Point Estimates and Confidence Intervals

Let’s take a look at the point estimates and confidence intervals for research synthesis and meta-analysis.

Research synthesis and meta-analysis

Meta-analysis is less likely to include cases where intervals include a wide non-significant result and, as a result, have a lower chance of being published or of being clearly reported. However, if we do conduct meta-analysis, we must include all research conducted on the topic of focus, including significant and non-significant results.

Meta-analysis can be useful in situations where we have various studies that have returned wide intervals as a result of small sample sizes. If we combine the data from these studies, meta-analysis can detect general effects that may have previously been deemed non-significant (or marginal) in each of the smaller studies.

Reproducibility of a result is a key part of the scientific method, and meta-analysis is one of our main tools for detecting reproducible results. However, data can only be included in a meta-analysis if they’re available. Providing point estimates, intervals, and sample sizes for all results—including whether they’re statistically significant or non-significant—helps facilitate this synthesis of published research. This is yet another reason why an estimation-based approach to analysis is favorable.

Results

As we just mentioned, the reproducibility of results is a key part of the scientific method. Unfortunately, the emphasis placed on novelty by both journals and the grant review process can be a deterrent against reproducibility. Despite this, we can still find generally similar experiments. As results build up in the literature, they can be synthesized to see how consistent the results are and how dependent the outcome is on confounding variables.

There are ways of working with p-values for both formal meta-analysis and less formal research reviews. However, effect sizes with a clearly defined interval (and sample sizes) are vastly superior and preferable for both. Less formal research reviews often use vote counting, where each analysis contributes a statistically significant or non-significant result. For example, our analysis of Darwin’s data produced a statistically significant result and counts as a positive vote for the negative effects of inbreeding depression on evolutionary fitness.

Now consider another (imaginary) study—we’ll say by Wallace—with a similar mean effect of inbreeding but not as statistically significant. If we focus on p-values, it gives us a picture of inconsistency. This is because Darwin’s study was statistically significant, but Wallace’s was not. As a resulte, we have one vote each way. As we’ve previously discussed, it’s always preferable to present point estimates and intervals. The figure below displays both results and gives a different impression.

Get hands-on with 1400+ tech skills courses.