Research methods · The literature

Systematic reviews & meta-analyses

Try this first

A meta-analysis of 15 studies "proves" a supplement works, and the brand puts it on the label. Before reading on, name two things that could make that conclusion worthless even though the math is correct.

Say you want to know whether a supplement does anything. One trial isn't enough — it could be a fluke, or too small, or run by a lab that got lucky. So someone smarter does the obvious thing: they round up every trial ever run on that supplement, line up the results, and combine them into a single number with a tight margin of error. That's a meta-analysis, and it sits at the very top of the evidence ladder. It feels like the final word. The trouble is that combining studies doesn't fix bad studies — it averages them. If the trials feeding in are small, short, or paid for by the seller, you don't get the truth. You get a wrong answer with impressive-looking precision.

The one idea

A systematic review gathers all the relevant studies by a stated, repeatable method; a meta-analysis statistically pools their numbers into one estimate. It's the top rung of the ladder — but it's garbage in, garbage out. Pool weak, biased, or industry-funded trials and you get a more precise wrong answer. Before you trust the pooled number, check the inputs.

Two words people use interchangeably, and shouldn't

A systematic review is a search protocol: it sets the question, the inclusion rules, and the databases in advance, then finds everything that qualifies — so the result doesn't depend on which studies the author happened to like. A meta-analysis is the optional next step: take the numbers from those studies and pool them into one weighted estimate, usually drawn as a forest plot. Every meta-analysis should rest on a systematic review; not every review ends in a meta-analysis (sometimes the studies are too different to combine, which is itself a finding).

Three checks separate a trustworthy pooled number from a dressed-up guess.

What to check before you trust the diamond
Check	The question it answers	Bad sign
Quality grade (e.g. GRADE)	How sure are the authors, really?	Rated "low" or "very low" certainty
Heterogeneity	Are these studies even comparable?	High I², results scattered apart
Publication bias	Did the null results get buried?	No funnel plot or test reported
Funding & endpoints	Who paid, and what was measured?	Industry-funded, surrogate outcomes

Forest plot: each square an estimate, each line its interval; the diamond is the pooled result. Studies C and F sit far off — apples mixed with oranges.

Heterogeneity: are you even pooling the same thing?

Look at the plot again. Most studies cluster near the no-effect line, but C and F sit way off to either side. That spread is heterogeneity — the studies disagree more than chance alone would explain. Maybe they used different doses, different patients, different durations, or measured different outcomes. When studies are that inconsistent, averaging them is like averaging the temperature of an oven and a freezer to declare the room "comfortable." A summary statistic called I² flags how much of the scatter is real disagreement rather than noise. High heterogeneity doesn't always void a meta-analysis, but it means the single pooled number hides as much as it reveals — and you should be suspicious of anyone who quotes the diamond without mentioning the spread.

Work one, then finish one

Worked: A label cites a meta-analysis: "15 trials, supplement significantly improves outcomes." You pull the review. The 15 trials are all small (under 60 people), short (4–8 weeks), funded by makers of the supplement, and every one measured a surrogate endpoint — a blood marker that's supposed to track health, not actual health events like heart attacks or living longer. Pooling them produces a narrow, confident diamond. But narrow only means the inputs agreed; it says nothing about whether the inputs were any good. Fifteen biased trials measuring the wrong thing pool into a precise answer to a question you don't care about. Garbage in, garbage out — now with a tight confidence interval. Precision is not accuracy.

Your turn: Two meta-analyses on the same supplement reach opposite conclusions — one says it works, one says it doesn't. The math in both is fine. What explains the clash? (They used different inclusion criteria and quality thresholds — each one let in a different set of studies. One may have admitted small industry trials the other excluded for low quality, so they're pooling different evidence and landing in different places.)

Why this matters

This is exactly where a real supplement decision goes wrong. A podcaster or a brand page flashes "backed by a meta-analysis" as if it ends the argument, and it lands hard — meta-analysis sounds like the top of the pyramid, because it is. But a meta-analysis headline is only as good as the studies stacked underneath it, and almost nobody clicks through to check them. Before you spend money on the bottle, open the actual review: skim the funnel plot for buried null results, find the GRADE rating, look at whether the trials were big and long or small and short, and notice who paid. If the inputs are weak, the pooled number is a confident wrong answer — and the confidence is the trap.

Recall check · no peeking

What's the difference between a systematic review and a meta-analysis?
What does "garbage in, garbage out" mean for a pooled estimate, and why doesn't a narrow confidence interval rescue it?
Why do the heterogeneity and publication-bias checks matter before you trust the diamond?

Explain it back

In one plain sentence, tell a friend why "but it's a meta-analysis" isn't an automatic mic-drop.

Learn · Shawon Chowdhury · a study guide, kept rough on purpose