Research methods · Foundations

The evidence ladder

Try this first

Four claims about a supplement, all "backed by science." Rank them hardest-evidence-first: a friend swears it changed their life; a result in mice; a survey linking the supplement to better health; a randomized trial in people. Which would actually move your decision?

You're deciding whether a supplement is worth it — say creatine, collagen, or some peptide a podcaster is excited about. You open the brand's page and it's a wall of citations. There's a study about receptors, a study in rats, a big survey, a glowing quote from a customer. It all says "supported by research," so it all feels roughly equal. It isn't. Two studies can both be real, peer-reviewed, and honestly reported, and still be worlds apart in what they're allowed to prove. The first question is never is there a study? It's what kind of study is it?

The one idea

Different types of evidence sit on a ladder, from anecdote at the bottom to systematic review at the top. The rung is a ceiling on what the study can prove — not a gold star it earned. A mechanism or a mouse study can be flawless and still cannot, by design, tell you what a pill does in your body. Rank the rung first; only then argue about quality.

The six rungs, bottom to top

Each step up adds something the one below it couldn't give you. Anecdote has no comparison group at all. A mechanism explains why something might work, but a plausible story is not an effect. Animal and in-vitro work let you intervene cleanly, but the subject isn't a human. Observational studies finally watch real people, but can't separate the supplement from everyone who chooses to take it. A randomized controlled trial (RCT) breaks that link by assigning the pill at random. A systematic review or meta-analysis pools many trials so one fluke can't carry the verdict.

Type sets the ceiling. The top two rungs are where human claims live.

What each rung can and can't prove
Rung	What it shows	Hard limit
1 · Anecdote	One person felt better	No comparison; could be placebo, chance, or anything else
2 · Mechanism / in-vitro	A plausible reason it could work	A story, not an effect; cells in a dish aren't a body
3 · Animal	An effect in a living organism	Mice aren't humans; doses and biology differ
4 · Observational	A link in real people	Can't separate the pill from who chooses to take it
5 · RCT	Cause, by random assignment	One trial can still be small, short, or a fluke
6 · Review / meta-analysis	The weight of many trials	Only as good as the trials it pools

Rung first, quality second

A common mistake is to argue quality before type — "but it's a well-run mouse study." Well-run for a mouse study still tops out at rung three. Type sets the ceiling; quality just tells you how close to that ceiling a given study reaches. A sloppy RCT can be worse than a clean one, but no amount of polish lets a mechanism claim prove what a pill does in people. Place the rung, then ask whether the study did its rung well.

Work one, then finish one

Worked: "Ashwagandha raises testosterone because it acts on the right receptors." Where does this sit? The phrase because it acts on the right receptors is a mechanism claim — rung two. It describes a pathway by which the effect could happen. That's genuinely useful for deciding what's worth testing, but it proves nothing about whether real men taking real doses end up with higher testosterone. Plausible is not proven. To move up, you'd need it shown in people, ideally in a randomized trial.

Your turn: "In 12 mice, compound X extended lifespan by 20%." Which rung, and what's the catch? (Rung 3 — animal evidence, second from the bottom. It's a real effect in living creatures, so it beats a mechanism story, but the n is tiny (12), the subject is a mouse, not a human, and lifespan results in mice notoriously fail to carry over. Interesting; nowhere near a reason to buy.)

Why this matters

Almost all supplement marketing lives on the bottom two rungs — a mechanism diagram and a pile of testimonials — staged to feel like it's near the top. The brand page for a collagen or a peptide will lead with "clinically studied ingredients," then cite a receptor pathway (rung 2) and five-star reviews (rung 1), with maybe a rat study (rung 3) for weight. A podcaster does the same out loud: a confident causal story stitched from a mechanism and an anecdote. The instant you ask "what type of evidence is this, actually?" most of it collapses to plausible and someone felt better — which is exactly the evidence you'd expect even if the product did nothing. For a real purchase, you want at least one decent RCT in humans, and ideally a review of several. If the strongest thing on offer is a mechanism and a testimonial, you're being sold a story, not a result.

Recall check · no peeking

Name the six rungs in order, bottom to top.
Why do mechanism and animal data rank low when the claim is about humans?
What does "rung first, quality second" mean, and why is that the right order?

Explain it back

In one plain sentence, tell a friend why a single good randomized trial in people can outweigh ten glowing testimonials.

Learn · Shawon Chowdhury · a study guide, kept rough on purpose