Research methods · Foundations
Correlation vs causation
Try this first
Across the year, ice-cream sales and drowning deaths rise and fall together — almost perfectly. Nobody thinks ice cream drowns people. So what is making both move at once? Name it before reading on.
Two numbers move together. That's all "correlation" means: when one is high, the other tends to be high (or reliably low). It's a real, measurable pattern — the ice-cream and drowning lines really do track each other. The trap is the leap your brain makes for free: one must be causing the other. But a pattern in the data is mute. It tells you the two things are linked; it says nothing about why. And there isn't just one "why" hiding behind it — there are four, and three of them are boring.
The one idea
When A and B move together, there are exactly four explanations: A causes B, B causes A (reverse), a hidden third thing C drives both (confounding), or it's chance. Only the first is the headline. Your job is to rule out the other three before believing it.
The four readings of any link
The ice-cream answer is the third one: summer heat drives both. Hot days sell ice cream and push people into the water, where some drown. Heat is the hidden C sitting above both — a confounder. Neither variable touches the other; they just share a common cause. Once you see one confounder, you start seeing them everywhere, because the world is full of things that move on the same seasons, the same incomes, the same kinds of careful people.
| Reading | What it means | Ice cream & drowning |
|---|---|---|
| A → B | The exciting one: A truly causes B | Ice cream causes drowning (absurd) |
| B → A | Reverse: B causes A, you had the arrow backwards | Drownings cause ice-cream sales (absurd) |
| C → both | Confounding: a third factor drives both | Summer heat drives both — the real answer |
| chance | A fluke; the link vanishes with more data | Possible in tiny samples, not here |
Work one, then finish one
Worked: "Diet soda is linked to weight gain." The headline implies A→B — the sweetener makes you fat. But run the four readings. Reverse causation fits far better: people who are already heavier, or trying to lose weight, are the ones who switch to diet soda. The weight came first and caused the diet-soda choice, not the other way round. The arrow points B→A. (Confounding plays a part too — the same people may diet, snack, and weigh in differently.) The drink looks guilty only because we read the arrow in the flattering direction.
Your turn: "People who meditate are calmer." Give all four readings before you credit meditation. (One: meditation→calm, the headline. Two, reverse: calm, low-stress people are the ones who keep up a meditation habit. Three, confounding: a third factor like more leisure time, higher income, or fewer money worries makes someone both calmer and free to meditate. Four: chance, a fluke in a small or noisy sample. Only the first earns the "meditation works" caption.)
Why this matters
Almost every nutrition and supplement headline you'll ever read is a correlation from an observational study — researchers watched what people already chose to eat or take and tracked who got sick — dressed up as cause. "People who take fish oil have healthier hearts." "Vitamin-D levels are linked to lower risk of everything." Maybe. But fish-oil takers also tend to be wealthier, more health-conscious, and better-doctored; low vitamin D can be a result of being sick and indoors, not a cause of it. Before you spend money on a bottle, ask the one question the marketing never answers: was this a controlled trial where they actually gave people the supplement and watched what happened — or did they just notice that the kind of person who already takes it tends to be healthier anyway? If it's the second, you're looking at a correlation wearing a cause's clothes.
Recall check · no peeking
- Name the four explanations for any observed correlation between A and B.
- What is reverse causation, in one sentence, with an example of your own?
- Which hedge words in a headline signal "this is only a correlation"?
Explain it back
In one plain sentence, tell a friend why a study saying a supplement is "linked to" better health does not mean the supplement causes better health.