Understanding Causal Inference in Epidemiology: From Risk Factors to Counterfactual Regression

 4 min read

YouTube video ID: rUxorJ_3NkM

Source: YouTube video by ChisquaresWatch original video

PDF

Introduction

The lecture walks through core concepts that differentiate risk factors from causal factors, critiques classic frameworks like Koch’s postulates, and introduces modern tools for establishing causality in public health research.

Risk Factors vs. Causal Factors

  • Risk factor: a proxy variable (e.g., living in sub‑Saharan Africa) that captures a bundle of underlying conditions such as malnutrition and poverty.
  • Causal factor: the true underlying mechanism that directly leads to disease. Measuring causal factors is often impractical, so epidemiologists rely on risk factors to guide prevention.

Limitations of Koch’s Postulates

  • Originally designed for infectious diseases, they assume a single, identifiable pathogen.
  • In chronic diseases, many exposures interact; baseline risk exists even without known exposure, making the postulates insufficient.

Bradford Hill Criteria (ordered by perceived strength)

  1. Experiment – Randomized trials provide the strongest evidence; however, ethical constraints limit their use for harmful exposures.
  2. Temporality – Exposure must precede outcome.
  3. Biologic Gradient – Dose‑response relationship (more exposure → higher risk).
  4. Strength of Association – Larger effect sizes (e.g., odds ratio > 3) indicate stronger evidence.
  5. Coherence – Findings should align with related biological pathways and biomarkers.
  6. Consistency – Replication across studies/populations, though consensus does not guarantee truth.
  7. Specificity – The exposure leads to a single outcome; rarely met in chronic disease.
  8. Analogy – Similar known relationships support the new hypothesis.
  9. Plausibility (Biologic Possibility) – The relationship must make sense biologically, but over‑reliance on current knowledge can be misleading.

Rothman’s Sufficient‑Component Cause Model

  • Sufficient cause: a set of conditions that together can produce disease.
  • Necessary component: a factor present in every sufficient cause (e.g., factor A in all four example pies).
  • A factor can be necessary but not sufficient, sufficient but not necessary, or both. The model helps visualize multiple pathways to the same outcome.

Causal Inference Using Regression and Counterfactuals

  1. Associational analysis compares observed exposed vs. unexposed groups.
  2. Causal analysis asks: What would the average outcome be if the entire population were exposed versus if none were? This is the counterfactual framework.
  3. Key assumptions:
  4. Well‑defined interventions – clear definition of exposure (e.g., quit smoking vs. continue smoking).
  5. Exchangeability – treated and untreated groups would have identical outcomes under the opposite condition.
  6. Positivity – every subgroup must contain both exposed and unexposed individuals.
  7. Consistency – the observed outcome under the actual exposure matches the counterfactual outcome under the same exposure.
  8. No model misspecification – the statistical model must correctly represent the true functional form (linear vs. quadratic, etc.).

Practical Example: Smoking Cessation and Weight Gain (NHANES)

  • Data: 1,556 adult smokers with baseline (1971‑75) and follow‑up (1982) measurements.
  • Exposure: quitting smoking between surveys.
  • Outcome: change in body weight.
  • Procedure:
  • Create three datasets: original, all‑exposed, all‑unexposed.
  • Set exposure to 1 in the all‑exposed copy and 0 in the all‑unexposed copy; mask outcomes (set to missing) in these copies.
  • Fit a regression model on the original data, adjusting for confounders (age, sex, race, education, physical activity, smoking intensity, smoking duration, baseline weight).
  • Predict outcomes for the two counterfactual copies using the fitted model.
  • Compute the average predicted weight change for the all‑exposed and all‑unexposed groups; the difference (≈ 3.46 kg) is the average causal effect.
  • Interpretation:
  • Associational: “People who quit smoking gained on average 3.46 kg more than those who did not quit.”
  • Causal: “If everyone in the population quit smoking, the mean weight would be 3.46 kg higher than if nobody quit.”

Interpreting Causal vs. Associational Results

  • Causal estimates refer to population‑level interventions; they are not statements about individual behavior.
  • Associations can be biased by confounding; causal models aim to remove that bias through the assumptions listed above.

Take‑Home Messages

  • Risk factors are useful proxies when direct measurement of causal mechanisms is impossible.
  • Bradford Hill criteria provide a structured, though not definitive, checklist for causal inference.
  • Rothman’s component‑cause diagrams clarify that multiple pathways can lead to the same disease.
  • Counterfactual regression, combined with rigorous assumptions, allows researchers to estimate what would happen under hypothetical universal exposure scenarios.
  • Proper model specification and careful handling of missing data are essential to avoid biased causal estimates.

Causal inference in epidemiology moves beyond simple associations by rigorously applying frameworks like Bradford Hill, Rothman's component‑cause model, and counterfactual regression, all while respecting key assumptions; mastering these tools enables researchers to estimate the true impact of public‑health interventions.

Frequently Asked Questions

Who is Chisquares on YouTube?

Chisquares is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF