Confounding variable: Everything you need to know

Last updated

11 June 2026

Author

Dovetail Editorial Team

Reviewed by

Cathy Heath

Summarize with AI

Analyze your research data

Quickly find patterns and themes across all your data when you analyze it in Dovetail

Contact sales

A confounding variable (or confounder) is a third variable that influences both the independent and dependent variables in a study, creating a false impression of cause and effect. Because confounders usually enter a study unnoticed, they can lead even experienced researchers to incorrect conclusions.

This article explains how confounding variables work, shows examples of they’ve distorted, and covers proven techniques for reducing their influence.

[Embed: 1QRx4m3axKbEtdiGslIn1K]

What’s the difference between confounding, independent, and dependent variables?

To define a confounding variable, it helps to start with the at the heart of most studies.

A is the thing you’re measuring or observing to see how it responds to changes in another variable—the independent variable, which researchers manipulate. In other words, the dependent variable’s value hinges on the independent variable’s value.

Imagine you’re evaluating the effectiveness of a tutoring program on students’ math scores. The quality or length of the tutoring program is the independent variable, and the students’ scores are the dependent variable.

This type of experiment usually has a and a treatment group. The treatment group receives the treatment—tutoring in this example, though it could be a medication or type of therapy. The control group doesn’t receive the treatment, which gives you a baseline for measuring the treatment’s effectiveness.

What is a confounding variable?

A confounding variable relates to both an experiment’s dependent and independent variables. Confounders can be hard to spot because you don’t bring them into an experiment deliberately—but they can still shape the outcome.

In the tutoring example, a simple confounding variable could be parental involvement. Perhaps the parents who signed their children up for tutoring are also more actively involved in their education generally.

If students achieved higher math scores after tutoring, you could argue the involved parents played a role rather than the tutoring program.

Why do confounding variables matter?

Confounders influence the conclusions of an experiment in ways researchers don’t intend. They create false associations: a researcher may believe Variable A leads to Conclusion B when it’s actually the confounder causing the change.

That’s why experimenters look for ways to reduce or eliminate confounding variables. If you were studying the tutoring program, you’d assign students to the program randomly rather than let their parents sign them up—effectively eliminating parental involvement as a confounder.

How can confounding variables harm research?

Confounding variables can seem abstract, but they’ve cast doubt on—or outright invalidated—well-known studies. Here are two documented examples.

Relationship between alcohol consumption and lung cancer

A study found that drinking alcohol more than doubles the risk of lung cancer. But it didn’t account for a confounding variable: a high percentage of people who drink alcohol also smoke.

Smoking is a well-established cause of lung cancer. Once researchers controlled for smoking, the link between alcohol and lung cancer disappeared.

Does obesity lower the risk of death for heart patients?

One study found that people with obesity have a lower risk of mortality after a heart attack. Here, researchers overlooked age as the confounding factor.

Heart disease is more prevalent among older people, while people with obesity and heart disease tend to be younger than heart patients without obesity. Once researchers controlled for age, the association between obesity and higher survival disappeared.

Requirements for confounding variables

For something to count as a confounding variable, it must meet two criteria:

It must correlate with the independent variable
It must causally relate to the dependent variable

In the alcohol and lung cancer example, alcohol consumption is the independent variable and cancer rates are the dependent variable. Smoking meets both conditions: it correlates with alcohol consumption, and it’s causally related to lung cancer.

Confounding variables vs. selection bias

People sometimes confuse confounding variables with selection bias, another factor that can invalidate research. They can look similar, but they’re not the same.

What is bias in research?

Some bias is present in nearly all research, and it can come from researchers or participants. It’s a significant issue in the social sciences, where attitudes and demographics (e.g., race, income, gender) play a large role in experiments, polls, and surveys—but it also occurs in medicine and the natural sciences.

Selection bias relates to the people you choose for research. It’s a common issue when researchers run studies at colleges and universities, where the population tends to be young and middle class. If you surveyed the reading habits of college students, you couldn’t generalize the results to the broader population.

Similar issues occur in medical research. If you studied the effectiveness of a flu vaccine, a young sample could distort the results, since older adults are more likely to experience severe flu symptoms.

can introduce bias, too. Calling landlines to conduct a survey skews your sample toward older people, while collecting data online excludes people without regular internet access, including unsheltered people and some older adults.

Pressuring people to participate also distorts results. If a company forces employees to complete a survey during their lunch break, many will fill out the form as fast as possible.

Why confounding is different from bias

The alcohol and cancer example illustrates the difference. Smoking is the confounding factor in that correlation, since many people with alcohol use disorder also smoke. But studying people who smoke and have alcohol use disorder isn’t selection bias—as long as participants aren’t disproportionately smokers.

Men smoke at significantly higher rates than women, for example, so a study that disproportionately targets men would introduce selection bias.

In short: bias leads to false conclusions because the research hasn’t used the right sample. With confounding, the correlation between variables is real but not necessarily causal. People with alcohol use disorder do have higher-than-average cancer rates—it’s just unclear whether alcohol is the cause.

Confounding variables and Simpson’s Paradox

Simpson’s Paradox occurs when combined data from two groups (e.g., men and women) contradicts the results of each group analyzed separately. It results from confounding variables you haven’t identified.

A famous example occurred when the University of California, Berkeley was sued for gender discrimination. Overall acceptance rates showed 44% of male applicants were accepted compared to 35% of female applicants. But when researchers broke the data down by department, the acceptance rate for women was equal to or higher than for men in most cases.

The explanation: women applied disproportionately to departments with low acceptance rates, while men typically applied to departments with high acceptance rates. Once you consider the confounding variable—which department applicants chose—the data tells a different story.

How to reduce the impact of confounding variables

Several techniques minimize the effect of confounding variables in research.

Randomization

Randomization works well for studies like clinical trials, where each subject should have an equal chance of receiving a particular treatment. A confounding variable is then more likely to distribute evenly across the groups.

For example, if a study compared the effectiveness of two drugs, researchers could randomly give subjects Drug A, Drug B, or a placebo.

Matching

Matching works when you already know your study’s confounding variables. You match an equal number of participants exposed and not exposed to the confounder. Studies often use siblings because they share similar genetics and family backgrounds—twins are especially useful because they share identical genes.

If you studied the effects of smoking, you could compare sets of twins where one twin smokes and the other doesn’t. This eliminates confounders like age differences, economic disparity, and geography. The drawback: matching only works if you’re aware of the confounding variables.

Restricting enrollment

One of the simplest ways to control for confounding variables is to limit enrollment to people equally affected by the confounder.

Many medical conditions disproportionately affect older people, so age would be a confounding variable in a study with participants of all ages. To prevent this, you can confine enrollment to older participants.

There are drawbacks: you must know the confounders and each subject’s status relative to them, and participants may have unknown health conditions or may not be truthful about their behavior. The range may also stay too wide—if you enroll people aged 65–80, risk factors can still vary significantly within that range.

Include confounders as control variables

You can control for confounders by including them as control variables in regression analysis, alongside your dependent and independent variables. The main challenge is identifying as many potential confounders as possible.

Imagine you’re studying the correlation between soda and obesity. Many potential confounders could affect the results—age, gender, other health conditions, and participants’ overall diet. Accounting for them keeps them from unduly influencing the results.

How do you identify a confounding variable?

The hardest part is spotting confounders in the first place. One of the best ways is to study previous research on similar topics, which may have already identified relevant confounders.

It also helps to look closely at differences between participants. Quantifying differences in age, income, and behavior—rather than reducing everything to simple binaries—can help control confounders.

For experiments on food, drink, or drugs, how much participants consume matters. Simply asking whether participants drink soda leaves a large margin for error: some people drink a few ounces a month, while others drink gallons a week.

Don’t overlook confounding variables

Confounders can cause you to attribute causation to the wrong factors and undermine otherwise solid experiments. Uncover as many potential confounders as you can before you start, then use randomization, matching, restriction, or statistical controls to keep them in check.