Confounding variable: Everything you need to know

Short on time? Get an AI generated summary of this article instead

Confounding variables are critical in many areas of research. Even seasoned researchers can overlook confounding variables and arrive at incorrect conclusions if they aren't paying close attention.

This article will discuss what a confounding variable is and provide examples and guidelines on ensuring this type of variable doesn't muddle your research.

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

Analyze with Dovetail

What's the difference between confounding, independent, and dependent variables?

When defining a confounding variable (also known as a confounder), looking at the dependent and independent variables is helpful. These are critical elements in many studies.

A dependent variable is an element in a scientific experiment or study. This is the thing you’re measuring or observing to see how it responds to changes in another variable, known as the independent variable.

In other words, the dependent variable's value hinges on the independent variable's value. An independent variable is an element in an experiment that the researchers manipulate.

Verywell Mind gives several examples of how these variables work in psychology.

Imagine you’re evaluating the effectiveness of a tutoring program on students' math scores. You may choose the quality or length of the tutoring program as the independent variable and the students' scores as the dependent variable.

This type of experiment usually has a control group and a treatment group.

The treatment group is the group receiving the treatment, which would be tutoring in this example. It could also be a medication or type of therapy. The control group does not receive the treatment, which is necessary to measure the treatment’s effectiveness.

What is a confounding variable?

A confounding variable relates to an experiment's dependent and independent variables. Confounding variables can be difficult to see because you usually don't bring them into an experiment deliberately. However, they can affect the outcome.

Returning to the example of a tutoring program, researchers should look for possible confounding variables. A simple confounding variable could be parental involvement. Perhaps the parents are more actively involved in their children’s education, so they signed them up for tutoring.

If students achieved higher math scores after tutoring, you could argue that the involved parents played a role rather than the tutoring program.

Why do confounding variables matter?

So, why are confounding variables problematic? Confounders can influence the conclusions of an experiment in ways that the researchers do not intend.

Confounding variables can create false associations. A researcher may believe that Variable A leads to Conclusion B, but it’s actually the confounder causing the change.

Experimenters look for ways to reduce or eliminate confounding variables, as they can make it challenging to arrive at clear conclusions.

If you were studying the tutoring program, you'd have to assign students to the program randomly rather than let their parents sign them up. This effectively eliminates the confounding variable of parental involvement.

How can confounding variables harm research?

Confounding variables can seem rather abstract. However, they have important implications for research. There are several documented cases of confounding variables causing doubt or invalidating well-known studies. Let's look at some examples.

Relationship between alcohol consumption and lung cancer

A study found that drinking alcohol more than doubles the risk of lung cancer. However, this study did not consider a confounding variable: A high percentage of people who consume alcohol also smoke.

As we know, scientists have ascertained that smoking increases the risk of lung cancer. So, once researchers controlled studies for smoking, they discovered the link between alcohol and lung cancer didn’t exist.

Does obesity lower the risk of death for heart patients?

One study found that people with obesity have a lower risk of mortality after a heart attack. In this case, researchers ignored age as the confounding factor.

Heart disease is more prevalent among older people. Meanwhile, obesity is associated with a shorter life span overall. Also, people with obesity and heart disease tend to be younger than those who aren't obese.

Once researchers controlled the studies for age, the association between obesity and higher survival disappeared.

Requirements for confounding variables

For something to be considered a confounding variable, it must meet two criteria:

It must correlate with the independent variable
It must causally relate to the dependent variable

If we use the example of correlating alcohol consumption with lung cancer, the consumption of alcohol is the independent variable, while cancer rates are the dependent variable.

As we saw, smoking is a confounder, as it meets both conditions: There's a correlation between smoking and consuming alcohol, and it’s causally related to lung cancer.

Confounding variables vs. selection bias

People sometimes confuse confounding variables with selection bias, another factor that often invalidates research. While these two can appear similar, they are not the same.

What is bias in research?

Some type of bias is common in all experiments and research. Several types of bias may come from researchers or study participants.

Bias is a significant issue in the social sciences, where attitudes and demographics (e.g., race, income, gender) can play a significant role in experiments, polls, and survey results.

However, bias can also occur in medicine and the hard (natural) sciences, such as astronomy and meteorology.

The following are some common types of bias.

Selection bias

Selection bias relates to the people you choose for research. It’s a common issue when researchers conduct studies at colleges and universities, where the population tends to be young and middle class. If you surveyed the reading habits of college students, you could not generalize this and assume it's true of the population in general.

Similar issues can occur with medical research. If you studied the effectiveness of a flu vaccine, a young population could distort the results, as older adults are more likely to experience severe flu symptoms.

Data or measurement collecting bias

This involves the data collection methods researchers use. Are you thinking of calling people on landlines to conduct a survey? As most younger people no longer use landlines, your collection method would skew your data. Collecting data on a website or via email excludes people who don't have regular internet access, including unsheltered people and older adults.

Procedural bias

This bias occurs if you compel people to participate in research or pressure them to answer questions quickly. For example, if a company forces its employees to complete a survey during their break or lunch hour, they may fill out a form as fast as possible.

Why confounding is different from bias

Qualifying Health uses the above example of the possible correlation between alcohol use disorder and cancer to illustrate how bias and confounding differ. As we saw, smoking is the confounding factor in this correlation, as many with alcohol use disorder also smoke.

However, studying people who smoke and have alcoholism is not a type of selection bias as long as participants in the research are not disproportionately smokers.

For example, Statista reports that about twice as many men as women use tobacco products. A study that disproportionately targets men would be a type of selection bias.

As the article in Qualifying Health summarizes it, bias leads to false conclusions because the research hasn't used the correct sample type. On the other hand, confounding the correlation between variables is real but not necessarily causal. For example, people with alcohol use disorder have higher than average cancer rates, but it's unclear whether alcohol is the cause.

Confounding variables and Simpson's Paradox

Simpson's Paradox is an issue that can perplex researchers, causing results that seem contradictory. It occurs when researchers combine data from two groups (e.g., men and women) which contradicts the effects of the separate groups.

Simpson's Paradox results from confounding variables you’ve not identified.

A famous example of Simpson's Paradox occurred when the University of California at Berkeley was sued for gender discrimination. Acceptance rates revealed that 44% of male applicants were accepted compared to 35% of female applicants. However, when researchers broke data down into separate departments at Berkeley, they found the acceptance rate for women was equal to or higher than for men in most cases.

How can we explain these contradictory results? Looking at the data in isolation, the university appears more likely to accept men. However, women applied disproportionately to departments with low acceptance rates, while men typically applied to departments with high acceptance rates.

Once we consider the confounding variable of the specific department applications, the data tells a different story.

How to reduce the impact of confounding variables

Several techniques minimize the effect of confounding variables in research.

Randomization

Randomization works well for studies such as clinical trials for medications, as each subject should have an equal chance of receiving a particular treatment. This means a confounding variable is more likely to distribute across the groups evenly.

For example, if a study was comparing the effectiveness of two drugs, researchers could randomly give their subjects Drug A, Drug B, or a placebo.

Matching

Matching can be effective when you know your study’s confounding variables. You match an equal number of participants exposed or not exposed to the confounding variable. Studies often use siblings because they have similar genetics and family backgrounds. Twins are especially useful as they share identical genes.

If you studied the effects of a specific behavior, such as smoking, you could compare a set of twins where one twin was a smoker and the other a non-smoker. This type of study eliminates confounding factors such as age differences, economic disparity, geography, and others. One drawback of matching is that it will only work if you are aware of the confounding variables.

Restricting enrollment

One of the simplest ways to control for confounding variables is to limit study enrollment to people equally affected by the confounder.

For example, many medical conditions disproportionately affect older people. Age would therefore be a confounding variable if a study included people of all different ages. To prevent this, you can confine enrollment to older participants.

The main drawbacks of restriction are that you must be aware of the confounders and the subjects' status relative to the confounder. Some study participants may have unknown health conditions or may not be truthful about their behavior.

With factors such as age, the range may be too large to eliminate confounders, although you could confine a study to people over 65. However, if you enroll people aged 65–80, the risk factors could still vary significantly within that range.

Include confounders as control variables

Controlling for confounders is possible by using them as control variables when performing regression analysis. With regression analysis, you include all the variables that impact the study, including dependent and independent variables. If you can identify confounding variables, you should also take these into account.

The main challenge is to identify as many potential confounders as possible.

Imagine you’re studying the correlation between soda and obesity. Many potential confounders could affect the results, such as age, gender, other health conditions, and participants' overall diet. By considering these, you can avoid them influencing the results unduly.

How do you identify a confounding variable?

While researchers receive training to account for confounders, the main challenge is identifying them in the first place. One of the best ways to identify confounders is to study previous research on similar topics, which may have identified relevant confounders.

It's also helpful to carefully consider differences in participants in a study. Quantifying differences in age, income, and behavior rather than reducing everything to simple binaries can control confounders.

Grouping people by age can be imprecise if the ranges are too large. For experiments focusing on the effects of food, drink, and drugs, learning how much participants consume may be very significant to the results. Simply asking if participants consume soda or not leaves a large margin for error: Some people drink a few ounces a month, while others drink gallons a week.

Don’t overlook confounding variables

Researchers need to be aware of confounding variables to ensure their experiments are valid. It also prevents them from coming to incorrect conclusions, as confounders can cause you to attribute causation to the wrong factors. To ensure this doesn't happen, uncover as many potential confounders as possible and take the appropriate precautions.