Spring launch 2025 | Discover our latest AI-powered innovationsExplore launch

Understanding the difference between correlation and causation

Last updated

20 February 2023

Author

Reviewed by

When conducting any type of testing, experimentation, or measurement—whether marketing, scientific research, or inventory management—understanding the difference between correlation and causation will be vital.

Data that reflects a correlation between a particular measure or variable of interest and an intervention does not necessarily imply causation. You’ve likely heard this framed before as the adage, “correlation does not imply causation.”

For example, suppose you launch a marketing campaign, and your sales increase. Before determining whether the marketing campaign (also known as the intervention) caused the increase in sales (the variable you’re measuring), you must either control or measure other factors that could influence sales.

For instance, if the price of the featured product simultaneously fell, you would need to isolate and track that variable to know whether the marketing campaign was a success or the reduced cost is driving the increase in sales.

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

Analyze with Dovetail

What is correlation?

Correlation signifies a pattern or relationship between two data sets. The relationship can be a positive or negative correlation. The correlation creates an opportunity to make predictions, even though the variables are not directly causing the change to one variable or the other.

For example, imagine a retailer seeing their sunscreen sales increase while ice cream sales also increase. While sunscreen sales don't directly influence ice cream sales, a third factor – rising temperatures – is driving an increase in ice cream and sunscreen sales.

This knowledge is valuable because the owner knows that when sunscreen sales begin to increase, they also can anticipate a rise in ice cream sales and prepare by ordering more ice cream.

In addition, if they use inventory management software, the lift in sunscreen sales might trigger a prompt to order more ice cream or even automatically order more for them.

How do you explain correlation in simple terms?

Correlation occurs when two variables act in unison but without exerting any influence over each other. In the previous example, sunscreen and ice cream sales go up in the summer and fall in the winter, but one did not cause the other.

Instead, the causing factor was the hot weather that called for sunscreen and increased the desire to eat ice cream.

Why is it essential to understand correlation?

Correlation is a helpful predictor for businesses, but organizations must understand that they cannot manipulate correlational variables to influence each other in entirely predictable ways.

For example, if a company sells two products that have a seasonal connection (correlation), it is not a given that both products will automatically experience increased sales.

However, if they know that two or more products are affected by the same causal variable, they can forecast an increase or drop in sales when the causal variable changes. In the ice cream and sunscreen sales example, it is possible to have a deep understanding of the correlation, yet, there could be many variables that cause the prediction not to come true.

For example, if a fancy ice cream parlor opens up beside the retail store, customers might still buy sunscreen but go next door to purchase artisanal ice cream.

What is causation?

Causation is an effect observed when one variable or factor directly influences another variable or measure.

To establish causation, you must be able to control other variables and measure the impact of the first variable on the second.

In our example above, the retailer might have previously lowered the price of ice cream and noted a corresponding increase in sales. Or the opposite – the company observes a drop in sales when the cost increases.

The marketing department can still gauge the effectiveness of their campaign by having this knowledge.

Let’s say that historically, a previous price drop of 5% increased sales by 10%. And then, the company dropped the cost by another 5%, but that summer’s sales increased by 25%. In that scenario, a 15% increase can reasonably be supposed to be caused by the marketing campaign’s effectiveness.

What's the difference between correlation and causation?

Correlation and causation are both predictive indicators, but companies or individuals who aim to use these insights must understand the importance of not mixing up their properties.

Measuring the correlation of variables is a simple task of plotting the data on a graph to see if they create a linear model. Linear models demonstrate a continuous response in relation to one or more predictors.

If the drug store plots the sales of ice cream from January through December, they will typically see an increase starting in the spring, rising through the summer, and dropping in the fall. Likewise, if they overlay a graph of sunscreen sales, they will create a similar bell curve.

Conversely, to determine causation, you must first be able to measure the effects through experimentation or testing.

Double-blind experiments happen all the time in the field of medicine. For example, when a pharmaceutical company tests a new drug, it sets up scenarios for half of all patients to receive the new treatment. In contrast, the other half gets a placebo. At this point, neither the participants nor the researchers will know who is getting the new medication.

This double-blind measure prevents both patients and researchers from exerting influence over the outcomes. Otherwise, they might interact with each other differently or have a biased expectation of improving from treatment.

But suppose the patients who receive the new drug improve more than the other patients (the control group). In that case, the new drug therapy is determined to be the cause of improvement (causation rather than correlation).

Now, let’s say there was no control group, and all participants received the new drug—it would be reasonable to expect a certain amount of people to improve—regardless of the effectiveness of the medication.

That’s because simply expecting to improve could be a confounding factor, even if the medicine lacks therapeutic value (also known as a placebo effect).

This version of the experiment would indicate a connection between the new treatment and fewer symptoms. Still, it would be impossible to accurately attribute benefits directly to the medication.

Why is it important to understand the difference between correlation and causation?

Understanding the difference between correlation and causation is critical because you don’t want to imply a causal relationship to only correlated variables. Forcing a change in one of two correlated variables will not necessarily create a change in the other.

Let’s return to our earlier retail example – suppose that store offers half-price off on sunscreen in May and sales spike.

That doesn’t mean they should automatically stock an equal amount of ice cream—that may well be a waste of money.

In this case, the retailer has created two independent factors that could cause a spike in sunscreen sales – warmer weather and a lower price – while ice cream sales are still influenced only by warmer weather. In other words, sunscreen sales have no causal relationship with ice cream sales.

Understanding the relationships between causal variables enables greater control when it comes time to test and analyze the outcomes. For example, launching a new marketing campaign may be unwise if your sales team informs you of an upcoming price drop on specific goods or services. Due to this added variable, it becomes more challenging to measure the campaign’s effectiveness.

Ultimately, businesses need to be discerning with how they interpret data to make the best possible decisions and achieve excellent outcomes.

How can we determine if variables are correlated?

Determining whether variables have a correlational relationship can be done by placing them on a scattershot graph to see if they create a linear diagram or a non-linear one.

They can create a positive correlation (when X increases, Y also increases) or a negative correlation (when X increases, Y decreases).

If X increases and Y remains the same, there is no correlation, or, in other words, there is no clear pattern.

Why doesn't correlation mean causation?

Correlated variables tend to change in tandem when influenced by a third variable but it may also be a coincidence.

For example, increased ice cream sales and violent crime also are closely correlated, but our retail store owner would be happy to know that their ice cream sales do not cause violent crime or vice versa. Instead, both have a causal relationship influenced by hot weather.

An example of correlation vs. causation in product analytics

Picture a company that develops two games for Xbox – one targets adult users, while the other appeals to children. The company releases a new version of each game annually in the fall because they anticipate increased gaming as the weather gets colder, plus a lift in holiday sales.

This company has also seen a historical increase in sales of the child-friendly game version in countries that offer an extended summer break.

Historically, sales also rise every time a new Xbox version is released.

The company knows that the first two games have a correlated relationship because both products experience an increase in sales during the fall and the holiday season. Still, a rise in one has no direct influence over the other.

The causal variables for each are that colder weather increases indoor activities and, therefore, video game sales, and, in addition, holiday buying increases sales.

The children's game has the added causal variable that students being on summer break increases video game usage and sales.

The company decided to develop a teen version of its adult game to appeal to the market between its current offerings.

As a result, the company can also anticipate sales of its other two games increasing in the fall and during the holidays.

However, it still needs to be predicted whether the teen game will also be affected by school letting out for the summer.

To better forecast the sales impact of summer break and anticipate inventory needs, the company must run a survey or analyze the past sales trends of other teen video games.

How to test for causation in your product

First, you must know the variables that go into the sales of your product and determine which ones are controllable and which are uncontrollable.

For controllable variables, you must ensure that none of the controllable variables change while you are testing.

For non-controlled variables, you must know how those non-controlled variables affect your sales and a way to monitor whether or not they will change while you are testing.

For example, if you have an app popular with iPhone users, you know you will get a spike in downloads when a new iPhone is released, but the exact date is an uncontrolled variable.

However, new iPhones usually drop in the fall. Knowing this, you could set up your testing during a different season. Alternatively, you could also analyze how much your sales increase whenever a new iPhone comes out and factor that into your testing results.