Cause vs. correlation: what’s the difference?

Print anything with Printful



Correlation and cause are often confused. A correlation is a relationship between two or more things, but it doesn’t necessarily mean that one causes the other. Researchers use statistical methods to establish correlations, but they should be careful about jumping to conclusions. Proving beyond a reasonable doubt that one factor causes another requires much more than a high correlation coefficient. Confusion between cause and correlation often results from the way results are reported in the media. People should watch or listen carefully for qualifying words before drawing conclusions.

Cause and correlation are often confused or misused terms. A correlation means a relationship between two or more things: when one increases, the other increases, or when one increases, the other decreases. A cause is something that results in an effect; for example, heating water to a certain temperature will cause it to boil. The crux is that a correlation between two things doesn’t necessarily mean that one causes the other. If there is a relationship between two phenomena, A and B, it could be that A causes B, or it could be that B is responsible for A; other possibilities are that some other factor is the reason for A and B, or that they have independent causes that simply occur in parallel.

Correlation

Researchers trying to find reasons for various things often use statistical methods to establish correlations – this may be the first step towards determining the cause. Scientists and statisticians can use a formula to determine the strength of a relationship between two phenomena. This gives a figure, known as the squared correlation coefficient, or R2, which always lies between 0 and 1, with a value closer to 1 indicating a stronger correlation.

When the R2 value is high, this relationship may merit further investigation; however, researchers should be careful about jumping to conclusions. All kinds of strong, but meaningless correlations can be identified. In one well-known example, the R2 for the number of highway fatalities in the United States between 1996 and 2000, and the amount of lemons imported from Mexico during the same period, is 0.97 – a very strong correlation – but it is extremely unlikely that one causes the other.

A correlation, particularly when reported in the media, is often described as a ‘link’, which can be misleading, as it can be interpreted as meaning that one factor causes the other. For example, a study that found that men who drink four cups of green tea a day had a lower risk of stroke than those who didn’t drink it could generate the headline “Green Tea Reduces Stroke Risk.” This implies that drinking green tea will directly reduce the risk of stroke, but this is not proven by the study. Other factors, such as the fact that the study was conducted on men in Japan who have different diets and exercise habits than men in Western countries, may have influenced the results. While there may be a more direct causal relationship here, a larger study would be needed and more variables should be considered.

To cause
If factor A is responsible for factor B, there will be a strong correlation between the two, but the reverse is not necessarily true. Proving beyond a reasonable doubt that A is responsible for B requires much more than a high R2 value. Having established a strong relationship, the researchers will then need to come up with ideas about how A might affect B, then test these ideas by experiments. More than one possible cause can often be identified. In these cases, a good method is to conduct experiments in which all but one factor remains constant, and then determine from this the factor responsible for the effect.

For example, a plant growing in a temperate climate may go dormant during the winter and start growing in the spring. One theory would be that rising average temperatures trigger growth, while another might be that longer periods of daylight are responsible. To determine which is the case, one plant sample could be subjected to increasing temperatures and constant hours of daylight, while another could experience constant temperature and increasing daylight hours. The cause could then be determined by which set of plants starts growing. If neither group starts growing, a third experiment could be performed, in which both temperature and daylight are increased; if this results in growth, the researchers may conclude that a combination of both factors is needed.
In some cases, a given cause will always result in a particular effect; for example, Earth’s gravity will always cause an object to fall if no other force is acting on it. In other cases, however, the effect is not guaranteed. Ionizing radiation and some chemicals are known to cause cancer, but not everyone exposed to these factors will develop the disease, as there is an element of possibility involved. Both factors can alter the DNA, and sometimes this will result in a cell becoming cancerous, but this won’t happen every time.
If, however, one were to plot levels of exposure to these factors against the incidence of cancer in a large sample of otherwise similar people, one would expect a strong correlation.

While researchers have criteria for pursuing possible causes of a phenomenon based on the strength of correlations, the factor with the highest R2 value is not necessarily the culprit. Scientists and researchers will reject factors that show a weak correlation, but, as noted, completely irrelevant factors can produce a very high R2, as well as factors that appear for the same reason as the object under investigation. The probability that A causes B is therefore not necessarily proportional to the strength of the correlation.
Confusion of cause and correlation
Lots of confusion between cause and correlation results from the way results are reported in the media. A relationship could be described as a “cause” – violent video games could be reported to cause violent behavior, when all that has been found is a correlation, for example. It is possible that aggressive people are more likely to play violent games, so such people will behave more aggressively with or without the influence of the games.

Research has shown that violent games can influence aggression. It also shows that a number of other factors may be responsible for violent behavior, including poorer socioeconomic status, mental illness, abusive childhoods, and poor parenting. It is possible that such games could increase the likelihood of violent behavior in an individual with a predisposition to aggression resulting from other factors, but claims that violent video games cause violent behavior is not justified by the known facts.
Health is another area where confusion can arise. Those who read or hear about the many things that have been reported to cause, or be linked to, cancer may never eat, drink or leave their homes again. A “cause” can only be a correlation, and a “link” is just that: it doesn’t identify a definite cause of the cancer. There is a lot of research into why cancer develops, and scientists often find links, but when these are reported in the media, people should watch or listen carefully for qualifying words like “may,” “may increase,” or “may have an effect”, before drawing conclusions.




Protect your devices with Threat Protection by NordVPN


Skip to content