Correlation: Discovering Relationships Between Variables
Ice cream sales and drownings are positively correlated. Correlation measures the strength of linear relationships, but beware — it never proves cause and effect.
Renato Freitas
Updated on May 5, 2026
What is correlation?
Correlation is a statistical measure that describes the direction and strength of the linear relationship between two numerical variables. If when one variable increases the other also tends to increase, we have positive correlation. If when one increases the other tends to decrease, we have negative correlation. If there is no apparent pattern, the correlation is close to zero.
The scatter plot (or dot plot) is the first tool for visualizing correlation. Each observation is a point on the chart, with one variable on the X-axis and the other on the Y-axis. A cloud of points sloping upward suggests positive correlation; sloping downward, negative; with no clear direction, weak or no correlation.
🧮 Try it yourself — CalcSim
Want more features? Download CalcSim IA app
Pearson's correlation coefficient
Pearson's r coefficient quantifies correlation as a number between −1 and +1. Values close to +1 indicate strong positive correlation (the points nearly form an upward line). Values close to −1 indicate strong negative correlation. Values close to 0 indicate weak or no correlation.
As a rule of thumb, |r| between 0.7 and 1.0 is considered strong; between 0.5 and 0.7, moderate; between 0.3 and 0.5, weak; below 0.3, very weak or negligible. But these thresholds depend on the field — in social sciences, r = 0.5 may be considered strong; in exact sciences, it may be weak.
Pearson's r measures only linear correlation. Two variables can have a strong curvilinear relationship (like drug dose and efficacy — too little is ineffective, the right dose works, too much is toxic) and still have r ≈ 0. Always visualize your data before interpreting the coefficient.
- r = +1: perfect positive correlation
- r = −1: perfect negative correlation
- r = 0: no linear correlation
- |r| ≥ 0.7: strong correlation (general rule)
Correlation is not causation
This is one of the most important — and most violated — principles in statistics and journalism. Ice cream sales and the number of drownings have a strong positive correlation. This does not mean eating ice cream causes drownings: both are driven by a third variable (high summer temperatures).
This type of misleading correlation, generated by a hidden variable, is called spurious correlation. Two phenomena may be correlated because one causes the other, because both are caused by a third factor, or simply by coincidence in a small sample.
To establish causation, we need controlled experiments (treatment and control groups with randomization), not just observation. Observational studies, even with high r, can only generate hypotheses — proving cause requires rigorous experimental design.
Frequently asked questions
Is negative correlation 'bad'?
No. Negative correlation simply means the variables move in opposite directions. Exercise and body fat have negative correlation — which is desirable. Price and demand also have negative correlation. Correlation describes a pattern; the judgment of 'good' or 'bad' depends on context.
What is the difference between Pearson and Spearman correlation?
Pearson measures linear correlation between continuous numerical variables. Spearman measures correlation between the ranks of values, making it more robust when there are outliers or when the relationship is not strictly linear. Spearman also works with ordinal variables.
If r = 0.9, are the variables always close to a straight line?
Almost always, but not always. R² (the coefficient of determination) indicates the proportion of variance in Y explained by X. With r = 0.9, R² = 0.81, meaning 81% of the variation in Y is explained by the variation in X. The remaining 19% is variation not explained by the linear relationship.
How do I know if a correlation is statistically significant?
By running a hypothesis test for the correlation coefficient. The result depends on both the value of r and the sample size. A correlation of r = 0.3 may be significant with n = 200 and not significant with n = 20. The p-value indicates whether the result is larger than what would be expected by chance.
Was this article helpful?
Rate with stars to help us improve the content.
Sign in to rate this article.
Still have questions?
The AI Professor explains step by step
Ask a question in natural language and get a personalised explanation about Basic Statistics — or any other topic.
Prefer to solve it on your phone?
Download the free app →Keep learning