Properties of Correlation
DS 1000 — Data Science Concepts
Properties of Correlation
Suppose \(x\) and \(y\) are two numerical variables. In this lab, we will demonstrate five fundamental properties of correlation:
- Symmetry: \[\text{corr}(x,y) = \text{corr}(y,x)\]
- Linear Invariance (Unit & Translation): Scaling (by a positive constant) or shifting variables does not change correlation.
- Unit Invariance: \[\text{corr}(Ax, By) = \text{corr}(x, y)\] (where \(A, B > 0\))
- Translation Invariance: \[\text{corr}(x + a, y + b) = \text{corr}(x, y)\] (where \(a, b\) are constants)
- Boundedness: Correlation is always between -1 and +1. \[-1 \le \text{corr}(x,y) \le 1\]
- Perfect Correlation:
- \[\text{corr}(x,y) = 1\] means all points fall exactly on a line with positive slope.
- \[\text{corr}(x,y) = -1\] means all points fall exactly on a line with negative slope.
- Linearity Only: Correlation only detects linear patterns. It fails to detect curved relationships.
Getting Started
First, we need to load our tools. We will use pandas for data, numpy for calculations, and matplotlib for graphing.
1. Symmetry
The first property of correlation is symmetry.
In other words, the correlation between “Height” and “Weight” is exactly the same as the correlation between “Weight” and “Height”. There is no distinction between explanatory and response variables when calculating correlation.
Let’s verify this numerically and visually.
Notice that the shape of the point cloud is identical, just flipped over the diagonal line. The form remains the same.
2. Unit Invariance
The second property is unit invariance. This is one of the most useful features of the correlation coefficient.
Because correlation is based on standardized scores (z-scores), it has no units. Therefore, changing units does not change the correlation.
- If you convert Miles to Kilometers, correlation stays the same.
- If you convert Celsius to Fahrenheit, correlation stays the same.
Verification in Python
Let’s take our Height data (in inches) and Weight data (in lbs) and convert them to Metric units.
- Inches to Centimeters: Multiply by 2.54
- Lbs to Kilograms: Divide by 2.2
3. Translation Invariance
The third property is translation invariance. This means that adding (or subtracting) a constant to either variable does not change the correlation. This effectively shifts the graph up, down, left, or right, but does not change the shape of the point cloud.
4. Correlation bounded between -1 and 1
Let’s generate 10,000 pairs of random variables and visualize the distribution of correlations. If the bound holds, we should never see a number outside the range [-1, 1].
5. Perfect Correlation
A correlation of exactly 1 or -1 implies a perfect linear relationship.
Does Slope Matter?
A common misconception is that a “higher” correlation means a “steeper” line. This is false. Correlation measures how tightly the points fit the line, not how steep the line is (as long as the slope isn’t zero).
Note: Some of the above correlations are just below 1 this is due to floating point precision limitations in computers. In theory, they should all be exactly 1. This is a limitation of numerical computing, not of correlation itself.
6. Linearity only
Correlation () only measures the strength of Linear relationships. If the data has a perfect curved relationship (like a parabola), might be zero!
This is why we must always visualize our data before trusting the correlation number.
Even though is perfectly predicted by (), the correlation is 0 because the relationship is not a straight line.
Summary
In this lab, we proved critical properties of correlation using Python:
| Property | Why it matters |
|---|---|
| Symmetry | It doesn’t matter which variable is “explanatory” or “response”. |
| Unit Invariance | Changing units (feet to meters, Fahrenheit to Celsius) does not change the correlation. |
| Translation Invariance | Adding a constant shift to data doesn’t change the relationship strength. |
| Perfect Correlation | Indicates a perfect linear fit. The slope of the line does not affect . |
| Linearity Only | does not mean “no relationship,” it just means “no linear relationship.” |