Chapter 5 of 18

Measures of Spread — Variance, SD & IQR

Understand how scattered data is around its centre — range, variance, standard deviation, IQR, and the coefficient of variation.

Meritshot10 min read
StatisticsVarianceStandard DeviationIQRSpreadDispersion
All Statistics Chapters

Why Spread Matters

Two datasets can have the same mean but completely different characters:

Team A scores: 70, 72, 74, 76, 78    Mean = 74, very consistent
Team B scores: 40, 55, 74, 90, 111   Mean = 74, wildly variable

Measures of spread describe how scattered the data is around the centre. Without them, the mean tells an incomplete — sometimes misleading — story.

Sample Dataset

Annual salaries for 8 analysts (₹ thousands):
62, 68, 72, 75, 78, 82, 85, 91

n = 8
x̄ = (62+68+72+75+78+82+85+91) / 8 = 613 / 8 = 76.625

1. Range

The simplest measure of spread.

Range = Maximum − Minimum

Range = 91 − 62 = 29 (₹29,000)

Limitation: Uses only two values. One extreme outlier completely changes it.

Same data but add one outlier (₹200k):
Range = 200 − 62 = 138 ← doubles the range even though 7 of 8 values are unchanged

2. Variance

Measures the average squared deviation from the mean.

Why Square the Deviations?

Raw deviations sum to zero:

Σ(xᵢ − x̄) = (62−76.625) + (68−76.625) + ... + (91−76.625) = 0 (always)

Squaring makes all deviations positive so they don't cancel:

Sample Variance Formula

s² = Σ(xᵢ − x̄)² / (n − 1)

We divide by (n−1), not n, because we're estimating the population variance from a sample. Dividing by (n−1) makes s² an unbiased estimator of σ². This is called Bessel's correction.

Calculation

x̄ = 76.625

Deviations and squared deviations:
x      (x − x̄)    (x − x̄)²
62     −14.625     213.891
68      −8.625      74.391
72      −4.625      21.391
75      −1.625       2.641
78       1.375       1.891
82       5.375      28.891
85       8.375      70.141
91      14.375     206.641
                  ─────────
Sum of (x−x̄)²:   619.878

s² = 619.878 / (8−1) = 619.878 / 7 = 88.554 (₹² thousands²)

Population Variance

σ² = Σ(xᵢ − μ)² / N     (divide by N when you have ALL data)

Limitation of variance: Units are squared (₹²) — hard to interpret. We need the standard deviation.

3. Standard Deviation

The square root of the variance — back in the original units.

Sample SD:      s = √s² = √88.554 = 9.41 (₹9,410)
Population SD:  σ = √σ²

Interpretation

x̄ = 76.625 (₹76,625)
s  = 9.41   (₹9,410)

→ A typical analyst's salary is about ₹9,410 away from the mean
→ Most salaries fall roughly between x̄ ± s = ₹67,215 and ₹86,035

The Empirical Rule (68-95-99.7 Rule)

For bell-shaped (approximately normal) distributions:

68% of data falls within 1 SD of the mean:  (x̄ ± 1s)
95% of data falls within 2 SDs of the mean: (x̄ ± 2s)
99.7% of data falls within 3 SDs of the mean: (x̄ ± 3s)
If exam scores: x̄ = 70, s = 10
→ 68% of students scored between 60 and 80
→ 95% scored between 50 and 90
→ 99.7% scored between 40 and 100
→ Only 0.3% score below 40 or above 100

This rule is covered in depth in Chapter 10 (Normal Distribution).

Properties of Standard Deviation

  1. Always ≥ 0 (zero only if all values are identical)
  2. Same units as the original data
  3. Sensitive to outliers (like the mean)
  4. The most widely used measure of spread for symmetric distributions

4. Interquartile Range (IQR)

The IQR measures the spread of the middle 50% of data — robust to outliers.

Quartiles

Sort the data. Divide into four equal quarters:

Sorted: 62, 68, 72, 75, 78, 82, 85, 91

Q1 (25th percentile): median of the lower half
  Lower half: 62, 68, 72, 75
  Q1 = (68+72)/2 = 70

Q2 (50th percentile) = Median = (75+78)/2 = 76.5

Q3 (75th percentile): median of the upper half
  Upper half: 78, 82, 85, 91
  Q3 = (82+85)/2 = 83.5

IQR = Q3 − Q1 = 83.5 − 70 = 13.5 (₹13,500)

Interpretation

IQR = 13.5 → the middle 50% of salaries span a range of ₹13,500
25% earn below ₹70k, 75% earn below ₹83.5k

Outlier Detection with IQR (Tukey's Fences)

Lower fence = Q1 − 1.5 × IQR = 70 − 1.5 × 13.5 = 70 − 20.25 = 49.75
Upper fence = Q3 + 1.5 × IQR = 83.5 + 1.5 × 13.5 = 83.5 + 20.25 = 103.75

Any value below 49.75 or above 103.75 is flagged as a potential outlier.
Our data: all values between 62 and 91 → no outliers by this rule.
If we added 200: 200 > 103.75 → outlier flagged.

When to Use IQR vs Standard Deviation

IQRStandard Deviation
Resistant to outliers
Uses all data✗ (middle 50%)
Used withMedianMean
Best forSkewed data, outlier detectionSymmetric data, normal distribution
Used inBox plotsMost statistical tests

5. Coefficient of Variation (CV)

Compares spread relative to the mean — useful when comparing distributions with different units or scales.

CV = (s / x̄) × 100%

Salary dataset: CV = (9.41 / 76.625) × 100% = 12.3%

Interpretation: The standard deviation is 12.3% of the mean.
→ Moderate variability relative to the mean

Comparing Two Datasets

Dataset A: Salaries in Finance: x̄ = 82, s = 9 → CV = 11%
Dataset B: Project durations: x̄ = 12 days, s = 4 days → CV = 33%

Project durations are relatively more variable than salaries,
even though the absolute SD is smaller.

CV is dimensionless — allows comparison across different units or scale.

6. The Box Plot (Box-and-Whisker Plot)

Visualises the five-number summary: Min, Q1, Median, Q3, Max.

Box Plot for salary data:

Min=62                Max=91
|   Q1=70  Med=76.5  Q3=83.5  |
├────┤━━━━━┿━━━━━━━━━┿━━━━━┤────┤
62   70   76.5      83.5      91

The box spans Q1 to Q3 (the IQR).
The line inside the box is the median.
Whiskers extend to Min and Max (or to the fences for outlier detection).
Points beyond the fences are plotted individually as outliers (dots/circles).

Side-by-Side Box Plots

Comparing distributions across groups — more informative than just comparing means:

Finance  ──[━━━━━━━━━━]──
Tech     ──────[━━━━━━━━━━━━━━━]─────
Marketing ──[━━━━━━]──

Quick visual: Tech has higher median and wider spread than Finance or Marketing.

Practical Examples

Example 1: Comparing Two Investment Strategies

Strategy A monthly returns (%): 2.1, 2.3, 2.0, 2.4, 2.2, 2.5, 2.1, 2.3, 2.0, 2.2
Strategy B monthly returns (%): 4.0, −1.0, 5.5, 0.5, 3.8, −2.0, 6.1, 0.3, 4.2, −0.4

x̄_A = 2.21%      s_A = 0.16%     CV_A = 7.2%
x̄_B = 2.10%      s_B = 2.88%     CV_B = 137%

Strategy A has similar mean return but dramatically lower variability.
Strategy B has high upside months but also losses — much higher risk.
Risk-adjusted, Strategy A is clearly superior (same return, less risk).

Example 2: Quality Control

Two production lines, target output = 500 units/day:

Line A: 495, 498, 502, 504, 497, 501, 500, 503, 499, 501
  x̄_A = 500, s_A = 2.7 units, CV = 0.54%

Line B: 480, 510, 490, 520, 485, 515, 505, 495, 510, 490
  x̄_B = 500, s_B = 14.3 units, CV = 2.86%

Both lines produce 500 units on average.
Line B is 5× more variable — worse quality control.

Example 3: Student Performance Analysis

Class of 30 students, scores:
Q1 = 58, Median = 72, Q3 = 84

IQR = 84 − 58 = 26
Lower fence = 58 − 1.5(26) = 19 → any score below 19 = outlier
Upper fence = 84 + 1.5(26) = 123 → any score above 123 = outlier (impossible here)

Three students scored 12, 15, 18 → all three are outliers
→ These students need individual attention

Comparing All Measures of Spread

MeasureFormulaResistant?Best for
RangeMax − MinNoQuick overview
Variance (s²)Σ(x−x̄)²/(n−1)NoFurther calculations
Std Dev (s)√s²NoSymmetric data, normal distribution
IQRQ3 − Q1YesSkewed data, outlier detection
CV(s/x̄)×100%NoComparing across different units

Common Mistakes

1. Confusing sample SD with population SD

Sample: divide by (n−1)   → s  (estimating population spread from a sample)
Population: divide by N   → σ  (you have ALL the data)

Using n instead of (n−1) for a sample underestimates the true spread.

2. Interpreting SD without context

s = 15 marks is huge for a 0–20 quiz but tiny for a 0–1000 exam. Use CV for context.

3. Reporting only the mean without spread

"Average salary = ₹76k" is incomplete. "Average = ₹76k, SD = ₹9.4k" tells you much more — and "Median = ₹76.5k, IQR = ₹13.5k" is even better for skewed salary distributions.

4. Using SD for skewed distributions

Salary distribution with outlier (₹200k):
s = 44k (distorted by outlier)
IQR = 13.5k (unaffected)

Report IQR + Median for skewed salary data.

Practice Exercises

  1. For the dataset: 3, 7, 7, 9, 11, 14, 15, 18, 21, 25 — calculate range, variance, and standard deviation.

  2. Find Q1, Q3, and IQR for: 12, 15, 18, 22, 25, 28, 31, 35, 40. Identify any outliers using Tukey's fences.

  3. Two mutual funds:

    • Fund X: mean annual return = 12%, SD = 2%
    • Fund Y: mean annual return = 18%, SD = 9% Which is more variable relative to its return? Calculate CV for each.
  4. A factory tracks daily defects: 2, 4, 3, 5, 4, 3, 4, 2, 25, 4. Calculate the mean and SD with and without the outlier (25). Which measure of spread better describes this dataset?

  5. Explain in words why we divide by (n−1) instead of n when calculating sample variance.

Summary

In this chapter you learned:

  • Range = Max − Min; simple but outlier-sensitive
  • Variance s² = Σ(x−x̄)²/(n−1); divide by (n−1) for sample (Bessel's correction); units are squared
  • Standard deviation s = √s²; same units as data; 68% within 1 SD, 95% within 2 SDs (for normal distributions)
  • IQR = Q3 − Q1; spread of middle 50%; resistant to outliers
  • Tukey's fences: Q1−1.5×IQR and Q3+1.5×IQR — values beyond these are potential outliers
  • CV = (s/x̄)×100% — relative spread; compare across different scales or units
  • Box plot visualises Min, Q1, Median, Q3, Max — great for comparing groups
  • Use SD+Mean for symmetric data; use IQR+Median for skewed data or when outliers are present

Next up: Data Visualisation for Statistics — histograms, box plots, scatter plots, and how to choose the right chart for statistical data.