Why the Normal Distribution?
The normal distribution is the most important probability distribution in statistics because:
- Many natural phenomena are approximately normal: heights, blood pressure, measurement errors, IQ scores
- It arises from the sum of many random effects (Central Limit Theorem — Chapter 11)
- Most statistical tests assume normality (t-tests, ANOVA, regression)
- It has elegant mathematical properties that make calculations tractable
Properties of the Normal Distribution
Bell-shaped, symmetric around the mean μ
68%
┌───────┐
│ 95% │
┌─┴───────┴─┐
│ 99.7% │
└───────────┘
←1σ→← →←1σ→
μ-σ μ μ+σ
Key properties:
- Symmetric around the mean: left half mirrors right half
- Mean = Median = Mode: all three coincide at the peak
- Parameterised by μ (mean) and σ (standard deviation) — written X ~ N(μ, σ²)
- Asymptotic tails: the curve never touches the x-axis; extends infinitely in both directions
- Total area under the curve = 1 (all probabilities sum to 1)
- Inflection points at μ ± σ — where the curve transitions from concave to convex
The Empirical Rule (68-95-99.7)
For any normal distribution:
P(μ − σ < X < μ + σ) = 0.6827 ≈ 68%
P(μ − 2σ < X < μ + 2σ) = 0.9545 ≈ 95%
P(μ − 3σ < X < μ + 3σ) = 0.9973 ≈ 99.7%
Only 0.3% of values lie beyond ±3 standard deviations.
Application
IQ scores: μ = 100, σ = 15
68% of people have IQ between 85 and 115
95% of people have IQ between 70 and 130
99.7% of people have IQ between 55 and 145
P(IQ > 130) = (100% − 95%) / 2 = 2.5%
P(IQ > 145) = (100% − 99.7%) / 2 = 0.15%
The Standard Normal Distribution (Z-Distribution)
The standard normal distribution is a normal distribution with μ = 0 and σ = 1:
Z ~ N(0, 1)
Converting any normal variable to the standard normal allows us to use a single table for all normal distributions.
Z-Score Formula
Z = (X − μ) / σ
Z = number of standard deviations X is from the mean
Z > 0: X is above the mean
Z < 0: X is below the mean
Z = 0: X equals the mean
Interpretation
Salary distribution: μ = ₹76,000, σ = ₹12,000
Priya earns ₹88,000:
Z = (88,000 − 76,000) / 12,000 = 12,000 / 12,000 = +1.0
→ Priya's salary is exactly 1 standard deviation above the mean
Raj earns ₹58,000:
Z = (58,000 − 76,000) / 12,000 = −18,000 / 12,000 = −1.5
→ Raj's salary is 1.5 standard deviations below the mean
Using the Z-Table
The Z-table (standard normal table) gives P(Z < z) for any z value.
Standard Normal Table (excerpt)
z P(Z < z)
-2.00 0.0228
-1.50 0.0668
-1.00 0.1587
-0.50 0.3085
0.00 0.5000
0.50 0.6915
1.00 0.8413
1.50 0.9332
1.96 0.9750
2.00 0.9772
2.58 0.9951
3.00 0.9987
Key Z-Values to Memorise
Z = ±1.00 → 68% between → P(Z < 1) = 0.8413
Z = ±1.645 → 90% between → P(Z < 1.645) = 0.9500 (one-tail)
Z = ±1.96 → 95% between → P(Z < 1.96) = 0.9750 ← most important!
Z = ±2.326 → 98% between → P(Z < 2.326) = 0.9900
Z = ±2.576 → 99% between → P(Z < 2.576) = 0.9950
Probability Calculations
Finding P(X < a)
Salaries ~ N(76,000, 12,000²)
P(salary < 94,000) = ?
Step 1: Convert to Z
Z = (94,000 − 76,000) / 12,000 = 18,000/12,000 = 1.5
Step 2: Look up Z-table
P(Z < 1.5) = 0.9332
Answer: P(salary < 94,000) = 93.32%
Finding P(X > a) — Upper Tail
P(salary > 94,000) = 1 − P(salary < 94,000) = 1 − 0.9332 = 0.0668 = 6.68%
Finding P(a < X < b) — Between Two Values
P(64,000 < salary < 94,000) = ?
Z₁ = (64,000 − 76,000) / 12,000 = −12,000/12,000 = −1.0
Z₂ = (94,000 − 76,000) / 12,000 = +1.5
P(Z < −1.0) = 0.1587
P(Z < +1.5) = 0.9332
P(−1 < Z < 1.5) = 0.9332 − 0.1587 = 0.7745
Answer: P(64,000 < salary < 94,000) = 77.45%
Finding P(X < a) when a is Below the Mean
P(salary < 64,000) = P(Z < −1.0) = 0.1587 = 15.87%
Because the table typically gives P(Z < z):
For negative z, you can also use: P(Z < −z) = 1 − P(Z < z)
P(Z < −1.0) = 1 − P(Z < 1.0) = 1 − 0.8413 = 0.1587 ✓
Inverse Normal — Finding X from Probability
Sometimes you need to find the value X that corresponds to a given probability.
P(X > ?) = 5% → what salary is in the top 5%?
Step 1: P(X < ?) = 95% → find z such that P(Z < z) = 0.95
From table: z = 1.645
Step 2: Solve for X
z = (X − μ) / σ → X = μ + z × σ
X = 76,000 + 1.645 × 12,000
X = 76,000 + 19,740 = 95,740
Answer: Top 5% earn above ₹95,740
Percentile Calculation
"What is the 25th percentile of salaries?"
P(Z < z) = 0.25 → z = −0.674 (from table)
X = μ + z × σ = 76,000 + (−0.674) × 12,000
= 76,000 − 8,088
= 67,912
→ 25th percentile = ₹67,912 (Q1)
Normal Approximation to the Binomial
When n is large (np ≥ 5 and n(1−p) ≥ 5), the Binomial distribution can be approximated by a Normal distribution:
X ~ B(n, p) ≈ N(np, np(1−p))
With continuity correction (for better accuracy):
P(X ≤ k) ≈ P(Z ≤ (k + 0.5 − np) / √(np(1−p)))
P(X ≥ k) ≈ P(Z ≥ (k − 0.5 − np) / √(np(1−p)))
Example:
n=100, p=0.40 → np=40, σ=√(100×0.4×0.6)=√24=4.9
P(X ≤ 35) ≈ P(Z ≤ (35.5 − 40)/4.9) = P(Z ≤ −0.918) = 0.179
Practical Examples
Example 1: Quality Control — Manufacturing
Bolt lengths ~ N(10 cm, 0.05² cm²)
Specification: must be between 9.9 cm and 10.1 cm
Z₁ = (9.9 − 10) / 0.05 = −2.0
Z₂ = (10.1 − 10) / 0.05 = +2.0
P(9.9 < X < 10.1) = P(−2 < Z < 2) = 0.9772 − 0.0228 = 0.9544
→ 95.44% of bolts meet specification
→ About 4.56% are rejected — too long or too short
→ Yield improvement: reduce σ (tighter manufacturing process)
Example 2: Finance — Value at Risk (VaR)
Daily portfolio return ~ N(0.1%, 2%²) — daily mean return and SD
95% VaR: What is the maximum loss we expect to see on 95% of days?
(i.e., find the 5th percentile)
P(Z < z) = 0.05 → z = −1.645
X = 0.1% + (−1.645) × 2% = 0.1% − 3.29% = −3.19%
95% VaR = 3.19% daily loss
→ On 95% of trading days, we expect to lose no more than 3.19% of portfolio value
→ On 5% of days (about 12–13 trading days per year), losses exceed 3.19%
Example 3: HR — Hiring Threshold
Aptitude test scores ~ N(μ=70, σ=10)
Company hires top 20% of applicants.
Find the minimum score to be in top 20%:
P(Z < z) = 0.80 → z = 0.842
X = 70 + 0.842 × 10 = 70 + 8.42 = 78.42
→ Applicants must score at least 78.42 to be hired
→ In a pool of 500 applicants, approximately 100 will qualify
Example 4: Return on Investment
Annual ROI ~ N(8%, 15%²)
P(negative return) = P(X < 0)
Z = (0 − 8) / 15 = −0.533
P(Z < −0.533) = 0.297
→ 29.7% probability of a negative return in any given year
P(return > 30%) = P(Z > (30−8)/15) = P(Z > 1.467) = 1 − 0.929 = 0.071
→ 7.1% probability of returning more than 30%
Common Mistakes
1. Using the empirical rule for non-normal data
The 68-95-99.7 rule applies ONLY to normal distributions.
For skewed or heavy-tailed distributions, use Chebyshev's inequality instead:
P(|X − μ| < kσ) ≥ 1 − 1/k² (works for ANY distribution)
k=2: at least 75% within 2 SDs (vs 95% for normal)
k=3: at least 89% within 3 SDs (vs 99.7% for normal)
2. Forgetting to standardise
Wrong: P(X < 94,000) ≠ P(Z < 94,000)
Right: P(X < 94,000) = P(Z < 1.5) when μ=76k, σ=12k
3. Mixing up upper and lower tail
P(X > a) = 1 − P(X < a) = 1 − P(Z < z)
Don't confuse "greater than" with the direct table lookup.
4. Applying normal approximation when conditions aren't met
Binomial: n=10, p=0.05 → np=0.5 < 5 → Normal approximation is poor
Use exact Binomial formula instead.
Practice Exercises
-
Exam scores ~ N(65, 12²). Find: a) P(score > 80) b) P(50 < score < 75) c) The 90th percentile score d) The minimum score for the top 5% (distinction boundary)
-
Daily ATM withdrawals ~ N(₹5,000, ₹1,500²). Find the probability that on a randomly chosen day, withdrawals exceed ₹8,000.
-
A factory produces components with diameter ~ N(20 mm, 0.4² mm²). Acceptable range: 19.2 to 20.8 mm. What percentage of components are acceptable?
-
Investment annual return ~ N(10%, 20%²). What is the 5% Value at Risk (worst 5% of annual returns)?
-
Heights of adult men ~ N(170 cm, 7² cm²). A doorway is 185 cm tall. What percentage of men can walk through without ducking?
Summary
In this chapter you learned:
- Normal distribution X ~ N(μ, σ²): symmetric, bell-shaped; parameterised by mean μ and SD σ
- Empirical rule: 68% within ±1σ, 95% within ±2σ, 99.7% within ±3σ
- Standard normal Z ~ N(0,1): mean=0, SD=1; all normal problems convert to this
- Z-score: Z = (X − μ)/σ — measures distance from mean in standard deviations
- Z-table: gives P(Z < z); use complement for upper tail, subtraction for between-value problems
- Key z-values: 1.645 (90% two-tail), 1.96 (95% two-tail), 2.576 (99% two-tail)
- Inverse normal: given probability → find z from table → solve X = μ + z×σ
- Normal approximation to Binomial: valid when np≥5 and n(1-p)≥5; use continuity correction
- Applications: quality control yield rates, VaR in finance, percentile thresholds in hiring
Next up: Sampling Distributions & the Central Limit Theorem — why the normal distribution appears everywhere, and how sample means behave.