Chapter 12 of 18

Confidence Intervals

Build a range of plausible values for a population parameter — constructing, interpreting, and understanding confidence intervals for means and proportions.

Meritshot10 min read
StatisticsConfidence IntervalsMargin of ErrorEstimationPopulation MeanProportion
All Statistics Chapters

What Is a Confidence Interval?

A confidence interval (CI) provides a range of plausible values for an unknown population parameter, based on sample data. Instead of a single point estimate ("the mean is ₹76,400"), a CI says: "We're 95% confident the true mean is between ₹73,200 and ₹79,600."

Why not just give the sample mean? Because x̄ varies from sample to sample. The CI captures this uncertainty, quantifying how much we should trust our estimate.

The General Structure

CI = Point Estimate ± Margin of Error

Margin of Error = Critical Value × Standard Error

For the mean:
CI for μ = x̄ ± z* × (σ/√n)     [if σ known]
CI for μ = x̄ ± t* × (s/√n)      [if σ unknown — more common]

95% Confidence Interval for the Mean (σ Known)

When the population standard deviation σ is known (rare in practice), use z-critical values.

z* for common confidence levels:
90% CI: z* = 1.645
95% CI: z* = 1.96
99% CI: z* = 2.576

Example

Sample: n=50 employees, x̄ = ₹76,400
Known: σ = ₹12,000

SE = σ/√n = 12,000/√50 = 12,000/7.071 = 1,697

95% CI = 76,400 ± 1.96 × 1,697
       = 76,400 ± 3,326
       = (73,074, 79,726)

Interpretation: We are 95% confident the true average salary (μ) is between ₹73,074 and ₹79,726.

Interpreting Confidence Intervals Correctly

This is where most people go wrong.

The Correct Interpretation

"If we repeated this sampling procedure many times, 95% of the
confidence intervals constructed would contain the true parameter μ."

Common Misinterpretations

WRONG: "There is a 95% probability that μ is between 73,074 and 79,726."
WRONG: "95% of individual salaries fall between these values."
WRONG: "We are 95% sure our sample mean is correct."

WHY: μ is fixed (not random). Once we construct the interval, it either
contains μ or it doesn't. The 95% refers to the METHOD, not this specific interval.

The "95 Out of 100" Analogy

Imagine constructing 100 different confidence intervals from 100 samples:
95 of those intervals WILL contain the true μ
5 will not

Your one CI is one realisation of this process — it's in the 95 or the 5,
but you don't know which.

Confidence Interval for the Mean (σ Unknown — t-Distribution)

In practice, σ is almost never known. We estimate it with s, the sample standard deviation. This introduces additional uncertainty, which the t-distribution accounts for.

The t-Distribution

The t-distribution is:

  • Bell-shaped and symmetric (like normal)
  • Has heavier tails than normal (more extreme values)
  • Parameterised by degrees of freedom (df = n − 1)
  • As n → ∞, t → Z (normal)
t* for 95% CI at various df:
df = 5:   t* = 2.571
df = 10:  t* = 2.228
df = 20:  t* = 2.086
df = 30:  t* = 2.042
df = 60:  t* = 2.000
df = ∞:   t* = 1.960  (= z*)

For n ≥ 30, t* ≈ z*, so the difference is small.

CI Formula with t

CI for μ = x̄ ± t* × (s/√n)

Where t* is from the t-distribution with df = n − 1 at the desired confidence level

Example

Sample: n=25 employees
x̄ = ₹76,400
s = ₹11,800 (sample SD — we don't know σ)

SE = s/√n = 11,800/√25 = 11,800/5 = 2,360
df = n − 1 = 24
t* at 95% with 24 df = 2.064

95% CI = 76,400 ± 2.064 × 2,360
       = 76,400 ± 4,871
       = (71,529, 81,271)

Interpretation: We are 95% confident the true average salary is between ₹71,529 and ₹81,271.

Note: this interval is wider than the z-interval because:

  1. n is smaller (25 vs 50)
  2. We used t* = 2.064 instead of z* = 1.96 (heavier tails)

Confidence Interval for a Proportion

When your parameter of interest is a population proportion p:

p̂ = x/n (sample proportion)
SE = √(p̂(1−p̂)/n)

CI for p = p̂ ± z* × √(p̂(1−p̂)/n)

Valid when np̂ ≥ 10 and n(1−p̂) ≥ 10

Example

Survey: 400 customers surveyed
280 say they are satisfied

p̂ = 280/400 = 0.70

SE = √(0.70 × 0.30 / 400) = √(0.21/400) = √0.000525 = 0.0229

95% CI = 0.70 ± 1.96 × 0.0229
       = 0.70 ± 0.0449
       = (0.655, 0.745)

Interpretation: We are 95% confident the true proportion of satisfied customers is
between 65.5% and 74.5%.

Factors Affecting CI Width

The margin of error (half-width of CI) = z* × SE:

Width = 2 × z* × σ/√n

Wider CI (less precise) when:
→ Higher confidence level (larger z*)
→ Larger σ (more population variability)
→ Smaller n

Narrower CI (more precise) when:
→ Lower confidence level (smaller z*)
→ Smaller σ (less variability)
→ Larger n

Trade-offs

You want 99% confidence AND a narrow interval AND a small sample?
→ Impossible — you can have any two of these at the cost of the third.

Practical choice: Fix the confidence level (usually 95%) and desired width,
then solve for the required n.

Determining Sample Size

If you want a specific margin of error E at confidence level 1−α:

For the mean (σ known or estimated):
n = (z* × σ / E)²

For a proportion:
n = z*² × p̂(1−p̂) / E²  →  use p̂=0.5 if unknown (conservative)

Example:
Desired: ±₹1,000 margin of error, 95% confidence
Population SD = ₹12,000

n = (1.96 × 12,000 / 1,000)² = (23.52)² = 553.19 → n = 554

Example (proportion):
Desired: ±3% margin, 95% confidence, p unknown
n = 1.96² × 0.5 × 0.5 / 0.03² = 3.8416 × 0.25 / 0.0009 = 1,067.1 → n = 1,068

Practical Examples

Example 1: Market Research

A retailer surveys 200 customers about monthly spending.
x̄ = ₹8,500, s = ₹3,200

95% CI for average monthly spend:
SE = 3,200/√200 = 226.3
df = 199, t* ≈ 1.972

CI = 8,500 ± 1.972 × 226.3
   = 8,500 ± 446.5
   = (₹8,054, ₹8,947)

→ Can estimate total market by multiplying the CI by total customers
→ If 50,000 customers: total market = (₹402M, ₹447M)

Example 2: Clinical Trial

A new drug is tested on 30 patients.
Reduction in blood pressure: x̄ = 8.5 mmHg, s = 3.2 mmHg

95% CI:
SE = 3.2/√30 = 0.584
df = 29, t* = 2.045

CI = 8.5 ± 2.045 × 0.584
   = 8.5 ± 1.19
   = (7.31, 9.69) mmHg

→ 95% confident the drug reduces blood pressure by between 7.3 and 9.7 mmHg
→ Since the entire CI is above 0, there's evidence of a positive effect

Example 3: Election Polling

Poll of 1,000 voters: 52% support candidate A

p̂ = 0.52, n = 1,000
SE = √(0.52 × 0.48/1000) = √0.0002496 = 0.01580

95% CI = 0.52 ± 1.96 × 0.01580
       = 0.52 ± 0.031
       = (0.489, 0.551) = 48.9% to 55.1%

→ The CI crosses 50% — the race is "too close to call"
→ The poll is inconclusive despite showing 52% support

Example 4: Employee Satisfaction

HR surveys 150 employees: 108 report high job satisfaction.
p̂ = 108/150 = 0.72

95% CI for proportion:
SE = √(0.72 × 0.28/150) = √0.001344 = 0.0367

CI = 0.72 ± 1.96 × 0.0367
   = 0.72 ± 0.072
   = (0.648, 0.792) = 64.8% to 79.2%

→ "About 72% of employees are highly satisfied — margin of error ±7.2% at 95% confidence"

One-Sided Confidence Intervals

Sometimes you only care about one direction:

"We are 95% confident the mean is AT LEAST this much" → one-sided lower bound
"We are 95% confident the proportion is AT MOST this much" → one-sided upper bound

For one-sided 95% CI, use z* = 1.645 (not 1.96):
Lower bound: x̄ − 1.645 × SE
Upper bound: x̄ + 1.645 × SE

Example: With n=50, x̄=76,400, σ=12,000, SE=1,697
95% lower CI: 76,400 − 1.645 × 1,697 = 76,400 − 2,792 = ₹73,608
"We are 95% confident the true mean salary is at least ₹73,608."

Common Mistakes

1. Misinterpreting the confidence level

WRONG: "95% of employees earn between ₹71,529 and ₹81,271"
RIGHT: The interval is for the POPULATION MEAN, not individual values.
For a prediction interval for individual values, the range is much wider.

2. Wider CI is not always worse

A wider CI reflects genuinely uncertain data. Reporting a falsely narrow CI (by ignoring variability) is misleading. Uncertainty is information.

3. Using z instead of t for small samples

For n < 30, always use t (unless σ is truly known). Using z underestimates the required width.

4. CI for proportion when np is small

n=20, p̂=0.04 → np=0.8 < 10 → normal approximation breaks down
Use exact Binomial intervals (Clopper-Pearson) instead.

5. Thinking a 95% CI is "almost certain to be right"

5 out of 100 CIs at 95% confidence will NOT contain the true parameter. If you compute 20 CIs, expect approximately 1 to miss.

Practice Exercises

  1. n=36, x̄=250, s=48. Construct 90%, 95%, and 99% confidence intervals for μ.

  2. A quality inspector samples 64 items. Average weight x̄=500g, s=24g. Construct a 95% CI for the true average weight.

  3. 450 of 600 surveyed customers prefer Brand A. Construct a 95% CI for the true proportion preferring Brand A.

  4. You want to estimate average household electricity use within ±50 kWh with 99% confidence. Population SD ≈ 400 kWh. How large a sample is required?

  5. Two analysts compute CIs for the same population mean from the same data. Analyst A uses 90% confidence, Analyst B uses 99%. Whose CI is wider and by roughly how much?

Summary

In this chapter you learned:

  • Confidence interval: point estimate ± margin of error; provides a range of plausible values for a population parameter
  • CI for μ (σ known): x̄ ± z* × (σ/√n); z* = 1.645/1.96/2.576 for 90/95/99%
  • CI for μ (σ unknown): x̄ ± t* × (s/√n); use t-distribution with df = n−1
  • t-distribution: heavier tails than normal; converges to Z as n → ∞
  • CI for proportion: p̂ ± z* × √(p̂(1−p̂)/n); valid when np̂≥10 and n(1−p̂)≥10
  • Correct interpretation: "If repeated, 95% of such intervals contain μ" — NOT "95% probability μ is in this interval"
  • Width controlled by: confidence level (↑ → wider), n (↑ → narrower), σ (↑ → wider)
  • Sample size for mean: n = (zσ/E)²; for proportion: n = z²p(1−p)/E²
  • One-sided CI: use z*=1.645 for 95% one-sided; reports lower or upper bound only

Next up: Hypothesis Testing — making formal decisions about population parameters using sample evidence.