What Is a Confidence Interval?
A confidence interval (CI) provides a range of plausible values for an unknown population parameter, based on sample data. Instead of a single point estimate ("the mean is ₹76,400"), a CI says: "We're 95% confident the true mean is between ₹73,200 and ₹79,600."
Why not just give the sample mean? Because x̄ varies from sample to sample. The CI captures this uncertainty, quantifying how much we should trust our estimate.
The General Structure
CI = Point Estimate ± Margin of Error
Margin of Error = Critical Value × Standard Error
For the mean:
CI for μ = x̄ ± z* × (σ/√n) [if σ known]
CI for μ = x̄ ± t* × (s/√n) [if σ unknown — more common]
95% Confidence Interval for the Mean (σ Known)
When the population standard deviation σ is known (rare in practice), use z-critical values.
z* for common confidence levels:
90% CI: z* = 1.645
95% CI: z* = 1.96
99% CI: z* = 2.576
Example
Sample: n=50 employees, x̄ = ₹76,400
Known: σ = ₹12,000
SE = σ/√n = 12,000/√50 = 12,000/7.071 = 1,697
95% CI = 76,400 ± 1.96 × 1,697
= 76,400 ± 3,326
= (73,074, 79,726)
Interpretation: We are 95% confident the true average salary (μ) is between ₹73,074 and ₹79,726.
Interpreting Confidence Intervals Correctly
This is where most people go wrong.
The Correct Interpretation
"If we repeated this sampling procedure many times, 95% of the
confidence intervals constructed would contain the true parameter μ."
Common Misinterpretations
WRONG: "There is a 95% probability that μ is between 73,074 and 79,726."
WRONG: "95% of individual salaries fall between these values."
WRONG: "We are 95% sure our sample mean is correct."
WHY: μ is fixed (not random). Once we construct the interval, it either
contains μ or it doesn't. The 95% refers to the METHOD, not this specific interval.
The "95 Out of 100" Analogy
Imagine constructing 100 different confidence intervals from 100 samples:
95 of those intervals WILL contain the true μ
5 will not
Your one CI is one realisation of this process — it's in the 95 or the 5,
but you don't know which.
Confidence Interval for the Mean (σ Unknown — t-Distribution)
In practice, σ is almost never known. We estimate it with s, the sample standard deviation. This introduces additional uncertainty, which the t-distribution accounts for.
The t-Distribution
The t-distribution is:
- Bell-shaped and symmetric (like normal)
- Has heavier tails than normal (more extreme values)
- Parameterised by degrees of freedom (df = n − 1)
- As n → ∞, t → Z (normal)
t* for 95% CI at various df:
df = 5: t* = 2.571
df = 10: t* = 2.228
df = 20: t* = 2.086
df = 30: t* = 2.042
df = 60: t* = 2.000
df = ∞: t* = 1.960 (= z*)
For n ≥ 30, t* ≈ z*, so the difference is small.
CI Formula with t
CI for μ = x̄ ± t* × (s/√n)
Where t* is from the t-distribution with df = n − 1 at the desired confidence level
Example
Sample: n=25 employees
x̄ = ₹76,400
s = ₹11,800 (sample SD — we don't know σ)
SE = s/√n = 11,800/√25 = 11,800/5 = 2,360
df = n − 1 = 24
t* at 95% with 24 df = 2.064
95% CI = 76,400 ± 2.064 × 2,360
= 76,400 ± 4,871
= (71,529, 81,271)
Interpretation: We are 95% confident the true average salary is between ₹71,529 and ₹81,271.
Note: this interval is wider than the z-interval because:
- n is smaller (25 vs 50)
- We used t* = 2.064 instead of z* = 1.96 (heavier tails)
Confidence Interval for a Proportion
When your parameter of interest is a population proportion p:
p̂ = x/n (sample proportion)
SE = √(p̂(1−p̂)/n)
CI for p = p̂ ± z* × √(p̂(1−p̂)/n)
Valid when np̂ ≥ 10 and n(1−p̂) ≥ 10
Example
Survey: 400 customers surveyed
280 say they are satisfied
p̂ = 280/400 = 0.70
SE = √(0.70 × 0.30 / 400) = √(0.21/400) = √0.000525 = 0.0229
95% CI = 0.70 ± 1.96 × 0.0229
= 0.70 ± 0.0449
= (0.655, 0.745)
Interpretation: We are 95% confident the true proportion of satisfied customers is
between 65.5% and 74.5%.
Factors Affecting CI Width
The margin of error (half-width of CI) = z* × SE:
Width = 2 × z* × σ/√n
Wider CI (less precise) when:
→ Higher confidence level (larger z*)
→ Larger σ (more population variability)
→ Smaller n
Narrower CI (more precise) when:
→ Lower confidence level (smaller z*)
→ Smaller σ (less variability)
→ Larger n
Trade-offs
You want 99% confidence AND a narrow interval AND a small sample?
→ Impossible — you can have any two of these at the cost of the third.
Practical choice: Fix the confidence level (usually 95%) and desired width,
then solve for the required n.
Determining Sample Size
If you want a specific margin of error E at confidence level 1−α:
For the mean (σ known or estimated):
n = (z* × σ / E)²
For a proportion:
n = z*² × p̂(1−p̂) / E² → use p̂=0.5 if unknown (conservative)
Example:
Desired: ±₹1,000 margin of error, 95% confidence
Population SD = ₹12,000
n = (1.96 × 12,000 / 1,000)² = (23.52)² = 553.19 → n = 554
Example (proportion):
Desired: ±3% margin, 95% confidence, p unknown
n = 1.96² × 0.5 × 0.5 / 0.03² = 3.8416 × 0.25 / 0.0009 = 1,067.1 → n = 1,068
Practical Examples
Example 1: Market Research
A retailer surveys 200 customers about monthly spending.
x̄ = ₹8,500, s = ₹3,200
95% CI for average monthly spend:
SE = 3,200/√200 = 226.3
df = 199, t* ≈ 1.972
CI = 8,500 ± 1.972 × 226.3
= 8,500 ± 446.5
= (₹8,054, ₹8,947)
→ Can estimate total market by multiplying the CI by total customers
→ If 50,000 customers: total market = (₹402M, ₹447M)
Example 2: Clinical Trial
A new drug is tested on 30 patients.
Reduction in blood pressure: x̄ = 8.5 mmHg, s = 3.2 mmHg
95% CI:
SE = 3.2/√30 = 0.584
df = 29, t* = 2.045
CI = 8.5 ± 2.045 × 0.584
= 8.5 ± 1.19
= (7.31, 9.69) mmHg
→ 95% confident the drug reduces blood pressure by between 7.3 and 9.7 mmHg
→ Since the entire CI is above 0, there's evidence of a positive effect
Example 3: Election Polling
Poll of 1,000 voters: 52% support candidate A
p̂ = 0.52, n = 1,000
SE = √(0.52 × 0.48/1000) = √0.0002496 = 0.01580
95% CI = 0.52 ± 1.96 × 0.01580
= 0.52 ± 0.031
= (0.489, 0.551) = 48.9% to 55.1%
→ The CI crosses 50% — the race is "too close to call"
→ The poll is inconclusive despite showing 52% support
Example 4: Employee Satisfaction
HR surveys 150 employees: 108 report high job satisfaction.
p̂ = 108/150 = 0.72
95% CI for proportion:
SE = √(0.72 × 0.28/150) = √0.001344 = 0.0367
CI = 0.72 ± 1.96 × 0.0367
= 0.72 ± 0.072
= (0.648, 0.792) = 64.8% to 79.2%
→ "About 72% of employees are highly satisfied — margin of error ±7.2% at 95% confidence"
One-Sided Confidence Intervals
Sometimes you only care about one direction:
"We are 95% confident the mean is AT LEAST this much" → one-sided lower bound
"We are 95% confident the proportion is AT MOST this much" → one-sided upper bound
For one-sided 95% CI, use z* = 1.645 (not 1.96):
Lower bound: x̄ − 1.645 × SE
Upper bound: x̄ + 1.645 × SE
Example: With n=50, x̄=76,400, σ=12,000, SE=1,697
95% lower CI: 76,400 − 1.645 × 1,697 = 76,400 − 2,792 = ₹73,608
"We are 95% confident the true mean salary is at least ₹73,608."
Common Mistakes
1. Misinterpreting the confidence level
WRONG: "95% of employees earn between ₹71,529 and ₹81,271"
RIGHT: The interval is for the POPULATION MEAN, not individual values.
For a prediction interval for individual values, the range is much wider.
2. Wider CI is not always worse
A wider CI reflects genuinely uncertain data. Reporting a falsely narrow CI (by ignoring variability) is misleading. Uncertainty is information.
3. Using z instead of t for small samples
For n < 30, always use t (unless σ is truly known). Using z underestimates the required width.
4. CI for proportion when np is small
n=20, p̂=0.04 → np=0.8 < 10 → normal approximation breaks down
Use exact Binomial intervals (Clopper-Pearson) instead.
5. Thinking a 95% CI is "almost certain to be right"
5 out of 100 CIs at 95% confidence will NOT contain the true parameter. If you compute 20 CIs, expect approximately 1 to miss.
Practice Exercises
-
n=36, x̄=250, s=48. Construct 90%, 95%, and 99% confidence intervals for μ.
-
A quality inspector samples 64 items. Average weight x̄=500g, s=24g. Construct a 95% CI for the true average weight.
-
450 of 600 surveyed customers prefer Brand A. Construct a 95% CI for the true proportion preferring Brand A.
-
You want to estimate average household electricity use within ±50 kWh with 99% confidence. Population SD ≈ 400 kWh. How large a sample is required?
-
Two analysts compute CIs for the same population mean from the same data. Analyst A uses 90% confidence, Analyst B uses 99%. Whose CI is wider and by roughly how much?
Summary
In this chapter you learned:
- Confidence interval: point estimate ± margin of error; provides a range of plausible values for a population parameter
- CI for μ (σ known): x̄ ± z* × (σ/√n); z* = 1.645/1.96/2.576 for 90/95/99%
- CI for μ (σ unknown): x̄ ± t* × (s/√n); use t-distribution with df = n−1
- t-distribution: heavier tails than normal; converges to Z as n → ∞
- CI for proportion: p̂ ± z* × √(p̂(1−p̂)/n); valid when np̂≥10 and n(1−p̂)≥10
- Correct interpretation: "If repeated, 95% of such intervals contain μ" — NOT "95% probability μ is in this interval"
- Width controlled by: confidence level (↑ → wider), n (↑ → narrower), σ (↑ → wider)
- Sample size for mean: n = (zσ/E)²; for proportion: n = z²p(1−p)/E²
- One-sided CI: use z*=1.645 for 95% one-sided; reports lower or upper bound only
Next up: Hypothesis Testing — making formal decisions about population parameters using sample evidence.