Chapter 15 of 18

Chi-Square Tests

Test categorical data — goodness-of-fit to a distribution and independence between two categorical variables using the chi-square statistic.

Meritshot9 min read
StatisticsChi-SquareGoodness of FitIndependenceContingency TableCategorical Data
All Statistics Chapters

When to Use Chi-Square

Chi-square (χ²) tests work with categorical data (counts, frequencies) — unlike t-tests which require quantitative data.

Two main uses:

  1. Goodness-of-fit test: Does this sample match an expected distribution?
  2. Test of independence: Are two categorical variables related in the population?

The Chi-Square Statistic

The chi-square statistic measures how far observed counts are from expected counts:

χ² = Σ [(O − E)² / E]

Where:
O = Observed count in each category
E = Expected count under H₀
Σ = sum over all categories/cells

Properties:
- Always ≥ 0
- Larger χ² = more difference from H₀ = more evidence against H₀
- Follows a χ² distribution with degrees of freedom depending on the test

1. Chi-Square Goodness-of-Fit Test

Question: Does the observed distribution of one categorical variable match a hypothesised distribution?

Assumptions

  1. Random sample
  2. Expected frequency ≥ 5 in each category (if not, combine categories)
  3. Observations are independent

Worked Example

Scenario: A die is rolled 120 times. Do the outcomes match a fair die?

Observed:
Face    O (observed)    E (expected = 120/6)
1       24              20
2       18              20
3       22              20
4       26              20
5       16              20
6       14              20
       ──────          ──────
Total  120              120

H₀: The die is fair (each face equally likely, p=1/6)
H₁: The die is not fair
α = 0.05

Step 1: Compute χ²
χ² = (24−20)²/20 + (18−20)²/20 + (22−20)²/20 + (26−20)²/20 + (16−20)²/20 + (14−20)²/20
   = 16/20 + 4/20 + 4/20 + 36/20 + 16/20 + 36/20
   = 0.8 + 0.2 + 0.2 + 1.8 + 0.8 + 1.8
   = 5.6

Step 2: Degrees of freedom
df = k − 1 = 6 − 1 = 5

Step 3: Critical value (or p-value)
From χ² table: χ²(5, α=0.05) = 11.07
Our statistic: χ² = 5.6 < 11.07 → FAIL TO REJECT H₀

p-value ≈ 0.35

Conclusion: No significant evidence that the die is unfair (χ²(5)=5.6, p=0.35).
The observed frequencies are consistent with a fair die.

Finding Expected Frequencies from a Hypothesised Distribution

H₀ doesn't have to be "uniform." It can be any specified distribution.

Example: Company claims 40% buy Product A, 35% buy B, 25% buy C.
Survey of 200 customers: 90 bought A, 65 bought B, 45 bought C.

Expected counts:
A: 200 × 0.40 = 80
B: 200 × 0.35 = 70
C: 200 × 0.25 = 50

χ² = (90−80)²/80 + (65−70)²/70 + (45−50)²/50
   = 100/80 + 25/70 + 25/50
   = 1.25 + 0.357 + 0.50
   = 2.107

df = 3 − 1 = 2
χ²(2, 0.05) = 5.991
2.107 < 5.991 → fail to reject H₀

Survey data is consistent with the company's claimed distribution.

2. Chi-Square Test of Independence

Question: Are two categorical variables statistically independent (unrelated)?

H₀: The two variables are independent
H₁: The two variables are NOT independent (there is an association)

Contingency Table

The data is arranged in a two-way table (rows = one variable, columns = another):

Example: Is there a relationship between Department and Job Level?

                Junior   Mid-Level   Senior   Total
Finance           20         35         15      70
Technology        15         40         25      80
Marketing         25         20         5       50
Total             60         95         45     200

Expected Frequencies

Under H₀ (independence), the expected frequency for each cell:

E_ij = (Row total_i × Column total_j) / Grand total

E(Finance, Junior) = 70 × 60 / 200 = 21
E(Finance, Mid) = 70 × 95 / 200 = 33.25
E(Finance, Senior) = 70 × 45 / 200 = 15.75

E(Tech, Junior) = 80 × 60 / 200 = 24
E(Tech, Mid) = 80 × 95 / 200 = 38
E(Tech, Senior) = 80 × 45 / 200 = 18

E(Mktg, Junior) = 50 × 60 / 200 = 15
E(Mktg, Mid) = 50 × 95 / 200 = 23.75
E(Mktg, Senior) = 50 × 45 / 200 = 11.25

Computing χ²

χ² = Σ [(O − E)² / E]

Cell contributions:
Finance, Junior:    (20−21)²/21   = 1/21   = 0.048
Finance, Mid:       (35−33.25)²/33.25 = 3.0625/33.25 = 0.092
Finance, Senior:    (15−15.75)²/15.75 = 0.5625/15.75 = 0.036
Tech, Junior:       (15−24)²/24   = 81/24  = 3.375
Tech, Mid:          (40−38)²/38   = 4/38   = 0.105
Tech, Senior:       (25−18)²/18   = 49/18  = 2.722
Mktg, Junior:       (25−15)²/15   = 100/15 = 6.667
Mktg, Mid:          (20−23.75)²/23.75 = 14.0625/23.75 = 0.592
Mktg, Senior:       (5−11.25)²/11.25 = 39.0625/11.25 = 3.472

χ² = 0.048 + 0.092 + 0.036 + 3.375 + 0.105 + 2.722 + 6.667 + 0.592 + 3.472 = 17.11

Degrees of freedom = (rows − 1) × (columns − 1) = (3−1) × (3−1) = 2 × 2 = 4

χ²(4, α=0.05) = 9.488
χ² = 17.11 > 9.488 → REJECT H₀

p-value < 0.002

Conclusion: There is a significant association between Department and Job Level.
The distribution of seniority differs across departments.

Interpreting the Result

The test tells you there IS a relationship, but not what it is. Look at the data:

Standardised residuals = (O − E) / √E

Tech, Junior: (15−24)/√24 = −9/4.9 = −1.84  (fewer junior than expected)
Tech, Senior: (25−18)/√18 = +7/4.24 = +1.65  (more senior than expected)
Mktg, Junior: (25−15)/√15 = +10/3.87 = +2.58 ← large positive (more junior than expected)
Mktg, Senior: (5−11.25)/√11.25 = −6.25/3.35 = −1.87 (fewer senior than expected)

→ Technology has more senior employees than expected
→ Marketing has more junior employees than expected

Practical Examples

Example 1: Website Traffic Source Analysis

Expected (from last year's data): Organic=50%, Paid=30%, Referral=15%, Direct=5%
This year's sample (n=500): Organic=230, Paid=160, Referral=85, Direct=25

E: Organic=250, Paid=150, Referral=75, Direct=25

χ² = (230−250)²/250 + (160−150)²/150 + (85−75)²/75 + (25−25)²/25
   = 400/250 + 100/150 + 100/75 + 0/25
   = 1.6 + 0.667 + 1.333 + 0
   = 3.6

df = 4−1 = 3
χ²(3, 0.05) = 7.815
3.6 < 7.815 → fail to reject H₀

Traffic source distribution is not significantly different from last year.

Example 2: Drug Side Effects by Gender

Question: Are side effects associated with gender?

            Side Effects    No Side Effects    Total
Male             45              155            200
Female           35              165            200
Total            80              320            400

Expected:
Male with SE:    200×80/400 = 40
Male without:    200×320/400 = 160
Female with SE:  200×80/400 = 40
Female without:  200×320/400 = 160

χ² = (45−40)²/40 + (155−160)²/160 + (35−40)²/40 + (165−160)²/160
   = 25/40 + 25/160 + 25/40 + 25/160
   = 0.625 + 0.156 + 0.625 + 0.156
   = 1.562

df = (2−1)(2−1) = 1
χ²(1, 0.05) = 3.841
1.562 < 3.841 → fail to reject H₀

No significant association between gender and side effects.

Example 3: A/B Test Conversion (2×2 Table)

Question: Is conversion rate different between Version A and Version B?

            Converted    Not Converted    Total
Version A      120            880         1000
Version B      148            852         1000
Total          268           1732         2000

Expected:
A, Converted: 1000×268/2000 = 134
A, Not:       1000×1732/2000 = 866
B, Converted: 134
B, Not:       866

χ² = (120−134)²/134 + (880−866)²/866 + (148−134)²/134 + (852−866)²/866
   = 196/134 + 196/866 + 196/134 + 196/866
   = 1.463 + 0.226 + 1.463 + 0.226
   = 3.378

df = 1
χ²(1, 0.05) = 3.841
3.378 < 3.841 → fail to reject H₀ (p ≈ 0.066)

Conversion rates are not significantly different (p=0.066). Need more data.

For 2×2 Tables: Yates' Correction

When df=1, apply Yates' continuity correction for better approximation:
χ² = Σ [(|O − E| − 0.5)² / E]

Measures of Association

After rejecting independence, measure the strength of association:

Phi coefficient (2×2 tables):
φ = √(χ²/n)   → ranges from 0 to 1

Cramér's V (larger tables):
V = √(χ²/(n × min(r−1, c−1)))   → ranges from 0 to 1

Interpretation:
V ≈ 0.1 → weak association
V ≈ 0.3 → moderate association
V ≈ 0.5 → strong association

For our Department-Level example:
V = √(17.11/(200 × min(2,2))) = √(17.11/400) = √0.04278 = 0.207 → moderate association

Common Mistakes

1. Expected frequency < 5

If any E < 5, the χ² approximation breaks down.
Fix: Combine small categories, or use Fisher's Exact Test (2×2 tables)

2. Using χ² for quantitative data

χ² works on COUNTS, not means or continuous values.
For comparing means: use t-test or ANOVA.

3. Confusing goodness-of-fit with independence

Goodness-of-fit: ONE variable, comparing to a known distribution (df = k−1)
Independence: TWO variables in a contingency table (df = (r−1)(c−1))

4. Ignoring effect size after significance

With large samples, even tiny associations become significant.
Always compute Cramér's V to assess practical significance.

5. Direction of association from the p-value

χ² only tells you IF there's an association — not WHICH direction.
Look at observed vs expected, or standardised residuals, to understand the nature of the association.

Practice Exercises

  1. Roll a die 180 times: Observed: 1→28, 2→32, 3→25, 4→35, 5→20, 6→40. Test if the die is fair (α=0.05).

  2. 200 customers are surveyed about brand preference: Brand A=65, B=80, C=55. The company claims equal preference (33.3% each). Test this claim (α=0.05).

  3. Survey of 300 people: Is preferred news source (TV/Online/Print) associated with age group (Young/Middle/Senior)? Set up the contingency table, compute expected values, and test independence.

  4. For a 2×2 contingency table with χ²=4.5 and n=100, compute Cramér's V. Is this a strong, moderate, or weak association?

  5. A quality inspector finds χ² = 2.3 with df=3 (p=0.51). A colleague says "the data is perfect." What's wrong with this interpretation?

Summary

In this chapter you learned:

  • Chi-square statistic: χ² = Σ[(O−E)²/E] — measures how far observed counts are from expected; always ≥ 0
  • Goodness-of-fit test: one variable; df = k−1; compares observed to any hypothesised distribution
  • Test of independence: two-way contingency table; df = (r−1)(c−1); E_ij = (row_i × col_j)/n
  • Decision rule: χ² > χ²_critical (or p < α) → reject H₀ (of fit / of independence)
  • Assumption: all expected frequencies ≥ 5; if not, combine cells or use Fisher's Exact Test
  • Cramér's V: measure of association strength after significant χ²; V = √(χ²/(n×min(r−1,c−1)))
  • Standardised residuals: (O−E)/√E → identify which cells drive the association
  • χ² reveals IF an association exists; look at residuals to understand WHAT the association is
  • For quantitative outcomes, use t-tests/ANOVA; for counts/categories, use χ²

Next up: ANOVA — Analysis of Variance for comparing means across three or more groups simultaneously.