Probability Fundamentals

What Is Probability?

Probability is the mathematical language for quantifying uncertainty. It assigns a number between 0 and 1 to the likelihood of an event:

P(event) = 0   → impossible (will never happen)
P(event) = 1   → certain (will always happen)
P(event) = 0.5 → equally likely to happen or not

Probability underpins everything in inferential statistics — confidence intervals, hypothesis tests, and machine learning models all rely on probability theory.

Sample Spaces and Events

Sample Space (S or Ω)

The set of ALL possible outcomes of an experiment.

Toss a coin:         S = {Heads, Tails}
Roll a die:          S = {1, 2, 3, 4, 5, 6}
Two coin tosses:     S = {HH, HT, TH, TT}
Select one employee: S = {Priya, Raj, Meera, Arjun, Kavya}

Event

A subset of the sample space — one or more outcomes you're interested in.

Roll a die. Event A = "roll an even number"
A = {2, 4, 6} ⊆ S = {1, 2, 3, 4, 5, 6}

Event B = "roll a number greater than 4"
B = {5, 6}

Defining Probability

Classical Probability (Equally Likely Outcomes)

P(A) = (number of outcomes in A) / (number of outcomes in S)

P(even number on die) = 3/6 = 0.5
P(roll a 6) = 1/6 ≈ 0.167
P(roll < 3) = 2/6 = 1/3 ≈ 0.333

Relative Frequency (Empirical) Probability

P(A) ≈ (number of times A occurred) / (number of trials)

Flipped a coin 1,000 times → Heads appeared 487 times
P(Heads) ≈ 487/1000 = 0.487 ≈ 0.5 (Law of Large Numbers: approaches 0.5 as n→∞)

Subjective Probability

A probability assigned based on judgement or expertise, not counted data.

"I estimate a 70% chance this merger will be approved."
"Our model gives a 35% probability of default."

Probability Rules

Rule 1: Probability is Between 0 and 1

0 ≤ P(A) ≤ 1    for any event A

Rule 2: Sum of All Probabilities = 1

The total probability across all possible outcomes = 1

P(S) = 1
P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 6/6 = 1

Rule 3: Complement Rule

The complement of event A (written Aᶜ or Ā) = "A does not happen."

P(Aᶜ) = 1 − P(A)

P(not rolling a 6) = 1 − P(6) = 1 − 1/6 = 5/6

P(at least one head in 5 coin tosses):
Direct calculation is complex.
Complement: P(no heads) = (1/2)⁵ = 1/32
P(at least one head) = 1 − 1/32 = 31/32 ≈ 0.969

The complement rule is extremely useful when "at least one" or "at least N" problems arise.

Types of Events

Mutually Exclusive (Disjoint) Events

Events that cannot happen simultaneously.

Rolling a die: Event A = {1,2}, Event B = {5,6}
A and B cannot both occur on one roll → mutually exclusive

A and B are NOT mutually exclusive:
A = {2,4,6} (even), B = {3,4,5,6} (> 2)
Both can happen if we roll a 4 or 6

Exhaustive Events

Events that together cover the entire sample space.

A = {1,2,3}, B = {4,5,6}
A ∪ B = {1,2,3,4,5,6} = S → A and B are exhaustive

Mutually exclusive AND exhaustive events form a partition of S.

The Addition Rule

For Mutually Exclusive Events

P(A or B) = P(A) + P(B)

P(rolling 1 or 6) = P(1) + P(6) = 1/6 + 1/6 = 2/6 = 1/3

General Addition Rule (Non-Mutually Exclusive)

When A and B can overlap, adding them double-counts the overlap:

P(A or B) = P(A) + P(B) − P(A and B)

A = {2,4,6} (even), P(A) = 3/6
B = {3,4,5,6} (> 2), P(B) = 4/6
A ∩ B = {4,6}, P(A and B) = 2/6

P(A or B) = 3/6 + 4/6 − 2/6 = 5/6

Venn Diagram intuition: P(A or B) = the total area covered by both circles; subtracting the intersection avoids double-counting it.

The Multiplication Rule

Used when you need the probability of two events both occurring.

For Independent Events

Two events are independent if knowing one happened doesn't change the probability of the other.

P(A and B) = P(A) × P(B)    [only if A and B are independent]

Two coin flips:
P(H on flip 1 AND H on flip 2) = P(H₁) × P(H₂) = 0.5 × 0.5 = 0.25

Rolling a die and flipping a coin:
P(6 AND Heads) = 1/6 × 1/2 = 1/12 ≈ 0.083

For Dependent Events (General Multiplication Rule)

P(A and B) = P(A) × P(B|A)

Where P(B|A) = probability of B given A has occurred (conditional probability — Chapter 8)

Drawing 2 aces from a deck without replacement:
P(Ace₁ AND Ace₂) = P(Ace₁) × P(Ace₂ | Ace₁)
                  = 4/52 × 3/51
                  = 12/2652
                  = 1/221
                  ≈ 0.0045

Probability Trees

A tree diagram maps out all possible outcomes and probabilities systematically.

Loan application (two stages: credit check, income verification):

Credit Check:
  P(Pass) = 0.7
  P(Fail) = 0.3

Income Verification (if passed credit check):
  P(Pass | Credit Pass) = 0.8
  P(Fail | Credit Pass) = 0.2

If failed credit check → automatic rejection (no income check)

Tree:
Credit Check    Income Check    Outcome         P
Pass (0.7) ─── Pass (0.8) ─── Approved     0.7×0.8 = 0.56
           └── Fail (0.2) ─── Rejected     0.7×0.2 = 0.14
Fail (0.3) ──────────────────── Rejected   0.3×1.0 = 0.30

Total: 0.56 + 0.14 + 0.30 = 1.00 ✓
P(Approved) = 0.56 (56%)

Counting: Combinations and Permutations

When calculating probability, you often need to count outcomes.

Permutations — Order Matters

Arranging r items from n distinct items, order matters:

P(n, r) = n! / (n−r)!

How many ways to arrange 3 people from a group of 5 in order (1st, 2nd, 3rd)?
P(5,3) = 5! / (5−3)! = 5! / 2! = 120 / 2 = 60 ways

Combinations — Order Doesn't Matter

Choosing r items from n items, order doesn't matter:

C(n, r) = n! / (r! × (n−r)!)    also written as ⁿCᵣ or (n choose r)

How many ways to select a committee of 3 from 5 people?
C(5,3) = 5! / (3! × 2!) = 120 / (6×2) = 10 ways

Probability that a specific 3-person committee is selected (from random draw):
P = 1 / C(5,3) = 1/10 = 0.10

Lottery Example

Lottery: pick 6 numbers from 1–49
Total combinations: C(49,6) = 49! / (6! × 43!) = 13,983,816
P(winning) = 1 / 13,983,816 ≈ 0.0000000715 (7.15 × 10⁻⁸)
→ Probability of winning = about 1 in 14 million

Practical Examples

Example 1: Loan Default Risk

A bank's historical data:
- P(applicant defaults) = 0.04 (4%)
- P(no default) = 0.96

For a portfolio of 3 independent loans:
P(all 3 default) = 0.04 × 0.04 × 0.04 = 0.000064 (0.0064%)
P(at least one defaults) = 1 − P(none default) = 1 − 0.96³ = 1 − 0.885 = 0.115 (11.5%)

→ Even with 4% individual default rate, there's 11.5% chance of at least one default

Example 2: Drug Testing

Medical test for a disease:
P(positive test | have disease) = 0.95 (sensitivity)
P(positive test | no disease) = 0.10 (false positive rate)

Test 3 independent patients, all without the disease:
P(all test negative) = 0.90 × 0.90 × 0.90 = 0.729
P(at least one false positive) = 1 − 0.729 = 0.271

→ 27% chance of at least one false positive when testing 3 disease-free patients

Example 3: Quality Control

Production line: 2% of items are defective (P(defective) = 0.02)
Items are inspected independently.

Quality control samples 10 items:
P(no defectives) = (0.98)^10 = 0.817 (81.7%)
P(at least one defective) = 1 − 0.817 = 0.183 (18.3%)

→ About 18% of 10-item batches will contain at least one defective
→ Chapter 9 (Binomial distribution) generalises this calculation

Example 4: Investment Outcomes

Three independent projects:
Project A: P(success) = 0.8
Project B: P(success) = 0.6
Project C: P(success) = 0.7

P(all three succeed) = 0.8 × 0.6 × 0.7 = 0.336
P(none succeed) = 0.2 × 0.4 × 0.3 = 0.024
P(at least one succeeds) = 1 − 0.024 = 0.976

Common Mistakes

1. Adding probabilities when they're not mutually exclusive

P(Finance) = 0.32, P(salary > 80k) = 0.45
P(Finance OR salary > 80k) ≠ 0.32 + 0.45 = 0.77   ← WRONG

Need to subtract the overlap:
P(Finance AND salary > 80k) = 0.15 (Finance employees over 80k)
P(Finance OR salary > 80k) = 0.32 + 0.45 − 0.15 = 0.62   ← CORRECT

2. Multiplying dependent events as if independent

Draw 2 cards from a deck without replacement:
Wrong: P(2 hearts) = 13/52 × 13/52 = 0.0625   (assumes replacement)
Right: P(2 hearts) = 13/52 × 12/51 = 0.0588   (without replacement — dependent)

3. Gambler's Fallacy

"I've flipped Tails 5 times in a row — Heads is 'due'."
WRONG: P(Heads on flip 6) = 0.5 (still)

Each flip is independent. The coin has no memory.
The Law of Large Numbers says proportions converge in the LONG run — not the short run.

4. Confusing P(A and B) with P(A or B)

P(A and B) = probability of BOTH happening (usually smaller)
P(A or B) = probability of AT LEAST ONE happening (usually larger)

Practice Exercises

A card is drawn from a standard 52-card deck. Find: a) P(red card) b) P(face card) c) P(red OR face card) — these are not mutually exclusive
Two dice are rolled. Find P(sum = 7).
A company has 10 applicants (6 experienced, 4 junior). If 3 are selected randomly, what is P(all 3 are experienced)?
A disease affects 1% of the population. A test has a 95% sensitivity and 5% false positive rate. If you test 100 disease-free people, what is P(at least one false positive)?
A basket contains 5 red and 3 blue balls. Two balls are drawn without replacement. Find: a) P(both red) b) P(one red, one blue) c) P(at least one red)

Summary

In this chapter you learned:

Probability — a number in [0,1] measuring the likelihood of an event
Sample space (S): all possible outcomes; Event: a subset of S
Classical probability: P(A) = favourable outcomes / total outcomes (equally likely)
Empirical probability: P(A) ≈ observed frequency / total trials
Complement rule: P(Aᶜ) = 1 − P(A); use for "at least one" problems
Mutually exclusive events: P(A and B) = 0; can't both happen
Addition rule: P(A or B) = P(A) + P(B) − P(A and B); for ME events, drop last term
Multiplication rule (independent): P(A and B) = P(A) × P(B)
Multiplication rule (general): P(A and B) = P(A) × P(B|A)
Permutations — order matters: n!/(n−r)!; Combinations — order doesn't: n!/(r!(n−r)!)
Gambler's fallacy: independent events have no memory

Next up: Conditional Probability & Bayes' Theorem — how new information updates our probability estimates.