Data Types & Measurement Scales

Why Measurement Scales Matter

The level at which you measure a variable determines what you can do with it statistically. Using the wrong method for the wrong scale produces meaningless — sometimes dangerously wrong — results.

Example: Can you calculate the average blood type? A=1, B=2, AB=3, O=4 → "mean blood type = 2.3" is nonsense. Blood type is nominal — averaging it makes no sense.

Stevens' Four Levels of Measurement

Stanley Smith Stevens (1946) defined four hierarchical levels. Each level includes all the properties of the levels below it.

RATIO    ← highest (all arithmetic operations valid)
INTERVAL
ORDINAL
NOMINAL  ← lowest (only categorisation)

1. Nominal Scale

Properties: Categories only. No order. No arithmetic.

Examples:
- Blood type: A, B, AB, O
- Department: Finance, Technology, Marketing, HR
- Marital status: Single, Married, Divorced
- Yes/No responses
- Country of origin

What you can do:

Count frequencies (how many in each category)
Find the mode (most common category)
Calculate proportions and percentages

What you cannot do:

Calculate mean or median
Say one category is "greater than" another
Measure the "distance" between categories

Statistical tests: Chi-square test (Chapter 15), mode, frequency tables, bar charts

Survey: 100 employees
Department: Finance=32, Technology=28, Marketing=25, HR=15

Mode = Finance (most common)
Finance proportion = 32/100 = 32%
Cannot say: "Average department = 2.2" ← INVALID

2. Ordinal Scale

Properties: Categories with a meaningful order, but the gaps between categories are not necessarily equal.

Examples:
- Satisfaction rating: 1=Very Dissatisfied, 2=Dissatisfied, 3=Neutral, 4=Satisfied, 5=Very Satisfied
- Education level: School < Undergraduate < Postgraduate < PhD
- Bond rating: AAA > AA > A > BBB > BB > B > CCC
- Military rank: Private < Corporal < Sergeant < Lieutenant < Captain
- Likert scale items

What you can do:

Everything nominal can do
Rank observations
Find median (the middle rank)
Compare "greater than" / "less than"

What you cannot do:

Assume equal intervals between categories
Calculate the mean (technically — though it's commonly done as an approximation)
Multiply or divide

Key insight: The gap between "Dissatisfied" (2) and "Neutral" (3) is NOT necessarily the same as between "Neutral" (3) and "Satisfied" (4). We just know the order.

Bond Ratings Example:
AAA=1, AA=2, A=3, BBB=4

A bond rated AA (rank 2) is better than A (rank 3)
But "AA is twice as good as BBB" is NOT a valid statement
Calculating "mean rating = 2.4" is technically incorrect
→ Median is the appropriate centre measure

Statistical tests: Spearman correlation (Chapter 17), Mann-Whitney U test, Kruskal-Wallis test, median

3. Interval Scale

Properties: Ordered categories with equal intervals between them, but NO true zero point. Zero is arbitrary.

Examples:
- Temperature in Celsius or Fahrenheit (0°C is not "no temperature")
- Calendar years (Year 0 doesn't mean "no time")
- IQ scores
- pH scale
- Dates (2000, 2010, 2020 — 0 AD is not meaningful as "no year")

What you can do:

Everything ordinal can do
Add and subtract (calculate differences)
Mean and standard deviation are valid
Calculate intervals: 30°C – 20°C = 10°C difference

What you cannot do:

Multiply or divide meaningfully (no true zero)
Say "40°C is twice as hot as 20°C" — this is WRONG

Temperature Example:
July avg: 38°C, January avg: 18°C
Difference: 38 – 18 = 20°C ← VALID (equal intervals)
Ratio: 38/18 = 2.1 ← INVALID ("38°C is 2.1 times as hot as 18°C" is meaningless)

Proof: 38°C = 100.4°F, 18°C = 64.4°F
100.4/64.4 = 1.56 (different ratio! — ratio depends on the scale chosen)

Statistical tests: Pearson correlation, t-tests, ANOVA, mean, standard deviation

4. Ratio Scale

Properties: Ordered, equal intervals, AND a true zero that means "none of the quantity." All arithmetic operations are valid.

Examples:
- Salary: ₹0 means no salary
- Height: 0 cm means no height
- Weight: 0 kg means no weight
- Distance: 0 km means no distance
- Revenue: ₹0 means no revenue
- Age: 0 years means just born
- Number of transactions: 0 means none
- Temperature in Kelvin: 0 K = absolute zero (true absence of heat)

What you can do:

All operations: add, subtract, multiply, divide
All statistical methods
Ratios are meaningful: ₹100,000 salary is twice ₹50,000

Statistical tests: All parametric tests, geometric mean, coefficient of variation

Salary Example:
Priya earns ₹80,000 and Raj earns ₹40,000
Ratio: 80,000 / 40,000 = 2 → Priya earns twice as much ← VALID

The ratio is meaningful because ₹0 = truly no salary

Summary Table

Property	Nominal	Ordinal	Interval	Ratio
Categories	✓	✓	✓	✓
Meaningful order	✗	✓	✓	✓
Equal intervals	✗	✗	✓	✓
True zero	✗	✗	✗	✓
Mode	✓	✓	✓	✓
Median	✗	✓	✓	✓
Mean	✗	✗*	✓	✓
SD / Variance	✗	✗*	✓	✓
Ratios (×, ÷)	✗	✗	✗	✓

*Ordinal means are commonly computed in practice (e.g., survey ratings), but are technically approximate.

Discrete vs Continuous Revisited

Within quantitative (interval and ratio) variables, another important distinction:

Discrete: Only specific values possible (usually integers)

Number of customers: 0, 1, 2, 3, ... (not 2.7 customers)
Number of defects: 0, 1, 2, ...
Credit card transactions: 0, 1, 2, ...

Continuous: Any value in a range (including all decimals)

Salary: ₹78,432.50 (any positive real number)
Time to complete a task: 2.47 minutes
Height: 167.34 cm
Interest rate: 8.75%

The distinction matters for choosing between discrete (Binomial, Poisson) and continuous (Normal, t) probability distributions.

Practical Examples

Example 1: Finance Dataset Classification

Variable	Scale	Reason
Stock symbol (INFY, TCS)	Nominal	No order between symbols
Analyst rating (Buy/Hold/Sell)	Ordinal	Order exists but unequal gaps
Year (2020, 2021, 2022)	Interval	Equal gaps, but Year 0 is arbitrary
Share price	Ratio	₹0 = no price (true zero)
% Return	Ratio	0% = no return (true zero)
Credit rating (AAA to D)	Ordinal	Order matters, gaps are not equal
Number of shares	Ratio	0 shares = none

Example 2: Clinical Trial Dataset

Variable	Scale
Patient ID	Nominal
Treatment group (A/B/Placebo)	Nominal
Pain level (0–10 scale)	Ordinal
Body temperature (°C)	Interval
Blood pressure (mmHg)	Ratio
Recovery time (days)	Ratio
Improved? (Yes/No)	Nominal

Example 3: HR Survey

Annual engagement survey — variables:
Q1: "I am proud to work here" (1–5) → Ordinal
Q2: Years with the company → Ratio
Q3: Department → Nominal
Q4: Job grade (L1, L2, L3, L4) → Ordinal
Q5: Annual salary → Ratio
Q6: Working hours per week → Ratio

For Q1 (ordinal): Report the median and distribution, not the mean. For Q5 (ratio): Mean, median, and SD are all valid.

Common Mistakes

1. Treating ordinal as interval

Wrong: "Mean satisfaction score = 3.7 out of 5"
(Assumes equal gaps between each point — not guaranteed)

Why it matters: If 4→5 is a bigger improvement than 3→4,
averaging 3s and 5s together gives a distorted picture.

Common practice: Survey researchers often do this anyway as a useful approximation,
but it should be stated as an assumption, not a fact.

2. Computing ratios on interval data

Wrong: "2024 GDP growth was twice 2022 growth"
(Calendar years are interval — the year 0 doesn't mean "no time")

If growth rate is measured as %, that's ratio scale — ratios are valid.

3. Assigning numbers to nominal categories and treating them as numeric

Wrong:
Encode: Finance=1, Marketing=2, HR=3
Calculate: Mean department = 1.8 → "between Finance and Marketing"
→ Completely meaningless

4. Confusing discrete ratio with continuous

Number of support tickets = 0, 1, 2, 3... (discrete ratio)
→ Mean = 2.7 tickets/day is valid as an expected value
→ But you can't have 2.7 tickets in a single day

Practice Exercises

Classify each variable and justify your answer: a) Net Promoter Score (−100 to +100, where 0 = neutral) b) Movie rating (1 to 5 stars on Netflix) c) Revenue in ₹ crore d) Quarter (Q1, Q2, Q3, Q4) e) Temperature in Kelvin f) Student rank in class (1st, 2nd, 3rd...)
A researcher calculates the "average blood type" of 100 patients by coding A=1, B=2, AB=3, O=4. What is wrong with this approach?
For each variable below, identify the most appropriate measure of centre: a) Customer satisfaction (1–5 ordinal scale) b) Employee salaries in a department c) Most common job title in the company
Can you say that a pH of 8 is "twice as basic" as a pH of 4? Why or why not?
A finance team codes analyst recommendations: Sell=1, Hold=2, Buy=3. They report "average recommendation = 2.1". What assumptions are they making? Is this reasonable?

Summary

In this chapter you learned:

Nominal: categories only; no order; mode, frequency, bar chart
Ordinal: ordered categories; unequal gaps; median, rank-based tests
Interval: ordered, equal intervals, no true zero; mean, SD valid; ratios invalid
Ratio: all of the above + true zero; all arithmetic and statistical methods valid
Hierarchy: Ratio > Interval > Ordinal > Nominal (each level inherits lower level properties)
Discrete: whole numbers only; Continuous: any value in a range
The scale of measurement determines which statistics are valid — always identify your variable type before analysing
Ordinal means are technically approximate but widely used in practice

Next up: Data Collection & Sampling Methods — how data gets collected, and how sampling design affects the validity of every conclusion you draw.

Data Types & Measurement Scales

Why Measurement Scales Matter

Stevens' Four Levels of Measurement

1. Nominal Scale

2. Ordinal Scale

3. Interval Scale

4. Ratio Scale

Summary Table

Discrete vs Continuous Revisited

Practical Examples

Example 1: Finance Dataset Classification

Example 2: Clinical Trial Dataset

Example 3: HR Survey

Common Mistakes

1. Treating ordinal as interval

2. Computing ratios on interval data

3. Assigning numbers to nominal categories and treating them as numeric

4. Confusing discrete ratio with continuous

Practice Exercises

Summary

Explore Meritshot

Resources

Company

FAQs