Chapter 8 of 15

Statistical Functions

Analyse data with SUMIF, AVERAGEIF, COUNTIF, PERCENTILE, STDEV, RANK, and other statistical functions for business reporting.

Meritshot8 min read
ExcelStatistical FunctionsSUMIFAVERAGEIFPERCENTILESTDEVRANK
All Excel Chapters

Why Statistical Functions?

Raw numbers don't tell a story — statistics do. Statistical functions help you understand distributions, spot outliers, compare performance, and summarise large datasets into meaningful metrics.

Sample Dataset

A           B           C       D       E
Emp ID      Department  Salary  Rating  Years
EMP-001     Finance     78000   4       3
EMP-002     Technology  95000   5       7
EMP-003     Marketing   68000   3       2
EMP-004     Finance     82000   4       5
EMP-005     Technology  110000  5       9
EMP-006     Marketing   72000   4       4
EMP-007     Finance     91000   5       6
EMP-008     Technology  88000   3       3
EMP-009     HR          62000   4       2
EMP-010     Finance     75000   3       4

SUMIF and SUMIFS (Review and Advanced)

SUMIF with Wildcards

=SUMIF(B2:B11, "Tech*", C2:C11)
→ Sum salaries for any department starting with "Tech"

=SUMIF(B2:B11, "*nology", C2:C11)
→ Sum for departments ending in "nology"

=SUMIF(C2:C11, ">"&AVERAGE(C2:C11), C2:C11)
→ Sum salaries above average (dynamic criteria using cell reference)

SUMIFS with Date Range

-- Sum sales between two dates:
=SUMIFS(Sales!C:C, Sales!A:A, ">="&DATE(2026,1,1), Sales!A:A, "<="&DATE(2026,3,31))
→ Q1 2026 sales total

SUMPRODUCT — The Flexible Alternative

When SUMIFS isn't enough (OR conditions, calculated criteria):

=SUMPRODUCT((B2:B11="Finance") * C2:C11)
→ Sum Finance salaries — same as SUMIF

=SUMPRODUCT(((B2:B11="Finance")+(B2:B11="Technology")) * C2:C11)
→ Sum Finance OR Technology (OR condition — SUMIFS can't do this!)

=SUMPRODUCT((D2:D11>=4) * (C2:C11>80000) * C2:C11)
→ Sum salaries for rating ≥ 4 AND salary > 80000

COUNTIF and COUNTIFS (Advanced)

-- Count using cell reference as criteria:
=COUNTIF(C2:C11, ">"&B15)    → count above threshold in B15

-- Count unique values (array formula, Ctrl+Shift+Enter in old Excel):
=SUMPRODUCT(1/COUNTIF(B2:B11, B2:B11))
→ Count of unique departments

-- Count non-blank:
=COUNTA(B2:B11) - COUNTBLANK(B2:B11)   → same as COUNTA alone

AVERAGEIF and AVERAGEIFS

=AVERAGEIF(B2:B11, "Finance", C2:C11)
→ Average salary for Finance department: (78000+82000+91000+75000)/4 = 81500

=AVERAGEIFS(C2:C11, B2:B11, "Technology", D2:D11, ">=4")
→ Average salary for Technology employees with rating ≥ 4

Descriptive Statistics

Measures of Central Tendency

=AVERAGE(C2:C11)     → arithmetic mean (81100 for our data)
=MEDIAN(C2:C11)      → middle value — less sensitive to outliers (80000)
=MODE(C2:C11)        → most frequent value (might be #N/A if all unique)
=MODE.SNGL(C2:C11)   → single mode (Excel 2010+)
=MODE.MULT(C2:C11)   → all modes if multiple values share the peak

Measures of Spread

=MAX(C2:C11) - MIN(C2:C11)   → range (110000 - 62000 = 48000)

=STDEV(C2:C11)       → sample standard deviation (estimates population from sample)
=STDEV.S(C2:C11)     → same as STDEV (explicit sample version)
=STDEV.P(C2:C11)     → population standard deviation (when you have ALL data)

=VAR.S(C2:C11)       → variance (sample)
=VAR.P(C2:C11)       → variance (population)

Standard deviation tells you: On average, how far is each value from the mean?

If AVERAGE = 81100 and STDEV = 14000:
→ Most salaries fall between 67100 and 95100 (±1 SD)
→ A salary of 110000 is (110000-81100)/14000 = 2.1 standard deviations above average

Percentiles and Quartiles

=PERCENTILE(C2:C11, 0.90)    → 90th percentile salary
=PERCENTILE.INC(C2:C11, 0.5) → same as MEDIAN
=PERCENTILE.EXC(C2:C11, 0.9) → exclusive (doesn't include endpoints)

=QUARTILE(C2:C11, 1)   → Q1 (25th percentile)
=QUARTILE(C2:C11, 2)   → Q2 (50th = median)
=QUARTILE(C2:C11, 3)   → Q3 (75th percentile)

IQR = Q3 - Q1
Upper fence (outlier threshold) = Q3 + 1.5 * IQR
Lower fence = Q1 - 1.5 * IQR

RANK Functions

Rank values within a list:

=RANK(C2, $C$2:$C$11)         → rank of C2 in range (1=highest by default)
=RANK.EQ(C2, $C$2:$C$11, 0)  → 0=descending (highest rank=1), same as RANK
=RANK.EQ(C2, $C$2:$C$11, 1)  → 1=ascending (lowest value=rank 1)
=RANK.AVG(C2, $C$2:$C$11)    → averages rank when there are ties

Example: Rank each employee's salary:

F2: =RANK.EQ(C2, $C$2:$C$11)
→ EMP-002 (95000) → 3rd, EMP-005 (110000) → 1st, EMP-009 (62000) → 10th

LARGE and SMALL

Return the nth largest or smallest value:

=LARGE(C2:C11, 1)    → highest salary: 110000
=LARGE(C2:C11, 2)    → 2nd highest: 95000
=LARGE(C2:C11, 3)    → 3rd highest: 91000

=SMALL(C2:C11, 1)    → lowest: 62000
=SMALL(C2:C11, 2)    → 2nd lowest: 68000

Top 3 salaries:

=LARGE($C$2:$C$11, ROW(A1))   → drag down 3 rows: returns 1st, 2nd, 3rd largest

MIN, MAX with Conditions

Standard MIN/MAX don't take criteria — work around with MINIFS/MAXIFS (Excel 2019+) or array formulas:

=MINIFS(C2:C11, B2:B11, "Finance")    → lowest Finance salary
=MAXIFS(C2:C11, B2:B11, "Technology") → highest Tech salary

-- Pre-2019 equivalent (Ctrl+Shift+Enter array formula):
=MIN(IF(B2:B11="Finance", C2:C11))

Correlation and Regression

=CORREL(C2:C11, E2:E11)    → correlation between salary and years: how related are they?
→ -1 = perfect inverse, 0 = no relationship, +1 = perfect positive correlation

=SLOPE(C2:C11, E2:E11)     → slope of the best-fit line (salary per year of experience)
=INTERCEPT(C2:C11, E2:E11) → y-intercept
=RSQ(C2:C11, E2:E11)       → R-squared (how well years predicts salary; 0–1)

FREQUENCY — Distribution Analysis

Count how many values fall in each bucket:

Bins in F2:F5: 70000, 80000, 90000, 100000

-- Select G2:G6 (one more row than bins), enter formula, Ctrl+Shift+Enter:
{=FREQUENCY(C2:C11, F2:F5)}
→ G2: count < 70000
→ G3: count 70001-80000
→ G4: count 80001-90000
→ G5: count 90001-100000
→ G6: count > 100000

Z-Score (Standardisation)

How many standard deviations is a value from the mean?

=STANDARDIZE(C2, AVERAGE($C$2:$C$11), STDEV.S($C$2:$C$11))

-- Or manually:
=(C2 - AVERAGE($C$2:$C$11)) / STDEV.S($C$2:$C$11)

→ Z-score of 2 = 2 standard deviations above average
→ Z-score of -1 = 1 SD below average
→ |Z| > 2 typically flags as an outlier

Practical Examples

Example 1: Salary Band Analysis

Summary table:

Department | Count | Avg Salary | Min | Max | Median | Std Dev
Finance    | =COUNTIF(B:B,"Finance") | =AVERAGEIF(B:B,"Finance",C:C) | =MINIFS(C:C,B:B,"Finance") | =MAXIFS(C:C,B:B,"Finance") | ...
Technology | ...
Marketing  | ...

Example 2: Performance Distribution

-- How many employees in each rating?
=COUNTIF($D$2:$D$11, 1)   → count of rating 1
=COUNTIF($D$2:$D$11, 2)   → count of rating 2
... 
=COUNTIF($D$2:$D$11, 5)

-- Average salary per rating band:
=AVERAGEIF($D$2:$D$11, 5, $C$2:$C$11)  → average salary of top performers

Example 3: Outlier Detection

Q1 = =QUARTILE($C$2:$C$11, 1)
Q3 = =QUARTILE($C$2:$C$11, 3)
IQR = Q3 - Q1

Upper fence = Q3 + 1.5 * IQR
Lower fence = Q1 - 1.5 * IQR

-- Flag outliers:
=IF(OR(C2 > (Q3+1.5*IQR), C2 < (Q1-1.5*IQR)), "Outlier", "Normal")

Example 4: Sales Performance Ranking Report

A = Salesperson, B = Monthly Revenue

Rank: =RANK.EQ(B2, $B$2:$B$11)
Top Quartile: =IF(B2 >= PERCENTILE($B$2:$B$11, 0.75), "Top 25%", "")
Z-Score: =(B2-AVERAGE($B$2:$B$11))/STDEV.S($B$2:$B$11)

Common Mistakes

1. STDEV vs STDEV.P confusion

STDEV / STDEV.S → sample (use when data is a sample of a larger population)
STDEV.P         → population (use when data IS the entire population)

For most business analyses (employee sample, sales sample), use STDEV.S.

2. AVERAGE ignores empty cells but not zero

=AVERAGE(0, 0, 100)   → 33.3 (zeros count!)
=AVERAGE("", "", 100) → 100 (text/blanks excluded automatically)

If you want to average only non-zero values:

=AVERAGEIF(C2:C11, "<>0")

3. RANK gives same rank to ties but skips the next rank

Values: 100, 100, 80
RANK: 1, 1, 3   (not 1, 1, 2!)
Use RANK.AVG for fair tie-breaking: 1.5, 1.5, 3

4. PERCENTILE vs QUARTILE

=QUARTILE(range, 1)     = =PERCENTILE(range, 0.25)
=QUARTILE(range, 2)     = =PERCENTILE(range, 0.5) = MEDIAN
=QUARTILE(range, 3)     = =PERCENTILE(range, 0.75)

Practice Exercises

  1. For each department in the sample data, calculate: count, average salary, minimum, maximum, and standard deviation.
  2. Rank all employees by salary (1 = highest) and add a "Top 25%" flag using PERCENTILE.
  3. Identify salary outliers using the IQR method (values beyond Q1 - 1.5×IQR or Q3 + 1.5×IQR).
  4. Calculate the correlation between years of experience and salary. Is there a strong relationship?
  5. Use FREQUENCY to create a salary distribution showing how many employees fall in each ₹10,000 bracket from ₹60,000 to ₹120,000.

Summary

In this chapter you learned:

  • SUMIF/SUMIFS with wildcards and dynamic criteria; SUMPRODUCT for OR conditions
  • AVERAGEIF/AVERAGEIFS — conditional average
  • MEDIAN — outlier-resistant central tendency; MODE.SNGL — most frequent value
  • STDEV.S (sample) vs STDEV.P (population); VAR.S — variance
  • PERCENTILE(range, k) — k as decimal (0–1); QUARTILE(range, 1/2/3) for Q1/Q2/Q3
  • RANK.EQ(value, range) — rank descending; RANK.AVG — average rank on ties
  • LARGE(range, n) / SMALL(range, n) — nth largest/smallest
  • MINIFS / MAXIFS — conditional min and max (Excel 2019+)
  • CORREL — correlation coefficient; SLOPE, INTERCEPT, RSQ for regression
  • FREQUENCY — distribution buckets (array formula)
  • Z-score: (value - mean) / stdev — how many standard deviations from average

Next up: Sorting & Filtering — organise and slice your data with sort, AutoFilter, and Advanced Filter.