What Is an Index?
An index is a separate data structure that the database maintains alongside your table to make lookups faster. Think of a book's index at the back — instead of reading every page to find "foreign keys", you jump straight to the page number.
Without an index, a query WHERE email = 'priya@meritshot.com' must scan every row in the table (a full table scan). With an index on email, the database jumps directly to the matching rows in microseconds — even on a table with millions of rows.
Full table scan: O(n) — checks every row
Index lookup: O(log n) — binary search on the index structure
Types of Indexes
| Type | Description |
|---|---|
| B-tree | Default; balanced tree structure; great for =, >, <, BETWEEN, ORDER BY |
| Hash | Optimised for exact-match (=) only; not useful for ranges |
| Full-text | Text search within documents (LIKE '%word%' is slow; FTS indexes fix this) |
| Partial | Index only rows that match a condition |
| Composite | Index on multiple columns together |
| Unique | Index that also enforces uniqueness |
Most of the time you'll use the default B-tree index.
Creating Indexes
Basic Syntax
CREATE INDEX idx_name ON table_name (column1);
CREATE UNIQUE INDEX idx_name ON table_name (column1); -- unique + fast lookup
CREATE INDEX IF NOT EXISTS idx_name ON table_name (column1);
Examples
-- Index on email for login lookups
CREATE INDEX idx_employees_email ON employees (email);
-- Index on hire_date for date range queries
CREATE INDEX idx_employees_hire_date ON employees (hire_date);
-- Unique index (same effect as UNIQUE constraint but created separately)
CREATE UNIQUE INDEX idx_employees_email_unique ON employees (email);
Composite Indexes
An index on multiple columns:
-- Index on (department, salary) together
CREATE INDEX idx_dept_salary ON employees (department, salary);
This index is useful for queries filtering on both columns:
-- Uses the composite index efficiently
SELECT name FROM employees
WHERE department = 'Finance'
AND salary > 75000;
The order of columns in a composite index matters — a (A, B) index helps queries filtering on A alone or on (A, B) together. It does not help queries filtering on B alone.
When Indexes Are Used (and When They're Not)
Index IS Used
-- Equality on indexed column
WHERE email = 'priya@meritshot.com'
-- Range on indexed column
WHERE salary BETWEEN 70000 AND 90000
-- Leftmost column of composite index
WHERE department = 'Finance'
-- ORDER BY on indexed column (may avoid sorting step)
ORDER BY hire_date DESC LIMIT 10
Index IS NOT Used
-- Function on indexed column destroys index usage
WHERE LOWER(email) = 'priya@meritshot.com' -- LOWER() breaks it
WHERE YEAR(hire_date) = 2022 -- YEAR() breaks it
-- Leading wildcard in LIKE — must scan
WHERE name LIKE '%Sharma' -- leading % means "can be anything before"
-- Type mismatch
WHERE id = '5' -- implicit cast can prevent index use
-- Very low cardinality column
WHERE is_active = 1 -- boolean with 2 values; full scan is often faster
Workarounds
-- Instead of LOWER(email), store email in lowercase
-- Or use a functional index (PostgreSQL)
CREATE INDEX idx_email_lower ON employees (LOWER(email));
SELECT * FROM employees WHERE LOWER(email) = 'priya@meritshot.com'; -- now uses index
-- Instead of YEAR(hire_date):
WHERE hire_date BETWEEN '2022-01-01' AND '2022-12-31' -- range on raw column
EXPLAIN — Understanding Query Execution
EXPLAIN (or EXPLAIN QUERY PLAN in SQLite) shows how the database plans to execute your query:
-- SQLite
EXPLAIN QUERY PLAN
SELECT name, salary FROM employees WHERE email = 'priya@meritshot.com';
-- PostgreSQL
EXPLAIN ANALYZE
SELECT name, salary FROM employees WHERE email = 'priya@meritshot.com';
-- MySQL
EXPLAIN
SELECT name, salary FROM employees WHERE email = 'priya@meritshot.com';
Without index:
SCAN TABLE employees -- reads every row
With index on email:
SEARCH TABLE employees USING INDEX idx_employees_email (email=?) -- fast!
Look for these terms in the output:
- SCAN /
Seq Scan— full table scan (slow on large tables) - SEARCH USING INDEX /
Index Scan— using an index (fast) - USING COVERING INDEX — all needed columns are in the index (very fast, no table lookup needed)
Dropping Indexes
DROP INDEX idx_employees_email;
DROP INDEX IF EXISTS idx_employees_email;
Automatically Created Indexes
Databases automatically create indexes for:
PRIMARY KEY— always indexedUNIQUEconstraint — always indexedFOREIGN KEY— MySQL creates one automatically; PostgreSQL does not
-- No need to manually create:
CREATE TABLE employees (
id INTEGER PRIMARY KEY, -- implicit index
email TEXT UNIQUE -- implicit unique index
);
Covering Indexes
A covering index is one that contains all the columns needed by the query — the database never has to touch the actual table rows:
-- Query needs name, salary, and filters on department
SELECT name, salary FROM employees WHERE department = 'Finance';
-- Covering index for this query
CREATE INDEX idx_dept_cover ON employees (department, name, salary);
-- Index contains department (filter), name and salary (output)
-- Query answered entirely from the index — no table rows needed
Partial Indexes
Index only a subset of rows (PostgreSQL, SQLite):
-- Only index active employees (not inactive ones taking up index space)
CREATE INDEX idx_active_employees
ON employees (name)
WHERE is_active = 1;
-- Only effective when your query also has the same WHERE:
SELECT name FROM employees WHERE is_active = 1 AND name LIKE 'P%';
Performance Guidelines
When to Add an Index
- Columns frequently used in
WHERE,JOIN ON,ORDER BY, orGROUP BY - High-cardinality columns (many unique values) — email, ID, timestamps
- Foreign key columns (to speed up joins)
- Columns used in range queries (
BETWEEN,>,<)
When NOT to Add an Index
- Very small tables (full scan is fine for <1,000 rows)
- Columns with very few unique values (e.g.,
is_activewith only 1 or 0) — low cardinality - Columns rarely used in WHERE or JOIN
- Tables with very frequent writes (each index adds overhead to INSERT/UPDATE/DELETE)
The Write Overhead Trade-off
Every index speeds up reads but slows down writes. When you INSERT a row, the database must update every index on that table:
- 5 indexes on a table → every INSERT writes to 6 places (table + 5 indexes)
- This is fine for read-heavy tables; problematic for write-heavy tables
Rule of Thumb
Start with no indexes beyond the PK and FKs. Profile slow queries with EXPLAIN. Add targeted indexes only for proven slow queries.
Practical Examples
Example 1: Optimise a Login Query
-- Slow without index on email (full scan on millions of rows)
SELECT id, name FROM users WHERE email = 'user@example.com';
-- Add an index
CREATE UNIQUE INDEX idx_users_email ON users (email);
-- Now the query is a fast index lookup
Example 2: Optimise a Date Range Report
-- Monthly report — slow without index
SELECT COUNT(*), SUM(amount)
FROM orders
WHERE order_date BETWEEN '2026-01-01' AND '2026-01-31';
-- Index on order_date
CREATE INDEX idx_orders_date ON orders (order_date);
Example 3: Optimise a Join
-- Join often on employee_id in orders
SELECT e.name, COUNT(o.id) AS order_count
FROM employees e
JOIN orders o ON e.id = o.employee_id
GROUP BY e.name;
-- Index the foreign key column (PostgreSQL doesn't auto-create it)
CREATE INDEX idx_orders_employee ON orders (employee_id);
Common Mistakes
1. Indexing every column
Too many indexes hurt write performance significantly. Only index what's proven slow.
2. Using a function on an indexed column
-- Breaks the index
WHERE UPPER(name) = 'PRIYA SHARMA'
-- Use a functional index instead (PostgreSQL)
CREATE INDEX idx_name_upper ON employees (UPPER(name));
3. Assuming indexes are always used
The query planner may choose a full scan if the table is small or the selectivity is low. Use EXPLAIN to verify.
4. Forgetting the left-prefix rule for composite indexes
-- Index on (A, B, C)
WHERE A = 1 -- uses index
WHERE A = 1 AND B = 2 -- uses index
WHERE A = 1 AND B = 2 AND C = 3 -- uses index
WHERE B = 2 -- CANNOT use this index (A is not specified)
WHERE B = 2 AND C = 3 -- CANNOT use this index
Practice Exercises
- Create an index on the
employeestable for thehire_datecolumn. Then write a query that would use it. - What is the difference between a regular index and a unique index?
- Create a composite index on
(department, hire_date). Which queries would use it and which wouldn't? - Run
EXPLAINon a query that filters bydepartment. Does it use an index? - Why should you NOT add an index on a column like
is_activethat only has two possible values?
Summary
In this chapter you learned:
- An index is a separate data structure that makes lookups fast (O(log n) vs O(n))
- Default index type: B-tree — works for =, ranges, ORDER BY
CREATE INDEX idx_name ON table (column)— add an index- Composite indexes: column order matters; only left-prefix queries benefit
- Functions on indexed columns (e.g.,
LOWER(col)) usually break index usage - Use
EXPLAIN/EXPLAIN ANALYZEto see if your query uses an index - PRIMARY KEY and UNIQUE constraints automatically create indexes
- Trade-off: indexes speed reads, slow writes — don't over-index
- Covering indexes include all query columns — avoid table row lookups entirely
- Partial indexes (PostgreSQL/SQLite) index only rows matching a condition
Next up: Views — create reusable, named query results.