Why Database Design Matters
Poor database design leads to:
- Data anomalies — inserting, updating, or deleting a record causes unintended effects in other records
- Data redundancy — the same information stored in many places; update one and miss others
- Inconsistency — different rows disagree about the same fact
- Query difficulty — poorly structured data is hard to query correctly
Good design prevents all of these through normalization and careful schema planning.
The Design Process
- Understand requirements — what data needs to be stored? What questions will be asked?
- Identify entities — things with distinct existence (Customer, Order, Product)
- Identify attributes — properties of each entity (customer: name, email, phone)
- Identify relationships — how entities relate (customer places orders)
- Choose primary keys — unique identifier for each entity
- Normalize — eliminate redundancy while preserving information
- Define constraints — enforce business rules at the database level
Entity-Relationship (ER) Modeling
An ER diagram maps the conceptual model before writing any SQL.
Entities and Attributes
CUSTOMER: customer_id (PK), name, email, phone, city
ORDER: order_id (PK), order_date, status, total_amount
PRODUCT: product_id (PK), name, price, category, stock_qty
Relationship Types
| Type | Description | Example |
|---|---|---|
| One-to-One (1:1) | One A relates to one B | Employee ↔ Passport |
| One-to-Many (1:N) | One A relates to many B | Customer → Orders |
| Many-to-Many (M:N) | Many A relate to many B | Orders ↔ Products |
Implementing Relationships in SQL
One-to-Many: FK in the "many" side
CREATE TABLE customers (customer_id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER NOT NULL, -- FK to customers
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
-- One customer has many orders; each order belongs to one customer
Many-to-Many: Junction/bridge table
-- Orders and Products: one order can have many products; one product can be in many orders
CREATE TABLE order_items (
order_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
quantity INTEGER NOT NULL DEFAULT 1,
unit_price DECIMAL(10,2) NOT NULL,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
One-to-One: FK with UNIQUE constraint
CREATE TABLE employees (employee_id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE employee_contracts (
contract_id INTEGER PRIMARY KEY,
employee_id INTEGER UNIQUE NOT NULL, -- UNIQUE enforces 1:1
start_date DATE,
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);
What Is Normalization?
Normalization is the process of organising a database to reduce data redundancy and improve data integrity. It's done through a series of Normal Forms (NF) — each one removes a specific type of anomaly.
The Un-normalised Table
Start with a single "flat" table that stores everything:
order_data (bad design)
order_id | customer_name | customer_email | customer_city | product1 | qty1 | product2 | qty2 | order_date
---------|---------------|--------------------|----|----------|------|----------|------|---
1001 | Priya Sharma | priya@example.com | Delhi | Laptop | 1 | Mouse | 2 | 2026-01-10
1002 | Raj Patel | raj@example.com | Mumbai | Phone | 1 | NULL | NULL| 2026-01-15
1003 | Priya Sharma | priya@example.com | Delhi | Tablet | 1 | NULL | NULL| 2026-02-01
Problems:
- Customer info repeated across orders (update email in one place, miss another)
- Fixed number of product columns (what if order has 10 items?)
- Empty NULLs for unused product slots
First Normal Form (1NF)
1NF Rules:
- Each column contains atomic (indivisible) values — no lists, no arrays, no repeated columns
- Each row must be unique (requires a primary key)
- No repeating groups (no product1, product2, product3 columns)
Violation in our table:
product1,qty1,product2,qty2are repeating groups- A customer might have multiple phone numbers in one cell:
"9999,8888"
Fix — Remove repeating groups:
-- 1NF: separate rows for each product in the order
orders (1NF):
order_id | customer_name | customer_email | customer_city | product_name | qty | order_date
---------|---------------|------------------|--------------|--------------|-----|----------
1001 | Priya Sharma | priya@example.com| Delhi | Laptop | 1 | 2026-01-10
1001 | Priya Sharma | priya@example.com| Delhi | Mouse | 2 | 2026-01-10
1002 | Raj Patel | raj@example.com | Mumbai | Phone | 1 | 2026-01-15
1003 | Priya Sharma | priya@example.com| Delhi | Tablet | 1 | 2026-02-01
Now 1NF — but we still have redundancy (customer info repeated for each product in order 1001).
Second Normal Form (2NF)
2NF requires:
- Already in 1NF
- Every non-key column must depend on the entire primary key — not just part of it
If the primary key is (order_id, product_name), then:
qtydepends on both (the quantity of a product in a specific order) ✓customer_namedepends only onorder_id(not on product_name) ✗ — partial dependency
Fix — Split into separate tables:
-- Customers
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE,
city TEXT
);
-- Orders (each order belongs to one customer)
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER NOT NULL,
order_date DATE NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
-- Products
CREATE TABLE products (
product_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
price DECIMAL(10,2)
);
-- Order items (the bridge table)
CREATE TABLE order_items (
order_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
qty INTEGER NOT NULL DEFAULT 1,
unit_price DECIMAL(10,2) NOT NULL,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Now in 2NF: each column depends on the full primary key of its table.
Third Normal Form (3NF)
3NF requires:
- Already in 2NF
- No transitive dependencies — non-key column should not depend on another non-key column
Example Violation
-- employees with transitive dependency
employees:
employee_id | name | department_id | department_name | department_budget
------------|-------|--------------|-----------------|------------------
1 | Priya | 10 | Finance | 500000
2 | Raj | 20 | Technology | 750000
3 | Meera | 10 | Finance | 500000
department_namedepends ondepartment_id(not onemployee_id)department_budgetdepends ondepartment_id- These are transitive dependencies:
employee_id → department_id → department_name
Fix — Move department data to its own table:
CREATE TABLE departments (
department_id INTEGER PRIMARY KEY,
department_name TEXT NOT NULL,
budget DECIMAL(12,2)
);
CREATE TABLE employees (
employee_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER NOT NULL,
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);
Now in 3NF: every non-key column depends only on the primary key.
Boyce-Codd Normal Form (BCNF)
BCNF is a slightly stronger version of 3NF: every determinant must be a candidate key. For most practical database design, 3NF is sufficient.
Summary of Normal Forms
| Normal Form | Removes |
|---|---|
| 1NF | Repeating groups, non-atomic values |
| 2NF | Partial dependencies (on part of composite PK) |
| 3NF | Transitive dependencies (non-key → non-key) |
| BCNF | All anomalies from functional dependencies |
Denormalization — When to Break the Rules
Normalization is best for write-heavy systems (fewer anomalies, easier updates). But for read-heavy systems (analytics, reporting), joining many tables is expensive.
Denormalization intentionally adds redundancy to improve read performance:
-- Denormalised orders table (no need to join for common reports)
CREATE TABLE orders_denorm (
order_id INTEGER PRIMARY KEY,
order_date DATE,
customer_id INTEGER,
customer_name TEXT, -- redundant but fast to read
customer_city TEXT, -- redundant
total_items INTEGER, -- pre-computed
total_amount DECIMAL(10,2) -- pre-computed
);
You trade storage and write complexity for read speed. This is common in:
- Data warehouses (star schema, snowflake schema)
- Analytics databases (BigQuery, Redshift)
- Pre-aggregated reporting tables
Real-World Design Patterns
Audit Columns
Every production table should track when and by whom rows were created or changed:
CREATE TABLE employees (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
department TEXT NOT NULL,
salary DECIMAL NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by INTEGER, -- user who created the record
updated_by INTEGER -- user who last modified it
);
Soft Deletes
Don't delete rows — mark them as deleted to preserve history:
ALTER TABLE employees ADD COLUMN deleted_at TIMESTAMP;
ALTER TABLE employees ADD COLUMN is_deleted INTEGER DEFAULT 0;
-- "Delete"
UPDATE employees SET is_deleted = 1, deleted_at = CURRENT_TIMESTAMP WHERE id = 5;
-- Normal queries filter out deleted rows
SELECT * FROM employees WHERE is_deleted = 0;
-- Or use a view
CREATE VIEW active_employees AS
SELECT * FROM employees WHERE is_deleted = 0;
Lookup/Reference Tables
Instead of storing string values like 'pending', 'confirmed' everywhere, use a reference table:
CREATE TABLE order_statuses (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
INSERT INTO order_statuses VALUES (1,'pending'),(2,'confirmed'),(3,'shipped'),(4,'delivered'),(5,'cancelled');
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
status_id INTEGER NOT NULL,
FOREIGN KEY (status_id) REFERENCES order_statuses(id)
);
Changing a status name requires updating one row, not thousands.
Versioning / History Tables
Keep a full history of changes:
CREATE TABLE product_prices (
id INTEGER PRIMARY KEY AUTOINCREMENT,
product_id INTEGER NOT NULL,
price DECIMAL(10,2) NOT NULL,
valid_from DATE NOT NULL,
valid_to DATE, -- NULL means currently active
FOREIGN KEY (product_id) REFERENCES products(id)
);
-- Current price
SELECT price FROM product_prices
WHERE product_id = 42 AND valid_to IS NULL;
-- Historical price on a specific date
SELECT price FROM product_prices
WHERE product_id = 42
AND valid_from <= '2025-01-01'
AND (valid_to IS NULL OR valid_to > '2025-01-01');
Complete Example: E-Commerce Schema
CREATE TABLE categories (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE products (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name VARCHAR(200) NOT NULL,
description TEXT,
price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
stock INTEGER NOT NULL DEFAULT 0 CHECK (stock >= 0),
category_id INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (category_id) REFERENCES categories(id)
);
CREATE TABLE customers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name VARCHAR(100) NOT NULL,
email VARCHAR(150) NOT NULL UNIQUE,
city VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE orders (
id INTEGER PRIMARY KEY AUTOINCREMENT,
customer_id INTEGER NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending','confirmed','shipped','delivered','cancelled')),
order_date DATE NOT NULL DEFAULT CURRENT_DATE,
total DECIMAL(12,2),
FOREIGN KEY (customer_id) REFERENCES customers(id) ON DELETE RESTRICT
);
CREATE TABLE order_items (
order_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
qty INTEGER NOT NULL CHECK (qty > 0),
unit_price DECIMAL(10,2) NOT NULL CHECK (unit_price >= 0),
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE,
FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE RESTRICT
);
-- Useful indexes
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_date ON orders (order_date);
CREATE INDEX idx_products_category ON products (category_id);
CREATE INDEX idx_order_items_product ON order_items (product_id);
Common Mistakes
1. Storing multiple values in one column
-- Bad
CREATE TABLE employees (
tags TEXT -- "finance, accounting, excel" — hard to query
);
-- Good
CREATE TABLE employee_tags (
employee_id INTEGER,
tag TEXT,
PRIMARY KEY (employee_id, tag)
);
2. Using VARCHAR for status fields
-- Risky — typos like 'Pendin' or 'PENDING' get through
status VARCHAR(20)
-- Better — use a CHECK constraint or FK to a status table
status VARCHAR(20) CHECK (status IN ('pending','confirmed','shipped'))
3. Premature normalization
Normalizing all the way to 5NF for a small lookup table creates unnecessary complexity. Use judgment — 3NF is the standard target for OLTP databases.
4. No surrogate key on junction tables
-- Fine for simple cases
PRIMARY KEY (order_id, product_id)
-- But sometimes you need to reference an order_item specifically:
-- Adding a surrogate key helps
CREATE TABLE order_items (
id INTEGER PRIMARY KEY AUTOINCREMENT, -- surrogate for easy reference
order_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
UNIQUE (order_id, product_id) -- still enforce uniqueness
);
Practice Exercises
- A table
student_courseshas columns:student_name,student_email,course_name,instructor_name,instructor_email,grade. Identify all violations of 1NF, 2NF, and 3NF, then design a normalised schema. - Design a schema for a library system: books can have multiple authors; members can borrow multiple books; each loan has a due date and return date.
- Add audit columns (
created_at,updated_at,created_by) to theorderstable from the e-commerce schema. - Implement a soft-delete pattern for the
customerstable and create a view that shows only non-deleted customers. - Why might a data warehouse intentionally denormalize and store customer name directly in the
orderstable?
Summary
In this chapter you learned:
- Good database design prevents anomalies, redundancy, and inconsistency
- ER modeling: entities (things), attributes (properties), relationships (1:1, 1:N, M:N)
- 1:N → foreign key in the "many" side; M:N → junction/bridge table
- 1NF — atomic values, no repeating groups, unique rows
- 2NF — no partial dependencies on a composite PK (full key dependency)
- 3NF — no transitive dependencies (non-key → non-key removed)
- BCNF — every determinant is a candidate key
- Denormalization — trading update complexity for read performance (analytics/reporting)
- Audit columns (
created_at,updated_at) — always include in production tables - Soft deletes — preserve history instead of physically deleting rows
- Reference/lookup tables — avoid magic strings scattered across many tables
- History/versioning tables — track changes over time with
valid_from/valid_to
You've now completed the full SQL tutorial! You have the knowledge to design solid schemas, write complex queries, optimise performance, and ensure data integrity in any relational database.