Chapter 18 of 18

Database Design & Normalization

Design efficient, consistent schemas — entity-relationship modeling, normalization (1NF through 3NF/BCNF), denormalization trade-offs, and real-world design patterns.

Meritshot12 min read
SQLDatabase DesignNormalizationER Diagram1NF2NF3NFSchema Design
All SQL Chapters

Why Database Design Matters

Poor database design leads to:

  • Data anomalies — inserting, updating, or deleting a record causes unintended effects in other records
  • Data redundancy — the same information stored in many places; update one and miss others
  • Inconsistency — different rows disagree about the same fact
  • Query difficulty — poorly structured data is hard to query correctly

Good design prevents all of these through normalization and careful schema planning.

The Design Process

  1. Understand requirements — what data needs to be stored? What questions will be asked?
  2. Identify entities — things with distinct existence (Customer, Order, Product)
  3. Identify attributes — properties of each entity (customer: name, email, phone)
  4. Identify relationships — how entities relate (customer places orders)
  5. Choose primary keys — unique identifier for each entity
  6. Normalize — eliminate redundancy while preserving information
  7. Define constraints — enforce business rules at the database level

Entity-Relationship (ER) Modeling

An ER diagram maps the conceptual model before writing any SQL.

Entities and Attributes

CUSTOMER:  customer_id (PK), name, email, phone, city
ORDER:     order_id (PK), order_date, status, total_amount
PRODUCT:   product_id (PK), name, price, category, stock_qty

Relationship Types

TypeDescriptionExample
One-to-One (1:1)One A relates to one BEmployee ↔ Passport
One-to-Many (1:N)One A relates to many BCustomer → Orders
Many-to-Many (M:N)Many A relate to many BOrders ↔ Products

Implementing Relationships in SQL

One-to-Many: FK in the "many" side

CREATE TABLE customers (customer_id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE orders (
    order_id    INTEGER PRIMARY KEY,
    customer_id INTEGER NOT NULL,  -- FK to customers
    order_date  DATE,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
-- One customer has many orders; each order belongs to one customer

Many-to-Many: Junction/bridge table

-- Orders and Products: one order can have many products; one product can be in many orders
CREATE TABLE order_items (
    order_id   INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity   INTEGER NOT NULL DEFAULT 1,
    unit_price DECIMAL(10,2) NOT NULL,

    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id)   REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

One-to-One: FK with UNIQUE constraint

CREATE TABLE employees (employee_id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE employee_contracts (
    contract_id INTEGER PRIMARY KEY,
    employee_id INTEGER UNIQUE NOT NULL,  -- UNIQUE enforces 1:1
    start_date  DATE,
    FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);

What Is Normalization?

Normalization is the process of organising a database to reduce data redundancy and improve data integrity. It's done through a series of Normal Forms (NF) — each one removes a specific type of anomaly.

The Un-normalised Table

Start with a single "flat" table that stores everything:

order_data (bad design)
order_id | customer_name | customer_email     | customer_city | product1 | qty1 | product2 | qty2 | order_date
---------|---------------|--------------------|----|----------|------|----------|------|---
1001     | Priya Sharma  | priya@example.com  | Delhi  | Laptop | 1  | Mouse | 2  | 2026-01-10
1002     | Raj Patel     | raj@example.com    | Mumbai | Phone  | 1  | NULL  | NULL| 2026-01-15
1003     | Priya Sharma  | priya@example.com  | Delhi  | Tablet | 1  | NULL  | NULL| 2026-02-01

Problems:

  • Customer info repeated across orders (update email in one place, miss another)
  • Fixed number of product columns (what if order has 10 items?)
  • Empty NULLs for unused product slots

First Normal Form (1NF)

1NF Rules:

  1. Each column contains atomic (indivisible) values — no lists, no arrays, no repeated columns
  2. Each row must be unique (requires a primary key)
  3. No repeating groups (no product1, product2, product3 columns)

Violation in our table:

  • product1, qty1, product2, qty2 are repeating groups
  • A customer might have multiple phone numbers in one cell: "9999,8888"

Fix — Remove repeating groups:

-- 1NF: separate rows for each product in the order
orders (1NF):
order_id | customer_name | customer_email   | customer_city | product_name | qty | order_date
---------|---------------|------------------|--------------|--------------|-----|----------
1001     | Priya Sharma  | priya@example.com| Delhi        | Laptop       | 1   | 2026-01-10
1001     | Priya Sharma  | priya@example.com| Delhi        | Mouse        | 2   | 2026-01-10
1002     | Raj Patel     | raj@example.com  | Mumbai       | Phone        | 1   | 2026-01-15
1003     | Priya Sharma  | priya@example.com| Delhi        | Tablet       | 1   | 2026-02-01

Now 1NF — but we still have redundancy (customer info repeated for each product in order 1001).

Second Normal Form (2NF)

2NF requires:

  1. Already in 1NF
  2. Every non-key column must depend on the entire primary key — not just part of it

If the primary key is (order_id, product_name), then:

  • qty depends on both (the quantity of a product in a specific order) ✓
  • customer_name depends only on order_id (not on product_name) ✗ — partial dependency

Fix — Split into separate tables:

-- Customers
CREATE TABLE customers (
    customer_id    INTEGER PRIMARY KEY,
    name           TEXT NOT NULL,
    email          TEXT UNIQUE,
    city           TEXT
);

-- Orders (each order belongs to one customer)
CREATE TABLE orders (
    order_id      INTEGER PRIMARY KEY,
    customer_id   INTEGER NOT NULL,
    order_date    DATE    NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Products
CREATE TABLE products (
    product_id    INTEGER PRIMARY KEY,
    name          TEXT    NOT NULL,
    price         DECIMAL(10,2)
);

-- Order items (the bridge table)
CREATE TABLE order_items (
    order_id    INTEGER NOT NULL,
    product_id  INTEGER NOT NULL,
    qty         INTEGER NOT NULL DEFAULT 1,
    unit_price  DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id)   REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Now in 2NF: each column depends on the full primary key of its table.

Third Normal Form (3NF)

3NF requires:

  1. Already in 2NF
  2. No transitive dependencies — non-key column should not depend on another non-key column

Example Violation

-- employees with transitive dependency
employees:
employee_id | name  | department_id | department_name | department_budget
------------|-------|--------------|-----------------|------------------
1           | Priya | 10           | Finance         | 500000
2           | Raj   | 20           | Technology      | 750000
3           | Meera | 10           | Finance         | 500000
  • department_name depends on department_id (not on employee_id)
  • department_budget depends on department_id
  • These are transitive dependencies: employee_id → department_id → department_name

Fix — Move department data to its own table:

CREATE TABLE departments (
    department_id   INTEGER PRIMARY KEY,
    department_name TEXT    NOT NULL,
    budget          DECIMAL(12,2)
);

CREATE TABLE employees (
    employee_id   INTEGER PRIMARY KEY,
    name          TEXT    NOT NULL,
    department_id INTEGER NOT NULL,
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

Now in 3NF: every non-key column depends only on the primary key.

Boyce-Codd Normal Form (BCNF)

BCNF is a slightly stronger version of 3NF: every determinant must be a candidate key. For most practical database design, 3NF is sufficient.

Summary of Normal Forms

Normal FormRemoves
1NFRepeating groups, non-atomic values
2NFPartial dependencies (on part of composite PK)
3NFTransitive dependencies (non-key → non-key)
BCNFAll anomalies from functional dependencies

Denormalization — When to Break the Rules

Normalization is best for write-heavy systems (fewer anomalies, easier updates). But for read-heavy systems (analytics, reporting), joining many tables is expensive.

Denormalization intentionally adds redundancy to improve read performance:

-- Denormalised orders table (no need to join for common reports)
CREATE TABLE orders_denorm (
    order_id       INTEGER PRIMARY KEY,
    order_date     DATE,
    customer_id    INTEGER,
    customer_name  TEXT,    -- redundant but fast to read
    customer_city  TEXT,    -- redundant
    total_items    INTEGER, -- pre-computed
    total_amount   DECIMAL(10,2) -- pre-computed
);

You trade storage and write complexity for read speed. This is common in:

  • Data warehouses (star schema, snowflake schema)
  • Analytics databases (BigQuery, Redshift)
  • Pre-aggregated reporting tables

Real-World Design Patterns

Audit Columns

Every production table should track when and by whom rows were created or changed:

CREATE TABLE employees (
    id          INTEGER   PRIMARY KEY AUTOINCREMENT,
    name        TEXT      NOT NULL,
    department  TEXT      NOT NULL,
    salary      DECIMAL   NOT NULL,
    created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by  INTEGER,  -- user who created the record
    updated_by  INTEGER   -- user who last modified it
);

Soft Deletes

Don't delete rows — mark them as deleted to preserve history:

ALTER TABLE employees ADD COLUMN deleted_at TIMESTAMP;
ALTER TABLE employees ADD COLUMN is_deleted INTEGER DEFAULT 0;

-- "Delete"
UPDATE employees SET is_deleted = 1, deleted_at = CURRENT_TIMESTAMP WHERE id = 5;

-- Normal queries filter out deleted rows
SELECT * FROM employees WHERE is_deleted = 0;

-- Or use a view
CREATE VIEW active_employees AS
SELECT * FROM employees WHERE is_deleted = 0;

Lookup/Reference Tables

Instead of storing string values like 'pending', 'confirmed' everywhere, use a reference table:

CREATE TABLE order_statuses (
    id   INTEGER PRIMARY KEY,
    name TEXT    NOT NULL UNIQUE
);
INSERT INTO order_statuses VALUES (1,'pending'),(2,'confirmed'),(3,'shipped'),(4,'delivered'),(5,'cancelled');

CREATE TABLE orders (
    id        INTEGER PRIMARY KEY,
    status_id INTEGER NOT NULL,
    FOREIGN KEY (status_id) REFERENCES order_statuses(id)
);

Changing a status name requires updating one row, not thousands.

Versioning / History Tables

Keep a full history of changes:

CREATE TABLE product_prices (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    product_id  INTEGER NOT NULL,
    price       DECIMAL(10,2) NOT NULL,
    valid_from  DATE NOT NULL,
    valid_to    DATE,  -- NULL means currently active
    FOREIGN KEY (product_id) REFERENCES products(id)
);

-- Current price
SELECT price FROM product_prices
WHERE product_id = 42 AND valid_to IS NULL;

-- Historical price on a specific date
SELECT price FROM product_prices
WHERE product_id = 42
  AND valid_from <= '2025-01-01'
  AND (valid_to IS NULL OR valid_to > '2025-01-01');

Complete Example: E-Commerce Schema

CREATE TABLE categories (
    id   INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE
);

CREATE TABLE products (
    id          INTEGER       PRIMARY KEY AUTOINCREMENT,
    name        VARCHAR(200)  NOT NULL,
    description TEXT,
    price       DECIMAL(10,2) NOT NULL CHECK (price >= 0),
    stock       INTEGER       NOT NULL DEFAULT 0 CHECK (stock >= 0),
    category_id INTEGER,
    created_at  TIMESTAMP     DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (category_id) REFERENCES categories(id)
);

CREATE TABLE customers (
    id         INTEGER       PRIMARY KEY AUTOINCREMENT,
    name       VARCHAR(100)  NOT NULL,
    email      VARCHAR(150)  NOT NULL UNIQUE,
    city       VARCHAR(100),
    created_at TIMESTAMP     DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
    id          INTEGER       PRIMARY KEY AUTOINCREMENT,
    customer_id INTEGER       NOT NULL,
    status      VARCHAR(20)   NOT NULL DEFAULT 'pending'
        CHECK (status IN ('pending','confirmed','shipped','delivered','cancelled')),
    order_date  DATE          NOT NULL DEFAULT CURRENT_DATE,
    total       DECIMAL(12,2),
    FOREIGN KEY (customer_id) REFERENCES customers(id) ON DELETE RESTRICT
);

CREATE TABLE order_items (
    order_id   INTEGER       NOT NULL,
    product_id INTEGER       NOT NULL,
    qty        INTEGER       NOT NULL CHECK (qty > 0),
    unit_price DECIMAL(10,2) NOT NULL CHECK (unit_price >= 0),
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id)   REFERENCES orders(id)   ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE RESTRICT
);

-- Useful indexes
CREATE INDEX idx_orders_customer   ON orders (customer_id);
CREATE INDEX idx_orders_date       ON orders (order_date);
CREATE INDEX idx_products_category ON products (category_id);
CREATE INDEX idx_order_items_product ON order_items (product_id);

Common Mistakes

1. Storing multiple values in one column

-- Bad
CREATE TABLE employees (
    tags TEXT  -- "finance, accounting, excel" — hard to query
);

-- Good
CREATE TABLE employee_tags (
    employee_id INTEGER,
    tag         TEXT,
    PRIMARY KEY (employee_id, tag)
);

2. Using VARCHAR for status fields

-- Risky — typos like 'Pendin' or 'PENDING' get through
status VARCHAR(20)

-- Better — use a CHECK constraint or FK to a status table
status VARCHAR(20) CHECK (status IN ('pending','confirmed','shipped'))

3. Premature normalization

Normalizing all the way to 5NF for a small lookup table creates unnecessary complexity. Use judgment — 3NF is the standard target for OLTP databases.

4. No surrogate key on junction tables

-- Fine for simple cases
PRIMARY KEY (order_id, product_id)

-- But sometimes you need to reference an order_item specifically:
-- Adding a surrogate key helps
CREATE TABLE order_items (
    id         INTEGER PRIMARY KEY AUTOINCREMENT,  -- surrogate for easy reference
    order_id   INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    UNIQUE (order_id, product_id)                  -- still enforce uniqueness
);

Practice Exercises

  1. A table student_courses has columns: student_name, student_email, course_name, instructor_name, instructor_email, grade. Identify all violations of 1NF, 2NF, and 3NF, then design a normalised schema.
  2. Design a schema for a library system: books can have multiple authors; members can borrow multiple books; each loan has a due date and return date.
  3. Add audit columns (created_at, updated_at, created_by) to the orders table from the e-commerce schema.
  4. Implement a soft-delete pattern for the customers table and create a view that shows only non-deleted customers.
  5. Why might a data warehouse intentionally denormalize and store customer name directly in the orders table?

Summary

In this chapter you learned:

  • Good database design prevents anomalies, redundancy, and inconsistency
  • ER modeling: entities (things), attributes (properties), relationships (1:1, 1:N, M:N)
  • 1:N → foreign key in the "many" side; M:N → junction/bridge table
  • 1NF — atomic values, no repeating groups, unique rows
  • 2NF — no partial dependencies on a composite PK (full key dependency)
  • 3NF — no transitive dependencies (non-key → non-key removed)
  • BCNF — every determinant is a candidate key
  • Denormalization — trading update complexity for read performance (analytics/reporting)
  • Audit columns (created_at, updated_at) — always include in production tables
  • Soft deletes — preserve history instead of physically deleting rows
  • Reference/lookup tables — avoid magic strings scattered across many tables
  • History/versioning tables — track changes over time with valid_from / valid_to

You've now completed the full SQL tutorial! You have the knowledge to design solid schemas, write complex queries, optimise performance, and ensure data integrity in any relational database.