Chapter 6 of 14

Tuples & Sets

Understand immutable tuples and unique-element sets — creation, operations, methods, use cases, and when to choose each.

Meritshot34 min read
PythonTuplesSetsData Structures
All Python Chapters

Part 1 — Tuples

A tuple is an ordered, immutable sequence. Once you create a tuple, you cannot add, remove, or reassign its elements. This makes tuples ideal for data that should remain constant throughout the life of a program — coordinates, database rows, configuration values, and function return values.

coordinates = (10, 20)
colors = ("red", "green", "blue")
print(type(coordinates))  # <class 'tuple'>

Why Should You Care About Tuples?

Beginners often wonder why tuples exist when lists can do everything tuples do and more. Here are the key reasons:

  • Data integrity — immutability guarantees that the data will not be accidentally modified.
  • Performance — tuples use less memory than lists and are faster to create and access.
  • Hashable — tuples (of hashable elements) can be used as dictionary keys and set members, whereas lists cannot.
  • Signal intent — using a tuple tells other developers "this collection is not meant to change."
  • Safe defaults — passing a tuple to a function ensures the caller's data cannot be mutated.

Creating Tuples

Python offers several ways to create tuples. Understanding all of them helps you recognise tuples when you encounter them in other people's code.

1. Parentheses (Standard Syntax)

fruits = ("apple", "banana", "cherry")
point = (3.5, 7.2)
mixed = (1, "hello", True, 3.14)

2. Without Parentheses (Tuple Packing)

Python actually creates a tuple whenever you write a comma-separated sequence of values — the parentheses are optional in most contexts.

# These two lines create identical tuples
a = 1, 2, 3
b = (1, 2, 3)
print(a == b)  # True
print(type(a)) # <class 'tuple'>

This is called tuple packing — Python "packs" the values into a tuple automatically.

3. Single-Element Tuple (The Trailing Comma)

This is one of the most common gotchas. A single value in parentheses is not a tuple — it is just that value. You need a trailing comma.

# NOT a tuple — just an integer in parentheses
not_a_tuple = (42)
print(type(not_a_tuple))  # <class 'int'>

# THIS is a single-element tuple
single = (42,)
print(type(single))  # <class 'tuple'>
print(len(single))   # 1

# Without parentheses — the comma is what matters
also_single = 42,
print(type(also_single))  # <class 'tuple'>

4. The tuple() Constructor

Convert any iterable into a tuple.

# From a list
from_list = tuple([1, 2, 3])
print(from_list)  # (1, 2, 3)

# From a string (each character becomes an element)
from_string = tuple("Python")
print(from_string)  # ('P', 'y', 't', 'h', 'o', 'n')

# From a range
from_range = tuple(range(5))
print(from_range)  # (0, 1, 2, 3, 4)

# From a set (order not guaranteed from the set)
from_set = tuple({3, 1, 2})
print(from_set)  # order may vary

# From a generator expression
squares = tuple(x ** 2 for x in range(6))
print(squares)  # (0, 1, 4, 9, 16, 25)

5. Empty Tuple

empty_a = ()
empty_b = tuple()
print(empty_a == empty_b)  # True
print(len(empty_a))        # 0

6. Repeating Elements

Like lists, you can use the * operator to repeat a tuple.

zeros = (0,) * 5
print(zeros)  # (0, 0, 0, 0, 0)

pattern = (True, False) * 3
print(pattern)  # (True, False, True, False, True, False)

7. Concatenation

You can combine tuples with +. This creates a new tuple (the originals remain unchanged).

first = (1, 2, 3)
second = (4, 5, 6)
combined = first + second
print(combined)  # (1, 2, 3, 4, 5, 6)
print(first)     # (1, 2, 3) — unchanged

Accessing Tuple Elements

Positive and Negative Indexing

Tuples are zero-indexed, exactly like lists.

languages = ("Python", "Java", "C++", "JavaScript", "Go")

# Positive indexing (left to right, starting at 0)
print(languages[0])   # Python
print(languages[2])   # C++
print(languages[4])   # Go

# Negative indexing (right to left, starting at -1)
print(languages[-1])  # Go
print(languages[-3])  # C++
print(languages[-5])  # Python

IndexError

Accessing an out-of-range index raises an IndexError, just as with lists.

point = (10, 20, 30)
# print(point[5])  # IndexError: tuple index out of range

Slicing Tuples

Slicing works identically to list slicing. The result is always a new tuple.

nums = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

print(nums[2:5])    # (2, 3, 4)
print(nums[:4])     # (0, 1, 2, 3)
print(nums[6:])     # (6, 7, 8, 9)
print(nums[::2])    # (0, 2, 4, 6, 8)     — every second element
print(nums[1::2])   # (1, 3, 5, 7, 9)     — odd-indexed elements
print(nums[::-1])   # (9, 8, 7, 6, 5, 4, 3, 2, 1, 0)  — reversed
print(nums[7:2:-1]) # (7, 6, 5, 4, 3)     — backward from 7 to 3

Key point: Slicing never raises an IndexError. Out-of-range slice boundaries are silently clamped.

short = (1, 2, 3)
print(short[0:100])  # (1, 2, 3) — no error
print(short[50:])    # ()        — empty tuple, no error

Tuple Unpacking

Tuple unpacking (also called destructuring) assigns each element of a tuple to a separate variable in a single statement. This is one of the most elegant features in Python.

Basic Unpacking

person = ("Priya", 25, "Mumbai")
name, age, city = person

print(name)  # Priya
print(age)   # 25
print(city)  # Mumbai

The number of variables on the left must match the number of elements in the tuple, otherwise Python raises a ValueError.

# ValueError: not enough values to unpack
# a, b = (1, 2, 3)

# ValueError: too many values to unpack
# a, b, c, d = (1, 2, 3)

Swapping Variables

Tuple unpacking makes variable swapping a one-liner — no temporary variable needed.

a, b = 1, 2
a, b = b, a
print(a, b)  # 2 1

# Works with more than two variables
x, y, z = 1, 2, 3
x, y, z = z, x, y
print(x, y, z)  # 3 1 2

Starred Unpacking with *rest

When you don't know the exact length of a tuple, or you only care about certain positions, use the * operator to collect "the rest" into a list.

numbers = (1, 2, 3, 4, 5, 6, 7)

first, *middle, last = numbers
print(first)   # 1
print(middle)  # [2, 3, 4, 5, 6]  — note: this is a LIST
print(last)    # 7

# Grab only the first two
first, second, *rest = numbers
print(first)   # 1
print(second)  # 2
print(rest)    # [3, 4, 5, 6, 7]

# Grab only the last two
*rest, second_last, last = numbers
print(rest)        # [1, 2, 3, 4, 5]
print(second_last) # 6
print(last)        # 7

Important: The starred variable always becomes a list, even if it captures zero or one element.

a, *b, c = (1, 2)
print(a)  # 1
print(b)  # []  — empty list
print(c)  # 2

Ignoring Values with Underscore

By convention, _ is used as a "throwaway" variable for values you don't need.

record = ("Priya", 25, "Mumbai", "priya@example.com")

name, _, city, _ = record
print(name)  # Priya
print(city)  # Mumbai

# Combine with starred unpacking
name, *_ = record
print(name)  # Priya  — only the name, ignoring everything else

Unpacking in Loops

Tuple unpacking is extremely useful when iterating over sequences of tuples.

students = [
    ("Alice", 88),
    ("Bob", 72),
    ("Charlie", 95),
]

for name, score in students:
    print(f"{name}: {score}")

# With enumerate
for index, (name, score) in enumerate(students, start=1):
    print(f"#{index} {name} scored {score}")

Named Tuples

Regular tuples access elements by numeric index, which hurts readability. Named tuples let you access elements by name while retaining all the benefits of tuples (immutability, hashability, low memory).

Creating Named Tuples with collections.namedtuple

from collections import namedtuple

# Define a named tuple type
Point = namedtuple("Point", ["x", "y"])

# Create instances
p1 = Point(3, 7)
p2 = Point(x=10, y=20)

# Access by name (preferred — more readable)
print(p1.x)  # 3
print(p1.y)  # 7

# Access by index (still works)
print(p2[0])  # 10
print(p2[1])  # 20

# Unpack just like a regular tuple
x, y = p1
print(x, y)  # 3 7

Named Tuples Are Still Immutable

from collections import namedtuple

Color = namedtuple("Color", "red green blue")  # string syntax also works
c = Color(255, 128, 0)

# c.red = 200  # AttributeError: can't set attribute

# To "modify", create a new named tuple with _replace()
c2 = c._replace(red=200)
print(c2)  # Color(red=200, green=128, blue=0)
print(c)   # Color(red=255, green=128, blue=0) — original unchanged

Named Tuples with Default Values

from collections import namedtuple

# Defaults apply to the rightmost fields
Student = namedtuple("Student", ["name", "age", "grade"], defaults=["A"])

s1 = Student("Priya", 22)         # grade defaults to "A"
s2 = Student("Rahul", 24, "B+")   # explicit grade

print(s1)  # Student(name='Priya', age=22, grade='A')
print(s2)  # Student(name='Rahul', age=24, grade='B+')

Converting Named Tuples to Dictionaries

from collections import namedtuple

Employee = namedtuple("Employee", ["name", "department", "salary"])
emp = Employee("Alice", "Engineering", 95000)

# Convert to an ordered dictionary
emp_dict = emp._asdict()
print(emp_dict)
# {'name': 'Alice', 'department': 'Engineering', 'salary': 95000}

Modern Alternative: typing.NamedTuple

Python 3.6+ offers a class-based syntax with type hints.

from typing import NamedTuple

class Coordinate(NamedTuple):
    latitude: float
    longitude: float
    label: str = "Unknown"

loc = Coordinate(28.6139, 77.2090, "New Delhi")
print(loc.label)      # New Delhi
print(loc.latitude)   # 28.6139

Tuple Methods

Tuples have only two built-in methods, because they are immutable.

MethodDescriptionReturns
count(x)Number of times x appears in the tupleint
index(x, start, end)Index of first occurrence of xint (raises ValueError if missing)
data = (10, 20, 30, 20, 40, 20, 50)

# count — how many times does 20 appear?
print(data.count(20))   # 3
print(data.count(99))   # 0

# index — where is the first 20?
print(data.index(20))       # 1
print(data.index(20, 2))    # 3  — search starting from index 2
print(data.index(20, 4))    # 5  — search starting from index 4

# index raises ValueError if not found
# data.index(99)  # ValueError: tuple.index(x): x not in tuple

Built-in Functions That Work with Tuples

Although tuples have only two methods, many built-in functions accept tuples.

FunctionDescriptionExample
len(t)Number of elementslen((1,2,3)) returns 3
min(t)Smallest elementmin((3,1,2)) returns 1
max(t)Largest elementmax((3,1,2)) returns 3
sum(t)Sum of all elementssum((1,2,3)) returns 6
sorted(t)New sorted list from tuplesorted((3,1,2)) returns [1,2,3]
reversed(t)Reverse iteratortuple(reversed((1,2,3))) returns (3,2,1)
any(t)True if any element is truthyany((0, False, 1)) returns True
all(t)True if all elements are truthyall((1, True, "hi")) returns True
enumerate(t)Iterator of (index, item) pairsSee iteration section
zip(a, b)Pair elements from two tuplesSee iteration section
scores = (78, 92, 85, 63, 97, 88)

print(len(scores))    # 6
print(min(scores))    # 63
print(max(scores))    # 97
print(sum(scores))    # 503
print(sorted(scores)) # [63, 78, 85, 88, 92, 97]  — returns a LIST

Immutability — What It Really Means

Tuples are immutable, which means you cannot reassign, add, or remove elements.

t = (1, 2, 3)

# ALL of these will raise TypeError:
# t[0] = 99
# t.append(4)
# del t[0]

Immutability Does NOT Mean the Contents Cannot Change

This is a crucial subtlety. If a tuple contains a mutable object (like a list or dictionary), you can modify that object in place. The tuple itself doesn't change — it still holds the same reference — but the object the reference points to changes.

# A tuple containing a list
t = (1, [2, 3], 4)

# You CANNOT replace the list with a different object
# t[1] = [20, 30]  # TypeError

# But you CAN modify the list IN PLACE
t[1].append(99)
print(t)  # (1, [2, 3, 99], 4)

t[1][0] = 200
print(t)  # (1, [200, 3, 99], 4)

This is because immutability applies to the references stored in the tuple, not to the objects those references point to. The tuple still holds the same list object — it's the list's contents that changed.

Best practice: If you want truly frozen data, make sure every element is also immutable (use tuples, strings, numbers, frozensets — not lists or dicts).

Hashability Depends on Contents

A tuple is hashable only if all its elements are hashable. This matters when you try to use a tuple as a dictionary key or add it to a set.

# Hashable tuple — all elements are immutable
hashable = (1, "hello", (2, 3))
print(hash(hashable))  # works fine
my_dict = {hashable: "value"}  # works as a dict key

# NOT hashable — contains a mutable list
unhashable = (1, [2, 3])
# hash(unhashable)  # TypeError: unhashable type: 'list'
# my_dict = {unhashable: "value"}  # TypeError

Tuple Use Cases

1. Dictionary Keys

Lists cannot be dictionary keys because they are mutable. Tuples can.

# Using (row, col) tuples as keys for a sparse grid
grid = {}
grid[(0, 0)] = "start"
grid[(2, 5)] = "treasure"
grid[(4, 4)] = "exit"

print(grid[(2, 5)])  # treasure

# Counting occurrences of coordinate pairs
from collections import Counter
clicks = [(100, 200), (150, 300), (100, 200), (100, 200)]
click_counts = Counter(clicks)
print(click_counts)
# Counter({(100, 200): 3, (150, 300): 1})

2. Function Return Values

Functions frequently return multiple values as tuples.

def divide(a, b):
    """Return both the quotient and remainder."""
    quotient = a // b
    remainder = a % b
    return quotient, remainder  # returns a tuple

q, r = divide(17, 5)
print(f"17 / 5 = {q} remainder {r}")  # 17 / 5 = 3 remainder 2

# You can also capture the result as a single tuple
result = divide(17, 5)
print(result)    # (3, 2)
print(result[0]) # 3

3. Data Integrity — Read-Only Records

Use tuples when data should not be modified after creation.

# Database-style records
employees = [
    ("E001", "Alice", "Engineering", 95000),
    ("E002", "Bob", "Marketing", 72000),
    ("E003", "Charlie", "Engineering", 88000),
]

# Safe to iterate — nobody can accidentally modify a record
for emp_id, name, dept, salary in employees:
    print(f"{emp_id}: {name} ({dept}) — ${salary:,}")

4. Tuples as Set Elements

Because tuples are hashable, you can put them inside sets.

# Unique coordinate pairs
visited = set()
visited.add((0, 0))
visited.add((1, 2))
visited.add((0, 0))  # duplicate — ignored
print(visited)  # {(0, 0), (1, 2)}

# Check if a coordinate has been visited
print((1, 2) in visited)  # True
print((3, 4) in visited)  # False

5. String Formatting

The % formatting operator expects a tuple of values.

name = "Priya"
age = 25
print("Name: %s, Age: %d" % (name, age))
# Name: Priya, Age: 25

When to Use Tuples Over Lists

SituationUse a TupleUse a List
Data should not changeYesNo
Need to use as dict keyYesNo
Need to use as set memberYesNo
Returning multiple values from a functionYesNo
Fixed collection of heterogeneous items (like a record)YesNo
Collection will grow/shrinkNoYes
Need append(), sort(), remove() etc.NoYes
Element order matters and items may be modifiedNoYes

Rule of thumb: If the collection represents a fixed record (name, age, score), use a tuple. If the collection represents a dynamic group of similar items (list of names, list of scores), use a list.


Part 2 — Sets

A set is an unordered collection of unique elements. Sets automatically discard duplicates and provide O(1) membership testing. They are Python's implementation of the mathematical set concept.

fruits = {"apple", "banana", "cherry"}
print(type(fruits))  # <class 'set'>

Key Characteristics

  • Unordered — elements have no defined position; you cannot index or slice a set.
  • Unique — duplicate values are automatically removed.
  • Mutable — you can add and remove elements (unlike tuples).
  • Elements must be hashable — you can store integers, strings, and tuples, but not lists, dicts, or other sets.

Creating Sets

1. Curly Braces

colors = {"red", "green", "blue"}
numbers = {1, 2, 3, 2, 1}  # duplicates removed
print(numbers)  # {1, 2, 3}

Important: An empty {} creates a dictionary, not a set. Use set() for an empty set.

empty_dict = {}
empty_set = set()

print(type(empty_dict))  # <class 'dict'>
print(type(empty_set))   # <class 'set'>

2. The set() Constructor

# From a list (removes duplicates)
from_list = set([1, 2, 3, 2, 1])
print(from_list)  # {1, 2, 3}

# From a string (each character becomes an element)
from_string = set("mississippi")
print(from_string)  # {'m', 'i', 's', 'p'}  — unique characters only

# From a tuple
from_tuple = set((10, 20, 30, 20))
print(from_tuple)  # {10, 20, 30}

# From a range
from_range = set(range(5))
print(from_range)  # {0, 1, 2, 3, 4}

3. Set Comprehensions

Set comprehensions follow the same syntax as list comprehensions but use curly braces.

# Squares of numbers 1 through 10
squares = {x ** 2 for x in range(1, 11)}
print(squares)  # {1, 4, 9, 16, 25, 36, 49, 64, 81, 100}

# Unique word lengths from a sentence
sentence = "the quick brown fox jumps over the lazy dog"
word_lengths = {len(word) for word in sentence.split()}
print(word_lengths)  # {3, 4, 5}

# With a condition — only even squares
even_squares = {x ** 2 for x in range(1, 11) if x % 2 == 0}
print(even_squares)  # {4, 16, 36, 64, 100}

# Unique first letters (case-insensitive)
names = ["Alice", "Bob", "anna", "Charlie", "bob"]
initials = {name[0].upper() for name in names}
print(initials)  # {'A', 'B', 'C'}

Adding and Removing Elements

add() — Add a Single Element

skills = {"Python", "SQL"}
skills.add("Tableau")
print(skills)  # {'Python', 'SQL', 'Tableau'}

# Adding an element that already exists does nothing
skills.add("Python")
print(skills)  # {'Python', 'SQL', 'Tableau'} — unchanged

update() — Add Multiple Elements

update() accepts any iterable (list, tuple, set, string, etc.).

skills = {"Python"}
skills.update(["SQL", "Tableau"])
skills.update(("R", "Excel"))
skills.update({"Spark"})

print(skills)
# {'Python', 'SQL', 'Tableau', 'R', 'Excel', 'Spark'}

# Updating with a string adds EACH CHARACTER
letters = {"a", "b"}
letters.update("cd")
print(letters)  # {'a', 'b', 'c', 'd'}

remove() — Remove (Raises Error if Missing)

skills = {"Python", "SQL", "Tableau"}
skills.remove("SQL")
print(skills)  # {'Python', 'Tableau'}

# Raises KeyError if the element is not found
# skills.remove("Java")  # KeyError: 'Java'

discard() — Remove (No Error if Missing)

skills = {"Python", "SQL", "Tableau"}
skills.discard("SQL")
print(skills)  # {'Python', 'Tableau'}

# No error if the element doesn't exist
skills.discard("Java")  # silently does nothing
print(skills)  # {'Python', 'Tableau'}

pop() — Remove and Return an Arbitrary Element

Since sets are unordered, you cannot predict which element will be removed.

numbers = {10, 20, 30}
removed = numbers.pop()
print(f"Removed: {removed}")  # could be 10, 20, or 30
print(numbers)  # the remaining two elements

# Raises KeyError on empty set
# set().pop()  # KeyError: 'pop from an empty set'

clear() — Remove All Elements

items = {1, 2, 3}
items.clear()
print(items)  # set()

Set Operations — Mathematical Set Theory

This is where sets truly shine. Python supports all standard set operations, both as operators and methods. The method syntax is more flexible because it accepts any iterable as an argument, while the operator syntax requires both operands to be sets.

Union — All Elements from Both Sets

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

# Operator syntax
print(a | b)         # {1, 2, 3, 4, 5, 6}

# Method syntax (accepts any iterable)
print(a.union(b))    # {1, 2, 3, 4, 5, 6}
print(a.union([5, 6, 7]))  # {1, 2, 3, 4, 5, 6, 7}

# Multiple sets at once
c = {7, 8}
print(a | b | c)                # {1, 2, 3, 4, 5, 6, 7, 8}
print(a.union(b, c))            # {1, 2, 3, 4, 5, 6, 7, 8}

# In-place union (modifies a)
a_copy = a.copy()
a_copy |= b
print(a_copy)  # {1, 2, 3, 4, 5, 6}
# Or equivalently: a_copy.update(b)

Intersection — Elements Common to Both Sets

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

# Operator syntax
print(a & b)              # {3, 4}

# Method syntax
print(a.intersection(b))  # {3, 4}

# Multiple sets
c = {3, 4, 7, 8}
print(a & b & c)                  # {3, 4}
print(a.intersection(b, c))      # {3, 4}

# In-place intersection
a_copy = a.copy()
a_copy &= b
print(a_copy)  # {3, 4}
# Or equivalently: a_copy.intersection_update(b)

Difference — Elements in First Set but Not in Second

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

# Operator syntax
print(a - b)              # {1, 2}    — in a but NOT in b
print(b - a)              # {5, 6}    — in b but NOT in a

# Method syntax
print(a.difference(b))    # {1, 2}

# Multiple sets — remove elements found in ANY of b, c
c = {2, 7}
print(a - b - c)                  # {1}
print(a.difference(b, c))        # {1}

# In-place difference
a_copy = a.copy()
a_copy -= b
print(a_copy)  # {1, 2}
# Or equivalently: a_copy.difference_update(b)

Symmetric Difference — Elements in Either Set but Not Both

This is the opposite of intersection — it gives you elements that are unique to each set.

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

# Operator syntax
print(a ^ b)                          # {1, 2, 5, 6}

# Method syntax
print(a.symmetric_difference(b))      # {1, 2, 5, 6}

# In-place symmetric difference
a_copy = a.copy()
a_copy ^= b
print(a_copy)  # {1, 2, 5, 6}
# Or equivalently: a_copy.symmetric_difference_update(b)

Note: Symmetric difference with multiple sets using ^ is chained pairwise: a ^ b ^ c first computes a ^ b, then XORs the result with c. This is not the same as "elements appearing in exactly one of the three sets."

Subset and Superset

a = {1, 2, 3}
b = {1, 2, 3, 4, 5}

# Is a a subset of b? (every element of a is in b)
print(a <= b)           # True
print(a.issubset(b))    # True

# Is a a proper subset of b? (subset and not equal)
print(a < b)            # True

# Is b a superset of a? (b contains all elements of a)
print(b >= a)           # True
print(b.issuperset(a))  # True

# Is b a proper superset?
print(b > a)            # True

# Equal sets
c = {3, 2, 1}
print(a == c)     # True  — order doesn't matter
print(a <= c)     # True  — a set is a subset of itself
print(a < c)      # False — not a PROPER subset (they're equal)

Disjoint Sets — No Elements in Common

evens = {2, 4, 6, 8}
odds = {1, 3, 5, 7}
primes = {2, 3, 5, 7}

print(evens.isdisjoint(odds))    # True  — no overlap
print(evens.isdisjoint(primes))  # False — they share {2}
print(odds.isdisjoint(primes))   # False — they share {3, 5, 7}

Set Operations Summary Table

OperationOperatorMethodIn-Place MethodDescription
Uniona | ba.union(b)a.update(b) or a |= bAll elements from both
Intersectiona & ba.intersection(b)a.intersection_update(b) or a &= bCommon elements
Differencea - ba.difference(b)a.difference_update(b) or a -= bIn a but not b
Symmetric Diffa ^ ba.symmetric_difference(b)a.symmetric_difference_update(b) or a ^= bIn either but not both
Subseta <= ba.issubset(b)All of a in b?
Proper Subseta < bSubset and not equal?
Superseta >= ba.issuperset(b)All of b in a?
Proper Superseta > bSuperset and not equal?
Disjointa.isdisjoint(b)No common elements?

Membership Testing — O(1) Performance

One of the most important practical reasons to use sets is fast membership testing. Checking x in my_set runs in O(1) average time (constant time, regardless of set size), compared to O(n) for lists and tuples.

import time

# Create a large list and set with the same data
data = list(range(10_000_000))  # 10 million integers
data_set = set(data)

target = 9_999_999  # worst case for list (last element)

# List lookup — O(n)
start = time.time()
found_list = target in data
list_time = time.time() - start

# Set lookup — O(1)
start = time.time()
found_set = target in data_set
set_time = time.time() - start

print(f"List: {list_time:.6f}s")  # much slower
print(f"Set:  {set_time:.6f}s")   # nearly instant

Practical rule: If you need to check membership repeatedly (especially in a loop), convert your data to a set first. The one-time cost of building the set is quickly recovered.

# BAD — O(n) per lookup, O(n * m) total
allowed_list = ["admin", "editor", "viewer", "moderator", "analyst"]
users = [("Alice", "admin"), ("Bob", "hacker"), ("Charlie", "viewer")]

for name, role in users:
    if role in allowed_list:  # O(n) each time
        print(f"{name}: allowed")

# GOOD — O(1) per lookup
allowed_set = set(allowed_list)
for name, role in users:
    if role in allowed_set:  # O(1) each time
        print(f"{name}: allowed")

Frozen Sets

A frozenset is an immutable version of a set. It supports all set operations (union, intersection, etc.) but cannot be modified after creation — no add(), remove(), discard(), pop(), or clear().

# Create a frozenset
fs = frozenset([1, 2, 3, 4, 5])
print(fs)       # frozenset({1, 2, 3, 4, 5})
print(type(fs)) # <class 'frozenset'>

# All read-only operations work
print(3 in fs)              # True
print(fs | {6, 7})          # frozenset({1, 2, 3, 4, 5, 6, 7})
print(fs & {3, 4, 5, 6})   # frozenset({3, 4, 5})

# Modification operations raise AttributeError
# fs.add(6)      # AttributeError: 'frozenset' object has no attribute 'add'
# fs.remove(1)   # AttributeError

Why Use Frozensets?

Since frozensets are hashable, they can be used where regular sets cannot:

# 1. As dictionary keys
permissions = {
    frozenset({"read", "write"}): "Editor",
    frozenset({"read"}): "Viewer",
    frozenset({"read", "write", "admin"}): "Admin",
}

user_perms = frozenset({"read", "write"})
print(permissions[user_perms])  # Editor

# 2. As elements of another set (set of sets)
groups = set()
groups.add(frozenset({1, 2, 3}))
groups.add(frozenset({4, 5, 6}))
groups.add(frozenset({1, 2, 3}))  # duplicate — ignored
print(groups)  # {frozenset({1, 2, 3}), frozenset({4, 5, 6})}

# 3. As a safe default in function parameters
def process(items=frozenset()):
    """Default is a frozenset — no risk of mutable default argument bug."""
    for item in items:
        print(item)

Common Set Patterns

1. Removing Duplicates from a List

# Simple but DOES NOT preserve order
names = ["Priya", "Rahul", "Priya", "Ananya", "Rahul", "Priya"]
unique = list(set(names))
print(unique)  # order may vary

2. Removing Duplicates While Preserving Order

# Method 1: dict.fromkeys() — Python 3.7+
names = ["Priya", "Rahul", "Priya", "Ananya", "Rahul", "Priya"]
unique_ordered = list(dict.fromkeys(names))
print(unique_ordered)  # ['Priya', 'Rahul', 'Ananya']

# Method 2: Manual loop with a "seen" set
def deduplicate(items):
    """Remove duplicates while preserving insertion order."""
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

print(deduplicate(names))  # ['Priya', 'Rahul', 'Ananya']

# Method 3: Using a generator (memory-efficient for large data)
def unique_gen(items):
    seen = set()
    for item in items:
        if item not in seen:
            seen.add(item)
            yield item

print(list(unique_gen(names)))  # ['Priya', 'Rahul', 'Ananya']

3. Finding Common Elements Between Collections

# Students enrolled in different courses
python_students = {"Alice", "Bob", "Charlie", "Diana", "Eve"}
sql_students = {"Bob", "Charlie", "Frank", "Grace"}
tableau_students = {"Charlie", "Diana", "Grace", "Helen"}

# Students in ALL three courses
all_three = python_students & sql_students & tableau_students
print(f"All three courses: {all_three}")  # {'Charlie'}

# Students in Python OR SQL (at least one)
python_or_sql = python_students | sql_students
print(f"Python or SQL: {python_or_sql}")

# Students in Python but NOT SQL
python_only = python_students - sql_students
print(f"Python only: {python_only}")  # {'Alice', 'Diana', 'Eve'}

# Students in exactly one of the three courses
# (those in only one = total minus those in any pair)
in_exactly_one = (
    (python_students - sql_students - tableau_students) |
    (sql_students - python_students - tableau_students) |
    (tableau_students - python_students - sql_students)
)
print(f"Exactly one course: {in_exactly_one}")

4. Data Validation with Sets

# Validate that user input contains only allowed characters
allowed_chars = set("abcdefghijklmnopqrstuvwxyz0123456789_")

def is_valid_username(username):
    """Check if username contains only allowed characters."""
    return set(username.lower()).issubset(allowed_chars)

print(is_valid_username("priya_123"))   # True
print(is_valid_username("bob@email"))   # False — @ not allowed
print(is_valid_username("hello world")) # False — space not allowed

5. Set Algebra for Data Analysis

# Comparing two time periods
jan_customers = {"Alice", "Bob", "Charlie", "Diana", "Eve"}
feb_customers = {"Bob", "Diana", "Frank", "Grace", "Helen"}

# New customers in February (not seen in January)
new_customers = feb_customers - jan_customers
print(f"New in Feb: {new_customers}")  # {'Frank', 'Grace', 'Helen'}

# Churned customers (were in Jan, gone in Feb)
churned = jan_customers - feb_customers
print(f"Churned: {churned}")  # {'Alice', 'Charlie', 'Eve'}

# Retained customers (in both months)
retained = jan_customers & feb_customers
print(f"Retained: {retained}")  # {'Bob', 'Diana'}

# Retention rate
retention_rate = len(retained) / len(jan_customers) * 100
print(f"Retention rate: {retention_rate:.1f}%")  # 40.0%

Comparison Table — List vs Tuple vs Set vs Frozenset

Featurelisttuplesetfrozenset
Syntax[1, 2, 3](1, 2, 3){1, 2, 3}frozenset({1, 2, 3})
OrderedYesYesNoNo
MutableYesNoYesNo
Allows duplicatesYesYesNoNo
Indexing / slicingYesYesNoNo
HashableNoYes*NoYes
Can be dict keyNoYes*NoYes
Can be set elementNoYes*NoYes
in operator speedO(n)O(n)O(1)O(1)
Memory usageHigherLowerHigherHigher
Best use caseDynamic collectionsFixed records, dict keysUnique items, fast lookupImmutable unique groups

*Tuples are hashable only if all their elements are also hashable.

Quick Decision Guide

  • Need to modify the collection? Use list or set.
  • Need order? Use list or tuple.
  • Need uniqueness? Use set or frozenset.
  • Need fast membership testing? Use set or frozenset.
  • Need to use as a dict key? Use tuple or frozenset.
  • Data should never change? Use tuple or frozenset.

Practical Examples

Example 1: Frequency Analysis with Counter

from collections import Counter

# Analyse the frequency of words in a text
text = """
Python is a great programming language. Python is used for data science.
Data science is one of the most popular fields. Python and data science
are growing together. Python makes data analysis easy and fun.
"""

# Normalise and tokenise
words = text.lower().split()
# Remove punctuation from each word
words = [word.strip(".,!?;:") for word in words]

# Count frequencies
word_counts = Counter(words)

# Most common words
print("Top 5 most frequent words:")
for word, count in word_counts.most_common(5):
    print(f"  '{word}' — {count} times")

# Unique words (using a set)
unique_words = set(words)
print(f"\nTotal words: {len(words)}")
print(f"Unique words: {len(unique_words)}")
print(f"Vocabulary richness: {len(unique_words)/len(words):.2%}")

# Words that appear exactly once (hapax legomena)
hapax = {word for word, count in word_counts.items() if count == 1}
print(f"Words appearing only once: {hapax}")

Example 2: De-Duplication Pipeline

def clean_email_list(raw_emails):
    """
    Clean a list of email addresses:
    1. Strip whitespace
    2. Convert to lowercase
    3. Remove duplicates while preserving order
    4. Validate format (basic check)
    """
    seen = set()
    cleaned = []

    for email in raw_emails:
        # Step 1 & 2: normalise
        email = email.strip().lower()

        # Step 3: skip if already seen
        if email in seen:
            continue

        # Step 4: basic validation
        if "@" not in email or "." not in email.split("@")[-1]:
            print(f"  Skipped invalid: {email}")
            continue

        seen.add(email)
        cleaned.append(email)

    return cleaned

# Test data
raw = [
    "  Alice@Example.com ",
    "bob@test.com",
    "alice@example.com",      # duplicate (case-insensitive)
    "CHARLIE@test.com",
    "bob@test.com",           # exact duplicate
    "invalid-email",          # no @
    "diana@company.org",
    "  BOB@TEST.COM  ",      # duplicate with whitespace
]

result = clean_email_list(raw)
print(f"\nCleaned list ({len(result)} emails):")
for email in result:
    print(f"  {email}")

# Output:
#   Skipped invalid: invalid-email
#
# Cleaned list (4 emails):
#   alice@example.com
#   bob@test.com
#   charlie@test.com
#   diana@company.org

Example 3: Finding Unique Visitors Across Days

# Simulated web analytics — visitor IDs for each day of the week
monday = {"user_001", "user_002", "user_003", "user_004"}
tuesday = {"user_002", "user_003", "user_005", "user_006"}
wednesday = {"user_001", "user_003", "user_006", "user_007"}
thursday = {"user_003", "user_004", "user_008"}
friday = {"user_001", "user_002", "user_005", "user_009"}

all_days = [monday, tuesday, wednesday, thursday, friday]
day_names = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]

# Total unique visitors across the week
all_visitors = set()
for day in all_days:
    all_visitors |= day  # union
print(f"Total unique visitors: {len(all_visitors)}")
# {user_001 through user_009} = 9

# Visitors who came EVERY day
daily_visitors = monday
for day in all_days[1:]:
    daily_visitors &= day  # intersection
print(f"Visited every day: {daily_visitors}")
# Only user_003 was there Mon-Fri

# First-time visitors each day
seen = set()
for name, day in zip(day_names, all_days):
    new_visitors = day - seen
    print(f"  {name}: {len(new_visitors)} new visitors — {new_visitors}")
    seen |= day

# Daily unique visitor counts
print("\nDaily visitor counts:")
for name, day in zip(day_names, all_days):
    print(f"  {name}: {len(day)} visitors")

Example 4: Using Tuples and Sets Together

from collections import namedtuple

# Tracking student course registrations
Registration = namedtuple("Registration", ["student", "course", "semester"])

registrations = [
    Registration("Alice", "Python", "Fall 2025"),
    Registration("Bob", "Python", "Fall 2025"),
    Registration("Alice", "SQL", "Fall 2025"),
    Registration("Charlie", "Python", "Fall 2025"),
    Registration("Bob", "SQL", "Fall 2025"),
    Registration("Alice", "Python", "Spring 2026"),  # re-registration
    Registration("Diana", "Tableau", "Spring 2026"),
]

# Unique student-course pairs (ignoring semester)
unique_pairs = {(r.student, r.course) for r in registrations}
print(f"Unique student-course pairs: {len(unique_pairs)}")
for student, course in sorted(unique_pairs):
    print(f"  {student} -> {course}")

# All unique students
students = {r.student for r in registrations}
print(f"\nUnique students: {students}")

# All unique courses
courses = {r.course for r in registrations}
print(f"Unique courses: {courses}")

# Students per course
for course in sorted(courses):
    enrolled = {r.student for r in registrations if r.course == course}
    print(f"  {course}: {enrolled}")

Practice Exercises

Test your understanding of tuples and sets. Try each exercise before reading the hint.

Exercise 1: Tuple Statistics Write a function tuple_stats(t) that takes a tuple of numbers and returns a named tuple with fields minimum, maximum, total, average, and count. Do not use import statistics.

# Example:
# stats = tuple_stats((10, 20, 30, 40, 50))
# stats.minimum  -> 10
# stats.maximum  -> 50
# stats.total    -> 150
# stats.average  -> 30.0
# stats.count    -> 5

Hint: Use collections.namedtuple to define a Stats type, then compute each value with built-in functions.

Exercise 2: Symmetric Difference Without the Operator Write a function symmetric_diff(a, b) that returns the symmetric difference of two sets without using ^, symmetric_difference(), or symmetric_difference_update(). Use only union, intersection, and difference.

# Example:
# symmetric_diff({1, 2, 3}, {2, 3, 4})  -> {1, 4}

Hint: The symmetric difference is (A - B) | (B - A), or equivalently (A | B) - (A & B).

Exercise 3: Remove Duplicates Preserving Order (Case-Insensitive) Write a function unique_words(text) that takes a string, splits it into words, and returns a list of unique words preserving their first occurrence order. Treat "Python" and "python" as the same word, but keep the casing of the first occurrence.

# Example:
# unique_words("Python is great and python is fun and PYTHON rocks")
# -> ['Python', 'is', 'great', 'and', 'fun', 'rocks']

Hint: Use a set to track lowercase versions of words you've already seen.

Exercise 4: Common Friends Given a dictionary mapping person names to sets of friends, write a function common_friends(network, person_a, person_b) that returns the set of mutual friends of two people (excluding the two people themselves).

# Example:
# network = {
#     "Alice": {"Bob", "Charlie", "Diana"},
#     "Bob": {"Alice", "Charlie", "Eve"},
#     "Charlie": {"Alice", "Bob", "Diana", "Eve"},
# }
# common_friends(network, "Alice", "Bob")  -> {"Charlie"}

Hint: Use set intersection and then discard the two people from the result.

Exercise 5: Tuple-Based Sparse Matrix Write a class SparseMatrix that stores only non-zero values using a dictionary with (row, col) tuple keys. Implement set(row, col, value), get(row, col), and non_zero_count() methods.

# Example:
# m = SparseMatrix()
# m.set(0, 0, 5)
# m.set(2, 3, 8)
# m.get(0, 0)       -> 5
# m.get(1, 1)       -> 0  (default for missing entries)
# m.non_zero_count() -> 2

Hint: Use a dictionary with (row, col) tuples as keys. The get method should return 0 if the key is not present.

Exercise 6: Set-Based Venn Diagram Analysis Write a function venn_analysis(set_a, set_b, label_a="A", label_b="B") that prints a complete Venn diagram analysis: elements only in A, elements only in B, elements in both, total unique elements, and whether one is a subset of the other.

# Example:
# venn_analysis({1, 2, 3, 4, 5}, {3, 4, 5, 6, 7}, "Odds", "Evens")
# Only in Odds:       {1, 2}
# Only in Evens:      {6, 7}
# In both:            {3, 4, 5}
# Total unique:       7
# Odds subset of Evens? No
# Evens subset of Odds? No

Hint: Use difference for "only in A", difference for "only in B", intersection for "in both", and union for "total unique."


Summary

In this chapter, you learned:

Tuples:

  • What tuples are — ordered, immutable sequences
  • Creating tuples — parentheses, packing, single-element with trailing comma, tuple() constructor, repetition, concatenation
  • Accessing elements — positive indexing, negative indexing, slicing with start:stop:step
  • Tuple unpacking — basic destructuring, swapping variables, starred *rest unpacking, ignoring values with _
  • Named tuplescollections.namedtuple, typing.NamedTuple, _replace(), _asdict(), defaults
  • Tuple methodscount() and index(), plus built-in functions (len, min, max, sum, sorted)
  • Immutability nuances — references vs objects, mutable elements inside tuples, hashability rules
  • Use cases — dictionary keys, function return values, data integrity, set elements, string formatting

Sets:

  • What sets are — unordered collections of unique, hashable elements
  • Creating sets — curly braces, set() constructor, set comprehensions
  • Modifying setsadd(), update(), remove(), discard(), pop(), clear()
  • Mathematical operations — union (|), intersection (&), difference (-), symmetric difference (^) — both operator and method syntax
  • Subset and superset testing<=, <, >=, >, issubset(), issuperset(), isdisjoint()
  • Frozen sets — immutable sets that can be dict keys and set elements
  • O(1) membership testing — why in is dramatically faster with sets than lists
  • Common patterns — deduplication (with and without order), finding common elements, data validation, customer analysis

Choosing the right data structure:

  • List — ordered, mutable, allows duplicates (dynamic collections)
  • Tuple — ordered, immutable, allows duplicates (fixed records, dict keys)
  • Set — unordered, mutable, unique elements (fast lookup, deduplication)
  • Frozenset — unordered, immutable, unique elements (hashable set)

Next up: Dictionaries — key-value pair mappings for structured data storage and retrieval.