Why File Handling Matters
Almost every real-world program needs to work with files at some point. Without file handling, data lives only in memory and disappears the moment your program ends. File handling lets your programs:
- Persist data — Save user information, application state, or computation results so they survive after the program exits.
- Read configurations — Load settings from config files (
.ini,.json,.yaml) instead of hardcoding values. - Process logs — Analyze server logs, error reports, or audit trails stored in text files.
- Exchange data — Import and export data in standard formats like CSV and JSON to communicate with other programs, databases, or APIs.
- Generate reports — Write output files such as summaries, invoices, or analytics dashboards.
- Automate workflows — Batch-process thousands of files, rename them, convert formats, or extract information.
Python makes file handling straightforward with built-in functions and modules. By the end of this chapter, you will be comfortable reading, writing, and managing files of all kinds.
Opening Files
The built-in open() function is the gateway to all file operations in Python. It returns a file object that you use to read from or write to the file.
Basic Syntax
file_object = open(filename, mode, encoding)
filename— The path to the file (string orPathobject).mode— How you want to open the file (read, write, append, etc.). Defaults to"r".encoding— The character encoding to use. Defaults to the platform default (usually UTF-8 on modern systems, but not guaranteed).
A Simple Example
# Open a file for reading
file = open("data.txt", "r")
content = file.read()
print(content)
file.close() # Always close when done manually
File Modes Quick Reference
| Mode | Description |
|---|---|
"r" | Read (default). File must exist. |
"w" | Write. Creates file or overwrites existing. |
"a" | Append. Creates file or adds to end. |
"x" | Exclusive create. Error if file already exists. |
"r+" | Read and write. File must exist. |
"w+" | Write and read. Creates or overwrites. |
"a+" | Append and read. Creates or adds to end. |
"rb" | Read in binary mode. |
"wb" | Write in binary mode. |
We will explore each mode in detail later in this chapter.
The encoding Parameter
Always specify encoding explicitly when working with text files. This avoids surprises across different operating systems:
# Recommended: specify encoding
file = open("data.txt", "r", encoding="utf-8")
content = file.read()
file.close()
# Without encoding, Python uses the platform default
# On Windows this might be 'cp1252', on Linux/Mac 'utf-8'
file = open("data.txt", "r") # encoding depends on OS
Common encodings you may encounter:
| Encoding | Description |
|---|---|
"utf-8" | Universal standard. Handles all languages. |
"ascii" | English-only, 128 characters. |
"latin-1" | Western European languages. Never raises decode errors. |
"utf-16" | Used by some Windows applications. |
"cp1252" | Windows default for Western European locales. |
Tip: When in doubt, use
encoding="utf-8". It covers the vast majority of use cases.
The with Statement (Context Manager)
The with statement is the recommended way to work with files in Python. It guarantees the file will be properly closed, even if an error occurs inside the block.
Why with Is Preferred
Without with, you must remember to close the file manually. If an exception occurs before file.close(), the file stays open, which can lead to data loss or resource leaks.
Without with (risky):
file = open("data.txt", "r")
content = file.read()
# If an error occurs here, file.close() never runs!
file.close()
With with (safe):
with open("data.txt", "r", encoding="utf-8") as file:
content = file.read()
print(content)
# File is automatically closed here, even if an error occurred
What Happens Without with
Consider this scenario where an error prevents the file from closing:
# DANGEROUS: file may remain open
file = open("data.txt", "r")
content = file.read()
result = int(content) # ValueError if content isn't a number!
file.close() # This line never executes if the error above fires
To handle this properly without with, you would need a try/finally block:
# Safe but verbose
file = open("data.txt", "r")
try:
content = file.read()
result = int(content)
finally:
file.close() # Runs no matter what
The with statement does exactly the same thing, but more cleanly:
# Clean and safe
with open("data.txt", "r") as file:
content = file.read()
result = int(content)
# file.close() is called automatically
Opening Multiple Files
You can open multiple files in a single with statement:
with open("input.txt", "r", encoding="utf-8") as infile, \
open("output.txt", "w", encoding="utf-8") as outfile:
for line in infile:
outfile.write(line.upper())
Starting with Python 3.10, you can use parentheses for a cleaner look:
with (
open("input.txt", "r", encoding="utf-8") as infile,
open("output.txt", "w", encoding="utf-8") as outfile,
):
for line in infile:
outfile.write(line.upper())
Checking If a File Is Closed
with open("data.txt", "r") as f:
print(f.closed) # False — file is open inside the block
print(f.closed) # True — file is closed outside the block
Best Practice: Always use
withfor file operations. There is almost never a reason to use manualopen()/close().
Reading Files
Python offers several ways to read file contents, each suited to different situations.
Assume we have a file called notes.txt with this content:
Hello, World!
Python is great.
File handling is useful.
This is the last line.
read() — Read the Entire File
The read() method returns the complete file content as a single string:
with open("notes.txt", "r", encoding="utf-8") as f:
content = f.read()
print(content)
# Output:
# Hello, World!
# Python is great.
# File handling is useful.
# This is the last line.
print(type(content)) # <class 'str'>
print(len(content)) # Total number of characters in the file
Warning:
read()loads the entire file into memory. For very large files (hundreds of MB or more), this can crash your program. Use line-by-line iteration for large files.
read(n) — Read N Characters
You can pass an integer to read() to read only that many characters:
with open("notes.txt", "r", encoding="utf-8") as f:
first_five = f.read(5)
print(first_five) # Hello
next_eight = f.read(8)
print(next_eight) # , World!
Notice that the second read(8) picks up where the first one left off. The file has an internal cursor (more on this later).
readline() — Read One Line
The readline() method reads a single line from the file, including the trailing newline character (\n):
with open("notes.txt", "r", encoding="utf-8") as f:
line1 = f.readline()
print(line1) # 'Hello, World!\n'
print(repr(line1)) # 'Hello, World!\n'
line2 = f.readline()
print(line2) # 'Python is great.\n'
You can use readline() in a loop:
with open("notes.txt", "r", encoding="utf-8") as f:
while True:
line = f.readline()
if not line: # Empty string means end of file
break
print(line.strip())
# Output:
# Hello, World!
# Python is great.
# File handling is useful.
# This is the last line.
readlines() — Read All Lines into a List
The readlines() method returns a list where each element is a line from the file:
with open("notes.txt", "r", encoding="utf-8") as f:
lines = f.readlines()
print(lines)
# ['Hello, World!\n', 'Python is great.\n', 'File handling is useful.\n', 'This is the last line.\n']
print(len(lines)) # 4
Iterating Line by Line (Most Memory-Efficient)
The best way to process a file line by line is to iterate directly over the file object. Python reads one line at a time, keeping memory usage low even for enormous files:
with open("notes.txt", "r", encoding="utf-8") as f:
for line in f:
print(line.strip())
# Output:
# Hello, World!
# Python is great.
# File handling is useful.
# This is the last line.
This approach is strongly recommended for large files because it never loads the entire file into memory.
Stripping Newlines
Every method that reads lines includes the trailing \n. Use strip() to remove it:
with open("notes.txt", "r", encoding="utf-8") as f:
for line in f:
clean_line = line.strip() # Removes leading/trailing whitespace including \n
print(clean_line)
You can also use rstrip("\n") if you want to remove only the trailing newline but keep other whitespace:
with open("notes.txt", "r", encoding="utf-8") as f:
for line in f:
clean_line = line.rstrip("\n")
print(clean_line)
Reading into a List Without Newlines
A common pattern to get a clean list of lines:
with open("notes.txt", "r", encoding="utf-8") as f:
lines = [line.strip() for line in f]
print(lines)
# ['Hello, World!', 'Python is great.', 'File handling is useful.', 'This is the last line.']
Reading in Chunks
For processing very large files efficiently, you can read fixed-size chunks:
def read_in_chunks(filepath, chunk_size=1024):
"""Read a file in fixed-size chunks."""
with open(filepath, "r", encoding="utf-8") as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break
# Process each chunk
print(f"Read {len(chunk)} characters")
read_in_chunks("large_file.txt")
Comparison of Reading Methods
| Method | Returns | Memory Usage | Best For |
|---|---|---|---|
read() | Entire string | High (whole file) | Small files |
read(n) | N characters | Low | Chunk processing |
readline() | One line string | Low | Reading one line at a time |
readlines() | List of strings | High (whole file) | When you need all lines in a list |
for line in f | One line per loop | Low | Large files (recommended) |
Writing Files
write() — Write a String
The write() method writes a string to the file. It does not add a newline automatically — you must include \n yourself:
with open("output.txt", "w", encoding="utf-8") as f:
f.write("Hello, World!\n")
f.write("This is line 2.\n")
f.write("This is line 3.\n")
The resulting output.txt:
Hello, World!
This is line 2.
This is line 3.
write() returns the number of characters written:
with open("output.txt", "w", encoding="utf-8") as f:
chars_written = f.write("Hello!\n")
print(chars_written) # 7 (6 characters + 1 newline)
writelines() — Write Multiple Strings
The writelines() method takes an iterable of strings and writes them all. Like write(), it does not add newlines between items:
lines = ["Line 1\n", "Line 2\n", "Line 3\n"]
with open("output.txt", "w", encoding="utf-8") as f:
f.writelines(lines)
If your list does not already contain newlines, add them:
items = ["apple", "banana", "cherry"]
with open("fruits.txt", "w", encoding="utf-8") as f:
f.writelines(item + "\n" for item in items)
Overwriting vs. Appending
Overwriting ("w" mode) — Erases all existing content and starts fresh:
# First write — creates the file
with open("log.txt", "w", encoding="utf-8") as f:
f.write("Log started.\n")
# Second write — ERASES everything and writes new content
with open("log.txt", "w", encoding="utf-8") as f:
f.write("Fresh start.\n")
# log.txt now contains ONLY: "Fresh start.\n"
Appending ("a" mode) — Adds to the end of the file without erasing:
# First write — creates the file
with open("log.txt", "w", encoding="utf-8") as f:
f.write("Log started.\n")
# Append — adds to the end
with open("log.txt", "a", encoding="utf-8") as f:
f.write("New entry added.\n")
# log.txt now contains:
# Log started.
# New entry added.
Creating New Files Safely
Use "x" mode to create a file only if it does not already exist. This prevents accidentally overwriting important data:
try:
with open("important_data.txt", "x", encoding="utf-8") as f:
f.write("This file was just created.\n")
print("File created successfully.")
except FileExistsError:
print("File already exists! Not overwriting.")
Writing with print()
You can redirect print() output to a file using the file parameter:
with open("output.txt", "w", encoding="utf-8") as f:
print("Hello, World!", file=f)
print("This is line 2.", file=f)
print("Value:", 42, file=f)
This is convenient because print() automatically adds newlines and can handle multiple arguments with spaces.
File Modes Deep Dive
Understanding file modes is essential. Here is a comprehensive reference:
Text Modes
| Mode | Read | Write | Creates File | Truncates (Erases) | Cursor Position | File Must Exist |
|---|---|---|---|---|---|---|
"r" | Yes | No | No | No | Beginning | Yes |
"w" | No | Yes | Yes | Yes | Beginning | No |
"a" | No | Yes | Yes | No | End | No |
"x" | No | Yes | Yes | N/A (new file) | Beginning | Must NOT exist |
"r+" | Yes | Yes | No | No | Beginning | Yes |
"w+" | Yes | Yes | Yes | Yes | Beginning | No |
"a+" | Yes | Yes | Yes | No | End | No |
Binary Modes
Add "b" to any mode for binary file operations (images, PDFs, executables, etc.):
| Mode | Description |
|---|---|
"rb" | Read binary |
"wb" | Write binary (creates or truncates) |
"ab" | Append binary |
"xb" | Exclusive create binary |
"rb+" | Read and write binary |
"wb+" | Write and read binary (truncates) |
"ab+" | Append and read binary |
Text vs. Binary Mode
| Aspect | Text Mode ("r", "w") | Binary Mode ("rb", "wb") |
|---|---|---|
| Data type | str | bytes |
| Newline handling | Converts \r\n to \n on read | No conversion |
| Encoding | Uses specified encoding | No encoding |
| Use for | .txt, .csv, .json, .html | Images, audio, video, .pdf |
The + Modes Explained
The + sign adds the complementary capability to a mode:
# r+ : Read AND write. File must exist. Cursor starts at beginning.
with open("data.txt", "r+", encoding="utf-8") as f:
content = f.read() # Read existing content
f.write("Appended!") # Write at current cursor position (end after read)
# w+ : Write AND read. Creates or truncates file.
with open("data.txt", "w+", encoding="utf-8") as f:
f.write("New content")
f.seek(0) # Move cursor back to beginning
content = f.read() # Now we can read what we wrote
print(content) # New content
# a+ : Append AND read. Creates file if needed. Write cursor always at end.
with open("data.txt", "a+", encoding="utf-8") as f:
f.write("More data\n")
f.seek(0) # Move cursor to beginning for reading
content = f.read() # Read entire file
print(content)
File Pointer / Cursor
Every open file has an internal cursor (also called the file pointer) that tracks your current position in the file. Reading or writing advances the cursor forward.
tell() — Get Current Position
The tell() method returns the current position of the cursor in bytes:
with open("notes.txt", "r", encoding="utf-8") as f:
print(f.tell()) # 0 — cursor at the very start
f.read(5)
print(f.tell()) # 5 — moved forward 5 bytes
f.readline()
print(f.tell()) # Position after the first line ends
seek() — Move the Cursor
The seek(offset, whence) method moves the cursor:
offset— Number of bytes to move.whence— Reference point (optional):0— From the beginning (default)1— From the current position (binary mode only)2— From the end (binary mode only)
with open("notes.txt", "r", encoding="utf-8") as f:
content = f.read()
print(f.tell()) # Cursor is at the end
f.seek(0) # Move back to the beginning
print(f.tell()) # 0
first_line = f.readline()
print(first_line.strip()) # Hello, World!
Practical Example: Re-reading a File
with open("data.txt", "r", encoding="utf-8") as f:
# First pass: count lines
line_count = sum(1 for _ in f)
print(f"Total lines: {line_count}")
# Reset cursor to re-read
f.seek(0)
# Second pass: process content
for line in f:
print(line.strip().upper())
Practical Example: Overwriting Part of a File
# Create a sample file
with open("record.txt", "w", encoding="utf-8") as f:
f.write("Name: Alice\n")
f.write("Score: 085\n")
# Overwrite part of it using seek
with open("record.txt", "r+", encoding="utf-8") as f:
content = f.read()
print("Before:", repr(content))
f.seek(0) # Go back to start
# Overwrite with same-length content
f.write("Name: Bobby\n")
f.write("Score: 099\n")
f.seek(0)
print("After:", f.read())
# Output:
# Before: 'Name: Alice\nScore: 085\n'
# After: Name: Bobby
# Score: 099
Using seek() in Binary Mode
In binary mode, you can seek relative to the current position or the end of the file:
with open("data.bin", "rb") as f:
f.seek(0, 2) # Move to the end of the file
file_size = f.tell()
print(f"File size: {file_size} bytes")
f.seek(-10, 2) # Move to 10 bytes before the end
last_10 = f.read()
print(f"Last 10 bytes: {last_10}")
Working with CSV Files
CSV (Comma-Separated Values) is one of the most common data exchange formats. Python's csv module handles the tricky parts like quoting, escaping, and different delimiters.
Why Use the csv Module?
You might think you can just split lines by commas:
# Naive approach — BREAKS with commas inside quoted fields!
line = 'John,"New York, NY",30'
fields = line.split(",")
print(fields) # ['John', '"New York', ' NY"', '30'] — WRONG!
The csv module handles this correctly:
import csv
import io
line = 'John,"New York, NY",30'
reader = csv.reader(io.StringIO(line))
for row in reader:
print(row) # ['John', 'New York, NY', '30'] — CORRECT!
csv.reader — Reading CSV Files
Assume students.csv contains:
Name,Age,Score
Priya,22,95
Rahul,24,82
Ananya,23,90
import csv
with open("students.csv", "r", encoding="utf-8") as f:
reader = csv.reader(f)
header = next(reader) # Read the header row
print("Columns:", header)
for row in reader:
name, age, score = row
print(f"{name} is {age} years old and scored {score}")
# Output:
# Columns: ['Name', 'Age', 'Score']
# Priya is 22 years old and scored 95
# Rahul is 24 years old and scored 82
# Ananya is 23 years old and scored 90
csv.DictReader — Reading CSV as Dictionaries
DictReader uses the first row as keys, giving you a dictionary for each row:
import csv
with open("students.csv", "r", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
print(f"{row['Name']}: {row['Score']}")
# Each row is an OrderedDict: {'Name': 'Priya', 'Age': '22', 'Score': '95'}
# Output:
# Priya: 95
# Rahul: 82
# Ananya: 90
csv.writer — Writing CSV Files
import csv
data = [
["Name", "Age", "Score"],
["Priya", 22, 95],
["Rahul", 24, 82],
["Ananya", 23, 90],
]
with open("students.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerows(data) # Write all rows at once
You can also write one row at a time:
import csv
with open("students.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["Name", "Age", "Score"]) # Header
writer.writerow(["Priya", 22, 95])
writer.writerow(["Rahul", 24, 82])
Important: Always pass
newline=""when opening CSV files. This prevents blank lines between rows on Windows.
csv.DictWriter — Writing CSV from Dictionaries
import csv
students = [
{"Name": "Priya", "Age": 22, "Score": 95},
{"Name": "Rahul", "Age": 24, "Score": 82},
{"Name": "Ananya", "Age": 23, "Score": 90},
]
with open("students.csv", "w", newline="", encoding="utf-8") as f:
fieldnames = ["Name", "Age", "Score"]
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader() # Writes the header row
writer.writerows(students) # Writes all data rows
Custom Delimiters
Not all "CSV" files use commas. Some use tabs, semicolons, or pipes:
import csv
# Reading a tab-separated file
with open("data.tsv", "r", encoding="utf-8") as f:
reader = csv.reader(f, delimiter="\t")
for row in reader:
print(row)
# Writing a semicolon-separated file
with open("data_semicolon.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f, delimiter=";")
writer.writerow(["Name", "City", "Score"])
writer.writerow(["Priya", "Mumbai", 95])
Quoting Options
The csv module provides different quoting strategies:
import csv
data = [["Name", "Comment"], ["Alice", 'She said "hello"'], ["Bob", "No comment"]]
# QUOTE_MINIMAL (default) — only quote fields that need it
with open("out.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
writer.writerows(data)
# QUOTE_ALL — quote every field
with open("out.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f, quoting=csv.QUOTE_ALL)
writer.writerows(data)
# QUOTE_NONNUMERIC — quote all non-numeric fields
with open("out.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)
writer.writerows(data)
| Quoting Constant | Behavior |
|---|---|
csv.QUOTE_MINIMAL | Quote only fields that contain special chars |
csv.QUOTE_ALL | Quote every field |
csv.QUOTE_NONNUMERIC | Quote all non-numeric fields |
csv.QUOTE_NONE | Never quote (use escape character instead) |
Working with JSON Files
JSON (JavaScript Object Notation) is the dominant data format for web APIs, configuration files, and data exchange. Python's json module provides seamless conversion between Python objects and JSON.
Python to JSON Type Mapping
| Python | JSON |
|---|---|
dict | object {} |
list, tuple | array [] |
str | string "" |
int, float | number |
True / False | true / false |
None | null |
json.load() — Read JSON from a File
Assume config.json contains:
{
"database": {
"host": "localhost",
"port": 5432,
"name": "myapp"
},
"debug": true,
"allowed_hosts": ["localhost", "127.0.0.1"]
}
import json
with open("config.json", "r", encoding="utf-8") as f:
config = json.load(f)
print(config["database"]["host"]) # localhost
print(config["database"]["port"]) # 5432
print(config["debug"]) # True
print(config["allowed_hosts"]) # ['localhost', '127.0.0.1']
print(type(config)) # <class 'dict'>
json.dump() — Write JSON to a File
import json
data = {
"name": "Meritshot",
"courses": ["Python", "SQL", "Power BI"],
"students": 5000,
"active": True
}
with open("data.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
The resulting data.json:
{
"name": "Meritshot",
"courses": [
"Python",
"SQL",
"Power BI"
],
"students": 5000,
"active": true
}
The indent Parameter
The indent parameter controls pretty-printing:
import json
data = {"name": "Alice", "scores": [90, 85, 92]}
# No indent — compact, single line (good for saving space)
print(json.dumps(data))
# {"name": "Alice", "scores": [90, 85, 92]}
# indent=2 — readable, indented by 2 spaces
print(json.dumps(data, indent=2))
# {
# "name": "Alice",
# "scores": [
# 90,
# 85,
# 92
# ]
# }
# indent=4 — more indentation
print(json.dumps(data, indent=4))
json.loads() and json.dumps() — String Conversion
These work with strings instead of files:
import json
# Python dict to JSON string
python_dict = {"name": "Priya", "age": 22, "enrolled": True}
json_string = json.dumps(python_dict)
print(json_string) # '{"name": "Priya", "age": 22, "enrolled": true}'
print(type(json_string)) # <class 'str'>
# JSON string to Python dict
json_text = '{"name": "Rahul", "age": 24, "enrolled": false}'
python_obj = json.loads(json_text)
print(python_obj) # {'name': 'Rahul', 'age': 24, 'enrolled': False}
print(type(python_obj)) # <class 'dict'>
print(python_obj["name"]) # Rahul
Handling Nested Data
JSON often contains deeply nested structures:
import json
api_response = {
"status": "success",
"data": {
"users": [
{
"id": 1,
"name": "Alice",
"address": {
"city": "Mumbai",
"pincode": "400001"
}
},
{
"id": 2,
"name": "Bob",
"address": {
"city": "Delhi",
"pincode": "110001"
}
}
],
"total": 2
}
}
# Write nested JSON
with open("api_data.json", "w", encoding="utf-8") as f:
json.dump(api_response, f, indent=2)
# Read and navigate nested JSON
with open("api_data.json", "r", encoding="utf-8") as f:
data = json.load(f)
for user in data["data"]["users"]:
print(f"{user['name']} lives in {user['address']['city']}")
# Output:
# Alice lives in Mumbai
# Bob lives in Delhi
Additional json.dump() / json.dumps() Parameters
import json
data = {"name": "Priya", "city": "Mumbai", "age": 22}
# sort_keys — Sort dictionary keys alphabetically
print(json.dumps(data, sort_keys=True, indent=2))
# {
# "age": 22,
# "city": "Mumbai",
# "name": "Priya"
# }
# ensure_ascii=False — Preserve non-ASCII characters (Hindi, Chinese, etc.)
data_hindi = {"name": "प्रिया", "city": "मुंबई"}
print(json.dumps(data_hindi, ensure_ascii=False, indent=2))
# {
# "name": "प्रिया",
# "city": "मुंबई"
# }
# separators — Customize separators for compact output
print(json.dumps(data, separators=(",", ":")))
# {"name":"Priya","city":"Mumbai","age":22}
Custom Serialization with default
Some Python objects (like datetime) are not JSON-serializable by default:
import json
from datetime import datetime, date
data = {
"event": "Enrollment",
"date": datetime(2026, 3, 15, 10, 30),
"today": date(2026, 3, 15)
}
# This will raise TypeError:
# json.dumps(data) # TypeError: Object of type datetime is not JSON serializable
# Solution: provide a custom serializer
def custom_serializer(obj):
if isinstance(obj, (datetime, date)):
return obj.isoformat()
raise TypeError(f"Type {type(obj)} not serializable")
json_string = json.dumps(data, default=custom_serializer, indent=2)
print(json_string)
# {
# "event": "Enrollment",
# "date": "2026-03-15T10:30:00",
# "today": "2026-03-15"
# }
Working with Binary Files
Binary files store data as raw bytes rather than text. This includes images, audio, video, PDFs, executables, and compressed archives. Use "b" modes ("rb", "wb", "ab") for these files.
Reading a Binary File
with open("photo.jpg", "rb") as f:
data = f.read()
print(type(data)) # <class 'bytes'>
print(len(data)) # File size in bytes
print(data[:10]) # First 10 bytes (e.g., b'\xff\xd8\xff\xe0...')
Writing a Binary File
# Write raw bytes
with open("output.bin", "wb") as f:
f.write(b"\x00\x01\x02\x03\x04")
f.write(bytes([72, 101, 108, 108, 111])) # "Hello" in ASCII bytes
Copying a File Byte by Byte
A practical example of copying any type of file:
def copy_file(source, destination, chunk_size=4096):
"""Copy a file in chunks (works for any file type)."""
bytes_copied = 0
with open(source, "rb") as src, open(destination, "wb") as dst:
while True:
chunk = src.read(chunk_size)
if not chunk:
break
dst.write(chunk)
bytes_copied += len(chunk)
print(f"Copied {bytes_copied} bytes from {source} to {destination}")
# Usage
copy_file("photo.jpg", "photo_backup.jpg")
Checking File Type by Magic Bytes
Many file types start with specific "magic bytes" that identify them:
def identify_file_type(filepath):
"""Identify file type by reading its first few bytes."""
signatures = {
b"\xff\xd8\xff": "JPEG image",
b"\x89PNG": "PNG image",
b"GIF87a": "GIF image",
b"GIF89a": "GIF image",
b"%PDF": "PDF document",
b"PK": "ZIP archive (or .docx, .xlsx)",
}
with open(filepath, "rb") as f:
header = f.read(8)
for magic, filetype in signatures.items():
if header.startswith(magic):
return filetype
return "Unknown"
# Usage
# print(identify_file_type("photo.jpg")) # JPEG image
# print(identify_file_type("report.pdf")) # PDF document
File and Directory Operations with os and pathlib
Python provides two main approaches for working with the file system: the older os / os.path module and the modern pathlib module (introduced in Python 3.4).
Checking File Existence
import os
# Using os.path
if os.path.exists("data.txt"):
print("File exists")
else:
print("File not found")
# Check specifically for file vs directory
print(os.path.isfile("data.txt")) # True if it is a file
print(os.path.isdir("data")) # True if it is a directory
Getting File Information
import os
# File size in bytes
size = os.path.getsize("data.txt")
print(f"Size: {size} bytes")
# Absolute path
abs_path = os.path.abspath("data.txt")
print(f"Absolute path: {abs_path}")
# File name and directory
print(os.path.basename("/home/user/data.txt")) # data.txt
print(os.path.dirname("/home/user/data.txt")) # /home/user
# Split path and extension
name, ext = os.path.splitext("report.csv")
print(name, ext) # report .csv
Listing Directory Contents
import os
# List all items in a directory
items = os.listdir(".")
print(items) # ['file1.txt', 'folder1', 'script.py', ...]
# List only files
files = [f for f in os.listdir(".") if os.path.isfile(f)]
print(files)
# List only directories
dirs = [d for d in os.listdir(".") if os.path.isdir(d)]
print(dirs)
# List files with a specific extension
csv_files = [f for f in os.listdir(".") if f.endswith(".csv")]
print(csv_files)
Creating and Removing Directories
import os
# Create a single directory
os.mkdir("new_folder")
# Create nested directories (like mkdir -p)
os.makedirs("path/to/nested/folder", exist_ok=True)
# exist_ok=True prevents error if directory already exists
# Remove an empty directory
os.rmdir("new_folder")
# Remove nested empty directories
os.removedirs("path/to/nested/folder")
Renaming and Moving Files
import os
# Rename a file
os.rename("old_name.txt", "new_name.txt")
# Move a file to a different directory
os.rename("file.txt", "archive/file.txt")
# For more robust moving, use shutil
import shutil
shutil.move("source.txt", "destination/source.txt")
# Copy a file
shutil.copy2("original.txt", "backup.txt") # Preserves metadata
shutil.copytree("source_dir", "backup_dir") # Copy entire directory
# Remove a non-empty directory
shutil.rmtree("old_directory")
Joining Paths Safely
Never concatenate paths with string concatenation. Use os.path.join():
import os
# Correct — works on any OS
path = os.path.join("data", "output", "report.csv")
print(path) # data/output/report.csv (or data\output\report.csv on Windows)
# Wrong — breaks on Windows
path = "data" + "/" + "output" + "/" + "report.csv"
Walking Directory Trees
os.walk() recursively traverses an entire directory tree:
import os
for dirpath, dirnames, filenames in os.walk("project"):
# dirpath — current directory path
# dirnames — list of subdirectories in dirpath
# filenames — list of files in dirpath
print(f"\nDirectory: {dirpath}")
print(f" Subdirectories: {dirnames}")
print(f" Files: {filenames}")
# Example output:
# Directory: project
# Subdirectories: ['src', 'data']
# Files: ['README.md', 'setup.py']
#
# Directory: project/src
# Subdirectories: []
# Files: ['main.py', 'utils.py']
#
# Directory: project/data
# Subdirectories: []
# Files: ['input.csv']
A practical example: find all Python files in a project:
import os
python_files = []
for dirpath, dirnames, filenames in os.walk("project"):
for filename in filenames:
if filename.endswith(".py"):
full_path = os.path.join(dirpath, filename)
python_files.append(full_path)
print(f"Found {len(python_files)} Python files:")
for f in python_files:
print(f" {f}")
os.path vs pathlib.Path Comparison
| Task | os.path | pathlib.Path |
|---|---|---|
| Join paths | os.path.join("a", "b") | Path("a") / "b" |
| Get file name | os.path.basename(p) | Path(p).name |
| Get directory | os.path.dirname(p) | Path(p).parent |
| Get extension | os.path.splitext(p)[1] | Path(p).suffix |
| Check existence | os.path.exists(p) | Path(p).exists() |
| Is file? | os.path.isfile(p) | Path(p).is_file() |
| Is directory? | os.path.isdir(p) | Path(p).is_dir() |
| Absolute path | os.path.abspath(p) | Path(p).resolve() |
| List directory | os.listdir(p) | Path(p).iterdir() |
| Glob files | glob.glob("*.py") | Path(".").glob("*.py") |
| Read file | open(p).read() | Path(p).read_text() |
| Write file | open(p, "w").write(s) | Path(p).write_text(s) |
| Create directory | os.makedirs(p) | Path(p).mkdir(parents=True) |
The pathlib Module (Modern Approach)
pathlib was introduced in Python 3.4 and provides an object-oriented interface for filesystem paths. It is now the recommended way to handle paths in new Python code.
Creating Path Objects
from pathlib import Path
# From a string
p = Path("data/output/report.csv")
# Current directory
cwd = Path.cwd()
print(cwd) # /home/user/project
# Home directory
home = Path.home()
print(home) # /home/user
# Joining paths with the / operator
data_dir = Path("data")
output_file = data_dir / "output" / "report.csv"
print(output_file) # data/output/report.csv
Path Properties
from pathlib import Path
p = Path("data/output/report.csv")
print(p.name) # report.csv — file name with extension
print(p.stem) # report — file name without extension
print(p.suffix) # .csv — file extension
print(p.suffixes) # ['.csv'] — all extensions (e.g., ['.tar', '.gz'])
print(p.parent) # data/output — parent directory
print(p.parents[0]) # data/output — immediate parent
print(p.parents[1]) # data — grandparent
print(p.parts) # ('data', 'output', 'report.csv')
Changing Parts of a Path
from pathlib import Path
p = Path("data/report.csv")
# Change extension
new_p = p.with_suffix(".json")
print(new_p) # data/report.json
# Change file name
new_p = p.with_name("summary.csv")
print(new_p) # data/summary.csv
# Change stem (keep extension)
new_p = p.with_stem("final_report")
print(new_p) # data/final_report.csv
Checking Properties
from pathlib import Path
p = Path("data.txt")
print(p.exists()) # True if path exists
print(p.is_file()) # True if it is a regular file
print(p.is_dir()) # True if it is a directory
print(p.is_absolute()) # True if path is absolute (/home/user/...)
Reading and Writing with Path
pathlib provides convenience methods for quick file I/O:
from pathlib import Path
# Write text to a file (creates or overwrites)
Path("greeting.txt").write_text("Hello, World!\n", encoding="utf-8")
# Read text from a file
content = Path("greeting.txt").read_text(encoding="utf-8")
print(content) # Hello, World!
# Write bytes
Path("data.bin").write_bytes(b"\x00\x01\x02\x03")
# Read bytes
data = Path("data.bin").read_bytes()
print(data) # b'\x00\x01\x02\x03'
Note:
read_text()andwrite_text()handle opening and closing the file for you, but they read/write the entire content at once. For line-by-line processing, useopen()withwith.
Glob Patterns
glob() finds files matching a pattern. rglob() searches recursively:
from pathlib import Path
project = Path("project")
# All .py files in the directory (not subdirectories)
for py_file in project.glob("*.py"):
print(py_file)
# All .py files recursively (including subdirectories)
for py_file in project.rglob("*.py"):
print(py_file)
# All CSV files in any 'data' subdirectory
for csv_file in project.rglob("data/*.csv"):
print(csv_file)
# All image files
for img in project.rglob("*.jpg"):
print(img)
for img in project.rglob("*.png"):
print(img)
Creating Directories
from pathlib import Path
# Create a single directory
Path("new_folder").mkdir(exist_ok=True)
# Create nested directories
Path("path/to/nested/folder").mkdir(parents=True, exist_ok=True)
Iterating Over Directory Contents
from pathlib import Path
data_dir = Path("data")
# List all items
for item in data_dir.iterdir():
if item.is_file():
print(f"File: {item.name} ({item.stat().st_size} bytes)")
elif item.is_dir():
print(f"Dir: {item.name}")
Getting File Metadata
from pathlib import Path
from datetime import datetime
p = Path("data.txt")
stat = p.stat()
print(f"Size: {stat.st_size} bytes")
print(f"Created: {datetime.fromtimestamp(stat.st_ctime)}")
print(f"Modified: {datetime.fromtimestamp(stat.st_mtime)}")
print(f"Accessed: {datetime.fromtimestamp(stat.st_atime)}")
Deleting Files
from pathlib import Path
# Delete a file
Path("temp.txt").unlink(missing_ok=True) # missing_ok prevents error if absent
# Delete an empty directory
Path("empty_folder").rmdir()
Temporary Files
The tempfile module creates temporary files and directories that are automatically cleaned up when no longer needed. This is useful for storing intermediate data during processing.
NamedTemporaryFile
Creates a temporary file with a name you can reference:
import tempfile
# Create a temporary file
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False,
encoding="utf-8") as tmp:
tmp.write("Temporary data\n")
tmp.write("More temporary data\n")
print(f"Temp file: {tmp.name}") # e.g., /tmp/tmpxyz123.txt
# The file still exists because delete=False
# Read it back
with open(tmp.name, "r", encoding="utf-8") as f:
print(f.read())
# Clean up manually
import os
os.unlink(tmp.name)
With delete=True (the default), the file is deleted when the with block ends:
import tempfile
with tempfile.NamedTemporaryFile(mode="w", suffix=".csv",
encoding="utf-8") as tmp:
tmp.write("Name,Score\n")
tmp.write("Alice,95\n")
tmp_path = tmp.name # Save the path for later reference
print(f"Temp file: {tmp_path}")
# File exists here
# File is automatically deleted here
TemporaryDirectory
Creates a temporary directory that is automatically removed when the with block ends:
import tempfile
from pathlib import Path
with tempfile.TemporaryDirectory() as tmpdir:
print(f"Temp dir: {tmpdir}") # e.g., /tmp/tmpxyz456
# Create files inside
data_file = Path(tmpdir) / "data.txt"
data_file.write_text("Hello from temp dir!", encoding="utf-8")
results_file = Path(tmpdir) / "results.csv"
results_file.write_text("Name,Score\nAlice,95\n", encoding="utf-8")
# Use the files
print(data_file.read_text(encoding="utf-8"))
# Entire directory and all contents are automatically deleted here
mkstemp and mkdtemp (Low-Level)
For more control, you can use the lower-level functions:
import tempfile
import os
# Create a temporary file (returns file descriptor and path)
fd, path = tempfile.mkstemp(suffix=".txt")
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
f.write("Low-level temp file\n")
# Read it back
with open(path, "r", encoding="utf-8") as f:
print(f.read())
finally:
os.unlink(path) # Manual cleanup required
Practical Examples
Example 1: Log File Analyzer
A program that reads a server log file, counts error types, and writes a summary report:
from collections import Counter
from datetime import datetime
def analyze_log(input_path, output_path):
"""Analyze a log file and generate a summary report."""
error_counts = Counter()
warning_counts = Counter()
total_lines = 0
error_lines = 0
warning_lines = 0
with open(input_path, "r", encoding="utf-8") as f:
for line in f:
total_lines += 1
line = line.strip()
if "ERROR" in line:
error_lines += 1
# Extract error type: "2026-03-15 10:30:00 ERROR TimeoutError: ..."
parts = line.split("ERROR")
if len(parts) > 1:
error_type = parts[1].strip().split(":")[0].strip()
error_counts[error_type] += 1
elif "WARNING" in line:
warning_lines += 1
parts = line.split("WARNING")
if len(parts) > 1:
warning_type = parts[1].strip().split(":")[0].strip()
warning_counts[warning_type] += 1
# Write the report
with open(output_path, "w", encoding="utf-8") as f:
f.write(f"Log Analysis Report\n")
f.write(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
f.write(f"{'=' * 50}\n\n")
f.write(f"Total lines processed: {total_lines}\n")
f.write(f"Error lines: {error_lines}\n")
f.write(f"Warning lines: {warning_lines}\n\n")
f.write("Top Errors:\n")
for error_type, count in error_counts.most_common(10):
f.write(f" {error_type}: {count}\n")
f.write("\nTop Warnings:\n")
for warning_type, count in warning_counts.most_common(10):
f.write(f" {warning_type}: {count}\n")
print(f"Report written to {output_path}")
# Usage:
# analyze_log("server.log", "log_report.txt")
Example 2: Configuration File Reader
A utility that reads a simple key=value config file, supporting comments and sections:
def read_config(filepath):
"""
Read a configuration file with the format:
# comment
[section]
key = value
"""
config = {}
current_section = "default"
with open(filepath, "r", encoding="utf-8") as f:
for line_num, line in enumerate(f, start=1):
line = line.strip()
# Skip empty lines and comments
if not line or line.startswith("#"):
continue
# Section header: [section_name]
if line.startswith("[") and line.endswith("]"):
current_section = line[1:-1].strip()
if current_section not in config:
config[current_section] = {}
continue
# Key = Value pair
if "=" in line:
key, value = line.split("=", 1) # Split on first = only
key = key.strip()
value = value.strip()
# Type conversion
if value.lower() in ("true", "yes"):
value = True
elif value.lower() in ("false", "no"):
value = False
elif value.isdigit():
value = int(value)
else:
try:
value = float(value)
except ValueError:
pass # Keep as string
if current_section not in config:
config[current_section] = {}
config[current_section][key] = value
else:
print(f"Warning: could not parse line {line_num}: {line}")
return config
def write_config(filepath, config):
"""Write a configuration dictionary to a file."""
with open(filepath, "w", encoding="utf-8") as f:
f.write("# Configuration file\n")
f.write(f"# Generated on {__import__('datetime').datetime.now()}\n\n")
for section, values in config.items():
f.write(f"[{section}]\n")
for key, value in values.items():
f.write(f"{key} = {value}\n")
f.write("\n")
# Example config file (app.conf):
# ---------------------------------
# # Application Configuration
#
# [server]
# host = localhost
# port = 8080
# debug = true
#
# [database]
# host = localhost
# port = 5432
# name = myapp
# ---------------------------------
# Usage:
# config = read_config("app.conf")
# print(config["server"]["host"]) # localhost
# print(config["server"]["port"]) # 8080 (as int)
# print(config["server"]["debug"]) # True (as bool)
Example 3: CSV Report Generator
A program that reads raw data from a CSV, processes it, and generates a summary report:
import csv
def generate_sales_report(input_csv, output_csv):
"""
Read a sales CSV and generate a summary by region.
Input CSV format: Date,Region,Product,Quantity,Price
Output CSV format: Region,TotalSales,AverageOrderValue,OrderCount
"""
region_data = {}
# Read input data
with open(input_csv, "r", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
region = row["Region"]
quantity = int(row["Quantity"])
price = float(row["Price"])
sale_amount = quantity * price
if region not in region_data:
region_data[region] = {"total_sales": 0, "order_count": 0}
region_data[region]["total_sales"] += sale_amount
region_data[region]["order_count"] += 1
# Calculate averages and write output
with open(output_csv, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=[
"Region", "TotalSales", "AverageOrderValue", "OrderCount"
])
writer.writeheader()
for region, data in sorted(region_data.items()):
avg_value = data["total_sales"] / data["order_count"]
writer.writerow({
"Region": region,
"TotalSales": round(data["total_sales"], 2),
"AverageOrderValue": round(avg_value, 2),
"OrderCount": data["order_count"],
})
print(f"Report saved to {output_csv}")
print(f"Processed {sum(d['order_count'] for d in region_data.values())} orders "
f"across {len(region_data)} regions")
# Usage:
# generate_sales_report("sales_data.csv", "sales_summary.csv")
Example 4: Student Records System (JSON-Based CRUD)
A complete mini-application that creates, reads, updates, and deletes student records stored in a JSON file:
import json
from pathlib import Path
RECORDS_FILE = "students.json"
def load_students():
"""Load student records from the JSON file."""
path = Path(RECORDS_FILE)
if not path.exists():
return []
with open(RECORDS_FILE, "r", encoding="utf-8") as f:
return json.load(f)
def save_students(students):
"""Save student records to the JSON file."""
with open(RECORDS_FILE, "w", encoding="utf-8") as f:
json.dump(students, f, indent=2, ensure_ascii=False)
def add_student(name, age, course, score):
"""Add a new student record."""
students = load_students()
# Generate a simple ID
new_id = max((s["id"] for s in students), default=0) + 1
student = {
"id": new_id,
"name": name,
"age": age,
"course": course,
"score": score
}
students.append(student)
save_students(students)
print(f"Added student: {name} (ID: {new_id})")
return new_id
def get_student(student_id):
"""Get a student by their ID."""
students = load_students()
for student in students:
if student["id"] == student_id:
return student
return None
def update_student(student_id, **kwargs):
"""Update a student's fields."""
students = load_students()
for student in students:
if student["id"] == student_id:
for key, value in kwargs.items():
if key in student and key != "id":
student[key] = value
save_students(students)
print(f"Updated student ID {student_id}")
return True
print(f"Student ID {student_id} not found")
return False
def delete_student(student_id):
"""Delete a student by their ID."""
students = load_students()
original_count = len(students)
students = [s for s in students if s["id"] != student_id]
if len(students) < original_count:
save_students(students)
print(f"Deleted student ID {student_id}")
return True
print(f"Student ID {student_id} not found")
return False
def list_students():
"""Display all students in a formatted table."""
students = load_students()
if not students:
print("No student records found.")
return
print(f"\n{'ID':<5} {'Name':<20} {'Age':<5} {'Course':<15} {'Score':<6}")
print("-" * 55)
for s in students:
print(f"{s['id']:<5} {s['name']:<20} {s['age']:<5} {s['course']:<15} {s['score']:<6}")
print(f"\nTotal students: {len(students)}")
def search_students(query):
"""Search students by name (case-insensitive)."""
students = load_students()
results = [s for s in students if query.lower() in s["name"].lower()]
if results:
print(f"Found {len(results)} match(es):")
for s in results:
print(f" ID {s['id']}: {s['name']} - {s['course']} (Score: {s['score']})")
else:
print(f"No students found matching '{query}'")
return results
# Usage example:
# add_student("Priya Sharma", 22, "Python", 95)
# add_student("Rahul Verma", 24, "Data Science", 88)
# add_student("Ananya Patel", 23, "Machine Learning", 92)
#
# list_students()
# Output:
# ID Name Age Course Score
# -------------------------------------------------------
# 1 Priya Sharma 22 Python 95
# 2 Rahul Verma 24 Data Science 88
# 3 Ananya Patel 23 Machine Learning 92
#
# Total students: 3
#
# update_student(2, score=91)
# delete_student(3)
# search_students("priya")
Common Mistakes
Mistake 1: Not Closing Files
# BAD — file may remain open if an error occurs
file = open("data.txt", "r")
content = file.read()
process(content) # If this raises an error, file.close() never runs
file.close()
# GOOD — use 'with' to guarantee closing
with open("data.txt", "r") as file:
content = file.read()
process(content)
Mistake 2: Using the Wrong Mode
# BAD — accidentally overwriting data you wanted to add to
with open("log.txt", "w") as f: # "w" erases everything!
f.write("New entry\n")
# GOOD — use "a" to append
with open("log.txt", "a") as f:
f.write("New entry\n")
# BAD — trying to write to a read-only file
with open("data.txt", "r") as f:
f.write("Hello") # io.UnsupportedOperation: not writable
# GOOD — open in write or read-write mode
with open("data.txt", "w") as f:
f.write("Hello")
Mistake 3: Encoding Errors (UnicodeDecodeError)
# BAD — may fail if the file contains non-ASCII characters
with open("data.txt", "r") as f:
content = f.read()
# UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0
# GOOD — specify encoding explicitly
with open("data.txt", "r", encoding="utf-8") as f:
content = f.read()
# If you're unsure of the encoding, try 'latin-1' (never raises decode errors)
with open("data.txt", "r", encoding="latin-1") as f:
content = f.read()
Mistake 4: Reading Entire Large Files into Memory
# BAD — loads a 2GB file entirely into memory
with open("huge_log.txt", "r") as f:
content = f.read() # May crash or freeze your computer!
# GOOD — process line by line
with open("huge_log.txt", "r") as f:
for line in f: # Reads one line at a time
if "ERROR" in line:
print(line.strip())
Mistake 5: Forgetting newline="" with CSV
# BAD — on Windows, this creates blank lines between rows
with open("data.csv", "w") as f:
writer = csv.writer(f)
writer.writerows(data)
# GOOD — always use newline="" with csv
with open("data.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(data)
Mistake 6: Not Handling FileNotFoundError
# BAD — crashes if the file does not exist
with open("missing_file.txt", "r") as f:
content = f.read()
# FileNotFoundError: [Errno 2] No such file or directory: 'missing_file.txt'
# GOOD — handle the error gracefully
try:
with open("missing_file.txt", "r", encoding="utf-8") as f:
content = f.read()
except FileNotFoundError:
print("File not found. Please check the path.")
content = ""
Mistake 7: Modifying a List While Iterating Over File Lines
# BAD — confusing logic that may produce unexpected results
with open("data.txt", "r") as f:
lines = f.readlines()
for i, line in enumerate(lines):
lines[i] = line.upper() # This works but is unclear
# GOOD — create a new list
with open("data.txt", "r", encoding="utf-8") as f:
lines = [line.strip().upper() for line in f]
Best Practices
1. Always Use with for File Operations
# This is always the right choice
with open("file.txt", "r", encoding="utf-8") as f:
content = f.read()
2. Always Specify Encoding Explicitly
# Be explicit — avoid platform-dependent behavior
with open("file.txt", "r", encoding="utf-8") as f:
content = f.read()
3. Handle Errors Gracefully
from pathlib import Path
def safe_read(filepath):
"""Read a file with comprehensive error handling."""
try:
with open(filepath, "r", encoding="utf-8") as f:
return f.read()
except FileNotFoundError:
print(f"Error: '{filepath}' does not exist.")
except PermissionError:
print(f"Error: No permission to read '{filepath}'.")
except UnicodeDecodeError:
print(f"Error: '{filepath}' contains non-UTF-8 characters.")
except IsADirectoryError:
print(f"Error: '{filepath}' is a directory, not a file.")
return None
4. Use pathlib for File Paths
from pathlib import Path
# Modern, clean, cross-platform
data_dir = Path("data")
output_file = data_dir / "results" / "report.csv"
# Create parent directories if needed
output_file.parent.mkdir(parents=True, exist_ok=True)
# Check before reading
if output_file.exists():
content = output_file.read_text(encoding="utf-8")
5. Process Large Files Line by Line
# Memory-efficient for any file size
def count_lines(filepath):
count = 0
with open(filepath, "r", encoding="utf-8") as f:
for _ in f:
count += 1
return count
6. Use Temporary Files for Intermediate Data
import tempfile
import json
# Process data through a temporary file
with tempfile.NamedTemporaryFile(mode="w", suffix=".json",
delete=True, encoding="utf-8") as tmp:
json.dump(intermediate_data, tmp)
tmp.flush() # Ensure data is written to disk
# Pass tmp.name to another function that reads it
process_file(tmp.name)
# Temp file is cleaned up automatically
7. Use Atomic Writes for Critical Data
When writing important files, write to a temporary file first, then rename it. This prevents data corruption if the program crashes mid-write:
import json
import tempfile
import os
from pathlib import Path
def safe_json_write(filepath, data):
"""Write JSON data atomically to prevent corruption."""
filepath = Path(filepath)
# Write to a temporary file in the same directory
fd, tmp_path = tempfile.mkstemp(
dir=filepath.parent,
suffix=".tmp"
)
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
# Atomic rename (on most filesystems)
os.replace(tmp_path, filepath)
except Exception:
os.unlink(tmp_path) # Clean up temp file on error
raise
Summary of Best Practices
| Practice | Why |
|---|---|
Use with statement | Guarantees file is closed properly |
Specify encoding="utf-8" | Avoids platform-dependent encoding issues |
Handle FileNotFoundError | Prevents crashes on missing files |
Use pathlib.Path for paths | Cross-platform, readable, object-oriented |
| Iterate line by line for large files | Keeps memory usage low |
Use newline="" with CSV | Prevents blank rows on Windows |
Use "x" mode for new files | Prevents accidental overwriting |
| Write to temp file, then rename | Prevents data corruption on crash |
Practice Exercises
Exercise 1: Word Counter
Write a program that reads a text file and prints:
- Total number of lines
- Total number of words
- Total number of characters (excluding newlines)
- The 5 most common words
# Hint structure:
def word_counter(filepath):
"""Count words, lines, and characters in a text file."""
# Read the file
# Count lines, words, characters
# Use collections.Counter for most common words
# Print results
pass
# Expected output:
# Lines: 150
# Words: 1234
# Characters: 6789
# Most common words:
# the: 45
# is: 32
# and: 28
# to: 25
# of: 22
Exercise 2: CSV Grade Calculator
Write a program that:
- Reads a CSV file with columns:
Name,Math,Science,English - Calculates the average score for each student
- Assigns a grade (A: 90+, B: 80+, C: 70+, D: 60+, F: below 60)
- Writes the results to a new CSV with columns:
Name,Average,Grade
# Input CSV (grades.csv):
# Name,Math,Science,English
# Priya,95,88,92
# Rahul,72,68,75
# Ananya,88,91,85
# Output CSV (results.csv):
# Name,Average,Grade
# Priya,91.67,A
# Rahul,71.67,C
# Ananya,88.0,B
Exercise 3: JSON Phonebook
Build a phonebook application using JSON for storage that supports:
- Adding a contact (name, phone, email)
- Searching by name
- Deleting a contact
- Listing all contacts
- Exporting contacts to CSV
# Hint: Use the Student Records System example as a starting point
# Store data in phonebook.json
# Each contact: {"name": "...", "phone": "...", "email": "..."}
Exercise 4: File Organizer
Write a program that organizes files in a directory by moving them into subfolders based on their extension:
# Before:
# downloads/
# photo1.jpg
# report.pdf
# data.csv
# script.py
# notes.txt
# image.png
# After:
# downloads/
# Images/
# photo1.jpg
# image.png
# Documents/
# report.pdf
# Data/
# data.csv
# Code/
# script.py
# Text/
# notes.txt
# Hint: Use pathlib for path operations and shutil.move for moving files
# Define a mapping: {".jpg": "Images", ".png": "Images", ".pdf": "Documents", ...}
Exercise 5: Log File Merger
Write a program that:
- Reads multiple log files from a directory (all
.logfiles) - Each line has a timestamp format:
2026-03-15 10:30:00 - message - Merges all lines from all files
- Sorts them by timestamp
- Writes the sorted, merged result to a single output file
# Hint:
# 1. Use pathlib.Path.glob("*.log") to find all log files
# 2. Read all lines from all files into a single list
# 3. Sort by the timestamp portion of each line
# 4. Write the sorted lines to output
Exercise 6: File Backup Tool
Write a backup utility that:
- Takes a source directory and a backup directory as arguments
- Copies all files from source to backup, preserving directory structure
- Only copies files that are newer than the backup copy (or missing from backup)
- Generates a backup report listing all copied files and total bytes transferred
# Hint:
# Use pathlib for path operations
# Use os.stat().st_mtime to compare modification times
# Use shutil.copy2 to copy files (preserves metadata)
# Use os.walk or Path.rglob to traverse directories
Summary
In this chapter, you learned:
- Why file handling matters — Programs need to persist data, read configurations, process logs, and exchange data with other systems.
- Opening files — The
open()function with various modes (r,w,a,x,r+,w+,a+) and the importance of specifyingencoding="utf-8". - The
withstatement — The recommended way to work with files, guaranteeing proper cleanup even when errors occur. - Reading files —
read(),readline(),readlines(), line-by-line iteration (most memory-efficient), and chunk-based reading. - Writing files —
write(),writelines(), the difference between overwriting (w) and appending (a), and creating files safely withxmode. - File modes in depth — Text vs. binary modes, the
+read-write modes, and when to use each. - File pointer manipulation — Using
tell()andseek()to navigate within files. - CSV files — Reading and writing with
csv.reader,csv.writer,csv.DictReader, andcsv.DictWriter, plus custom delimiters and quoting options. - JSON files —
json.load(),json.dump(),json.loads(),json.dumps(), pretty-printing withindent, handling nested data, and custom serialization. - Binary files — Reading and writing bytes, copying files, and identifying file types by magic bytes.
- File system operations — Using
osandos.pathfor checking existence, listing directories, creating and removing directories, and walking directory trees. - The
pathlibmodule — The modern, object-oriented approach to file paths withPathobjects, glob patterns, and convenience methods. - Temporary files — Using the
tempfilemodule for intermediate data that cleans up after itself. - Best practices — Always use
with, specify encoding, handle errors gracefully, usepathlibfor paths, and process large files line by line.
File handling is a foundational skill that you will use in almost every Python project. With the techniques covered here, you are well-equipped to read, write, and manage files of all types confidently.