Strings in Python
A string is an immutable sequence of Unicode characters. Every piece of text you work with in Python — a name, a sentence, a line from a file, an HTTP response — is a string. Strings are one of the most frequently used data types, and Python gives you an exceptionally rich set of tools for creating, inspecting, and transforming them.
# Strings are objects of the str class
greeting = "Hello, World!"
print(type(greeting)) # <class 'str'>
print(len(greeting)) # 13
Because Python 3 strings are Unicode by default, you can include characters from any language or symbol set:
hindi = "नमस्ते"
emoji = "Python is fun 🐍"
japanese = "こんにちは"
print(hindi, emoji, japanese)
Creating Strings
Python offers several ways to create strings, each suited to different situations.
Single and Double Quotes
Single quotes and double quotes are interchangeable. Use whichever lets you avoid escaping:
name = 'Meritshot'
name = "Meritshot" # exactly the same
# Use the other quote style to embed quotes naturally
message = "It's a beautiful day"
html = '<div class="container">Hello</div>'
Triple Quotes (Multi-Line Strings)
Triple quotes (""" or ''') let you write strings that span multiple lines. The line breaks are preserved as \n characters:
poem = """Roses are red,
Violets are blue,
Python is awesome,
And so are you."""
print(poem)
# Roses are red,
# Violets are blue,
# Python is awesome,
# And so are you.
# Triple quotes are also used for docstrings
def greet(name):
"""Return a personalised greeting string."""
return f"Hello, {name}!"
Raw Strings
Prefixing a string with r or R tells Python to treat backslashes as literal characters, not as escape sequences. This is especially useful for regular expressions and Windows file paths:
# Without raw string — \n is interpreted as a newline
path = "C:\new_folder\test"
print(path)
# C:
# ew_folder est
# With raw string — backslashes are kept literally
path = r"C:\new_folder\test"
print(path) # C:\new_folder\test
# Essential for regex patterns
import re
pattern = r"\d{3}-\d{4}" # matches 123-4567
Byte Strings
Prefixing with b creates a bytes object instead of a str. Byte strings hold raw binary data and are used for file I/O, network protocols, and encoding operations:
data = b"Hello"
print(type(data)) # <class 'bytes'>
print(data[0]) # 72 (ASCII code for 'H')
# Convert between str and bytes
text = "Hello"
encoded = text.encode("utf-8") # str → bytes
decoded = encoded.decode("utf-8") # bytes → str
print(encoded) # b'Hello'
print(decoded) # Hello
String Indexing and Slicing
Strings are sequences, which means every character has a position (index) and you can extract sub-sequences (slices).
Positive Indexing
Indices start at 0 for the first character:
text = "Python"
# P y t h o n
# 0 1 2 3 4 5
print(text[0]) # P — first character
print(text[1]) # y
print(text[5]) # n — last character
# print(text[6]) # IndexError: string index out of range
Negative Indexing
Negative indices count from the end, starting at -1:
text = "Python"
# P y t h o n
# -6 -5 -4 -3 -2 -1
print(text[-1]) # n — last character
print(text[-2]) # o — second to last
print(text[-6]) # P — first character
Slicing: [start:stop:step]
Slicing extracts a substring. The syntax is string[start:stop:step] where:
- start — index where the slice begins (inclusive, default
0) - stop — index where the slice ends (exclusive, default end of string)
- step — how many characters to skip (default
1)
text = "Hello, Python!"
# H e l l o , P y t h o n !
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13
# Basic slicing
print(text[0:5]) # Hello — characters 0 to 4
print(text[7:13]) # Python — characters 7 to 12
print(text[7:]) # Python! — from index 7 to end
print(text[:5]) # Hello — from start to index 4
print(text[:]) # Hello, Python! — full copy
# Slicing with step
print(text[0:10:2]) # Hlo y — every 2nd character from 0 to 9
print(text[::3]) # Hl yh! — every 3rd character
# Negative indices in slices
print(text[-7:-1]) # Python — 7th from end to 2nd from end
print(text[-7:]) # Python! — 7th from end to the end
Reversing a String
A step of -1 reverses the string:
text = "Python"
reversed_text = text[::-1]
print(reversed_text) # nohtyP
# Reverse just a portion
print(text[4::-1]) # ohtyP — from index 4 backwards to start
Getting Every Nth Character
text = "abcdefghijklmnop"
print(text[::2]) # acegikmo — every 2nd character
print(text[::3]) # adgjmp — every 3rd character
print(text[1::2]) # bdfhjln — every 2nd character starting from index 1
String Immutability
Strings in Python are immutable — once a string is created, you cannot change any of its characters in place. Any operation that appears to modify a string actually creates a brand-new string object.
name = "Python"
# This will raise an error
# name[0] = "J" # TypeError: 'str' object does not support item assignment
# Instead, create a new string
name = "J" + name[1:]
print(name) # Jython
Why Immutability Matters
- Safety — strings can be used as dictionary keys and set elements because their hash never changes
- Performance — Python can optimise memory by reusing identical string objects (interning)
- Thread safety — immutable objects are inherently safe to share across threads
Workaround: Convert to a List
If you need to modify characters frequently, convert to a list, make your changes, and join back:
text = "Hello, World!"
chars = list(text) # ['H', 'e', 'l', 'l', 'o', ',', ' ', 'W', 'o', 'r', 'l', 'd', '!']
chars[0] = 'J'
chars[7] = 'w'
result = "".join(chars)
print(result) # Jello, world!
String Methods
Python strings come with dozens of built-in methods. None of them modify the original string (because strings are immutable) — they all return new strings.
Case Methods
text = "hello, WORLD! Python 3"
print(text.upper()) # HELLO, WORLD! PYTHON 3
print(text.lower()) # hello, world! python 3
print(text.title()) # Hello, World! Python 3
print(text.capitalize()) # Hello, world! python 3 (only first char)
print(text.swapcase()) # HELLO, world! pYTHON 3
casefold() — aggressive lowering that handles special Unicode characters. Use this for case-insensitive comparisons:
german = "Straße" # German word with ß
print(german.lower()) # straße
print(german.casefold()) # strasse (ß → ss)
# Case-insensitive comparison
word1 = "Straße"
word2 = "STRASSE"
print(word1.casefold() == word2.casefold()) # True
Search Methods
text = "Hello, Python! Python is great."
# find() — returns first index or -1 if not found
print(text.find("Python")) # 7
print(text.find("Python", 10)) # 15 (search from index 10)
print(text.find("Java")) # -1
# rfind() — searches from the right
print(text.rfind("Python")) # 15 (last occurrence)
# index() — like find() but raises ValueError if not found
print(text.index("Python")) # 7
# print(text.index("Java")) # ValueError: substring not found
# rindex() — like rfind() but raises ValueError
print(text.rindex("Python")) # 15
# count() — count non-overlapping occurrences
print(text.count("Python")) # 2
print(text.count("o")) # 3
print(text.count("o", 0, 10)) # 1 (count within range)
Tip: Use
find()when you're not sure the substring exists (returns-1). Useindex()when you expect it to exist and want an error if it doesn't.
Check Methods (Return Boolean)
These methods test the content of a string and return True or False:
# Prefix and suffix
print("hello.py".startswith("hello")) # True
print("hello.py".endswith(".py")) # True
print("hello.py".startswith(("hello", "world"))) # True (tuple of prefixes)
# Character type checks
print("Hello".isalpha()) # True — only letters
print("12345".isdigit()) # True — only digits
print("Hello123".isalnum()) # True — letters and digits only
print(" \t\n".isspace()) # True — only whitespace
# Case checks
print("HELLO".isupper()) # True
print("hello".islower()) # True
print("Hello World".istitle()) # True — title case
# Numeric checks for different scripts
print("42".isnumeric()) # True
print("²".isnumeric()) # True (superscript is numeric)
print("½".isnumeric()) # True (fraction is numeric)
print("²".isdigit()) # True
print("½".isdigit()) # False (fraction is NOT a digit)
| Method | Returns True When | Example |
|---|---|---|
isalpha() | All characters are letters | "Hello" |
isdigit() | All characters are digits | "123" |
isalnum() | All characters are letters or digits | "abc123" |
isspace() | All characters are whitespace | " \t\n" |
isupper() | All cased characters are uppercase | "HELLO" |
islower() | All cased characters are lowercase | "hello" |
istitle() | Title case (each word capitalised) | "Hello World" |
isnumeric() | All characters are numeric (broad) | "42", "½" |
isidentifier() | Valid Python identifier | "my_var" |
Transform Methods
# strip() — remove leading/trailing whitespace (or specified characters)
text = " Hello, World! "
print(text.strip()) # "Hello, World!"
print(text.lstrip()) # "Hello, World! "
print(text.rstrip()) # " Hello, World!"
# Strip specific characters
url = "###Welcome###"
print(url.strip("#")) # "Welcome"
print(url.lstrip("#")) # "Welcome###"
print(url.rstrip("#")) # "###Welcome"
# replace(old, new, count) — replace occurrences
text = "banana banana banana"
print(text.replace("banana", "apple")) # "apple apple apple"
print(text.replace("banana", "apple", 2)) # "apple apple banana"
# translate() and maketrans() — character-level replacement
# Create a translation table: a→@, e→3, o→0
table = str.maketrans("aeo", "@30")
message = "Hello everyone"
print(message.translate(table)) # H3ll0 3v3ry0n3
# Use translate to remove characters (map to None)
remove_digits = str.maketrans("", "", "0123456789")
mixed = "Room 404, Floor 3"
print(mixed.translate(remove_digits)) # "Room , Floor "
Split and Join
# split(sep, maxsplit) — split string into a list
text = "apple,banana,cherry,date"
print(text.split(",")) # ['apple', 'banana', 'cherry', 'date']
print(text.split(",", 2)) # ['apple', 'banana', 'cherry,date']
# Default split — splits on any whitespace and removes empty strings
text = " Hello World Python "
print(text.split()) # ['Hello', 'World', 'Python']
# rsplit() — splits from the right
text = "a/b/c/d/e"
print(text.rsplit("/", 2)) # ['a/b/c', 'd', 'e']
# splitlines() — split on line boundaries
multiline = "Line 1\nLine 2\rLine 3\r\nLine 4"
print(multiline.splitlines())
# ['Line 1', 'Line 2', 'Line 3', 'Line 4']
# join(iterable) — join a list into a string
words = ["Python", "is", "awesome"]
print(" ".join(words)) # "Python is awesome"
print("-".join(words)) # "Python-is-awesome"
print(", ".join(words)) # "Python, is, awesome"
# Join with newlines
lines = ["Line 1", "Line 2", "Line 3"]
print("\n".join(lines))
# Line 1
# Line 2
# Line 3
Padding and Alignment
name = "Python"
# ljust(width, fillchar) — left-justify (pad on the right)
print(name.ljust(15)) # "Python "
print(name.ljust(15, "-")) # "Python---------"
# rjust(width, fillchar) — right-justify (pad on the left)
print(name.rjust(15)) # " Python"
print(name.rjust(15, ".")) # ".........Python"
# center(width, fillchar)
print(name.center(20)) # " Python "
print(name.center(20, "=")) # "=======Python======="
# zfill(width) — pad with zeros on the left (useful for numbers)
print("42".zfill(5)) # "00042"
print("-42".zfill(6)) # "-00042" (sign stays in front)
print("3.14".zfill(8)) # "0003.14"
String Formatting
Python offers three main ways to format strings. f-strings are the modern, recommended approach.
f-Strings (Recommended — Python 3.6+)
f-strings (formatted string literals) are prefixed with f and allow you to embed Python expressions inside curly braces {}:
name = "Priya"
age = 22
score = 95.678
# Basic variable insertion
print(f"Name: {name}") # Name: Priya
print(f"Age: {age}") # Age: 22
# Expressions inside braces
print(f"Next year: {age + 1}") # Next year: 23
print(f"Name length: {len(name)}") # Name length: 5
print(f"Uppercase: {name.upper()}") # Uppercase: PRIYA
# Conditional expressions
print(f"Status: {'Pass' if score >= 40 else 'Fail'}") # Status: Pass
Number Formatting with f-Strings
pi = 3.14159265
big_number = 1234567890
price = 49999.5
percentage = 0.8567
# Decimal places
print(f"Pi: {pi:.2f}") # Pi: 3.14
print(f"Pi: {pi:.4f}") # Pi: 3.1416
# Thousands separator
print(f"Population: {big_number:,}") # Population: 1,234,567,890
print(f"Population: {big_number:_}") # Population: 1_234_567_890
# Currency formatting
print(f"Price: ₹{price:,.2f}") # Price: ₹49,999.50
# Percentage
print(f"Score: {percentage:.1%}") # Score: 85.7%
# Scientific notation
print(f"Value: {big_number:.2e}") # Value: 1.23e+09
# Integer formatting
print(f"Binary: {42:b}") # Binary: 101010
print(f"Octal: {42:o}") # Octal: 52
print(f"Hex: {255:x}") # Hex: ff
print(f"Hex: {255:X}") # Hex: FF
Alignment and Padding with f-Strings
name = "Python"
num = 42
# Alignment: < left, > right, ^ center
print(f"|{name:<20}|") # |Python |
print(f"|{name:>20}|") # | Python|
print(f"|{name:^20}|") # | Python |
# Padding with custom characters
print(f"|{name:*<20}|") # |Python**************|
print(f"|{name:*>20}|") # |**************Python|
print(f"|{name:*^20}|") # |*******Python*******|
# Number padding
print(f"|{num:05d}|") # |00042|
print(f"|{num:10d}|") # | 42|
print(f"|{num:<10d}|") # |42 |
# Practical: formatted table
products = [("Widget", 29.99), ("Gadget", 149.50), ("Thingamajig", 5.00)]
print(f"{'Product':<15} {'Price':>10}")
print("-" * 26)
for product, price in products:
print(f"{product:<15} {'₹':>1}{price:>9.2f}")
# Product Price
# --------------------------
# Widget ₹ 29.99
# Gadget ₹ 149.50
# Thingamajig ₹ 5.00
.format() Method
The .format() method works in all Python 3 versions:
# Positional arguments
print("Hello, {}! You scored {}%.".format("Rahul", 88))
# Hello, Rahul! You scored 88%.
# Numbered arguments (reusable)
print("{0} loves {1}. {1} loves {0}.".format("Alice", "Bob"))
# Alice loves Bob. Bob loves Alice.
# Named arguments
print("Name: {name}, Age: {age}".format(name="Priya", age=22))
# Formatting specifiers (same syntax as f-strings)
print("Pi: {:.2f}".format(3.14159)) # Pi: 3.14
print("Price: {:,.2f}".format(49999.5)) # Price: 49,999.50
print("{:>20}".format("right-aligned")) # right-aligned
# Unpacking a dictionary
person = {"name": "Meritshot", "role": "Education"}
print("Company: {name}, Industry: {role}".format(**person))
% Formatting (Old Style)
This is the oldest formatting method. You'll encounter it in legacy code, but prefer f-strings for new code:
name = "Priya"
age = 22
score = 95.678
print("Name: %s, Age: %d" % (name, age)) # Name: Priya, Age: 22
print("Score: %.2f%%" % score) # Score: 95.68%
print("Hex: %x, Octal: %o" % (255, 42)) # Hex: ff, Octal: 52
print("Padded: %10s | %-10s" % ("right", "left")) # Padded: right | left
Format Specifiers Reference
| Specifier | Meaning | Example | Output |
|---|---|---|---|
d | Integer | f"{42:d}" | 42 |
f | Fixed-point float | f"{3.14159:.2f}" | 3.14 |
e / E | Scientific notation | f"{1500:.2e}" | 1.50e+03 |
s | String | f"{'hi':>10s}" | hi |
% | Percentage | f"{0.85:.1%}" | 85.0% |
, | Thousands separator | f"{1000000:,}" | 1,000,000 |
b | Binary | f"{10:b}" | 1010 |
o | Octal | f"{10:o}" | 12 |
x / X | Hexadecimal | f"{255:x}" | ff |
< | Left align | f"{'hi':<10}" | hi |
> | Right align | f"{'hi':>10}" | hi |
^ | Center align | f"{'hi':^10}" | hi |
0 | Zero-pad | f"{42:05}" | 00042 |
Escape Characters
Escape sequences let you include special characters in strings:
| Escape | Description | Example Output |
|---|---|---|
\n | Newline | Line break |
\t | Tab | Horizontal tab |
\\ | Backslash | \ |
\" | Double quote | " |
\' | Single quote | ' |
\r | Carriage return | Returns cursor to start of line |
\b | Backspace | Deletes previous character |
\0 | Null character | Null byte |
\uXXXX | Unicode (16-bit) | \u0041 → A |
\UXXXXXXXX | Unicode (32-bit) | \U0001F600 → smiley |
\xHH | Hex character | \x41 → A |
\ooo | Octal character | \101 → A |
# Common escape sequences
print("Line 1\nLine 2") # newline
print("Name:\tPython") # tab
print("She said \"hello\"") # embedded quotes
print("Backslash: \\") # literal backslash
print("A\bB") # backspace: prints "B" (A is erased)
# Unicode escapes
print("\u2764") # ❤ (heart)
print("\u03C0") # π (pi)
print("\U0001F680") # 🚀 (rocket)
# Use raw strings to disable escaping
print(r"No \n newline here") # No \n newline here
String Concatenation and Repetition
Concatenation with +
first = "Merit"
last = "shot"
full = first + last
print(full) # Meritshot
# You can only concatenate strings with strings
# print("Age: " + 25) # TypeError!
print("Age: " + str(25)) # Age: 25
Repetition with *
line = "-" * 40
print(line) # ----------------------------------------
laugh = "Ha" * 3
print(laugh) # HaHaHa
# Useful for creating patterns
box_top = "+" + "-" * 20 + "+"
print(box_top) # +--------------------+
Performance: + vs join()
Repeated + concatenation is slow for many strings because it creates a new string object each time. Use join() for building strings from many pieces:
# SLOW — O(n²) because each + creates a new string
result = ""
for i in range(10000):
result += str(i) # creates a new string every iteration
# FAST — O(n) because join allocates memory once
result = "".join(str(i) for i in range(10000))
# Even better for simple cases — list then join
parts = []
for i in range(10000):
parts.append(str(i))
result = "".join(parts)
Rule of thumb: For 2-3 concatenations,
+is fine. For loops or many parts, always usejoin().
String Membership
The in and not in operators check whether a substring exists within a string:
text = "Python is an amazing programming language"
print("Python" in text) # True
print("python" in text) # False (case-sensitive!)
print("Java" not in text) # True
print("amazing" in text) # True
print("amaz" in text) # True (partial match works)
# Case-insensitive check
search = "python"
print(search.lower() in text.lower()) # True
Iterating Over Strings
Character by Character
word = "Python"
# Simple loop
for char in word:
print(char, end=" ")
# P y t h o n
print() # newline
With Index Using enumerate()
word = "Python"
for index, char in enumerate(word):
print(f"Index {index}: '{char}'")
# Index 0: 'P'
# Index 1: 'y'
# Index 2: 't'
# Index 3: 'h'
# Index 4: 'o'
# Index 5: 'n'
Using List Comprehensions with Strings
word = "Hello World"
# Get all uppercase letters
uppers = [ch for ch in word if ch.isupper()]
print(uppers) # ['H', 'W']
# Get ASCII values of each character
codes = [ord(ch) for ch in word]
print(codes) # [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100]
# Build a new string — convert vowels to uppercase
result = "".join(ch.upper() if ch in "aeiou" else ch for ch in "hello world")
print(result) # hEllO wOrld
Common String Patterns
Reversing a String
text = "Hello"
# Method 1: slicing (most Pythonic)
print(text[::-1]) # olleH
# Method 2: reversed() + join
print("".join(reversed(text))) # olleH
# Method 3: loop (educational, not recommended)
result = ""
for char in text:
result = char + result
print(result) # olleH
Checking for Palindrome
A palindrome reads the same forwards and backwards:
def is_palindrome(text):
"""Check if a string is a palindrome (case-insensitive, ignoring spaces)."""
cleaned = text.lower().replace(" ", "")
return cleaned == cleaned[::-1]
print(is_palindrome("racecar")) # True
print(is_palindrome("A man a plan a canal Panama")) # True (ignoring spaces/case)
print(is_palindrome("hello")) # False
print(is_palindrome("Madam")) # True
Counting Vowels and Consonants
def count_vowels_consonants(text):
"""Count vowels and consonants in a string."""
vowels = "aeiouAEIOU"
v_count = 0
c_count = 0
for char in text:
if char.isalpha():
if char in vowels:
v_count += 1
else:
c_count += 1
return v_count, c_count
vowels, consonants = count_vowels_consonants("Hello, World!")
print(f"Vowels: {vowels}, Consonants: {consonants}")
# Vowels: 3, Consonants: 7
Removing All Whitespace
text = " Hello World Python "
# Remove leading/trailing only
print(text.strip()) # "Hello World Python"
# Remove ALL whitespace
print(text.replace(" ", "")) # "HelloWorldPython"
# Normalise whitespace (collapse multiple spaces into one)
print(" ".join(text.split())) # "Hello World Python"
Caesar Cipher (Basic Encryption)
Shift each letter by a fixed number of positions in the alphabet:
def caesar_cipher(text, shift):
"""Encrypt text using Caesar cipher with the given shift."""
result = []
for char in text:
if char.isalpha():
# Determine base: 'A' for uppercase, 'a' for lowercase
base = ord('A') if char.isupper() else ord('a')
# Shift the character, wrapping around with modulo 26
shifted = (ord(char) - base + shift) % 26 + base
result.append(chr(shifted))
else:
result.append(char) # keep non-letters unchanged
return "".join(result)
encrypted = caesar_cipher("Hello, World!", 3)
print(encrypted) # Khoor, Zruog!
decrypted = caesar_cipher(encrypted, -3)
print(decrypted) # Hello, World!
Extracting Digits from a String
text = "Order #4521, Total: ₹2,999.50, Items: 3"
# Method 1: list comprehension
digits = "".join(ch for ch in text if ch.isdigit())
print(digits) # 452129995003
# Method 2: filter()
digits = "".join(filter(str.isdigit, text))
print(digits) # 452129995003
# Method 3: extract as separate numbers using split and filtering
import re
numbers = re.findall(r"\d+\.?\d*", text)
print(numbers) # ['4521', '2', '999.50', '3']
Title Case Conversion
# Built-in title() has limitations with apostrophes and acronyms
text = "hello world from PYTHON's string methods"
print(text.title()) # Hello World From Python'S String Methods
# Note: 'S is capitalised (not ideal)
# Better approach for edge cases — use capwords from string module
import string
print(string.capwords(text)) # Hello World From Python's String Methods
Validating Email (Basic Pattern Check)
def is_valid_email(email):
"""Basic email validation without regex."""
if email.count("@") != 1:
return False
local, domain = email.split("@")
if not local or not domain:
return False
if "." not in domain:
return False
if domain.startswith(".") or domain.endswith("."):
return False
# Check for valid characters
allowed = set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._-")
if not all(ch in allowed for ch in local):
return False
return True
print(is_valid_email("user@example.com")) # True
print(is_valid_email("user.name@company.co")) # True
print(is_valid_email("invalid@")) # False
print(is_valid_email("no-at-sign.com")) # False
print(is_valid_email("@no-local.com")) # False
Regex Basics (Brief Introduction)
Python's re module provides regular expression support for powerful pattern matching. Here is a quick introduction — regex is a deep topic that deserves its own chapter.
Importing and Basic Functions
import re
text = "Contact us at support@meritshot.com or call 91-9876543210"
# re.search() — find first match
match = re.search(r"\d{2}-\d{10}", text)
if match:
print(f"Phone found: {match.group()}") # Phone found: 91-9876543210
# re.findall() — find ALL matches (returns list of strings)
emails = re.findall(r"[\w.-]+@[\w.-]+", text)
print(emails) # ['support@meritshot.com']
# re.sub() — search and replace using patterns
cleaned = re.sub(r"\d", "*", text)
print(cleaned)
# Contact us at support@meritshot.com or call **-**********
# re.split() — split on a pattern
parts = re.split(r"[,;\s]+", "apple, banana; cherry date")
print(parts) # ['apple', 'banana', 'cherry', 'date']
Common Regex Patterns
| Pattern | Matches | Example |
|---|---|---|
\d | Any digit (0-9) | "42" |
\D | Non-digit | "abc" |
\w | Word character (letter, digit, _) | "hello_42" |
\W | Non-word character | "@#!" |
\s | Whitespace | " \t\n" |
\S | Non-whitespace | "abc" |
. | Any character (except newline) | "a", "1", "@" |
+ | One or more | \d+ matches "123" |
* | Zero or more | \d* matches "" or "123" |
? | Zero or one | colou?r matches "color" or "colour" |
{n} | Exactly n times | \d{4} matches "2026" |
{n,m} | Between n and m times | \d{2,4} matches "42" or "2026" |
^ | Start of string | ^Hello |
$ | End of string | world$ |
[abc] | Any one of a, b, c | [aeiou] matches vowels |
[^abc] | NOT a, b, or c | [^0-9] matches non-digits |
import re
# Extract all words
words = re.findall(r"\b\w+\b", "Hello, World! Python 3.12")
print(words) # ['Hello', 'World', 'Python', '3', '12']
# Validate phone number format
phone = "91-9876543210"
if re.match(r"^\d{2}-\d{10}$", phone):
print("Valid phone number")
# Extract date components
date_str = "Today is 2026-03-15"
match = re.search(r"(\d{4})-(\d{2})-(\d{2})", date_str)
if match:
year, month, day = match.groups()
print(f"Year: {year}, Month: {month}, Day: {day}")
# Year: 2026, Month: 03, Day: 15
Practical Examples
Password Strength Checker
def check_password_strength(password):
"""
Check password strength and return a score with feedback.
Criteria: length, uppercase, lowercase, digits, special characters.
"""
score = 0
feedback = []
# Length check
if len(password) >= 8:
score += 1
else:
feedback.append("Use at least 8 characters")
if len(password) >= 12:
score += 1
# Character type checks
if any(ch.isupper() for ch in password):
score += 1
else:
feedback.append("Add at least one uppercase letter")
if any(ch.islower() for ch in password):
score += 1
else:
feedback.append("Add at least one lowercase letter")
if any(ch.isdigit() for ch in password):
score += 1
else:
feedback.append("Add at least one digit")
special_chars = "!@#$%^&*()_+-=[]{}|;:',.<>?/~`"
if any(ch in special_chars for ch in password):
score += 1
else:
feedback.append("Add at least one special character")
# Common pattern checks
common_passwords = ["password", "123456", "qwerty", "admin"]
if password.lower() in common_passwords:
score = 0
feedback = ["This is a commonly used password — choose something unique"]
# Determine strength label
if score <= 2:
strength = "Weak"
elif score <= 4:
strength = "Moderate"
else:
strength = "Strong"
return strength, score, feedback
# Test
passwords = ["hello", "Hello123", "M3rit$hot!2026", "password"]
for pw in passwords:
strength, score, tips = check_password_strength(pw)
print(f"Password: {pw:20s} | Strength: {strength:8s} | Score: {score}/6")
for tip in tips:
print(f" → {tip}")
print()
Word Frequency Counter
def word_frequency(text):
"""Count the frequency of each word in a text (case-insensitive)."""
# Remove punctuation and convert to lowercase
import string
cleaned = text.lower()
cleaned = cleaned.translate(str.maketrans("", "", string.punctuation))
# Split into words and count
words = cleaned.split()
freq = {}
for word in words:
freq[word] = freq.get(word, 0) + 1
# Sort by frequency (highest first)
sorted_freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)
return sorted_freq
text = """
Python is a great programming language. Python is used for web development,
data science, and automation. Many developers love Python because Python
is easy to learn and Python has a rich ecosystem.
"""
results = word_frequency(text)
print(f"{'Word':<15} {'Count':>5}")
print("-" * 21)
for word, count in results[:10]: # top 10 words
print(f"{word:<15} {count:>5}")
Text Cleaner
import string
import re
def clean_text(text):
"""
Clean and normalise text:
- Remove extra whitespace
- Remove punctuation
- Convert to lowercase
- Remove digits (optional)
"""
# Convert to lowercase
text = text.lower()
# Remove URLs
text = re.sub(r"https?://\S+", "", text)
# Remove email addresses
text = re.sub(r"\S+@\S+", "", text)
# Remove punctuation
text = text.translate(str.maketrans("", "", string.punctuation))
# Remove digits
text = re.sub(r"\d+", "", text)
# Normalise whitespace (collapse multiple spaces/newlines into one space)
text = " ".join(text.split())
return text.strip()
raw_text = """
Hello!!! This is a MESSY text with extra spaces.
Visit https://meritshot.com for more info.
Contact: support@meritshot.com
Order #12345 was placed on 2026-03-15.
"""
cleaned = clean_text(raw_text)
print("Original:")
print(raw_text)
print("Cleaned:")
print(cleaned)
# hello this is a messy text with extra spaces visit for more info contact order was placed on
Practice Exercises
-
Reverse Words: Write a function that takes a sentence and returns the sentence with each word reversed, but the word order preserved. For example,
"Hello World"becomes"olleH dlroW". -
Acronym Generator: Write a function that takes a phrase and returns its acronym. For example,
"Artificial Intelligence"becomes"AI"and"as soon as possible"becomes"ASAP". -
String Compression: Write a function that compresses a string using run-length encoding. For example,
"aaabbbccccdd"becomes"a3b3c4d2". If the compressed string is not shorter than the original, return the original. -
Anagram Checker: Write a function that checks if two strings are anagrams of each other (same letters, different order). For example,
"listen"and"silent"are anagrams. -
Pig Latin Translator: Write a function that converts English to Pig Latin. Rules: if a word starts with a vowel, add
"yay"to the end; if it starts with consonants, move the leading consonants to the end and add"ay". For example,"hello"becomes"ellohay"and"apple"becomes"appleyay". -
Masked Credit Card: Write a function that takes a credit card number (as a string of 16 digits) and returns it masked with
*except for the last 4 digits. For example,"1234567890123456"becomes"************3456".
Summary
In this chapter, you learned:
- String creation — single quotes, double quotes, triple quotes, raw strings (
r""), and byte strings (b"") - Indexing and slicing — positive and negative indices,
[start:stop:step]syntax, reversing with[::-1] - Immutability — strings cannot be modified in place; every operation creates a new string
- String methods — case conversion, searching, checking, transforming, splitting, joining, and padding
- String formatting — f-strings (recommended),
.format(), and legacy%formatting - Escape characters —
\n,\t,\\,\", Unicode escapes, and raw strings - Concatenation and repetition —
+,*, and whyjoin()is faster in loops - Membership testing —
inandnot inoperators - Common patterns — palindromes, Caesar cipher, email validation, digit extraction
- Regex basics —
re.search(),re.findall(),re.sub(), and common patterns - Practical programs — password checker, word counter, text cleaner
Next up: Lists — learn about Python's most versatile data structure for storing ordered collections of items.