Chapter 5 of 14

Python Strings

Complete guide to Python strings — creation, indexing, slicing, methods, formatting, regex basics, and real-world string operations.

Meritshot24 min read
PythonStringsTextFormatting
All Python Chapters

Strings in Python

A string is an immutable sequence of Unicode characters. Every piece of text you work with in Python — a name, a sentence, a line from a file, an HTTP response — is a string. Strings are one of the most frequently used data types, and Python gives you an exceptionally rich set of tools for creating, inspecting, and transforming them.

# Strings are objects of the str class
greeting = "Hello, World!"
print(type(greeting))  # <class 'str'>
print(len(greeting))   # 13

Because Python 3 strings are Unicode by default, you can include characters from any language or symbol set:

hindi = "नमस्ते"
emoji = "Python is fun 🐍"
japanese = "こんにちは"
print(hindi, emoji, japanese)

Creating Strings

Python offers several ways to create strings, each suited to different situations.

Single and Double Quotes

Single quotes and double quotes are interchangeable. Use whichever lets you avoid escaping:

name = 'Meritshot'
name = "Meritshot"       # exactly the same

# Use the other quote style to embed quotes naturally
message = "It's a beautiful day"
html = '<div class="container">Hello</div>'

Triple Quotes (Multi-Line Strings)

Triple quotes (""" or ''') let you write strings that span multiple lines. The line breaks are preserved as \n characters:

poem = """Roses are red,
Violets are blue,
Python is awesome,
And so are you."""

print(poem)
# Roses are red,
# Violets are blue,
# Python is awesome,
# And so are you.

# Triple quotes are also used for docstrings
def greet(name):
    """Return a personalised greeting string."""
    return f"Hello, {name}!"

Raw Strings

Prefixing a string with r or R tells Python to treat backslashes as literal characters, not as escape sequences. This is especially useful for regular expressions and Windows file paths:

# Without raw string — \n is interpreted as a newline
path = "C:\new_folder\test"
print(path)
# C:
# ew_folder	est

# With raw string — backslashes are kept literally
path = r"C:\new_folder\test"
print(path)  # C:\new_folder\test

# Essential for regex patterns
import re
pattern = r"\d{3}-\d{4}"   # matches 123-4567

Byte Strings

Prefixing with b creates a bytes object instead of a str. Byte strings hold raw binary data and are used for file I/O, network protocols, and encoding operations:

data = b"Hello"
print(type(data))    # <class 'bytes'>
print(data[0])       # 72 (ASCII code for 'H')

# Convert between str and bytes
text = "Hello"
encoded = text.encode("utf-8")     # str → bytes
decoded = encoded.decode("utf-8")  # bytes → str
print(encoded)   # b'Hello'
print(decoded)   # Hello

String Indexing and Slicing

Strings are sequences, which means every character has a position (index) and you can extract sub-sequences (slices).

Positive Indexing

Indices start at 0 for the first character:

text = "Python"
#       P  y  t  h  o  n
#       0  1  2  3  4  5

print(text[0])   # P   — first character
print(text[1])   # y
print(text[5])   # n   — last character
# print(text[6]) # IndexError: string index out of range

Negative Indexing

Negative indices count from the end, starting at -1:

text = "Python"
#       P   y   t   h   o   n
#      -6  -5  -4  -3  -2  -1

print(text[-1])   # n   — last character
print(text[-2])   # o   — second to last
print(text[-6])   # P   — first character

Slicing: [start:stop:step]

Slicing extracts a substring. The syntax is string[start:stop:step] where:

  • start — index where the slice begins (inclusive, default 0)
  • stop — index where the slice ends (exclusive, default end of string)
  • step — how many characters to skip (default 1)
text = "Hello, Python!"
#       H  e  l  l  o  ,     P  y  t  h  o  n  !
#       0  1  2  3  4  5  6  7  8  9  10 11 12 13

# Basic slicing
print(text[0:5])     # Hello       — characters 0 to 4
print(text[7:13])    # Python      — characters 7 to 12
print(text[7:])      # Python!     — from index 7 to end
print(text[:5])      # Hello       — from start to index 4
print(text[:])       # Hello, Python!  — full copy

# Slicing with step
print(text[0:10:2])  # Hlo y      — every 2nd character from 0 to 9
print(text[::3])     # Hl yh!     — every 3rd character

# Negative indices in slices
print(text[-7:-1])   # Python     — 7th from end to 2nd from end
print(text[-7:])     # Python!    — 7th from end to the end

Reversing a String

A step of -1 reverses the string:

text = "Python"
reversed_text = text[::-1]
print(reversed_text)  # nohtyP

# Reverse just a portion
print(text[4::-1])    # ohtyP  — from index 4 backwards to start

Getting Every Nth Character

text = "abcdefghijklmnop"

print(text[::2])   # acegikmo   — every 2nd character
print(text[::3])   # adgjmp     — every 3rd character
print(text[1::2])  # bdfhjln    — every 2nd character starting from index 1

String Immutability

Strings in Python are immutable — once a string is created, you cannot change any of its characters in place. Any operation that appears to modify a string actually creates a brand-new string object.

name = "Python"

# This will raise an error
# name[0] = "J"   # TypeError: 'str' object does not support item assignment

# Instead, create a new string
name = "J" + name[1:]
print(name)  # Jython

Why Immutability Matters

  1. Safety — strings can be used as dictionary keys and set elements because their hash never changes
  2. Performance — Python can optimise memory by reusing identical string objects (interning)
  3. Thread safety — immutable objects are inherently safe to share across threads

Workaround: Convert to a List

If you need to modify characters frequently, convert to a list, make your changes, and join back:

text = "Hello, World!"
chars = list(text)         # ['H', 'e', 'l', 'l', 'o', ',', ' ', 'W', 'o', 'r', 'l', 'd', '!']
chars[0] = 'J'
chars[7] = 'w'
result = "".join(chars)
print(result)  # Jello, world!

String Methods

Python strings come with dozens of built-in methods. None of them modify the original string (because strings are immutable) — they all return new strings.

Case Methods

text = "hello, WORLD! Python 3"

print(text.upper())       # HELLO, WORLD! PYTHON 3
print(text.lower())       # hello, world! python 3
print(text.title())       # Hello, World! Python 3
print(text.capitalize())  # Hello, world! python 3  (only first char)
print(text.swapcase())    # HELLO, world! pYTHON 3

casefold() — aggressive lowering that handles special Unicode characters. Use this for case-insensitive comparisons:

german = "Straße"                  # German word with ß
print(german.lower())              # straße
print(german.casefold())           # strasse  (ß → ss)

# Case-insensitive comparison
word1 = "Straße"
word2 = "STRASSE"
print(word1.casefold() == word2.casefold())  # True

Search Methods

text = "Hello, Python! Python is great."

# find() — returns first index or -1 if not found
print(text.find("Python"))      # 7
print(text.find("Python", 10))  # 15  (search from index 10)
print(text.find("Java"))        # -1

# rfind() — searches from the right
print(text.rfind("Python"))     # 15  (last occurrence)

# index() — like find() but raises ValueError if not found
print(text.index("Python"))     # 7
# print(text.index("Java"))    # ValueError: substring not found

# rindex() — like rfind() but raises ValueError
print(text.rindex("Python"))    # 15

# count() — count non-overlapping occurrences
print(text.count("Python"))     # 2
print(text.count("o"))          # 3
print(text.count("o", 0, 10))   # 1  (count within range)

Tip: Use find() when you're not sure the substring exists (returns -1). Use index() when you expect it to exist and want an error if it doesn't.

Check Methods (Return Boolean)

These methods test the content of a string and return True or False:

# Prefix and suffix
print("hello.py".startswith("hello"))   # True
print("hello.py".endswith(".py"))       # True
print("hello.py".startswith(("hello", "world")))  # True (tuple of prefixes)

# Character type checks
print("Hello".isalpha())    # True  — only letters
print("12345".isdigit())    # True  — only digits
print("Hello123".isalnum()) # True  — letters and digits only
print("   \t\n".isspace())  # True  — only whitespace

# Case checks
print("HELLO".isupper())    # True
print("hello".islower())    # True
print("Hello World".istitle())  # True — title case

# Numeric checks for different scripts
print("42".isnumeric())     # True
print("²".isnumeric())      # True  (superscript is numeric)
print("½".isnumeric())      # True  (fraction is numeric)
print("²".isdigit())        # True
print("½".isdigit())        # False (fraction is NOT a digit)
MethodReturns True WhenExample
isalpha()All characters are letters"Hello"
isdigit()All characters are digits"123"
isalnum()All characters are letters or digits"abc123"
isspace()All characters are whitespace" \t\n"
isupper()All cased characters are uppercase"HELLO"
islower()All cased characters are lowercase"hello"
istitle()Title case (each word capitalised)"Hello World"
isnumeric()All characters are numeric (broad)"42", "½"
isidentifier()Valid Python identifier"my_var"

Transform Methods

# strip() — remove leading/trailing whitespace (or specified characters)
text = "   Hello, World!   "
print(text.strip())         # "Hello, World!"
print(text.lstrip())        # "Hello, World!   "
print(text.rstrip())        # "   Hello, World!"

# Strip specific characters
url = "###Welcome###"
print(url.strip("#"))       # "Welcome"
print(url.lstrip("#"))      # "Welcome###"
print(url.rstrip("#"))      # "###Welcome"

# replace(old, new, count) — replace occurrences
text = "banana banana banana"
print(text.replace("banana", "apple"))       # "apple apple apple"
print(text.replace("banana", "apple", 2))    # "apple apple banana"

# translate() and maketrans() — character-level replacement
# Create a translation table: a→@, e→3, o→0
table = str.maketrans("aeo", "@30")
message = "Hello everyone"
print(message.translate(table))  # H3ll0 3v3ry0n3

# Use translate to remove characters (map to None)
remove_digits = str.maketrans("", "", "0123456789")
mixed = "Room 404, Floor 3"
print(mixed.translate(remove_digits))  # "Room , Floor "

Split and Join

# split(sep, maxsplit) — split string into a list
text = "apple,banana,cherry,date"
print(text.split(","))        # ['apple', 'banana', 'cherry', 'date']
print(text.split(",", 2))     # ['apple', 'banana', 'cherry,date']

# Default split — splits on any whitespace and removes empty strings
text = "  Hello   World  Python  "
print(text.split())           # ['Hello', 'World', 'Python']

# rsplit() — splits from the right
text = "a/b/c/d/e"
print(text.rsplit("/", 2))    # ['a/b/c', 'd', 'e']

# splitlines() — split on line boundaries
multiline = "Line 1\nLine 2\rLine 3\r\nLine 4"
print(multiline.splitlines())
# ['Line 1', 'Line 2', 'Line 3', 'Line 4']

# join(iterable) — join a list into a string
words = ["Python", "is", "awesome"]
print(" ".join(words))        # "Python is awesome"
print("-".join(words))        # "Python-is-awesome"
print(", ".join(words))       # "Python, is, awesome"

# Join with newlines
lines = ["Line 1", "Line 2", "Line 3"]
print("\n".join(lines))
# Line 1
# Line 2
# Line 3

Padding and Alignment

name = "Python"

# ljust(width, fillchar) — left-justify (pad on the right)
print(name.ljust(15))         # "Python         "
print(name.ljust(15, "-"))    # "Python---------"

# rjust(width, fillchar) — right-justify (pad on the left)
print(name.rjust(15))         # "         Python"
print(name.rjust(15, "."))    # ".........Python"

# center(width, fillchar)
print(name.center(20))        # "       Python       "
print(name.center(20, "="))   # "=======Python======="

# zfill(width) — pad with zeros on the left (useful for numbers)
print("42".zfill(5))          # "00042"
print("-42".zfill(6))         # "-00042"  (sign stays in front)
print("3.14".zfill(8))        # "0003.14"

String Formatting

Python offers three main ways to format strings. f-strings are the modern, recommended approach.

f-strings (formatted string literals) are prefixed with f and allow you to embed Python expressions inside curly braces {}:

name = "Priya"
age = 22
score = 95.678

# Basic variable insertion
print(f"Name: {name}")             # Name: Priya
print(f"Age: {age}")               # Age: 22

# Expressions inside braces
print(f"Next year: {age + 1}")     # Next year: 23
print(f"Name length: {len(name)}") # Name length: 5
print(f"Uppercase: {name.upper()}") # Uppercase: PRIYA

# Conditional expressions
print(f"Status: {'Pass' if score >= 40 else 'Fail'}")  # Status: Pass

Number Formatting with f-Strings

pi = 3.14159265
big_number = 1234567890
price = 49999.5
percentage = 0.8567

# Decimal places
print(f"Pi: {pi:.2f}")             # Pi: 3.14
print(f"Pi: {pi:.4f}")             # Pi: 3.1416

# Thousands separator
print(f"Population: {big_number:,}")     # Population: 1,234,567,890
print(f"Population: {big_number:_}")     # Population: 1_234_567_890

# Currency formatting
print(f"Price: ₹{price:,.2f}")          # Price: ₹49,999.50

# Percentage
print(f"Score: {percentage:.1%}")        # Score: 85.7%

# Scientific notation
print(f"Value: {big_number:.2e}")        # Value: 1.23e+09

# Integer formatting
print(f"Binary: {42:b}")                 # Binary: 101010
print(f"Octal: {42:o}")                  # Octal: 52
print(f"Hex: {255:x}")                   # Hex: ff
print(f"Hex: {255:X}")                   # Hex: FF

Alignment and Padding with f-Strings

name = "Python"
num = 42

# Alignment: < left, > right, ^ center
print(f"|{name:<20}|")    # |Python              |
print(f"|{name:>20}|")    # |              Python|
print(f"|{name:^20}|")    # |       Python       |

# Padding with custom characters
print(f"|{name:*<20}|")   # |Python**************|
print(f"|{name:*>20}|")   # |**************Python|
print(f"|{name:*^20}|")   # |*******Python*******|

# Number padding
print(f"|{num:05d}|")     # |00042|
print(f"|{num:10d}|")     # |        42|
print(f"|{num:<10d}|")    # |42        |

# Practical: formatted table
products = [("Widget", 29.99), ("Gadget", 149.50), ("Thingamajig", 5.00)]
print(f"{'Product':<15} {'Price':>10}")
print("-" * 26)
for product, price in products:
    print(f"{product:<15} {'₹':>1}{price:>9.2f}")
# Product              Price
# --------------------------
# Widget           ₹    29.99
# Gadget           ₹   149.50
# Thingamajig      ₹     5.00

.format() Method

The .format() method works in all Python 3 versions:

# Positional arguments
print("Hello, {}! You scored {}%.".format("Rahul", 88))
# Hello, Rahul! You scored 88%.

# Numbered arguments (reusable)
print("{0} loves {1}. {1} loves {0}.".format("Alice", "Bob"))
# Alice loves Bob. Bob loves Alice.

# Named arguments
print("Name: {name}, Age: {age}".format(name="Priya", age=22))

# Formatting specifiers (same syntax as f-strings)
print("Pi: {:.2f}".format(3.14159))          # Pi: 3.14
print("Price: {:,.2f}".format(49999.5))      # Price: 49,999.50
print("{:>20}".format("right-aligned"))       #      right-aligned

# Unpacking a dictionary
person = {"name": "Meritshot", "role": "Education"}
print("Company: {name}, Industry: {role}".format(**person))

% Formatting (Old Style)

This is the oldest formatting method. You'll encounter it in legacy code, but prefer f-strings for new code:

name = "Priya"
age = 22
score = 95.678

print("Name: %s, Age: %d" % (name, age))           # Name: Priya, Age: 22
print("Score: %.2f%%" % score)                       # Score: 95.68%
print("Hex: %x, Octal: %o" % (255, 42))             # Hex: ff, Octal: 52
print("Padded: %10s | %-10s" % ("right", "left"))   # Padded:      right | left

Format Specifiers Reference

SpecifierMeaningExampleOutput
dIntegerf"{42:d}"42
fFixed-point floatf"{3.14159:.2f}"3.14
e / EScientific notationf"{1500:.2e}"1.50e+03
sStringf"{'hi':>10s}" hi
%Percentagef"{0.85:.1%}"85.0%
,Thousands separatorf"{1000000:,}"1,000,000
bBinaryf"{10:b}"1010
oOctalf"{10:o}"12
x / XHexadecimalf"{255:x}"ff
<Left alignf"{'hi':<10}"hi
>Right alignf"{'hi':>10}" hi
^Center alignf"{'hi':^10}" hi
0Zero-padf"{42:05}"00042

Escape Characters

Escape sequences let you include special characters in strings:

EscapeDescriptionExample Output
\nNewlineLine break
\tTabHorizontal tab
\\Backslash\
\"Double quote"
\'Single quote'
\rCarriage returnReturns cursor to start of line
\bBackspaceDeletes previous character
\0Null characterNull byte
\uXXXXUnicode (16-bit)\u0041A
\UXXXXXXXXUnicode (32-bit)\U0001F600 → smiley
\xHHHex character\x41A
\oooOctal character\101A
# Common escape sequences
print("Line 1\nLine 2")        # newline
print("Name:\tPython")         # tab
print("She said \"hello\"")    # embedded quotes
print("Backslash: \\")         # literal backslash
print("A\bB")                  # backspace: prints "B" (A is erased)

# Unicode escapes
print("\u2764")                 # ❤  (heart)
print("\u03C0")                 # π  (pi)
print("\U0001F680")             # 🚀 (rocket)

# Use raw strings to disable escaping
print(r"No \n newline here")   # No \n newline here

String Concatenation and Repetition

Concatenation with +

first = "Merit"
last = "shot"
full = first + last
print(full)  # Meritshot

# You can only concatenate strings with strings
# print("Age: " + 25)     # TypeError!
print("Age: " + str(25))   # Age: 25

Repetition with *

line = "-" * 40
print(line)   # ----------------------------------------

laugh = "Ha" * 3
print(laugh)  # HaHaHa

# Useful for creating patterns
box_top = "+" + "-" * 20 + "+"
print(box_top)  # +--------------------+

Performance: + vs join()

Repeated + concatenation is slow for many strings because it creates a new string object each time. Use join() for building strings from many pieces:

# SLOW — O(n²) because each + creates a new string
result = ""
for i in range(10000):
    result += str(i)         # creates a new string every iteration

# FAST — O(n) because join allocates memory once
result = "".join(str(i) for i in range(10000))

# Even better for simple cases — list then join
parts = []
for i in range(10000):
    parts.append(str(i))
result = "".join(parts)

Rule of thumb: For 2-3 concatenations, + is fine. For loops or many parts, always use join().

String Membership

The in and not in operators check whether a substring exists within a string:

text = "Python is an amazing programming language"

print("Python" in text)        # True
print("python" in text)        # False  (case-sensitive!)
print("Java" not in text)      # True
print("amazing" in text)       # True
print("amaz" in text)          # True  (partial match works)

# Case-insensitive check
search = "python"
print(search.lower() in text.lower())  # True

Iterating Over Strings

Character by Character

word = "Python"

# Simple loop
for char in word:
    print(char, end=" ")
# P y t h o n

print()  # newline

With Index Using enumerate()

word = "Python"

for index, char in enumerate(word):
    print(f"Index {index}: '{char}'")
# Index 0: 'P'
# Index 1: 'y'
# Index 2: 't'
# Index 3: 'h'
# Index 4: 'o'
# Index 5: 'n'

Using List Comprehensions with Strings

word = "Hello World"

# Get all uppercase letters
uppers = [ch for ch in word if ch.isupper()]
print(uppers)  # ['H', 'W']

# Get ASCII values of each character
codes = [ord(ch) for ch in word]
print(codes)   # [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100]

# Build a new string — convert vowels to uppercase
result = "".join(ch.upper() if ch in "aeiou" else ch for ch in "hello world")
print(result)  # hEllO wOrld

Common String Patterns

Reversing a String

text = "Hello"

# Method 1: slicing (most Pythonic)
print(text[::-1])             # olleH

# Method 2: reversed() + join
print("".join(reversed(text)))  # olleH

# Method 3: loop (educational, not recommended)
result = ""
for char in text:
    result = char + result
print(result)                 # olleH

Checking for Palindrome

A palindrome reads the same forwards and backwards:

def is_palindrome(text):
    """Check if a string is a palindrome (case-insensitive, ignoring spaces)."""
    cleaned = text.lower().replace(" ", "")
    return cleaned == cleaned[::-1]

print(is_palindrome("racecar"))     # True
print(is_palindrome("A man a plan a canal Panama"))  # True (ignoring spaces/case)
print(is_palindrome("hello"))       # False
print(is_palindrome("Madam"))       # True

Counting Vowels and Consonants

def count_vowels_consonants(text):
    """Count vowels and consonants in a string."""
    vowels = "aeiouAEIOU"
    v_count = 0
    c_count = 0
    for char in text:
        if char.isalpha():
            if char in vowels:
                v_count += 1
            else:
                c_count += 1
    return v_count, c_count

vowels, consonants = count_vowels_consonants("Hello, World!")
print(f"Vowels: {vowels}, Consonants: {consonants}")
# Vowels: 3, Consonants: 7

Removing All Whitespace

text = "  Hello   World   Python  "

# Remove leading/trailing only
print(text.strip())                     # "Hello   World   Python"

# Remove ALL whitespace
print(text.replace(" ", ""))            # "HelloWorldPython"

# Normalise whitespace (collapse multiple spaces into one)
print(" ".join(text.split()))           # "Hello World Python"

Caesar Cipher (Basic Encryption)

Shift each letter by a fixed number of positions in the alphabet:

def caesar_cipher(text, shift):
    """Encrypt text using Caesar cipher with the given shift."""
    result = []
    for char in text:
        if char.isalpha():
            # Determine base: 'A' for uppercase, 'a' for lowercase
            base = ord('A') if char.isupper() else ord('a')
            # Shift the character, wrapping around with modulo 26
            shifted = (ord(char) - base + shift) % 26 + base
            result.append(chr(shifted))
        else:
            result.append(char)  # keep non-letters unchanged
    return "".join(result)

encrypted = caesar_cipher("Hello, World!", 3)
print(encrypted)                              # Khoor, Zruog!

decrypted = caesar_cipher(encrypted, -3)
print(decrypted)                              # Hello, World!

Extracting Digits from a String

text = "Order #4521, Total: ₹2,999.50, Items: 3"

# Method 1: list comprehension
digits = "".join(ch for ch in text if ch.isdigit())
print(digits)  # 452129995003

# Method 2: filter()
digits = "".join(filter(str.isdigit, text))
print(digits)  # 452129995003

# Method 3: extract as separate numbers using split and filtering
import re
numbers = re.findall(r"\d+\.?\d*", text)
print(numbers)  # ['4521', '2', '999.50', '3']

Title Case Conversion

# Built-in title() has limitations with apostrophes and acronyms
text = "hello world from PYTHON's string methods"

print(text.title())       # Hello World From Python'S String Methods
# Note: 'S is capitalised (not ideal)

# Better approach for edge cases — use capwords from string module
import string
print(string.capwords(text))  # Hello World From Python's String Methods

Validating Email (Basic Pattern Check)

def is_valid_email(email):
    """Basic email validation without regex."""
    if email.count("@") != 1:
        return False
    local, domain = email.split("@")
    if not local or not domain:
        return False
    if "." not in domain:
        return False
    if domain.startswith(".") or domain.endswith("."):
        return False
    # Check for valid characters
    allowed = set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._-")
    if not all(ch in allowed for ch in local):
        return False
    return True

print(is_valid_email("user@example.com"))      # True
print(is_valid_email("user.name@company.co"))   # True
print(is_valid_email("invalid@"))               # False
print(is_valid_email("no-at-sign.com"))         # False
print(is_valid_email("@no-local.com"))          # False

Regex Basics (Brief Introduction)

Python's re module provides regular expression support for powerful pattern matching. Here is a quick introduction — regex is a deep topic that deserves its own chapter.

Importing and Basic Functions

import re

text = "Contact us at support@meritshot.com or call 91-9876543210"

# re.search() — find first match
match = re.search(r"\d{2}-\d{10}", text)
if match:
    print(f"Phone found: {match.group()}")   # Phone found: 91-9876543210

# re.findall() — find ALL matches (returns list of strings)
emails = re.findall(r"[\w.-]+@[\w.-]+", text)
print(emails)  # ['support@meritshot.com']

# re.sub() — search and replace using patterns
cleaned = re.sub(r"\d", "*", text)
print(cleaned)
# Contact us at support@meritshot.com or call **-**********

# re.split() — split on a pattern
parts = re.split(r"[,;\s]+", "apple, banana; cherry  date")
print(parts)  # ['apple', 'banana', 'cherry', 'date']

Common Regex Patterns

PatternMatchesExample
\dAny digit (0-9)"42"
\DNon-digit"abc"
\wWord character (letter, digit, _)"hello_42"
\WNon-word character"@#!"
\sWhitespace" \t\n"
\SNon-whitespace"abc"
.Any character (except newline)"a", "1", "@"
+One or more\d+ matches "123"
*Zero or more\d* matches "" or "123"
?Zero or onecolou?r matches "color" or "colour"
{n}Exactly n times\d{4} matches "2026"
{n,m}Between n and m times\d{2,4} matches "42" or "2026"
^Start of string^Hello
$End of stringworld$
[abc]Any one of a, b, c[aeiou] matches vowels
[^abc]NOT a, b, or c[^0-9] matches non-digits
import re

# Extract all words
words = re.findall(r"\b\w+\b", "Hello, World! Python 3.12")
print(words)  # ['Hello', 'World', 'Python', '3', '12']

# Validate phone number format
phone = "91-9876543210"
if re.match(r"^\d{2}-\d{10}$", phone):
    print("Valid phone number")

# Extract date components
date_str = "Today is 2026-03-15"
match = re.search(r"(\d{4})-(\d{2})-(\d{2})", date_str)
if match:
    year, month, day = match.groups()
    print(f"Year: {year}, Month: {month}, Day: {day}")
    # Year: 2026, Month: 03, Day: 15

Practical Examples

Password Strength Checker

def check_password_strength(password):
    """
    Check password strength and return a score with feedback.
    Criteria: length, uppercase, lowercase, digits, special characters.
    """
    score = 0
    feedback = []

    # Length check
    if len(password) >= 8:
        score += 1
    else:
        feedback.append("Use at least 8 characters")

    if len(password) >= 12:
        score += 1

    # Character type checks
    if any(ch.isupper() for ch in password):
        score += 1
    else:
        feedback.append("Add at least one uppercase letter")

    if any(ch.islower() for ch in password):
        score += 1
    else:
        feedback.append("Add at least one lowercase letter")

    if any(ch.isdigit() for ch in password):
        score += 1
    else:
        feedback.append("Add at least one digit")

    special_chars = "!@#$%^&*()_+-=[]{}|;:',.<>?/~`"
    if any(ch in special_chars for ch in password):
        score += 1
    else:
        feedback.append("Add at least one special character")

    # Common pattern checks
    common_passwords = ["password", "123456", "qwerty", "admin"]
    if password.lower() in common_passwords:
        score = 0
        feedback = ["This is a commonly used password — choose something unique"]

    # Determine strength label
    if score <= 2:
        strength = "Weak"
    elif score <= 4:
        strength = "Moderate"
    else:
        strength = "Strong"

    return strength, score, feedback


# Test
passwords = ["hello", "Hello123", "M3rit$hot!2026", "password"]
for pw in passwords:
    strength, score, tips = check_password_strength(pw)
    print(f"Password: {pw:20s} | Strength: {strength:8s} | Score: {score}/6")
    for tip in tips:
        print(f"   → {tip}")
    print()

Word Frequency Counter

def word_frequency(text):
    """Count the frequency of each word in a text (case-insensitive)."""
    # Remove punctuation and convert to lowercase
    import string
    cleaned = text.lower()
    cleaned = cleaned.translate(str.maketrans("", "", string.punctuation))

    # Split into words and count
    words = cleaned.split()
    freq = {}
    for word in words:
        freq[word] = freq.get(word, 0) + 1

    # Sort by frequency (highest first)
    sorted_freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)
    return sorted_freq


text = """
Python is a great programming language. Python is used for web development,
data science, and automation. Many developers love Python because Python
is easy to learn and Python has a rich ecosystem.
"""

results = word_frequency(text)
print(f"{'Word':<15} {'Count':>5}")
print("-" * 21)
for word, count in results[:10]:      # top 10 words
    print(f"{word:<15} {count:>5}")

Text Cleaner

import string
import re

def clean_text(text):
    """
    Clean and normalise text:
    - Remove extra whitespace
    - Remove punctuation
    - Convert to lowercase
    - Remove digits (optional)
    """
    # Convert to lowercase
    text = text.lower()

    # Remove URLs
    text = re.sub(r"https?://\S+", "", text)

    # Remove email addresses
    text = re.sub(r"\S+@\S+", "", text)

    # Remove punctuation
    text = text.translate(str.maketrans("", "", string.punctuation))

    # Remove digits
    text = re.sub(r"\d+", "", text)

    # Normalise whitespace (collapse multiple spaces/newlines into one space)
    text = " ".join(text.split())

    return text.strip()


raw_text = """
    Hello!!!   This is a MESSY   text with    extra     spaces.
    Visit https://meritshot.com for more info.
    Contact:  support@meritshot.com
    Order #12345 was placed on 2026-03-15.
"""

cleaned = clean_text(raw_text)
print("Original:")
print(raw_text)
print("Cleaned:")
print(cleaned)
# hello this is a messy text with extra spaces visit for more info contact order was placed on

Practice Exercises

  1. Reverse Words: Write a function that takes a sentence and returns the sentence with each word reversed, but the word order preserved. For example, "Hello World" becomes "olleH dlroW".

  2. Acronym Generator: Write a function that takes a phrase and returns its acronym. For example, "Artificial Intelligence" becomes "AI" and "as soon as possible" becomes "ASAP".

  3. String Compression: Write a function that compresses a string using run-length encoding. For example, "aaabbbccccdd" becomes "a3b3c4d2". If the compressed string is not shorter than the original, return the original.

  4. Anagram Checker: Write a function that checks if two strings are anagrams of each other (same letters, different order). For example, "listen" and "silent" are anagrams.

  5. Pig Latin Translator: Write a function that converts English to Pig Latin. Rules: if a word starts with a vowel, add "yay" to the end; if it starts with consonants, move the leading consonants to the end and add "ay". For example, "hello" becomes "ellohay" and "apple" becomes "appleyay".

  6. Masked Credit Card: Write a function that takes a credit card number (as a string of 16 digits) and returns it masked with * except for the last 4 digits. For example, "1234567890123456" becomes "************3456".

Summary

In this chapter, you learned:

  • String creation — single quotes, double quotes, triple quotes, raw strings (r""), and byte strings (b"")
  • Indexing and slicing — positive and negative indices, [start:stop:step] syntax, reversing with [::-1]
  • Immutability — strings cannot be modified in place; every operation creates a new string
  • String methods — case conversion, searching, checking, transforming, splitting, joining, and padding
  • String formatting — f-strings (recommended), .format(), and legacy % formatting
  • Escape characters\n, \t, \\, \", Unicode escapes, and raw strings
  • Concatenation and repetition+, *, and why join() is faster in loops
  • Membership testingin and not in operators
  • Common patterns — palindromes, Caesar cipher, email validation, digit extraction
  • Regex basicsre.search(), re.findall(), re.sub(), and common patterns
  • Practical programs — password checker, word counter, text cleaner

Next up: Lists — learn about Python's most versatile data structure for storing ordered collections of items.