Chapter 15 of 23

STL — Maps, Sets and Unordered Containers

Master C++ associative containers — std::map, std::unordered_map, std::set, std::unordered_set, multimap, and multiset — with complexity analysis and a complete word-frequency counter example.

Meritshot13 min read
C++STLMapSetUnordered MapHash TableAssociative Containers
All C++ Chapters

STL — Maps, Sets and Unordered Containers

While vector and deque store elements sequentially, associative containers store them by key, enabling fast lookup, insertion, and deletion. These containers power everything from dictionary autocomplete in apps like Google Search India to counting word frequencies in NLP pipelines at companies like Jio and Infosys BPO.

In competitive programming, knowing when to use map vs unordered_map is often the difference between an accepted solution and a time-limit-exceeded verdict. This chapter covers all the major associative containers with the complexity and usage detail expected at the ₹15–40 LPA interview level.


std::map — Sorted Key-Value Store

std::map maintains keys in sorted order using a self-balancing binary search tree (typically a Red-Black Tree). Every operation — insert, find, erase — runs in O(log n) time.

Basic Usage

#include <iostream>
#include <map>
#include <string>
using namespace std;

int main() {
    map<string, int> population;

    // Insert using operator[]
    population["Delhi"]     = 32000000;
    population["Mumbai"]    = 21000000;
    population["Bengaluru"] = 13000000;
    population["Chennai"]   = 11000000;

    // Insert using insert() — preferred to avoid default-constructing the value
    population.insert({"Hyderabad", 10000000});
    population.insert(make_pair("Pune", 7000000));

    // Lookup
    cout << "Delhi population: " << population["Delhi"] << endl;

    // find() returns iterator — use it to avoid accidentally creating keys
    auto it = population.find("Kolkata");
    if (it == population.end()) {
        cout << "Kolkata not found" << endl;
    } else {
        cout << "Kolkata: " << it->second << endl;
    }

    // Iterating — keys come out in sorted (alphabetical) order
    cout << "\nAll cities (sorted):" << endl;
    for (const auto& [city, pop] : population) {   // C++17 structured binding
        cout << "  " << city << ": " << pop << endl;
    }

    cout << "\nSize: " << population.size() << endl;
    return 0;
}

Output:

Delhi population: 32000000
Kolkata not found

All cities (sorted):
  Bengaluru: 13000000
  Chennai: 11000000
  Delhi: 32000000
  Hyderabad: 10000000
  Mumbai: 21000000
  Pune: 7000000

Size: 6

The keys are alphabetically ordered because map sorts them. This is one of its defining features.

Important: operator[] Creates Keys

Accessing a non-existent key with operator[] inserts it with a default value:

map<string, int> m;
cout << m["newkey"] << endl;   // Prints 0 AND inserts "newkey" with value 0
cout << m.size()   << endl;    // 1 — "newkey" now exists in the map

Use find() when you want to check existence without side effects. Use count(key) for a simple existence check:

if (m.count("newkey") > 0) {
    // Key exists
}

std::unordered_map — Hash Table Key-Value Store

std::unordered_map uses a hash table under the hood. Keys are not sorted. Average-case complexity is O(1) for all operations, but worst case degrades to O(n) with hash collisions.

#include <iostream>
#include <unordered_map>
using namespace std;

int main() {
    unordered_map<string, int> freq;

    vector<string> words = {"roti", "dal", "roti", "sabzi", "dal", "roti"};

    for (const string& w : words) {
        freq[w]++;   // Inserts 0 if missing, then increments
    }

    for (const auto& [word, count] : freq) {
        cout << word << ": " << count << endl;
    }
    // Order is NOT guaranteed — depends on hash values
    return 0;
}

std::map vs std::unordered_map

This comparison is one of the most frequently asked C++ questions in technical interviews at Cisco India, Qualcomm Hyderabad, and e-commerce companies.

Featurestd::mapstd::unordered_map
Internal structureRed-Black Tree (BST)Hash Table
Key orderingSorted (ascending)Unordered
InsertO(log n)O(1) average, O(n) worst
FindO(log n)O(1) average, O(n) worst
EraseO(log n)O(1) average, O(n) worst
SpaceO(n) with small overheadO(n), higher constant
Key requirementsMust support operator<Must be hashable + equality-comparable
Iterator invalidation on insertNo (iterators remain valid)Possible (rehashing)
Preferred whenSorted iteration needed, small nFast lookup, order doesn't matter

Default choice: unordered_map for raw speed. Fall back to map when you need sorted keys, or when hash collisions are a concern (e.g., competitive programming anti-hash tests).

Custom Hash for unordered_map

For custom key types, provide a hash function:

struct Point { int x, y; };

struct PointHash {
    size_t operator()(const Point& p) const {
        return hash<int>()(p.x) ^ (hash<int>()(p.y) << 16);
    }
};

struct PointEqual {
    bool operator()(const Point& a, const Point& b) const {
        return a.x == b.x && a.y == b.y;
    }
};

unordered_map<Point, string, PointHash, PointEqual> grid;
grid[{0, 0}] = "origin";

std::set — Sorted Unique Elements

std::set stores unique elements in sorted order, backed by a Red-Black Tree. It is ideal when you need a sorted collection of distinct values.

#include <iostream>
#include <set>
using namespace std;

int main() {
    set<int> s;

    s.insert(50);
    s.insert(30);
    s.insert(70);
    s.insert(10);
    s.insert(30);   // Duplicate — silently ignored

    cout << "Size: " << s.size() << endl;   // 4

    // Iterating — always sorted ascending
    for (int x : s) cout << x << " ";    // 10 30 50 70
    cout << endl;

    // Search
    auto it = s.find(30);
    if (it != s.end()) cout << "Found: " << *it << endl;

    // Existence check
    cout << "Has 99? " << s.count(99) << endl;   // 0

    // Erase
    s.erase(30);
    for (int x : s) cout << x << " ";   // 10 50 70
    cout << endl;

    // Lower and upper bound
    s.insert({20, 40, 60, 80});
    auto lb = s.lower_bound(35);   // First element >= 35
    auto ub = s.upper_bound(65);   // First element > 65
    cout << "lower_bound(35): " << *lb << endl;   // 40
    cout << "upper_bound(65): " << *ub << endl;   // 70

    return 0;
}

lower_bound and upper_bound are O(log n) — a massive advantage over vector where binary search must be called separately.


std::unordered_set — Hash-Based Unique Elements

std::unordered_set is to std::set what unordered_map is to map: same interface, hash table backing, O(1) average operations, no ordering.

#include <iostream>
#include <unordered_set>
using namespace std;

int main() {
    unordered_set<string> visited;

    vector<string> urls = {
        "meritshot.com", "google.com", "meritshot.com", "flipkart.com"
    };

    for (const string& url : urls) {
        if (visited.count(url) == 0) {
            visited.insert(url);
            cout << "New: " << url << endl;
        } else {
            cout << "Already visited: " << url << endl;
        }
    }

    return 0;
}

Output:

New: meritshot.com
New: google.com
Already visited: meritshot.com
New: flipkart.com

Use unordered_set for O(1) duplicate detection. Common in LeetCode problems like "Two Sum" and "Longest Substring Without Repeating Characters."


std::multimap — Map with Duplicate Keys

std::multimap allows multiple entries with the same key, still sorted.

#include <iostream>
#include <map>
using namespace std;

int main() {
    multimap<string, int> salaries;

    // Multiple employees can have the same department key
    salaries.insert({"Engineering", 1200000});
    salaries.insert({"Engineering", 1500000});
    salaries.insert({"Engineering", 900000});
    salaries.insert({"HR",          600000});
    salaries.insert({"HR",          700000});

    // Iterate over all Engineering entries
    auto range = salaries.equal_range("Engineering");
    cout << "Engineering salaries:" << endl;
    for (auto it = range.first; it != range.second; ++it) {
        cout << "  Rs." << it->second / 100000.0 << " LPA" << endl;
    }

    cout << "Total entries: " << salaries.size() << endl;   // 5
    cout << "Engineering count: " << salaries.count("Engineering") << endl; // 3
    return 0;
}

equal_range returns a pair of iterators marking the begin and end of all entries with the given key.


std::multiset — Set with Duplicate Elements

std::multiset stores sorted elements, allowing duplicates. Useful for frequency-order processing or when you need sorted access to repeated values.

#include <iostream>
#include <set>
using namespace std;

int main() {
    multiset<int> ms;
    ms.insert({5, 3, 8, 3, 5, 5, 1});

    for (int x : ms) cout << x << " ";   // 1 3 3 5 5 5 8
    cout << endl;

    cout << "Count of 5: " << ms.count(5) << endl;   // 3

    // Erase ONLY ONE occurrence of 3
    ms.erase(ms.find(3));

    for (int x : ms) cout << x << " ";   // 1 3 5 5 5 8
    cout << endl;

    // Erase ALL occurrences of 5
    ms.erase(5);

    for (int x : ms) cout << x << " ";   // 1 3 8
    cout << endl;

    return 0;
}

Important: ms.erase(value) removes ALL elements with that value. ms.erase(ms.find(value)) removes only one. This distinction is a classic interview trap.


Common Operations Summary

Operationmap/setunordered_map/unordered_set
Insertm[k]=v or insert({k,v})Same
Findm.find(k) or m.count(k)Same
Erase by keym.erase(key)Same
Erase by iteratorm.erase(it)Same
Iteratefor (auto& [k,v] : m) — sortedSame — unordered
Check emptym.empty()Same
Sizem.size()Same
Clearm.clear()Same
Bounds (map/set only)lower_bound, upper_bound, equal_rangeNot available

Worked Example: Word Frequency Counter

Word frequency counting is used in everything from search engine indexing (Google Search, Bing India) to spam detection and academic plagiarism tools (Turnitin). Here is a complete implementation using both map and unordered_map.

#include <iostream>
#include <map>
#include <unordered_map>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <cctype>
using namespace std;

// Normalise a word: lowercase, remove punctuation
string normalise(const string& word) {
    string result;
    for (char c : word) {
        if (isalpha(c)) result += tolower(c);
    }
    return result;
}

int main() {
    string paragraph =
        "India is a land of diversity. India has many languages, cultures, "
        "and traditions. The culture of India is rich and vibrant. "
        "People in India celebrate many festivals. India is known for its "
        "rich heritage and culture.";

    // --- Using unordered_map for O(1) average insertion ---
    unordered_map<string, int> freqMap;
    istringstream stream(paragraph);
    string token;

    while (stream >> token) {
        string word = normalise(token);
        if (!word.empty()) freqMap[word]++;
    }

    cout << "=== Word Frequencies ===" << endl;
    for (const auto& [word, count] : freqMap) {
        cout << "  " << word << ": " << count << endl;
    }

    // --- Sort by frequency descending using vector of pairs ---
    vector<pair<string, int>> sortedFreq(freqMap.begin(), freqMap.end());
    sort(sortedFreq.begin(), sortedFreq.end(),
         [](const pair<string,int>& a, const pair<string,int>& b) {
             if (a.second != b.second) return a.second > b.second;
             return a.first < b.first;   // Alphabetical tiebreak
         });

    cout << "\n=== Top 5 Words ===" << endl;
    for (int i = 0; i < min(5, (int)sortedFreq.size()); i++) {
        cout << "  " << i + 1 << ". "
             << sortedFreq[i].first << " (" << sortedFreq[i].second << ")"
             << endl;
    }

    // --- Use std::map to get an alphabetically sorted view ---
    map<string, int> alphaMap(freqMap.begin(), freqMap.end());

    cout << "\n=== Alphabetical Order ===" << endl;
    for (const auto& [word, count] : alphaMap) {
        cout << "  " << word << ": " << count << endl;
    }

    // --- Find words that appear exactly once (hapax legomena) ---
    cout << "\n=== Words Appearing Once ===" << endl;
    for (const auto& [word, count] : alphaMap) {
        if (count == 1) cout << "  " << word << endl;
    }

    // --- Using multimap to group words by frequency ---
    multimap<int, string, greater<int>> byFreq;  // Sorted by freq descending
    for (const auto& [word, count] : freqMap) {
        byFreq.insert({count, word});
    }

    cout << "\n=== Grouped By Frequency ===" << endl;
    int lastFreq = -1;
    for (const auto& [count, word] : byFreq) {
        if (count != lastFreq) {
            cout << "  Frequency " << count << ": ";
            lastFreq = count;
        }
        cout << word << " ";
    }
    cout << endl;

    return 0;
}

Sample Output (truncated):

=== Word Frequencies ===
  india: 5
  is: 3
  culture: 2
  ...

=== Top 5 Words ===
  1. india (5)
  2. is (3)
  3. and (3)
  4. culture (2)
  5. many (2)

=== Alphabetical Order ===
  a: 1
  and: 3
  celebrate: 1
  ...

=== Words Appearing Once ===
  a
  celebrate
  diversity
  ...

=== Grouped By Frequency ===
  Frequency 5: india
  Frequency 3: and is
  Frequency 2: culture many rich
  ...

This example demonstrates transferring from unordered_map to map for sorted output, sorting pairs by custom comparator, and using multimap for frequency grouping — all patterns that appear in real-world data processing code.


Common Pitfalls

  1. operator[] on map inserts on miss. Using m["key"] when checking existence accidentally inserts the key with a zero-initialised value. Use m.find("key") != m.end() or m.count("key") instead.

  2. multiset::erase(value) removes all. If you only want to remove one occurrence from a multiset, use ms.erase(ms.find(value)). This is one of the most common multiset bugs in competitive programming.

  3. Iterating and modifying simultaneously. Erasing elements while iterating with a range-based for loop is undefined behaviour. Use the iterator-based erase pattern: it = m.erase(it).

  4. Assuming unordered_map is always faster. For small maps (under ~30 entries), map is often faster in practice due to cache effects and the overhead of hashing. Always profile before committing to either.

  5. Hash collision attacks in competitive programming. Adversarial test cases can trigger O(n) behaviour in unordered_map if the judge knows your hash function. Use a custom hash with a random seed, or use map when safety matters more than speed.

  6. Structured bindings require C++17. for (const auto& [key, val] : m) requires C++17 or later. Ensure your compiler is invoked with -std=c++17 (or -std=c++20). On older standards, use it->first and it->second.


Practice Exercises

  1. Write a program that reads a list of student names and their marks, stores them in a map<string, int>, then prints only the students who scored above the class average.

  2. Implement a phone book using multimap<string, string> where each person can have multiple phone numbers. Support adding, removing one number, and listing all numbers for a name.

  3. Use unordered_set<int> to find all duplicate elements in a vector of integers in O(n) time. Compare your solution with a naive O(n^2) approach on a vector of 100,000 elements.

  4. Given a sentence, use map<char, int> to find the first non-repeating character. This is a classic problem in Amazon and Microsoft India telephonic rounds.

  5. Build a simple in-memory index for a document search engine: use map<string, set<int>> where the key is a word and the value is the set of document IDs containing that word. Implement search(word) to return document IDs and addDocument(id, text) to index a new document.


Summary

  • std::map stores key-value pairs sorted by key in a Red-Black Tree; all operations are O(log n).
  • std::unordered_map uses a hash table for O(1) average operations but offers no sorted iteration.
  • std::set stores unique elements sorted; std::unordered_set offers the same uniqueness guarantee with O(1) average lookup.
  • std::multimap and std::multiset allow duplicate keys/values while maintaining sorted order.
  • Use find() or count() for existence checks on map — never operator[] when you do not want accidental insertion.
  • multiset::erase(value) removes all matching elements; use erase(find(value)) to remove exactly one.
  • lower_bound and upper_bound on map/set provide O(log n) range queries not available on unordered containers.
  • For sorted output from an unordered_map, copy to a vector<pair<K,V>> and sort, or copy to a map.
  • Hash collision attacks are a real concern; prefer map in competitive programming when adversarial inputs are expected, or use a randomised custom hash.
  • C++17 structured bindings (auto& [key, val]) make range-based iteration over maps significantly cleaner.