1.1 What is an Algorithm
Introduction
Before you write a single line of code for any problem, you need a clear, step-by-step plan. That plan—when it is precise, unambiguous, and finite—is what we call an algorithm. This is the first and most important idea in the entire DSA course: every correct program is the implementation of some algorithm. If you understand what an algorithm is and what makes one good or bad, you have the foundation for everything that follows.
Real-World Analogy
Think of an algorithm like a recipe. A recipe has:
- Inputs: ingredients (flour, eggs, sugar)
- Steps: mix, bake at 350°F for 30 minutes
- Output: a cake
If the steps are vague (“add some flour” or “bake until it looks done”), two people can get different results. If the steps are exact (“add 2 cups flour,” “bake 30 minutes”), anyone who follows them gets the same cake. An algorithm is the same: it takes defined inputs, follows a fixed set of clear steps, and produces a defined output. The more precise the steps, the more reliable the result.
Giving someone directions is also an algorithm: start at A, turn left at the gas station, go 2 miles, turn right at the red house. Input = starting point; steps = the turns and distances; output = reaching the destination. Ambiguous directions (“go that way for a bit”) are like a bad algorithm—they don’t guarantee the same outcome every time.
Formal Definition
In computer science, we define an algorithm as:
An algorithm is a finite, well-defined sequence of unambiguous instructions that, when followed from a given set of inputs, produces a corresponding set of outputs and terminates in a finite amount of time.
Breaking that down:
- Finite: The list of steps has an end. It doesn’t run forever.
- Well-defined: Each step is clear. There’s no “do something useful” or “maybe do this.”
- Unambiguous: Only one interpretation. No “sometimes do A, sometimes B” unless we explicitly say under what condition.
- Inputs: We know what we’re given (e.g., a list of numbers, a string).
- Outputs: We know what we must produce (e.g., the maximum, a sorted list, yes/no).
- Terminates: The process stops. We don’t require infinite time or infinite memory.
Why This Topic Matters
Interviews and real-world design both revolve around algorithms. Interviewers ask you to “design an algorithm” for a problem—they want to see that you can break a task into clear, correct steps before coding. In production, the algorithm you choose decides whether your system scales (e.g., O(n log n) vs O(n²)) or fails under load. Understanding what an algorithm is—and what makes one better than another—is the basis for the rest of this course.
Mental Model
Picture an algorithm as a black box with a contract:
┌─────────────────────────────────────┐
│ INPUT(s) │
│ e.g. array [3, 1, 4, 1, 5] │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ ALGORITHM (sequence of steps) │
│ • Step 1: ... │
│ • Step 2: ... │
│ • ... │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ OUTPUT(s) │
│ e.g. maximum value 5 │
└─────────────────────────────────────┘
You don’t have to care how the box is implemented (which language, which data structures) when you’re defining the algorithm. You only care: given these inputs, these steps produce this output. Implementation comes later.
Properties of a Good Algorithm
Not every set of steps is a “good” algorithm. We usually require:
| Property | Meaning |
|---|---|
| Finiteness | Stops after a finite number of steps. |
| Definiteness | Each step is precisely defined; no ambiguity. |
| Input | Zero or more well-defined inputs. |
| Output | At least one well-defined output. |
| Effectiveness | Every step is doable in finite time (e.g., no “solve the Halting Problem”). |
Step-by-Step: Describing an Algorithm
When you explain an algorithm, you typically:
- State the problem: What are we given, and what must we return?
- Assume valid input: (Or state what you assume, e.g., “array is non-empty.”)
- List steps in order: Numbered, clear, one idea per step.
- Handle edge cases: Empty input, single element, duplicates—whatever the problem allows.
You can describe the same algorithm in plain English, pseudocode, or code. The algorithm is the idea; the code is one way to run it.
Example: Find the Maximum in a List
Problem: Given a list of numbers, return the largest number.
Input: A list arr of numbers (we’ll assume at least one element).
Output: The maximum value in arr.
Algorithm (steps):
- Assume the first element is the maximum; call it
max_so_far. - For each remaining element in the list: if that element is greater than
max_so_far, updatemax_so_farto this element. - When all elements have been checked, return
max_so_far.
This is finite (we do one pass), well-defined (each step is clear), and effective. It’s an algorithm.
Python Implementation
def find_max(arr):
if not arr:
return None # or raise, depending on contract
max_so_far = arr[0]
for i in range(1, len(arr)):
if arr[i] > max_so_far:
max_so_far = arr[i]
return max_so_far
The code above is one implementation of that algorithm. The algorithm itself is the three-step idea; Python is just the language we used to run it.
Algorithm vs Program
- Algorithm: Language-independent, step-by-step procedure. It’s the “what to do” and “in what order.”
- Program: An implementation of one or more algorithms in a specific language (Python, Java, etc.). It’s the “how it runs on a machine.”
One algorithm can be implemented in many languages; the core logic stays the same. When we analyze “time complexity” or “space complexity” later, we’re analyzing the algorithm, not the quirks of a single language.
Confusing “writing code” with “designing an algorithm.” In interviews, always clarify the steps (and edge cases) before coding. The algorithm is the plan; the code is the execution of that plan.
When asked “how would you solve X?”, start by stating the problem (inputs/outputs), then give a high-level algorithm in steps. Only then translate to code. Showing that you think in algorithms—not just syntax—signals strong problem-solving skills.
Summary
- An algorithm is a finite, well-defined, unambiguous sequence of steps that maps inputs to outputs and terminates.
- Good algorithms are finite, definite, have clear input/output, and are effective.
- You can describe an algorithm in words, pseudocode, or code; the algorithm is the idea, the code is one implementation.
- Thinking in algorithms first (then implementing) is the foundation of DSA and of strong technical interviews.
1.2 What is a Data Structure
Introduction
An algorithm tells you what steps to follow. But those steps operate on something—numbers, names, relationships. How you organize and store that “something” so that your algorithm can work efficiently is the job of a data structure. Algorithms and data structures are inseparable: the right structure makes the algorithm simple and fast; the wrong one makes it clumsy or slow.
Real-World Analogy
Imagine you need to find a book in a library.
- Stack of unsorted books: You look one by one until you find it. Slow to search, but easy to add a new book (just put it on top).
- Books sorted by title on a shelf: You can jump to the right section (like binary search). Fast to search, but inserting a new book may require shifting others.
- Index cards (catalog): You look up “Author name” and get the shelf location. Very fast lookup if you know the key.
Same “data” (books), different ways of organizing it—each with different tradeoffs for “find a book,” “add a book,” “remove a book.” A data structure is exactly that: a chosen way to organize and store data so that the operations we care about (search, insert, delete, etc.) are as efficient as we need.
A queue at a ticket counter: first come, first served. The “structure” is an ordered line; the operations are “join at the end” and “serve from the front.” A stack of plates: last in, first out. Same idea—different rules for how we add and remove—different data structure.
Formal Definition
A data structure is a way to store, organize, and manage data in memory so that certain operations (access, insertion, deletion, search, etc.) can be performed efficiently.
“Efficiently” depends on the problem: sometimes we need fast lookup by key (dictionary); sometimes we need order (sorted list); sometimes we need fast insert at one end (list, queue). The data structure is the organization; the algorithm is the procedure that uses it.
Why This Topic Matters
In interviews and in production, the first question after “what’s the algorithm?” is often “what data structure will you use?” Picking an array vs a hash set can change the time complexity from O(n²) to O(n). Trees, graphs, and heaps exist because certain problems are naturally expressed and solved with those shapes. You don’t just “use a list for everything”—you choose a structure that matches the operations and constraints of the problem.
Mental Model
Every data structure gives you two things:
- Storage layout: How the data is arranged in memory (contiguous array, linked nodes, key–value buckets, tree, graph).
- Operation set: What you can do with it—and how fast. For example: “add at end O(1),” “search by value O(n),” “look up by key O(1) average.”
ALGORITHM (steps) + DATA STRUCTURE (storage + operations) = PROGRAM
"How to solve" "Where to put data & how to access" Working code
Change the data structure and the same high-level algorithm might become faster or simpler; change the algorithm and you might need a different structure. They are designed together.
Algorithm and Data Structure: A Partnership
An algorithm assumes it can do certain things with the data: “get the next element,” “check if this key exists,” “remove the smallest.” The data structure provides those operations. If your algorithm needs “find minimum quickly,” you might choose a heap; if it needs “check membership quickly,” you might choose a set. So:
- Algorithm = what steps to perform.
- Data structure = how data is organized so those steps are efficient.
You’ll often hear “hash map” or “two pointers” or “binary search”—the first is a data structure, the others are algorithmic ideas. In practice we combine them: e.g., “use a hash map and one pass” (structure + algorithm).
Types of Data Structures (High-Level)
We don’t need to memorize a catalog yet. It’s enough to see the landscape:
- Linear: Data in a line—arrays, linked lists, stacks, queues. Order matters; access by position or by scanning.
- Tree: Hierarchical—binary trees, BSTs, heaps. Good for “smallest/largest,” “split by range,” “parent–child” relationships.
- Graph: Nodes and edges—networks, dependencies, maps. Good for “paths,” “connected components,” “shortest route.”
- Hash-based: Store by key; access by key in (average) constant time. Dictionaries, sets.
Later sections of the course go deep on each. Here the takeaway is: different shapes support different operations and tradeoffs.
Same Data, Different Structure
Suppose you must support: “add an element at the front” and “traverse all elements.”
- Array (list): Insert at front is expensive—you shift every element one position. O(n). Traverse is O(n).
- Singly linked list: Insert at front is just “create a node and point the head to it.” O(1). Traverse is still O(n).
So for “insert at front” often, a linked list can be a better choice than an array. The data is the same (a sequence of items); the structure decides the cost of each operation.
Choosing a Data Structure
Ask:
- What operations do I need? (insert, delete, search, get min, range query, etc.)
- How often is each used? (e.g., search often → consider hash or sorted structure)
- What are the constraints? (memory, order, duplicates allowed?)
Then match the structure to the operations. This “operation-first” thinking is how experienced engineers pick structures in interviews.
When stuck, write down the operations you need (e.g., “add,” “remove max,” “check if present”). Then think: “Which standard structure supports these?” Often the problem is designed to fit one—array, hash, heap, tree, or graph.
Summary
- A data structure is a way to store and organize data so that the operations you need (access, insert, delete, search) can be done efficiently.
- Algorithms and data structures work together: the algorithm defines the steps; the structure defines how data is stored and what operations cost.
- Different structures (linear, tree, graph, hash) offer different tradeoffs; choose based on the operations and constraints of the problem.
1.3 How to Think Like a Problem Solver
Introduction
Solving a problem in an interview or in real code isn’t about memorizing solutions—it’s about a repeatable way of thinking. This section gives you a framework: what to do from the moment you read the problem until you have a clear, implementable plan. Master this and you’ll handle new problems you’ve never seen before.
Why a Framework Matters
Without a process, it’s easy to jump into code, miss edge cases, or get stuck. With a process, you: clarify the problem, test your understanding with examples, get a simple solution first, then improve. That’s how strong problem solvers work—and how you should too.
The Problem-Solving Mindset
- Clarify before solving: Make sure you understand inputs, outputs, and rules. Ask “what if” (empty input? duplicates? negative numbers?).
- Start simple: Get a correct solution first, even if it’s slow. Correctness is the baseline; optimization comes next.
- Use examples: Work 2–3 small examples by hand. If your approach works on paper, it’s easier to translate to code.
- Break it down: If the problem is big, solve a smaller version or one part first.
Step-by-Step Framework
Follow these steps every time. They work for interviews and for practice.
Step 1: Read and Restate
Read the problem fully. Then restate it in your own words: “So I’m given X, and I need to return Y, with these rules: …” If you can’t restate it, you don’t understand it yet. Ask clarifying questions (in an interview) or re-read (in practice).
Restating forces you to identify the inputs, the output, and any constraints. That’s the contract your solution must satisfy.
Step 2: Identify Inputs and Output
Write them down explicitly. Example: “Input: array of integers, length n. Output: indices of two numbers that add up to target, or [-1, -1] if none.” Knowing I/O prevents you from solving the wrong problem.
Step 3: Work Small Examples and Edge Cases
Pick 2–3 small examples and solve them by hand. Then think of edge cases:
- Empty input, single element, two elements
- No valid answer (e.g., no pair sums to target)
- Duplicates, negatives, zeros
- Maximum size (what if n is huge?)
If your approach breaks on an edge case, fix the approach before coding.
Step 4: Brute Force First
Describe the simplest solution that definitely works—even if it’s “try every pair” or “check every possibility.” Say it out loud or write 3–5 steps. This gives you a correctness baseline and often reveals a path to a better solution.
Step 5: Optimize (If Needed)
Look for repeated work, unnecessary loops, or a better data structure. Can you use a hash map to avoid a second loop? Can you sort and use two pointers? We’ll build a toolkit of patterns; for now, the habit is: “I have a working solution; where is the waste?”
Step 6: Write the Code
Only after steps 1–5, translate your algorithm to code. You’re implementing a plan, not exploring in the dark. Test with the examples and edge cases you already thought through.
READ → RESTATE → I/O → EXAMPLES & EDGE CASES → BRUTE FORCE → OPTIMIZE → CODE → TEST
Restate in Your Own Words
This is the single most underused habit. Before writing a single line, say: “I need to … given … and return … when …” If you do this, you’ll catch misunderstandings early and your code will match the problem.
Work Examples by Hand
Take a tiny input (e.g., array [2, 7, 11], target 9). Walk through your intended steps. Do you get [0, 1]? If yes, your logic is clear. If no, fix the logic. This “hand trace” is how you avoid bugs and how you explain your approach in an interview.
Jumping into code before understanding the problem or testing an approach on paper. You’ll waste time debugging or solving the wrong problem. Always clarify and example first.
Interviewers want to see your process. Talk through: “First I’m clarifying … here are the inputs and output … let me try a small example … my brute force would be … then I can optimize by …” That narrative is as important as the code.
Summary
- Use a framework: read → restate → I/O → examples & edge cases → brute force → optimize → code.
- Clarify the problem and restate it; identify inputs and output explicitly.
- Work small examples and edge cases by hand before coding.
- Get a correct (brute force) solution first, then look for optimization.
1.4 Brute Force Approach
Introduction
The brute force approach is the straightforward solution that “just tries everything” or does the most obvious thing—no clever tricks yet. It’s often slow for large inputs, but it’s simple, easy to get right, and gives you a correct baseline. In problem-solving and in interviews, you should start with brute force, then optimize only when needed.
What Is Brute Force?
Brute force means solving a problem by trying all relevant possibilities or by the most direct, naive method—without optimizing for time or space. Correctness first; efficiency second.
Examples:
- Find two numbers that sum to target: Check every pair. Two nested loops. Slow (O(n²)) but obviously correct.
- Find maximum in an array: Scan every element and keep the largest. One loop. For this problem, that “naive” scan is already optimal—so “brute force” here is also the best solution.
- Find a value in an unsorted list: Linear search—check each element until you find it. Brute force and, for unsorted data, the only option.
So “brute force” doesn’t always mean “bad.” It means “no clever optimization yet.” Sometimes the brute force solution is already good enough.
Why Start With Brute Force?
- Correctness: You have a solution that works. You can test it, debug it, and verify against examples.
- Baseline: You know the worst-case behavior (e.g., O(n²)). Optimization then means “do better than this.”
- Clarity: Simple logic is easier to explain and to implement without bugs.
- Interview signal: Showing brute force first, then improving, demonstrates structured thinking—better than jumping to a half-remembered “optimal” idea and getting stuck.
Mental Model
Think: “What’s the dumbest correct solution?” That’s brute force. Then ask: “Where am I doing redundant work? Can a different data structure or a different order of operations remove it?” That’s the path from brute force to a better solution.
Example: Two Sum (Evolution)
Problem: Given an array of integers and a target, return indices of two numbers that add up to the target. Assume exactly one valid pair.
Brute Force: Try Every Pair
For each index i, for each index j > i, check if arr[i] + arr[j] == target. If yes, return [i, j].
- Time: Two nested loops over n elements → O(n²).
- Space: O(1) besides the input.
def two_sum_brute(arr, target):
n = len(arr)
for i in range(n):
for j in range(i + 1, n):
if arr[i] + arr[j] == target:
return [i, j]
return [-1, -1]
This is correct. For small n it’s fine. For large n we want to do better.
Better: One Pass With a Hash Map
As we scan the array, for each value arr[i] we need a partner target - arr[i]. If we’ve already seen that partner at some index j, we can return [j, i]. Store “value → index” in a dictionary.
- Time: One pass, O(n). Each lookup/insert in the dict is O(1) average.
- Space: O(n) for the dictionary.
def two_sum_better(arr, target):
seen = {} # value -> index
for i, x in enumerate(arr):
need = target - x
if need in seen:
return [seen[need], i]
seen[x] = i
return [-1, -1]
Same problem, same correctness—faster when n is large. This is the “evolution”: brute force first, then optimize by removing the inner loop with a hash map.
The jump from O(n²) to O(n) came from asking: “What am I doing repeatedly?” Answer: “Looking for a complement for each element.” A hash map turns that repeated search into O(1) lookup—classic trade: a bit of extra space for a lot less time.
When Is Brute Force Acceptable?
- Small input: If n is tiny (e.g., ≤ 20), O(n²) or even O(2^n) might run in milliseconds. Brute force is fine.
- Quick to code: In a contest or interview, a slow-but-correct solution can be better than no solution or a buggy “clever” one.
- No better known: Some problems (e.g., certain NP-hard cases) don’t have a much better algorithm; brute force (or clever brute force) is what we use.
Evolution: Brute Force → Better → Optimal
Get in the habit of this progression:
- Brute force: Correct, simple, maybe slow. State it and implement it.
- Better: Identify the bottleneck (extra loop? repeated work?). Use a better structure or invariant (e.g., hash, two pointers, sort).
- Optimal: Often “better” is already optimal (e.g., one pass + hash for Two Sum). Sometimes you can prove “we must look at each element at least once” → O(n) is a lower bound.
Don’t skip step 1. It keeps your thinking grounded and your code correct.
Optimizing too early and producing a wrong or incomplete solution. Or not mentioning brute force in an interview—interviewers want to see that you can get a correct solution first, then improve.
Say: “The brute force would be to try every pair / check every subset / … That’s O(…). Then we can optimize by …” This shows you prioritize correctness and then efficiency—exactly what interviewers look for.
Summary
- Brute force = straightforward, try-all or naive solution. Correctness first; no clever optimization yet.
- Always consider brute force first: it’s a correct baseline and often reveals how to optimize.
- Evolution: brute force → find bottleneck → better data structure or algorithm → optimal (if needed).
- For small inputs or when a better solution isn’t obvious, brute force is acceptable and sometimes preferred.
1.5 Optimization Strategy
Introduction
Once you have a correct solution—often brute force—the next step is to ask: where is the time or space being wasted? Optimization isn’t random cleverness; it’s a systematic way to find bottlenecks and remove them. This section gives you a repeatable optimization strategy so you can move from “it works” to “it scales.”
Why Optimize Systematically?
Without a strategy, you might optimize the wrong part (e.g., micro-tune a loop that’s already O(n) while the real cost is an O(n²) nested loop elsewhere). With a strategy, you measure or reason about where the cost is, then attack that first. One bottleneck fixed often yields a bigger win than many small tweaks.
Mental Model
Think of your program as a pipeline: some steps run once, some run inside loops. The total time is dominated by what runs most often or on the largest data. Optimization means: find that hot spot, then either do less work there or do the same work fewer times.
Step-by-Step Optimization Strategy
Step 1: Establish Correctness First
Never optimize broken code. Get a solution that passes your tests and handles edge cases. Optimization changes code; if the baseline is wrong, you’ll either hide bugs or optimize the wrong behavior.
Step 2: Identify the Bottleneck
Ask: “What is the slow part?”
- By counting: Look at loops. A single loop over n → O(n). Two nested loops over n → O(n²). Three nested → O(n³). The deepest or most repeated structure usually dominates.
- By profiling (when you can run code): Use a profiler to see where CPU time is spent. Focus on the top few functions or lines.
In interviews you usually reason by counting. In production, profiling confirms your guess.
Step 3: Ask “What Am I Doing Redundantly?”
Often the bottleneck is repeated work. Examples:
- Repeated search: For each element, scanning the rest of the array to find a match → O(n²). Replace the inner scan with a hash lookup → O(n).
- Repeated computation: Computing the same sum or max over and over in a loop. Compute once, reuse (e.g., prefix sum, sliding window).
- Repeated traversal: Walking the whole list to find one item, many times. One pass with a hash or one sort + linear pass can often replace many passes.
Most big wins come from eliminating a loop (by a better data structure) or from reusing work (caching, prefix sums, invariants). Look for “do I need to do this every time, or can I do it once and reuse?”
Step 4: Consider the Right Data Structure
The wrong structure forces extra work. Examples:
- “Check if this value exists” in a list → O(n) per check. In a set → O(1) average. Replace list membership with a set when you need many lookups.
- “Get the smallest element” in a list → O(n). In a min-heap → O(log n) for extract-min. Use a heap when you need repeated min/max.
- “Insert at front” in an array → O(n) shift. In a linked list → O(1). Choose the structure that makes your dominant operation cheap.
Step 5: Trade Space for Time (When Appropriate)
Often you can use extra memory to avoid recomputation: hash maps, prefix arrays, caching. If the problem allows O(n) extra space and you can turn O(n²) into O(n), that’s usually a good trade. Don’t over-optimize space when time is the constraint.
Step 6: Re-evaluate After a Change
After each optimization, confirm correctness and re-check the bottleneck. Sometimes the bottleneck moves (e.g., I/O or a different loop). Stop when you meet the required performance or when further optimization isn’t worth the complexity.
CORRECT solution → FIND bottleneck (loops, repeated work) → REMOVE redundancy
(data structure / cache / one pass) → VERIFY still correct → REPEAT if needed
Common Optimization Patterns
- Hash for lookups: Replace “search for X in the rest of the array” with “is X in this set?” → often O(n²) to O(n).
- Two pointers / sliding window: Replace “for every start, for every end” with one pass where start and end move in one direction → O(n²) to O(n).
- Sort first: Sorted data enables binary search or two-pointer scans. Sort is O(n log n); if that’s cheaper than repeated linear scans, sort once and reuse.
- Prefix sum: Replace “sum from i to j” computed in a loop with precomputed prefix array → O(1) per query after O(n) setup.
When to Stop Optimizing
Optimize until:
- You meet the time/space constraints (e.g., problem says n ≤ 10^5 and your solution is O(n log n)), or
- Further optimization would make the code much harder to read or maintain without a clear need.
Don’t optimize for the sake of it. Correct and clear first; then fast enough.
State your brute force and its complexity, then say: “The bottleneck is the inner loop / repeated lookup. I can remove it by using a hash map / two pointers / …” That shows you think in bottlenecks and know how to optimize systematically.
Summary
- Optimize only after you have a correct solution.
- Find the bottleneck (nested loops, repeated work); attack that first.
- Look for redundancy: repeated search → hash; repeated range computation → prefix sum or sliding window.
- Choose the right data structure so the dominant operation is cheap.
- Trade space for time when it removes a bottleneck.
1.6 Writing Pseudocode
Introduction
Pseudocode is a compact, language-agnostic way to describe an algorithm using a mix of plain English and simple control structures (loops, conditionals). It’s the bridge between “I know what to do” and “here’s the code.” Writing pseudocode first helps you get the logic right before you worry about syntax, and it’s how you’re expected to communicate your approach in interviews.
Why Write Pseudocode?
- Clarify logic: You focus on steps and order, not semicolons or types. Mistakes in logic show up before you write a single line of code.
- Communicate: In interviews, pseudocode lets you explain your algorithm quickly. Interviewers can follow it even if they don’t use your language.
- Plan: It’s an outline. You can refine it (e.g., “here I need a loop”) and only then translate to real code.
What Pseudocode Looks Like
There’s no single standard. The goal is: readable and unambiguous. Use:
- Indentation for blocks (like Python).
- Keywords such as if / else / for / while / return.
- Short names for variables and data (e.g.,
arr,n,result). - Plain English for high-level steps when that’s clearer than fake code.
Avoid: language-specific syntax (e.g., list comprehensions or pointer arithmetic) unless they make the idea obvious. Prefer clarity over looking like real code.
Conventions We’ll Use
- Assignment:
x = valueorresult ← value. - Indexing:
arr[i]for the i-th element (0-based unless stated). - Length:
length(arr)ornwhen we’ve setn = length(arr). - Loops:
for i from 0 to n-1orfor each element x in arr. - Return:
return result.
Example: Two Sum in Pseudocode
Input: Array arr of integers, integer target.
Output: Indices i, j such that arr[i] + arr[j] == target, or “none” if no such pair.
function twoSum(arr, target):
n = length(arr)
for i from 0 to n-1:
for j from i+1 to n-1:
if arr[i] + arr[j] == target:
return [i, j]
return "none"
This is brute force. Now optimized version using a hash map:
function twoSum(arr, target):
seen = empty map // value -> index
for i from 0 to length(arr)-1:
need = target - arr[i]
if need is in seen:
return [seen[need], i]
put (arr[i], i) in seen
return "none"
Same problem; the second version makes the “lookup complement” step explicit and shows we only need one pass.
How Much Detail?
Enough that someone could implement it without guessing. Include:
- Loop bounds and what the loop variable means.
- Conditions for if/else and what you return in each case.
- Where you update state (e.g., “add current element to set”).
You can omit: exact type names, error handling, or trivial details (“increment i”).
In an interview, write 5–15 lines of pseudocode before coding. It keeps you on track and gives the interviewer a clear picture of your algorithm. If you get stuck coding, the pseudocode is your roadmap.
Writing pseudocode that’s so vague (“do something with the array”) that it doesn’t constrain the implementation. Or writing full code in pseudocode—then small syntax errors distract from the logic. Aim for the middle: clear structure and key steps, not every semicolon.
Summary
- Pseudocode = language-agnostic description of an algorithm using simple control flow and names.
- Use it to clarify logic, communicate in interviews, and plan before coding.
- Keep it readable: indentation, clear loops and conditions, enough detail to implement without guessing.
1.7 Debugging Techniques
Introduction
Debugging is the process of finding and fixing the cause of incorrect behavior—wrong output, crash, or infinite loop. It’s a core skill: even correct algorithms get implemented with off-by-one errors or missed edge cases. This section gives you a systematic way to narrow down where the bug is and fix it quickly.
Mindset
Assume the bug is in your code or assumptions, not the compiler or the problem. Form a hypothesis (“maybe the loop starts at the wrong index”), test it, and adjust. Random changes rarely help; targeted checks do.
Step-by-Step Debugging Process
Step 1: Reproduce the Failure
Get a concrete input that produces the wrong result (or crash). If you can’t reproduce it, you can’t fix it reliably. Use the problem’s examples first; then try small inputs you design (edge cases: empty, one element, two elements).
Step 2: State the Expected vs Actual
Write down: “For input X, I expected Y, but I got Z.” That makes the bug unambiguous. Sometimes in writing this you already spot the mistake (e.g., “I expected the first index to be 0 but I’m returning 1”).
Step 3: Narrow the Location
Find where the program first goes wrong. Methods:
- Print / log: Print key variables at the start of the loop, after conditionals, and before return. Check: “At step 2, is
iwhat I think it is? Isseencorrect?” - Binary search: Comment out the second half of the logic (or test with half the input). If the bug disappears, it’s in the part you removed. Narrow until you’re down to a few lines.
- Rubber duck: Explain the code line by line to yourself or someone else. Often you hear yourself say something wrong (“and here we add 1 … wait, we shouldn’t”).
Step 4: Find the Root Cause
Don’t stop at “it’s in this function.” Identify the exact wrong assumption or wrong operation. Examples: “I used < but the problem says strictly less,” “I’m updating the index after the loop so it’s off by one,” “I’m not handling empty input.” Fix that assumption or line.
Step 5: Fix and Re-test
Make the minimal change that fixes the bug. Re-run the failing case and the examples you had passing. Add a test for the edge case you missed so it doesn’t come back.
Techniques in Practice
Print Debugging
Insert print(...) (or logs) at critical points: loop start (loop variable, key state), after branches, before return. Inspect the output. Remove or comment out prints when done.
# Example: debugging a loop
for i in range(len(arr)):
print("i:", i, "arr[i]:", arr[i], "need:", target - arr[i]) # temporary
if (target - arr[i]) in seen:
return [seen[target - arr[i]], i]
seen[arr[i]] = i
Use a Debugger
When you can run the code locally, use a debugger: set breakpoints, step line by line, inspect variables. You see state at each step without adding prints. Essential for larger programs.
Check Edge Cases Explicitly
Many bugs are at boundaries: empty input, one element, first/last index. Add a small test for these. If the code fails on “empty list,” the fix is usually at the start of the function (early return or different initialization).
Compare With a Working Example
Trace through your code by hand on the same input that fails. Write down the value of each variable at each step. Where does your trace first differ from what the code actually does? That’s near the bug.
Common Bug Categories
- Off-by-one: Loop runs one too many or too few times; index is 0-based vs 1-based. Check loop bounds and indexing.
- Wrong condition: Used
<instead of<=, or forgot to handle equality. Re-read the problem and your condition. - State not updated: Forgot to add an element to a set or update a variable inside a loop. Ensure every path that should update state does.
- Edge case: Empty input, single element, or “no answer” case not handled. Add explicit checks.
Changing code randomly without a hypothesis or without re-running the same failing case. Always reproduce, locate, then fix. One fix at a time.
When your code fails a test case, say: “Let me trace through with this input.” Walk through your logic step by step and state expected values. You’ll often spot the bug while explaining. That’s more impressive than silent trial-and-error.
Summary
- Reproduce the failure with a concrete input; state expected vs actual.
- Narrow the bug location (prints, binary search, rubber duck).
- Find the root cause (wrong condition, off-by-one, missing edge case), fix it, then re-test.
- Watch for off-by-one, wrong conditions, missing state updates, and unhandled edge cases.
1.8 Handling Edge Cases
Introduction
An edge case is an input or situation at the “boundary” of what your solution is designed for: empty input, a single element, maximum size, or a “no valid answer” scenario. Code that works on “normal” examples often fails on edge cases. Handling them explicitly is what separates a quick hack from a robust solution—and what interviewers look for.
Why Edge Cases Matter
In production, edge cases are where systems break: empty list causing a crash, or one user when the logic assumed many. In interviews, test cases often include edge cases on purpose. If you don’t handle them, your solution is incomplete. Thinking about edge cases up front also improves your algorithm design: you clarify the problem and avoid bugs before they happen.
What Counts as an Edge Case?
Depends on the problem. Common categories:
Size Boundaries
- Empty: Empty array, empty string, zero elements. Many algorithms assume “at least one”; they crash or return nonsense on empty.
- Single element: One item in the list, one character in the string. Loops that assume “first and rest” or “pair” can break.
- Two elements: Minimal “non-trivial” case. Good for testing “first and last” or “pair” logic.
- Large input: Maximum n. Tests performance and overflow (e.g., integer or recursion depth).
Value Boundaries
- Zero, negative, or maximum values: Division by zero, negative indices, or values at the limit of the type.
- Duplicates: All elements equal, or many duplicates. “Find two that sum to target” with many repeated numbers can affect indexing or “same index” bugs.
- No valid answer: Problem says “return -1 if not found” or “return empty list.” Your code must explicitly handle and return that.
Structural / Problem-Specific
- Already sorted / reverse sorted: For sorting or search, these can stress your algorithm.
- All same: All zeros, all same character. Can break “find the different one” or “max” logic if you’re not careful.
How to Handle Edge Cases
1. List Them Before Coding
After reading the problem, write down 3–5 edge cases: “What if empty? What if one element? What if no pair exists?” Then ensure your algorithm and code account for each. In an interview, say them out loud: “I’ll need to handle empty input and the case when no two numbers sum to target.”
2. Early Returns / Guards
At the start of the function, check for edge cases and return a defined result. That keeps the main logic simple and avoids special cases inside loops.
def find_max(arr):
if not arr: # edge: empty
return None
if len(arr) == 1: # edge: single element (optional, main loop handles it)
return arr[0]
# main logic
result = arr[0]
for i in range(1, len(arr)):
if arr[i] > result:
result = arr[i]
return result
3. Design the Main Logic to Naturally Handle Boundaries
Sometimes the “normal” logic already works for one element or two (e.g., a loop that runs from 0 to n-1 and updates a result). Other times you need a special case. Prefer one clear path when possible; use early returns when the edge is truly different.
4. Test Edge Cases Explicitly
Before submitting or shipping, run: empty input, one element, two elements, no answer, all same. If any fails, fix the code and add that case to your mental (or automated) test list.
Example: Two Sum Edge Cases
- Empty array: Return “no pair” (e.g., [-1, -1] or empty list) without entering the loop.
- Single element: Can’t form a pair; return “no pair.”
- No pair sums to target: After the loop, return the agreed “not found” value.
- Duplicate values: Ensure you don’t use the same index twice (e.g., i and j must be different). Your algorithm should already enforce that (e.g., j > i or storing index when we see a value).
Edge cases are not “extra”—they define the contract of your function. The problem statement usually says what to return when input is empty or when there’s no solution; your code must implement that contract.
Assuming “input will always have at least two elements” or “there will always be an answer” without checking the problem. If the problem doesn’t guarantee it, handle the opposite case explicitly.
After reading the problem, say: “I’ll consider edge cases: empty input, single element, and when no valid answer exists.” Then add the checks. Interviewers notice when you proactively handle boundaries.
Summary
- Edge cases = boundary inputs or situations: empty, single element, no answer, duplicates, max size, etc.
- List them before coding; use early returns or design main logic so boundaries are handled.
- Test edge cases explicitly; they define the contract of your solution.
1.9 Problem Decomposition
Introduction
Problem decomposition is the skill of breaking a large, complex problem into smaller subproblems that are easier to solve. You solve the small pieces (often recursively or in order), then combine their results to get the full solution. It’s the same idea behind “divide and conquer,” dynamic programming, and clean system design—and it’s how you avoid feeling overwhelmed by a hard problem.
Why Decompose?
- Manageable pieces: A subproblem like “find the max in the left half” is easier to think about than “find the max in the whole array” when you’re building recursion or iteration.
- Reuse: The same subproblem often appears many times (e.g., “sort this range”). Solve it once and reuse—that’s the heart of dynamic programming and recursion.
- Clear structure: Dependencies between subproblems (e.g., “solve A and B before C”) define the order of computation and help you avoid circular or missing steps.
Mental Model
Picture the problem as a tree: the root is the original problem; children are subproblems. You solve leaves first (base cases), then combine upward until you reach the root. The art is choosing a split that makes the subproblems simpler and the combination step cheap.
Original: "Sort array A[0..n-1]"
|
v
Subproblem 1: Sort A[0..mid-1] Subproblem 2: Sort A[mid..n-1]
| |
v v
(base: 1 element, done) (base: 1 element, done)
| |
+---------> COMBINE: merge two sorted halves --------> sorted A
How to Decompose
Step 1: Identify Natural Subproblems
Ask: “What smaller version of this problem would help?” Examples: “max of array” → “max of left half” and “max of right half,” then take the larger. “Count ways to reach step n” → “ways to reach n-1” and “ways to reach n-2,” then add. The subproblems should have the same structure as the original (same kind of input/output, smaller size or simpler case).
Step 2: Define the Dependency
Does subproblem A need the result of B? Then B must be solved first. In recursion, that’s “call B, then use its result in A.” In dynamic programming, that’s “fill table in an order so that when we compute A, B is already computed.”
Step 3: Identify Base Cases
What’s the smallest or simplest case you can solve directly? Empty array, single element, zero steps—these stop the recursion or form the first row/column of your DP table. Without clear base cases, decomposition doesn’t terminate.
Step 4: Define How to Combine
Given solutions to the subproblems, how do you get the solution to the bigger problem? Merge two sorted halves, take max of two values, add counts, etc. The combine step should be simpler than solving the whole problem from scratch.
Example: Counting Inversions
Problem: Count pairs (i, j) with i < j and arr[i] > arr[j].
Decomposition: Split array into left and right halves. Count inversions in left, count in right, and count inversions that cross the middle (one in left, one in right). The cross count can be computed during a merge-like pass: when we take an element from the right and there are remaining elements in the left, each of those left elements forms an inversion with this right element. So we get:
- Subproblem 1: count inversions in left half (same problem, smaller n).
- Subproblem 2: count inversions in right half.
- Combine: add left count + right count + cross count (computed in O(n) during merge).
Base case: 0 or 1 element → 0 inversions. This is merge-sort with an extra counter—decomposition turned a hard count into “solve two halves + merge.”
Top-Down vs Bottom-Up
- Top-down (recursion / memoization): Start from the full problem, recurse to subproblems, combine on the way back. Natural to think about; you must handle overlapping subproblems (memoize) to avoid exponential time.
- Bottom-up (tabulation): Solve smallest subproblems first, then larger ones, in order. No recursion; good when dependency order is clear (e.g., “solve for size 1, then 2, then … n”).
Same decomposition; different order of evaluation. Choose based on what’s easier to implement and what fits the problem.
When stuck on a problem, ask: “If I had the answer for a smaller input (n-1, n/2, or a subset), could I use it to get the answer for the full input?” If yes, you’ve found a decomposition.
Say: “I can break this into …” and name the subproblems and how you’ll combine. That shows structured thinking. Then implement the base case and the combine step; the recursion or loop over subproblems often falls into place.
Summary
- Problem decomposition = break a big problem into smaller subproblems, solve them, then combine.
- Identify subproblems, their dependencies, base cases, and the combine step.
- Same idea underlies recursion, divide-and-conquer, and dynamic programming.
1.10 Pattern Recognition in Problems
Introduction
Many problems that look different on the surface share the same underlying pattern: same kind of input, same kind of operation, same data structure or technique that fits. Once you recognize “this is a two-pointer problem” or “this is a BFS shortest-path,” you can apply a known strategy instead of inventing from scratch. Pattern recognition is what turns experience into speed and confidence.
Why Patterns Matter
Interview and contest problems are often designed to test a specific technique. The problem statement might not say “use a hash map”—you’re expected to see that “find two things that sum to target” or “check if we’ve seen this before” suggests a hash map. The more patterns you know, the faster you match the problem to the right tool and the less you waste time on the wrong approach.
Mental Model
Read the problem → notice structure (sorted? pairs? subsequences? paths?) and operations (find, count, maximize, exists?) → map to a pattern → apply the standard strategy for that pattern. Pattern recognition is the step between “what’s being asked” and “which algorithm/structure to use.”
Common Patterns (High-Level)
You’ll see these again in later sections. Here we name them so you can start building the mapping.
Two Pointers
Two indices moving over a sequence (often from both ends or both from the start). Good for: “two numbers that sum to target” in a sorted array, “remove duplicates in place,” “palindrome check,” “merge two sorted arrays.” Clue: sorted array or string, “pair,” “two indices.”
Sliding Window
A contiguous segment of fixed or variable size that moves one step at a time. Good for: “longest subarray with sum ≤ K,” “minimum window containing all characters,” “max in every window of size k.” Clue: “subarray,” “substring,” “contiguous,” “window.”
Hash Map / Set for Lookup
Store what you’ve seen; check membership or fetch in O(1). Good for: “two sum,” “first duplicate,” “group anagrams,” “subarray with sum 0.” Clue: “find a pair,” “have we seen this,” “count distinct,” “group by.”
Prefix Sum
Precompute cumulative sums (or other aggregates) so range queries are O(1). Good for: “subarray sum equals K,” “range sum queries.” Clue: “sum of range,” “subarray sum,” multiple range queries.
Binary Search (on Answer or on Index)
When the answer or the “split point” is in a sorted space, binary search can reduce tries. Good for: “find minimum capacity,” “search in sorted array,” “kth smallest.” Clue: sorted data, “minimum maximum,” “feasibility check.”
BFS / DFS (Graph or Implicit Graph)
Explore neighbors level by level (BFS) or depth-first (DFS). Good for: shortest path in unweighted graph, connected components, “reachable,” “level order.” Clue: “shortest path,” “neighbors,” “grid,” “level.”
Dynamic Programming
Optimal substructure + overlapping subproblems. Good for: “maximum sum,” “count ways,” “longest increasing subsequence,” “edit distance.” Clue: “maximum/minimum,” “count,” “choose or skip,” “subsequence.”
How to Get Better at Recognition
- Solve by pattern: After solving a problem, label it (“two pointers,” “sliding window”). Next time you see similar wording or structure, try that pattern first.
- Note keywords: “Subarray” often suggests sliding window or prefix sum; “two numbers” in sorted array suggests two pointers; “shortest path” in unweighted graph suggests BFS.
- Use constraints: Large n with “find pair” often rules out O(n²) brute force and points to hash or two pointers. Small n might allow brute force or bitmask DP.
Patterns are not rigid. One problem might be solvable with two pointers or a hash map. The goal is to narrow the set of strategies quickly so you don’t wander. If one pattern doesn’t fit, try the next best match.
After reading the problem, say: “This looks like a two-pointer / sliding-window / … problem because …” Then outline the standard approach. That signals you’ve seen similar problems and know the template.
Summary
- Pattern recognition = matching problem structure and operations to a known technique (two pointers, hash, sliding window, BFS, DP, etc.).
- Use keywords, constraints, and experience to narrow the strategy.
- Naming the pattern and applying the template speeds you up and impresses interviewers.
1.11 Fermi Estimation for System Design
Introduction
Fermi estimation (named after physicist Enrico Fermi) is the skill of getting an order-of-magnitude answer to a question using rough assumptions and simple arithmetic—no calculator, no exact data. “How many piano tuners are in Chicago?” “How many queries per second does Google handle?” You break the question into a few factors, estimate each to the nearest power of 10 or a round number, multiply or divide, and get a number that’s right within a factor of 10 or so. In system design interviews and in real-world capacity planning, this is how you sanity-check ideas and communicate scale.
Why It Matters
Interviewers ask “how many servers do you need?” or “what’s the storage for 1 billion users?” They don’t expect a precise number—they want to see that you can reason about scale: break the problem down, estimate each piece, and combine. Fermi estimation is that process. In practice, it’s how you quickly check if a design is in the right ballpark before diving into details.
How to Do It
Step 1: Break the Question Into Factors
Turn the big question into a product or quotient of quantities you can guess. Example: “Queries per second for Google search?” → (queries per user per day) × (number of users) ÷ (seconds per day). Each factor is easier to estimate than the whole.
Step 2: Estimate Each Factor
Use round numbers and powers of 10. “Number of people on Earth” ≈ 10^9 (actually ~8 billion, but 10^9 is fine). “Searches per person per day” might be 1–10; say 5. “Seconds per day” = 24 × 3600 ≈ 10^5. It’s okay to be wrong by 2–3×; we’re aiming for order of magnitude.
Step 3: Multiply or Divide
Do the arithmetic. Prefer mental math: 5 × 10^9 / 10^5 = 5 × 10^4 = 50,000. So “on the order of 10^4 to 10^5 queries per second” is a valid answer. Stating “roughly 50k QPS” or “tens of thousands” is better than “I don’t know” or an overly precise wrong number.
Step 4: Sanity Check
Does the result make sense? If you got 10^9 QPS for Google, that might be too high (global internet traffic is finite). If you got 10 QPS, that’s too low for a global product. Adjust assumptions if the result is obviously off.
Example: How Many Servers for a Video Platform?
Question: Roughly how many servers does a YouTube-scale video platform need?
Breakdown:
- Assume 1 billion daily active users, each watching ~30 min of video per day → 0.5 billion hours of video streamed per day.
- Assume average bitrate ~1 Mbps (mixed quality) → 0.5 × 10^9 × 3600 × 1 Mbit ≈ 1.8 × 10^15 bits per day. In bytes, ~2 × 10^14 bytes/day ≈ 2 × 10^11 bytes per second (roughly 200 Gbps global).
- If one server can serve ~10 Gbps (simplified), we need on the order of 200/10 ≈ 20 servers just for egress? That’s too low because we ignored replication, peaks, storage, etc. So we might say “order of 10^5 to 10^6 machines” when we include redundancy, storage servers, and peak load. The exact number isn’t the point—the reasoning is.
Interviewers care that you: (1) break it down, (2) state assumptions, (3) do the math, (4) sanity check.
Rules of Thumb
- Round boldly: 7 billion → 10^9; 24 × 3600 → 10^5. Fewer significant figures = faster and often “good enough.”
- State assumptions: “I’m assuming 10 million users …” so the interviewer can correct you or follow your logic.
- One or two significant figures: “About 50k” or “on the order of 10^5” is fine. Avoid fake precision like “47,382.”
“How much storage for 1 billion users with 100 MB each?” → 10^9 × 10^8 bytes = 10^17 bytes = 100 PB. Saying “around 100 petabytes” or “order of 10^17 bytes” is a valid Fermi answer.
When asked for scale or capacity, don’t freeze. Say: “Let me break this down. We need … I’ll assume … So that’s roughly … Does that order of magnitude make sense?” Showing the process matters more than the exact number.
Summary
- Fermi estimation = order-of-magnitude answers using rough assumptions and simple arithmetic.
- Break the question into factors, estimate each (powers of 10, round numbers), multiply/divide, then sanity check.
- State assumptions; aim for one or two significant figures. The reasoning is what interviewers evaluate.
1.12 Invariants & Monovariants in Logical Problems
Introduction
In logical and algorithmic problems, an invariant is something that stays true throughout a process (e.g., before and after each step of a loop or each move in a game). A monovariant (or monotonic variant) is a quantity that only moves in one direction—usually it only decreases (or only increases). Invariants help you prove correctness or narrow down possible states; monovariants help you prove that a process eventually terminates. Together they’re powerful tools for reasoning about algorithms and puzzles.
Why They Matter
In interviews you might get a puzzle or a “prove this loop terminates” question. In competitive programming, some problems are solved by finding an invariant that must hold and then constructing the answer from it. In code, loop invariants are what you use to reason that your algorithm is correct. So even if you don’t hear the words “invariant” and “monovariant” every day, the ideas are central to rigorous thinking.
Invariants
Definition
An invariant is a condition or quantity that remains true (or unchanged) every time a certain operation is applied. You typically state it “at the start of the loop” and “after each iteration”—and show that if it was true before the step, it’s still true after.
Example: Sum of Array Mod 2
Suppose you have a game where in each move you pick two elements and replace them with their sum. The sum of the whole array mod 2 (i.e., even or odd) never changes: (a + b) mod 2 = (a mod 2 + b mod 2) mod 2, so replacing two numbers by their sum doesn’t change the parity of the total. So if the initial sum is odd, the final single number (if you merge everything) must be odd. That invariant restricts what the answer can be.
Example: Loop Invariant in Code
In “find max in array,” a useful invariant is: “At the start of each iteration, max_so_far is the maximum of all elements we’ve seen so far.” Before the loop it’s true (we’ve seen only the first element). Each step we compare the next element and update; so after the step we’ve still seen a prefix and max_so_far is still the max of that prefix. When the loop ends we’ve seen the whole array, so max_so_far is the global max. That’s how you prove the loop correct.
Monovariants
Definition
A monovariant is a quantity that (under the rules of the process) only increases or only decreases. For example: “the number of inversions in the array” might only decrease when we swap two elements in a certain way. If it’s bounded below (e.g., by 0) and it’s integer, it can only decrease a finite number of times—so the process must stop. That’s a termination argument.
Example: Distance to Goal
In a BFS over a grid, “distance from start” increases as we go to deeper layers. So “distance from start” is a monovariant (monotonically increasing along any path). We don’t use it for termination (BFS stops when we find the goal or exhaust the graph), but we use a similar idea: “steps taken” only increases, and the graph is finite, so we can’t run forever.
Example: Inversion Count
In bubble sort, each swap reduces the total number of inversions by exactly 1. So “inversion count” is a monovariant that decreases with each swap. It’s bounded below by 0. So after a finite number of swaps, we must have 0 inversions—i.e., the array is sorted. That proves bubble sort eventually terminates (and with a sorted array).
Using Invariants and Monovariants Together
In a puzzle: first find an invariant that limits what’s possible (e.g., “parity of sum”). Then, if the process has steps that change the state, find a monovariant that only decreases (or increases) and is bounded, so the process must stop. The invariant might tell you the only possible final state; the monovariant tells you you’ll get there.
Invariant = “X stays true (or unchanged).” Use it to prove correctness or to restrict possible answers. Monovariant = “Y only decreases (or only increases) and is bounded.” Use it to prove termination.
“Prove that this loop terminates.” Identify a quantity that (1) decreases (or increases) each iteration, and (2) has a lower (or upper) bound. For a loop that divides n by 2 each time, “n” is a monovariant: it decreases and is bounded below by 0 (or 1). So the loop runs at most O(log n) times.
When asked “why does this terminate?” or “what’s always true here?”, name an invariant or monovariant and state why it holds. For example: “The number of inversions decreases with each swap and is non-negative, so we can only do finitely many swaps.”
Summary
- An invariant is a condition or quantity that stays true (or unchanged) through each step; use it to prove correctness or narrow possibilities.
- A monovariant is a quantity that only increases or only decreases and is bounded; use it to prove termination.
- Together they support rigorous reasoning about loops, games, and puzzles.
2.1 Python Data Types
Introduction
Every value in Python has a type: it’s an integer, a string, a list, a dictionary, and so on. The type determines what you can do with the value (index it, add to it, hash it) and how it behaves in memory (mutable vs immutable, shared by reference). For DSA, you need a clear picture of the built-in types so you can pick the right one and avoid subtle bugs—especially around mutability and copying.
Why Data Types Matter for DSA
Different types support different operations at different costs. You use a list when you need order and index access; you use a set or dict when you need fast membership or key lookup. Immutable types (e.g., str, tuple) can be used as dictionary keys and in sets; mutable ones cannot. Knowing types helps you reason about time complexity (e.g., “in” on a list is O(n), on a set is O(1) average) and about correctness (e.g., “did I just mutate a shared list?”).
Numeric Types
int
Integers in Python have arbitrary precision: they can be as large as memory allows. There’s no fixed 32- or 64-bit overflow like in C or Java (though operations get slower for huge numbers). For DSA, this means you usually don’t worry about integer overflow in pure Python; in contests or when interfacing with other systems, modulo arithmetic might still be required by the problem.
x = 42
y = 10**100 # valid in Python
print(type(x)) # <class 'int'>
float
Floating-point numbers are approximate. Avoid using == for equality; use tolerance checks (e.g., abs(a - b) < 1e-9) or integer arithmetic when possible. For DSA, many problems use only integers; when floats appear, be aware of precision issues.
bool
True and False. They’re a subtype of int (True is 1, False is 0), but for clarity use them as booleans. Used in conditionals and as results of comparisons.
Sequence Types
str (string)
Immutable sequence of Unicode characters. Indexing s[i] and slicing s[l:r] are O(1) for access, but slicing creates a new string. Concatenating with + in a loop is O(n²) for n characters; prefer ''.join(...) for building strings. For DSA, strings often appear in pattern matching, palindromes, and parsing.
s = "hello"
# s[0] = 'H' # TypeError: str does not support item assignment
t = s.upper() # returns new string; s unchanged
list
Mutable, ordered sequence. Append at end is amortized O(1); insert at front is O(n). Indexing and assignment arr[i] = x are O(1). The main workhorse for arrays in DSA. Supports negative indices: arr[-1] is the last element.
arr = [3, 1, 4, 1, 5]
arr.append(9) # O(1) amortized
arr[0] = 10 # O(1)
# arr[10] # IndexError
tuple
Immutable sequence. Same indexing and slicing as list, but you can’t change elements. Use when you need a fixed sequence (e.g., (x, y) coordinates, return multiple values) or when you need a hashable value (e.g., as dict key or set element).
t = (1, 2, 3)
# t[0] = 10 # TypeError
point = (x, y) # common for coordinates
Mapping and Set Types
dict
Key–value mapping. Keys must be hashable (immutable types: int, str, tuple, etc.). Lookup, insert, and delete by key are O(1) average. Essential for “count frequency,” “have we seen this,” “two sum” style problems. Iteration order is insertion order (Python 3.7+).
d = {}
d["a"] = 1
d["b"] = 2
if "a" in d: # O(1) average
print(d["a"])
set
Unordered collection of unique, hashable elements. Add, remove, and “in” are O(1) average. Use when you need fast membership or to remove duplicates. No indexing; iteration order is arbitrary.
s = {1, 2, 3}
s.add(2) # no change; 2 already there
print(len(s)) # 3
None
None is the single value of type NoneType. Used to mean “no value” or “missing.” Functions that don’t explicitly return something return None. In DSA you’ll use it for “not found” or optional results when you don’t want to use -1 or an exception.
Mutability and Why It Matters
Mutable objects (list, dict, set) can be changed in place. When you pass them to a function, the function gets a reference to the same object—so if the function modifies it, the caller sees the change. That’s useful when you want to build or update a structure, but dangerous when you didn’t intend to share: two names pointing to the same list can cause bugs if you assume they’re independent.
Immutable objects (int, float, str, tuple) can’t be changed. “Operations” that look like changes (e.g., s += 'x') create a new object. So you can’t accidentally mutate shared state; also, immutable values are hashable and can go in sets and as dict keys.
Modifying a list while iterating over it can skip elements or behave oddly. If you need to change a list during a loop, iterate over a copy (e.g., for x in list(arr):) or use a while loop and update the index yourself. Same idea: be clear about who “owns” the data and whether you’re sharing or copying.
Type Checking
type(x) returns the type of x. isinstance(x, int) is True if x is an int (or a subclass). For branching on type, isinstance is preferred because it respects inheritance. In DSA code you often assume types from the problem statement; type hints (e.g., def f(arr: list[int]) -> int:) document and can be checked with tools.
Quick Reference: DSA-Oriented
- Need ordered sequence, index access, append:
list. - Need fast “have we seen this” or unique elements:
set. - Need fast lookup by key or count by key:
dict. - Need immutable sequence (e.g., as dict key):
tuple. - Need sequence of characters, often read-only:
str.
Summary
- Python’s main types for DSA:
int,float,bool,str,list,tuple,dict,set,None. - Mutability matters: mutable (list, dict, set) can be changed in place and are passed by reference; immutable (int, str, tuple) are hashable and safe to share.
- Choose the type that supports the operations you need at the right cost (e.g., set/dict for O(1) lookup).
2.2 Conditionals & Loops
Introduction
Conditionals let your program choose different paths (if this, do that; else do something else). Loops let you repeat a block of code—over a sequence, a range of indices, or until a condition is false. Together they form the control flow of almost every algorithm. For DSA you must be fluent: clean conditionals for edge cases and branches, and correct loops with the right bounds and iteration style.
Conditionals
if / elif / else
Execute a block only when its condition is true. elif is “else if”; only one branch runs. Indentation defines the block (typically 4 spaces).
if n < 0:
print("negative")
elif n == 0:
print("zero")
else:
print("positive")
Truthiness
Conditions don’t have to be strictly True or False. Values are “truthy” or “falsy”: False, None, 0, 0.0, '', [], {} are falsy; most other values are truthy. So if arr: means “if arr is non-empty”; if not arr: means “if arr is empty.” Use this for clean guards.
if not arr:
return 0
if key in d: # key exists in dict
...
Comparison and Logical Operators
Comparisons: <, <=, >, >=, ==, !=. Chained comparisons: a < b < c is equivalent to a < b and b < c. Logical: and, or, not. Short-circuit: and stops at first falsy; or stops at first truthy. Use parentheses when it helps readability.
Loops
for loop
Iterate over an iterable: a sequence (str, list, tuple) or something that yields values (e.g., range).
for x in [1, 2, 3]:
print(x) # 1, 2, 3
for i in range(5): # i = 0, 1, 2, 3, 4
print(i)
for i in range(2, 10, 2): # start 2, stop 10 (exclusive), step 2 → 2,4,6,8
print(i)
range(n) gives 0 to n-1 (n values). range(a, b) gives a to b-1. range(a, b, step) goes from a by step, stopping before b. So “iterate indices 0 to n-1” is for i in range(len(arr)); “iterate indices 0 to n-1 inclusive in reverse” is for i in range(len(arr)-1, -1, -1).
enumerate
When you need both index and value in a loop, use enumerate(iterable, start=0). It yields (index, value) pairs. Avoids manual index management and off-by-one errors.
arr = [10, 20, 30]
for i, x in enumerate(arr):
print(i, x) # (0,10), (1,20), (2,30)
for i, x in enumerate(arr, start=1):
print(i, x) # (1,10), (2,20), (3,30)
while loop
Repeat while a condition is true. Use when the number of iterations isn’t known in advance (e.g., “while the stack is not empty,” “while n > 0”). Ensure the condition eventually becomes false to avoid infinite loops.
n = 10
while n > 0:
print(n)
n -= 1
break and continue
break: exit the innermost loop immediately. Use when you’ve found what you need (e.g., found a pair that sums to target).
continue: skip the rest of the current iteration and go to the next. Use to skip unwanted elements without nesting.
for i in range(n):
if arr[i] < 0:
continue # skip negative
process(arr[i])
else on Loops
A for or while can have an else block. It runs only if the loop completes without hitting a break. Useful for “search and report if not found.”
for x in arr:
if x == target:
print("Found")
break
else:
print("Not found") # runs only if we never broke
Nested Loops and Complexity
A loop inside a loop typically does work proportional to the product of the iteration counts. Two loops over n → O(n²). That’s why we optimize by removing inner loops (e.g., with a hash map) when possible. Be aware of what’s inside the inner loop: if each iteration does O(1), total is O(n²); if the inner loop does O(n), total can be O(n³).
Common Patterns in DSA
- Iterate by index:
for i in range(len(arr))when you need to read or writearr[i]or useiin logic. - Iterate by value:
for x in arrwhen you only need the elements. - Index and value:
for i, x in enumerate(arr). - Reverse index:
for i in range(len(arr)-1, -1, -1)orfor x in reversed(arr)(latter doesn’t give index).
Off-by-one with range: range(n) gives 0..n-1, not 0..n. To run from 0 to n inclusive you need range(n+1). Also: modifying a list while iterating with for x in arr can skip or repeat elements; iterate over a copy or use indices and update carefully.
Prefer for i, x in enumerate(arr) over for i in range(len(arr)): x = arr[i] when you need both—it’s clearer and less error-prone. Use while when the stopping condition is “until structure is empty” (e.g., stack) or “until state changes” (e.g., binary search on answer).
Summary
- Conditionals:
if/elif/else; use truthiness (if arr:,if not arr:) for clean guards. - for: over iterables and
range; useenumeratefor index + value;range(n)is 0..n-1. - while: when iterations aren’t known in advance; ensure condition eventually becomes false.
- break exits the loop; continue skips to the next iteration; else on a loop runs when no
breakoccurred.
2.3 Functions
Introduction
A function is a named block of code that takes inputs (arguments), does work, and optionally returns a value. Functions let you reuse logic, structure programs into clear steps, and express algorithms in small, testable pieces. For DSA, almost every solution is written as one or more functions—so you need a solid grasp of how to define them, pass data in and out, and avoid common pitfalls (especially with mutable arguments and scope).
Why Functions Matter for DSA
In problems you’ll write a function that takes the input (e.g., an array and a target) and returns the answer. Recursion is “a function that calls itself.” Helper functions keep your main solution readable (e.g., “is_valid,” “merge”). Understanding parameters (by value vs by reference), return values, and default arguments helps you avoid bugs and write clean, interview-ready code.
Defining a Function
Use the def keyword, the function name, parentheses with parameters, and a colon. The body is indented. Execution starts at the first line of the body and ends at return or at the end of the block (then the function returns None).
def greet(name):
return "Hello, " + name
result = greet("Alice") # result is "Hello, Alice"
Parameters and Arguments
Parameters are the names listed in the def line; arguments are the values you pass when you call the function. In Python, arguments are passed by object reference: the parameter name is bound to the same object the caller passed. For immutable types (int, str, tuple), reassigning the parameter doesn’t affect the caller. For mutable types (list, dict), modifying the object does affect the caller—because it’s the same object.
Positional and Keyword Arguments
Arguments can be passed by position or by name. f(1, 2) passes 1 and 2 to the first and second parameters. f(a=1, b=2) or f(1, b=2) uses names; keyword arguments must come after positional ones in a call.
def add(a, b):
return a + b
add(3, 5) # 8, positional
add(a=3, b=5) # 8, keyword
add(3, b=5) # 8, mixed
Default Parameter Values
You can give a parameter a default so callers can omit it. Defaults are evaluated once at function definition time—so avoid using mutable defaults (like def f(arr=[])); that one list is shared across all calls that omit arr. Use def f(arr=None) and if arr is None: arr = [] instead.
def power(x, n=2):
return x ** n
power(5) # 25
power(5, 3) # 125
Mutable default argument: def f(x, arr=[]): arr.append(x); return arr. Every call that omits arr shares the same list. Use arr=None and create a new list inside the function when arr is None.
Return Values
return exits the function and sends a value back to the caller. You can return one value, multiple values (as a tuple—often unpacked by the caller), or nothing (returns None). Returning multiple values is just return a, b; the caller gets (a, b) and can write x, y = f().
def min_max(arr):
if not arr:
return None, None
return min(arr), max(arr)
lo, hi = min_max([3, 1, 4]) # lo=1, hi=4
Scope
Variables defined inside a function are local to that function; they aren’t visible outside. Variables defined at the top level (module level) are global. Reading a global name inside a function is allowed; assigning to it (e.g., count = 0) creates a new local name unless you declare global count. For DSA, prefer passing values in and returning them—avoid global state so your functions are easy to test and reason about.
Mutability and Side Effects
If you pass a list and the function appends to it or changes elements, the caller’s list is modified. That’s useful when you intentionally want to build or update a structure in place (e.g., “fill this list with results”). It’s a bug when the caller didn’t expect the input to change. When in doubt, don’t mutate the input unless the problem or API says so; if you need to return a modified copy, build a new list/dict and return it.
In interviews, state your contract: “This function takes the array and target and returns the two indices; it does not modify the input array.” Then implement accordingly—copy if you need to sort or mutate internally, or work on the original if the problem allows in-place changes.
Recursion Preview
A function can call itself—that’s recursion. You’ll see it in depth in the next section. For now: every recursive function needs a base case (when to stop) and a recursive case (how to express the result in terms of a smaller subproblem). Example: factorial—fact(0)=1; fact(n)=n*fact(n-1).
def fact(n):
if n <= 0:
return 1
return n * fact(n - 1)
Summary
- Functions are defined with
def; they take parameters and return values (orNone). - Arguments are passed by object reference: mutating a mutable argument affects the caller; reassigning an immutable doesn’t.
- Avoid mutable default arguments; use
Noneand create a new list/dict inside if needed. - Prefer passing data in and returning results; be clear about whether your function mutates input.
2.4 Recursion Basics
Introduction
Recursion is when a function calls itself to solve a smaller instance of the same problem. Many algorithms (tree traversal, divide-and-conquer, backtracking) are naturally expressed recursively. To use it well you need a clear base case (when to stop), a correct recursive case (how to reduce the problem), and an understanding of the call stack so you can reason about correctness and space.
Why Recursion Matters for DSA
Recursion appears everywhere: merge sort (“sort left half, sort right half, merge”), binary search (“search left or right half”), tree DFS (“process root, then recurse on each child”), and backtracking (“try a choice, recurse, undo”). If you’re comfortable with recursion, these patterns are much easier to write and debug. The same logic can often be converted to an iterative loop with an explicit stack—but thinking recursively first is a core skill.
Two Parts of Every Recursive Function
Base Case
The base case is when the problem is so small that you can return an answer directly without calling yourself again. Without a base case, recursion never stops and you get infinite recursion (and eventually a stack overflow). Examples: “if n is 0, return 1” (factorial); “if the list is empty, return 0” (sum of list).
Recursive Case
The recursive case expresses the answer in terms of the same function on a smaller input. You must ensure the subproblem is strictly smaller (e.g., n−1 or half the array) so that repeated recursion eventually hits the base case. Example: “fact(n) = n * fact(n−1)”—each call reduces n until n is 0.
Recursive function:
1. Base case: return known answer for smallest input.
2. Recursive case: call self on smaller input; combine result with current step.
Example: Factorial
Definition: fact(0) = 1; fact(n) = n × fact(n−1) for n ≥ 1. Base case: n ≤ 0 → return 1. Recursive case: return n * fact(n−1).
def fact(n):
if n <= 0:
return 1
return n * fact(n - 1)
Trace for fact(3): fact(3)=3*fact(2), fact(2)=2*fact(1), fact(1)=1*fact(0), fact(0)=1. Then 1→1, 2*1=2, 3*2=6. So fact(3)=6.
Example: Sum of an Array
Sum of arr = first element + sum of the rest. Base case: empty list → 0. Recursive case: arr[0] + sum(arr[1:]).
def sum_arr(arr):
if not arr:
return 0
return arr[0] + sum_arr(arr[1:])
Note: arr[1:] creates a new list each time, so this uses O(n) extra space per level and O(n²) time for the slicing. For real DSA you’d often pass indices (left, right) instead of slicing—same recursive idea, better complexity. The point here is the structure: base + recursive case.
Example: Fibonacci
F(0)=0, F(1)=1, F(n)=F(n−1)+F(n−2). Base cases: n=0 → 0, n=1 → 1. Recursive case: return fib(n−1) + fib(n−2).
def fib(n):
if n <= 0:
return 0
if n == 1:
return 1
return fib(n - 1) + fib(n - 2)
This version is correct but exponential in n because it recomputes the same F(k) many times. Later you’ll fix that with memoization or iteration—but as a first recursive definition it’s the standard example.
Call Stack Intuition
Each recursive call pushes a new frame onto the call stack (with its own parameters and local variables). When a call returns, its frame is popped and execution resumes in the caller. So recursion uses space proportional to the depth of the recursion (e.g., fact(n) has depth n; sum_arr on a list of length n has depth n). If the depth is too large (e.g., recursion on a list of 10^5 elements without tail-call optimization), you get a stack overflow. For deep recursion, an iterative solution or passing indices to limit depth is often better.
Common Mistakes
- No base case or wrong base case: Recursion never stops, or returns wrong value for the smallest input. Always ask: “What’s the smallest input, and what should I return?”
- Recursive case not smaller: If you call f(n) in terms of f(n) or f(n+1), you never reach the base case. Ensure the argument (or problem size) strictly decreases.
- Wrong recurrence: The formula that combines “current step” with “result of smaller problem” must match the problem definition. Check with a small example by hand.
Forgetting the base case or writing a base case that never runs (e.g., “if n == 1” when you only ever call with n ≥ 2 but the recursive call can produce n=0). Trace one small call to verify the base case is hit.
When designing recursion, write the base case first (“what’s the trivial case?”), then the recursive case (“if I had the answer for the smaller problem, how would I get the full answer?”). That order matches how the computation unfolds.
Summary
- Recursion = function calls itself on a smaller instance; needs a base case and a recursive case.
- Base case: return directly for smallest input. Recursive case: call self on smaller input and combine.
- Recursion depth = stack space; avoid very deep recursion (e.g., long lists) unless you use indices or iteration.
- Naive Fibonacci is exponential; memoization or iteration fixes it (covered later).
2.5 Lists
Introduction
A list in Python is a mutable, ordered sequence of values. It’s the primary “array” type for DSA: you use it for sequences that need index access, in-place updates, and dynamic growth. Understanding how to create, index, slice, and modify lists—and what each operation costs—is essential for implementing algorithms correctly and efficiently.
Why Lists Matter for DSA
Most array-based problems give you or expect a list. You’ll index by position (arr[i]), append during a single pass, or use a list as an explicit stack or queue. Knowing that append is amortized O(1) but insert(0, x) is O(n) helps you choose the right structure (e.g., collections.deque for frequent front insert/delete).
Creating Lists
arr = [] # empty list
arr = [1, 2, 3] # literal
arr = list() # same as []
arr = list(range(5)) # [0, 1, 2, 3, 4]
arr = [0] * 10 # [0, 0, ..., 0] — same object repeated (careful with mutable elements!)
arr = [x * 2 for x in range(5)] # list comprehension: [0, 2, 4, 6, 8]
[[]] * 5 creates a list of five references to the same inner list. Appending to one affects all. Use [[] for _ in range(5)] to get five separate lists.
Indexing and Slicing
Indices are 0-based. arr[i] is the element at position i; valid range is 0 to len(arr)-1. Negative indices count from the end: arr[-1] is the last element, arr[-2] the second-to-last. Indexing is O(1).
Slicing arr[start:stop] returns a new list from index start up to but not including stop. Omitted start defaults to 0; omitted stop defaults to the end. arr[start:stop:step] steps by step. Slicing creates a copy—O(k) where k is the slice length.
arr = [10, 20, 30, 40, 50]
arr[0] # 10
arr[-1] # 50
arr[1:4] # [20, 30, 40] — indices 1,2,3
arr[:3] # [10, 20, 30]
arr[::2] # [10, 30, 50] — every 2nd element
Mutability and In-Place Operations
Lists are mutable: you can change elements, append, insert, and remove. Assigning to an index updates that slot: arr[i] = x is O(1). In-place operations modify the list and often return None (e.g., arr.sort() returns None; sorted(arr) returns a new list).
Key Operations and Time Complexity
| Operation | Time |
|---|---|
arr[i], arr[i] = x | O(1) |
arr.append(x) | O(1) amortized |
arr.pop() | O(1) |
arr.insert(0, x), arr.pop(0) | O(n) |
x in arr | O(n) |
arr + other (concatenate) | O(n + m) |
append is amortized O(1) because the list occasionally grows its backing storage; over many appends the average cost per append is constant. insert(0, x) shifts all elements, so it’s O(n). Use collections.deque when you need O(1) front and back operations.
Common Methods
arr.append(x)— add at end. Amortized O(1).arr.extend(iterable)— add all elements from iterable at end. O(k) for k elements.arr.insert(i, x)— insertxat indexi; elements shift. O(n).arr.pop()— remove and return last element. O(1).arr.pop(i)removes at index i, O(n).arr.remove(x)— remove first occurrence ofx. O(n).arr.reverse()— reverse in place. O(n).arr.sort()— sort in place. O(n log n).sorted(arr)returns new list, doesn’t changearr.len(arr),arr.count(x),arr.index(x)— length O(1); count and index O(n).
List as Stack or Queue
Stack: Use append for push and pop() for pop—both O(1) amortized. Perfect for DFS, expression evaluation, etc.
Queue: Using insert(0, x) to enqueue and pop() to dequeue makes enqueue O(n). For a real queue use collections.deque (append and popleft both O(1)).
# Stack
stack = []
stack.append(1)
stack.append(2)
top = stack.pop() # 2
Copying vs Reference
Assignment does not copy: b = arr makes b refer to the same list. Modifying b changes arr. To copy: arr.copy() or arr[:] — shallow copy (same elements; if elements are mutable, inner objects are shared). For a deep copy of nested structures use copy.deepcopy(arr).
When you need to pass a list to a function that might mutate it and you want to keep the original, pass arr.copy() or arr[:]. When implementing recursion that “tries a choice then backtracks,” often you append, recurse, then pop—same list, no copy—for O(n) space instead of copying at each level.
Summary
- Lists are mutable, ordered sequences; index and slice with 0-based indices; negative indices from the end.
append/popat end are O(1) amortized;insert(0)/pop(0)/inare O(n).- Use a list as a stack (append + pop); for a queue use
deque. - Assignment shares the list; use
arr.copy()orarr[:]for a shallow copy when needed.
2.6 Tuples
Introduction
A tuple is an immutable, ordered sequence of values. Like a list, it supports indexing and slicing—but you cannot add, remove, or change elements after creation. That immutability makes tuples hashable: they can be used as dictionary keys and as set elements, and they safely represent fixed structures (e.g., coordinates, multi-value returns) without accidental mutation.
Why Tuples Matter for DSA
You’ll use tuples when you need a fixed pair or record (e.g., (row, col) for grid positions, (distance, node) for Dijkstra’s priority queue). Returning multiple values from a function is done with a tuple (return i, j). Storing “visited” or “seen” composite keys (e.g., (i, j) for 2D states) requires a hashable type—list won’t work in a set or as a dict key; tuple will. When you don’t need to mutate the sequence, tuple is a clear and efficient choice.
Creating Tuples
t = () # empty tuple
t = (1,) # single-element tuple — comma required
t = (1, 2, 3) # literal
t = 1, 2, 3 # parentheses optional when unambiguous
t = tuple() # empty, same as ()
t = tuple([1, 2, 3]) # from iterable: (1, 2, 3)
Single-element tuple needs a trailing comma: (1,). Without it, (1) is just the integer 1 in parentheses.
Indexing and Slicing
Same as lists: 0-based indices, negative indices from the end, slicing t[start:stop] returns a new tuple. Indexing and slicing are O(1) and O(k) respectively. You cannot assign: t[0] = 5 raises TypeError.
t = (10, 20, 30, 40)
t[0] # 10
t[-1] # 40
t[1:3] # (20, 30)
Unpacking
You can assign elements of a tuple to multiple names in one line. Useful for function returns and loop variables.
x, y = (3, 5) # x=3, y=5
a, b = b, a # swap without temp
for i, (r, c) in enumerate([(1,2), (3,4)]):
... # i=0, r=1, c=2; then i=1, r=3, c=4
Star unpacking: first, *rest = (1, 2, 3, 4) gives first=1, rest=[2, 3, 4] (rest is a list).
Hashable: Dict Keys and Set Elements
Because tuples are immutable, they can be hashed (assuming their elements are hashable). So you can use a tuple as a dict key or put it in a set—unlike a list.
seen = set()
seen.add((0, 0)) # valid
# seen.add([0, 0]) # TypeError: unhashable type: 'list'
dist = {}
dist[(1, 2)] = 5 # key is tuple (1, 2)
In DSA, “visited cells” in a 2D grid are often stored as set() of (r, c) tuples. State in memoization (e.g., DP) is often a tuple of parameters so it can be used as a key.
Tuple vs List: When to Use Which
- Tuple: Fixed structure, multiple return values, keys or set members, or when you want to signal “this sequence is not meant to change.”
- List: Need to append, remove, or change elements; building a sequence over time; stack/queue.
Tuples are slightly more memory-efficient and can be slightly faster for creation and access in some cases, but the main reason to choose a tuple is semantics: immutability and hashability.
A tuple is only hashable if every element is hashable. ([1, 2], 3) contains a list, so it cannot be put in a set or used as a dict key. Use (tuple([1, 2]), 3) or keep the inner structure immutable.
When returning multiple values, return i, j is a tuple; the caller can write i, j = f(). For BFS/DFS on a grid, store coordinates as (r, c) in the queue and in the visited set so you have one consistent, hashable type.
Summary
- Tuples are immutable, ordered sequences; same indexing and slicing as lists, but no assignment.
- Tuples are hashable (if elements are hashable)—use as dict keys and set elements; lists are not.
- Use tuples for fixed structure, multi-value returns, and composite keys (e.g.,
(r, c),(i, j)).
2.7 Sets
Introduction
A set is an unordered collection of unique, hashable elements. No duplicates are stored; adding the same element again has no effect. Membership test (x in s), add, and remove are O(1) average. Sets are the go-to structure when you need “have I seen this?” or “distinct values” without caring about order or index.
Why Sets Matter for DSA
Many problems reduce to “check if this value exists” or “collect unique items.” With a list, x in arr is O(n); with a set it’s O(1) average—so a single pass with a set can replace nested loops (e.g., Two Sum: for each element, check if complement is in a set). Sets also give you deduplication for free: set([1,2,2,3]) → {1, 2, 3}. In graph BFS/DFS, “visited” is often a set of nodes. Use a set whenever the key operation is membership or uniqueness.
Creating Sets
s = set() # empty set (not {} — that's empty dict)
s = {1, 2, 3} # literal
s = set([1, 2, 2, 3]) # from iterable: {1, 2, 3}, duplicates removed
s = set("hello") # {'h', 'e', 'l', 'o'} — unique chars
Elements must be hashable (immutable: int, str, tuple, etc.). You cannot have a list or another set inside a set.
Key Operations and Time Complexity
x in s,x not in s— O(1) average.s.add(x)— add element; no effect if already present. O(1) average.s.remove(x)— removex; raisesKeyErrorif not in set.s.discard(x)— removexif present; no error if absent. O(1) average.len(s)— O(1).
There is no indexing: sets are unordered. Iteration order is arbitrary (and can change). Use for x in s when you only need to process each element once.
Set Operations
Mathematical set operations are built in:
a = {1, 2, 3}
b = {2, 3, 4}
a | b # union: {1, 2, 3, 4}
a & b # intersection: {2, 3}
a - b # difference (in a, not in b): {1}
a ^ b # symmetric difference (in one but not both): {1, 4}
These return a new set. In-place versions: a.update(b) (union), a.intersection_update(b), a.difference_update(b). Useful for “remove all seen elements” or “keep only common elements.”
No Duplicates, No Order
Adding the same element multiple times doesn’t change the set. So building a set from a list is an easy way to get unique values—but you lose order. If you need “unique but preserve insertion order” (Python 3.7+), dict.fromkeys(arr) gives you an ordered mapping; list(dict.fromkeys(arr)) is unique elements in first-seen order.
“Remove duplicates from a list” → list(set(arr)) is O(n) but order is arbitrary. For “unique, keep first occurrence order”: list(dict.fromkeys(arr)).
When to Use a Set
- Membership: “Is x in the collection?” — use a set for O(1) instead of list O(n).
- Uniqueness: “Collect distinct values” — add to a set; duplicates are ignored.
- Visited / seen: In BFS/DFS, store visited nodes in a set for O(1) check and add.
- No indexing needed: If you don’t need order or position, set is simpler and faster for in/add/remove.
frozenset
frozenset is an immutable set. It’s hashable, so it can be used as a dict key or as an element of another set. Use it when you need a set that must not change and must be storable in a set or as a key (e.g., “set of frozensets” for distinct subsets).
fs = frozenset([1, 2, 3])
# fs.add(4) # AttributeError
seen_subsets = {fs} # valid — frozenset is hashable
Using a list when you need membership: if x in arr in a loop makes the overall algorithm O(n²). If you’re only doing membership checks and adds, use a set for O(n) total. Also: {} is an empty dict, not an empty set—use set() for an empty set.
In “find two elements that sum to target,” maintain a set of values seen so far. For each new value x, check if target - x is in the set—O(1)—then add x. One pass, O(n). Same idea for “first duplicate,” “two arrays have a common element,” etc.
Summary
- Sets store unique, hashable elements; unordered; no indexing.
in,add,remove,discardare O(1) average—use sets for fast membership and deduplication.- Use a set for “seen,” “visited,” “distinct values,” or any problem where the main operation is “is this in the collection?”
2.8 Dictionaries
Introduction
A dictionary (dict) is a mapping from keys to values. Keys must be hashable (immutable); values can be anything. Lookup, insert, and delete by key are O(1) average. In Python 3.7+, dictionaries preserve insertion order. For DSA, dicts are essential: frequency counts, “value → index” or “value → count,” memoization caches, and graph adjacency (when nodes are hashable) all use dicts.
Why Dictionaries Matter for DSA
Whenever you need “for this key, what’s the value?” or “have I seen this key and what did I store?” a dict is the right structure. Two Sum: map each value to its index so you can find a complement in O(1). Frequency: map each element to its count in one pass. Memoization: map (arguments as key) to return value. Subarray with sum K: prefix sum → count of prefixes seen. Most hash-based optimizations in this course are implemented with a dict.
Creating Dictionaries
d = {} # empty dict
d = {"a": 1, "b": 2, "c": 3} # literal
d = dict() # empty
d = dict(a=1, b=2) # keyword args (keys are strings)
d = dict([("a", 1), ("b", 2)]) # from list of (key, value) pairs
d = dict.fromkeys([1, 2, 3], 0) # {1: 0, 2: 0, 3: 0} — same value for all keys
Accessing and Modifying
d[key] returns the value for key; raises KeyError if the key is missing. d[key] = value sets or overwrites. del d[key] removes the key; key in d is O(1) average. Use d.get(key, default) to avoid KeyError: returns d[key] if key exists, else default (or None if no default given).
d = {"x": 10, "y": 20}
d["x"] # 10
d["z"] = 30 # add or update
d.get("x") # 10
d.get("w") # None
d.get("w", 0) # 0 — default when key missing
Key Operations and Time Complexity
d[key],d[key] = value,del d[key],key in d— O(1) average.len(d)— O(1).d.get(key, default)— O(1) average.
Iteration: for k in d iterates keys; d.keys(), d.values(), d.items() give key, value, or (key, value) pairs. In Python 3 these are view-like and reflect the current dict; iteration order is insertion order (3.7+).
Common Patterns in DSA
- Frequency count:
for x in arr: d[x] = d.get(x, 0) + 1— one pass, O(n). - Value → index (e.g., Two Sum):
for i, x in enumerate(arr):stored[x] = i(or list of indices if duplicates matter). Checktarget - xindbefore storing. - Memoization: Key = tuple of arguments (or something hashable); value = computed result. Check
if key in d: return d[key]before computing.
# Frequency count
freq = {}
for x in arr:
freq[x] = freq.get(x, 0) + 1
defaultdict
collections.defaultdict(factory) is a dict that never raises KeyError on access: if the key is missing, it creates a value with factory() and stores it. defaultdict(int) gives 0 for missing keys—so freq[x] += 1 works without get. defaultdict(list) gives [] for missing keys—useful for “group by key.” Same O(1) average behavior as dict.
from collections import defaultdict
freq = defaultdict(int)
for x in arr:
freq[x] += 1 # no get() needed
groups = defaultdict(list)
for item in items:
groups[item.key].append(item) # each key gets a list
Keys Must Be Hashable
Dictionary keys must be immutable and hashable: int, str, tuple (of hashable elements), etc. You cannot use a list or another dict as a key. For composite keys (e.g., (i, j) for 2D state), use a tuple. If you need to key by something mutable, convert to something hashable (e.g., tuple(sorted(items)) for a “canonical” set-like key).
Using a mutable type as a key: d[[1,2]] = 3 raises TypeError: unhashable type: 'list'. Use a tuple: d[(1, 2)] = 3. Also: d[key] when key might be missing raises KeyError—use d.get(key) or check key in d first.
For “first index where each value appears,” store d[x] = i only when x is not already in d, so the first index is kept. For “last index,” overwrite every time: d[x] = i. Choose based on what the problem needs.
Summary
- Dictionaries map hashable keys to values; lookup, insert, delete by key are O(1) average.
- Use
d.get(key, default)to avoid KeyError; usedefaultdictwhen you want a default value for missing keys. - Dicts are the standard tool for frequency counts, value→index (Two Sum), and memoization.
2.9 Strings
Introduction
A string in Python is an immutable sequence of Unicode characters. Like a tuple, you can index and slice it, but you cannot change a character in place. Strings are hashable (so they can be dict keys or set elements) and support many methods for searching, splitting, and transforming. For DSA, strings show up in palindromes, anagrams, pattern matching, and parsing—and building strings efficiently (e.g., with ''.join()) matters for performance.
Why Strings Matter for DSA
String problems often ask: “Is it a palindrome?” “Are two strings anagrams?” “Find pattern in text.” You’ll index by position (s[i]), slice substrings (s[l:r]), and compare or count characters. Because strings are immutable, “changing” a character means creating a new string (e.g., s[:i] + c + s[i+1:])—O(n). For heavy string building, use a list of characters and ''.join(...) at the end for O(n) total instead of O(n²) from repeated concatenation.
Indexing and Slicing
Same as lists: 0-based indices, negative indices from the end. s[i] is O(1). s[start:stop] returns a new string (substring); omitted start/stop mean “from start” / “to end.” s[::-1] is the reversed string—handy for palindrome checks. Slicing is O(k) where k is the slice length.
s = "hello"
s[0] # 'h'
s[-1] # 'o'
s[1:4] # "ell"
s[::-1] # "olleh" — reverse
Immutability
You cannot do s[0] = 'H'—that raises TypeError. To “change” a string you build a new one: s = s[:i] + new_char + s[i+1:], or use methods like replace, upper, etc., which all return new strings. The original string is never modified.
Concatenation and Building Strings
a + b creates a new string of length len(a)+len(b). Doing result = result + c in a loop is O(n²) over n characters, because each concatenation copies the whole result. For building a string from many parts, collect parts in a list and join once: ''.join(parts) is O(n).
# Slow: O(n^2)
result = ""
for c in s:
result = result + c.upper() # avoid in loops
# Fast: O(n)
result = ''.join(c.upper() for c in s)
# Or: parts = []; parts.append(...); ''.join(parts)
When you need to build a string character by character or from many segments, use parts.append(...) and then ''.join(parts). Append to list is amortized O(1); one join at the end is O(n). Repeated s += c in a loop is O(n²).
Common Methods
s.split(sep=None)— split by whitespace (default) or bysep; returns list of strings."a b c".split()→['a','b','c'].s.strip(),s.lstrip(),s.rstrip()— remove leading/trailing whitespace (or specified chars).s.upper(),s.lower()— return new string in upper/lower case.s.replace(old, new)— return new string with all occurrences ofoldreplaced bynew.s.startswith(prefix),s.endswith(suffix)— boolean.s.find(sub)— index of first occurrence ofsub, or -1.s.index(sub)same but raises ValueError if not found.s.count(sub)— number of non-overlapping occurrences ofsub.
All of these return new strings or other values; they do not modify s.
Membership and Iteration
c in s is O(n) — Python scans the string. for c in s iterates over characters. for i, c in enumerate(s) gives index and character. To get a list of characters: list(s). To convert a list of characters back: ''.join(lst).
s = "abc"
list(s) # ['a', 'b', 'c']
''.join(['a', 'b', 'c']) # "abc"
Comparing Strings
Strings are compared lexicographically (dictionary order): character by character, using Unicode code points. ==, !=, <, >, <=, >= all work. So sorted(list_of_strings) sorts them alphabetically. For case-insensitive sort: sorted(lst, key=str.lower).
Building a string with s += c in a loop—O(n²). Use a list and ''.join(). Also: s[i] returns a string of length 1 (a character); in Python there is no separate “char” type. So s[0] == 'h' is correct.
For “check if palindrome”: s == s[::-1] is simple but uses O(n) extra space for the reversed slice. For O(1) space, use two pointers from both ends. For anagrams, sorted(s1)==sorted(s2) is O(n log n); frequency count with a dict or Counter is O(n).
Summary
- Strings are immutable sequences of characters; index and slice like lists; slicing creates a new string.
- Build strings with
''.join(parts), not repeateds += cin a loop, to avoid O(n²). - Use
split,strip,upper/lower,replace,find,startswith/endswithas needed; all return new values.
2.10 List Comprehension
Introduction
A list comprehension is a compact syntax for building a list from an iterable (and optionally filtering). Instead of a multi-line loop that appends to a list, you write a single expression in brackets. It’s idiomatic in Python, readable once you’re used to it, and often slightly faster than an equivalent loop. For DSA you’ll use it to build lists of indices, transformed values, or filtered elements in one line.
Basic Syntax
[expression for item in iterable] — for each item in the iterable, evaluate expression and collect the results into a new list. The result has the same length as the iterable (unless you add a filter).
squares = [x * x for x in range(5)] # [0, 1, 4, 9, 16]
uppers = [c.upper() for c in "hello"] # ['H', 'E', 'L', 'L', 'O']
lengths = [len(w) for w in ["a", "bb", "ccc"]] # [1, 2, 3]
With a Filter: if
[expression for item in iterable if condition] — only include expression when condition is true. The result can be shorter than the iterable.
evens = [x for x in range(10) if x % 2 == 0] # [0, 2, 4, 6, 8]
pos = [x for x in arr if x > 0] # only positive elements
There is no else in the filter position. To map some items to one value and others to another, use a conditional expression in the expression part: [a if cond else b for x in ...].
# Map: even -> "e", odd -> "o"
labels = ["e" if x % 2 == 0 else "o" for x in range(5)] # ['e','o','e','o','e']
Nested Loops
You can use multiple for clauses. They nest left to right: the rightmost iterable is the “inner” loop.
pairs = [(i, j) for i in range(2) for j in range(2)]
# [(0,0), (0,1), (1,0), (1,1)]
Equivalent to:
pairs = []
for i in range(2):
for j in range(2):
pairs.append((i, j))
You can add an if after the loops to filter. Keep comprehensions readable; if they get long or complex, an explicit loop is often clearer.
When to Use List Comprehensions
- Use: Simple “build a list from an iterable” or “build with one filter”—one line, easy to read.
- Avoid: Complex logic, multiple conditions, or when you need side effects (e.g., printing, updating other variables). Use an explicit
forloop instead.
A list comprehension always produces a new list. If you don’t need to keep the list (e.g., you’re only iterating once), a generator expression (expr for x in it) saves memory—same syntax, but with parentheses. sum(x*x for x in range(5)) doesn’t build a list of squares.
Dict and Set Comprehensions
Same idea with different brackets: {key_expr: value_expr for x in it} builds a dict; {expr for x in it} builds a set. Useful for “list of pairs → dict” or “unique values from a transformation.”
d = {x: x * x for x in range(5)} # {0:0, 1:1, 2:4, 3:9, 4:16}
s = {c.upper() for c in "hello"} # {'H', 'E', 'L', 'O'}
Putting side effects (e.g., print(x)) in a comprehension—it works but is confusing. Use a loop. Also: [x for x in it if x] keeps truthy values; [x if x else 0 for x in it] maps falsy to 0. The if after the loop filters; the x if cond else y in the expression chooses between two values.
In DSA, list comprehensions are handy for “indices where condition holds”: [i for i, x in enumerate(arr) if x > 0], or “transform and collect”: [x * 2 for x in arr]. Keep the expression and condition simple so the line stays readable.
Summary
- List comprehension:
[expr for item in iterable]or[expr for item in iterable if cond]. - Use for simple “build list / filter list” in one line; use explicit loops for complex logic or side effects.
- Dict comprehension:
{k: v for ...}; set comprehension:{expr for ...}. Generator:(expr for ...)when you don’t need a full list.
2.11 Lambda Functions
Introduction
A lambda is an anonymous function defined with a single expression. You write lambda arguments: expression—no def, no block, no explicit return. Lambdas are useful when you need a small function as an argument (e.g., key for sorted, or a comparator). For anything longer or reusable, use a normal def function.
Syntax
lambda x: x * 2 is a function that takes one argument x and returns x * 2. You can have multiple arguments: lambda a, b: a + b. You cannot have multiple statements or assignments—only one expression. The expression is implicitly returned.
f = lambda x: x * 2
f(5) # 10
g = lambda a, b: a - b
g(7, 3) # 4
Common Use: sorted and key
Many built-ins accept a key function that maps each element to a value used for comparison. sorted(iterable, key=...) sorts by the result of key(item). A lambda is often the shortest way to supply that.
arr = [("b", 2), ("a", 3), ("c", 1)]
sorted(arr) # by first element: [('a',3), ('b',2), ('c',1)]
sorted(arr, key=lambda t: t[1]) # by second: [('c',1), ('b',2), ('a',3)]
# Sort list of numbers by absolute value
sorted([-3, 1, -2], key=lambda x: abs(x)) # [1, -2, -3]
Same idea for min, max: min(items, key=lambda x: x.weight). For custom comparison in sorting (e.g., multiple criteria), you can use key=lambda x: (x.a, -x.b) to sort by a ascending and b descending.
map and filter
map(f, iterable) returns an iterator of f(x) for each x. filter(pred, iterable) returns an iterator of elements for which pred(x) is true. Lambdas can be used as f or pred.
list(map(lambda x: x * 2, [1, 2, 3])) # [2, 4, 6]
list(filter(lambda x: x > 0, [-1, 2, 0, 3])) # [2, 3]
In Python, list comprehensions are often clearer: [x * 2 for x in lst] and [x for x in lst if x > 0]. Use map/filter with lambda when you prefer that style or when you need an iterator without building a list.
When to Use Lambda vs def
- Lambda: One-off function passed to
sorted,min,max,key=, or similar; expression fits on one line and is simple. - def: Reusable logic, multiple lines, default arguments, or when a name would make the code clearer. If you’re doing
f = lambda ...and usingfin several places, a named function is better.
Using a lambda for anything that needs more than one expression. Lambdas can’t contain statements (assignments, loops, etc.). If the logic is at all involved, use def. Also: lambda: x doesn’t “capture” x at definition time in the way you might expect in a loop—closure over loop variables can be tricky; prefer passing the value as a default argument or using a named function.
In DSA, the most common use of lambda is sorted(arr, key=lambda x: (x[0], -x[1])) or similar—sort by one field ascending, another descending. For heap operations you’ll see key= or a custom comparator; in Python’s heapq, you often store (priority, item) or use a custom class, but lambda isn’t used there. Keep lambdas short and readable.
Summary
- Lambda = anonymous function:
lambda args: expression; one expression only, implicitly returned. - Use with
sorted(..., key=lambda x: ...),min/maxkey=, ormap/filterwhen the logic is one short expression. - Use
deffor anything multi-line, reusable, or complex.
2.12 Sorting in Python
Introduction
Sorting arranges elements in a defined order (ascending or descending). In Python you have two main ways: sorted(iterable)—which returns a new sorted list and leaves the original unchanged—and list.sort()—which sorts the list in place and returns None. Both use the same underlying algorithm (Timsort) and support a key function and reverse flag. For DSA, you’ll sort to enable binary search, two-pointer techniques, or to order items by a custom rule; knowing the API and the cost is essential.
Why Sorting Matters for DSA
Many algorithms assume or produce sorted data: binary search requires a sorted array; “merge two sorted lists” and “find pairs with sum K” often start by sorting; greedy problems sometimes need items ordered by value or deadline. Python’s sort is O(n log n) and stable—so you get predictable, efficient ordering. Custom key lets you sort by a field, by multiple criteria, or by a computed value without writing a comparison function by hand.
Two Ways to Sort
sorted(iterable, key=..., reverse=...)
Built-in function. Accepts any iterable (list, tuple, string, etc.) and returns a new list in sorted order. The original is not modified. Use when you need to keep the original or when the input isn’t a list (e.g., sorted("hello") → ['e','h','l','l','o']).
arr = [3, 1, 4, 1, 5]
new_list = sorted(arr) # [1, 1, 3, 4, 5]; arr unchanged
s = "hello"
sorted(s) # ['e', 'h', 'l', 'l', 'o'] — returns list of chars
list.sort(key=..., reverse=...)
Method on lists only. Sorts the list in place and returns None. Slightly more efficient when you don’t need the original: no extra list is allocated. Use when the variable is a list and you’re fine mutating it.
arr = [3, 1, 4, 1, 5]
arr.sort() # arr is now [1, 1, 3, 4, 5]; returns None
# new_list = arr.sort() # wrong — new_list would be None
| Feature | sorted() |
list.sort() |
|---|---|---|
| Input | Any iterable | List only |
| Return value | New sorted list | None (mutates list) |
| Original | Unchanged | Modified in place |
The key Parameter
Both sorted() and list.sort() accept key=function. For each element x, the sort compares key(x) instead of x. So you can sort by a field, by a transformation, or by a tuple for multiple criteria—without implementing a comparator. The key function is called once per element; the result is cached, so cost is O(n) key calls plus O(n log n) comparisons.
Sort by a Single Field
pairs = [(2, "b"), (1, "a"), (2, "a")]
sorted(pairs) # by first elem: [(1,'a'), (2,'a'), (2,'b')]
sorted(pairs, key=lambda p: p[1]) # by second: [(2,'a'), (1,'a'), (2,'b')]
Sort by Multiple Criteria
Return a tuple from key. Python compares tuples lexicographically: first element, then second, and so on. To sort by field A ascending and field B descending, use (x.a, -x.b) for numbers (negate for descending), or a custom key that returns a tuple where the “descending” part is inverted (e.g., -x.b for numeric B).
# Sort by second element ascending, then first descending
pairs = [(2, 1), (3, 1), (1, 2)]
sorted(pairs, key=lambda p: (p[1], -p[0])) # [(3,1), (2,1), (1,2)]
Sort by Computed Value
arr = [-4, 2, -1, 3]
sorted(arr, key=abs) # [-1, 2, 3, -4] — by absolute value
sorted(arr, key=lambda x: -x) # [3, 2, -1, -4] — descending (key negated)
The reverse Parameter
reverse=True sorts in descending order. Same comparisons as ascending; order of output is reversed. Equivalent to sorting by key(x) and then reversing, but reverse=True is built in and clear.
sorted([3, 1, 4], reverse=True) # [4, 3, 1]
Stability
Python’s sort is stable: when two elements compare equal, their relative order in the output is the same as in the input. So you can sort by one field first, then sort again by another field—the second sort preserves the order of ties from the first. Example: sort by last name, then by first name; people with the same last name stay in the order of first name from the previous pass. Or: sort by value, then by index—equal values keep original index order.
Stability is why “sort by (primary_key, secondary_key)” works: the tuple comparison gives primary first, and when primary is equal, secondary breaks the tie. You get the same effect as a stable sort by primary then by secondary.
Time and Space Complexity
Python uses Timsort (a hybrid of merge sort and insertion sort). Time: O(n log n) in the average and worst case. Each key is computed once—O(n) key calls—and the comparison-based sort does O(n log n) comparisons. Space: sorted() allocates a new list—O(n) extra space. list.sort() sorts in place but Timsort uses O(n) auxiliary space for the merge step. So both are O(n log n) time, O(n) space; sort() avoids a second list.
Edge Cases
- Empty iterable:
sorted([])→[];[].sort()does nothing. Both are safe. - Single element: Returns or leaves the list with one element; no error.
- Duplicates: All kept; order among equal elements is stable (same as input).
- Mixed types: In Python 3, comparing incompatible types (e.g., int vs str) raises
TypeError. Ensure elements are comparable or supply akeythat returns a comparable type (e.g., all numbers or all strings).
Sorting Custom Objects
By default, objects are compared by identity (or by __lt__ if defined). To sort by an attribute, use key=lambda obj: obj.attr. For multiple attributes, key=lambda obj: (obj.a, obj.b). You can also define __lt__ (and optionally other rich comparison methods) on the class so the object is directly comparable; then sorted(list_of_objs) works without key. For one-off sorts, key is usually simpler.
class Person:
def __init__(self, name, age):
self.name, self.age = name, age
people = [Person("Bob", 30), Person("Alice", 25)]
sorted(people, key=lambda p: p.age) # by age
sorted(people, key=lambda p: (p.age, p.name)) # by age, then name
Common Mistakes
- Expecting
list.sort()to return the list: It returnsNone. Writearr.sort()and then usearr; don’t assign the return value. - Sorting in place when you need the original: If you still need the unsorted list, use
sorted(arr)and assign to a new variable, or copy first:arr_copy = arr.copy(); arr_copy.sort(). - Wrong multi-criteria order: Remember that
key=lambda x: (x.a, x.b)sorts byafirst, thenb. For descending on one field, use negation (numbers) or a key that inverts order.
arr = arr.sort() sets arr to None because sort() returns None. Use arr.sort() (no assignment) for in-place, or arr = sorted(arr) for a new list.
When the problem needs “order by X, then by Y,” say: “I’ll sort with key=lambda x: (x.X, x.Y)” or “by (X, -Y) if Y should be descending.” Mention that Python’s sort is stable and O(n log n). If you need to preserve the original list, use sorted() and assign to a new variable.
Summary
sorted(iterable)returns a new sorted list;list.sort()sorts in place and returnsNone.- Use
key=functionto sort by a field, computed value, or tuple for multiple criteria; usereverse=Truefor descending. - Python’s sort is stable and O(n log n) time, O(n) space; choose
sorted()vssort()based on whether you need to keep the original.
2.13 Time Complexity of Built-ins
Introduction
To analyze your algorithms you need to know the cost of the operations you use. Python's built-in types have specific time complexities: list index is O(1), but x in list is O(n); dict and set lookup are O(1) average; string concatenation in a loop is O(n²). This section summarizes the main time complexities of built-in operations so you can reason correctly about your overall complexity and avoid hidden bottlenecks.
Why This Matters for DSA
If you use in on a list inside a loop, you may turn an O(n) idea into O(n²). If you use a set for membership instead, you stay O(n). Choosing the right structure and knowing the cost of each call (index, append, insert, in, sort, etc.) lets you state and optimize complexity accurately in interviews and in code.
List
| Operation | Time |
|---|---|
arr[i], arr[i] = x | O(1) |
arr.append(x), arr.pop() | O(1) amortized |
arr.insert(i, x), arr.pop(i) | O(n) |
x in arr, arr.index(x), arr.count(x) | O(n) |
arr + other, arr[k:j] (slice of size k) | O(n) / O(k) |
arr.sort(), sorted(arr) | O(n log n) |
Dict and Set
Average case assumes a good hash function and no excessive collisions. Worst case (many collisions) can degrade toward O(n) per operation, but in practice the average case is the one to use for analysis.
| Operation | Time (average) |
|---|---|
d[key], d[key]=v, del d[key], key in d | O(1) |
s.add(x), s.discard(x), x in s | O(1) |
Iteration over d or s | O(n) |
String
| Operation | Time |
|---|---|
s[i], s[i:j] (slice length k) | O(1), O(k) |
sub in s | O(n) typical |
s + t (lengths n, m) | O(n + m) |
''.join(list_of_strings) (total length n) | O(n) |
Repeated s += c in a loop does a new concatenation each time (copy of growing string), so total is O(n²). Use ''.join(parts) for O(n).
Deque (collections.deque)
append, appendleft, pop, popleft are O(1). Indexing d[i] in the middle is O(n). Use when you need O(1) at both ends (queue, sliding window).
When analyzing a loop, multiply "number of iterations" by "cost per iteration." If the inner step is x in list (O(n)), and the loop runs n times, that's O(n²). Switching to a set makes the inner step O(1), so total O(n).
Summary
- List: index/append/pop at end O(1) (append amortized); insert/pop at front or middle,
in, index/count O(n); sort O(n log n). - Dict / Set: get, set, delete,
inO(1) average; iteration O(n). - String: index O(1), slice O(k);
inO(n); avoid repeated+=, use''.join().
2.14 collections Module
Introduction
The collections module provides high-performance alternatives and extensions to built-in types. For DSA the most used are: deque (double-ended queue with O(1) append/pop at both ends), Counter (count hashable elements and get frequencies), and defaultdict (dict with a default value for missing keys). Knowing when to use each keeps your code simple and efficient.
deque (Double-Ended Queue)
collections.deque is a sequence that supports O(1) append, appendleft, pop, and popleft. Use it instead of a list when you need to add or remove at both ends (e.g., BFS queue, sliding window from both sides). Indexing in the middle is O(n), so use for queue/stack style access, not random access.
from collections import deque
q = deque()
q.append(1)
q.append(2)
q.appendleft(0) # [0, 1, 2]
x = q.popleft() # 0, q is [1, 2]
y = q.pop() # 2, q is [1]
BFS pattern: q = deque([start]), then while q: node = q.popleft(); ...; q.append(neighbor). This is the standard queue for level-order traversal and shortest path in unweighted graphs.
List insert(0, x) and pop(0) are O(n) because elements shift. Deque uses a linked structure internally, so both ends are O(1). For a queue or a stack that sometimes needs to pop from the left, always use deque.
Counter
Counter counts hashable elements. Give it an iterable (list, string) and it returns a dict-like object: keys are elements, values are counts. in, [], and iteration work like dict; missing keys return 0 (no KeyError).
from collections import Counter
c = Counter([1, 2, 2, 3, 3, 3]) # Counter({3: 3, 2: 2, 1: 1})
c = Counter("hello") # Counter({'l': 2, 'h': 1, 'e': 1, 'o': 1})
c["x"] # 0 (missing key)
c.most_common(2) # [(l, 2), (h, 1)] — 2 most common
c.subtract(Counter("he")) # subtract counts in place
Building a frequency map from an array: freq = Counter(arr) is one line and O(n). Use most_common(k) for “top k frequent” and elements() to iterate with repetition. For anagrams, Counter(s1) == Counter(s2) is a clean O(n) check.
Many “count frequency” or “top k frequent” problems are one-liners with Counter. If you need to add or subtract counts (e.g., sliding window frequencies), use c.subtract(iterable) or update with another Counter. For “first k most common,” most_common(k) returns a list of (elem, count) pairs.
defaultdict
defaultdict(factory) is a dict that calls factory() when a key is missing. defaultdict(int) gives 0 for missing keys—so d[x] += 1 works without get. defaultdict(list) gives []—so d[key].append(item) groups by key. Same O(1) average as dict.
from collections import defaultdict
freq = defaultdict(int)
for x in arr:
freq[x] += 1 # no KeyError
groups = defaultdict(list)
for item in items:
groups[item.key].append(item)
When to Use Which
- deque: Queue (BFS), stack with occasional left pop, sliding window that shrinks from the left.
- Counter: Frequency count from iterable, “top k frequent,” anagram check, subtract/add counts.
- defaultdict: Frequency (int) or “group by key” (list) when you want to avoid
getor explicit key checks.
Summary
- deque: O(1) append/pop at both ends; use for BFS queue and when you need operations at both ends.
- Counter: Count elements from iterable;
most_common(k),subtract; ideal for frequency and anagram problems. - defaultdict: Dict with default value for missing keys; use for counts (int) or grouping (list).
2.15 heapq
Introduction
The heapq module provides a min-heap implementation on top of a list. The smallest element is always at index 0; after you pop it, the heap reorganizes so the next smallest is at the top. Operations heappush, heappop, and heapify are O(log n). There is no separate heap type—you use a list and call heapq functions so the list satisfies the heap invariant. For DSA, heaps are used for “top K,” “merge K sorted lists,” and priority queues (e.g., Dijkstra).
Why heapq Matters for DSA
When you need repeated “smallest (or largest) element” or “smallest among many candidates,” a heap gives O(log n) insert and O(log n) extract-min. That’s better than sorting the whole list each time (O(n log n)) or scanning for the min (O(n)). Top K elements, merge K sorted lists, and Dijkstra’s algorithm all rely on this “get min, add new candidates” pattern.
Basic API
heapq.heapify(lst)— Turn listlstinto a min-heap in place. O(n). After this,lst[0]is the smallest element.heapq.heappush(heap, x)— Addxto the heap, keeping the heap invariant. O(log n).heapq.heappop(heap)— Remove and return the smallest element. O(log n).heapq.heappushpop(heap, x)— Pushxthen pop the smallest. More efficient than push + pop when you’re replacing the min.heapq.nsmallest(k, iterable)/heapq.nlargest(k, iterable)— Return the k smallest or k largest. Useful for one-off “top k”; for streaming or repeated updates, maintain a heap yourself.
import heapq
arr = [3, 1, 4, 1, 5]
heapq.heapify(arr) # arr is now a min-heap; arr[0] == 1
x = heapq.heappop(arr) # 1, heap size reduced
heapq.heappush(arr, 0) # add 0; 0 becomes new min
smallest = arr[0] # 0 — peek without popping (don't pop yet)
Min-Heap vs Max-Heap
heapq implements a min-heap only. For a max-heap, negate values: push -x, and when you pop, negate again to get the original maximum. Or use key with a wrapper: store (-priority, item) so that “largest priority” becomes “smallest -priority” and pops first.
# Max-heap: negate
heapq.heappush(heap, -x) # push negative
max_val = -heapq.heappop(heap) # pop and negate back
Time Complexity
heapify: O(n). heappush, heappop: O(log n) each. Peeking the min is O(1) (heap[0]). So “push n elements then pop n times” is O(n log n). “Top k from n” with a size-k heap: O(n log k)—push each of n elements, heap never exceeds size k.
Example: Top K Elements
Keep a min-heap of size k. For each element: if heap has fewer than k elements, push it; else if the new element is larger than the heap’s minimum, pop the min and push the new one. At the end, the heap contains the K largest. To get them in order, pop repeatedly (smallest of the K first) or use nsmallest on the heap. Alternatively, push all with negated values for “K largest” and pop k times from a max-heap (negated) to get largest first.
def top_k_largest(arr, k):
heap = arr[:k]
heapq.heapify(heap) # min-heap of first k
for x in arr[k:]:
if x > heap[0]:
heapq.heapreplace(heap, x) # pop min, push x (or heappop + heappush)
return heap # k largest (unordered in heap; sort if needed)
heapq.heapreplace(heap, x) pops the smallest and then pushes x—equivalent to heappop then heappush, but slightly more efficient.
The heap is just a list; the order of elements in the list is not fully sorted—only the heap invariant holds (parent ≤ children). So heap[0] is the min, but heap[1] is not necessarily the second smallest. Don’t iterate the list expecting sorted order; use heappop to get elements in order.
For “merge K sorted lists,” push the first element of each list (with list index or iterator) into a min-heap. Pop the smallest, then push the next element from the same list. Repeat until the heap is empty. Each push/pop is O(log K); total O(N log K) for N total elements.
Summary
- heapq gives a min-heap on a list:
heapify,heappush,heappop; min is always atheap[0]. - For a max-heap, negate values when pushing and when popping.
- Use for top K, merge K sorted lists, and priority-queue algorithms (e.g., Dijkstra).
2.16 bisect
Introduction
The bisect module provides binary search on sorted lists. It finds the position where an element would be inserted to keep the list sorted, or where an existing element sits. All operations are O(log n). Use it when you have a sorted sequence and need “find index of x” or “insert x and keep sorted” without writing binary search by hand.
Why bisect Matters for DSA
Binary search on a sorted array is a core pattern: search for a value, find the first position ≥ x, or the last position ≤ x. bisect_left and bisect_right give you these insertion points; from that you can check “is x present?” or “how many elements are < x?”. Many problems (search in rotated array, lower/upper bound, binary search on answer) build on the same idea—bisect is the standard library way to get the indices right.
Main Functions
bisect.bisect_left(a, x)— Leftmost index wherexcan be inserted so the list stays sorted. Ifxis already ina, returns the index of the first occurrence. So “is x in a?” →i = bisect_left(a, x); i < len(a) and a[i] == x.bisect.bisect_right(a, x)(aliasbisect.bisect(a, x)) — Rightmost index wherexcan be inserted. Ifxis ina, one past the last occurrence. So “count of elements ≤ x” →bisect_right(a, x); “count of elements < x” →bisect_left(a, x).bisect.insort_left(a, x)— Insertxinaat the position that keeps order (before equal elements). O(n) for the shift.bisect.insort_right(a, x)(aliasbisect.insort(a, x)) — Insertxafter equal elements. O(n) for the shift.
import bisect
a = [1, 2, 2, 3, 4]
bisect.bisect_left(a, 2) # 1 — first position for 2
bisect.bisect_right(a, 2) # 3 — one past last 2
bisect.bisect_left(a, 5) # 5 — would go at end
# Is 2 in a? i = bisect_left(a, 2); i < len(a) and a[i] == 2 → True
Left vs Right
Use bisect_left when you want “first index ≥ x” or “where to insert so x is before equals.” Use bisect_right when you want “first index > x” (one past last x) or “count of elements ≤ x” (the return value is that count when the list is 0-indexed and sorted). For “lower bound” (first ≥ x): bisect_left. For “upper bound” (first > x): bisect_right.
Time Complexity
All bisect functions are O(log n) for the search. insort is O(n) because it shifts elements to make room. So use bisect for search; use insort only when you occasionally need to maintain a sorted list with inserts (e.g., small dynamic set). For many inserts, consider a structure that supports O(log n) insert (e.g., sorted container from a library) or batch sort.
The list must be sorted (ascending) for bisect to be correct. If the list is not sorted, the result is meaningless. Also: bisect_left returns an index that can be len(a) (x is greater than all elements)—check bounds before using as an index.
“Lower bound” = first index i such that a[i] >= x → bisect_left(a, x). “Upper bound” = first index i such that a[i] > x → bisect_right(a, x). Number of elements in [L, R] in sorted a → bisect_right(a, R) - bisect_left(a, L).
Summary
- bisect_left(a, x) = leftmost insertion point (first ≥ x); bisect_right(a, x) = rightmost (first > x). List must be sorted.
- Use for “is x in sorted list?,” “count in range [L, R],” lower/upper bound. O(log n) search; insort is O(n).
2.17 itertools
Introduction
The itertools module provides iterator building blocks: combinations, permutations, Cartesian product, chaining iterables, grouping, and more. These are lazy (one element at a time) and memory-efficient. For DSA you’ll use them for “all subsets of size k,” “all permutations,” “all pairs from two lists,” and similar enumeration tasks without writing nested loops or recursion by hand.
Why itertools Matters for DSA
Problems that ask for “all combinations,” “all permutations,” or “iterate over pairs (i, j)” can be done with itertools.combinations, permutations, or product. They’re correct, readable, and avoid off-by-one errors. For backtracking you often write your own recursion—but when the problem is “enumerate all,” the standard iterators are a quick and correct option (and you can wrap them in list() if you need the full list, though that uses O(n) space).
Combinations and Permutations
itertools.combinations(iterable, r)— All subsequences of lengthrin iterable order; no repeats, (a,b) same as (b,a).itertools.combinations_with_replacement(iterable, r)— Same but elements can repeat (e.g., (1,1), (1,2), (2,2)).itertools.permutations(iterable, r=None)— All orderings of lengthr(default: full length). (a,b) and (b,a) both appear.
import itertools
list(itertools.combinations([1,2,3], 2)) # [(1,2), (1,3), (2,3)]
list(itertools.permutations([1,2], 2)) # [(1,2), (2,1)]
list(itertools.combinations_with_replacement([1,2], 2)) # [(1,1),(1,2),(2,2)]
Product
itertools.product(*iterables, repeat=1) — Cartesian product: all pairs (or tuples) from the given iterables. product(A, B) is like nested loops “for a in A: for b in B.” With repeat=k, one iterable is used k times (e.g., all k-tuples from a set of choices).
list(itertools.product([1,2], ['a','b'])) # [(1,'a'),(1,'b'),(2,'a'),(2,'b')]
list(itertools.product([0,1], repeat=3)) # all 3-bit: (0,0,0)..(1,1,1)
Chain and Slice
itertools.chain(*iterables)— Iterate over the first iterable, then the second, and so on. Flattens one level.chain.from_iterable(iterables)takes one iterable of iterables.itertools.islice(iterable, start, stop, step)— Slice an iterator (like list slice but lazy). No negative indices.
list(itertools.chain([1,2], [3,4])) # [1, 2, 3, 4]
list(itertools.islice(range(10), 2, 6)) # [2, 3, 4, 5]
groupby
itertools.groupby(iterable, key=None) — Group consecutive elements that have the same key. The iterable should be sorted by the key for meaningful groups. Yields (key, group_iterator) pairs. Useful for “run-length” style grouping.
# Consecutive equal elements (sort first if needed)
for k, g in itertools.groupby([1,1,2,2,2,3]):
print(k, list(g)) # 1 [1,1], 2 [2,2,2], 3 [3]
Other Useful Tools
itertools.cycle(iterable)— Repeat the iterable forever (use with islice to limit).itertools.repeat(x, times=None)— Yieldxrepeatedly (ortimestimes).itertools.count(start=0, step=1)— Infinite counter.
For “all subsets of size k” from a list: itertools.combinations(arr, k). For “all permutations of length k”: itertools.permutations(arr, k). For nested loops over indices: product(range(n), range(m)). Convert to list only if you need to index or reuse; otherwise iterate once to save memory.
Summary
- combinations / permutations / product — enumerate subsets, orderings, or Cartesian product; lazy iterators.
- chain, islice, groupby — combine or slice iterators, group consecutive keys.
- Use itertools for “all combinations/permutations/pairs” to avoid manual loops and keep code clear.
2.18 functools
Introduction
The functools module provides tools for higher-order functions: caching, partial application, and reduction. For DSA the most important is lru_cache—a decorator that memoizes function results by argument (with an optional size limit). It turns a recursive solution with repeated subproblems into an efficient one without writing a cache by hand. Other tools like partial and reduce are useful in general Python but less central to algorithm implementation.
Why functools Matters for DSA
Recursive solutions often recompute the same arguments (e.g., Fibonacci, many DP problems). Memoization stores the return value for each argument so the second time you call with the same args you get the cached value. @lru_cache does this automatically: add the decorator and ensure arguments are hashable (e.g., use tuples instead of lists for state). Then the recursive function runs in time proportional to distinct argument values instead of exploding exponentially.
lru_cache
functools.lru_cache(maxsize=128, typed=False) — Decorator that caches the most recent results of a function. Arguments must be hashable (so use tuples, not lists, for composite state). Call the function with the same arguments again and you get the cached result (O(1) lookup) instead of recomputing.
from functools import lru_cache
@lru_cache(maxsize=None) # unbounded cache
def fib(n):
if n <= 1:
return n
return fib(n - 1) + fib(n - 2)
fib(100) # fast — each n computed once
Without the decorator, fib(n) would be exponential. With it, each distinct n is computed once; subsequent calls with the same n return the cached value. Use maxsize=None for unbounded cache (all distinct argument tuples stored). Use a number (e.g., 1000) to limit memory; LRU evicts least recently used entries when full.
Hashable Arguments Only
The cache uses arguments as dict keys, so they must be hashable. If your state includes a list, convert it to a tuple for the call: f(tuple(arr), i). For multiple arguments, lru_cache treats the whole (arg1, arg2, ...) as the key.
Memoization = “remember results by arguments.” lru_cache is memoization with an optional cap (LRU eviction). Same idea as a manual dict cache: if args in cache: return cache[args]; cache[args] = result. The decorator does this for you and handles thread-safety and size limit.
partial
functools.partial(func, *args, **kwargs) — Returns a new callable that fixes some arguments of func. Useful when you need to pass a function that takes one argument (e.g., to sorted or map) but your function takes two—fix the second with partial.
from functools import partial
def power(base, exp):
return base ** exp
square = partial(power, exp=2)
square(5) # 25
reduce
functools.reduce(function, iterable, initializer=None) — Apply a two-argument function cumulatively: function(function(...(function(initial, first), second), ...), last). Classic example: reduce(lambda a, b: a + b, [1,2,3], 0) → 6. For “fold” or “reduce a sequence to one value” it’s built-in; for readability, a simple loop is often clearer. Use when it fits the problem (e.g., product of a list: reduce(operator.mul, arr, 1)).
Using lru_cache on a function that takes unhashable arguments (e.g., a list). You’ll get TypeError: unhashable type: 'list'. Convert lists to tuples when calling, or use a tuple of primitive args that describes the state. Also: clear the cache between test cases in competitive coding—use fib.cache_clear() (or your decorated function’s cache_clear()).
In DP/recursion, define the function with the minimal set of (hashable) parameters that define the state. Decorate with @lru_cache(maxsize=None). If you have a list that’s part of the state, pass indices or a tuple of relevant values instead of the list so the key is hashable.
Summary
- lru_cache — Memoization decorator; use for recursive/DP functions with hashable args to avoid recomputation.
- partial — Fix some arguments of a function; useful for callbacks. reduce — Fold an iterable to a single value.
- For DSA,
lru_cacheis the main tool; ensure arguments are hashable (e.g., tuple, int).
2.19 Python Memory Management (Reference Counting & GC)
Introduction
Python manages memory mainly through reference counting: each object keeps a count of how many references point to it. When that count drops to zero, the object is reclaimed immediately. A garbage collector (GC) also runs to break reference cycles (e.g., A refers to B, B refers to A) that reference counting alone cannot free. For DSA you rarely tune this—but understanding “variables are references” and “when is an object freed?” helps you reason about aliasing, copies, and space.
Why This Matters for DSA
You don’t usually optimize Python memory at the level of refcounts. What matters is: assignment doesn’t copy objects (a = b means both names point to the same object); mutating that object affects all names; when you need a copy, use .copy() or list() or slicing. Knowing that “no more references = object can be freed” explains why a big structure can be garbage-collected when you leave the only reference to it (e.g., reassign the variable or return from a function and drop the local reference).
Reference Counting
Every object has a reference count: how many names, containers, or other objects point to it. When you assign x = [1, 2], the list’s refcount is 1. When you do y = x, it becomes 2—same object, two names. When y is reassigned or goes out of scope, the count drops. When it reaches 0, Python frees the object (and any objects it alone referenced). This is automatic and immediate for non-cyclic structures.
import sys
a = [1, 2, 3]
sys.getrefcount(a) # 2 (a + the argument to getrefcount)
b = a
sys.getrefcount(a) # 3
del b
# refcount drops; when a is no longer used, list is freed
getrefcount is for curiosity; the exact number includes the temporary reference from passing a to the function. The idea: more references = higher count; when all references are gone, the object is reclaimed.
Cycles and the Garbage Collector
If object A holds a reference to B and B holds a reference to A, their refcounts never become 0 even if no external reference exists. Python’s cycle detector (in the gc module) periodically finds such cycles and reclaims them. So you can have circular references (e.g., a graph node pointing to neighbors) and they will still be collected when unreachable. You normally don’t need to call gc.collect(); the runtime runs it when needed.
Implications for Your Code
- Aliasing:
a = bdoes not copy; both refer to the same object. Changes viaaare visible viab. For DSA this is why “pass by reference” for lists matters: mutating inside a function affects the caller’s list. - When objects are freed: When the last reference disappears (variable reassigned, scope left, container holding the reference cleared), the object becomes eligible for reclamation. So a large local list is freed when the function returns (assuming you don’t return it or store it elsewhere).
- No explicit free: You don’t manually free memory; dropping references is enough. To release a large structure early, delete the name (
del big_list) or reassign it so nothing else points to the object.
“Reference” here means a pointer to an object: a variable name, an element in a list, a dict value, etc. Immutable objects (ints, tuples) can be interned or shared by the implementation, so refcount might be higher than you expect—but the model “no references → freed” still holds.
Summary
- Python uses reference counting plus a garbage collector for cycles. When refcount hits 0 (and no cycle keeps it alive), the object is reclaimed.
- Assignment copies the reference, not the object—so aliasing and mutation matter. Use explicit copy when you need an independent copy.
- For DSA, this backs your mental model of “pass by reference” and when big structures get freed (when no reference remains).
2.20 Internals of list & dict (Dynamic Resizing & Amortized Analysis)
Introduction
Lists and dicts in Python grow dynamically: they allocate more memory when needed instead of fixing a size upfront. That growth is done in chunks (list) or by resizing the hash table (dict), so individual operations stay fast on average. Understanding dynamic resizing and amortized analysis explains why we say list.append is O(1) amortized and dict get/set is O(1) average—and why the occasional "expensive" resize doesn't make the whole sequence slow.
Why This Matters for DSA
When we state "append is O(1) amortized," we mean: over n appends, total time is O(n), so average per append is O(1). You don't need to implement resizing yourself—but knowing that lists over-allocate and resize in steps (and that dicts resize when load factor gets high) lets you trust the complexity we use in analysis and avoid micro-optimizations like preallocating list size "to be safe."
List: Dynamic Array
A list is implemented as a dynamic array: a contiguous block of pointers to objects. When you append and the current block is full, Python allocates a larger block, copies the pointers over, and frees the old one. It doesn't grow by one slot each time—it uses a growth strategy (in CPython, the new capacity is computed so that the sequence of appends does O(n) total work). So most appends just write into an existing empty slot (O(1)); occasionally one triggers a resize (O(current size)), but that cost is "spread" over the next many cheap appends. That's amortized O(1) per append.
Amortized means: if you do n operations, total cost is O(n), so average cost per operation is O(1). A single operation might sometimes be O(k) (e.g., resize when size was k), but the sum of all such costs over the life of the structure is linear in the number of operations. So we say "append is O(1) amortized."
Dict: Hash Table and Resizing
A dict is a hash table: an array of buckets, each holding key-value entries. On get/set, the key is hashed and the bucket is found; collisions are handled (e.g., by probing or chaining). When the table gets too full (high load factor = number of entries / number of buckets), the table is resized (e.g., doubled), all entries are rehashed into the new table. That resize is O(n), but it happens only when the load factor crosses a threshold, so over many insertions the average cost per insertion stays O(1). Deletes don't shrink the table immediately in CPython; the table can grow and stay large. So we say get/set/delete are O(1) average (assuming a good hash function and normal load).
Amortized Analysis Intuition
For a list: suppose every time we double the capacity we do work proportional to the current size. So we do 1 + 2 + 4 + 8 + … + (capacity at some point) units of copy work. That sum is less than 2 × final size. So for n appends, total copy work is O(n), hence O(1) per append on average. The "expensive" resizes are rare and their cost is amortized over the many cheap appends that follow. Same idea for dict: occasional O(n) resize, but averaged over all insertions, each insertion is O(1).
In interviews you can say: "List append is O(1) amortized because the list over-allocates and resizes in chunks; over n appends the total work is O(n). Dict get/set is O(1) average because it's a hash table with resizing when load factor gets high." That shows you understand the internals at a level that supports complexity analysis.
Summary
- List = dynamic array; grows in chunks so
appendis O(1) amortized (total work for n appends is O(n)). - Dict = hash table; resizes when load factor is high so get/set are O(1) average.
- Amortized = total cost of n operations is O(n), so average per operation is O(1). Occasional expensive resize is spread over many cheap operations.
2.21 Object Internals (slots, is vs ==)
Introduction
Two concepts that come up in Python object model: is vs == (identity vs value equality), and __slots__ (restricting instance attributes to save memory and speed attribute access). For DSA you'll mostly care about is vs ==—especially using is None instead of == None by convention—and rarely __slots__ unless you create huge numbers of small objects.
is vs ==
is checks identity: do two names refer to the exact same object in memory? == checks value equality: do the objects compare equal (via __eq__)? Two different list instances with the same elements are == but not is. The same object is both is and ==.
a = [1, 2, 3]
b = [1, 2, 3]
a == b # True — same values
a is b # False — different objects
c = a
a is c # True — same object
Use is None (and is not None) to check for None—it's the conventional and correct way, because there's only one None object. Using == None works but can be overridden by a custom __eq__; is None cannot. For other values, use == when you care about equality of content.
Using == when you mean identity (e.g., "if x is None") or using is when you mean value equality. For numbers and strings, is can seem to work due to interning/caching of small integers, but don't rely on it—use == for comparing values.
__slots__
By default, each instance of a class has a __dict__ that stores its attributes—flexible but uses extra memory per instance. __slots__ is a class attribute that lists the only attribute names the instances are allowed to have. The class then uses a fixed-size structure (like a tuple) instead of a dict for those attributes, saving memory and giving slightly faster attribute access. You can't add new instance attributes beyond those in __slots__.
class Point:
__slots__ = ('x', 'y')
def __init__(self, x, y):
self.x, self.y = x, y
p = Point(1, 2)
# p.z = 3 # AttributeError — not in __slots__
Use __slots__ when you have many small instances (e.g., graph nodes, events) and want to reduce memory. For DSA it's an optimization, not required for correctness.
Summary
is= identity (same object);=== value equality. Useis None/is not Nonefor None checks.__slots__= fixed set of instance attributes; saves memory and speeds access when you have many small objects.
2.22 Concurrency Basics (GIL & Asyncio Overview)
Introduction
Python has a Global Interpreter Lock (GIL): only one thread can execute Python bytecode at a time in a single process. So multithreading doesn't give you parallel execution of CPU-bound Python code—threads take turns. For I/O-bound work (network, disk), threads can still help because the lock is released while waiting. Asyncio is a different model: cooperative concurrency with async/await, one thread, many tasks that yield during I/O. For DSA and algorithm interviews you rarely need concurrency; this section is an overview so you know the landscape.
The GIL (Global Interpreter Lock)
The GIL is a mutex that protects access to Python objects. Only one thread holds it at a time, so only one thread runs Python bytecode at a time. That simplifies the interpreter (no fine-grained locking on every object) but means:
- CPU-bound: Multiple threads running pure Python computation don't run in parallel—they serialize. To use multiple CPU cores for CPU-bound work, use multiprocessing (separate processes, each with its own interpreter and GIL) or offload to C extensions that release the GIL.
- I/O-bound: While a thread is waiting for I/O (network, file), it typically releases the GIL. So other threads can run. Multithreading can still speed up I/O-bound programs (e.g., many simultaneous HTTP requests) because waiting is overlapped.
For algorithm contests and interviews, code is almost always single-threaded. The GIL matters when you design systems (e.g., "should we use threads or processes for this worker pool?") or when you optimize I/O-heavy scripts. For DSA problem-solving, you can ignore the GIL.
Asyncio Overview
Asyncio provides cooperative concurrency: you define coroutines with async def and await other coroutines or I/O operations. One thread runs an event loop; when a coroutine hits await (e.g., waiting for a socket), the loop can run another coroutine. So many I/O operations can be in flight without blocking the thread—useful for servers handling many connections or scripts making many HTTP requests. It's not parallelism (one thread); it's concurrency (overlapping I/O wait).
Key ideas: async def defines a coroutine; await yields until the awaited thing completes; asyncio.run(main()) runs the top-level coroutine. For DSA you don't need to write async code; just know that asyncio is for I/O-bound concurrency on one thread, not for making CPU-bound algorithms faster.
When to Use What
- Single-threaded: Default for DSA, scripts, most interview code.
- Multithreading: I/O-bound work where you want to overlap waits (e.g., many URLs); limited by GIL for CPU.
- Multiprocessing: CPU-bound work to use multiple cores; no GIL sharing (each process has its own).
- Asyncio: I/O-bound, many concurrent operations; one thread, cooperative scheduling.
Summary
- The GIL allows only one thread to run Python bytecode at a time; multithreading doesn't parallelize CPU-bound Python code.
- Asyncio = cooperative concurrency with
async/await; good for I/O-bound, one thread. - For DSA and interviews, single-threaded code is the norm; concurrency is for system design and I/O-heavy applications.
3.1 Big-O Notation
Introduction
Big-O notation describes how the time or space used by an algorithm grows as the input size grows. We use it to compare algorithms and to state guarantees: "this runs in O(n) time" means the running time is at most proportional to n (for large n), up to a constant factor. Big-O is the language of complexity analysis in interviews and in practice—you must be able to state it, derive it from code, and compare two Big-O expressions.
Why Big-O Matters
We care about growth rate, not the exact number of nanoseconds. When n doubles, does the time double (O(n)), quadruple (O(n²)), or stay roughly the same (O(1))? That tells us whether an algorithm will scale. Problem constraints (e.g., n ≤ 10⁵) combined with Big-O tell you if a solution will pass (e.g., O(n log n) is fine, O(n²) might be too slow). In interviews, you're expected to state complexity and justify it.
Formal Definition (Intuitive)
We say T(n) is O(g(n)) if, for large enough n, T(n) is at most a constant multiple of g(n). In symbols: there exist constants c > 0 and n₀ such that for all n ≥ n₀, T(n) ≤ c · g(n). So we ignore constant factors and lower-order terms: 5n + 3 is O(n), 2n² + n is O(n²). Big-O is an upper bound: "no worse than this growth."
Common Complexity Classes
From best to worst (for large n):
- O(1) — Constant. Same cost regardless of n (e.g., hash lookup, array index).
- O(log n) — Logarithmic. Doubling n adds a constant amount of work (e.g., binary search, balanced tree operations).
- O(n) — Linear. Doubling n doubles the work (e.g., one pass over an array).
- O(n log n) — Linearithmic. Typical for efficient sorting (e.g., merge sort, heapsort).
- O(n²) — Quadratic. Nested loops over n (e.g., two loops over the array).
- O(n³), O(2ⁿ), O(n!) — Higher; often too slow for large n unless the problem size is tiny.
We usually write "O(n)" not "O(n²)" when the bound is linear—we mean the function inside the O. So "runs in O(n) time" means time ≤ c·n for some c and large n. Base of logarithm doesn't matter for Big-O (log₂ n and log₁₀ n differ by a constant factor), so we write O(log n).
How to Derive Big-O from Code
- Count basic steps (or representative operations) as a function of input size n.
- Drop constant factors (e.g., 3n → n).
- Keep the dominant term (the one that grows fastest as n grows). So 5n + 10 → O(n); n² + n → O(n²).
Single loop that does O(1) work per iteration → O(n). Two nested loops, each over n → O(n²). Loop that halves n each time → O(log n). Recursion with one call per level and n levels → O(n); with two calls per level and n levels (like naive Fibonacci) → exponential.
# O(n): one loop
for i in range(len(arr)):
process(arr[i])
# O(n²): two nested loops
for i in range(n):
for j in range(n):
do_something()
Summary
- Big-O = upper bound on growth rate; we ignore constants and lower-order terms.
- Common: O(1), O(log n), O(n), O(n log n), O(n²); know what they mean and when they arise.
- Derive from code by counting steps, dropping constants, keeping the dominant term.
3.2 Theta & Omega Notation
Introduction
Big-O gives an upper bound: "the algorithm is no worse than this growth." Sometimes we need a lower bound ("at least this much work") or a tight bound ("exactly this growth, up to constants"). Omega (Ω) is the lower bound; Theta (Θ) means "both upper and lower bound"—the growth is tightly characterized. Together, O, Ω, and Θ are the standard asymptotic notation; in interviews you'll mostly use O, but knowing Θ and Ω helps you state "optimal" precisely and understand lower-bound arguments.
Why This Topic Matters
When we say "this algorithm is optimal," we often mean its running time is Θ(f(n)) and that any algorithm for the problem must take Ω(f(n))—so we've matched the lower bound. For example, comparison-based sorting is Ω(n log n); merge sort is O(n log n), so merge sort is optimal (Θ(n log n) for that problem). Omega is also used in proofs: "you must look at every element at least once, so the algorithm is Ω(n)." Theta is the right way to say "the complexity is exactly this order of growth."
Omega (Ω): Lower Bound
We say T(n) is Ω(g(n)) if, for large enough n, T(n) is at least a constant multiple of g(n). Formally: there exist constants c > 0 and n₀ such that for all n ≥ n₀, T(n) ≥ c · g(n). So Ω describes a lower bound: "the algorithm does at least this much work." Example: finding the maximum in an unsorted array by comparison is Ω(n), because you must look at every element (otherwise an unseen element might be the max). So any correct algorithm is at least linear.
Big-O = "no worse than" (upper bound). Omega = "no better than" (lower bound). Saying "T(n) is Ω(n)" means T(n) grows at least as fast as n (up to a constant). So we're giving a guarantee that the cost doesn't disappear or grow slower than n.
Theta (Θ): Tight Bound
We say T(n) is Θ(g(n)) if T(n) is both O(g(n)) and Ω(g(n)). So for large n, T(n) is sandwiched between two constant multiples of g(n): c₁·g(n) ≤ T(n) ≤ c₂·g(n) for some c₁, c₂ > 0 and n ≥ n₀. That means the growth rate is exactly g(n), up to constants—no faster, no slower. Example: merge sort does Θ(n log n) comparisons—we can prove both O(n log n) and Ω(n log n) for comparison-based sorting on n elements, so Θ(n log n) is the tight bound.
Relationship: T(n) = Θ(g(n)) if and only if T(n) = O(g(n)) and T(n) = Ω(g(n)). So when you've proved both an upper and a lower bound with the same g(n), you write Θ(g(n)).
Comparison of O, Ω, and Θ
| Notation | Meaning | Use |
|---|---|---|
| O(g(n)) | T(n) ≤ c·g(n) for large n (upper bound) | "At most this fast" |
| Ω(g(n)) | T(n) ≥ c·g(n) for large n (lower bound) | "At least this slow" |
| Θ(g(n)) | T(n) between c₁·g(n) and c₂·g(n) (tight) | "Exactly this growth" |
Examples
- Linear scan to find max: We do n−1 comparisons → O(n). We must touch every element → Ω(n). So the algorithm is Θ(n).
- Binary search (success): Each step halves the range → O(log n) comparisons. We need at least log₂ n steps to narrow from n to 1 → Ω(log n). So Θ(log n).
- Bubble sort: O(n²) (nested loops). It can be Ω(n²) in the worst case (e.g., reverse order), so worst case is Θ(n²). Best case (already sorted) is O(n) but not Θ(n²)—Theta is for a specific scenario (e.g., worst case).
Claim: "Comparison-based sorting of n elements is Ω(n log n)." So any algorithm that only compares elements must do at least on the order of n log n comparisons in the worst case. Merge sort does O(n log n), so merge sort is optimal for that model—its worst case is Θ(n log n).
When to Use Which
- O: Most common. "My algorithm runs in O(n²) time." Safe and correct as long as you've proved an upper bound.
- Ω: When you're proving "you can't do better" or "at least this much work is required." Used in lower-bound proofs and to justify optimality.
- Θ: When you've proved both upper and lower bounds with the same growth. "Merge sort is Θ(n log n) in the comparison model." More precise than O when you know the bound is tight.
Saying "this algorithm is Θ(n)" when you've only shown it's O(n). Theta requires a matching lower bound. For example, "find max in array" is O(n) and Ω(n) so Θ(n). But "check if array has an even number" can be done in O(1) if you see an even early—you might not read all n elements, so it's O(n) but not Ω(n) in general; the worst case might still be Θ(n) if you must scan all in the worst case.
In interviews you'll usually state Big-O. If the interviewer asks "is that tight?" or "can we do better?", you can say: "It's O(n); and we need Ω(n) because we must look at every element, so it's Θ(n)—optimal for this problem." That shows you understand upper and lower bounds.
Summary
- Ω(g(n)) = lower bound: T(n) ≥ c·g(n) for large n. "At least this much work."
- Θ(g(n)) = tight bound: T(n) = O(g(n)) and T(n) = Ω(g(n)). "Exactly this growth."
- Use O for upper bounds (most common); use Ω for lower-bound proofs; use Θ when you have both and they match.
3.3 Best / Worst / Average Case
Introduction
So far we've talked about Big-O, Theta, and Omega as ways to describe how an algorithm's cost grows with input size. But the same algorithm can take different amounts of time depending on what the input looks like. Linear search might find the target in the first cell (one step) or in the last (n steps). That's why we distinguish best case, worst case, and average case: they tell us how the algorithm behaves under different input scenarios. Master this and you'll know exactly what to say in interviews and how to choose algorithms in practice.
Real-World Analogy
Imagine searching for your keys at home. Best case: they're on the table by the door—you find them in one look. Worst case: they're in the last drawer you check, after searching every room. Average case: sometimes they're in the first place, sometimes the middle, sometimes the last—over many days, you do "about half" the possible search. The method (where you look) is the same; only the input (where the keys actually are) changes the cost. Algorithms work the same way: same code, different inputs → different runtimes.
Formal Definitions
Best Case
The best case is the scenario (or type of input) that minimizes the algorithm's work. For a given input size n, it's the minimum number of steps (or comparisons, or operations) the algorithm can take over all valid inputs of that size. We express it with Big-O (or Theta when the bound is tight) and call it "best-case time complexity."
- Example: Linear search in an array of size n. Best case = target is at index 0 → 1 comparison → O(1) best case.
Worst Case
The worst case is the scenario that maximizes the algorithm's work. For input size n, it's the maximum number of steps over all valid inputs. We almost always care about worst case when we say "time complexity" because it's a guarantee: the algorithm will never do more than this, no matter how unlucky the input.
- Example: Linear search, target not present or at last index → n comparisons → O(n) worst case.
Average Case
The average case is the expected (average) number of steps over some distribution of inputs—usually we assume all inputs of size n are equally likely, or we use a probability distribution that matches reality. Average case is harder to define and analyze than best/worst, but it often reflects "typical" performance.
- Example: Linear search with target equally likely at any index (1 to n) or absent. If present: average position ≈ n/2 → about n/2 comparisons. Often we still say O(n) average, since the constant factor doesn't change the growth class.
Best/worst/average refer to which input we're considering, not different algorithms. We then describe each with asymptotic notation (O, Θ, Ω). So we get phrases like "worst-case O(n²)" or "average-case Θ(n log n)."
Why This Topic Matters
In interviews and in production, you need to know: (1) Worst case—so you can guarantee latency and avoid surprises. (2) Best case—so you know when an algorithm can be very fast (e.g., early exit). (3) Average case—when inputs are random or typical, average often predicts real behavior. Many algorithms (e.g., quicksort) have a bad worst case but a good average case; choosing them means accepting that worst case in exchange for speed on average.
Mental Model
Think of the algorithm as a fixed procedure. For each input size n, imagine all possible inputs of that size. Each input leads to some number of steps. Best case = minimum over those inputs; worst case = maximum; average case = average (with a defined distribution). The three cases can have different Big-O classes (e.g., best O(1), worst O(n), average O(n)).
Step-by-Step: Analyzing an Algorithm for All Three Cases
- Identify the "cost" you're measuring (e.g., comparisons, swaps, or total operations).
- Ask: what input minimizes this cost? That gives the best case; count steps and express in Big-O.
- Ask: what input maximizes this cost? That gives the worst case.
- For average case, define a distribution over inputs (e.g., uniform), compute expected cost, then simplify to Big-O.
Example: Linear Search
Problem: find index of target in list arr, or return -1 if not present.
def linear_search(arr, target):
for i in range(len(arr)):
if arr[i] == target:
return i
return -1
- Best case: target at index 0. One comparison, then return. → O(1).
- Worst case: target at last index or absent. n comparisons. → O(n).
- Average case (target in array, uniform position): Position k (1-based) with probability 1/n; comparisons = k. Expected comparisons = (1 + 2 + … + n) / n = n(n+1)/(2n) = (n+1)/2 → Θ(n).
Example: Binary Search (on sorted array)
We repeatedly compare with the middle and discard half. Best case: target is the middle element → 1 comparison → O(1). Worst case: target absent or at a leaf of the "decision tree"—we halve until size 1 → about log₂ n comparisons → O(log n). Average case (assuming target equally likely in any position): also on the order of log n → O(log n). So for binary search, best is O(1), worst and average are O(log n).
Example: Bubble Sort
Compare adjacent pairs and swap if out of order; repeat until no swaps. Best case: array already sorted → one pass, no swaps, early exit possible → O(n). Worst case: array reverse sorted → every pass does maximum swaps, O(n) passes → O(n²). Average case: random order → still O(n²) comparisons and swaps on average. So we say: best O(n), worst O(n²), average O(n²).
ASCII Diagram: Where Do the Cases Come From?
Input size n = 5. "Cost" = number of comparisons (e.g., linear search).
Input: [T, ?, ?, ?, ?] [?, ?, ?, ?, T] [?, T, ?, ?, ?] ...
(target at 0) (target at 4) (target at 1)
Cost: 1 5 2 ...
Best case = min(1, 5, 2, ...) = 1 → O(1)
Worst case = max(1, 5, 2, ...) = 5 → O(n)
Average = (1+5+2+...)/#inputs → O(n)
Comparison Table: Best / Worst / Average
| Algorithm | Best | Worst | Average |
|---|---|---|---|
| Linear search | O(1) | O(n) | O(n) |
| Binary search (sorted) | O(1) | O(log n) | O(log n) |
| Bubble sort | O(n) | O(n²) | O(n²) |
| Insertion sort | O(n) | O(n²) | O(n²) |
| Quicksort (naive pivot) | O(n log n) | O(n²) | O(n log n) |
| Merge sort | O(n log n) | O(n log n) | O(n log n) |
Edge Cases and Assumptions
- Empty input: Often O(1) or a single check. Sometimes counted separately or as part of best case.
- Average case depends on distribution: If keys are usually at the start, "average" for linear search is better than n/2. We usually assume uniform or "random" when we say "average."
- Worst case can be rare: Quicksort's O(n²) worst case happens for specific inputs (e.g., already sorted with first-element pivot); random shuffling makes it unlikely.
Confusing "best case" with "fast algorithm." Best case only says: "in the most favorable input, we do this well." A bad algorithm can still have a great best case (e.g., one comparison) but terrible worst case. Always state which case you mean: "Linear search is O(1) in the best case and O(n) in the worst case."
When you see "time complexity" without "best/worst/average," it almost always means worst case. So "sorting is O(n log n)" means worst-case O(n log n) for algorithms like merge sort. If someone says "average case," they'll say it explicitly.
Interview Insight
Interviewers often ask: "What's the time complexity?" Follow up with: "Best, worst, or average?" Then answer precisely: "Worst case O(n) because we might scan the whole array; best case O(1) if the target is first." For sorting, be ready to say why quicksort is used in practice (good average case) despite O(n²) worst case, and when merge sort is preferred (when you need a guaranteed bound).
Summary
- Best case = input that minimizes work; worst case = input that maximizes work; average case = expected work over a chosen input distribution.
- Same algorithm can have different O(·) for each case (e.g., linear search: O(1) best, O(n) worst and average).
- Default "time complexity" usually means worst case. State best/worst/average explicitly when it matters.
- Analyze by asking: what input minimizes/maximizes cost? Then express each with Big-O (or Θ when tight).
3.4 Loop & Nested Loop Analysis
Introduction
Most iterative algorithms are built from loops. To get their time complexity, we need a reliable way to count how many times the loop body runs and how much work each run does. A single loop that runs n times with O(1) work per iteration gives O(n). Two nested loops, each over n, give O(n²). This section teaches you to analyze single loops, nested loops, and common variations (loop bounds that change, multiple sequential loops, loop with a shrinking range) so you can derive Big-O from code quickly and correctly.
Why This Topic Matters
Loop structure is the main source of complexity in non-recursive code. If you can analyze loops, you can analyze most iterative algorithms—searching, sorting, matrix traversal, and many DP and greedy solutions. Interview problems often have obvious loop structure; being able to say "two nested loops over n → O(n²)" or "outer n, inner halves each time → O(n log n)" shows you understand complexity at a glance.
Mental Model
Total time = (number of iterations) × (work per iteration), where "work per iteration" is itself in Big-O. So we (1) figure out how many times the loop runs as a function of n, (2) figure out the cost per iteration (constant, or another loop, etc.), (3) multiply (or sum if different iterations do different work), then (4) simplify to Big-O by keeping the dominant term.
Single Loop Analysis
Fixed number of iterations
If the loop runs a constant number of times (e.g., 10, 100) independent of input size n, total cost is O(1). Example: a loop that runs exactly 5 times doing O(1) work each time → 5 × O(1) = O(1).
Loop runs n times
Most common: for i in range(n): or for i in range(len(arr)):. If the body does O(1) work per iteration, total is n × O(1) = O(n). If the body does O(log n) work per iteration (e.g., a binary search or a balanced-tree operation inside), total is n × O(log n) = O(n log n).
# O(n): n iterations, O(1) per iteration
for i in range(n):
do_something_constant()
# Still O(n): body is O(1) in terms of n
for i in range(n):
x = arr[i] + 1
result.append(x)
Loop runs a fraction or multiple of n
If the loop runs n/2, 2n, or 5n times, we still get O(n)—constant factors are dropped. So for i in range(0, n, 2): runs n/2 times → O(n).
Loop that halves the range each time
If the loop variable doubles each time (e.g., i = 1; while i < n: i *= 2), the number of iterations is about log₂ n. With O(1) work per iteration → O(log n). If the body does O(n) work (e.g., process all elements), total is O(n log n).
# O(log n): i takes values 1, 2, 4, 8, ... up to n
i = 1
while i < n:
do_something_constant()
i *= 2
For a single loop, first ask: how many iterations as a function of n? Then: what is the cost of one iteration (in Big-O)? Multiply them. Sum only when different iterations do different amounts of work (e.g., inner loop size depends on outer index).
Nested Loop Analysis
Two loops, both run n times
Outer loop runs n times. For each outer iteration, inner loop runs n times. Inner body is O(1). So total = n × n × O(1) = O(n²). Classic pattern for "check every pair" or full matrix traversal.
# O(n²): n * n iterations, O(1) per inner iteration
for i in range(n):
for j in range(n):
process(i, j)
Inner loop depends on outer index
Very common: inner loop runs from 0 to i (or i to n). Example: for i in range(n): for j in range(i): .... Inner iterations: when i=0 → 0, i=1 → 1, …, i=n-1 → n-1. Total inner iterations = 0+1+2+…+(n-1) = n(n-1)/2 = Θ(n²). So still O(n²).
# O(n²): inner runs i times for each i; total 0+1+...+(n-1) = n(n-1)/2
for i in range(n):
for j in range(i):
do_something()
Why the sum 0+1+…+(n-1) is Θ(n²)
Sum of first k integers = 1+2+…+k = k(k+1)/2. So 0+1+…+(n-1) = (n-1)n/2 = (n² - n)/2. For large n, the n² term dominates; we drop the linear term and constant factor → Θ(n²). So any time the inner loop runs "up to the outer index" (or "from outer index to n"), expect O(n²).
Three nested loops, each O(n)
n × n × n = O(n³). Same idea: multiply the number of iterations of each loop when they're independent (or sum over outer index if inner bounds depend on it).
Step-by-Step: Deriving Complexity from Loops
- Identify the loops and their bounds (as functions of n or input size).
- For each loop, determine how many times it runs (exact or in Big-O). If the bound depends on another loop variable, express it in terms of that variable first.
- Multiply iteration counts for nested loops when the inner count doesn't depend on the outer index, or sum over the outer index when it does (e.g., inner runs 0 to i → sum i from 0 to n-1).
- Multiply by work per iteration if it's not O(1).
- Simplify to a single Big-O (drop constants, keep dominant term).
ASCII Diagram: Nested Loop Iterations
Outer i: 0 1 2 3 ... n-1 Inner j: (0) (0,1) (0,1,2) (0..3) ... (0..n-1) Count: 0 1 2 3 ... n-1 Total inner iterations = 0 + 1 + 2 + ... + (n-1) = n(n-1)/2 → O(n²)
Common Loop Patterns (Quick Reference)
| Pattern | Time |
|---|---|
| Single loop, n iterations, O(1) body | O(n) |
| Single loop, doubles until n (e.g. i *= 2) | O(log n) |
| Two nested loops, each n, O(1) inner body | O(n²) |
| Outer n, inner 0..i (or i..n) | O(n²) |
| Three nested loops, each n | O(n³) |
| Outer n, inner halves each time (e.g. binary search style) | O(n log n) |
Python Examples with Line-by-Line Complexity
Example 1: Sequential loops
def two_loops(arr):
total = 0
for x in arr: # n iterations, O(1) each → O(n)
total += x
for x in arr: # n iterations, O(1) each → O(n)
total += x * 2
return total # O(n) + O(n) = O(n)
Two independent loops, each O(n). We add their costs: O(n) + O(n) = O(n). So overall O(n).
Example 2: Nested loops (full pair check)
def count_pairs(arr):
count = 0
for i in range(len(arr)): # n times
for j in range(len(arr)): # n times each → n * n
if i != j and arr[i] + arr[j] == 0:
count += 1
return count # O(n²)
Outer n, inner n, body O(1). Total O(n²).
Example 3: Inner loop from 0 to i
def prefix_sums(arr):
n = len(arr)
result = [0] * n
for i in range(n): # i = 0, 1, ..., n-1
for j in range(i + 1): # j runs 1, 2, ..., n times
result[i] += arr[j] # inner: 1+2+...+n = n(n+1)/2
return result # O(n²)
Inner iterations: 1 + 2 + … + n = n(n+1)/2 → O(n²).
Edge Cases
- Loop with early break/return: Complexity is the worst case over all inputs—assume the loop runs as many times as the bound allows unless you're analyzing best/average separately.
- Loop bound is min(n, k) or similar: If k is a constant, effective iterations ≤ k → O(1). If k is a parameter, express complexity in both n and k (e.g., O(n × k)).
- Inner loop that doesn't start at 0: Same idea—count how many times it runs for each outer value and sum (e.g., j from i to n-1 gives (n-i) iterations for outer i; sum = n(n-1)/2 → O(n²)).
Multiplying when you should add: two sequential loops (one after the other) add: O(n) + O(n) = O(n). Two nested loops multiply: O(n) × O(n) = O(n²). Also: confusing "inner runs i times" with "inner runs n times"—the total is the sum over i (0 to n-1), which is Θ(n²), not n.
When the inner loop's bound depends on the outer index, write the total as a sum (e.g., Σ i from 0 to n-1). That sum is often an arithmetic series; simplify it to a closed form (e.g., n(n-1)/2) and then to Big-O. This is how you prove "nested loop with inner 0..i" is O(n²), not O(n).
Interviewers often ask "what's the time complexity?" after you write a solution. For loop-based code, say the structure clearly: "We have two nested loops over the array, so O(n²)." If the inner bound depends on the outer index, briefly say "inner runs up to i each time, so total iterations are on the order of n²." That shows you can derive it, not just memorize.
Summary
- Single loop: (iterations) × (work per iteration). n iterations with O(1) body → O(n); log n iterations → O(log n).
- Nested loops: Multiply when bounds are independent (n × n → O(n²)). When inner bound depends on outer index, sum over outer index (e.g., 0+1+…+(n-1) = Θ(n²)).
- Sequential loops: Add their costs: O(n) + O(n) = O(n).
- Derive by: count iterations per loop, multiply or sum as appropriate, multiply by work per iteration, then simplify to Big-O.
3.5 Recursion Tree Method
Introduction
Recursive algorithms are described by recurrence relations: equations that express T(n) in terms of T on smaller inputs (e.g., T(n) = 2T(n/2) + n). To find their time complexity, we need to solve the recurrence. The recursion tree method does this by drawing a tree: each node represents the cost at one level of recursion, and we sum the costs level by level (or across leaves) to get the total. It gives intuition, works for many recurrences that don't fit the Master Theorem, and is a standard tool in interviews and coursework.
Why This Topic Matters
Merge sort, quicksort, binary search, and many divide-and-conquer algorithms have runtimes given by recurrences like T(n) = 2T(n/2) + Θ(n). The recursion tree method lets you see why the total is O(n log n): you draw the tree, sum the work at each level, and observe that there are log n levels with O(n) work per level. When the Master Theorem doesn't apply (e.g., T(n) = T(n/3) + T(2n/3) + n), the tree method still works. It also builds the intuition you need for the Master Theorem and for substitution proofs.
What Is a Recurrence?
A recurrence is an equation that defines T(n) in terms of T on smaller arguments. Example: T(n) = 2T(n/2) + n says: "to solve a problem of size n, we do n work and then solve two subproblems of size n/2." We need a base case (e.g., T(1) = 1 or T(1) = O(1)) to stop the recursion. The goal is to find a closed form or Big-O for T(n).
In recurrences we often write T(n/2) even when n is odd; we assume n is a power of 2 for simplicity, or use floor/ceiling. The asymptotic result (e.g., O(n log n)) is the same. We also often write "+ n" or "+ Θ(n)" for the "non-recursive" work at the current level—the cost of dividing and combining.
Recursion Tree: The Idea
Imagine the recurrence as a tree:
- Root = one call of size n; we charge the "extra" work at this level (the + n part) to the root.
- Children = the recursive calls. So T(n) = 2T(n/2) + n gives two children, each of size n/2; we write the cost "n" at the root.
- Each child again has its own cost (n/2 at that level) and its children (size n/4), and so on until we hit the base case (size 1).
- Total cost = sum of the costs written at every node. We can sum by level: level 0 has 1 node with cost n; level 1 has 2 nodes with cost n/2 each → total n; level 2 has 4 nodes with cost n/4 each → total n; … So each level sums to n, and there are log₂ n levels → total Θ(n log n).
Step-by-Step: Recursion Tree Method
- Write the recurrence in the form T(n) = (sum of T(subproblems)) + (work at this level). Put the "work at this level" as the node cost.
- Draw the tree (at least conceptually): root cost, then children with their costs, and so on. Identify the pattern: how many children per node? What size? What cost per node at level i?
- Find the number of levels until base case. For T(n/2), size halves each level → about log₂ n levels. For T(n/3), about log₃ n levels.
- Sum the cost per level, then sum over levels. Or sum over all leaves (if each leaf has the same cost and you know the count) and add the internal cost—whichever is easier.
- Simplify to Big-O. Often the sum is a geometric series or "constant per level × number of levels."
Worked Example: T(n) = 2T(n/2) + n
This is the merge sort recurrence. Assume T(1) = Θ(1).
Level 0: [n] cost = n
/ \
Level 1: [n/2] [n/2] cost = n/2 + n/2 = n
/ \ / \
Level 2: [n/4][n/4][n/4][n/4] cost = 4·(n/4) = n
...
Level k: 2^k nodes, each cost n/2^k cost = 2^k · (n/2^k) = n
Every level has total cost n. Number of levels: size goes n → n/2 → … → 1, so k levels when n/2^k = 1 → k = log₂ n. So total = n × log₂ n = Θ(n log n).
Worked Example: T(n) = 2T(n/2) + Θ(1)
Same structure, but only constant work at each node. Level 0: 1; level 1: 2; level 2: 4; … level k: 2^k nodes, each Θ(1). Total = 2^0 + 2^1 + … + 2^(log n) = 2^(log n + 1) − 1 = Θ(n). So T(n) = Θ(n)—the tree is a full binary tree with Θ(n) leaves, and constant work per node.
When the Per-Level Cost Changes
If the work at level i is not constant (e.g., it decreases with depth), write the cost at each level and sum. Example: T(n) = 2T(n/2) + n². Root cost n²; level 1: 2 × (n/2)² = n²/2; level 2: 4 × (n/4)² = n²/4; … level k: 2^k × (n/2^k)² = n²/2^k. Total = n²(1 + 1/2 + 1/4 + …) ≤ 2n² → O(n²). Here the root dominates; the series is geometric and sums to a constant.
ASCII: Summing by Levels vs by Leaves
Sum by levels (T(n)=2T(n/2)+n): L0: n L1: n L2: n ... L_log n: n Total: n * (log n + 1) = Θ(n log n) Sum by leaves (for T(n)=2T(n/2)+Θ(1)): Number of leaves = 2^(log n) = n Work per leaf = Θ(1) Total = Θ(n) (Internal work is also Θ(n), same order.)
Common Recurrence Patterns (Recursion Tree View)
| Recurrence | Result |
|---|---|
| T(n) = 2T(n/2) + n | Θ(n log n) |
| T(n) = 2T(n/2) + Θ(1) | Θ(n) |
| T(n) = 2T(n/2) + n² | Θ(n²) |
| T(n) = T(n/2) + n | Θ(n) |
| T(n) = T(n-1) + n | Θ(n²) |
Edge Cases and Tips
- Uneven splits: T(n) = T(n/3) + T(2n/3) + n. The tree is not full; the "depth" is determined by the branch that shrinks slowest (2n/3). After k steps, size is at most (2/3)^k n; so depth ≈ log_{3/2} n. Per-level cost is still O(n). Total O(n log n).
- More than two subproblems: T(n) = 3T(n/2) + n. Each node has 3 children of size n/2. Level k has 3^k nodes, cost (n/2^k) each → total n×(3/2)^k. Sum over k = 0 to log₂ n: geometric with ratio 3/2 → dominated by last term, O(n^(log₂ 3)).
- Base case: Use T(1) = c or T(0) = c. For asymptotic result, the exact base constant doesn't change Big-O.
Counting only the leaves and forgetting the internal work. T(n) = 2T(n/2) + n has Θ(n) leaves if we think of "work at base case," but the recurrence charges "+ n" at each internal node. So we must sum all node costs (or sum per level), not just the bottom level. For T(n) = 2T(n/2) + n, the internal work dominates (n log n); for T(n) = 2T(n/2) + Θ(1), leaf count gives Θ(n) and that's correct.
When per-level cost is the same at every level (e.g., n at each level for T(n)=2T(n/2)+n), total = (cost per level) × (number of levels). When per-level cost forms a geometric series (increasing or decreasing), the sum is dominated by the first or last term—use the geometric sum formula and then simplify to Big-O.
If asked "why is merge sort O(n log n)?" you can say: "The recurrence is T(n) = 2T(n/2) + n. In the recursion tree, each level does O(n) work and there are O(log n) levels, so total O(n log n)." Drawing a small tree (root + one level of children) is enough to show you understand the method. For recurrences that don't match the Master Theorem, saying "I'd draw the recursion tree and sum the levels" is a good approach.
Summary
- Recurrence = equation like T(n) = 2T(n/2) + n; recursion tree = picture of cost at each level of recursion.
- Method: Draw tree (root = current work, children = subcalls); find number of levels and cost per level; sum over levels (or use geometric series); simplify to Big-O.
- Classic: T(n) = 2T(n/2) + n → log n levels, n work per level → Θ(n log n).
- Use the tree when the Master Theorem doesn't apply or when you want a clear visual derivation.
3.6 Master Theorem
Introduction
The Master Theorem (or Master Method) gives a cookbook solution for recurrences of the form T(n) = aT(n/b) + f(n): we split the problem into a subproblems of size n/b, and do f(n) extra work. By comparing f(n) with the quantity nlogb a (the "critical exponent"), the theorem tells you whether the total time is dominated by the base-case work, balanced, or dominated by the top-level work—and gives the answer in one step. It's the fastest way to solve many divide-and-conquer recurrences without drawing the recursion tree.
Why This Topic Matters
Merge sort, binary search, and many recursive algorithms fit T(n) = aT(n/b) + f(n). The Master Theorem lets you state their complexity in seconds: "a=2, b=2, f(n)=n → n^(log_b a)=n, so Case 2 → Θ(n log n)." In interviews, knowing the three cases and when they apply is enough to justify divide-and-conquer runtimes. When the recurrence doesn't fit (e.g., a or b not constant, or f(n) has a different form), you fall back to the recursion tree or substitution method.
Standard Form and the Critical Exponent
Recurrence in standard form:
T(n) = aT(n/b) + f(n)
- a ≥ 1: number of subproblems per step.
- b > 1: factor by which the problem size shrinks (so each subproblem has size n/b).
- f(n): cost of dividing and combining (the "non-recursive" work). We assume f(n) is asymptotically positive.
The critical exponent is logb a. So the "size" of the recursive work (ignoring f) is like nlogb a: that's the number of leaves in the recursion tree (a^(log_b n) = n^(log_b a)). The theorem compares f(n) with nlogb a to decide which dominates.
We often write n/b meaning floor or ceiling; for the theorem we assume n is a power of b so that sizes are integers. The asymptotic result is unchanged. Also, f(n) is usually given as Θ(nk), Θ(nk log n), etc.; the theorem uses a precise "polynomial comparison" condition.
The Three Cases
Case 1: f(n) is polynomially smaller than nlogb a
If f(n) = O(nlogb a − ε) for some constant ε > 0, then the recursive part dominates. Result: T(n) = Θ(nlogb a).
Intuition: the work at the root (and each level) is tiny compared to the growth of the leaf count; total is dominated by the leaves.
Case 2: f(n) is the same order as nlogb a
If f(n) = Θ(nlogb a logk n) for some k ≥ 0 (usually k = 0 or 1), then work is balanced across levels. Result: T(n) = Θ(nlogb a logk+1 n). For k = 0: T(n) = Θ(nlogb a log n).
Intuition: every level does about the same total work; there are Θ(log n) levels, so we get an extra log factor.
Case 3: f(n) is polynomially larger than nlogb a
If f(n) = Ω(nlogb a + ε) for some ε > 0, and if the regularity condition holds (a·f(n/b) ≤ c·f(n) for some c < 1 and large n), then the top-level work dominates. Result: T(n) = Θ(f(n)).
Intuition: the root (and upper levels) do so much work that the sum is dominated by f(n). The regularity condition ensures that work decreases as we go down the tree.
Quick Reference Table
| Condition | T(n) |
|---|---|
| f(n) = O(nlog_b a − ε), ε > 0 | Θ(nlog_b a) |
| f(n) = Θ(nlog_b a logk n), k ≥ 0 | Θ(nlog_b a logk+1 n) |
| f(n) = Ω(nlog_b a + ε), ε > 0, and regularity | Θ(f(n)) |
Step-by-Step: How to Apply the Master Theorem
- Identify a, b, and f(n) from T(n) = aT(n/b) + f(n).
- Compute logb a. So nlogb a is your comparison benchmark. (Tip: logb a = (ln a)/(ln b).)
- Compare f(n) with nlogb a:
- If f(n) is O(nlog_b a − ε) for some ε > 0 → Case 1 → T(n) = Θ(nlog_b a).
- If f(n) is Θ(nlog_b a logk n) → Case 2 → T(n) = Θ(nlog_b a logk+1 n).
- If f(n) is Ω(nlog_b a + ε) and a·f(n/b) ≤ c·f(n) for some c < 1 → Case 3 → T(n) = Θ(f(n)).
- If none of the cases clearly applies (e.g., f(n) is not polynomial in n, or recurrence has a different form), use recursion tree or substitution.
Worked Examples
Example 1: T(n) = 2T(n/2) + n (merge sort)
a = 2, b = 2, f(n) = n. So nlogb a = nlog₂ 2 = n1 = n. We have f(n) = n = Θ(n) = Θ(nlog_b a log0 n). That's Case 2 with k = 0. So T(n) = Θ(nlog_b a log1 n) = Θ(n log n).
Example 2: T(n) = T(n/2) + Θ(1) (binary search)
a = 1, b = 2, f(n) = Θ(1). So nlogb a = nlog₂ 1 = n0 = 1. We have f(n) = Θ(1) = Θ(n0). So f(n) = Θ(nlog_b a log0 n) → Case 2, k = 0. T(n) = Θ(n0 log n) = Θ(log n).
Example 3: T(n) = 2T(n/2) + Θ(1)
a = 2, b = 2, f(n) = Θ(1). nlog_b a = n. So f(n) = O(n1−ε) for any ε in (0,1] (e.g. f(n) = O(n0.5)). Case 1. T(n) = Θ(nlog_b a) = Θ(n).
Example 4: T(n) = 2T(n/2) + n²
a = 2, b = 2, f(n) = n². nlog_b a = n. So f(n) = n² = Ω(n1+ε) for ε = 1. Regularity: 2·(n/2)² = n²/2 ≤ c·n² for c = 1/2 < 1. Case 3. T(n) = Θ(n²).
Example 5: T(n) = 3T(n/2) + n (e.g. Strassen-like split)
a = 3, b = 2, f(n) = n. nlog_b a = nlog₂ 3 ≈ n1.58. So f(n) = n = O(n1.58 − ε) for small ε. Case 1. T(n) = Θ(nlog₂ 3) ≈ Θ(n1.58).
When the Master Theorem Does Not Apply
- T(n) = 2T(n/2) + n log n: f(n) = n log n is larger than n but not polynomially larger than n (no n1+ε). Case 2 with k=1 gives T(n) = Θ(n log² n)—check the exact statement for logk n in f(n).
- T(n) = T(n/2) + T(n/2) + n is really 2T(n/2)+n, so it applies. But T(n) = T(n/3) + T(2n/3) + n has two different fractions; standard form assumes one n/b. Use recursion tree.
- a or b not constant: e.g. T(n) = n·T(√n) + n. Form is different; use substitution or tree.
Using Case 2 when f(n) is only O(nlog_b a) but not Θ(nlog_b a logk n). For Case 2 we need f(n) = Θ(nlog_b a logk n). If f(n) is strictly smaller (e.g. f(n) = O(n0.9) and nlog_b a = n), that's Case 1. Also: forgetting the regularity condition in Case 3—without it, the theorem doesn't guarantee T(n) = Θ(f(n)).
For Case 2, the most common situation is f(n) = Θ(nlog_b a)—i.e. no log factor in f(n). Then k = 0 and T(n) = Θ(nlog_b a log n). Merge sort is the classic: f(n)=n, nlog_b a=n, so Θ(n log n).
You don't need to recite the theorem word-for-word. Say: "It's the form aT(n/b)+f(n); I compare f(n) to n^(log_b a). Here that's n, and f(n)=n so they're equal—Case 2—so Θ(n log n)." For 3T(n/2)+n, say "log_b a is log₂ 3, so n^(log₂ 3) is bigger than n; the leaves dominate, Case 1, so Θ(n^(log₂ 3))." That shows you understand the three cases.
Summary
- Standard form: T(n) = aT(n/b) + f(n). Compute nlogb a and compare with f(n).
- Case 1: f(n) polynomially smaller → T(n) = Θ(nlog_b a).
- Case 2: f(n) = Θ(nlog_b a logk n) → T(n) = Θ(nlog_b a logk+1 n).
- Case 3: f(n) polynomially larger and regularity → T(n) = Θ(f(n)).
- When the recurrence doesn't fit (e.g. different split ratios), use the recursion tree or substitution method.
3.7 Space Complexity
Introduction
Space complexity is how much extra memory an algorithm uses, as a function of input size n (and sometimes other parameters). We express it in Big-O, just like time: O(1), O(n), O(n²), etc. It tells you whether your solution will run in limited memory (e.g. on embedded systems or with huge inputs) and whether you can improve by using less auxiliary space—e.g. in-place algorithms use O(1) extra space. This section covers what to count, how to analyze iterative and recursive code, and how to state space complexity clearly in interviews.
Why This Topic Matters
Interviews often ask "what's the space complexity?" in addition to time. Recursive solutions can be O(n) space just from the call stack; building a copy of the input adds O(n). In-place algorithms are valued when you must not use extra linear space. In practice, running out of memory can be as bad as running too long—so understanding space helps you choose and justify algorithms (e.g. iterative vs recursive DFS, merge sort vs in-place quicksort in terms of space).
Auxiliary Space vs Total Space
- Total space = space for input + output + any extra data structures and call stack. Sometimes we say "the algorithm uses O(n) space" meaning total.
- Auxiliary space = extra space used beyond the input (and sometimes beyond the output, depending on convention). So "O(1) auxiliary space" means we only use a constant number of extra variables; the input itself is not counted.
In DSA and interviews, "space complexity" usually means auxiliary space—we don't count the input (or the output if it's required, e.g. returning a new array). So "merge sort is O(n) space" means O(n) extra for the temporary arrays, not counting the input array. Always clarify if the problem says "space" without specifying: most of the time it's auxiliary.
When the output size is Θ(n) or larger (e.g. returning a list of all results), some definitions count it and some don't. In interviews, it's common to say "O(n) auxiliary space" to exclude the output, or "O(n) space including output." Stating "auxiliary" avoids ambiguity.
What to Count
- Extra variables and data structures: A list of size n you build → O(n). A hash map with n keys → O(n). A few integers → O(1).
- Call stack (recursion): Each recursive call uses space for parameters, return address, and local variables. Depth of recursion × space per frame. Example: recursive binary search has depth Θ(log n), so O(log n) space if each frame is O(1).
- Input and output: Usually not counted in auxiliary space. If you must count total space, input is often Θ(n) and output can be too.
Iterative Code: No Recursion
Space is just the extra data structures and variables. One loop with a fixed number of variables → O(1). Building a list of size n → O(n). Two arrays of size n → O(n) (we don't double-count constants in Big-O, but "two arrays" is still O(n)). Nested loops don't add space unless you allocate per iteration (e.g. a new list each time would be dangerous and likely O(n²) or worse).
# O(1) auxiliary: only a few variables
def max_element(arr):
m = arr[0]
for x in arr[1:]:
if x > m:
m = x
return m
# O(n) auxiliary: copy of input
def sorted_copy(arr):
return sorted(arr) # returns new list of size n
Recursive Code: Call Stack
Space = (maximum depth of recursion) × (space per frame). Frame space is usually O(1) per call (parameters and locals). So if depth is Θ(n), space is O(n); if depth is Θ(log n), space is O(log n).
# O(n) space: depth n, O(1) per frame
def fact(n):
if n <= 1:
return 1
return n * fact(n - 1)
# O(log n) space: depth log n, O(1) per frame
def binary_search_rec(arr, t, lo, hi):
if lo > hi:
return -1
mid = (lo + hi) // 2
if arr[mid] == t:
return mid
if arr[mid] > t:
return binary_search_rec(arr, t, lo, mid - 1)
return binary_search_rec(arr, t, mid + 1, hi)
Common Space Complexities (Quick Reference)
| Scenario | Auxiliary Space |
|---|---|
| Few variables, no extra structures, iterative | O(1) |
| One extra array/list of size n | O(n) |
| Hash map/set with n entries | O(n) |
| Recursion depth n, O(1) per frame | O(n) |
| Recursion depth log n (e.g. binary search) | O(log n) |
| Matrix/table of size n×n (e.g. DP) | O(n²) |
In-Place and O(1) Space
An in-place algorithm uses O(1) extra space (or at most O(log n) for recursion). It may overwrite the input. Examples: swapping two elements, reversing an array with two pointers, some sorting algorithms (e.g. quicksort with tail recursion or iterative implementation can be O(log n) stack; "in-place" often means no extra array). Saying "we can do this in O(1) space" is a strong claim—it usually means no extra arrays or maps that grow with n.
Time vs Space Trade-off
Often you can use more space to get faster time: e.g. hash map for O(1) lookup instead of O(n) scan—O(n) space for O(n) or O(1) time. Or memoization (DP): store subproblem results to avoid recomputation—extra space for less time. The reverse: in-place algorithms save space but may be trickier or have the same time. In interviews, stating the trade-off ("we can do O(n) time and O(n) space with a set, or O(n²) time and O(1) space with two loops") shows good understanding.
Forgetting the call stack in recursive code. A recursive function that only uses a few variables per call still uses O(depth) space. Also: counting the input in auxiliary space—we typically don't. And saying "O(1) space" when you're building a list of results of size n; that list is O(n) unless the problem explicitly doesn't count the output.
To reduce recursion space, convert to iteration with an explicit stack if needed. For example, DFS can be implemented recursively (O(h) or O(n) stack space) or iteratively with a stack data structure—the stack still uses O(n) in the worst case, but you avoid stack overflow and can sometimes optimize what you store. For "O(1) space" requirements, prefer iterative solutions or tail recursion where the language optimizes it.
When asked "what's the space complexity?", say both what you're counting and the result: "Auxiliary space is O(n) because we use a hash set of seen elements." For recursion: "O(n) space for the call stack since we have n recursive calls." If you give an O(n) space solution, you can add: "We could do O(1) space if we sort in place and use two pointers, but that would change the time to O(n log n)." That shows you understand the trade-off.
Summary
- Space complexity = extra memory as a function of n, usually meaning auxiliary space (excluding input, and often output).
- Count: extra data structures (arrays, maps, etc.) and call stack depth for recursion.
- Iterative: space = size of extra variables and structures. Recursive: space = depth × space per frame.
- O(1) = in-place style; O(n) = one extra linear structure or linear recursion depth; O(log n) = logarithmic depth (e.g. binary search recursion).
- Time–space trade-offs: more space can mean faster time (e.g. hash map); less space may mean more time or a more complex algorithm.
3.8 Amortized Analysis
Introduction
Amortized analysis gives a bound on the average cost per operation over a sequence of operations, rather than the cost of a single operation in the worst case. Some operations in the sequence may be expensive (e.g. resizing an array), but if they happen rarely, the average cost per operation can still be low. We say "amortized O(1) per append" meaning: over n appends, total cost is O(n), so each append "costs" O(1) on average. This is how we justify that list.append in Python (and dynamic arrays in general) is O(1) amortized, even though a single append can trigger an O(n) resize.
Why This Topic Matters
Dynamic arrays (Python list, C++ vector, Java ArrayList), hash tables, and many data structures have operations that are cheap most of the time and occasionally expensive. Worst-case per-operation analysis would say "append can be O(n)," which is misleading—we don't pay O(n) every time. Amortized analysis tells the right story: append is O(1) amortized, so building a list of n elements is O(n) total. In interviews, saying "append is O(1) amortized" shows you understand the difference between worst-case per operation and amortized cost.
Amortized vs Worst-Case vs Average Case
- Worst-case (per operation): The cost of a single operation in the worst possible scenario. One append might be O(n) when it triggers a resize.
- Average case (over inputs): Expected cost of one operation when the input is random. That's a different notion—we're not averaging over a sequence of operations.
- Amortized: We fix a sequence of operations (e.g. n appends). Total cost of the sequence is T. Amortized cost per operation = T / n. So we're spreading the cost of expensive operations over the whole sequence.
Amortized analysis does not use probability—it's deterministic. We consider the worst possible sequence of operations and show that even then, the total cost is bounded; dividing by the number of operations gives the amortized cost. So "O(1) amortized" means: for any sequence of n operations, total cost is O(n).
Classic Example: Dynamic Array (List) Append
Start with an array of capacity 1 (or some constant). When we append and the array is full, we allocate a new array of double the size, copy all elements, then append. So most appends are O(1); every time we double, we do O(current size) work. Question: over n appends, what is the total cost?
We do at most O(1) work for each of the n appends, plus the cost of copies during resizes. Copy sizes: 1, 2, 4, 8, … up to at most n (roughly). So total copy cost ≤ 1 + 2 + 4 + … + n ≤ 2n (geometric series). So total operations = n (appends) + O(n) (copies) = O(n). Hence amortized cost per append = O(n)/n = O(1).
Append #: 1 2 3 4 5 6 7 8 9 ...
Cost: 1 1 2 1 1 1 1 4 1 ...
(copy 1) (copy 1,2) (copy 1..4)
Total after n appends: n + (1+2+4+...+ ≤n) = n + O(n) = O(n)
Amortized per append: O(1)
Three Methods for Amortized Analysis
Aggregate method
Sum the total cost of n operations; show it's T(n); then amortized cost = T(n)/n. We used this for dynamic array: total O(n), so O(1) amortized per append.
Accounting (banker's) method
Assign a "charge" (amortized cost) to each operation. Some of that charge pays for the operation itself; the rest is "credit" stored for later. We require that credit never goes negative. If we charge 2 units per append (so amortized O(1)), then a cheap append uses 1 and saves 1; when we resize we use the saved credit to pay for the copy. So total charge n×2 = O(n) covers all real cost.
Potential method
Define a "potential" function Φ on the data structure state. Let ci be the real cost of the i-th operation and Φi the potential after it. We show that amortized cost âi = ci + Φi − Φi−1 is small (e.g. O(1)). Then sum of âi = sum of ci + Φn − Φ0; if Φ is always non-negative and bounded, total real cost is bounded by sum of âi. For dynamic array, Φ can be "2 × (number of elements) − capacity"; when we double, the drop in potential pays for the copy.
For interviews, the aggregate method is usually enough: "Total cost of n appends is O(n), so O(1) amortized."
Summary of Dynamic Array Resize
- Doubling when full: copy sizes 1, 2, 4, … up to ≈ n. Sum = O(n). So n appends cost O(n) total → O(1) amortized per append.
- If we grew by a constant (e.g. +10 each time): after n appends we'd copy 10 + 20 + 30 + … ≈ O(n²) total → amortized O(n) per append. So doubling (or any constant factor growth) is essential for O(1) amortized.
When Amortized Analysis Applies
Use it when you have a sequence of operations and some are occasionally expensive. Examples: dynamic array append/push, hash table insert (with rehashing), splay tree operations, incrementing a binary counter (flipping bits). We don't use "amortized" for a single standalone operation—we use it for the cost per operation in a long run.
Confusing amortized with average case. Amortized is "worst-case total over the sequence, divided by number of operations"—no randomness. Average case is "expected cost of one operation under a distribution over inputs." Also: saying "append is O(1)" without "amortized" is fine in practice (everyone understands), but technically it's O(1) amortized; a single append can be O(n) in the worst case.
When you see "dynamic array" or "list that grows," think doubling (or 1.5× or similar) and geometric series. The sum of resize costs is O(n), so amortized O(1) per insert. Same idea applies when rehashing a hash table: if we double the table when load factor is high, insert is O(1) amortized.
If asked "what's the time complexity of appending n elements to a list?" say "O(n) total, so O(1) amortized per append—occasionally we double and copy, but the total copy cost is O(n)." If they ask "why is append O(1)?" you can say "amortized O(1) because we double the capacity when full, so the total work for n appends is O(n)." That shows you know the difference between one expensive operation and amortized cost.
Summary
- Amortized cost = (total cost of a sequence of operations) / (number of operations). We bound the total, then divide.
- Used when some operations are occasionally expensive (e.g. resize); over the sequence, the average cost per operation is small.
- Dynamic array append with doubling: total O(n) for n appends → O(1) amortized per append. Constant-size growth would give amortized O(n).
- Methods: aggregate (sum total, divide), accounting (charge and credit), potential (potential function). For interviews, aggregate is usually enough.
- Amortized is not average case: it's deterministic worst-case over the sequence.
3.9 Recurrence Relations
Introduction
A recurrence relation is an equation that defines a function T(n) in terms of its values on smaller inputs—e.g. T(n) = 2T(n/2) + n. Recurrences appear whenever we analyze recursive algorithms: the cost of solving a problem of size n is the cost of the "current step" plus the cost of solving smaller subproblems. To get Big-O for the algorithm, we must solve the recurrence—find a closed form or an asymptotic bound. This section ties together recurrences as a concept, when they arise, and how to solve them using the tools from earlier topics (recursion tree, Master Theorem, and substitution).
Why This Topic Matters
Merge sort, quicksort, binary search, divide-and-conquer, and many recursive DP or backtracking algorithms have runtimes that naturally express as recurrences. Writing the recurrence is step one; solving it gives you the complexity. Recurrence relations are the bridge between "what the code does" and "what is T(n) in Big-O." In interviews, you might be asked to write a recurrence for your recursive solution and then solve it (or say "it fits the Master Theorem, so Θ(n log n)").
What Is a Recurrence Relation?
Formally, a recurrence for T(n) has the form:
T(n) = (expression involving T(n₁), T(n₂), …) + (non-recursive work)
where n₁, n₂, … are smaller than n (e.g. n−1, n/2, n/3). We also need a base case: T(1) = c or T(0) = c so the recursion stops. The "non-recursive work" is the cost of dividing the problem and combining results (e.g. merging two sorted halves). We want to find a closed form or asymptotic bound for T(n).
We often write T(n/2) even when n is odd; we assume n is a power of 2 for simplicity, or we use floor/ceiling. The Big-O result is the same. The "non-recursive" term is usually written as f(n)—e.g. Θ(n), Θ(1), Θ(n²)—and we use the Master Theorem or recursion tree by comparing f(n) with n^(log_b a).
Where Recurrences Come From
- Divide-and-conquer: Split into a subproblems of size n/b, do f(n) work. T(n) = aT(n/b) + f(n). Examples: merge sort 2T(n/2)+n, binary search T(n/2)+1.
- Linear reduction: One subproblem of size n−1 plus linear work. T(n) = T(n−1) + n or T(n) = T(n−1) + Θ(1). Examples: factorial, simple recursive scan.
- Multiple subproblems with different sizes: T(n) = T(n/3) + T(2n/3) + n. Doesn't fit the standard Master Theorem form; use recursion tree.
Common Recurrence Types and Their Solutions
| Recurrence | Solution | Typical use |
|---|---|---|
| T(n) = T(n−1) + Θ(1) | Θ(n) | Single recursion, constant work |
| T(n) = T(n−1) + n | Θ(n²) | Single recursion, linear work |
| T(n) = T(n/2) + Θ(1) | Θ(log n) | Binary search |
| T(n) = T(n/2) + n | Θ(n) | One half, linear merge |
| T(n) = 2T(n/2) + Θ(1) | Θ(n) | Tree traversal style |
| T(n) = 2T(n/2) + n | Θ(n log n) | Merge sort |
| T(n) = 2T(n/2) + n² | Θ(n²) | Heavy combine step |
How to Solve Recurrences: Method Overview
- Recursion tree (3.5): Draw the tree, sum cost per level (or over leaves). Works for any recurrence; especially useful when Master Theorem doesn't apply (e.g. T(n/3)+T(2n/3)+n).
- Master Theorem (3.6): For T(n) = aT(n/b) + f(n). Compare f(n) with n^(log_b a); apply Case 1, 2, or 3. Fast when it fits.
- Substitution: Guess the form (e.g. T(n) = O(n log n)), then prove by induction. Useful when you have a candidate bound and need to verify, or when the recurrence is non-standard.
- Expand and sum: For simple recurrences like T(n) = T(n−1) + n, unroll: T(n) = n + (n−1) + … + T(1) = n(n+1)/2 + c = Θ(n²).
Quick Derivation: T(n) = T(n−1) + n
T(n) = n + T(n−1) = n + (n−1) + T(n−2) = … = n + (n−1) + … + 1 + T(0) = n(n+1)/2 + Θ(1) = Θ(n²).
Quick Derivation: T(n) = 2T(n/2) + n
Master Theorem: a=2, b=2, f(n)=n, n^(log_b a)=n. So f(n)=Θ(n)=Θ(n^(log_b a)). Case 2 → T(n)=Θ(n log n). Or recursion tree: each level costs n, log n levels → n log n.
When the Recurrence Doesn't Fit Standard Form
- Uneven splits: T(n) = T(n/3) + T(2n/3) + n. Use recursion tree; depth from slowest branch (2n/3); per-level cost O(n) → O(n log n).
- More than one recursive term with different arguments: Still draw the tree and sum, or try substitution with a guessed bound.
- f(n) not polynomial: e.g. T(n) = 2T(n/2) + n log n. Master Theorem (extended) or recursion tree; result is often Θ(n log² n) or similar.
Forgetting the base case when writing a recurrence—without it, T(n) is not well-defined. Also: using the Master Theorem when the recurrence isn't in the form aT(n/b)+f(n) (e.g. T(n)=T(n/2)+T(n/3)+n). And confusing the "combine" cost with the "divide" cost—both go into f(n) as the non-recursive part.
When you see a recursive algorithm, write the recurrence first: "We do f(n) work and make a calls of size n/b" → T(n)=aT(n/b)+f(n). Then decide: does the Master Theorem apply? If yes, plug in. If no (e.g. uneven split), use the recursion tree. For T(n)=T(n−1)+something, unrolling usually gives a simple sum.
If you give a recursive solution, the interviewer may ask "what's the recurrence?" Say something like: "We split into two halves and do O(n) merge, so T(n)=2T(n/2)+n." Then: "That's Master Theorem Case 2, so Θ(n log n)." For a recurrence that doesn't fit, say "I'd draw the recursion tree and sum the levels." That shows you know the full toolkit.
Summary
- Recurrence relation = equation defining T(n) in terms of T on smaller inputs plus non-recursive work; need a base case.
- Common forms: T(n)=aT(n/b)+f(n) (divide-and-conquer), T(n)=T(n−1)+g(n) (linear reduction).
- Solve with: recursion tree, Master Theorem (when form fits), substitution, or expand-and-sum for simple cases.
- Know the classic solutions: T(n−1)+n → Θ(n²); 2T(n/2)+n → Θ(n log n); T(n/2)+1 → Θ(log n).
- When the recurrence is non-standard (uneven splits, non-polynomial f), use the recursion tree or substitution.
3.10 Proof of Time Complexity (Basic Induction)
Introduction
So far we've derived time complexity using recursion trees and the Master Theorem. To be rigorous, we can prove that our solution is correct—e.g. that T(n) = O(n log n) for the recurrence T(n) = 2T(n/2) + n. The standard way to do that is the substitution method: guess a bound (e.g. T(n) ≤ c·n log n), then use induction to show that the recurrence implies the bound for a suitable constant c and large n. This section introduces basic induction and the substitution method so you can prove (or verify) complexity bounds when needed—and understand why the Master Theorem and recursion tree conclusions are valid.
Why This Topic Matters
In coursework and sometimes in interviews, you may need to justify that your asymptotic bound is correct—not just state it. The substitution method is the standard proof technique for recurrence solutions. It also helps when the recurrence doesn't fit the Master Theorem: you guess the answer from the recursion tree, then prove it by substitution. Understanding induction makes you confident that "T(n) = 2T(n/2) + n ⇒ T(n) = Θ(n log n)" is not just a formula but a provable fact.
Induction in One Paragraph
Induction proves a statement P(n) for all natural numbers n (or all n ≥ n₀). You show: (1) Base case: P(n₀) is true. (2) Inductive step: For every n > n₀, if P(k) is true for all k < n (or for k = n−1, in simple induction), then P(n) is true. Then by induction, P(n) holds for all n ≥ n₀. For recurrences we often use strong induction: assume the bound holds for all smaller values (T(1), T(2), …, T(n−1)), then prove it for T(n) using the recurrence.
We prove Big-O bounds: "T(n) ≤ c·f(n) for n ≥ n₀." We choose c and n₀ so that the base case and the inductive step both work. Sometimes we need to subtract a lower-order term in the guess (e.g. T(n) ≤ c·n log n − dn) so that the recurrence "falls into" the bound when we substitute.
The Substitution Method (Steps)
- Guess the form of the solution: e.g. T(n) ≤ c·n log n for some constant c > 0.
- State the inductive hypothesis: Assume T(k) ≤ c·k log k for all k < n (and for k in the range we care about, e.g. k ≥ 2).
- Plug into the recurrence: T(n) = 2T(n/2) + n ≤ 2(c·(n/2) log(n/2)) + n = c·n log(n/2) + n = c·n log n − c·n + n.
- Show the right-hand side is ≤ your guess: We want c·n log n − c·n + n ≤ c·n log n. That holds if −c·n + n ≤ 0, i.e. c ≥ 1. So choose c = 1 (or larger).
- Base case: For n = 1 (or small n), T(1) = O(1). Choose c large enough so that c·1·log 1 = 0 doesn't matter; we may need to set n₀ ≥ 2 and check T(2) by hand, then the inductive step applies for n ≥ 2.
Worked Example: T(n) = 2T(n/2) + n ⇒ T(n) = O(n log n)
Guess: T(n) ≤ c·n log n for n ≥ 2, for some c ≥ 1.
Inductive hypothesis: Assume T(k) ≤ c·k log k for all 2 ≤ k < n.
Recurrence: T(n) = 2T(n/2) + n. By the hypothesis (with k = n/2, and we assume n/2 ≥ 2 so n ≥ 4, or we handle n = 2, 3 separately), T(n/2) ≤ c·(n/2)·log(n/2). So:
T(n) ≤ 2·c·(n/2)·log(n/2) + n = c·n·log(n/2) + n = c·n·(log n − 1) + n = c·n log n − c·n + n.
We need T(n) ≤ c·n log n. So we need −c·n + n ≤ 0, i.e. c ≥ 1. Choose c = 1. Then T(n) ≤ n log n for n ≥ 2 (with base case checked). Hence T(n) = O(n log n).
When the Guess Needs a Lower-Order Term
Sometimes a "plain" guess like T(n) ≤ c·n log n doesn't work because the recurrence gives an extra positive term that won't disappear. Then we guess with a subtraction: T(n) ≤ c·n log n − dn. Substituting, we get something like c·n log n − dn − (d − 1)n; we choose d so that (d − 1)n ≥ 0 (e.g. d ≥ 1), and then the right-hand side is ≤ c·n log n − dn. So the inductive step goes through. The Master Theorem and recursion tree already tell us the correct form; the subtraction trick is just to make the algebra work in the proof.
Proving Θ (Upper and Lower Bounds)
To prove T(n) = Θ(n log n), we prove both T(n) = O(n log n) and T(n) = Ω(n log n). For the upper bound we use substitution as above. For the lower bound, we guess T(n) ≥ c·n log n and show that the recurrence implies it (with a suitable c > 0 and possibly a lower-order term added in the guess). The idea is the same; the inequalities are reversed.
Summary of the Substitution Method
- Guess the asymptotic form (from recursion tree or Master Theorem).
- Assume by (strong) induction that the bound holds for all smaller inputs.
- Substitute into the recurrence and show that the bound holds for n.
- Choose constants (c, n₀, and sometimes a subtracted term) so that the base case and inductive step both work.
Using the recurrence to "prove" the recurrence—e.g. writing "T(n) = 2T(n/2) + n, so by the Master Theorem T(n) = O(n log n)." That's correct but not a substitution proof. In substitution you must plug your inductive hypothesis (the bound for T(n/2)) into the recurrence and show the bound for T(n). Also: forgetting the base case—induction requires it.
In interviews you usually just state the bound and cite the Master Theorem or recursion tree. Substitution is for when someone asks "can you prove it?" or in written exams. If you do prove by substitution, keep the algebra simple: write "T(n) ≤ 2·(c·(n/2) log(n/2)) + n = …" and end with "≤ c·n log n when c ≥ 1."
Most interviewers are satisfied with "By the Master Theorem, Case 2, so Θ(n log n)" or "The recursion tree has log n levels and n work per level, so O(n log n)." If they ask "can you prove it?" say: "We'd use the substitution method: assume T(k) ≤ c·k log k for k < n, substitute into T(n) = 2T(n/2)+n, and show T(n) ≤ c·n log n for some c." You don't need to do the full algebra unless they insist.
Summary
- Induction proves a statement for all n: base case + inductive step (assume for smaller k, prove for n).
- Substitution method for recurrences: guess a bound (e.g. T(n) ≤ c·n log n), assume it holds for all smaller inputs, substitute into the recurrence, and show the bound holds for n; choose c and base case appropriately.
- Sometimes the guess needs a lower-order term (e.g. c·n log n − dn) so the inequality works out.
- To prove Θ, prove both O and Ω.
- In practice, recursion tree and Master Theorem are enough for stating complexity; substitution is for rigor or when asked to prove.
4.1 Number Systems
Introduction
When you see the number 42, you instantly read it as “forty-two.” When a computer stores that same value, it uses a different representation: 101010 in binary. The value is the same; the way we write it depends on the number system (base) we choose. In algorithms and programming, you will constantly meet binary (bits, masks, XOR), hexadecimal (memory addresses, colors, hashes), and sometimes octal. Understanding how number systems work—and how to convert between them—is essential for bit manipulation, low-level reasoning, and many interview problems.
Real-World Analogy
Think of number systems like different languages for writing the same quantities.
- Decimal (base 10): What we use every day. Ten symbols: 0–9. “42” means 4×10 + 2×1.
- Time: Clocks use base 60 for seconds and minutes (60 seconds in a minute), and base 12 for hours (12 on the clock face). So “1:30” is one way of writing a duration—same idea as a different base.
- Binary (base 2): Only two symbols: 0 and 1. Computers use it because hardware is built from switches that are either on or off. “101010” is the same quantity as decimal 42, just written in base 2.
- Hexadecimal (base 16): Sixteen symbols: 0–9 and A–F. Shorthand for binary (one hex digit = four bits). Used in memory addresses, color codes (#FF5733), and when debugging.
The value (how many) doesn’t change; only the notation (how we write it) changes with the base.
Formal Definition
A number system (or numeral system) is a way of representing numbers using a fixed set of symbols and a base (radix). In a positional system, the value of a digit depends on its position in the number.
In base b, a number written as dk dk−1 … d1 d0 (where each di is a digit from 0 to b−1) has value:
dk·bk + dk−1·bk−1 + … + d1·b1 + d0·b0.
So in base 10, 42 = 4×101 + 2×100. In base 2, 101010 = 1×25 + 0×24 + 1×23 + 0×22 + 1×21 + 0×20 = 32 + 8 + 2 = 42.
The radix is the number of distinct digits. Base 2 → digits 0,1. Base 10 → 0–9. Base 16 → 0–9, A(10), B(11), C(12), D(13), E(14), F(15).
Why This Topic Matters
In DSA and interviews, number systems show up everywhere:
- Bit manipulation: AND, OR, XOR, shifts—all operate on the binary representation. You need to think in base 2 to set/clear/test bits.
- Fast exponentiation / modular arithmetic: The binary expansion of the exponent drives “square and multiply” algorithms (covered later).
- Encoding and hashing: Hex strings represent raw bytes (e.g., SHA hashes). Parsing and generating such strings requires comfort with base 16.
- Problem constraints: Some problems ask for “the number of 1s in the binary representation” or “convert to base k”—direct number-system questions.
Building a clear mental model of bases and conversion will make later topics (Bit Manipulation, Fast Exponentiation, Modular Arithmetic) much easier.
Mental Model
In any base b, each position is a power of b. The rightmost digit is the “ones” place (b0), the next is “bs” (b1), then “b2s,” and so on. So:
Base 10: ... thousands hundreds tens ones
... 10³ 10² 10¹ 10⁰
Example: 4 2 → 4×10 + 2×1 = 42
Base 2: ... 32 16 8 4 2 1
... 2⁵ 2⁴ 2³ 2² 2¹ 2⁰
Example: 1 0 1 0 1 0 → 32+8+2 = 42
Base 16: ... 256 16 1
... 16² 16¹ 16⁰
Example: 2 A → 2×16 + 10×1 = 42
Same number, different “columns.” Conversion is just re-expressing the same total in a different column system.
Decimal (Base 10)
We use digits 0–9. Each place is a power of 10. You already do this intuitively: 347 = 3×100 + 4×10 + 7×1. Nothing new here except naming: this is our default radix.
Binary (Base 2)
Only two digits: 0 and 1. Every number is a sum of distinct powers of 2. That’s why binary is natural for computers: each bit is one “switch” (on/off).
- Rightmost bit = 20 = 1 (least significant bit, LSB).
- Next left = 21 = 2, then 4, 8, 16, … (most significant bit, MSB, on the left).
42 in binary: 42 = 32 + 8 + 2 = 1×25 + 0×24 + 1×23 + 0×22 + 1×21 + 0×20 → digits from MSB to LSB: 101010. So 4210 = 1010102.
Counting in binary: 0, 1, 10, 11, 100, 101, 110, 111, 1000, … (same idea as rolling over digits when you pass 9 in decimal).
Octal (Base 8)
Digits 0–7. Each octal digit corresponds to exactly three binary digits (because 8 = 23). So conversion between binary and octal is trivial: group bits in threes from the right. Less common today than hex, but you may see file permissions (e.g., chmod 755) expressed in octal.
Example: 4210 = 1010102. Group as 101 | 010 → 5 and 2 in octal → 528. Check: 5×8 + 2 = 42 ✓.
Hexadecimal (Base 16)
Digits 0–9 and A–F (A=10, B=11, C=12, D=13, E=14, F=15). One hex digit = four bits (16 = 24). So binary ↔ hex is a simple grouping by fours. Hex is compact and easy to read; that’s why memory addresses, RGB values (e.g., #FF5733), and many hashes are shown in hex.
4210 = 1010102. Pad to four-bit groups: 0010 | 1010 → 2 and 10 (A in hex) → 0x2A or 2A16. Check: 2×16 + 10 = 42 ✓.
Step-by-Step: Converting Between Bases
Two main directions: from decimal to base b, and from base b to decimal.
Decimal → Base b (e.g., Decimal → Binary)
Method: repeated division by b; remainders (read in reverse) give digits from LSB to MSB.
- Divide the number by b. The remainder is the rightmost digit (LSB).
- Take the quotient and divide by b again. The new remainder is the next digit to the left.
- Repeat until the quotient is 0. The sequence of remainders (last to first) is the number in base b.
Example: 42 → binary (base 2). 42÷2=21 rem 0; 21÷2=10 rem 1; 10÷2=5 rem 0; 5÷2=2 rem 1; 2÷2=1 rem 0; 1÷2=0 rem 1. Remainders (bottom to top): 1,0,1,0,1,0 → 101010.
Base b → Decimal
Method: expand by place value. Multiply each digit by its place power and add: dk·bk + … + d0·b0. For binary, that’s “add the powers of 2 where the bit is 1.”
Example: 1010102 = 1×32 + 0×16 + 1×8 + 0×4 + 1×2 + 0×1 = 32+8+2 = 42.
Binary ↔ Hex (Shortcut)
Binary → Hex: group bits in fours from the right; replace each group with the corresponding hex digit (0–9, A–F). Hex → Binary: replace each hex digit with its 4-bit binary form (e.g., A → 1010).
ASCII Diagram: Place Value Across Bases
Decimal 42 in different bases (same value, different digits):
Base 10: 4 2 → 4×10¹ + 2×10⁰
Base 2: 1 0 1 0 1 0 → 1×2⁵ + 1×2³ + 1×2¹
Base 8: 5 2 → 5×8¹ + 2×8⁰
Base 16: 2 A → 2×16¹ + 10×16⁰
Position: (MSB) ............ (LSB)
Value: b^(n-1) ... b^1 b^0
Python Implementation
Python has built-in support for number systems. You can parse strings in a given base and format integers in binary, octal, or hex.
Parsing: String in Base b → Integer
# int(string, base) — base 2 to 36
int("101010", 2) # 42
int("52", 8) # 42
int("2A", 16) # 42
int("2a", 16) # 42 (hex digits case-insensitive)
# Optional prefix: 0b, 0o, 0x (base inferred)
int("0b101010", 0) # 42
int("0o52", 0) # 42
int("0x2A", 0) # 42
Formatting: Integer → String in Base b
# bin(), oct(), hex() return strings with prefix
bin(42) # '0b101010'
oct(42) # '0o52'
hex(42) # '0x2a'
# Without prefix: use format() or slice
format(42, 'b') # '101010'
format(42, 'o') # '52'
format(42, 'x') # '2a'
format(42, 'X') # '2A' (uppercase hex)
# Generic base (2–36) — no built-in; implement with repeated division
Custom Base Conversion (Decimal to Base b)
def to_base(n: int, b: int) -> str:
if n == 0:
return "0"
digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
result = []
neg = n < 0
n = abs(n)
while n:
result.append(digits[n % b])
n //= b
if neg:
result.append("-")
return "".join(reversed(result))
# Examples
to_base(42, 2) # "101010"
to_base(42, 16) # "2A"
to_base(255, 16) # "FF"
Custom Base Conversion (Base b String to Decimal)
def from_base(s: str, b: int) -> int:
s = s.strip().upper()
if not s:
return 0
start = 1 if s[0] in "-+" else 0
sign = -1 if s[0] == "-" else 1
n = 0
digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
for c in s[start:]:
n = n * b + digits.index(c)
return sign * n
# Examples
from_base("101010", 2) # 42
from_base("2A", 16) # 42
Line-by-Line Explanation: to_base
if n == 0: return "0"— edge case: zero in any base is "0".digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"— one character per value 0 to 35; supports bases 2–36.result = []— we collect remainder digits; we'll reverse at the end because we get LSB first.neg, n = abs(n)— handle negative by converting to positive and tracking sign; then we process positiven.while n:— repeat until quotient is 0.n % bis the next digit (LSB first);n //= bis the quotient for the next iteration.return "".join(reversed(result))— remainders were collected LSB-first, so reversing gives MSB-first (normal written form). Prepend "-" if the number was negative.
This is exactly the “repeated division” algorithm we described earlier; the code just automates it.
Edge Cases
- Zero: In any base, 0 is written "0". Your
to_base(0, b)should return "0". - Negative numbers: Standard conversion is for non-negative integers. If you need negative, handle the sign separately (e.g., convert
abs(n)and prepend "-"). - Empty string:
int("", 2)raisesValueError. In customfrom_base, you might return 0 or raise; document the choice. - Invalid digits: e.g.
int("12", 2)— digit 2 is invalid in base 2. Python raisesValueError. In your own parser, validate each character against the base. - Large numbers: Python integers are arbitrary-precision, so no overflow;
int("1"*1000, 2)works.
Common Mistakes
- Reading binary/hex backwards: The rightmost bit is the LSB (20). Don’t treat the leftmost as “first” when computing value—left is MSB, highest power.
- Confusing prefix with value:
0b101010is a literal in code; the value is 42. When youbin(42)you get the string'0b101010'—the0bis a prefix, not part of the mathematical value. - Off-by-one in bit position: “Bit 0” usually means the LSB (20). “Bit i” often means the coefficient of 2i. Check problem wording (0-indexed vs 1-indexed).
- Using
int(string)without base for non-decimal:int("101010")is 101010 in decimal, not 42. For binary you must useint("101010", 2).
int("101010") returns 101010 (one hundred one thousand ten in decimal). To interpret "101010" as binary, you must call int("101010", 2), which returns 42. Always pass the base when the string is not in decimal.
Pattern Recognition
Base conversion fits two patterns you’ll reuse:
- Decimal → base b: Repeated division by b; remainders (reverse order) = digits. Same idea as “extract digits” in decimal (n % 10, n // 10).
- Base b → decimal: Horner-style expansion: start at 0, then for each digit do
value = value * b + digit. One pass left-to-right.
Many “digit” problems (e.g., “sum of digits in base k,” “palindrome in base b”) use these same building blocks.
Binary ↔ hex conversion is O(number of digits): group four bits per hex digit. Decimal ↔ binary via repeated division is O(log n) steps (each step halves the number). For very large numbers, Python’s int(..., 2) and bin() are implemented in C and are efficient; use them instead of hand-rolled loops when possible.
When a problem involves “binary representation,” “number of 1 bits,” “reverse bits,” or “base k,” state clearly: “We can get the binary representation with bin(n) or by repeated division; then operate on the string or digits.” For “count set bits,” mention that you can use bin(n).count('1') or bit operations (n & 1 and n >>= 1 in a loop). Knowing both the string and the bit-op approaches shows depth.
Practice Problems
- Convert a decimal number to binary and to hex by hand for small values (e.g., 0–255).
- Implement
to_base(n, b)andfrom_base(s, b)for bases 2–36 without usingint(s, b)orbin/hex(practice the algorithm). - Given a positive integer, count the number of 1s in its binary representation (Hamming weight).
- Check if a number’s binary representation is a palindrome (e.g., 9 = 1001 is; 10 = 1010 is not).
- Convert a number from base A to base B (e.g., base 7 → base 5) by going through decimal or by direct conversion if you know the place values.
Summary
- A number system is a way of writing numbers using a base (radix) and a set of digits. In a positional system, value = sum of digit × baseposition.
- Decimal (base 10), binary (base 2), octal (base 8), and hexadecimal (base 16) are the most common in programming. Binary is fundamental for bits; hex is a compact shorthand for binary (one hex digit = four bits).
- Decimal → base b: Repeated division by b; remainders (reverse order) give the digits. Base b → decimal: Expand by place value, or use Horner:
value = value * b + digit. - In Python:
int(string, base)parses;bin(),oct(),hex()andformat(n, 'b'/'o'/'x')format. Use these for correctness and speed; implement by hand when practicing the algorithm. - Number systems underpin bit manipulation, fast exponentiation, and encoding—master them early for a smooth path through Mathematics for Algorithms and Bit Manipulation.
4.2 Prime Numbers
Introduction
A prime number is a natural number greater than 1 that has exactly two positive divisors: 1 and itself. Primes are the building blocks of integers—every integer greater than 1 can be written uniquely (up to order) as a product of primes. In DSA, primes appear in hashing (hash table sizes), cryptography, factorization problems, and counting (e.g., divisors, coprime pairs). This section covers the definition, why primes matter, how to test if a number is prime, and how to count primes in a range—setting you up for the Sieve of Eratosthenes (next topic) and number-theoretic algorithms.
Real-World Analogy
Think of primes as indivisible building blocks. Just as you can’t split an atom into smaller pieces of the same substance (in classical chemistry), you can’t factor a prime into smaller positive integers other than 1 and itself. Composite numbers are “molecules” made of primes: 12 = 2×2×3, 42 = 2×3×7. Once you know the primes up to a limit, you can build and analyze all integers in that range—same idea as having a periodic table of elements.
Formal Definition
A prime number is an integer p ≥ 2 such that the only positive divisors of p are 1 and p. An integer n ≥ 2 that is not prime is composite; it has at least one divisor d with 2 ≤ d ≤ n−1.
By convention, 1 is neither prime nor composite. The first primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, … Note that 2 is the only even prime; every other even number is divisible by 2.
Why This Topic Matters
- Primality testing: “Is n prime?” appears in problems and interviews. You need a correct, efficient check—often trial division first, then optimizations.
- Counting primes: “How many primes ≤ N?” or “List all primes ≤ N” lead to the Sieve of Eratosthenes (next section)—one of the most common algorithms in number theory.
- Divisors and factorization: GCD, LCM, divisor count, and “sum of divisors” rely on prime factorization. Primes are the first step.
- Hashing and randomization: Prime-sized hash tables and prime moduli reduce collisions. Choosing a “large enough prime” is a standard trick.
Mental Model
For a number n to be composite, it must have a divisor d with 2 ≤ d ≤ √n. Why? If n = a×b with a ≤ b, then a ≤ √n (otherwise a×b > n). So we only need to check divisors up to √n—that’s the core of trial-division primality testing and of “find one factor” for composites.
n = a × b with a ≤ b
⇒ a² ≤ a·b = n
⇒ a ≤ √n
So: if no divisor in [2, √n], then n is prime.
Primality Testing: Is n Prime?
We want a function is_prime(n) that returns True if n is prime and False otherwise.
Brute Force: Trial Division
Check every integer d from 2 to n−1: if d divides n, then n is composite. If none divide, n is prime. Correct but slow—O(n) divisions for n.
Better: Check Only Up to √n
If n has a divisor d > √n, then n/d is a divisor < √n. So we only need to check d from 2 to ⌊√n⌋. If n % d == 0 for any such d, n is composite; otherwise prime. This reduces the loop to O(√n) iterations.
Optimal for Trial Division: Check 2, Then Odd Numbers
If n is even, we can immediately return n == 2. Then check only odd d: 3, 5, 7, … up to √n. This halves the number of divisions (still O(√n) but with a smaller constant). For very large n, probabilistic tests (e.g., Miller–Rabin) are used in practice; for DSA and interviews, trial division up to √n is usually enough.
Brute force: O(n) checks. Better: O(√n) by checking only up to √n. Best trial division: O(√n) but only odd divisors after 2. For “list all primes ≤ N,” the Sieve of Eratosthenes (next topic) is O(N log log N)—much better than testing each number with trial division.
Python Implementation
is_prime(n) — Trial Division
def is_prime(n: int) -> bool:
if n < 2:
return False
if n == 2:
return True
if n % 2 == 0:
return False
d = 3
while d * d <= n:
if n % d == 0:
return False
d += 2
return True
Line-by-Line Explanation
n < 2— Primes are ≥ 2; 0 and 1 are not prime.n == 2— 2 is the only even prime; return early.n % 2 == 0— Any other even number is composite.while d * d <= n— Check only up to √n. We used * d <= nto avoid floating-point √n. We startd = 3and add 2 each time (odd divisors only).- If any such
ddividesn, returnFalse. If the loop finishes, no divisor was found, so returnTrue.
Time and Space Complexity
Time: O(√n). The loop runs at most about √n times (only odd d, so roughly √n/2 iterations). Each iteration does a remainder and a multiply; both O(1) for fixed-size integers. For arbitrary-precision integers, cost per division grows with the number of digits; we still say O(√n) in terms of the value n when analyzing algorithm design.
Space: O(1). Only a few variables.
Edge Cases
- n < 2: Return
False. 0 and 1 are not prime. - n == 2: Return
True. Handle before the “even” check so we don’t incorrectly label 2 as composite. - Large n: Trial division is fine for n up to around 1012 in practice (√n ≈ 106). For bigger n, use a probabilistic test or accept that trial division is slow.
Counting Primes and Listing Primes
To answer “how many primes ≤ N?” or “list all primes ≤ N,” testing each number with is_prime would take about O(N √N) time—too slow for large N. The Sieve of Eratosthenes (next topic, 4.3) does this in O(N log log N) by marking composites once. Here we only note the goal; the algorithm is in the next section.
For a single query “is n prime?”, use trial division. For “all primes up to N” or “count primes up to N”, use the Sieve. Don’t sieve when you only need one primality check; don’t do N trial divisions when you need the full list.
Common Mistakes
- Treating 1 as prime: 1 has only one positive divisor (itself). By definition, primes have exactly two divisors (1 and n). So 1 is not prime.
- Checking up to n−1 instead of √n: That leaves the algorithm correct but O(n) instead of O(√n). Always use the √n bound.
- Using floating-point sqrt(n):
int(n**0.5)can have rounding errors for large n. Preferwhile d * d <= nso all arithmetic is integer. - Forgetting to handle 2: If you do “if n % 2 == 0: return False” before “if n == 2: return True”, you’ll incorrectly say 2 is not prime.
1 is not prime. The definition requires exactly two positive divisors. 1 has only one divisor (1), so it is excluded. Your is_prime(1) must return False.
Interview Insight
When asked “how do you check if a number is prime?”, say: “Trial division: check divisors from 2 to √n. If any divide n, it’s composite; otherwise prime. We can skip evens after 2 to speed up. Time O(√n), space O(1).” If the follow-up is “count primes up to N,” say: “Then we’d use the Sieve of Eratosthenes to mark composites and count primes in O(N log log N).”
Practice Problems
- Implement
is_prime(n)with trial division (up to √n, odd divisors after 2). - Given N, count how many primes are in the range [2, N] using trial division for each (then compare with the Sieve in the next section).
- Find the smallest prime factor of n (or return n if prime)—same loop as primality test, but return the first d that divides n.
- Check if n is a “prime power” (n = pk for some prime p and k ≥ 1): after confirming n > 1, find the smallest prime factor p; then check if n is a power of p.
Summary
- A prime is an integer ≥ 2 whose only positive divisors are 1 and itself. 1 is not prime. 2 is the only even prime.
- Primality test: Trial division—check if any d in [2, √n] divides n. If yes, composite; if no, prime. Optimize by checking 2 then only odd divisors. Time O(√n), space O(1).
- For “list/count primes ≤ N,” use the Sieve of Eratosthenes (next topic)—O(N log log N)—not N separate trial divisions.
- Edge cases: n < 2 → false; n == 2 → true; use integer comparison
d*d <= ninstead of floating-point √n to avoid rounding issues.
4.3 Sieve of Eratosthenes
Introduction
The Sieve of Eratosthenes is an ancient algorithm that finds all prime numbers up to a given limit N by repeatedly marking multiples of primes as composite. Instead of testing each number with trial division (which would cost about O(N √N) for the whole range), the sieve marks each composite number once—or a small number of times—yielding a total cost of O(N log log N) time and O(N) space. It is the standard way to “list all primes ≤ N” or “count primes ≤ N” and is a must-know for number theory, competitive programming, and interviews.
Real-World Analogy
Imagine a long list of numbers from 2 to N written on a board. You go through the list in order. When you see a number that hasn’t been crossed out, it’s prime—so you cross out all its multiples (4, 6, 8, … for 2; then 6, 9, 12, … for 3; and so on). When you’re done, every number still left is prime. You never “test” a number by dividing it; you only erase multiples of numbers you’ve already declared prime. That’s the sieve: “sift out” composites by marking multiples of each prime.
Formal Definition
The Sieve of Eratosthenes works as follows: maintain a boolean array is_prime[0..N] (or similar) where initially every index is considered “prime.” For each i from 2 to √N (or to N): if i is still marked prime, then mark all multiples of i (2i, 3i, …) as composite. When the loop finishes, every index that remains marked prime (in [2, N]) is a prime number.
We only need to start crossing multiples from p = 2 up to p = √N, because any composite ≤ N has a prime factor ≤ √N—so it will already be marked when we process that factor.
Why This Topic Matters
- Count primes / list primes: Many problems ask “how many primes ≤ N?” or “output all primes ≤ N.” The sieve answers both in one pass.
- Precomputation: Once you’ve sieved up to N, you have O(1) primality checks for any number ≤ N (just look up the array). Useful when you need many such checks.
- Prime factorization, divisors: A common variant stores the smallest prime factor (SPF) for each number instead of just a boolean. That allows fast factorization and divisor enumeration.
- Interview staple: “Count primes less than n” is a classic LeetCode-style problem; the expected solution is the sieve.
Mental Model
Think of the sieve as “every composite number has a smallest prime factor.” When we process prime p, we mark all multiples of p. The first time a composite m gets marked is when we process its smallest prime factor. So we’re not “testing” each number—we’re just ensuring each composite gets marked once (by its smallest prime factor). That’s why the total work is much less than N × √N.
For each prime p in [2, √N]:
Mark 2p, 3p, 4p, ... (multiples of p) as composite.
After the loop: any unmarked number in [2, N] is prime.
Key: we only iterate p up to √N; multiples beyond N are skipped.
Step-by-Step Breakdown
- Create: An array
is_primeof length N+1. Setis_prime[0] = is_prime[1] = False(0 and 1 are not prime). Setis_prime[2..N] = True(assume prime until marked composite). - Outer loop: For
pfrom 2 to N (or to √N for the optimized version): ifis_prime[p]is still True, thenpis prime. - Mark composites: For each multiple of
p—i.e.p*2, p*3, p*4, ...—setis_prime[k] = Falsefork = 2*p, 3*p, ...(stop whenk > N). - Optional optimization: Start marking from
p*pinstead of2*p. Why? Because2*p, 3*p, ..., (p-1)*phave already been marked when we processed smaller primes (2, 3, …, p−1). So we only need to markp², p²+p, p²+2p, ...up to N. - Result: After the loop, collect all
iin [2, N] such thatis_prime[i]is True—those are the primes. Or count them.
ASCII Diagram
Sieving primes up to 30. We mark multiples of 2, then 3, then 5 (we don’t need to go beyond √30 ≈ 5). X = composite, . = prime (unchanged).
p=2: mark 4,6,8,10,12,14,16,18,20,22,24,26,28,30
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
. . X . X . X . X . X . X . X . X . X . X . X . X . X . X
p=3: mark 9,15,21,27
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
. . X . X . X X X . X . X X . X . X . X . X . X . X X . X
p=5: mark 25
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
. . X . X . X X X . X . X X . X . X . X . X . X X . X . X
Primes: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29
Evolution: Brute Force → Sieve → Optimized Sieve
Brute Force
For each number i from 2 to N, call is_prime(i) (trial division). Time: O(N √N). Space: O(1) per call. Too slow for large N.
Sieve of Eratosthenes (Basic)
One array; for each p from 2 to N, if p is prime, mark all multiples of p. Time: O(N log log N) (see below). Space: O(N). Much faster than brute force.
Optimized Sieve
(1) Outer loop only to √N—once we’ve processed primes up to √N, all composites ≤ N are already marked. (2) Start marking multiples at p²—smaller multiples were marked by smaller primes. (3) Optional: use a list of booleans for odd numbers only (half the memory, index mapping). The asymptotic time remains O(N log log N); constants improve.
Stopping the outer loop at √N doesn’t change the asymptotic time (most marking is done by small primes), but it avoids redundant work. Starting inner loop at p² is both correct and faster: we mark fewer cells per prime. For “count primes” only, you can use a bit-packed sieve (one bit per odd number) to reduce memory.
Python Implementation
Basic Sieve: List All Primes ≤ N
def sieve(n: int) -> list[int]:
if n < 2:
return []
is_prime = [True] * (n + 1)
is_prime[0] = is_prime[1] = False
p = 2
while p * p <= n:
if is_prime[p]:
for k in range(2 * p, n + 1, p):
is_prime[k] = False
p += 1
return [i for i in range(2, n + 1) if is_prime[i]]
Count Primes Only (Same Idea)
def count_primes(n: int) -> int:
if n < 2:
return 0
is_prime = [True] * (n + 1)
is_prime[0] = is_prime[1] = False
p = 2
while p * p <= n:
if is_prime[p]:
for k in range(2 * p, n + 1, p):
is_prime[k] = False
p += 1
return sum(1 for i in range(2, n + 1) if is_prime[i])
Optimized: Start Marking at p²
def sieve_optimized(n: int) -> list[int]:
if n < 2:
return []
is_prime = [True] * (n + 1)
is_prime[0] = is_prime[1] = False
p = 2
while p * p <= n:
if is_prime[p]:
for k in range(p * p, n + 1, p):
is_prime[k] = False
p += 1
return [i for i in range(2, n + 1) if is_prime[i]]
The only change is range(2 * p, ...) → range(p * p, ...). Correct because multiples p×2, p×3, …, p×(p−1) were already marked when we processed primes 2, 3, …, p−1.
Line-by-Line Explanation (Basic Sieve)
if n < 2: return []— No primes below 2.is_prime = [True] * (n + 1)— Index 0..n; we’ll use indices 0 and 1 for 0 and 1, and 2..n for numbers 2..n.is_prime[0] = is_prime[1] = False— 0 and 1 are not prime.while p * p <= n— We only need to consider p up to √n. If p > √n, then any multiple p×k ≥ p² > n, so we’d mark nothing new.if is_prime[p]— If p was already marked composite (by a smaller prime), skip; we’ve already marked its multiples.for k in range(2*p, n+1, p)— Mark 2p, 3p, 4p, … up to n. Step sizepgives exactly the multiples of p.return [i for i in range(2, n+1) if is_prime[i]]— Collect all indices that are still True; those are the primes.
Time Complexity
For each prime p ≤ √N, we mark about N/p numbers (multiples of p). Total operations is roughly:
N/2 + N/3 + N/5 + N/7 + … (sum over primes p ≤ N). Factor out N: N × (1/2 + 1/3 + 1/5 + …). The sum of 1/p over primes ≤ N is known to be about log log N. So total is O(N log log N).
Stopping the outer loop at √N doesn’t change the dominant term (most marks come from small primes). Starting the inner loop at p² reduces the number of marks per prime but the same asymptotic holds. So the sieve is O(N log log N) time.
Space Complexity
We use one boolean array of length N+1 → O(N) space. If we pack one bit per number (or only store odd indices), we can get O(N) bits, i.e. O(N/8) bytes—same big-O, smaller constant.
Edge Cases
- n < 2: No primes. Return
[]or count 0. - n == 2: One prime (2). The outer loop runs
p=2, 2×2=4 > 2 so the inner loop runs no times;is_prime[2]stays True. Correct. - Large N: O(N) space can be an issue for N in the hundreds of millions. Use a segmented sieve or bit-packed storage if needed.
Common Mistakes
- Including 0 or 1 as prime: Set
is_prime[0] = is_prime[1] = Falseat the start. - Looping p to N instead of √N: Correct but wasteful. Stopping at √N is enough and faster.
- Starting the inner loop at 2*p but forgetting p² optimization: Correct either way; starting at p² is a performance improvement.
- Off-by-one in range: Use
range(2, n + 1)so that n itself is included when we collect primes. Userange(p * p, n + 1, p)so the last multiple ≤ n is marked.
Forgetting to mark 0 and 1 as non-prime. If you leave is_prime[0] or is_prime[1] as True, your list or count will be wrong. Always set is_prime[0] = is_prime[1] = False before the loop.
Pattern Recognition
“Mark multiples of each prime” is a recurring idea: when you want to eliminate all numbers that are “multiples of something,” iterate over the “something” (here, primes) and mark multiples in one batch. The same pattern appears in segmented sieves (sieving a range [L, R]) and in some divisor-sum precomputations.
Interview Insight
For “count primes less than n” (or “list primes ≤ n”), say: “I’ll use the Sieve of Eratosthenes. Create a boolean array, mark 0 and 1 as non-prime, then for each p from 2 to √n, if p is still prime, mark all multiples of p. Finally count or collect indices that remain True. Time O(n log log n), space O(n).” If they ask for optimization, mention starting the inner loop at p² and stopping the outer loop at √n.
Practice Problems
- Implement the sieve and return the list of primes ≤ N; then implement count-only without building the list.
- LeetCode 204: Count Primes — use the sieve; handle n ≤ 2.
- Precompute a “smallest prime factor” (SPF) array: for each number, store its smallest prime divisor. (When marking multiples of p, if
spf[k]is not yet set, setspf[k] = p.) Then use SPF to factor any number ≤ N quickly. - Segmented sieve: find primes in an interval [L, R] without building a full sieve up to R (useful when R is huge but R−L is moderate).
Summary
- The Sieve of Eratosthenes finds all primes ≤ N by marking multiples of each prime; unmarked numbers are prime. Use a boolean array and iterate p from 2 to √N; for each prime p, mark multiples (start at 2p or p²).
- Time: O(N log log N). Space: O(N).
- Evolution: brute force (N × trial division) → sieve → optimized sieve (outer loop to √N, inner from p²).
- Edge cases: n < 2 → empty list / 0; always set
is_prime[0] = is_prime[1] = False. - Standard for “list/count primes ≤ N”; variant with smallest-prime-factor array enables fast factorization.
4.4 GCD & LCM
Introduction
The Greatest Common Divisor (GCD) of two integers is the largest positive integer that divides both. The Least Common Multiple (LCM) is the smallest positive integer that both divide. GCD and LCM are fundamental in number theory, modular arithmetic, simplifying fractions, and many DSA problems (e.g., “minimum steps to reach,” “repeat pattern,” “coprime pairs”). This section covers definitions, the Euclidean algorithm for GCD (and why it’s fast), the link between GCD and LCM, and clean Python implementations—including edge cases and interview-ready phrasing.
Real-World Analogy
Imagine you have two ropes of length 12 and 18. You want to cut both into equal pieces (no leftover), with each piece as long as possible. The piece length must divide both 12 and 18—so it’s a common divisor. The greatest such length is 6. That’s the GCD(12, 18) = 6. Now imagine two gears that turn every 12 and 18 seconds. When do they align (both at the start)? The first time is at the smallest positive time that both 12 and 18 divide—that’s the LCM(12, 18) = 36. So: GCD = “largest common measure”; LCM = “earliest common multiple.”
Formal Definition
GCD(a, b) (or gcd(a, b)) is the largest positive integer d such that d | a and d | b (i.e., d divides both a and b). By convention, GCD(0, b) = b and GCD(a, 0) = a (so GCD(0, 0) is often defined as 0).
LCM(a, b) is the smallest positive integer m such that a | m and b | m. For positive a, b, we have LCM(a, b) × GCD(a, b) = a × b, so LCM(a, b) = a × b / GCD(a, b).
For negative numbers, we usually work with absolute values: GCD(−12, 18) = GCD(12, 18) = 6. So in code we often take gcd(abs(a), abs(b)).
Why This Topic Matters
- Modular arithmetic and cryptography: Extended GCD (next topic) is used to compute modular inverses—essential for RSA and many algorithms.
- Simplifying fractions: A fraction a/b in lowest terms is (a/gcd(a,b)) / (b/gcd(a,b)).
- Periodicity and “when do two events align?”: Problems like “two runners with periods 12 and 18—when do they meet?” reduce to LCM (or GCD of periods in special cases).
- Interview problems: “Minimum steps to make two numbers equal,” “number of coprime pairs,” “split array into groups with GCD K”—all use GCD/LCM.
Mental Model
Think of a and b in terms of their prime factorizations. The GCD takes the minimum exponent for each prime (the “overlap”). The LCM takes the maximum exponent (the “union”). So for 12 = 2²×3 and 18 = 2×3²: GCD = 2¹×3¹ = 6, LCM = 2²×3² = 36. And indeed 6 × 36 = 12 × 18. That’s the intuition behind GCD × LCM = a × b (for positive a, b).
a = 12 = 2² × 3¹ b = 18 = 2¹ × 3²
GCD = common part = 2¹ × 3¹ = 6
LCM = combined part = 2² × 3² = 36
Check: 6 × 36 = 216 = 12 × 18
GCD: Brute Force → Euclidean Algorithm
Brute Force
List all divisors of a and b; take the largest number that appears in both. Finding divisors of n costs O(√n), so total is about O(√a + √b). Works for small numbers but is slow and clumsy.
Euclidean Algorithm (Optimal)
Key identity: GCD(a, b) = GCD(b, a mod b). Why? Any common divisor of a and b also divides a − b, a − 2b, …, so it divides a mod b. So the set of common divisors of (a, b) and (b, a mod b) is the same; hence the GCD is the same. Base case: GCD(a, 0) = a (the largest divisor of a is a). So we repeatedly replace (a, b) with (b, a mod b) until the second number is 0; then the first number is the GCD.
GCD(48, 18): 48 mod 18 = 12 → GCD(18, 12). 18 mod 12 = 6 → GCD(12, 6). 12 mod 6 = 0 → GCD(6, 0) = 6. So GCD(48, 18) = 6.
Each step roughly halves the larger number (a mod b < b, and often much smaller), so the number of steps is O(log min(a, b)). No factorization needed—just remainders.
Step-by-Step: Euclidean Algorithm
- Start with (a, b). If b = 0, return a (base case).
- Otherwise compute r = a mod b and return GCD(b, r).
- Repeat until the second argument is 0. The first argument at that point is the GCD.
GCD(48, 18)
= GCD(18, 48 mod 18) = GCD(18, 12)
= GCD(12, 18 mod 12) = GCD(12, 6)
= GCD(6, 12 mod 6) = GCD(6, 0)
= 6
LCM from GCD
For positive integers a, b: LCM(a, b) = a × b / GCD(a, b). So compute GCD first, then LCM = (a // g) * b (or a * (b // g)) to avoid overflow: divide one number by the GCD first, then multiply by the other. In Python, integers are arbitrary size, but (a // gcd) * b keeps the intermediate values smaller and is good practice for other languages.
Writing lcm = a * b // gcd(a, b) can overflow in C++/Java for large a, b. Prefer lcm = (a // gcd(a, b)) * b so the multiplication is with a smaller factor. In Python it’s less critical but still clearer and portable.
Python Implementation
GCD — Recursive
def gcd(a: int, b: int) -> int:
a, b = abs(a), abs(b)
if b == 0:
return a
return gcd(b, a % b)
GCD — Iterative
def gcd_iter(a: int, b: int) -> int:
a, b = abs(a), abs(b)
while b:
a, b = b, a % b
return a
LCM (Using GCD)
def lcm(a: int, b: int) -> int:
if a == 0 or b == 0:
return 0
return (a // gcd(abs(a), abs(b))) * abs(b)
We use (a // g) * b so the division happens first (no overflow). We take absolute values so LCM is defined for negatives in a consistent way (e.g., LCM(−12, 18) = LCM(12, 18) = 36).
Using the Standard Library
import math
math.gcd(a, b) # GCD; handles 0 and negatives sensibly
# Python 3.9+:
math.lcm(a, b) # LCM; returns 0 if either argument is 0
In practice, use math.gcd and math.lcm when available; implement by hand when practicing the algorithm or when you need extended GCD (next topic).
Line-by-Line Explanation (Iterative GCD)
a, b = abs(a), abs(b)— GCD is defined for non-negative numbers; negatives are handled by taking absolute values so the result is non-negative.while b:— Loop until the second argument is 0. When b becomes 0, the currentais the GCD.a, b = b, a % b— Replace (a, b) with (b, a mod b). This is one step of the Euclidean algorithm; the GCD is unchanged.return a— When b = 0, GCD(a, 0) = a, so we returna.
Time Complexity
GCD: Each step reduces the larger argument. It can be shown that in two steps, the larger number is at least halved (e.g., if a > b, then (a mod b) < b, and after one more step we have (b, a mod b) with both values bounded by the previous larger value). So the number of steps is O(log min(a, b)). Each step is O(1) for fixed-size integers; for big integers, cost per step is proportional to the number of digits. So O(log min(a, b)) iterations; with cost per iteration O(log n) for big integers, total is O(log² min(a, b)) in that model. We usually state it as O(log min(a, b)) for the number of iterations.
LCM: One GCD plus one division and one multiplication → same as GCD, O(log min(a, b)).
Space Complexity
Iterative GCD: O(1) extra space. Recursive GCD: O(log min(a, b)) stack depth. Prefer iterative if you care about stack or want to avoid recursion limits for huge inputs.
Edge Cases
- One or both zero: GCD(0, b) = b, GCD(a, 0) = a, GCD(0, 0) = 0 (by convention). LCM(0, b) = LCM(a, 0) = 0 (no positive multiple of 0 in the usual sense; 0 is the standard convention).
- Negatives: GCD and LCM are usually defined for positive integers; in code we take absolute values so GCD(−12, 18) = 6, LCM(−12, 18) = 36.
- Equal numbers: GCD(a, a) = a, LCM(a, a) = a.
Common Mistakes
- Forgetting to handle zero: GCD(a, 0) must return a, not crash or loop forever. In iterative code,
while b:correctly stops when b = 0. - LCM overflow: Using
a * b // gcd(a, b)in a language with fixed-width integers can overflow. Use(a // gcd(a, b)) * b. - LCM when one is zero: Define LCM(a, 0) = 0 (or avoid calling LCM with 0). Don’t divide by zero.
- Assuming a and b are positive: If the problem allows negatives, use
abs(a),abs(b)for a non-negative GCD.
Optimization Insight
There is no asymptotically faster algorithm than the Euclidean algorithm for GCD (in the number of digits). The binary GCD (Stein’s algorithm) uses only shifts and subtraction—sometimes faster in practice due to hardware. For interviews and most code, the standard Euclidean (iterative) is enough. Use math.gcd in Python for production.
Pattern Recognition
“Reduce (a, b) to (b, a mod b)” is the same idea as “reduce the problem size using the remainder.” Many number-theoretic algorithms (extended GCD, modular inverse, continued fractions) build on this. When you see “greatest common divisor,” “simplify fraction,” or “coprime,” think GCD first.
Interview Insight
When asked “how do you compute GCD?”, say: “Euclidean algorithm: GCD(a, b) = GCD(b, a mod b), with base case GCD(a, 0) = a. Iterative implementation in a loop: while b is non-zero, set (a, b) = (b, a % b); then return a. Time O(log min(a, b)), space O(1). LCM is a*b/GCD(a,b); compute as (a // gcd) * b to avoid overflow.” If the problem involves “lowest terms” or “coprime,” connect it to dividing by GCD.
Practice Problems
- Implement
gcdandlcm(iterative and recursive for GCD) and test with (0, b), (a, 0), negatives, and (48, 18). - Simplify a fraction (a, b) to lowest terms: divide numerator and denominator by GCD(a, b).
- Count pairs (i, j) in an array such that GCD(arr[i], arr[j]) = 1 (coprime pairs)—use GCD in the inner check or use inclusion–exclusion with prime factors.
- Given two numbers and a step, “minimum steps to make both equal” often involves GCD of the steps and the difference.
- LCM of an array: LCM(a, b, c) = LCM(LCM(a, b), c); use a loop and the two-argument LCM.
Summary
- GCD(a, b) = largest positive integer that divides both. LCM(a, b) = smallest positive integer that both divide. For positive a, b: LCM × GCD = a × b.
- Euclidean algorithm: GCD(a, b) = GCD(b, a mod b); base case GCD(a, 0) = a. Iterative: while b ≠ 0, (a, b) = (b, a % b); return a. Time O(log min(a, b)), space O(1) iterative.
- LCM: LCM(a, b) = (a // GCD(a, b)) × b (or use math.lcm in Python 3.9+). Handle zero: LCM(a, 0) = 0.
- Use
abs(a), abs(b)when inputs can be negative. Prefermath.gcdandmath.lcmin production; implement by hand for learning and for extended GCD (next topic).
4.5 Extended Euclidean Algorithm
Introduction
The Extended Euclidean Algorithm not only computes GCD(a, b) but also finds integers x and y such that ax + by = gcd(a, b). This identity is called Bézout’s identity. The coefficients x, y are essential for computing modular inverses (next topic), solving linear Diophantine equations, and many cryptographic algorithms. This section builds on the ordinary Euclidean algorithm and shows how to “extend” it to recover the coefficients.
Real-World Analogy
Imagine you have two piles of coins with a and b coins. You’re allowed to add or remove multiples of each pile. The Euclidean algorithm tells you the “unit” of value you can always form (the GCD). The extended algorithm tells you how to form it: “take x copies of the first pile and y copies of the second (with removal meaning negative copies), and you get exactly gcd(a, b).” So you get both the number (GCD) and a “recipe” (x, y) to express it as a combination of a and b.
Formal Definition
Bézout’s identity: For any integers a, b (not both zero), there exist integers x, y such that ax + by = gcd(a, b). The Extended Euclidean Algorithm computes gcd(a, b) and one such pair (x, y). The pair is not unique: (x + k·b/g, y − k·a/g) also works for any integer k, where g = gcd(a, b).
We usually want one concrete solution (x, y). The extended algorithm gives one by propagating coefficients backward through the Euclidean steps.
Why This Topic Matters
- Modular inverse: Finding x such that a·x ≡ 1 (mod m) reduces to solving a·x + m·y = 1. That has a solution iff gcd(a, m) = 1; the extended algorithm gives x (mod m). Critical for RSA and many algorithms.
- Linear Diophantine equations: Equations like a·x + b·y = c have integer solutions (x, y) iff gcd(a, b) | c. The extended algorithm gives a particular solution when c = gcd(a, b); scale to get one for general c.
- Interview and contest problems: “Compute modular inverse,” “find one solution to a·x + b·y = c”—both rely on extended GCD.
Mental Model
In the Euclidean algorithm we repeatedly replace (a, b) with (b, a mod b). At each step we have a = q·b + r, so r = a − q·b. If we already know how to write the GCD as a combination of b and r (i.e., b·x₁ + r·y₁ = g), then we can substitute r = a − q·b and get a combination of a and b: a·y₁ + b·(x₁ − q·y₁) = g. So we propagate coefficients backward: from (b, r) we get coefficients for (a, b). Base case: when b = 0, we have a·1 + 0·0 = a = gcd(a, 0).
At step: a = q·b + r ⇒ r = a − q·b
If b·x₁ + r·y₁ = g then b·x₁ + (a − q·b)·y₁ = g
⇒ a·y₁ + b·(x₁ − q·y₁) = g → new (x, y) = (y₁, x₁ − q·y₁)
Step-by-Step: Recursive Form
- Base case: If b = 0, then gcd(a, 0) = a and a·1 + 0·0 = a. So return (a, 1, 0) meaning (gcd, x, y).
- Recursive step: Compute (g, x₁, y₁) = extended_gcd(b, a % b). So g = b·x₁ + (a % b)·y₁. We have a = q·b + (a % b) with q = a // b. So (a % b) = a − q·b. Substitute: g = b·x₁ + (a − q·b)·y₁ = a·y₁ + b·(x₁ − q·y₁). So coefficients for (a, b) are x = y₁, y = x₁ − q·y₁. Return (g, x, y).
Extended GCD(35, 15): 35 = 2·15 + 5, so we need coefficients for (15, 5). Extended GCD(15, 5): 15 = 3·5 + 0. Base case: (5, 1, 0) → 15·1 + 0·0 = 5. Back: we had 5 = 35 − 2·15, so 5 = 35·1 + 15·(−2). So (g, x, y) = (5, 1, −2). Check: 35·1 + 15·(−2) = 35 − 30 = 5 ✓.
Python Implementation
Recursive Extended GCD
def extended_gcd(a: int, b: int) -> tuple[int, int, int]:
"""Returns (g, x, y) such that a*x + b*y = g = gcd(a, b)."""
a, b = abs(a), abs(b)
if b == 0:
return (a, 1, 0)
q = a // b
g, x1, y1 = extended_gcd(b, a % b)
# g = b*x1 + (a % b)*y1, and a % b = a - q*b
# so g = a*y1 + b*(x1 - q*y1)
x, y = y1, x1 - q * y1
return (g, x, y)
Iterative Extended GCD
Keep track of (a, b) and coefficients (x_a, y_a) for current a and (x_b, y_b) for current b, such that a = orig_a·x_a + orig_b·y_a and b = orig_a·x_b + orig_b·y_b. Initially a = orig_a, b = orig_b → (x_a, y_a) = (1, 0), (x_b, y_b) = (0, 1). When we replace (a, b) with (b, a − q·b), we update the coefficient vectors the same way: new (x_b, y_b) becomes (x_a − q·x_b, y_a − q·y_b). When b = 0, (g, x, y) = (a, x_a, y_a).
def extended_gcd_iter(a: int, b: int) -> tuple[int, int, int]:
"""Returns (g, x, y) such that a*x + b*y = g = gcd(a, b)."""
a, b = abs(a), abs(b)
x_a, y_a = 1, 0
x_b, y_b = 0, 1
while b:
q = a // b
a, b = b, a % b
x_a, y_a, x_b, y_b = x_b, y_b, x_a - q * x_b, y_a - q * y_b
return (a, x_a, y_a)
Line-by-Line Explanation (Recursive)
if b == 0: return (a, 1, 0)— Base case: a·1 + 0·0 = a = gcd(a, 0).q = a // b— Quotient so that a = q·b + (a % b).g, x1, y1 = extended_gcd(b, a % b)— Get GCD and coefficients for (b, a % b): g = b·x1 + (a % b)·y1.x, y = y1, x1 - q * y1— Substitute (a % b) = a − q·b to get g = a·y1 + b·(x1 − q·y1), so x = y1, y = x1 − q·y1.return (g, x, y)— One solution to a·x + b·y = g.
Time and Space Complexity
Time: Same as the Euclidean algorithm—O(log min(a, b)) steps. Each step does a division and a few arithmetic operations. So O(log min(a, b)).
Space: Recursive version O(log min(a, b)) stack depth. Iterative version O(1) extra variables.
Edge Cases
- b = 0: Return (a, 1, 0). Correct: a·1 + 0·0 = a.
- a = 0, b ≠ 0: extended_gcd(0, b) → (b, 0, 1). Check: 0·0 + b·1 = b = gcd(0, b) ✓.
- Negative inputs: We take abs(a), abs(b) so we work with non-negative numbers. The GCD is non-negative; x, y can be negative (e.g., 35·1 + 15·(−2) = 5).
Application: Modular Inverse (Preview)
We want x such that a·x ≡ 1 (mod m). That means a·x + m·y = 1 for some y. This has a solution iff gcd(a, m) = 1. Run extended_gcd(a, m) to get (1, x, y). The x you get might be negative; reduce modulo m: x % m (or (x % m + m) % m) is the modular inverse of a modulo m. This is covered in detail in the next topic (Modular Inverse).
When you need the modular inverse of a modulo m, call extended_gcd(a, m). If g ≠ 1, no inverse exists. Otherwise the inverse is x % m (adjust for negative). In Python 3.8+, you can use pow(a, -1, m) for the inverse when gcd(a, m) = 1.
Common Mistakes
- Wrong order of (x, y): The equation is a·x + b·y = g. So the first coefficient multiplies a, the second multiplies b. Don’t swap x and y when returning or when using the result for modular inverse (inverse of a mod m uses the coefficient that multiplies a).
- Forgetting to reduce x modulo m for inverse: extended_gcd returns an x that might be negative or larger than m. The modular inverse is (x % m + m) % m (or x % m in Python since % is non-negative when divisor is positive).
- Assuming inverse exists: a has an inverse mod m only if gcd(a, m) = 1. Always check g == 1 before using x as the inverse.
The equation is a·x + b·y = g. So when you compute the modular inverse of a modulo m, you use the coefficient that goes with a (the first coefficient x), not the one that goes with m (the second coefficient y). Inverse of a mod m = x mod m (after checking gcd(a, m) = 1).
Interview Insight
When asked “how do you find coefficients such that ax + by = gcd(a, b)?”, say: “Extended Euclidean algorithm. Recursively compute (g, x1, y1) for (b, a mod b). Then use a = q·b + (a mod b) to get coefficients for (a, b): x = y1, y = x1 − q·y1. Base case: (a, 1, 0) when b = 0. Same complexity as GCD, O(log min(a,b)).” For “modular inverse,” say: “Run extended GCD(a, m); if g = 1, the inverse is x mod m.”
Practice Problems
- Implement recursive and iterative extended GCD and verify a·x + b·y = g for (35, 15), (48, 18), (0, 7).
- Using extended GCD, compute the modular inverse of a modulo m when gcd(a, m) = 1 (handle negative x).
- Solve a·x + b·y = c in integers: first run extended_gcd(a, b). If g does not divide c, no solution. Otherwise one solution is (x·(c//g), y·(c//g)); describe the full solution set.
Summary
- Bézout’s identity: There exist integers x, y with a·x + b·y = gcd(a, b). The Extended Euclidean Algorithm computes (g, x, y).
- Recursive: Base case (b = 0) → (a, 1, 0). Otherwise (g, x1, y1) = extended_gcd(b, a % b); then x = y1, y = x1 − (a//b)·y1; return (g, x, y).
- Iterative: Maintain (a, b) and coefficient vectors (x_a, y_a), (x_b, y_b); update with the same recurrence as (a, b) when doing (a, b) = (b, a % b).
- Time O(log min(a, b)), space O(1) iterative. Use extended GCD to get the modular inverse of a mod m (x mod m when g = 1)—next topic.
4.6 Fast Exponentiation
Introduction
Fast exponentiation (also called binary exponentiation or square-and-multiply) computes ab using only O(log b) multiplications instead of b−1. The idea is to use the binary representation of the exponent: write b in base 2, and combine precomputed powers a2⁰, a2¹, a2², … by squaring repeatedly and multiplying when the current bit is 1. This is essential for modular exponentiation (ab mod m)—used in RSA, hashing, and many algorithms—where we must keep intermediate results modulo m to avoid overflow. This section covers the idea, the algorithm, and clean Python implementations.
Real-World Analogy
To get 313, you could multiply 3 by itself 12 times. Instead, build powers by doubling: 3¹, 3², 3⁴, 3⁸ (each is the previous squared). Then 13 in binary is 1101, so 313 = 38 × 34 × 31—multiply only the powers that correspond to 1-bits. You do about four squarings and a few multiplications instead of 12. Same idea as “repeated doubling” in any setting: grow exponentially, then combine.
Formal Definition
Write the exponent b in binary: b = bk·2k + … + b1·2¹ + b0·2⁰, where each bi is 0 or 1. Then ab = ab₀ × (a²)b₁ × (a⁴)b₂ × … . We compute a, a², a⁴, a⁸, … by repeated squaring, and multiply into the result only when the current bit of b is 1. Total multiplications: O(log b).
For modular exponentiation we compute ab mod m: after each multiplication or squaring, take the result mod m so numbers stay in [0, m−1].
Why This Topic Matters
- Modular exponentiation: RSA and many crypto primitives need ab mod m for huge b. Naive a × a × … mod m would take b steps; fast exponentiation does O(log b) steps.
- Python’s
pow(a, b, m): The built-in three-argument form does exactly this. Knowing the algorithm explains why it’s fast and how to implement it when needed. - Matrix exponentiation, recurrence relations: The same “binary exponentiation” idea applies to raising a matrix to power b (e.g., Fibonacci in O(log n))—covered later in the course.
Mental Model
Scan the bits of b from right to left (LSB first). Maintain a running power base = a^(2^i) (start with base = a, then square each time we move left). Maintain a result res (start 1). When the current bit of b is 1, multiply res by base. After each step, square base (for the next bit) and shift b right. When b becomes 0, res is ab.
b in binary: ... b₂ b₁ b₀
a^b = (a^1)^b₀ × (a^2)^b₁ × (a^4)^b₂ × ...
So: res = 1; base = a; for each bit of b (LSB first):
if bit is 1: res *= base
base *= base; b //= 2
Return res.
Evolution: Naive → Fast Exponentiation
Naive (Brute Force)
Multiply res = 1 by a exactly b times. Time O(b), space O(1). For b in the millions or more, this is impractical.
Fast Exponentiation (Square-and-Multiply)
Repeated squaring + multiply when bit is 1. Number of iterations = number of bits of b = ⌊log₂ b⌋ + 1. Each iteration does one or two multiplications (squaring base, and possibly multiplying res by base). Time O(log b), space O(1) iterative.
There is no way to compute ab using fewer than Ω(log b) multiplications in the general case—each multiplication can at most double the exponent we can represent. Fast exponentiation achieves O(log b) and is optimal up to constants.
Step-by-Step Example
Compute 313. 13 in binary is 1101 (LSB = 1).
res=1, base=3, b=13
b&1=1 → res=1*3=3, base=3²=9, b=6
b&1=0 → res=3, base=9²=81, b=3
b&1=1 → res=3*81=243, base=81²=6561, b=1
b&1=1 → res=243*6561=1594323, base=..., b=0
Return 1594323. Check: 3^13 = 1594323 ✓
We did four “bit” steps (squaring each time, and three multiplies into res). So about 4 squarings + 3 multiplies instead of 12 multiplies.
Modular Exponentiation
To compute ab mod m, reduce modulo m after every multiplication and squaring. That keeps all intermediates in [0, m−1] and avoids overflow. The algorithm is the same; only the multiplication and squaring are done mod m.
# In code: res = (res * base) % m; base = (base * base) % m
Python’s pow(a, b, m) does exactly this when m is provided. Use it in production; implement by hand when learning or when you need a custom variant (e.g., matrix exponentiation).
Python Implementation
Iterative: ab (no mod)
def power(a: int, b: int) -> int:
"""Returns a^b for non-negative b."""
if b == 0:
return 1
res = 1
base = a
while b:
if b & 1:
res *= base
base *= base
b >>= 1
return res
Iterative: ab mod m
def power_mod(a: int, b: int, m: int) -> int:
"""Returns (a^b) % m for non-negative b. Assumes m >= 1."""
if b == 0:
return 1 % m
a %= m
res = 1
base = a
while b:
if b & 1:
res = (res * base) % m
base = (base * base) % m
b >>= 1
return res
Recursive (Alternative)
def power_mod_rec(a: int, b: int, m: int) -> int:
if b == 0:
return 1 % m
a %= m
half = power_mod_rec(a, b // 2, m)
half = (half * half) % m
if b & 1:
half = (half * a) % m
return half
Recursive idea: ab = (ab//2)² × ab%2. Same O(log b) steps; uses O(log b) stack.
Line-by-Line Explanation (Iterative with Mod)
if b == 0: return 1 % m— a0 = 1; reduce mod m for consistency (handles m = 1).a %= m— Work with a in [0, m−1] so base stays small.res = 1, base = a— Result starts 1; current power of a is a1.while b:— Process each bit of b until b = 0.if b & 1: res = (res * base) % m— If the LSB is 1, multiply result by current base (that bit contributes to the exponent).base = (base * base) % m— Square: a2^i → a2^(i+1).b >>= 1— Shift right to process the next bit.
Time and Space Complexity
Time: O(log b) iterations. Each iteration does O(1) multiplications (with mod, each multiplication is O(log m) for fixed-size integers, or O((log m)²) for naive big-int). We state O(log b) in terms of the number of steps; with big integers, total can be O(log b · (log a + log m)²) or similar depending on model.
Space: O(1) for the iterative version (a few variables). Recursive version O(log b) stack depth.
Edge Cases
- b = 0: a0 = 1. Return 1 (or 1 % m).
- a = 0, b > 0: 0b = 0. The loop will leave res = 0 after the first 1-bit (base becomes 0). Or handle explicitly: if a % m == 0 and b > 0, return 0.
- m = 1: Any integer mod 1 is 0. So ab mod 1 = 0 for any a, b. The code 1 % 1 = 0 and (res * base) % 1 = 0 is correct.
- Negative exponent: a−b = 1/(ab). Not usually needed for modular exponentiation in DSA; if needed, use modular inverse (when gcd(a, m) = 1, a−1 mod m exists and pow(a, -b, m) = pow(a−1, b, m) in Python 3.8+).
Common Mistakes
- Forgetting to reduce mod m at each step: If you only take mod at the end, intermediates can overflow (in other languages) or become huge (slow in Python). Always do
res = (res * base) % mandbase = (base * base) % m. - Using b - 1 instead of b >>= 1: We must process bits, not decrement b. Use
b >>= 1(orb //= 2). - Wrong order of operations: Check the bit (b & 1) before squaring base and shifting b. Multiply into res when bit is 1, then always square base and shift.
Computing ab first and then taking mod m only at the end. For large b, ab is astronomically large and won’t fit in memory. Always reduce modulo m after every multiplication so that values stay bounded by m.
Pattern Recognition
“Binary decomposition of the exponent” is a recurring pattern: any operation that is associative (like multiplication, matrix multiplication) can be raised to power b in O(log b) “multiplications” by using the binary expansion of b. Same idea underlies matrix exponentiation for Fibonacci and linear recurrences.
Interview Insight
When asked “how do you compute a^b mod m efficiently?”, say: “Fast exponentiation using the binary representation of b. Start with res = 1, base = a mod m. For each bit of b from LSB: if the bit is 1, multiply res by base mod m; then square base mod m and shift b right. Time O(log b), space O(1). In Python we can use pow(a, b, m).” If the problem is “a^b without mod,” same algorithm without the % m.
Practice Problems
- Implement
power_mod(a, b, m)iteratively and verify againstpow(a, b, m)for small and large b. - Compute the last k digits of ab (i.e., ab mod 10k) using fast exponentiation.
- LeetCode-style: “Count good numbers” or problems that need (base)^(exponent) mod mod—use fast exponentiation.
- Later: apply the same “binary exponentiation” idea to matrices (e.g., compute the n-th Fibonacci number in O(log n) using matrix power).
Summary
- Fast exponentiation computes ab in O(log b) multiplications by using the binary expansion of b: ab = product of a2^i over bits that are 1. Algorithm: res = 1, base = a; for each bit (LSB first): if bit 1, res *= base; base *= base; b //= 2.
- Modular exponentiation: Reduce mod m after every multiplication and squaring so intermediates stay in [0, m−1]. Use
pow(a, b, m)in Python. - Time O(log b), space O(1) iterative. Same idea extends to matrix exponentiation and other associative operations.
- Edge cases: b = 0 → 1; reduce a mod m first; handle a = 0 or m = 1 if needed.
4.7 Modular Arithmetic
Introduction
Modular arithmetic is arithmetic on integers where we only care about the remainder when dividing by a fixed positive integer m (the modulus). Two integers a and b are congruent modulo m, written a ≡ b (mod m), if they leave the same remainder when divided by m—equivalently, if m divides (a − b). Addition, subtraction, and multiplication “work” modulo m: you can reduce before or after the operation and get a consistent result. Division modulo m is different: it requires the modular inverse (next topic). Modular arithmetic is foundational for cryptography, hashing, cyclic structures, and almost every problem that asks for “answer modulo 10⁹+7” in competitive programming.
Real-World Analogy
Think of a clock with 12 hours. If it’s 9 o’clock and you add 5 hours, you get 2 o’clock—not 14. So 9 + 5 ≡ 2 (mod 12). The clock “wraps around” at 12. Similarly, “what day of the week is it 100 days from now?” is modular arithmetic mod 7. The modulus is the size of the cycle; we only care which position we’re in, not how many full cycles have passed.
Formal Definition
For a positive integer m (the modulus), we say a ≡ b (mod m) iff m divides (a − b), i.e., (a − b) is a multiple of m. Equivalently, a and b leave the same remainder when divided by m: a mod m = b mod m.
The residue class of a modulo m is the set of all integers congruent to a mod m. A representative of that class is often chosen in the range [0, m−1]: that’s a mod m (when we define mod to return a value in [0, m−1]).
In code, “a mod m” usually means the remainder when a is divided by m. In Python, a % m returns a value in [0, m−1] when m > 0 (so −17 % 5 = 3, because −17 = (−4)·5 + 3). In C/Java, a % m can be negative when a is negative; the “canonical” representative is then (a % m + m) % m.
Why This Topic Matters
- Competitive programming and interviews: Problems often ask for the answer “modulo 10⁹+7” or “mod 998244353” to avoid big integers and focus on the algorithm. You must add, subtract, multiply (and sometimes divide) correctly under the modulus.
- Cryptography: RSA and many protocols work in modular arithmetic (mod a large composite or prime).
- Hashing: Hash tables use
hash(key) % capacity. Understanding mod avoids off-by-one and negative-index bugs. - Cyclic behavior: Sequences that repeat (e.g., state machines, periodic signals) are naturally described with modular arithmetic.
Mental Model
Work with numbers as their “remainder when divided by m.” Every integer is equivalent to exactly one value in {0, 1, …, m−1}. When you add or multiply, do the operation and then take the remainder (or take remainders first—see below). So the universe of values is finite: only m “slots,” and everything wraps around.
Integers mod m: ... ≡ -2m ≡ -m ≡ 0 ≡ m ≡ 2m ≡ ... (all same "slot")
... ≡ -m+1 ≡ 1 ≡ m+1 ≡ ... (another slot)
We usually pick representatives 0, 1, ..., m-1.
Basic Operations Modulo m
For addition, subtraction, and multiplication, the following hold:
- Addition: (a + b) mod m = ((a mod m) + (b mod m)) mod m. So you can reduce a and b first, then add, then reduce again (avoids overflow in other languages).
- Subtraction: (a − b) mod m = ((a mod m) − (b mod m) + m) mod m. The +m ensures the result is non-negative when (a mod m) < (b mod m).
- Multiplication: (a · b) mod m = ((a mod m) · (b mod m)) mod m. Reduce before multiplying to keep intermediates small.
Mod 7: 5 + 4 = 9 ≡ 2; 5 − 4 = 1; 5 × 4 = 20 ≡ 6. For subtraction with negative result: 3 − 5 mod 7. (3 − 5) = −2 ≡ 5 (mod 7) because −2 + 7 = 5. So (3 mod 7) − (5 mod 7) = 3 − 5 = −2; then (−2 + 7) % 7 = 5.
Division and the Need for Inverses
Division modulo m is not “divide then take remainder.” In integers, we don’t have fractions. Instead, to “divide by b” mod m we multiply by the modular inverse of b: a number b−1 such that b · b−1 ≡ 1 (mod m). Then (a / b) mod m is defined as (a · b−1) mod m. The inverse exists iff gcd(b, m) = 1 (e.g., when m is prime, every non-zero b has an inverse). So:
(a / b) mod m = (a · b−1) mod m, where b−1 is the modular inverse of b (next topic). Never compute (a mod m) / (b mod m) as integers and then take mod—that’s wrong.
Assuming (a / b) mod m = (a mod m) / (b mod m). Division in modular arithmetic is multiplication by the inverse: (a · b−1) mod m. If you don’t have the inverse, you can’t “divide” in the usual sense.
Congruence Properties
If a ≡ b (mod m) and c ≡ d (mod m), then:
- a + c ≡ b + d (mod m)
- a − c ≡ b − d (mod m)
- a · c ≡ b · d (mod m)
- ak ≡ bk (mod m) for any non-negative integer k
So we can replace any term with a congruent one before doing operations. That justifies reducing mod m at each step to keep numbers small.
Python Implementation
Addition and Multiplication
def add_mod(a: int, b: int, m: int) -> int:
return (a % m + b % m) % m
def sub_mod(a: int, b: int, m: int) -> int:
return (a % m - b % m + m) % m
def mul_mod(a: int, b: int, m: int) -> int:
return (a % m * (b % m)) % m
Adding m in sub_mod ensures the result is in [0, m−1] even when a % m < b % m (since in Python a % m is already in [0, m−1], (a % m - b % m) can be negative; adding m and then % m fixes it).
Power: ab mod m
# Use fast exponentiation; in Python:
pow(a, b, m) # computes (a^b) % m efficiently
Negative Numbers and “Canonical” Representative
# In Python, a % m is already in [0, m-1] when m > 0
-17 % 5 # 3
# If you get a from a language where % can be negative:
def to_canonical(a: int, m: int) -> int:
return (a % m + m) % m
Line-by-Line Explanation (sub_mod)
a % m,b % m— Bring both into [0, m−1].a % m - b % m— Can be negative (e.g., 3 − 5 = −2).+ m— Add one modulus so the value becomes in [0, m−1] (e.g., −2 + 7 = 5).% m— Final reduce (redundant if we only added one m, but safe if we ever add more or if the first % was done in a different way).
Time and Space Complexity
Addition, subtraction, multiplication modulo m: O(1) for fixed-size integers; O(log m) or O(log a + log b) for arbitrary-precision. Power mod m: O(log b) using fast exponentiation (previous topic).
Edge Cases
- m = 1: Every integer is ≡ 0 (mod 1). So a mod 1 = 0 for any a. Operations mod 1 always yield 0.
- m < 1: Modulus is usually defined as positive. In code, avoid m ≤ 0 or handle explicitly (Python’s % with negative divisor has a different convention).
- Negative a: In Python, a % m is in [0, m−1]. So (−17) % 5 = 3. No extra fix needed. In other languages, use (a % m + m) % m.
- Large intermediates: Always reduce before and after multiplication so (a * b) % m is computed as ((a % m) * (b % m)) % m to avoid overflow in C++/Java; in Python it’s for speed and clarity.
Common Mistakes
- Dividing without inverse: (a / b) mod m must be (a · b−1) mod m. Don’t use integer division.
- Subtraction giving negative: (a − b) mod m must be in [0, m−1]. Use (a % m − b % m + m) % m.
- Overflow in multiplication: In C++/Java, (a % m) * (b % m) can still overflow if m is large. Use long or ((a % m) * (b % m)) % m with 64-bit; for very large m, use a type that can hold (m−1)² or use a custom big-int approach.
- Assuming mod distributes over everything: (a − b) mod m ≠ (a mod m) − (b mod m) when the right-hand side is negative; you must add m. And (a / b) mod m ≠ (a mod m) / (b mod m).
Optimization Insight
In long expressions (e.g., sums of products), reduce after each operation so that every intermediate stays in [0, m−1]. That keeps numbers small and avoids overflow. For (a + b + c) mod m, you can do ((a + b) % m + c) % m; for (a * b * c) mod m, ((a * b) % m * c) % m.
Pattern Recognition
Whenever the problem says “output modulo 10⁹+7” (or similar), all arithmetic in your solution should be done mod m. Counts, sums, products—reduce at each step. If you need to “divide” (e.g., divide by 2 or by n!), use the modular inverse. The pattern is: work in the ring of integers mod m; addition, subtraction, multiplication are safe; division is multiply by inverse.
Interview Insight
When the problem asks for “answer mod 10⁹+7,” say: “I’ll do all arithmetic modulo m. For addition and multiplication I’ll reduce after each step. For subtraction I’ll use (a - b + m) % m to keep the result non-negative. For division I’ll use the modular inverse (e.g., pow(b, -1, m) in Python or extended GCD).” Mention that a ≡ b (mod m) means (a − b) is divisible by m, and that we work with representatives in [0, m−1].
Practice Problems
- Implement add_mod, sub_mod, mul_mod and test with negative numbers and large values.
- Compute (a + b + c) mod m and (a · b · c) mod m by reducing at each step.
- Given a, b, m, compute (a − b) mod m correctly when a < b.
- Problems that ask for “number of ways mod 10⁹+7”: use modular arithmetic throughout; when you need to divide by k!, compute k! mod m and then multiply by its inverse mod m.
Summary
- a ≡ b (mod m) iff m | (a − b); equivalently, same remainder when divided by m. We usually work with representatives in [0, m−1].
- Addition/subtraction/multiplication: (a ± b) mod m and (a · b) mod m can be computed by reducing operands first, then doing the operation, then reducing. For subtraction use (a % m − b % m + m) % m to keep result in [0, m−1].
- Division: (a / b) mod m = (a · b−1) mod m; b−1 is the modular inverse (exists when gcd(b, m) = 1). Never use integer division.
- Congruence is preserved under +, −, ·, and powers. Reduce at each step to avoid overflow and keep intermediates small. Use
pow(a, b, m)for ab mod m.
4.8 Modular Inverse
Introduction
The modular inverse of an integer a modulo m is an integer x such that a · x ≡ 1 (mod m). We write x ≡ a−1 (mod m). It lets us “divide” by a in modular arithmetic: (b / a) mod m = (b · a−1) mod m. The inverse exists if and only if gcd(a, m) = 1 (a and m are coprime). When it exists, it is unique modulo m. This section covers when and why the inverse exists, two ways to compute it (Extended Euclidean and Fermat’s little theorem when m is prime), and how to use it in code.
Real-World Analogy
In normal arithmetic, the inverse of 3 is 1/3 because 3 × (1/3) = 1. Modulo m we only have integers, so we can’t use fractions. The modular inverse is the integer that “plays the role” of 1/a: when you multiply a by it, you get 1 (mod m). For example, mod 7 we have 3 × 5 = 15 ≡ 1 (mod 7), so 5 is the inverse of 3 mod 7. “Dividing by 3” mod 7 means multiplying by 5.
Formal Definition
Modular inverse: For integers a and m with m > 0, a−1 (mod m) is an integer x in [0, m−1] such that a · x ≡ 1 (mod m). Such an x exists iff gcd(a, m) = 1, and when it exists it is unique modulo m (all solutions are x + k·m for integer k).
Why gcd(a, m) = 1? The equation a·x ≡ 1 (mod m) means a·x + m·y = 1 for some integer y. By Bézout’s identity, this has a solution iff gcd(a, m) divides 1, i.e., gcd(a, m) = 1.
Why This Topic Matters
- Division mod m: To compute (a / b) mod m you need b−1 mod m, then (a · b−1) mod m. Essential whenever the problem asks for “answer mod 10⁹+7” and your formula involves division (e.g., n! / (k! (n−k)!) for combinations).
- Combinatorics: nCr mod p, Catalan numbers, partition counts—many use factorials and require dividing by factorials mod p. Precompute factorials and inverse factorials mod p using the inverse.
- Cryptography: RSA decryption and many protocols use the modular inverse.
When the Inverse Exists
The inverse of a mod m exists iff gcd(a, m) = 1. So:
- When m is prime, every a with 1 ≤ a ≤ m−1 has an inverse (since gcd(a, m) = 1). 0 has no inverse.
- When m is composite, a has an inverse iff a and m are coprime. For example mod 10, 3 has inverse 7 (3·7=21≡1); 2 has no inverse because gcd(2, 10) = 2 ≠ 1.
Two Methods to Compute the Inverse
Method 1: Extended Euclidean Algorithm
Solve a·x + m·y = 1. The coefficient x is a modular inverse of a mod m. Run extended_gcd(a, m); if g ≠ 1, no inverse. Otherwise reduce x to [0, m−1]: inv = x % m or inv = (x % m + m) % m if x might be negative. This works for any m and any a with gcd(a, m) = 1. Time O(log min(a, m)).
Method 2: Fermat’s Little Theorem (When m is Prime)
If m is prime and a is not divisible by m, then am−1 ≡ 1 (mod m). So a · am−2 ≡ 1 (mod m), hence a−1 ≡ am−2 (mod m). Compute am−2 mod m with fast exponentiation. Time O(log m). Only applies when m is prime.
Inverse of 3 mod 7. Extended GCD: 3·x + 7·y = 1 → (1, 5, −2), so x = 5. Check: 3·5 = 15 ≡ 1 (mod 7). Or by Fermat (m=7 prime): 3−1 ≡ 35 = 243 ≡ 5 (mod 7).
Python Implementation
Using Extended GCD (Works for Any m)
def mod_inverse_gcd(a: int, m: int) -> int | None:
"""Returns a^(-1) mod m if gcd(a, m) = 1, else None."""
g, x, _ = extended_gcd(a % m, m)
if g != 1:
return None
return (x % m + m) % m
Assume extended_gcd is from topic 4.5. We take a % m so we work with a in [0, m−1]; then (x % m + m) % m puts the inverse in [0, m−1].
Using Fermat (When m is Prime)
def mod_inverse_fermat(a: int, m: int) -> int | None:
"""Returns a^(-1) mod m using a^(m-2). Only when m is prime."""
a %= m
if a == 0:
return None
return pow(a, m - 2, m)
Using Python’s Built-in (3.8+)
# When gcd(a, m) = 1:
pow(a, -1, m) # returns a^(-1) mod m
# Raises ValueError if gcd(a, m) != 1
In practice use pow(a, -1, m) when you know a and m are coprime (e.g., m prime and a not divisible by m).
Line-by-Line Explanation (mod_inverse_gcd)
a % m— Work with a in [0, m−1]; gcd and inverse are unchanged.extended_gcd(a % m, m)— Get (g, x, y) with (a mod m)·x + m·y = g.if g != 1: return None— Inverse exists only when gcd is 1.(x % m + m) % m— x might be negative; this gives the unique representative in [0, m−1].
Time and Space Complexity
Extended GCD method: O(log min(a, m)). Fermat (m prime): O(log m) for pow(a, m−2, m). Space O(1) for both iterative implementations.
Edge Cases
- a = 0: 0 has no inverse (0·x ≡ 0 ≢ 1 for any x). Return None or handle before calling.
- gcd(a, m) ≠ 1: No inverse. extended_gcd returns g > 1; return None.
pow(a, -1, m)raises ValueError. - m = 1: Every number is ≡ 0 (mod 1); “inverse” isn’t useful. Usually m ≥ 2 in practice.
- Negative a: Reduce a to [0, m−1] first with a % m; the inverse of (a mod m) is the same as the inverse of a mod m.
Common Mistakes
- Not checking gcd: If you assume the inverse exists and it doesn’t, you’ll get wrong results or a crash. Always check g == 1 (or catch ValueError for pow).
- Returning negative x: The inverse should be in [0, m−1]. Use (x % m + m) % m after extended GCD.
- Using Fermat when m is not prime: a−1 ≡ am−2 only holds when m is prime. For composite m use extended GCD.
Using Fermat’s little theorem (am−2) to compute the inverse when m is composite. The identity am−1 ≡ 1 (mod m) holds for all a coprime to m only when m is prime. For composite m (e.g., m = 10⁹+7 is prime, so Fermat is fine), but if the problem used a composite modulus you must use the extended Euclidean algorithm.
Application: (a / b) mod m
To compute (a / b) mod m when gcd(b, m) = 1: find inv = b−1 mod m, then return (a % m * inv) % m. So “division” is multiply by the inverse.
def div_mod(a: int, b: int, m: int) -> int | None:
inv = pow(b, -1, m) # or mod_inverse_gcd(b, m)
if inv is None:
return None
return (a % m * inv) % m
Interview Insight
When asked “how do you compute the modular inverse?”, say: “If gcd(a, m) = 1, the inverse exists. Two ways: (1) Extended Euclidean algorithm—solve a·x + m·y = 1, then x mod m is the inverse. (2) When m is prime, by Fermat’s little theorem a−1 ≡ am−2 (mod m), so we can use pow(a, m-2, m). In Python 3.8+ we use pow(a, -1, m). Always check that the inverse exists when m is composite.”
Practice Problems
- Implement mod_inverse using extended GCD and (when m is prime) using Fermat; verify a · inv ≡ 1 (mod m).
- Compute nCr mod p (p prime): nCr = n! / (k! (n−k)!); precompute factorials and inverse factorials mod p, then combine.
- Given a, b, m, compute (a / b) mod m when possible; return a sentinel or raise when b has no inverse.
Summary
- The modular inverse of a mod m is x with a·x ≡ 1 (mod m). It exists iff gcd(a, m) = 1; when it exists it is unique in [0, m−1].
- Extended GCD: Solve a·x + m·y = 1; x mod m (adjusted for negative) is the inverse. Works for any m. Time O(log min(a, m)).
- Fermat (m prime): a−1 ≡ am−2 (mod m). Compute with pow(a, m−2, m). Only when m is prime.
- Use
pow(a, -1, m)in Python 3.8+ when gcd(a, m) = 1. (a / b) mod m = (a · b−1) mod m.
4.9 Combinatorics
Introduction
Combinatorics is the study of counting: the number of ways to arrange, select, or partition objects under given rules. In DSA and competitive programming you constantly need “how many ways …?” (paths, subsets, arrangements, valid configurations). This section covers the foundational counting principles—the sum rule and product rule—and the role of factorials in counting. The next topic (4.10) gives the exact formulas for permutations and combinations; here we build the mindset and the modular tools (factorials and inverse factorials mod m) needed to compute those efficiently.
Real-World Analogy
Imagine choosing an outfit: 3 shirts and 4 pants. Any shirt can pair with any pant, so total outfits = 3 × 4 = 12. That’s the product rule: when one choice doesn’t affect the other, multiply the number of options. Now imagine you can either wear a hat (2 choices) or no hat (1 choice), but not both—total hat options = 2 + 1 = 3. That’s the sum rule: when choices are mutually exclusive, add. Most counting problems combine these two rules.
Formal Definition
Sum rule: If a task can be done in one of n1 ways, or n2 ways, …, or nk ways, and these options are mutually exclusive, then the task can be done in n1 + n2 + … + nk ways.
Product rule: If a task is done in a sequence of steps: step 1 in n1 ways, step 2 in n2 ways (regardless of step 1), …, step k in nk ways, then the task can be done in n1 × n2 × … × nk ways.
Many problems require breaking the count into cases (sum rule) and then counting each case by a sequence of choices (product rule).
Why This Topic Matters
- “Number of ways” problems: Count paths, subsets, valid sequences, placements—all use combinatorial reasoning.
- Permutations and combinations: The next topic gives P(n,r) and C(n,r); both rely on factorials and on the product/sum rules for derivation.
- Modular counting: Problems often ask for the answer “mod 10⁹+7.” You need factorials and inverse factorials mod m (from the previous topics) to compute nCr and nPr without overflow.
Mental Model
Ask: “Is this a sequence of independent choices?” → product rule (multiply). “Is this one of several disjoint cases?” → sum rule (add). Often you combine: “For each case i, count the ways (product rule); then sum over cases.” Also: “Order matters” vs “order doesn’t matter” will distinguish permutations from combinations (topic 4.10).
Factorials and Growth
The factorial of a non-negative integer n is n! = n × (n−1) × … × 1, with 0! = 1. It counts the number of ways to arrange n distinct objects in a line (permutations of n objects). Factorials grow very fast: 10! ≈ 3.6×10⁶, 20! ≈ 2.4×10¹⁸. So we almost always work modulo m (e.g., 10⁹+7) when n is large.
def factorial(n: int, m: int | None = None) -> int:
"""n! or n! mod m if m given."""
if n < 0:
return 0 # or raise
res = 1
for i in range(2, n + 1):
res *= i
if m is not None:
res %= m
return res
For repeated use (e.g., many nCr queries), precompute factorials 0! .. N! mod m in an array—O(N) time once, then O(1) per lookup.
Precomputing Factorials and Inverse Factorials Mod m
To compute nCr = n! / (k! (n−k)!) mod m (m prime), we need n!, k!, (n−k)! mod m and then divide by (k! (n−k)!) using the modular inverse. Precompute:
fact[i]= i! mod m for i = 0..Ninv_fact[i]= (i!)−1 mod m (inverse factorial)
Then nCr mod m = fact[n] × inv_fact[k] × inv_fact[n−k] mod m. Building inv_fact: inv_fact[N] = pow(fact[N], -1, m), then inv_fact[i] = inv_fact[i+1] × (i+1) mod m (so inv_fact[i] = 1/(i!) mod m).
def precompute_factorials(n: int, m: int) -> tuple[list[int], list[int]]:
fact = [1] * (n + 1)
for i in range(1, n + 1):
fact[i] = (fact[i - 1] * i) % m
inv_fact = [1] * (n + 1)
inv_fact[n] = pow(fact[n], -1, m)
for i in range(n - 1, -1, -1):
inv_fact[i] = (inv_fact[i + 1] * (i + 1)) % m
return fact, inv_fact
Then C(n, k) mod m = fact[n] * inv_fact[k] % m * inv_fact[n - k] % m (for 0 ≤ k ≤ n). This is the standard setup for combinatorics mod m.
Sum Rule and Product Rule: Examples
Product rule
Number of k-digit strings over an alphabet of size d (repetition allowed): each of k positions has d choices → dk. Number of ways to order n distinct items: n choices for first, n−1 for second, … → n!.
Sum rule
Number of ways to get a sum of 7 with two dice: (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) → 6 ways. Each outcome is mutually exclusive, so we add.
Combined
Count 5-letter words that start with a vowel (a,e,i,o,u) or end with a consonant. Cases: (1) start vowel, end consonant; (2) start vowel, end vowel; (3) start consonant, end consonant. Each case uses the product rule; the three cases are disjoint, so add the three counts.
Time and Space Complexity
Single factorial mod m: O(n). Precompute factorials 0..N: O(N) time, O(N) space. Each nCr query after precomputation: O(1). So for many nCr queries (e.g., up to N), precomputation is essential.
Edge Cases
- n < 0 or k < 0: Factorial and C(n,k) are usually defined for n, k ≥ 0. Return 0 or handle as invalid.
- k > n: C(n, k) = 0 (no way to choose k items from n). So check 0 ≤ k ≤ n before computing.
- m = 1: Everything mod 1 is 0; factorials and nCr mod 1 are 0. Usually m is a prime > 1 (e.g., 10⁹+7).
Common Mistakes
- Confusing sum and product: If choices are independent (one doesn’t restrict the other), multiply. If they’re mutually exclusive (either A or B), add.
- Computing n! without mod for large n: n! overflows or becomes huge. Always work mod m when the problem says “mod 10⁹+7.”
- Computing nCr as fact[n] // (fact[k] * fact[n-k]): Integer division is wrong in modular arithmetic. Use fact[n] * inv_fact[k] * inv_fact[n-k] mod m.
Using integer division for nCr mod m: (fact[n] // fact[k]) // fact[n-k] is wrong—division in integers doesn’t preserve congruence. You must multiply by the modular inverse of the denominator (or use precomputed inverse factorials).
Pattern Recognition
“In how many ways …?” usually means: identify disjoint cases (sum rule) and/or a sequence of choices (product rule). When the count involves “choose k from n” or “arrange n items,” use permutations and combinations (next topic); the formulas are built from factorials and the rules above.
Interview Insight
When the problem asks for “number of ways mod 10⁹+7,” say: “I’ll use the sum and product rules to break the count into cases and choices. For formulas that need nCr or nPr I’ll precompute factorials and inverse factorials mod m in O(N), then each query is O(1). I won’t use integer division for nCr—I’ll use inverse factorials.”
Practice Problems
- Precompute fact and inv_fact for n up to 10⁶ mod 10⁹+7; implement nCr(n, k, fact, inv_fact, m) and test.
- Count the number of k-digit numbers with no leading zero: first digit 1..9, remaining k−1 digits 0..9 → 9 × 10k−1 (product rule).
- Solve a problem that asks for “number of ways” mod 10⁹+7 using sum/product rules and (when needed) nCr from precomputed factorials.
Summary
- Sum rule: Mutually exclusive options → add the counts. Product rule: Sequence of independent choices → multiply the counts.
- Factorial: n! = n×(n−1)×…×1, 0! = 1; grows fast—work mod m for large n. Precompute fact[0..N] and inv_fact[0..N] mod m for O(1) nCr/nPr.
- nCr mod m = fact[n] × inv_fact[k] × inv_fact[n−k] mod m (for 0 ≤ k ≤ n). Never use integer division; use modular inverses.
- Combinatorics problems: identify cases (sum) and choices (product); then plug in permutations/combinations (next topic) when order or selection is involved.
4.10 Permutations & Combinations
Introduction
A permutation is an arrangement of objects in a specific order; a combination is a selection of objects where order does not matter. “How many ways to arrange 3 books from 5?” is P(5, 3) = 60. “How many ways to choose 3 books from 5?” is C(5, 3) = 10. Permutations and combinations are the core counting tools in DSA: subsets, teams, passwords, placements. This section gives the definitions, formulas (with and without repetition), efficient computation mod m using precomputed factorials (from 4.9), and common identities.
Real-World Analogy
Permutation: Picking 3 people for president, vice-president, secretary—order matters (Alice–Bob–Carol is different from Bob–Alice–Carol). So we count arrangements. Combination: Picking 3 people for a committee—order doesn’t matter (the same three people are one committee). So we count subsets. Same “choose 3 from n,” but permutation counts orderings (more), combination counts subsets (fewer); and P(n, 3) = C(n, 3) × 3!.
Formal Definition
Permutation (without repetition): P(n, r) or Pn,r = number of ways to arrange r distinct objects chosen from n distinct objects. Order matters. Formula: P(n, r) = n! / (n−r)! = n × (n−1) × … × (n−r+1).
Combination (without repetition): C(n, r) or C(n,r) or nCr or (n choose r) = number of ways to choose r objects from n distinct objects. Order does not matter. Formula: C(n, r) = n! / (r! (n−r)!).
By convention, P(n, 0) = C(n, 0) = 1 (one way to arrange or choose nothing). For r > n, P(n, r) = C(n, r) = 0.
Why This Topic Matters
- Subset and selection problems: “How many subsets of size k?” → C(n, k). “How many ways to assign k distinct roles from n people?” → P(n, k).
- Counting paths, sequences, and configurations: Many problems decompose into “choose positions” (combination) and “assign values” (permutation or product rule).
- Competitive programming: nCr and nPr mod 10⁹+7 appear constantly; you need the formulas and fast implementation with precomputed factorials.
Relation Between P and C
To arrange r objects chosen from n: first choose the r objects (C(n, r) ways), then order them (r! ways). So P(n, r) = C(n, r) × r!. Hence C(n, r) = P(n, r) / r! = n! / (r! (n−r)!).
Key Identities
- Symmetry: C(n, k) = C(n, n−k). Choosing k is the same as “leaving out” n−k.
- Pascal’s identity: C(n, k) = C(n−1, k−1) + C(n−1, k). Either the first element is in the subset (C(n−1, k−1)) or it isn’t (C(n−1, k)). This gives a recurrence to build Pascal’s triangle.
- Sum of row: C(n, 0) + C(n, 1) + … + C(n, n) = 2n (total subsets of an n-element set).
With Repetition (Brief)
Permutations with repetition: Arrange r objects from n types, each type available unlimited times. Each of r positions has n choices → nr.
Combinations with repetition (multiset choice): Choose r objects from n types (unlimited of each). Formula: C(n+r−1, r) = C(n+r−1, n−1) (“stars and bars”). Example: number of ways to distribute r identical candies to n children = C(n+r−1, r).
Python Implementation
Assume we have precomputed fact and inv_fact (from topic 4.9) for indices 0..N mod m.
nCr (combinations)
def nCr(n: int, k: int, fact: list[int], inv_fact: list[int], m: int) -> int:
if k < 0 or k > n:
return 0
return fact[n] * inv_fact[k] % m * inv_fact[n - k] % m
nPr (permutations)
def nPr(n: int, r: int, fact: list[int], inv_fact: list[int], m: int) -> int:
if r < 0 or r > n:
return 0
return fact[n] * inv_fact[n - r] % m
nPr(n, r) = n! / (n−r)! = fact[n] × inv_fact[n−r]. No need for inv_fact[r] unless you derive from nCr × r!.
Line-by-Line Explanation (nCr)
if k < 0 or k > n: return 0— No way to choose k from n when k > n or k < 0. C(n, 0) = 1 is handled by the formula (inv_fact[0] = 1).fact[n] * inv_fact[k] % m * inv_fact[n - k] % m— n! / (k! (n−k)!) mod m. Multiply by inverse factorials instead of dividing. Reduce mod m after each multiplication to keep intermediates bounded.
Time and Space Complexity
With precomputed fact and inv_fact: O(1) per nCr or nPr query. Precomputation is O(N) time and O(N) space (topic 4.9). Without precomputation, computing one nCr with a loop (multiply and divide) is O(min(k, n−k)); for many queries, precomputation is better.
Edge Cases
- k > n or k < 0: C(n, k) = 0. Return 0.
- k = 0 or k = n: C(n, 0) = C(n, n) = 1. The formula gives fact[n] * inv_fact[0] * inv_fact[n] = 1 when inv_fact[0] = 1 and inv_fact[n] = 1/fact[n].
- r > n for nPr: P(n, r) = 0. Return 0.
Common Mistakes
- Using P when order doesn’t matter: “Choose a committee of 3” → C(n, 3), not P(n, 3). P counts orderings.
- Using C when order matters: “Rank top 3” or “assign 3 distinct roles” → P(n, 3).
- Off-by-one in formulas: P(n, r) has r factors: n, n−1, …, n−r+1. So the last factor is (n−r+1), not (n−r).
- Integer division for nCr: Use inverse factorials (or modular inverse) when working mod m; never integer division.
Using P(n, r) when the problem says “choose” or “select” and order doesn’t matter. That overcounts by r!. Use C(n, r). Conversely, using C(n, r) when positions or order are distinct (e.g., “first place, second place, third place”) undercounts—use P(n, r).
Pattern Recognition
Ask: “Does order matter?” Yes → permutation (arrange, rank, assign distinct roles). No → combination (choose, committee, subset). Then check: repetition allowed? If yes, use nr (permutation) or stars-and-bars (combination with repetition). If no, use P(n, r) or C(n, r).
Interview Insight
When the problem involves “number of ways to choose/arrange,” clarify: “Order matters → permutation P(n,r) = n!/(n−r)!. Order doesn’t matter → combination C(n,r) = n!/(k!(n−k)!). I’ll precompute factorials and inverse factorials mod m so each nCr/nPr is O(1).” Mention symmetry C(n,k)=C(n,n−k) to reduce k when k > n/2 for a slight optimization.
Practice Problems
- Implement nCr and nPr with precomputed fact/inv_fact; verify against small values (e.g., C(5,2)=10, P(5,2)=20).
- Count subsets of size k: C(n, k). Count permutations of n objects: n!. Count ways to place k distinct items in n distinct positions (each at most one): P(n, k).
- LeetCode-style: “Unique paths” (grid with C(n+m-2, n-1)), “combinations” (enumerate C(n,k)), or “number of ways” mod 10⁹+7 using nCr.
Summary
- Permutation P(n, r) = n!/(n−r)!: arrange r distinct objects from n; order matters. Combination C(n, r) = n!/(r!(n−r)!): choose r from n; order doesn’t matter. P(n, r) = C(n, r) × r!.
- Key identities: C(n, k) = C(n, n−k); C(n, k) = C(n−1, k−1) + C(n−1, k); sum of C(n,0)..C(n,n) = 2n.
- With repetition: permutations nr; combinations (stars and bars) C(n+r−1, r).
- Implement nCr/nPr mod m with precomputed fact and inv_fact; O(1) per query. Edge cases: k > n or k < 0 → 0.
4.11 Probability Basics
Introduction
Probability in DSA appears in randomized algorithms, expected-value analysis, and problems that ask “what is the probability that …?” or “expected number of steps?”. You don’t need measure theory—only discrete probability: sample space, events, and the rule P(event) = number of favorable outcomes / number of total outcomes when outcomes are equally likely. This section covers basic definitions, the addition and multiplication rules, complement, independence, and expected value (including linearity). Enough to reason about probability in interviews and to tie counting (combinatorics) to probability.
Real-World Analogy
Roll a fair die. The sample space is {1, 2, 3, 4, 5, 6}. The probability of rolling a 4 is 1/6 (one favorable outcome out of six). The probability of rolling an even number is 3/6 = 1/2 (outcomes 2, 4, 6). “Probability” here is just “favorable count / total count” when every outcome is equally likely. Same idea in algorithms: “probability a random permutation has property P” = (number of permutations with P) / (total permutations).
Formal Definition (Discrete, Equally Likely)
Sample space Ω: the set of all possible outcomes. An event E is a subset of Ω. When all outcomes are equally likely, the probability of event E is P(E) = |E| / |Ω| (number of outcomes in E divided by total outcomes). So probability reduces to counting: count favorable outcomes and total outcomes (often using permutations and combinations).
We have 0 ≤ P(E) ≤ 1. P(Ω) = 1 (something must happen). P(empty) = 0.
Why This Topic Matters
- Expected value: “Expected number of comparisons in quicksort,” “expected steps until random walk hits a state”—linearity of expectation is used everywhere.
- Randomized algorithms: “With probability at least 1/2 the algorithm succeeds”—you need to bound or compute such probabilities.
- Interview problems: “Probability that two people share a birthday,” “expected value of …”—formulate as counting (favorable / total) or as expectation.
Key Rules
Complement
P(not E) = 1 − P(E). So “probability at least one” = 1 − “probability none.”
Addition (Disjoint Events)
If events A and B cannot happen together (disjoint), then P(A or B) = P(A) + P(B). For more than two disjoint events, add all.
Multiplication (Independent Events)
If A and B are independent (one occurring doesn’t change the chance of the other), then P(A and B) = P(A) × P(B). Example: two fair coin flips; P(both heads) = (1/2)×(1/2) = 1/4.
General Addition
For any two events: P(A or B) = P(A) + P(B) − P(A and B). When A and B are disjoint, P(A and B) = 0.
Expected Value
For a discrete random variable X that takes values x1, x2, … with probabilities p1, p2, …, the expected value is E[X] = Σ xi · P(X = xi). Intuition: long-run average if you repeat the experiment many times.
Linearity of expectation: E[X + Y] = E[X] + E[Y] for any X, Y (even if they are not independent). So E[X₁ + X₂ + … + Xₙ] = E[X₁] + … + E[Xₙ]. This is used constantly: break the quantity into indicator or simple random variables, compute each expectation, and add.
Expected number of heads in n fair coin flips: Let Xᵢ = 1 if flip i is heads, 0 otherwise. E[Xᵢ] = 1/2. Total heads X = X₁ + … + Xₙ, so E[X] = n/2.
Probability as Counting
When outcomes are equally likely, P(E) = (number of outcomes in E) / (total outcomes). So:
- Probability that a random subset of size k from {1..n} contains 1 = C(n−1, k−1) / C(n, k) = k/n (or: 1 is in the subset with probability k/n by symmetry).
- Probability that a random permutation of n is a derangement (no fixed point) = (number of derangements) / n!. Useful in some puzzles.
So combinatorics (nCr, nPr, counting) directly gives probabilities when the sample space is “all subsets” or “all permutations” with uniform distribution.
Python: Simple Probability and Expectation
For small spaces you can enumerate and count. For expectation you can sum value × probability, or simulate (Monte Carlo) for verification.
# Example: P(sum of two fair dice = 7)
# Favorable: (1,6),(2,5),(3,4),(4,3),(5,2),(6,1) → 6 outcomes. Total 36.
p_7 = 6 / 36 # 1/6
# Expected value of one die roll (1 to 6)
E_one_die = sum(x for x in range(1, 7)) / 6 # 3.5
# E[X+Y] = E[X] + E[Y]: two dice → 3.5 + 3.5 = 7
Common Mistakes
- Assuming independence when events aren’t: P(A and B) = P(A)×P(B) only when A and B are independent. If they’re not, use P(A and B) = P(A) P(B|A) (conditional) or count outcomes.
- Adding non-disjoint events without subtracting overlap: P(A or B) = P(A) + P(B) − P(A and B). Forgetting the subtraction double-counts.
- Confusing E[X·Y] with E[X]·E[Y]: Linearity gives E[X+Y] = E[X] + E[Y]. In general E[X·Y] ≠ E[X]·E[Y] unless X and Y are uncorrelated/independent.
Using P(A and B) = P(A) × P(B) when A and B are not independent. For example, “probability first draw is red and second draw is red” from an urn without replacement—the second probability depends on the first. Use conditional probability or count favorable outcomes directly.
Interview Insight
When asked a probability question, say: “I’ll assume outcomes are equally likely and compute P(E) = favorable outcomes / total outcomes. I’ll use counting—combinations or permutations—for the numerator and denominator.” For expected value: “I’ll use linearity: write the quantity as a sum of simpler random variables (e.g., indicators), compute each expectation, and add.”
Practice Problems
- Probability that two people in a room of n share a birthday (use complement: 1 − P(all different) = 1 − (365/365)×(364/365)×…×((365−n+1)/365)).
- Expected number of trials until first success (geometric): if P(success) = p, E[trials] = 1/p.
- Given n items and k chosen at random, expected number of “special” items in the chosen set (linearity with indicators).
Summary
- When outcomes are equally likely, P(E) = |E| / |Ω|—probability is counting. Use combinatorics (nCr, nPr) for numerator and denominator.
- Complement: P(not E) = 1 − P(E). Disjoint: P(A or B) = P(A) + P(B). Independent: P(A and B) = P(A)×P(B). General or: P(A or B) = P(A) + P(B) − P(A and B).
- Expected value: E[X] = Σ x·P(X=x). Linearity: E[X+Y] = E[X] + E[Y]; use to break into indicators or simple terms.
- In DSA: randomized algorithms (probability of success), expected running time, and “probability that …” problems (count favorable / total).
4.12 Matrix Basics
Introduction
A matrix is a rectangular grid of numbers (or elements) arranged in rows and columns. An m × n matrix has m rows and n columns. In DSA, matrices appear as 2D arrays (graph adjacency, grid problems), in linear recurrences (matrix exponentiation for Fibonacci—topic 4.17), and in dynamic programming. This section covers representation (row/column indexing), basic operations—addition, scalar multiplication, matrix multiplication, and transpose—and clean Python implementations. Mastery of matrix multiplication (dimensions, loop order) is essential for matrix exponentiation and many advanced topics.
Real-World Analogy
Think of a matrix as a spreadsheet or a data table: rows are entities (e.g., users), columns are attributes (e.g., age, score). The entry in row i and column j is the value for that pair. In graphics or physics, a matrix can represent a transformation (rotation, scaling); multiplying a vector by a matrix gives a new vector. In algorithms, we often multiply matrices to combine transitions (e.g., “one step” in a recurrence becomes a matrix; “n steps” is the matrix raised to power n).
Formal Definition
An m × n matrix A has m rows and n columns. We write A[i][j] or Ai,j for the entry in row i and column j (often 0-indexed: rows 0..m−1, columns 0..n−1). The transpose AT is the n×m matrix with (AT)j,i = Ai,j—rows and columns swapped.
Two matrices of the same dimensions can be added entry-wise. Matrix multiplication is defined when the number of columns of the first equals the number of rows of the second: (m×n)(n×p) → m×p.
Why This Topic Matters
- 2D arrays and grids: Matrices are the natural structure for grids (maze, board, image). Traversal, DP on grids, and adjacency matrices all use the same indexing.
- Matrix multiplication: Recurrences like Fibonacci can be written as a vector updated by a fixed matrix; then Fn is obtained by “matrix to power n” (topic 4.17). So matrix multiply is the building block.
- Graphs: Adjacency matrix of a graph is an n×n matrix; A[i][j] = 1 if there’s an edge from i to j. Powers of the adjacency matrix count walks of a given length.
Representation in Python
A matrix is typically a list of lists: each inner list is a row. So A[i][j] is row i, column j. Ensure every row has the same length (same number of columns).
# 2×3 matrix
A = [
[1, 2, 3],
[4, 5, 6]
]
# A[0][1] = 2, A[1][2] = 6
rows, cols = len(A), len(A[0])
Basic Operations
Addition
For two matrices A and B of the same size (m×n), (A + B)[i][j] = A[i][j] + B[i][j]. Time O(m·n), space O(m·n) for the result.
Scalar Multiplication
(c·A)[i][j] = c × A[i][j]. Time O(m·n).
Transpose
AT has dimensions n×m with AT[j][i] = A[i][j]. So the j-th row of AT is the j-th column of A. Time O(m·n).
Matrix Multiplication
Let A be m×n and B be n×p. The product C = A×B is m×p with:
C[i][j] = Σk=0n−1 A[i][k] · B[k][j]
So the (i,j) entry of the product is the dot product of row i of A and column j of B. The inner dimension (n) must match; the result has dimensions m×p.
2×2 times 2×2: A = [[a,b],[c,d]], B = [[e,f],[g,h]]. C[0][0] = a·e + b·g, C[0][1] = a·f + b·h, C[1][0] = c·e + d·g, C[1][1] = c·f + d·h.
ASCII Diagram: Matrix Multiplication
A (m×n) × B (n×p) = C (m×p)
row i ──────────────► dot with col j ──► C[i][j]
[ ... A[i][k] ... ] [ B[k][j] ] = sum_k A[i][k]*B[k][j]
[ ... ]
Inner dimension n must match; result has outer dimensions m and p.
Identity Matrix
The identity matrix In is the n×n matrix with I[i][j] = 1 if i = j and 0 otherwise. For any n×n matrix A, A·I = I·A = A. In exponentiation we use I as the base case (A0 = I).
Python Implementation
Matrix Addition
def mat_add(A: list[list[float]], B: list[list[float]]) -> list[list[float]]:
m, n = len(A), len(A[0])
if len(B) != m or len(B[0]) != n:
raise ValueError("dimension mismatch")
return [[A[i][j] + B[i][j] for j in range(n)] for i in range(m)]
Matrix Multiplication
def mat_mul(A: list[list[float]], B: list[list[float]]) -> list[list[float]]:
m, n, p = len(A), len(A[0]), len(B[0])
if len(B) != n:
raise ValueError("dimension mismatch: A is m×n, B must be n×p")
C = [[0] * p for _ in range(m)]
for i in range(m):
for j in range(p):
for k in range(n):
C[i][j] += A[i][k] * B[k][j]
return C
Transpose
def mat_transpose(A: list[list[float]]) -> list[list[float]]:
if not A:
return []
m, n = len(A), len(A[0])
return [[A[i][j] for i in range(m)] for j in range(n)]
Identity Matrix
def identity(n: int) -> list[list[float]]:
I = [[0] * n for _ in range(n)]
for i in range(n):
I[i][i] = 1
return I
Line-by-Line Explanation (mat_mul)
m, n, p = len(A), len(A[0]), len(B[0])— A is m×n, B must be n×p; result C is m×p.if len(B) != n— B must have n rows (same as A’s columns) for the product to be defined.C = [[0] * p for _ in range(m)]— Initialize m×p result to zeros.for i in range(m): for j in range(p): for k in range(n):— For each (i,j), C[i][j] = sum over k of A[i][k]*B[k][j]. Loop order: i, j, k is standard (good cache behavior when row-major).
Time and Space Complexity
Addition: O(m·n) time, O(m·n) space for result. Transpose: O(m·n) time and space. Matrix multiplication (naive): Three nested loops over m, p, n → O(m·n·p) time. For two n×n matrices, O(n³). Space for result O(m·p). (Strassen and other methods reduce the exponent for large matrices but are rarely needed in interviews.)
Modular Matrix Multiplication
When entries are integers and the problem asks for “result mod M,” reduce modulo M after each multiplication and addition to avoid overflow and keep numbers small. Same loop structure; add % M when accumulating C[i][j].
def mat_mul_mod(A: list[list[int]], B: list[list[int]], M: int) -> list[list[int]]:
m, n, p = len(A), len(A[0]), len(B[0])
if len(B) != n:
raise ValueError("dimension mismatch")
C = [[0] * p for _ in range(m)]
for i in range(m):
for j in range(p):
for k in range(n):
C[i][j] = (C[i][j] + A[i][k] * B[k][j]) % M
return C
Edge Cases
- Empty matrix: A = [] or A = [[]]. Rows = 0; number of columns undefined. Check
if not A or not A[0]before using len(A[0]). - Dimension mismatch: For A×B, A’s columns must equal B’s rows. Validate before looping.
- Single row or column: A 1×n matrix times an n×1 matrix gives a 1×1 matrix (a single number). Handled correctly by the same loops.
Common Mistakes
- Wrong loop order: C[i][j] must sum over k. So the inner loop must be over k (A’s columns / B’s rows). Writing loops in wrong order (e.g., j, i, k without i fixed) gives wrong indices.
- Using A[k][j] instead of B[k][j]: The second matrix is B; row k of A times column j of B uses A[i][k] and B[k][j]. Don’t mix A and B.
- Dimension confusion: (m×n)(n×p) = m×p. The shared dimension n is the one we sum over.
In matrix multiplication, the (i,j) entry uses row i of A and column j of B. So the inner loop is over k: A[i][k] and B[k][j]. Writing A[i][j] and B[i][j] or swapping A and B in the inner product is wrong.
Pattern Recognition
Many problems that look like “apply a linear recurrence n times” can be expressed as “vector × matrix^n”. The recurrence defines the matrix; matrix multiplication combines two steps into one. So implementing mat_mul (and later mat_pow using fast exponentiation) is a recurring pattern.
Interview Insight
When the problem involves matrices, state dimensions clearly: “A is m×n, B is n×p, so the product is m×p. The (i,j) entry is the dot product of row i of A and column j of B—three nested loops over i, j, k with C[i][j] += A[i][k]*B[k][j]. For matrix exponentiation we’ll use this multiply with fast exponentiation (binary exponentiation on the matrix).”
Practice Problems
- Implement mat_add, mat_mul, transpose, identity and test on 2×2 examples (hand-compute expected result).
- Implement mat_mul_mod for integer matrices mod M; use it as the “multiply” in matrix exponentiation (next topic 4.17).
- Count walks of length k from vertex i to j in a graph using the adjacency matrix: the (i,j) entry of Ak is the number of walks of length k.
Summary
- A matrix is an m×n grid; A[i][j] is row i, column j. Represent in Python as list of lists (each list is a row).
- Addition: entry-wise, same dimensions. Transpose: (AT)j,i = Ai,j. Multiplication: (m×n)(n×p) = m×p; (AB)[i][j] = Σk A[i][k]·B[k][j].
- Naive matrix multiply: O(m·n·p) time. For mod M, reduce after each operation. Identity matrix I satisfies A·I = I·A = A.
- Loop order: for i, for j, for k with C[i][j] += A[i][k]*B[k][j]. Use row of A and column of B—don’t mix indices.
4.13 Euler's Totient Function
Introduction
Euler’s totient function φ(n) counts the number of integers in {1, 2, …, n} that are coprime to n (i.e., gcd(k, n) = 1). So φ(n) is also the number of elements in the set of integers modulo n that have a multiplicative inverse—the size of the “unit group” mod n. It appears in Euler’s theorem (aφ(n) ≡ 1 (mod n) when gcd(a, n) = 1), in RSA cryptography, and in counting problems (e.g., “how many fractions a/b in lowest terms with 0 < a < b ≤ n?”). This section defines φ(n), gives formulas (including from prime factorization and via a sieve), and shows how to compute it in code.
Real-World Analogy
Imagine n chairs in a circle. You want to assign each chair a number from 1 to n so that every chair gets a number that “shares no common factor with n” in a certain sense—think of it as “valid” positions. The number of valid assignments is φ(n). Equivalently: among 1, 2, …, n, how many share no prime factor with n? That count is φ(n). For n = 10, the numbers coprime to 10 are 1, 3, 7, 9 → φ(10) = 4.
Formal Definition
For a positive integer n, φ(n) (Euler’s totient) is the number of integers k in the range 1 ≤ k ≤ n such that gcd(k, n) = 1. So φ(1) = 1 (1 is coprime to 1 by convention). For n ≥ 2, φ(n) is the count of elements in {1, …, n} that have a multiplicative inverse modulo n.
If n has prime factorization n = p1a₁ p2a₂ … pkaₖ, then φ(n) = n × Π (1 − 1/pi) = n × (1 − 1/p₁)(1 − 1/p₂)…(1 − 1/pk). Equivalently, φ(n) = Π (piaᵢ − piaᵢ−1) = Π piaᵢ−1(pi − 1).
Why This Topic Matters
- Euler’s theorem: If gcd(a, n) = 1, then aφ(n) ≡ 1 (mod n). So a−1 ≡ aφ(n)−1 (mod n)—another way to compute the modular inverse when n is not prime (Fermat applies only when n is prime).
- RSA: The public and private exponents are chosen using φ(N) where N = p·q. Security relies on the hardness of computing φ(N) without knowing the factors.
- Counting: Problems like “count pairs (a, b) with 1 ≤ a < b ≤ n and gcd(a, b) = 1” or “number of reduced fractions” use sums involving φ.
Mental Model
φ(n) is “how many numbers in 1..n don’t share any prime factor with n.” So start with n and for each distinct prime p dividing n, “remove” a fraction 1/p of the numbers (those divisible by p). What remains is n × (1 − 1/p₁)(1 − 1/p₂)… = φ(n). For a prime p, every number 1..p−1 is coprime to p, so φ(p) = p − 1.
Key Formulas
- Prime: φ(p) = p − 1.
- Prime power: φ(pk) = pk − pk−1 = pk−1(p − 1).
- Multiplicative: If gcd(m, n) = 1, then φ(m·n) = φ(m)·φ(n). So from prime powers, φ(n) = Π φ(piaᵢ) = Π (piaᵢ − piaᵢ−1).
- Product form: φ(n) = n × Πp|n (1 − 1/p).
Step-by-Step: Computing φ(n) from Prime Factorization
- Factor n into primes: n = p1a₁ … pkaₖ.
- For each distinct prime p dividing n, multiply the running result by (1 − 1/p), or equivalently compute φ(n) = n × Π (1 − 1/p). Alternatively, φ(n) = Π (piaᵢ − piaᵢ−1).
- If you only need φ(n) for one n, factor n (trial division up to √n) then apply the formula. If you need φ(1) through φ(N), use a sieve (see below).
Computing φ(1..N) with a Sieve
Initialize phi[i] = i for all i. For each prime p from 2 to N: for each multiple k = p, 2p, 3p, … ≤ N, do phi[k] -= phi[k] / p (or phi[k] *= (1 - 1/p) in integer form: phi[k] = phi[k] * (p - 1) / p). After processing all primes, phi[i] = φ(i). This runs in O(N log log N) like the sieve of Eratosthenes.
φ(12): 12 = 2² × 3. φ(12) = 12 × (1 − 1/2)(1 − 1/3) = 12 × (1/2)(2/3) = 4. Or φ(12) = φ(4)·φ(3) = (4−2)·(3−1) = 2·2 = 4. The integers in [1,12] coprime to 12 are 1, 5, 7, 11 → four numbers.
Euler’s Theorem
If gcd(a, n) = 1, then aφ(n) ≡ 1 (mod n). So the order of a modulo n divides φ(n). Corollary: a−1 ≡ aφ(n)−1 (mod n)—useful to compute the modular inverse when n is composite (e.g., n = 10⁹+7 is prime, so Fermat is simpler; but for composite n, use φ(n) or extended GCD).
Python Implementation
φ(n) from Factorization (Single Value)
def totient(n: int) -> int:
"""Returns φ(n) for n >= 1."""
if n <= 0:
return 0
if n == 1:
return 1
res = n
d = 2
while d * d <= n:
if n % d == 0:
while n % d == 0:
n //= d
res -= res // d
d += 1
if n > 1:
res -= res // n
return res
Idea: start with res = n. For each distinct prime p dividing n, multiply res by (1 − 1/p), implemented as res -= res // p (so res becomes res * (p-1) / p). We iterate d and divide n by d so we only process distinct primes.
φ(1..N) via Sieve
def totient_sieve(n: int) -> list[int]:
"""Returns list [φ(0), φ(1), ..., φ(n)]. φ(0)=0."""
phi = list(range(n + 1))
for i in range(2, n + 1):
if phi[i] == i:
for j in range(i, n + 1, i):
phi[j] -= phi[j] // i
return phi
If phi[i] == i, then i is prime (not yet reduced). For each prime i, update all its multiples j: phi[j] *= (1 - 1/i) via phi[j] -= phi[j] // i.
Line-by-Line Explanation (totient)
if n <= 0: return 0— φ is defined for positive n only.if n == 1: return 1— φ(1) = 1 by convention.res = n— Start with n; we’ll multiply by (1 − 1/p) for each prime p.while d * d <= n— Trial division up to √n. When we exit, n is 1 or a single prime.if n % d == 0— d is a prime factor.while n % d == 0: n //= dremoves all factors of d so we process d only once.res -= res // d— Same as res = res * (1 - 1/d) = res * (d-1) / d in integers.if n > 1: res -= res // n— Remaining n is a prime factor; apply the same.
Time and Space Complexity
Single φ(n) from factorization: O(√n) for trial division. Totient sieve: O(N log log N) time, O(N) space—same as the sieve of Eratosthenes. Use the sieve when you need φ(1)..φ(N); use the factorization method for a single large n.
Edge Cases
- n ≤ 0: φ is defined for positive integers; return 0 or handle as invalid.
- n = 1: φ(1) = 1 (1 is coprime to 1).
- Prime n: φ(n) = n − 1.
Common Mistakes
- Using φ(n) for modular inverse when n is prime: For prime n, Fermat (an−2) is simpler than aφ(n)−1 (they’re equal when n is prime since φ(n) = n−1). Use φ when n is composite.
- Forgetting distinct primes: In the product φ(n) = n Π (1 − 1/p), each distinct prime p appears once. When factoring, don’t apply (1 − 1/p) multiple times for the same p.
- Sieve: updating phi[j] for composite j: In the sieve we iterate primes i and update multiples j. We do phi[j] -= phi[j] // i; doing this for each prime factor of j gives the correct φ(j).
In the product formula φ(n) = n × Π (1 − 1/p), the product is over distinct primes dividing n. So for n = 12 = 2²×3, use (1−1/2) once and (1−1/3) once, not (1−1/2) twice.
Optimization Insight
When you need φ(n) for many n in a range [1, N], the sieve is better than factoring each n (sieve is O(N log log N) total vs N × O(√n) for individual factorization). When you need a single φ(n) for large n, factorization is O(√n) and sufficient.
Interview Insight
When asked about Euler’s totient, say: “φ(n) counts integers in [1, n] coprime to n. Formula: φ(n) = n × product over distinct primes p|n of (1 − 1/p). For prime p, φ(p) = p−1. Euler’s theorem: if gcd(a,n)=1, aφ(n) ≡ 1 (mod n). I can compute φ(n) by factoring n in O(√n) or precompute φ(1..N) with a sieve in O(N log log N).”
Practice Problems
- Implement totient(n) and totient_sieve(N); verify φ(12)=4, φ(7)=6.
- Compute the modular inverse of a mod n (composite n) using aφ(n)−1 mod n when gcd(a, n) = 1.
- Count the number of integers in [1, n] coprime to n (that’s φ(n)); or count pairs (a, b) with 1 ≤ a < b ≤ n and gcd(a, b) = 1 using φ.
Summary
- φ(n) = number of integers in {1, …, n} with gcd(k, n) = 1. φ(1) = 1; for prime p, φ(p) = p−1.
- Formula: φ(n) = n × Πp|n (1 − 1/p) = Π (pa − pa−1) over prime powers in n.
- Euler’s theorem: If gcd(a, n) = 1 then aφ(n) ≡ 1 (mod n). So a−1 ≡ aφ(n)−1 (mod n) for composite n.
- Single φ(n): factor n, then apply formula—O(√n). Range φ(1..N): sieve in O(N log log N).
4.14 Chinese Remainder Theorem
Introduction
The Chinese Remainder Theorem (CRT) says that when we have several congruences with pairwise coprime moduli, there is a unique solution modulo the product of the moduli. Specifically: given x ≡ a₁ (mod m₁), x ≡ a₂ (mod m₂), …, x ≡ ak (mod mk) with gcd(mi, mj) = 1 for i ≠ j, there exists a unique x (mod M) where M = m₁·m₂·…·mk. CRT lets us combine results computed modulo different primes (e.g., nCr mod several primes) into one result modulo their product, or split a problem into smaller moduli. This section states the theorem, gives the constructive formula, and implements it in Python.
Real-World Analogy
You have a number of items. When you divide by 3 the remainder is 2; when you divide by 5 the remainder is 3; when you divide by 7 the remainder is 2. Is there a number that fits all three? CRT says yes, and that all such numbers differ by a multiple of 3×5×7 = 105. So there is a unique remainder mod 105. Like solving a puzzle: each condition narrows the set; with coprime moduli, the constraints are “independent” and pin down one residue class mod the product.
Formal Definition
Chinese Remainder Theorem: Let m₁, m₂, …, mk be positive integers that are pairwise coprime (gcd(mi, mj) = 1 for i ≠ j). Let M = m₁·m₂·…·mk. Then for any integers a₁, …, ak, the system of congruences
x ≡ a₁ (mod m₁), x ≡ a₂ (mod m₂), …, x ≡ ak (mod mk)
has a unique solution modulo M. That is, there exists an integer x such that all congruences hold, and any two such x are congruent modulo M.
If the moduli are not pairwise coprime, a solution may not exist (e.g., x ≡ 0 (mod 2) and x ≡ 1 (mod 2) is impossible). When a solution exists (e.g., x ≡ 1 (mod 2) and x ≡ 1 (mod 4)), it is unique modulo lcm(m₁, …, mk).
Why This Topic Matters
- Combining results: Compute something mod p₁, mod p₂, …, mod pk (e.g., nCr mod each prime), then use CRT to get the result mod p₁·p₂·…·pk or mod a large composite.
- Large modulus: Instead of working mod a huge M, work mod several smaller coprime factors and recombine with CRT.
- Contest problems: “Find x such that x ≡ a (mod m) and x ≡ b (mod n)”—direct CRT application.
Mental Model
Each congruence x ≡ ai (mod mi) says “x is in a certain residue class mod mi.” Because the mi are coprime, the conditions are independent: there is exactly one residue class mod M that matches all of them. So we “build” x by combining the contributions: for each i, we want a term that is ai mod mi and 0 mod mj for j ≠ i; then add these terms.
Construction (Formula)
Let M = m₁·m₂·…·mk and Mi = M / mi. Then gcd(Mi, mi) = 1 (since mi is coprime to every other mj). So Mi has an inverse mod mi; call it yi (so Mi·yi ≡ 1 (mod mi)). One solution is:
x = Σi ai · Mi · yi
Reduce x mod M to get the unique representative in [0, M−1]. Check: for each i, all terms aj·Mj·yj with j ≠ i are divisible by mi (because Mj contains mi), so x ≡ ai·Mi·yi ≡ ai·1 ≡ ai (mod mi).
Two Moduli (Special Case)
Solve x ≡ a (mod m) and x ≡ b (mod n) with gcd(m, n) = 1. Write x = a + m·t for some integer t. Substitute into the second: a + m·t ≡ b (mod n) ⇒ m·t ≡ (b − a) (mod n). So t ≡ (b − a)·m−1 (mod n). Compute t mod n, then x = a + m·t is a solution; reduce mod (m·n) for the unique solution in [0, mn−1].
x ≡ 2 (mod 3), x ≡ 3 (mod 5). M = 15, M₁ = 5, M₂ = 3. y₁ = inverse of 5 mod 3 = 2 (5·2=10≡1). y₂ = inverse of 3 mod 5 = 2 (3·2=6≡1). x = 2·5·2 + 3·3·2 = 20 + 18 = 38 ≡ 8 (mod 15). Check: 8 mod 3 = 2, 8 mod 5 = 3 ✓.
Step-by-Step: General CRT
- Check moduli are pairwise coprime (or handle non-coprime case separately).
- Compute M = m₁·m₂·…·mk and for each i, Mi = M / mi.
- For each i, compute yi = modular inverse of Mi modulo mi (e.g., pow(M_i, -1, m_i) in Python).
- x = Σ ai·Mi·yi; then x = x % M (and if negative, (x % M + M) % M).
Python Implementation
Two Moduli
import math
def crt2(a: int, m: int, b: int, n: int) -> int | None:
"""Solves x ≡ a (mod m), x ≡ b (mod n). Returns x mod (m*n) or None if no solution."""
if math.gcd(m, n) != 1:
return None # or solve for lcm when solution exists
t = (b - a) * pow(m, -1, n) % n
x = a + m * t
return x % (m * n)
General CRT (List of (remainder, modulus))
def crt(remainders: list[int], moduli: list[int]) -> int | None:
"""Solves x ≡ remainders[i] (mod moduli[i]) for all i. Moduli must be pairwise coprime."""
if len(remainders) != len(moduli):
return None
M = 1
for m in moduli:
M *= m
x = 0
for a, m in zip(remainders, moduli):
Mi = M // m
yi = pow(Mi, -1, m)
x = (x + a * Mi * yi) % M
return x
We reduce x mod M after each term (or once at the end) to avoid overflow. Final x is in [0, M−1].
Line-by-Line Explanation (General CRT)
M = product of moduli— The solution is unique mod M.Mi = M // m— Mi = M/mi; divisible by all mj except mi.yi = pow(Mi, -1, m)— Inverse of Mi mod mi (exists because gcd(M_i, m_i) = 1).x = (x + a * Mi * yi) % M— Add ai·Mi·yi and keep x mod M so the sum doesn’t grow and we stay in [0, M−1].
Time and Space Complexity
For k congruences: k modular inverses (each O(log min(M_i, m_i))), k multiplications and additions. So O(k · log(max moduli)). Space O(1) if we don’t store all Mi, yi (we can compute and add in one loop).
Edge Cases
- Moduli not pairwise coprime: A solution may not exist (e.g., x ≡ 0 (mod 2) and x ≡ 1 (mod 2)). If it exists, it is unique mod lcm(m₁, …, mk). For a full implementation, check pairwise gcd and either return None or use the lcm and check consistency.
- Single congruence: Just return a mod m (or a if already in range).
- Empty list: Return 0 or define M = 1, x = 0.
Common Mistakes
- Assuming CRT applies when moduli are not coprime: The uniqueness and existence hold only when moduli are pairwise coprime. For 2 and 4, a solution exists only if a ≡ b (mod 2) (consistency); then solution is unique mod 4.
- Forgetting to reduce x mod M: The formula can produce a large x; the answer is x mod M. Always return x % M (and handle negative if needed).
- Wrong order of arguments: (remainder, modulus) pairs must match: x ≡ a (mod m). Don’t swap a and m when calling.
Using CRT when the moduli are not pairwise coprime. For example, x ≡ 2 (mod 4) and x ≡ 0 (mod 2) has solutions (e.g., x ≡ 2 (mod 4)), but x ≡ 1 (mod 2) and x ≡ 2 (mod 4) has no solution. Always ensure gcd(m_i, m_j) = 1 for i ≠ j, or implement consistency checks and use lcm.
Optimization Insight
When combining results mod p₁, p₂, …, pk (e.g., nCr mod each prime), compute each result in parallel or in one pass, then run CRT once. The CRT step is O(k) and typically k is small. Precompute M and the inverses yi if you solve many systems with the same moduli.
Interview Insight
When asked about CRT, say: “If we have x ≡ a_i (mod m_i) for pairwise coprime m_i, there’s a unique solution mod M = product of m_i. We build it as sum of a_i * (M/m_i) * inv(M/m_i) mod m_i. I’ll compute M, then for each i get the inverse of M/m_i mod m_i and add the term. For two moduli, we can also solve by writing x = a + m*t and solving for t mod n.”
Practice Problems
- Implement crt2 and crt; verify with the example x ≡ 2 (mod 3), x ≡ 3 (mod 5) → 8 (mod 15).
- Solve a system of three congruences with coprime moduli (e.g., mod 3, 5, 7) and check the result.
- Use CRT to combine nCr mod 2, mod 3, mod 5 into nCr mod 30 (or another small composite).
Summary
- CRT: For pairwise coprime m₁, …, mk, the system x ≡ ai (mod mi) has a unique solution modulo M = m₁·…·mk.
- Construction: x = Σ ai·(M/mi)·yi where yi = (M/mi)−1 mod mi; then x mod M.
- Two moduli: x = a + m·t with t ≡ (b−a)·m−1 (mod n); solution x mod (m·n).
- Moduli must be pairwise coprime for the standard statement. Time O(k · log(max modulus)); reduce x mod M to get the unique representative.
4.15 Lucas Theorem (Large nCr % Mod)
Introduction
Lucas’s theorem computes C(n, k) mod p when p is prime and n, k can be very large (e.g., 10¹⁸). The idea: write n and k in base p; then C(n, k) ≡ Πi C(ni, ki) (mod p), where ni and ki are the base-p digits. So instead of precomputing factorials up to n (impossible when n is huge), we only need factorials up to p−1 and then one small product per digit. This is the standard way to compute nCr mod p for large n in competitive programming when p is prime (e.g., p = 10⁹+7).
Real-World Analogy
To compute “choose k from n” mod p, we could use n! / (k!(n−k)!) but n! is astronomically large. Lucas says: break n and k into “digits” in base p (like writing numbers in base 10). Each digit is at most p−1, so “choose ki from ni” for each digit is a small binomial coefficient we can precompute. The answer mod p is the product of these small binomials. So we reduce a huge problem to many tiny ones.
Formal Definition
Let p be prime. Write n and k in base p: n = n0 + n1·p + n2·p² + …, k = k0 + k1·p + k2·p² + … (digits 0 ≤ ni, ki ≤ p−1). Then C(n, k) ≡ Πi C(ni, ki) (mod p). If for some digit i we have ki > ni, then C(ni, ki) = 0, so the whole product is 0.
The theorem follows from the fact that (1+x)p ≡ 1+xp (mod p) (by the binomial theorem and the fact that p divides C(p,j) for 0 < j < p). So the coefficient of xk in (1+x)n mod p factors as the product over digits.
Why This Topic Matters
- Large n, prime p: Problems often ask for nCr mod 10⁹+7 with n up to 10¹⁸. You cannot compute n! or even store n. Lucas reduces to O(logp n) binomials each with arguments < p.
- Competitive programming: Standard tool for “n choose k mod prime” when n is huge. Precompute factorials 0..p−1 once, then each query is O(log n).
- When p is not prime: Factor p into prime powers, compute nCr mod each prime power (using Lucas for primes; for prime powers there are extensions), then combine with CRT.
Mental Model
Think of n and k in base p. Each “digit position” contributes independently: we must choose ki “items” from ni available in that position. The theorem says the total number of ways (mod p) is the product of the ways at each position. So we only ever need C(a, b) for 0 ≤ a, b ≤ p−1—a small table.
Step-by-Step: Computing C(n, k) mod p (Lucas)
- Precompute factorials and inverse factorials for 0..p−1 (so we can compute C(ni, ki) in O(1)).
- Get base-p digits of n and k: n = n0 + n1·p + …, k = k0 + k1·p + … (e.g., repeatedly n % p, n //= p).
- If len(k_digits) > len(n_digits), pad n with zeros (or treat missing digits of n as 0). For each digit index i, if ki > ni, return 0.
- result = 1. For each i: result = (result * C(ni, ki)) % p. Return result.
Example
C(10, 3) mod 5. 10 = 2·5 + 0 → digits (0,2); 3 = 0·5 + 3 → digits (3,0). So C(10,3) ≡ C(0,3)·C(2,0) (mod 5). C(0,3) = 0 (cannot choose 3 from 0). So C(10,3) ≡ 0 (mod 5). Check: C(10,3) = 120 ≡ 0 (mod 5) ✓. Another: C(7,2) mod 5. 7 = (2,1) in base 5, 2 = (2,0). C(7,2) ≡ C(2,2)·C(1,0) = 1·1 = 1 (mod 5). C(7,2) = 21 ≡ 1 (mod 5) ✓.
Python Implementation
Assume we have precomputed fact and inv_fact for indices 0..p−1 (size p).
def nCr_mod_small(n: int, k: int, fact: list[int], inv_fact: list[int], p: int) -> int:
"""C(n,k) mod p when 0 <= n, k < p."""
if k < 0 or k > n:
return 0
return fact[n] * inv_fact[k] % p * inv_fact[n - k] % p
def digits_base_p(x: int, p: int) -> list[int]:
"""Digits of x in base p (LSB first)."""
if x == 0:
return [0]
d = []
while x:
d.append(x % p)
x //= p
return d
def lucas(n: int, k: int, fact: list[int], inv_fact: list[int], p: int) -> int:
"""C(n, k) mod p using Lucas's theorem. p must be prime."""
if k < 0 or k > n:
return 0
nd = digits_base_p(n, p)
kd = digits_base_p(k, p)
if len(kd) > len(nd):
return 0
res = 1
for i in range(len(kd)):
ni = nd[i] if i < len(nd) else 0
ki = kd[i]
if ki > ni:
return 0
res = res * nCr_mod_small(ni, ki, fact, inv_fact, p) % p
return res
Line-by-Line Explanation
digits_base_p(x, p)— Extract digits of x in base p (LSB first) by repeated x % p and x //= p.if len(kd) > len(nd)— If k has more base-p digits than n, then k > n, so C(n,k) = 0.for i in range(len(kd))— We only need to consider digit positions where k has a digit. Where k has no digit (higher positions), k_i = 0 and C(n_i, 0) = 1, so we skip those.ni = nd[i] if i < len(nd) else 0— i-th digit of n. Since we already ensured len(kd) ≤ len(nd), we have i < len(nd), so nd[i] exists. (If we didn’t return 0 earlier, n ≥ k implies n has at least as many digits as k.)if ki > ni: return 0— If any digit k_i > n_i, C(n_i, k_i) = 0, so the whole product is 0.res = res * nCr_mod_small(ni, ki, ...) % p— Lucas: multiply the binomial coefficient for each digit position.
Time and Space Complexity
Precomputation: Factorials and inverse factorials for 0..p−1: O(p) time and space. Per query (Lucas): O(logp n) digit extraction and O(logp n) small binomial lookups (each O(1) with precomputed fact/inv_fact). So O(logp n) per query. When n is huge (e.g., 10¹⁸) and p = 10⁹+7, logp n is about 2, so just a few multiplications.
Edge Cases
- k > n or k < 0: C(n, k) = 0. Return 0.
- k = 0 or k = n: C(n, 0) = C(n, n) = 1. Lucas gives product of C(n_i, 0) = 1 for each digit → 1.
- Any digit ki > ni: C(n_i, k_i) = 0, so entire product 0. Return 0.
- n = 0: digits [0]; k must be 0 (else k > n); C(0,0) = 1.
Common Mistakes
- Using Lucas when p is not prime: The theorem holds only for prime p. For composite modulus, use prime-power factorization and CRT, or use a different method.
- Precomputing factorials up to n: The whole point of Lucas is that n can be huge. Precompute only 0..p−1.
- Digit order: Lucas uses base-p digits; product is over the same position i. LSB first is consistent as long as both n and k use the same convention.
Applying Lucas when the modulus is composite (e.g., 10⁹+9 or 1000). Lucas only applies when the modulus is prime. For composite moduli you need to factor into prime powers, compute nCr mod each (e.g., with Lucas for primes), then combine with CRT.
Optimization Insight
Precompute fact and inv_fact once for the given prime p (e.g., at program start). Then each Lucas query is O(log n). When n and k are both less than p, Lucas is unnecessary—use direct nCr with the precomputed arrays (and then you only need arrays of size max(n)+1, but if p is small, size p is fine).
Interview Insight
When asked “how do you compute C(n, k) mod p for very large n?”, say: “Lucas’s theorem: write n and k in base p; then C(n,k) ≡ product of C(n_i, k_i) mod p. So we only need binomials with arguments less than p. I precompute factorials and inverse factorials for 0..p-1, then for each base-p digit compute C(n_i, k_i) and multiply. Time O(log_p n) per query. This only works when p is prime.”
Practice Problems
- Implement digits_base_p, nCr_mod_small, and lucas; verify C(10,3) mod 5 = 0, C(7,2) mod 5 = 1.
- Solve a problem that asks for C(n, k) mod 10⁹+7 with n ≤ 10¹⁸ using Lucas.
- Compare: when n < p, direct nCr with fact[0..n] vs Lucas (both work; direct is simpler).
Summary
- Lucas’s theorem (prime p): Write n, k in base p; then C(n, k) ≡ Πi C(ni, ki) (mod p). If any ki > ni, result is 0.
- Precompute fact and inv_fact for 0..p−1 (O(p)). Each query: get base-p digits of n and k, multiply C(n_i, k_i) for each digit—O(logp n).
- Only applies when p is prime. For composite modulus use prime factors + CRT.
- Use when n (or k) can be much larger than p; when n, k < p, direct nCr is enough.
4.16 Inclusion-Exclusion Principle
Introduction
The inclusion-exclusion principle counts the size of a union of finite sets by adding sizes of sets, subtracting sizes of pairwise intersections, adding sizes of triple intersections, and so on—alternating signs so that each element is counted exactly once. It is one of the most useful counting tools in DSA: “how many numbers in [1, N] are not divisible by any of these primes?” “how many permutations have at least one fixed point?” “how many strings avoid certain substrings?” All reduce to union counting. This section states the formula, gives the intuition, and shows how to implement it (often by iterating over subsets of conditions).
Real-World Analogy
You want to count how many people in a room speak English or Spanish or French. If you add “English speakers” + “Spanish” + “French,” you count anyone who speaks two languages twice, and anyone who speaks all three three times. So subtract the counts of “English and Spanish,” “English and French,” “Spanish and French.” Now those who speak all three were added three times and subtracted three times (once in each pair)—so add back “English and Spanish and French.” The result is the count of people who speak at least one of the three. That’s inclusion-exclusion: add singles, subtract pairs, add triples.
Formal Definition
For finite sets A₁, A₂, …, An, the size of their union is:
|A₁ ∪ A₂ ∪ … ∪ An| = Σ |Ai| − Σ |Ai ∩ Aj| + Σ |Ai ∩ Aj ∩ Ak| − … + (−1)n+1 |A₁ ∩ … ∩ An|
Equivalently: for every non-empty subset S of {1, …, n}, take the intersection of Ai for i ∈ S; add (−1)|S|+1 times its size. So |∪ Ai| = Σ∅≠S⊆{1..n} (−1)|S|+1 |∩i∈S Ai|.
The “inclusion” is adding; the “exclusion” is subtracting to correct overcounts. The sign alternates: odd-sized subsets add, even-sized subtract (or vice versa depending on how you write it; the key is that each element in the union is counted exactly once).
Why This Topic Matters
- “Count numbers not divisible by any of …”: Let Ai = numbers divisible by pi. Then “not divisible by any” = total − |A₁ ∪ … ∪ Ak|. Intersection of Ai for i ∈ S is “divisible by LCM of those pi”—size ⌊N / LCM⌋.
- Derangements: Permutations with no fixed point = n! − (permutations with at least one fixed point). The latter is inclusion-exclusion over “position i is fixed.”
- Contest problems: Many “count valid configurations” or “count numbers with property P” use inclusion-exclusion over violating conditions.
Mental Model
We want |A₁ ∪ … ∪ An|. If we add all |Ai|, we overcount elements in more than one set. Subtract intersections of two sets; then we undercount elements in three sets. Add intersections of three sets; and so on. The alternating sum makes every element in the union contribute exactly 1 (proved by checking how many times an element in exactly r sets is counted: C(r,1) − C(r,2) + … + (−1)^(r+1) C(r,r) = 1).
Two and Three Sets
Two sets: |A ∪ B| = |A| + |B| − |A ∩ B|.
Three sets: |A ∪ B ∪ C| = |A| + |B| + |C| − |A∩B| − |A∩C| − |B∩C| + |A∩B∩C|.
Pattern: sum of singles, minus sum of pairs, plus sum of triples, …
Step-by-Step: Applying Inclusion-Exclusion
- Define the sets: Identify conditions (e.g., “divisible by pi”). Let Ai be the set of elements satisfying condition i.
- Decide what to count: Often we want “elements in none of the sets” = total − |∪ Ai|, or “elements in at least one” = |∪ Ai|.
- Compute intersection sizes: For each non-empty subset S, compute |∩i∈S Ai|. This is problem-specific (e.g., “divisible by LCM of primes in S” → ⌊N / LCM⌋).
- Combine with alternating signs: |∪ Ai| = Σ (−1)|S|+1 |∩i∈S Ai| over non-empty S. Or “none” = total − that sum.
Example: Numbers Not Divisible by 2 or 3
Count integers in [1, 100] not divisible by 2 or 3. A₁ = divisible by 2, A₂ = divisible by 3. |A₁| = 50, |A₂| = 33, |A₁ ∩ A₂| = divisible by 6 = 16. So |A₁ ∪ A₂| = 50 + 33 − 16 = 67. Numbers not divisible by 2 or 3 = 100 − 67 = 33. Check: numbers 1,5,7,11,13,… (every 6 we have 2 numbers: 1,5 and 7,11 and …) so 100/6 ≈ 16 full blocks of 6, 2 per block → 32, plus remainder → 33 ✓.
Python Implementation (Subset Iteration)
Iterate over non-empty subsets of {0, 1, …, k−1} using bitmasks 1 to 2k−1. For each subset S, compute the size of the intersection (problem-specific) and add (−1)|S|+1 × size to the result.
def inclusion_exclusion_union(n_conditions: int, intersection_size: callable) -> int:
"""
Returns |A0 ∪ A1 ∪ ... ∪ A_{k-1}|.
intersection_size(S) returns |∩_{i in S} A_i| for S a set or bitmask.
S is represented as a bitmask: S has bit i set iff i is in S.
"""
k = n_conditions
total = 0
for mask in range(1, 1 << k):
pop = bin(mask).count("1")
sign = 1 if pop % 2 == 1 else -1
total += sign * intersection_size(mask)
return total
# Example: count 1..N not divisible by any prime in primes
def count_not_divisible(N: int, primes: list[int]) -> int:
from math import lcm
def inter_size(mask: int) -> int:
prod = 1
for i in range(len(primes)):
if (mask >> i) & 1:
prod = lcm(prod, primes[i])
if prod > N:
return 0
return N // prod
return N - inclusion_exclusion_union(len(primes), inter_size)
intersection_size(mask) must return the size of the intersection of sets Ai for which bit i is set in mask. For “divisible by prime pi,” the intersection is “divisible by LCM of selected primes,” so size = N // LCM (or 0 if LCM > N).
Line-by-Line Explanation
for mask in range(1, 1 << k)— Non-empty subsets: mask from 1 to 2k−1; bit i set means set Ai is in the intersection.pop = bin(mask).count("1")— |S| = number of sets in this intersection.sign = 1 if pop % 2 == 1 else -1— (−1)|S|+1: odd |S| → +1, even → −1. So we add singles, subtract pairs, add triples, …total += sign * intersection_size(mask)— Add signed size of this intersection.- In
count_not_divisible: “not divisible by any” = N − |union|.inter_size(mask)computes LCM of primes in mask and returns N // LCM (or 0 if LCM > N).
Time and Space Complexity
We iterate 2k − 1 subsets. For each subset we call intersection_size once. So O(2k × cost of intersection_size). For the “not divisible by primes” example, each intersection is O(k) for LCM (or O(1) if we precompute LCMs). So total O(2k · k) or similar. Space O(1) plus the cost of intersection_size. When k is large (e.g., 20), 2k can be acceptable; when k is very large, we may need a different approach or pruning.
Edge Cases
- No conditions (k = 0): Union of zero sets is empty; |∪| = 0. Or define “not in any” = all N elements.
- Single set (k = 1): |A₁| = one term; no subtraction.
- LCM exceeds N: Intersection “divisible by LCM” has size 0; return 0 to avoid invalid division or wrong count.
Common Mistakes
- Wrong sign: The formula is |∪| = Σ (−1)|S|+1 |∩S|. So odd-sized subsets add, even-sized subtract. Reversing the sign gives the complement (e.g., “count in none” vs “count in at least one”)—double-check what you want.
- Forgetting the empty subset: We do not include the empty subset in the sum (empty intersection would be the whole universe; we’re not adding that). So start mask from 1.
- Wrong intersection meaning: For “divisible by pi,” the intersection over S is “divisible by LCM of pi for i ∈ S,” not product (product is correct only when primes are pairwise coprime; LCM = product for primes).
Using the wrong sign for “count elements in none of the sets.” That count = Total − |∪ Ai|. So compute |∪| with inclusion-exclusion (add odd, subtract even), then subtract from total. Don’t flip the sign inside the sum unless you’re sure you’re computing the complement correctly.
Pattern Recognition
When the problem asks “count elements that satisfy none of the bad conditions” or “avoid all of these,” define Ai = “satisfies bad condition i.” Then “satisfies none” = total − |∪ Ai|. When it asks “count elements that satisfy at least one,” you want |∪ Ai| directly. The same subset loop works; only the interpretation (and possibly the final subtraction) changes.
Interview Insight
When the problem involves “count numbers not divisible by any of these” or “count permutations avoiding all of these positions,” say: “I’ll use inclusion-exclusion. Define A_i as the set satisfying condition i (e.g., divisible by p_i). Then |union| = sum over non-empty subsets of (−1)^(|S|+1) times the size of the intersection. I’ll iterate over subsets with a bitmask (1 to 2^k−1), compute the intersection size for each (e.g., N // LCM for the chosen primes), and add with the correct sign. For ‘none of the conditions’ I subtract the union from the total.”
Practice Problems
- Count integers in [1, N] not divisible by any of 2, 3, 5 using inclusion-exclusion; verify for N = 100.
- Count derangements of n (permutations with no fixed point): n! − (at least one fixed point); expand “at least one fixed point” with inclusion-exclusion over positions.
- Given a list of primes, count coprime integers in [1, N] (coprime to product of primes) = same as “not divisible by any prime.”
Summary
- Inclusion-exclusion: |A₁ ∪ … ∪ An| = Σ∅≠S (−1)|S|+1 |∩i∈S Ai|. Add singles, subtract pairs, add triples, …
- Implement by iterating non-empty subsets (e.g., mask 1 to 2k−1); for each subset compute intersection size and add with sign (−1)|S|+1.
- “Count in none” = total − |∪|. “Count in at least one” = |∪|. For “not divisible by any of primes,” intersection over S = numbers divisible by LCM(primes in S), size ⌊N / LCM⌋.
- Time O(2k × cost of intersection). Watch sign and empty subset.
4.17 Matrix Exponentiation for Recurrence
Introduction
Many linear recurrences (e.g., Fibonacci: Fn = Fn−1 + Fn−2) can be written as a state vector updated by a fixed matrix: vn = M · vn−1. Then vn = Mn · v0, so we get the n-th term by matrix exponentiation (compute Mn with fast exponentiation from topic 4.6, using matrix multiplication from topic 4.12) and then multiplying by the initial vector. This gives the n-th term in O(d³ log n) time where d is the dimension of the state (e.g., d = 2 for Fibonacci), instead of O(n) with a naive loop. This section shows how to build the matrix from a recurrence and how to implement it in code (including modulo).
Real-World Analogy
Think of the recurrence as a state machine: at each step, the current state (e.g., “last two Fibonacci numbers”) is updated by a fixed rule. That rule is linear—it’s a matrix multiplying the state vector. Doing n steps means applying the matrix n times = Mn. Fast exponentiation lets us compute Mn in about log n “matrix multiplications,” so we jump from “step one by one” to “double the number of steps” each time.
Formal Definition
A linear recurrence of order d has the form Fn = c₁·Fn−1 + c₂·Fn−2 + … + cd·Fn−d (and initial values F0, …, Fd−1). Define the state vector vn = (Fn, Fn−1, …, Fn−d+1)T. Then there is a d×d matrix M such that vn = M · vn−1. So vn = Mn−d+1 · vd−1 (or similar, depending on indexing). The first entry of vn is Fn.
The matrix M encodes the recurrence: the first row is (c₁, c₂, …, cd); the rest of the rows shift the previous state (identity-like with a shift).
Why This Topic Matters
- Fibonacci and similar: Fn in O(log n) instead of O(n). Essential when n is huge (e.g., 10¹⁸).
- Linear recurrences in contests: Many problems give a recurrence of order 2 or 3; matrix exponentiation is the standard solution.
- Counting paths: Number of walks of length n in a graph = (adjacency matrix)n; same idea (matrix power).
Fibonacci as a 2×2 Matrix
Fn = Fn−1 + Fn−2. State: vn = (Fn, Fn−1)T. We want vn from vn−1 = (Fn−1, Fn−2)T. So Fn = 1·Fn−1 + 1·Fn−2 and Fn−1 = 1·Fn−1 + 0·Fn−2. Hence:
[ F_n ] [ 1 1 ] [ F_{n-1} ]
[ F_{n-1} ] = [ 1 0 ] · [ F_{n-2} ]
So M = [[1,1],[1,0]]. v_n = M · v_{n-1}, so v_n = M^{n-1} · v_1, with v_1 = (F_1, F_0)^T = (1, 0)^T.
Thus (F_n, F_{n-1})^T = M^{n-1} · (1, 0)^T; F_n = (M^{n-1})_{0,0} (top-left of M^{n-1}) or first component of M^{n-1} * (1,0)^T.
Actually: v_1 = (F_1, F_0) = (1, 0). v_2 = M*v_1 = (1, 1). So v_n = M^{n-1} * v_1. F_n = first entry of v_n = first entry of M^{n-1} * (1,0)^T = (M^{n-1})_{00} * 1 + (M^{n-1})_{01} * 0 = (M^{n-1})_{00}. So F_n = top-left entry of M^{n-1}.
For n=1: M^0 = I, (1,0) -> F_1 = 1. For n=2: M^1 = M, (1,0) -> (1,1), F_2 = 1. Good.
So Fn = first component of Mn−1 · (1, 0)T, or equivalently the top-left entry of Mn−1. For n = 0 we define F0 = 0; handle separately.
Step-by-Step: Building the Matrix for a Recurrence
- Write the recurrence: Fn = c₁ Fn−1 + … + cd Fn−d.
- State vector: v = (Fn, Fn−1, …, Fn−d+1)T (d components).
- First row of M: (c₁, c₂, …, cd) — these are the coefficients of the recurrence.
- Row i (i ≥ 2): has a 1 in column i−1 and 0 elsewhere — shifts Fn−1 into second slot, etc. So M is: row 0 = [c₁, c₂, …, cd]; row 1 = [1, 0, …, 0]; row 2 = [0, 1, 0, …, 0]; …; row d−1 = [0, …, 0, 1, 0].
- Initial vector: vd−1 = (Fd−1, Fd−2, …, F0)T. Then vn = Mn−d+1 · vd−1 for n ≥ d.
- Compute Mn−d+1 with binary exponentiation (matrix version); multiply by vd−1; the first component is Fn.
Python Implementation
Matrix Power (Modulo)
def mat_pow_mod(M: list[list[int]], exp: int, mod: int) -> list[list[int]]:
"""Returns M^exp mod mod. M is square (d×d)."""
d = len(M)
if exp == 0:
I = [[1 if i == j else 0 for j in range(d)] for i in range(d)]
return I
base = [row[:] for row in M]
res = [[1 if i == j else 0 for j in range(d)] for i in range(d)]
while exp:
if exp & 1:
res = mat_mul_mod(res, base, mod)
base = mat_mul_mod(base, base, mod)
exp >>= 1
return res
Assume mat_mul_mod(A, B, mod) from topic 4.12 (multiplies two matrices and reduces mod mod).
Fibonacci F_n mod m
def fib_mod(n: int, m: int) -> int:
if n <= 0:
return 0
if n == 1:
return 1
M = [[1, 1], [1, 0]]
P = mat_pow_mod(M, n - 1, m)
# v_n = P * (1, 0)^T; F_n = P[0][0]*1 + P[0][1]*0 = P[0][0]
return P[0][0]
General Linear Recurrence (Order 2)
Fn = a·Fn−1 + b·Fn−2. Matrix M = [[a, b], [1, 0]]. vn = Mn−1 · (F1, F0)T. Fn = first component of Mn−1 · (F1, F0)T = P[0][0]*F1 + P[0][1]*F0.
Line-by-Line Explanation (mat_pow_mod)
if exp == 0— M0 = identity matrix.res = identity— Accumulator for the result; we multiply by base when the current bit of exp is 1.while exp: if exp & 1: res = mat_mul_mod(res, base, mod)— Binary exponentiation: when the LSB of exp is 1, multiply res by base.base = mat_mul_mod(base, base, mod); exp >>= 1— Square base and shift exp (same as integer fast exponentiation).
Time and Space Complexity
Matrix multiplication (d×d): O(d³). Matrix power: O(log n) multiplications, so O(d³ log n) time. Space O(d²) for the matrices. For Fibonacci (d = 2), this is O(log n) — much better than O(n) with a loop when n is huge.
Edge Cases
- n = 0 or n = 1: Handle before matrix power (F0 = 0, F1 = 1 for standard Fibonacci).
- Negative n: Often undefined for recurrences; return 0 or handle as invalid.
- mod = 1: All entries become 0; return 0.
Common Mistakes
- Wrong matrix: The first row must be the recurrence coefficients in order (c₁, c₂, …). Rows below shift: (1,0,…,0), (0,1,0,…), …. Swapping rows or columns gives wrong results.
- Wrong initial vector or exponent: vn = Mn−1 · v1 for Fibonacci (state has Fn, Fn−1). So we need Mn−1, not Mn. Check with n = 2: M1 · (1,0) = (1,1) → F2 = 1 ✓.
- Index off by one: Fn = first component of vn = Mn−1 · v1. So use exponent n−1 for Fibonacci.
Using Mn instead of Mn−1 for Fibonacci. We have vn = M · vn−1, so vn = Mn−1 · v1. So the exponent in mat_pow_mod must be n−1 to get Fn. Using n would give the first component of vn+1, i.e., Fn+1.
Optimization Insight
For d = 2 (Fibonacci-like), the matrix is 2×2; each multiply is O(1). So total O(log n). For d = 3 or 4, still very fast. When the recurrence has constant coefficients and we need a single term Fn for huge n, matrix exponentiation is the standard; when n is small (e.g., n < 10⁶), a simple loop may be simpler and cache-friendly.
Interview Insight
When asked “compute F_n (or the n-th term of a linear recurrence) for very large n,” say: “I’ll express the recurrence as a state vector updated by a matrix M. Then the n-th state is M^{n-1} times the initial vector. I’ll compute M^{n-1} with matrix binary exponentiation (same as integer fast exponentiation but with matrix multiply). Time O(d^3 log n). For Fibonacci, M = [[1,1],[1,0]], initial (1,0), and F_n is the first component of M^{n-1} * (1,0).”
Practice Problems
- Implement mat_pow_mod and fib_mod(n, m); verify F_10 = 55, F_0 = 0, F_1 = 1.
- Solve F_n = 2*F_{n-1} + 3*F_{n-2} with given F_0, F_1 using matrix exponentiation.
- Count the number of ways to tile a 2×n board with 2×1 dominoes (recurrence: a_n = a_{n-1} + a_{n-2}; same as Fibonacci).
Summary
- Linear recurrence Fn = c₁·Fn−1 + … + cd·Fn−d can be written vn = M · vn−1; then vn = Mn−d+1 · vd−1. First row of M is (c₁, …, cd); below that, shift rows.
- Fibonacci: M = [[1,1],[1,0]], v1 = (1,0)T; Fn = first component of Mn−1 · v1 = (Mn−1)0,0.
- Compute Mexp with matrix binary exponentiation (same as fast exponentiation, with mat_mul_mod). Time O(d³ log n), space O(d²).
- Handle n < d (base cases) separately. Use exponent n−1 (not n) for Fibonacci to get F_n.
5.1 Array Basics
Introduction
An array is a contiguous block of memory that stores a sequence of elements of the same type, each identifiable by an index. In Python, the built-in list is the primary “array” type: it supports indexing (arr[i]), length (len(arr)), and dynamic growth (append). Arrays are the foundation of most data structures and algorithms—strings are arrays of characters, matrices are arrays of arrays, and most problems involve traversing or querying an array. This section covers what an array is, 0-based indexing, basic operations (access, update, traverse), and how Python lists behave so you can reason about time complexity and edge cases.
Real-World Analogy
Think of an array like a row of lockers or parking spots numbered 0, 1, 2, … Each slot holds one item. You can go directly to slot 5 (O(1) “access”) and read or replace what’s there. You can walk the row from start to end (traversal). The “address” of each slot is computed from the base address plus the index—that’s why access by index is constant time. Adding a new slot at the end is cheap (append); inserting in the middle or at the front may require shifting (expensive in a true array; Python lists hide this with amortized cost).
Formal Definition
An array of size n is a sequence of n elements stored in contiguous memory, indexed by integers 0 to n−1 (or 1 to n in 1-based indexing; we use 0-based). Access by index i is O(1) because the address of the i-th element is base + i × (size of one element). Length is typically stored, so len is O(1).
In a static array, the size is fixed at creation. In a dynamic array (like Python’s list), the size can grow (and sometimes shrink); append is amortized O(1), but insert at position 0 is O(n) because elements must be shifted.
Why This Topic Matters
- Foundation: Almost every data structure (strings, heaps, graphs as adjacency lists) uses arrays. Two-pointer, sliding window, and prefix-sum techniques all operate on arrays.
- Interviews: “Given an array of integers…” is the most common problem start. You must be comfortable with indexing, bounds, and traversal.
- Complexity: Access O(1); search by value O(n); insert at end amortized O(1); insert at front or middle O(n). Choosing the right structure (array vs linked list) depends on these costs.
Mental Model
Picture a row of boxes numbered 0, 1, …, n−1. Each box holds one value. “arr[i]” means “open box i.” “len(arr)” is the number of boxes. Traversal is “visit each box in order.” Slicing arr[start:end] is “the segment from box start up to (but not including) box end.” Negative index −1 means “last box,” −2 means “second to last,” and so on.
Indexing and Slicing in Python
- 0-based: First element is arr[0], last is arr[len(arr)−1] or arr[−1].
- Negative indices: arr[−1] is the last element, arr[−2] the second-to-last; arr[−i] is the same as arr[len(arr)−i] (for valid i).
- Slicing: arr[start:end] gives elements from index start to end−1 (end excluded). arr[:end] means start=0; arr[start:] means end=len(arr). arr[::step] can reverse (step=−1) or skip elements.
arr = [10, 20, 30, 40, 50]
arr[0] # 10
arr[-1] # 50
arr[1:4] # [20, 30, 40]
arr[::-1] # [50, 40, 30, 20, 10] — reverse
Basic Operations
| Operation | Python | Time |
|---|---|---|
| Access by index | arr[i] | O(1) |
| Update | arr[i] = x | O(1) |
| Length | len(arr) | O(1) |
| Append | arr.append(x) | O(1) amortized |
| Insert at position | arr.insert(i, x) | O(n) |
| Search by value | x in arr | O(n) |
Traversal
# By index
for i in range(len(arr)):
print(arr[i])
# By value
for x in arr:
print(x)
# With index and value
for i, x in enumerate(arr):
print(i, x)
Edge Cases
- Empty array: arr = []; len(arr) = 0. Accessing arr[0] raises IndexError. Check len(arr) or use “if arr” before indexing.
- Index out of bounds: arr[i] when i < 0 or i ≥ len(arr) raises IndexError (for negative i, Python interprets as relative to end; −1 is valid if len ≥ 1).
- Single element: arr[0] and arr[−1] are the same; no special case needed.
Common Mistakes
- Off-by-one: Valid indices are 0 to len(arr)−1. Loop “for i in range(len(arr))” gives 0..len−1. Using range(1, len(arr)) skips the first element; range(len(arr)+1) causes IndexError on the last iteration.
- Modifying list while iterating: Removing or inserting elements during “for x in arr” can skip elements or cause errors. Iterate over a copy (e.g., arr[:]) or use indices and adjust.
- Assuming append returns the list: arr.append(x) returns None and mutates arr. Don’t write result = arr.append(x) and expect result to be the new list.
Using 1-based logic when the language is 0-based. “First element” is arr[0], “second” is arr[1]. “Element at position i” in problem statements often means 1-based; convert to index i−1 when accessing, or use arr[i−1] and document that you’re 1-based.
Interview Insight
When given “an array of n integers,” clarify: “0-indexed? Can be empty? Sorted or unsorted? Any duplicates?” Then state: “I’ll use a list; access and update by index are O(1). I’ll traverse with for i in range(len(arr)) when I need the index, or for x in arr when I only need values. For in-place changes I’ll be careful not to modify while iterating.”
Practice Problems
- Given an array, return the maximum element and its index (one pass).
- Reverse an array in place (swap arr[i] and arr[n−1−i] for i in range(len(arr)//2)).
- Check if an array is palindrome (compare arr[i] and arr[n−1−i]).
Summary
- An array is a contiguous sequence of elements indexed 0 to n−1. Access and update by index are O(1).
- Python list: append amortized O(1), insert O(n), “x in arr” O(n). Use 0-based and negative indices (arr[−1] = last).
- Traverse with for i in range(len(arr)), for x in arr, or enumerate(arr). Watch empty array and index bounds.
- Off-by-one and “modify while iterating” are common bugs; clarify 0 vs 1-based when the problem says “position.”
5.2 Searching
Introduction
Searching in an array means finding whether a given value (the target) exists and, often, at which index. The right search strategy depends on whether the array is sorted or unsorted. For an unsorted array, you have no choice but to scan elements until you find the target or reach the end—linear search, O(n). For a sorted array, you can repeatedly eliminate half of the remaining elements—binary search, O(log n). This section builds both from scratch, explains why binary search works only when the array is sorted, and shows how to implement "first index of," "last index of," and related variants so you can handle interview problems confidently.
Real-World Analogy
Imagine finding a name in a phone book (sorted A–Z). You don't read every page: you open the middle, see "M," and know the name is either in the first half or the second half. You throw away one half and repeat. That's binary search—each step halves the search space. Now imagine a pile of unsorted receipts. To find one with a certain date, you must look through them one by one until you find it or run out. That's linear search. The structure of the data (sorted vs not) determines which strategy is possible and how fast it can be.
Given arr = [3, 7, 2, 9, 1] (unsorted), finding 7 requires checking 3, then 7—you might get lucky, but in the worst case you check all 5. Given arr = [1, 2, 3, 7, 9] (sorted), you compare 7 to the middle (3), then to the right half's middle (7), and you're done in 2 steps. No need to look at 1 or 2 or 9.
Formal Definition
Search problem: Given an array A of size n and a target value x, determine if x is in A and optionally return an index i such that A[i] = x. Linear search examines elements in order until a match or end of array; worst-case time O(n), space O(1). Binary search (for sorted arrays) repeatedly compares the target to the middle element and discards half of the remaining range; worst-case time O(log n), space O(1) (iterative) or O(log n) (recursive stack).
Binary search requires a sorted array (or an array that can be treated as sorted with a predicate). The "discard half" step is valid only when you know that all elements in one half cannot contain the target—which follows from the ordering.
Why This Topic Matters
- Interviews: "Find the index of target in a sorted array" is a classic. You must code binary search correctly (bounds, loop condition, when to return) and know first/last occurrence variants.
- Building block: Binary search on answer (Topic 5.9), search in rotated array, and many optimization problems use the same "narrow the range" idea.
- Complexity: Going from O(n) to O(log n) when the array is sorted is a huge win for large n. Knowing when you can use binary search (sorted + comparable) is essential.
Mental Model
Linear search: "Walk from index 0 to n−1; stop when you see the target or run out." Binary search: "Keep a range [left, right] where the target might be. While the range is non-empty, look at the middle element. If it equals the target, you're done. If it's less than the target, the target must be in the right half (if at all). If it's greater, the target must be in the left half. Update the range and repeat." The key is that "left" and "right" are indices, and you always shrink the range; the loop exits when left > right (range is empty) or when you find the target.
Linear Search (Unsorted or Any Array)
Algorithm
- For each index i from 0 to n−1:
- If
arr[i] == target, return i (or True). - If the loop finishes without returning, the target is not present; return −1 (or False).
Python Implementation
def linear_search(arr, target):
for i in range(len(arr)):
if arr[i] == target:
return i
return -1
Time O(n): in the worst case we check every element. Space O(1). This is optimal for an unsorted array—you cannot avoid looking at every element in the worst case (the target might be last or absent).
Python built-ins: target in arr returns True/False (same idea, O(n)). arr.index(target) returns the first index or raises ValueError; also O(n).
Binary Search (Sorted Array)
Why It Works
If the array is sorted in non-decreasing order, then for any index mid, every element to the left is ≤ arr[mid] and every element to the right is ≥ arr[mid]. So when we compare target to arr[mid]:
- If
target == arr[mid], we found it. - If
target < arr[mid], the target cannot be at mid or to the right; search only[left, mid−1]. - If
target > arr[mid], the target cannot be at mid or to the left; search only[mid+1, right].
Each step removes at least half of the remaining indices, so after O(log n) steps the range is empty or we find the target.
ASCII Diagram: Binary Search Step
Sorted array: [ 2, 5, 7, 9, 12, 15 ] target = 9
Index: 0 1 2 3 4 5
Step 1: left=0, right=5, mid=2 → arr[2]=7 < 9 → search right half
[ 2, 5, 7, | 9, 12, 15 ]
↑
Step 2: left=3, right=5, mid=4 → arr[4]=12 > 9 → search left half
[ 9, 12, 15 ]
↑
Step 3: left=3, right=3, mid=3 → arr[3]=9 == 9 → found at index 3.
Standard Binary Search (Any Occurrence)
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
if arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
Line-by-Line Explanation
left <= right: Whenleft == right, the range has one element; we must check it. So the loop condition isleft <= right. Exiting whenleft > rightmeans the range is empty—target not found.mid = (left + right) // 2: Middle index (integer division). Avoids overflow in other languages; in Python,(left + right) // 2is standard.- If
arr[mid] < target, every element at index ≤ mid is too small, so setleft = mid + 1. Ifarr[mid] > target, every element at index ≥ mid is too large, so setright = mid - 1.
First Occurrence (Leftmost Index)
When duplicates are allowed, "find the first index where arr[i] == target" requires a small change: when arr[mid] == target, don't return yet—remember mid as a candidate and continue searching the left half (there might be an earlier occurrence).
def first_index(arr, target):
left, right = 0, len(arr) - 1
result = -1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
result = mid
right = mid - 1 # keep looking left
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return result
When we find a match, we shrink the range to [left, mid−1] to see if there's another match to the left. If not, result holds the leftmost index we saw.
Last Occurrence (Rightmost Index)
Similarly, for the last occurrence: when arr[mid] == target, set result = mid and search the right half with left = mid + 1.
def last_index(arr, target):
left, right = 0, len(arr) - 1
result = -1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
result = mid
left = mid + 1 # keep looking right
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return result
Evolution: Brute Force → Linear → Binary
| Approach | When | Time | Space |
|---|---|---|---|
| Linear search | Unsorted or one-off | O(n) | O(1) |
| Binary search | Sorted array | O(log n) | O(1) |
If the array is sorted, always prefer binary search over linear search—O(log n) vs O(n). If you need to search the same array many times, consider sorting once (O(n log n)) and then doing k binary searches (k × O(log n)); that can beat k linear searches (k × O(n)) when k is large.
Time and Space Complexity
- Linear search: Time O(n), space O(1).
- Binary search (iterative): Time O(log n)—each step halves the range, so at most ⌈log₂(n+1)⌉ iterations. Space O(1).
- Binary search (recursive): Same time O(log n), but space O(log n) for the call stack.
Edge Cases
- Empty array: Linear: loop doesn't run, return −1. Binary:
left=0,right=-1, soleft <= rightis false; return −1. - Target not present: Both return −1 (or your chosen sentinel).
- Single element: Linear: one comparison. Binary: one iteration,
left==right, checkarr[mid]. - All same value: Linear finds the first. Standard binary finds any; first_index/last_index give the correct boundary.
- Unsorted array: Binary search is wrong—it can miss the target or return an arbitrary index. Always ensure the array is sorted (or use a predicate that preserves the "discard half" property) before using binary search.
Common Mistakes
- Using binary search on an unsorted array: Binary search assumes sorted order. If the array isn't sorted, use linear search or sort first.
- Off-by-one in loop condition: Use
left <= rightso that whenleft == rightyou still check that single element. Usingleft < rightcan skip the last candidate. - Wrong mid update: When
arr[mid] < target, the target is in the right half, soleft = mid + 1. Whenarr[mid] > target,right = mid - 1. Don't setleft = midorright = midwithout ±1, or the range might not shrink and you can get an infinite loop. - Integer overflow for mid: In C/Java,
mid = (left + right) / 2can overflow for very large indices. Usemid = left + (right - left) / 2. In Python, (left + right) // 2 is fine.
Writing while left < right and then returning left or right without verifying that arr[left] == target. The "find insertion point" variant (bisect) uses left < right and returns left; the "find exact match" variant should use left <= right and return when arr[mid] == target, or −1 when the loop exits.
Python Built-ins: bisect
The bisect module provides binary search for sorted lists:
bisect.bisect_left(arr, target): leftmost index wherearr[i] >= target(insertion point to keep sorted). If target is present, this is the first occurrence.bisect.bisect_right(arr, target)(orbisect.bisect): rightmost index wherearr[i] <= targetis still true before the next element—i.e., one past the last occurrence of target.
So "first index of target" is bisect_left (and check arr[i]==target); "last index" is bisect_right(arr, target) - 1 (and check). Count of target = bisect_right(arr, target) - bisect_left(arr, target).
Clarify: "Is the array sorted? Can there be duplicates? What should I return if the target is not found—−1 or something else?" Then implement binary search with left <= right, correct updates for left/right, and handle the "first/last index" variant if asked. Mention bisect in Python if the problem is just "find index" and you're allowed to use the standard library.
Practice Problems
- Binary search: Sorted array, return index of target or −1.
- First and last position: Sorted array with duplicates; return [first_index, last_index] of target or [−1, −1].
- Search insert position: Sorted array, return the index where target would be inserted to keep order (same as
bisect_left). - Count occurrences: Sorted array, count how many times target appears (last_index − first_index + 1, or use bisect).
Summary
- Linear search: Scan from 0 to n−1; O(n) time, O(1) space. Use for unsorted arrays or when you need a simple one-off check.
- Binary search: Requires sorted array. Maintain [left, right]; compare target to middle; discard half each time. O(log n) time, O(1) space (iterative).
- Use
left <= rightandmid = (left+right)//2; updateleft = mid+1orright = mid−1so the range always shrinks. - First occurrence: when match, search left (
right = mid - 1). Last occurrence: when match, search right (left = mid + 1). - Python:
inandindex()are linear. For sorted lists, usebisect_left/bisect_rightfor insertion position and range of target.
5.3 Insertion & Deletion
Introduction
Insertion means adding a new element at a given position; deletion means removing an element (by index or by value). Because array elements are stored in contiguous memory, inserting or deleting in the middle (or at the front) forces the rest of the elements to shift—that’s why these operations are O(n) in the worst case. Appending at the end is the exception: no shift is needed, so it’s O(1) amortized in a dynamic array like Python’s list. This section covers exactly when and why shifting happens, how to implement insertion and deletion correctly, and how to reason about time complexity so you can choose the right structure (array vs linked list) when the problem involves many middle insertions or deletions.
Real-World Analogy
Imagine a row of cars in a parking lot with no gaps. To add a car at spot 2, you must move the car currently at 2 (and every car after it) one spot forward to make room—that’s insertion and shifting. To remove the car at spot 2, you must move every car after it one spot backward to close the gap—that’s deletion and shifting. Adding a car at the end of the row doesn’t require moving anyone. The “contiguous, no gaps” rule is what makes middle insertions and deletions expensive; a linked list is like having each car point to the next, so you can insert or remove by changing pointers without moving everyone.
arr = [10, 20, 30, 40]. Insert 25 at index 2: we need [10, 20, 25, 30, 40]. Elements at indices 2 and 3 (30, 40) shift right. Delete element at index 1 (20): we need [10, 30, 40]. Elements at indices 2 and 3 shift left. In both cases, the number of elements that move is proportional to the number of positions after the insertion or deletion point—hence O(n) in the worst case.
Formal Definition
Insertion at position i: Add a new element at index i. All elements at indices ≥ i must move one position to the right (or the array must be reallocated). The number of shifts is n−i in the worst case, so O(n) time. Deletion at position i: Remove the element at index i. All elements at indices > i must move one position to the left. The number of shifts is n−1−i, so O(n) time. Append (insert at end): No shift; O(1) amortized. Space for in-place operations is O(1) extra; dynamic arrays may use extra space for growth.
In a static array (fixed size), insertion might be impossible if the array is full; deletion only “marks” or overwrites. In a dynamic array (Python list), the structure can grow and shrink; the implementation hides reallocation, but the shift cost remains when inserting or deleting not at the end.
Why This Topic Matters
- Complexity reasoning: You must know that
appendis cheap andinsert(0, x)orinsert(i, x)for small i is expensive. Same forpop(0)vspop(). - Choosing data structures: If the problem has many insertions/deletions at the front or middle, an array (list) may be the wrong choice—deque or linked list can offer O(1) at ends or O(1) at a known node.
- Interviews: “Implement a list that supports insert and delete” or “why is insert at front slow?”—you need to explain shifting and give the correct big-O.
Mental Model
Picture the array as a row of slots. Insert at i: Make room by shifting everything from i to the end one step to the right, then write the new element at i. Delete at i: Remove the element at i and shift everything from i+1 to the end one step to the left. The “shift” is a loop that copies elements; the cost is proportional to how many elements lie after the insertion or deletion point.
Insertion
Insert at End (Append)
No shifting: the new element goes into the next available slot. In Python, arr.append(x) is O(1) amortized (occasional reallocation when capacity is exceeded, but amortized constant).
arr = [10, 20, 30]
arr.append(40) # arr → [10, 20, 30, 40]
Insert at Position i
Elements at indices i, i+1, …, n−1 must move one place right; then write the new element at i. So we need space for one more element (dynamic array handles this) and a loop that copies from right to left to avoid overwriting.
Before: [ 10, 20, 30, 40 ] insert 25 at index 2
Index: 0 1 2 3
Step 1: Shift right from index 2 onward (copy 40→3, 30→2)
After: [ 10, 20, 25, 30, 40 ]
↑ new
Python: arr.insert(i, x) inserts x at index i; all elements at i and beyond shift right. Time O(n).
arr = [10, 20, 30, 40]
arr.insert(2, 25) # arr → [10, 20, 25, 30, 40]
Insert at beginning (insert(0, x)) shifts all n elements—O(n). Insert at end is same as append—O(1) amortized.
Deletion
Delete by Index
Remove the element at index i; elements at i+1, …, n−1 shift left by one. Time O(n) because up to n−1 elements may move.
Before: [ 10, 20, 30, 40 ] delete index 1
After: [ 10, 30, 40 ]
↑ 30 and 40 moved left
Python: arr.pop(i) removes and returns the element at index i (default i = last, so pop() is O(1) at end). pop(0) is O(n)—shifts all remaining elements.
arr = [10, 20, 30, 40]
arr.pop(1) # returns 20, arr → [10, 30, 40]
arr.pop() # returns 40, arr → [10, 30] (pop from end, O(1))
Delete by Value
Find the first occurrence of the value (O(n) search) and remove it (shift the rest left, O(n)). Total O(n). Python: arr.remove(x) removes the first occurrence of x; raises ValueError if not found.
arr = [10, 20, 20, 30]
arr.remove(20) # removes first 20 → [10, 20, 30]
Step-by-Step: Manual Insert and Delete (Conceptual)
Insert x at index i (without using insert):
- Ensure capacity (or append a dummy so length increases by 1).
- For j from n−1 down to i: set arr[j+1] = arr[j]. (Shift right from the end so we don’t overwrite.)
- Set arr[i] = x.
Delete at index i (without using pop):
- For j from i to n−2: set arr[j] = arr[j+1]. (Shift left.)
- Decrease length by 1 (or remove last slot).
Python Implementation Summary
# Insertion
arr.append(x) # end, O(1) amortized
arr.insert(i, x) # at index i, O(n)
# Deletion
arr.pop() # remove last, O(1)
arr.pop(i) # remove at index i, O(n)
arr.remove(x) # remove first occurrence of x, O(n)
# Length changes
len(arr) # after insert +1, after delete −1
Line-by-Line Notes
insert(0, x)andpop(0)are O(n)—avoid in a loop or usecollections.dequefor O(1) at both ends.remove(x)only removes the first match. To remove all occurrences, either loop (careful: indices change) or build a new list with a list comprehension.- Neither
insertnorappendnorpopreturns the list; they mutate and return the element (pop) or None (insert/append/remove).
Evolution: Many Insertions/Deletions
| Scenario | Best choice | Why |
|---|---|---|
| Insert/delete only at end | List (array) | append/pop() are O(1) amortized. |
| Insert/delete at front (e.g. queue from front) | deque | appendleft/popleft O(1); list insert(0)/pop(0) O(n). |
| Insert/delete in middle by index | List or linked list | List O(n) per op; linked list O(1) if you have the node (but no random access). |
If you are building a list and only ever append, a dynamic array is optimal. If you need to remove from the front frequently (e.g. queue), use collections.deque so that popleft is O(1). If you need many middle insertions and you have a reference to the node, a linked list avoids shifting—but you lose O(1) access by index.
Time Complexity
- append(x): O(1) amortized.
- insert(i, x): O(n); worst when i=0 (shift all).
- pop(): O(1).
- pop(i): O(n); worst when i=0.
- remove(x): O(n) (find + shift).
Space Complexity
In-place insertion/deletion: O(1) extra space (aside from the list’s own storage). The list may over-allocate for growth; that’s implementation-dependent and amortized.
Edge Cases
- Empty list:
pop()andpop(i)raiseIndexError;remove(x)raisesValueError. Checkif arrbefore popping. - Single element:
pop(0)orpop()both remove the only element and leave an empty list. - Index out of range:
insert(len(arr), x)is valid (same as append).insert(i, x)for i > len(arr) can raise or append (Python: insert at end if i > len).pop(i)for i ≥ len raises IndexError. - Value not present:
remove(x)raises ValueError if x not in list. Checkif x in arrfirst if you need to avoid the exception.
Common Mistakes
- Using insert(0, x) or pop(0) in a loop: That’s O(n) per call, so k operations become O(kn). Use deque for queue-like behavior.
- Removing while iterating: Deleting elements in a
for x in arrloop skips elements (index shifts). Iterate over a copy (e.g.for x in arr[:]) or use a while loop and adjust index when you remove. - Assuming remove removes all occurrences: It removes only the first. To remove all, use list comprehension
arr = [a for a in arr if a != x]or a loop over a copy. - Expecting insert/append to return the list: They return None; the list is modified in place.
Building a list by repeatedly inserting at the front: result.insert(0, x) in a loop makes each insertion O(n), so total O(n²). Instead, append in the loop and reverse at the end (result.reverse() or result[::-1]), or use a deque and appendleft, then convert to list if needed.
Pattern Recognition
When a problem involves “add element” or “remove element”:
- If only at the end: list append/pop is fine.
- If at the front: consider
deque(Topic 9.7). - If in the middle and you need to preserve order: list insert/pop is O(n) per op; acceptable for small n or few operations.
When asked “how do you insert/delete in an array?”, explain the shift: “Insert at i shifts elements from i to the end right; delete at i shifts elements from i+1 to the end left. So both are O(n) in the worst case. Append and pop from the end are O(1). If the problem needs many front operations, I’d use a deque.” Mention that remove-by-value is O(n) because it combines search and shift.
Practice Problems
- Implement “insert at index i” and “delete at index i” on a list without using
insert/pop(use a loop to shift). - Remove all occurrences of value
xfrom a list in place (one pass with two pointers or build new list and assign back). - Merge two sorted arrays into one sorted array (compare and push to result; no “insert in middle” needed—just append).
Summary
- Insertion at i: Shift elements from i right, then write. O(n). Append is O(1) amortized.
- Deletion at i: Shift elements from i+1 left. O(n). Pop from end is O(1).
- Python:
insert(i,x),pop(i),remove(x); avoidinsert(0,x)andpop(0)in a loop—usedequefor O(1) at both ends. - Don’t remove (or insert) while iterating over the same list; use a copy or index-based loop with care.
5.4 Two Pointer Technique
Introduction
The two pointer technique uses two indices (or pointers) that move through an array—often from opposite ends or both from the start—to solve a problem in one pass with O(n) time and O(1) extra space. Instead of nested loops (e.g. checking every pair), you move the pointers based on the current values and the problem condition, so each element is considered at most a constant number of times. Typical uses: finding a pair with a given sum in a sorted array, removing duplicates in place, checking if a sequence is a palindrome, or partitioning (e.g. move zeros to the end). This section covers the main patterns (converging pointers, same-direction pointers) and when to apply each.
Real-World Analogy
Imagine two people walking toward each other from opposite ends of a corridor. They meet in the middle. If the corridor is sorted by some rule (e.g. height), you can decide at each step whether the “left” person or the “right” person should move so you get closer to a goal (e.g. two people whose heights sum to a target). That’s the converging two-pointer pattern. Alternatively, imagine one person walking fast and one slow along the same path—useful for finding the “middle” or a cycle. That’s the fast/slow or same-direction pattern. In both cases, you avoid checking every possible pair by using the structure of the array (e.g. sorted) to rule out large parts of the search space.
Sorted array [1, 2, 3, 4, 5], target sum 7. Put left=0, right=4. arr[0]+arr[4]=6 < 7 → increase sum by moving left right. left=1: arr[1]+arr[4]=7 → found. Only a few comparisons instead of checking all pairs.
Formal Definition
Two pointer technique: Maintain two indices left and right (or slow and fast). At each step, update one or both based on arr[left], arr[right], and the problem condition. Converging: left starts at 0, right at n−1; they move toward each other until left ≥ right. Same direction: both start at 0 (or 0 and 1); one or both advance. Guarantee: each element is processed O(1) times, so total time O(n), space O(1).
Converging pointers work well when the array is sorted (or has a known structure) so that moving one pointer in one direction has a predictable effect (e.g. sum increases or decreases). Same-direction pointers work for in-place compaction, “read” vs “write” positions, or fast/slow for cycle or middle detection.
Why This Topic Matters
- Interviews: Two Sum (sorted), 3Sum, remove duplicates, move zeros, palindrome, container with most water—all frequently asked and often solved with two pointers.
- Efficiency: Replaces O(n²) “check every pair” with O(n) when the problem allows ruling out ranges using order or invariants.
- In-place: Many two-pointer solutions use O(1) extra space, which is required when you cannot allocate a new array.
Mental Model
Converging: You have a “window” [left, right]. The condition (e.g. sum, or “elements between”) depends on both ends. If the current window is too small (e.g. sum too low), move the pointer that will increase it (e.g. left++ on a sorted array increases the smallest element). If too large, move the pointer that will decrease it (e.g. right--). Same direction: One pointer is the “reader” (scans the array), the other is the “writer” (next position to write a valid element). Or one is “slow” and one “fast” so that when fast has moved 2k steps, slow has moved k (e.g. find middle).
Pattern 1: Converging Pointers (Opposite Ends)
Use when the array is sorted (or can be sorted) and you are comparing or combining values at two positions. Start with left = 0, right = len(arr) - 1. Loop while left < right. Depending on the problem, move left right or right left so the search space shrinks.
Two Sum in Sorted Array
Find two indices such that arr[i] + arr[j] == target. Because the array is sorted, if arr[left] + arr[right] < target, we need a larger sum—move left right. If arr[left] + arr[right] > target, move right left. If equal, return the pair.
def two_sum_sorted(arr, target):
left, right = 0, len(arr) - 1
while left < right:
s = arr[left] + arr[right]
if s == target:
return [left, right]
if s < target:
left += 1
else:
right -= 1
return [-1, -1]
Each iteration either increases left or decreases right, so the number of steps is at most n−1. Time O(n), space O(1).
ASCII Diagram: Converging Pointers
Sorted: [ 1, 2, 3, 4, 5 ] target = 7
Index: 0 1 2 3 4
↑ ↑
left right sum=6 < 7 → left++
↑ ↑
left right sum=7 → return (1, 4)
Check Palindrome (converging)
Check if arr reads the same from both ends. Move left and right toward the center; if arr[left] != arr[right], return False. Stop when left >= right.
def is_palindrome(arr):
left, right = 0, len(arr) - 1
while left < right:
if arr[left] != arr[right]:
return False
left += 1
right -= 1
return True
Pattern 2: Same-Direction Pointers (Read/Write)
One pointer scans the array (read); the other marks where to write the “next valid” element. Used for in-place removal or compaction (e.g. remove duplicates, move zeros).
Remove Duplicates In Place (Sorted)
Sorted array: keep one copy of each value, in place. write is the next index to write a unique value; read scans. When arr[read] != arr[write-1] (or write==0), copy arr[read] to arr[write] and advance write. Return write as the new length.
def remove_duplicates_sorted(arr):
if not arr:
return 0
write = 1
for read in range(1, len(arr)):
if arr[read] != arr[write - 1]:
arr[write] = arr[read]
write += 1
return write
Move Zeros to End
Keep relative order of non-zeros; put all zeros at the end. write = next position for a non-zero. Scan with read; when arr[read] != 0, swap or copy to arr[write] and increment write. Fill the rest with zeros if needed (or swap and leave zeros at end).
def move_zeros(arr):
write = 0
for read in range(len(arr)):
if arr[read] != 0:
arr[write], arr[read] = arr[read], arr[write]
write += 1
Time O(n), space O(1).
Pattern 3: Fast and Slow Pointers
Two pointers both start at the beginning; one moves one step per iteration, the other two steps (or different rates). Used for finding the middle of a list, detecting a cycle (in linked lists), or similar “position relative to length” problems. On arrays, a common use is “slow” = write, “fast” = read, which is the same as the read/write pattern above.
Step-by-Step: Two Sum (Sorted)
- Set
left = 0,right = len(arr) - 1. - While
left < right: computes = arr[left] + arr[right]. - If
s == target, return (left, right). - If
s < target, doleft += 1(we need a larger sum; increasing the left element increases the sum because the array is sorted). - If
s > target, doright -= 1. - If the loop exits, no pair found; return a sentinel.
Evolution: Brute Force → Two Pointers
Two sum (sorted): Brute force: two nested loops, check every pair—O(n²). Two pointers: one pass from both ends—O(n). The key is using sorted order: if the current sum is too small, the only way to get a larger sum is to move the left pointer right; if too large, move the right pointer left.
Whenever you have a sorted array and need to find a pair (or triple) satisfying a condition, consider converging pointers (or one pointer + binary search). Same-direction pointers are for in-place scans (one pass, O(1) space). Don’t fall back to nested loops if a single pass with two indices is enough.
Time and Space Complexity
- Converging (two sum, palindrome): Each iteration does O(1) work and one of the pointers moves; total iterations O(n). Time O(n), space O(1).
- Same direction (remove duplicates, move zeros): Single loop, each element read once, O(1) writes. Time O(n), space O(1).
Edge Cases
- Empty or single element: For converging,
left < rightis false when n ≤ 1; handle (e.g. return False or empty result). For read/write, handle n==0 so you don’t usewrite-1when write is 0. - No valid pair: Return a clear value (e.g. [-1,-1], False, or empty list).
- Multiple valid pairs: Clarify whether you need one or all (e.g. two sum usually returns one pair; 3Sum returns all unique triplets).
- Duplicates: In “remove duplicates,” sorted array is assumed. In two sum, if you need distinct indices, (left, right) are always distinct when left < right.
Common Mistakes
- Using two pointers on an unsorted array for “pair sum”: Converging logic assumes order; sort first or use a hash map.
- Wrong loop condition: Use
left < rightfor converging (notleft <= rightunless you need to handle the same index twice). For “find pair,” left and right must be distinct. - Moving the wrong pointer: In a sorted array, smaller index → smaller value. So to increase sum, move left right; to decrease, move right left. Reverse if the array is sorted descending.
- Read/write: off-by-one: In remove duplicates, the first element is always kept;
writestarts at 1 and we compare witharr[write-1].
Using left <= right and then using left and right as a pair: when left == right, you’re using the same index twice. For “two distinct indices,” keep left < right. For “palindrome,” left == right is the middle element and doesn’t need to be compared with itself, so left < right is correct.
Pattern Recognition
- Sorted array + pair/triplet sum or “find two”: Think converging pointers (or one pointer + binary search for the second).
- In-place remove/compact (duplicates, zeros, “remove value”): Think same-direction read/write pointer.
- Palindrome, “valid from both ends”: Converging pointers from both ends.
- Linked list middle or cycle: Fast/slow pointers (Topic 8).
When the problem involves “find two indices” or “in-place removal” or “palindrome,” say: “I can use two pointers. If the array is sorted, I’ll start from both ends and move based on the condition. If I need to remove elements in place, I’ll use a read and a write pointer.” State the invariant (e.g. “elements in [0, write) are the valid ones”) and the time/space (O(n), O(1)).
Practice Problems
- Two Sum II (sorted): Return indices (1-based) of two numbers that add to target; converging pointers.
- Remove duplicates from sorted array: In place, return new length; same-direction write pointer.
- Move zeros: In place; read/write with swap.
- Valid palindrome: Ignore non-alphanumeric, case-insensitive; converging pointers.
- Container with most water: Heights array; converging pointers (move the shorter line inward).
Summary
- Two pointers = two indices moving in one pass; often O(n) time, O(1) space.
- Converging: left at 0, right at n−1; move toward each other. Use for sorted array pair sum, palindrome, “two from ends.”
- Same direction: read and write (or fast/slow). Use for in-place remove duplicates, move zeros, compaction.
- Loop condition for “distinct pair”:
left < right. Move the pointer that will fix the condition (e.g. sum too small → left++, sorted). - Recognize “sorted + pair” and “in-place remove” as two-pointer problems to avoid O(n²) or extra space.
5.5 Sliding Window
Introduction
The sliding window technique solves problems on contiguous subarrays (or substrings) by maintaining a “window” [left, right] and moving it in one pass. Instead of checking every possible subarray (O(n²)), you expand or shrink the window based on the problem condition so each element is added and removed from the window at most twice—O(n) time. There are two main types: fixed-size window (e.g. max sum of k consecutive elements) and variable-size window (e.g. smallest subarray with sum ≥ target, or longest substring with at most k distinct characters). This section covers both patterns, when to use which, and how to keep window state (e.g. sum or frequency map) updated efficiently.
Real-World Analogy
Imagine a train car with a fixed number of seats (fixed window): as the train moves, one person gets off at the front and one gets on at the back—you always see the same number of people, but the group “slides.” For a variable window, imagine a rope you pull from both ends: you lengthen it until a condition is met (e.g. “contains at least 3 red beads”), then shorten from the left until the condition fails, then lengthen again. You never go backward on the right pointer; you only adjust left. In both cases, you avoid re-scanning the whole array by reusing what you already know about the current window.
Array [2, 1, 5, 1, 3, 2], k = 3. Fixed window: first window sum = 2+1+5 = 8; slide right: drop 2, add 1 → 1+5+1 = 7; drop 1, add 3 → 5+1+3 = 9; drop 5, add 2 → 1+3+2 = 6. Max sum = 9. One pass, O(n), instead of recomputing each window from scratch.
Formal Definition
Sliding window: Maintain a contiguous segment [left, right] (inclusive or [left, right) as convenient). Fixed-size: Window length is k. Advance right and left together (or right first until window size k, then slide both). Variable-size: Expand (right++) when the current window doesn’t satisfy the condition; shrink (left++) when it does (or the opposite, depending on the problem). Keep a running state (sum, count, frequency map) that you update in O(1) when adding/removing one element. Total time O(n); space O(1) or O(k) for a frequency map.
The key invariant: each element enters the window once and leaves at most once (when left advances). So the number of “add to window” and “remove from window” operations is O(n), and if each update is O(1), the whole algorithm is O(n).
Why This Topic Matters
- Interviews: Max sum subarray of size k, smallest subarray with sum ≥ target, longest substring with at most K distinct characters, minimum window substring—all classic sliding window.
- Efficiency: Turns “check every subarray” O(n²) (or O(n·k) for fixed k) into O(n) by reusing window state.
- Pattern: “Contiguous subarray/substring” + “max/min length” or “satisfy condition” often suggests sliding window.
Mental Model
Fixed-size: The window is a “frame” of k elements. Slide one step: subtract the element that just left (left), add the new element (right). Keep a running sum (or other aggregate) and update it in O(1) per slide. Variable-size: Right expands the window; when the condition is met (or violated, depending on the problem), you may record a candidate answer and then shrink from the left until the condition is no longer met, then expand again. The goal is usually “smallest window that satisfies” or “largest window that satisfies”; the expansion/shrink logic depends on that.
Fixed-Size Window
Problem: Maximum Sum of K Consecutive Elements
Given an array and integer k, find the maximum sum of any contiguous subarray of length k.
Algorithm
- Compute the sum of the first k elements (window [0, k−1]). This is the first candidate.
- For right from k to n−1: the new window drops arr[left] and adds arr[right], where left = right − k. So new_sum = current_sum − arr[left] + arr[right]. Update current_sum and track the maximum.
- Return the maximum sum seen.
def max_sum_k(arr, k):
if not arr or k <= 0 or k > len(arr):
return 0
window_sum = sum(arr[:k])
best = window_sum
for right in range(k, len(arr)):
left = right - k
window_sum = window_sum - arr[left] + arr[right]
best = max(best, window_sum)
return best
Line-by-Line Notes
window_sumis the sum of the current window. When we slide, we subtract the element that exits (arr[left]) and add the element that enters (arr[right]).- Loop runs (n − k) iterations; each iteration is O(1). Total O(n).
ASCII Diagram: Fixed Window
arr: [ 2, 1, 5, 1, 3, 2 ] k = 3
Step 0: [ 2, 1, 5 ] sum = 8
Step 1: [ 1, 5, 1 ] sum = 7 (drop 2, add 1)
Step 2: [ 5, 1, 3 ] sum = 9 (drop 1, add 3) ← max
Step 3: [ 1, 3, 2 ] sum = 6
Result: max = 9
Variable-Size Window
Problem: Smallest Subarray with Sum ≥ Target
Given an array of positive integers and a target, find the length of the smallest contiguous subarray whose sum is ≥ target. If none, return 0.
Idea: Expand the window (right++) until the window sum ≥ target. Then we have a candidate length (right − left + 1). Shrink from the left (left++) until the sum is < target again; each time before shrinking, update the minimum length. Then expand again. Each element is added once and removed once—O(n).
def min_subarray_sum(arr, target):
if not arr or target <= 0:
return 0
left = 0
window_sum = 0
min_len = float('inf')
for right in range(len(arr)):
window_sum += arr[right]
while window_sum >= target:
min_len = min(min_len, right - left + 1)
window_sum -= arr[left]
left += 1
return min_len if min_len != float('inf') else 0
While the sum is ≥ target, we shrink from the left and update the minimum length. Time O(n): right and left each advance at most n times. Space O(1).
Problem: Longest Substring with At Most K Distinct Characters
Given a string (or array of characters) and k, find the length of the longest substring with at most k distinct characters. Idea: Expand (right++) and add the new character to a frequency map. While the number of distinct characters exceeds k, shrink from the left (remove arr[left] from the map, left++). After each shrink step, the window has ≤ k distinct. Track the maximum window size. Time O(n), space O(k) for the map.
def longest_k_distinct(s, k):
if k <= 0 or not s:
return 0
left = 0
freq = {}
max_len = 0
for right in range(len(s)):
c = s[right]
freq[c] = freq.get(c, 0) + 1
while len(freq) > k:
c_left = s[left]
freq[c_left] -= 1
if freq[c_left] == 0:
del freq[c_left]
left += 1
max_len = max(max_len, right - left + 1)
return max_len
Step-by-Step: Variable Window (Min Subarray Sum)
- left = 0, window_sum = 0, min_len = ∞.
- For right from 0 to n−1: add arr[right] to window_sum.
- While window_sum ≥ target: update min_len = min(min_len, right − left + 1); subtract arr[left] from window_sum; left++.
- After the loop, return min_len (or 0 if no valid window).
The “while” shrink step ensures that when we leave the inner loop, the window [left, right] has sum < target. So the next expansion (right++) is the only way to get back to ≥ target. No need to re-scan from the beginning.
Evolution: Brute Force → Sliding Window
Max sum of k consecutive: Brute force: for each starting index i, sum arr[i..i+k−1] — O(n·k). Sliding window: one pass, add one and remove one per step — O(n). Smallest subarray sum ≥ target: Brute force: for each (i, j), sum arr[i..j] and compare — O(n²). Sliding window: expand and shrink, each element processed O(1) times — O(n).
| Problem type | Brute force | Sliding window |
|---|---|---|
| Fixed size k (max sum) | O(n·k) | O(n) |
| Variable (min subarray sum) | O(n²) | O(n) |
Whenever you need a contiguous subarray (or substring) that maximizes or minimizes a measure (sum, length, count) or satisfies a condition (sum ≥ T, at most k distinct), consider sliding window. Fixed k → fixed-size window with running aggregate. Variable size → expand until condition met, then shrink from the left; keep the invariant that the window state is updated in O(1) when adding/removing one element.
Time and Space Complexity
- Fixed-size (max sum k): One pass over the array; each element enters and leaves the window once. Time O(n), space O(1).
- Variable-size (min subarray sum): Left and right each advance at most n times; inner while runs at most n times total. Time O(n), space O(1).
- Variable-size with frequency map (k distinct): Time O(n), space O(k) for the map.
Edge Cases
- Empty array or k > n (fixed window): Return 0 or handle (e.g. no valid window).
- k = 0 or negative: No valid window; return 0 or appropriate value.
- Target unreachable (min subarray): If total sum < target, return 0 (or report no solution).
- All elements same (k distinct): One distinct character; window can be the whole array if k ≥ 1.
- Negative numbers: Min subarray sum with “sum ≥ target” can still use sliding window, but the “shrink while sum ≥ target” logic remains correct. For “max subarray sum” (Kadane), different algorithm (Topic 5.8).
Common Mistakes
- Recomputing the window from scratch each time: That gives O(n·k) or O(n²). Always update the running state (sum, frequency) in O(1) when sliding.
- Shrinking too much (variable window): Shrink only while the condition is (still) satisfied (or violated, depending on the problem). For “smallest subarray with sum ≥ target,” shrink while sum ≥ target; for “longest with ≤ k distinct,” shrink while distinct > k.
- Using “if” instead of “while” when shrinking: After expanding, you may need to shrink multiple steps (e.g. remove several elements from the left) to restore the invariant. Use
whilefor the shrink loop. - Off-by-one in length: Length of [left, right] inclusive is
right - left + 1. Check your problem’s definition (0-based indices vs 1-based length).
In variable-size window, moving left past right: ensure after shrinking, left ≤ right + 1. Usually left is increased until the condition fails, so the next iteration expands right again. Don’t reset left to 0 on each iteration unless the problem requires it (almost never in standard sliding window).
Pattern Recognition
- “Contiguous subarray of size k” / “every consecutive k”: Fixed-size window.
- “Smallest/largest subarray such that sum ≥ / ≤ target”: Variable-size window; expand then shrink (or the reverse).
- “Longest substring with at most K distinct”: Variable window + frequency map.
- “Minimum window substring” (containing all chars of T): Variable window + frequency map for T and current window.
Say: “This is a contiguous subarray problem, so I’ll use a sliding window. If the size is fixed (k), I’ll maintain a running sum and slide by subtracting the left element and adding the right. If the size is variable, I’ll expand with right until the condition is met, then shrink from the left with a while loop and update the answer.” State the invariant (“window sum is the sum of [left, right]”) and time O(n), space O(1) or O(k).
Practice Problems
- Max sum of k consecutive elements: Fixed window; running sum.
- Smallest subarray with sum ≥ target: Variable window; expand, then shrink while sum ≥ target.
- Longest substring with at most K distinct characters: Variable window + freq map; shrink while distinct > k.
- Maximum average subarray of length k: Same as max sum of k (fixed window); average = sum/k.
- Minimum window substring: Smallest substring of s containing all characters of t; variable window + two frequency maps.
Summary
- Sliding window = contiguous subarray [left, right] with state updated in O(1) when adding/removing one element; total time O(n).
- Fixed-size: Window length k; slide by subtracting arr[left] and adding arr[right]; one pass O(n).
- Variable-size: Expand (right++) until condition met; shrink (left++) with a
whileuntil condition fails; track min/max length (or other measure). - Use
while(notif) when shrinking so the invariant is restored after multiple removals. - Recognize “contiguous subarray” + “max/min sum or length” or “at most k distinct” as sliding window to get O(n) instead of O(n²) or O(n·k).
5.6 Prefix Sum
Introduction
A prefix sum (or cumulative sum) array stores, at each index i, the sum of all elements from the start of the array up to and including i. Once built in O(n), any range sum query—"what is the sum of elements from index left to right?"—can be answered in O(1) using the identity: sum(arr[left..right]) = prefix[right] − prefix[left−1] (with a convention for left=0). This turns repeated range-sum queries from O(n) per query to O(1), so q queries take O(n + q) instead of O(n·q). Prefix sum is the basis for many problems: subarray sum, equilibrium index, and 2D range sums (matrices). This section covers building the prefix array, the range-sum formula, and common uses.
Real-World Analogy
Imagine a road with mile markers. If you record the cumulative distance from the start at each marker (0, 5, 12, 18, …), then the distance between marker 2 and marker 4 is (distance at 4) − (distance at 2). You don't re-measure the segment; you subtract two stored numbers. The prefix array is that list of cumulative distances: prefix[i] = "total from start up to i." Any segment [left, right] is prefix[right] − prefix[left−1].
arr = [1, 2, 3, 4, 5]. Prefix: prefix = [1, 3, 6, 10, 15]. Sum of arr[2..4] = 3+4+5 = 12. Using prefix: prefix[4] − prefix[1] = 15 − 3 = 12. One subtraction instead of looping.
Formal Definition
Prefix sum array: For array arr of length n, define prefix[i] = arr[0] + arr[1] + … + arr[i] for 0 ≤ i < n. Convention: prefix[-1] = 0 (sum of zero elements). Then range sum from index left to right (inclusive): sum(arr[left..right]) = prefix[right] − prefix[left−1]. With prefix[-1]=0, this holds for left=0 too. Build: O(n); per query: O(1).
We can use a 1-indexed prefix array so that prefix[i] = sum of first i elements; then sum of elements from index a to b (1-based) is prefix[b] − prefix[a−1]. The same idea applies in 2D: prefix[r][c] = sum of the rectangle from (0,0) to (r,c), and a subrectangle sum becomes four prefix lookups.
Why This Topic Matters
- Range queries: Many problems ask for "sum of subarray [L, R]" repeatedly. Naive: O(n) per query. With prefix sum: O(n) preprocess, O(1) per query.
- Interviews: Subarray sum equals K (with hash map + prefix), equilibrium index, 2D range sum (matrix block sum).
- Building block: Difference array (Topic 5.7) and many segment-tree problems can be understood via prefix thinking.
Mental Model
prefix[i] = "sum of everything from the start up to i." So the sum from left to right is "sum up to right" minus "sum up to (left−1)." Picture a number line: prefix marks cumulative totals; the segment [left, right] is the gap between two marks.
Building the Prefix Array
def build_prefix(arr):
n = len(arr)
prefix = [0] * (n + 1) # prefix[0] = 0, prefix[i] = sum(arr[0..i-1])
for i in range(n):
prefix[i + 1] = prefix[i] + arr[i]
return prefix
Here prefix[i] = sum of arr[0..i−1] (first i elements). So sum(arr[left..right]) = prefix[right+1] − prefix[left]. Alternatively, use prefix[i] = sum(arr[0..i]) and prefix[-1]=0; then sum(arr[left..right]) = prefix[right] − prefix[left−1] (treat prefix[-1]=0 in code as a special case or use a length-(n+1) array with prefix[0]=0).
Convention: Length-(n+1) with prefix[0] = 0
Let prefix[0] = 0 and prefix[i] = arr[0] + … + arr[i−1] for 1 ≤ i ≤ n. Then:
- Sum of arr[left..right] (0-based, inclusive) = prefix[right+1] − prefix[left].
- No special case for left=0: prefix[0]=0 gives prefix[right+1] − 0 = sum of first (right+1) elements.
Range Sum Query
def range_sum(prefix, left, right):
# prefix has length n+1, prefix[i] = sum(arr[0..i-1])
return prefix[right + 1] - prefix[left]
O(1) per query. Left and right are 0-based inclusive indices.
ASCII Diagram
arr: [ 1, 2, 3, 4, 5 ]
index: 0 1 2 3 4
prefix: [ 0, 1, 3, 6, 10, 15 ]
index: 0 1 2 3 4 5 (prefix[i] = sum arr[0..i-1])
sum(arr[2..4]) = arr[2]+arr[3]+arr[4] = 3+4+5 = 12
= prefix[5] - prefix[2] = 15 - 3 = 12
Python Implementation (In-Place or New Array)
# Build prefix (new array, length n+1)
prefix = [0]
for x in arr:
prefix.append(prefix[-1] + x)
# Range sum [left, right] inclusive
def query(left, right):
return prefix[right + 1] - prefix[left]
# Example: subarray sum equals K (count)
# For each right, count how many left with prefix[left] = prefix[right+1] - K
# Use a dict: for each prefix value, how many indices seen so far
from collections import defaultdict
def subarray_sum_count(arr, K):
prefix = 0
seen = defaultdict(int)
seen[0] = 1
count = 0
for x in arr:
prefix += x
count += seen[prefix - K]
seen[prefix] += 1
return count
Line-by-Line: Subarray Sum Equals K
We want count of (left, right) such that sum(arr[left..right]) = K. That is prefix[right+1] − prefix[left] = K, i.e. prefix[left] = prefix[right+1] − K. As we iterate right, we have prefix = prefix[right+1]. So for each right, add to count the number of left < right+1 with prefix[left] = prefix − K. Maintain a frequency map of prefix values seen so far; before adding current prefix to the map, add seen[prefix − K] to count.
Time and Space Complexity
- Build prefix: One pass, O(n) time, O(n) space for the prefix array (or O(1) extra if you overwrite a copy of arr).
- Range sum query: O(1) per query.
- q queries: O(n + q) with prefix sum vs O(n·q) naive.
Edge Cases
- Empty array: prefix = [0]; any range query with left > right can return 0.
- Single element: prefix = [0, arr[0]]; sum(arr[0..0]) = prefix[1] − prefix[0] = arr[0].
- left = right: Sum of one element; formula still works.
- left > right: Define as 0 or handle as invalid.
Common Mistakes
- Off-by-one: With prefix[0]=0 and prefix[i]=sum(arr[0..i−1]), sum(arr[left..right]) = prefix[right+1] − prefix[left]. Using prefix[right] − prefix[left] gives sum(arr[left..right−1]).
- Wrong convention: If prefix[i] = sum(arr[0..i]), then sum(arr[left..right]) = prefix[right] − (prefix[left−1] if left>0 else 0). Stick to one convention (e.g. length n+1 with prefix[0]=0) and use it consistently.
- Index bounds: For prefix of length n+1, valid indices are 0..n. Query (left, right) must have 0 ≤ left ≤ right < n.
Using prefix[right] − prefix[left] for inclusive range [left, right]. That equals the sum of arr[left..right−1]. For inclusive right you need prefix[right+1] − prefix[left] (with the standard length-(n+1) prefix where prefix[i] = sum of first i elements).
Pattern Recognition
- "Sum of subarray [L,R]" or "range sum" repeatedly: Build prefix once, then O(1) per query.
- "Number of subarrays with sum K": Prefix + hash map (store count of prefix values; for each right, add count of prefix value = current_prefix − K).
- "Equilibrium index" (left sum = right sum): Total sum = S; at index i, left sum = prefix[i], right sum = S − prefix[i] − arr[i]; solve for i.
When you see "range sum" or "subarray sum," say: "I can precompute a prefix sum array in O(n). Then each range sum is O(1) as prefix[right+1] − prefix[left]." For "count subarrays with sum K," say: "I'll use prefix and a hash map: for each right, I need the count of left with prefix[left] = current_prefix − K."
Practice Problems
- Range sum query (many queries): Build prefix; answer each [L,R] in O(1).
- Subarray sum equals K (count): Prefix + frequency map.
- Equilibrium index: Index where sum of elements on left = sum on right; use total sum and prefix.
- 2D range sum (matrix): prefix[r][c] = sum of rectangle (0,0) to (r,c); block sum using four prefix values.
Summary
- Prefix sum lets you answer range sum queries in O(1) after O(n) build. prefix[i] = sum of arr[0..i−1] (with prefix[0]=0).
- Range sum [left, right] inclusive = prefix[right+1] − prefix[left].
- Use length-(n+1) array and prefix[0]=0 to avoid special cases.
- "Subarray sum equals K" count: iterate with current prefix, add seen[current_prefix − K], then update seen[current_prefix].
5.7 Difference Array
Introduction
A difference array (or difference table) lets you apply many range-update operations—“add value v to every element in [left, right]”—in O(1) time per update. After all updates, you recover the final array by taking the prefix sum of the difference array—one O(n) pass. So q range updates take O(q + n) instead of O(q·n). It is the “inverse” idea of prefix sum: prefix sum answers range queries (sum over [L,R]); the difference array handles range updates (add to [L,R]). This section covers how to represent range updates as two point updates, how to build and apply the difference array, and when to use it.
Real-World Analogy
Imagine a long fence where you paint segments. Instead of walking the whole segment each time to add paint, you mark “+1 bucket” at the start of the segment and “−1 bucket” at the end. Later, you walk once from left to right, carrying a running total of “buckets so far”—that total is how much paint is on the fence at each point. The difference array is those +1 and −1 marks; the prefix sum of that array is the final “amount of paint” (or the final array after all range adds).
Start with array [0, 0, 0, 0]. Updates: add 5 to [1, 2], add 3 to [0, 1]. Difference array: at 0: +3, at 1: +5, at 2: −5, at 3: −3 (or diff[0]=3, diff[1]=5, diff[2]=−5, diff[3]=−3; diff[4]=0 for boundary). Prefix sum of diff: [3, 8, 3, 0]. So final array = [3, 8, 3, 0].
Formal Definition
Difference array diff: For an array arr, define diff[0] = arr[0] and diff[i] = arr[i] − arr[i−1] for i ≥ 1. Then arr is the prefix sum of diff. Equivalently: to add v to every element in [left, right], do diff[left] += v and diff[right+1] −= v (if right+1 is in bounds). After all such updates, arr[i] = diff[0] + diff[1] + … + diff[i] = prefix sum of diff. Each range update is O(1); recovering the array is O(n).
We use a length-(n+1) diff array so that diff[right+1] is valid when right = n−1. Initialize diff with zeros (or from an initial arr). Apply each range update with two point updates; then prefix sum gives the final array.
Why This Topic Matters
- Range updates: Problems like “add v to [L,R] for many (L,R,v), then output the final array” are O(q·n) naive. With a difference array: O(q) for updates + O(n) to recover = O(q + n).
- Interviews: Range add queries, “car pooling,” “meeting rooms” style “add in range,” or recover array after many range updates.
- Duality with prefix sum: Prefix sum = range query (sum); difference array = range update (add). Taking prefix sum of diff recovers the array.
Mental Model
Think of diff as “how much does this index change from the previous one?” Adding v to [left, right] means: at left, the array “steps up” by v (diff[left] += v); at right+1, it “steps down” by v (diff[right+1] −= v). The prefix sum of diff accumulates these steps into the actual values.
Building the Difference Array from an Initial Array
If you start with an array arr: diff[0] = arr[0], and for i from 1 to n−1, diff[i] = arr[i] − arr[i−1]. Then arr is the prefix sum of diff. (We can use length n and define prefix sum accordingly, or use length n+1 with diff[n]=0.)
def build_diff(arr):
n = len(arr)
diff = [0] * (n + 1)
diff[0] = arr[0]
for i in range(1, n):
diff[i] = arr[i] - arr[i - 1]
return diff
Applying a Range Update: Add v to [left, right]
Update: diff[left] += v and diff[right + 1] -= v. If right+1 == n, we have diff[n] (which we ignore when computing prefix sum for indices 0..n−1), so it’s fine. If we use 0-indexed and right = n−1, then diff[n] -= v keeps the prefix sum correct for indices 0..n−1.
def range_add(diff, left, right, v):
diff[left] += v
if right + 1 < len(diff):
diff[right + 1] -= v
O(1) per update.
Recovering the Array (Prefix Sum of diff)
def recover_array(diff):
arr = []
s = 0
for i in range(len(diff) - 1): # or n, if diff has length n+1 for n elements
s += diff[i]
arr.append(s)
return arr
O(n). The final arr[i] is the prefix sum of diff up to i.
ASCII Diagram
Start: arr = [0, 0, 0, 0], diff = [0, 0, 0, 0, 0]
Update: add 5 to [1, 2]
diff[1] += 5 → diff[2] -= 5
diff: [0, 5, -5, 0, 0]
Prefix sum of diff (first 4): [0, 5, 0, 0] ✓ (indices 1,2 got +5)
Update: add 3 to [0, 1]
diff[0] += 3, diff[2] -= 3
diff: [3, 5, -8, 0, 0]
Prefix sum: [3, 8, 0, 0] then index 2: 0+(-8) wrong — need to sum
Correct: arr[0]=3, arr[1]=3+5=8, arr[2]=8-5=3, arr[3]=3+0=3. So [3, 8, 3, 0].
(Prefix: 3, 3+5=8, 8+(-5)=3, 3+(-3)=0 for diff[3]=-3 if we did diff[3]-=3 for right=1.)
For [0,1]: diff[0]+=3, diff[2]-=3 → prefix: 3, 8, 3, 0. Yes.
Full Example in Code
# Start with zeros; apply range adds; recover array
n = 5
diff = [0] * (n + 1)
def add(left, right, v):
diff[left] += v
if right + 1 <= n:
diff[right + 1] -= v
add(1, 3, 10) # add 10 to indices 1,2,3
add(0, 2, 5) # add 5 to indices 0,1,2
# Recover
arr = []
s = 0
for i in range(n):
s += diff[i]
arr.append(s)
# arr = [5, 15, 15, 10, 0]
Time and Space Complexity
- Build diff from arr: O(n).
- One range update (add v to [L,R]): O(1)—two index updates.
- Recover array: O(n)—one prefix-sum pass.
- q range updates + recover: O(q + n). Naive would be O(q·n).
- Space: O(n) for the diff array.
Edge Cases
- right = n−1: diff[right+1] is diff[n]; ensure diff has length n+1 so this is valid.
- left > right: Treat as no-op or skip.
- Empty array: diff = [0]; recover gives [].
Common Mistakes
- Forgetting diff[right+1] −= v: Without it, the “step up” at left is never canceled, so every index from left to the end gets +v.
- Using right instead of right+1: The “step down” must happen at the first index that should not get the update, i.e. right+1.
- Diff array too short: Use length n+1 so that diff[right+1] is valid for right = n−1.
Doing only diff[left] += v and forgetting diff[right+1] −= v. Then the prefix sum from left onward is increased by v forever; the range update becomes “add v to [left, end].” Always add both point updates for a bounded range [left, right].
Pattern Recognition
- “Add v to all elements in [L,R]” repeated many times, then output the array: Difference array.
- “Apply many range updates (add/subtract), then query” or “get final state”: Diff array + prefix sum to recover.
- 2D variant: Similar idea with a 2D diff matrix and 2D prefix sum to recover (four corners for each rectangle update).
When the problem has “add value to range [L,R]” many times and then asks for the final array (or a single query), say: “I’ll use a difference array. Each range add is two point updates: diff[L] += v and diff[R+1] -= v. Then I recover the array with one prefix-sum pass. Total O(q + n).”
Practice Problems
- Range add: Given q queries (L, R, v), add v to arr[L..R] for each; output final array. Use diff.
- Car pooling / booking: Trips (num_passengers, start, end); at each point, current load = prefix sum of diff (add at start, subtract at end).
- 2D range add: Add v to all cells in rectangle (r1,c1) to (r2,c2); use 2D difference array and 2D prefix sum to recover.
Summary
- Difference array supports range updates “add v to [left, right]” in O(1) each:
diff[left] += v,diff[right+1] -= v. - Recover the final array by taking the prefix sum of diff—O(n).
- Use a length-(n+1) diff array so diff[right+1] is valid when right = n−1.
- q range updates + recover: O(q + n) vs naive O(q·n).
5.8 Kadane's Algorithm
Introduction
Kadane's algorithm finds the maximum sum of a contiguous subarray (maximum subarray sum) in O(n) time and O(1) space with a single pass. The idea: at each position, the maximum sum ending at that position is either “extend the best subarray ending at the previous position” or “start fresh with only this element.” By keeping a running “max sum ending here” and a global “best so far,” you never need to check every subarray—so you avoid O(n²). This section covers the standard formulation, handling all-negative arrays, and how to recover the indices of the best subarray.
Real-World Analogy
Imagine you’re tracking daily profit over a month. “Best contiguous period” is the stretch of days that would have made you the most money. At each day, you decide: “Do I extend my current streak (add today’s profit) or throw it away and start from today?” If the current streak goes negative, starting from today might be better. You only need to remember “best sum ending at yesterday” and “best sum seen so far”—no need to try every possible start and end day.
arr = [-2, 1, -3, 4, -1, 2, 1, -5, 4]. The maximum sum contiguous subarray is [4, -1, 2, 1] with sum 6. Kadane: at index 3 (value 4), we start fresh (previous ending sum was negative); we extend through indices 4,5,6; then -5 reduces the ending sum; 4 starts a new candidate. One pass, no nested loops.
Formal Definition
Maximum subarray sum (MSCS): Given array arr, find max{ sum(arr[i..j]) : 0 ≤ i ≤ j < n }. Kadane's algorithm: Define cur = maximum sum of a contiguous subarray ending at the current index. Then cur = max(arr[i], cur + arr[i]) (either start fresh at i or extend the previous best ending). Update best = max(best, cur). Initial: cur = arr[0], best = arr[0]. Time O(n), space O(1).
If the problem allows an “empty” subarray (sum 0), use cur = max(0, cur + arr[i]) and best = max(best, cur) with cur = 0, best = 0 initially. Then all-negative arrays yield 0. For “non-empty subarray,” use the formulation above so the answer is at least the maximum single element.
Why This Topic Matters
- Interviews: “Maximum subarray sum” is a classic; Kadane is the expected O(n) solution. Follow-ups: return the indices, handle circular array, or maximum product subarray (similar idea).
- Efficiency: Brute force is O(n²) (or O(n³) with naive sum). Kadane is O(n).
- Pattern: “Best contiguous segment” with an additive criterion often reduces to “max sum ending here” + “global max.”
Mental Model
At each index, you have a “current streak” (max sum of a subarray ending at the previous index). Adding the current element might improve it or make it worse. If the current streak becomes negative, it’s better to drop it and start a new streak with only the current element (otherwise any future positive segment would be dragged down). So: cur = max(arr[i], cur + arr[i]). The global best is the maximum cur you ever see.
Algorithm (Non-Empty Subarray)
- Initialize
cur = arr[0],best = arr[0]. - For i from 1 to n−1:
cur = max(arr[i], cur + arr[i]);best = max(best, cur). - Return
best.
Interpretation: cur is the maximum sum of a contiguous subarray that ends at index i. We either extend (cur + arr[i]) or start fresh (arr[i]).
Python Implementation
def max_subarray_sum(arr):
if not arr:
return 0 # or None, depending on problem
cur = arr[0]
best = arr[0]
for i in range(1, len(arr)):
cur = max(arr[i], cur + arr[i])
best = max(best, cur)
return best
Line-by-Line Explanation
cur = max(arr[i], cur + arr[i]): Extend the previous best ending (cur + arr[i]) or start a new subarray with only arr[i]. We take the max, so we never keep a negative “ending” when arr[i] alone is better.best = max(best, cur): Track the best sum we’ve seen over all positions.- Empty array: return 0 or raise; for non-empty we guarantee at least one element is considered.
Recovering the Indices (Start and End of Best Subarray)
While updating cur and best, track when cur equals arr[i] (we started fresh → new start index) and when best is updated (we have a new best → update end index).
def max_subarray_indices(arr):
if not arr:
return 0, -1, -1
cur = arr[0]
best = arr[0]
start = end = 0
best_start = best_end = 0
for i in range(1, len(arr)):
if arr[i] > cur + arr[i]:
cur = arr[i]
start = i
else:
cur = cur + arr[i]
end = i
if cur > best:
best = cur
best_start, best_end = start, end
return best, best_start, best_end
ASCII Diagram
arr: [ -2, 1, -3, 4, -1, 2, 1, -5, 4 ]
index: 0 1 2 3 4 5 6 7 8
i=0: cur=-2, best=-2
i=1: cur=max(1, -2+1)=1, best=1
i=2: cur=max(-3, 1-3)=-2, best=1
i=3: cur=max(4, -2+4)=4, best=4 (start fresh)
i=4: cur=max(-1, 4-1)=3, best=4
i=5: cur=max(2, 3+2)=5, best=5
i=6: cur=max(1, 5+1)=6, best=6 ← max sum
i=7: cur=max(-5, 6-5)=1, best=6
i=8: cur=max(4, 1+4)=5, best=6
Return 6 (subarray [4,-1,2,1])
All-Negative and Empty-Subarray Variants
All-negative array
With the non-empty formulation, cur and best stay in the array; the answer is the maximum (least negative) element. Correct.
Allow empty subarray (sum = 0)
Use cur = max(0, cur + arr[i]) and best = max(best, cur), with cur = 0, best = 0. Then if every element is negative, we return 0 (empty subarray).
def max_subarray_sum_allow_empty(arr):
cur = best = 0
for x in arr:
cur = max(0, cur + x)
best = max(best, cur)
return best
Evolution: Brute Force → Kadane
Brute force: For each pair (i, j), compute sum(arr[i..j])—O(n²) pairs, O(n) sum each = O(n³), or O(n²) with prefix sum. Kadane: One pass, O(n). The key is that we don’t need to try every start index; the recurrence “max sum ending at i” depends only on “max sum ending at i−1” and arr[i].
Whenever you need the “best contiguous segment” by sum (or a similar additive measure), ask: “Can I compute the best segment ending at each index from the best ending at the previous index?” If yes, that’s a linear recurrence and usually O(n) with O(1) space.
Time and Space Complexity
- Time: O(n)—one pass over the array.
- Space: O(1)—only a few variables (cur, best, and optionally indices).
Edge Cases
- Empty array: Return 0, None, or as specified. Avoid indexing arr[0].
- Single element: cur and best both equal that element; correct.
- All negative: Non-empty version returns the maximum element. Empty-allowed version returns 0.
- All positive: The whole array is the answer; Kadane correctly extends to the end.
Common Mistakes
- Initializing cur = 0, best = 0 for non-empty: Then for all-negative arrays you’d return 0, but the problem may require a non-empty subarray (answer = max element). Use cur = arr[0], best = arr[0] for non-empty.
- Using cur = cur + arr[i] without the max: Then a negative prefix keeps dragging cur down; you never “restart.” You must do cur = max(arr[i], cur + arr[i]).
- Confusing with “max sum subsequence”: Subarray = contiguous. Subsequence = any subset in order. Kadane is for subarray only.
Using “allow empty” (cur = max(0, cur + x)) when the problem says “non-empty contiguous subarray.” For [-1, -2, -3], the non-empty answer is -1; the empty-allowed answer is 0. Always clarify and implement accordingly.
Pattern Recognition
- “Maximum sum contiguous subarray”: Kadane.
- “Maximum product contiguous subarray”: Similar idea but track both max and min (negative × negative = positive).
- “Best contiguous segment” with a cumulative condition: Consider “best ending here” recurrence.
Say: “I’ll use Kadane’s algorithm. I’ll keep the maximum sum of a subarray ending at the current index, and either extend the previous best or start fresh. One pass, O(n) time, O(1) space.” If asked for indices, mention tracking start when we restart and end when we update the global best.
Practice Problems
- Maximum subarray sum: Standard Kadane; return the sum or the subarray indices.
- Maximum product subarray: Track cur_max and cur_min (and swap on negative); update best.
- Maximum sum circular subarray: Either max subarray in linear array, or total − min subarray (wrap-around case).
Summary
- Kadane’s algorithm finds the maximum sum of a contiguous subarray in O(n) time, O(1) space.
cur = max(arr[i], cur + arr[i])(max sum ending at i);best = max(best, cur).- Non-empty: initialize cur = arr[0], best = arr[0]. Empty allowed: cur = 0, best = 0 and cur = max(0, cur + arr[i]).
- All-negative: non-empty answer = maximum element; empty allowed = 0.
5.9 Binary Search on Answer
Introduction
Binary search on the answer (also called binary search on value or answer space binary search) is used when you want to find the minimum or maximum value that satisfies a condition, and that value lies in a known range [low, high]. Instead of checking every value in the range, you binary search: pick mid; ask “is mid a valid answer?” (or “can we achieve at least/most mid?”). If the predicate is monotonic—e.g. “if x works, then every value ≥ x works”—you can discard half of the range each time. Total cost is O(log(range) × cost of one predicate check). This section covers when to use it, how to design the predicate, and classic problems (minimum capacity, split array, etc.).
Real-World Analogy
Imagine finding the minimum speed at which you must drive to reach a city within 10 hours. Speeds 1, 2, 3, … up to some max. If speed 50 works, then 51, 52, … also work. If 50 doesn’t work, then 49, 48, … don’t work. So “does speed x work?” is monotonic in x. You binary search on speed: try 50; if it works, try 25; if not, try 75; and so on. You need only about log₂(max_speed) tries instead of checking every speed.
Split array largest sum: Partition array into k contiguous subarrays; minimize the largest sum among them. Answer is in [max(arr), sum(arr)]. For a candidate “max sum” S, we can greedily form segments so each has sum ≤ S and count how many segments we need. If segments ≤ k, S is feasible. If we can do it with S, we can do it with S+1. Binary search on S; each check is O(n). Total O(n log(sum)).
Formal Definition
Answer space: The answer is an integer (or real) in range [low, high]. Predicate: feasible(x) = “is x a valid answer?” (or “can we achieve at least x?” / “at most x?”). Monotonicity: For “minimize the answer,” typically if feasible(mid) then every value ≥ mid is feasible (so we try left half for a smaller answer). For “maximize,” if feasible(mid) then every value ≤ mid is feasible (try right half). Algorithm: Binary search on [low, high]; at each mid, call feasible(mid); narrow the range. Time O(log(range) × T), where T = cost of feasible().
The hardest part is (1) identifying that the answer lies in a range and (2) writing a correct, monotonic predicate. Once you have that, the binary search loop is standard.
Why This Topic Matters
- Interviews: “Minimum capacity to ship in D days,” “Koko eating bananas,” “Split array largest sum,” “minimum time to complete trips”—all classic “binary search on answer” problems.
- Optimization: Turns “find minimum/maximum x such that P(x)” into O(log(range)) iterations instead of linear scan over the answer space.
- Pattern: When the problem asks “minimize the maximum” or “maximize the minimum” and you can check “is x achievable?” in reasonable time, consider binary search on x.
Mental Model
The answer is “some number” in [low, high]. You don’t search the array by index—you search the value of the answer. For each candidate value mid, you ask: “If the answer were mid, would that be valid?” (feasible(mid)). The key is that feasibility is monotonic: once you cross a threshold, all values on one side work and all on the other don’t. So you can binary search that threshold.
When to Use
- Problem asks for minimum or maximum of something (capacity, speed, sum, time).
- You can define a range [low, high] that contains the answer.
- You can write a function feasible(x) that returns True if x is achievable/valid, and feasibility is monotonic (e.g. if x is valid, then any y ≥ x is valid for “minimize” problems).
Algorithm: Minimize the Answer
Find the smallest value x in [low, high] such that feasible(x) is True. Typical: if feasible(mid), then we can try smaller—search left (high = mid). Otherwise search right (low = mid + 1).
def minimize_answer(low, high, feasible):
while low < high:
mid = (low + high) // 2
if feasible(mid):
high = mid # try smaller
else:
low = mid + 1
return low
Loop invariant: answer is in [low, high]. When low == high, that value is the minimum feasible.
Algorithm: Maximize the Answer
Find the largest value x such that feasible(x) is True. If feasible(mid), try larger (low = mid + 1); else try smaller (high = mid - 1). Or keep the same loop and return low - 1 / high depending on how you shrink.
def maximize_answer(low, high, feasible):
while low < high:
mid = (low + high + 1) // 2 # ceiling to avoid infinite loop
if feasible(mid):
low = mid
else:
high = mid - 1
return low
Example: Minimum Capacity to Ship in D Days
Weights in array; ship all in at most D days; same order; minimize the maximum weight per day (capacity). Answer in [max(weights), sum(weights)]. feasible(cap): can we ship all with capacity cap in ≤ D days? Greedy: pack days until adding the next item would exceed cap; then start a new day. If days needed ≤ D, cap is feasible. Monotonic: if cap works, cap+1 works. Binary search for minimum cap.
def ship_within_days(weights, D):
def feasible(cap):
days = 1
cur = 0
for w in weights:
if cur + w > cap:
days += 1
cur = w
else:
cur += w
if days > D:
return False
return True
low, high = max(weights), sum(weights)
while low < high:
mid = (low + high) // 2
if feasible(mid):
high = mid
else:
low = mid + 1
return low
Line-by-Line Notes
- feasible(cap): simulate shipping with daily capacity cap; count days. If days ≤ D, cap is valid.
- low must be at least max(weights) (one item per day); high at most sum(weights) (one day).
- We want minimum cap, so when feasible(mid) we set high = mid to try smaller.
ASCII Diagram: Binary Search on Answer
Range [1, 10]. Find minimum x such that feasible(x).
feasible(1)=F, feasible(2)=F, feasible(3)=F, feasible(4)=T, feasible(5)=T, ...
low=1, high=10 → mid=5, feasible(5)=T → high=5
low=1, high=5 → mid=3, feasible(3)=F → low=4
low=4, high=5 → mid=4, feasible(4)=T → high=4
low=4, high=4 → return 4
Time and Space Complexity
- Iterations: O(log(high - low + 1))—each step halves the range.
- Per iteration: Cost of feasible(mid). Often O(n) for array problems.
- Total: O(T × log(range)), where T = cost of feasible(). Space O(1) plus any space used by feasible().
Edge Cases
- No feasible value: If even high is not feasible (minimize) or low not feasible (maximize), handle after the loop or ensure range is chosen so that at least one value is feasible.
- low > high: Range is empty; no solution (or adjust low/high so the answer is in range).
- Integer overflow for mid: Use mid = low + (high - low) // 2 in other languages.
Common Mistakes
- Predicate not monotonic: If feasible(x) is true for some x and false for larger x, binary search is wrong. Verify “if x works, then x+1 works” (or the appropriate direction).
- Wrong bound update (maximize with mid): When maximizing, use mid = (low + high + 1) // 2 so that when feasible(mid) and you set low = mid, you don’t get stuck (low=3, high=4, mid=3 forever).
- Wrong range: low/high must include the answer. E.g. minimum capacity must be ≥ max(weights).
In “maximize” binary search, writing mid = (low + high) // 2 and then low = mid when feasible: when low and high are consecutive (e.g. 4 and 5), mid stays 4, and if feasible(4) is True you set low=4 again—infinite loop. Use mid = (low + high + 1) // 2 so mid moves to high.
Evolution: Linear Scan vs Binary Search on Answer
Naive: try each value from low to high until you find the minimum feasible—O(range × T). Binary search: O(log(range) × T). When the range is large (e.g. up to 10^9), only binary search is practical.
If the problem says “minimize the maximum” or “maximize the minimum” and you can check “can we achieve value x?” in O(n) or O(n log n), binary search on the answer often gives the optimal complexity. Always check monotonicity of the predicate.
Pattern Recognition
- “Minimize the maximum …” / “Maximize the minimum …”: Candidate for binary search on answer.
- “What is the minimum capacity/speed/time such that …?”: Answer in a range; define feasible(cap) and binary search.
- “Split into k parts, minimize the largest sum”: Binary search on the largest sum; feasible = greedy partition.
When you see “minimize the maximum” or “maximize the minimum,” say: “The answer is in a range. I’ll binary search on the answer. For each candidate value, I’ll check if it’s achievable—that’s my feasible function. If feasible is monotonic, we’re done. Time O(log(range) × cost of feasible).” Then implement the loop and the predicate.
Practice Problems
- Capacity to ship packages in D days: Minimize max capacity; feasible = greedy days.
- Koko eating bananas: Minimize speed; feasible = can finish in H hours.
- Split array largest sum: Minimize largest sum when splitting into k subarrays; feasible = greedy segments.
- Minimum time to complete trips: Binary search on time; feasible = can complete all trips.
Summary
- Binary search on answer finds min/max value in [low, high] such that feasible(x) is true. Requires monotonic predicate.
- Minimize: If feasible(mid), try smaller (high = mid); else low = mid + 1. Return low.
- Maximize: Use mid = (low + high + 1) // 2; if feasible(mid), low = mid; else high = mid - 1. Return low.
- Choose [low, high] so the answer is inside; implement feasible() correctly and check monotonicity.
5.10 2D Arrays
Introduction
A 2D array (or matrix) is an array of arrays: each element is itself an array, so you access elements with two indices—mat[row][col] or mat[i][j]. Rows and columns form a grid; dimensions are “rows × columns” (e.g. 3×4). In Python, a 2D array is a list of lists. Used for matrices in math, grids in games, adjacency structures for graphs, and images (pixels). This section covers creating 2D arrays correctly in Python, indexing, traversal patterns, and common pitfalls (shallow copy).
Real-World Analogy
Think of a spreadsheet or chessboard: rows and columns. Cell (2, 3) means row 2, column 3. A 2D array is the same: first index = row, second index = column. Like a 1D array is a row of lockers, a 2D array is a grid of lockers—you need two coordinates (row, col) to open one.
mat = [[1,2,3], [4,5,6], [7,8,9]] is a 3×3 matrix. mat[0][0]=1, mat[1][2]=6, mat[2][1]=8. Rows are mat[0], mat[1], mat[2]; number of rows = len(mat), number of columns = len(mat[0]) (assuming non-empty).
Formal Definition
2D array (matrix): A structure with rows and columns. Element at row r and column c is accessed as mat[r][c]. Dimensions: rows × cols. In memory, often stored row-major (row 0, then row 1, …) so mat[r][c] is at base + r×cols + c. In Python, mat is a list of lists; mat[r] is the r-th row (a list); mat[r][c] is the c-th element of that row. Access and update: O(1).
All rows should have the same length (rectangular matrix) unless you explicitly need a “ragged” 2D structure.
Why This Topic Matters
- Foundation: Matrices in math, dynamic programming tables, graph adjacency matrices, grid problems (BFS/DFS).
- Interviews: Matrix traversal, spiral, search in sorted 2D array, island count, path sum—all assume comfort with 2D indexing and bounds.
- Python gotcha: Creating 2D arrays with
[[0]*c]*rreuses the same row; use[[0]*c for _ in range(r)]for independent rows.
Mental Model
Rows are horizontal; columns are vertical. mat[i][j] = row i, column j. Row index i goes from 0 to rows−1; column index j from 0 to cols−1. Traversal “row by row” is for i in range(rows): for j in range(cols): mat[i][j]. “Column by column” swaps the loops (j outer, i inner).
Creating a 2D Array in Python
Correct: Independent Rows
rows, cols = 3, 4
mat = [[0] * cols for _ in range(rows)]
This creates rows separate lists, each of length cols. Changing mat[0][0] does not affect mat[1][0].
Wrong: Shallow Copy
mat = [[0] * cols] * rows # BAD: same row repeated
Here all rows are the same list. mat[0] is mat[1] is mat[2]. So mat[0][0] = 1 makes mat[1][0] and mat[2][0] also 1. Never use this for a mutable 2D array.
Using [[0]*cols]*rows creates one inner list referenced by every row. Use [[0]*cols for _ in range(rows)] so each row is a new list.
Dimensions and Bounds
rows = len(mat)
cols = len(mat[0]) if mat else 0
Valid indices: 0 ≤ i < rows, 0 ≤ j < cols. For non-rectangular lists, cols might vary; then use len(mat[i]) for the i-th row.
Access and Update
Same as 1D: mat[i][j] to read or write. O(1).
mat[1][2] = 99
x = mat[0][0]
Traversal
Row by Row (Standard)
for i in range(len(mat)):
for j in range(len(mat[0])):
print(mat[i][j])
Column by Column
for j in range(len(mat[0])):
for i in range(len(mat)):
print(mat[i][j])
With enumerate
for i, row in enumerate(mat):
for j, val in enumerate(row):
print(i, j, val)
ASCII Diagram
mat[3][4]: col 0 col 1 col 2 col 3
row 0 a b c d
row 1 e f g h
row 2 i j k l
mat[1][2] = g
len(mat) = 3, len(mat[0]) = 4
Time and Space Complexity
- Access/update mat[i][j]: O(1).
- Traverse all elements: O(rows × cols).
- Space: O(rows × cols) for the matrix.
Edge Cases
- Empty matrix: mat = []; len(mat)=0. Don’t use len(mat[0]). Check
if not mat or not mat[0]before using dimensions. - Single row: mat = [[1,2,3]]; one row, three columns.
- Single column: mat = [[1],[2],[3]]; three rows, one column.
Common Mistakes
- [[0]*c]*r: Same row reference; mutations affect every row. Use list comprehension.
- Off-by-one in bounds: Valid (i,j) are 0 to rows−1 and 0 to cols−1. Loop conditions:
range(len(mat)),range(len(mat[0])). - Assuming rectangular: If rows can have different lengths, use len(mat[i]) for the j-loop bound.
Pattern Recognition
- Grid / matrix problems: 2D array; neighbors = (i±1,j), (i,j±1) or 8-direction.
- DP table: Often 2D; dp[i][j] from dp[i-1][j], dp[i][j-1], etc.
- Next topics: Matrix traversal (5.11), spiral (5.12) build on 2D indexing.
When given a matrix, state dimensions: “It’s rows × cols. I’ll use mat[i][j] for row i, column j. I’ll traverse with nested loops—row by row unless we need column order. I’ll check for empty matrix and bounds before indexing.” If you need to create a 2D array, use [[0]*cols for _ in range(rows)].
Practice Problems
- Matrix initialization: Create rows×cols matrix of zeros (correct way).
- Transpose: New matrix where new[i][j] = mat[j][i].
- Sum all elements: Nested loop; O(rows×cols).
Summary
- 2D array = list of lists;
mat[i][j]= row i, column j. Dimensions: rows × cols. - Create with
[[0]*cols for _ in range(rows)]—not[[0]*cols]*rows. - Traverse row-by-row or column-by-column with nested loops; check empty and bounds.
5.11 Matrix Traversal
Introduction
Matrix traversal means visiting every cell of a 2D array in a defined order. Common patterns: row-wise (left to right, top to bottom), column-wise, diagonal (main, anti-, or all diagonals), layer/ring (from outer boundary inward—used in spiral), and BFS/DFS on the grid (neighbor-based). The choice of order affects how you solve problems (e.g. DP row-by-row, spiral by layer). This section covers these patterns, how to iterate diagonals and boundaries, and 4- vs 8-neighbor movement for grid problems.
Real-World Analogy
Imagine reading a page of text: you go left to right, then next line (row-wise). Or scanning a form column by column (column-wise). Diagonal is like cutting the grid with lines of slope 1 or −1. Layer by layer is like peeling an onion—outer rectangle first, then the inner one. Each order suits different tasks (fill row-by-row for DP; peel for spiral).
3×3 matrix: row-wise order is (0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2). Main diagonal: (0,0), (1,1), (2,2). Layer 0 = top row, right col, bottom row, left col; layer 1 = center. Spiral (next topic) follows layers with a specific winding order.
Formal Definition
Traversal: A sequence of cell indices (i, j) that visits each cell of an rows×cols matrix exactly once (or as needed). Row-wise: for i in range(rows): for j in range(cols). Column-wise: for j in range(cols): for i in range(rows). Diagonal: cells with constant (i−j) or (i+j). Layer: cells on the boundary of a sub-rectangle; repeat for inner rectangles. Neighbors: 4-direction (i±1, j), (i, j±1); 8-direction adds diagonals.
Why This Topic Matters
- DP and iteration: Many DP tables are filled row by row or diagonal by diagonal.
- Spiral / boundaries: Topic 5.12 uses layer-wise traversal; you need clean boundary loops.
- Grid BFS/DFS: Graph-like traversal on (i,j) with 4 or 8 neighbors—islands, shortest path in maze.
Mental Model
Row-wise = “for each row, scan all columns.” Column-wise = “for each column, scan all rows.” Diagonal = “all (i,j) where i−j = d” (one diagonal) or “for each sum s, all (i,j) with i+j = s.” Layer = “for k = 0,1,…, min(rows,cols)//2, traverse the k-th boundary (top row, right col, bottom row, left col of the k-th rectangle).”
Row-Wise and Column-Wise
# Row-wise (standard)
for i in range(rows):
for j in range(cols):
process(mat[i][j])
# Column-wise
for j in range(cols):
for i in range(rows):
process(mat[i][j])
Diagonal Traversal
Main Diagonal (i == j)
for i in range(min(rows, cols)):
process(mat[i][i])
All Diagonals (constant i − j)
For each value d = i − j, we get one diagonal. d ranges from -(cols-1) to (rows-1). For each d, iterate (i, j) such that i − j = d and 0 ≤ i < rows, 0 ≤ j < cols.
for d in range(-(cols - 1), rows):
for i in range(rows):
j = i - d
if 0 <= j < cols:
process(mat[i][j])
Anti-Diagonals (constant i + j)
Sum s = i + j ranges from 0 to (rows−1)+(cols−1). For each s, iterate valid (i, j) with i + j = s.
for s in range(rows + cols - 1):
for i in range(rows):
j = s - i
if 0 <= j < cols:
process(mat[i][j])
Layer / Boundary Traversal
Traverse the k-th “ring” (boundary of the rectangle from row k to row rows−1−k and col k to col cols−1−k). Top row (k, k) to (k, cols−1−k), right col (k+1, cols−1−k) to (rows−1−k, cols−1−k), bottom row (rows−1−k, cols−2−k) to (rows−1−k, k), left col (rows−2−k, k) to (k+1, k). Handle the case when the ring collapses to a single row or column.
def traverse_boundary(mat, k):
rows, cols = len(mat), len(mat[0])
top, bottom = k, rows - 1 - k
left, right = k, cols - 1 - k
if top > bottom or left > right:
return
for j in range(left, right + 1):
process(mat[top][j])
for i in range(top + 1, bottom + 1):
process(mat[i][right])
if top != bottom:
for j in range(right - 1, left - 1, -1):
process(mat[bottom][j])
if left != right:
for i in range(bottom - 1, top, -1):
process(mat[i][left])
Repeat for k = 0, 1, … until the rectangle is empty (top > bottom or left > right). This is the structure used for spiral order.
4-Direction and 8-Direction Neighbors
For cell (i, j), 4-neighbors: (i−1,j), (i+1,j), (i,j−1), (i,j+1). 8-neighbors add (i−1,j−1), (i−1,j+1), (i+1,j−1), (i+1,j+1). Use for BFS/DFS on grids (maze, islands).
# 4-direction
dr = [-1, 1, 0, 0]
dc = [0, 0, -1, 1]
for d in range(4):
ni, nj = i + dr[d], j + dc[d]
if 0 <= ni < rows and 0 <= nj < cols:
# (ni, nj) is a valid neighbor
ASCII Diagram
Matrix 3×3:
j=0 j=1 j=2
i=0 a b c
i=1 d e f
i=2 g h i
Row-wise: a,b,c, d,e,f, g,h,i
Main diagonal: a, e, i (i=j)
Layer 0: a,b,c, f,i, h,g, d
Layer 1: e
Time and Space Complexity
- Any full traversal: Visits O(rows×cols) cells; time O(rows×cols).
- Space: O(1) for the traversal order itself; O(rows×cols) if you store visited (e.g. BFS/DFS).
Edge Cases
- Single row/column: Layer has only one segment (top row or left col); avoid duplicating corners in boundary traversal.
- Non-square: Diagonals have different lengths; main diagonal length = min(rows, cols).
Common Mistakes
- Boundary traversal: When top == bottom or left == right, don’t traverse the same row/column twice (skip the “bottom” or “left” segments when the ring is a single row or column).
- Neighbor bounds: Always check 0 ≤ ni < rows and 0 ≤ nj < cols before using (ni, nj).
When the problem involves “traverse the matrix” or “spiral” or “layer,” say: “I’ll use row/column indices and either nested loops (row-wise) or a boundary loop (layer). For neighbors I’ll use a dr, dc array and bounds check.” For spiral, mention that you traverse layer by layer (Topic 5.12).
Practice Problems
- Traverse diagonals: Print all elements in anti-diagonal order (i+j constant).
- Boundary of matrix: Print only the outer ring.
- BFS on grid: Shortest path in a 0/1 matrix (4-neighbors).
Summary
- Row-wise: for i: for j. Column-wise: for j: for i.
- Diagonal: constant (i−j) or (i+j); iterate (i, j) with bounds check.
- Layer: For each k, traverse top row, right col, bottom row, left col of the k-th rectangle; handle single row/col.
- Neighbors: 4-direction (dr, dc); 8-direction adds diagonals; always check bounds.
5.12 Spiral Matrix
Introduction
Spiral matrix traversal visits elements in the order they appear when you walk the matrix in a spiral: right along the top row, down the right column, left along the bottom row, up the left column, then repeat for the inner rectangle. Same idea as Topic 5.11’s layer traversal with a fixed winding order. Two common problems: (1) Given a matrix, return elements in spiral order; (2) Generate an n×n matrix filled with 1 to n² in spiral order. Both use the same layer-by-layer boundary logic. This section covers the standard solution, corner handling, and the generate variant.
Real-World Analogy
Imagine walking around the perimeter of a rectangular field: go right along the top edge, turn and go down the right edge, turn and go left along the bottom, turn and go up the left edge. You’re back near the start but one “layer” in. Repeat for the inner rectangle until you’ve covered the whole field. That’s spiral order—one boundary at a time, clockwise (or counter-clockwise; we use clockwise here).
Matrix [[1,2,3],[4,5,6],[7,8,9]]. Spiral order: 1,2,3, 6,9, 8,7, 4, 5. Layer 0: top row 1,2,3; right col 6,9; bottom row 8,7; left col 4. Layer 1: center 5.
Formal Definition
Spiral order (clockwise): For each layer k = 0, 1, …, traverse in order: (1) top row from (k,k) to (k, cols−1−k), (2) right column from (k+1, cols−1−k) to (rows−1−k, cols−1−k), (3) bottom row from (rows−1−k, cols−2−k) to (rows−1−k, k) if top ≠ bottom, (4) left column from (rows−2−k, k) to (k+1, k) if left ≠ right. Stop when the current rectangle is empty (top > bottom or left > right). Each cell is visited once; output length = rows×cols.
Why This Topic Matters
- Interviews: “Spiral order” and “generate spiral matrix” are common; clean boundary loops and single-row/column handling are what interviewers look for.
- Layer abstraction: Same pattern as general boundary traversal (5.11); spiral fixes the order (right, down, left, up).
Mental Model
Maintain four bounds: top, bottom, left, right. The current “ring” is the rectangle between these. Add all of the top row (left→right), then right column (top+1→bottom), then bottom row (right−1→left) only if there is more than one row, then left column (bottom−1→top+1) only if there is more than one column. Then shrink: top++, bottom--, left++, right--. Repeat until top > bottom or left > right.
Algorithm: Spiral Order (Read)
- Initialize top=0, bottom=rows−1, left=0, right=cols−1. Result list = [].
- While top ≤ bottom and left ≤ right:
- Traverse top row: for j from left to right, append mat[top][j].
- Traverse right column: for i from top+1 to bottom, append mat[i][right].
- If top < bottom: traverse bottom row from right−1 to left, append mat[bottom][j].
- If left < right: traverse left column from bottom−1 to top+1, append mat[i][left].
- top++, bottom--, left++, right--.
- Return result.
Python Implementation: Spiral Order
def spiral_order(mat):
if not mat or not mat[0]:
return []
rows, cols = len(mat), len(mat[0])
top, bottom = 0, rows - 1
left, right = 0, cols - 1
result = []
while top <= bottom and left <= right:
for j in range(left, right + 1):
result.append(mat[top][j])
top += 1
for i in range(top, bottom + 1):
result.append(mat[i][right])
right -= 1
if top <= bottom:
for j in range(right, left - 1, -1):
result.append(mat[bottom][j])
bottom -= 1
if left <= right:
for i in range(bottom, top - 1, -1):
result.append(mat[i][left])
left += 1
return result
Line-by-Line Notes
- After the top row we do top += 1, so the right column starts at the next row (no duplicate corner).
- After the right column we do right -= 1; bottom row goes from right to left (no duplicate).
if top <= bottom: when there is only one row left, we already added it as the “top” row; skip the bottom pass.if left <= right: when there is only one column left, we already added it in the right column; skip the left pass.
Generate Spiral Matrix (n×n, 1 to n²)
Same bounds (top, bottom, left, right). Fill with a counter: do the same four segments (top row, right col, bottom row, left col), writing counter and incrementing. Initialize mat with zeros; write in spiral order.
def generate_spiral(n):
mat = [[0] * n for _ in range(n)]
top, bottom = 0, n - 1
left, right = 0, n - 1
num = 1
while top <= bottom and left <= right:
for j in range(left, right + 1):
mat[top][j] = num
num += 1
top += 1
for i in range(top, bottom + 1):
mat[i][right] = num
num += 1
right -= 1
if top <= bottom:
for j in range(right, left - 1, -1):
mat[bottom][j] = num
num += 1
bottom -= 1
if left <= right:
for i in range(bottom, top - 1, -1):
mat[i][left] = num
num += 1
left += 1
return mat
ASCII Diagram
3×3: [1 2 3]
[4 5 6]
[7 8 9]
Layer 0: top row 1,2,3 → right 6,9 → bottom 8,7 → left 4
Layer 1: top row 5 (then top>bottom, done)
Order: 1,2,3,6,9,8,7,4,5
Time and Space Complexity
- Spiral order (read): Each cell visited once; time O(rows×cols). Space O(1) extra plus O(rows×cols) for the output list.
- Generate spiral: O(n²) time and space for the n×n matrix.
Edge Cases
- Empty matrix: Return [] or empty matrix.
- Single row: Top row adds all; right col and bottom row skipped (top > bottom after top+=1); left col skipped. Correct.
- Single column: Top row adds one cell; right col adds rest (or similar); bottom/left conditions avoid duplicates.
Common Mistakes
- Double-counting corners: After adding the top row, start the right column at top+1 (and we already did top+=1). After the right column, the bottom row should go from right−1 to left (and we did right−=1). Same for left column.
- Forgetting single row/column: Without “if top <= bottom” before the bottom row, a single horizontal strip would be added twice. Same for “if left <= right” and the left column.
Traversing the bottom row or left column without checking top < bottom or left < right. When the ring collapses to one row or one column, those segments were already covered by the top row or right column; adding them again duplicates elements.
Pattern Recognition
- “Spiral order” / “spiral traversal”: Layer-by-layer, four segments per layer with the if-checks.
- “Generate n×n spiral”: Same loop, write counter instead of read.
Say: “I’ll traverse layer by layer with top, bottom, left, right. For each layer I add the top row, right column, then if there’s more than one row the bottom row, then if there’s more than one column the left column. Then shrink the bounds.” Mention the single-row/column check to avoid duplicates.
Practice Problems
- Spiral order: Return elements of matrix in spiral order (as above).
- Generate spiral matrix: n×n matrix with 1..n² in spiral order.
- Spiral order for non-square: Same algorithm works for rows×cols.
Summary
- Spiral order = layer by layer: top row (left→right), right col (top+1→bottom), bottom row (right−1→left) if top<bottom, left col (bottom−1→top+1) if left<right.
- Update bounds after each segment (top++, right--, etc.) to avoid duplicate corners.
- Use
if top <= bottomandif left <= rightbefore bottom and left segments so single row/column are not traversed twice.
5.13 Subarrays
Introduction
A subarray is a contiguous segment of an array—elements from index start to index end (inclusive), with start ≤ end. So a subarray is uniquely defined by the pair (start, end). There are n(n+1)/2 subarrays in an array of length n. This is different from a subsequence, which keeps order but does not require contiguity (Topic 5.14). Most “subarray” problems (max sum, subarray sum equals K, longest with property) use techniques you’ve seen: Kadane, prefix sum, sliding window, two pointers. This section clarifies the definition, count, and how to enumerate or reason about subarrays.
Real-World Analogy
Imagine a train with numbered cars. A subarray is a connected stretch of cars: cars 3, 4, 5 together. You can’t take cars 3 and 5 and skip 4 and still call it a “subarray”—that would be a subsequence. So “subarray” = one contiguous block. The number of possible contiguous blocks is the number of ways to pick a start car and an end car (with start ≤ end)—that’s n(n+1)/2 for n cars.
Array [1, 2, 3]. Subarrays: [1], [1,2], [1,2,3], [2], [2,3], [3]—six total. 3(3+1)/2 = 6. [1,3] is not a subarray (not contiguous); it’s a subsequence.
Formal Definition
Subarray: For array arr of length n, a subarray is arr[start..end] for some 0 ≤ start ≤ end < n (0-based). It has (end − start + 1) elements. Count: For each start (0 to n−1), end can be start, start+1, …, n−1—that’s (n − start) choices. Total = n + (n−1) + … + 1 = n(n+1)/2. Subsequence (different): any subset of indices in order; not necessarily contiguous; 2^n possible.
Why This Topic Matters
- Terminology: Interview problems often say “subarray”; you must not confuse with “subsequence.” Subarray ⇒ contiguous; use Kadane, prefix sum, sliding window.
- Counting: “How many subarrays have sum K?” uses prefix + map; “max sum subarray” uses Kadane; “longest subarray with …” uses sliding window or two pointers.
Mental Model
Fix start; vary end from start to n−1. That gives all subarrays starting at start. Then start = 0, 1, …, n−1. So you can enumerate with two loops: for start in range(n): for end in range(start, n). The subarray is arr[start:end+1].
Enumerating All Subarrays
# All subarrays: (start, end) with 0 <= start <= end < n
for start in range(len(arr)):
for end in range(start, len(arr)):
sub = arr[start:end+1]
# process sub, or use start/end to compute sum via prefix
This is O(n²) subarrays; if you do O(1) work per subarray (e.g. sum via prefix), total O(n²). Often we don’t enumerate all—we use Kadane (O(n)) or prefix+map for “count subarray sum = K” (O(n)).
Count of Subarrays
Number of pairs (start, end) with 0 ≤ start ≤ end < n = n + (n−1) + … + 1 = n(n+1)/2. So brute-force “check every subarray” is at least Ω(n²) if we must look at each once.
Subarray vs Subsequence
| Subarray | Subsequence | |
|---|---|---|
| Definition | Contiguous segment | Any subset, order preserved |
| Count | n(n+1)/2 | 2^n |
| Techniques | Kadane, prefix, sliding window | DP, LIS-style |
Common Subarray Problems (Recap)
- Maximum sum subarray: Kadane (5.8)—O(n).
- Count subarrays with sum = K: Prefix sum + frequency map (5.6)—O(n).
- Longest subarray with sum ≤ K / at most K distinct: Sliding window (5.5)—O(n).
- Minimum/maximum length subarray with sum ≥ K: Sliding window—O(n).
Time and Space Complexity
- Enumerate all subarrays: O(n²) subarrays; O(n) per subarray if you copy → O(n³). With prefix sum for range sum: O(n²).
- Optimized solutions: Kadane O(n); prefix+map for count O(n); sliding window O(n).
Edge Cases
- Empty array: Zero subarrays (or define “empty subarray” if problem allows).
- Single element: One subarray: the element itself.
Common Mistakes
- Confusing subarray with subsequence: Subarray must be contiguous. [1,3] from [1,2,3] is a subsequence, not a subarray.
- Brute force when better exists: Don’t enumerate all subarrays for “max sum” or “count sum = K”—use Kadane or prefix+map.
Using “subsequence” techniques (e.g. pick/don’t pick DP) for a problem that asks for “subarray.” Subarray ⇒ contiguous; the recurrence is different (e.g. “max sum ending at i” for Kadane).
Pattern Recognition
- “Subarray” + max/min sum: Kadane or prefix + structure.
- “Subarray” + count with sum K: Prefix + hash map.
- “Subarray” + longest/shortest with condition: Sliding window or two pointers.
When the problem says “subarray,” confirm: “So we need a contiguous segment.” Then choose: max sum → Kadane; count sum = K → prefix + map; longest/shortest with property → sliding window. Don’t mix up with subsequence.
Practice Problems
- Max sum subarray: Kadane.
- Subarray sum equals K (count): Prefix + frequency map.
- Longest subarray with at most K distinct: Sliding window.
Summary
- Subarray = contiguous segment arr[start..end]. Count = n(n+1)/2.
- Subarray ≠ subsequence: Subsequence keeps order but not contiguity; 2^n possible.
- Enumerate with for start: for end in range(start, n). Optimized: Kadane, prefix+map, sliding window.
5.14 Subsequences
Introduction
A subsequence of an array is a sequence obtained by deleting zero or more elements without changing the order of the remaining elements. Unlike a subarray, elements do not have to be contiguous—you can skip any indices. There are 2^n subsequences (each element is either included or not). Many “subsequence” problems use dynamic programming (pick/don’t pick, or “state at index i”) or greedy (e.g. longest increasing subsequence with binary search). This section defines subsequences, how to enumerate them, and how they differ from subarrays (Topic 5.13).
Real-World Analogy
Imagine a queue of people. A subsequence is any group you get by asking some people to step out—the ones left stay in the same relative order. You might keep persons 1, 3, and 5; that’s a subsequence. A subarray would require that the kept people stood next to each other in the original queue. So “subsequence” = same order, any selection; “subarray” = one contiguous block.
Array [1, 2, 3]. Subsequences include: [], [1], [2], [3], [1,2], [1,3], [2,3], [1,2,3]—eight = 2^3. [1,3] is a subsequence (we skipped 2) but not a subarray. Every subarray is a subsequence; not every subsequence is a subarray.
Formal Definition
Subsequence: A sequence (b_0, b_1, …, b_{k−1}) is a subsequence of array arr if there exist indices 0 ≤ i_0 < i_1 < … < i_{k−1} < n such that b_j = arr[i_j] for each j. So we choose a strictly increasing sequence of indices and take those elements in order. Count: For each of the n elements, we either include it or not → 2^n subsequences (including the empty one). vs Subarray: Subarray = contiguous; count n(n+1)/2.
Why This Topic Matters
- Terminology: “Subsequence” ⇒ order preserved, not contiguous. Techniques are different from subarray (DP, LIS, pick/don’t pick).
- Classic problems: Longest increasing subsequence (LIS), longest common subsequence (LCS), “is s a subsequence of t?”—all use subsequence definition.
Mental Model
At each index, you have two choices: pick this element (add to the subsequence) or skip it. So generating all subsequences is like generating all subsets—2^n. For optimization (“longest subsequence with property”), you usually use DP: state = (index, maybe extra info); recurrence = pick vs don’t pick.
Generating All Subsequences
Recursion (Pick / Don’t Pick)
def subsequences(arr, i, path, result):
if i == len(arr):
result.append(path[:])
return
# Don't pick arr[i]
subsequences(arr, i + 1, path, result)
# Pick arr[i]
path.append(arr[i])
subsequences(arr, i + 1, path, result)
path.pop()
Call with subsequences(arr, 0, [], result). Each path is one subsequence. Total 2^n.
Using Bitmask
For small n, iterate mask from 0 to 2^n − 1; if bit j is set, include arr[j].
for mask in range(1 << len(arr)):
sub = [arr[j] for j in range(len(arr)) if (mask >> j) & 1]
# process sub
Count: 2^n
Each element is either in or out of the subsequence → n independent binary choices → 2^n. Including the empty subsequence.
Subsequence vs Subarray (Recap)
- Subarray: Contiguous; indices [start, end]; count n(n+1)/2. Use Kadane, prefix sum, sliding window.
- Subsequence: Any subset in order; count 2^n. Use DP (pick/don’t pick), LIS, LCS.
Classic Example: Longest Increasing Subsequence (LIS)
Find the length of the longest subsequence that is strictly increasing. DP: dp[i] = length of LIS ending at index i. For each i, look at all j < i with arr[j] < arr[i]; dp[i] = 1 + max(dp[j]). O(n²). Better: O(n log n) with binary search (patience sorting). This is a subsequence problem—we skip elements.
def lis_length(arr):
if not arr:
return 0
dp = [1] * len(arr)
for i in range(1, len(arr)):
for j in range(i):
if arr[j] < arr[i]:
dp[i] = max(dp[i], dp[j] + 1)
return max(dp)
Is Subsequence (Check)
Given two strings/arrays s and t, is s a subsequence of t? Two pointers: scan t; whenever t[j] == s[i], advance i. If i reaches len(s), yes. O(|s| + |t|).
def is_subsequence(s, t):
i = 0
for c in t:
if i < len(s) and s[i] == c:
i += 1
return i == len(s)
Time and Space Complexity
- Generate all: 2^n subsequences; O(n) each → O(n·2^n). Space O(n) recursion depth.
- LIS (DP): O(n²) time, O(n) space. LIS with binary search: O(n log n).
- Is subsequence: O(len(s) + len(t)).
Edge Cases
- Empty array: One subsequence: [].
- Empty s in “is s subsequence of t?”: Empty string is subsequence of any t; return True.
Common Mistakes
- Treating as subarray: Don’t use Kadane or sliding window for “longest increasing subsequence”—it’s not contiguous.
- Wrong order: Subsequence must preserve original order. [3,1,2] is not a subsequence of [1,2,3].
Using “subarray” techniques for “subsequence” problems. Max sum subsequence is different from Kadane (max sum subarray): for subsequence you can skip negative elements, so it’s sum of all positive (or pick/don’t pick DP). Always check the problem wording.
Pattern Recognition
- “Longest increasing subsequence”: DP O(n²) or binary search O(n log n).
- “Is s a subsequence of t?”: Two pointers O(n+m).
- “Count subsequences with property”: Often DP with state (index, …).
When the problem says “subsequence,” confirm: “So we can skip elements but must keep order.” Then: generating all → recursion or bitmask; longest/subsequence count → DP; “is A subsequence of B?” → two pointers. Don’t use subarray (contiguous) techniques.
Practice Problems
- Longest increasing subsequence: DP or binary search.
- Is subsequence: Two pointers.
- Longest common subsequence (LCS): Classic 2D DP.
Summary
- Subsequence = elements in original order, not necessarily contiguous. Count = 2^n.
- Subarray = contiguous; count n(n+1)/2. Don’t confuse.
- Generate all: recursion (pick/don’t pick) or bitmask. Optimize: DP (LIS, LCS) or two pointers (is subsequence).
6.1 Linear Search
Introduction
Linear search (also called sequential search) is the simplest search algorithm: you start at the beginning of a collection and check every element in order until you either find the target or reach the end. It makes no assumptions about the data—the array or list can be sorted, unsorted, or in any order. Because it does not exploit structure, in the worst case it must look at every element, giving O(n) time. That makes it the baseline "brute force" for search: correct for any input, but not fast when the data has useful structure (like sorted order). Mastering linear search teaches you how to scan collections safely, handle "not found," and recognize when a better algorithm (like binary search) can replace it.
Real-World Analogy
Imagine you're looking for a specific book on a shelf where books are not in any particular order. You have no choice but to start at one end and look at each spine until you find the title you want or run out of books. You cannot "jump to the middle" and conclude the book isn't in the other half—because there is no ordering, the target could be anywhere. That's linear search: one item at a time, in sequence. Contrast this with a sorted phone book, where you can open to the middle and eliminate half the names in one step—that's binary search (Topic 6.2). Linear search is what you do when you don't have that luxury.
Array arr = [40, 12, 88, 5, 23, 9], target 23. You check index 0 (40 ≠ 23), index 1 (12 ≠ 23), index 2 (88 ≠ 23), index 3 (5 ≠ 23), index 4 (23 = 23) → found at index 4. If the target were 99, you would check all six elements and then report "not found."
Formal Definition
Linear search: Given a sequence A of n elements (e.g. array or list) and a target value x, examine A[0], A[1], …, A[n−1] in order. If for some index i we have A[i] = x, return i (or True). If no such i exists, return a "not found" sentinel (e.g. −1 or False). No assumption is made about the order of elements. The algorithm is correct for any input; its cost depends on how many comparisons are made before a match or end of sequence.
Why This Topic Matters
- Foundation: Linear search is the first search strategy you should internalize. It works everywhere—unsorted arrays, linked lists, even streams—and is the fallback when no better structure exists.
- Baseline for comparison: When you learn binary search (O(log n)), you'll appreciate why "halving the search space" requires sorted data. Linear search is the O(n) baseline that motivates using better algorithms when the data allows.
- Interviews: Interviewers often ask "find the index of target" on an unsorted array—that's linear search. They may then ask "what if the array were sorted?" to lead you to binary search. Knowing when to use which is key.
- Built-in behavior: In Python,
target in lstandlst.index(target)are linear search under the hood. Understanding linear search explains how these work and when they are expensive.
Mental Model
Think of linear search as "walk from left to right; stop when you see the target or run out of elements." You maintain a single position (index). At each step, you ask: "Is this element the one I want?" If yes, you're done. If no, you move to the next index. If you've passed the last index without finding it, the target is not in the collection. No need to remember previous elements or look ahead—just one pass, one element at a time.
Step-by-Step Breakdown
- Initialize: You'll traverse indices from
0tolen(arr) - 1. No extra data structure is needed. - Loop: For each index
iin that range, comparearr[i]withtarget. - Match: If
arr[i] == target, returni(orTrue) immediately. You've found the target. - No match: If the loop completes without returning, the target was never seen. Return a sentinel value such as
−1orFalseto indicate "not found."
ASCII Diagram
arr = [ 40, 12, 88, 5, 23, 9 ] target = 23
i=0: 40 ≠ 23 → continue
i=1: 12 ≠ 23 → continue
i=2: 88 ≠ 23 → continue
i=3: 5 ≠ 23 → continue
i=4: 23 = 23 → return 4
If target = 99:
i=0..5: no match → after loop, return -1
Python Implementation
Basic: Return Index or −1
def linear_search(arr, target):
for i in range(len(arr)):
if arr[i] == target:
return i
return -1
Variant: Return Boolean (Exists or Not)
def linear_search_exists(arr, target):
for x in arr:
if x == target:
return True
return False
Variant: Find Last Occurrence
To find the last index where the target appears, scan from the end or keep updating the result as you see matches.
def linear_search_last(arr, target):
result = -1
for i in range(len(arr)):
if arr[i] == target:
result = i
return result
Line-by-Line Explanation (Basic Version)
for i in range(len(arr)):Iterates over every valid index from 0 to n−1. This guarantees we consider every element exactly once (until we return early).if arr[i] == target:The only comparison we need. For custom types, this might be replaced by a predicate (e.g.if predicate(arr[i])) or==if the type supports it.return i: As soon as we find a match, we return that index. No need to look further for "first occurrence" — the first match we see is the first occurrence because we go left to right.return -1: Reached only if the loop completed without returning. By convention, −1 often means "not found" for index-returning functions (since 0 is a valid index).
Time Complexity
Worst case: The target is not in the array, or it is at the last position. We compare arr[i] with target for every index i from 0 to n−1. That's n comparisons → O(n) time.
Best case: The target is at index 0. We do one comparison and return. So 1 comparison → O(1) time.
Average case (target in array, uniformly random position): On average the target is around the middle; we check about n/2 elements before finding it. So about n/2 comparisons → O(n). If the target might not be in the array at all, the average depends on the probability of presence; still linear in n.
We say linear search is Θ(n) in the worst case: we do at least n/2 and at most n comparisons when the target is absent or at the end. So worst-case time is both O(n) and Ω(n), hence Θ(n). For big-O we typically state O(n) and note that it's optimal for an unsorted array—you cannot do better than O(n) in the worst case without additional structure (e.g. a hash set) or assumptions (e.g. sorted order).
Space Complexity
We only use a loop variable (index i) and the input reference. No extra array or recursive call stack that grows with n. So O(1) auxiliary space.
Edge Cases
- Empty array:
range(0)runs zero times; we skip the loop and return −1. Correct. - Single element: One comparison: either it's the target (return 0) or not (return −1).
- Target at index 0: One comparison, return 0. Best case.
- Target at last index: n comparisons, return n−1. Worst case when target exists.
- Duplicate values: The basic implementation returns the first occurrence. If you need the last, use the "last occurrence" variant above.
- None or non-comparable elements: If
targetor elements can beNone, ensure your comparison doesn't raise. For custom objects,==must be defined; otherwise use an explicit predicate.
Common Mistakes
- Using linear search on a sorted array when you need speed: If the array is sorted, binary search (Topic 6.2) gives O(log n). Use linear search only when the array is unsorted or when you need a simple one-off check and n is small.
- Forgetting the "not found" return: If you only return inside the loop, you must return something after the loop (e.g. −1). Otherwise the function returns
Nonewhen the target is absent, which can cause bugs if the caller expects an integer. - Modifying the array while iterating: Don't add or remove elements from the list during the linear search loop; iteration behavior can become confusing. If you need to collect indices or modify data, do it in a separate pass or with a while-loop and explicit index management.
Using for x in arr when you need the index. Then you only have the value x, not its position. Use for i in range(len(arr)) and arr[i], or for i, x in enumerate(arr) so you have both.
Evolution: When to Use Linear vs Better Search
| Scenario | Use | Reason |
|---|---|---|
| Unsorted array, find target | Linear search | No structure to exploit; O(n) is optimal. |
| Sorted array, find target | Binary search | Halve search space each step; O(log n). |
| Need "exists?" only, small n | Linear or in | Simple and fast enough for small data. |
| Many searches on same data | Consider sort + binary, or hash set | Amortize cost: sort once O(n log n), then k × O(log n) or O(1) per lookup. |
Linear search is the brute force for search: correct and simple, but O(n). If your array is sorted, switch to binary search for O(log n). If you only need membership and will do many lookups, building a set from the array gives O(n) build and O(1) average lookup—better when k (number of lookups) is large. Use linear search when the data is unsorted, small, or you need the index and don't have (or don't want to maintain) extra structure.
Pattern Recognition
Many problems are "linear scan and compare": find first/last index, count occurrences, find min/max, or check a condition over the array. The pattern is: one loop, one pass, O(n). If the problem says "unsorted" or "arbitrary order," think linear search first. If it says "sorted" or "non-decreasing," think binary search.
Python Built-ins
target in arr performs a linear search and returns True or False. arr.index(target) returns the first index of target or raises ValueError if not found. Both are O(n). For a safe "index or -1" you can use:
try:
return arr.index(target)
except ValueError:
return -1
Or stick to an explicit loop so you have full control (e.g. last occurrence, custom predicate).
For "find index of target in an array," clarify: Is the array sorted? If yes, say you'd use binary search (Topic 6.2) for O(log n). If unsorted, implement linear search and state O(n) time, O(1) space. Mention that for unsorted data, O(n) is optimal in the worst case—you must look at every element if the target might be last or absent. If the interviewer asks "what if we search many times?" discuss sorting once and then binary search, or using a hash set for membership.
Practice Problems
- Implement linear search: return first index of target or −1.
- Find the last index of target in an unsorted array.
- Count how many times target appears (single linear pass).
- Find the minimum (or maximum) element and its index—same O(n) scan, different comparison.
Summary
- Linear search = scan elements in order from index 0 to n−1; return index when
arr[i] == target, else return −1 after the loop. - Time: O(n) worst and average; O(1) best (target at front). Space: O(1).
- Use for unsorted data or when you need the index and have no extra structure. For sorted data, prefer binary search.
- Edge cases: empty array (return −1), duplicates (first occurrence returned by basic version), last occurrence (keep updating result in loop).
- Python:
inandindex()are linear; use explicit loop when you need last index or custom logic.
6.2 Binary Search
Introduction
Binary search is the fast way to find a target in a sorted array: instead of checking every element (linear search, O(n)), you repeatedly compare the target with the middle element and throw away half of the remaining range. Each step halves the search space, so you need at most about log₂(n) steps—giving O(log n) time. This only works when the array is sorted (or ordered by some predicate) so that comparing with the middle tells you which half can contain the target. Binary search is a cornerstone of algorithm design: the same "narrow the range" idea appears in search-in-rotated-array, binary search on answer (Topic 5.9), and countless interview problems. Mastering the loop condition, the updates for left and right, and the first/last occurrence variants will make you interview-ready.
Real-World Analogy
Imagine a phone book sorted A–Z. To find "Smith," you don't read every page. You open to the middle: if you see "M," Smith is in the second half; if you see "T," Smith is in the first half. You throw away the half that cannot contain Smith and repeat on the remaining half. Each time you eliminate roughly half the pages. After about log₂(number of pages) steps, you're down to one page. That's binary search—divide and conquer by comparison with the middle. The critical requirement: the book must be sorted. In an unsorted pile of papers, opening to the middle tells you nothing about where "Smith" might be.
Sorted array arr = [2, 5, 7, 9, 12, 15], target 9. Step 1: mid = 2, arr[2]=7 < 9 → search right half [9, 12, 15]. Step 2: mid = 4, arr[4]=12 > 9 → search left half [9]. Step 3: mid = 3, arr[3]=9 == 9 → found at index 3. Only three comparisons instead of scanning all six elements.
Formal Definition
Binary search (sorted array): Let A be an array of n elements sorted in non-decreasing order, and x a target. Maintain a range [left, right] of indices where x could lie. While the range is non-empty, let mid = (left + right) // 2. If A[mid] = x, return mid. If A[mid] < x, then x (if present) must be in [mid+1, right]. If A[mid] > x, then x must be in [left, mid−1]. Update the range and repeat. The loop terminates when left > right (range empty → not found) or when a match is found. Invariant: If x is in the array, it lies in [left, right] at the start of each iteration.
Why This Topic Matters
- Speed: O(log n) vs O(n) for linear search on sorted data. For n = 1,000,000, that's about 20 comparisons instead of a million.
- Interview staple: "Find target in sorted array," "first/last position of target," "search in rotated sorted array," "binary search on answer"—all rely on the same pattern.
- Building block: Binary search on answer (Topic 5.9), lower_bound/upper_bound, and many optimization problems use "find the smallest/largest value that satisfies a condition" with a binary search over a range.
- Python: The
bisectmodule implements binary search for sorted lists. Knowing how it works helps you usebisect_left,bisect_right, and when to write a custom loop.
Mental Model
Keep a range [left, right] of indices where the target might be. Initially that's the whole array. Each step: look at the middle element. If it equals the target, you're done. If it's smaller than the target, the target (if present) must be to the right—so discard the left half and set left = mid + 1. If it's larger, the target must be to the left—set right = mid - 1. The range shrinks every time; when left > right, the range is empty and the target isn't in the array. The key is: one comparison with the middle tells you which half to keep—and that's only valid because the array is sorted.
Step-by-Step Breakdown
- Initialize:
left = 0,right = len(arr) - 1. The search range is the entire array. - Loop: While
left <= right(range is non-empty), computemid = (left + right) // 2. - Compare: If
arr[mid] == target, returnmid. Ifarr[mid] < target, setleft = mid + 1. Otherwisearr[mid] > target, setright = mid - 1. - Termination: If the loop exits without returning, the range became empty; return −1 (not found).
ASCII Diagram
Sorted array: [ 2, 5, 7, 9, 12, 15 ] target = 9
Index: 0 1 2 3 4 5
Step 1: left=0, right=5, mid=2 → arr[2]=7 < 9 → discard left half
[ 2, 5, 7, | 9, 12, 15 ]
left = 3, right = 5
Step 2: left=3, right=5, mid=4 → arr[4]=12 > 9 → discard right half
[ 9, 12, 15 ]
left = 3, right = 3
Step 3: left=3, right=3, mid=3 → arr[3]=9 == 9 → return 3
Python Implementation
Standard: Any Occurrence, Return Index or −1
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
if arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
First Occurrence (Leftmost Index)
When the array may contain duplicates, "first index of target" means: when you find a match at mid, don't return yet—there might be another match to the left. Remember mid as a candidate and search [left, mid−1].
def first_index(arr, target):
left, right = 0, len(arr) - 1
result = -1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
result = mid
right = mid - 1 # keep looking left
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return result
Last Occurrence (Rightmost Index)
For the last occurrence: when arr[mid] == target, set result = mid and search the right half with left = mid + 1.
def last_index(arr, target):
left, right = 0, len(arr) - 1
result = -1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
result = mid
left = mid + 1 # keep looking right
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return result
Recursive Version
Same logic, expressed recursively: base case is empty range (not found) or match at mid; otherwise recurse on the left or right half.
def binary_search_rec(arr, target, left, right):
if left > right:
return -1
mid = (left + right) // 2
if arr[mid] == target:
return mid
if arr[mid] < target:
return binary_search_rec(arr, target, mid + 1, right)
return binary_search_rec(arr, target, left, mid - 1)
# Call: binary_search_rec(arr, target, 0, len(arr) - 1)
Line-by-Line Explanation (Standard Version)
left <= right: We must include the caseleft == right(one element left). So the loop runs while the range is non-empty. Exiting whenleft > rightmeans we've considered every candidate—target not found.mid = (left + right) // 2: Integer division somidis a valid index. In C/Java,left + (right - left) / 2avoids overflow; in Python(left + right) // 2is standard.arr[mid] < target→left = mid + 1: Every element at index ≤ mid is ≤ arr[mid], so < target. The target cannot be there; search only[mid+1, right].else(arr[mid] > target) →right = mid - 1: Every element at index ≥ mid is ≥ arr[mid], so > target. Search only[left, mid−1].- We always exclude
midwhen we shrink (mid+1 or mid−1), so the range strictly shrinks and the loop cannot run forever.
Time Complexity
Why O(log n)? Each iteration compares the target with one element and then discards at least half of the current range. So the size of the range goes: n → n/2 → n/4 → … → 1 (or 0). The number of times we can halve n until we reach 1 is ⌈log₂(n)⌉ (or one more for the empty check). So the number of iterations is at most ⌈log₂(n+1)⌉, which is O(log n).
Best case: Target at the first mid we check—O(1). Worst case: Target not present or at a leaf of the "decision tree"—O(log n). Average case: O(log n) as well.
We say binary search is Θ(log n) in the worst case: we do at least and at most on the order of log n comparisons. No comparison-based search on a sorted array can do better than Ω(log n) in the worst case (decision-tree argument: n! orderings possible, each comparison gives a binary decision, so we need at least log₂(n!) ≈ n log n bits… for search we need log n comparisons). So binary search is optimal for sorted array search.
Space Complexity
Iterative version: Only a few variables (left, right, mid). O(1) auxiliary space.
Recursive version: Each recursive call uses stack space. The depth is the number of halvings, so O(log n) space for the call stack.
Edge Cases
- Empty array:
left=0,right=-1→left <= rightis false → return −1 immediately. - Single element:
left=right=0, one iteration, compare once, return 0 or −1. - Target not present: Loop eventually makes the range empty; return −1.
- All elements equal to target: Standard version returns any matching index. First_index returns 0; last_index returns n−1.
- Unsorted array: Binary search is incorrect—it may miss the target or return a wrong index. Always ensure the array is sorted (or that your predicate preserves the "discard half" property).
Common Mistakes
- Using binary search on unsorted data: The "discard half" logic depends on sorted order. If the array isn't sorted, use linear search or sort first.
- Loop condition
left < rightinstead ofleft <= right: Whenleft == right, there is still one element to check. Withleft < rightyou exit without checking it and can incorrectly return −1. - Not shrinking the range: You must set
left = mid + 1orright = mid - 1. Usingleft = midorright = midcan leave the range unchanged whenleft == midorright == mid, causing an infinite loop. - Integer overflow for mid (other languages): In C/Java,
(left + right) / 2can overflow. Useleft + (right - left) / 2. In Python this is not an issue for list indices.
Confusing "find exact index" with "find insertion point." For exact match we use left <= right and return when arr[mid] == target. For insertion point (smallest index where arr[i] >= target) we often use left < right and return left at the end—that's the bisect_left style. Don't mix the two loop invariants.
Evolution: Linear → Binary
| Approach | Requirement | Time | Space |
|---|---|---|---|
| Linear search | Any order | O(n) | O(1) |
| Binary search | Sorted array | O(log n) | O(1) iter / O(log n) rec |
Whenever the array (or search space) is sorted and you need to find a value or the boundary of a condition, binary search is the tool. If you need to run many searches on the same array, sorting once (O(n log n)) and then doing k binary searches (k × O(log n)) beats k linear searches (k × O(n)) for moderate to large k. For "find insertion point" or "lower_bound," use the same pattern with left < right and return left—or use bisect_left.
Pattern Recognition
Binary search applies when: (1) the data is sorted (or monotonic in some sense), and (2) comparing with the middle tells you which half to keep. The pattern: maintain [left, right], compute mid, compare, then set left = mid + 1 or right = mid - 1. The same idea extends to "binary search on answer": the "array" is a range of possible answers, and you have a predicate that is false for small values and true for large (or vice versa); you binary search for the boundary.
Python Built-ins: bisect
The bisect module provides binary search for sorted lists:
bisect.bisect_left(arr, target): Leftmost index i such that arr[i] ≥ target. If target is present, this is the first occurrence. Inserting target at this index keeps the list sorted.bisect.bisect_right(arr, target)(orbisect.bisect): Index one past the last occurrence of target—smallest i such that arr[i] > target. So last index of target =bisect_right(arr, target) - 1(if present).
To check existence and get first index: i = bisect_left(arr, target); if i < len(arr) and arr[i] == target, then index is i. Count of target: bisect_right(arr, target) - bisect_left(arr, target).
Clarify: "Is the array sorted? Are duplicates allowed? Return first index, last index, or any?" Then implement with left <= right, correct updates, and first_index/last_index if needed. Mention that for sorted arrays, binary search is O(log n) and optimal. If the problem is "insertion position" or "lower_bound," you can implement it or use bisect_left in Python. Be ready to derive why the loop terminates and why we need left <= right.
Practice Problems
- Find target in sorted array (return index or −1).
- Find first and last position of target in sorted array with duplicates.
- Search insert position: index where target would be inserted to keep order (same as bisect_left).
- Count occurrences of target in sorted array (last_index − first_index + 1 or bisect).
- Search in rotated sorted array (array is sorted then rotated; adapt the comparison logic).
Summary
- Binary search requires a sorted array. Maintain
[left, right]; compare target witharr[mid]; discard left or right half; repeat until found or range empty. - Time O(log n), space O(1) iterative / O(log n) recursive. Optimal for comparison-based search on sorted data.
- Use
left <= rightandmid = (left + right) // 2; updateleft = mid + 1orright = mid - 1so the range always shrinks. - First occurrence: on match, search left (
right = mid - 1). Last occurrence: on match, search right (left = mid + 1). - Python:
bisect_left/bisect_rightfor insertion position and range; use explicit loop when you need exact "first/last index" semantics or custom predicates.
6.3 Ternary Search
Introduction
Ternary search is a divide-and-conquer technique that splits the search space into three parts (instead of two like binary search) using two interior points, then discards one of the three segments. It is most useful for finding the maximum or minimum of a unimodal function—a function that first strictly increases and then strictly decreases (or the reverse). On a plain sorted array, ternary search is not better than binary search: you do more comparisons per step and reduce the range by one-third instead of one-half, so you get O(log₃ n) which is still O(log n) but with a worse constant. Where ternary search shines is unimodal optimization: given a function f that has a single peak (or valley), ternary search can find that peak in O(log n) evaluations of f. This appears in problems like "find the peak in a bitonic array," "minimize a cost function," or "find the maximum in a sequence that first goes up then down."
Real-World Analogy
Imagine you're standing on a hill that goes up to a single peak and then down on the other side. You can't see the whole hill; you can only check the height at positions you walk to. To find the peak efficiently, you might check two points partway along the range: if the left point is higher than the right, the peak must be to the left of the right point, so you discard the right third. If the right point is higher, discard the left third. If they're equal (or you're close enough), you're near the peak. You keep narrowing the range until you've found the top. That's ternary search on a unimodal "height" function—each step throws away at least one-third of the remaining range.
Bitonic array arr = [1, 3, 8, 12, 9, 5, 2] (increases to 12, then decreases). We want the index of the maximum. Compare at indices mid1 and mid2: if arr[mid1] < arr[mid2], the peak is in the right two-thirds; if arr[mid1] > arr[mid2], the peak is in the left two-thirds. We narrow until we have a single candidate (the peak).
Formal Definition
Unimodal function: A function f on indices [0, n−1] is unimodal if there exists an index m such that f is strictly increasing on [0, m] and strictly decreasing on [m, n−1] (or the reverse—strictly decreasing then strictly increasing). So there is exactly one local maximum (or minimum). Ternary search: Maintain a range [left, right]. Choose two interior points mid1 = left + (right - left) // 3 and mid2 = right - (right - left) // 3. Compare f(mid1) and f(mid2). If unimodal and we want the maximum: if f(mid1) < f(mid2), the maximum cannot be in [left, mid1]; if f(mid1) > f(mid2), the maximum cannot be in [mid2, right]. Update the range and repeat until the range is small enough (e.g. one or two elements).
Why This Topic Matters
- Unimodal optimization: Many real-world and contest problems ask for the maximum (or minimum) of a function that has a single peak—e.g. bitonic array peak, minimize cost with a convex structure. Ternary search is the standard tool.
- Interviews: "Find peak in a bitonic array" or "find the index of the maximum in an array that increases then decreases" are classic. Knowing ternary search (or the equivalent "compare two points and discard one third") shows you understand divide-and-conquer beyond binary.
- Comparison with binary search: For sorted array lookup, binary search is strictly better (fewer comparisons, same O(log n)). For unimodal functions, ternary search is natural because there is no single "middle" comparison that tells you "which half"—you need two points to decide which third to discard.
Mental Model
You have a range [left, right] and a unimodal function (one peak). Pick two points inside the range: one in the left third and one in the right third. Compare their values. If the left point is lower than the right, the peak must be to the right of the left point (otherwise the function wouldn't go up toward the right point)—so discard the left third. If the left point is higher, the peak must be to the left of the right point—discard the right third. You always throw away at least one-third of the range, so after O(log n) steps the range collapses to the peak.
Step-by-Step Breakdown (Find Maximum of Unimodal Array)
- Initialize:
left = 0,right = len(arr) - 1. - Loop: While the range has more than two elements (e.g.
right - left > 2), or whileleft < rightwith care: computemid1 = left + (right - left) // 3,mid2 = right - (right - left) // 3(so mid1 < mid2). - Compare: If
arr[mid1] < arr[mid2], the peak is in [mid1+1, right] (or at least not in [left, mid1]), setleft = mid1 + 1. Ifarr[mid1] > arr[mid2], the peak is in [left, mid2−1], setright = mid2 - 1. If equal, we can move either way (e.g.left = mid1 + 1). - Termination: When the range is small (e.g. 1 or 2 elements), compare and return the index of the maximum.
ASCII Diagram
Unimodal (bitonic): [ 1, 3, 8, 12, 9, 5, 2 ] find index of max
Index: 0 1 2 3 4 5 6
left=0, right=6 → mid1 = 2, mid2 = 4
arr[2]=8, arr[4]=9 → 8 < 9 → peak in right part → left = 3
left=3, right=6 → mid1 = 4, mid2 = 5
arr[4]=9, arr[5]=5 → 9 > 5 → peak in left part → right = 4
left=3, right=4 → small range: max(arr[3], arr[4]) = 12 at index 3 → return 3
Python Implementation
Find Index of Maximum in Unimodal (Bitonic) Array
def ternary_search_max(arr):
left, right = 0, len(arr) - 1
while right - left > 2:
mid1 = left + (right - left) // 3
mid2 = right - (right - left) // 3
if arr[mid1] < arr[mid2]:
left = mid1 + 1
else:
right = mid2 - 1
# Range has at most 3 elements; find max index
best = left
for i in range(left + 1, right + 1):
if arr[i] > arr[best]:
best = i
return best
Ternary Search on a Function (Find x that Maximizes f(x))
When the "array" is implicit—you have a function f and a continuous or integer range—evaluate f at mid1 and mid2 and narrow the range.
def ternary_search_func(f, left, right, eps=1e-9):
"""Find x in [left, right] that maximizes f(x). f is unimodal."""
while right - left > eps:
mid1 = left + (right - left) / 3
mid2 = right - (right - left) / 3
if f(mid1) < f(mid2):
left = mid1
else:
right = mid2
return (left + right) / 2
Line-by-Line Explanation (Array Version)
mid1 = left + (right - left) // 3: One-third of the way from left.mid2 = right - (right - left) // 3: One-third from the right. So we have three segments: [left, mid1], (mid1, mid2), [mid2, right].arr[mid1] < arr[mid2]: On a unimodal that increases then decreases, if the value at mid1 is less than at mid2, we're still on the "increasing" side—the peak is to the right of mid1. So discard [left, mid1] by settingleft = mid1 + 1.else:arr[mid1] >= arr[mid2]. Then we're at or past the peak; the peak is to the left of mid2. Setright = mid2 - 1.- When
right - left <= 2, we have at most 3 elements; a simple loop finds the maximum index. This avoids off-by-one issues at the end.
Time Complexity
Each iteration reduces the range to at most 2/3 of its size (we discard at least one-third). So the number of iterations k satisfies (2/3)^k · n ≤ 1, i.e. k = O(log₃ n) = O(log n). Each iteration does a constant number of comparisons and index computations. So time O(log n).
For the function version with real-valued domain and termination when right - left < eps, the number of iterations is O(log((right−left)/eps)).
Ternary search does more comparisons per step than binary search (2 vs 1) and reduces the range by a factor of 2/3 instead of 1/2. So for sorted array lookup, binary search is strictly better. Use ternary search only when the problem is unimodal optimization (find the peak/valley), not when you're just searching for a target value in a sorted list.
Space Complexity
Only a constant number of variables (left, right, mid1, mid2, best). O(1) auxiliary space.
Edge Cases
- Single element: left == right; skip the while loop, return left.
- Two elements: right - left == 1; we might enter the loop depending on condition (if we use
right - left > 2, we don't); then the final loop compares both and returns the max index. - Strictly increasing array: The "peak" is at the last index. Unimodal allows this (decreasing part is empty). Ternary search still converges to the last index.
- Strictly decreasing array: Peak at index 0. Similarly handled.
- Non-unimodal array: Ternary search can return a wrong (local) maximum. Ensure the array or function is unimodal before using.
Common Mistakes
- Using ternary search for sorted array lookup: For "find target in sorted array," use binary search. Ternary search is for finding the maximum/minimum of a unimodal function, not for equality check.
- Wrong mid1/mid2 or update: mid1 and mid2 must lie strictly between left and right, and we must discard a full third. Using
mid1 = (2*left + right) // 3andmid2 = (left + 2*right) // 3is equivalent. Whenarr[mid1] < arr[mid2], discard the left third (left = mid1 + 1); whenarr[mid1] > arr[mid2], discard the right third (right = mid2 - 1). - Infinite loop with small range: Ensure the loop condition (e.g.
right - left > 2) guarantees the range shrinks, and handle the base case (small range) explicitly.
Assuming ternary search is "faster" than binary search because it divides into three. For sorted array search, binary search does fewer comparisons (1 per step) and halves the range (better reduction). Ternary is for unimodal optimization where one comparison is not enough to decide which half to keep.
Comparison: Binary vs Ternary
| Use case | Algorithm | Time |
|---|---|---|
| Find target in sorted array | Binary search | O(log n), 1 compare/step |
| Find max/min in unimodal array | Ternary search | O(log n), 2 compares/step |
| Minimize unimodal function f(x) | Ternary search on domain | O(log (range/eps)) |
For unimodal functions, ternary search is the standard O(log n) method. Alternative: if you can compute the derivative (or discrete difference), you could use binary search on the sign of the derivative to find where it crosses zero—equivalent to finding the peak. For arrays, "find peak in bitonic array" is the classic ternary search problem.
Pattern Recognition
Think "ternary" when: (1) you need the maximum or minimum of something, and (2) that something is unimodal (one peak or one valley). Keywords: "bitonic," "increases then decreases," "single peak," "minimize cost that first decreases then increases." If the problem is "find where this value is" in a sorted list, use binary search instead.
If asked "find the peak in an array that first increases then decreases," describe ternary search: divide the range into thirds, compare the two interior points, discard one third. State O(log n) time, O(1) space. Mention that for plain "find target in sorted array," binary search is preferred. You can implement the loop with mid1/mid2 and handle the small-range base case by scanning the few remaining elements for the max.
Practice Problems
- Find the index of the maximum in a bitonic (unimodal) array.
- Find the minimum of a unimodal function (decreases then increases) over an integer or real range.
- Peak Index in a Mountain Array (LeetCode-style: array increases then decreases, find peak index).
Summary
- Ternary search divides the range into three parts using two points (mid1, mid2); compare and discard one third. Used for unimodal optimization (find peak or valley), not for sorted array lookup.
- Unimodal = strictly increasing then strictly decreasing (or the reverse). One local maximum (or minimum).
- Time O(log n), space O(1). For sorted array search, binary search is better (fewer comparisons per step).
- Implementation: while range large, compute mid1 and mid2, compare arr[mid1] and arr[mid2], set left = mid1+1 or right = mid2−1; then handle small range by linear scan for max.
6.4 Exponential & Interpolation Search
Introduction
This topic covers two search variants that improve on standard binary search in specific settings: exponential search when the target is likely near the start or when the array is effectively unbounded, and interpolation search when the sorted data is roughly uniformly distributed. Exponential search finds a range by repeatedly doubling an index (1, 2, 4, 8, …) until the value at that index exceeds the target, then runs binary search within that range—giving O(log i) time where i is the target's index. Interpolation search estimates the position of the target using the value at the endpoints (assuming uniform spread), then narrows the range like binary search but with a smarter probe—average O(log log n) for uniform data, but O(n) worst case for skewed distributions. Both assume a sorted array; neither is a drop-in replacement for binary search in all cases, but they are useful tools when the problem structure matches.
Real-World Analogy
Exponential search: Like searching for a word in a dictionary when you suspect it's in the first few pages. Instead of opening to the middle, you flip 1 page, then 2, then 4, then 8—until you've passed the word. Then you binary search in the small range you just bounded. Interpolation search: Like estimating where "Newton" sits in an alphabetically sorted list by the letters: "N" is about 14/26 of the way through the alphabet, so you might open the book about 14/26 of the way through. If the names were uniformly spread, that gets you close in one shot; if not, you adjust. Both methods exploit extra structure (target near start, or uniform distribution) to reduce work.
Exponential: Sorted array of 1000 elements, target at index 5. Binary search does ~10 steps over [0, 999]. Exponential: check indices 1, 2, 4, 8—at 8 we exceed or match; then binary search in [4, 8] (or [0, 8]). Fewer steps because we quickly bound the range. Interpolation: Array of values 10, 20, 30, …, 1000 (uniform). Target 320. We estimate position ≈ (320−10)/(1000−10) × n ≈ 0.31n; probe there and narrow. Often very few steps when data is uniform.
Why This Topic Matters
- Exponential search: Used when the target index is small (e.g. search in an unbounded or very large sorted stream, or "find first 1" in a sorted bit array). Also the backbone of "binary search in a range we don't know yet"—find the range by doubling, then binary search.
- Interpolation search: In theory and in practice, when data is uniformly distributed, interpolation search can do better than binary search (O(log log n) average). Useful in specialized settings (e.g. numeric keys in a known range).
- Interviews: Less common than binary search, but "search in an unbounded sorted array" or "find position with minimal comparisons when target is near start" can lead to exponential search. Interpolation is sometimes mentioned as a "faster average case when data is uniform."
Exponential Search
Formal Definition
Exponential search: Given a sorted array A and target x, find the smallest power-of-two range that could contain x: start with index i = 1; while i < n and A[i] < x, double i (e.g. i = 2, 4, 8, …). Then run binary search in the range [i/2, min(i, n−1)]. If the target's index is k, we need O(log k) steps to bound the range and O(log (range size)) ≈ O(log k) for binary search, so total O(log k) where k is the index of the target (or the upper bound we reach).
Mental Model
You don't know where the target is, but you want to find a small range that contains it. Jump 1, 2, 4, 8, … until you've passed the target (or reached the end). That gives you an upper bound. The target must lie in the previous "jump" range. Then binary search inside that range.
Step-by-Step
- If
arr[0] == target, return 0. - Set
i = 1. Whilei < nandarr[i] < target, seti *= 2(exponential jump). - We have bounded the target to the range [i/2, min(i, n−1)]. Run binary search in that range and return the result.
ASCII Diagram
Sorted array, target at index 5. i = 1, 2, 4, 8, ...
i=1: arr[1] < target → i=2
i=2: arr[2] < target → i=4
i=4: arr[4] < target → i=8
i=8: arr[8] > target or i >= n → stop. Range is [4, min(8,n-1)]
Binary search in [4, 8] → find target at 5.
Python Implementation
def exponential_search(arr, target):
n = len(arr)
if n == 0:
return -1
if arr[0] == target:
return 0
i = 1
while i < n and arr[i] < target:
i *= 2
# Binary search in range [i//2, min(i, n-1)]
left, right = i // 2, min(i, n - 1)
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
if arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
Time and Space Complexity (Exponential)
Let k be the index of the target (or the first index where we exceed the target). We need O(log k) doublings to reach or pass k. The range for binary search is at most 2× the last jump, so O(log k) for binary search. Total O(log k). If the target is at the beginning, this is much better than O(log n). Worst case (target at end): O(log n). Space O(1).
Interpolation Search
Formal Definition
Interpolation search: Given a sorted array A and target x, estimate the position of x assuming values are uniformly distributed between A[left] and A[right]: pos = left + (x - A[left]) * (right - left) // (A[right] - A[left]). If A[pos] == x, return pos. If A[pos] < x, search [pos+1, right]; else search [left, pos−1]. Average case (uniform distribution): O(log log n). Worst case (e.g. values increase exponentially): O(n).
Mental Model
You have a sorted range and know the values at the endpoints. If the values were evenly spaced, where would the target sit? That estimated index is your probe. If the target is there, you're done; otherwise narrow to the left or right subrange and repeat. When data is close to uniform, each step reduces the range by a large factor (not just half), giving very few steps on average.
Step-by-Step
- Maintain [left, right]. If
left > rightor target < arr[left] or target > arr[right], return −1 (target cannot be in range). - Compute
pos = left + (target - arr[left]) * (right - left) // (arr[right] - arr[left]). Clamp pos to [left, right]. - If
arr[pos] == target, return pos. Ifarr[pos] < target, setleft = pos + 1; elseright = pos - 1. Repeat.
Python Implementation
def interpolation_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
if arr[right] == arr[left]:
if arr[left] == target:
return left
return -1
pos = left + (target - arr[left]) * (right - left) // (arr[right] - arr[left])
pos = max(left, min(pos, right))
if arr[pos] == target:
return pos
if arr[pos] < target:
left = pos + 1
else:
right = pos - 1
return -1
Time and Space Complexity (Interpolation)
Average case (uniform distribution): Each probe reduces the range by a factor that depends on how close the estimate is; analysis gives O(log log n) expected comparisons. Worst case: When values are heavily skewed (e.g. 1, 2, 4, 8, …, 2^n), the estimate can be off and we might only eliminate one element per step → O(n). Space O(1).
Using interpolation search when the data is not roughly uniform (e.g. many duplicates, or geometric progression). Worst-case performance degrades to O(n). Also avoid division by zero when arr[right] == arr[left]—handle that case separately.
Comparison Table
| Algorithm | Best for | Time (typical) | Worst case |
|---|---|---|---|
| Binary search | General sorted array | O(log n) | O(log n) |
| Exponential search | Target near start, unbounded array | O(log k), k = index | O(log n) |
| Interpolation search | Uniformly distributed sorted data | O(log log n) avg | O(n) |
Use exponential search when the target is likely near the beginning (e.g. "find first occurrence of 1" in a sorted 0/1 array) or when the array is conceptually unbounded. Use interpolation search only when you have strong reason to believe the data is uniformly distributed; otherwise binary search is safer and predictable O(log n).
Edge Cases
- Exponential: Empty array; target at index 0 (handle before loop); target larger than all elements (i grows until i ≥ n, then binary search in [i/2, n−1]).
- Interpolation: arr[right] == arr[left] (avoid division by zero; check if target equals that value). Target outside [arr[left], arr[right]] (return −1). Duplicate values (formula can still give a valid index; duplicates may require scanning).
Pattern Recognition
Exponential: "Unbounded sorted array," "target likely near the start," "find range then search." Interpolation: "Sorted array with uniform spread," "numeric keys in a known range," "minimize comparisons when distribution is uniform." When in doubt, default to binary search.
For "search in an unbounded sorted array," describe exponential search: double the index until you pass the target, then binary search in the last range. State O(log k) where k is the target index. Interpolation search is less commonly asked; if mentioned, say it gives O(log log n) average for uniform data but O(n) worst case, and that binary search is usually the safe choice.
Practice Problems
- Search in an unbounded sorted array (exponential search to find range, then binary search).
- Find the first 1 in a sorted binary array (0s then 1s)—exponential search + first-occurrence binary.
- Implement interpolation search and compare with binary on uniformly distributed data.
Summary
- Exponential search: Bound the target by doubling index (1, 2, 4, 8, …); then binary search in that range. O(log k) where k is the target index. Best when target is near the start or array is unbounded.
- Interpolation search: Estimate position from value and endpoints; probe and narrow. O(log log n) average for uniform data, O(n) worst case. Use only when data is roughly uniformly distributed.
- Both assume a sorted array. For general sorted search, binary search remains the default; use exponential or interpolation when the problem structure matches.
6.5 Bubble Sort
Introduction
Bubble sort is a simple comparison-based sorting algorithm that repeatedly steps through the array, compares adjacent elements, and swaps them if they are in the wrong order. Each full pass "bubbles" the largest (or smallest) unsorted element to its correct position at the end (or beginning) of the segment. It is easy to understand and implement but inefficient for large data: time is O(n²) in the worst and average case, and O(n) in the best case (when the array is already sorted and we use an early-exit optimization). It is mainly used for teaching, small arrays, or when simplicity matters more than speed. Understanding bubble sort builds intuition for comparison-based sorting and sets the stage for faster algorithms like merge sort and quicksort (Topics 6.8–6.9).
Real-World Analogy
Imagine ordering a row of bottles by height. You walk left to right: if two adjacent bottles are out of order (shorter one on the right), you swap them. You keep doing full passes along the row. After one pass, the tallest bottle has "bubbled" to the right end. You repeat, ignoring the last position (already correct), until no swaps happen in a pass—then the row is sorted. That's bubble sort: repeatedly swap adjacent inversions until none remain.
Array [5, 2, 8, 1]. Pass 1: 5↔2 → [2,5,8,1]; 5<8 ok; 8↔1 → [2,5,1,8]. Largest (8) is at end. Pass 2: 2,5 ok; 5↔1 → [2,1,5,8]; 5<8 ok. Pass 3: 2↔1 → [1,2,5,8]. No more swaps → sorted.
Formal Definition
Bubble sort: For i from 0 to n−2 (outer loop over passes), for j from 0 to n−2−i (inner loop; after pass i, the last i elements are already the largest and sorted), compare arr[j] and arr[j+1]. If arr[j] > arr[j+1], swap them. After pass i, the (i+1)-th largest element is in place at index n−1−i. Stability: When elements are equal, we do not swap (use >, not >=), so bubble sort is stable. In-place: O(1) extra space (aside from the array).
Why This Topic Matters
- Foundation: One of the first sorting algorithms taught. Builds the idea of "compare and swap" and "one element per pass in place."
- Stability: Bubble sort is stable when implemented with strict
>(no swap on equal). Useful to contrast with non-stable sorts later. - Interviews: Rarely asked to implement for production, but "explain bubble sort" or "sort with only adjacent swaps" (e.g. minimum adjacent swaps to sort) can appear. Knowing why it's O(n²) and that better sorts exist is expected.
Mental Model
Each pass scans the unsorted portion and pushes the maximum to the right boundary. After k passes, the k largest elements are in their final positions at the end. So the unsorted region shrinks from the right. Alternatively: "repeatedly fix adjacent inversions until there are none"—when no swap occurs in a pass, the array is sorted (useful for early termination).
Step-by-Step Breakdown
- Outer loop: For pass i = 0, 1, …, n−2 (we need at most n−1 passes; after n−1 passes, the smallest has bubbled to the front or the largest to the back).
- Inner loop: For j from 0 to n−2−i, compare
arr[j]andarr[j+1]. Ifarr[j] > arr[j+1], swap. - Early exit (optimization): If in a full pass no swap happened, the array is sorted; break out.
ASCII Diagram
Initial: [ 5, 2, 8, 1 ]
Pass 1 (i=0): j=0: 5>2 swap → [2,5,8,1]; j=1: 5<8; j=2: 8>1 swap → [2,5,1,8]
Pass 2 (i=1): j=0: 2<5; j=1: 5>1 swap → [2,1,5,8]
Pass 3 (i=2): j=0: 2>1 swap → [1,2,5,8]; j=1: 2<5. No more swaps → done.
Python Implementation
Standard (No Early Exit)
def bubble_sort(arr):
n = len(arr)
for i in range(n - 1):
for j in range(n - 1 - i):
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
With Early Termination
def bubble_sort_optimized(arr):
n = len(arr)
for i in range(n - 1):
swapped = False
for j in range(n - 1 - i):
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
swapped = True
if not swapped:
break
Line-by-Line Explanation
for i in range(n - 1):We do at most n−1 passes. After pass i, the (i+1) largest elements are at indices n−1−i down to n−1.for j in range(n - 1 - i):Inner loop only goes up to n−2−i so we compare adjacent pairs in the unsorted part; we don't touch the last i elements (already in place).if arr[j] > arr[j + 1]:Strict>keeps the sort stable (equal elements not swapped). Swap so the larger moves right.
Time Complexity
Worst case: Reverse order. Every pair is inverted. Number of comparisons = (n−1) + (n−2) + … + 1 = n(n−1)/2 = O(n²).
Best case: Already sorted. With early termination, one pass (n−1 comparisons), no swaps → O(n). Without early exit, still (n−1)+(n−2)+…+1 = O(n²).
Average case: Random order; about half the pairs may be inverted. Still on the order of n² comparisons → O(n²).
Space Complexity
Only a few variables (i, j, maybe swapped). Sorting is in-place. O(1) auxiliary space.
Edge Cases
- Empty or single element: Loop range is empty or single pass; no issue; array unchanged (already sorted).
- Already sorted: With early exit, one pass then break. Without, still O(n²) comparisons.
- All equal: No swaps (we use
>); one pass with early exit. Stable.
Common Mistakes
- Inner loop bound: Use
range(n - 1 - i), notrange(n - 1), so we don't re-scan already-sorted tail (correctness is fine either way, but efficiency and standard definition expect shrinking range). - Using
>=for swap: That makes the sort unstable (equal elements may be reordered). Use>for stability.
Early termination (no-swap check) gives O(n) on already-sorted or nearly-sorted data. For general random data, bubble sort remains O(n²); for production use prefer O(n log n) sorts (merge sort, quicksort, or built-in sort).
Evolution: Bubble vs Better Sorts
| Algorithm | Time (avg/worst) | Stable |
|---|---|---|
| Bubble sort | O(n²) | Yes |
| Merge sort / Quick sort | O(n log n) | Merge yes, quick no (default) |
If asked to implement bubble sort, write the two nested loops with adjacent compare-and-swap and mention O(n²) time, O(1) space, stability. Add early termination for best case O(n). Acknowledge that in practice we use O(n log n) sorts or the language's built-in sort.
Practice Problems
- Implement bubble sort with and without early termination.
- Count the number of swaps bubble sort would do (equals number of inversions).
- Minimum adjacent swaps to sort an array (answer: inversion count).
Summary
- Bubble sort: Repeatedly compare adjacent elements and swap if out of order; each pass bubbles the largest to the right. O(n²) worst/average, O(n) best with early exit, O(1) space, stable (use
>). - Inner loop:
jfrom 0 ton - 2 - i; swap whenarr[j] > arr[j+1]. - Use for teaching or tiny arrays; prefer merge/quicksort or built-in sort for real use.
6.6 Selection Sort
Introduction
Selection sort works by repeatedly finding the minimum element in the unsorted portion of the array and swapping it to the front. After i passes, the first i positions hold the i smallest elements in sorted order; the rest is still unsorted. Unlike bubble sort, it does at most one swap per pass (the minimum into place), but it still needs to scan the unsorted part to find that minimum—so time remains O(n²) in all cases (best, average, worst). It is in-place (O(1) extra space) but not stable in the typical implementation: swapping the minimum to the front can move it past equal elements, changing their relative order. Selection sort is easy to implement and has the property that it does the minimum number of swaps (at most n−1) among comparison-based sorts—useful when swap cost is high compared to comparison cost.
Real-World Analogy
Imagine sorting a row of cards by value. You scan the whole row, find the smallest card, and put it in the first position (swap with whatever was there). Then you scan from the second position to the end, find the smallest in that range, and put it in the second position. You repeat: "find minimum of the rest, put it next." Each pass places one element in its final position with one swap. That's selection sort—select the minimum, place it, repeat.
Array [64, 25, 12, 22, 11]. Pass 1: min in [0..4] is 11 at index 4 → swap with index 0 → [11, 25, 12, 22, 64]. Pass 2: min in [1..4] is 12 at index 2 → swap with index 1 → [11, 12, 25, 22, 64]. Pass 3: min in [2..4] is 22 at index 3 → swap with index 2 → [11, 12, 22, 25, 64]. Pass 4: min in [3..4] is 25 at index 3 → no swap needed. Sorted.
Formal Definition
Selection sort: For i from 0 to n−2: (1) Find the index min_idx of the minimum element in arr[i..n−1]. (2) Swap arr[i] with arr[min_idx]. After step i, arr[0..i] contains the i+1 smallest elements in sorted order. Stability: The standard implementation (swap minimum to front) is not stable—equal elements can be reordered when the minimum is swapped from a later position. In-place: O(1) extra space. Swaps: At most n−1 swaps (one per pass, except possibly the last when the minimum is already at i).
Why This Topic Matters
- Minimal swaps: When swapping is expensive (e.g. large records, external storage), selection sort minimizes the number of swaps (at most n−1), while still being simple.
- Contrast with bubble sort: Bubble sort fixes inversions with many adjacent swaps; selection sort does one "long-distance" swap per pass. Both O(n²), but selection sort has a fixed, small swap count.
- Interviews: "Implement selection sort" or "sort with minimum swaps" can come up. Knowing it's unstable and O(n²) in all cases is important.
Mental Model
Keep a "sorted region" at the front (initially empty). Each pass: look at the entire unsorted region, find the smallest element, and move it to the end of the sorted region (one swap). The sorted region grows by one element each time; after n−1 passes, the last element is automatically in place.
Step-by-Step Breakdown
- Outer loop: For i = 0 to n−2 (after n−1 passes, the first n−1 positions are correct; the last is the maximum).
- Find minimum: Set
min_idx = i. For j from i+1 to n−1, ifarr[j] < arr[min_idx], setmin_idx = j. - Swap: If
min_idx != i, swaparr[i]andarr[min_idx].
ASCII Diagram
Initial: [ 64, 25, 12, 22, 11 ]
i=0: min in [0..4] at index 4 (11) → swap 64,11 → [11, 25, 12, 22, 64]
i=1: min in [1..4] at index 2 (12) → swap 25,12 → [11, 12, 25, 22, 64]
i=2: min in [2..4] at index 3 (22) → swap 25,22 → [11, 12, 22, 25, 64]
i=3: min in [3..4] at index 3 (25) → no swap. Done.
Python Implementation
def selection_sort(arr):
n = len(arr)
for i in range(n - 1):
min_idx = i
for j in range(i + 1, n):
if arr[j] < arr[min_idx]:
min_idx = j
if min_idx != i:
arr[i], arr[min_idx] = arr[min_idx], arr[i]
Variant—selection sort by maximum (place largest at end): For i from n−1 down to 1, find max in arr[0..i], swap to arr[i]. Same O(n²), same instability.
Line-by-Line Explanation
for i in range(n - 1):We need n−1 passes; after that, the first n−1 elements are the n−1 smallest in order, so the last element is the largest.min_idx = i: Assume the element at i is the minimum in the unsorted part; we'll update min_idx if we find a smaller one.for j in range(i + 1, n):: Scan every element after i. Ifarr[j] < arr[min_idx], we found a smaller element, somin_idx = j.if min_idx != i:: Only swap if the minimum wasn't already at i (avoids redundant swap).
Time Complexity
All cases: The inner loop always runs (n−1−i) times for pass i. Total comparisons = (n−1) + (n−2) + … + 1 = n(n−1)/2 = O(n²). There is no early exit—we always scan the full unsorted portion to find the minimum. So best, average, and worst case are all O(n²).
Swaps: At most n−1 swaps (one per pass when min_idx ≠ i). So when comparisons are cheap but swaps are expensive, selection sort can be preferable to bubble sort.
Space Complexity
Only loop indices and min_idx. O(1) auxiliary space; in-place.
Edge Cases
- Empty or single element:
range(n - 1)is empty; no iterations; array unchanged. - Already sorted: Still O(n²) comparisons; each pass finds min at i, so no swaps (min_idx == i every time).
- All equal: min_idx stays at i (we use
<), so no swaps. Order preserved among equals, but the algorithm is still considered unstable because with a different initial order of equals we could get reordering—stability is defined over all inputs.
Common Mistakes
- Including i in the min search: The minimum in the "unsorted" part can be at i itself, so we start with
min_idx = iand compare withjfrom i+1. Correct. - Forgetting the swap: Finding the minimum is useless if you don't swap it to position i. Always swap
arr[i]andarr[min_idx]when they differ.
Claiming selection sort is stable. The classic implementation (swap min to front) is not stable: if you have two equal elements and the later one gets selected as "min" and swapped to the front, it will now appear before the other equal element. For a stable O(n²) sort, use insertion sort (Topic 6.7).
Comparison: Selection vs Bubble
| Property | Selection sort | Bubble sort |
|---|---|---|
| Time (all cases) | O(n²) | O(n²) worst/avg; O(n) best with early exit |
| Swaps | At most n−1 | Up to O(n²) |
| Stable | No | Yes (with strict >) |
Selection sort minimizes swaps, which can matter when moving large objects. For general-purpose sorting, O(n log n) algorithms (merge sort, quicksort) or the language's built-in sort are preferred. Use selection sort when you need a simple in-place O(n²) sort and care about minimizing swap count.
Pattern Recognition
"Find the minimum (or maximum), put it in place, repeat"—that's selection sort. Useful when the problem asks for "minimum number of swaps to sort" (selection sort achieves at most n−1) or when implementing a simple sort from scratch.
If asked to implement selection sort: outer loop i from 0 to n−2, inner loop find min index in arr[i+1..n−1] (starting min_idx=i), then swap arr[i] and arr[min_idx]. State O(n²) for all cases, O(1) space, not stable, and at most n−1 swaps. Compare with bubble sort (many swaps, stable with care) and insertion sort (stable, good for nearly sorted).
Practice Problems
- Implement selection sort and count comparisons and swaps on a few inputs.
- Modify to sort by maximum (place largest at end) instead of minimum at front.
- Minimum number of swaps required to sort an array (when only "swap any two" is allowed: answer is n − (number of cycles in the permutation); when only "swap with minimum" is allowed, selection sort is optimal).
Summary
- Selection sort: For each position i, find the minimum in arr[i..n−1] and swap it to i. O(n²) time (all cases), O(1) space, not stable, at most n−1 swaps.
- Inner loop: find
min_idxin [i+1, n−1], then swaparr[i]andarr[min_idx]if different. - Use when swap count must be minimal or for teaching; prefer O(n log n) sorts or built-in sort in practice.
6.7 Insertion Sort
Introduction
Insertion sort builds the sorted array one element at a time by repeatedly taking the next element from the unsorted portion and inserting it into its correct position among the already-sorted elements. The sorted region grows from the left: after processing index i, the subarray arr[0..i] is sorted. To insert arr[i], we compare it with elements to its left and shift larger ones right until we find the right spot. Worst and average case are O(n²) (many comparisons and shifts), but best case is O(n) when the array is already sorted—each element is compared once and no shifts. Insertion sort is stable (we insert after equal elements) and in-place (O(1) extra space). It is the algorithm of choice for small or nearly sorted data and is how many people sort cards in hand. It also underlies efficient algorithms for incremental sorting and for small subarrays in hybrid sorts (e.g. quicksort with insertion sort for small segments).
Real-World Analogy
Imagine sorting a hand of cards. You keep the cards you've already sorted in your left hand (or on the table). You pick the next card from the unsorted pile and insert it into the correct position in the sorted part—sliding larger cards to the right as you go. You repeat until all cards are in the sorted region. That's insertion sort: each new element is inserted into the already-sorted prefix.
Array [12, 11, 13, 5, 6]. Sorted prefix starts as [12]. Insert 11: 11 < 12, shift 12 right → [11, 12]. Insert 13: 13 > 12, no shift → [11, 12, 13]. Insert 5: 5 < 13,12,11, shift all right → [5, 11, 12, 13]. Insert 6: 6 < 13,12,11, 6 > 5, shift 13,12,11 right → [5, 6, 11, 12, 13]. Done.
Formal Definition
Insertion sort: For i from 1 to n−1: assume arr[0..i−1] is sorted. Set key = arr[i]. Shift elements arr[j] (j = i−1, i−2, …) one position right while arr[j] > key. Place key in the vacated position. After iteration i, arr[0..i] is sorted. Stability: We shift only when arr[j] > key (strict), so equal elements are not moved past the inserted one—stable. In-place: O(1) extra space.
Why This Topic Matters
- Best case O(n): When the array is already sorted (or nearly sorted), insertion sort does one comparison per element and little or no shifting—faster than selection sort, which always does O(n²) comparisons.
- Stable and in-place: The only common O(n²) sort that is both stable and in-place. Useful when stability is required and n is small.
- Practical use: Small arrays (e.g. n ≤ 10–50), nearly sorted data, or as the base case in merge sort / quicksort (sort small subarrays with insertion sort).
- Interviews: "Implement insertion sort," "sort a stream one element at a time," or "why is insertion sort good for nearly sorted data?" are common.
Mental Model
Maintain a sorted prefix [0..i−1]. Take arr[i] and "insert" it: walk left from i−1, shifting every element that is greater than the key one step right, until you find an element ≤ key (or reach the start). Put the key in the gap. The sorted prefix is now [0..i]. Repeat for i = 1, 2, …, n−1.
Step-by-Step Breakdown
- Outer loop: For i from 1 to n−1 (index 0 is trivially sorted).
- Save key:
key = arr[i]. We will insert key into the sorted region [0..i−1]. - Shift and find position: Set j = i−1. While j ≥ 0 and
arr[j] > key, setarr[j+1] = arr[j]and j = j−1. The loop stops when we hit an element ≤ key or the start. - Place key:
arr[j+1] = key(the position that was vacated or is right after the element ≤ key).
ASCII Diagram
Initial: [ 12, 11, 13, 5, 6 ]
i=1: key=11, shift 12 right → [ 11, 12, 13, 5, 6 ]
i=2: key=13, 13>12, no shift → [ 11, 12, 13, 5, 6 ]
i=3: key=5, shift 13,12,11 right → [ 5, 11, 12, 13, 6 ]
i=4: key=6, shift 13,12,11 right, 6>5 stop → [ 5, 6, 11, 12, 13 ]
Python Implementation
Standard (Shift in a while-loop)
def insertion_sort(arr):
n = len(arr)
for i in range(1, n):
key = arr[i]
j = i - 1
while j >= 0 and arr[j] > key:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
Variant (Swap backward instead of shift)
Instead of shifting, swap the key backward until it is in order. Same O(n²), same stability if we use strict > when swapping.
def insertion_sort_swap(arr):
for i in range(1, len(arr)):
j = i
while j > 0 and arr[j - 1] > arr[j]:
arr[j - 1], arr[j] = arr[j], arr[j - 1]
j -= 1
Line-by-Line Explanation (Standard Version)
for i in range(1, n):We insert each element from index 1 onward into the sorted prefix [0..i−1].key = arr[i]; j = i - 1:We'll compare key with elements to its left; j is the current "slot" we're comparing with.while j >= 0 and arr[j] > key:As long as the element at j is greater than key, it must move right to make room. We copy it toarr[j+1]and decrement j. Strict>ensures stability.arr[j + 1] = key:When the loop exits, j is either −1 (key is smallest) or the index of the last element that is ≤ key. The insertion position is j+1.
Time Complexity
Worst case: Array in reverse order. Each insertion shifts all existing sorted elements. Comparisons and shifts: 1 + 2 + … + (n−1) = n(n−1)/2 = O(n²).
Best case: Already sorted. For each i, we compare key with arr[i−1] once and find it's not smaller, so no shifts. Total n−1 comparisons → O(n).
Average case: Random order; on average we shift about half of the sorted prefix per insertion → O(n²).
Space Complexity
Only variables i, j, and key. O(1) auxiliary space; in-place.
Edge Cases
- Empty or single element: Loop
range(1, n)is empty; array unchanged (already sorted). - Already sorted: Best case; one comparison per element, no shifts; O(n).
- All equal: No element satisfies
arr[j] > key, so no shifts; stable, O(n).
Common Mistakes
- Using
>=in the condition:arr[j] >= keywould shift equal elements, making the sort unstable. Usearr[j] > key. - Forgetting to place key: After the while loop, the vacated position is
arr[j+1]. Must assignarr[j+1] = key.
Starting the outer loop at 0. Index 0 is a single-element "sorted" region; we insert starting from index 1. If you start at 0, there is no "sorted prefix" to insert into.
Comparison: Insertion vs Bubble vs Selection
| Algorithm | Best | Worst/Avg | Stable |
|---|---|---|---|
| Insertion sort | O(n) | O(n²) | Yes |
| Bubble sort | O(n) with early exit | O(n²) | Yes |
| Selection sort | O(n²) | O(n²) | No |
For nearly sorted data, insertion sort is fast (O(n) best case). The insertion point can also be found with binary search in the sorted prefix—that reduces comparisons to O(n log n) but shifts remain O(n²), so overall still O(n²). Binary insertion is useful when comparisons are expensive. For small n, insertion sort often beats merge/quicksort due to low constant factors; many standard libraries use it for small subarrays.
Pattern Recognition
"Maintain a sorted prefix; take the next element and insert it in the right place"—that's insertion sort. Good when data arrives one element at a time (online sorting), when the array is small or nearly sorted, or when you need a stable in-place O(n²) sort.
Implement insertion sort: for i from 1 to n−1, key = arr[i], shift elements to the right while arr[j] > key, then arr[j+1] = key. State O(n) best (already sorted), O(n²) worst/average, O(1) space, stable. Mention that it's the preferred simple sort for nearly sorted data and for small n in hybrid sorts.
Practice Problems
- Implement insertion sort and test on already-sorted, reverse-sorted, and random arrays.
- Count inversions using insertion sort (each time you shift, you're fixing an inversion).
- Binary insertion sort: use binary search to find the insertion position, then shift. Compare total operations.
Summary
- Insertion sort: For each i from 1 to n−1, insert arr[i] into the sorted prefix [0..i−1] by shifting larger elements right and placing the key. O(n) best, O(n²) worst/average, O(1) space, stable.
- Use strict
arr[j] > keyfor stability; place key atarr[j+1]after the while loop. - Best of the simple O(n²) sorts for nearly sorted or small data; used as base case in many O(n log n) sorts.
6.8 Merge Sort
Introduction
Merge sort is a divide-and-conquer sorting algorithm: it splits the array into two halves, recursively sorts each half (using merge sort), then merges the two sorted halves into one sorted array. The merge step takes two sorted subarrays and combines them in linear time by repeatedly taking the smaller of the two front elements. Because we always divide in half and merge in O(n), the recurrence is T(n) = 2T(n/2) + O(n), which gives O(n log n) time in all cases (best, average, worst). Merge sort uses O(n) extra space for temporary arrays (or one shared temp array) and O(log n) stack space for recursion. It is stable when we take the left element when equal during merge. Merge sort is the go-to when you need guaranteed O(n log n), stability, or when sorting linked lists (where merge is natural and random access is not).
Real-World Analogy
Imagine sorting a deck of cards by splitting it in half, giving each half to a friend to sort (they do the same—split and hand off until they have one card each), then merging the two sorted piles: you always look at the top of each pile and take the smaller card, placing it face-down on the result. When one pile is empty, you put the rest of the other pile on the result. That's merge sort: divide until trivial (one element), then merge sorted halves.
Array [38, 27, 43, 3, 9, 82, 10]. Split: [38, 27, 43, 3] and [9, 82, 10]. Recursively sort: [3, 27, 38, 43] and [9, 10, 82]. Merge: compare 3 and 9 → take 3; 27 and 9 → take 9; 27 and 10 → take 10; 27 and 82 → take 27; … → [3, 9, 10, 27, 38, 43, 82].
Formal Definition
Merge sort: If n ≤ 1, return (already sorted). Otherwise: (1) Divide: mid = n//2; left = arr[0..mid−1], right = arr[mid..n−1]. (2) Conquer: left = merge_sort(left), right = merge_sort(right). (3) Merge: Merge the two sorted sequences left and right into one sorted sequence by repeatedly comparing the front elements and taking the smaller. Stability: When left[i] == right[j], take left[i] first → stable. Space: O(n) for the merged result (and recursion stack O(log n)).
Why This Topic Matters
- Guaranteed O(n log n): Unlike quicksort, merge sort has no bad pivot; it always does O(n log n) comparisons and O(n log n) work. Predictable performance.
- Stable: Important when sorting by one key then another (e.g. sort by name, then by age—ages stay in name order within same name).
- Linked lists and external sorting: Merge sort works well on linked lists (no random access needed) and is the basis for external sorting (merge sorted chunks from disk).
- Interviews: "Implement merge sort," "merge two sorted arrays," "count inversions" (using merge step), "sort linked list."
Mental Model
Divide the array in half until each piece has 0 or 1 element (trivially sorted). Then merge adjacent sorted pieces: two pointers at the start of each piece, take the smaller, advance that pointer, until both are exhausted. The merge of two arrays of size n/2 takes O(n). The recursion tree has log n levels and O(n) work per level → O(n log n).
Step-by-Step Breakdown
- Base case: If
len(arr) <= 1, return the array (already sorted). - Divide:
mid = len(arr) // 2;left = arr[:mid],right = arr[mid:]. - Conquer:
left = merge_sort(left),right = merge_sort(right). - Merge: Create result list. While both left and right are non-empty, compare
left[0]andright[0]; append the smaller to result and remove it from that list. Append the remainder of the non-empty list to result. Return result.
ASCII Diagram
Recursion (split):
[38, 27, 43, 3, 9, 82, 10]
/ \
[38,27,43,3] [9,82,10]
/ \ / \
[38,27] [43,3] [9,82] [10]
/ \ / \ / \ |
[38][27][43][3] [9][82] [10]
Merge (bottom-up): [27,38] [3,43] → [3,27,38,43]; [9,82] [10] → [9,10,82]
Then: [3,27,38,43] and [9,10,82] → [3,9,10,27,38,43,82]
Python Implementation
Recursive (New Lists)
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]: # <= for stability
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
Index-Based (One Temp Array)
To avoid creating many small lists, use indices and one temporary array of size n. Merge arr[low..mid] and arr[mid+1..high] into temp, then copy back to arr[low..high].
def merge_sort_inplace(arr):
temp = [0] * len(arr)
def merge_segments(lo, mid, hi):
i, j, k = lo, mid + 1, lo
while i <= mid and j <= hi:
if arr[i] <= arr[j]:
temp[k] = arr[i]; i += 1
else:
temp[k] = arr[j]; j += 1
k += 1
while i <= mid: temp[k] = arr[i]; i += 1; k += 1
while j <= hi: temp[k] = arr[j]; j += 1; k += 1
for i in range(lo, hi + 1): arr[i] = temp[i]
def sort(low, high):
if low >= high:
return
mid = (low + high) // 2
sort(low, mid)
sort(mid + 1, high)
merge_segments(low, mid, high)
sort(0, len(arr) - 1)
The list-slicing version above is simpler but uses O(n) space for copies at each level. The index-based version uses one O(n) temp array; the array is modified in place, but auxiliary space is still O(n).
Line-by-Line Explanation (Merge Function)
if left[i] <= right[j]: We take from the left when it's less than or equal to the right. Taking left when equal preserves the original order of equal elements (stability).result.extend(left[i:]): After one list is exhausted, append the rest of the other. Only one of left[i:] or right[j:] is non-empty.
Time Complexity
Let T(n) be the time for n elements. We split into two halves of size n/2, sort each (2 · T(n/2)), and merge in O(n). So T(n) = 2T(n/2) + O(n). By the Master Theorem (case 2: a=2, b=2, f(n)=Θ(n), n^(log_b a)=n), T(n) = Θ(n log n). This holds for best, average, and worst case—merge sort always does the same divide and merge structure. O(n log n) comparisons and O(n log n) moves.
Space Complexity
Recursive list-slicing version: Each level allocates O(n) for the left/right copies and the merged result. Depth is O(log n), but space is not multiplied across levels if we consider that we return and discard before going to the sibling—more precisely, the extra space at any time is O(n) for the current merge plus O(log n) stack. Often stated as O(n) auxiliary. Index-based with one temp: One temp array of size n plus O(log n) stack → O(n).
Edge Cases
- Empty or single element: Base case returns immediately; no merge needed.
- Two elements: Split into [a] and [b], merge → sorted.
- Already sorted: Still O(n log n); merge sort doesn't adapt to existing order (unlike insertion sort).
Common Mistakes
- Using
<instead of<=in merge: For stability we must take the left element when equal. If we use<, we take the right when equal, which can reorder equal elements. - Off-by-one in index-based merge: Use
mid = (low + high) // 2, left segment [low..mid], right segment [mid+1..high]. Merge into temp then copy back to arr[low..high].
Assuming "in-place" means O(1) extra space. Standard merge sort uses O(n) extra space for merging. True in-place merge in O(1) space exists but is complex and slower in practice; usually "in-place merge sort" means the array is modified using one O(n) temp buffer.
Comparison: Merge Sort vs Quick Sort
| Property | Merge sort | Quick sort |
|---|---|---|
| Time (worst) | O(n log n) | O(n²) |
| Time (avg) | O(n log n) | O(n log n) |
| Space | O(n) | O(log n) typical |
| Stable | Yes | No (default) |
For small subarrays (e.g. n ≤ 15–20), use insertion sort instead of recurring down to size 1. This reduces constant factors and stack depth. Many production merge sorts use this hybrid. Merge sort is also ideal for linked lists: splitting is O(n) (find mid with two pointers), merge is O(n), no extra array needed if you merge by rewiring pointers.
Pattern Recognition
"Divide in half, sort each half, merge"—that's merge sort. Same pattern: merge two sorted arrays (two pointers), count inversions (count when taking from right in merge), sort linked list (split at mid, merge).
Implement merge sort: base case len ≤ 1; split at mid; recurse on left and right; merge by comparing fronts and appending the smaller (use ≤ for stability). State O(n log n) time, O(n) space, stable. For "merge two sorted arrays," use the same merge logic without the recursion. For "count inversions," in the merge step when you take an element from the right half, add (remaining length of left) to the inversion count.
Practice Problems
- Implement merge sort recursively and with an index-based in-place style.
- Merge two sorted arrays (or merge two sorted halves of one array) in O(n) time.
- Count inversions in an array using merge sort (count when right element is taken before left elements remain).
- Sort a linked list using merge sort (find mid with slow/fast pointers, merge by rewiring).
Summary
- Merge sort: Divide array in half, recursively sort halves, merge two sorted halves. O(n log n) time (all cases), O(n) extra space, stable (take left when equal in merge).
- Merge step: two pointers, append smaller (or left when equal), then append remainder.
- Use when you need guaranteed O(n log n), stability, or sorting linked lists; prefer quicksort for average-case in-place when stability isn't required.
6.9 Quick Sort
Introduction
Quick sort is a divide-and-conquer algorithm that works by choosing a pivot element, partitioning the array so that all elements smaller than the pivot are to its left and all larger are to its right, then recursively sorting the left and right subarrays. The pivot ends up in its final sorted position. Average-case time is O(n log n), but worst case is O(n²) when the pivot is always the smallest or largest (e.g. already sorted array with last element as pivot). Quick sort is typically in-place (O(log n) stack space for recursion) and is not stable in the standard implementation. It is widely used in practice because of good average performance, cache friendliness, and the fact that random or median-of-three pivot selection makes worst case rare. Many language runtimes use a quicksort variant (often with insertion sort for small subarrays).
Real-World Analogy
Imagine sorting a stack of papers by picking one as a "reference" (pivot)—say the last one. You go through the rest and put everything smaller than that reference in one pile and everything larger in another. The reference goes in the middle. You then sort each pile the same way (pick a pivot, split). No merging step—the pivot is already in place. That's quicksort: partition around a pivot, then recurse on the two sides.
Array [10, 80, 30, 90, 40, 50, 70], pivot = 70 (last). Partition: smaller than 70 → [10, 30, 40, 50]; larger → [80, 90]; pivot in middle → [10, 30, 40, 50, 70, 80, 90]. Recursively sort [10,30,40,50] and [80,90]. No merge—pivot 70 is already in final position.
Formal Definition
Quick sort: (1) Choose a pivot (e.g. arr[high]). (2) Partition: Rearrange so that elements < pivot are in the left segment, elements > pivot in the right segment, and the pivot is between them (or at a fixed index). (3) Recursively quick sort the left segment and the right segment. Partition invariant: After partition, pivot is in its final position; no merge step. Stability: Standard partition (Lomuto or Hoare) is not stable. In-place: Partition can be done with O(1) extra space; recursion uses O(log n) stack in the average case.
Why This Topic Matters
- Practical default: Many standard libraries use quicksort (or a hybrid) for sorting because average O(n log n) with small constants and in-place operation.
- Partition as a building block: The partition step is reused in "quickselect" (find kth smallest without fully sorting), "Dutch national flag," and many interview problems.
- Worst case and pivot choice: Understanding why sorted input can give O(n²) with last-element pivot leads to random pivot or median-of-three, making worst case unlikely.
- Interviews: "Implement quicksort," "partition an array," "find the kth largest element" (quickselect).
Mental Model
Pick a pivot. Walk through the array and group "small" and "large" around it so the pivot lands in its final position. Now the array is split into "left of pivot" and "right of pivot"—both are unsorted but every left element < pivot < every right element. Sort the left and right with the same process. There is no merge: when the recursion returns, the whole segment is sorted because the pivot is already in place.
Step-by-Step Breakdown
- Base case: If the segment has 0 or 1 element, return (already sorted).
- Choose pivot: Often the last element (or first, or random, or median-of-three). Swap pivot to the end (or keep index) for easier partitioning.
- Partition: Maintain a "small" region. Scan elements; when you find one < pivot, extend the small region and put the element there. At the end, place the pivot after the small region. Return the pivot's index.
- Recurse: Quick sort arr[low..pivot_idx−1] and arr[pivot_idx+1..high].
ASCII Diagram
Lomuto partition (pivot = last). arr = [10, 80, 30, 90, 40, 50, 70], pivot=70
i = index of last "small" element (init -1). j scans.
j=0: 10<70 → swap to small region → i=0: [10, 80, 30, 90, 40, 50, 70]
j=1: 80>70 skip
j=2: 30<70 → swap → i=1: [10, 30, 80, 90, 40, 50, 70]
j=3,4: 90,40 → 40<70 → [10, 30, 40, 90, 80, 50, 70]; then 50<70 → [10, 30, 40, 50, 80, 90, 70]
j=6: swap pivot with i+1 → [10, 30, 40, 50, 70, 90, 80]. Pivot index = 4.
Python Implementation
Lomuto Partition (Pivot = Last)
def partition_lomuto(arr, low, high):
pivot = arr[high]
i = low - 1
for j in range(low, high):
if arr[j] <= pivot:
i += 1
arr[i], arr[j] = arr[j], arr[i]
arr[i + 1], arr[high] = arr[high], arr[i + 1]
return i + 1
def quicksort(arr, low, high):
if low < high:
pi = partition_lomuto(arr, low, high)
quicksort(arr, low, pi - 1)
quicksort(arr, pi + 1, high)
# Call: quicksort(arr, 0, len(arr) - 1)
Wrapper and Random Pivot (Avoid Worst Case)
import random
def quicksort_random(arr, low, high):
if low < high:
rand = random.randint(low, high)
arr[rand], arr[high] = arr[high], arr[rand]
pi = partition_lomuto(arr, low, high)
quicksort_random(arr, low, pi - 1)
quicksort_random(arr, pi + 1, high)
Hoare Partition (Two Pointers)
Two pointers from left and right; swap when left finds a large element and right finds a small one; stop when they cross. Pivot can be first or middle. Slightly more efficient (fewer swaps on average); pivot may not end at the split point—adjust recursion bounds.
def partition_hoare(arr, low, high):
pivot = arr[low]
left, right = low, high
while True:
while left <= right and arr[left] < pivot:
left += 1
while left <= right and arr[right] > pivot:
right -= 1
if left >= right:
return right
arr[left], arr[right] = arr[right], arr[left]
left += 1
right -= 1
Line-by-Line Explanation (Lomuto)
i = low - 1: The region arr[low..i] will contain elements ≤ pivot. Initially empty (i is "before" low).if arr[j] <= pivot: We use ≤ so that elements equal to the pivot can go either side; putting them in the small region is fine. Then we extend the small region (i += 1) and swap arr[i] with arr[j].arr[i + 1], arr[high] = arr[high], arr[i + 1]: After the loop, arr[low..i] ≤ pivot and arr[i+1..high−1] > pivot. So the pivot (at high) should go at index i+1. Swap and return i+1 as the pivot index.
Time Complexity
Best case: Pivot is always near the middle (each partition splits roughly in half). T(n) = 2T(n/2) + O(n) → O(n log n).
Average case: Random pivot (or random input). On average the pivot divides the array in a constant fraction; recurrence yields O(n log n).
Worst case: Pivot is always the smallest or largest (e.g. sorted array, pivot = last). One segment has n−1 elements, the other 0. T(n) = T(n−1) + O(n) → O(n²). Randomizing the pivot (or choosing median-of-three) makes this rare in practice.
Space Complexity
Partition uses O(1) extra space. Recursion depth: best/average O(log n), worst O(n) (unbalanced splits). So O(log n) average stack space, O(n) worst.
Edge Cases
- Empty or single element:
low >= high→ return; no partition. - All equal: Every element ≤ pivot; Lomuto puts all in the small region, pivot at end. One segment has n−1, the other 0 → O(n²) unless we optimize (e.g. three-way partition).
- Already sorted (pivot = last): Each partition puts pivot at the end; left segment has n−1 elements → O(n²). Use random pivot to avoid.
Common Mistakes
- Including pivot in the partition loop: In Lomuto we iterate
j from low to high-1and keep pivot at high until the final swap. Don't compare or move the pivot during the loop. - Wrong recursion bounds: After partition, pivot is at index pi. Recurse on [low, pi−1] and [pi+1, high]; do not include pi in either subarray.
Using the last element as pivot on an already-sorted array gives worst-case O(n²). Always randomize the pivot (swap a random element to the end before partitioning) or use median-of-three when implementing for production or interviews.
Comparison: Quick Sort vs Merge Sort
| Property | Quick sort | Merge sort |
|---|---|---|
| Worst time | O(n²) | O(n log n) |
| Avg time | O(n log n) | O(n log n) |
| Space | O(log n) avg | O(n) |
| Stable | No | Yes |
Random pivot or median-of-three (compare first, middle, last; use median as pivot) avoids worst case on sorted or nearly sorted data. For segments smaller than a threshold (e.g. 10–20), use insertion sort to reduce recursion overhead. Three-way partition (elements equal to pivot in the middle) gives O(n) when there are many duplicates.
Pattern Recognition
"Choose pivot, partition, recurse on both sides"—that's quicksort. The same partition idea: "reorder so that all elements with property X are before those without" (e.g. move zeros to the end), or "find kth smallest" (quickselect: partition once; if pivot index is k, done; else recurse on left or right).
Implement partition (Lomuto or Hoare), then quicksort that recurses on [low, pi−1] and [pi+1, high]. State O(n log n) average, O(n²) worst, O(log n) space average. Mention random pivot to avoid worst case. For "kth largest," use quickselect: partition and recurse on the side that contains the kth position (or use the pivot index to decide).
Practice Problems
- Implement quicksort with Lomuto partition and with random pivot.
- Partition: given pivot value, reorder array so all < pivot come first, then pivot(s), then > pivot.
- Find the kth largest element (quickselect: partition, then recurse on one side based on pivot index vs k).
- Sort an array with many duplicates (three-way partition: less, equal, greater).
Summary
- Quick sort: Choose pivot (e.g. last), partition so smaller elements are left and larger right, pivot in place; recurse on left and right. O(n log n) average, O(n²) worst, O(log n) space average, not stable.
- Lomuto: pivot at high;
i= last index of "small" region; for each j, if arr[j] ≤ pivot, extend small region and swap. Final swap places pivot at i+1. - Use random or median-of-three pivot to avoid worst case; use insertion sort for small subarrays in practice.
6.10 Heap Sort
Introduction
Heap sort sorts an array by treating it as a binary max-heap: first we build a max-heap (so the largest element is at the root), then we repeatedly take the maximum (root), swap it to the end of the unsorted region, and sift down to restore the heap property on the remaining elements. Building the heap can be done in O(n) time (bottom-up heapify); each of the n "extract max" steps takes O(log n), so total time is O(n log n) in all cases (best, average, worst). Heap sort is in-place (O(1) extra space) and not stable. It is useful when you need guaranteed O(n log n) with no extra space and when the heap data structure is already relevant (e.g. priority queue, k largest elements). Many embedded or memory-constrained systems use heap sort for this reason.
Real-World Analogy
Imagine a tournament bracket where the winner (largest) always rises to the top. You arrange all players in a binary tree so that each parent beats both children—that's a max-heap. The champion is at the root. You take the champion out, put them in the "sorted" seat at the end, and run a new match among the remaining players to get the next champion. Repeat until everyone is seated in order. That's heap sort: repeatedly extract the maximum from a heap and place it at the end.
Array [12, 11, 13, 5, 6, 7]. Build max-heap: e.g. [13, 11, 12, 5, 6, 7] (13 at root). Swap root (13) with last (7) → [7, 11, 12, 5, 6 | 13], heapify first 5 → [12, 11, 7, 5, 6 | 13]. Swap root (12) with last (6) → [6, 11, 7, 5 | 12, 13], heapify → [11, 6, 7, 5 | 12, 13]. Continue until sorted: [5, 6, 7, 11, 12, 13].
Formal Definition
Max-heap: A complete binary tree (represented as an array: parent at i has left child at 2i+1, right at 2i+2) where every parent is ≥ its children. So the maximum is at index 0. Heap sort: (1) Build heap: Start from the last non-leaf (index n/2−1) down to 0; for each node, sift it down so the subtree rooted at that node becomes a heap. (2) For i from n−1 down to 1: swap arr[0] with arr[i], then sift down from 0 on the heap of size i (so arr[i..n−1] is sorted). Stability: Not stable. In-place: O(1) extra space.
Why This Topic Matters
- Guaranteed O(n log n) in-place: Unlike quicksort, no worst-case O(n²); unlike merge sort, no O(n) extra space. Good for memory-constrained environments.
- Heap as a tool: The same heapify and sift-down operations are used for priority queues (Topic 9.8), "find k largest" (min-heap of size k), and many scheduling problems.
- Interviews: "Implement heap sort," "heapify an array," "k largest elements" (min-heap of size k or quickselect). Understanding parent/child indices and sift-down is essential.
Mental Model
The array is a complete binary tree: index 0 is the root; for node at index i, left child is 2i+1, right is 2i+2, parent is (i−1)//2. Sift down: If the node is smaller than the larger child, swap with that child and repeat in the subtree. Build heap: Sift down every non-leaf from bottom to top. Sort: Swap root (max) with the last element, then sift down from root on the reduced heap (excluding the last). The "last" position joins the sorted region.
Step-by-Step Breakdown
- Build max-heap: For i from (n//2 − 1) down to 0, call sift_down(arr, n, i) so that the subtree at i satisfies the heap property. After this, arr[0] is the maximum.
- Extract max and shrink heap: For i from n−1 down to 1: swap arr[0] with arr[i]; then sift_down(arr, i, 0) to restore the heap on indices [0..i−1]. After each step, arr[i..n−1] is sorted.
ASCII Diagram
Array as tree (indices): 0
/ \
1 2
/ \ /
3 4 5
Parent(i) = (i-1)//2, Left(i) = 2*i+1, Right(i) = 2*i+2
After build heap: root = max. Swap root with last, heap size -= 1, sift down from 0.
Sift down: if arr[i] < max(arr[left], arr[right]), swap with the larger child; repeat.
Python Implementation
Sift Down and Build Heap
def sift_down(arr, n, i):
largest = i
left = 2 * i + 1
right = 2 * i + 2
if left < n and arr[left] > arr[largest]:
largest = left
if right < n and arr[right] > arr[largest]:
largest = right
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
sift_down(arr, n, largest)
def heap_sort(arr):
n = len(arr)
for i in range(n // 2 - 1, -1, -1):
sift_down(arr, n, i)
for i in range(n - 1, 0, -1):
arr[0], arr[i] = arr[i], arr[0]
sift_down(arr, i, 0)
Iterative Sift Down (Avoid Recursion)
def sift_down_iter(arr, n, i):
while True:
largest = i
left = 2 * i + 1
right = 2 * i + 2
if left < n and arr[left] > arr[largest]:
largest = left
if right < n and arr[right] > arr[largest]:
largest = right
if largest == i:
break
arr[i], arr[largest] = arr[largest], arr[i]
i = largest
Line-by-Line Explanation
left = 2 * i + 1,right = 2 * i + 2: Standard indexing for a 0-based complete binary tree. Checkleft < nandright < nbefore using (node may have no children).if largest != i: If the current node is not the largest among itself and its children, swap with the larger child and sift down in that subtree. This restores the heap property at i (assuming both subtrees are already heaps).for i in range(n // 2 - 1, -1, -1): The last non-leaf index is (n//2 − 1). We heapify from bottom to top so that when we sift down at a node, its children are already heaps.arr[0], arr[i] = arr[i], arr[0]: Move the current max (root) to position i, which becomes part of the sorted suffix. Then sift_down(arr, i, 0) restores the heap on [0..i−1].
Time Complexity
Build heap: Sift down is O(height) per node. Sum over all nodes (bottom-up) gives O(n), not O(n log n)—because most nodes are near the leaves (short sift). Rigorous: at height h there are at most n/2^(h+1) nodes, each sifts O(h); sum h·n/2^(h+1) = O(n).
Sort phase: We do n−1 extractions; each involves a swap and a sift down on a heap of size at most n. So (n−1) × O(log n) = O(n log n).
Total: O(n) + O(n log n) = O(n log n). Same for best, average, and worst case.
Space Complexity
Only a few variables (indices, largest). Recursive sift_down uses O(log n) stack; iterative sift_down uses O(1). So O(1) with iterative sift-down, O(log n) with recursive.
Edge Cases
- Empty or single element: Build loop and sort loop run over empty ranges or do nothing; array unchanged.
- Two elements: Build heap: one sift_down at index 0 (compare with left child). Sort: one swap, then sift_down on size 1 (no-op). Correct.
Common Mistakes
- Wrong parent/child formula: For 0-based indexing, left = 2*i+1, right = 2*i+2, parent = (i−1)//2. Don't use 1-based formulas (2*i, 2*i+1) without adjusting.
- Sift down on wrong size: After swapping root with arr[i], the heap is only indices [0..i−1]. Call sift_down(arr, i, 0), not sift_down(arr, n, 0).
Building the heap by sifting down from the top for each node (or inserting one by one) gives O(n log n) for the build phase. The correct O(n) build is to sift down from the last non-leaf down to the root, so each node is sifted when its children are already heaps.
Comparison with Other Sorts
| Algorithm | Time | Space | Stable |
|---|---|---|---|
| Heap sort | O(n log n) all | O(1) | No |
| Merge sort | O(n log n) | O(n) | Yes |
| Quick sort | O(n log n) avg, O(n²) worst | O(log n) | No |
Use iterative sift_down to avoid recursion and keep space O(1). Heap sort is a good choice when you need O(n log n) guaranteed and cannot afford O(n) extra space. For "k largest" or "k smallest," a min-heap of size k (or max-heap for k smallest) gives O(n log k); full heap sort is O(n log n) and is used when you need the entire array sorted in place.
Pattern Recognition
"Largest at root, swap to end, restore heap"—that's heap sort. The same heap operations: build heap (O(n) when done bottom-up), sift down (O(log n)), and the tree indexing (2*i+1, 2*i+2) appear in priority queues, top-k problems, and scheduling.
Implement heap sort: (1) Build max-heap from the bottom up (last non-leaf down to 0). (2) For i from n−1 to 1, swap arr[0] with arr[i], then sift_down(arr, i, 0). State O(n log n) time, O(1) space (iterative sift-down), not stable. Be able to derive or state that build heap is O(n). For "k largest," mention min-heap of size k in O(n log k) or quickselect.
Practice Problems
- Implement heap sort with recursive and iterative sift-down.
- Given an array, heapify it (build max-heap) in O(n).
- Find the k largest elements (min-heap of size k, or quickselect).
- Merge k sorted lists using a min-heap (Topic 9.8).
Summary
- Heap sort: Build max-heap (bottom-up, O(n)), then repeatedly swap root with last, shrink heap, sift down. O(n log n) time (all cases), O(1) space (iterative sift-down), not stable.
- Tree indexing: left = 2*i+1, right = 2*i+2; parent = (i−1)//2. Sift down: swap with larger child until heap property holds.
- Use when you need guaranteed O(n log n) in-place; heap operations also power priority queues and top-k.
6.11 Counting Sort
Introduction
Counting sort is a non-comparison sorting algorithm for integers (or elements that can be mapped to small integers) in a known range, e.g. [0, k]. It counts how many times each value appears, then uses those counts to place each element in the correct position in the output. Time is O(n + k) where n is the number of elements and k is the range size; when k is O(n), this becomes O(n)—faster than any comparison-based sort. Space is O(n + k) for the output and the count array. Counting sort is stable when we iterate through the input from end to start and place each element using cumulative counts (so equal elements keep their relative order). It is the building block for radix sort (Topic 6.12) and is used whenever the key range is small and known.
Real-World Analogy
Imagine sorting a pile of papers that are only graded 1, 2, 3, 4, or 5. Instead of comparing papers to each other, you make five stacks (one per grade), drop each paper into the right stack, then read off the stacks in order: 1, then 2, then 3, 4, 5. That's counting sort: count how many of each value, then output that many of each in order. No comparisons between elements—just counting and placing.
Array [4, 2, 2, 8, 3, 3, 1], range [1..9]. Count: 1→1, 2→2, 3→2, 4→1, 8→1. Cumulative (for stable placement): 1→1, 2→3, 3→5, 4→6, 8→7. Place from end: 1 at index 0, two 2s at 1–2, two 3s at 3–4, 4 at 5, 8 at 6 → [1, 2, 2, 3, 3, 4, 8].
Formal Definition
Counting sort (integer keys in range [0, k] or [min, max]): (1) Count: count[x] = number of times value x appears. (2) Cumulative (for stable): pos[x] = number of elements with value < x (or cumulative sum of counts). (3) Place: for each element in the input (from end to start for stability), put it in the output at pos[value], then increment pos[value]. Stability: Iterating backward and using cumulative positions preserves the order of equal elements. Time O(n + k), space O(n + k).
Why This Topic Matters
- Linear time when range is small: If k = O(n), counting sort is O(n)—beats comparison-based Ω(n log n) lower bound because it doesn't compare elements; it uses the integer key as an index.
- Stable and simple: Stable counting sort is the standard subroutine in radix sort (sort by digit from least to most significant).
- Interviews: "Sort integers in range [0, 100]" or "sort by frequency then by value"—counting (or count then output) is natural. Often combined with "what if range is large?" (use comparison sort or radix).
Mental Model
First pass: count how many 0s, how many 1s, …, how many ks. Second pass (cumulative): for each value, how many elements come before it in sorted order? Third pass: for each element in the input (back to front for stability), look up "where does this value go?"—place it there and advance the position for that value so the next equal element goes in the next slot.
Step-by-Step Breakdown
- Find range: If not given, compute
min_valandmax_valof the array. Range sizek = max_val - min_val + 1. - Count: Create array
countof size k (index 0 for min_val). Traverse the input and incrementcount[arr[i] - min_val]. - Cumulative: Convert counts to cumulative positions:
count[i] += count[i-1](so count[i] = number of elements with value ≤ value_i). Then adjust so count[i] = starting index for value_i (shift by one, or build a separateposarray). - Place (stable): Create output array. Traverse input from end to start. For each element x, output[count[x]−1] = x, then count[x] -= 1.
ASCII Diagram
Input: [4, 2, 2, 8, 3, 3, 1], range 1..9 → indices 0..8 for values 1..9
Count: [0,1,2,1,0,0,0,0,1] for values 1,2,3,4,5,6,7,8
Cumulative: [0,1,3,5,6,6,6,6,7] (count[i] += count[i-1], then count[i] = first index for value i+1)
Place from end: 1→idx 0; 3→idx 4; 3→idx 3; 8→idx 6; 2→idx 2; 2→idx 1; 4→idx 5
Output: [1, 2, 2, 3, 3, 4, 8]
Python Implementation
Stable Counting Sort (Range [min_val, max_val])
def counting_sort(arr):
if not arr:
return []
min_val, max_val = min(arr), max(arr)
k = max_val - min_val + 1
count = [0] * k
for x in arr:
count[x - min_val] += 1
for i in range(1, k):
count[i] += count[i - 1]
output = [0] * len(arr)
for i in range(len(arr) - 1, -1, -1):
x = arr[i]
idx = count[x - min_val] - 1
output[idx] = x
count[x - min_val] -= 1
return output
Simple Version (Unstable, When Stability Not Needed)
Just count, then overwrite the array: for each value v, write count[v] copies of v. Simpler but does not preserve order of equal elements.
def counting_sort_simple(arr, max_val):
count = [0] * (max_val + 1)
for x in arr:
count[x] += 1
i = 0
for v in range(max_val + 1):
for _ in range(count[v]):
arr[i] = v
i += 1
Line-by-Line Explanation (Stable Version)
count[x - min_val] += 1: We map value x to index x − min_val so the count array has size k (not max_val+1). After the loop, count[i] = frequency of value (min_val + i).count[i] += count[i - 1]: Cumulative sum. After this, count[i] = number of elements with value ≤ (min_val + i). So count[i] − 1 is the last index where value (min_val + i) should go (if we have 1-based "how many before").for i in range(len(arr) - 1, -1, -1): Back to front so that when we place an element, we use the current count and then decrement—the next equal element goes to the previous slot, preserving order (stable).
Time Complexity
One pass to find min/max: O(n). One pass to count: O(n). Cumulative sum over k: O(k). One pass to place: O(n). Total O(n + k). When k = O(n), this is O(n). When k is very large (e.g. range 0 to 10^9 with few elements), counting sort is impractical—use comparison sort or radix sort on digits.
Space Complexity
Count array: O(k). Output array: O(n). So O(n + k). In-place counting sort exists (by permuting cycles) but is more complex; usually we accept O(n) for the output.
Edge Cases
- Empty array: Return [] or skip.
- Single element: count[0] = 1, cumulative count[0] = 1, place at index 0. Correct.
- All same value: Count = n for that value; cumulative gives one run of indices; all n elements placed in order (stable).
- Negative numbers or large range: Use min_val offset so indices are 0..k−1. If max_val − min_val is huge, consider radix sort or comparison sort instead.
Common Mistakes
- Placing from front instead of back: To keep stability, we must place elements in reverse order of the input. If we iterate forward, equal elements get reversed in the output.
- Off-by-one in index: After cumulative sum, count[i] is "one past the last index" for value i (or "number of elements ≤ value i"). So the last occurrence of value i goes at index count[i]−1; then decrement count[i].
Using counting sort when the range k is very large (e.g. 32-bit integers). Space and time O(n + k) become O(2^32) or similar—impractical. Use comparison sort or radix sort (sort by digit in chunks) when range is large.
When to Use: Counting vs Comparison Sorts
| Scenario | Choice |
|---|---|
| Integers, range 0..k with k = O(n) | Counting sort O(n) |
| Integers, large range (e.g. 32-bit) | Radix sort or comparison sort |
| Stable sort by digit (for radix) | Stable counting sort per digit |
When the range is known and small (e.g. ages 0–120, grades 1–5), counting sort is optimal in time. For sorting objects by an integer key, use the key for counting and store the objects (or indices) to preserve stability and handle duplicates. Counting sort is the inner loop of radix sort (Topic 6.12).
Pattern Recognition
"Integer keys in a small known range" → counting sort. "Sort by digit" (least significant to most) → counting sort per digit = radix sort. "Frequency of each value" or "how many times does x appear" → count array first.
Describe counting sort: count frequencies, then cumulative counts, then place from end to start for stability. State O(n + k) time and space; mention it's non-comparison and beats Ω(n log n) when k = O(n). If the range is large, say you'd use radix sort (digits) or a comparison sort. For "sort integers in [0, 100]," counting sort is ideal.
Practice Problems
- Implement stable counting sort for integers in [min, max].
- Sort an array of characters (range 0..255 or 'a'..'z') using counting sort.
- Use counting sort as the stable digit sort inside radix sort (Topic 6.12).
Summary
- Counting sort: Count frequency of each value, compute cumulative positions, place each element from end to start for stability. O(n + k) time and space; stable when placing backward.
- Use when keys are integers (or small discrete keys) in a known range; when k = O(n), sort is O(n).
- Do not use when range is huge; use radix or comparison sort instead. Stable version is the building block for radix sort.
6.12 Radix Sort
Introduction
Radix sort sorts integers (or strings) by processing them digit by digit (or character by character), from the least significant to the most significant (LSD), using a stable sort for each digit. Because each digit has a small range (0–9 for decimal, 0–255 for a byte), we use counting sort per digit—giving O(d · (n + k)) time where d is the number of digits and k is the digit range (e.g. 10). For fixed-width integers, d is constant, so radix sort is O(n). It handles large value ranges (e.g. 32-bit integers) without the huge count array that pure counting sort would need. Radix sort is stable when the per-digit sort is stable, and uses O(n + k) extra space per pass (typically O(n) for the output). MSD radix sort (most significant first) is an alternative used for variable-length keys or lexicographic order; LSD is simpler and standard for fixed-length integers.
Real-World Analogy
Imagine sorting a stack of dated papers. First sort by day (1–31), then by month (1–12), then by year. Each pass uses a stable sort so that after sorting by month, papers with the same month stay in day order. After the last pass, everything is in full date order. Radix sort does the same with digits: sort by ones, then tens, then hundreds—each pass stable, so the previous order is preserved within the same digit group.
Array [170, 45, 75, 90, 802, 24, 2, 66]. Sort by ones digit: [170, 90, 802, 2, 24, 45, 75, 66]. Sort by tens: [802, 2, 24, 45, 66, 170, 75, 90]. Sort by hundreds: [2, 24, 45, 66, 75, 90, 170, 802]. Each pass uses stable counting sort on that digit.
Formal Definition
LSD Radix sort: Assume all keys have the same number of digits (pad with leading zeros if needed). For digit position p = 0 (least significant) to d−1 (most significant): stable sort the array by the p-th digit. After all passes, the array is sorted. Why stable? When we sort by digit p, keys that agree on digit p keep their relative order from the previous pass—so the sort by digit p−1 is preserved within each digit-p group. Time: d passes × O(n + k) per pass = O(d · (n + k)). For integers in base 10 or base 256, d is the number of digits; k = 10 or 256. Space: O(n + k) per pass (reusable).
Why This Topic Matters
- Large range, linear time: For 32-bit integers, we have at most 32/8 = 4 bytes (or 10 decimal digits). So d is small; radix sort runs in O(n) for fixed-width integers when using byte or digit chunks.
- No comparisons: Like counting sort, radix sort uses the structure of the key (digits) rather than comparisons—so it can beat the Ω(n log n) comparison lower bound for integers.
- Interviews: "Sort integers in O(n)," "sort by digit," or "how would you sort 1 million 32-bit integers?"—radix sort (or counting sort if range is small) is the answer.
Mental Model
Think of each number as having d digits (e.g. 802 = 8,0,2 from most to least significant). We sort one digit at a time, starting from the rightmost digit (least significant). Each pass is a stable sort (counting sort) on that digit only. After the first pass, numbers are ordered by ones; after the second, by ones then tens; after the last pass, by the full key. Stability is critical: it keeps the work from previous passes intact.
Step-by-Step Breakdown (LSD)
- Find max: Get the maximum value to determine the number of digits (or use a fixed width, e.g. 32-bit = 4 bytes).
- For each digit position from least significant to most significant: extract that digit (or byte) from each element, run stable counting sort on that digit, and replace the array with the sorted order.
- After all digit passes, the array is sorted.
ASCII Diagram
[170, 45, 75, 90, 802, 24, 2, 66] (ones: 0,5,5,0,2,4,2,6)
Pass 1 (ones): [170,90,802,2,24,45,75,66]
Pass 2 (tens): [802,2,24,45,66,170,75,90] (tens: 7,0,0,2,2,4,9,7)
Pass 3 (hundreds): [2,24,45,66,75,90,170,802]
Sorted.
Python Implementation
LSD Radix Sort (Decimal Digits)
def counting_sort_by_digit(arr, exp):
n = len(arr)
output = [0] * n
count = [0] * 10
for i in range(n):
digit = (arr[i] // exp) % 10
count[digit] += 1
for i in range(1, 10):
count[i] += count[i - 1]
for i in range(n - 1, -1, -1):
digit = (arr[i] // exp) % 10
output[count[digit] - 1] = arr[i]
count[digit] -= 1
for i in range(n):
arr[i] = output[i]
def radix_sort(arr):
if not arr:
return
max_val = max(arr)
exp = 1
while max_val // exp > 0:
counting_sort_by_digit(arr, exp)
exp *= 10
LSD Radix Sort (By Byte, for Large Integers)
For 32-bit non-negative integers, process 4 bytes (or 8-bit chunks). Each pass: extract byte (x >> (8*p)) & 255, counting sort on 0..255, then copy back. Four passes; each O(n + 256) = O(n).
Line-by-Line Explanation
(arr[i] // exp) % 10: For exp = 1 we get the ones digit; for exp = 10 the tens digit; for exp = 100 the hundreds digit. So we isolate one decimal digit per pass.counting_sort_by_digit(arr, exp): Sorts the array in place by the digit given byexp. We use the same stable counting sort as Topic 6.11; the "key" is that digit.while max_val // exp > 0: We stop whenexpexceeds the maximum value (no more digits). For each pass we multiplyexpby 10 (next digit to the left).
Time Complexity
Let d be the number of digits (or byte passes). Each pass is a counting sort: O(n + k) with k = 10 (digits) or 256 (bytes). Total O(d · (n + k)). For integers in a fixed range, d is constant (e.g. 10 decimal digits for 32-bit, or 4 bytes). So O(n) when d and k are constants. For variable-length strings, d is the length of the longest string.
Space Complexity
Per pass: count array O(k), output array O(n). So O(n + k); k is small (10 or 256). Can reuse the same output buffer for every pass.
Edge Cases
- Empty array: Return or skip.
- Negative numbers: Standard LSD radix sort assumes non-negative keys. For signed integers, split into negative and non-negative, radix sort each (negatives: complement or offset for the digit extraction), then concatenate negative (reversed) and non-negative.
- Variable length: Pad with leading zeros so all have the same number of digits (for LSD). Or use MSD radix sort, which naturally handles variable length (shorter strings are "smaller" in the digit sense if we treat missing digit as 0).
Common Mistakes
- Using an unstable sort per digit: If the per-digit sort is not stable, the previous digit order is lost and the final array may be wrong. Always use stable counting sort.
- Wrong digit order: LSD must process from least significant to most significant. If you sort by most significant first (MSD), you need a different algorithm (MSD radix sort, often recursive).
Forgetting that radix sort is for fixed or bounded key length. If keys are arbitrary-length strings and you use LSD with padding, d equals the max length—fine. If keys are integers with unbounded size, d can grow; then radix sort is O(n · d) and may not beat comparison sort. For 32- or 64-bit integers, d is constant.
LSD vs MSD
| Variant | Use |
|---|---|
| LSD | Fixed-length integers; simple; stable sort per digit from right to left. |
| MSD | Variable-length keys, lexicographic order; can skip empty buckets; recursive. |
For 32-bit integers, using byte-based radix (4 passes, k=256) is often faster than digit-based (10 passes, k=10) because fewer passes. Use counting sort for each byte. For strings, LSD with fixed padding or MSD (trie-like) are both used depending on the data.
Pattern Recognition
"Sort integers in O(n)" or "sort without comparison" with large range → radix sort (or counting sort if range is small). "Sort by digit" or "sort strings lexicographically" with fixed-length or by character → radix (LSD or MSD). The key is that each digit has a small range so counting sort per digit is O(n).
Describe LSD radix sort: sort by least significant digit first, then next digit, …, using a stable sort (counting sort) each time. State O(d · n) for d digits and small digit range. Mention that it's good for fixed-width integers and beats O(n log n) because it's non-comparison. If asked about negatives, say you'd split into negative and non-negative and handle separately (e.g. two's complement or offset).
Practice Problems
- Implement LSD radix sort for non-negative integers (decimal digits then byte-based).
- Sort an array of strings of equal length using LSD radix sort (by character position from right to left).
- Extend to signed integers (split, radix sort, recombine).
Summary
- Radix sort (LSD): Sort by least significant digit, then next, …, using a stable sort (counting sort) per digit. O(d · (n + k)) time, O(n + k) space; for fixed d and k, O(n).
- Stability of the per-digit sort is essential so that the order from previous passes is preserved.
- Use for fixed-width integers (or fixed-length strings) when you need O(n) sort; use counting sort when the full key range is small.
6.13 Bucket Sort
Introduction
Bucket sort assumes the input is uniformly distributed over a known range. It distributes elements into buckets (e.g. bucket i holds values in the range [i/n, (i+1)/n) when values are in [0, 1)), then sorts each bucket (typically with insertion sort, since buckets are small), and concatenates the buckets in order. When distribution is uniform, each bucket has O(1) elements on average, so sorting all buckets is O(n) expected—expected time O(n). Worst case (all elements in one bucket) is O(n²) if we use insertion sort per bucket. Space is O(n) for the buckets. Bucket sort is useful for floating-point numbers in [0, 1) or for keys that can be normalized to a range; it is a distribution sort (like counting and radix) that relies on the key structure rather than comparisons.
Real-World Analogy
Imagine sorting exam scores from 0 to 100. You set up 10 buckets: 0–9, 10–19, …, 90–100. You drop each paper into the right bucket. If scores are spread out, each bucket has only a few papers; you sort each bucket quickly (e.g. insertion sort) and then stack the buckets in order. That's bucket sort: scatter into buckets by range, sort each bucket, then concatenate.
Values in [0, 1): [0.78, 0.17, 0.39, 0.26, 0.72, 0.94]. Use 6 buckets: [0,1/6), [1/6,2/6), … Bucket 0: empty; 1: [0.17]; 2: [0.26, 0.39]; 3: empty; 4: [0.72, 0.78]; 5: [0.94]. Sort each (insertion sort): [0.17], [0.26, 0.39], [0.72, 0.78], [0.94]. Concatenate → [0.17, 0.26, 0.39, 0.72, 0.78, 0.94].
Formal Definition
Bucket sort (values in [0, 1) or normalized range): Create n buckets (or a fixed number). For each element x, place it in bucket floor(x * n) (for [0,1)) or the bucket that covers x's range. Sort each bucket (insertion sort or another sort). Append buckets in order to get the sorted array. Expected time: O(n) when the distribution is uniform (each bucket O(1) size). Worst time: O(n²) when all elements fall in one bucket. Space: O(n).
Why This Topic Matters
- Expected O(n) for uniform data: When input is uniformly distributed, bucket sort is one of the few algorithms that achieve linear expected time with no comparison-based lower bound.
- Floating-point and normalized keys: Counting sort needs integer indices; radix sort needs digits. Bucket sort works directly on reals in [0, 1) or any range you can map to buckets.
- Interviews: "Sort numbers uniformly distributed in [0, 1)" or "what sort would you use for uniformly distributed floats?"—bucket sort is the standard answer.
Mental Model
Divide the range into n equal intervals (buckets). Scatter each element into the bucket that contains its value. Each bucket is "almost full" with about one element on average (if uniform). Sort each bucket (cheap—small size), then read the buckets in order. No merging step—just concatenation.
Step-by-Step Breakdown
- Create buckets: n empty buckets (e.g. lists), indexed 0 to n−1.
- Scatter: For each element x in [0, 1), bucket index =
int(x * n)(or min of that and n−1 to handle 1.0). Append x to that bucket. - Sort each bucket: Use insertion sort (or any sort) on each bucket.
- Concatenate: Output = bucket[0] + bucket[1] + … + bucket[n−1].
ASCII Diagram
Input [0.78, 0.17, 0.39, 0.26, 0.72, 0.94], n=6
Bucket index = floor(x * 6). 0.78→4, 0.17→1, 0.39→2, 0.26→1, 0.72→4, 0.94→5
Buckets: [] [0.17,0.26] [0.39] [] [0.72,0.78] [0.94]
After sort each: [] [0.17,0.26] [0.39] [] [0.72,0.78] [0.94]
Concatenate: [0.17, 0.26, 0.39, 0.72, 0.78, 0.94]
Python Implementation
def bucket_sort(arr):
if not arr:
return []
n = len(arr)
buckets = [[] for _ in range(n)]
for x in arr:
idx = min(int(x * n), n - 1)
buckets[idx].append(x)
for b in buckets:
b.sort()
return [x for b in buckets for x in b]
For values in a range [min_val, max_val], normalize: (x - min_val) / (max_val - min_val + 1e-9) then use the same logic, or map directly to bucket index: int((x - min_val) / (max_val - min_val + 1e-9) * n).
Line-by-Line Explanation
idx = min(int(x * n), n - 1): For x in [0, 1),x * nis in [0, n).int(x * n)can be 0 to n−1; if x is exactly 1.0, we get n, somin(..., n-1)keeps the index valid.b.sort(): Python's sort is O(k log k) for a bucket of size k. Total over buckets: sum O(n_i log n_i). Expected case: each bucket O(1) size → O(n) total. Worst: one bucket has n elements → O(n log n) for that bucket (or O(n²) with insertion sort).
Time Complexity
Expected (uniform distribution): Each bucket has O(1) elements on average. Sorting each bucket is O(1) on average (insertion sort on constant size). Total O(n). Worst case: All elements in one bucket; that bucket has n elements. With insertion sort per bucket → O(n²). With a comparison sort per bucket → O(n log n) for that bucket, so total O(n log n). Space: O(n) for the buckets.
Space Complexity
We need n buckets and the elements are distributed among them. O(n).
Edge Cases
- Empty array: Return [].
- All same value: All go to one bucket; sort that bucket (already sorted).
- Value exactly 1.0: Use
min(int(x * n), n - 1)so we don't index out of bounds.
Common Mistakes
- Assuming O(n) always: Bucket sort is O(n) only in the expected case when the distribution is uniform. Worst case can be O(n²) with insertion sort per bucket.
- Wrong bucket index for [0, 1): Use
int(x * n); ensure 1.0 maps to n−1 (not n) to avoid index error.
Using bucket sort when the data is not uniformly distributed (e.g. heavily skewed). Then many elements fall in few buckets and the "sort each bucket" step becomes expensive. Use comparison sort or another distribution sort (counting/radix) when the key structure fits better.
When to Use
| Scenario | Choice |
|---|---|
| Floats in [0, 1), uniform | Bucket sort O(n) expected |
| Integers in small range | Counting sort |
| Skewed or unknown distribution | Comparison sort (merge/quick) |
Use insertion sort for small buckets (low constant factor); use a full O(k log k) sort per bucket if you want to avoid O(n²) worst case when one bucket gets many elements. The number of buckets is often chosen as n; more buckets mean smaller buckets (faster to sort) but more overhead.
Pattern Recognition
"Uniformly distributed in [0, 1)" or "floats in a range" → bucket sort. "Distribute by range then sort small groups" is the pattern. When data is not uniform, bucket sort loses its advantage.
Describe bucket sort: create n buckets for range [0, 1), scatter each element into bucket floor(x * n), sort each bucket (e.g. insertion sort), concatenate. State O(n) expected when uniform, O(n²) worst when all in one bucket. Mention it's good for uniformly distributed floats; for integers in small range use counting sort instead.
Practice Problems
- Implement bucket sort for floats in [0, 1).
- Bucket sort for integers in [min, max] by mapping to bucket index.
Summary
- Bucket sort: Scatter elements into buckets by range, sort each bucket, concatenate. O(n) expected when uniform, O(n²) worst with insertion sort per bucket. O(n) space.
- Use for uniformly distributed data in a known range (e.g. [0, 1)); not for skewed or arbitrary distributions.
6.14 Stability in Sorting
Introduction
A sorting algorithm is stable if whenever two elements are equal according to the sort key, their relative order in the output is the same as in the input. That is, if element A appears before element B in the input and A and B are equal, then A still appears before B in the output. Stability matters when you sort by one key first and then by another (e.g. sort by name, then by age—you want people with the same age to remain in name order), or when the sort is used as a subroutine (e.g. radix sort requires a stable per-digit sort). Not all sorts are stable: quicksort and heapsort typically are not; merge sort and insertion sort are. This topic defines stability, explains why it matters, and summarizes which algorithms are stable and how to achieve stability when needed.
Real-World Analogy
Imagine sorting a class list first by last name, then by first name. After the second sort, you want "Smith, Alice" and "Smith, Bob" to stay in alphabetical order by first name. If the second sort is stable, all "Smith" entries will keep their relative order from the first-name sort. If it's unstable, two "Smith" entries might be swapped even when their first names were already in order.
Pairs (score, name): [(3, "Alice"), (2, "Bob"), (3, "Charlie")]. Sort by score. Stable sort → [(2, "Bob"), (3, "Alice"), (3, "Charlie")] (Alice before Charlie because they were in that order for equal score). Unstable sort might give [(2, "Bob"), (3, "Charlie"), (3, "Alice")].
Formal Definition
Stability: Let the sort key be a function key(x). A sort is stable if for any two elements a, b with key(a) = key(b), if a appeared before b in the input, then a appears before b in the output. Equivalently: equal elements preserve their original relative order. Why it breaks: An algorithm that swaps or moves elements without regard to original position can reorder equal elements (e.g. quicksort swapping with pivot, heapsort swapping root to end).
Why This Topic Matters
- Multi-key sorting: Sort by (last name, first name) by first sorting by first name, then by last name with a stable sort. The second sort preserves first-name order within the same last name.
- Radix sort: LSD radix sort requires a stable sort per digit. If the per-digit sort is unstable, the final order can be wrong.
- Interviews: "What is a stable sort?" "Which sorts are stable?" "How would you make quicksort stable?" (e.g. attach original index as tiebreaker).
Mental Model
When two elements compare equal, a stable sort never swaps them; it leaves the one that was first in the input still first. So "equal" is treated as "no swap." In code, that usually means using strict < in comparisons (so we only swap when strictly less/greater), or when merging, taking the element from the left subarray when keys are equal (merge sort).
Which Sorts Are Stable?
- Stable: Merge sort (take left when equal in merge), insertion sort (insert after equals with strict >), bubble sort (swap only when strict >), counting sort (place from end to start), radix sort (stable per-digit sort), bucket sort (if each bucket is sorted with a stable sort).
- Unstable: Quick sort (partition can swap equals past each other), heap sort (swap root with end can reorder equals), selection sort (swapping min to front can move a later equal past an earlier one).
Comparison Table
| Algorithm | Stable? | Note |
|---|---|---|
| Merge sort | Yes | Take left when equal in merge |
| Insertion / Bubble | Yes | Use strict > (no swap on equal) |
| Counting / Radix | Yes | Place backward; stable per digit |
| Quick sort / Heap sort / Selection | No | Swaps can reorder equals |
How to Make an Unstable Sort Stable
Attach the original index (or any unique tiebreaker) to each element. When two keys are equal, compare the indices so that the element that was first in the input is considered "smaller." Then any sort becomes stable with respect to the original order, because the comparison is never truly equal—we break ties by index. Example: sort pairs (value, index); compare by value first, then by index. This uses O(n) extra space for the indices.
# Stable "quicksort" by attaching original index as tiebreaker
def stable_sort(arr):
indexed = [(x, i) for i, x in enumerate(arr)]
indexed.sort(key=lambda p: (p[0], p[1]))
return [x for x, i in indexed]
Common Mistakes
- Using >= or <= when swapping: For stability in bubble/insertion, use strict
>so we do not swap when equal. If you use>=, equal elements may be swapped. - Assuming "default" quicksort is stable: Standard in-place quicksort is not stable. Say "quicksort is unstable unless we use a tiebreaker."
Define stability: equal elements keep their relative order. Give an example (sort by name then age). List stable sorts (merge, insertion, bubble, counting, radix) and unstable (quick, heap, selection). To make an unstable sort stable, add original index as secondary key so ties are broken by position.
Summary
- Stable sort: If key(a) = key(b) and a was before b in the input, a is before b in the output.
- Stable: merge, insertion, bubble, counting, radix (with stable digit sort), bucket (with stable bucket sort). Unstable: quick, heap, selection.
- Use stable sort for multi-key sorting and as the subroutine for radix sort. To get stability with an unstable algorithm, attach original index as tiebreaker.
6.15 Inversion Count
Introduction
An inversion in an array is a pair of indices (i, j) such that i < j and arr[i] > arr[j]—i.e. two elements that are out of order. The inversion count (or number of inversions) measures how "unsorted" the array is: it is 0 when the array is sorted (ascending) and maximum when the array is reverse sorted. Counting inversions in O(n²) is trivial (check every pair); we can do it in O(n log n) by modifying merge sort: during the merge step, whenever we take an element from the right subarray, every remaining element in the left subarray is greater than it, so we add the number of remaining left elements to the inversion count. Inversion count equals the minimum number of adjacent swaps needed to sort the array (as in bubble sort). It appears in problems about "how far from sorted," "minimum swaps to sort," and in analysis of sorting algorithms.
Real-World Analogy
Imagine a queue of people sorted by height. An inversion is a pair where the taller person is standing in front of the shorter one. Counting inversions tells you "how many pairs are in the wrong order." If you could only swap adjacent people (like bubble sort), the number of such swaps needed to sort the queue is exactly the inversion count.
Array [2, 4, 1, 3, 5]. Inversions: (2,1), (4,1), (4,3) → count = 3. Sorted is [1,2,3,4,5]; we need at least 3 adjacent swaps to get there (e.g. swap 4 and 1, then 2 and 1, then 4 and 3).
Formal Definition
Inversion: A pair (i, j) with 0 ≤ i < j < n and arr[i] > arr[j]. Inversion count: The total number of such pairs. Relation to sorting: Minimum number of adjacent swaps to sort the array (bubble sort) equals the inversion count. Merge-sort approach: When merging two sorted halves, if we take an element from the right half, it is smaller than every remaining element in the left half—so there are (mid − i + 1) inversions involving this right element and those left elements. Sum over all such "right" picks to get the total count.
Why This Topic Matters
- Minimum adjacent swaps: Problems like "sort the array using only adjacent swaps" have answer = inversion count.
- Measure of disorder: Inversion count is a simple measure of how far the array is from sorted; used in rankings and similarity.
- Interviews: "Count inversions in an array" is a classic merge-sort modification. Also "minimum swaps to sort" (adjacent or not—different formulas).
Mental Model
During merge sort, when we merge the left and right sorted halves, we compare the front of each. When we take an element from the right half and put it in the output, that element is smaller than every element still in the left half (because the left half is sorted and we're taking the smallest remaining from the right). So this one right element forms an inversion with each of those remaining left elements. Add (number of remaining left elements) to the inversion count. We do not count anything when we take from the left (that element is smaller than the current right front, so no inversion with the right elements we've already placed).
Step-by-Step (Merge-Sort Based)
- Run merge sort, but during the merge step, maintain a global (or passed) inversion count.
- When merging: when we choose to take an element from the right subarray (because it is smaller than the current left front), add (number of elements remaining in the left subarray) to the inversion count. That equals (mid − i + 1) if we use index i for the current position in the left array and mid is the end of the left array.
- Return the inversion count (and the sorted array if needed).
Python Implementation
def merge_and_count(arr, temp, left, mid, right):
i, j, k = left, mid + 1, left
inv_count = 0
while i <= mid and j <= right:
if arr[i] <= arr[j]:
temp[k] = arr[i]
i += 1
else:
temp[k] = arr[j]
j += 1
inv_count += (mid - i + 1)
k += 1
while i <= mid:
temp[k] = arr[i]
i += 1
k += 1
while j <= right:
temp[k] = arr[j]
j += 1
k += 1
for i in range(left, right + 1):
arr[i] = temp[i]
return inv_count
def count_inversions(arr, temp, left, right):
inv_count = 0
if left < right:
mid = (left + right) // 2
inv_count += count_inversions(arr, temp, left, mid)
inv_count += count_inversions(arr, temp, mid + 1, right)
inv_count += merge_and_count(arr, temp, left, mid, right)
return inv_count
# Usage: temp = [0] * len(arr); total = count_inversions(arr, temp, 0, len(arr)-1)
Line-by-Line Explanation
if arr[i] <= arr[j]: We take from the left when it's less than or equal. So we only take from the right when arr[j] < arr[i]. In that case, arr[j] is smaller than arr[i] and every element after i in the left half (all ≥ arr[i]). So arr[j] forms an inversion with (mid − i + 1) elements.inv_count += (mid - i + 1): The number of elements left in the left subarray (from i to mid inclusive). Each of these is an inversion with the current right element we're placing.
Time Complexity
Same as merge sort: we do the same splits and merges, with one extra addition per "take from right" event. So O(n log n). Space O(n) for the temp array and recursion stack O(log n).
Space Complexity
O(n) for the temporary array used in merge; O(log n) for recursion stack. O(n) total.
Brute Force vs Merge-Sort
| Method | Time | Space |
|---|---|---|
| Brute force (all pairs) | O(n²) | O(1) |
| Merge sort based | O(n log n) | O(n) |
For "minimum adjacent swaps to sort," the answer is exactly the inversion count (each adjacent swap fixes exactly one inversion). For "minimum swaps to sort" when any two elements can be swapped (not necessarily adjacent), the answer is n − c where c is the number of cycles in the permutation (decompose into cycles and each cycle of length k needs k−1 swaps). Don't confuse the two problems.
Edge Cases
- Empty or single element: 0 inversions.
- Sorted array: 0 inversions (no pair is out of order).
- Reverse sorted: Every pair (i, j) with i < j is an inversion → n(n−1)/2.
Define an inversion as (i, j) with i < j and arr[i] > arr[j]. Say you can count in O(n²) by checking all pairs, or in O(n log n) by modifying merge sort: when merging, whenever you take an element from the right half, add (remaining left half size) to the count. Mention that inversion count = minimum adjacent swaps to sort (bubble sort).
Practice Problems
- Count inversions in an array (merge-sort method).
- Minimum adjacent swaps to sort (answer: inversion count).
- Count inversions where arr[i] > 2*arr[j] (modify merge step to count such pairs in O(n log n)).
Summary
- Inversion: (i, j) with i < j and arr[i] > arr[j]. Inversion count = total number of such pairs.
- Count in O(n log n) by merge sort: when placing an element from the right half during merge, add (mid − i + 1) to the count.
- Inversion count = minimum adjacent swaps needed to sort the array. For "any swap" minimum swaps, use n − (number of cycles).
7.1 String Basics
Introduction
In the world of data structures and algorithms, a string is one of the most fundamental and frequently used types of data. Whether you're checking if a word is a palindrome, finding anagrams, searching for a pattern in text, or parsing input, you are working with strings. String basics are the foundation for the entire Strings section: once you understand how strings are represented, how to access and slice them efficiently, and how immutability affects your algorithms, you can tackle frequency counting, anagrams, palindromes, pattern matching, and advanced algorithms like KMP and hashing with confidence.
In Python, a string is an immutable sequence of Unicode characters. You can think of it as a read-only array of characters: you can read any position in O(1), but you cannot change a character in place—any "change" creates a new string. This has important consequences for how we design string algorithms and how we reason about time and space.
Real-World Analogy
Imagine a necklace of beads where each bead is a letter. The beads are fixed on the string in order: you can point to the 1st bead, the 5th bead, or the last bead. You can look at any bead or a contiguous stretch of beads (a substring). But you cannot replace a single bead without making a whole new necklace—that's immutability. When you "add" another necklace (concatenation), you're really creating a new, longer necklace by copying both. So "building" a long necklace by adding one bead at a time means making a new necklace each time, which gets expensive. The smart way is to collect all the beads (or segments) first and then thread them once—in code, that's using a list and ''.join().
String s = "hello". The length is 5. The first character is s[0] → 'h'; the last is s[-1] → 'o'. The substring from index 1 to 4 (exclusive) is s[1:4] → "ell". Reversing the string with s[::-1] gives "olleh"—useful for checking palindromes. You cannot do s[0] = 'H'; to get "Hello" you must build a new string, e.g. 'H' + s[1:].
Formal Definition
String: A finite sequence of characters from some alphabet (e.g. ASCII, Unicode). The length n is the number of characters. We use 0-based indexing: the first character is at index 0, the last at index n−1. A substring (or slice) is a contiguous segment s[i:j] (indices i to j−1). A prefix is s[0:k]; a suffix is s[k:n]. Strings are compared lexicographically (character by character, like dictionary order). In Python, strings are immutable: no in-place modification.
Why This Topic Matters
- Core DSA topic: String problems appear in every interview and contest: palindromes, anagrams, pattern matching, parsing, and text processing. String basics give you the vocabulary and operations you need.
- Efficiency matters: Accessing
s[i]is O(1), but building a string withs += cin a loop is O(n²). Knowing when to use a list and''.join()keeps your solutions fast. - Bridges to advanced topics: Frequency counting (7.2), anagrams (7.3), and palindromes (7.4) all rely on iterating over characters, slicing substrings, and comparing strings. Pattern matching (7.5+) builds on substrings and indices.
- Language-agnostic thinking: In other languages (C++, Java), strings may be mutable or stored differently, but the logical view—sequence of characters, indexing, substrings—is the same. Master the concepts here and you can adapt.
Mental Model
Think of a string as a read-only array of characters with fixed positions. You have a cursor (index) that you can move; at each position you can read the character in O(1). You can also "window" a contiguous range (slice) to get a substring—that costs O(k) where k is the length of the slice, because a new string is created. You never "edit" the string; you only create new strings (slices, concatenations, or results of methods like replace, upper). When you need to build a string from many parts, imagine collecting the parts in a list and then joining them once—that way you pay O(n) total instead of repeated copying.
Step-by-Step: How We Work With Strings in Algorithms
- Read and index: Use
s[i]for a single character (O(1)) ands[start:end]for a substring (O(length of slice)). Uselen(s)for length. - Iterate: Use
for c in sto iterate over characters, orfor i in range(len(s))when you need indices. Useenumerate(s)for both. - Compare: Use
==,<,>for lexicographic comparison. For "are these two strings equal?" uses1 == s2(O(n)). - Build new strings: Avoid
result += cin a loop. Useparts = [],parts.append(...), then''.join(parts)for O(n) total. - Reverse and slice:
s[::-1]is the reversed string. For palindrome check,s == s[::-1]is simple but uses O(n) extra space; two pointers from both ends give O(1) space.
ASCII Diagram: String as Indexed Sequence
s = "hello" (length n = 5)
Index: 0 1 2 3 4
Char: h e l l o
↑ ↑
s[0] s[4] or s[-1]
Slice s[1:4] → indices 1,2,3 → "ell"
Slice s[:3] → indices 0,1,2 → "hel" (prefix)
Slice s[2:] → indices 2,3,4 → "llo" (suffix)
Slice s[::-1] → step -1 → "olleh" (reverse)
Python Implementation: Essential Operations
Creation, Length, Indexing
s = "hello"
n = len(s) # 5
c0 = s[0] # 'h' — first character
c_last = s[-1] # 'o' — last character
sub = s[1:4] # "ell" — substring (indices 1 to 3)
rev = s[::-1] # "olleh" — reverse
Immutability: You Cannot Assign to s[i]
# s[0] = 'H' # TypeError: 'str' object does not support item assignment
# To "change" one character, build a new string:
s_new = 'H' + s[1:] # "Hello"
# Or for a generic index i:
i = 0
s_new = s[:i] + 'H' + s[i+1:] # "Hello" when i=0
Building a String Efficiently (List + join)
# Slow: O(n^2) — avoid in loops
result = ""
for c in "hello":
result = result + c.upper() # each += copies entire result
# Fast: O(n)
result = ''.join(c.upper() for c in "hello") # "HELLO"
# Or explicitly with a list:
parts = []
for c in "hello":
parts.append(c.upper())
result = ''.join(parts)
Common Methods Relevant to DSA
s = "hello world"
s.split() # ['hello', 'world'] — by whitespace
s.find("ell") # 1 — index of first occurrence, or -1 if not found
s.count("l") # 3 — number of non-overlapping occurrences
s.startswith("hel") # True
s.endswith("rld") # True
s.replace("l", "L") # "heLLo worLd" — returns new string
s.upper() # "HELLO WORLD"
s.lower() # "hello world"
Line-by-Line: Why join Is O(n) and += in a Loop Is O(n²)
Suppose you build a string by adding one character at a time in a loop. The first concatenation copies 1 character, the second copies 2, the third copies 3, … the nth copies n. Total work is 1 + 2 + 3 + … + n = n(n+1)/2 = O(n²). So repeated result += c is quadratic.
With a list: each parts.append(c) is amortized O(1). At the end, ''.join(parts) allocates one new string of length n and copies each character once: O(n). So the whole process is O(n).
Time Complexity of Key Operations
| Operation | Time |
|---|---|
s[i], len(s) | O(1) |
s[i:j] (slice of length k) | O(k) |
s1 + s2 (concatenation) | O(len(s1)+len(s2)) |
c in s, s.find(sub) | O(n) |
s1 == s2 | O(n) |
''.join(list_of_strings) (total length n) | O(n) |
Space Complexity
Storing a string of length n takes O(n) space. Slicing s[i:j] creates a new string of length k, so O(k) extra space. Reversing via s[::-1] uses O(n) extra space; reversing in place is not possible because strings are immutable. For O(1) extra space, use two pointers and compare s[left] and s[right] without creating a new string (e.g. for palindrome check).
Edge Cases
- Empty string:
""has length 0.s[0]would raiseIndexError. Checkif not sorlen(s) == 0before indexing. - Single character:
"a"has length 1;s[0]ands[-1]are the same. Reversing gives the same string (palindrome). - Whitespace and case: Depending on the problem, you may need to strip whitespace (
strip) or normalize case (lower()) before comparing (e.g. "A man a plan a canal Panama" for palindromes). - Unicode: In Python 3, strings are Unicode. One "character" might be one code point (e.g. 'a') or a grapheme cluster (e.g. some emojis). For many DSA problems we assume ASCII or single code points; for full i18n, consider normalization.
Common Mistakes
- Building a string with
+=in a loop: This leads to O(n²) time. Use a list and''.join()instead. - Trying to assign to
s[i]: Strings are immutable; uses = s[:i] + new_char + s[i+1:]or build a new string another way. - Assuming
s[i]is a special "char" type: In Python,s[i]is a string of length 1. Comparisons likes[0] == 'h'work; you can use it in sets or as dict keys. - Off-by-one in slices:
s[i:j]includes index i and excludes j. Sos[0:len(s)]is the whole string;s[0:len(s)-1]is all but the last character.
Using result += c (or result = result + c) inside a loop to build a string. Each concatenation copies the entire current result, so after n steps you've done O(n²) work. Always prefer parts.append(c) and ''.join(parts) when building strings in a loop.
Brute: Build string with += in a loop → O(n²) time. Better: Use a list, append each part in O(1) amortized, then ''.join(parts) in O(n) → O(n) total. For palindrome check, s == s[::-1] is O(n) time and O(n) space; optimal for space is two pointers and compare s[left] == s[right] while moving inward—O(n) time, O(1) space.
Pattern Recognition
When you see a string problem, ask: Do I need to read characters (index, iterate), extract substrings (slicing), compare strings (equality, lexicographic), or build a new string? For reading and comparing, use indices and loops. For building, use a list and join. For "is it a palindrome?", either s == s[::-1] (simple) or two pointers (O(1) space). For "are two strings anagrams?", sorted(s1)==sorted(s2) is O(n log n); frequency count (e.g. Counter) is O(n). These patterns will recur in the next topics (7.2–7.5).
Keep Python's string methods in mind: find, count, startswith, endswith, split, replace. For interviews, you're often allowed to use these; for learning, implement key logic yourself (e.g. substring search) to understand later algorithms like KMP. When in doubt, index and slice explicitly—it's clear and correct.
Interviewers expect you to know: strings are immutable; s[i] is O(1), slicing s[i:j] is O(k); building with += in a loop is O(n²)—use list and ''.join(). For palindrome: s == s[::-1] or two pointers. For anagram: sort both and compare, or use a frequency map. Stating these complexities and trade-offs shows you understand string basics and are ready for harder string problems.
Practice Problems
- Check if a string is a palindrome (two-pointer and slice versions).
- Reverse a string (and reverse words in a sentence).
- Build a string from alternating characters of two strings (e.g. "ace" and "bdf" → "abcdef").
- Given a string, replace every occurrence of a character with another (without using
str.replace).
Summary
- A string is an immutable sequence of characters; 0-based indexing,
s[i]O(1),s[i:j]O(k). - Immutability: You cannot assign to
s[i]; "changes" require building a new string (e.g. slice + concatenation). - Building strings: Use
parts.append(...)and''.join(parts)for O(n) total; avoidresult += cin a loop (O(n²)). - Use
len(s),s[::-1]for reverse,s.find,s.count,split,strip,upper/loweras needed; all return new values or indices. - Edge cases: empty string, single character, case and whitespace. For palindromes and anagrams, these basics are the foundation for the next topics in Section 7.
7.2 Frequency Counting
Introduction
Frequency counting means counting how many times each distinct element (character, digit, or key) appears in a string, array, or sequence. It is one of the most common and powerful techniques in string and array problems. Once you have a frequency map—a dictionary (or hash map) from element to count—you can answer questions like: "Do two strings have the same character counts?" (anagrams), "Which character appears most often?", "Can we form string A from the characters of string B?", and "How many characters do we need to change?" Frequency counting turns many "compare every pair" or "search for each element" ideas into a single pass plus O(1) lookups, giving O(n) time instead of O(n²) or worse.
In this topic we focus on character frequency in strings: building the map, using it for comparison and validation, and connecting it to anagrams (7.3) and other string problems. The same pattern applies to counting digits, array elements, or any hashable keys.
Real-World Analogy
Imagine you have a bag of letter tiles (like in Scrabble). To check whether you can spell a word, you don't try every possible arrangement—you count how many of each letter you have. Then for the target word, you count how many of each letter it needs. If, for every letter, your count is at least the word's count, you can spell it. That's frequency counting: two counts (your tiles, the word) and a per-letter comparison. Similarly, to see if two words are anagrams (same letters, different order), you compare their letter counts; if every letter has the same count in both, they're anagrams.
String s = "aabbbc". Frequency map: {'a': 2, 'b': 3, 'c': 1}. We can answer: "How many 'b's?" → 3. "Which character appears most?" → 'b'. For t = "abc", map {'a': 1, 'b': 1, 'c': 1}. To check "can we form t from characters of s?" we need, for each char in t, freq_s[char] >= freq_t[char]. Here s has 2 a's, 3 b's, 1 c; t needs 1, 1, 1 → yes. For anagrams: "listen" and "silent" have the same frequency map (each letter count 1), so they are anagrams.
Formal Definition
Frequency (count) of an element x in a sequence S: the number of indices i such that S[i] = x. A frequency map (or frequency table) is a mapping from each distinct element to its count. For a string of length n over an alphabet of size Σ, the map has at most Σ keys; we typically use a hash table (dict) so that building the map is O(n) and lookup/update is O(1) amortized. Two strings are anagrams iff their frequency maps are equal. We say we can "form" string A from string B iff for every character c, freq_B[c] ≥ freq_A[c].
Why This Topic Matters
- Anagrams (7.3): The standard way to check if two strings are anagrams is to compare their character frequencies—either build two maps and compare, or build one and decrement while scanning the other. Frequency counting is the backbone of anagram problems.
- Palindrome construction: A string can be rearranged into a palindrome iff at most one character has odd frequency. So building the frequency map is the first step.
- Substring / window problems: "Find the minimum window in s that contains all characters of t" uses frequency maps for t and for the current window; we update counts as we slide.
- Interviews: "Are two strings anagrams?", "First unique character", "Majority element", "Group anagrams"—all rely on counting. Using a dict or
Counterand stating O(n) time is expected.
Mental Model
One pass over the sequence: for each element, either add it to the map with count 1 (first time) or increment its count. After the pass, the map answers "how many times does x appear?" in O(1). For comparing two strings (e.g. anagrams), you can build two maps and check equality, or build one map from the first string and then iterate over the second, decrementing the count for each character—if any count goes negative or a character is missing, they're not anagrams. Think: "count first, compare counts."
Step-by-Step: Building a Frequency Map
- Initialize: Create an empty dictionary
freq = {}(ordefaultdict(int), orCounter()). - Single pass: For each character
cin the string, dofreq[c] = freq.get(c, 0) + 1(orfreq[c] += 1with defaultdict/Counter). - Use the map: Look up
freq[char]for any character; missing keys mean count 0 (usefreq.get(char, 0)if not using defaultdict/Counter).
Step-by-Step: Check Anagrams via Frequency
- If
len(s) != len(t), they cannot be anagrams → return False. - Build frequency map for
s: one pass, O(n). - Option A: Build frequency map for
tand checkfreq_s == freq_t. Option B: Iterate overt, and for eachcdecrementfreq_s[c]; ifcnot in map or count becomes negative, return False. Option B uses one map and avoids building a second.
ASCII Diagram: Frequency Map
s = "aabbbc"
Pass: a a b b b c
freq: a:1 a:2 b:1 b:2 b:3 c:1
Final freq = { 'a': 2, 'b': 3, 'c': 1 }
For t = "abc": need a≥1, b≥1, c≥1.
freq['a']=2≥1, freq['b']=3≥1, freq['c']=1≥1 → can form "abc" from s.
Python Implementation
Manual Frequency Map (dict)
def count_freq(s):
freq = {}
for c in s:
freq[c] = freq.get(c, 0) + 1
return freq
# Example: count_freq("aabbbc") → {'a': 2, 'b': 3, 'c': 1}
Using collections.Counter
from collections import Counter
def count_freq_counter(s):
return Counter(s) # one line; Counter is a dict subclass
# Counter("aabbbc") → Counter({'b': 3, 'a': 2, 'c': 1})
# freq['x'] returns 0 if 'x' not present (no KeyError)
Check Anagrams (Two Maps)
def are_anagrams(s, t):
if len(s) != len(t):
return False
return count_freq(s) == count_freq(t)
Check Anagrams (One Map, Decrement)
def are_anagrams_one_map(s, t):
if len(s) != len(t):
return False
freq = {}
for c in s:
freq[c] = freq.get(c, 0) + 1
for c in t:
if c not in freq or freq[c] == 0:
return False
freq[c] -= 1
return True
Can We Form String t from String s?
def can_form(t, s):
"""Can we form string t using characters from s (each char used at most once)?"""
freq_s = count_freq(s)
for c in t:
if freq_s.get(c, 0) < 1:
return False
freq_s[c] -= 1 # or: freq_s[c] = freq_s.get(c, 0) - 1
return True
Line-by-Line Explanation (One-Map Anagram Check)
if len(s) != len(t): return False: Different lengths ⇒ different total counts ⇒ cannot be anagrams.- First loop: build
freqfors. Each character increments its count. - Second loop: for each character in
t, we "use" one occurrence. If the character isn't in the map, or we've already used all of them (freq[c] == 0), return False. Otherwise decrementfreq[c]. - If we finish the second loop without returning, every character in
twas matched with a distinct occurrence ins, and lengths are equal, so the strings are anagrams.
Time Complexity
Building the map: One pass over n characters; each update is O(1) amortized. Total O(n).
Comparing two maps (anagrams): Building both maps is O(n + m). Comparing two dicts with at most |Σ| keys is O(|Σ|) or O(min(n, m)) in practice. So overall O(n + m). One-map decrement approach: O(n) for first string, O(m) for second → O(n + m).
Lookup: After the map is built, freq[c] or freq.get(c, 0) is O(1) amortized.
Space Complexity
The frequency map has at most min(n, |Σ|) entries (one per distinct character in the string). So O(min(n, |Σ|)). For ASCII we can say O(1) if we assume |Σ| is constant (128 or 256); for Unicode, O(k) where k is the number of distinct characters. For anagram check with one map we use one such map: O(min(n, |Σ|)).
Edge Cases
- Empty string: Frequency map is
{}. Two empty strings are anagrams. Forming "" from any string s is true (need zero of each character). - Single character:
"a"→{'a': 1}. "a" and "a" are anagrams. - All same character:
"aaaa"→{'a': 4}. "aaaa" and "aaaa" are anagrams; "aa" can be formed from "aaaa". - Case and spaces: Often the problem says "ignore case" or "alphanumeric only." Normalize first (e.g.
s = s.lower(); s = ''.join(c for c in s if c.isalnum())) before counting.
Common Mistakes
- Forgetting length check for anagrams: If you only decrement one map, different-length strings can still pass (e.g. "a" and "aa"—after decrementing, "aa" would try to use 'a' twice). Always check
len(s) == len(t)first. - KeyError when looking up missing character: Use
freq.get(c, 0)orCounter(which returns 0 for missing keys) instead offreq[c]when the character might not be in the map. - Using a list of size 26 for "lowercase letters only" but not normalizing: If the problem says only 'a'–'z', you can use
ord(c) - ord('a')as index—but ensure you've converted to lowercase first.
Checking anagrams without comparing lengths. If s = "a" and t = "aa", building one map from s gives {'a': 1}. Decrementing for "aa" would fail on the second 'a' (count 0), so you'd correctly return False—but if you only compare maps and forget to build the second map correctly, or if you use a method that doesn't account for multiplicity, you can get wrong results. Always enforce len(s) == len(t) for anagrams.
Brute Force vs Frequency Map
| Approach | Anagram check | Time |
|---|---|---|
| Sort both and compare | sorted(s)==sorted(t) | O(n log n) |
| Frequency count (dict or Counter) | Two maps or one map + decrement | O(n) |
Brute: For "are s and t anagrams?", compare every permutation of one to the other—O(n!)—or sort both and compare—O(n log n). Better: Frequency count both strings and compare maps—O(n) time, O(|Σ|) space. For "first non-repeating character," brute is O(n²) (for each position, scan to see if that char appears again); with a frequency map (one pass to count, one pass to find first with count 1) we get O(n).
Pattern Recognition
Think "frequency count" when you see: anagrams, permutation, same characters, rearrange, minimum window containing all characters of t, first unique / first non-repeating, majority element, can form string A from B. If the problem asks "how many of each?" or "do they have the same multiset of elements?", build a map first.
In Python, Counter from collections is ideal for frequency counting: Counter(s) in one line, and freq[c] returns 0 for missing keys. For "lowercase letters only" and tight constraints, a list of 26 ints with index ord(c)-ord('a') is also O(n) and uses less overhead than a dict. Use dict/Counter when the alphabet is large or unknown.
For "are two strings anagrams?", state: "We can sort both and compare—O(n log n)—or use a frequency map: one pass per string, compare maps—O(n)." Implement the map approach. Mention edge case: different lengths ⇒ not anagrams. For "first non-repeating character," say: "One pass to build frequency map, second pass to find first character with count 1—O(n) time, O(1) space if alphabet is fixed."
Practice Problems
- Check if two strings are anagrams (same characters, same counts).
- First non-repeating character in a string (return index or character).
- Given two strings s and t, can you form t using characters from s (each at most once)?
- Determine if a string can be rearranged into a palindrome (at most one character with odd frequency).
- Group anagrams: given a list of strings, group those that are anagrams of each other (use sorted string or frequency tuple as key).
Summary
- Frequency counting: one pass over the sequence, for each element increment its count in a dict (or Counter). Time O(n), space O(distinct elements).
- Use the map to compare counts (anagrams: same map), check "can form A from B" (freq_B ≥ freq_A for every char), or find first with count 1 (first non-repeating).
- Anagram check: length check, then two maps and compare, or one map and decrement for the second string. O(n) time.
- Edge cases: empty string, single char, case/non-alphanumeric (normalize first). Use
freq.get(c, 0)orCounterto avoid KeyError.
7.3 Anagram Problems
Introduction
Two strings are anagrams of each other if they contain the same characters with the same frequencies, only in a different order. "listen" and "silent" are anagrams; "hello" and "world" are not. Anagram problems are among the most common string questions in interviews and coding rounds: check if two strings are anagrams, group a list of strings into anagram clusters, find all starting indices where a pattern's anagram appears in a text, or compute the minimum number of character changes to make two strings anagrams. All of these rest on the same idea—character frequency—but the problem shape changes (two strings vs many strings, exact match vs sliding window). This topic builds on frequency counting (7.2) and gives you a toolkit for every anagram variant.
Real-World Analogy
Think of anagrams as the same set of letter tiles arranged differently. If you have the tiles L-I-S-T-E-N and your friend has S-I-L-E-N-T, you both have exactly one L, one I, one S, one T, one E, one N. So you can spell the same words with your tiles; the two words you spell are anagrams. A "group anagrams" problem is like sorting a pile of words into buckets: words that use the same multiset of letters go in the same bucket. "Find anagram of pattern in text" is like looking for a contiguous stretch in a long string where the letter counts match the pattern—same multiset, in a window.
"listen" and "silent": both have 1× l, i, s, t, e, n → anagrams. "aab" and "abb": first has 2 a's and 1 b, second has 1 a and 2 b's → not anagrams. Group anagrams: ["eat","tea","ate","tan","nat","bat"] → groups [["eat","tea","ate"], ["tan","nat"], ["bat"]] (same sorted form or same frequency map per group). Find anagrams of "ab" in "cbaebabacd": windows "ba", "ab", "ab" at indices 2, 4, 5 → answer [2, 4, 5].
Formal Definition
Anagram: Strings s and t are anagrams iff they have the same length and the same multiset of characters (i.e. the same frequency map). Equivalently, t is a permutation of s. So anagram relation is symmetric and transitive on a set of strings. Group anagrams: Partition a list of strings so that two strings are in the same group iff they are anagrams. Find anagrams in text: Given text T and pattern P, find every starting index i such that the substring T[i : i+|P|] is an anagram of P. This is a fixed-length sliding window with frequency comparison.
Why This Topic Matters
- Interview staple: "Valid Anagram" and "Group Anagrams" are classic LeetCode-style questions. "Find All Anagrams in a String" combines anagrams with sliding window—very common.
- Reuses 7.2: Every anagram solution uses frequency counts. Here we apply that in different problem shapes: pair comparison, grouping by key, and sliding window.
- Pattern for other problems: "Minimum steps to make two strings anagrams" (count difference); "anagram palindrome" (at most one odd count). Same frequency logic, different output.
Mental Model
Anagrams = same character counts, different order. So compare counts, not order. For two strings: build counts and compare, or build one and decrement with the other. For grouping: assign each string a canonical key (sorted string or tuple of counts) and group by that key. For "find anagram in text": maintain a window of length |P|, keep a frequency map for the window, slide and update the map; when window's map equals pattern's map, record the start index.
Problem 1: Check If Two Strings Are Anagrams
Given strings s and t, return True if they are anagrams, False otherwise.
Approach
If len(s) != len(t), return False. Then either: (A) build frequency maps for both and check freq_s == freq_t, or (B) build one map from s and iterate over t decrementing; if any character is missing or count goes negative, return False.
from collections import Counter
def is_anagram(s, t):
if len(s) != len(t):
return False
return Counter(s) == Counter(t)
# One-map variant (no Counter):
def is_anagram_one_map(s, t):
if len(s) != len(t):
return False
freq = {}
for c in s:
freq[c] = freq.get(c, 0) + 1
for c in t:
if freq.get(c, 0) == 0:
return False
freq[c] -= 1
return True
Time O(n+m), space O(|Σ|). With length check, n = m.
Problem 2: Group Anagrams
Given a list of strings, group them so that each group contains all anagrams of each other. Return a list of groups (lists of strings).
Approach
Every anagram has the same "signature": either the sorted string (e.g. "eat" → "aet") or a tuple of character counts. Use this as a key in a dict: key → list of strings with that key. One pass over the list: for each string, compute key, append to groups[key]. Return list(groups.values()).
from collections import defaultdict
def group_anagrams(strs):
groups = defaultdict(list)
for s in strs:
key = tuple(sorted(s)) # or: key = ''.join(sorted(s))
groups[key].append(s)
return list(groups.values())
# Alternative key: tuple of counts for 'a'..'z' (if lowercase only)
def group_anagrams_count_key(strs):
groups = defaultdict(list)
for s in strs:
count = [0] * 26
for c in s:
count[ord(c) - ord('a')] += 1
groups[tuple(count)].append(s)
return list(groups.values())
Time: O(n · k log k) with sorted key (n strings, max length k), or O(n · k) with count tuple. Space: O(n · k) for output.
Problem 3: Find All Anagrams in a String
Given string s (text) and string p (pattern), return a list of all start indices in s such that the substring of length len(p) starting at that index is an anagram of p.
Approach
Sliding window of fixed length len(p). Build frequency map need for p. Maintain a window map (or a single map that we update as we slide). For each window, check if the window's character counts match need. When we slide right, remove the character that leaves the window and add the new one. We can track a single map: when we add a char to the window, increment; when we remove, decrement. When the map equals need, append the start index. Comparing two maps each time is O(26) for lowercase; we can instead track "how many distinct chars have the correct count" to get O(1) comparison.
from collections import Counter
def find_anagrams(s, p):
if len(p) > len(s):
return []
need = Counter(p)
window = Counter(s[:len(p)])
result = []
if window == need:
result.append(0)
for i in range(len(p), len(s)):
# add s[i], remove s[i - len(p)]
window[s[i]] = window.get(s[i], 0) + 1
left_char = s[i - len(p)]
window[left_char] -= 1
if window[left_char] == 0:
del window[left_char]
if window == need:
result.append(i - len(p) + 1)
return result
Time O(n) (n = len(s)); each step does O(1) or O(26) for map compare. Space O(1) if alphabet size is constant.
Problem 4: Minimum Number of Steps to Make Two Strings Anagrams
You are allowed to change a character in one string to any other. Return the minimum number of such changes so that the two strings become anagrams. (Equivalent: how many characters differ in the multiset view? Or: total length minus "matched" count.)
Approach
Count frequency of each character in both strings. For each character, the "surplus" in one string that we can't match with the other is the extra we must change. One common approach: count frequency of s and t. For each character, the number of changes needed for that character is |freq_s[c] - freq_t[c]|. Sum over all c and divide by 2 (each "change" fixes one excess in one string and one deficit in the other). Alternatively: minimum steps = (sum of (freq_s[c] - freq_t[c]) for c where freq_s[c] > freq_t[c]) = half of sum of absolute differences.
def min_steps_anagram(s, t):
from collections import Counter
freq_s = Counter(s)
freq_t = Counter(t)
total_diff = 0
all_keys = set(freq_s) | set(freq_t)
for c in all_keys:
total_diff += abs(freq_s.get(c, 0) - freq_t.get(c, 0))
return total_diff // 2
Time O(n+m), space O(|Σ|).
Evolution: Brute → Sort → Frequency
| Approach | Time (two strings) | Note |
|---|---|---|
| Check all permutations of one | O(n!) | Impractical |
| Sort both, compare | O(n log n) | Simple, no extra structure |
| Frequency count (dict/Counter) | O(n) | Optimal for comparison |
For two strings: frequency count wins—O(n). For group anagrams, the key must be canonical: sorted string is O(k log k) per string; count tuple is O(k) per string (fixed alphabet). For find anagrams in text, sliding window + frequency map is O(n); avoid re-building the window map from scratch each time—update incrementally when sliding.
Edge Cases
- Empty strings: Two empty strings are anagrams. Group anagrams of
[""]→[[""]]. Find anagrams of "" in s: every index is valid (often problem says pattern length > 0). - Unequal length: For "are s and t anagrams?", if lengths differ, return False immediately.
- Single character: "a" and "a" are anagrams; "a" and "b" are not.
- Case and non-letters: Often problems say "lowercase only" or "ignore case and non-alphanumeric." Normalize before counting (e.g.
s = ''.join(c.lower() for c in s if c.isalnum())).
Common Mistakes
- Forgetting the length check for "are two strings anagrams?"—different lengths can sometimes pass a decrement-based check if you're not careful (e.g. "a" vs "aa": after processing "a", freq is 0; second 'a' in "aa" fails).
- In "find anagrams in string," building a new Counter for every window instead of updating the previous window—that makes each step O(k) and total O(n·k). Update in O(1): add one char, remove one char.
- In group anagrams, using the string itself as key—"eat" and "tea" would get different keys. Use sorted string or count tuple.
In find-all-anagrams, comparing the full window map to the pattern map by rebuilding the window from the substring each time: Counter(s[i:i+len(p)]) == need in a loop. That is O(k) per index, so O(n·k). Instead, maintain one window map and update it when sliding: drop s[i-1], add s[i+len(p)-1]—O(1) per step, total O(n).
Pattern Recognition
Keywords: anagram, permutation, same characters, rearrange, group by same letters. If the problem asks "same multiset" or "same characters in any order," use frequency. For "all start indices where window is anagram of pattern," use fixed-length sliding window + frequency map.
Start with "Two strings are anagrams iff they have the same character counts. So we can sort both and compare—O(n log n)—or use a Counter for each—O(n)." For group anagrams: "Use a canonical key—sorted string or tuple of counts—and group by that key." For find anagrams in string: "Sliding window of length len(p), maintain frequency of the window, update when we slide; when window count matches pattern count, add start index." Mention length check and empty/case edge cases.
Practice Problems
- Valid Anagram (LeetCode 242): check if two strings are anagrams.
- Group Anagrams (LeetCode 49): group list of strings into anagram groups.
- Find All Anagrams in a String (LeetCode 438): return all start indices.
- Minimum Number of Steps to Make Two Strings Anagram (LeetCode 1347).
- Check if a string can be rearranged into a palindrome (at most one odd count).
Summary
- Anagrams = same character counts, any order. Always use frequency (dict/Counter); optionally sort for a canonical key.
- Check two strings: length check, then Counter(s)==Counter(t) or one-map decrement. O(n).
- Group anagrams: key = sorted(s) or tuple of counts; group by key. O(n·k) or O(n·k log k).
- Find anagrams in text: sliding window of length |p|, maintain window frequency, update on slide; when window == need, record start. O(n).
- Min steps to anagram: sum of |freq_s[c]-freq_t[c]| over all c, divided by 2. O(n+m).
7.4 Palindrome Problems
Introduction
A palindrome is a string that reads the same forward and backward: "aba", "racecar", "a". Palindrome problems are extremely common in string interviews: check if a string is a palindrome, find the longest palindromic substring, determine if a string can be rearranged into a palindrome, or compute the minimum number of insertions (or deletions) to make a string a palindrome. The core idea is symmetry around a center—either the string equals its reverse, or we expand from centers to find palindromic substrings. This topic ties together string basics (7.1), indexing and two pointers, and frequency counting (7.2) for the "rearrange to palindrome" variant.
Real-World Analogy
Imagine writing a word on a strip of paper and holding it up to a mirror. If the word looks the same in the mirror as it does on the paper (ignoring the mirror flip), it's a palindrome—the first letter matches the last, the second matches the second-to-last, and so on. "NOON" is a palindrome; "NOON" in the mirror still reads NOON. For "longest palindromic substring," you're looking for the longest stretch inside a string that has this mirror property. For "can we rearrange to form a palindrome?", you're asking: can we arrange the letters so that the left half is the mirror of the right? That's possible only if at most one letter appears an odd number of times (that one can go in the middle).
"aba" → same forward and backward → palindrome. "abba" → palindrome. "abc" → not (a≠c). Valid palindrome (ignore non-alphanumeric, case): "A man, a plan, a canal: Panama" → normalize to "amanaplanacanalpanama" → same as reverse → valid. Longest palindromic substring in "babad": "bab" or "aba" (length 3). Can rearrange to palindrome? "aab" → 2 a's, 1 b → one odd (b) → yes, e.g. "aba". "abc" → three odds → no.
Formal Definition
Palindrome: String s of length n is a palindrome iff s[i] = s[n−1−i] for all i in 0..n−1. Equivalently, s = s[::-1] (reverse). A palindromic substring is any contiguous substring that is a palindrome. A string can be rearranged into a palindrome iff at most one character has odd frequency (that character can sit in the center; the rest pair off). Valid palindrome (typical problem): after removing non-alphanumeric characters and normalizing case, the string reads the same forward and backward.
Why This Topic Matters
- Interview staple: "Valid Palindrome," "Longest Palindromic Substring," and "Palindrome Number" (digit version) appear constantly. "Minimum insertions to make palindrome" is a classic DP variant.
- Two-pointer foundation: Checking a palindrome with two pointers (left and right moving inward) is the standard O(1)-space approach and generalizes to "expand around center" for longest palindromic substring.
- Links to 7.2 and 7.11: "Can rearrange to palindrome" uses frequency counts; longest palindromic substring has an optimal linear-time algorithm (Manacher's, topic 7.11). Here we cover the core ideas and O(n²) expand-around-center.
Mental Model
Check palindrome: Compare first with last, second with second-to-last, etc. Either use s == s[::-1] (simple, O(n) space) or two pointers left, right; while left < right, if s[left] != s[right] return False; then left += 1, right -= 1. Longest palindromic substring: For each possible "center" (character or between two characters), expand outward while left and right match; track the longest. Rearrange to palindrome: Count character frequencies; allow at most one character with odd count.
Problem 1: Check If a String Is a Palindrome
Given string s, return True if it reads the same forward and backward, False otherwise.
Approach
Slice: return s == s[::-1]. Time O(n), space O(n) for the reversed copy. Two pointers: left = 0, right = len(s)-1; while left < right, if s[left] != s[right] return False; then left += 1, right -= 1. Return True. Time O(n), space O(1).
def is_palindrome_slice(s):
return s == s[::-1]
def is_palindrome_two_pointers(s):
left, right = 0, len(s) - 1
while left < right:
if s[left] != s[right]:
return False
left += 1
right -= 1
return True
Problem 2: Valid Palindrome (Ignore Non-Alphanumeric, Case)
Given a string, consider only alphanumeric characters and ignore case. Return True if the resulting string is a palindrome.
Approach
Normalize: build a string (or list) of alphanumeric characters only, lowercased. Then check that string with two pointers (or slice). Alternatively, use two pointers on the original string: skip non-alphanumeric by advancing left or right until they point to alphanumeric, then compare (lowercase); if unequal return False.
def is_valid_palindrome(s):
# Normalize then check
cleaned = ''.join(c.lower() for c in s if c.isalnum())
left, right = 0, len(cleaned) - 1
while left < right:
if cleaned[left] != cleaned[right]:
return False
left += 1
right -= 1
return True
# O(1) extra space: two pointers on original, skip non-alphanumeric
def is_valid_palindrome_inplace(s):
left, right = 0, len(s) - 1
while left < right:
while left < right and not s[left].isalnum():
left += 1
while left < right and not s[right].isalnum():
right -= 1
if s[left].lower() != s[right].lower():
return False
left += 1
right -= 1
return True
Time O(n), space O(n) for cleaned string or O(1) for in-place.
Problem 3: Longest Palindromic Substring
Given string s, return the longest contiguous substring that is a palindrome.
Approach: Expand Around Center
Every palindrome has a "center": either a single character (odd length) or between two characters (even length). For each center (2n−1 possible centers: n for character, n−1 for between), expand left and right while s[left] == s[right]; track the longest substring seen. Total O(n²) time, O(1) space. (Manacher's algorithm, topic 7.11, does O(n).)
def longest_palindrome(s):
if not s:
return ""
n = len(s)
start, max_len = 0, 1
def expand(l, r):
nonlocal start, max_len
while l >= 0 and r < n and s[l] == s[r]:
if r - l + 1 > max_len:
max_len = r - l + 1
start = l
l -= 1
r += 1
for i in range(n):
expand(i, i) # odd-length: center at s[i]
expand(i, i + 1) # even-length: center between s[i] and s[i+1]
return s[start:start + max_len]
Time O(n²), space O(1).
Problem 4: Can String Be Rearranged Into a Palindrome?
Given a string, determine if its characters can be rearranged to form a palindrome (e.g. "aab" → "aba").
Approach
Build character frequency. A palindrome has at most one character with odd count (the center). So: count odds; return sum(1 for c in freq.values() if c % 2 == 1) <= 1.
from collections import Counter
def can_rearrange_palindrome(s):
freq = Counter(s)
odd_count = sum(1 for c in freq.values() if c % 2 == 1)
return odd_count <= 1
Time O(n), space O(|Σ|).
Problem 5: Count Palindromic Substrings
Given string s, return the number of palindromic substrings (every distinct contiguous substring that is a palindrome).
Approach
Same "expand around center" as longest palindromic substring: for each center, expand and for each valid (left, right) where s[left:right+1] is a palindrome, count one. Odd center: start with (i, i); even: (i, i+1). While expanding, each time s[l]==s[r] we have one more palindromic substring.
def count_palindromic_substrings(s):
n = len(s)
count = 0
def expand(l, r):
nonlocal count
while l >= 0 and r < n and s[l] == s[r]:
count += 1
l -= 1
r += 1
for i in range(n):
expand(i, i)
expand(i, i + 1)
return count
Time O(n²), space O(1).
Evolution: Check Palindrome
| Approach | Time | Space |
|---|---|---|
s == s[::-1] | O(n) | O(n) |
| Two pointers (left/right) | O(n) | O(1) |
For check palindrome: two pointers give O(1) space; slice is simpler but O(n) space. For longest palindromic substring, expand-around-center is O(n²). Manacher's algorithm (topic 7.11) achieves O(n) by reusing information from previous centers. For count palindromic substrings, expand-around-center is standard and optimal without Manacher.
Edge Cases
- Empty string: Often considered a palindrome (length 0). Check problem statement.
- Single character: Always a palindrome.
- Valid palindrome: String with no alphanumeric characters (e.g. " ") → empty after cleaning → typically true (empty is palindrome).
- Longest palindromic substring: If no palindrome of length > 1, return any single character (e.g.
s[0]).
Common Mistakes
- In "valid palindrome," forgetting to skip non-alphanumeric or to normalize case—"A" and "a" must be treated as the same.
- In expand-around-center, only expanding from characters and forgetting even-length palindromes (center between two chars)—so you miss "abba". Always try both
expand(i,i)andexpand(i,i+1). - For "rearrange to palindrome," confusing with "is the string already a palindrome?"—we only need at most one odd count, not that the string equals its reverse.
When implementing expand-around-center for longest palindromic substring, using only one type of center (e.g. odd-length only). Even-length palindromes like "abba" have their center between two characters; you must call expand(i, i+1) for each i as well as expand(i, i).
Pattern Recognition
Keywords: palindrome, reads same forward and backward, symmetric, longest palindromic, rearrange to palindrome. Check → two pointers or reverse. Longest/count palindromic substrings → expand around center (or Manacher for linear). Rearrange → frequency count, at most one odd.
For "valid palindrome" with only alphanumeric and case ignored, the in-place two-pointer version avoids building a new string and is O(1) space. For longest palindromic substring, expand-around-center is interview-friendly and O(n²); mention that Manacher's gives O(n) if the interviewer asks for optimal.
"Is it a palindrome?" → "We can compare with reverse—s == s[::-1]—or use two pointers from both ends, O(1) space." For longest palindromic substring: "For each center (character or between two), expand while characters match; track the longest. O(n²) time, O(1) space. There's also Manacher's algorithm for O(n)." For "can rearrange to palindrome": "Count character frequencies; we need at most one character with odd count—O(n) with a Counter."
Practice Problems
- Valid Palindrome (LeetCode 125): ignore non-alphanumeric, case.
- Longest Palindromic Substring (LeetCode 5): expand around center or Manacher.
- Palindromic Substrings (LeetCode 647): count palindromic substrings.
- Palindrome Permutation (LeetCode 266): can string be rearranged to palindrome?
- Minimum Insertions to Make Palindrome (DP or longest palindromic subsequence).
Summary
- Palindrome = reads same forward and backward. Check:
s == s[::-1]or two pointers. O(n) time; two pointers use O(1) space. - Valid palindrome: Normalize (alphanumeric, lowercase) then check, or two pointers skipping non-alphanumeric.
- Longest palindromic substring: Expand around center (odd and even centers); O(n²). Manacher's (7.11) is O(n).
- Can rearrange to palindrome: At most one character with odd frequency. Build freq map, count odds ≤ 1.
- Count palindromic substrings: Same expand-around-center; increment count for each valid expansion.
7.5 Pattern Matching
Introduction
Pattern matching (or string search) is the problem: given a text string T of length n and a pattern string P of length m, find all starting positions (or the first position) in T where P occurs as a contiguous substring. For example, pattern "ab" in text "cababc" occurs at indices 1 and 4. This is one of the most studied problems in string algorithms: search in a document, DNA sequence, or log file. The naive approach—try every possible start index and compare character by character—is correct and easy to implement; it runs in O(n·m) in the worst case. Faster algorithms (Rabin-Karp with hashing, KMP, Z algorithm) avoid re-scanning the text unnecessarily and achieve O(n + m). This topic introduces the problem and the naive method; topics 7.6–7.10 cover hashing and linear-time algorithms.
Real-World Analogy
Imagine searching for a phrase in a book. You slide a bookmark (the "window") along the page. At each position you check whether the characters under the window match the phrase. If they do, you've found an occurrence. If not, you move the window one position to the right and try again. The naive method does exactly that: for every starting position, compare the next m characters with the pattern. Smarter methods (KMP, etc.) use the fact that after a mismatch, you sometimes know enough to skip ahead more than one position—like remembering "we already matched 'ab', so if we fail on the next character we can try aligning the pattern so that the 'ab' we already saw is reused."
Text T = "abcabc", pattern P = "abc". Start at 0: T[0:3] = "abc" matches → index 0. Start at 1: "bca" ≠ "abc". Start at 2: "cab" ≠ "abc". Start at 3: T[3:6] = "abc" matches → index 3. Result: [0, 3]. With P = "abd", no matches. Pattern longer than text (e.g. P length 10, T length 5) → no valid start index, return [].
Formal Definition
Pattern matching (exact): Given text T[0..n−1] and pattern P[0..m−1], find all indices i in [0, n−m] such that T[i..i+m−1] = P (character by character). We require m ≤ n; otherwise there are no valid positions. Output: list of start indices, or −1 / empty list if none. Variants: find first occurrence only, or count occurrences. The problem is also called "substring search" or "find pattern in text."
Why This Topic Matters
- Fundamental problem: Search engines, editors, and bioinformatics all do pattern matching. Understanding the naive method and its cost motivates KMP and hashing (7.6–7.10).
- Interview baseline: "Find first occurrence of pattern in text" can be solved with
str.findin Python, but interviewers often want you to implement the naive loop or discuss how to do it in O(n + m) with KMP. - Building block: Many string problems (e.g. "repeated substring," "longest repeating substring") use substring search or similar ideas. Pattern matching is the core.
Mental Model
Slide a window of length m over the text. For each start index i, the window is T[i..i+m−1]. Check if the window equals P by comparing T[i+j] with P[j] for j = 0..m−1. If any j fails, try the next i. If all j match, record i and then try the next i. The naive method never "skips" i based on what we learned from a mismatch; that's what KMP improves on.
Step-by-Step: Naive (Brute-Force) Algorithm
- If
len(P) > len(T), return [] (no possible match). - For
i = 0ton - m(inclusive): the candidate start index isi. - For
j = 0tom - 1: ifT[i+j] != P[j], break (mismatch at thisi). - If the inner loop completed without breaking, then
T[i:i+m] == P; appendito the result. - Return the list of start indices.
ASCII Diagram: Naive Search
T = "abcabc" P = "abc" n=6, m=3
i=0: [a b c] a b c compare T[0..2] with P → match → add 0
i=1: a [b c a] b c T[1..3] = "bca" ≠ "abc" → skip
i=2: a b [c a b] c T[2..4] = "cab" ≠ "abc" → skip
i=3: a b c [a b c] T[3..5] = "abc" → match → add 3
Result: [0, 3]
Python Implementation
Naive: Find All Occurrences
def find_pattern_naive(text, pattern):
n, m = len(text), len(pattern)
if m > n:
return []
result = []
for i in range(n - m + 1):
match = True
for j in range(m):
if text[i + j] != pattern[j]:
match = False
break
if match:
result.append(i)
return result
Naive: Find First Occurrence Only
def find_first_naive(text, pattern):
n, m = len(text), len(pattern)
if m > n:
return -1
for i in range(n - m + 1):
for j in range(m):
if text[i + j] != pattern[j]:
break
else:
return i # no break in inner loop
return -1
Using Python's str.find
# First occurrence
first = text.find(pattern) # -1 if not found
# All occurrences (loop with start parameter)
def find_all_builtin(text, pattern):
result = []
start = 0
while True:
i = text.find(pattern, start)
if i == -1:
break
result.append(i)
start = i + 1 # next search after this match
return result
Line-by-Line Explanation (Naive)
if m > n: return []: No room for pattern; no valid start index.for i in range(n - m + 1): Valid start indices are 0 through n−m (inclusive). So n−m+1 positions.- Inner loop: for each j from 0 to m−1, compare
text[i+j]withpattern[j]. If any mismatch, setmatch = Falseand break. - If we never broke,
matchis still True → full match ati→ appendi.
Time Complexity
Worst case: Pattern never matches (e.g. T = "aaa...a", P = "aab"). At each of the (n−m+1) start positions we may compare up to m characters before a mismatch. Total comparisons can be (n−m+1)·m = O(n·m). When m is small or matches are rare, behavior is closer to O(n).
Best case: First character of P never matches in T (e.g. P starts with 'x', T has no 'x'). Then at each i we do one comparison and break. Total O(n).
Python str.find: Implementations typically use a mix of strategies; worst case can still be O(n·m) for a naive implementation, but in practice often better for random text.
Space Complexity
Only a few variables (indices, result list). The result list holds at most O(n) indices (if the pattern is length 1 and every character matches). So O(n) for the output; O(1) auxiliary space excluding output.
Edge Cases
- Empty pattern: Usually defined as matching everywhere (every index is a "start"). Check problem:
find("abc", "")might return [0,1,2,3] or similar. Often problems assume m ≥ 1. - Pattern longer than text: Return [] or −1.
- Pattern equals text: One match at index 0.
- Pattern not in text: Return [] or −1.
- Overlapping matches: e.g. T = "aaa", P = "aa". Naive finds [0, 1]. Built-in
findwithstart = i+1after each match also finds overlapping occurrences.
Common Mistakes
- Off-by-one in the range of
i: validigoes from 0 ton - minclusive, sorange(n - m + 1). Usingrange(n - m)misses the last valid start when n > m. - Comparing with
pattern[j]but indexing text withi + j; forgetting the+ jand writingtext[i]in the inner loop (wrong).
The last valid start index for the pattern is n - m, not n - m - 1. So the loop must be for i in range(n - m + 1). If you use range(n - m), you never check the window that starts at index n−m (e.g. T of length 6, P of length 3: you must check i = 0,1,2,3; i=3 is 6-3 = 3).
Evolution: Naive → Hashing → KMP
| Method | Time (typical) | Topic |
|---|---|---|
| Naive (try every start, compare) | O(n·m) | 7.5 |
| Rabin-Karp (rolling hash) | O(n + m) average | 7.7, 7.8 |
| KMP (failure function) | O(n + m) | 7.9 |
Brute force is correct and often acceptable when the text and pattern are small or when you only need one match and the pattern is unlikely to match at every position. For large inputs or when you need guaranteed linear time, use Rabin-Karp (hashing; topic 7.8) or KMP (topic 7.9). KMP avoids re-scanning the text by precomputing a "failure" function on the pattern and shifting the pattern intelligently after a mismatch.
Pattern Recognition
Keywords: find pattern, substring search, needle in haystack, first occurrence of, all occurrences. When the problem is "exact match" of a contiguous substring, it's pattern matching. Use naive for interviews unless asked for O(n+m); mention KMP or hashing as the next step.
In Python, text.find(pattern) returns the first index or −1; pattern in text returns a boolean. For "all occurrences," loop with start = text.find(pattern, start) + 1 until find returns −1. For implementing from scratch, the naive double loop is expected; stating "we could optimize with KMP for O(n+m)" shows you know the theory.
"Find pattern in text" → "We can try every start index from 0 to n−m and compare the next m characters with the pattern. That's O(n·m) worst case. In Python we'd use str.find. For O(n+m) we can use KMP or Rabin-Karp with rolling hash." Implement the naive loop; mention edge case: pattern longer than text → no match. Last valid start index is n−m, so loop is range(n - m + 1).
Practice Problems
- Implement strStr() / Find First Occurrence (LeetCode 28): return first index of pattern in text, or −1.
- Find all occurrences of a pattern in a long text (naive then try Rabin-Karp or KMP).
- Repeated Substring Pattern: can the string be written as a shorter pattern repeated? (Uses substring check.)
Summary
- Pattern matching: find start indices in text T where pattern P occurs as a contiguous substring. Require m ≤ n.
- Naive algorithm: For i in 0..n−m, compare T[i..i+m−1] with P character by character. Record i if match. Time O(n·m), space O(1) auxiliary.
- Valid start indices: 0 to n−m inclusive →
range(n - m + 1). Usetext[i+j]andpattern[j]in inner loop. - Faster methods: Rabin-Karp (7.8), KMP (7.9), Z algorithm (7.10) achieve O(n + m). Naive is the baseline; use it unless linear time is required.
7.6 String Hashing
Introduction
String hashing assigns a numeric value (a hash) to a string so that we can compare two strings by comparing their hashes: if the hashes are equal, the strings are likely equal (with a small collision probability); if the hashes differ, the strings are definitely different. The standard approach is the polynomial rolling hash: treat the string as a number in base B, with each character as a digit, and take the value modulo a large prime M. With precomputed prefix hashes and powers of B, we can compute the hash of any substring in O(1)—which is the key to Rabin-Karp pattern matching (7.8) and to quickly comparing substrings in problems like "longest duplicate substring." This topic builds the hash function and the prefix structure; 7.7 Rolling Hash and 7.8 Rabin-Karp use it for sliding-window updates.
Real-World Analogy
Think of a string as a number in a strange base: 'a' = 1, 'b' = 2, … (or use character codes). The string "abc" might become the number 1·B² + 2·B + 3 for some base B. Two different strings usually produce different numbers; we then take that number modulo a large prime to keep it in a manageable range. Comparing two long strings by their hashes is like comparing two people by their ID numbers: if the IDs match, it's almost certainly the same person; if they don't match, they're different. The "almost" is because two different people could in theory have the same ID (hash collision)—we make that very unlikely by choosing a large modulus and sometimes using two hashes.
String "ab" with base B=31, treating 'a'=1, 'b'=2: hash = 1·31 + 2 = 33 (before mod). With modulus M=10⁹+7: hash = 33. For "abc": 1·31² + 2·31 + 3. Substring s[1:3] = "bc": we can get it as (hash of "abc") − (hash of "a")·31², then mod M—that's the idea behind prefix hashes. So we precompute hash of every prefix s[0:i]; then hash(s[i:j]) = (prefix[j] − prefix[i]·B^(j−i)) mod M (with proper handling of mod).
Formal Definition
Polynomial rolling hash: For string s of length n, assign a numeric value to each character (e.g. ord(c) or ord(c) - ord('a') + 1). The hash of s is H(s) = (s₀·B^(n−1) + s₁·B^(n−2) + … + s_(n−1)) mod M, where B is the base (e.g. 31, 131) and M is a large prime (e.g. 10⁹+7). Prefix hash: P[i] = hash of s[0..i−1] (length i). Then hash of s[i..j−1] = (P[j] − P[i]·B^(j−i)) mod M, using modular arithmetic. We need precomputed B^k for shifts.
Why This Topic Matters
- Rabin-Karp (7.8): Pattern matching in O(n+m) average time by comparing pattern hash with each window hash; the window hash is updated in O(1) using a "rolling" update (topic 7.7).
- Substring comparison in O(1): Problems like "longest repeated substring," "compare two substrings," or "number of distinct substrings" can use hashing to compare substrings in O(1) after O(n) preprocessing.
- Interview and contests: String hashing is a standard tool when you need fast equality checks on substrings without building suffix structures (e.g. suffix array 7.12).
Mental Model
String → number in base B (with mod M). Same string → same number. Different strings → usually different numbers (collision possible but rare for good B, M). Store hash of each prefix; then substring s[i:j] has hash = (prefix[j] − prefix[i]·B^(j−i)) mod M. We precompute pow[i] = B^i % M so that multiplying by B^(j−i) is O(1).
Polynomial Hash Formula
For string s with character values c₀, c₁, …, c_(k−1) (length k):
H = (c₀·B^(k−1) + c₁·B^(k−2) + … + c_(k−1)·B^0) mod M
So we can compute H in one pass: start with 0, then for each character do H = (H * B + char_value) % M. For substring s[i:j] (length L = j−i), we need H_sub = (P[j] − P[i]·B^L) mod M. To avoid negative values: H_sub = (P[j] − (P[i] · pow[L]) % M + M) % M.
Step-by-Step: Building Prefix Hashes
- Choose base
B(e.g. 31, 131) and modulusM(e.g. 10**9+7). - Precompute
pow[i] = (B ** i) % Mfor i = 0..n (or compute on the fly). prefix[0] = 0. For i from 1 to n:prefix[i] = (prefix[i-1] * B + char_value(s[i-1])) % M. Hereprefix[i]is the hash ofs[0:i].- Hash of substring
s[i:j]=(prefix[j] - prefix[i] * pow[j-i]) % M, then(... + M) % Mto keep non-negative.
Python Implementation
Single String Hash and Prefix Hashes
def build_hashes(s, B=31, M=10**9+7):
"""Return (prefix hashes, powers of B). prefix[i] = hash of s[0:i]."""
n = len(s)
prefix = [0] * (n + 1)
pow_B = [1] * (n + 1)
for i in range(1, n + 1):
pow_B[i] = (pow_B[i-1] * B) % M
# char value: ord(s[i-1]) or (ord(s[i-1]) - ord('a') + 1) for lowercase
val = ord(s[i-1])
prefix[i] = (prefix[i-1] * B + val) % M
return prefix, pow_B
def substr_hash(prefix, pow_B, i, j, M=10**9+7):
"""Hash of s[i:j] (0-indexed, j exclusive)."""
L = j - i
h = (prefix[j] - prefix[i] * pow_B[L]) % M
if h < 0:
h += M
return h
Compute Hash of a String (No Prefix)
def string_hash(s, B=31, M=10**9+7):
h = 0
for c in s:
h = (h * B + ord(c)) % M
return h
Line-by-Line: Substring Hash Formula
prefix[i] is the hash of s[0:i], i.e. s₀·B^(i−1) + s₁·B^(i−2) + … + s_(i−1). So prefix[j] = hash of s[0:j]. The substring s[i:j] has hash = s_i·B^(L−1) + … + s_(j−1)·B^0. We have prefix[j] = prefix[i]·B^L + (s_i·B^(L−1) + … + s_(j−1)). So hash(s[i:j]) = prefix[j] − prefix[i]·B^L. Taking mod M: (prefix[j] - prefix[i] * pow_B[L]) % M. Adding M if negative keeps the result in [0, M−1].
Time Complexity
Building prefix hashes: One pass over the string, O(1) per character. Precomputing pow_B is O(n). Total O(n).
Single substring hash: O(1) using prefix and pow_B.
Hash of a string (no prefix): O(k) for string of length k.
Space Complexity
Prefix array: O(n). Power array: O(n). So O(n) for the precomputed structures. Single hash value is O(1).
Collisions and Double Hashing
Two different strings can have the same hash (collision). With one modulus M ≈ 10⁹, the probability of a collision among n comparisons is roughly n²/(2M)—acceptable for many contests. To reduce collision risk, use two bases/moduli and compare both hashes; only if both match do we declare equality (and optionally verify with one character comparison).
# Double hash: (h1, h2) with (B1, M1) and (B2, M2)
# Equal only if h1 and h2 both match. Very low collision rate.
Edge Cases
- Empty substring: s[i:i] has length 0. Hash is 0 by convention (prefix[i] − prefix[i]·B^0 = 0).
- Single character: s[i:i+1] has hash = (prefix[i+1] − prefix[i]·B) % M = ord(s[i]) if we define prefix and pow correctly.
- Full string: s[0:n] hash = prefix[n], which is the standard hash of the whole string.
- Modulo: Use
(x % M + M) % Mwhen subtracting to avoid negative values.
Common Mistakes
- Forgetting to take modulo after each operation—intermediate values can overflow. In Python integers don't overflow, but taking mod at each step keeps numbers small and matches the math.
- Wrong substring hash formula: it must be
prefix[j] - prefix[i] * pow_B[j-i], not prefix[j] − prefix[i] (that would ignore the shift). Length of substring is j−i, so we multiply prefix[i] by B^(j−i). - Using a small modulus (e.g. 1000) or base that's too small—collisions become likely. Use prime M ≥ 10⁹ and base B > |alphabet|.
The hash of s[i:j] is not prefix[j] − prefix[i]. It is prefix[j] − prefix[i]·B^(j−i), because prefix[i] represents the hash of the first i characters, which in the full prefix[j] are shifted by (j−i) positions. So you must multiply prefix[i] by B^(j−i) (i.e. pow_B[j-i]) before subtracting.
Precompute prefix hashes and powers once in O(n). Then any substring hash is O(1). For Rabin-Karp we don't need all prefix hashes—we use a rolling hash: update the current window hash in O(1) when sliding by one character (add new char, remove old; topic 7.7). So pattern matching can be done in O(n) time with O(1) extra space for the hash state (plus O(m) for the pattern hash).
Pattern Recognition
Use string hashing when you need to compare substrings quickly (e.g. "are these two substrings equal?", "longest duplicate substring," "number of distinct substrings") or when implementing Rabin-Karp. If the problem only needs exact pattern match once, naive or KMP might be simpler; hashing shines when you compare many substrings or when rolling window updates are natural.
Common choices: B = 31 or 131, M = 10**9+7 or 10**9+9. For lowercase letters only, you can use ord(c) - ord('a') + 1 so values are 1–26 (avoids leading zeros). For arbitrary ASCII/Unicode, ord(c) is fine. Always use a prime modulus to reduce collisions.
"We can hash a string with a polynomial rolling hash: base B, modulus M, H = (c0*B^(k-1) + ... + c_{k-1}) mod M. With prefix hashes and precomputed powers, we get the hash of any substring in O(1). That lets us compare substrings in O(1) and is the basis for Rabin-Karp. Collisions are possible; we can use two moduli to make them very rare." Implement prefix build and substring hash; mention use in pattern matching (7.8).
Practice Problems
- Implement polynomial hash and prefix hashes; compute hash of any substring in O(1).
- Longest Duplicate Substring: use binary search on length + hashing to check if there's a duplicate of length L.
- Compare two substrings of a string in O(1) after O(n) preprocessing (hash equality).
Summary
- Polynomial rolling hash: H(s) = (c0·B^(k−1) + … + c_(k−1)) mod M. Same string → same hash; different strings usually different (collision possible).
- Prefix hashes: prefix[i] = hash of s[0:i]. Build in O(n). Hash of s[i:j] = (prefix[j] − prefix[i]·B^(j−i)) mod M; use pow_B for B^(j−i).
- Choose prime M (e.g. 10⁹+7), base B > alphabet size. Double hashing (two moduli) reduces collisions.
- Used in Rabin-Karp (7.8) and for O(1) substring comparison after O(n) preprocessing.
7.7 Rolling Hash
Introduction
A rolling hash (or sliding window hash) is a way to update the hash of a fixed-length window when the window slides by one position: instead of recomputing the hash of the new window from scratch (O(m) for a window of length m), we remove the contribution of the character that left the window, shift the remaining hash, and add the new character—all in O(1). This is the engine behind Rabin-Karp pattern matching (7.8): we compute the pattern's hash once, then compute the hash of the first window of the text; for each subsequent position we roll the hash in O(1) and compare with the pattern hash. So we scan the text in O(n) with only O(1) work per position, giving O(n + m) total (m for the pattern, n for the text).
Real-World Analogy
Imagine a train with m cars (the window). Each car has a number (character value). The "hash" of the train is a combination of those numbers in order. When the train moves forward one track position, the front car leaves and a new car joins at the back. Instead of recalculating the whole combination from the new set of cars, we subtract the contribution of the car that left, shift the rest (as if every remaining car moved one position in the formula), and add the new car's contribution. One subtraction, one shift, one addition—constant time.
Text T = "abcde", window length m = 3, base B = 31, mod M. Window at index 0: "abc" → hash H0 = a·31² + b·31 + c. Window at index 1: "bcd". Naive: recompute b·31² + c·31 + d. Rolling: H0 corresponds to (a, b, c). To get (b, c, d): remove a's contribution (a·31²), multiply the rest by 31 (so b·31² + c·31), then add d. So H1 = (H0 − a·31²)·31 + d = H0·31 − a·31³ + d. Precompute 31² and 31³ (or 31^(m−1) and 31^m) so each step is O(1).
Formal Definition
Rolling hash update: Let the current window be T[i..i+m−1] with hash H = (T[i]·B^(m−1) + T[i+1]·B^(m−2) + … + T[i+m−1]) mod M. The next window is T[i+1..i+m] with hash H' = (T[i+1]·B^(m−1) + … + T[i+m]) mod M. We have H' = (H − T[i]·B^(m−1))·B + T[i+m], all mod M. So: subtract the leftmost character's contribution (times B^(m−1)), multiply by B (shift), add the new rightmost character. Precompute B^(m−1) mod M once; then each roll is O(1).
Why This Topic Matters
- Rabin-Karp (7.8): Pattern matching uses one hash for the pattern and a rolling hash for the text window. Without the roll, each window would cost O(m), giving O(n·m) again; with the roll, O(n + m).
- Fixed-length window problems: Any problem that asks "for every contiguous block of length m" (e.g. distinct substrings of length m, or "find all positions where window equals X") can use a rolling hash to update in O(1) per step.
- Efficiency: Going from O(m) per window to O(1) per window is what makes hash-based pattern matching competitive with KMP.
Mental Model
Current window hash = value of the string in the window. When we slide right by one: (1) the leftmost character "leaves"—subtract its contribution, which is left_char * B^(m-1); (2) the rest of the window "moves left" in our formula—multiply the remaining hash by B; (3) the new character "enters" at the right—add it (with factor B^0 = 1). So: new_hash = (old_hash - left_char * B^(m-1)) * B + new_char, then mod M. Keep B^(m-1) % M stored.
Step-by-Step: Rolling Update
- Compute the hash of the first window T[0..m−1] (one pass, O(m)). Store as
cur. - Precompute
base_pow = B^(m−1) % M(or use a precomputed array from 7.6). - For i from 1 to n−m (each slide): left character = T[i−1], new character = T[i+m−1].
cur = ((cur - left_char * base_pow) * B + new_char) % M. Handle negative:cur = (cur % M + M) % Mafter the subtraction step if needed. - After each update,
curis the hash of T[i..i+m−1]. Compare with pattern hash as needed.
ASCII Diagram: One Roll
Window length m = 3. Current window: T[i..i+2] = "abc"
Hash H = a·B² + b·B + c
Slide right by 1: new window = "bcd"
- Remove a: H - a·B² → b·B + c
- Shift: (b·B + c)·B = b·B² + c·B
- Add d: b·B² + c·B + d = new hash H'
So H' = (H - a·B²)·B + d
Python Implementation
Compute First Window Hash
def first_window_hash(s, m, B=31, M=10**9+7):
"""Hash of s[0:m] (length m)."""
h = 0
for i in range(m):
h = (h * B + ord(s[i])) % M
return h
Rolling Hash: Update One Step
def roll_hash(cur, left_char, new_char, base_pow, B=31, M=10**9+7):
"""Given hash of window [i..i+m-1], return hash of [i+1..i+m].
left_char = T[i], new_char = T[i+m]."""
cur = (cur - ord(left_char) * base_pow) % M
cur = (cur * B + ord(new_char)) % M
if cur < 0:
cur += M
return cur
Full Loop: All Window Hashes (Rabin-Karp Style)
def rolling_hashes(text, m, B=31, M=10**9+7):
"""Yield hash of text[i:i+m] for i = 0, 1, ..., n-m."""
n = len(text)
if m > n:
return
base_pow = pow(B, m - 1, M)
cur = first_window_hash(text, m, B, M)
yield cur
for i in range(1, n - m + 1):
cur = roll_hash(cur, text[i-1], text[i+m-1], base_pow, B, M)
yield cur
Line-by-Line: Roll Formula
cur is (T[i-1]·B^(m−1) + T[i]·B^(m−2) + … + T[i+m−2]). We want the hash of T[i..i+m−1], i.e. T[i]·B^(m−1) + … + T[i+m−1]. Subtract T[i−1]·B^(m−1) from cur to get (T[i]·B^(m−2) + … + T[i+m−2]). Multiplying by B gives (T[i]·B^(m−1) + … + T[i+m−2]·B). Adding T[i+m−1] gives the new hash. So cur = (cur - ord(text[i-1])*base_pow)*B + ord(text[i+m-1]), then mod M.
Time Complexity
First window: O(m). Each roll: O(1). Total for n−m+1 windows: O(m) + (n−m)·O(1) = O(n). So we get the hash of every length-m window in O(n), which is the key to Rabin-Karp's O(n + m).
Space Complexity
O(1) for the current hash, the base power, and a few variables. We don't store all window hashes unless we need them; in Rabin-Karp we only compare with the pattern hash. O(1) auxiliary space.
Edge Cases
- m > n: No window; don't run the loop or return empty.
- m == 1: base_pow = B^0 = 1. Roll: cur = (cur - left_char)*B + new_char; correct.
- Negative modulo: In Python, (cur - left_char * base_pow) can be negative; take
(... % M + M) % Mbefore multiplying by B, or ensure the final cur is in [0, M−1].
Common Mistakes
- Using the wrong power: we subtract
left_char * B^(m−1), not B^m. The leftmost character in the window has weight B^(m−1) in the polynomial. - Off-by-one in indices: when at position i, the window is T[i..i+m−1]; the character that just left is T[i−1], the new one is T[i+m−1].
- Forgetting to take mod after each operation can lead to huge numbers (in other languages, overflow). In Python it's correct but slow; taking mod keeps values bounded.
The leftmost character in the current window has coefficient B^(m−1), not B^m. So we subtract left_char * B^(m−1). If you use B^m you're shifting one extra time and the hash will be wrong. Precompute base_pow = pow(B, m - 1, M).
Without rolling: n−m+1 windows, each hash computed in O(m) → O(n·m). With rolling: O(m) for the first window, then O(1) per slide → O(n). The only extra storage is B^(m−1) mod M and the current hash. This is why Rabin-Karp can be O(n + m) instead of O(n·m).
Connection to Rabin-Karp (7.8)
In Rabin-Karp we compute the pattern hash once (O(m)). Then we iterate over start indices i = 0..n−m: for i=0 we get the first window hash (O(m)); for i≥1 we roll from the previous hash (O(1)). When the current window hash equals the pattern hash, we optionally verify with a character-by-character check (to handle collisions). Total O(n + m). Topic 7.8 implements the full algorithm.
Use the same base and modulus as in 7.6 for consistency. When implementing Rabin-Karp, compute the pattern hash with the same formula as the first window (so that matching strings have equal hashes). For double hashing, maintain two rolling hashes with two (B, M) pairs and compare both.
"When we slide the window right by one, we remove the left character and add the right. The hash update is: subtract left_char·B^(m−1), multiply by B, add new_char—all mod M. That's O(1) per position, so we can get every length-m window hash in O(n). That's the rolling hash used in Rabin-Karp." Implement first_window_hash and roll_hash; mention that B^(m−1) is precomputed.
Practice Problems
- Implement rolling hash: first window, then roll step. Verify that hashes match prefix-based substring hash for the same window.
- Rabin-Karp (7.8): use rolling hash to find all positions where pattern occurs in text.
- Find all distinct substrings of length m in a string: use a set of rolling hashes (or double hash) to count distinct windows.
Summary
- Rolling hash: Update the hash of a length-m window in O(1) when sliding right by one. Formula:
new_hash = (old_hash − left_char·B^(m−1))·B + new_char(mod M). - Precompute
B^(m−1) % Monce. First window hash computed in O(m); then each of the (n−m) rolls is O(1) → total O(n). - Used in Rabin-Karp (7.8) for O(n + m) pattern matching. Also for any fixed-length sliding window problem where window equality is checked via hash.
7.8 Rabin-Karp Algorithm
Introduction
The Rabin-Karp algorithm is a pattern-matching algorithm that uses hashing (7.6) and a rolling hash (7.7) to find all occurrences of a pattern P in text T in O(n + m) average time, where n = |T| and m = |P|. Instead of comparing the pattern character-by-character at every start index (naive O(n·m)), we compute the hash of the pattern once, then scan the text with a sliding window of length m, updating the window hash in O(1) per step. When the window hash equals the pattern hash, we have a candidate match; we then optionally verify with a character-by-character check to rule out hash collisions. Rabin-Karp is simple to implement, extends naturally to multiple patterns or 2D matching, and is a classic interview topic alongside KMP (7.9).
Real-World Analogy
Imagine you have a stencil (the pattern) and you're sliding it along a long strip of text. Instead of comparing every letter under the stencil with the stencil at each position, you assign a "fingerprint" (hash) to the stencil and a fingerprint to whatever is under the window. You slide the window one step, update the fingerprint in constant time (rolling hash), and compare. When the fingerprints match, you double-check by looking at the letters—in case two different texts had the same fingerprint (collision). So you do fast filter by hash, then confirm when needed.
Text T = "cababc", pattern P = "ab". Pattern hash H(P) = h("ab"). First window T[0:2] = "ca" → H("ca") ≠ H(P). Roll: window "ab" → H("ab") = H(P) → candidate at index 1; verify: T[1:3] == "ab" ✓. Roll: window "ba" → no match. Roll: window "ab" at index 4 → hash match, verify ✓. Result: [1, 4].
Formal Definition
Rabin-Karp: (1) Compute hash H(P) of pattern P (length m) using the same polynomial hash as in 7.6. (2) Compute hash of T[0..m−1]. (3) For i = 0..n−m: if hash(T[i..i+m−1]) = H(P), then either report i as a match or verify by comparing T[i..i+m−1] with P character-by-character; then update the window hash to T[i+1..i+m] using the rolling update (7.7). Same base B and modulus M for pattern and text. Verification: On hash match, if we verify and it's a false positive, cost is O(m); worst-case many false positives → O(n·m). With a good hash, expected verification cost is low → O(n + m) average.
Why This Topic Matters
- O(n + m) pattern matching: With rolling hash, we avoid O(m) work per position; combined with sparse verification, average time is linear. Alternative to KMP (7.9) with different trade-offs.
- Multiple patterns: Rabin-Karp extends to searching for k patterns at once: compute k pattern hashes, put in a set; at each window, check if the window hash is in the set. KMP would need k passes or Aho-Corasick.
- 2D and other variants: Rolling hash can be extended to 2D (e.g. find a small image in a large one) by hashing rows and then columns. Conceptually the same idea.
Mental Model
One pattern hash. One window hash that rolls. At each position: if window hash == pattern hash, we have a candidate—verify (optional but recommended) and report. Then roll the window hash to the next position. Repeat. Verification is the safety net for collisions.
Step-by-Step Algorithm
- If m > n, return [] (no possible match).
- Choose B and M (e.g. B=31, M=10⁹+7). Precompute
base_pow = B^(m−1) % M. - Compute
pattern_hash= hash of P[0..m−1] (same formula as first window: H = (H*B + ord(c)) % M). - Compute
window_hash= hash of T[0..m−1]. - For i from 0 to n−m: if
window_hash == pattern_hash, optionally verify T[i..i+m−1] == P; if match (or skip verify), append i to result. If i < n−m, roll:window_hash = (window_hash - ord(T[i])*base_pow)*B + ord(T[i+m]), then mod M. - Return the list of start indices.
Python Implementation
def rabin_karp(text, pattern, B=31, M=10**9+7):
n, m = len(text), len(pattern)
if m > n:
return []
base_pow = pow(B, m - 1, M)
def hash_str(s, length):
h = 0
for i in range(length):
h = (h * B + ord(s[i])) % M
return h
def roll(h, left_c, new_c):
h = (h - ord(left_c) * base_pow) % M
h = (h * B + ord(new_c)) % M
return h % M if h >= 0 else (h % M + M) % M
pattern_hash = hash_str(pattern, m)
window_hash = hash_str(text, m)
result = []
for i in range(n - m + 1):
if window_hash == pattern_hash:
if text[i:i+m] == pattern: # verify to avoid false positive
result.append(i)
if i < n - m:
window_hash = roll(window_hash, text[i], text[i + m])
return result
Line-by-Line Explanation
hash_str(s, length): Same polynomial hash as 7.6—ensures H(P) and H(T[i..i+m−1]) use the same formula so equal strings have equal hashes.roll: Subtract left character's contribution (× B^(m−1)), multiply by B, add new character; handle negative mod so result is in [0, M−1].- When
window_hash == pattern_hash, we verify withtext[i:i+m] == patternso that we only report true matches. Without verification, collisions would produce false positives. - We roll only when
i < n - mso we don't read past the end of the text.
Time Complexity
Preprocessing: Pattern hash O(m), base_pow O(1) with pow(B, m−1, M). First window hash O(m). Main loop: n−m+1 iterations; each iteration: O(1) hash comparison, O(1) roll (when i < n−m). When we verify, we do O(m) character comparisons. Average case: Few hash matches (and few collisions), so verification cost is low → O(n + m). Worst case: Many hash collisions (e.g. pathological input or bad M)—every position might verify → O(n·m). With a large prime M and good B, worst case is rare in practice.
Space Complexity
O(1) auxiliary: pattern_hash, window_hash, base_pow, and loop variables. Result list is O(number of matches). So O(1) extra space excluding output.
Verification: Why We Need It
Hash equality only implies likely string equality. Two different strings can have the same hash (collision). So when window_hash == pattern_hash, we must either trust the hash (risking false positives) or verify. Verifying takes O(m) per candidate but keeps the output correct. With a single modulus M ≈ 10⁹, the chance of a random collision per comparison is about 1/M; over n positions expected false positives are low. Double hashing (two moduli) reduces the chance further; we can then skip verification in non-critical settings or verify only when both hashes match.
Edge Cases
- m > n: Return [] without looping.
- m == n: One window; compare hash, verify once.
- Empty pattern: Usually not defined; if m == 0, all indices could be considered matches—check problem.
- Negative modulo: After
(h - ord(left_c)*base_pow) % M, the result can be negative in Python; fix with(h % M + M) % Mbefore multiplying by B.
Common Mistakes
- Using a different hash formula for the pattern and the text window—they must be identical so that equal strings get equal hashes.
- Rolling when i == n−m (reading T[n] when 0-indexed length is n). Condition must be
if i < n - mbefore rolling. - Skipping verification entirely when correctness is required; hashes can collide.
Rolling the hash after the last valid window: when i = n−m, the window is T[n−m..n−1]. There is no "next" window, so do not call roll with text[n]. Only roll when i < n - m.
Rabin-Karp vs KMP
| Aspect | Rabin-Karp | KMP (7.9) |
|---|---|---|
| Average time | O(n + m) | O(n + m) |
| Worst time | O(n·m) with many collisions | O(n + m) |
| Idea | Hash + roll; verify on match | Failure function; no backtrack in text |
| Multiple patterns | Easy (set of hashes) | Need Aho-Corasick or k passes |
Rabin-Karp gives O(n + m) in practice with a good hash and verification. For guaranteed O(n + m) worst case with no hashing, use KMP (7.9). For "find any of k patterns," Rabin-Karp: one pass over text, at each window check if hash is in a set of k hashes—O(n + m·k) to build pattern hashes, then O(n) scan. Double hashing reduces false positives so verification is rarely needed.
Pattern Recognition
Use Rabin-Karp when you need exact pattern match in O(n + m) and hashing is acceptable, or when you need to search for multiple patterns in one pass. If the problem forbids hashing or requires worst-case linear time with no probability, prefer KMP.
Use the same B and M as in 7.6 and 7.7. In contests, M = 10**9+7 or 10**9+9 is standard. For maximum safety use double hashing and verify only when both hashes match; or always verify for correctness. Rabin-Karp is easier to code than KMP for many people—rolling hash + one loop.
"We hash the pattern once and the first window of the text. Then we slide the window one position at a time, updating the hash in O(1) with a rolling hash. When the window hash equals the pattern hash, we verify character-by-character to avoid false positives from collisions. Average time O(n + m)." Implement the loop with hash_str and roll; mention edge case m > n and rolling only when i < n−m.
Practice Problems
- Find all occurrences of pattern in text (Rabin-Karp).
- Implement strStr() with Rabin-Karp (return first index).
- Search for multiple patterns in text: compute set of pattern hashes, scan text with rolling hash, check membership.
Summary
- Rabin-Karp: Pattern matching using polynomial hash + rolling hash. Compute H(P) and H(T[0..m−1]); for each start i, if hashes match then verify and report i; then roll to next window. O(n + m) average.
- Same hash formula for pattern and window. Roll only when i < n−m. Verify on hash match to avoid false positives.
- Worst case O(n·m) with many collisions; use large prime M (and optionally double hash). KMP (7.9) gives guaranteed O(n + m) with no hashing.
7.9 KMP Algorithm
Introduction
The KMP (Knuth-Morris-Pratt) algorithm finds all occurrences of a pattern P in text T in O(n + m) time with no backtracking in the text. The naive method can re-scan the same text characters many times when a mismatch occurs; KMP avoids that by precomputing a failure function (also called lps—longest proper prefix that is also a suffix) on the pattern. When a mismatch happens at pattern index j, we don't move the text pointer back; we shift the pattern using lps so that we reuse the already-matched prefix and only advance the text pointer forward. The result is a single forward pass over the text and a bounded number of pattern shifts, giving guaranteed O(n + m) worst-case time and O(m) space.
Real-World Analogy
Imagine the pattern is a ruler with notches. You slide the ruler along the text. When the notches align with the text, you have a match. When they don't, the naive approach would lift the ruler and try the next position from scratch. KMP says: "We already know the first few characters of the ruler matched the text—so we can slide the ruler so that the next possible alignment uses that matched part again." The lps array tells us how far we can slide the ruler (how much of the pattern is a prefix that matches a suffix we already saw) so we never re-read the text backward.
Pattern P = "ababc". After matching "abab" and then failing on the next character, we know "ab" is a prefix of P that also appears as a suffix of the matched "abab". So we can shift P so that the second "ab" in P aligns with the "ab" we already matched in the text—we don't need to go back in the text. The lps array for "ababc": lps[0]=0, lps[1]=0, lps[2]=1 (prefix "a" vs "b" no; "ab" has border "" only? No—"a" is not suffix of "ab". Proper borders of "aba": "a"; so lps[3]=1. For "abab": borders "ab", "": longest is "ab" len 2 → lps[4]=2. So when we mismatch after "abab", we set j = lps[4-1]=lps[3]=1... Actually standard lps[i] = length of longest proper prefix of P[0..i] that is also a suffix. So lps[4] for "abab" = 2 ("ab"). On mismatch at j=5 (next char), we do j = lps[4] = 2 and continue.
Formal Definition
LPS (longest proper prefix that is also a suffix): For the substring P[0..i], a proper prefix is a prefix not equal to the whole string. lps[i] = length of the longest proper prefix of P[0..i] that is also a suffix of P[0..i]. Example: P = "aabaab"; for i=5 we have "aabaab"; proper prefixes "a","aa","aab","aaba","aabaa"; suffixes "b","ab","aab","baab","abaab"; the longest that appears in both is "aab" (length 3) → lps[5]=3. KMP search: Maintain text index i and pattern index j. If T[i]=P[j], increment both. If not and j>0, set j = lps[j−1] (shift pattern); if j=0, increment i. When j=m, we found a match at i−m; then set j = lps[j−1] to find next overlap.
Why This Topic Matters
- Guaranteed O(n + m): No hashing, no probability; worst-case linear time. Preferred when you need deterministic performance.
- No backtracking in text: The text pointer i never decreases. Useful for streaming or when the text is read once.
- Foundation for other algorithms: The same "failure function" idea appears in Aho-Corasick (7.14) and in problems like "repeated substring," "shortest period."
Mental Model
We have two pointers: i in the text (never goes back), j in the pattern. Match → both advance. Mismatch → we don't move i back; instead we ask: "What's the longest prefix of P[0..j−1] that is also a suffix?" That length is lps[j−1]. We set j = lps[j−1] and compare T[i] with P[j] again (so we've effectively shifted the pattern). If j=0 and still mismatch, then we advance i. When j reaches m, we found a match at i−m; then set j = lps[m−1] to continue searching for the next occurrence.
Building the LPS Array
We want lps[i] = length of longest proper prefix of P[0..i] that is also a suffix. We build it with two pointers: len = current longest border length, i = current index (1 to m−1). If P[i] == P[len], then lps[i] = len + 1, and we increment both. If not and len > 0, set len = lps[len−1] (try a shorter border). If len == 0, then lps[i] = 0 and i++.
def build_lps(pattern):
"""LPS[i] = length of longest proper prefix of P[0..i] that is also a suffix."""
m = len(pattern)
lps = [0] * m
length = 0 # length of current longest border
i = 1
while i < m:
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
KMP Search Algorithm
- Build lps for the pattern (O(m)).
- i = 0 (text), j = 0 (pattern).
- While i < n: if T[i] == P[j], then i++, j++. If j == m, we found a match at i−m; append to result; set j = lps[j−1] to continue. Else (mismatch): if j > 0, set j = lps[j−1]; else i++.
- Return list of start indices.
Python Implementation
def kmp_search(text, pattern):
n, m = len(text), len(pattern)
if m > n:
return []
if m == 0:
return list(range(n + 1)) # problem-dependent
lps = build_lps(pattern)
result = []
i = j = 0
while i < n:
if text[i] == pattern[j]:
i += 1
j += 1
if j == m:
result.append(i - m)
j = lps[j - 1]
else:
if j > 0:
j = lps[j - 1]
else:
i += 1
return result
Line-by-Line: Build LPS
lengthholds the length of the longest border for the prefix we've extended so far. When P[i] == P[length], extending by one character gives a longer border, so lps[i] = length + 1.- When P[i] != P[length], we can't extend the current border. We try the next shorter candidate: the longest border of P[0..length−1] has length lps[length−1], so set length = lps[length−1] and recheck (without advancing i yet).
- When length == 0, there is no border; lps[i] = 0 and we move i forward.
Line-by-Line: Search
- Match: advance both i and j. When j == m, we've matched the whole pattern → record start i−m, then set j = lps[m−1] so we can find the next occurrence that might overlap (e.g. pattern "aa" in "aaa" gives matches at 0 and 1).
- Mismatch and j > 0: shift pattern by setting j = lps[j−1]; we don't move i, so we compare T[i] with P[j] again.
- Mismatch and j == 0: no border to fall back to; advance i.
ASCII Diagram: LPS and Shift
P = "ababc" (m=5)
Proper prefix that is also suffix (borders):
P[0..0] "a" → none (proper prefix only "") → lps[0]=0
P[0..1] "ab" → none → lps[1]=0
P[0..2] "aba" → "a" → lps[2]=1
P[0..3] "abab"→ "ab" → lps[3]=2
P[0..4] "ababc"→ none (suffix "c" no prefix "c")→ lps[4]=0
Text "abababc", match "abab" then mismatch at next char.
Shift: j was 4, lps[3]=2 so j=2. Now we compare with P[2]='a' without moving text pointer.
Time Complexity
Build LPS: The inner while (length = lps[length−1]) can run multiple times per i, but length decreases each time and we only increase length by 1 when we advance i. Amortized O(m). Search: Each step either increases i or decreases j (j drops by at least 1 when we do lps[j−1], and j is increased at most n times when we match). So total iterations O(n + m). Overall O(n + m).
Space Complexity
LPS array is O(m). A few variables for i, j. O(m) space.
Edge Cases
- m > n: Return [] without building lps or searching.
- m == 0: Often "empty pattern matches everywhere"; return [0,1,...,n] or as per problem.
- No match: Return [].
- Overlapping matches: e.g. P = "aa", T = "aaa". After match at 0, we set j = lps[1] = 1; then we compare T[2] with P[1] and get match at 1. So overlapping matches are found correctly.
Common Mistakes
- In build_lps, advancing i when we set length = lps[length−1]—we should not advance i until we've either set lps[i] or determined length==0. The standard code only increments i when pattern[i]==pattern[length] or when length==0.
- In search, after finding a match (j==m), forgetting to set j = lps[j−1] and instead setting j=0—then we miss overlapping matches (e.g. "aa" in "aaa").
- Confusing 0-indexed lps: lps[j−1] is the border length for P[0..j−1]; the next position to try in the pattern is index lps[j−1], so j = lps[j−1] is correct.
After finding a match (j == m), set j = lps[j - 1] (i.e. lps[m−1]) so that we can detect the next occurrence that might overlap with the current one. If you set j = 0, you will miss overlaps like pattern "aa" in text "aaa" (matches at 0 and 1).
KMP never backs up in the text—every character is compared at most twice in amortized sense (when we decrease j we're "reusing" a prefix). So for one pattern, KMP is the standard O(n + m) solution with no hashing. For multiple patterns, use Aho-Corasick (7.14), which generalizes the failure function to a trie of patterns.
Pattern Recognition
Use KMP when you need exact pattern match with guaranteed O(n + m), when you can't use hashing, or when the problem asks for "failure function" or "border" of a string. Also for "shortest period," "repeated substring" (pattern = string, search in string+string).
Memorize: lps[i] = longest proper prefix of P[0..i] that is also a suffix. Build with two pointers (length, i); on match extend border; on mismatch shorten with length = lps[length−1]. Search: match → i++, j++; j==m → report, j=lps[j−1]; mismatch and j>0 → j=lps[j−1]; else i++.
"KMP precomputes an lps array on the pattern: lps[i] is the length of the longest proper prefix of P[0..i] that is also a suffix. When we mismatch at j, we set j = lps[j−1] and don't move the text pointer—so we never backtrack in the text. That gives O(n + m) worst case. After a full match we set j = lps[m−1] to find overlapping matches." Implement build_lps and the search loop; mention overlap handling.
Practice Problems
- Implement strStr() / Find first occurrence using KMP.
- Find all occurrences of pattern in text (KMP).
- Repeated Substring Pattern: is s equal to a proper substring repeated? (Concatenate s with itself, search for s in (s+s)[1:-1] with KMP.)
- Shortest Palindrome: add characters in front to make palindrome (use KMP on s + '#' + reverse(s), use lps).
Summary
- KMP: Pattern matching in O(n + m) using an lps (failure) array. Text pointer never goes backward.
- LPS[i] = length of longest proper prefix of P[0..i] that is also a suffix. Build in O(m) with two pointers.
- Search: Match → i++, j++. j==m → report i−m, j = lps[j−1]. Mismatch: j>0 → j = lps[j−1]; else i++.
- After a match, set j = lps[m−1] to catch overlapping occurrences. Guaranteed O(n + m); no hashing.
7.10 Z Algorithm
Introduction
The Z algorithm (or Z-box algorithm) computes an array Z for a string S such that Z[i] is the length of the longest substring starting at position i that matches a prefix of S. So Z[0] = n (the whole string is a prefix of itself), and for i > 0, Z[i] tells us how many characters starting at i match S[0], S[1], …. We build the Z array in O(n) time using an invariant: we maintain a "Z-box" [L, R] meaning S[L..R] = S[0..R−L] (the substring from L matches the prefix of the same length). The Z algorithm is used for pattern matching: form S = P + '$' + T (with '$' a character not in P or T); then for each index i in the T part, if Z[i] = len(P), the pattern occurs at that position in T. Like KMP, it gives O(n + m) pattern matching with no hashing.
Real-World Analogy
Imagine you have a ribbon with a repeating pattern at the start. For each position along the ribbon, you ask: "How far to the right does the ribbon match the pattern starting at the beginning?" The Z array stores that length at each position. We compute it efficiently by reusing what we already know: if we've computed a stretch [L, R] that matches the prefix, then for positions inside that stretch we can copy information from the corresponding position in the prefix (with a small adjustment) and only extend when necessary. So we avoid re-comparing from scratch at every index.
String S = "aabxaabxcaabxaabxay". Z[0] = len(S). Z[1] = 1 (S[1]='a' matches S[0]; S[2]='b' ≠ S[1]). Z[2] = 0. Z[3] = 0. Z[4] = 1. Z[5] = 4 (S[5..8] = "aabx" = S[0..3]). For pattern matching: S = "ab" + "$" + "cababc" = "ab$cababc". Z[3] = 2 = len("ab") → pattern matches at index 3−len("ab")−1 = 0 in T? Actually: S = P + '$' + T; if we use 0-indexed and P has length m, then T starts at index m+1. So index i in S corresponds to start i−(m+1) in T. When Z[i] = m, pattern starts at i in S, so it's at start i−(m+1) in T... No: if Z[i]=m, the substring S[i..i+m−1] equals S[0..m−1]=P. So in S, the pattern appears starting at i. T starts at index m+1 in S. So S-index i is T-index i−(m+1). So when Z[i]=m and i ≥ m+1, we have a match at T-index i−m−1.
Formal Definition
Z[i]: For string S of length n, Z[i] = length of the longest substring S[i..i+k−1] that equals the prefix S[0..k−1]. So S[i..i+Z[i]−1] = S[0..Z[i]−1]. By convention Z[0] = n (we can define it as length of the string, or sometimes we don't use Z[0] for pattern matching). Z-box [L, R]: We maintain L and R such that S[L..R] = S[0..R−L], i.e. the substring starting at L matches the prefix of length R−L+1. For each i, if i ≤ R we use the fact that S[i..R] = S[i−L..R−L]; so Z[i] ≥ min(Z[i−L], R−i+1). We then extend by comparing. If i > R we compute Z[i] from scratch. After computing Z[i], if i+Z[i]−1 > R we update L=i, R=i+Z[i]−1.
Why This Topic Matters
- Pattern matching in O(n + m): With S = P + '$' + T, any index i (i ≥ m+1) where Z[i] = m gives a match of P in T at position i−m−1. One Z array computation is O(|S|) = O(n + m).
- Alternative to KMP: Same linear time, different idea (prefix matching at every position). Some problems are more natural with Z (e.g. find all positions where a prefix repeats).
- String structure: Z array reveals periodicity and repeated prefixes; used in "find period," "distinct substrings," and similar.
Mental Model
We have a window [L, R] that we've already verified matches the prefix S[0..R−L]. For the current index i: if i is inside [L, R], we know S[i..R] = S[i−L..R−L], so Z[i] is at least min(Z[i−L], R−i+1). We then try to extend by comparing S[R+1] with S[R−i+1], etc. If i > R, we compute Z[i] by comparing S[i] with S[0] and extending. Whenever we get a new rightmost R (i+Z[i]−1 > R), we update L=i and R=i+Z[i]−1.
Step-by-Step: Building the Z Array
- Z[0] = n (or set and skip in loop). L = R = 0 initially.
- For i from 1 to n−1: If i > R, we have no box covering i. Set L = R = i, then while R < n and S[R−L] == S[R], extend R. Then Z[i] = R−L, and R = R−1 (since we exited with R one past the match). Actually standard: Z[i] = R−i+1 after extending so that S[i..R] = S[0..R−i]. So we extend R while S[R] == S[R−i]. Then Z[i] = R−i+1, set R to the new R (we use R as inclusive). Then L = i, R = current R.
- If i ≤ R, we're inside the box. k = i−L. If Z[k] < R−i+1, then Z[i] = Z[k]. Else we have Z[i] ≥ R−i+1; set L = i and extend R from R+1 comparing S[R+1] with S[R−i+1], then Z[i] = R−i+1.
Python Implementation
Build Z Array
def build_z(s):
"""Z[i] = length of longest substring starting at i that matches prefix of s."""
n = len(s)
z = [0] * n
l = r = 0
for i in range(1, n):
if i > r:
l = r = i
while r < n and s[r - l] == s[r]:
r += 1
z[i] = r - l
r -= 1
else:
k = i - l
if z[k] < r - i + 1:
z[i] = z[k]
else:
l = i
while r + 1 < n and s[r + 1] == s[r + 1 - l]:
r += 1
z[i] = r - i + 1
return z
Pattern Matching with Z
def z_pattern_match(text, pattern):
"""Return list of start indices in text where pattern occurs."""
m, n = len(pattern), len(text)
if m > n:
return []
s = pattern + '\0' + text # use \0 or any separator not in P, T
z = build_z(s)
result = []
for i in range(m + 1, len(s) - m + 1):
if z[i] == m:
result.append(i - m - 1)
return result
T starts at index m+1 in the combined string S. When Z[i] = m for some i ≥ m+1, the substring S[i..i+m−1] equals the pattern, so the match in T starts at index i − m − 1.
Line-by-Line: Z Build (i > R case)
When i > R, no prior box covers i. We set L = i and extend R: compare S[R] with S[R−i] (so we're comparing the next character at position R with the same offset from the start). When they match, R increases. So S[i..R] = S[0..R−i]. Then Z[i] = R−i+1 (number of characters). Then we set R to this R (inclusive). In the code that uses "r -= 1" after: the while loop exits when r reaches n or mismatch, so r is one past the last match; we do r -= 1 so that R is the last index of the Z-box. So L = i, R = r (after decrement).
Time Complexity
Each time we compare characters, either R increases or i increases. R never decreases (we only set L=i and R to a value ≥ current R when we extend). So total comparisons O(n). O(n) to build Z for a string of length n. Pattern matching: S has length m+1+n, so O(n + m).
Space Complexity
Z array is O(n). L and R are O(1). O(n) space. For pattern matching, the combined string is O(n + m), so O(n + m) space.
Edge Cases
- Empty string: Return Z = [] or Z[0]=0.
- Single character: Z[0]=1, no other indices (or i from 1 to n−1 gives Z[i] from the loop).
- Pattern longer than text: Return [] before building Z.
- Separator in pattern or text: Use a character that does not appear in P or T (e.g. '\0' or '#'); otherwise Z might span across the separator incorrectly.
Common Mistakes
- Using a separator that appears in P or T—then Z values can extend across the boundary and give false matches. Use a character guaranteed absent.
- Off-by-one in pattern matching: T starts at index m+1 in S (after P and the separator). So when Z[i]=m at index i, the match in T starts at i−(m+1), not i−m.
- In the Z build, when i ≤ R and Z[k] ≥ R−i+1, we must extend from R+1; don't forget to update L and R after extending.
In pattern matching with S = P + '$' + T, the match start index in T is i − m − 1, not i − m. The T part starts at index m+1 in S (index 0 is start of P, index m is the separator, index m+1 is start of T). So when Z[i] = m for index i ≥ m+1, the pattern occupies S[i..i+m−1], and that corresponds to T starting at position (i − (m+1)) = i − m − 1.
Z Algorithm vs KMP
Both achieve O(n + m) pattern matching. KMP: precompute LPS on the pattern, then one pass over the text with two pointers. Z: form S = P + sep + T, compute Z for S; positions in T where Z[i]=m are matches. Z gives "prefix match length at every position" in one array; KMP gives "failure function" and doesn't store prefix match at every text position. For just pattern matching, either is fine; Z can be more intuitive for problems that ask "longest prefix match at each index."
The key to O(n) Z build is that R never decreases. When we use Z[i−L] for i inside the box, we get a lower bound and sometimes the exact value without any comparison. When we extend, we only compare from R+1 onward, so each comparison increases R. So total work is linear.
Pattern Recognition
Use Z when you need length of prefix match at every position, or when pattern matching with a single combined string is convenient. Also for "find all positions where the string matches its prefix," "period of string," or "number of occurrences of prefix at each position."
Keep the Z-box [L, R] meaning S[L..R] = S[0..R−L]. When i ≤ R, use k = i−L and Z[i] = min(Z[k], R−i+1) unless Z[k] ≥ R−i+1 in which case we extend. When i > R, compute Z[i] from scratch and update L, R. For pattern matching, always use a separator not in P or T.
"The Z array at i gives the length of the longest substring starting at i that matches the prefix. We build it in O(n) by maintaining a box [L,R] that matches the prefix; for i inside the box we reuse Z[i−L], then extend if needed. For pattern matching we form P + sep + T and find indices i where Z[i] = len(P); those are match starts in T at i−len(P)−1." Implement build_z and mention the separator.
Practice Problems
- Find all occurrences of pattern in text using Z algorithm.
- Find all positions in a string where the prefix repeats (Z[i] = some value).
- Shortest period of a string (smallest period p such that s is periodic with period p)—use Z or KMP.
Summary
- Z[i] = length of longest substring starting at i that equals a prefix of S. Build in O(n) using a Z-box [L, R].
- Pattern matching: S = P + separator + T; compute Z for S; when Z[i] = m and i ≥ m+1, pattern occurs in T at start index i−m−1.
- Use a separator that does not appear in P or T. Match start in T is i−m−1, not i−m.
- Same O(n + m) as KMP; Z array gives prefix-match length at every position, which is useful for period and repeat problems.
7.11 Manacher's Algorithm
Introduction
Manacher's algorithm finds the longest palindromic substring of a string in O(n) time and O(n) space. In topic 7.4 we used "expand around center" for every center (odd and even), which takes O(n²). Manacher improves this by reusing information: we maintain the rightmost boundary R of any palindrome we've seen so far and its center C. For each position i, we use the mirror position 2·C − i to get a lower bound on the palindrome radius at i, then expand only when needed. We also transform the string by inserting a separator (e.g. '#') between every pair of characters and at the ends, so that every palindrome in the new string has odd length—then we only need one type of center. The result is a single pass where the "expand" step advances R at least once per comparison, giving O(n) total.
Real-World Analogy
Imagine you're measuring how far each point on a line can "see" in both directions while staying inside a symmetric corridor (palindrome). Once you've computed the corridor for a point to your left, a point inside that corridor can often reuse the same width (mirror image) instead of measuring from scratch. You only extend the measurement when you're past the known corridor or when the mirror gives a lower bound and you need to check further. So you avoid re-measuring the same stretch repeatedly—that's how Manacher gets linear time.
String s = "babad". Transform with '#' → "#b#a#b#a#d#". Now every palindrome has odd length and a unique center. The longest palindrome in the original string is "bab" or "aba" (length 3). In the transformed string, the center of "aba" is the middle 'a'; the radius (half-length) is 2 (including the center). Manacher's array P at that center would be 2; the corresponding length in the original string is P[i] (the number of original characters in that palindrome). So we get length 3 and can recover the substring.
Formal Definition
Transformed string: T = "#" + "#".join(s) + "#", so length is 2n+1. Every palindrome in T has odd length with a unique center. P[i]: the radius (half-length) of the longest palindrome centered at index i in T. The palindrome spans T[i−P[i]..i+P[i]] and has length 2·P[i]+1 in T. The number of original characters in that palindrome equals P[i]. So the longest palindromic substring in s has length max(P[i]). We maintain center C and right boundary R of the rightmost palindrome; for each i we use the mirror 2·C−i to get a lower bound, then expand. R never decreases, so total work is O(n).
Why This Topic Matters
- Optimal for longest palindromic substring: Expand-around-center (7.4) is O(n²); Manacher is O(n). For large inputs or when the problem explicitly asks for linear time, Manacher is the answer.
- Reuses structure: Like Z algorithm and KMP, we reuse previously computed information (the mirror palindrome under center C) to avoid redundant work.
- Interview and contests: "Longest palindromic substring in O(n)" is a classic follow-up after the O(n²) solution.
Mental Model
We have a "rightmost" palindrome with center C and right boundary R (so it spans to index R). For the current position i: if i is inside [C−(R−C), R] = [2C−R, R], then the mirror of i with respect to C is j = 2C−i. The palindrome at j is fully inside the one at C, so we know the palindrome at i is at least as large as the part that fits—P[i] ≥ min(P[j], R−i). We set P[i] to that minimum, then try to expand. If we expand past R, we update C = i and R = new right boundary. So we always move R forward, giving O(n) total expansions.
Transform and Key Recurrence
Transform: T = "#" + "#".join(list(s)) + "#" so that every palindrome has odd length. Example: "abba" → "#a#b#b#a#".
Recurrence: Maintain C (center of the palindrome that extends farthest right) and R (its right boundary, inclusive). For each i from 1 to len(T)−1: If i ≤ R, set P[i] = min(R − i, P[2*C − i]). Then while T[i + P[i] + 1] == T[i − P[i] − 1], increment P[i]. If i + P[i] > R, set C = i and R = i + P[i].
Python Implementation
def manacher(s):
"""Return (max_radius, center_index) in transformed string; original length = max_radius."""
if not s:
return 0, 0
t = '#' + '#'.join(s) + '#'
n = len(t)
p = [0] * n
c = r = 0
for i in range(1, n - 1):
mirror = 2 * c - i
if i < r:
p[i] = min(r - i, p[mirror])
while i + p[i] + 1 < n and i - p[i] - 1 >= 0 and t[i + p[i] + 1] == t[i - p[i] - 1]:
p[i] += 1
if i + p[i] > r:
c = i
r = i + p[i]
max_rad = max(p)
center = p.index(max_rad)
return max_rad, center
def longest_palindrome_manacher(s):
"""Return the longest palindromic substring of s."""
if not s:
return ""
max_rad, center = manacher(s)
t = '#' + '#'.join(s) + '#'
start = center - max_rad
end = center + max_rad + 1
substring = t[start:end]
return substring.replace('#', '')
Line-by-Line Explanation
t = '#' + '#'.join(s) + '#': Transforms "ab" into "#a#b#". So every palindrome in t has odd length (center is one character).if i < r: We're inside the palindrome centered at c; mirror = 2*c − i. The palindrome at mirror is fully contained in [2*c−r, r], so we can copy P[mirror] unless it would extend past r—then we cap at r−i.while ... expand: Try to extend the palindrome at i; stop at boundary or mismatch.if i + p[i] > r: We extended past the old right boundary, so update center c and right boundary r.- To get the original substring: the segment in t from center−max_rad to center+max_rad (inclusive) has 2*max_rad+1 characters; removing '#' gives max_rad original characters? Actually in t, indices center−max_rad to center+max_rad inclusive: that's 2*max_rad+1 chars. The original chars in that range are every other (the odd indices in that range). So original length = max_rad. So longest palindrome length = max_rad. To extract: take t[center-max_rad : center+max_rad+1] and replace '#' with ''. That gives the substring.
Time Complexity
Each time we expand (increment P[i]), we increase the right boundary R. R never decreases and is at most n (length of t). So total expansions are O(n). The "min(r − i, p[mirror])" step is O(1). So O(n) total.
Space Complexity
Transformed string t: O(n). Array P: O(n). So O(n).
Edge Cases
- Empty string: Return "" or length 0.
- Single character: Transform "#a#"; P[1]=1 (radius 1); longest is "a".
- No palindrome of length > 1: e.g. "abc"; max P[i] = 1; return first character.
Common Mistakes
- Forgetting to transform the string—then even-length palindromes don't have a single center in the middle of a character.
- Wrong mirror index: mirror = 2*C − i (i and mirror are symmetric about C).
- When extracting the substring, mixing up indices in t vs original s. The segment in t is [center−P[center], center+P[center]]; remove '#' to get the original substring.
The mirror of index i with respect to center C is 2*C − i, not C − i or i − C. So P[i] gets a lower bound from P[2*C − i] when i is inside the current rightmost palindrome.
Expand-Around-Center vs Manacher
| Method | Time | Space |
|---|---|---|
| Expand around center (7.4) | O(n²) | O(1) |
| Manacher's algorithm | O(n) | O(n) |
Expand-around-center does O(n) work per center (up to n/2 expansions per center in the worst case), so O(n²). Manacher reuses the rightmost palindrome: when i is inside it, we get a free lower bound and often don't expand at all; when we do expand, we push R forward, so total expansions are O(n). Trade-off: O(n) extra space for the P array and transformed string.
Pattern Recognition
Use Manacher when the problem asks for longest palindromic substring in O(n), or when you need the radius (or length) of the longest palindrome centered at every position (e.g. count palindromic substrings in O(n) by summing (P[i]+1)//2 or similar). For "just" longest palindrome and n is small, expand-around-center is simpler.
Transform with '#' so that the transformed string has length 2n+1 and every palindrome is odd-length. Then P[i] is the "radius" (half-length minus center). The original palindrome length equals P[i]. After the loop, max(P) gives the longest palindrome radius; extract the substring from the transformed string and remove '#'.
"We can do expand-around-center in O(n²). For O(n) we use Manacher's algorithm: transform the string so every palindrome has odd length (insert '#' between characters). Then we maintain the rightmost palindrome [C, R] and for each position i we use the mirror 2*C−i to get a lower bound on P[i], then expand. R only moves forward, so total work is O(n)." Implement the transform and the main loop with mirror and expand.
Practice Problems
- Longest Palindromic Substring (LeetCode 5) in O(n) with Manacher.
- Count palindromic substrings in O(n) using the P array (each center i contributes (P[i]+1)//2 palindromes, or similar).
Summary
- Manacher's algorithm finds the longest palindromic substring in O(n) by reusing the rightmost palindrome boundary.
- Transform: T = "#" + "#".join(s) + "#" so every palindrome has odd length. P[i] = radius of longest palindrome centered at i.
- Recurrence: If i ≤ R, P[i] = min(R−i, P[2*C−i]); then expand. If i+P[i] > R, update C=i, R=i+P[i].
- R never decreases → O(n) time. O(n) space for T and P.
7.12 Suffix Array
Introduction
A suffix array of a string S of length n is an array of integers that gives the starting indices of all suffixes of S in lexicographic (sorted) order. So suffix_array[0] is the index of the smallest suffix, suffix_array[1] the next smallest, and so on. For example, for S = "banana", the suffixes are "banana"(0), "anana"(1), "nana"(2), "ana"(3), "na"(4), "a"(5). Sorted: "a"(5), "ana"(3), "anana"(1), "banana"(0), "na"(4), "nana"(2). So the suffix array is [5, 3, 1, 0, 4, 2]. Once built, we can search for a pattern P in O(m log n) by binary search: find the range of suffixes that have P as a prefix. We can also build an LCP array (longest common prefix between consecutive suffixes in the suffix array) to solve problems like "longest repeated substring" and "number of distinct substrings." Naive construction is O(n² log n) (sort n suffixes, each comparison O(n)); efficient algorithms (doubling, SA-IS) achieve O(n log n) or O(n).
Real-World Analogy
Imagine a dictionary of all suffixes of a word: "banana" gives entries "a", "ana", "anana", "banana", "na", "nana" in alphabetical order. The suffix array is the list of "page numbers" (starting indices) in that order. To find where "nan" appears, you open the dictionary to the right place (binary search) and check if the suffix at that position starts with "nan". The sorted order lets you binary search instead of scanning every suffix.
S = "banana", n = 6. Suffixes: S[0:]= "banana", S[1:]= "anana", S[2:]= "nana", S[3:]= "ana", S[4:]= "na", S[5:]= "a". Sorted lexicographically: "a" < "ana" < "anana" < "banana" < "na" < "nana". So SA = [5, 3, 1, 0, 4, 2]. Pattern "na": binary search finds suffixes that start with "na"—they are at positions 4 and 5 in SA (indices 4 and 2 in S). So "na" occurs at indices 2 and 4.
Formal Definition
Suffix array SA[0..n−1]: A permutation of {0, 1, …, n−1} such that the suffix starting at SA[0] is lexicographically smallest, the suffix at SA[1] is the next smallest, and so on. So S[SA[i]:] < S[SA[i+1]:] for all i. Pattern search: P occurs in S iff there is some suffix that has P as a prefix. Because suffixes are sorted, all such suffixes form a contiguous range in the suffix array. Binary search finds the leftmost and rightmost index in SA where the suffix has P as prefix; the occurrences of P in S are exactly SA[lo], SA[lo+1], …, SA[hi]. LCP array: LCP[i] = length of longest common prefix of S[SA[i]:] and S[SA[i−1]:]. Used for repeated substrings and distinct substring count.
Why This Topic Matters
- Pattern matching: After O(n log n) or O(n) build, we can answer "does P occur in S?" and "where does P occur?" in O(m log n) per query (binary search + O(m) comparison per step).
- Longest repeated substring: With the LCP array, the maximum LCP value gives the length of the longest substring that appears at least twice. The substring is S[SA[i]:SA[i]+LCP[i]].
- Distinct substrings: Total substrings = n(n+1)/2; subtract sum of LCP to get distinct count (each repeated substring is "overcounted" by the common prefix length).
Mental Model
List all suffixes (starting index 0, 1, …, n−1), sort them lexicographically. The sorted order of indices is the suffix array. To find pattern P: binary search for the smallest suffix that is ≥ P, and the smallest suffix that is > P (where "suffix > P" means P is a prefix of the suffix or the suffix is lexicographically greater). The range between them (if any) gives the starting indices where P occurs.
Building the Suffix Array (Naive)
Create pairs (suffix string, index) for each starting index 0..n−1. Sort by the suffix string. Extract the indices in order. Comparing two suffixes is O(n) in the worst case, and we have O(n log n) comparisons, so O(n² log n) total. For small n this is acceptable; for large n we need O(n log n) construction (e.g. doubling with radix sort or SA-IS).
Python Implementation
Naive Suffix Array Build
def build_suffix_array_naive(s):
"""Return suffix array: indices of suffixes in lexicographic order."""
n = len(s)
suffixes = [(s[i:], i) for i in range(n)]
suffixes.sort(key=lambda x: x[0])
return [idx for _, idx in suffixes]
Pattern Search: Binary Search
def suffix_array_search(s, suffix_array, pattern):
"""Return list of start indices in s where pattern occurs."""
n, m = len(s), len(pattern)
if m > n or not pattern:
return []
def cmp_suffix(i, pattern):
# Compare s[i:] with pattern. Return -1 if s[i:] < pattern, 0 if prefix, 1 if s[i:] > pattern
suff = s[i:]
if suff.startswith(pattern):
return 0
if suff < pattern:
return -1
return 1
# Binary search: leftmost index in SA where suffix >= pattern (has P as prefix or is greater)
lo, hi = 0, n
while lo < hi:
mid = (lo + hi) // 2
if cmp_suffix(suffix_array[mid], pattern) < 0:
lo = mid + 1
else:
hi = mid
left = lo
# Rightmost index where suffix has P as prefix (suffix < P + chr(255) or similar)
hi = n
while lo < hi:
mid = (lo + hi) // 2
if s[suffix_array[mid]:].startswith(pattern):
lo = mid + 1
else:
hi = mid
right = lo - 1
if left <= right:
return sorted([suffix_array[i] for i in range(left, right + 1)])
return []
Simpler variant: one binary search for "first suffix ≥ pattern", one for "first suffix > pattern" (where we treat "prefix of suffix" as equal to pattern). Then occurrences are SA[left], ..., SA[right−1] when we use the second search as "first suffix that does not have P as prefix."
LCP Array (Brief)
After building the suffix array, the LCP array LCP[i] = length of the longest common prefix of S[SA[i]:] and S[SA[i−1]:]. We can build it in O(n) by iterating with a pointer that never increases by more than 1 per step. Then max(LCP) is the length of the longest repeated substring; the substring is S[SA[i]: SA[i] + LCP[i]] for the i that achieves the max.
Time Complexity
Naive build: O(n² log n)—n suffixes, sort with O(n log n) comparisons, each comparison O(n). Efficient build: O(n log n) with doubling + sort, or O(n) with SA-IS (not covered here). Pattern search: O(m log n) with binary search (log n steps, each comparison O(m)).
Space Complexity
Suffix array: O(n). Naive build stores O(n) strings of total length O(n²) in the worst case (when we store full suffix strings); we can avoid storing full strings by comparing on the fly during sort (Python's sort will compare s[i:] with s[j:] which creates temporary slices—still O(n) per comparison). So O(n) for SA; naive construction may use O(n²) temporary space for the list of suffixes if we materialize them. With a key that returns s[i:] on demand, we use O(n) for the array of indices and O(n) per comparison.
Edge Cases
- Empty string: Suffix array is [].
- Single character: SA = [0].
- Pattern longer than string: Return [] from search.
- Pattern empty: Usually return all indices 0..n−1 or handle per problem.
Common Mistakes
- Comparing suffixes incorrectly: use lexicographic order (same as Python's string comparison).
- In binary search, off-by-one in the "first suffix that has P as prefix" vs "first suffix that is > P" — the range of occurrences is [left, right] inclusive when left is first with prefix P and right is last with prefix P.
Naive: Sort suffixes as strings → O(n² log n). Better: Doubling algorithm: sort by first 1 char, then by first 2, 4, 8, … using ranks; each phase O(n) with radix sort → O(n log n). Optimal: SA-IS and others in O(n). For interviews, naive build is often acceptable when n is small; mention that production uses O(n log n) or O(n) construction.
Pattern Recognition
Use suffix array when you need many pattern searches on the same text (build once, query in O(m log n)), longest repeated substring, longest common substring of two strings (concatenate with separator, build SA and LCP, find max LCP between suffixes from different strings), or distinct substring count. For a single pattern search, KMP or Rabin-Karp is simpler.
In Python, sorting with key=lambda i: s[i:] is cleaner than storing (s[i:], i) because we only compare when needed; but s[i:] creates a new string each time. For naive build, (s[i:], i) is fine for small n. For search, binary search with a comparator that compares s[SA[mid]:] with pattern avoids building a list of all suffixes.
"A suffix array is the list of starting indices of all suffixes in sorted order. We can build it naively by sorting the suffixes in O(n² log n). To search for pattern P we binary search: find the range of suffixes that have P as prefix—O(m log n). With an LCP array we can get the longest repeated substring in O(n)." Implement naive build and one binary search; mention LCP for repeated substring.
Practice Problems
- Build suffix array (naive) and search for a pattern.
- Longest repeated substring: build SA and LCP, return substring for max LCP.
- Number of distinct substrings: n(n+1)/2 − sum(LCP).
- Longest common substring of two strings: S = A + '#' + B, build SA and LCP, find max LCP where the two suffixes come from A and B.
Summary
- Suffix array SA: starting indices of all suffixes in lexicographic order. SA[i] = index of the i-th smallest suffix.
- Naive build: sort suffixes → O(n² log n). Efficient: O(n log n) or O(n) with doubling/SA-IS.
- Pattern search: binary search for range of suffixes with P as prefix → O(m log n). Occurrences are SA[lo..hi].
- LCP array: longest common prefix of consecutive suffixes in SA order. Max LCP = length of longest repeated substring.
7.13 Suffix Tree
Introduction
A suffix tree of a string S is a trie (or compressed trie) that contains all suffixes of S. Each path from the root corresponds to a substring of S; each leaf is labeled with the starting index of the suffix that ends there. Once built, we can search for a pattern P in O(m) time by walking from the root along edges that match P; if we reach a node (or a point on an edge), all leaves in the subtree below give the occurrence indices. The suffix tree also supports longest repeated substring, longest common substring of two strings, and other problems. Naive construction inserts each of the n suffixes into a trie in O(n²) time and space; Ukkonen's algorithm builds the tree in O(n) time and space. This topic introduces the structure and naive build; efficient construction (Ukkonen) is often studied in advanced courses.
Real-World Analogy
Imagine a family tree where every path from the root spells a prefix of some suffix. The root has branches for each first letter of a suffix; each branch leads to more branches for the next character, and so on. When you reach a "leaf," you've read one full suffix and the leaf tells you where that suffix started in the original string. To find where "nan" appears, you walk: root → 'n' → 'a' → 'n'. The subtree under that point contains all the leaves (starting indices) where "nan" occurs. So one walk gives you the answer.
S = "banana". Suffixes: "banana"(0), "anana"(1), "nana"(2), "ana"(3), "na"(4), "a"(5). In the trie, root has edges for 'b', 'a', 'n'. The edge 'a' leads to a node with edges 'n' (→ "ana" suffix 3, "anana" suffix 1) and perhaps a leaf for "a" (suffix 5). Pattern "na": from root go to 'n', then 'a'; the leaves under that node are 2 and 4, so "na" occurs at indices 2 and 4. In a compressed suffix tree, edges are labeled with substrings (e.g. "nana") instead of single characters to reduce nodes.
Formal Definition
Suffix tree (uncompressed): A trie where each root-to-leaf path spells a suffix of S. Each leaf stores the starting index of that suffix. Internal nodes may have multiple children (branching). Compressed suffix tree: Edges are labeled with substrings (not single chars); any internal node (except possibly the root) has at least two children. This keeps the number of nodes O(n). Pattern search: Follow the path from the root that matches P character by character (or substring by substring). If we can match all of P, every leaf in the subtree at that point is an occurrence of P. Space: Uncompressed trie can have O(n²) nodes; compressed tree O(n) nodes and O(n) space with Ukkonen.
Why This Topic Matters
- O(m) pattern search: After O(n) build (Ukkonen), each pattern query is O(m)—faster than suffix array's O(m log n) binary search when many queries are needed.
- Longest repeated substring: Find the deepest internal node (with at least two leaf descendants); the path from root to that node spells the longest repeated substring.
- Longest common substring of two strings: Build suffix tree for A + '#' + B; find the deepest node that has leaves from both A and B (using a separator to distinguish).
- Foundation: Suffix trees generalize to multiple strings and are related to suffix arrays (the suffix array can be obtained by a DFS of the suffix tree).
Mental Model
Root = empty string. Each edge is labeled with one or more characters. Insert all suffixes: S[0:], S[1:], …, S[n−1:]. Shared prefixes share the same path. Leaves store the start index. To search P: start at root, follow edges that match P; if we consume all of P, collect all leaf indices in the current subtree. In a compressed tree, one edge might say "nana" so we skip four characters in one step.
Structure: Trie of Suffixes
Start with an empty trie. For each starting index i from 0 to n−1, insert the suffix S[i:] into the trie. When inserting, follow existing edges that match; when no edge matches the next character, create a new edge and a new node (or leaf). Store the start index i at the leaf. Each leaf corresponds to exactly one suffix; each internal node (except root) represents a substring that appears as a prefix of at least two different suffixes.
Python Implementation (Naive Build)
class SuffixTreeNode:
def __init__(self):
self.children = {} # char -> SuffixTreeNode
self.start = None # for leaf: start index of suffix
self.is_leaf = False
def build_suffix_tree_naive(s):
"""Build uncompressed suffix tree (trie of suffixes). O(n^2) time and space."""
root = SuffixTreeNode()
n = len(s)
for i in range(n):
node = root
for j in range(i, n):
c = s[j]
if c not in node.children:
node.children[c] = SuffixTreeNode()
node = node.children[c]
node.is_leaf = True
node.start = i
return root
def suffix_tree_search(root, s, pattern):
"""Return list of start indices where pattern occurs in s."""
node = root
for c in pattern:
if c not in node.children:
return []
node = node.children[c]
result = []
def collect_leaves(n):
if n.is_leaf:
result.append(n.start)
for child in n.children.values():
collect_leaves(child)
collect_leaves(node)
return sorted(result)
Pattern Search
Walk from the root following the first character of P, then the second, and so on. If at any step the required character is not on any edge, P does not occur in S—return []. If we finish reading P, we are at a node (or in the middle of an edge in a compressed tree). All leaves in the subtree rooted at that point are the starting indices of suffixes that have P as a prefix—i.e. occurrences of P. Collect them with a DFS. Time: O(m) to walk + O(k) to collect k leaves. If we only need to check existence, we can stop after the walk—O(m).
Longest Repeated Substring (Concept)
In the suffix tree, the longest repeated substring is the string spelled by the path from the root to the deepest internal node that has at least two leaf descendants (i.e. the substring appears at least twice). So we can do a DFS, compute the depth of each node, and among nodes with ≥2 leaves in the subtree, take the one with maximum depth. The path label from root to that node is the longest repeated substring.
Time Complexity
Naive build: n suffixes, each of length up to n; each insertion may traverse and create O(n) nodes. Total O(n²) time and O(n²) space in the worst case (e.g. all characters distinct). Ukkonen: O(n) time and O(n) space for the compressed tree. Pattern search: O(m) to walk + O(k) to collect k occurrences. So O(m + k) per query.
Space Complexity
Uncompressed: O(n²) worst case. Compressed (Ukkonen): O(n) nodes and edges. Each edge stores a substring reference (start, end indices into S) so O(1) per edge.
Edge Cases
- Empty string: Tree has only root (no suffixes to insert).
- Single character: One suffix; one leaf.
- Pattern not in string: Walk fails at some character → return [].
- All same character "aaa": Heavy sharing; tree is a single path of length n plus leaves at each level (or compressed into one long edge).
Suffix Tree vs Suffix Array
| Aspect | Suffix Tree | Suffix Array |
|---|---|---|
| Build (efficient) | O(n) Ukkonen | O(n log n) or O(n) |
| Pattern search | O(m + k) | O(m log n) |
| Space | O(n) compressed | O(n) |
| Implementation | Complex (Ukkonen) | Simpler |
Naive suffix tree: Insert n suffixes into a trie → O(n²) time and space. Ukkonen's algorithm: Builds the tree in a single left-to-right pass with suffix links and active point, achieving O(n). For interviews, the naive trie of suffixes is often enough to convey the idea; mention Ukkonen for linear-time build. In practice, suffix arrays + LCP are often preferred for simplicity and cache efficiency.
Pattern Recognition
Use suffix tree when you need very fast pattern search (O(m) per query after O(n) build), longest repeated substring (deepest branching node), or longest common substring of two strings (build for A#B, find deepest node with leaves from both). For most problems, suffix array + LCP is simpler to implement and sufficient.
A compressed suffix tree uses edge labels as (start, end) indices into S instead of copying substrings, so space stays O(n). Ukkonen builds it online. If you only need pattern search and n is moderate, the naive trie is acceptable; for production or large n, use a suffix array or a library that implements Ukkonen.
"A suffix tree is a trie of all suffixes. Each leaf stores the start index. To search for P we walk from the root following P; if we reach a node, all leaves in the subtree are the occurrences—O(m) search. Naive build is O(n²) by inserting each suffix; Ukkonen's algorithm builds in O(n). The longest repeated substring is the path label of the deepest internal node with at least two leaf descendants." Implement the naive trie build and search; mention Ukkonen for linear time.
Practice Problems
- Build a suffix tree (naive trie) and implement pattern search.
- Longest repeated substring using suffix tree (deepest branching node).
- Check if pattern P occurs in S using the suffix tree (O(m) after build).
Summary
- Suffix tree: Trie (or compressed trie) of all suffixes; leaves store start indices. Path from root spells a substring.
- Pattern search: Walk from root following P; collect all leaf indices in the subtree → O(m + k).
- Naive build: Insert n suffixes → O(n²) time and space. Ukkonen: O(n) time and space.
- Longest repeated substring: Deepest internal node with ≥2 leaf descendants. Suffix tree supports many string problems; suffix array + LCP are often simpler in practice.
7.14 Aho-Corasick Algorithm
Introduction
The Aho-Corasick algorithm solves multi-pattern string matching: given a text T and a set of patterns {P₁, P₂, …, Pₖ}, find all occurrences of any pattern in T in a single pass over the text. Instead of running KMP (or Rabin-Karp) k times—once per pattern—we build a trie of all patterns and add failure links (like the KMP failure function, but on the trie): from each node, the failure link points to the longest proper suffix of the current path that is also a prefix of some pattern. Then we scan T once, at each character following the trie (or the failure link when there is no matching edge), and at each node we report any pattern that ends there. Total time is O(n + m + z), where n = |T|, m = sum of pattern lengths, and z = total number of matches.
Real-World Analogy
Imagine a single pass through a document with a "dictionary" of many keywords. At each position you're in a "state" (a node in the trie). You try to extend the state by reading the next character; if you can't, you fall back to a shorter matching state (the failure link) and try again, without moving the text pointer back. So you never backtrack in the text—one forward scan—and whenever your state corresponds to a full keyword, you report it. It's like KMP for one pattern, but the "failure" can jump to a state that matches a different pattern's prefix.
Patterns: "he", "she", "his", "hers". Text: "she sells seashells". Build trie: root → 's' → 'h' → 'e' (match "she"), also 'h' → 'e' (match "he"), 'h' → 'i' → 's' (match "his"), 'e' → 'r' → 's' (match "hers"). Failure links: e.g. after "sh" we might fail on next char; the longest suffix of "sh" that is a prefix of some pattern is "h" (prefix of "he", "his"). So from the "sh" node we have a failure link to the "h" node. Scan: s→h→e: match "she" at start; then continue for "he" (overlap), etc.
Formal Definition
Aho-Corasick automaton: (1) Trie: One node per prefix of any pattern; edges are characters; nodes that correspond to the end of a pattern store the pattern ID(s). (2) Failure link: From node u (path spells string S), failure[u] = the node v such that the path from root to v is the longest proper suffix of S that is also a prefix of some pattern. (3) Output link (optional): From u, follow failure links until we hit a node that ends a pattern; we can precompute "which patterns end at u or at any node reachable by failure from u" to report all matches quickly. Search: Start at root; for each character c of T, while current node has no edge c, go to failure[node]; then take the edge c (or stay at root if still no edge). At each node, report patterns ending there.
Why This Topic Matters
- Multi-pattern in one pass: KMP does one pattern per pass; Aho-Corasick does all patterns in O(n + m + z). Essential when you have a fixed set of keywords and many texts (e.g. spam filter, intrusion detection, bioinformatics).
- No backtrack in text: Like KMP, the text pointer never goes backward. So we can stream the text.
- Interview and contests: "Find all occurrences of any of these k patterns" is the classic use case; mentioning Aho-Corasick shows you know beyond single-pattern KMP.
Mental Model
Trie of patterns: each path from root spells a prefix of some pattern; mark nodes where a pattern ends. Failure link: from each node, "if the next character doesn't match, what's the longest suffix of my path that could still match something?"—that suffix is a prefix of some pattern, so we have a node for it. Search: read T character by character; at each step, follow the trie (or failure) so that the current node always represents the longest match of some pattern prefix ending at the current position. When that node (or any node reachable via failure) ends a pattern, report it.
Building the Automaton
Step 1: Build the Trie
Start with root. For each pattern, insert it into the trie (like a normal trie). At the node reached after the last character, mark that pattern ID (and length, or the pattern itself) so we know which pattern(s) end there.
Step 2: Build Failure Links (BFS)
Root's failure is root (or null). For each node u at depth 1 (children of root), failure[u] = root. For nodes at depth > 1: let u be reached by edge c from parent p. Set w = failure[p]. While w ≠ root and w has no edge c, set w = failure[w]. Then failure[u] = w.child[c] if that exists, else root. This computes the longest proper suffix of the path to u that is a prefix of some pattern.
Python Implementation (Simplified)
from collections import deque
class AhoNode:
def __init__(self):
self.children = {}
self.fail = None
self.output = [] # pattern indices ending at this node
def build_aho_corasick(patterns):
root = AhoNode()
for i, p in enumerate(patterns):
node = root
for c in p:
if c not in node.children:
node.children[c] = AhoNode()
node = node.children[c]
node.output.append(i)
root.fail = root
q = deque()
for c, child in root.children.items():
child.fail = root
q.append(child)
while q:
node = q.popleft()
for c, child in node.children.items():
q.append(child)
fail = node.fail
while fail != root and c not in fail.children:
fail = fail.fail
child.fail = fail.children.get(c, root)
child.output += child.fail.output
return root
def aho_search(text, root, patterns):
"""Return list of (pattern_index, start_position_in_text) for each match."""
result = []
node = root
for i, c in enumerate(text):
while node != root and c not in node.children:
node = node.fail
node = node.children.get(c, root)
for pat_id in node.output:
start = i - len(patterns[pat_id]) + 1
result.append((pat_id, start))
return result
Line-by-Line: Failure and Output
child.fail = fail.children.get(c, root): We walk from node's parent's failure until we find a node that has an edge for c, or we reach root. Then the child's failure is that next node (or root).child.output += child.fail.output: Any pattern that ends at the failure node also "ends" at the current node in the sense that we should report it when we reach the current node (because the failure path is a suffix of the current path). So we merge output lists so that at each node we report all patterns that end at this node or at any node on the failure chain.- In search: when we can't follow an edge, we go to failure until we can or we're at root. Then we take the edge (or stay at root). We report all patterns in node.output at each step.
Time Complexity
Build trie: O(m) where m = total length of all patterns. Build failure links: BFS over nodes; each node we may follow failure links a bounded number of times (amortized analysis: total failures followed is O(m)). So O(m). Search: n characters; at each character we may follow failure links (each follow moves us to a strictly shorter path, so total over the whole search is O(n)) and then one transition. Reporting z matches is O(z). Total O(n + m + z).
Space Complexity
Trie has O(m) nodes (each character of each pattern creates at most one node). Failure and output pointers: O(m). So O(m).
Edge Cases
- Empty pattern: Skip or treat as matching everywhere; handle per problem.
- One pattern: Aho-Corasick reduces to KMP-like behavior (trie is a path + failure links).
- Pattern a prefix of another: The shorter pattern's end node is an ancestor of the longer's; output lists and failure chain ensure both are reported.
Common Mistakes
- Forgetting to merge output from failure node into the current node—then you only report patterns that end exactly at this node and miss patterns that end at a suffix (e.g. "he" inside "she").
- In the search loop, moving the text pointer when following failure—we should not advance i when we follow failure; we only advance when we consume a character from the text.
When at a node that doesn't have an edge for the current character, we follow the failure link and do not advance the text index. We only advance the text index when we actually consume a character (take an edge). So the loop is: while no edge and not root, go to failure; then if there is an edge, take it and advance i; else (at root with no edge) just advance i.
Aho-Corasick vs KMP vs Rabin-Karp
| Problem | Best choice |
|---|---|
| Single pattern | KMP or Rabin-Karp O(n + m) |
| Multiple patterns, one text | Aho-Corasick O(n + m + z) |
| Multiple patterns, hashing OK | Rabin-Karp: set of hashes, one pass O(n) |
For k patterns, running KMP k times gives O(k·(n + m_i)) which can be O(k·n) if patterns are short. Aho-Corasick does one pass O(n) plus O(m) build, so O(n + m)—better when k is large. Rabin-Karp with a set of pattern hashes also does one pass; Aho-Corasick is deterministic and doesn't need verification (no hash collisions).
Pattern Recognition
Use Aho-Corasick when you have multiple patterns and one text (or many texts with the same pattern set). Keywords: "find any of these keywords," "multi-pattern search," "dictionary matching." For a single pattern, use KMP or Rabin-Karp.
Precompute output lists so that at each node we store all pattern IDs that end at this node or at any node reachable by following failure links. Then during search we only need to iterate node.output at each step—no need to follow the failure chain to collect matches. Build failure links with BFS (level order) so that when we compute failure for a node at depth d, all nodes at depth < d already have their failure set.
"For multiple patterns we build a trie of all patterns and add failure links: from each node, the failure points to the longest proper suffix of the current path that is also a prefix of some pattern—like KMP on the trie. Then we scan the text once: at each character we follow the trie or failure, and at each node we report patterns in the output list. Time O(n + m + z)." Implement trie build and failure BFS; mention output list merge from failure node.
Practice Problems
- Find all occurrences of any of k patterns in a text (Aho-Corasick).
- Keyword matching: given a list of forbidden words and a document, find all positions where any forbidden word appears.
- Multi-pattern search with pattern IDs: return (pattern_id, start_index) for each match.
Summary
- Aho-Corasick: Multi-pattern matching in one pass. Trie of patterns + failure links (longest proper suffix that is a prefix of some pattern) + output lists.
- Build: Trie O(m), failure links via BFS O(m). Search: one pass over text O(n + z). Total O(n + m + z).
- At each node, output = patterns ending at this node plus those at nodes on the failure chain. Do not advance text index when following failure.
- Use when k patterns and one text; for single pattern, KMP or Rabin-Karp is simpler.
7.15 Longest Repeating Substring
Introduction
The longest repeating substring problem asks: given a string S, find the longest substring that appears at least twice in S (the two occurrences may overlap). For example, in "banana" the substring "ana" appears at indices 1 and 3; "anana" would be longer but only appears once; so "ana" (length 3) is one valid answer. This problem ties together suffix structures (7.12, 7.13) and hashing (7.6, 7.7): we can solve it with suffix array + LCP (max LCP value gives the length; the substring is at SA[i] for that i), with a suffix tree (deepest internal node with ≥2 leaves), or with binary search on length + rolling hash (for a fixed length L, check if any length-L substring appears twice using a set of hashes). Each approach has different time/space trade-offs; we cover the main ones here.
Real-World Analogy
Imagine a long paragraph. You want to find the longest phrase that appears more than once—perhaps it's a repeated slogan or a copy-paste. You could list every possible phrase and see which repeats (too slow), or you could use structure: sort all suffixes and notice that when two suffixes share a long common prefix, that prefix is a repeating substring. The "longest common prefix" between consecutive sorted suffixes (the LCP array) directly tells you the longest such repeat.
S = "banana". Suffixes sorted: "a"(5), "ana"(3), "anana"(1), "banana"(0), "na"(4), "nana"(2). LCP: "ana" and "anana" share "ana" (length 3); "anana" and "banana" share "" (0); etc. Max LCP = 3, and the substring is "ana" (e.g. at SA[1]=3, so S[3:6]="ana"). So longest repeating substring = "ana". In "aaaa" the longest repeating substring is "aaa" (appears at 0 and 1).
Formal Definition
Longest repeating substring: A substring T of S that appears at least twice (i.e. there exist indices i ≠ j such that S[i..i+|T|] = S[j..j+|T|]) and |T| is maximum. Overlapping is allowed (e.g. "aaa" in "aaaa" has occurrences 0 and 1). Suffix array + LCP: After sorting suffixes, LCP[i] = length of common prefix of S[SA[i]:] and S[SA[i−1]:]. Any common prefix of two suffixes is a repeating substring (it appears at SA[i] and SA[i−1]). So max LCP over i is the length of the longest repeating substring; the substring itself is S[SA[i]: SA[i] + LCP[i]] for an i that achieves the max. Hash approach: For length L, use rolling hash to get hashes of all length-L substrings; if any hash appears at least twice, some substring of length L repeats. Binary search on L to find the maximum such L.
Why This Topic Matters
- Classic string problem: Appears in interviews and contests; combines suffix array (7.12), LCP, or hashing (7.6–7.7).
- Application: Plagiarism detection, data compression (repeated blocks), bioinformatics (repeated sequences).
- Unifies earlier topics: You can solve with suffix array + LCP (from 7.12), suffix tree (7.13), or binary search + rolling hash (7.6, 7.7).
Approach 1: Suffix Array + LCP
Build the suffix array (e.g. naive sort of suffixes). Build the LCP array: LCP[i] = longest common prefix of S[SA[i]:] and S[SA[i−1]:]. Then max_len = max(LCP), and the substring is S[SA[i]: SA[i] + max_len] for any i where LCP[i] = max_len. If max_len is 0, there is no repeating substring of length ≥ 1 (all characters distinct).
def longest_repeating_suffix_array(s):
n = len(s)
if n < 2:
return ""
suffixes = [(s[i:], i) for i in range(n)]
suffixes.sort(key=lambda x: x[0])
sa = [idx for _, idx in suffixes]
lcp = [0] * n
for i in range(1, n):
a, b = s[sa[i]:], s[sa[i-1]:]
j = 0
while j < len(a) and j < len(b) and a[j] == b[j]:
j += 1
lcp[i] = j
max_len = max(lcp)
if max_len == 0:
return ""
i = lcp.index(max_len)
return s[sa[i]:sa[i] + max_len]
Time: O(n² log n) for naive SA + O(n²) for naive LCP (each LCP[i] can be O(n)). With O(n log n) SA build and O(n) LCP build, total O(n log n). Space: O(n).
Approach 2: Binary Search + Rolling Hash
Binary search on the length L (from 1 to n−1). For a fixed L, compute the hash of every length-L substring using a rolling hash (7.7). Store hashes in a set (or dict: hash → list of start indices). If any hash appears at least twice, then some substring of length L repeats—so we can try larger L. Otherwise try smaller L. Return the maximum L for which a repeat exists, and optionally the substring (e.g. by storing one start index per hash and checking for a second occurrence). To avoid collisions, use double hashing or verify with a direct comparison when two hashes match.
def has_repeating_substring(s, L, B=31, M=10**9+7):
"""True if some length-L substring appears at least twice."""
n = len(s)
if L > n or L <= 0:
return False
seen = {}
base_pow = pow(B, L - 1, M)
h = 0
for i in range(L):
h = (h * B + ord(s[i])) % M
seen[h] = [0]
for i in range(1, n - L + 1):
h = (h - ord(s[i-1]) * base_pow) % M
h = (h * B + ord(s[i + L - 1])) % M
h = (h % M + M) % M
if h in seen:
for start in seen[h]:
if s[start:start+L] == s[i:i+L]:
return True
seen[h].append(i)
else:
seen[h] = [i]
return False
def longest_repeating_binary_search(s):
n = len(s)
if n < 2:
return ""
lo, hi = 1, n - 1
best_len = 0
while lo <= hi:
mid = (lo + hi) // 2
if has_repeating_substring(s, mid):
best_len = mid
lo = mid + 1
else:
hi = mid - 1
if best_len == 0:
return ""
for i in range(n - best_len + 1):
sub = s[i:i+best_len]
if s.count(sub) >= 2: # or use hash to find second occurrence
return sub
return ""
Time: O(n log n) binary search steps; each step O(n) with rolling hash (and possibly O(n) for verification if we use a simple dict). So O(n log n) or O(n² log n) with naive verification. With double hashing we can often skip verification. Space: O(n) for the set/dict.
Approach 3: Suffix Tree (Concept)
Build the suffix tree (7.13). The longest repeating substring is the path label of the deepest internal node that has at least two leaf descendants (i.e. the substring appears at least twice). Depth = number of characters from root to that node. With Ukkonen's algorithm the tree is built in O(n); then a DFS to find the deepest such node is O(n). So total O(n).
Comparison of Approaches
| Approach | Time | Note |
|---|---|---|
| Suffix array + LCP (naive) | O(n² log n) | Simple; efficient SA gives O(n log n) |
| Binary search + hash | O(n log n) | No suffix structure; verify to avoid collisions |
| Suffix tree (Ukkonen) | O(n) | Optimal; implementation complex |
Brute force: For each length L from n−1 down to 1, check all O(n) substrings of length L and see if any appears twice—O(n³) or O(n²) with hashing per L. Better: Suffix array + LCP gives the answer in one pass over the LCP array after building SA. Alternative: Binary search on L + rolling hash: for each L we do one pass O(n); log n values of L → O(n log n). Suffix tree gives O(n) with Ukkonen.
Edge Cases
- All characters distinct: No repeating substring of length ≥ 1; return "".
- String length < 2: No repeat possible; return "".
- Entire string repeats: e.g. "abab" — longest repeating substring could be "ab" (length 2) or "abab" (if we allow overlapping: "abab" appears at 0 and 2? No, 0 and 2 gives "ab" and "ab" so "ab" repeats. For "aaaa", "aaa" repeats.)
Common Mistakes
- Confusing "longest repeating" with "longest substring that appears exactly twice"—the problem usually means "at least twice," so "aaa" in "aaaa" is valid.
- In LCP approach, returning the substring from the wrong index: use SA[i] (the start of the suffix at position i in the sorted order), and length LCP[i].
- In hash approach, not handling collisions: two different substrings can have the same hash; verify with direct comparison or use double hashing.
For interviews, the suffix array + LCP approach is the most direct: "Sort the suffixes, compute LCP; the maximum LCP value is the length of the longest repeating substring; the substring is S[SA[i]: SA[i]+LCP[i]] for that i." If the interviewer wants O(n log n) without building a full suffix array, binary search + rolling hash is a good alternative.
"We can build the suffix array and LCP array. The longest repeating substring has length max(LCP), and we get the actual substring from S[SA[i]: SA[i]+LCP[i]] for an i that achieves the max. Alternatively, binary search on the length L and for each L use a rolling hash to check if any length-L substring appears twice—O(n log n)." Implement one approach; mention the other and suffix tree for O(n).
Practice Problems
- Longest Repeating Substring (LeetCode 1044): return the longest substring that appears at least twice.
- Longest Duplicate Substring: same problem; often solved with binary search + hashing or suffix array.
Summary
- Longest repeating substring = longest substring that appears at least twice (overlap allowed).
- Suffix array + LCP: max(LCP) = length; substring = S[SA[i]: SA[i]+LCP[i]] for i with LCP[i] = max. Build SA (e.g. O(n² log n) naive) and LCP.
- Binary search + hash: Binary search on length L; for each L, rolling hash + set to see if any hash repeats; verify to avoid collisions. O(n log n).
- Suffix tree: Deepest internal node with ≥2 leaves; O(n) with Ukkonen.
8.1 Singly Linked List
Introduction
An array stores elements in contiguous memory: you can jump to any index in O(1), but inserting or deleting in the middle (or at the front) can cost O(n) because you must shift elements. A singly linked list is a different way to represent a sequence: each element lives in a node that holds a value and a pointer (reference) to the next node. There is no random access by index—you must walk from the head—but insertion and deletion at the front (or at a known node) can be done in O(1). This tradeoff (no random access vs cheap front/middle updates) is why linked lists appear everywhere: in low-level memory allocators, in LRU caches, in graph adjacency lists, and in countless interview problems.
In this section we build a singly linked list from scratch in Python: the node class, the list class, and the core operations—traverse, insert at head/tail, delete, search. You will see exactly why some operations are O(1) and others O(n), and how to avoid the most common bugs (losing references, off-by-one, empty list). Master this and you are ready for reverse list, cycle detection, merge lists, and LRU cache.
Real-World Analogy
Imagine a treasure hunt where each clue card says: “Your next clue is at the red mailbox.” You start at the first card (the head), read it, follow to the next card, and so on until one card says “You’re done” (no next—that’s null). You cannot jump to “the 5th card” without walking through the first four. Adding a new first clue is easy: write a new card that points to the old first card and call that the new head. Removing the first clue is easy: the new head is whatever the first card pointed to. That’s a singly linked list: each node points only forward; to go backward you’d need a doubly linked list (Section 8.2).
Browser “Back” and “Forward” buttons: the history of visited pages can be modeled as a list. If we only move forward (clicking links), a singly linked list is enough: each page holds a reference to the “next” page. When you insert a new visit (new page), you make it the new head and point it to the previous head—O(1). No need to shift a big array.
Formal Definition
Singly linked list: A linear data structure consisting of nodes. Each node contains (1) a value (data) and (2) a next reference (pointer) to the next node, or None if it is the last node. The list is accessed via a head reference to the first node. There is no direct access by index; traversal is sequential from head to the node whose next is None. The number of nodes is the length of the list.
We do not store “where is the 3rd element” in one step—we must follow head → next → next. So access by index is O(k) for the k-th element. Insertion after a given node (or at head) is O(1) if we already have a reference to that node; insertion at tail is O(1) if we keep a tail pointer, otherwise O(n). Deletion of a node is O(1) if we have a reference to the node before it (so we can rewire before.next); otherwise we may need O(n) to find that predecessor.
Why This Topic Matters
- Foundation for linked structures: Doubly linked lists, circular lists, and many graph/tree representations build on the same “node + pointer” idea. If you understand a singly linked list, you understand the pattern.
- Interview staple: Reverse a list, detect cycle, merge two sorted lists, remove Nth from end, reorder list—all assume you can traverse and mutate pointers confidently.
- Real systems: Free lists in memory allocators, buckets in hash tables (chaining), and LRU caches often use linked lists for O(1) front insertion and removal.
Mental Model
Picture the list as a chain of boxes. Each box has two slots: one for data, one for “next box.” The first box is the head. The last box’s “next” slot is empty (None). To traverse: start at head, look at data, then move to the box in “next”; repeat until “next” is empty. To insert at front: create a new box whose “next” is the current head, then set head to the new box. Never “lose” the head or the rest of the chain—always update pointers in an order that doesn’t drop references.
Step-by-Step: Node and List Structure
A node is the unit of the list. In Python we use a class with data (or val) and next. The list itself is represented by a head reference; optionally we keep a tail and a length for O(1) tail operations and O(1) size.
1. Define the Node
class Node:
def __init__(self, data):
self.data = data
self.next = None
Every node holds one value and a link to the next. None means “no next node.”
2. Define the List (Head + Optional Tail and Length)
class LinkedList:
def __init__(self):
self.head = None # empty list
# Optional: self.tail = None, self.length = 0
An empty list is head is None. We can add tail for O(1) append and length for O(1) size.
ASCII Diagram: Structure and Traversal
Empty list: head → None
List [10, 20, 30]:
head
│
▼
┌─────┬─────┐ ┌─────┬─────┐ ┌─────┬─────┐
│ 10 │ ●──┼───►│ 20 │ ●──┼───►│ 30 │ None│
└─────┴─────┘ └─────┴─────┘ └─────┴─────┘
node1 node2 node3
(data, next) (data, next) (data, next)
Traversal: curr = head → curr = curr.next → curr = curr.next → curr is None (stop).
Python Implementation: Full Singly Linked List
Below is a complete implementation with: insert at head, insert at tail (with tail pointer), delete at head, search by value, get length, and traverse/print. We use a tail pointer so that appending is O(1).
class Node:
def __init__(self, data):
self.data = data
self.next = None
class LinkedList:
def __init__(self):
self.head = None
self.tail = None
self.length = 0
def is_empty(self):
return self.head is None
def insert_at_head(self, data):
new_node = Node(data)
new_node.next = self.head
self.head = new_node
if self.tail is None:
self.tail = new_node
self.length += 1
def insert_at_tail(self, data):
new_node = Node(data)
if self.tail is None:
self.head = self.tail = new_node
else:
self.tail.next = new_node
self.tail = new_node
self.length += 1
def delete_at_head(self):
if self.head is None:
return None
value = self.head.data
self.head = self.head.next
if self.head is None:
self.tail = None
self.length -= 1
return value
def search(self, key):
curr = self.head
while curr is not None:
if curr.data == key:
return True
curr = curr.next
return False
def get_length(self):
return self.length
def to_list(self):
result = []
curr = self.head
while curr is not None:
result.append(curr.data)
curr = curr.next
return result
Line-by-Line Explanation (Key Parts)
insert_at_head: Create a new node; set itsnextto currenthead; setheadto the new node. If the list was empty, the new node is also the tail. Always updatelength. Time O(1).insert_at_tail: Create a new node. If list is empty, set bothheadandtailto it. Otherwise settail.next = new_nodeand thentail = new_node. Time O(1) because we have a tail pointer.delete_at_head: If empty, return None. Otherwise savehead.data, sethead = head.next. If the list becomes empty (head is None), settail = None. Decrement length. Time O(1).search: Walk fromheadwithcurr = curr.nextuntil we findkeyor hitNone. Time O(n).to_list: Traverse and collect values into a Python list. Time O(n), space O(n) for the result.
When inserting or deleting, updating pointers in the wrong order can “lose” the rest of the list. Rule: when you need to rewire A → B to A → C → B, first set C.next = B, then set A.next = C. If you set A.next = C before setting C.next, you lose the reference to B. Similarly, in delete_at_head, saving the value or the next node before changing head avoids losing the list.
Insertion in the Middle (After a Given Node)
If you have a reference node to the node after which you want to insert:
def insert_after(self, node, data):
if node is None:
return
new_node = Node(data)
new_node.next = node.next
node.next = new_node
if node == self.tail:
self.tail = new_node
self.length += 1
Order matters: set new_node.next = node.next first, then node.next = new_node. Time O(1). If you don’t have a reference to the node and must search by index or value, finding that node is O(n).
Deletion by Reference (Delete Node Given Only That Node)
In a singly linked list, to remove a node you normally need the previous node so you can do prev.next = node.next. If you are given only the node to delete (and no reference to the head), you cannot get the previous node without traversing from head—unless you “fake” deletion by copying the next node’s value into the current node and then skipping the next node. That gives O(1) but does not work for the tail (there is no “next” to copy). We’ll see this trick again in “Delete Node in a Linked List” (LeetCode 237).
# Given only the node to delete (and it's not the tail):
def delete_node(node):
node.data = node.next.data
node.next = node.next.next
Time Complexity Summary
| Operation | Time | Note |
|---|---|---|
| Access by index k | O(k) | Must traverse k nodes from head |
| Search by value | O(n) | Worst case scan entire list |
| Insert at head | O(1) | Just rewire head |
| Insert at tail (with tail ptr) | O(1) | Wire tail → new, update tail |
| Insert after given node | O(1) | If you have the node reference |
| Delete at head | O(1) | head = head.next |
| Delete by reference (copy trick) | O(1) | Only when node is not tail |
| Traverse / length (no stored length) | O(n) | One pass from head to None |
Space Complexity
The list itself uses O(n) space for n nodes (each node stores a value and a next reference). Additional space for operations: insert_at_head, delete_at_head, insert_after use O(1) extra; search and to_list use O(1) extra for the pointer/index, but to_list returns a new list of size n so the output is O(n).
Edge Cases
- Empty list:
head(andtail) is None. Every operation must check: insert at head/tail sets both head and tail to the new node; delete at head returns None and leaves list empty. - Single node: After
delete_at_head, settail = Noneso the list is correctly empty. - Insert after tail: When you do
insert_after(tail, data), updateself.tail = new_nodeso tail stays correct. - Delete the only node:
headbecomes None;tailmust also become None.
Common Mistakes
- Reversing the order of pointer updates and losing the rest of the list. Always set the new node’s
nextbefore changing existing pointers. - Forgetting to update
tailwhen the last node is removed or when a new node is added at the end. - Assuming you can delete a node in O(1) when given only that node and it’s the tail—you cannot without a reference to the previous node (or a doubly linked list).
- Off-by-one in “get k-th node”: the 1st node is at index 0; the k-th node requires k−1 moves of
curr = curr.nextfrom head.
Without tail pointer: Insert at tail is O(n) because you must traverse to the last node. With tail pointer: Insert at tail becomes O(1). The tradeoff is one extra field and updating it on every operation that changes the last node. For a queue implemented as a linked list, tail is essential for O(1) enqueue. Stored length: If you maintain self.length, get_length is O(1); otherwise it’s O(n).
Pattern Recognition
Many linked-list problems use the same patterns:
- Two pointers: Slow and fast (e.g. find middle, detect cycle, remove Nth from end).
- Dummy node: A dummy head (e.g.
dummy = Node(0); dummy.next = head) simplifies edge cases when the head might change (insertions, deletions at front). - Prev/curr traversal: When you need to delete a node or insert before a node, keep
prevandcurrand advanceprev = curr; curr = curr.next.
Interviewers expect you to implement a linked list without hesitation and to reason about pointer updates. Always clarify: “Is the list singly or doubly linked? Do we have a tail? Can we use a dummy node?” For “delete node given only that node,” mention the copy trick (overwrite with next, skip next) and its limitation (doesn’t work for tail). For “insert at head” and “delete at head,” state O(1) and show the exact pointer updates. Drawing a small diagram (3–4 nodes) before coding helps avoid pointer bugs.
Practice Problems
- LeetCode 206: Reverse Linked List (iterative and recursive).
- LeetCode 141: Linked List Cycle (Floyd’s slow/fast).
- LeetCode 21: Merge Two Sorted Lists.
- LeetCode 19: Remove Nth Node From End of List (two pointers).
- LeetCode 237: Delete Node in a Linked List (copy trick).
- LeetCode 876: Middle of the Linked List (slow/fast).
Summary
- A singly linked list is a sequence of nodes; each node has data and next (or None). The list is accessed via head; optionally tail and length for O(1) append and size.
- Insert at head / tail (with tail) / after a node: O(1). Delete at head: O(1). Search / access by index: O(n).
- Always update pointers in an order that doesn’t lose references; update
tailwhen the last node changes. - Use dummy node, two pointers, and prev/curr as standard patterns for list problems. Master this before tackling reverse, cycle detection, and merge (Sections 8.4–8.6).
8.2 Doubly Linked List
Introduction
In a singly linked list, each node points only to the next node. To delete a node, you typically need a reference to the previous node—which can cost O(n) to find. A doubly linked list adds a prev pointer in each node, so every node knows both its predecessor and its successor. That allows O(1) deletion of a node when you have a reference to that node (you just rewire prev.next and next.prev), and backward traversal from tail to head. The cost is extra space (one more pointer per node) and slightly more complex pointer updates. Doubly linked lists are the backbone of many practical structures: LRU caches (Section 8.9), browser back/forward history, and ordered data structures that need fast removal of an arbitrary element.
In this section we define the doubly linked list node and list class, implement insert/delete at head and tail, and—importantly—delete a node given only that node in O(1). We compare with the singly linked list and show when the extra prev pointer is worth it.
Real-World Analogy
Think of a train with bidirectional links. Each carriage has a door to the next carriage and a door to the previous one. From any carriage you can walk forward or backward without starting from the engine. If you remove one carriage, you only need to connect “previous carriage → next carriage” and “next carriage → previous carriage”—you don’t need to walk from the engine to find the previous carriage. That’s the doubly linked list: each node has prev and next; deletion at a known node is just rewiring two links.
Browser back/forward: each page has “previous page” and “next page.” When you click a new link, the new page becomes the new “head” of the forward list and its prev points to the current page. When you click Back, you follow prev; when you click Forward, you follow next. Removing a page from the middle (e.g. clearing history from a point) is O(1) if you have the node—exactly what a doubly linked list provides.
Formal Definition
Doubly linked list: A linear data structure of nodes. Each node has (1) data, (2) next (reference to the next node, or None at the tail), and (3) prev (reference to the previous node, or None at the head). The list is accessed via head (first node) and optionally tail (last node). Traversal can go forward (head → next → …) or backward (tail → prev → …). Given a reference to any node, that node can be removed in O(1) by updating its predecessor’s next and its successor’s prev.
The key advantage over a singly linked list: delete node in O(1) given only that node. In a singly linked list you need the previous node to do deletion; in a doubly linked list the node itself carries prev, so you have everything you need.
Why This Topic Matters
- LRU cache: The standard implementation keeps “most recently used” items in a doubly linked list so the least recently used (tail) can be evicted in O(1), and moving a node to “most recent” (head) is O(1) by remove-then-insert-at-head—both require O(1) delete at a known node.
- Ordered structures: Many ordered containers (e.g. Java’s LinkedList, Python’s deque with certain operations) use doubly linked lists for O(1) removal of an arbitrary element when you have an iterator/node.
- Interview follow-up: “How would you delete a node in O(1) given only that node?” Answer: use a doubly linked list (or the “copy next into node” trick for singly, which doesn’t work for the tail).
Mental Model
Picture a chain where each link has two hooks: one to the next link, one to the previous. The first link’s “previous” hook is empty (None); the last link’s “next” hook is empty. To remove a link: disconnect its prev link’s “next” from it, and its next link’s “prev” from it; then connect prev’s next to next, and next’s prev to prev. You never need to traverse from the head to find “who points to this node”—the node’s prev tells you.
Node and List Structure
Node: data, prev, next
class Node:
def __init__(self, data):
self.data = data
self.prev = None
self.next = None
Each node has two pointers. head.prev and tail.next are always None.
List: head and tail
class DoublyLinkedList:
def __init__(self):
self.head = None
self.tail = None
self.length = 0
ASCII Diagram
Empty: head → None, tail → None
List [10, 20, 30]:
head tail
│ │
▼ ▼
┌─────┬───┬─────┐ ┌─────┬───┬─────┐ ┌─────┬───┬─────┐
│None │ 10│ ●──┼───►│ ● │ 20│ ●──┼───►│ ● │ 30│None│
│ ◄──┼───┼──◄─┼────│─────┼───┼──◄─┼────│─────┼───┼──◄─┘
└─────┴───┴─────┘ └─────┴───┴─────┘ └─────┴───┴─────┘
prev data next prev data next prev data next
Delete middle node (20): set 10's next = 30, set 30's prev = 10. No traversal needed.
Python Implementation
class Node:
def __init__(self, data):
self.data = data
self.prev = None
self.next = None
class DoublyLinkedList:
def __init__(self):
self.head = None
self.tail = None
self.length = 0
def is_empty(self):
return self.head is None
def insert_at_head(self, data):
new_node = Node(data)
new_node.next = self.head
if self.head is not None:
self.head.prev = new_node
self.head = new_node
if self.tail is None:
self.tail = new_node
self.length += 1
def insert_at_tail(self, data):
new_node = Node(data)
new_node.prev = self.tail
if self.tail is not None:
self.tail.next = new_node
self.tail = new_node
if self.head is None:
self.head = new_node
self.length += 1
def delete_at_head(self):
if self.head is None:
return None
value = self.head.data
self.head = self.head.next
if self.head is None:
self.tail = None
else:
self.head.prev = None
self.length -= 1
return value
def delete_at_tail(self):
if self.tail is None:
return None
value = self.tail.data
self.tail = self.tail.prev
if self.tail is None:
self.head = None
else:
self.tail.next = None
self.length -= 1
return value
def delete_node(self, node):
"""Remove node in O(1) given a reference to it."""
if node is None:
return
if node.prev is not None:
node.prev.next = node.next
else:
self.head = node.next
if node.next is not None:
node.next.prev = node.prev
else:
self.tail = node.prev
self.length -= 1
def search_forward(self, key):
curr = self.head
while curr is not None:
if curr.data == key:
return True
curr = curr.next
return False
def to_list_forward(self):
result = []
curr = self.head
while curr is not None:
result.append(curr.data)
curr = curr.next
return result
def to_list_backward(self):
result = []
curr = self.tail
while curr is not None:
result.append(curr.data)
curr = curr.prev
return result
Line-by-Line Explanation (Key Operations)
insert_at_head: New node’snext = head. If list was non-empty,head.prev = new_node. Thenhead = new_node. If list was empty,tail = new_node. Order: wire new node into the list first, then update head. O(1).insert_at_tail: Symmetric:new_node.prev = tail; if non-emptytail.next = new_node;tail = new_node; if emptyhead = new_node. O(1).delete_at_head: Save value,head = head.next. If list becomes empty,tail = None. Elsehead.prev = None(new head has no predecessor). O(1).delete_at_tail: Symmetric:tail = tail.prev; if emptyhead = None; elsetail.next = None. O(1).delete_node(node): Ifnode.prevexists, setnode.prev.next = node.next; elsenodewas head, sohead = node.next. Ifnode.nextexists, setnode.next.prev = node.prev; elsenodewas tail, sotail = node.prev. No traversal—O(1).
When deleting a node, forgetting to handle the case when the node is head or tail. If node.prev is None, you must set self.head = node.next; if node.next is None, you must set self.tail = node.prev. Also, when delete_at_head leaves the list with one node removed, the new head’s prev must become None—otherwise you leave a dangling reference.
Singly vs Doubly Linked List: Comparison
| Aspect | Singly | Doubly |
|---|---|---|
| Space per node | 1 pointer (next) | 2 pointers (prev, next) |
| Delete node given only that node | O(1) copy trick (not for tail), else O(n) to find prev | O(1) always |
| Backward traversal | Not possible without extra structure | O(n) from tail |
| Insert/delete at head or tail | O(1) with head (and tail for insert tail) | O(1) |
| Pointer updates per insert/delete | Fewer | More (prev and next both sides) |
Time and Space Complexity
Time: Insert at head/tail O(1). Delete at head/tail O(1). Delete given node O(1). Search O(n). Access by index O(k). Forward or backward traversal O(n).
Space: O(n) for n nodes; each node has two pointers plus data. Doubly uses roughly one extra pointer per node compared to singly (e.g. 2× pointer space per node).
Edge Cases
- Empty list:
headandtailare None. Insert at head or tail sets both head and tail to the new node. - Single node:
head == tail.delete_at_headordelete_at_tailmust set both head and tail to None. delete_nodeon head:node.prev is None→ sethead = node.next. If that’s None, settail = None.delete_nodeon tail:node.next is None→ settail = node.prev. If that’s None, sethead = None.
Common Mistakes
- Updating only
next(or onlyprev) when deleting, leaving the other direction inconsistent. Bothprev.nextandnext.prevmust be updated (or head/tail if at boundary). - In
insert_at_head, forgettinghead.prev = new_nodewhen the list was non-empty—then the old head’sprevstays None and backward traversal is wrong. - Assuming
delete_nodeis O(1) in a singly linked list without the copy trick—in general you need O(n) to find the previous node.
Use a doubly linked list when you need O(1) removal of an arbitrary node given a reference (e.g. LRU cache: remove from current position and add to head). Use singly when you only need front/back insertion and deletion and don’t need to delete an arbitrary node by reference—saves space and simpler code. For “delete node given only that node” in an interview, stating “with a doubly linked list this is O(1)” shows you know the tradeoff.
When the problem involves “remove from the middle in O(1)” or “move this item to the front in O(1)” (e.g. LRU), a doubly linked list is the right structure. Be ready to implement delete_node(node) and to handle head/tail in that method. Draw a 3-node diagram and show updating both prev.next and next.prev (or head/tail) so the interviewer sees you handle boundaries. Comparing “singly: need prev to delete, so O(n) unless copy trick; doubly: O(1) delete given node” is a strong answer.
Practice Problems
- LeetCode 146: LRU Cache (doubly linked list + hash map for O(1) get/put).
- LeetCode 430: Flatten a Multilevel Doubly Linked List.
- LeetCode 707: Design Linked List (support both singly and doubly).
Summary
- A doubly linked list has nodes with data, prev, and next. head and tail give O(1) access to both ends.
- Delete a node in O(1) given only that node by rewiring
node.prev.nextandnode.next.prev(and head/tail if node is head or tail). - Insert/delete at head or tail remain O(1). Backward traversal is O(n) from tail. Use doubly when you need O(1) arbitrary-node removal or backward traversal; use singly when you don’t, to save space and simplify updates.
8.3 Circular Linked List
Introduction
In a normal (linear) linked list, the last node’s next is None—traversal stops when you hit that. In a circular linked list, the last node’s next points back to the head, so there is no “end”: from any node you can keep following next and eventually loop back. That shape is useful when the data is naturally cyclic: round-robin scheduling, Josephus problem, multiplayer turn order, or any “rotate through a fixed set” scenario. You can build a singly circular list (one pointer per node, tail → head) or a doubly circular list (head’s prev → tail, tail’s next → head). The main implementation catch is termination: a traversal must stop when it reaches the starting point again, or you’ll loop forever.
In this section we define circular singly and doubly linked lists, implement insert and delete (with care for the one-node case), and show how to traverse safely. We also touch on the Josephus problem as a classic application.
Real-World Analogy
Imagine a round table of people. Each person can point to the person to their right. The person at the “last” seat points back to the first—so there is no real last: everyone has a next. To go around the table you start at one person and keep moving “next” until you’re back where you started. That’s a circular list. Removing someone means rewiring “previous person → next person”; if you remove the “head,” the new head is whoever was second. The table never has a physical end—only a designated starting point (head) for convenience.
Round-robin CPU scheduling: Ready processes are kept in a circular list. The scheduler gives the CPU to the current node, then advances to next; when it reaches the “head” again it has completed one full round. No need to check for “end of list”—the list is circular by design.
Formal Definition
Circular linked list (singly): A linked list in which the last node’s next points to the first node (head), forming a ring. There is no node with next is None. The list is identified by a head (or any designated node). Traversal: start at head, follow next until you return to head (or use a counter for n steps). Circular doubly linked list: Same idea, with head.prev = tail and tail.next = head, so you can go forward or backward in a loop.
Empty list is still head is None. A list with one node has node.next = node (and in doubly, node.prev = node)—that’s the critical edge case.
Why This Topic Matters
- Round-robin and rotation: Any “take turns in a circle” or “rotate through items” logic maps naturally to a circular list; advancing is just
curr = curr.nextwith no special case for the last element. - Josephus problem: Classic problem: n people in a circle, eliminate every k-th person; who survives? Modeling the circle as a circular list and repeatedly removing the k-th node is a direct approach.
- Interview awareness: Less common than linear lists, but “detect cycle” (Section 8.5) and “split a circular list” sometimes appear; understanding that the last node points to head (not None) avoids bugs.
Mental Model
Picture a ring or loop of nodes. You have a “start” marker (head). Walking “next” from any node always leads to another node; after n steps you’re back at the start. To avoid infinite loops in code, you either (1) stop when curr == head again (after at least one step), or (2) iterate exactly n times if you know the length, or (3) use a “visited” set (rarely needed if you control the structure).
Singly Circular: Node and List
Same node as singly linked list: data and next. The list keeps a head; the last node satisfies last.next == head. Optionally we keep a tail so that insert-at-tail is O(1) and we don’t have to traverse to find the last node.
class Node:
def __init__(self, data):
self.data = data
self.next = None
class CircularLinkedList:
def __init__(self):
self.head = None
self.tail = None # tail.next will point to head
self.length = 0
ASCII Diagram
Empty: head → None, tail → None
Circular list [10, 20, 30]:
head
│
▼
┌─────┬─────┐
│ 30 │ ●──┼──┐
└──●──┴─────┘ │
│ │
│ ┌─────┬▼────┐
└───►│ 10 │ ●──┼──┐
└─────┴──●─┘ │
│ │
│ ┌─────┬▼────┐
└──►│ 20 │ ●──┼──┘
└─────┴─────┘
tail.next → head; last node (30) points to 10.
One node: head → [data|next] → points to self (node.next = node).
Python Implementation: Singly Circular
class Node:
def __init__(self, data):
self.data = data
self.next = None
class CircularLinkedList:
def __init__(self):
self.head = None
self.tail = None
self.length = 0
def is_empty(self):
return self.head is None
def insert_at_head(self, data):
new_node = Node(data)
if self.head is None:
new_node.next = new_node
self.head = self.tail = new_node
else:
new_node.next = self.head
self.tail.next = new_node
self.head = new_node
self.length += 1
def insert_at_tail(self, data):
new_node = Node(data)
if self.head is None:
new_node.next = new_node
self.head = self.tail = new_node
else:
new_node.next = self.head
self.tail.next = new_node
self.tail = new_node
self.length += 1
def delete_at_head(self):
if self.head is None:
return None
value = self.head.data
if self.head == self.tail:
self.head = self.tail = None
else:
self.tail.next = self.head.next
self.head = self.head.next
self.length -= 1
return value
def traverse(self, max_steps=None):
"""Yield nodes from head, then around. If max_steps is n, yield n times then stop."""
if self.head is None:
return
curr = self.head
steps = 0
while True:
yield curr.data
steps += 1
if max_steps is not None and steps >= max_steps:
break
curr = curr.next
if curr == self.head:
break
Line-by-Line Explanation (Key Points)
insert_at_head(non-empty): New node’snext = head. Thentail.next = new_nodeso the circle stays closed (old last still points to the new first). Thenhead = new_node. If we didn’t updatetail.next, the circle would break.insert_at_head/insert_at_tail(empty): Only node in the list, sonew_node.next = new_node. Set both head and tail to it.delete_at_head: If single node, set head and tail to None. Otherwise,tail.next = head.next(close the circle around the new head), thenhead = head.next. Forgetting to updatetail.nextwould leave tail pointing to the removed node and break the circle.traverse: Stop whencurr == self.headafter at least one step, or whenmax_stepsis reached. Without a stopping condition you get an infinite loop.
When inserting or deleting at head (or tail), forgetting to update the last node’s next. In a circular list, the tail always points to the head. So: on insert_at_head, set tail.next = new_node before moving head; on delete_at_head, set tail.next = head.next before moving head. Also: the one-node case must set node.next = node and on delete set both head and tail to None.
Doubly Circular (Concept)
In a doubly circular list, head.prev = tail and tail.next = head. Insert/delete at head or tail require updating both the head and tail sides of the ring (four pointer updates instead of two). The same O(1) delete-given-node idea from Section 8.2 applies: node.prev.next = node.next and node.next.prev = node.prev; no need to treat “end” differently because there is no end.
Traversal and Termination
Ways to traverse once around without infinite loop:
- By count: For i in range(length): use curr, then curr = curr.next. Stops after n steps.
- By reaching head again: Start at curr = head; do { process curr; curr = curr.next } while curr != head. Process each node exactly once.
- Do-while style: Process node, then advance; stop when you’re back at head. Ensures the head is processed.
Time and Space Complexity
Same as linear singly linked list when a tail pointer is kept: insert at head O(1), insert at tail O(1), delete at head O(1). Search O(n)—and you must use a count or “back at head” to stop. Space O(n) for n nodes. The only extra cost is maintaining the circle (updating tail.next on head insert/delete).
Edge Cases
- Empty list: head and tail None. Insert creates the single node with
node.next = node. - Single node:
head == tailandhead.next == head. Delete must set head = tail = None. - Two nodes: Each points to the other. Delete head: tail stays, tail.next must become tail (new head), so tail.next = head.next correctly gives tail.next = tail.
Josephus Problem (Application)
n people stand in a circle; every k-th person is eliminated until one remains. Model: circular list of n nodes. Repeatedly advance k−1 steps (so the next node is the k-th), remove that node, and continue from the next. When one node remains, its data is the survivor. Removal is O(1) if we use a doubly circular list (or O(k) per removal with singly by advancing then deleting the next node). Total time O(n·k) naive; faster formulations exist using recursion or closed form.
For “delete the k-th node from current” in a circular list, advance (k−1) times so you’re at the node before the one to delete, then do “prev.next = curr.next” (singly: you need the predecessor, so advance k−1 from current to land on predecessor, then skip next). In a doubly circular list you can land on the node to delete and remove it in O(1).
If the problem says “circular list” or “nodes in a ring,” remember: there is no None at the end—traversal stops when you return to the start (or after n steps). Mention the one-node case (node.next = node) and that insert/delete at head must update tail.next to keep the circle closed. For Josephus, briefly describe “circular list, repeatedly skip k−1 and remove the next node” and note that a doubly circular list allows O(1) removal.
Practice Problems
- Implement a circular linked list with insert_at_head, insert_at_tail, delete_at_head, and safe traversal.
- Josephus problem: find the survivor when every k-th person is eliminated from n people in a circle.
- LeetCode 708: Insert into a Sorted Circular Linked List (insert and keep order; handle wrap-around).
Summary
- A circular linked list has the last node’s
next(and in doubly, head’sprev) point back to the head, forming a ring. No node has next/prev None (except in the empty list). - Maintain the circle on every insert/delete: e.g.
tail.next = headalways; on insert_at_head settail.next = new_node; on delete_at_head settail.next = head.next. - One-node list:
node.next = node. Traverse by stopping when you return to head or after n steps. Use circular lists for round-robin, rotation, or Josephus-style problems.
8.4 Reverse Linked List
Introduction
Reversing a linked list means flipping the direction of every next pointer so that the last node becomes the new head and the original head becomes the tail. It is one of the most frequently asked linked-list problems in interviews and appears in many variations: reverse the whole list, reverse a portion (between positions m and n), reverse in groups of k, or reverse nodes in even/odd groups. Mastering the basic “reverse the entire list” gives you the same pointer-manipulation skills you need for all of these. We will build from intuition to code, then compare iterative (in-place) and recursive solutions and see why the iterative version is usually preferred for production and interviews.
Real-World Analogy
Imagine a train with cars linked in a single direction. The engine is at the front (head); each car is coupled only to the car behind it. To “reverse” the train, you don’t move the cars—you re-couple them. You start from the engine: uncouple it from the next car, then that next car becomes the new “front” and you attach the old engine behind it. You repeat: the car that was third is now first, and you attach the previous “front” behind it. By the time you reach the last car, the entire train is reversed: the old last car is the new engine, and the old engine is at the back. Reversing a linked list is the same idea: we rewire next pointers one node at a time, without moving data.
Original list: head → 1 → 2 → 3 → None. After reverse: head → 3 → 2 → 1 → None. The node that held 1 now has next = None (it’s the tail); the node that held 3 is the new head. The values don’t move—only the links change.
Formal Definition
Reverse linked list (in-place): Given the head of a singly linked list, reverse the list by changing each node’s next pointer to point to its previous node (instead of the next). The former last node becomes the new head; the former head becomes the last node (its next becomes None). The operation is typically done in-place—no new list is allocated; only references are updated. Return the new head of the reversed list.
We do not create new nodes or copy values. We only change next pointers. That’s why we need to keep a reference to the “previous” node as we traverse: so we can set curr.next = prev. The challenge is doing this without losing the rest of the list—we must save the next node before we overwrite curr.next.
Why This Topic Matters
- Interview staple: LeetCode 206 (Reverse Linked List) is among the most common questions. Interviewers use it to check if you can manipulate pointers correctly and handle the head/tail and single-node cases.
- Building block: “Reverse between m and n,” “reverse in groups of k,” “reorder list,” and “palindrome linked list” all use the same idea: reverse a segment and reattach it. Once you can reverse a full list, you can reverse a sublist.
- Pointer discipline: Reversing in-place teaches you to update pointers in the right order (save next, then rewire, then advance) and to avoid losing references—a skill that transfers to merge, partition, and reorder problems.
Mental Model
Picture three pointers moving together:
- prev: The node that should come “after” the current node in the new list (i.e. the already-reversed part’s head). Initially
None(because the new tail will point to nothing). - curr: The node we are currently rewiring. We will set
curr.next = prev. - next_node: The rest of the list. We must save this before changing
curr.next, otherwise we lose the list.
In one step we: save next_node = curr.next, set curr.next = prev, then move prev = curr and curr = next_node. Repeat until curr is None; then prev is the new head.
Step-by-Step Breakdown (Iterative)
- Initialize
prev = Noneandcurr = head. We have not reversed any nodes yet, so “previous” to the first node is nothing. - While
curris notNone:- Save the rest of the list:
next_node = curr.next. - Rewire:
curr.next = prev. Now the current node points backward (to the already-reversed part). - Advance:
prev = currandcurr = next_node.
- Save the rest of the list:
- When the loop exits,
currisNone(we passed the tail). The last node we processed isprev, which is the new head. Returnprev.
Critical rule: always save curr.next before you overwrite it. Otherwise you cannot move to the next node.
ASCII Diagram: One Step of the Iterative Reverse
Before rewiring (curr = node 2, prev = node 1 reversed):
prev curr next_node
│ │ │
▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────┐
│ 1 │◄─│ 2 │ │ 3 │──► ...
└──●──┘ └──●──┘ └──●──┘
▲ │ │
│ │ │
curr.next = prev (save this before overwriting!)
Step 1: next_node = curr.next (save 3)
Step 2: curr.next = prev (2 now points to 1)
Step 3: prev = curr, curr = next_node (prev=2, curr=3)
After one step:
... ◄── 1 ◄── 2 curr → 3 → ...
prev
After processing the last node, curr becomes None and prev is the old last node—the new head. The list is fully reversed.
Evolution: From Extra Space to In-Place
We can reverse a linked list in several ways. Seeing the evolution helps you choose the right approach and understand why the iterative in-place method is optimal.
Approach 1: Brute Force — Use an Array (or Stack)
Idea: Traverse the list and push each node’s value onto an array. Then traverse again (or traverse the array from the end) and overwrite each node’s value with the values in reverse order. Alternatively, push references to nodes onto a stack, then pop and rewire next pointers.
Time: O(n). Space: O(n) for the array or stack. It works and is easy to reason about, but it uses extra space when we don’t need to.
Approach 2: Recursive Reverse
Idea: Assume we can reverse the sublist starting at head.next. That returns the new head of the reversed rest. The current head is still pointing to the old second node (which is now the tail of the reversed rest). We set head.next.next = head (the old second node now points back to head) and head.next = None, then return the new head we got from the recursion.
Time: O(n). Space: O(n) for the call stack. Elegant and teaches recursion; in practice the stack depth equals list length.
Approach 3: Iterative In-Place (Optimal)
Idea: The three-pointer method above. One pass, rewire as we go. No extra data structure and no recursion stack (aside from a few local variables).
Time: O(n). Space: O(1). This is what you want in interviews and production when asked to “reverse in-place.”
| Approach | Time | Space | Note |
|---|---|---|---|
| Array/stack (values or refs) | O(n) | O(n) | Simple but extra space |
| Recursive | O(n) | O(n) stack | Elegant; stack depth = n |
| Iterative in-place | O(n) | O(1) | Preferred in interviews |
Python Implementation
Iterative (In-Place) — Recommended
def reverse_list(head):
prev = None
curr = head
while curr is not None:
next_node = curr.next # save rest of list
curr.next = prev # rewire: point backward
prev = curr # advance prev
curr = next_node # advance curr
return prev # new head
Recursive
def reverse_list_rec(head):
if head is None or head.next is None:
return head
new_head = reverse_list_rec(head.next)
head.next.next = head # reverse the link from next back to head
head.next = None # head becomes tail
return new_head
Base case: empty list or single node—nothing to reverse, return head. Recursive case: reverse everything after head; that gives new_head. The node that was second (head.next) is now the tail of the reversed rest; we make it point back to head with head.next.next = head, then set head.next = None so head is the new tail. We never change new_head—it’s returned all the way up.
Line-by-Line Explanation (Iterative)
prev = None: The “previous” node for the first element is nothing; that’s why the new tail’snextwill beNone.curr = head: We start at the current head; we’ll rewire it to point toprev(None).while curr is not None: We process every node. WhencurrbecomesNone, we’ve passed the tail.next_node = curr.next: Must save the rest of the list before we overwritecurr.next. Without this, we could not move to the next node.curr.next = prev: Reverse the link. The current node now points to the already-reversed part (or None for the first node).prev = curr,curr = next_node: Move both pointers forward. The node we just processed becomes the new “head” of the reversed portion; we advancecurrto the next node to process.return prev: When the loop ends,currisNone(we’ve passed the last node). The last node we processed isprev—that’s the new head of the reversed list.
Time Complexity
We visit each node exactly once: one next read, one next write, and pointer updates. So the number of operations is proportional to n (the number of nodes). Time: O(n).
Space Complexity
- Iterative: Only a fixed number of variables (
prev,curr,next_node). No extra data structure and no recursion. Space: O(1). - Recursive: Each call uses a stack frame. For a list of length n, we have n recursive calls (minus the base case). Space: O(n) for the call stack.
Edge Cases
- Empty list (
head is None): The while loop never runs; we returnprev, which isNone. Correct. - Single node: We do one iteration:
next_node = None,curr.next = None,prev = head,curr = None. We returnprev(the single node). Correct. - Two nodes: First iteration reverses the first node; second iteration reverses the second and we return it as the new head. Both iterative and recursive handle this without special code.
No need for a special check for empty or single node in the iterative version—the logic naturally returns None or the single node. You can add if not head or not head.next: return head as an early exit for clarity or for the recursive version (recursive needs the base case).
Common Mistakes
Overwriting curr.next before saving it. If you do curr.next = prev first and then try to get “the next node” with curr = curr.next, you’ve lost the rest of the list. Always save next_node = curr.next at the start of the loop.
- Returning
headinstead ofprev: After the loop, the original head is now the tail; the new head isprev. Returningheadwould give the caller a list that looks like a single node (or wrong head). - Using
curr.nextafter rewiring: Oncecurr.next = prev, the link to the rest of the list is gone unless you saved it innext_node. Never rely oncurr.nextfor “advance” after you’ve changed it—use the saved reference.
For “reverse the entire list,” the iterative O(1) space solution is already optimal: you must touch every node to change its pointer, so Ω(n) time, and you need only a constant number of pointers to do it in one pass, so O(1) extra space is achievable. The only “optimization” beyond the standard iterative solution is code clarity: use clear names (prev, curr, next_node) and avoid unnecessary branches (empty/single-node are handled by the same loop).
Pattern Recognition
Reversing a linked list is a pointer-rewiring pattern:
- Maintain a “reversed so far” (or “previous”) reference; process one node at a time; save the rest of the list before rewiring; then advance. The same idea appears when reversing a sublist (e.g. between positions m and n): you find the node before the sublist and the node after it, reverse the middle with the same while loop, then reattach the new head and tail of the reversed segment.
- In “reverse in groups of k,” you reverse the first k nodes with this pattern, then recursively or iteratively process the next segment and attach the reversed segments. The core step is always “reverse a contiguous segment by rewiring next pointers.”
Once you can reverse a full list in a loop, you can isolate that loop for a segment and add logic to find the segment boundaries and reattach—that’s the pattern for LeetCode 92 (Reverse Linked List II) and 25 (Reverse Nodes in k-Group).
When reversing a sublist (e.g. from position m to n), use a dummy node to simplify: dummy.next = head. Traverse to the node before position m (call it left_prev). The sublist head is left_prev.next. Run the same iterative reverse logic for (n − m + 1) nodes, then set left_prev.next to the new head of the reversed segment and the old head of the segment (now tail) to the node after position n. The same three-pointer logic applies inside the segment.
State the problem clearly: “Reverse the list in-place by changing next pointers; return the new head.” Give the iterative solution with prev, curr, and saving next_node before rewiring. Mention edge cases: empty list (return None), single node (return that node). If asked for recursion, give the recursive version and note that it uses O(n) stack space. Interviewers often follow up with “reverse between m and n” or “reverse in groups of k”—recognize that the core is the same reverse loop applied to a segment.
Practice Problems
- LeetCode 206: Reverse Linked List (implement both iterative and recursive).
- LeetCode 92: Reverse Linked List II (reverse nodes from position m to n; use dummy node and the same rewiring loop).
- LeetCode 25: Reverse Nodes in k-Group (reverse every k nodes; link reversed segments).
- LeetCode 234: Palindrome Linked List (reverse the second half and compare with the first half, or use a stack).
- LeetCode 24: Swap Nodes in Pairs (similar pointer discipline; can be seen as reverse in groups of 2).
Summary
- Reverse linked list means rewiring each node’s
nextto point to its previous node; the old tail becomes the new head, the old head becomes the tail (next = None). - Iterative in-place: Use
prev,curr, and savenext_node = curr.nextbefore settingcurr.next = prev; thenprev, curr = curr, next_node. ReturnprevwhencurrisNone. Time O(n), space O(1). - Recursive: Base case: empty or single node. Otherwise reverse
head.next, sethead.next.next = headandhead.next = None, return the new head from the recursion. Time O(n), space O(n) stack. - Always save the next node before overwriting
curr.next. Return the new head (previn iterative), not the originalhead. The same rewiring pattern extends to reversing a sublist or reversing in groups of k.
8.5 Detect Cycle (Floyd's Algorithm)
Introduction
A cycle in a linked list exists when some node’s next pointer eventually points back to an earlier node, so traversal never reaches None and instead loops forever. Detecting whether a list has a cycle—and optionally finding where the cycle starts—is a classic problem. The elegant solution is Floyd’s cycle-finding algorithm (also called the tortoise and hare): use two pointers that move at different speeds. If there is a cycle, they will eventually meet inside the cycle; if there is no cycle, the fast pointer reaches None. The algorithm runs in O(n) time and O(1) extra space, and the same slow/fast pointer idea appears in “find the middle of the list” and “find duplicate number” in an array of integers in range 1..n. This section builds the intuition, proves why the pointers meet, and shows how to find the cycle’s starting node.
Real-World Analogy
Imagine a circular track. Two runners start at the same point: one runs at speed 1 (one step per second), the other at speed 2 (two steps per second). If the track is a straight line (no cycle), the fast runner reaches the end and stops. If the track is a circle (cycle), the fast runner will eventually lap the slow runner—they meet. You don’t need to mark every position or remember where you’ve been; you only need two runners and the rule “slow moves 1, fast moves 2.” That’s Floyd’s algorithm: the tortoise (slow) and the hare (fast) both start at the head; slow advances one node per step, fast advances two. If fast hits None, no cycle. If they meet, there is a cycle.
List: head → 1 → 2 → 3 → 4 → 5 → 3 (5 points back to 3). Slow and fast start at 1. After a few steps: slow is inside the cycle, fast is also inside and “catches up” from behind. They meet at some node inside the cycle. If the list were 1 → 2 → 3 → None, fast would become None and we return false.
Formal Definition
Cycle: A linked list has a cycle if there exists a node such that following next pointers repeatedly eventually leads back to that same node. Equivalently, no node has next = None reachable from the head. Cycle detection: Given the head of a singly linked list, determine whether the list contains a cycle. Optionally, return the node where the cycle begins (cycle start). Floyd’s algorithm: Use two pointers (slow and fast) starting at head. Slow moves one step per iteration (slow = slow.next), fast moves two (fast = fast.next.next). If fast or fast.next becomes None, there is no cycle. If slow == fast at some point, there is a cycle.
We do not use a hash set to store visited nodes—that would work but uses O(n) space. Floyd’s method uses only two pointers, so O(1) extra space.
Why This Topic Matters
- Interview staple: LeetCode 142 (Linked List Cycle II) and the simpler “has cycle?” (LeetCode 141) are very common. Interviewers use them to check understanding of two-pointer techniques and optional follow-up “find the start of the cycle.”
- Same pattern elsewhere: “Find the middle of the linked list” (slow once, fast twice; when fast reaches the end, slow is at the middle). “Find duplicate in array 1..n” (treat array indices as next pointers; then the array is like a linked list with a cycle, and Floyd finds the duplicate).
- Correctness and proof: Understanding why the tortoise and hare meet (and why moving one back to head and advancing both one step at a time finds the cycle start) separates memorized code from real understanding.
Mental Model
Picture the list as a path that is either a straight line (tail’s next is None) or a stick with a loop: a “tail” from head into the cycle, then a circle. Two pointers start at the head. Slow takes one step, fast takes two. On a straight line, fast hits None. In a loop, both eventually enter the cycle; once inside, fast gains one step per iteration relative to slow, so fast will catch slow after a number of steps at most the length of the cycle. So: if fast ever becomes None (or fast.next is None before we use fast.next.next), no cycle. If slow == fast, cycle.
Step-by-Step: Floyd’s Algorithm (Detection Only)
- Initialize
slow = headandfast = head. (IfheadisNone, return False.) - Loop while
fastis notNoneandfast.nextis notNone:- Move slow:
slow = slow.next. - Move fast:
fast = fast.next.next. - If
slow == fast, return True (cycle detected).
- Move slow:
- If the loop exits, fast reached
Noneor had nonext, so the list is acyclic. Return False.
We check fast.next before using fast.next.next to avoid calling next on None.
Why Do Slow and Fast Meet? (Intuition)
Suppose the list has a cycle. Let L be the number of nodes from head to the cycle entrance, and C the number of nodes in the cycle. After L steps, slow is at the cycle entrance. Fast might already be inside the cycle; it has moved 2L steps total. Think of fast as “ahead” of slow by some offset in the cycle. From here on, both are inside the cycle. Each step, fast moves one extra step relative to slow. So fast gains one “position” on slow per step. The cycle has C positions, so within C steps fast will have lapped slow—they meet. So total steps until meeting is O(L + C) = O(n).
Formally: when slow enters the cycle, fast is somewhere in the cycle. Let the distance (in cycle steps) from slow to fast be d (0 ≤ d < C). Each step, this distance decreases by 1 (fast gains one). So after d steps they meet. So they always meet within one full cycle after slow enters.
Finding the Cycle Start (Optional but Important)
Once slow and fast meet at some node meet, we can find the cycle’s starting node without extra space:
- Keep one pointer at
meetand set another pointerentry = head. - Move both one step at a time:
entry = entry.next,meet = meet.next(or move the pointer that was at meet). They will meet at the cycle entrance.
Why? Let L = distance head → cycle start, C = cycle length. When slow and fast first meet, slow has traveled L + a for some a (0 ≤ a < C), and fast has traveled 2(L + a) = L + a + kC for some integer k ≥ 1. So L + a = kC, hence L = kC − a. So from the meeting point, a steps forward in the cycle brings you to the cycle start (because a steps from meeting is (L + a) + a = L + 2a, and we need L mod C…). The cleaner fact: distance from head to cycle start is L; distance from meeting point to cycle start (going forward in the cycle) is C − a. One can show L = C − a (mod C), so moving one pointer from head and one from meet, both one step at a time, they meet exactly at the cycle start after L steps.
ASCII Diagram: List With Cycle
List with cycle (tail points into the loop):
head
│
▼
┌───┐ ┌───┐ ┌───┐
│ 1 │────►│ 2 │────►│ 3 │
└───┘ └───┘ └───┘
│
│
┌───┐ ┌───┐ ◄───┘
│ 6 │◄────│ 5 │◄────│ 4 │
└───┘ └───┘ └───┘
▲
│
Cycle: 3 → 4 → 5 → 6 → 3 ...
Slow and fast both enter; fast catches slow inside the cycle.
Evolution: Hash Set vs Floyd
Two main approaches:
Approach 1: Hash Set (Visited Nodes)
Traverse from head. For each node, check if it is in a set of visited nodes; if yes, cycle (and this node is in the cycle). If no, add the node and move to next. If you reach None, no cycle. Time O(n), Space O(n). Easy to implement and to find cycle start (first repeated node).
Approach 2: Floyd’s Algorithm (Two Pointers)
Slow and fast as above. No set. Time O(n), Space O(1). Preferred when O(1) space is required. Cycle start can still be found by the “entry from head” trick after detection.
| Approach | Time | Space |
|---|---|---|
| Hash set (visited) | O(n) | O(n) |
| Floyd (slow/fast) | O(n) | O(1) |
Python Implementation
Detection Only (Has Cycle?)
def has_cycle(head):
if head is None:
return False
slow = head
fast = head
while fast is not None and fast.next is not None:
slow = slow.next
fast = fast.next.next
if slow == fast:
return True
return False
Detection + Return Cycle Start (or None)
def detect_cycle_start(head):
if head is None or head.next is None:
return None
slow = head
fast = head
while fast is not None and fast.next is not None:
slow = slow.next
fast = fast.next.next
if slow == fast:
entry = head
while entry != slow:
entry = entry.next
slow = slow.next
return entry
return None
After finding the meeting point, we move entry from head and slow from meet one step at a time until they are equal; that node is the cycle start.
Line-by-Line Explanation (Detection)
if head is None: return False: Empty list has no cycle.slow = fast = head: Both start at the head.while fast is not None and fast.next is not None: We needfast.nextto exist so we can usefast.next.nextwithout raising. If fast reaches the end, no cycle.slow = slow.next,fast = fast.next.next: Tortoise moves 1 step, hare moves 2.if slow == fast: return True: Same node → we’re in a cycle.return False: Loop exited because fast hit the end.
Time Complexity
When there is no cycle: fast reaches None in O(n) steps (at most n/2 iterations of the loop). When there is a cycle: as argued, slow and fast meet in O(L + C) = O(n) steps. So time O(n) in both cases.
Space Complexity
Only a fixed number of pointers (slow, fast, and optionally entry). Space O(1).
Edge Cases
- Empty list:
head is None→ return False (or None for cycle start). - Single node, no cycle:
head.next is None; first iteration hasfast = head,fast.nextis None, so we don’t enter the loop (or we exit). Correct. - Single node, self-cycle:
head.next = head. Thenslow = head,fast = head.next.next = head; we might compare after moving, or we need to move first then compare. Standard implementation: we move slow and fast, then check; after one iteration slow = head.next = head, fast = head; so slow == fast. Correct.
Checking fast and fast.next before advancing avoids null dereference. For “cycle start,” if there is no cycle we return None; if there is, the entry/slow phase runs in O(n) and finds the start.
Common Mistakes
Using fast.next.next without checking fast.next. If the list has an even number of nodes and no cycle, fast can land on the last node; then fast.next is None and fast.next.next raises AttributeError. Always ensure fast is not None and fast.next is not None before moving fast by two.
- Comparing before moving: If you check
slow == fastat the start of the loop (before updating), both are head and you might return true incorrectly for a list that has no cycle but you haven’t moved yet. So: move first, then check; or start with slow at head and fast at head.next and then loop (with proper null checks). The code above moves first then checks, so we never compare the initial equal state—we only compare after at least one step. - Wrong cycle start: The cycle start is not necessarily the meeting node. You must do the second phase: one pointer from head, one from meet, step both until they meet.
Floyd’s algorithm is already optimal for cycle detection in terms of extra space: you need at least one pointer to traverse, and with two pointers you get O(1) space and O(n) time. You cannot do better than O(n) time in the worst case (you may have to enter the cycle to detect it). The hash set approach trades space for implementation simplicity; use Floyd when the problem asks for O(1) space or when you want to show the classic solution.
Pattern Recognition
Slow/fast (tortoise and hare): Same-direction two pointers with different step sizes. Use for: (1) cycle detection in a linked list, (2) finding the middle of a linked list (slow 1, fast 2; when fast reaches the end, slow is at the middle), (3) “find duplicate number” in an array where values are in 1..n and there is exactly one duplicate (treat index i → value arr[i] as next pointer; then Floyd finds the duplicate as the cycle start). The pattern is “two pointers, different speeds; use the meeting point or the relative position to infer something about the structure.”
For “find the middle node” of a linked list: slow = fast = head, then while fast and fast.next: slow = slow.next, fast = fast.next.next. When the loop ends, slow is the middle (or the first of the two middle nodes in an even-length list). Same loop structure as cycle detection; no cycle, so fast eventually becomes None.
State the problem: “Determine if the list has a cycle and optionally return the node where the cycle begins.” Give Floyd’s algorithm: slow and fast from head; move slow by 1, fast by 2; if they meet, cycle; if fast hits None, no cycle. Mention the null check for fast.next before fast.next.next. For “find cycle start,” explain the second phase: pointer from head and pointer from meeting point, both advance one step at a time until they meet—that’s the cycle start. If asked “why do they meet?”, give the intuition: once both are in the cycle, fast gains one step per iteration, so within cycle length steps they meet.
Practice Problems
- LeetCode 141: Linked List Cycle (detection only).
- LeetCode 142: Linked List Cycle II (detect cycle and return the cycle start node).
- LeetCode 287: Find the Duplicate Number (array of n+1 integers in 1..n; one duplicate; use indices as next pointers and apply Floyd to find the duplicate).
- LeetCode 876: Middle of the Linked List (slow/fast; when fast reaches end, slow is middle).
Summary
- A list has a cycle if following
nextnever reachesNoneand eventually repeats a node. Floyd’s algorithm uses two pointers (slow and fast) starting at head; slow moves 1 step, fast 2 steps per iteration. If they meet, there is a cycle; if fast becomes None (or fast.next is None), there is no cycle. - Always check
fast is not None and fast.next is not Nonebefore usingfast.next.next. Move slow and fast first, then check equality (to avoid falsely detecting a “cycle” at head when there isn’t one). - To find cycle start: after slow and fast meet, set one pointer to head and keep one at the meeting node; advance both one step at a time until they meet; that node is the cycle entrance. Time O(n), space O(1).
- The same slow/fast pattern is used for finding the middle of a list and for “find duplicate in array 1..n” by modeling the array as a linked list with a cycle.
8.6 Merge Lists
Introduction
Merging two sorted linked lists means combining them into one sorted list by repeatedly taking the smaller of the two current heads and appending it to the result. It is the linked-list version of the “merge” step in merge sort and is one of the most common list operations in interviews. You use a dummy node to avoid special-casing the first node, and two pointers (one per list) that advance as you attach nodes. The result can be built in-place (reusing the existing nodes) so that time is O(n + m) and extra space is O(1). This section covers the two-sorted-list merge in detail, then briefly extends to merge k sorted lists (using a heap or repeated two-way merges).
Real-World Analogy
Imagine two sorted stacks of cards (e.g. both sorted by number). You want one combined sorted stack. You look at the top of each stack; take the smaller card and put it face-down on the result pile; repeat until one stack is empty, then put the rest of the other stack on top. You never need to “search” for where to insert—the next smallest overall is always one of the two tops. Merging two sorted linked lists is the same: the “top” is the head of each list; you compare, take the smaller, advance that list’s pointer, and attach the node to the merged list.
List A: 1 → 3 → 5. List B: 2 → 4 → 6. Compare 1 and 2, take 1; compare 3 and 2, take 2; compare 3 and 4, take 3; and so on. Result: 1 → 2 → 3 → 4 → 5 → 6. Each node is chosen in O(1) time; total O(n + m).
Formal Definition
Merge two sorted lists: Given the heads of two singly linked lists sorted in non-decreasing order, merge them into one sorted list. We “merge” by repeatedly choosing the smaller of the two current head values, appending that node to the result, and advancing the pointer of the list we took from. The merged list should use the existing nodes (in-place) or new nodes, as required. Return the head of the merged list. Either list may be empty; the result is the other list. This is the same logic as the merge step in merge sort, but on linked lists instead of arrays.
The key is to avoid special-casing “who is the first node?” A dummy node (a temporary node whose next will point to the real head) lets us always do “append to current tail” in the same way; at the end we return dummy.next.
Why This Topic Matters
- Interview staple: LeetCode 21 (Merge Two Sorted Lists) is extremely common. It tests pointer handling and the dummy-node pattern. LeetCode 23 (Merge k Sorted Lists) is a natural follow-up.
- Merge sort on lists: Merge sort for linked lists works by splitting the list (e.g. slow/fast for middle), recursively sorting the two halves, then merging with this exact algorithm. So “merge two sorted lists” is the core subroutine.
- Reusable pattern: The same “two pointers, compare, take smaller, advance” pattern appears in merging sorted arrays and in two-pointer problems on sorted data. Master it once for lists and you reuse it everywhere.
Mental Model
You have two “current” nodes: p for list 1, q for list 2. You also maintain a “tail” of the merged list (initially the dummy). In each step: if one list is exhausted, attach the rest of the other to the tail and stop. Otherwise, compare p.val and q.val; attach the smaller node to the tail, advance that list’s pointer (p or q), and set tail to the node you just attached. Repeat until both lists are consumed. The dummy’s next is the head of the merged list.
Step-by-Step Breakdown
- Create a dummy node and set
tail = dummy. We will build the merged list by doingtail.next = ...and thentail = tail.next. - Set
p = list1,q = list2. - While both
pandqare not None:- If
p.val <= q.val: settail.next = p, thenp = p.next. - Else: set
tail.next = q, thenq = q.next. - Set
tail = tail.next(the node we just attached).
- If
- After the loop, exactly one of
porqmay still have nodes. Attach the remainder:tail.next = p if p is not None else q. - Return
dummy.next(the real head; dummy is not part of the result).
ASCII Diagram: Merge in Progress
list1: 1 → 3 → 5 → None list2: 2 → 4 → 6 → None
p q
dummy → 1 → 2 → 3 → ?
tail p q → 4 → 6
We compare p.val (3) and q.val (4); take 3, so tail.next = p, tail = p, p = p.next.
Then tail.next = q (4), tail = q, q = q.next; etc.
Finally attach remaining: tail.next = p or q (whichever is non-None).
Python Implementation
Merge Two Sorted Lists (In-Place)
def merge_two_lists(list1, list2):
dummy = ListNode(0) # or Node(0) with .val and .next
tail = dummy
p, q = list1, list2
while p is not None and q is not None:
if p.val <= q.val:
tail.next = p
p = p.next
else:
tail.next = q
q = q.next
tail = tail.next
tail.next = p if p is not None else q
return dummy.next
We reuse existing nodes; no new list allocation. Only dummy is extra (one node).
Line-by-Line Explanation
dummy = ListNode(0),tail = dummy: Dummy gives us a uniform “append” point. We always settail.nextand then movetailto the new end.p, q = list1, list2: Current heads of the two lists.while p is not None and q is not None: As long as both lists have a node, we compare and take one.if p.val <= q.val: Take the smaller (use<=so list1 is preferred when equal; either way is fine).tail.next = pattaches that node;p = p.nextadvances list1.tail = tail.next: The new tail is the node we just attached.tail.next = p if p is not None else q: After the loop, one list may still have nodes. Attach the rest in one shot (eitherporqis None, so the other is the remainder).return dummy.next: The first real node of the merged list; dummy is discarded.
Time Complexity
Each node from both lists is attached exactly once. We do O(1) work per node (compare, set pointers). So time O(n + m) where n and m are the lengths of the two lists.
Space Complexity
We only use a dummy node and a few pointers. No extra list or recursion. Space O(1) (the merged list reuses the input nodes; the dummy is one node).
Edge Cases
- Both lists empty:
pandqare None; we never enter the loop;tail.next = p if p is not None else qsetstail.next = None. We returndummy.next= None. Correct. - One list empty: We don’t enter the while loop;
tail.nextis set to the non-empty list’s head. Correct. - Single node in one list: Handled by the same logic; we attach that node and the other list’s remainder.
Common Mistakes
Forgetting to advance the tail. After tail.next = p (or q), you must do tail = tail.next. Otherwise the next attachment overwrites the same tail.next and you lose the rest of the list. Always move tail to the node you just attached.
- Returning the wrong head: Return
dummy.next, notdummy. The dummy is not part of the data; it was only a handle. - Not handling empty lists: The code above handles “both empty” and “one empty” via the loop and the final
tail.next = p if p else q. No need for an extraif not list1: return list2unless you want an early exit for clarity.
Merge K Sorted Lists (Brief)
Given k sorted linked lists, merge them into one sorted list. Two main approaches:
Approach 1: Repeated Two-Way Merge
Merge list 0 with list 1, then merge that result with list 2, and so on. Total time O(k × total nodes) in the worst case (each merge touches the growing result). Simple but can be slow when k is large.
Approach 2: Min-Heap (Priority Queue)
Put the head of each list into a min-heap (by value). Repeatedly pop the smallest, append it to the result, and push its next if not None. Each pop/push is O(log k); we do it for every node across all lists. Time O(N log k) where N is the total number of nodes and k is the number of lists. Space O(k) for the heap. This is the standard optimal solution for LeetCode 23.
In Python for “merge k lists,” use heapq. Push (node.val, id(node), node) so that nodes are comparable (heapq compares by tuple; if values tie, id(node) breaks ties). Or use a wrapper class that implements __lt__ by comparing val. When you pop, get the node, set tail.next = node, tail = node, and if node.next is not None, push node.next.
Pattern Recognition
Two-pointer merge: Two sorted sequences, two pointers; compare current elements, take the smaller, advance that pointer. Same idea as merging two sorted arrays; here we only change next pointers. Dummy node: When building a new list and the head is not known in advance (or you want to avoid “if first node” branches), use a dummy as the initial tail and return dummy.next. This pattern appears in “merge two lists,” “partition list,” and “remove elements.”
State the problem: “Merge two sorted linked lists into one sorted list, reusing nodes.” Use a dummy node and a tail; in a loop, compare the two current heads, attach the smaller to tail, advance that pointer and tail. After the loop, attach the remainder. Return dummy.next. Mention edge cases: both empty, one empty. If asked to merge k lists, give the heap approach: push all heads, pop smallest, append, push next; O(N log k) time, O(k) space.
Practice Problems
- LeetCode 21: Merge Two Sorted Lists (implement with dummy and in-place).
- LeetCode 23: Merge k Sorted Lists (heap of k heads; pop smallest, push next).
- LeetCode 2: Add Two Numbers (two lists representing digits; similar two-pointer traverse and build result).
- LeetCode 148: Sort List (merge sort: find middle with slow/fast, recurse, merge two sorted halves).
Summary
- Merge two sorted lists: Use a dummy node and a tail; compare the two current heads (
p,q), attach the smaller totail.next, advance that pointer andtail; then attach the remainder and returndummy.next. Time O(n + m), space O(1) excluding the dummy. - Always set
tail = tail.nextafter attaching a node so the next attachment goes to the new end. - For merge k sorted lists, use a min-heap of the k heads; pop smallest, append to result, push
next; time O(N log k), space O(k). The two-way merge pattern is the core of merge sort on linked lists.
8.7 Intersection Point
Introduction
Two singly linked lists may intersect: at some node they merge into a single list and share the same nodes until the end. The lists can have different lengths before the intersection. The problem is: given the heads of two such lists, find the intersection node (the first node that is common to both lists), or return None if they do not intersect. You cannot modify the lists. The elegant O(1) space solution uses two pointers that traverse both lists in a way that aligns their “distance from the end”: either by computing lengths and advancing the longer list’s pointer by the difference, or by having each pointer traverse list A then list B (and the other list B then list A) so they cover the same total distance and meet at the intersection. This section covers both approaches and the hash-set fallback.
Real-World Analogy
Imagine two roads that start in different places but later merge into one. Two people start at the two beginnings and walk at the same speed. If one road is longer before the merge, that person has a head start. To make them “meet at the merge point,” the person on the longer road could start later—by exactly the length difference. Alternatively: person A walks “road 1 then road 2,” person B walks “road 2 then road 1.” They cover the same total distance; when they meet, they are at the same node. If there is no merge, one road ends and they never meet at a shared node. That’s the two-pointer idea for intersection.
List A: 4 → 1 → 8 → 4 → 5. List B: 5 → 6 → 1 → 8 → 4 → 5. The tail 8 → 4 → 5 is shared; the intersection node is the first shared node—the one with value 8. A has 2 nodes before intersection; B has 3. If we align so both pointers are “same distance from the end,” they will meet at the intersection node.
Formal Definition
Intersection of two linked lists: We are given two singly linked list heads, headA and headB. The lists may have different lengths. If they intersect, they share a common tail: from some node onward, next is the same for both. The intersection node is the first node that appears in both lists when traversing from the respective heads. If the lists do not intersect (both have distinct tails), return None. We assume no cycles and do not modify the lists. Return the intersection node (the actual node object), or None.
Constraints: list lengths can differ; we want O(1) extra space if possible; we only traverse and compare nodes by reference (identity), not by value.
Why This Topic Matters
- Interview staple: LeetCode 160 (Intersection of Two Linked Lists) is a common two-pointer problem. It tests whether you can “align” two traversals without extra space.
- Alignment idea: The trick—making two pointers travel the same effective distance so they meet at the target—reappears in “find cycle start” (head + meet) and in problems where you need to compare or sync two sequences of different lengths.
- Reference equality: Intersection is defined by node identity (same object), not value. Two nodes with the same value are not necessarily the intersection node; we compare with
isor==for the node reference.
Mental Model
Picture two lists: each has a “unique” part and then a “common” part. The lengths of the unique parts can differ. If we start two pointers at the two heads and move them one step at a time, they will not meet at the intersection in one pass because one list might be longer before the merge. So we either: (1) compute both lengths, advance the longer list’s pointer by |lenA − lenB| steps so both pointers are the same distance from the end, then step both until they are equal or both None; or (2) run pointer A over list A then list B, and pointer B over list B then list A—each travels lenA + lenB, so they meet at the intersection node on the second “leg” if it exists.
Step-by-Step Breakdown
Method 1: Length Difference
- Traverse list A to get length
lenA; traverse list B to getlenB. - If
lenA > lenB, advanceheadAbylenA − lenBsteps (use a pointerp = headA, advance it). Else advanceheadBbylenB − lenAsteps. Now both pointers are the same distance from the end of their lists. - Move both pointers one step at a time until they point to the same node (intersection) or both become
None(no intersection). Return the node where they meet, orNone.
Method 2: Two Pointers (A then B, B then A)
- Set
p = headA,q = headB. - While
p != q: ifpis None, setp = headB; else setp = p.next. Ifqis None, setq = headA; else setq = q.next. So when one pointer reaches the end of its list, it continues from the other list’s head. - When the loop exits,
p == q. If they are both None, there is no intersection; otherwisep(orq) is the intersection node. Returnp.
In method 2, each pointer travels lenA + lenB nodes. So they meet at the intersection node (when both are on the common part) or both become None after the same number of steps (when there is no intersection).
ASCII Diagram: Intersection
headA: A1 → A2 → \
C1 → C2 → C3 → None (shared tail)
headB: B1 → B2 → B3 → /
Unique A: 2 nodes. Unique B: 3 nodes. Common: 3 nodes.
Intersection node = C1.
Length method: lenA=5, lenB=6. Advance B by 1: start B at B2.
Then move both: A1,B2 → A2,B3 → C1,C1 → meet at C1.
Two-pointer method: p goes A1→A2→C1→C2→C3→(switch to B) B1→B2→B3→C1.
q goes B1→B2→B3→C1→C2→C3→(switch to A) A1→A2→C1.
They meet at C1 when both are on the common part.
Python Implementation
Method 1: Length Difference
def get_intersection_node(headA, headB):
def length(head):
n = 0
while head:
n += 1
head = head.next
return n
lenA, lenB = length(headA), length(headB)
p, q = headA, headB
if lenA > lenB:
for _ in range(lenA - lenB):
p = p.next
else:
for _ in range(lenB - lenA):
q = q.next
while p is not q:
p = p.next
q = q.next
return p
Method 2: Two Pointers (No Length)
def get_intersection_node(headA, headB):
p, q = headA, headB
while p is not q:
p = p.next if p else headB
q = q.next if q else headA
return p
When p reaches the end of A, we set p = headB; when q reaches the end of B, we set q = headA. So each pointer eventually traverses A then B (or B then A). They meet at the intersection node or both become None.
Line-by-Line Explanation (Method 2)
p, q = headA, headB: Start at the two heads.while p is not q: We useisfor reference equality (same node object). When they are the same node—or both None—we exit.p = p.next if p else headB: Ifpis not None, advance; ifpis None (we’ve passed the end of A), switch to list B’s head. Same forqwithheadA.return p: After the loop,p == q. If there was no intersection, both traversed to the end and are None. If there was an intersection, both are at the intersection node. Sopis the correct return value in both cases.
Time Complexity
Method 1: Two length passes O(n + m), then advancing by the difference O(|n − m|), then stepping until meet O(min(n, m)). Total O(n + m).
Method 2: Each pointer travels at most n + m nodes (list A then list B, or vice versa). So at most 2(n + m) steps. O(n + m).
Space Complexity
Both methods use only a few pointers (and method 1 uses a length helper with O(1) extra space). Space O(1).
Edge Cases
- No intersection: Lists have different tails. Method 2: both pointers eventually become None (after traversing A+B and B+A);
p is qwhen both are None, so we return None. Correct. - One or both lists empty: If
headAis None, in method 2 we havep = headBimmediately (sincep.next if pgivesheadBwhen p is None). If both are None, p and q stay None and we return None. Correct. - Same list: If
headA == headB, we return headA immediately (first iterationp is q). - Intersection at head of one list: One list is a suffix of the other. The length method or the two-pointer method still finds the first common node.
Common Mistakes
Comparing by value instead of by reference. The intersection is defined by node identity: the first node that is the same object in both lists. Two nodes with the same value are not necessarily the intersection. Use p is q (or id(p) == id(q)), not p.val == q.val, to detect the intersection node.
- Modifying the lists: We must not change
nextpointers. The problem expects a read-only traversal. Method 1 and 2 only read; don’t reverse or mutate. - In method 2, wrong switch: When
pis None we setp = headB(the other list), notheadA. Mixing these up breaks the “same distance” property.
Alternative: Hash Set
Traverse list A and store every node (or node id) in a set. Then traverse list B; the first node that is in the set is the intersection node. Time O(n + m), Space O(n) or O(m). Use when O(1) space is not required; the two-pointer method is preferred for O(1) space.
The two-pointer method (A then B, B then A) avoids computing lengths and uses a single loop. It is usually the cleanest to code and to explain. Both the length-difference and the two-pointer method achieve O(n + m) time and O(1) space; choose the one you find clearer. The hash set is O(n + m) time but O(n) space—mention it as an alternative if the interviewer allows extra space.
Pattern Recognition
Aligning two traversals: When two sequences have different lengths but share a common suffix (or you want them to “meet” at a certain point), you can either (1) equalize the distance from the end by advancing the longer one, or (2) make each pointer traverse both sequences so they cover the same total length. The same idea appears in “find cycle start” (one pointer from head, one from meeting point; both step once until they meet).
If the problem said “intersection by value” (first node where values match), you’d need to be careful: multiple nodes can have the same value. The standard LeetCode 160 problem is intersection by reference (same node object). Always clarify in an interview: “Is the intersection defined by the same node object or by equal value?”
State the problem: “Find the first node that is common to both lists, or None if they don’t intersect. No cycles, don’t modify the lists.” Give the two-pointer method: p and q start at headA and headB; when p hits the end of A, set p = headB; when q hits the end of B, set q = headA; loop until p is q; return p. Explain why they meet: each travels lenA + lenB, so they align on the common part (or both become None). Mention edge cases: no intersection (return None), one list empty. If asked for another approach, give the length-difference method or the hash set.
Practice Problems
- LeetCode 160: Intersection of Two Linked Lists (implement two-pointer and/or length-difference; O(1) space).
- Variation: If lists could have cycles, you must first detect cycles and find cycle starts; then reason about intersection of two lists that may have cycles (more complex).
Summary
- Intersection point is the first node that is common to both lists (by reference). Use two pointers and align their effective path length so they meet at that node or both become None.
- Length method: Get lenA and lenB; advance the longer list’s pointer by |lenA − lenB|; then step both until they are equal. Return the meeting node or None.
- Two-pointer method: p = headA, q = headB. While p is not q: p = p.next if p else headB, q = q.next if q else headA. Return p. Each pointer traverses list A then B (or B then A), so they meet at the intersection or both become None. Time O(n + m), space O(1).
- Compare nodes by reference (
p is q), not by value. Do not modify the lists.
8.8 Remove Nth Node
Introduction
Remove the nth node from the end of a singly linked list: given the head and an integer n, remove the node that is the n-th from the end (1-indexed from the end) and return the head. For example, n = 1 means remove the last node; n = 2 means remove the second-to-last; if the list has length L, n = L means remove the first node (head). The challenge is that we don’t know the length in advance—we need a single pass or a clever two-pointer setup. The standard solution uses a dummy node and two pointers: advance one pointer n + 1 steps ahead, then move both until the leading pointer reaches the end; the trailing pointer is then just before the node to remove, so we can do prev.next = prev.next.next. This section covers the one-pass two-pointer approach, edge cases (removing the head, single node), and why the dummy simplifies the code.
Real-World Analogy
Imagine a line of people and you must remove the person who is “n places from the end.” If you start at the front, you don’t know where the end is until you walk there. Trick: send one person n steps ahead. When the person at the back reaches the end of the line, the person who was sent ahead is exactly n steps in front of “the end”—so the person n steps behind the leader is standing right before the one to remove. You only need one pass: move the “leader” and the “follower” together until the leader hits the end; then the follower is one step before the node to delete. A dummy at the very front lets you treat “removing the head” the same as removing any other node (the follower sits on the dummy, and dummy.next is the node to remove).
List: 1 → 2 → 3 → 4 → 5, n = 2. We remove the 2nd from end = the node with value 4. Result: 1 → 2 → 3 → 5. If n = 5, we remove the head (1); result: 2 → 3 → 4 → 5. The dummy node lets us handle “remove head” without a special branch: the follower ends up at the dummy, and we set dummy.next = head.next.
Formal Definition
Remove Nth node from end: Given the head of a singly linked list and an integer n (1 ≤ n ≤ list length), remove the n-th node from the end of the list (1-indexed: 1 = last node, 2 = second-to-last, etc.) and return the head. We assume the list has at least n nodes. We do this in (at most) one pass and O(1) extra space. The removed node is no longer part of the list; the list’s length becomes one less. If we remove the head, we return the new head (the former second node).
Using a dummy node whose next is the real head allows us to have a “previous” pointer even when the node to remove is the head; then we always do prev.next = prev.next.next and return dummy.next.
Why This Topic Matters
- Interview staple: LeetCode 19 (Remove Nth Node From End of List) is very common. It tests the “lead pointer by n” two-pointer pattern and the dummy node to avoid head special-case.
- Same pattern elsewhere: “Find the k-th node from the end” uses the same idea (lead by k, then step both; when lead reaches end, the other pointer is at the k-th from end). “Reorder list” and “split list” sometimes use similar “distance from end” logic.
- Dummy + two pointers: The combination “dummy so we have a predecessor for the head” and “lead pointer by n steps” is a reusable pattern for any “from the end” indexing.
Mental Model
We want a pointer to the node before the one we will remove, so we can do prev.next = prev.next.next. The node to remove is n steps from the end—i.e. when we are at the end (None), we go back n steps to get to that node, and one more step back to get to its predecessor. So: put a “fast” pointer n + 1 steps ahead of a “slow” pointer (slow starts at dummy, so slow is “one before” the first data node). Move both one step at a time until fast reaches None. Then slow is exactly at the predecessor of the n-th-from-end node. Remove the next node and return dummy.next.
Step-by-Step Breakdown
- Create a dummy node and set
dummy.next = head. Setslow = dummyandfast = dummy. - Advance
fastby n + 1 steps (so fast is (n + 1) steps ahead of slow). If fast becomes None before we finish n + 1 steps, the list has fewer than n nodes—handle as needed (e.g. return head or raise). - While
fastis not None: moveslow = slow.nextandfast = fast.next. When the loop exits, fast is None—we’ve passed the last node. So slow is at the node that is (n + 1) from the end, i.e. the predecessor of the n-th-from-end node. - Remove the next node:
slow.next = slow.next.next. Returndummy.next(the head; it may have changed if we removed the original head).
Why n + 1? We need slow to land on the predecessor of the n-th-from-end node. The n-th-from-end node is n steps before None. So its predecessor is n + 1 steps before None. By moving fast ahead by n + 1 and then stepping both until fast is None, slow has moved the same number of steps as fast from its starting point—so slow is (n + 1) steps before the current “end” (None), i.e. at the predecessor. So we advance fast by n + 1 in the beginning.
ASCII Diagram
List: 1 → 2 → 3 → 4 → 5 → None n = 2 (remove 4)
After advancing fast by n+1 = 3 steps:
dummy → 1 → 2 → 3 → 4 → 5 → None
slow fast
After moving both until fast is None:
dummy → 1 → 2 → 3 → 4 → 5 → None
slow fast
slow is at 3; slow.next is 4 (the node to remove).
slow.next = slow.next.next → 3.next = 5
Result: 1 → 2 → 3 → 5 → None
Python Implementation
def remove_nth_from_end(head, n):
dummy = ListNode(0)
dummy.next = head
slow = fast = dummy
for _ in range(n + 1):
fast = fast.next
if fast is None and _ < n:
return head # list shorter than n; optional
while fast is not None:
slow = slow.next
fast = fast.next
slow.next = slow.next.next
return dummy.next
We advance fast n + 1 times. If we want to assume valid input (list length ≥ n), we can skip the if fast is None check and assume fast is not None after the loop. Then we step both until fast is None, remove the next node of slow, and return dummy.next.
Line-by-Line Explanation
dummy.next = head,slow = fast = dummy: Dummy gives us a predecessor for the head. Both pointers start at dummy.for _ in range(n + 1): fast = fast.next: Move fast forward by n + 1 steps. After this, fast is (n + 1) nodes ahead of slow. So when we then move both one step at a time until fast is None, slow will be at the (n + 1)-th node from the end = predecessor of the n-th from end.while fast is not None: slow, fast = slow.next, fast.next: Advance both until fast reaches the end. Slow ends at the node before the one to remove.slow.next = slow.next.next: Bypass the n-th-from-end node. We assume the list has at least n nodes so slow.next exists and we are not dereferencing None.return dummy.next: The (possibly new) head. If we removed the original head, dummy.next now points to the second node, which is correct.
Time Complexity
We traverse the list at most twice: once to advance fast by n + 1 (or up to the end), and once to move both until fast is None. Total steps O(n). Time O(n). It is possible to do exactly one pass by advancing fast n + 1 steps and then moving both in the same loop; the total number of pointer moves is still O(n).
Space Complexity
Only the dummy and a few pointers. Space O(1).
Edge Cases
- Remove the head (n = length): Fast advances n + 1 steps and becomes None right after the last node. Then we move both: slow never moves (fast was already None after the for-loop…). Actually: after the for-loop we have advanced fast n + 1 times. If the list has exactly n nodes, after n steps fast is at the last node; after n + 1 steps fast is None. So we don’t enter the while loop. Slow is still at dummy (we didn’t enter the while). So slow.next is head—the node to remove. slow.next = slow.next.next makes dummy.next point to head.next, the new head. Correct.
- Single node, n = 1: List is [x]. We remove the only node. Dummy.next should become None. After n+1 = 2 steps, fast is None. Slow is still dummy. slow.next = slow.next.next sets dummy.next = None. Return dummy.next = None. Correct.
- List shorter than n: If we don’t check and n is too large, the for-loop may set fast to None before we complete n + 1 steps. Then in the while, fast is already None so we don’t enter; slow is still dummy; slow.next.next might be invalid if the list has fewer than n nodes. So either assume valid n or add a check and return head (or raise) when fast becomes None too early.
Common Mistakes
Leading by n instead of n + 1. If you advance fast by only n steps, when you then move both until fast is None, slow ends at the n-th-from-end node itself, not its predecessor. To remove that node you need the previous node. So you must lead by n + 1 so that slow lands on the predecessor. Alternatively, you could lead by n and then remove the node at slow by copying slow.next’s value into slow and doing slow.next = slow.next.next (delete-next trick), but that doesn’t work if the node to remove is the last node. So “lead by n + 1 and remove slow.next” is the standard approach.
- No dummy: When n = length, the node to remove is the head. Without a dummy you have no “previous” node. You’d need a special case: “if we need to remove the head, return head.next.” The dummy unifies this: the predecessor of the head is the dummy.
- Off-by-one in the loop: The “n-th from end” is 1-indexed: 1 = last node. So the predecessor of the n-th from end is (n + 1) from the end. Hence advance fast by n + 1.
Alternative: Two Pass (Length First)
First pass: compute length L. The node to remove is at position L − n + 1 from the front (1-indexed). Second pass: traverse to the (L − n)-th node (the predecessor) and set prev.next = prev.next.next. Use a dummy to handle L − n = 0 (removing head). Time O(n), space O(1). Same complexity; the one-pass “lead by n + 1” is more common in interviews.
One pass is sufficient: we don’t need to know the total length. The lead-by-(n+1) trick gives us the predecessor in a single sweep. The two-pass solution is easier to derive but does two traversals; both are O(n) time and O(1) space. In practice, the one-pass solution is preferred for elegance and because it matches the “two pointers with a gap” pattern used in other problems.
Pattern Recognition
“K-th from end” with two pointers: To get a pointer to the k-th node from the end (or its predecessor), advance one pointer by k (or k + 1) steps, then move both until the leading pointer reaches None. The trailing pointer is then at the desired position. Use a dummy when the “previous” of the head might be needed. This pattern appears in “remove nth from end,” “find middle,” and “reorder list” (find middle, reverse second half, merge).
If the problem asked to “find” the n-th node from the end (not remove it), you’d advance fast by n steps (not n + 1), then move both until fast is None. Then slow is at the n-th-from-end node. For removal we need the predecessor, so we use n + 1 and then remove slow.next.
State the problem: “Remove the n-th node from the end in one pass, return the head.” Use a dummy and two pointers: advance the fast pointer n + 1 steps from the dummy, then move both until fast is None. Slow is then at the predecessor of the node to remove; do slow.next = slow.next.next and return dummy.next. Explain why n + 1: we need the predecessor so we can unlink the node; the predecessor is (n + 1) steps from the end. Mention edge case: removing the head (n = length) is handled by the dummy—slow stays at dummy and we unlink the first node. Assume n is valid (list length ≥ n) unless told otherwise.
Practice Problems
- LeetCode 19: Remove Nth Node From End of List (one-pass with dummy and lead by n + 1).
- Variation: Find the n-th node from the end (lead by n, then step both; slow lands on that node).
- LeetCode 61: Rotate List (rotate right by k; similar “from end” reasoning—find new tail and new head).
Summary
- Remove n-th node from end: Use a dummy (dummy.next = head) and two pointers. Advance fast by n + 1 steps from the dummy; then move both slow and fast until fast is None. Slow is at the predecessor of the n-th-from-end node. Set
slow.next = slow.next.nextand returndummy.next. - Lead by n + 1 (not n) so that slow lands on the predecessor, allowing a single unlink. The dummy handles the case when the node to remove is the head.
- Time O(n), space O(1). Alternative: two-pass (compute length, then traverse to predecessor). Same complexity; one-pass is standard in interviews.
8.9 LRU Cache Implementation
Introduction
An LRU (Least Recently Used) cache is a fixed-capacity cache that evicts the least recently used item when the cache is full and a new item is inserted. It supports two operations in O(1) average time: get(key)—return the value for the key if present and mark the item as most recently used; and put(key, value)—insert or update the key and mark it most recently used, evicting the LRU item if the cache is at capacity. To achieve O(1) get and put we combine a hash map (key → node or value) for O(1) lookup with a doubly linked list to maintain access order: the list keeps items ordered by recency, with “most recent” at one end and “least recent” at the other. When we access or add an item, we move its node to the “most recent” end; when we evict, we remove from the “least recent” end. The doubly linked list allows O(1) removal of a node when we have a reference to it (Section 8.2). This section builds the design from scratch and gives a complete Python implementation.
Real-World Analogy
Imagine a clothes rack that can hold only a fixed number of hangers. When you wear an item, you put it back on the “most recently used” end of the rack. When the rack is full and you add a new item, you remove the one that hasn’t been used for the longest time (the one at the “least recently used” end). You need to (1) find an item by “name” quickly (hash map) and (2) reorder the rack when you use something (move that hanger to the front)—and remove from the back when full. The doubly linked list is the rack: each hanger has a link to the next and the previous so you can pluck one out and move it to the front in O(1) time.
Capacity 2. put(1, 10): cache has [1]. put(2, 20): cache has [1, 2] (2 is most recent). get(1): return 10, now order is [2, 1] (1 moved to most recent). put(3, 30): cache full; evict LRU = 2; cache becomes [1, 3]. get(2): not found, return -1.
Formal Definition
LRU cache: A data structure with a fixed positive capacity that supports: (1) get(key)—return the value associated with key if the key exists in the cache, otherwise return -1 (or None). If the key exists, the item is considered “used” and becomes the most recently used. (2) put(key, value)—if the key already exists, update its value and mark it most recently used; if the key is new and the cache is at capacity, evict the least recently used item, then add (key, value) and mark it most recently used. Both operations must run in O(1) average time. “Least recently used” means the item that was accessed (get or put) least recently among all items currently in the cache.
We need O(1) lookup (hash map), O(1) “move to most recent” (remove node from current position and add to head—requires doubly linked list so we can unlink in O(1) given the node), and O(1) “remove least recent” (remove from tail end of the list).
Why This Topic Matters
- Interview staple: LeetCode 146 (LRU Cache) is a classic design problem. It combines hash map and doubly linked list and is asked frequently at senior levels.
- Real systems: Caches in operating systems, databases, and web servers often use LRU or variants (LRU-K, ARC). Understanding LRU is a stepping stone to cache design and eviction policies.
- Data structure combination: The pattern “hash map + doubly linked list” appears whenever you need O(1) lookup and O(1) reordering or removal by reference. Same idea is used in some LFU (Least Frequently Used) implementations and in ordered maps with “move to front.”
Mental Model
Maintain two structures: (1) A hash map from key to the list node that holds (key, value). (2) A doubly linked list that represents recency order: e.g. “most recent” at the head (right after a dummy head node) and “least recent” at the tail (right before a dummy tail node). So we have head <-> [most recent] <-> ... <-> [least recent] <-> tail. On get: look up the node in the map; if found, unlink it and add it after head (make it most recent); return the value. On put: if key exists, update the node’s value and move it to most recent. If key is new and cache is full, remove the node before tail (LRU), delete its key from the map, then create a new node, add it after head, and put it in the map. If key is new and cache is not full, just add the new node after head and put it in the map.
Step-by-Step Breakdown
Structure
- Node:
key,value,prev,next. We need the key in the node so that when we evict the node at the tail we can remove the corresponding key from the map. - Dummy head and tail:
head.next= most recent,tail.prev= least recent. This avoids null checks when adding or removing. - Map:
key → nodefor O(1) lookup.
Helper: Add node after head (make most recent)
- Link the node between head and head.next:
node.prev = head,node.next = head.next,head.next.prev = node,head.next = node.
Helper: Remove node from list
node.prev.next = node.next,node.next.prev = node.prev. The node is unlinked.
get(key)
- If key not in map, return -1.
- Get the node from the map. Remove the node from its current position (remove_node). Add it after head (add_after_head). Return node.value.
put(key, value)
- If key in map: get the node, update node.value, remove_node, add_after_head. Return.
- If cache is full (len(map) == capacity): the LRU node is tail.prev. Remove it from the list, remove its key from the map.
- Create a new node for (key, value). add_after_head, and map[key] = node.
ASCII Diagram: List Order
head <-> [MRU] <-> ... <-> [LRU] <-> tail
After get(key): unlink that node, insert it between head and head.next.
After put(new): if full, unlink tail.prev (LRU), remove from map; then add new node after head.
New / updated item is always at head.next (most recent).
Python Implementation
class Node:
def __init__(self, key=0, value=0):
self.key = key
self.value = value
self.prev = None
self.next = None
class LRUCache:
def __init__(self, capacity: int):
self.capacity = capacity
self.cache = {} # key -> node
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
def _add_after_head(self, node):
node.prev = self.head
node.next = self.head.next
self.head.next.prev = node
self.head.next = node
def _remove_node(self, node):
node.prev.next = node.next
node.next.prev = node.prev
def get(self, key: int) -> int:
if key not in self.cache:
return -1
node = self.cache[key]
self._remove_node(node)
self._add_after_head(node)
return node.value
def put(self, key: int, value: int) -> None:
if key in self.cache:
node = self.cache[key]
node.value = value
self._remove_node(node)
self._add_after_head(node)
return
if len(self.cache) == self.capacity:
lru = self.tail.prev
self._remove_node(lru)
del self.cache[lru.key]
new_node = Node(key, value)
self.cache[key] = new_node
self._add_after_head(new_node)
Line-by-Line Explanation
- Node: Holds key (for map cleanup on evict), value, prev, next. Doubly linked.
- head, tail: Dummy nodes. head.next is the first real node (MRU), tail.prev is the last (LRU). head.prev and tail.next can stay None or unused.
- _add_after_head(node): Inserts node between head and head.next. Standard doubly linked insert after head.
- _remove_node(node): Unlinks node by rewiring node.prev.next and node.next.prev. O(1) because we have the node reference.
- get: If key not in cache, return -1. Else get node, remove it from list, add after head (move to MRU), return value.
- put: If key exists: update value, move to MRU (remove + add_after_head). Else: if at capacity, evict tail.prev (remove from list, delete from map). Then create new node, put in map, add_after_head.
Time Complexity
get(key): O(1) average—hash lookup, then two pointer updates (remove + add). put(key, value): O(1) average—lookup, and either (update + move) or (evict one node + add). All operations are O(1) assuming hash map and doubly linked list operations are O(1).
Space Complexity
O(capacity)—we store at most capacity nodes and capacity map entries. The dummy head and tail are O(1). Space O(capacity).
Edge Cases
- Capacity 0: put should not add; get always returns -1. Guard: if capacity == 0, put returns without doing anything (and don’t evict if already empty).
- put same key twice: Update value and move to MRU. Handled by “if key in self.cache” branch.
- get then put same key: get moves to MRU; put updates value and moves to MRU again. No duplicate nodes; the map still points to the same node.
Common Mistakes
Forgetting to store the key in the node. When we evict, we remove tail.prev and must do del self.cache[lru.key]. If the node doesn’t store the key, we cannot remove the correct key from the map. Always store (key, value) in the node.
- Wrong order of pointer updates: When adding after head, link the node to its neighbors first, then update head.next and head.next.prev. If you update head.next first, you lose the reference to the old first node unless you saved it.
- Using a singly linked list: To remove a node in O(1) when you have only the node, you need to change the previous node’s next. Without a prev pointer, finding the previous node is O(n). So LRU cache needs a doubly linked list for O(1) move/remove.
Evolution: Why Hash Map + Doubly Linked List
Naive: Store (key, value) in a list. get: scan the list O(n); put: scan to update or add, and to find LRU (e.g. last element) O(n). Too slow.
Hash map only: We can lookup in O(1) but we don’t have “order of use.” To evict LRU we’d need to track timestamps and scan to find the minimum—O(capacity) per eviction. Not O(1) put.
Hash map + doubly linked list: Map gives O(1) lookup; list gives order. Move to MRU = unlink + add after head = O(1). Evict LRU = remove tail.prev = O(1). This meets the O(1) get/put requirement.
You could use an ordered dict (e.g. Python’s collections.OrderedDict) and move the key to the end on get/put (most recent at end), then evict from the beginning. That also gives O(1) get/put in practice. The “hash map + doubly linked list” implementation is the classic interview solution and shows you understand both structures; in Python, mentioning OrderedDict as an alternative is fine. For other languages, the manual doubly linked list is standard.
Pattern Recognition
Hash map + doubly linked list: Use when you need O(1) lookup by key and O(1) “move to front” or “remove by reference.” The list maintains an order; the map gives quick access to the node so you can unlink and reinsert. Same pattern: LRU cache, LFU cache (with frequency lists), and some ordered cache eviction policies.
In Python 3.7+, plain dict preserves insertion order. You could implement a minimal LRU by deleting and re-inserting the key on access (to move it to the “end” as most recent) and evicting the first key (popitem(last=False)) when full. That gives O(1) average get/put and is very short to write. The interview version with explicit doubly linked list is preferred when the goal is to demonstrate the data structure design; for production Python, OrderedDict or the dict trick is often used.
State the requirements: O(1) get and put, evict LRU when full. Say you’ll use a hash map for O(1) lookup and a doubly linked list to maintain recency order (most recent at head, least at tail). Describe the node (key, value, prev, next)—emphasize storing key so you can remove from the map on eviction. For get: lookup; if found, unlink the node and add after head, return value. For put: if key exists, update and move to head; else if full, evict tail.prev and remove from map; then create new node, add after head, put in map. Mention dummy head/tail to simplify insert/remove. If they ask “why doubly linked?”, answer: O(1) removal of a node when we have a reference—we need prev and next to unlink.
Practice Problems
- LeetCode 146: LRU Cache (implement get/put with hash map + doubly linked list).
- LeetCode 460: LFU Cache (least frequently used; multiple lists or heap + map).
- LeetCode 588: Design In-Memory File System (can use similar caching ideas for directory listings).
Summary
- LRU cache supports get(key) and put(key, value) in O(1) average time; when full, put evicts the least recently used item. Combine a hash map (key → node) with a doubly linked list (order by recency: head = MRU, tail = LRU).
- Store (key, value) in each node so that on eviction we can remove the key from the map. Use dummy head and tail for simpler insert/remove.
- get: lookup node; if found, unlink and add after head, return value. put: if key exists, update value and move to head; else if full, evict tail.prev and delete from map; then add new node after head and put in map. Time O(1), space O(capacity).
9.1 Stack Implementation
Introduction
A stack is a linear data structure that follows LIFO (Last In, First Out): the last element added is the first one removed. It supports three core operations—push (add an element on top), pop (remove and return the top element), and peek or top (return the top element without removing it)—all in O(1) time when implemented with an array or a linked list. Stacks appear everywhere: in the call stack of your program (function calls and returns), in expression evaluation (postfix/prefix), in matching brackets and parsing, in DFS (depth-first search), and in undo/redo. This section defines the stack ADT, implements it in Python (using a list and optionally a linked list), and covers edge cases and when to reach for a stack in problem-solving.
Real-World Analogy
Think of a stack of plates. You can only add a new plate on top and remove the top plate. You cannot pull a plate from the middle without toppling the stack. The last plate you put on is the first one you take off—LIFO. Similarly, the “undo” in an editor: the last action you did is the first one that gets undone. Stacks model any situation where “most recent” matters and you only need access to the top.
Push 10, push 20, push 30. Top is 30. Pop returns 30; top is now 20. Pop returns 20; the stack has only 10 left. Order of removal is always the reverse of order of insertion.
Formal Definition
Stack (ADT): A collection that supports: (1) push(x)—add element x on top of the stack; (2) pop()—remove and return the top element (undefined if the stack is empty); (3) peek() or top()—return the top element without removing it (undefined if empty); (4) isEmpty() (or empty())—return true if the stack has no elements. Optionally: size(). The only element that can be accessed or removed is the one most recently pushed—the top. No random access by index. LIFO order is guaranteed.
Implementation can use a dynamic array (list) or a singly linked list (push and pop at the head). Both give O(1) amortized or O(1) push/pop/peek.
Why This Topic Matters
- Foundation for Stack & Queue section: Stacks and queues are the simplest linear ADTs after arrays and lists. Many interview problems (“valid parentheses,” “next greater element,” “min stack”) are stack-based.
- Call stack: Recursion and function calls are implemented with a stack. Understanding stacks helps you reason about recursion and stack overflow.
- Algorithm building block: DFS uses an explicit or implicit stack; expression evaluation (postfix), bracket matching, and monotonic stack problems all rely on the LIFO property.
Mental Model
Picture a vertical tube open at the top. You drop items in one at a time; they pile up. The only item you can see or take out is the one on top. That’s the stack. In code, we maintain a “top” (e.g. the last index in an array, or the head of a linked list) and only add or remove there. No need to shift elements—push and pop are O(1).
Step-by-Step: Operations
Using a dynamic array (list)
- Push: Append the element to the end of the list. Top = last index. O(1) amortized.
- Pop: Remove and return the last element (list.pop()). O(1).
- Peek/Top: Return the last element (list[-1]) without removing. O(1).
- isEmpty: Check if the list is empty (len(list) == 0). O(1).
Using a singly linked list
Use the head as the top. Push: create a new node, set its next to the current head, set head to the new node. Pop: save head’s value, set head = head.next, return the value. Peek: return head.data (or head.val). All O(1).
ASCII Diagram
Stack (top on the right):
push(10): [10]
push(20): [10, 20] top = 20
push(30): [10, 20, 30] top = 30
pop(): [10, 20] returns 30
peek(): [10, 20] returns 20
pop(): [10] returns 20
Linked-list version: top = head
head → 30 → 20 → 10 → None (30 is top)
push: new_node.next = head; head = new_node
pop: val = head.val; head = head.next; return val
Python Implementation
Using list (recommended in Python)
class Stack:
def __init__(self):
self._data = []
def push(self, x):
self._data.append(x)
def pop(self):
if self.is_empty():
raise IndexError("pop from empty stack")
return self._data.pop()
def peek(self):
if self.is_empty():
raise IndexError("peek from empty stack")
return self._data[-1]
def is_empty(self):
return len(self._data) == 0
def size(self):
return len(self._data)
Using linked list (for practice)
class Node:
def __init__(self, data):
self.data = data
self.next = None
class StackLinkedList:
def __init__(self):
self.head = None
def push(self, x):
new = Node(x)
new.next = self.head
self.head = new
def pop(self):
if self.head is None:
raise IndexError("pop from empty stack")
val = self.head.data
self.head = self.head.next
return val
def peek(self):
if self.head is None:
raise IndexError("peek from empty stack")
return self.head.data
def is_empty(self):
return self.head is None
Line-by-Line Explanation (List Version)
_data = []: Internal list; we only add/remove at the end so LIFO is preserved.push(x): append(x): Add at the end; that becomes the new top. O(1) amortized.pop(): _data.pop(): Remove and return the last element. Check is_empty first to avoid popping from an empty list.peek(): _data[-1]: Return last element without removing. Must check is_empty.is_empty(): len(_data) == 0: O(1). Alternativelyreturn not self._data.
Time Complexity
Push: O(1) amortized (list append may occasionally resize). Pop: O(1). Peek: O(1). isEmpty, size: O(1). All core operations are constant time.
Space Complexity
O(n) where n is the number of elements in the stack. The list (or linked list) stores each element once.
Edge Cases
- Pop from empty stack: Undefined behavior unless we define it. In the implementation above we raise
IndexError. In problems, often the input is guaranteed non-empty, or we must check before popping. - Peek on empty stack: Similarly, raise or return a sentinel. Always consider “what if the stack is empty?” when using peek in algorithms (e.g. bracket matching).
- Push None: Allowed if the stack is intended to hold any value. Some problems use the stack to store indices or nodes; None can be a valid element or a sentinel—clarify in interviews.
Common Mistakes
Popping or peeking without checking if the stack is empty. In production code, always guard pop/peek with is_empty (or a size check) or document that the caller must ensure non-empty. In contest/problem code, if the problem says “non-empty,” you may skip the check; otherwise one invalid input can cause a runtime error.
- Using the wrong end of the list: For a stack, push and pop must happen at the same end. If you push with append but pop from the front (pop(0)), that’s O(n) per pop and it’s a queue-like order, not LIFO. Stick to append + pop() for the list-based stack.
- Confusing stack with queue: Stack = LIFO (last in, first out). Queue = FIFO (first in, first out). Use a stack when you need “most recent” or “reverse order” (e.g. DFS, undo, bracket matching).
When to Use a Stack
Use a stack when the problem has one or more of: (1) LIFO / “most recent”—e.g. undo, backtracking; (2) Matching or nesting—e.g. valid parentheses, balanced brackets, HTML tags; (3) Need to “look back” at the last relevant element—e.g. next greater element (monotonic stack), stock span; (4) DFS—explicit stack or recursion (call stack); (5) Expression evaluation—postfix (RPN) or infix with operator stack. If you need “first in, first out,” use a queue instead.
In Python, you often don’t need a custom Stack class: a list used with append and pop() is a stack. For interviews, either use a list directly (“I’ll use a list as a stack: append for push, pop for pop”) or implement a thin wrapper. Knowing the ADT and when to use it matters more than the exact class name.
When the problem involves “matching pairs,” “nested structure,” “most recent,” or “reverse order,” mention a stack. Implement with a list: push = append, pop = pop(), peek = list[-1]. State the time: O(1) per operation, O(n) space. For “min stack” or “max stack,” you’ll extend this with an auxiliary structure (e.g. second stack or heap) in later topics.
Practice Problems
- LeetCode 20: Valid Parentheses (stack to match brackets).
- LeetCode 155: Min Stack (stack + auxiliary structure for O(1) getMin).
- LeetCode 232: Implement Queue using Stacks (two stacks for FIFO).
- LeetCode 94: Binary Tree Inorder Traversal (iterative version uses a stack).
Summary
- A stack is LIFO: push (add on top), pop (remove top), peek (read top). All O(1) with list or linked list. Optionally isEmpty and size.
- In Python, use a list:
appendfor push,pop()for pop,list[-1]for peek. Or implement a small Stack class that wraps a list. - Use a stack for: matching/nesting (parentheses), “most recent” (undo), next-greater/span (monotonic stack), DFS, and expression evaluation. Guard pop/peek when the stack may be empty.
9.2 Monotonic Stack
Introduction
A monotonic stack is a stack that maintains elements in strictly increasing or strictly decreasing order (from bottom to top). We use it to answer “for each element, find the next (or previous) element that is greater or smaller” in O(n) time for the entire array—instead of O(n) per element with a naive scan. The idea: as we scan the array, we push indices (or values) onto the stack; before pushing, we pop all elements that “break” the desired monotonicity. Those pops give us the answers: for example, when we pop an index because we found a larger value, that larger value is the “next greater element” for the popped index. Monotonic stacks power problems like Next Greater Element, Stock Span, and Largest Rectangle in Histogram. This section builds the pattern from intuition to code.
Real-World Analogy
Imagine a line of people by height. You walk from left to right. You want to know, for each person, “who is the next person to my right who is taller than me?” When you meet someone taller, they are the “next taller” for everyone you’re still “holding” (people you passed who haven’t found their answer yet). You keep a mental stack of people who haven’t found their “next taller” person. When a new person arrives, anyone on your stack who is shorter than this new person has found their answer—this new person. You remove them from the stack and record the answer, then add the new person to the stack. The stack always has people in increasing height (bottom to top)—monotonic increasing. That’s the monotonic stack.
Array [2, 1, 4, 3, 5]. Next greater element (NGE) for each: 2→4, 1→4, 4→5, 3→5, 5→-1. We scan left to right, keep a stack of indices where we haven’t found NGE yet. When we see 4, 2 and 1 are smaller so we pop them and set their NGE = 4. Then we push 4’s index. When we see 5, we pop 4 and 3 (smaller) and set their NGE = 5. Result: [4, 4, 5, 5, -1].
Formal Definition
Monotonic stack: A stack (usually storing indices) that we maintain in monotonic order relative to the corresponding array values. Monotonically increasing stack: From bottom to top, the values at the stacked indices are in non-decreasing (or strictly increasing) order. Used when we want “next greater” or “previous greater” type queries. Monotonically decreasing stack: From bottom to top, values are in non-increasing order. Used for “next smaller” or “previous smaller.” The key: when we push a new index, we first pop all indices whose values violate the order; each pop often corresponds to finding an “answer” (e.g. next greater element) for that index.
We typically store indices in the stack so we can both compare values (arr[stack[-1]]) and write answers (result[stack.pop()] = current value or current index).
Why This Topic Matters
- Interview staple: Next Greater Element (LeetCode 496, 503), Stock Span (739-style), Largest Rectangle in Histogram (84), Trapping Rain Water (42)—all use monotonic stacks. Recognizing the pattern is half the solution.
- O(n) where naive is O(n²): For each position, “find next greater” naively is O(n) per element. With a monotonic stack, each element is pushed once and popped at most once, so total O(n).
- Reusable pattern: Same idea applies to “next smaller,” “previous greater,” “previous smaller,” and to 2D problems (e.g. maximal rectangle).
Mental Model
We scan the array (left to right for “next,” or right to left for “previous”). The stack holds “candidates” that are still waiting for their answer. For “next greater element”: we want the stack to have smaller elements at the bottom and larger at the top (so the top is the “smallest so far” among the waiting). When a new value comes in that is greater than the stack top, the stack top has found its next greater—the new value. We pop and assign the answer, and repeat until the stack is empty or the top is not smaller. Then we push the current index. So the stack stays sorted by value (bottom ≤ top in terms of arr[i]). That’s a monotonically increasing stack (by value). For “next smaller,” we maintain a monotonically decreasing stack: pop while current is smaller than top, and the current is the “next smaller” for the popped indices.
Step-by-Step: Next Greater Element (NGE)
Given array arr, find for each index i the next greater element (first index j > i such that arr[j] > arr[i]). If none, -1.
- Initialize
result = [-1] * nand an empty stack (of indices). - For each index
ifrom 0 to n-1:- While the stack is not empty and
arr[stack[-1]] < arr[i]: the element atstack[-1]has found its next greater—it’sarr[i]. Setresult[stack.pop()] = arr[i](or= iif you need the index). - Push
ionto the stack.
- While the stack is not empty and
- Indices left in the stack have no next greater element; they already have -1 in result.
The stack maintains indices whose values are in increasing order (bottom to top): when we see a larger value, we “resolve” all smaller ones.
ASCII Diagram: Next Greater Element
arr: [2, 1, 4, 3, 5]
i=0: stack=[0] (value 2)
i=1: stack=[0,1] (2, 1 - both waiting)
i=2: arr[2]=4 > 1 → pop 1, result[1]=4
arr[2]=4 > 2 → pop 0, result[0]=4
stack=[2]
i=3: stack=[2,3] (4, 3)
i=4: arr[4]=5 > 3 → pop 3, result[3]=5
arr[4]=5 > 4 → pop 2, result[2]=5
stack=[4]
result: [4, 4, 5, 5, -1]
Monotonically Increasing vs Decreasing
| Stack type | Pop condition (scan L→R) | Use for |
|---|---|---|
| Increasing (bottom→top) | Pop while arr[stack[-1]] < arr[i] | Next greater element |
| Decreasing (bottom→top) | Pop while arr[stack[-1]] > arr[i] | Next smaller element |
For “previous” greater/smaller, scan right to left and the same pop logic applies (previous greater = scan from right, pop when current is greater than top, etc.).
Python Implementation
Next Greater Element (each element)
def next_greater_element(arr):
n = len(arr)
result = [-1] * n
stack = [] # indices
for i in range(n):
while stack and arr[stack[-1]] < arr[i]:
idx = stack.pop()
result[idx] = arr[i] # or result[idx] = i for index
stack.append(i)
return result
Next Smaller Element
def next_smaller_element(arr):
n = len(arr)
result = [-1] * n
stack = []
for i in range(n):
while stack and arr[stack[-1]] > arr[i]:
idx = stack.pop()
result[idx] = arr[i]
stack.append(i)
return result
Only the comparison changes: < for next greater (pop when current is larger), > for next smaller (pop when current is smaller).
Line-by-Line Explanation (NGE)
result = [-1] * n: Default “no next greater.” We only update when we pop.stack: Holds indices of elements that don’t have an answer yet. Stack values (arr[stack]) are in increasing order from bottom to top.while stack and arr[stack[-1]] < arr[i]: Current value arr[i] is greater than the top’s value—so the top has found its next greater element (arr[i]). Pop and record.stack.append(i): After resolving everyone we can, push the current index. It becomes the new top (and it’s the smallest in the stack in terms of value, preserving monotonicity).
Time Complexity
Each index is pushed exactly once and popped at most once. So total operations are O(n). Time O(n).
Space Complexity
The stack can hold up to n indices in the worst case (e.g. strictly decreasing array—nothing gets popped until the end). Space O(n).
Edge Cases
- Empty array: Return [] or [-1] as appropriate. The loop doesn’t run.
- Strictly decreasing array: No element has a next greater. Stack grows to size n; result stays all -1.
- Strictly increasing array: Each element’s next greater is the next element. We pop one per step; stack size stays 1.
- Duplicate values: For “next greater” we use
<so equal values don’t pop each other; both stay in the stack. For “next greater or equal” you’d use<=when popping so equals resolve each other. Clarify with the problem.
Common Mistakes
Storing values instead of indices. If you store values, you can compare but you cannot write the result for the correct index. Always store indices in the stack (and use arr[stack[-1]] for comparison) so that when you pop, you know which position’s answer to set.
- Wrong comparison direction: Next greater → pop when current is greater than top (arr[stack[-1]] < arr[i]). Next smaller → pop when current is smaller than top (arr[stack[-1]] > arr[i]). Reversing these gives wrong answers.
- Previous vs next: “Next” = scan left to right; “previous” = scan right to left. The pop condition is the same; only the loop direction and which index gets the answer change.
Variants: Previous Greater, Stock Span
Previous greater element: For each i, find the previous (j < i) such that arr[j] > arr[i]. Scan right to left (i from n-1 to 0). Pop while arr[stack[-1]] < arr[i]; then result[i] = arr[stack[-1]] (or -1 if stack empty), and push i. Stack stays decreasing (bottom to top) in value.
Stock span: For each day i, find how many consecutive days to the left (including today) where price was ≤ today’s price. Equivalently: find “previous greater” index j; span = i - j. Use a decreasing stack (by value), scan left to right; when we pop, we know the popped index’s “previous greater” is the current index (or -1). Span for current = current index - stack[-1] after pops (or current + 1 if stack empty).
The monotonic stack achieves O(n) by ensuring each element is pushed once and popped at most once. There is no faster asymptotic time for “next greater for all” because we must at least read each element. The pattern extends to “next greater in a circular array” (LeetCode 503): traverse twice or use indices modulo n so that the second pass resolves elements that didn’t find a greater in the first pass.
Pattern Recognition
When the problem asks for “next/previous greater/smaller” for each element, or “nearest” such element, think monotonic stack. Keywords: next greater, next smaller, stock span, histogram rectangle, trapping rain water (nearest higher bars). Decide: (1) next or previous? (scan direction); (2) greater or smaller? (pop condition and stack order). Store indices; use arr[i] for comparisons.
For “largest rectangle in histogram” (LeetCode 84): for each bar, we need the “previous smaller” and “next smaller” (indices where height is less than current). Then width = next_smaller - previous_smaller - 1 and area = height * width. Run both a “previous smaller” and “next smaller” pass (or do both in one pass with a single stack and careful bookkeeping). Same monotonic stack pattern—decreasing stack for “smaller.”
State: “For each element I need the next greater element. I’ll use a monotonic stack of indices: when I see a value larger than the stack top’s value, the stack top has found its next greater—I pop and record, then push the current index. The stack stays increasing by value so each element is pushed and popped at most once—O(n) time.” Give the code and mention edge cases (empty, all -1 for decreasing array). If asked for “previous greater” or “next smaller,” explain the same idea with different scan direction or comparison.
Practice Problems
- LeetCode 496: Next Greater Element I (subset in a larger array).
- LeetCode 503: Next Greater Element II (circular array; double the array or traverse twice).
- LeetCode 739: Daily Temperatures (next greater index; store index, result is index - i).
- LeetCode 84: Largest Rectangle in Histogram (previous and next smaller).
- LeetCode 42: Trapping Rain Water (can use monotonic stack or two pointers).
Summary
- A monotonic stack keeps elements (usually indices) in increasing or decreasing order by value. We pop when the current value “breaks” that order; each pop often assigns an answer (e.g. next greater = current value).
- Next greater: Scan left to right, stack of indices in increasing value (bottom to top). Pop while arr[stack[-1]] < arr[i]; result[pop] = arr[i]. Then push i. Time O(n), space O(n).
- Next smaller: Same scan; pop while arr[stack[-1]] > arr[i]. Store indices (not values) so we can write results correctly. Use for stock span, histogram rectangle, and “nearest” type problems.
9.3 Next Greater Element
Introduction
The Next Greater Element (NGE) problem: for each element in an array, find the first element to its right that is strictly greater. If none exists, use -1 (or a sentinel). This appears in two classic forms: (1) NGE for the whole array—return an array where result[i] = next greater of arr[i]; (2) NGE I (LeetCode 496)—given a subset array nums1 and a larger array nums2, find the next greater element in nums2 for each value in nums1 and return answers in nums1’s order; (3) NGE II (LeetCode 503)—same as (1) but the array is circular, so “to the right” wraps around. We solve all with the same monotonic stack idea from Section 9.2: O(n) time, one pass (or two for circular). This section focuses on problem formulation, brute force vs optimal, and the circular variant.
Real-World Analogy
Imagine standing in a line of people with numbers on their shirts. For each person, you want to know: “who is the next person to my right with a higher number?” The first such person to the right is that element’s “next greater.” In a circular line, after the last person you wrap to the first. The monotonic stack is like remembering “everyone who hasn’t found their next higher person yet”; when someone with a higher number arrives, they are the answer for everyone you’re still holding.
Array [4, 2, 1, 5, 3]. NGE: 4→5, 2→5, 1→5, 5→-1, 3→-1. So result = [5, 5, 5, -1, -1]. For circular: after 3 we wrap; 4 is to the “right” of 3, but 4 is not greater than 3; 5 is, so 3→5. Circular result: [5, 5, 5, -1, 5].
Formal Definition
Next Greater Element (standard): Given array arr of length n, for each index i find the smallest index j such that j > i and arr[j] > arr[i]. The value of the next greater element is arr[j], or -1 if no such j exists. NGE I (496): nums1 is a subset of nums2. For each element x in nums1, find the next greater element of x in nums2 (i.e. the first element to the right of x’s position in nums2 that is greater than x). Return an array of the same length as nums1 with these values (or -1). NGE II (503): Same as standard but the array is circular: “next” wraps from the end to the start. So for each index i we consider indices i+1, i+2, …, n-1, 0, 1, … until we find a greater element.
Why This Topic Matters
- LeetCode 496 and 503: Direct “Next Greater Element” problems. 496 tests mapping from a subset to a larger array; 503 tests the circular extension. Both are solved with the same stack.
- Daily Temperatures (739): For each day, find the number of days until a warmer day—i.e. the distance to the next greater element, not the value. Same stack; store indices and result[i] = (popped index - i) or (next_greater_index - i).
- Building block: Many “nearest larger” or “next/previous greater” problems reduce to one or two NGE-style passes.
Mental Model
Scan left to right. The stack holds indices of elements that haven’t found their next greater yet. When we see a value larger than the stack top’s value, the stack top has found its answer—the current value. Pop and assign, repeat, then push the current index. The stack stays monotonically increasing by value (bottom to top). For circular: after the first pass, indices left in the stack might still have an answer in the “wrap-around” part—so run a second pass (or traverse indices 0 to 2n-1 with index mod n) and use the same pop logic; then every element gets at most one chance from the left part of the array.
Step-by-Step: Standard NGE (One Array)
- result = [-1] * n, stack = [].
- For i in 0..n-1: while stack and arr[stack[-1]] < arr[i]: result[stack.pop()] = arr[i]. Then stack.append(i).
- Return result. Elements still in the stack have no next greater (remain -1).
Step-by-Step: NGE II (Circular)
- result = [-1] * n, stack = [].
- Traverse twice: for i in range(2 * n), use index j = i % n and value arr[j]. While stack and arr[stack[-1]] < arr[j]: result[stack.pop()] = arr[j]. Then, only in the first pass (i < n), stack.append(j). (We only push each index once—during the first pass.)
- After two full passes, every element that has a next greater in the circular sense has been assigned. Return result.
Alternatively: push indices 0..n-1 in the first pass; in the second pass (i from n to 2n-1, j = i % n) only do the pop-and-assign, no push. Same effect.
Evolution: Brute Force → Optimal
Brute Force
For each index i, scan j from i+1 to n-1 (and for circular, then 0 to i-1) until arr[j] > arr[i]. Set result[i] = arr[j] or -1. Time O(n²), space O(1) for the result.
Optimal: Monotonic Stack
One pass (or two for circular); each index pushed and popped at most once. Time O(n), space O(n) for the stack. See Section 9.2 for the mechanics.
Python Implementation
Standard NGE (one array)
def next_greater_element(arr):
n = len(arr)
result = [-1] * n
stack = []
for i in range(n):
while stack and arr[stack[-1]] < arr[i]:
result[stack.pop()] = arr[i]
stack.append(i)
return result
NGE II (circular)
def next_greater_element_circular(arr):
n = len(arr)
result = [-1] * n
stack = []
for i in range(2 * n):
j = i % n
while stack and arr[stack[-1]] < arr[j]:
result[stack.pop()] = arr[j]
if i < n:
stack.append(j)
return result
NGE I (LeetCode 496: nums1 subset of nums2)
def next_greater_element_1(nums1, nums2):
# Build NGE for nums2
nge2 = {}
stack = []
for x in nums2:
while stack and stack[-1] < x:
nge2[stack.pop()] = x
stack.append(x)
while stack:
nge2[stack.pop()] = -1
return [nge2.get(x, -1) for x in nums1]
Here we use a stack of values (since we need to map value → next greater value for nums1). For each x in nums2, pop all values smaller than x and set their NGE = x; then push x. At the end, remaining values have no NGE (-1). Then for each value in nums1, look up in the map.
Time and Space Complexity
Standard and circular: Each index is pushed once and popped at most once. Time O(n). Space O(n) for the stack and result. NGE I: One pass over nums2 to build the map, then one pass over nums1 to build the answer. Time O(len(nums2) + len(nums1)), space O(len(nums2)) for the stack and map.
Edge Cases
- Empty array: Return [].
- Single element: result = [-1].
- Strictly decreasing array: result = [-1] * n. Stack grows to n; nothing is ever popped until we finish.
- Circular, all equal: No element has a strictly greater; result = [-1] * n.
- NGE I: If nums1 contains a value not in nums2, we can define NGE as -1 (as in the code with get(x, -1)).
Common Mistakes
Circular: pushing the same index twice. In the second pass we only want to resolve remaining stack indices (assign their next greater from the wrap-around). We must not push the same index again, or we’ll assign wrong answers. So push only when i < n (first pass).
- Strictly greater vs greater-or-equal: Standard definition is “strictly greater” (arr[j] > arr[i]). If the problem says “greater or equal,” use >= when comparing and popping.
- NGE I: The problem asks for “next greater in nums2” for each value in nums1. Build the NGE map for the whole nums2, then map nums1 values to answers. Don’t scan nums2 for each nums1 value (that would be O(n*m)).
Daily Temperatures (739) Connection
For each day i, return the number of days you have to wait until a warmer day. So we need the index of the next greater element, not the value. result[i] = (that index) - i, or 0 if no warmer day. Same monotonic stack: store indices; when we pop index j because current i is warmer, set result[j] = i - j. Code: result[stack.pop()] = i - popped_index. Rest unchanged.
For “next greater index” (e.g. Daily Temperatures), store indices in the stack and assign result[pop()] = i - pop() (or the current index minus the popped index). For “next greater value,” assign result[pop()] = arr[i]. Same loop; only what you store in result changes.
State: “For each element I need the first element to the right that is strictly greater. I’ll use a monotonic stack of indices: scan left to right, pop and set result when current value is greater than top’s value, then push current index. O(n) time.” For circular: “I’ll traverse 2n positions with index mod n; only push in the first n steps so each index is pushed once. Second pass resolves wrap-around.” For NGE I: “Build the next-greater map for nums2 in one pass with the same stack, then map each nums1 value to its NGE.”
Practice Problems
- LeetCode 496: Next Greater Element I (nums1, nums2; build NGE for nums2, map to nums1).
- LeetCode 503: Next Greater Element II (circular; two passes or 2n loop with mod).
- LeetCode 739: Daily Temperatures (next greater index; result[i] = index - i).
Summary
- Next Greater Element: For each index i, result[i] = first value to the right that is strictly greater, or -1. Use a monotonically increasing stack (by value); pop when current > top and assign result[pop] = current value; push current index. O(n) time and space.
- Circular (NGE II): Traverse 2n with index mod n; only push when index < n. Second pass resolves wrap-around. Same O(n).
- NGE I: Build NGE map for nums2 (stack of values, pop when current > top, map[pop] = current); then answer[i] = map[nums1[i]]. Daily Temperatures: store indices, result[pop] = current_index - pop.
9.4 Expression Evaluation
Introduction
Expression evaluation is the task of computing the value of a mathematical expression given as a string (e.g. "3 + 4 * 2" or "( 1 + 2 ) * 3"). Expressions can be written in infix (operator between operands, e.g. 3 + 4), postfix (RPN—operands first, then operator, e.g. 3 4 +), or prefix (Polish—operator first). Stacks are the natural tool: postfix is evaluated with a single stack (scan left to right; push operands, when you see an operator pop two operands, compute, push result). Infix is either converted to postfix first (using an operator stack and precedence rules) or evaluated with two stacks (operands and operators) or with a single pass that respects parentheses and precedence. This section covers postfix evaluation, infix-to-postfix conversion, and a simple calculator pattern. Time is O(n) for a string of length n.
Real-World Analogy
Think of postfix like a stack of instructions: “put 3 on the table, put 4 on the table, now add the top two and replace them with the result.” You only ever need to look at the “top” numbers and the next instruction. No parentheses or “do multiplication before addition”—the order of operations is fixed by the order of tokens. Infix is how we usually write: “3 + 4 * 2.” To evaluate correctly we must defer some operations (e.g. we see + but we don’t add yet if * comes next) and use a stack to hold operators until we can apply them. The stack is the “pending work” list.
Postfix 3 4 + 2 *: push 3, push 4, see + → pop 4, pop 3, push 7. See 2 → push 2. See * → pop 2, pop 7, push 14. Result 14. Infix 3 + 4 * 2: we want 3 + (4*2) = 11; * has higher precedence than +, so we evaluate * before + when building postfix or when using two stacks.
Formal Definition
Infix: Operators appear between operands. Parentheses and precedence (e.g. * before +) determine order. Example: (1 + 2) * 3. Postfix (RPN): No parentheses; each operator follows its operands. Example: 1 2 + 3 * means (1+2)*3. Evaluation: scan left to right; operands go on a stack; when an operator is seen, pop two operands, apply the operator, push the result. Prefix: Operator precedes operands. Example: * + 1 2 3. Evaluation is often done right to left with a stack. We focus on postfix evaluation and infix to postfix (Shunting-yard idea) as the core stack-based methods.
Why This Topic Matters
- Classic stack application: Postfix evaluation is the standard “stack for expression” example. Many interview problems (basic calculator, expression parsing) build on this.
- LeetCode 150 (Evaluate Reverse Polish Notation), 224 (Basic Calculator), 227 (Basic Calculator II): Direct expression evaluation. 150 is postfix; 224/227 are infix with +, -, *, / and sometimes parentheses.
- Parsing and compilers: Expression parsing is a small version of what parsers do. Understanding operator precedence and stack-based evaluation transfers to more complex parsing.
Mental Model
Postfix: One stack of numbers. Scan tokens: if it’s a number, push it; if it’s an operator, pop two numbers (right first, then left), compute left op right, push the result. At the end the stack has one value—the answer. Infix to postfix: One stack for operators (and maybe “(”). Output a list of tokens (postfix). Scan infix: numbers go straight to output; for an operator, pop from the stack and output all operators that have precedence ≥ current (and are not “(”), then push current; for “(” push; for “)” pop and output until “(”. Finally pop and output the rest of the stack.
Step-by-Step: Postfix Evaluation
- Split the expression into tokens (numbers and operators). Assume valid postfix.
- Stack = []. For each token: if token is a number, push it. If token is an operator (+, -, *, /), pop the top two values (call them right and left, in that order). Compute left op right (e.g. left - right for “-”). Push the result.
- After processing all tokens, the stack should have exactly one element—the result. Return it.
Note: For subtraction and division, the first pop is the right operand and the second pop is the left. So “left op right” gives the correct order (e.g. 5 3 - → pop 3, pop 5 → 5 - 3 = 2).
Step-by-Step: Infix to Postfix (Shunting-Yard Idea)
- Output list = [], operator stack = []. Scan infix tokens left to right.
- Number: append to output.
- “(”: push onto operator stack.
- “)”: pop from stack and append to output until we pop “(”. Discard the “(”.
- Operator (+, -, *, /): while the stack is not empty and the top is an operator with precedence ≥ current (and top ≠ “(”), pop and append to output. Then push the current operator.
- End of input: pop all remaining operators from the stack and append to output. Result is the postfix expression. Then evaluate the postfix with the previous algorithm.
Precedence: * and / higher than + and -. Left associativity: when precedence is equal, pop the top first (e.g. 1 - 2 - 3 → 1 2 - 3 - in postfix).
ASCII Diagram: Postfix Evaluation
Postfix: 3 4 + 2 *
Token 3: stack [3]
Token 4: stack [3, 4]
Token +: pop 4, 3 → 3+4=7, stack [7]
Token 2: stack [7, 2]
Token *: pop 2, 7 → 7*2=14, stack [14]
Result: 14
Python Implementation
Evaluate Postfix (LeetCode 150 style)
def eval_rpn(tokens):
stack = []
for t in tokens:
if t in "+-*/":
right = stack.pop()
left = stack.pop()
if t == "+": stack.append(left + right)
elif t == "-": stack.append(left - right)
elif t == "*": stack.append(left * right)
else: stack.append(int(left / right)) # truncate toward zero
else:
stack.append(int(t))
return stack[0]
LeetCode 150 uses string tokens like "3", "4", "+". Division truncates toward zero (e.g. 6 / -132 = 0 in Python 3 with int()).
Simple Infix Calculator (no parentheses, + - * /)
One common approach: treat expression as a sum of terms. Each term is a product (or single number). Scan: keep a running result and current “term”; when you see + or -, add the current term to the result and reset term; when you see * or /, update the term. Alternatively: first convert infix to postfix (with precedence), then evaluate postfix. Below is a two-stack style: numbers and operators; when we see an operator with lower or equal precedence than the top, we collapse (pop two numbers and one operator, compute, push result) until we can push the current operator.
def calculate_infix(s):
# Remove spaces, then parse numbers and + - * /
s = s.replace(" ", "")
i, n = 0, len(s)
num_stack = []
op_stack = []
precedence = {"+": 0, "-": 0, "*": 1, "/": 1}
def apply_op():
r, l = num_stack.pop(), num_stack.pop()
op = op_stack.pop()
if op == "+": num_stack.append(l + r)
elif op == "-": num_stack.append(l - r)
elif op == "*": num_stack.append(l * r)
else: num_stack.append(int(l / r))
while i < n:
if s[i].isdigit():
j = i
while j < n and s[j].isdigit(): j += 1
num_stack.append(int(s[i:j]))
i = j
else:
op = s[i]
while op_stack and precedence.get(op_stack[-1], -1) >= precedence[op]:
apply_op()
op_stack.append(op)
i += 1
while op_stack:
apply_op()
return num_stack[0]
This assumes a well-formed expression with no parentheses. For parentheses, push “(” and pop until “(” when we see “)”.
Line-by-Line: Postfix Evaluation
stack = []: We only need one stack for operands.if t in "+-*/": Token is an operator. Pop two operands (right first, then left).left - right,int(left / right): Order matters for - and /. Right was popped first. Integer division toward zero: useint(a / b)in Python for LeetCode 150.else: stack.append(int(t)): Token is a number string; convert and push.return stack[0]: Valid postfix leaves exactly one value.
Time and Space Complexity
Postfix evaluation: One pass over n tokens. Each token is pushed once; each operator causes two pops and one push. Time O(n), space O(n) for the stack (worst case: many operands before any operator). Infix to postfix: One pass; each token and each operator pushed and popped O(1) times. O(n) time and space.
Edge Cases
- Single operand: Postfix with one number, e.g.
["42"]. Stack ends with [42]. Return 42. - Negative numbers: In LeetCode 150, tokens are strings; “-2” might be one token. Check problem: sometimes negative numbers are represented as “-”, “2” (two tokens). Handle according to problem.
- Division by zero: Postfix can have “a 0 /”. Guard or assume valid input.
- Integer division: LeetCode 150 requires truncation toward zero. In Python,
int(6/ -132)is 0;6 // -132is -1. Useint(a / b)for “truncate toward zero.”
Common Mistakes
Wrong operand order for subtraction and division. We pop the right operand first, then the left. So result = left op right. For “5 3 -” we want 5 - 3 = 2. If you do right - left you get -2. Same for division: “6 2 /” should be 3, not 1/3.
- Precedence in infix: * and / must be applied before + and -. When converting to postfix, an operator of higher precedence on the stack is popped before pushing a lower-precedence one. When two operators have equal precedence (e.g. + and -), left associativity means pop the top first.
- Parentheses in infix: “(” has the effect of “starting fresh” for precedence; “)” pops until we remove the matching “(”. Don’t output “(” or “)” in the postfix.
Basic Calculator With Parentheses (LeetCode 224)
Expression may contain +, -, parentheses, and spaces. One approach: use a stack to store the result and sign for each “level” of parentheses. When we see “(”, push current result and current sign; when we see “)”, pop and combine. Alternatively: convert to postfix respecting parentheses (treat “(” as highest precedence so we don’t pop past it until “)”), then evaluate. Another approach: recursive or iterative with a sign variable; when we see “(”, evaluate the subexpression (recursively or with a stack) and multiply by the sign.
For “basic calculator” with +, -, (, ): keep a stack of (result_so_far, sign_before_this_level). When you see “(”, push (result, sign) and reset result=0, sign=1. When you see “)”, pop (prev, s), do result = prev + s * result. When you see “+”, sign=1; “-”, sign=-1. When you see a number, add sign*num to result. This avoids building a full postfix string.
For postfix: “I’ll use one stack. Scan tokens: numbers go on the stack; for an operator I pop two operands (right then left), compute left op right, push the result. Final stack top is the answer. O(n) time.” Mention operand order for - and / and integer division. For infix: “I can convert to postfix using an operator stack and precedence, then evaluate, or use two stacks and collapse when precedence allows.”
Practice Problems
- LeetCode 150: Evaluate Reverse Polish Notation (postfix evaluation).
- LeetCode 224: Basic Calculator (+,-, parentheses, spaces).
- LeetCode 227: Basic Calculator II (+, -, *, /, no parentheses).
Summary
- Postfix evaluation: One stack. Scan tokens: push numbers; for an operator pop two (right, then left), compute left op right, push result. Return stack[0]. Order of operands matters for - and /.
- Infix to postfix: Output list + operator stack. Numbers to output; “(” push; “)” pop to output until “(”; operator: pop and output while top has precedence ≥ current, then push. Evaluate the resulting postfix.
- Use
int(left / right)for integer division toward zero. Postfix and infix evaluation are O(n) time and space.
9.5 Queue Implementation
Introduction
A queue is a linear data structure that follows FIFO (First In, First Out): the first element added is the first one removed. It supports enqueue (add at the rear), dequeue (remove from the front), and peek (read the front without removing). Like a stack, we want these operations in O(1) time. In Python, using a list with append for enqueue and pop(0) for dequeue is wrong for performance—pop(0) is O(n) because it shifts all elements. The correct approach is collections.deque (double-ended queue), which supports O(1) append and popleft, or a linked list with head and tail pointers (enqueue at tail, dequeue at head). This section covers the queue ADT, correct Python usage, and a linked-list implementation for understanding.
Real-World Analogy
Think of a line at a ticket counter. People join at the back (enqueue) and are served from the front (dequeue). The first person in line is the first to be served—FIFO. You cannot serve someone from the middle; you only see who is at the front. Queues model task scheduling, BFS (breadth-first search), buffering, and any “first come, first served” scenario.
Enqueue 10, 20, 30. Front is 10. Dequeue returns 10; front is now 20. Dequeue returns 20; the queue has only 30. Order of removal is always the same as order of insertion.
Formal Definition
Queue (ADT): A collection that supports: (1) enqueue(x)—add element x at the rear (back) of the queue; (2) dequeue()—remove and return the element at the front; (3) peek() or front()—return the front element without removing it; (4) isEmpty()—return true if the queue has no elements. Optionally: size(). Only the front element can be removed or read; only the rear can receive new elements. FIFO order is guaranteed. All operations should be O(1) for an efficient implementation.
Why This Topic Matters
- BFS: Breadth-first search uses a queue to process nodes level by level. You enqueue neighbors and dequeue the current node. Using a list with pop(0) makes BFS O(n²) instead of O(n) on a graph with n nodes.
- Interview and production: “Implement a queue,” “queue using stacks” (LeetCode 232), “sliding window” with a deque (Section 9.9). Correct choice of structure (deque vs list) matters.
- Foundation for deque and priority queue: A queue is the basic FIFO structure; deque extends it with O(1) operations at both ends; priority queue (Section 9.8) orders by priority instead of arrival time.
Mental Model
Picture a horizontal tube: elements enter at the rear (right) and leave from the front (left). You only add at one end and remove from the other. In a linked list, we keep a head (front—where we dequeue) and a tail (rear—where we enqueue). Enqueue: create a new node, set tail.next = new_node, tail = new_node (or head = tail = new_node if empty). Dequeue: return head.data, head = head.next (and if head becomes None, set tail = None). In a deque (double-ended queue), the underlying structure allows O(1) append and popleft so we use it as a queue without implementing pointers ourselves.
Step-by-Step: Operations
Using collections.deque (recommended in Python)
- Enqueue:
q.append(x). O(1). - Dequeue:
q.popleft(). O(1). - Peek:
q[0]. O(1). - isEmpty:
len(q) == 0ornot q. O(1).
Using a list (avoid for queues)
Enqueue with append is O(1), but dequeue with pop(0) is O(n)—every element shifts. Do not use a list as a queue when you expect many enqueue/dequeue operations.
Using a linked list (head + tail)
Enqueue: new node at tail; update tail. Dequeue: remove head; update head (and tail if queue becomes empty). Both O(1).
ASCII Diagram
Queue (front left, rear right):
enqueue(10), enqueue(20), enqueue(30):
front → [10] → [20] → [30] ← rear
dequeue() → returns 10
front → [20] → [30] ← rear
peek() → 20
dequeue() → returns 20
front → [30] ← rear (head and tail point to same node)
Python Implementation
Using collections.deque (recommended)
from collections import deque
q = deque()
q.append(10) # enqueue
q.append(20)
q.append(30)
front = q.popleft() # dequeue → 10
peek = q[0] # peek → 20
is_empty = len(q) == 0
Queue class wrapping deque
from collections import deque
class Queue:
def __init__(self):
self._data = deque()
def enqueue(self, x):
self._data.append(x)
def dequeue(self):
if self.is_empty():
raise IndexError("dequeue from empty queue")
return self._data.popleft()
def peek(self):
if self.is_empty():
raise IndexError("peek from empty queue")
return self._data[0]
def is_empty(self):
return len(self._data) == 0
def size(self):
return len(self._data)
Linked-list implementation
class Node:
def __init__(self, data):
self.data = data
self.next = None
class QueueLinkedList:
def __init__(self):
self.head = None
self.tail = None
def enqueue(self, x):
new = Node(x)
if self.tail is None:
self.head = self.tail = new
else:
self.tail.next = new
self.tail = new
def dequeue(self):
if self.head is None:
raise IndexError("dequeue from empty queue")
val = self.head.data
self.head = self.head.next
if self.head is None:
self.tail = None
return val
def peek(self):
if self.head is None:
raise IndexError("peek from empty queue")
return self.head.data
def is_empty(self):
return self.head is None
Line-by-Line Explanation (Linked List)
- enqueue: New node at tail. If queue was empty, head = tail = new. Else tail.next = new, tail = new. O(1).
- dequeue: If empty, raise. Else save head.data, head = head.next. If head becomes None (was single element), set tail = None. Return saved value. O(1).
- peek: Return head.data if not empty. O(1).
Time and Space Complexity
Enqueue, dequeue, peek, is_empty: O(1) with deque or linked list. List with pop(0): Dequeue is O(n). Space O(n) for n elements stored.
Edge Cases
- Dequeue from empty queue: Undefined unless we define it. Raise IndexError or return None. Always check is_empty before dequeue/peek in production, or document that the caller must ensure non-empty.
- Single element: After one enqueue, head == tail. One dequeue leaves head and tail both None. The linked-list code handles this with the
if self.head is None: self.tail = Nonebranch.
Common Mistakes
Using a list with pop(0) for dequeue. pop(0) shifts all remaining elements left, so each dequeue is O(n). For BFS or any algorithm that does many enqueue/dequeue operations, this turns O(n) into O(n²). Always use collections.deque with append and popleft for a queue in Python.
- Forgetting to update tail when queue becomes empty: In the linked-list dequeue, when you remove the last element (head.next becomes None), you must set tail = None. Otherwise tail still points to the removed node and the next enqueue can create a broken list.
- Queue vs stack: Queue = FIFO (first in, first out); stack = LIFO (last in, first out). Use a queue for BFS and “first come first served”; use a stack for DFS and “most recent.”
Queue Using Two Stacks (LeetCode 232)
To implement a queue with two stacks: one stack for “input” (enqueue: push onto input stack) and one for “output” (dequeue: if output is empty, pop all from input and push onto output, then pop from output; else just pop from output). Peek: same as dequeue but don’t remove the top of output. Amortized O(1) per operation: each element is pushed and popped at most twice (once per stack). This is a common interview follow-up.
For “implement queue using stacks,” use stack_in and stack_out. enqueue: push to stack_in. dequeue: if stack_out is empty, while stack_in: stack_out.push(stack_in.pop()); then return stack_out.pop(). This way the oldest element in stack_in ends up at the top of stack_out. Amortized O(1) because each element moves at most once from in to out.
State: “A queue is FIFO—enqueue at rear, dequeue from front. In Python I use collections.deque with append for enqueue and popleft for dequeue—both O(1). I avoid list with pop(0) because that’s O(n) per dequeue.” If asked to implement from scratch, give the linked-list version with head and tail, and mention that deque is the standard library queue. If asked “queue using stacks,” describe the two-stack approach with amortized O(1).
Practice Problems
- LeetCode 232: Implement Queue using Stacks (two stacks; amortized O(1)).
- BFS on a graph or tree (queue for level-order).
- LeetCode 225: Implement Stack using Queues (reverse problem—one queue and reorganize on push, or two queues).
Summary
- A queue is FIFO: enqueue at rear, dequeue from front, peek at front. All O(1) with deque or linked list (head/tail).
- In Python, use
collections.deque:appendto enqueue,popleftto dequeue. Do not use a list withpop(0)—it is O(n) per dequeue. - Linked-list queue: enqueue at tail (create node, tail.next = new, tail = new); dequeue at head (return head.data, head = head.next; set tail = None if empty). Queue using two stacks: enqueue pushes to one stack; dequeue pops from the other, refilling from the first when empty—amortized O(1).
9.6 Circular Queue
Introduction
A circular queue (or ring buffer) is a queue implemented with a fixed-size array where the front and rear indices wrap around to the start when they reach the end. This gives O(1) enqueue and dequeue without shifting elements and without dynamic allocation—useful in embedded systems, producer-consumer buffers, and when a bounded capacity is required. The main design choice is how to distinguish “empty” from “full”: (1) reserve one slot (never store more than capacity − 1 elements) so that “front == rear” means empty and “(rear + 1) % capacity == front” means full; or (2) store a separate size (or count) so empty is size == 0 and full is size == capacity. This section covers both approaches and a complete Python implementation (LeetCode 622 style).
Real-World Analogy
Imagine a circular track with a fixed number of stations. People board at the “rear” station and get off at the “front” station. When the rear reaches the last station, the next boarding happens at station 0—the track is circular. When the front catches up to the rear (all stations between them are empty), the queue is empty. When the rear is one step behind the front (going forward), the queue is full. The circular arrangement means we never “shift” anyone; we only move the front and rear pointers (indices) and use modulo arithmetic to wrap.
Capacity 4 (indices 0–3). Empty: front = 0, rear = 0. Enqueue 10, 20: rear moves to 2, data = [10, 20, _, _]. Dequeue: return 10, front = 1. Enqueue 30, 40: rear wraps to 0, 1; data = [40, 20, 30, _] with front=1, rear=1 (if we use one-slot reservation) or we track size=3. The “hole” is at index 0 (behind front) for the next enqueue.
Formal Definition
Circular queue: A queue stored in a fixed-size array of capacity k. Two indices: front (where we dequeue) and rear (next free slot for enqueue, or last enqueued—depending on convention). Indices advance with wrap: front = (front + 1) % k, rear = (rear + 1) % k. Empty: no elements (front == rear if one slot is reserved; or size == 0). Full: capacity reached ((rear + 1) % k == front with one slot reserved; or size == k). Enqueue: place element at rear, advance rear (if not full). Dequeue: return element at front, advance front (if not empty). All operations O(1).
Why This Topic Matters
- LeetCode 622: Design Circular Queue: Direct implementation problem. Tests understanding of wrap-around and full/empty handling.
- Bounded buffers: Producer-consumer queues, task queues with a fixed maximum size, and streaming pipelines often use a circular queue to avoid unbounded growth and to reuse a fixed block of memory.
- No shifting, O(1) per operation: Unlike a linear array queue (where dequeue would shift or we’d leave a hole), the circular design reuses freed slots by wrapping the rear index.
Mental Model
Picture an array bent into a circle. front points to the next element to dequeue; rear points to the next slot where we’ll enqueue (or to the last enqueued element—convention varies). When we enqueue, we write at rear and do rear = (rear + 1) % k. When we dequeue, we read from front and do front = (front + 1) % k. The queue content is the segment from front to rear (going forward, wrapping). If we don’t reserve a slot, “front == rear” can mean both empty and full—so we either waste one slot (max capacity k−1) or maintain a separate size/count variable.
Two Conventions: One Slot Reserved vs Size Counter
Convention 1: Reserve one slot (capacity − 1 elements max)
- Empty:
front == rear. - Full:
(rear + 1) % capacity == front. We never fill the last slot so that “front == rear” uniquely means empty. - Enqueue: if not full, write at rear, then rear = (rear + 1) % capacity.
- Dequeue: if not empty, read from front, then front = (front + 1) % capacity.
Convention 2: Store size (or count)
- Empty:
size == 0. Full:size == capacity. - Enqueue: if size < capacity, write at rear, rear = (rear + 1) % capacity, size += 1.
- Dequeue: if size > 0, read from front, front = (front + 1) % capacity, size -= 1.
Both give O(1) operations. Convention 2 uses one extra integer (size) but allows storing exactly capacity elements; Convention 1 uses no extra variable but stores at most capacity − 1 elements.
Step-by-Step (Size-Based)
- Allocate array of length
capacity. front = 0, rear = 0, size = 0. - Enqueue(x): if size == capacity, return False (full). Else: arr[rear] = x, rear = (rear + 1) % capacity, size += 1, return True.
- Dequeue(): if size == 0, return None or error. Else: val = arr[front], front = (front + 1) % capacity, size -= 1, return val.
- Front(): if size == 0 return -1; else return arr[front]. Rear(): if size == 0 return -1; else return arr[(rear - 1 + capacity) % capacity] (last enqueued).
ASCII Diagram
Capacity 4, indices 0..3 (circular):
Empty: front=0, rear=0, size=0
[ _, _, _, _ ]
^
f,r
After enqueue 10, 20: front=0, rear=2, size=2
[10, 20, _, _ ]
^ ^
f r
After dequeue: front=1, rear=2, size=1
[10, 20, _, _ ]
^ ^
f r
After enqueue 30, 40: rear wraps to 0, 1; front=1, rear=2? No:
rear = (rear+1)%4 twice: 2→3, 3→0. So rear=0. size=3.
[40, 20, 30, _ ] (rear=0 is next write slot; last written at index 3)
^ ^ ^ ^
r f (front=1, rear=0)
Rear element = arr[(rear-1+4)%4] = arr[3] = 40.
Python Implementation (LeetCode 622 Style)
class MyCircularQueue:
def __init__(self, k: int):
self.capacity = k
self.arr = [0] * k
self.front = 0
self.rear = 0
self.size = 0
def enQueue(self, value: int) -> bool:
if self.isFull():
return False
self.arr[self.rear] = value
self.rear = (self.rear + 1) % self.capacity
self.size += 1
return True
def deQueue(self) -> bool:
if self.isEmpty():
return False
self.front = (self.front + 1) % self.capacity
self.size -= 1
return True
def Front(self) -> int:
if self.isEmpty():
return -1
return self.arr[self.front]
def Rear(self) -> int:
if self.isEmpty():
return -1
return self.arr[(self.rear - 1 + self.capacity) % self.capacity]
def isEmpty(self) -> bool:
return self.size == 0
def isFull(self) -> bool:
return self.size == self.capacity
LeetCode 622 uses method names enQueue, deQueue, Front, Rear, isEmpty, isFull; deQueue returns bool (success), and Front/Rear return -1 when empty. We use a size counter so we can store exactly capacity elements.
Line-by-Line Explanation
rear: Points to the next free slot. So after enqueue we write at rear, then advance rear.enQueue: If full, False. Else arr[rear] = value, rear = (rear + 1) % capacity, size += 1. O(1).deQueue: If empty, False. Else front = (front + 1) % capacity, size -= 1. We don’t need to clear the old cell. O(1).Rear(): Last enqueued element is at (rear - 1 + capacity) % capacity, because rear is the next free slot. If empty, return -1.
Time and Space Complexity
Enqueue, dequeue, front, rear, isEmpty, isFull: all O(1). Space O(capacity) for the array and O(1) for the indices and size.
Edge Cases
- Enqueue when full: Return False (or raise). Do not overwrite or advance rear.
- Dequeue when empty: Return False (or None). Do not advance front.
- Capacity 0 or 1: With size, capacity 1 is fine: one element, front == rear after one enqueue, size == 1. Rear() = arr[(rear-1+1)%1] = arr[0]. For capacity 0, enqueue should always fail.
Common Mistakes
Confusing full and empty when not using a size. If you use “front == rear” for both empty and full (and don’t reserve a slot), you cannot distinguish. You must either (1) reserve one slot so full is (rear+1)%cap == front and never let rear “catch up” to front except when empty, or (2) maintain a size/count. Many bugs come from forgetting the one-slot reservation or the size update.
- Wrong index for Rear(): If rear is the “next write” index, the last enqueued value is at (rear - 1 + capacity) % capacity. If rear is “last written,” then Rear() is arr[rear] and you advance rear after writing—then rear points to the last element. Be consistent with your convention.
- Forgetting to wrap: Always use (rear + 1) % capacity and (front + 1) % capacity. Plain rear + 1 can go out of bounds when rear == capacity - 1.
One-Slot-Reserved Implementation (No Size)
class CircularQueueReservedSlot:
def __init__(self, k: int):
self.capacity = k
self.arr = [0] * k
self.front = 0
self.rear = 0 # next free slot
def enQueue(self, value: int) -> bool:
if (self.rear + 1) % self.capacity == self.front:
return False # full
self.arr[self.rear] = value
self.rear = (self.rear + 1) % self.capacity
return True
def deQueue(self) -> bool:
if self.front == self.rear:
return False # empty
self.front = (self.front + 1) % self.capacity
return True
def Front(self) -> int:
if self.front == self.rear:
return -1
return self.arr[self.front]
def Rear(self) -> int:
if self.front == self.rear:
return -1
return self.arr[(self.rear - 1 + self.capacity) % self.capacity]
def isEmpty(self) -> bool:
return self.front == self.rear
def isFull(self) -> bool:
return (self.rear + 1) % self.capacity == self.front
Here we store at most capacity − 1 elements. Full is detected by (rear + 1) % capacity == front.
When implementing “design circular queue,” state your convention: “I’ll use a size counter so the queue can hold exactly capacity elements. front is the dequeue index, rear is the next enqueue index. Empty when size==0, full when size==capacity.” Alternatively: “I’ll reserve one slot so front==rear means empty and (rear+1)%cap==front means full; max elements = capacity−1.” Either is acceptable; size is easier to reason about for many people.
Explain: “Circular queue uses a fixed array and two indices that wrap with modulo. I use a size variable so empty is size==0 and full is size==capacity. Enqueue: if not full, write at rear, rear = (rear+1)%cap, size++. Dequeue: if not empty, front = (front+1)%cap, size--. Front and Rear read at front and (rear-1+cap)%cap.” Mention the alternative (one slot reserved) and the risk of confusing full and empty without it.
Practice Problems
- LeetCode 622: Design Circular Queue (implement with size or one-slot reservation).
Summary
- A circular queue uses a fixed array and indices front and rear that wrap:
(front + 1) % capacity,(rear + 1) % capacity. Enqueue at rear, dequeue at front. O(1) per operation. - Empty/full: either (1) use a size counter—empty when size==0, full when size==capacity (can store exactly capacity elements), or (2) reserve one slot—empty when front==rear, full when (rear+1)%capacity==front (max capacity−1 elements).
- Rear(): last enqueued is at
(rear - 1 + capacity) % capacitywhen rear is the “next free” index. Always wrap indices with modulo to avoid out-of-bounds.
9.7 Deque
Introduction
A deque (double-ended queue) is a linear structure that supports O(1) insertion and deletion at both the front and the rear. It generalizes both the stack (LIFO) and the queue (FIFO): you can push/pop from either end. In Python, collections.deque is implemented with a doubly linked list of blocks (or a circular buffer) and provides append, appendleft, pop, popleft, plus indexing and rotation. Use a deque when you need a queue (avoid list’s O(n) pop(0)), when you need stack-like and queue-like operations in the same structure, or when implementing sliding-window algorithms (e.g. monotonic deque for max in window). This section covers the deque ADT, the Python API, and typical use cases.
Real-World Analogy
Imagine a line where people can join or leave from either end. Someone can cut in at the front (appendleft) or join at the back (append); someone can leave from the front (popleft) or from the back (pop). That’s a deque. It’s more flexible than a strict queue (only rear in, only front out) or a stack (only one end). Real examples: browser history (back/forward can be modeled with two stacks or one deque), undo/redo with “insert at front” for new action; and algorithms that need to inspect or remove from both ends (e.g. sliding window maximum).
d = deque(); d.append(10); d.append(20); d.appendleft(5). Order: [5, 10, 20]. d.popleft() → 5; d.pop() → 20. Remaining: [10]. You can use the same deque as a queue (append right, popleft) or as a stack (append and pop from the right).
Formal Definition
Deque (double-ended queue): A collection that supports insertion and deletion at both ends in O(1) time. Typical operations: append(x) (add at right/rear), appendleft(x) (add at left/front), pop() (remove and return from right), popleft() (remove and return from left). Optionally: peek at front or rear (e.g. d[0], d[-1]), rotate (shift elements left or right), clear, len. No random access in O(1) in a pure linked-list deque; Python’s deque allows index access in O(n) worst case (it’s implemented for good performance in practice for near-end access). The key guarantee is O(1) append, appendleft, pop, popleft.
Why This Topic Matters
- Queue and stack in one: In Python,
dequeis the standard choice for a queue (append + popleft) and can double as a stack (append + pop). Using a list for a queue is wrong (pop(0) is O(n)); deque is correct. - Sliding window and monotonic deque: Problems like “max in every sliding window” (LeetCode 239) use a deque to keep candidates; we remove from the front when they leave the window and from the rear when a larger element makes them useless. Both ends are accessed in O(1).
- BFS, palindromes, and rotation: BFS uses a queue (deque). Checking “can this string be a palindrome?” sometimes uses a deque of characters. Rotate operations (e.g. rotate(-1) to move left) are O(k) in Python but useful in some problems.
Mental Model
Picture a horizontal tube open at both ends. You can add or remove from the left (front) or the right (rear). So you have four core operations: add-left, add-right, remove-left, remove-right. The deque maintains order: the first element is at index 0 (front), the last at index -1 (rear). Use it as a queue by restricting to add-right and remove-left; as a stack by restricting to add-right and remove-right.
Python: collections.deque API
- Constructor:
deque(),deque(iterable),deque(iterable, maxlen=k)(bounded deque; when full, appending drops from the other end). - Add:
d.append(x)(right),d.appendleft(x)(left). O(1). - Remove:
d.pop()(right),d.popleft()(left). O(1). Raises IndexError if empty. - Access:
d[0],d[-1](front and rear).d[i]for arbitrary i is supported but O(n) in the middle; use for small indices or when needed. - Other:
len(d),d.clear(),d.rotate(k)(positive k = rotate right: last becomes first; negative = rotate left).d.extend(iterable),d.extendleft(iterable)(note: extendleft adds elements in reverse order).
ASCII Diagram
Deque: front (left) ←—— [ 5, 10, 20, 30 ] ——→ rear (right)
index 0 ... index -1
append(40) → [ 5, 10, 20, 30, 40 ]
appendleft(1) → [ 1, 5, 10, 20, 30, 40 ]
popleft() → returns 1; deque = [ 5, 10, 20, 30, 40 ]
pop() → returns 40; deque = [ 5, 10, 20, 30 ]
Using Deque as Queue or Stack
| Use as | Add | Remove |
|---|---|---|
| Queue (FIFO) | append (rear) | popleft (front) |
| Stack (LIFO) | append (top) | pop (top) |
Python Examples
from collections import deque
# As queue (BFS)
q = deque()
q.append(1)
q.append(2)
x = q.popleft() # 1
# As stack
s = deque()
s.append(10)
s.append(20)
y = s.pop() # 20
# Both ends
d = deque([1, 2, 3])
d.appendleft(0) # [0, 1, 2, 3]
d.rotate(1) # [3, 0, 1, 2] (right rotate)
d.rotate(-1) # [0, 1, 2, 3] (left rotate)
Bounded Deque (maxlen)
When you create deque(iterable, maxlen=k), the deque can hold at most k elements. When it is full and you append, the leftmost element is automatically dropped; when you appendleft, the rightmost is dropped. So a bounded deque behaves like a sliding window of the last k elements (if you only append). Useful for “last k items” or a fixed-size buffer.
d = deque(maxlen=3)
d.append(1)
d.append(2)
d.append(3) # [1, 2, 3]
d.append(4) # [2, 3, 4] (1 dropped)
Time and Space Complexity
append, appendleft, pop, popleft: O(1). Index access d[i]: O(n) in the middle for a linked-list-based implementation; Python’s deque is optimized but still avoid repeated indexing in a loop. rotate(k): O(k). len(d), d[0], d[-1]: O(1). Space O(n) for n elements.
Edge Cases
- pop/popleft on empty deque: Raises IndexError. Check
if d:orlen(d) > 0before popping if the structure might be empty. - extendleft(iterable): Inserts elements in reverse order (each is appended to the left in turn). So deque([1,2]).extendleft([3,4]) gives [4, 3, 1, 2]. Use it when you want the iterable’s first element to end up at the left.
Common Mistakes
Using a list as a queue. q = []; q.append(x); q.pop(0) makes dequeue O(n). Use deque: q.append(x); q.popleft() for O(1). Similarly, don’t use list.insert(0, x) for “add to front” in a queue-like scenario—that’s O(n). Use deque.appendleft(x).
- Confusing left and right: “Front” of a queue is the left (index 0); we popleft. “Rear” is the right (index -1); we append there. In a stack, “top” is the right; we append and pop.
- Assuming deque supports efficient random access: d[i] for arbitrary i can be O(n). Prefer operating at the ends (d[0], d[-1], pop, popleft) when possible.
Monotonic Deque (Sliding Window Maximum)
For “max in every sliding window of size k,” we maintain a deque of indices such that their values are in decreasing order (front = index of current max in the window). When the window moves: (1) remove from front if that index has left the window; (2) from the rear, remove indices whose values are ≤ the new element (they can never be the max again); (3) append the new index at the rear. The front is always the index of the maximum in the current window. We need both popleft (index left the window) and pop (back is smaller than new)—hence a deque. See Section 9.9 (Monotonic Queue) for full detail.
When you need to remove from both the front (e.g. “element left the window”) and the rear (e.g. “this element is dominated by the new one”), use a deque. When you only remove from one end (e.g. stack: only top; queue: only front), a stack or a simple queue is enough. The deque is the “both ends” structure.
State: “A deque supports O(1) add and remove at both ends. In Python I use collections.deque: append and popleft for a queue, append and pop for a stack. For sliding window max I keep a deque of indices in decreasing value order and remove from front when the index is out of the window and from the rear when the new element is greater.” Mention that list with pop(0) or insert(0, x) is O(n) and should be avoided for queue-like use.
Practice Problems
- LeetCode 239: Sliding Window Maximum (monotonic deque of indices).
- LeetCode 232: Implement Queue using Stacks (or use deque as the queue).
- BFS: use deque with append and popleft.
- Palindrome checker: use a deque (popleft and pop to compare from both ends).
Summary
- A deque allows O(1) add and remove at both ends. In Python:
collections.dequewithappend,appendleft,pop,popleft. - Use as queue: append (enqueue), popleft (dequeue). Use as stack: append (push), pop (pop). Never use a list with pop(0) for a queue.
- Bounded deque:
deque(maxlen=k)drops the opposite end when full. Monotonic deque: keep candidates in order; remove from front (out of window) and from rear (dominated)—key for sliding window maximum.
9.8 Priority Queue
Introduction
A priority queue is a collection where each element has a priority (or key), and we always remove the element with the highest (or lowest) priority—instead of FIFO like a normal queue. It supports insert (add an element with a priority) and extract-max (or extract-min) in O(log n) time when implemented with a binary heap. Peek (see the max or min without removing) is O(1). Priority queues are used in Dijkstra’s algorithm (always expand the closest vertex), merge k sorted lists (always take the smallest of the k heads), “top k” and “kth largest” problems, and task scheduling. In Python, the heapq module provides a min-heap (smallest element at the top); for a max-heap we negate keys or use a custom comparator. This section covers the ADT, heap-based implementation, and the Python API.
Real-World Analogy
Think of an ER triage: patients are not served in arrival order but by urgency. The one with the highest priority (e.g. critical) is seen first. When a new patient arrives, they are inserted into the queue according to priority; when a doctor is free, the highest-priority patient is removed. Similarly, a task scheduler might always run the task with the earliest deadline or the highest priority. The priority queue is the data structure that supports “insert with priority” and “remove the best” efficiently.
Insert (task1, 3), (task2, 1), (task3, 2) with higher number = higher priority. Extract-max returns task1 (3), then task3 (2), then task2 (1). Order of removal is by priority, not by insertion time.
Formal Definition
Priority queue (ADT): A collection of (element, priority) pairs that supports: (1) insert(x, p) or push(x)—add element x with priority p (or with x comparable); (2) extract-max() or extract-min()—remove and return the element with maximum or minimum priority; (3) peek—return the max or min without removing. Optionally: increase-priority, decrease-priority, size, isEmpty. Implementations: binary heap (O(log n) insert and extract, O(1) peek), or balanced BST (same bounds, supports more operations). We focus on the heap-based implementation.
Why This Topic Matters
- Dijkstra and A*: Always extract the vertex with smallest tentative distance. Priority queue is the core; without it we’d scan all vertices each time—O(n²). With a heap we get O((V+E) log V).
- Merge k sorted lists (LeetCode 23): Keep the smallest element among the k current heads; extract it and push the next from that list. Min-heap of size k gives O(N log k) for N total elements.
- Top K / Kth largest (LeetCode 215, 347): Min-heap of size k: keep the k largest; the heap top is the kth largest. Or use quickselect. Priority queue is the standard “maintain top k” tool.
Mental Model
A binary min-heap is a complete binary tree where each node is ≤ its children. So the smallest element is at the root. We store the tree in an array: index 0 = root; for node at index i, left child at 2i+1, right at 2i+2, parent at (i-1)//2. Insert: add at the end and “bubble up” (swap with parent while smaller). Extract-min: save root, move last element to root, then “bubble down” (swap with the smaller child while larger than a child). Both are O(log n). For a max-heap, reverse the comparisons (each node ≥ children).
Python: heapq Module
heapq provides a min-heap on a list. The list is modified in place to satisfy the heap property (smallest at index 0).
- heappush(heap, item): Push item onto heap; heap is a list. O(log n).
- heappop(heap): Pop and return the smallest item. O(log n).
- heap[0]: Peek the smallest without removing. O(1).
- heapify(x): Transform list x into a heap in place. O(n). Use when you already have all elements; faster than pushing one by one.
- nlargest(k, iterable), nsmallest(k, iterable): Return the k largest or k smallest. For small k or small n this may use a heap internally.
Items must be comparable. For (priority, value) pairs, the first element is used for comparison. So push (priority, value); the smallest priority is at the top. For a max-heap, push (-priority, value) and negate when you pop.
ASCII Diagram: Min-Heap
Min-heap (smallest at root):
1
/ \
3 2
/ \ / \
7 4 5 6
Array: [1, 3, 2, 7, 4, 5, 6]
Index: 0 1 2 3 4 5 6
Parent of i: (i-1)//2. Children of i: 2i+1, 2i+2.
heappop() → 1; then last (6) moves to root and bubbles down.
heappush(0) → 0 goes at end, bubbles up to root.
Python Examples
Min-heap (default)
import heapq
h = []
heapq.heappush(h, 5)
heapq.heappush(h, 2)
heapq.heappush(h, 8)
heapq.heappush(h, 1)
x = heapq.heappop(h) # 1 (smallest)
y = h[0] # 2 (peek, don't remove)
Max-heap (negate keys)
import heapq
# Max-heap: store (-value, value) or just -value
h = []
heapq.heappush(h, -5)
heapq.heappush(h, -2)
heapq.heappush(h, -8)
max_val = -heapq.heappop(h) # 8
Priority queue with payload (e.g. for Dijkstra or merge k lists)
import heapq
# (priority, payload). Smallest priority is popped first.
pq = []
heapq.heappush(pq, (0, "start"))
heapq.heappush(pq, (2, "node2"))
heapq.heappush(pq, (1, "node1"))
priority, node = heapq.heappop(pq) # (0, "start")
When priorities tie, the second element is used to break ties (e.g. (0, "a") vs (0, "b")). If the payload is not comparable, use (priority, counter, payload) so that (priority, counter, x) is always comparable (counter = insertion order).
Time and Space Complexity
heappush: O(log n). heappop: O(log n). heap[0] (peek): O(1). heapify: O(n). Space O(n) for n elements. For “merge k lists” with total N nodes: we do N pushes and N pops, so O(N log k) time when the heap has at most k elements.
Edge Cases
- heappop on empty heap: Raises IndexError. Check
if heap:before popping. - Duplicate priorities: heapq is stable in the sense that equal elements are ordered by their second component (if tuples). For (priority, value), if two have the same priority, the one with the smaller value (or earlier insertion if you add a counter) comes out first. Don’t rely on order among equals unless you define it.
- Non-comparable payloads: Use (priority, index, payload) so that (p, i, x) is comparable even when x is not. The index breaks ties.
Common Mistakes
Assuming heapq is a max-heap. heapq is a min-heap: heappop returns the smallest element. For “kth largest” or “extract max,” negate the keys when pushing and negate again when popping, or use (negative priority, value).
- Pushing (value, priority) instead of (priority, value): heapq compares the first element. So push (priority, value) so that the smallest priority is on top. If you push (value, priority), the smallest value will be on top, which is wrong for “process by priority.”
- Using heap as a sorted list: A heap does not store elements in sorted order; only the root is guaranteed min (or max). To get sorted order, repeatedly heappop—that’s O(n log n). Don’t iterate over the list expecting sorted order.
Merge K Sorted Lists (Pattern)
Put the head of each of the k lists into a min-heap as (value, list_id, node_or_index). Pop the smallest; append to result; if that list has a next element, push (next.val, list_id, next). Repeat until the heap is empty. Each of the N total elements is pushed and popped once; heap size ≤ k. Time O(N log k), space O(k). LeetCode 23.
When using heapq with custom objects, make the first component of the tuple the key for ordering. For merge k lists, push (node.val, i, node) so that the smallest value is on top; when you pop, get the node and push (node.next.val, i, node.next) if node.next exists. Use a counter as the second component if you need to avoid comparing nodes: (val, counter, node) with counter incremented each push.
State: “A priority queue lets me always take the element with highest or lowest priority. In Python I use heapq: it’s a min-heap, so heappop gives the smallest. For max-heap I negate the key. Push and pop are O(log n), peek is O(1). I use it for Dijkstra (smallest distance), merge k lists (smallest of k heads), and top k (min-heap of size k).” Mention that heapify(list) is O(n) when you have all elements upfront.
Practice Problems
- LeetCode 23: Merge k Sorted Lists (min-heap of k heads).
- LeetCode 215: Kth Largest Element in an Array (min-heap of size k or quickselect).
- LeetCode 347: Top K Frequent Elements (heap of (count, value) or bucket sort).
- LeetCode 373: Find K Pairs with Smallest Sums (heap of (sum, i, j)).
- Dijkstra’s shortest path (priority queue of (distance, node)).
Summary
- A priority queue supports insert and extract-min (or extract-max) in O(log n) via a binary heap. Peek is O(1).
- In Python,
heapqis a min-heap:heappush(h, x),heappop(h),h[0]for peek. For max-heap, push-xand negate on pop. For (priority, payload), push (priority, payload) so smallest priority is on top. - Use for: Dijkstra, merge k sorted lists (O(N log k)), top k / kth largest (min-heap of size k), and any “always process the best” algorithm. heapify(list) is O(n) when building from a full list.
9.9 Monotonic Queue
Introduction
A monotonic queue is a deque used to maintain a sequence of indices (or values) in monotonic order while supporting removal from both ends: from the front when an element “leaves” (e.g. goes out of a sliding window) and from the rear when a new element “dominates” older ones (e.g. a larger value makes smaller values useless as future maxima). The classic application is sliding window maximum (LeetCode 239): for each window of size k, report the maximum value in O(1) amortized time, so that the total over n windows is O(n). We keep a deque of indices such that their corresponding values are in decreasing order; the front is always the index of the current window’s maximum. This section gives the full algorithm, intuition, and code.
Real-World Analogy
Imagine a line of people by height in a room with a sliding door. The door shows only k consecutive people (the “window”). You want to know the tallest person in the current view. When the door slides right, the leftmost person may leave (remove from front if their index is out of the window) and a new person enters on the right. Anyone in the line who is shorter than the new person can never be the “tallest in view” again, so we remove them from the back. The line always has people in decreasing height (front = tallest in the window). That’s the monotonic queue: we remove from the front (out of window) and from the back (dominated by the new element).
Array [1, 3, -1, -3, 5, 3, 6, 7], k = 3. Window [1,3,-1] → max 3; [3,-1,-3] → max 3; [-1,-3,5] → max 5; [-3,5,3] → max 5; [5,3,6] → max 6; [3,6,7] → max 7. Result [3, 3, 5, 5, 6, 7]. The deque holds indices; we popleft when index < current window start, and pop from the back while arr[back] ≤ arr[i], then append i.
Formal Definition
Monotonic queue (for sliding window max): A deque that stores indices of the array such that the corresponding values are in non-increasing order (front = index of current max). Invariant: for the current window [left, right], the front of the deque is the index of the maximum in that window. When we advance the window: (1) if the front index is less than left (out of window), remove it from the front (popleft); (2) from the rear, remove all indices whose values are ≤ the new element at the right end (they are dominated); (3) append the new index at the rear. Then the front is the index of the max for the current window. Each index is pushed once and popped at most once, so total time O(n).
Why This Topic Matters
- LeetCode 239: Sliding Window Maximum: The standard O(n) solution uses a monotonic deque. Naive “max of each window” is O(n k); with the deque it’s O(n).
- Pattern for “range max/min” in a sliding window: Same idea applies to “minimum in each window” (use increasing order: pop from rear when new element is smaller). Also appears in problems like “longest subarray with max - min ≤ limit” (two deques for max and min).
- Difference from monotonic stack: A stack only removes from one end. For a sliding window we must remove from the front (elements that left the window) and from the rear (dominated elements)—so we need a deque.
Mental Model
The deque holds “candidates” for being the maximum of the current window. A candidate is useful only if (1) it’s still inside the window (index ≥ left), and (2) no larger element has appeared after it (so we keep indices in decreasing value order). When we move the window right: first drop the front if it’s outside the window; then from the back, drop any index whose value is ≤ the new value (the new value is to the right of them and is at least as large, so they’ll never be the max again); then add the new index. The front is always the index of the maximum in [left, right].
Step-by-Step: Sliding Window Maximum
- Let
arrbe the array,kthe window size. Initializedq = deque()(of indices) andresult = []. - For each right index
ifrom 0 to n-1:- Remove from front: While dq is not empty and
dq[0] < i - k + 1(index is to the left of the window),dq.popleft(). - Remove from rear: While dq is not empty and
arr[dq[-1]] <= arr[i],dq.pop(). (Use < for “strictly smaller” so we keep one of equal values; or use <= to drop all that are ≤ new—both work; typically we drop when ≤ so the newest equal value stays.) - Append:
dq.append(i). - Record result: Once we have at least k elements (i ≥ k - 1), the max for the current window is
arr[dq[0]]. Append it to result.
- Remove from front: While dq is not empty and
- Return result. Length is n - k + 1.
ASCII Diagram
arr = [1, 3, -1, -3, 5, 3, 6, 7], k = 3
Window [0..2]: indices 0,1,2. dq after processing 0,1,2: [1] (3 beats 1 and -1)
max = arr[1] = 3.
Window [1..3]: dq[0]=1 is still in range. Process 3: arr[3]=-3, dq=[1,3]. max=3.
Window [2..4]: Process 4: arr[4]=5. Pop 3 (-3≤5), pop 1 (3≤5). dq=[4]. max=5.
Window [3..5]: Process 5: arr[5]=3. dq=[4,5]. max=5.
Window [4..6]: Process 6: arr[6]=6. Pop 5, pop 4. dq=[6]. max=6.
Window [5..7]: Process 7: arr[7]=7. Pop 6. dq=[7]. max=7.
Result: [3, 3, 5, 5, 6, 7]
Python Implementation (LeetCode 239)
from collections import deque
def max_sliding_window(nums, k):
dq = deque()
result = []
for i in range(len(nums)):
# Remove indices that are out of the current window
while dq and dq[0] < i - k + 1:
dq.popleft()
# Remove from rear: elements that are <= current (they're dominated)
while dq and nums[dq[-1]] <= nums[i]:
dq.pop()
dq.append(i)
# First window is complete when i >= k - 1
if i >= k - 1:
result.append(nums[dq[0]])
return result
We use <= when comparing values so that when two values are equal, we keep the newer index (closer to the right of the window). Alternatively use < to keep the leftmost of equals; both give correct max. For “min in each window,” use a deque with increasing order: pop from rear when nums[dq[-1]] >= nums[i].
Line-by-Line Explanation
dq[0] < i - k + 1: The current window is [i - k + 1, i]. So any index < i - k + 1 is to the left of the window and must be removed from the front.nums[dq[-1]] <= nums[i]: The element at dq[-1] is ≤ the new element at i. Since i is to the right, the element at dq[-1] can never be the max of any future window that includes i. So we pop it from the rear.dq.append(i): Add the current index. After the two while loops, the deque is still in decreasing order by value (front = max in window).if i >= k - 1: The first window that is complete has right index k - 1 (indices 0..k-1). From then on, every step produces one window max.
Time and Space Complexity
Each index is appended once and removed at most once (either from the front when it leaves the window or from the rear when it’s dominated). So the total number of operations is O(n). Time O(n). The deque can hold at most k indices (one per window position), so space O(k); often written as O(n) for the result array. For the deque itself, O(k) is accurate.
Edge Cases
- k = 1: Each window has one element; result is a copy of the array. The deque always has one element after each step. Correct.
- k = n: One window; result has one element = max(arr). The deque may shrink to one index after processing all elements. Correct.
- k > n: Problem usually assumes k ≤ n. If k > n, we might return [] or the max of the whole array depending on problem definition.
- Strictly decreasing array: The deque will often have one element (the current index), as each new element dominates all previous. Result is the array itself (each window max is its rightmost element).
Common Mistakes
Removing from the wrong end. Elements that leave the window are at the front of the deque (they were added earliest among current candidates). So we popleft when dq[0] < left. Elements that are dominated by the new value are at the rear (we just added larger ones after them). So we pop from the rear when arr[dq[-1]] ≤ arr[i]. Swapping these (e.g. popping from rear for “out of window”) breaks the invariant.
- Storing values instead of indices: We need indices to know when an element is out of the window (index < i - k + 1). If we stored only values, we couldn’t tell when to remove from the front. Always store indices in the deque.
- Wrong comparison for “min in window”: For sliding window minimum, we want the deque in increasing order (front = min). Pop from rear when arr[dq[-1]] >= arr[i]. The new smaller value dominates.
Monotonic Queue vs Monotonic Stack
| Structure | Removal | Typical use |
|---|---|---|
| Monotonic stack | Only from top (one end) | Next greater/smaller element (no window) |
| Monotonic queue | Front (out of window) and rear (dominated) | Sliding window max/min |
Sliding Window Minimum
For the minimum in each window, keep the deque in increasing order (front = index of current min). Pop from front when index < i - k + 1. Pop from rear when arr[dq[-1]] >= arr[i] (the new value is smaller or equal, so the rear is dominated). Then result.append(nums[dq[0]]). Same O(n) time.
For “subarray range” problems (e.g. “number of subarrays where max - min ≤ limit”), maintain two monotonic deques: one for max (decreasing) and one for min (increasing). For each right, advance left as needed so that max - min ≤ limit (using the two fronts), then count subarrays. Same O(n) idea.
State: “For sliding window max I use a deque of indices keeping decreasing order by value. When the window moves: remove from front if the index is out of the window, remove from rear all indices whose values are ≤ the new element, then append the new index. The front is always the max index. Each index is pushed and popped at most once, so O(n).” Mention that we need indices (not just values) to detect “out of window.” For window min, same idea with increasing order and pop rear when arr[rear] >= arr[i].
Practice Problems
- LeetCode 239: Sliding Window Maximum (decreasing deque of indices).
- Sliding window minimum (increasing deque).
- LeetCode 1438: Longest Continuous Subarray With Absolute Diff Less Than or Equal to Limit (two deques for max and min).
Summary
- A monotonic queue is a deque that maintains indices in monotonic order (decreasing for window max, increasing for window min). We remove from the front when the index leaves the window and from the rear when the new element dominates.
- Sliding window maximum: Deque of indices, values in decreasing order. Popleft when dq[0] < i - k + 1; pop when arr[dq[-1]] <= arr[i]; append i; if i ≥ k - 1, result.append(arr[dq[0]]). Time O(n), space O(k).
- Store indices (not values) so we can check “out of window.” For window minimum, use increasing order and pop rear when arr[dq[-1]] >= arr[i].
10.1 Hash Tables
Introduction
A hash table (hash map, dictionary, or associative array) is a data structure that maps keys to values and supports insert, lookup, and delete by key in O(1) average time. It works by applying a hash function to the key to get an index into an array of “buckets”; each bucket can hold one or more (key, value) pairs. Collisions (two keys hashing to the same index) are handled by chaining (a list per bucket) or open addressing (probe to the next free slot). In Python, dict and set are hash-table based: dict stores key→value, set stores unique keys only. Hash tables are the go-to when you need fast “find by key” or “check membership” and don’t need order. This section covers the idea, hash functions, collision handling at a high level, and Python usage.
Real-World Analogy
Think of a library catalog: you look up a book by its call number (like a hash). The call number tells you which shelf (bucket) to go to. Several books might share the same shelf (collision); you then scan that shelf to find the exact book. The “hash function” (call number system) spreads books across many shelves so no single shelf has too many. If the hash is good, lookup is fast—you go to one shelf and do a short scan. Hash tables work the same: hash(key) → bucket index → search within that bucket (or probe) to find the key.
Store ("apple", 1), ("banana", 2), ("cherry", 3). Hash of "apple" might be 3, "banana" 7, "cherry" 3 (collision with apple). Bucket 3 holds [("apple", 1), ("cherry", 3)]; bucket 7 holds [("banana", 2)]. Lookup "cherry": hash→3, scan bucket 3, find ("cherry", 3). O(1) average if buckets are small.
Formal Definition
Hash table: A data structure that implements a map from keys to values (or a set of keys). It uses a hash function h(key) that maps each key to an integer in [0, m−1] (bucket index), where m is the number of buckets. Operations: insert(key, value) (or set.add(key)), get(key) / lookup, delete(key). With a good hash function and load factor (n/m) kept bounded, these operations are O(1) average. Worst case (all keys in one bucket) is O(n). Keys must be hashable (immutable and with a consistent __hash__) and equality-comparable. Mutable types (list, dict) are not hashable in Python.
Why This Topic Matters
- Most used structure in interviews: “Two sum,” “group anagrams,” “first non-repeating character,” “subarray with sum k”—all rely on fast lookup. Reaching for a dict or set often turns O(n²) into O(n).
- Python dict and set: Both are hash tables. dict for key→value; set for unique keys and O(1) membership. Knowing when to use which (and that keys must be hashable) is essential.
- Collision handling and load factor: Understanding chaining vs open addressing and why we resize (to keep load factor low) helps you reason about average vs worst case and about custom hash tables (e.g. in systems design).
Mental Model
An array of buckets. Each key is sent to a bucket by hash(key) % m. If we use chaining, each bucket is a list (or another structure) of (key, value) pairs; lookup: go to bucket, scan the list for the key. If we use open addressing, each bucket holds at most one pair; on collision we “probe” (e.g. linear probing: try next index) until we find an empty slot or the key. The load factor α = n/m (number of elements / number of buckets) should stay below a threshold (e.g. 0.7) so that average chain length or probe length is small. When α gets too high, we resize (double m, rehash all keys).
Hash Function
A good hash function: (1) is deterministic (same key → same hash), (2) spreads keys uniformly over buckets (reduces collisions), (3) is fast to compute. For integers, often h(x) = x % m (or a more sophisticated mix). For strings, we might combine character codes: e.g. h = 0; for c in s: h = (h * 31 + ord(c)) % m. Python’s built-in hash() is used by dict and set; it can vary between runs (salted for security), but within one run it is consistent. For user types, define __hash__ and __eq__; if two objects are equal they must have the same hash.
Collision Handling (Overview)
- Chaining: Each bucket is a list (or linked list). Insert: append to the list at hash(key). Lookup/delete: scan that list. Average list length = n/m = α. So average O(1 + α) = O(1) if α = O(1).
- Open addressing: One item per bucket. On collision, probe (e.g. linear: (h + i) % m, or quadratic, or double hashing). Lookup: probe until we find the key or an empty slot. Delete: can use a “tombstone” marker. Average probes also O(1) for low load factor. See Section 10.2 for detail.
Python: dict and set
Both are implemented with hash tables. dict: key → value; keys must be hashable and unique. set: unordered collection of unique hashable elements. Operations:
- dict: d[key] = value (insert/update), d[key] or d.get(key) (lookup), key in d (membership), del d[key] or d.pop(key) (delete). Average O(1).
- set: s.add(x), x in s, s.remove(x) or s.discard(x). Average O(1).
Keys must be immutable (or at least their hash must not change). So int, str, tuple (of hashables) are fine; list and dict are not. Use tuple(list) or frozenset if you need to hash a sequence or set.
ASCII Diagram
Hash table (chaining), m = 4, keys "a","b","c","d"; h("a")=0, h("b")=1, h("c")=0, h("d")=2.
Buckets: 0: [("a", val_a), ("c", val_c)]
1: [("b", val_b)]
2: [("d", val_d)]
3: []
Lookup "c": h("c")=0 → bucket 0 → scan → find ("c", val_c).
Time and Space Complexity
Average case (uniform hashing, load factor α = O(1)): insert, lookup, delete O(1). Worst case (all keys collide): O(n) per operation. Space: O(n) for n key-value pairs plus the bucket array O(m); total O(n + m). With resizing to keep α bounded, m = Θ(n), so space O(n).
Edge Cases
- Mutable keys: Don’t use list or dict as a key—they’re unhashable. TypeError. Use tuple or frozenset if you need to key by a sequence or set.
- Missing key: d[key] raises KeyError if key not in d; d.get(key) returns None (or a default). Use get when you’re not sure the key exists.
- Empty dict/set: {} or set(). Check with if d: or len(d) == 0.
Common Mistakes
Using a list as a dict key. Lists are mutable and unhashable. d[[1,2]] = 3 raises TypeError. Use tuple([1,2]) as the key if you need to store by a sequence. Same for set: s.add([1,2]) fails; use frozenset or tuple.
- Assuming order in Python 3.6 and earlier: In Python 3.7+, dict preserves insertion order. In 3.6 and earlier, dict order was not guaranteed. Don’t rely on order unless you know the version or use collections.OrderedDict.
- Modifying a dict while iterating: Adding or deleting keys during iteration can raise RuntimeError or give unpredictable results. Iterate over a copy of the keys (list(d)) or collect changes and apply after the loop.
When to Use Hash Table
Use a dict when you need: fast lookup by key, fast insert/update by key, or to count/frequency (key → count). Use a set when you need: unique elements, O(1) membership (x in s), or to remove duplicates. If you need order (sorted keys or insertion order), consider OrderedDict or sorted(d.keys()). If you need range queries (“all keys between a and b”), a hash table is not the right tool—use a balanced tree or sorted structure.
For “count frequency” or “group by key,” the pattern is: one pass, for each item do d[key] = d.get(key, 0) + 1 or d[key] = d.get(key, []) + [item]. For “check if seen,” use a set: seen.add(x) and if x in seen. For “two sum” style (find a pair with sum k), store seen values in a set or store value→index in a dict and check for complement.
State: “I’ll use a hash map (dict) for O(1) lookup by key. I’ll store … (e.g. value → index for two sum, or character → frequency). One pass, for each element I check/update the map. Total time O(n), space O(n).” Mention that keys must be hashable. For “unique elements” or “membership,” say “I’ll use a set for O(1) add and O(1) membership check.”
Practice Problems
- LeetCode 1: Two Sum (dict: value → index; check for complement).
- LeetCode 49: Group Anagrams (dict: sorted string or tuple of counts → list of strings).
- LeetCode 387: First Unique Character (dict: char → count or first index).
- LeetCode 217: Contains Duplicate (set to check seen).
Summary
- A hash table maps keys to values (or stores a set of keys) with O(1) average insert, lookup, and delete. It uses a hash function to choose a bucket and handles collisions by chaining or open addressing.
- In Python,
dict(key→value) andset(unique keys) are hash tables. Keys must be hashable (immutable; no list/dict as key). Use dict.get(key, default) to avoid KeyError. - Keep load factor low (resize when needed) for O(1) average. Use hash tables for fast lookup, membership, counting, and grouping—the default for “find by key” and “have I seen this?” in interviews.
10.2 Collision Handling
Introduction
A collision occurs when two different keys hash to the same bucket index. Because the number of possible keys is usually much larger than the number of buckets, collisions are inevitable. The way we handle them determines the performance and implementation of the hash table. The two main strategies are chaining (each bucket holds a list of (key, value) pairs) and open addressing (each bucket holds at most one pair; on collision we “probe” for another slot). Each has tradeoffs: chaining is simple and handles high load factors well; open addressing avoids pointers and can have better cache behavior but requires careful handling of deletions and load factor. This section details both methods, probe sequences, load factor, and deletion.
Real-World Analogy
With chaining, each shelf (bucket) can hold multiple books—like a stack or a list on that shelf. When two books have the same call number (collision), both go on the same shelf; you scan that shelf to find the one you need. With open addressing, each shelf holds exactly one book. If the shelf is taken, you look at the “next” shelf (linear probe), or skip by a rule (quadratic or double hashing) until you find an empty one. The table never has more than one item per slot, but finding the right slot can take several steps when the table is crowded.
Keys 5 and 13 both hash to index 2 (e.g. 5 % 8 = 5, 13 % 8 = 5 if we had 8 buckets—or imagine both give 2). Chaining: bucket 2 = [(5, v1), (13, v2)]. Lookup 13: go to bucket 2, scan, find (13, v2). Open addressing: bucket 2 has (5, v1); we try 3, 4, … until an empty slot; we store (13, v2) there. Lookup 13: start at 2, see 5; probe until we find 13 or an empty slot.
Formal Definition
Chaining (separate chaining): Each bucket is a container (list, linked list) that can hold multiple (key, value) pairs. Insert: compute h(key), append to the container at that index. Lookup: go to bucket h(key), search the container for the key. Delete: go to bucket, remove the pair from the container. Open addressing: The table is a single array; each slot holds at most one (key, value) or is empty. Insert: if slot h(key) is occupied, use a probe sequence (e.g. (h(key) + i) % m for i = 0, 1, 2, …) until an empty slot is found. Lookup: follow the same probe sequence until the key is found or an empty slot is reached. Delete: requires special handling (tombstone or rehash) so that lookups for keys that probed past the deleted slot still work.
Why This Topic Matters
- Implementation choice: When building a custom hash table (e.g. in systems or interviews), you must choose and implement a collision strategy. Chaining is easier to get right; open addressing is common in high-performance or embedded settings.
- Worst case vs average: With a bad hash function or adversarial keys, chaining can degenerate to one long chain (O(n) per operation). Open addressing can suffer from clustering (long probe runs). Understanding both helps you reason about resizing and load factor.
- Deletion in open addressing: You cannot simply clear the slot—a later key might have probed past it. Tombstones or “lazy delete” plus periodic rehash are the standard solutions.
Chaining (Separate Chaining)
Each bucket is a list (or linked list) of entries. Insert(k, v): Compute i = h(k) % m. If the table uses a list of lists, append (k, v) to table[i]. Lookup(k): i = h(k) % m; scan table[i] for an entry with key k; return value or None. Delete(k): i = h(k) % m; remove the entry with key k from table[i].
Analysis: Let n = number of elements, m = number of buckets, α = n/m (load factor). Assuming uniform hashing, the expected length of a chain is α. So lookup and delete take O(1 + α) on average. If we keep α = O(1) (e.g. resize when α > 1 or 2), operations are O(1). Worst case: all keys in one bucket → O(n).
Python-style pseudocode (chaining)
def insert(table, key, value):
i = hash(key) % len(table)
for j, (k, v) in enumerate(table[i]):
if k == key:
table[i][j] = (key, value)
return
table[i].append((key, value))
def lookup(table, key):
i = hash(key) % len(table)
for k, v in table[i]:
if k == key:
return v
return None
Open Addressing
One entry per slot. When slot i = h(key) % m is occupied by another key, we probe. The probe sequence is a sequence of indices i₀, i₁, i₂, … that must eventually cover all slots (so we can find an empty slot if the table is not full).
Linear probing
Probe sequence: (h(key) + i) % m for i = 0, 1, 2, …. Simple but causes primary clustering: consecutive occupied slots form runs, and new keys that hash into the run make it longer. Average probe length grows quickly as α approaches 1.
Quadratic probing
Probe sequence: (h(key) + c₁·i + c₂·i²) % m for i = 0, 1, 2, …. Reduces primary clustering but can fail to find an empty slot even when one exists (unless m and the constants are chosen carefully). Often used when α is kept below 0.5.
Double hashing
Probe sequence: (h₁(key) + i · h₂(key)) % m for i = 0, 1, 2, …, with a second hash function h₂. Good spread and fewer clusters. h₂(key) must be nonzero and relatively prime to m for the probe to cover all slots.
Lookup: Follow the probe sequence; if we find the key, return its value; if we find an empty slot, the key is not present. Insert: Follow the probe sequence; use the first empty slot (or first tombstone, depending on policy). Delete: If we simply clear the slot, a later lookup for a key that had probed past this slot might stop at the empty slot and incorrectly report “not found.” So we either (1) mark the slot as a tombstone (deleted but not empty—probe continues past it), or (2) rehash all remaining elements in the same cluster. Tombstones are common; they can be reused on insert. Too many tombstones can slow lookups; periodic rehash cleans them.
Load Factor and Resizing
The load factor is α = n / m (number of elements / number of buckets). For chaining, α can exceed 1 (average chain length = α). For open addressing, α must be < 1. To keep operations O(1) average:
- Chaining: Resize (e.g. double m) when α exceeds a threshold (e.g. 1 or 2). Rehash all entries into the new table.
- Open addressing: Resize when α exceeds 0.7–0.8 (or similar). Higher α causes long probe sequences. After resize, rehash all entries (tombstones are not copied).
Resizing is O(n) but amortized over many inserts, so average insert stays O(1).
Comparison: Chaining vs Open Addressing
| Aspect | Chaining | Open Addressing |
|---|---|---|
| Slots | One list per bucket; multiple items per index | One item per slot; array only |
| Load factor | Can be > 1 | Must be < 1 |
| Delete | Remove from list | Tombstone or rehash cluster |
| Cache | Pointer chasing in lists | Sequential probe can be cache-friendly |
ASCII Diagram: Linear Probing
m = 8. Insert keys that hash to: 5→2, 13→2, 20→4, 7→7.
After 5: [_, _, (5,v), _, _, _, _, _] index 2
After 13: [_, _, (5,v), (13,v), _, _, _, _] collision at 2, probe to 3
After 20: [_, _, (5,v), (13,v), (20,v), _, _, _]
After 7: [_, _, (5,v), (13,v), (20,v), _, _, (7,v)]
Lookup 13: start at 2, see 5; probe 3, see 13 → found.
Delete 5: if we clear slot 2 → [_, _, _, (13,v), ...]
Lookup 13: start at 2, see empty → not found (wrong!). So use tombstone at 2.
Common Mistakes
In open addressing, clearing a slot on delete. Clearing the slot breaks the probe chain. A key that was inserted after the deleted key (and probed past this slot) would be found by continuing past the deleted slot. Once the slot is empty, lookup stops there and incorrectly reports “not found.” Use a tombstone (special “deleted” marker) so that lookup and insert treat it as “keep probing.”
- Assuming no collisions: In analysis, always account for collisions. Average case assumes uniform hashing and load factor O(1). Worst case is O(n) if all keys collide.
- Forgetting to rehash after resize: When we double m, every key must be reinserted into the new table (new indices). Tombstones are not carried over.
Python's dict
CPython’s dict uses open addressing with a variant of random probing (perturbation) to reduce clustering. Deleted slots are marked (tombstone). Resizing happens when the table is about two-thirds full. You don’t implement this yourself in Python—but knowing that dict is open-addressing based explains why “one slot per entry” and “tombstones” matter in general.
In interviews, you usually just say “hash table with chaining” or “open addressing” at a high level. If asked to implement, chaining is simpler: list of lists, insert/lookup/delete by scanning the list at hash(key) % m. Mention load factor and resizing (double m, rehash) to keep O(1) average.
If asked “how do hash tables handle collisions?”, say: “Two main ways: chaining—each bucket is a list, we append and scan; and open addressing—one item per slot, we probe (e.g. linear or double hashing) until we find an empty slot or the key. For open addressing, delete uses a tombstone so we don’t break the probe chain. We keep load factor low and resize when needed for O(1) average.”
Summary
- Chaining: Each bucket is a list of (key, value) pairs. Insert: append to list at h(key)%m. Lookup/delete: scan that list. Average O(1+α); α can be > 1.
- Open addressing: One entry per slot. Collision: probe (linear, quadratic, or double hashing) until empty slot or key found. Delete: use tombstone so probe chain is not broken. α must be < 1; resize when α gets high.
- Resize (double m, rehash all) to keep load factor bounded. Python’s dict uses open addressing with tombstones and resizing.
10.3 Dictionary Internals
Introduction
Understanding how Python’s dict is implemented helps you reason about performance, key requirements (hashable, immutable), and insertion order. CPython’s dict uses open addressing with a perturbed probe sequence (not plain linear probing) to reduce clustering. It stores entries in a compact table (indices + keys + values) and maintains insertion order (since Python 3.7). Resizing follows a growth policy (typically roughly 2× when about two-thirds full). This section explains the high-level layout, hash and probe, resizing, and gives concrete examples so you can predict behavior and avoid pitfalls.
Real-World Analogy
Think of a dict as a filing cabinet with numbered drawers. The hash of the key tells you which drawer to try first. If it’s full (collision), you use a fixed “perturbation” rule to try other drawers in a pseudo-random order so you don’t always pile up in one place. The cabinet can be resized (bigger cabinet, same files re-filed). There is also a separate list of keys in insertion order (like a log of who put what in when)—that’s why in Python 3.7+ the order is guaranteed. The “key must be immutable” rule is like: you can’t change the label on a file after it’s been filed, or the drawer number would no longer match.
d = {"a": 1, "b": 2, "c": 3}. The keys "a", "b", "c" are hashed and stored; the order of iteration is "a", "b", "c" (insertion order). If you do d["b"] = 20, the key "b" stays in the same logical position; only the value changes. If you do d["d"] = 4, a new entry is added at the “end” of the order. So list(d.keys()) is ["a", "b", "c", "d"]. Deleting "b" and re-inserting it would put "b" at the end in current CPython behavior (deletion can affect the internal order of the compact list).
Formal Definition
CPython dict layout (simplified): (1) A hash table (indices array): each slot holds an index into the “entries” array, or is empty/dummy. (2) An entries array: stores (hash, key, value) in a compact form, with entries in insertion order (or a variant that preserves iteration order). Lookup: compute hash(key), use perturbed probing on the indices table to find the slot that points to the entry with matching key; then return that entry’s value. Insert: probe for an empty slot or the same key; if new key, add to entries and store the index in the slot. Delete: mark slot as dummy (tombstone) so probe chains still work. Resize: When the table is about 2/3 full, allocate a larger indices table (e.g. 2× or 4×) and reinsert all entries. Insertion order: The entries array (or equivalent) is built in insertion order so that iterating the dict yields keys in the order they were first inserted.
Why This Topic Matters
- Key requirements: Dict keys must be hashable (immutable and implementing __hash__ and __eq__). Understanding “why immutable?” comes from knowing that the hash is used to find the slot—if the key changed after insertion, the hash would change and we’d look in the wrong place.
- Insertion order (3.7+): You can rely on dict preserving order in iteration and in **kwargs. This is part of the language spec and is used in JSON and serialization. Knowing it avoids confusion with older Python or other languages.
- Performance and resizing: Insert can occasionally be O(n) when a resize happens, but amortized O(1). Understanding resizing explains why “pre-sizing” a dict (e.g. with an initial capacity) is rarely needed in Python—the implementation handles it.
Hash and Probe (High Level)
Python calls hash(key) to get an integer. For user-defined types, this uses id() by default unless you define __hash__. The hash is then perturbed (e.g. mixed with higher bits) before reducing to an index, so that similar keys don’t cluster. The probe sequence is not linear; it uses the perturbed value to jump around the table. This reduces clustering and keeps average probe length low. Exact details are in the CPython source (Objects/dictobject.c); for interviews, “open addressing with a smart probe and resizing” is enough.
Insertion Order (Python 3.7+)
Since Python 3.7, the language guarantees that dict maintains insertion order: the order in which keys were first inserted is the order in which they are iterated. So:
d = {}
d["z"] = 1
d["a"] = 2
d["m"] = 3
print(list(d.keys())) # ['z', 'a', 'm'] — same as insertion order
Reassigning a value does not change order:
d["a"] = 99
print(list(d.keys())) # still ['z', 'a', 'm']
Deleting a key removes it from the order; re-inserting the same key puts it at the “end” of the current order:
del d["a"]
d["a"] = 100
print(list(d.keys())) # ['z', 'm', 'a'] — 'a' is now last
Resizing and Growth
When the number of entries reaches about 2/3 of the indices table size, the dict is resized: a new, larger indices table is allocated (sizes follow a sequence that roughly doubles), and every entry is reinserted. This is O(n) but happens only periodically, so amortized insert is O(1). You don’t need to “pre-allocate” a dict for normal use; dict.fromkeys(iterable) or building in one pass is fine.
Start with an empty dict. First few inserts use a small table. After enough inserts, the table is resized; subsequent lookups use the new indices. You can observe “steps” in memory growth if you measure size, but for correctness you can assume O(1) average insert and lookup.
Keys Must Be Hashable
An object is hashable if it has a __hash__ that doesn’t change during its lifetime and implements __eq__ so that equal objects have the same hash. Immutable built-in types (int, float, str, tuple) are hashable. Lists and dicts are not (they are mutable). So:
d = {}
d[[1, 2]] = 3 # TypeError: unhashable type: 'list'
# Correct: use tuple
d[tuple([1, 2])] = 3 # OK
d[(1, 2)] = 4 # overwrites; (1, 2) == tuple([1, 2])
Using a tuple as key is standard when you need to key by a sequence:
# Group by pair (a, b)
pairs = [(1, 2), (1, 3), (2, 2)]
groups = {}
for a, b in pairs:
key = (a, b)
groups[key] = groups.get(key, 0) + 1
print(groups) # {(1, 2): 1, (1, 3): 1, (2, 2): 1}
Dictionary Internals: Compact Representation
Modern CPython uses a compact representation: a separate array of indices into a dense array of (hash, key, value) entries. The dense array is in insertion order; the indices array is the “hash table” that maps hash → index. This gives both O(1) lookup (via the indices) and ordered iteration (via the dense array). When you delete, the indices slot is marked as unused (dummy); the dense entry can be left in place or the table can be compacted, depending on implementation details.
Practical Examples
Example 1: Building a frequency map
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
freq = {}
for w in words:
freq[w] = freq.get(w, 0) + 1
# freq = {'apple': 3, 'banana': 2, 'cherry': 1}
# Order is insertion order: apple first, then banana, then cherry.
Example 2: Inverting a dict (value → key)
d = {"a": 1, "b": 2, "c": 1}
inv = {}
for k, v in d.items():
inv.setdefault(v, []).append(k)
# inv = {1: ['a', 'c'], 2: ['b']}
# If multiple keys have the same value, they go in a list.
Example 3: Default value and key existence
d = {"x": 10}
print(d.get("y")) # None (key missing)
print(d.get("y", 0)) # 0 (default)
print("y" in d) # False
d["y"] = d.get("y", 0) + 1 # safe: now d["y"] == 1
Example 4: Why mutable keys fail
# Lists are mutable — unhashable
try:
{[1, 2]: "bad"}
except TypeError as e:
print(e) # unhashable type: 'list'
# Tuple of immutables is fine
{(1, 2): "ok", (3, 4): "ok"} # OK
# Tuple containing a list is not hashable
try:
{(1, [2, 3]): "bad"}
except TypeError as e:
print(e) # unhashable type: 'list'
Time and Space Complexity
Average: insert, lookup, delete O(1). Amortized: insert O(1) (resize cost spread over many inserts). Worst case: O(n) if many keys collide or during resize. Space: O(n) for n entries plus the indices table (typically similar in size to the number of entries after resizing). Iteration is O(n) and yields keys (or items) in insertion order.
Common Mistakes
Using a list or dict as a key. Only hashable types can be keys. Use tuple(seq) or frozenset(s) if you need to key by a sequence or set. Remember: a tuple is hashable only if all its elements are hashable—(1, [2]) is not hashable.
- Assuming order in Python 3.6 and below: In 3.6 and earlier, dict did not guarantee insertion order. Code that relies on order should require 3.7+ or use collections.OrderedDict.
- Modifying dict while iterating: Adding or deleting keys during for k in d can raise RuntimeError or give undefined behavior. Iterate over list(d) or a copy if you need to mutate.
Hash Randomization
By default, Python randomizes the hash function at interpreter startup (using PYTHONHASHSEED). So hash("hello") can differ between runs. This is to prevent a class of denial-of-service attacks based on crafted keys that all collide. For the same process, hash is stable. Don’t rely on hash values being the same across different runs or machines.
When you need “dict with default” behavior, use collections.defaultdict or d.get(key, default). For grouping, d.setdefault(key, []).append(value) or defaultdict(list) is cleaner than checking if key not in d: d[key] = [].
If asked “how does Python’s dict work?”, say: “It’s a hash table with open addressing and a perturbed probe to reduce clustering. Keys must be hashable (immutable). Since 3.7, dict preserves insertion order. Resizing happens when the table is about two-thirds full; amortized O(1) insert and lookup.” Give an example of building a frequency dict or using tuple as key when you need to key by a pair.
Practice Problems
- LeetCode 49: Group Anagrams (dict with key = tuple of counts or sorted string).
- LeetCode 1: Two Sum (dict: value → index).
- Use dict.get(key, default) and setdefault in frequency and grouping problems.
Summary
- CPython’s dict uses open addressing with a perturbed probe, compact storage, and resizing when ~2/3 full. Insertion order is preserved (3.7+).
- Keys must be hashable (immutable; implement __hash__ and __eq__). Use tuple or frozenset to key by sequence or set; avoid list/dict as key.
- Use
d.get(key, default)for safe lookup,d.setdefault(key, [])ordefaultdictfor grouping. Don’t mutate a dict while iterating over it.
10.4 Frequency Problems
Introduction
Frequency problems are those where you need to count how often each element (or pattern) appears in the input—a list, string, or stream. The hash table (dict) is the natural tool: use the element as the key and the count as the value. Once you have a frequency map, you can answer questions like "which element appears most often?", "how many elements appear exactly k times?", or "are these two sequences anagrams?". This section covers building frequency maps, common patterns (max frequency, k most frequent, anagrams), and how to do it cleanly in Python with dict, defaultdict, and Counter.
Formal Definition
A frequency map (or count map) is a function from the set of distinct elements in the input to the non-negative integers: for each element x, freq(x) is the number of times x appears. Implemented as a hash table: keys are elements, values are counts. The map is built in a single pass: for each occurrence of x, set freq[x] = freq.get(x, 0) + 1. Any query that depends only on per-element counts can then be answered from this map.
Mental Model
Think of the frequency map as a scoreboard: one row per distinct item, one column for "how many." You scan the input once and, for every item you see, add 1 to its row. After the pass, the scoreboard holds all counts. "Most frequent" = row with the largest number; "anagrams?" = two scoreboards (one per string) are identical.
Real-World Analogy
Imagine counting votes in an election: each ballot has a candidate name. You go through the pile and, for each name, add one to that candidate's tally. At the end you have a "frequency map": candidate → number of votes. Finding the winner is "key with maximum value." Frequency problems in algorithms are the same: one pass (or a few) to build counts, then query or process the counts.
Given arr = [1, 2, 2, 3, 2, 1], a frequency map is {1: 2, 2: 3, 3: 1}. So 2 appears most often (3 times). For strings: "hello" → {'h': 1, 'e': 1, 'l': 2, 'o': 1}. Two strings are anagrams if their character frequency maps are equal.
Diagram: Input → Frequency Map
arr = [1, 2, 2, 3, 2, 1]
Step: scan 1 → scan 2 → scan 2 → scan 3 → scan 2 → scan 1
freq: {} → {1:1} → {1:1, 2:1} → {1:1, 2:2} → {1:1, 2:2, 3:1} → {1:1, 2:3, 3:1} → {1:2, 2:3, 3:1}
│ │ │ │ │ │
└───────┴────────────┴──────────────┴────────────────┴────────────────────┴──→ one pass
Final freq = {1: 2, 2: 3, 3: 1} → "most frequent" = key with max value = 2
Building a Frequency Map
For any iterable (list, string, etc.), the pattern is: iterate once; for each element, set freq[element] = freq.get(element, 0) + 1 (or use defaultdict(int) or Counter).
Using a plain dict
def build_freq(arr):
freq = {}
for x in arr:
freq[x] = freq.get(x, 0) + 1
return freq
# Example
print(build_freq([1, 2, 2, 3, 2, 1])) # {1: 2, 2: 3, 3: 1}
print(build_freq("hello")) # {'h': 1, 'e': 1, 'l': 2, 'o': 1}
Using defaultdict(int)
from collections import defaultdict
def build_freq_defaultdict(arr):
freq = defaultdict(int)
for x in arr:
freq[x] += 1
return dict(freq)
Using Counter
from collections import Counter
# Counter is built for this
freq = Counter([1, 2, 2, 3, 2, 1]) # Counter({2: 3, 1: 2, 3: 1})
char_freq = Counter("hello") # Counter({'l': 2, 'h': 1, 'e': 1, 'o': 1})
# Most common n elements
print(Counter("hello").most_common(2)) # [('l', 2), ('h', 1)]
Common Frequency Patterns
1. Element with maximum frequency
After building freq, find the key with the largest value. One pass over the dict, or use max(freq, key=freq.get).
def most_frequent(arr):
freq = {}
for x in arr:
freq[x] = freq.get(x, 0) + 1
return max(freq, key=freq.get) # key whose value is maximum
# Or with Counter
def most_frequent_counter(arr):
return Counter(arr).most_common(1)[0][0]
2. K most frequent elements
Build frequency map, then either: (a) sort (key, count) by count and take top k — O(n log k) with a heap or O(n log n) with full sort; or (b) use bucket sort: bucket[i] = list of elements with frequency i — O(n).
def top_k_frequent(nums, k):
freq = Counter(nums)
# bucket[i] = elements that appear i times
n = len(nums)
buckets = [[] for _ in range(n + 1)]
for x, count in freq.items():
buckets[count].append(x)
result = []
for i in range(n, 0, -1):
for x in buckets[i]:
result.append(x)
if len(result) == k:
return result
return result
3. Elements that appear exactly k times
Build freq; then collect all keys where freq[key] == k.
def elements_with_freq_k(arr, k):
freq = Counter(arr)
return [x for x, c in freq.items() if c == k]
4. Checking anagrams (same character frequencies)
Two strings are anagrams if their character frequency maps are equal. So build Counter(s1) and Counter(s2) and check equality, or use sorted(s1) == sorted(s2).
def are_anagrams(s1, s2):
return Counter(s1) == Counter(s2)
# Without Counter
def are_anagrams_manual(s1, s2):
if len(s1) != len(s2):
return False
f1, f2 = {}, {}
for c in s1:
f1[c] = f1.get(c, 0) + 1
for c in s2:
f2[c] = f2.get(c, 0) + 1
return f1 == f2
Why This Topic Matters
- Interview staple: "Count frequency", "find most frequent", "group by frequency", and "anagrams" appear constantly. The hash-table-one-pass pattern is expected.
- Streaming and big data: When you can't store the whole array, you can still maintain a frequency map of what you've seen so far (e.g. for approximate "heavy hitters" or exact counts if the number of distinct keys is small).
- Preprocessing step: Many problems (e.g. "substring with same character counts", "permutation in string") reduce to building or comparing frequency maps over windows.
Grouping by Key (Beyond Count)
Sometimes you don't need counts—you need to group elements by a key (e.g. group anagrams by "sorted string" or "tuple of counts"). Use dict with value as list; setdefault(key, []).append(item) or defaultdict(list).
# Group anagrams: key = sorted string, value = list of words
def group_anagrams(words):
groups = defaultdict(list)
for w in words:
key = tuple(sorted(w)) # or "".join(sorted(w))
groups[key].append(w)
return list(groups.values())
# Example
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(words))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Time and Space Complexity
- Building frequency map: One pass over n elements; dict insert/lookup O(1) average. Time O(n), space O(k) where k = number of distinct elements.
- Max frequency / k most frequent: After building the map, max over k keys is O(k); bucket approach for top-k is O(n). Overall O(n) time, O(n) space for buckets.
- Anagram check: O(n) for each string, n = length. O(n) time, O(1) space if alphabet size is fixed (e.g. 26 letters).
Why O(n) for building: We touch each of the n elements exactly once. For each element we do one dict lookup (O(1) average) and one insert/update (O(1) average). So total time is n · O(1) = O(n). Space is one entry per distinct key, so O(k) where k ≤ n.
Edge Cases
- Empty input: Return empty dict
{}or empty list for "top k" / "elements with freq k". - Single element: freq has one key with count 1; "most frequent" is that element; anagram of two single-char strings is just equality of the two chars.
- All elements same: One key with count n; top-k returns the same element k times (or k copies) depending on problem wording.
- k larger than distinct count: In "top k frequent," if there are fewer than k distinct elements, return all of them (or pad as per problem).
Pattern Recognition
Use a frequency map when the problem involves: "count occurrences," "most frequent," "least frequent," "elements appearing exactly k times," "anagrams," "group by same multiset," "majority element," "first unique/non-repeating." The pattern is: one pass to build freq, then one or more passes (or a single aggregation) to answer the question.
Common Mistakes
Assuming order in frequency map. Plain dict in Python 3.7+ preserves insertion order, so the order you see when iterating is the order of first occurrence. For "most frequent" you must explicitly find the key with max value—don't assume the first or last key is the answer.
- Off-by-one in "first k" vs "all with frequency ≥ x": "Top k frequent" means exactly k elements; "all elements with frequency ≥ 2" is a different problem. Read the problem carefully.
- Anagrams: case and spaces: Often problem says "ignore case" or "ignore spaces". Normalize (e.g. lower and remove spaces) before building the Counter.
Use Counter when you only need counts; it has most_common(k) and supports addition/subtraction. Use defaultdict(int) when you need to update counts in place or combine with other logic. Use plain dict with get(x, 0) + 1 when you want no extra imports.
When you see "count occurrences", "most frequent", "anagram", or "group by same character set", say: "I'll use a hash table to build a frequency map in one pass. For anagrams I'll compare Counter(s1) == Counter(s2) or group by a canonical key like sorted string." Then code the one-pass loop and the query (max, top-k, or equality).
Practice Problems
- LeetCode 347: Top K Frequent Elements (bucket sort or heap).
- LeetCode 49: Group Anagrams (group by sorted string or tuple of counts).
- LeetCode 242: Valid Anagram (Counter(s1) == Counter(s2)).
- LeetCode 387: First Unique Character (build freq, then scan for first with count 1).
Summary
- Frequency map: One pass:
freq[x] = freq.get(x, 0) + 1orCounter(iterable). Use for counts, max frequency, top-k, anagrams, grouping. - Max frequency:
max(freq, key=freq.get)orCounter(...).most_common(1). - Anagrams: Same character counts →
Counter(s1) == Counter(s2). Group anagrams by key = sorted string or tuple of counts. - Prefer
Counterwhen you needmost_common; usedefaultdict(int)or plain dict when you need custom updates.
10.5 Two Sum Pattern
Introduction
The Two Sum problem asks: given an array of numbers and a target value, find two elements (by value or by index) that add up to the target. The hash-table solution is the standard approach: in a single pass, for each element x, check whether target - x has already been seen; if so, you have a pair. Store seen values (and optionally their indices) in a dict for O(1) lookup. This pattern generalizes to "find a pair satisfying a condition" and to variants like Three Sum (reduce to Two Sum) or "count pairs with given sum." Mastering Two Sum is essential—it appears in interviews constantly and is the building block for many other problems.
Real-World Analogy
Imagine you're in a store with a fixed budget. You pick up an item and check its price. To know if you can buy two items that exactly match your budget, you need to remember the prices of items you've already seen. When you look at a new item, you ask: "Have I already seen an item whose price is (budget minus this item's price)?" A hash table is like a quick lookup pad: you write down each price as you see it, and when you see a new one you instantly check if the "complement" is on your list.
nums = [2, 7, 11, 15], target = 9. We need two numbers that add to 9. At index 0 we see 2; we need 9 - 2 = 7. Store 2 → 0. At index 1 we see 7; 9 - 7 = 2 is already in the dict at index 0. So indices 0 and 1 (values 2 and 7) are the answer: [0, 1].
Diagram: One-Pass Two Sum
nums = [ 2, 7, 11, 15 ] target = 9
index: 0 1 2 3
i=0: x=2, complement=9-2=7. 7 in seen? No. seen = {2:0}
i=1: x=7, complement=9-7=2. 2 in seen? Yes (at 0). Return [0, 1] ✓
Visual:
┌─────┬─────┬─────┬─────┐
│ 2 │ 7 │ 11 │ 15 │ ← array
└──┬──┴──┬──┴─────┴─────┘
│ │
│ └── at i=1: need 2 → found at index 0 → pair (0, 1)
└──────── at i=0: store seen[2]=0, need 7 (not seen yet)
Formal Definition
Given a sequence nums[0..n-1] and integer target, find distinct indices i, j such that nums[i] + nums[j] = target. Output is typically the pair (i, j) or [i, j]. Uniqueness: exactly one valid pair is assumed in the classic problem; we do not reuse the same index twice.
Problem Statement (Classic)
Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target. Assume exactly one solution exists, and you may not use the same element twice.
Thinking Evolution: Brute Force → Better → Optimal
Brute Force: Check Every Pair — O(n²)
Try all pairs (i, j) with i < j: if nums[i] + nums[j] == target, return [i, j]. Two nested loops; no extra space. Correct but slow for large n.
def two_sum_brute(nums, target):
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] + nums[j] == target:
return [i, j]
return []
Better: Sort + Two Pointers — O(n log n) time, O(1) extra (if we can lose indices)
If we only needed values, we could sort and use two pointers at the ends. But we need indices, so we must keep (value, index) pairs and sort by value, then run two pointers—still O(n log n). Good when the array is already sorted (then O(n)).
Optimal: Hash Table (One Pass) — O(n) time, O(n) space
For each x = nums[i], the needed partner is target - x. If we have already seen that value at some index j, we are done. So we maintain a mapping "value → index" and check before adding the current element. One pass, O(n) time and space.
Hash-Table Solution (One Pass)
Idea: as we iterate, for each nums[i], the value we need to complete the pair is complement = target - nums[i]. If complement was seen at some earlier index j, return [j, i]. Otherwise, record nums[i] → i in a dict and continue.
def two_sum(nums, target):
seen = {} # value -> index
for i, x in enumerate(nums):
complement = target - x
if complement in seen:
return [seen[complement], i]
seen[x] = i
return [] # no solution (problem usually guarantees one)
Line-by-Line Explanation
seen = {}: We will store "value we've seen → smallest index where we saw it" (so we return the earlier index first).for i, x in enumerate(nums): Process each element once;iis the index,xis the value.complement = target - x: The other number we need so thatx + complement = target.if complement in seen: We've already seen that value at indexseen[complement]. So[seen[complement], i]is a valid pair; return it.seen[x] = i: Record that we've seen valuexat indexifor future lookups. We do this after the check so we never use the same index twice.
Why one pass works: when we are at index i, all indices 0..i-1 are already in seen. So if the pair is (j, i) with j < i, we will find complement = target - nums[i] equal to nums[j] when we process i.
Edge Cases
- No solution: Return
[]or whatever the problem specifies; the one-pass loop will finish without returning. - Duplicate values: Storing
seen[x] = ioverwrites the previous index. We only need one valid pair; returning the first pair we find (earliestj) is usually acceptable. If the problem requires "all pairs," use a list of indices per value. - target - x == x (same element would satisfy 2x = target): We check
complement in seenbefore addingxtoseen, so we only match with an earlier index. We never use the same index twice.
Two-Pass Variant
First pass: build seen (value → index). Second pass: for each i, if target - nums[i] is in seen and the stored index is not i, return [seen[target - nums[i]], i]. Same O(n) time and O(n) space; one pass is cleaner and avoids the "same index" check when the complement is at the same position.
Return Values vs Indices
Some problems ask for the values of the two numbers, not indices. Same algorithm: when you find the pair, return [complement, x] or [nums[j], nums[i]]. If the problem asks for count of pairs with sum equal to target, use a frequency map: for each x, add freq.get(target - x, 0) to the count (handle same-index and duplicates per problem rules), then update freq[x].
Why This Topic Matters
- Interview staple: Two Sum is one of the most asked problems. The hash-table pattern is the expected optimal solution (O(n) time).
- Pattern for "find pair": Any "find two elements such that f(a, b) = target" can sometimes be reduced to storing seen values and checking for a complement. Works when the relation can be rewritten as "need b = g(target, a)" and g is easy to compute.
- Building block: Three Sum is often "fix one element, then Two Sum on the rest"; 4Sum and similar follow. Subarray sum problems use prefix-sum + hash table (different but related idea).
Variants and Extensions
Two Sum II – Sorted array
If the array is sorted, use two pointers at the start and end: if nums[lo] + nums[hi] > target, decrement hi; if smaller, increment lo; if equal, return. O(n) time, O(1) extra space. No hash table needed.
Count pairs with given sum
Use a frequency map. For each x, count how many times target - x has been seen (excluding current element if needed). Add that to the result, then do freq[x] += 1. Handle duplicates and "same index" per problem.
Two Sum – Return all unique pairs (values)
Sort and use two pointers, or use a set of pairs and a set of seen values to avoid duplicates. Depends on whether duplicates in the array are allowed and whether (a,b) and (b,a) count as one.
Time and Space Complexity
- Time: O(n) — one pass over the array; dict lookup and insert are O(1) average.
- Space: O(n) — in the worst case we store n distinct values in the dict.
Common Mistakes
Using the same element twice. In the one-pass solution we store seen[x] = i after checking for the complement, so we never use the same index twice. In a two-pass solution, you must ensure seen[complement] != i before returning.
- Returning value instead of index: Read the problem: often "return indices" is required. Store index in the dict, not just presence.
- Order of result: Some problems want the smaller index first. Our one-pass returns
[seen[complement], i]so the first index is always smaller.
When you see "find two numbers that sum to target", say: "I'll use a hash table. For each element I'll check if (target - element) was already seen; if yes, we have a pair. I'll store each value and its index so I can return indices." Then code the one-pass loop.
If asked "can you do it in O(1) space?", mention that for an unsorted array, the standard optimal is O(n) time and O(n) space with a hash table. O(1) space would require sorting (then two pointers), which is O(n log n) time—so there's a time–space tradeoff. For a sorted array, two pointers give O(n) time and O(1) space.
Practice Problems
- LeetCode 1: Two Sum (classic; return indices).
- LeetCode 167: Two Sum II – Input Array Is Sorted (two pointers).
- LeetCode 15: 3Sum (fix one, then two sum on the rest; avoid duplicate triplets).
- LeetCode 18: 4Sum (similar idea with two fixed elements or one fixed + 3Sum).
Summary
- Two Sum (unsorted): One pass with a dict mapping value → index. For each
x, iftarget - xis in the dict, return[seen[target-x], i]; elseseen[x] = i. - Time O(n), space O(n). Same pattern extends to "count pairs" (use frequency map) and to "find pair" for other relations when you can define a complement.
- Sorted array: Two pointers at both ends, move based on sum vs target — O(n) time, O(1) space. Three Sum / Four Sum often reduce to Two Sum after fixing one or two elements.
10.6 Subarray with Given Sum
Introduction
Given an array (possibly with negative numbers) and a target sum, the problem is to find a contiguous subarray whose elements add up to that target. The efficient solution uses prefix sums plus a hash table: if the prefix sum at index i is P[i], then a subarray from j+1 to i has sum P[i] - P[j]. So we need P[i] - P[j] = target, i.e. P[j] = P[i] - target. As we compute prefix sums left to right, we store each prefix sum (and index or count) in a dict; at each step we check whether current_prefix - target was seen before. This gives O(n) time and O(n) space. The same idea applies to "count subarrays with given sum" or "longest subarray with sum k."
Formal Definition
A subarray of arr is a contiguous block arr[i], arr[i+1], ..., arr[j] for some 0 ≤ i ≤ j < n. The sum of this subarray is arr[i] + arr[i+1] + ... + arr[j]. We define prefix sum P[k] = arr[0] + arr[1] + ... + arr[k] (with P[-1] = 0 for the empty prefix). Then the sum of arr[i..j] equals P[j] - P[i-1]. Finding a subarray with sum target is equivalent to finding indices i, j with i ≤ j such that P[j] - P[i-1] = target, i.e. P[i-1] = P[j] - target. So for each ending position j, we need a starting position i-1 (or i) where the prefix sum was P[j] - target.
Mental Model
Imagine walking along the array and keeping a running total (prefix sum). At each step you ask: "Have I ever had a running total that is exactly (current total minus target)?" If yes, the segment from that earlier moment to now has sum target. The hash table is your memory of "prefix sum → when you had it" (index or count).
Real-World Analogy
Imagine a road with mile markers. You know the total distance from the start to each marker (that's your "prefix sum"). To find a segment that is exactly 10 miles long, you ask: "At which earlier marker was the cumulative distance exactly (current distance minus 10)?" The hash table is your quick log of "cumulative distance → marker number." When you reach a new marker, you look up whether you've already passed a point where the cumulative distance was 10 less than now—if yes, the stretch between that point and now is 10 miles.
arr = [1, 2, 3, 4, 5], target = 9. Prefix sums: P[0]=1, P[1]=3, P[2]=6, P[3]=10, P[4]=15. At index 3, prefix sum is 10. We need P[j] = 10 - 9 = 1. Prefix sum 1 was at index 0 (before index 0 we had sum 0). So subarray from index 1 to 3 (1-based: positions 2 to 4) has sum 9: [2, 3, 4] → 2+3+4 = 9. So the answer is subarray [2, 3, 4] or indices [1, 3] (0-based).
Prefix Sum Idea
Define prefix[i] = sum of arr[0..i] (inclusive). By convention we also define prefix[-1] = 0 (empty prefix). Then the sum of arr[j+1..i] is:
sum(arr[j+1..i]) = prefix[i] - prefix[j]
So we want prefix[i] - prefix[j] = target, i.e. prefix[j] = prefix[i] - target. As we iterate i from 0 to n-1, we maintain a running curr_sum (which is prefix[i]). We need to know: have we seen the value curr_sum - target at some earlier index? If yes, the subarray from (that index + 1) to i has sum target.
Diagram: Prefix Sum and Segment
arr = [ 1, 2, 3, 4, 5 ] target = 9
index = 0 1 2 3 4
prefix: P[-1]=0, P[0]=1, P[1]=3, P[2]=6, P[3]=10, P[4]=15
To get sum 9 for segment [1..3] (elements 2,3,4):
sum(arr[1..3]) = 2+3+4 = 9 = P[3] - P[0] = 10 - 1 = 9 ✓
So: prefix[j] = P[0] = 1, prefix[i] = P[3] = 10.
We need prefix[j] = prefix[i] - target = 10 - 9 = 1. Seen at j=0? Yes.
Segment from index (0+1) to 3 → arr[1..3] = [2,3,4], sum = 9.
Timeline as we scan:
i=0: curr=1, need=1-9=-8 (not in seen). seen = {0:-1, 1:0}
i=1: curr=3, need=3-9=-6 (not in seen). seen = {..., 3:1}
i=2: curr=6, need=6-9=-3 (not in seen). seen = {..., 6:2}
i=3: curr=10, need=10-9=1 (in seen at 0). Return (0+1, 3) = (1, 3) ✓
Return One Subarray (Indices or Yes/No)
Store in the hash table: prefix_sum → smallest index where that prefix sum occurred (so we get the longest such subarray if we care, or any valid one). Initialize with 0 → -1 so that a subarray starting at index 0 is handled (we need "prefix[-1] = 0").
def subarray_sum(nums, target):
# Returns (start, end) 0-based indices if found, else (-1, -1)
seen = {0: -1} # prefix sum 0 at "index" -1 (before start)
curr = 0
for i, x in enumerate(nums):
curr += x
need = curr - target
if need in seen:
start = seen[need] + 1
return (start, i)
seen[curr] = i # store first occurrence for shortest/longest logic
return (-1, -1)
# Example
nums = [1, 2, 3, 4, 5]
print(subarray_sum(nums, 9)) # (1, 3) -> arr[1:4] = [2,3,4], sum = 9
Why seen[0] = -1? So when curr == target (e.g. first few elements sum to target), we have need = curr - target = 0, and we want to say "from start (index 0) to i". The "index" for prefix sum 0 is -1, so start = -1 + 1 = 0.
Count Subarrays with Given Sum
Instead of storing one index per prefix sum, store the count of how many times each prefix sum has occurred. When at index i with prefix sum curr, the number of subarrays ending at i with sum target is the count of indices j such that prefix[j] = curr - target, i.e. seen.get(curr - target, 0). Add that to the result, then do seen[curr] += 1 (and initialize seen[0] = 1 for the empty prefix).
def count_subarray_sum(nums, target):
seen = {0: 1} # prefix sum 0 has occurred once (empty prefix)
curr = 0
count = 0
for x in nums:
curr += x
need = curr - target
count += seen.get(need, 0)
seen[curr] = seen.get(curr, 0) + 1
return count
# Example: [1, 2, 3], target=3. Prefix: 1, 3, 6.
# At i=0: curr=1, need=1-3=-2 -> 0; seen[1]=1.
# At i=1: curr=3, need=3-3=0 -> 1 (empty prefix); count=1; seen[3]=1.
# At i=2: curr=6, need=6-3=3 -> 1; count=2; seen[6]=1.
# Subarrays with sum 3: [1,2] and [3]. So count=2.
Handling Negative Numbers
The prefix-sum + hash table approach works with negative numbers. Sliding window with two pointers does not work when the array can have negatives (shrinking the window can both increase or decrease the sum). So for arrays with negatives, the prefix-sum + hash table method is the correct O(n) approach.
Longest Subarray with Sum K
Store prefix_sum → first index where that sum was seen. When you see curr - target in seen, the subarray from seen[curr-target]+1 to i has sum target; update the maximum length. Only store the first occurrence of each prefix sum so that the subarray length i - seen[need] is as large as possible.
def longest_subarray_sum_k(nums, k):
seen = {0: -1}
curr = 0
max_len = 0
for i, x in enumerate(nums):
curr += x
need = curr - k
if need in seen:
length = i - seen[need]
max_len = max(max_len, length)
if curr not in seen: # keep first occurrence only
seen[curr] = i
return max_len
Why This Topic Matters
- Interview staple: "Subarray with given sum" and "count subarrays with sum k" are common. The prefix-sum + hash table pattern is the standard solution when negatives are allowed.
- Difference from Two Sum: Two Sum finds two elements; here we find a contiguous segment. The key is rephrasing "sum of segment = target" as "two prefix sums differ by target."
- Positive-only arrays: If all elements are positive, a sliding window (expand right until sum ≥ target, then shrink left) also works in O(n). But with negatives, prefix sum + hash is the way to go.
Time and Space Complexity
- Time: O(n) — one pass; each dict lookup and insert is O(1) average.
- Space: O(n) — up to n distinct prefix sums in the worst case.
Edge Cases
- Empty array: No subarray exists; return false, 0, or [].
- Target = 0: Subarray with sum 0 exists if any prefix sum repeats (same prefix seen twice) or if we have prefix 0 (segment from start).
seen[0] = 1orseen[0] = -1handles "segment from index 0." - First element equals target: Prefix after index 0 is target only if we had prefix 0 before (empty prefix). So initializing
seen[0]is essential.
Pattern Recognition
Use prefix sum + hash table when you see: "contiguous subarray with sum k," "count subarrays with sum k," "longest/shortest subarray with sum k," or "subarray sum equals target" and the array may contain negative numbers. If the problem says "all positive," you can also mention sliding window as an alternative (O(1) space).
Common Mistakes
Forgetting to initialize prefix sum 0. The empty prefix has sum 0. You must put seen[0] = -1 (for index version) or seen[0] = 1 (for count version) before the loop. Otherwise you miss subarrays that start at index 0.
- Using sliding window for arrays with negatives: Sliding window assumes monotonic change when you move the pointer. With negatives, the sum can go up when you shrink the window, so the two-pointer shrink logic fails. Use prefix sum + hash instead.
- Store last vs first occurrence: For "longest subarray with sum k" you want the first index where each prefix sum was seen. For "shortest subarray" you'd store the last occurrence (update every time).
When you hear "subarray sum equals k" or "contiguous subarray with sum target", say: "I'll use prefix sums and a hash table. At each position I'll check if (current prefix sum minus target) was seen before; that means there's a segment from that earlier position to here with sum target. I'll initialize with prefix 0 so segments starting at index 0 are counted."
If the interviewer says "all elements are positive", you can mention both approaches: (1) Sliding window: expand until sum ≥ target, then shrink from the left—O(n) time, O(1) space. (2) Prefix sum + hash: still works, O(n) time and O(n) space. For arrays with negatives, only (2) is correct.
Practice Problems
- LeetCode 560: Subarray Sum Equals K (count subarrays with sum k).
- LeetCode 325: Maximum Size Subarray Sum Equals k (longest subarray with sum k).
- GeeksforGeeks: Subarray with given sum (return start/end indices; handle negatives).
Summary
- Subarray sum = target is solved with prefix sum + hash table. At each index, check if
current_prefix - targetwas seen; if yes, the segment from (that index + 1) to current has sum target. - Initialize
seen[0] = -1(for indices) orseen[0] = 1(for count) so subarrays starting at index 0 are included. - Find one: store prefix_sum → index; count: store prefix_sum → count; longest: store prefix_sum → first index only. Works with negative numbers; sliding window does not.
10.7 Custom Hashing
Introduction
Custom hashing means using objects or composite values as keys in a hash table (e.g. Python dict) when built-in types are not enough. To use a custom type as a key, it must be hashable: it must implement __hash__ and __eq__ so that equal objects have the same hash and the hash does not change over the object's lifetime (so typically the object is immutable). For composite keys—e.g. "pair (a, b)" or "set of items"—we use tuple or frozenset, which are hashable when their elements are hashable. Sometimes you need to design a hash function for a domain (e.g. mapping (i, j) to a single integer) to avoid collisions and keep lookups O(1). This section covers making custom classes hashable, using tuples and frozensets as keys, and simple hash-function design.
Real-World Analogy
Think of a library where books are filed by "author + title" together. The filing system needs a single label (like a hash) for each (author, title) pair. You define a rule: e.g. "author alphabetically, then title," and that rule must always give the same label for the same pair and never change. Custom hashing is that rule: you decide how to turn your composite key into something the hash table can use, and you keep that rule consistent.
You want to count how many times the pair (x, y) appears in a list of coordinate pairs. Keys must be hashable—so use (x, y) as the key: freq[(x, y)] = freq.get((x, y), 0) + 1. For "group by set of tags" (order doesn't matter), use frozenset(tags) as the key so that {'a','b'} and {'b','a'} map to the same key.
Hashable Requirements in Python
An object is hashable if:
- It implements
__hash__()returning an integer that does not change during the object's lifetime. - It implements
__eq__()for equality. Ifa == b, thenhash(a) == hash(b)must hold. - Immutable types (int, float, str, tuple, frozenset) are hashable when their contents are hashable. Mutable types (list, dict, set) are not hashable.
Diagram: Tuple vs Frozenset as Key
Tuple (order matters): Frozenset (order doesn't matter):
(1, 2) and (2, 1) are frozenset({1,2}) == frozenset({2,1})
different keys. same key → one bucket for "set of 1 and 2"
Points: (1,2) → count 3 Tags: {"a","b"} → group [item1, item2]
(2,1) → count 1 {"b","a"} → same group
Using Tuples as Composite Keys
When the key is a fixed-length combination of values (pair, triple, etc.), use a tuple. Tuples are hashable if all elements are hashable.
# Count occurrences of (x, y) pairs
points = [(1, 2), (1, 3), (1, 2), (2, 2)]
freq = {}
for p in points:
freq[p] = freq.get(p, 0) + 1
print(freq) # {(1, 2): 2, (1, 3): 1, (2, 2): 1}
# Group by (row, col) in a grid
grid_data = {(0, 0): 'A', (0, 1): 'B', (1, 0): 'C'}
# (row, col) is a natural composite key
Order matters: (1, 2) and (2, 1) are different keys. Use a tuple when the order of components is part of the key.
Using Frozenset for Set-Like Keys
When the key is a set of items (order doesn't matter), use frozenset. Two sets with the same elements must map to the same key.
# Group anagrams: key = set of (char, count) or just sorted tuple
# Alternative: key = frozenset(Counter(s).items()) so "aab" and "aba" match
from collections import Counter
def group_by_char_set(words):
groups = {}
for w in words:
# frozenset of (char, count) pairs - order doesn't matter
key = frozenset(Counter(w).items())
groups.setdefault(key, []).append(w)
return list(groups.values())
# Or simpler for anagrams: key = tuple(sorted(s))
def group_anagrams_sorted(words):
groups = {}
for w in words:
key = tuple(sorted(w))
groups.setdefault(key, []).append(w)
return list(groups.values())
Making a Custom Class Hashable
To use your class instances as dict keys, implement __hash__ and __eq__. The object should be immutable (or at least the fields used in __hash__ and __eq__ must not change after creation).
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
if not isinstance(other, Point):
return False
return self.x == other.x and self.y == other.y
def __hash__(self):
return hash((self.x, self.y)) # delegate to tuple's hash
# Now Point can be used as key
d = {}
d[Point(1, 2)] = "hello"
print(d[Point(1, 2)]) # "hello"
Rule: Include in __hash__ exactly the same fields you use in __eq__. Typically return hash((self.a, self.b, ...)) so that equal objects get the same hash.
Designing a Simple Hash for Indices (i, j)
When you have two indices i, j in a bounded range (e.g. 0 to n-1), you can map them to a single integer to use as a key: key = i * n + j (row-major) or i + j * n (column-major). This avoids storing tuples and can be faster. Reverse: i, j = key // n, key % n.
def flatten_2d(i, j, cols):
return i * cols + j
def unflatten(key, cols):
return key // cols, key % cols
# Use in dict for 2D memoization
n, m = 10, 10
memo = {}
def dfs(i, j):
k = flatten_2d(i, j, m)
if k in memo:
return memo[k]
# ... compute and store memo[k] = result
Why This Topic Matters
- Dict keys: Many problems require grouping or lookup by a composite key (pair, triple, set). Tuples and frozensets are the standard way; custom classes are needed when you want a named type as key.
- Immutability: If you use a list as part of a key, Python will raise "unhashable type: list". Convert to
tuple(lst)orfrozenset(lst)as appropriate. - Interview clarity: Saying "I'll use (a, b) as the key" or "I'll use frozenset for set equality" shows you understand hashability and key design.
Common Mistakes
Using a list or dict as a key. d[[1,2]] = 3 raises TypeError. Use d[tuple([1,2])] or d[(1,2)]. If the key is a set (order doesn't matter), use d[frozenset([1,2])].
- Mutable default or mutable fields in __hash__: If your class has list/dict attributes and you use them in
__hash__, mutating them later changes the hash and breaks the dict. Prefer immutable fields for hashable types. - Tuple with mutable elements:
(1, [2, 3])is not hashable because the list is mutable. Only tuples of hashable elements are hashable.
For composite keys: use tuple when order matters (e.g. (i, j), (a, b)); use frozenset when you need set equality (e.g. same set of tags). For custom classes, implement __eq__ and __hash__ using the same fields and keep those fields immutable.
When the problem says "group by" or "count by" a composite (pair, set, etc.), say: "I'll use a dict with a composite key. For a pair I'll use a tuple (a, b); for a set I'll use frozenset so order doesn't matter. Keys must be hashable, so I'll avoid lists." Then write the loop with key = (x, y) or key = frozenset(...) and d[key] = d.get(key, 0) + 1 (or setdefault for lists).
Practice Problems
- LeetCode 49: Group Anagrams (key = tuple(sorted(s)) or frozenset(Counter(s).items())).
- Problems that count pairs (i, j) or state (i, j): use (i, j) or flatten to i*n+j as key.
- DP or memoization with 2D state: dict key = (i, j) or flattened index.
Summary
- Hashable: Implement
__hash__and__eq__; equal objects must have the same hash; keep key state immutable. - Composite keys: Use tuple for ordered pairs/tuples (e.g. (i, j), (a, b)); use frozenset for set-like keys where order doesn't matter.
- Custom class:
__hash__and__eq__based on the same fields;hash((self.x, self.y))is a simple pattern. For 2D indices, flatten withi * cols + jif you need a single integer key.
Section 11: Trees
This section covers trees from fundamentals to advanced structures and techniques. You will learn terminology (root, depth, height, subtree), traversals (inorder, preorder, postorder, level-order), height and diameter, balanced trees, LCA, path sum problems, serialization, and then BST, AVL, Red-Black trees, Trie, Segment Tree, Fenwick Tree, Sparse Table, Binary Lifting, Euler Tour, Heavy-Light Decomposition, and Centroid Decomposition. Master these to handle tree problems in interviews and contests.
11.1 Tree Terminology
Introduction
A tree is a hierarchical data structure consisting of nodes connected by edges, with no cycles and exactly one path between any two nodes. One node is designated the root; every other node has exactly one parent and zero or more children. Trees are fundamental in DSA: binary trees, BSTs, heaps, tries, and segment trees all build on this structure. To read problems and implement solutions correctly, you must be precise about terms like root, leaf, height, depth, ancestor, and subtree. This section defines all essential tree terminology with examples and ties it to how we represent trees in code.
Mental Model
Picture an upside-down tree: the root at the top, branches (edges) going down to children, and leaves at the bottom. "Depth" is how far down you are from the root (steps from the top). "Height" is how far down the longest branch goes from a given node (steps to the lowest leaf). Recursion on trees almost always says: "Do something at this node, then do the same thing for the left and right subtrees"—so the subtree is your unit of thinking.
What Is a Tree (Formal)
- Tree: An undirected, connected, acyclic graph. So: (1) there is a path between any two nodes, (2) there are no cycles.
- Rooted tree: A tree with one node chosen as the root. All edges are then thought of as directed "away from the root" (parent → child).
- Node: An element of the tree that holds a value (and possibly left/right pointers in a binary tree).
- Edge: A connection between two nodes (parent–child). In a rooted tree we say the edge goes from parent to child.
Real-World Analogy
Think of a company org chart: the CEO is the root; each person has one manager (parent) and possibly several reports (children). The hierarchy has no loops (no one reports to themselves through a chain), and from any person you can trace a single path up to the CEO. "Depth" is how many levels below the CEO; "height" of a person is the longest chain of reports below them.
Tree with root 1, left child 2, right child 3; node 2 has left 4 and right 5. So nodes 4 and 5 are leaves; 2 and 3 are internal. The path from root to 5 is 1 → 2 → 5. Depth of 5 is 2 (if root has depth 0). Height of node 2 is 1; height of the tree (root) is 2.
Core Terminology (All Points)
1. Root
The root is the topmost node; it has no parent. All other nodes are descendants of the root. In code we usually hold a reference to the root and traverse from there.
2. Parent and Child
For an edge (u, v) in the rooted tree, if the edge is directed from u to v, then u is the parent of v, and v is a child of u. Every node has at most one parent; the root has none.
3. Sibling
Nodes that share the same parent are siblings. In a binary tree, the left and right children of a node are siblings.
4. Leaf (External Node)
A leaf is a node with no children. Leaves are the "bottom" of the tree. In a binary tree, a leaf has both left and right as null/None.
5. Internal Node
An internal node is any node that has at least one child (i.e. not a leaf).
6. Edge
An edge is a link between a parent and a child. A tree with n nodes has exactly n − 1 edges (this follows from connected + acyclic).
7. Path
A path between two nodes is the sequence of edges connecting them. In a tree there is exactly one path between any two nodes. The path length is the number of edges (or the number of nodes minus one on that path).
8. Depth (of a node)
The depth of a node is the number of edges on the path from the root to that node. The root has depth 0. Its children have depth 1, and so on. Depth is "distance from root."
9. Level
Level is sometimes used like depth: level 0 = root, level 1 = nodes at depth 1, etc. In some contexts "level" is 1-based (root at level 1); clarify in problems. Here we use depth (0-based from root) consistently.
10. Height (of a node)
The height of a node is the number of edges on the longest path from that node down to a leaf. Leaves have height 0. The height of a node is the max height of its children plus one (or 0 if no children).
11. Height of the tree
The height of the tree is the height of the root. Equivalently, it is the maximum depth of any node (since the deepest leaf is at depth = tree height). Empty tree (no root) is often defined to have height −1 so that a single-node tree has height 0.
12. Degree (of a node)
The degree of a node is the number of children it has. In a binary tree, each node has degree 0, 1, or 2 (left child, right child, or both).
13. Subtree
For any node u, the subtree rooted at u is the node u together with all its descendants and the edges between them. It is itself a tree with root u. Recursive algorithms often "process the subtree at u" by processing u and then recursively processing its left and right subtrees.
14. Ancestor and Descendant
If there is a path from node u down to node v (following parent-to-child edges), then u is an ancestor of v, and v is a descendant of u. The root is an ancestor of every node. A node is an ancestor of itself (and a descendant of itself) in many definitions; in others "proper" ancestor excludes the node itself. Be consistent: in LCA problems, a node is usually considered its own ancestor.
15. Binary tree
A binary tree is a rooted tree in which each node has at most two children, typically called left and right. This is the most common tree type in interviews. A full binary tree has every node with 0 or 2 children; a complete binary tree is filled level by level (used for heaps).
Diagram (ASCII)
1 (root; depth 0, height 2)
/ \
2 3 (depth 1; 2 has height 1, 3 has height 0)
/ \
4 5 (leaves; depth 2, height 0)
Edges: (1,2), (1,3), (2,4), (2,5). 5 nodes, 4 edges.
Path from 1 to 5: 1 → 2 → 5 (length 2).
Subtree at 2: nodes {2, 4, 5}. Ancestors of 5: 5, 2, 1.
Representation in Code (Python)
We represent a binary tree node with a class that has a value and optional left and right children. The tree is referenced by its root.
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right
# Example: build the tree in the diagram
root = TreeNode(1)
root.left = TreeNode(2)
root.right = TreeNode(3)
root.left.left = TreeNode(4)
root.left.right = TreeNode(5)
# Check leaf: node is leaf iff (node.left is None and node.right is None)
def is_leaf(node):
return node and node.left is None and node.right is None
Key Formulas and Facts
- n nodes ⇒ n − 1 edges.
- Height h (max depth) ⇒ at most 2^(h+1) − 1 nodes in a binary tree (full tree). At least h+1 nodes (chain).
- Depth of node = number of edges from root to node. Height of node = longest path from node to a leaf (edges).
- For the root, depth = 0 and height = tree height. For a leaf, height = 0.
Edge Cases and Conventions
- Empty tree (root is None): No nodes, no edges. Height is conventionally −1 so that a single-node tree has height 0 (0 = 1 + max(−1, −1)).
- Single node: One node, zero edges. Depth 0, height 0. It is both root and leaf.
- Skew tree (chain): Every node has at most one child. Height = n − 1 (n nodes). This is the worst case for height when doing recursion (O(n) stack depth).
- Level 0 vs level 1: Some definitions put the root at "level 1." In this course we use depth 0 for the root; always check the problem statement.
Why This Topic Matters
- Problem statements use "depth", "height", "subtree", "ancestor" precisely. Misreading leads to wrong solutions (e.g. confusing depth and height).
- Recursion: "Process the subtree at node u" means process u and recurse on left and right. Base case is often "if node is None" or "if node is leaf."
- Interviews: You may be asked "what is the height of this tree?" or "how many edges are there?"—answering correctly shows you know the definitions.
Common Mistakes
Confusing depth and height. Depth is "distance from root" (root has depth 0). Height is "longest path from this node down to a leaf" (leaves have height 0). The height of the tree is the height of the root, which equals the maximum depth of any node.
- Empty tree: If the root is None, the tree has no nodes. Convention: height of empty tree = −1 so that height(single node) = 0.
- Level vs depth: Some problems use "level" 1-based. Always confirm: "Is the root at depth 0 or 1?"
When implementing "height of tree", do: if root is None return −1; else return 1 + max(height(root.left), height(root.right)). When implementing "depth", pass current depth as a parameter and increment when going to children.
If asked to define terms, say: "The root has no parent. Depth is distance from root; height is the longest path from a node down to a leaf. A leaf has no children. The tree height is the root's height. A subtree at a node includes that node and all descendants." Then you can implement height/depth correctly in code.
Practice Problems
- Any tree problem that uses depth, height, or subtree (e.g. max depth, min depth, count nodes).
- LeetCode 104: Maximum Depth of Binary Tree; LeetCode 111: Minimum Depth of Binary Tree.
Summary
- Tree: Connected, acyclic; n nodes, n−1 edges. Root: top node; parent/child: one edge direction; leaf: no children; internal: has at least one child.
- Depth: edges from root to node (root depth 0). Height (of node): longest path from node to leaf (leaf height 0). Tree height = root height = max depth.
- Subtree at u: u and all descendants. Ancestor/descendant: path from u down to v. Binary tree: at most two children per node (left, right). Code:
TreeNode(val, left, right); empty tree height = −1.
11.2 Binary Tree Traversals
Introduction
Traversal means visiting every node of a binary tree in a well-defined order. The four main traversals are: inorder (left → root → right), preorder (root → left → right), postorder (left → right → root), and level-order (BFS, level by level). Recursive implementations are simple and follow the same pattern: handle the base case (null), then recurse on left and right with the "visit" step placed before, between, or after the two recursions. Traversals are the basis for many tree problems (serialization, BST validation, path sum, etc.).
Real-World Analogy
Imagine exploring a branching maze. Preorder: you mark the room (visit) as soon as you enter, then explore left branch, then right. Inorder: explore left first, then mark the room, then explore right—like reading a tree from "left to right" on the page. Postorder: explore left and right first, then mark the room when you're leaving—useful when you need info from children before processing the node. Level-order: visit all rooms on the first floor, then the second, then the third.
Tree: root 1, left 2, right 3; node 2 has left 4, right 5. Preorder: 1, 2, 4, 5, 3. Inorder: 4, 2, 5, 1, 3. Postorder: 4, 5, 2, 3, 1. Level-order: 1, 2, 3, 4, 5.
Diagram: Tree and Traversal Order
1 (root)
/ \
2 3
/ \
4 5
Preorder (Root → Left → Right): 1 → 2 → 4 → 5 → 3 (visit as you enter node)
Inorder (Left → Root → Right): 4 → 2 → 5 → 1 → 3 (visit between left and right)
Postorder (Left → Right → Root): 4 → 5 → 2 → 3 → 1 (visit as you leave node)
Level-order (BFS): 1 → 2 → 3 → 4 → 5 (by levels: 1; then 2,3; then 4,5)
Recursive Traversals
Base case: if root is None, return (or return []). Otherwise, the order of "visit root" relative to "recurse left" and "recurse right" defines the traversal.
Preorder (Root → Left → Right)
Visit the node first, then left subtree, then right subtree. Used for copying trees, prefix notation, or when you need the root before children.
def preorder(root):
if not root:
return
print(root.val) # visit
preorder(root.left)
preorder(root.right)
# Return list instead of print
def preorder_list(root):
if not root:
return []
return [root.val] + preorder_list(root.left) + preorder_list(root.right)
Inorder (Left → Root → Right)
Recurse left, visit node, recurse right. In a BST, inorder gives values in sorted order. Used for BST validation and "sorted" output.
def inorder(root):
if not root:
return
inorder(root.left)
print(root.val) # visit
inorder(root.right)
def inorder_list(root):
if not root:
return []
return inorder_list(root.left) + [root.val] + inorder_list(root.right)
Postorder (Left → Right → Root)
Recurse left, recurse right, then visit. Used when you need children processed first (e.g. height, diameter, deleting a tree).
def postorder(root):
if not root:
return
postorder(root.left)
postorder(root.right)
print(root.val) # visit
def postorder_list(root):
if not root:
return []
return postorder_list(root.left) + postorder_list(root.right) + [root.val]
Level-Order (BFS)
Visit nodes level by level: first the root, then all nodes at depth 1, then depth 2, etc. Use a queue: enqueue root; while queue not empty, dequeue a node, visit it, enqueue its left and right (if non-null).
from collections import deque
def level_order(root):
if not root:
return []
q = deque([root])
result = []
while q:
node = q.popleft()
result.append(node.val)
if node.left:
q.append(node.left)
if node.right:
q.append(node.right)
return result
# Level-order as list of levels (each level is a list)
def level_order_by_levels(root):
if not root:
return []
result = []
q = deque([root])
while q:
level = []
for _ in range(len(q)):
node = q.popleft()
level.append(node.val)
if node.left:
q.append(node.left)
if node.right:
q.append(node.right)
result.append(level)
return result # e.g. [[1], [2, 3], [4, 5]]
Summary of Order
| Traversal | Order | Typical use |
|---|---|---|
| Preorder | Root → Left → Right | Copy tree, prefix expr, serialize |
| Inorder | Left → Root → Right | BST sorted order, validate BST |
| Postorder | Left → Right → Root | Height, diameter, delete tree |
| Level-order | BFS by level | Print by level, min depth, BFS problems |
Time and Space Complexity
- Time: O(n) — each node visited once.
- Space (recursive): O(h) call stack, where h = tree height. Worst O(n) for a skew tree.
- Space (level-order queue): O(w) where w is max level width; worst O(n).
Why This Topic Matters
- Most tree problems involve a traversal (or a variant). Knowing the four orders and when to use each is essential.
- BST: inorder gives sorted order; preorder can reconstruct BST with a known structure.
- Iterative versions (next topic) avoid stack overflow for very deep trees and are sometimes required.
Edge Cases
- Empty tree (root is None): Return [] or skip; do not visit.
- Single node: All four traversals output that one value (order may differ for level-order vs others).
- Skew tree: Recursive traversals still O(n) time but O(n) stack depth; iterative avoids stack overflow.
Common Mistakes
- Swapping left and right. Inorder is left → root → right; reversing to right → root → left gives reverse sorted order in a BST, not sorted.
- Level-order: forgetting to check null children. Only enqueue left/right if they are non-null before appending to the queue.
"I'll use recursive traversals: preorder visit then left then right; inorder left then visit then right (BST gives sorted); postorder left then right then visit. Level-order is BFS with a queue. All O(n) time; recursion uses O(h) stack. For iterative I'd use an explicit stack (next topic)."
Practice Problems
- LeetCode 94: Binary Tree Inorder Traversal; 144: Preorder; 145: Postorder; 102: Level Order.
- LeetCode 98: Validate BST (inorder gives sorted order).
Remember: Pre = root first; In = root in the middle (left, root, right); Post = root last. Level-order = BFS with a queue.
Summary
- Preorder: root, left, right. Inorder: left, root, right (BST → sorted). Postorder: left, right, root.
- Level-order: BFS with a queue; can return flat list or list of levels.
- Recursive: base case null; place "visit" before, between, or after the two recursions. Time O(n), space O(h) for recursion.
11.3 Iterative Traversal
Introduction
Iterative traversal implements inorder, preorder, and postorder using an explicit stack instead of the call stack. This avoids stack overflow on very deep trees and gives you full control over the visit order. Preorder iterative is straightforward (push root; pop, visit, push right then left). Inorder requires "go left until null, then pop/visit and go right." Postorder can be done with two stacks or by doing a "reverse preorder" (root, right, left) and reversing the result. Level-order uses a queue (already iterative). This section gives the standard iterative implementations for preorder, inorder, and postorder.
Why Iterative?
- Stack overflow: Recursion uses the call stack; a very tall tree can cause stack overflow. Iterative uses an explicit stack in heap memory.
- No recursion: Some environments or style guides prefer iterative code. Iterative also makes it easier to pause/resume or process in chunks.
- Same complexity: Time O(n), space O(h) for the stack—same as recursion, but the stack is under your control.
Preorder (Iterative)
Order: root → left → right. Use a stack. Push the root. While the stack is not empty: pop a node, visit it, then push its right child (if any), then its left child (if any). Pushing right before left ensures we process left before right when we pop.
Diagram: Stack During Preorder
Tree: 1
/ \
2 3
Preorder: 1, 2, 3
Step 1: stack=[1] → pop 1, visit 1, push 3 then 2 → stack=[3,2]
Step 2: stack=[3,2] → pop 2, visit 2, push nothing → stack=[3]
Step 3: stack=[3] → pop 3, visit 3, push nothing → stack=[]
Done. Output: 1, 2, 3
def preorder_iterative(root):
if not root:
return []
stack = [root]
result = []
while stack:
node = stack.pop()
result.append(node.val)
if node.right:
stack.append(node.right)
if node.left:
stack.append(node.left)
return result
Inorder (Iterative)
Order: left → root → right. Idea: go as far left as possible, pushing nodes onto the stack. When you hit null, pop (that's the next node to visit), visit it, then set current to its right and repeat (go left from there).
def inorder_iterative(root):
result = []
stack = []
curr = root
while curr or stack:
while curr:
stack.append(curr)
curr = curr.left
curr = stack.pop()
result.append(curr.val)
curr = curr.right
return result
How it works: The inner while curr pushes all left descendants. When curr becomes None, we've reached the leftmost node; we pop it (visit), then move to its right subtree and repeat. This mimics "recurse left, visit, recurse right" without recursion.
Postorder (Iterative)
Order: left → right → root. One clean approach: do a modified preorder that visits root, then right, then left. The resulting sequence is the reverse of postorder. So run that and reverse the result.
def postorder_iterative(root):
if not root:
return []
stack = [root]
result = []
while stack:
node = stack.pop()
result.append(node.val)
if node.left:
stack.append(node.left)
if node.right:
stack.append(node.right)
return result[::-1] # reverse: now left, right, root
Alternatively, use two stacks: push root to stack1; while stack1 not empty, pop to stack2 and push left then right to stack1. Then pop everything from stack2—that's postorder. Or use a single stack with a "last visited" pointer to avoid revisiting; the reverse-preorder trick is usually simpler to remember.
Summary Table
| Traversal | Iterative idea |
|---|---|
| Preorder | Stack: pop, visit, push right, push left |
| Inorder | Go left pushing nodes; pop & visit; go to right |
| Postorder | Preorder root→right→left, then reverse |
Time and Space Complexity
- Time: O(n) — each node pushed and popped once.
- Space: O(h) for the stack, h = tree height. Worst O(n) for a skew tree.
Common Mistakes
- Preorder: pushing left before right. Then the left child would be popped after the right. We want to visit left first, so push right then left (so left is on top of the stack).
- Inorder: forgetting to go right after pop. After visiting the node, set
curr = curr.rightto process the right subtree (or exit if null). - Postorder: wrong reverse. The trick is "preorder but root → right → left"; then reverse. If you do normal preorder (root → left → right) and reverse, you get reverse postorder, not postorder.
If asked "traverse without recursion," say: "I'll use an explicit stack. Preorder: push root, then while stack not empty pop, visit, push right then left. Inorder: go left pushing nodes until null, then pop and visit, then go right. Postorder: modified preorder root→right→left then reverse." Mention that space is still O(h) and time O(n).
Practice Problems
- LeetCode 94: Binary Tree Inorder Traversal (iterative).
- LeetCode 144: Binary Tree Preorder Traversal (iterative).
- LeetCode 145: Binary Tree Postorder Traversal (iterative).
Preorder iterative: push right before left so left is popped first. Postorder: do "preorder" but push left then right, then reverse the output. Inorder: "go left with stack, pop and visit, then go right."
Summary
- Preorder iterative: stack with root; pop → visit → push right, then left.
- Inorder iterative: while curr or stack: go left pushing nodes; pop (visit); curr = curr.right.
- Postorder iterative: preorder but push left then right; reverse result. All O(n) time, O(h) space.
11.4 Height & Diameter
Introduction
The height of a tree (or a node) is the number of edges on the longest path from that node down to a leaf. The diameter (or width) of a tree is the length of the longest path between any two nodes (measured in edges). Both are computed naturally with a postorder traversal: you need information from the children (their heights) before you can compute the current node's height or the best path through the current node. Height is a building block for balance checks and many other tree problems; diameter is a classic interview question (LeetCode 543). This section gives recursive definitions, code, and examples.
Formal Definitions
Height of a node: height(node) = 0 if the node is a leaf (no children). Otherwise height(node) = 1 + max(height(left_child), height(right_child)). For a null node we define height(null) = -1 so that a single-node tree has height 0. Height of the tree = height of the root.
Diameter: The diameter is the maximum over all pairs of nodes (u, v) of the length (in edges) of the unique path between u and v. Equivalently, it is the maximum over all nodes of the length of the longest path that passes through that node. For a node with left height L and right height R (in edges), the longest path through that node has length L + R + 2 (two edges from the node to the two children, then L and R edges down). So we can compute diameter by considering at each node the candidate through = L_h + R_h + 2 and taking the max over all nodes and over the diameters of the left and right subtrees.
Height (Recursive Definition)
Height of a node: Longest path (in edges) from that node to a leaf. Leaves have height 0. For an internal node, height = 1 + max(height(left_child), height(right_child)). Height of the tree = height of the root. Convention: height of an empty tree (null) = −1, so a single-node tree has height 0.
def height(root):
if root is None:
return -1
return 1 + max(height(root.left), height(root.right))
Tree: 1 (root) with left 2 and right 3; node 2 has left 4 and right 5. Leaves 3, 4, 5 have height 0. Node 2 has height 1 + max(0, 0) = 1. Node 1 has height 1 + max(1, 0) = 2. So tree height = 2.
Diameter (Longest Path Between Any Two Nodes)
The diameter is the maximum number of edges on any path between two nodes. That path may or may not pass through the root. For each node, the longest path through that node is: left_height + right_height (if we use edge-based heights: the path goes from a leaf in the left subtree up to this node and down to a leaf in the right subtree, so two "legs" of length left_height and right_height). So we can compute at every node the candidate diameter "through this node" = left_height + right_height, and take the maximum over all nodes. We also need to compare with the best diameter entirely in the left or right subtree.
Recursive approach: Return both (height, diameter) for the subtree. Base case: null → height = −1, diameter = 0. At a node: get (L_height, L_diam) and (R_height, R_diam). Current height = 1 + max(L_height, R_height). Candidate diameter through this node = (L_height + 1) + (R_height + 1) in terms of edges = L_height + R_height + 2? No: if height is "number of edges to leaf," then the path from a left leaf to this node has L_height + 1 edges (from node to left leaf). So path through node = (L_height + 1) + (R_height + 1) = L_height + R_height + 2. Actually the standard definition: left_height and right_height are already "max edges from left/right child to a leaf in that subtree." So from current node, the longest path down the left has length L_height + 1 (one edge to left child, then L_height edges). So the path "through" the node connecting a left leaf and a right leaf has (L_height + 1) + (R_height + 1) = L_height + R_height + 2 edges. So diameter_through = L_height + R_height + 2. Then diameter at this node = max(L_diam, R_diam, L_height + R_height + 2).
Many sources define "height" as the number of nodes on the path (so leaf = 1). Then diameter through node = L_height + R_height - 1 or similar. The most common convention in coding problems is edges: height(null) = -1, height(leaf) = 0. Then:
- Path through node (edges) = (L_height + 1) + (R_height + 1) = L_height + R_height + 2.
Some problems define diameter as the number of nodes on the longest path. Then path through node = 1 + L_height + R_height (current node + left path + right path in "node count" height). Always clarify "edges" vs "nodes" in the problem.
Implementation: Diameter (Edge-Based)
def diameter_of_binary_tree(root):
def dfs(node):
if node is None:
return -1, 0 # height, diameter
L_h, L_d = dfs(node.left)
R_h, R_d = dfs(node.right)
height = 1 + max(L_h, R_h)
through = L_h + R_h + 2 # edges through this node
diam = max(L_d, R_d, through)
return height, diam
if root is None:
return 0
_, d = dfs(root)
return d
Same tree: 1 (root), left 2 (with children 4, 5), right 3. Heights: 4,5,3 → 0; 2 → 1; 1 → 2. Longest path: from 4 to 5 (or 4 to 3, or 5 to 3). Path 4 → 2 → 5 has 2 edges. Path 4 → 2 → 1 → 3 has 3 edges. So diameter = 3. In the code: at node 1, L_h=1, R_h=0, through = 1+0+2 = 3; at node 2, through = 0+0+2 = 2. So max diameter = 3.
Diagram: Diameter "Through" a Node
1
/ \
2 3 At node 1: left height L_h=1 (path 2→4 or 2→5), right height R_h=0 (node 3).
/ \ Path THROUGH node 1 = (edge 1→2) + (edge 1→3) + (path in left) + (path in right)
4 5 = 1 + 1 + L_h + R_h = 2 + 1 + 0 = 3 edges. So diameter ≥ 3.
Longest path in tree: 4 --- 2 --- 1 --- 3 (edges: 4-2, 2-1, 1-3) → length 3 ✓
Why This Topic Matters
- Height is used everywhere: balance factor (AVL), level-order, "minimum depth," and as a subroutine for diameter.
- Diameter is a classic problem (e.g. LeetCode 543). The pattern "postorder, combine left and right info" is reusable.
- Getting the base case and the "through node" formula right (edges vs nodes) is a common interview pitfall.
Time and Space Complexity
- Time: O(n) — each node visited once. We do a constant amount of work per node (compare, add, max).
- Space: O(h) for the call stack, h = tree height. Worst case O(n) for a skew tree.
Edge Cases
- Empty tree (root is None): Diameter 0; height −1. Return 0 for diameter from the wrapper if needed.
- Single node: Height 0, diameter 0 (no path of length > 0 between two nodes).
- Only one child: At a node with only a left child, right height = −1; path through = L_h + (−1) + 2 = L_h + 1. Still correct.
Common Mistakes
Counting nodes vs edges. If the problem says "diameter = longest path" and measures in nodes, then at each node the path through it is 1 + L_height + R_height (if height is "number of nodes to leaf"). If it measures in edges, use L_height + R_height + 2 with edge-based height (null = −1). Check the problem statement.
- Wrong base case for height: Use −1 for null so that height(leaf) = 0. If you return 0 for null, then height(leaf) becomes 1 and all formulas shift.
- Forgetting to consider diameter inside subtrees: The longest path might not pass through the root. So diameter = max(left_diam, right_diam, through_current).
For diameter, use a helper that returns (height, diameter). Height = 1 + max(L_h, R_h). Diameter through node = L_h + R_h + 2 (edges). Overall diameter = max(L_d, R_d, through). Return (height, diameter) from the helper.
State clearly: "I'll use postorder so I have both children's heights. For each node I'll compute the longest path through it as left height plus right height plus two (for the two edges to the children). The diameter is the max of that over all nodes and the diameters in the subtrees." Then code the dfs returning (height, diameter).
Practice Problems
- LeetCode 543: Diameter of Binary Tree (edge-based diameter).
- LeetCode 110: Balanced Binary Tree (use height; check |left_h − right_h| ≤ 1 and recurse).
Summary
- Height: null → −1; else 1 + max(height(left), height(right)). Tree height = height(root).
- Diameter: longest path (in edges) between any two nodes. At each node, path through = L_height + R_height + 2; diameter = max(L_diam, R_diam, through).
- Use postorder; return (height, diameter) from helper. O(n) time, O(h) space. Clarify edges vs nodes in the problem.
11.5 Balanced Trees
Introduction
A balanced binary tree is one in which for every node, the heights of its left and right subtrees differ by at most 1. Formally: |height(left) − height(right)| ≤ 1, and both left and right subtrees are themselves balanced. A balanced tree has height O(log n), so operations that depend on height (e.g. search in a BST) stay efficient. Checking whether a tree is balanced is a classic problem (LeetCode 110): compute height recursively and at each node verify the balance condition; if any node is unbalanced, return false (or a sentinel like −1). This section gives the definition, the O(n) check algorithm, and code.
Definition
A binary tree is height-balanced if:
- For every node,
|height(left_subtree) − height(right_subtree)| ≤ 1. - Both the left and right subtrees are height-balanced (recursive definition).
An empty tree is balanced (by convention). A single node is balanced (height 0; both children have height −1, difference 0).
Diagram: Balanced vs Unbalanced
Balanced (|L_h - R_h| ≤ 1): Unbalanced (at root: L_h=2, R_h=0, diff=2):
1 1
/ \ /
2 3 2
/ \
4 3
(at 1: L_h=1, R_h=0, |1-0|=1 ✓) (at 1: L_h=2, R_h=-1, |2-(-1)|=3 ✗)
Algorithm: Check Balance and Return Height
Use a helper that returns the height of the subtree if it is balanced, and a sentinel value (e.g. −1 or −∞) if any subtree is unbalanced. Then the tree is balanced iff the helper returns a non-sentinel at the root. This avoids computing height separately from balance and keeps one O(n) pass.
- Base case: null → return 0 (or −1 if you use edge-based height; then "unbalanced" sentinel can be −2 or use a tuple (is_balanced, height)).
- Recurse on left and right. If either returns the sentinel, return sentinel (unbalanced somewhere below).
- If
|L_height − R_height| > 1, return sentinel. - Otherwise return
1 + max(L_height, R_height).
Common convention: use edge-based height (null = −1). So sentinel = −2 or any value that means "invalid." Then balanced check: if helper returns −2, not balanced.
def is_balanced(root):
def height_if_balanced(node):
if node is None:
return -1 # edge-based: null has height -1
L = height_if_balanced(node.left)
R = height_if_balanced(node.right)
if L == -2 or R == -2:
return -2 # already found unbalanced subtree
if abs(L - R) > 1:
return -2
return 1 + max(L, R)
return height_if_balanced(root) != -2
Balanced: Root with two children (both leaves). Left height 0, right height 0, difference 0. Root height 1. OK. Unbalanced: Root with only a left child; left child has only a left child (chain). At root: left height 1, right height −1, difference 2 > 1 → not balanced.
Alternative: Return (is_balanced, height)
Some prefer returning a tuple so the meaning is explicit:
def is_balanced_v2(root):
def check(node):
if node is None:
return True, -1
ok_left, h_left = check(node.left)
ok_right, h_right = check(node.right)
if not ok_left or not ok_right or abs(h_left - h_right) > 1:
return False, 0 # height irrelevant
return True, 1 + max(h_left, h_right)
return check(root)[0]
Time and Space Complexity
- Time: O(n) — each node visited once; we don't recompute height for the same node.
- Space: O(h) for the call stack.
Why This Topic Matters
- AVL and Red-Black trees (later topics) maintain balance so that height stays O(log n) and operations remain O(log n).
- LeetCode 110: Balanced Binary Tree is a common interview question. The "return sentinel if unbalanced" trick is the standard O(n) solution.
Common Mistakes
- Computing height in a separate pass. That gives O(n) per node and total O(n²) for a skew tree. Use one helper that returns either height or a sentinel so each node is visited once.
- Wrong sentinel value. Use a value that can't be a valid height (e.g. −2 when null returns −1). Check
if L == -2 or R == -2before checkingabs(L - R) > 1.
"I'll use a helper that returns the height if the subtree is balanced, and −2 (or similar) if not. For null I return −1. For a node I recurse left and right; if either is −2 or |L−R|>1 I return −2; else return 1+max(L,R). The tree is balanced iff the helper doesn't return the sentinel."
Practice Problems
- LeetCode 110: Balanced Binary Tree.
- LeetCode 108: Convert Sorted Array to BST (build a balanced BST from sorted array).
Use one helper that returns height when balanced and a sentinel (e.g. −2) when not, so you only traverse once. Check abs(L - R) <= 1 and that both L and R are not the sentinel.
Summary
- Balanced: For every node,
|height(left) − height(right)| ≤ 1, and both subtrees are balanced. - Check: Helper returns height if balanced, sentinel (e.g. −2) if not. Tree balanced iff root returns non-sentinel. O(n) time, O(h) space.
11.6 Lowest Common Ancestor
Introduction
The Lowest Common Ancestor (LCA) of two nodes p and q in a binary tree is the deepest node that has both p and q as descendants (a node can be a descendant of itself). It appears in many problems: distance between two nodes, path queries, and tree-based structures. For a general binary tree, the standard approach is a single recursive pass: if the current node is null or equals p or q, return it; otherwise recurse on left and right; if both sides return non-null, the current node is the LCA; otherwise return whichever side is non-null. For a BST, we can use the key order to descend left or right. This section gives both the general-tree and BST solutions.
Definition
LCA(p, q): The node that is an ancestor of both p and q and has the greatest depth (is "lowest" in the tree). By convention, if one of p or q is an ancestor of the other, that node is the LCA (e.g. LCA(root, leaf) = root when the leaf is in the tree).
Recursive Solution (General Binary Tree)
Idea: At each node, if the node is None or equals p or q, return the node. Recurse on left and right. If both left and right return non-null, the current node is the LCA (p and q lie in different subtrees). If only one side is non-null, that side holds the LCA (or one of p, q); propagate it up.
def lowest_common_ancestor(root, p, q):
if root is None or root is p or root is q:
return root
left = lowest_common_ancestor(root.left, p, q)
right = lowest_common_ancestor(root.right, p, q)
if left and right:
return root # p and q in different subtrees
return left if left else right
Tree: 1 (root), left 2 (children 4, 5), right 3. LCA(4, 5) = 2 (both in left subtree of 1). LCA(4, 3) = 1 (4 in left, 3 in right). LCA(4, 2) = 2 (2 is ancestor of 4; return 2 when we hit 2).
Diagram: LCA Examples
1
/ \
2 3
/ \
4 5
LCA(4, 5): Both in subtree of 2 → LCA = 2
LCA(4, 3): 4 in left subtree of 1, 3 in right → LCA = 1
LCA(4, 2): 2 is an ancestor of 4 → when we visit 2, we return 2 (root is p or q) → LCA = 2
LCA(1, 5): 1 is ancestor of 5 → LCA = 1
Why This Works
When we return a non-null value from a subtree, it means "this subtree contains at least one of p or q (or their LCA)." If both left and right return non-null, p and q must be in different subtrees, so the current node is the LCA. If only one side is non-null, the LCA is in that subtree (or the returned value is p or q itself); we just pass it up.
BST: Use Key Order
In a BST, if both p.val and q.val are less than root.val, the LCA is in the left subtree. If both are greater, it's in the right subtree. Otherwise (one ≤ root.val ≤ the other, or root is one of p, q), the root is the LCA.
def lowest_common_ancestor_bst(root, p, q):
if p.val > q.val:
p, q = q, p
while root:
if root.val < p.val:
root = root.right
elif root.val > q.val:
root = root.left
else:
return root
Time and Space Complexity
- General tree: O(n) time, O(h) space (call stack).
- BST (iterative): O(h) time, O(1) space.
Common Mistakes
- Assuming both p and q exist in the tree. The classic problem assumes they exist. If they might not, you need to verify both are found (e.g. count how many of p, q were seen in the subtree) and return None if only one is present.
- BST: not normalizing p and q. Ensure p.val ≤ q.val (swap if needed) so the condition "root between p and q" is simply
p.val ≤ root.val ≤ q.val.
"For a general tree I'll recurse: if root is None or p or q I return root. Then I get left and right results. If both are non-null, root is the LCA. Otherwise I return whichever is non-null. For BST I'll iterate: go left if root.val > q.val, right if root.val < p.val; when root is between p and q (or equals one), that's the LCA."
Practice Problems
- LeetCode 236: Lowest Common Ancestor of a Binary Tree.
- LeetCode 235: Lowest Common Ancestor of a BST.
- Distance between two nodes: dist = depth(p) + depth(q) − 2*depth(LCA).
General tree: "if root in (None, p, q) return root; recurse left and right; if both non-null return root else return left or right." BST: descend left or right by comparing keys until root lies between p and q (inclusive).
Summary
- LCA = deepest node that has both p and q as descendants. Node is its own descendant.
- General tree: Recurse; return root if root in (None, p, q); else if both left and right non-null return root, else return the non-null side.
- BST: Iterative or recursive: go left if both keys < root, right if both > root; else root is LCA. O(h) for BST, O(n) for general.
11.7 Path Sum Problems
Introduction
Path sum problems ask whether (or how many, or which) paths in a binary tree have a given sum. Common variants: (1) Root-to-leaf: Does any path from root to a leaf sum to target? (2) Return all root-to-leaf paths that sum to target. (3) Path sum III: Count paths where the sum equals target—paths can start and end at any node (not necessarily root or leaf). The first two use simple recursion with a running sum; the third uses a prefix-sum + hash table idea similar to "subarray with given sum." This section covers all three patterns.
Variant 1: Has Path Sum (Root to Leaf)
Problem: Given root and targetSum, return true if there exists a root-to-leaf path such that the sum of node values equals targetSum. At each node, subtract the node's value from the remaining target; at a leaf, check if the remaining is 0. Recurse on left and right with the new target.
def has_path_sum(root, target_sum):
if root is None:
return False
rem = target_sum - root.val
if root.left is None and root.right is None:
return rem == 0
return has_path_sum(root.left, rem) or has_path_sum(root.right, rem)
Tree: 5 (root), left 4 (left 11 with children 7, 2), right 8 (right 4). Target 22. Path 5 → 4 → 11 → 2 has sum 5+4+11+2 = 22. So return true.
Variant 2: All Paths With Sum (Root to Leaf)
Return a list of all root-to-leaf paths where the path sum equals targetSum. Use DFS: maintain current path (list) and running sum; at a leaf, if sum equals target, append a copy of the path to the result. Backtrack by popping after recursing.
def path_sum_ii(root, target_sum):
result = []
def dfs(node, rem, path):
if node is None:
return
path.append(node.val)
rem -= node.val
if node.left is None and node.right is None and rem == 0:
result.append(path[:])
dfs(node.left, rem, path)
dfs(node.right, rem, path)
path.pop()
if root:
dfs(root, target_sum, [])
return result
Diagram: Root-to-Leaf vs Any Path
Tree: 5
/ \
4 8
/ / \
11 13 4
/ \ \
7 2 1
Root-to-leaf (Variant 1 & 2): Path must start at root and end at a leaf.
Example: 5 → 4 → 11 → 2 (sum 22) ✓
Any path (Variant 3): Path can start and end at any node (parent → descendant).
Example: 11 → 7 (sum 18), or 5 → 4 → 11 (sum 20). We use prefix sum on the
current path: at node 11, running_sum = 5+4+11 = 20; if target=20, count += seen[0].
Variant 3: Count Paths With Sum (Any Start/End)
Paths can start and end at any node (parent to descendant). Same idea as "subarray with given sum": maintain a running sum from the root as we traverse, and at each node check how many times (running_sum - target) has been seen (prefix sums along the current path). Use a dict mapping prefix_sum → count; when going down, add current prefix to the dict; when going up, remove it (backtrack).
def path_sum_iii(root, target_sum):
from collections import defaultdict
count = defaultdict(int)
count[0] = 1
def dfs(node, running):
if node is None:
return 0
running += node.val
ans = count.get(running - target_sum, 0)
count[running] += 1
ans += dfs(node.left, running)
ans += dfs(node.right, running)
count[running] -= 1
return ans
return dfs(root, 0)
Why count[0] = 1? So when running_sum == target_sum, we have running_sum - target_sum == 0, and we count one path (from root to current node).
Edge Cases (Path Sum)
- Empty tree: hasPathSum(empty, anything) = false; path_sum_ii(empty, k) = []; path_sum_iii(empty, k) = 0.
- Single node: Root-to-leaf path is just the root; check if root.val == target (or rem == 0 after subtracting).
- Target is zero: A path with sum 0 exists if some root-to-leaf path sums to 0 (e.g. tree with negative values that cancel). For variant 3, count[0]=1 counts "path from root to current node" when running_sum == target.
- Negative values: All three variants work with negative node values; variant 3's prefix-sum approach is correct as long as we backtrack the map when leaving a node.
Pattern Recognition
Use root-to-leaf path sum when the problem says "path from root to leaf" or "root-to-leaf sum." Use prefix sum on the current path + hash table when paths can start and end at any node (variant 3). The key is: at each node, "how many paths ending here have sum target?" = count of prefix sums equal to (running_sum − target).
Time and Space Complexity
- Variant 1 & 2: O(n) time. Space O(h) for recursion; variant 2 also O(n) for storing paths in the worst case.
- Variant 3: O(n) time, O(h) space for recursion and the prefix-sum map (at most h keys on current path).
Root-to-leaf: pass (remaining target) down; at leaf check rem == 0. For "all paths," backtrack with path.append/pop. For "any path" count: prefix sum on the current path + hash table; initialize count[0]=1 and backtrack the map when leaving the node.
Summary
- Root-to-leaf sum: Recurse with (target - node.val); at leaf return (rem == 0).
- All root-to-leaf paths: DFS with path list; at leaf if sum matches append path[:]; backtrack with pop.
- Count paths (any start/end): Prefix sum along path + dict; count[running - target]; backtrack count when leaving. O(n) time.
11.8 Serialize & Deserialize
Introduction
Serialize converts a binary tree into a string (e.g. for storage or transmission); deserialize reconstructs the tree from that string. The format must encode both values and structure. A common approach is preorder with a marker for null children (e.g. "null" or "#"): we can then reconstruct uniquely by reading tokens left to right and building the tree recursively. LeetCode 297 is the classic problem. This section covers preorder-based serialize/deserialize with a simple format.
Format: Preorder with Null Markers
Serialize: preorder traversal; for each node output its value and a separator; for null output a sentinel (e.g. "null"). Example: tree 1(2, 3) with 2 having left 4 → "1,2,4,null,null,null,3,null,null". Deserialize: split the string into a list of tokens; consume one token at a time; if it's the null marker, return None; otherwise create a node, then recursively build left and right (consuming tokens in preorder order).
Diagram: Serialize (Preorder + Null)
Tree: 1
/ \
2 3
/
4
Preorder visit: 1 → 2 → 4 → (null) → (null) → (null) → 3 → (null) → (null)
Serialized: "1,2,4,null,null,null,3,null,null"
Reading back: first token 1 → root; 2 → left of 1; 4 → left of 2;
null → right of 4 is None; null → right of 2 is None; null → left of 3 is None;
3 → right of 1; null, null → children of 3. Tree restored.
class Codec:
def serialize(self, root):
if root is None:
return "null"
left = self.serialize(root.left)
right = self.serialize(root.right)
return str(root.val) + "," + left + "," + right
def deserialize(self, data):
tokens = data.split(",")
self.i = 0
def build():
if self.i >= len(tokens) or tokens[self.i] == "null":
self.i += 1
return None
node = TreeNode(int(tokens[self.i]))
self.i += 1
node.left = build()
node.right = build()
return node
return build()
Tree: 1 (root), left 2, right 3. Serialize: "1,2,null,null,3,null,null". Deserialize: first token 1 → root; next 2 → left child; next null → left of 2 is None; next null → right of 2 is None; next 3 → right of root; then null, null for 3's children. Tree is restored.
Why Preorder Works
Preorder (root, left, right) encodes the structure: when we read back, the first token is the root; the next tokens form the left subtree (until we've consumed the same "shape" as the left), then the right. Null markers tell us when a subtree ends, so we don't need extra delimiters.
Time and Space Complexity
- Serialize: O(n) time and space (string length O(n)).
- Deserialize: O(n) time (each token read once), O(n) space for the tree and token list.
Common Mistakes
- Forgetting null markers for missing children. If you only output values in preorder without "null," you cannot tell whether a value is a left child, right child, or root of a subtree when reading back. The nulls define the structure.
- Deserialize: not advancing the index. Each call to build() must consume exactly one token (either a value or "null"). Increment the index (or use an iterator) so the next call sees the next token.
- Using a different separator or format. Serialize and deserialize must agree on the format (e.g. comma, "null" spelling).
Say: "I'll use preorder and output 'null' for missing children so the structure is uniquely recoverable. For deserialize I'll split the string and use a single global index (or iterator) that advances as we consume tokens. Each node consumes one token; if it's 'null' we return None, else we build the node and recurse for left and right."
Practice Problems
- LeetCode 297: Serialize and Deserialize Binary Tree.
- Variants: use different delimiters or binary format; serialize to a list instead of string.
Use preorder + "null" for missing children. Serialize: if null return "null"; else return str(val) + "," + serialize(left) + "," + serialize(right). Deserialize: split by comma; use an index (or iterator) to consume one token per call; if "null" return None and advance; else build node and set left/right by recursing.
Summary
- Serialize: Preorder; output value or "null"; comma-separated. Deserialize: split tokens; build in preorder (consume one token per node; "null" → None).
- One-to-one mapping between tree and string. O(n) time and space for both.
11.9 Binary Search Tree
Introduction
A Binary Search Tree (BST) is a binary tree where for every node, all values in the left subtree are less than the node's value, and all values in the right subtree are greater (often defined as left ≤ root < right or strict inequality, depending on problem). This property gives inorder traversal = sorted order. Search, insert, and delete can be done in O(h) time where h is height (O(log n) if balanced). BSTs support efficient lookup, range queries, and ordered iteration. This section covers the BST property, search/insert, validation, and delete (concept).
Formal Definition
A binary tree is a BST if for every node n: (1) every key in the left subtree of n is strictly less than n.key (or ≤ if duplicates allowed on the left), and (2) every key in the right subtree of n is strictly greater than n.key. An empty tree is a BST. This invariant implies that an inorder traversal visits keys in non-decreasing order.
BST Property
For every node with value val: all nodes in the left subtree have value < val; all nodes in the right subtree have value > val. (Duplicate handling: some definitions use ≤ on one side.) As a result, inorder (left → root → right) visits nodes in ascending order.
Search
Compare target with root: if equal, found; if target < root.val, search left; else search right. If we reach null, not found. O(h) time.
def search_bst(root, target):
if root is None or root.val == target:
return root
if target < root.val:
return search_bst(root.left, target)
return search_bst(root.right, target)
Insert
Find the position where the key would be found (search); insert a new node as a leaf there. If key < root.val go left (if left is null, attach new node); else go right. O(h) time.
def insert_bst(root, val):
if root is None:
return TreeNode(val)
if val < root.val:
root.left = insert_bst(root.left, val)
else:
root.right = insert_bst(root.right, val)
return root
Validate BST
Check that every node lies in an allowed range (min, max). Root is in (-∞, +∞); left child must be in (min, root.val); right child in (root.val, max). Recurse with updated bounds. O(n) time.
def is_valid_bst(root):
def check(node, lo, hi):
if node is None:
return True
if not (lo < node.val < hi):
return False
return check(node.left, lo, node.val) and check(node.right, node.val, hi)
return check(root, float("-inf"), float("inf"))
Valid BST: 5 (root), left 3, right 7; 3 has left 1. Inorder: 1, 3, 5, 7 (sorted). Invalid: 5 with left 6 (6 > 5 violates left-subtree rule).
Diagram: BST Property
Valid BST: Invalid (6 > 5 in left subtree):
5 5
/ \ / \
3 7 6 7
/ (all left < 5, ↑
1 all right > 5) left subtree must be < 5
Inorder (L→root→R): 1, 3, 5, 7 (always sorted in a BST)
Delete (Concept)
To delete a node: (1) If it's a leaf, remove it. (2) If it has one child, replace it with that child. (3) If it has two children, replace its value with the inorder successor (smallest in right subtree) or inorder predecessor (largest in left subtree), then delete that successor/predecessor node (which has at most one child). O(h) time.
Edge Cases (BST)
- Empty tree: search/insert/validate on null root; return None, new node, or true (empty tree is valid BST) as appropriate.
- Single node: Valid BST; search returns the node if key matches; insert replaces or adds as child per implementation.
- Duplicates: Problem may allow left ≤ root < right or strict inequality. Validate and insert logic must match (e.g. allow left or right for equal keys).
- Integer overflow in validate: Using
float('inf')for (lo, hi) avoids overflow; or use None to mean "no bound."
Time and Space Complexity
- Search, insert, delete: O(h) time; O(h) space for recursion. h = height (O(log n) if balanced, O(n) worst).
- Validate: O(n) time, O(h) space.
BST: left < root < right ⇒ inorder = sorted. Search/insert: compare with root and go left or right. Validate: pass (min, max) range; left in (min, root.val), right in (root.val, max).
"BST: every node has left subtree < root < right subtree; inorder gives sorted order. Search and insert: compare with root, recurse left or right; O(h). To validate I pass (lo, hi) and tighten: left gets (lo, root.val), right gets (root.val, hi). For delete with two children I use inorder successor or predecessor."
Practice Problems
- LeetCode 98: Validate Binary Search Tree.
- LeetCode 700: Search in a BST.
- LeetCode 701: Insert into a BST.
- LeetCode 450: Delete Node in a BST.
Summary
- BST property: Left subtree < root < right subtree ⇒ inorder gives sorted order.
- Search/insert: Compare with root; recurse left or right; O(h). Delete: replace with successor or predecessor, then delete that node.
- Validate: Check each node is in (lo, hi); tighten range for left/right. O(n).
11.10 AVL Tree
Introduction
An AVL tree is a self-balancing BST where for every node, the heights of the left and right subtrees differ by at most 1. The balance factor of a node is height(left) − height(right); in an AVL tree it is −1, 0, or 1. After insert or delete, we may need to rotate to restore this property. Rotations (single: left/right; double: left-right/right-left) rearrange nodes so the tree stays balanced while preserving the BST order. AVL guarantees O(log n) height, so search, insert, and delete are O(log n). This section covers the balance factor, rotations, and when to apply them.
Balance Factor
Balance factor (BF) = height(left subtree) − height(right subtree). AVL invariant: |BF| ≤ 1 for every node. If after an insert/delete some node has BF = 2 or −2, we fix it by rotating at that node (or a descendant).
Rotations
Right rotation (RR): Used when the left subtree is too tall (BF = 2). The left child becomes the new root of the subtree; the old root becomes its right child; the previous left child's right subtree becomes the old root's left subtree. Preserves BST order.
Left rotation (LL): Used when the right subtree is too tall (BF = −2). The right child becomes the new root; the old root becomes its left child; the previous right child's left subtree becomes the old root's right subtree.
Diagram: Left Rotation (LL)
Before (right-heavy, BF=-2): After left rotate at 1:
1 2
\ / \
2 1 3
\
3
Node 2 becomes new root; 1 becomes left child of 2; 2's old left (none) becomes 1's right.
Left-Right (LR): When BF = 2 and the left child has BF = −1 (left's right subtree is taller). First left-rotate the left child, then right-rotate the current node.
Right-Left (RL): When BF = −2 and the right child has BF = 1. First right-rotate the right child, then left-rotate the current node.
def height(node):
return -1 if node is None else 1 + max(height(node.left), height(node.right))
def right_rotate(z):
y = z.left
z.left = y.right
y.right = z
return y
def left_rotate(z):
y = z.right
z.right = y.left
y.left = z
return y
Insert 1, 2, 3 in order into an AVL. After 1, 2: BF(root)=−1. After 3: right subtree of root grows; BF(root)=−2. Apply left rotation at root: node 2 becomes root, 1 is its left child, 3 is its right child. Tree is balanced.
Insert and Delete (Concept)
Insert: Insert as in BST; then walk back up the path to the root. At each node, recompute height and BF. If |BF| = 2, apply the appropriate rotation (LL, RR, LR, RL) once; the subtree height often decreases so ancestors may need no further fix (in standard AVL, one or two rotations suffice per insert).
Delete: Delete as in BST; then rebalance along the path from the deleted node to the root, applying rotations when |BF| = 2.
Time and Space Complexity
- Search, insert, delete: O(log n) time (height is O(log n)). Space O(log n) for recursion.
- Rotation: O(1) per rotation; at most O(1) rotations per insert/delete along the path.
Common Mistakes
- Applying the wrong rotation. BF = 2 with left child's BF = 1 → single right rotation (RR). Left child's BF = −1 → double rotation (LR: left rotate left child, then right rotate node). Similarly for BF = −2 check right child's BF for LL vs RL.
- Forgetting to update heights after rotation. After any rotation, recompute heights of the nodes that moved; then propagate height updates up the path to the root.
You usually don't implement full AVL in an interview; explaining the idea is enough: "AVL keeps |BF| ≤ 1. After insert we fix with single (RR/LL) or double (LR/RL) rotation. I'd need to update heights and check the child's BF to choose the rotation." If asked to code, implement height() and one rotation (e.g. left rotate) and describe the rest.
Practice Problems
- LeetCode 1382: Balance a BST (can use inorder + rebuild, or AVL-style rotations).
- Concept: implement insert with rebalance; or explain when to use RR, LL, LR, RL.
BF = height(left) − height(right). BF = 2: left heavy → RR or LR (check left child's BF). BF = −2: right heavy → LL or RL (check right child's BF). After rotation, update heights and continue up.
Summary
- AVL: BST with |balance factor| ≤ 1 at every node. BF = height(left) − height(right).
- Rebalance: Single rotations (RR when left-heavy, LL when right-heavy); double rotations (LR, RL) when the taller child is skewed the other way.
- Insert/delete: BST step then rebalance along path to root. O(log n) per operation.
11.11 Red-Black Tree
Introduction
A Red-Black tree is a self-balancing BST where each node has an extra color (red or black). Invariants on colors and "black height" guarantee that the tree stays roughly balanced, so height is O(log n) and search, insert, and delete are O(log n). Red-black trees are used in many standard libraries (e.g. std::map in C++) because they balance well in practice and rebalancing after insert/delete involves a bounded number of rotations and recolorings. This section covers the invariants and the high-level idea of insert/delete fix-up.
Invariants (Rules)
- Every node is either red or black.
- The root is black.
- Leaves (null pointers) are considered black.
- A red node has only black children (no two reds in a row on any path).
- For every node, all simple paths from that node down to descendant leaves contain the same number of black nodes (called "black height").
From these, the longest path (alternating red–black) is at most twice the shortest (all black), so height ≤ 2·log₂(n+1) = O(log n).
Insert (High Level)
Insert as in a BST and color the new node red. This may violate "red has black children" (if the parent is red) or "root is black" (if we inserted the root). Fix-up: Walk up from the new node. If the current node is red and its parent is red, we have a "double red." Depending on the color of the uncle (parent's sibling): (1) If uncle is red: recolor parent and uncle to black, grandparent to red, and continue from grandparent. (2) If uncle is black (or null): rotate (and possibly recolor) so the double red is resolved—either a single rotation (LL/RR style) or a double rotation (LR/RL style), then recolor. After fix-up, ensure the root is black.
Delete (High Level)
Delete as in a BST. If the removed node was black, the "black height" on some path decreases; fix-up restores it by recolorings and rotations. The sibling of the node that "lost" a black is used to push a black down or rotate. Details are more involved than insert; the idea is to propagate the "deficit" up until we can fix it with a rotation and recolor.
After inserting a red node under a red parent: if the uncle is red, we recolor (parent and uncle → black, grandparent → red). If the uncle is black, we rotate at the grandparent (e.g. left-rotate if the red nodes form a right spine) so that the tree satisfies "no two reds in a row" and black heights stay even.
Red-Black vs AVL
- AVL: Stricter balance (|BF| ≤ 1); slightly lower height; more rotations on insert/delete.
- Red-Black: Looser balance; fewer rotations in practice; often faster for mixed insert/delete workloads. Both give O(log n) operations.
Time and Space Complexity
- Search, insert, delete: O(log n) time. Space O(log n) for recursion.
- Fix-up: O(log n) steps; O(1) rotations per insert; delete may do O(log n) recolorings/rotations along the path.
Remember the five invariants; the key is "same black count on every path" and "no two reds adjacent." Insert: new node red; fix double red with uncle red (recolor) or uncle black (rotate + recolor). Root stays black.
Practice Problems
- Implementations: rarely required in interviews; understanding invariants and when to recolor/rotate is enough.
- Compare with AVL: when to prefer Red-Black (fewer rotations, good for mixed workloads) vs AVL (stricter balance, more lookups).
Summary
- Red-Black tree: BST with red/black nodes; root and leaves black; red nodes have black children; same black height on all paths ⇒ height O(log n).
- Insert: Insert red; fix double red (recolor if uncle red; rotate + recolor if uncle black). Delete: Fix black-height by recolor/rotate. Both O(log n).
11.12 Trie
Introduction
A Trie (prefix tree) is a tree used to store a set of strings. Each edge is labeled with a character; a path from the root to a node spells a prefix (possibly a full word). Nodes typically have a flag (e.g. is_end) to mark the end of a word. Tries support insert, search (exact word), and startsWith (prefix lookup) in O(m) time where m is the length of the word or prefix. They are used for autocomplete, spell check, and prefix-based problems (e.g. "count words with prefix"). This section covers the structure and basic operations.
Structure
Each node has:
- Children: A mapping from character to child node (e.g.
dictor array of size 26 for lowercase letters). - is_end: True if a word ends at this node (so the path from root spells a complete word).
The root represents the empty prefix. To insert "cat", we add edges c → a → t and set is_end at the node for "t".
Insert
Start at the root. For each character in the word, go to the corresponding child (create it if missing). After processing the last character, set is_end = True at that node. Time O(m), m = word length.
Search (Exact Word)
Follow the path for each character. If we hit a missing child, the word is not in the trie. If we finish the word, return True only if is_end is True at the final node (so "car" is not found when only "cart" was inserted). Time O(m).
startsWith (Prefix)
Follow the path for each character of the prefix. If we can traverse without missing a child, the prefix exists. We don't require is_end. Time O(m).
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for c in word:
if c not in node.children:
node.children[c] = TrieNode()
node = node.children[c]
node.is_end = True
def search(self, word):
node = self.root
for c in word:
if c not in node.children:
return False
node = node.children[c]
return node.is_end
def startsWith(self, prefix):
node = self.root
for c in prefix:
if c not in node.children:
return False
node = node.children[c]
return True
Insert "cat", "car", "card". Trie: root → c → a → t (is_end); root → c → a → r (is_end) → d (is_end). search("car") → True; search("ca") → False (no word ends at "ca"). startsWith("ca") → True.
Diagram: Trie for "cat", "car", "card"
root
│
c
│
a
/ \
t r t: is_end (word "cat")
│ │ r: is_end (word "car")
│ d d: is_end (word "card")
│
(cat) (car, card share c→a→r then branch)
search("car") → follow c,a,r → node has is_end ✓
search("ca") → follow c,a → node is_end? No ✗
startsWith("ca")→ follow c,a → path exists ✓
Edge Cases (Trie)
- Empty string: If the problem allows "" as a word, mark root.is_end = True when inserting "". search("") then returns True.
- Prefix of existing word: Insert "cat" then "ca"—you need a node at "ca" with is_end = True. search("ca") is True only if you set is_end when inserting "ca".
- Duplicate insert: Inserting the same word twice: just set is_end again; no structural change. Count of words may require a separate count field per node if needed.
Pattern Recognition
Use a Trie when the problem involves: prefix matching, "all words with prefix P," autocomplete, spell check, storing a set of strings with fast prefix/word lookup, or when you need to traverse by character and share structure across strings (e.g. "longest common prefix," "word squares").
Time and Space Complexity
- Insert, search, startsWith: O(m) time per operation, m = length of word/prefix.
- Space: O(total characters in all words) in the worst case; nodes are shared for common prefixes.
"I'll use a trie: each node has a dict of character → child and an is_end flag. Insert: walk character by character, create nodes as needed, set is_end at the last node. Search: walk and return whether the final node has is_end. startsWith: walk and return whether the path exists."
Practice Problems
- LeetCode 208: Implement Trie (Prefix Tree).
- LeetCode 212: Word Search II (trie of words + DFS on board).
- LeetCode 14: Longest Common Prefix (trie or simple compare).
Trie: one node per prefix; children by character; is_end for complete words. Insert: traverse/create; set is_end. Search: traverse; return is_end at last node. startsWith: traverse; return True if path exists.
Summary
- Trie: Tree for strings; path = prefix; is_end marks word end. Insert/search/startsWith in O(m).
- Use for prefix lookups, autocomplete, "count words with prefix," and string set membership.
11.13 Segment Tree
Introduction
A Segment Tree is a binary tree used for range queries (e.g. sum, min, max over [l, r]) and point updates (or range updates with lazy propagation) on an array. Each node stores an aggregate value for a segment [l, r]. The root covers [0, n−1]; left child covers the left half, right child the right half. Build takes O(n); query(l, r) and point update(index, value) take O(log n). It is useful when you have many range queries and updates. This section covers a segment tree for range sum with point update.
Structure
We use an array-based representation: root at index 1; for node at index i, left child at 2*i, right at 2*i + 1. Each node holds the aggregate (e.g. sum) for its segment. Leaves correspond to single elements. Tree size: about 4*n (or 2 * next_power_of_2(n)) to be safe.
Diagram: Segment Tree for arr[0..3] (Sum)
arr = [a0, a1, a2, a3] (indices 0..3)
Logical tree (each node covers a range):
[0..3] sum
/ \
[0..1] [2..3]
/ \ / \
[0..0] [1..1] [2..2] [3..3]
a0 a1 a2 a3
Array representation (root at 1):
index: 1 2 3 4 5 6 7
value: sum(0-3) sum(0-1) sum(2-3) a0 a1 a2 a3
children: 2,3 4,5 6,7 - - - -
Build
Fill leaves with the array values. Then fill internal nodes bottom-up: tree[i] = tree[2*i] + tree[2*i+1] (for sum). O(n) time.
Query(l, r)
Recurse from the root. If the current node's segment is entirely inside [l, r], return its value. If it doesn't overlap [l, r], return 0 (for sum). Otherwise recurse on left and right children and combine (e.g. add) the results. O(log n) time.
Point Update
Update the leaf for the given index (add delta or set value). Then update all ancestors: for index i, tree[i] = tree[2*i] + tree[2*i+1]. O(log n) time.
class SegmentTree:
def __init__(self, nums):
n = len(nums)
self.n = n
self.size = 1
while self.size < n:
self.size *= 2
self.tree = [0] * (2 * self.size)
for i in range(n):
self.tree[self.size + i] = nums[i]
for i in range(self.size - 1, 0, -1):
self.tree[i] = self.tree[2*i] + self.tree[2*i + 1]
def update(self, index, val):
i = self.size + index
self.tree[i] = val
i //= 2
while i:
self.tree[i] = self.tree[2*i] + self.tree[2*i + 1]
i //= 2
def query(self, left, right):
l, r = left + self.size, right + self.size
s = 0
while l <= r:
if l % 2 == 1:
s += self.tree[l]
l += 1
if r % 2 == 0:
s += self.tree[r]
r -= 1
l //= 2
r //= 2
return s
Array [1, 3, 5, 7, 9]. Build: leaves 1,3,5,7,9; parents = sum of children. query(1, 3) = 3+5+7 = 15. update(2, 10): set index 2 to 10, then update ancestors; query(1, 3) = 3+10+7 = 20.
Time and Space Complexity
- Build: O(n). Query / point update: O(log n).
- Space: O(n) for the tree array (about 4n nodes).
Common Mistakes
- Wrong segment boundaries in query. The iterative query that merges segments (l, r with l%2, r%2) must use the same 0-based or 1-based convention as your build. Check that [left, right] is inclusive and matches the problem.
- Array size too small. Use at least 4*n (or 2 * next power of 2) so that all nodes fit; otherwise index 2*i or 2*i+1 can go out of bounds.
- Update: forgetting to propagate. After updating the leaf, update all ancestors (i = i//2 in a loop) with the same combine function (e.g. sum of children).
"I'll use a segment tree with array representation: root at 1, children at 2*i and 2*i+1. Build bottom-up. For range sum query I merge segments that fall entirely inside [l, r]. For point update I update the leaf and then propagate to the root. Time O(n) build, O(log n) per query and update." Mention lazy propagation if the problem has range updates.
Practice Problems
- LeetCode 307: Range Sum Query - Mutable (segment tree or BIT).
- LeetCode 315: Count of Smaller Numbers After Self (segment tree / BIT for rank).
- SPOJ RMQ or similar: range min/max query with point update.
Segment tree: array representation, root at 1; children at 2*i and 2*i+1. Build bottom-up. Query: merge segments that lie inside [l, r]. Update: update leaf then propagate up. For range-update use lazy propagation.
Summary
- Segment tree: Binary tree over segments; each node = aggregate for [l, r]. Build O(n); query/point update O(log n).
- Use for range sum/min/max and point (or range with lazy) updates. Array size ~4n; query/update by walking from leaves to root.
11.14 Fenwick Tree (BIT)
Introduction
A Fenwick Tree (Binary Indexed Tree, BIT) supports prefix sum (sum of elements from index 0 to i) and point update (add a delta to one element) in O(log n) time and O(n) space. It is simpler and often faster in practice than a segment tree for these two operations. Range sum [l, r] = prefix_sum(r) − prefix_sum(l−1). The key idea: each index i in the BIT array stores the sum of a contiguous segment ending at i; the segment length is determined by the lowest set bit of i (i & -i). This section covers the 1-indexed BIT for prefix sum and point update.
Idea
We use a 1-indexed array tree. Index i is responsible for the segment of length lsb(i) = i & -i ending at position i: that is, indices [i - lsb(i) + 1, i]. To compute prefix_sum(i), we add tree[i], then subtract lsb(i) and repeat: i -= i & -i until i is 0. To update(i, delta), we add delta to tree[i], then add lsb(i) and repeat: i += i & -i until we exceed n. Both are O(log n).
Diagram: Fenwick Tree — Which Index Covers What
Index i Binary lsb(i)=i&-i Covers range (1-indexed)
------ ------ ----------- ----------------------
1 0001 1 [1..1]
2 0010 2 [1..2]
3 0011 1 [3..3]
4 0100 4 [1..4]
5 0101 1 [5..5]
6 0110 2 [5..6]
7 0111 1 [7..7]
8 1000 8 [1..8]
prefix_sum(7) = tree[7] + tree[6] + tree[4] (7 → 6 → 4 → 0; add and subtract lsb)
update(5, d): add d to tree[5], tree[6], tree[8] (5 → 6 → 8; add lsb)
Operations
prefix_sum(i): Sum of original array from index 1 to i (1-indexed). Start with sum = 0, pos = i. While pos > 0: sum += tree[pos]; pos -= pos & -pos. Return sum.
update(i, delta): Add delta to the element at index i. While pos ≤ n: tree[pos] += delta; pos += pos & -pos.
range_sum(l, r): prefix_sum(r) − prefix_sum(l − 1). Use 1-indexed l, r.
class FenwickTree:
def __init__(self, n):
self.n = n
self.tree = [0] * (n + 1)
def update(self, i, delta):
while i <= self.n:
self.tree[i] += delta
i += i & -i
def prefix_sum(self, i):
s = 0
while i > 0:
s += self.tree[i]
i -= i & -i
return s
def range_sum(self, l, r):
return self.prefix_sum(r) - self.prefix_sum(l - 1)
Build from array: Initialize tree to zeros, then for each index i call update(i, arr[i]) (using 1-based index). O(n log n). Or build in O(n) by filling tree[i] with sum of segment and then updating "parents" (standard O(n) build exists).
Array [1, 3, 5, 7] (1-indexed: indices 1..4). After build: prefix_sum(3) = 1+3+5 = 9. update(2, 2): add 2 to index 2 (so value becomes 5); prefix_sum(3) = 1+5+5 = 11. range_sum(2, 4) = prefix_sum(4) − prefix_sum(1) = 16 − 1 = 15.
Time and Space Complexity
- Update, prefix_sum, range_sum: O(log n) per call.
- Space: O(n). Build: O(n log n) with naive updates; O(n) with careful build.
Edge Cases (Fenwick Tree)
- 1-indexed vs 0-indexed: The standard BIT is 1-indexed. If your problem uses 0-based indices, convert: update(i+1, delta) and prefix_sum(i+1); range_sum(l, r) uses prefix_sum(r+1) − prefix_sum(l).
- Empty array or n=0: Initialize with n+1 size; avoid update/query when n is 0 (or handle with a guard).
- Range [l, r] when l=1: range_sum(1, r) = prefix_sum(r) − prefix_sum(0); prefix_sum(0) should be 0 (no elements).
Common Mistakes
- Using 0-based index in the BIT. The lsb logic and "which index covers what" assume 1-based indices. If you use 0-based, the covering ranges and update/query logic change; stick to 1-based and convert at the interface.
- Forgetting to add delta (or setting value). update(i, delta) adds delta to the element. If the problem says "set value at i to v," you need to add (v − old_value) or maintain the array and update with the difference.
"Fenwick tree uses 1-indexed array; each index i stores the sum of a segment of length (i & -i) ending at i. Prefix sum: add tree[i] and do i -= i & -i. Update: add delta and do i += i & -i. Range sum is prefix_sum(r) − prefix_sum(l−1). O(log n) per operation, O(n) space. Simpler than segment tree for prefix/range sum + point update."
Practice Problems
- LeetCode 307: Range Sum Query - Mutable (BIT or segment tree).
- LeetCode 315: Count of Smaller Numbers After Self (BIT for rank/count).
- Inversion count: use BIT to count smaller elements to the right.
BIT is 1-indexed. Update: i += i & -i. Prefix sum: i -= i & -i. Range sum [l, r] = prefix_sum(r) − prefix_sum(l−1). Use when you need prefix/range sum + point updates; simpler than segment tree for that.
Summary
- Fenwick Tree (BIT): 1-indexed array; index i covers segment of length i & -i ending at i. prefix_sum and update in O(log n).
- range_sum(l, r) = prefix_sum(r) − prefix_sum(l−1). Build by updates in O(n log n) or O(n) with proper build.
11.15 Sparse Table
Introduction
A Sparse Table is a data structure for range min/max (or other idempotent) queries on a static array. After O(n log n) preprocessing, each query is answered in O(1) time. It works for idempotent operations: min, max, gcd—where combining a value with itself gives the same value (so overlapping intervals are fine). It does not support point updates; the array is static. For range sum you need segment tree or BIT. This section covers the idea, build, and query for range minimum.
Idea
Precompute st[i][j] = minimum of the segment starting at index i with length 2^j (i.e. arr[i..i+2^j−1]). We can compute st[i][j] from st[i][j−1] and st[i + 2^(j−1)][j−1] (two halves). For a query [l, r], let k = floor(log2(r − l + 1)). The segments [l, l+2^k−1] and [r−2^k+1, r] cover [l, r] (they overlap but for min/max that's fine). So query(l, r) = min(st[l][k], st[r−2^k+1][k]).
Build
st[i][0] = arr[i] (length 1). For j from 1 to max_j: st[i][j] = min(st[i][j−1], st[i + 2^(j−1)][j−1]), for all i such that the second segment is in bounds. Precompute log2 for integers (e.g. log_table[length] = k) for O(1) query.
import math
def build_sparse_table(arr):
n = len(arr)
max_j = math.floor(math.log2(n)) + 1
st = [[0] * max_j for _ in range(n)]
for i in range(n):
st[i][0] = arr[i]
for j in range(1, max_j):
step = 1 << (j - 1)
for i in range(n - (1 << j) + 1):
st[i][j] = min(st[i][j-1], st[i + step][j-1])
return st
def query_min(st, l, r, log_table):
length = r - l + 1
k = log_table[length]
return min(st[l][k], st[r - (1 << k) + 1][k])
Precomputing log2 for Query
To get k = floor(log2(length)) in O(1), precompute an array log_table: for each length from 1 to n, store the exponent k such that 2^k ≤ length < 2^(k+1). Then query_min(st, l, r, log_table) uses k = log_table[r - l + 1].
arr = [3, 2, 4, 5, 1, 1, 5, 2]. st[i][0] = arr[i]. st[0][1] = min(arr[0..1]) = 2; st[0][2] = min(arr[0..3]) = 2. Query [2, 5]: length 4, k=2; min(st[2][2], st[5-4+1][2]) = min(min(arr[2..5]), min(arr[2..5])) = min(1, 1) = 1.
Time and Space Complexity
- Build: O(n log n) time and space (n × log n table).
- Query: O(1) time.
- No updates: Sparse table is static. For point updates use segment tree.
When to Use
Use sparse table when: array is static, queries are range min/max/gcd, and you need O(1) per query. Use segment tree when you need point/range updates or non-idempotent (e.g. sum) queries.
Common Mistakes
- Using for range sum. Sum is not idempotent (overlapping segments would double-count). Use segment tree or BIT for sum.
- Off-by-one in query. The second segment starts at
r - (1 << k) + 1and has length2^k, so it ends at r. Check that both segments are within [l, r].
"For static RMQ I can use a sparse table: precompute st[i][j] = min over [i, i+2^j−1] in O(n log n). Query [l,r]: k = floor(log2(r-l+1)); return min(st[l][k], st[r-2^k+1][k]) in O(1). Works for min, max, gcd—idempotent only. No updates."
Practice Problems
- Range Minimum Query (RMQ) on static array.
- Problems where you need many range min/max queries and the array doesn't change.
st[i][j] = op over arr[i..i+2^j−1]. Build: st[i][0]=arr[i]; st[i][j] = op(st[i][j-1], st[i+2^(j-1)][j-1]). Query [l,r]: k = floor(log2(r-l+1)); return op(st[l][k], st[r-2^k+1][k]). Idempotent only (min, max, gcd).
Summary
- Sparse table: Static array; O(n log n) build, O(1) range min/max (or gcd) query. Idempotent operations only.
- st[i][j] = min over [i, i+2^j−1]. Query: two overlapping segments of length 2^k cover [l, r]. No updates.
11.16 Binary Lifting
Introduction
Binary Lifting is a technique on rooted trees that precomputes "power-of-two" steps upward from each node. With O(n log n) preprocessing, you can answer k-th ancestor (the node reached by going up k edges from a node) in O(log n) per query, and Lowest Common Ancestor (LCA) in O(log n) as well. The idea is similar to a sparse table: store up[u][j] = the node reached by moving 2^j steps up from u, then decompose any jump into binary. This section covers the precomputation, k-th ancestor, and LCA using binary lifting.
Formal Definition
Given a rooted tree with n nodes and root r, define:
parent(u)= the parent of nodeu(root has no parent).up[u][0] = parent(u);up[u][j] = up[ up[u][j−1] ][j−1]for j ≥ 1 (i.e. 2^j-th ancestor = two steps of 2^(j−1)).- K-th ancestor of u: the node reached by moving exactly k edges from u toward the root (0-th ancestor = u; 1-st = parent).
- LCA(u, v): the deepest node that is an ancestor of both u and v.
Mental Model
Think of climbing the tree in "powers of two": from any node you can jump 1, 2, 4, 8, … steps up. To move up k steps, write k in binary (e.g. 5 = 101) and apply the corresponding jumps (1 step + 4 steps). The precomputed table up[u][j] gives you the result of a single 2^j-step jump from u.
Precomputation
Assume we have parent[u] and depth[u] for each node (from a BFS/DFS from the root). Let LOG = ceil(log2(n)).
up[u][0] = parent[u](or -1 / None for root).- For j from 1 to LOG−1:
up[u][j] = up[ up[u][j−1] ][j−1]ifup[u][j−1]is valid; else -1.
Process nodes in BFS order (so when we compute up[u][j], up[parent[u]][*] is already computed). Time O(n log n), space O(n log n).
K-th Ancestor Query
To find the k-th ancestor of u: if k > depth[u], no such node (return -1). Otherwise, for each bit set in k, jump by that power of two. Example: k = 5 (binary 101) → jump 2^0 then 2^2: u = up[u][0], then u = up[u][2]. Iterate j from 0 to LOG−1; if k has the j-th bit set, do u = up[u][j]. O(log n) time.
LCA Using Binary Lifting
1) Bring both nodes to the same depth: if depth[u] > depth[v], replace u with its (depth[u]−depth[v])-th ancestor (using k-th ancestor); similarly if depth[v] > depth[u]. 2) If u == v, return u. 3) Lift both u and v in large steps: for j from LOG−1 down to 0, if up[u][j] != up[v][j], set u = up[u][j], v = up[v][j]. After the loop, u and v are one step below the LCA, so parent[u] (or up[u][0]) is the LCA. O(log n) per query.
Diagram: Binary Lifting (up table)
Tree (root at top): up[u][j] = 2^j-th ancestor of u
r (root) u=3: up[3][0]=2, up[3][1]=r
/ \ u=2: up[2][0]=r, up[2][1]=-1
1 2 K-th ancestor of 3, k=2: 3 -> up[3][1]=r
\ \ LCA(3,4): same depth? 3 depth 2, 4 depth 2.
3 4 up[3][1]!=up[4][1] -> 3=up[3][1]=r, 4=up[4][1]=r;
then up[3][0]=up[4][0]=r -> LCA = r (or parent of 3/4).
def build_binary_lifting(parent, n):
LOG = (n).bit_length()
up = [[-1] * LOG for _ in range(n)]
for u in range(n):
up[u][0] = parent[u]
for j in range(1, LOG):
for u in range(n):
if up[u][j-1] != -1:
up[u][j] = up[up[u][j-1]][j-1]
return up
def kth_ancestor(up, u, k, depth):
if k > depth[u]:
return -1
for j in range(len(up[0])):
if (k >> j) & 1:
u = up[u][j]
if u == -1:
return -1
return u
def lca(up, depth, u, v):
if depth[u] < depth[v]:
u, v = v, u
d = depth[u] - depth[v]
u = kth_ancestor(up, u, d, depth)
if u == v:
return u
LOG = len(up[0])
for j in range(LOG - 1, -1, -1):
if up[u][j] != up[v][j]:
u, v = up[u][j], up[v][j]
return up[u][0]
Tree: root 0, children 1,2; 1's child 3; 2's child 4. depth = [0,1,1,2,2]. up[3][0]=1, up[3][1]=0. kth_ancestor(3, 2) = up[3][1] = 0. LCA(3, 4): bring to same depth (both 2); 3 and 4 have different up[][1] (1 vs 2), so u=0, v=0; then up[3][0]=1, up[4][0]=2, so we don't move; LCA = parent(3)=1? No—after bringing to same depth we have 3 and 4; up[3][1]=0, up[4][1]=0 so we set 3=0, 4=0; then LCA = up[0][0] = root 0.
Time and Space Complexity
- Preprocessing: O(n log n) time and space (n nodes × log n levels).
- K-th ancestor: O(log n) per query.
- LCA: O(log n) per query.
When to Use
Use binary lifting when you have a static tree and many k-th ancestor or LCA queries. For a single LCA query, two DFS passes (parent/depth + simple climb) are O(n); binary lifting pays off when you have many queries. Also used in advanced tree techniques (e.g. path aggregates with segment tree over Euler tour).
Edge Cases
- Root: parent[root] = -1; up[root][j] = -1 for all j. kth_ancestor(root, 0) = root; kth_ancestor(root, 1) = -1.
- K > depth: kth_ancestor(u, k) should return -1 (or invalid) when k exceeds depth[u].
- LCA(u, u): return u. Same depth step is a no-op when depths are equal.
Common Mistakes
- Processing order in build. Compute
up[u][j]only after parent'supis ready; BFS/level order ensures that. - LCA loop direction. In the "lift both" step, iterate j from LOG−1 down to 0 so you take the largest possible steps first and don't overshoot the LCA.
"I'll use binary lifting: precompute up[u][j] = 2^j-th ancestor in O(n log n). K-th ancestor: break k into bits and jump. LCA: bring both to same depth with k-th ancestor, then lift both while up[u][j] != up[v][j]. Final parent is LCA. O(log n) per query."
Practice Problems
- LeetCode 1483: Kth Ancestor of a Tree Node (binary lifting).
- LeetCode 236: Lowest Common Ancestor of a Binary Tree (also solvable with binary lifting on general tree after parent/depth build).
- Distance between two nodes: depth[u] + depth[v] − 2*depth[LCA(u,v)].
up[u][0]=parent; up[u][j]=up[up[u][j-1]][j-1]. K-th ancestor: for each bit in k, u=up[u][j]. LCA: same depth (k-th ancestor), then for j from high to 0 if up[u][j]!=up[v][j] lift both; return up[u][0].
Summary
- Binary lifting: Precompute up[u][j] = 2^j-th ancestor; O(n log n) build, O(log n) k-th ancestor and LCA.
- K-th ancestor: decompose k in binary and jump. LCA: same depth, then lift both until parents match; that parent is LCA.
11.17 Euler Tour (Tree Flattening)
Introduction
An Euler Tour (or tree flattening) is a DFS order that visits each node when entering and when leaving the subtree. The result is an array where every subtree corresponds to a contiguous segment: if node u has in[u] and out[u] (first and last time we visit it), then the segment [in[u], out[u]] contains exactly the nodes in the subtree of u. This lets you answer "subtree queries" (e.g. sum or update all nodes in subtree of u) using a segment tree or Fenwick tree on the flattened array. Build: one DFS, O(n).
Formal Definition
For a rooted tree, perform a DFS from the root. When we first enter a node u, append u to the tour and set in[u] = current_index. When we finish processing all children and backtrack from u, append u again (or just record out[u] = current_index if we use a single "exit" index). The subtree of u in the tree corresponds to the contiguous range [in[u], out[u]] in the tour (if we only store "in" and "out" and the segment between them is the subtree). In the variant where we push at enter and at exit, the segment [in[u], out[u]] has length 2·subtree_size(u) − 1; for subtree queries we often use a variant that stores each node once: in[u] to out[u] spans exactly the subtree (by storing node indices in order of first visit only, and out[u] = last index of any node in subtree).
Two Common Variants
- Enter + exit (full tour): Push node when entering and when leaving. Segment [in[u], out[u]] has every node in subtree appearing twice (except u once at boundaries). Useful for path queries (e.g. count edges on path) with a different structure.
- Subtree flattening (one index per node): Store each node at the time of first visit. Fill an array
ord[]so thatord[in[u]]] = uand the rangein[u] .. out[u]is exactly the in-times of all nodes in the subtree of u. Then subtree of u = segment [in[u], out[u]] inord(or in a value array indexed by in-time).
Mental Model
Imagine walking along the edges of the tree: start at root, go down to a child, eventually backtrack. Write down the node every time you "enter" it. The list you get has the property: for any node u, the block of entries from "enter u" to "enter the last descendant and then we never see u's subtree again" is exactly the subtree. So [in[u], out[u]] = subtree of u in the flattened array.
Build (DFS)
Initialize a timer timer = 0. For each node u: set in[u] = timer, then timer += 1 (or append u to tour). Recurse on all children. Set out[u] = timer - 1 (last index that belongs to u's subtree). So subtree of u = [in[u], out[u]] inclusive. O(n) time.
Diagram: Euler Tour (subtree = contiguous segment)
Tree: r DFS order (enter only): r, 1, 3, 2, 4
/ \ in[r]=0, in[1]=1, in[3]=2, in[2]=3, in[4]=4
1 2 out[r]=4, out[1]=2, out[3]=2, out[2]=4, out[4]=4
/ \ \ Subtree of 1 = [1,2] -> nodes 1,3. Subtree of r = [0,4] -> all.
3 (4) 4
(subtree of 1: 1,3)
def euler_tour(g, root=0):
n = len(g)
in_time = [0] * n
out_time = [0] * n
timer = [0]
def dfs(u, parent):
in_time[u] = timer[0]
timer[0] += 1
for v in g[u]:
if v != parent:
dfs(v, u)
out_time[u] = timer[0] - 1
dfs(root, -1)
return in_time, out_time
# Subtree of u in flattened array = [in_time[u], out_time[u]]
# Use with segment tree / BIT: value at index in_time[u] = value of node u
Use Case: Subtree Sum / Update
Store val[in_time[u]] = value_of_node_u. Subtree sum for u = range query [in_time[u], out_time[u]] on a segment tree or BIT. Point update at node u = update index in_time[u]. Range update on subtree of u = update segment [in_time[u], out_time[u]]. O(log n) per query with segment tree/BIT.
Time and Space Complexity
- Build: O(n) one DFS. in/out arrays: O(n).
- Subtree query/update: O(log n) with segment tree or BIT on the flattened array of size n.
Edge Cases
- Single node: in[u] = out[u]; segment has one element.
- Root: Subtree of root = [0, n−1].
Common Mistakes
- Confusing in/out with "enter/exit" variant. For subtree = contiguous segment, use "first visit" only and out[u] = last index in subtree (as built above).
- 0-based vs 1-based. Segment tree/BIT on indices 0..n−1: use in_time and out_time as 0-based; if BIT is 1-based, use in_time[u]+1.
"I'll flatten the tree with a DFS: in[u] when we enter, out[u] when we leave. Subtree of u is the contiguous segment [in[u], out[u]]. Then I can use a segment tree or BIT on that array for subtree sum/update in O(log n)."
Practice Problems
- Subtree sum queries / subtree update (e.g. add x to all nodes in subtree of u).
- Problems that need "all nodes in subtree" as a range on an array (e.g. tree + segment tree).
Euler tour: DFS, in[u] = timer++, recurse, out[u] = timer−1. Subtree of u = [in[u], out[u]]. Put node values at in[u]; use segment tree/BIT for subtree queries/updates.
Summary
- Euler tour (flattening): DFS to get in[u], out[u]; subtree of u = contiguous segment [in[u], out[u]].
- Use with segment tree or BIT for O(log n) subtree sum/update. Build O(n).
11.18 Heavy-Light Decomposition
Introduction
Heavy-Light Decomposition (HLD) splits a rooted tree into a set of heavy paths such that any path from a node to the root intersects at most O(log n) heavy paths. Each heavy path is a contiguous segment in a DFS order, so you can use a segment tree (or similar) on the concatenation of these paths to support path queries (e.g. max edge weight on path u–v) and sometimes path updates. HLD is used when you need many path-aggregate queries/updates on a tree; building it is O(n), and each path query is O(log² n) with a segment tree over chains.
Formal Definition
For each node u, the heavy child is the child whose subtree has the largest size (break ties arbitrarily). The heavy edge is the edge from u to its heavy child; all other edges from u to children are light edges. A heavy path is a maximal sequence of nodes connected by heavy edges. Each node belongs to exactly one heavy path. The root of that path is the head of the chain. When we DFS and assign positions, we visit the heavy child first so that each heavy path becomes a contiguous segment in the DFS order.
Mental Model
From each node, pick the "heaviest" child (largest subtree) and call that edge heavy; the rest are light. Following heavy edges from any node leads down to a leaf and forms one chain. The tree is partitioned into chains; a path from u to root goes "up" and may switch chains at light edges. Because switching chains at least halves the subtree size, you switch at most O(log n) times.
Steps to Build
- Compute
size[u]for all nodes (DFS). - For each node, mark the heavy child (child with max size).
- DFS again (heavy child first) to assign
pos[u]in a global array andhead[u](chain head). Each chain is contiguous in the global array.
Then map node values to the segment tree at indices pos[u]. Path from u to v: find LCA w; path u–v = path u–w + path w–v. To query path u–w: while u is not in the same chain as w, query segment [pos[head[u]], pos[u]], then move u to parent of head[u]. Repeat until u and w are in the same chain; then query [pos[w], pos[u]]. Same for the other half. O(log n) chain jumps × O(log n) segment tree query = O(log² n).
Diagram: Heavy vs light edges
Tree (numbers = subtree sizes): Heavy edges (---), light (···)
5 (root) r
/ \ / \
3 1 H L
/ \ \ / \ \
1 1 1 H L L
Heavy child = child with largest subtree. Path from leaf to root crosses O(log n) light edges.
def dfs_size(g, u, p, size):
size[u] = 1
for v in g[u]:
if v != p:
dfs_size(g, v, u, size)
size[u] += size[v]
def dfs_hld(g, u, p, size, head, pos, head_of, timer):
head_of[u] = head
pos[u] = timer[0]
timer[0] += 1
heavy = None
for v in g[u]:
if v != p and (heavy is None or size[v] > size[heavy]):
heavy = v
if heavy is not None:
dfs_hld(g, heavy, u, size, head, pos, head_of, timer)
for v in g[u]:
if v != p and v != heavy:
dfs_hld(g, v, u, size, v, pos, head_of, timer) # new chain, head = v
# Query path u -> v: split at LCA w. For u->w: while u not in chain of w, query [pos[head_of[u]], pos[u]], u = parent[head_of[u]]; then query [pos[w], pos[u]].
# Same for v->w. Combine results (e.g. max of segments).
Path Query (u to v)
Let w = LCA(u, v). Path u–v = path u–w plus path w–v. To get aggregate from u to w: while u and w are in different chains, query the segment from pos[head_of[u]] to pos[u], then set u = parent[head_of[u]]. When u and w are in the same chain, query [pos[w], pos[u]] and combine. Do the same for v to w. Combine the two halves (e.g. take max, or concatenate).
Time and Space Complexity
- Build: O(n) for size DFS + HLD DFS.
- Path query/update: O(log² n) with segment tree (O(log n) chain jumps × O(log n) per segment query).
- Space: O(n) for pos, head, size, plus segment tree O(n).
When to Use
Use HLD when you need path queries or updates (max/min/sum on path u–v, or update edges/nodes on a path). For subtree queries only, Euler tour + segment tree is simpler. HLD is heavier to implement but standard for path problems on trees.
Edge Cases
- Leaf: No heavy child; it starts its own chain (head = itself).
- Path u–u: Single node; return value at u (or identity for aggregate).
Common Mistakes
- Querying the wrong segment. When u and w are in the same chain, segment is [pos[w], pos[u]] (assuming pos[w] ≤ pos[u]); order depends on DFS. Be consistent with depth.
- Forgetting to combine both halves. Path u–v = u–LCA + LCA–v; combine the two aggregates correctly (e.g. for max, take max of both; for sum, add).
"I'll use heavy-light decomposition: heavy child = largest subtree. Heavy paths are contiguous in DFS order. Path from u to v: go to LCA in O(log n) steps, each step querying one segment on a segment tree. Total O(log² n) per path query. Use when we need path max/sum/update."
Practice Problems
- Path maximum/minimum/sum query on a tree (nodes or edges).
- Path update (e.g. add x to all nodes on path u–v).
- CP/contest problems that explicitly ask for HLD or "path queries on tree."
HLD: heavy child = max subtree size; DFS heavy first so each chain is contiguous. path_query(u,v) = split at LCA; climb from u (and v) by chains, query segment [pos[head[u]], pos[u]], then move to parent of head. O(log² n) with segment tree.
Summary
- HLD: Partition tree into heavy paths; any path to root crosses O(log n) chains. Chains are contiguous in segment tree.
- Path query: climb from u and v to LCA, query segment per chain; O(log² n). Build O(n).
11.19 Centroid Decomposition
Introduction
Centroid Decomposition is a technique that recursively splits a tree by centroids. A centroid is a node whose removal leaves no connected component of size greater than n/2. Every tree has at least one centroid (and at most two). We pick a centroid, solve for it (e.g. count paths through it), then remove it and recurse on each remaining component. The recursion depth is O(log n), so total work over all levels is often O(n log n). It is used for problems like "count pairs of nodes (u, v) such that distance(u, v) = k" or "sum of distances to all nodes."
Formal Definition
In a tree of n nodes, a node c is a centroid if every connected component of the tree after removing c has size ≤ n/2. Equivalently: for every neighbor v of c, the size of the subtree of v (when rooting at c) is ≤ n/2. To find a centroid: start at any node, then repeatedly move to the neighbor that has subtree size > n/2 until no such neighbor exists; that node is a centroid. The centroid tree is built by: pick centroid c, remove c, recurse on each component; the centroid tree has c as root and its children are the roots of the centroid trees of those components. Depth of centroid tree is O(log n).
Mental Model
Think of the centroid as the "balance point" of the tree: no single branch has more than half the nodes. After removing it, we have several smaller trees; we recursively find their centroids and make them children of the current centroid. Any path in the original tree crosses at most O(log n) centroids in this decomposition, which lets us count paths by "paths through centroid c" and then recurse.
Finding a Centroid
1) Root the tree arbitrarily; compute size[u] for all nodes (DFS). 2) Start at the root. 3) If there exists a child v such that size[v] > n/2, move to v and repeat. 4) The node where we stop is a centroid. O(n) time.
Using Centroid Decomposition (e.g. count paths of length k)
At each centroid c: count paths of length k that pass through c. Such a path has one endpoint in one component (after removing c) and the other in another (or c itself). For each component, compute distances from c to all nodes in that component; store counts by distance. Then for each distance d in one component, we need k−d in another; add to answer. Then mark c as removed and recurse on each component. Total O(n log n) if we do O(size) work per centroid.
Diagram: Centroid and decomposition
Tree (n=5): Centroid: remove 2 -> components size 2, 1, 1 (all <= 5/2)
1 - 2 - 3 So 2 is a centroid. Centroid tree: 2 is root; left subtree = centroid of {1,4};
/ \ right = centroid of {3,5}. Depth O(log n).
4 5
def get_size(g, u, p, size, removed):
size[u] = 1
for v in g[u]:
if v != p and not removed[v]:
get_size(g, v, u, size, removed)
size[u] += size[v]
def find_centroid(g, u, p, n, size, removed):
for v in g[u]:
if v != p and not removed[v] and size[v] > n // 2:
return find_centroid(g, v, u, n, size, removed)
return u
def decompose(g, u, removed, parent_centroid):
size = {}
get_size(g, u, -1, size, removed)
n = size[u]
c = find_centroid(g, u, -1, n, size, removed)
removed[c] = True
# parent_centroid[c] = parent in centroid tree (if needed)
for v in g[c]:
if not removed[v]:
decompose(g, v, removed, c)
# Process paths through c here (e.g. count paths of length k)
removed[c] = False # only if we need to traverse again; often we don't restore
Time and Space Complexity
- Find centroid: O(n) per level. Recursion depth O(log n), so total O(n log n) for building the decomposition (if we do size DFS per centroid).
- Path-count type problems: Often O(n log n) or O(n log² n) depending on how we aggregate per centroid.
- Space: O(n) for size, removed, and centroid tree.
When to Use
Use centroid decomposition when the problem asks for counting paths with a property (e.g. length = k, sum of weights = k), or aggregating over all pairs (e.g. sum of distances). The key is "paths through current centroid" then recurse. For single-source or single-path queries, BFS/DFS or LCA may be simpler.
Edge Cases
- n = 1: The only node is the centroid; no children.
- Line graph: Centroid is the middle node(s); recursion depth is O(log n).
Common Mistakes
- Size in wrong tree. When computing size for "find centroid," only count nodes in the current component (ignore removed nodes).
- Counting the same path twice. When counting paths through c, ensure you count pairs (u, v) with u and v in different components (or one is c) and don't double-count.
"I'll use centroid decomposition: find a node whose removal leaves no component larger than n/2, count paths through it (e.g. by distance buckets in each component), then recurse on components. Depth O(log n), so total O(n log n) for path-counting problems."
Practice Problems
- Count pairs (u, v) such that dist(u, v) = k.
- Sum of distances between all pairs, or from a set of nodes to all others.
- Problems that say "paths in a tree" and need to aggregate over many paths.
Centroid: remove it, all components size ≤ n/2. Find by moving to child with size > n/2 until none. Decompose: pick centroid, solve paths through it, recurse on components. Depth O(log n); use for path-count and pair-aggregate problems.
Summary
- Centroid: Node whose removal leaves no component of size > n/2. Found in O(n).
- Centroid decomposition: Recursively pick centroid, count/solve paths through it, recurse. O(log n) depth, often O(n log n) total. Use for path counting and distance problems.
Section 12: Heap
This section covers heaps: complete binary trees that satisfy the heap property. You will learn Min Heap and Max Heap, heapify, and classic patterns like Top K elements, merge K sorted lists, and median in a stream. Master these to tackle priority-queue problems in interviews and contests.
12.1 Min Heap
Introduction
A min heap is a complete binary tree where every node has a value smaller than or equal to the values of its children. The smallest element is always at the root. Min heaps are used whenever you need fast access to the minimum (e.g. priority queues, scheduling, finding the K smallest elements). Unlike a sorted array, you can insert and remove the minimum in O(log n) time while keeping the structure valid.
Real-World Analogy
Imagine a hospital emergency queue: patients are not served in arrival order but by priority (e.g. severity). The person with the smallest “priority number” (most urgent) is at the front. When someone new arrives, they are placed in the right spot; when the front is served, the next most urgent rises to the top. The min heap is exactly this: the “smallest” (highest priority) is always at the root, and updates are done in logarithmic time.
Formal Definition
- Complete binary tree: All levels are fully filled except possibly the last, which is filled from left to right.
- Min-heap property: For every node
i,value(i) ≤ value(children of i). So the root has the minimum value in the entire tree. - We do not require ordering between siblings (e.g. left vs right); only parent ≤ children.
Why This Topic Matters
Min heaps power priority queues, which appear in scheduling, graph algorithms (Dijkstra), merge K sorted lists, Top K problems, and median-finding. Interviewers often ask you to implement a heap from scratch or to recognize when “always take the smallest/largest” suggests a heap. Understanding the array representation and the bubble-up / bubble-down operations is essential for both implementation and complexity analysis.
Mental Model
Think of the heap as a pyramid of values: the smallest sits on top. When you add a new value, drop it at the next free spot (bottom-right of the tree), then let it float up by swapping with its parent until the parent is smaller or you hit the root. When you remove the minimum, you take the root, replace it with the last element in the tree, then let that value sink down by swapping with the smaller child until both children are larger or you hit a leaf. “Float up” and “sink down” keep the min-heap property with O(log n) work per operation.
Array Representation (Critical)
We store the heap in a 0-indexed array and use index arithmetic to navigate the tree:
- Root at index
0. - For a node at index
i: parent at(i - 1) // 2, left child at2*i + 1, right child at2*i + 2.
So we never need pointers; the tree structure is implicit. The array must stay “complete”: we always add at the end and remove from the end when we pop the root.
ASCII Diagram
Min heap (tree view) Array: [2, 5, 7, 9, 6, 10, 8]
2 (root, min)
/ \
5 7
/ \ / \
9 6 10 8
Index: 0 1 2 3 4 5 6
Parent of 4: (4-1)//2 = 1 → value 5
Left of 1: 2*1+1 = 3 → value 9
Right of 1: 2*1+2 = 4 → value 6
Every node ≤ its children.
Core Operations
1. Insert (push)
- Append the new element at the end of the array (next free position in the complete tree).
- Bubble up (sift-up): Compare with parent; if smaller, swap with parent and repeat until parent is smaller or we reach the root.
Cost: O(log n) because the path from leaf to root has at most log₂(n+1) nodes.
2. Get minimum (peek)
Return the element at index 0. No structural change. O(1).
3. Remove minimum (pop)
- Save the root (minimum) to return later.
- Replace the root with the last element of the array and shrink the size by one.
- Bubble down (sift-down): Compare the new root with both children; if it is greater than the smaller child, swap with that child and repeat until both children are ≥ current or we reach a leaf.
Cost: O(log n) because we traverse at most one path from root to leaf.
Python Implementation
class MinHeap:
def __init__(self):
self.heap = []
def _parent(self, i):
return (i - 1) // 2
def _left(self, i):
return 2 * i + 1
def _right(self, i):
return 2 * i + 2
def _swap(self, i, j):
self.heap[i], self.heap[j] = self.heap[j], self.heap[i]
def push(self, x):
self.heap.append(x)
self._sift_up(len(self.heap) - 1)
def _sift_up(self, i):
while i > 0:
p = self._parent(i)
if self.heap[i] >= self.heap[p]:
break
self._swap(i, p)
i = p
def peek(self):
if not self.heap:
raise IndexError("peek from empty heap")
return self.heap[0]
def pop(self):
if not self.heap:
raise IndexError("pop from empty heap")
n = len(self.heap)
self._swap(0, n - 1)
min_val = self.heap.pop()
if self.heap:
self._sift_down(0)
return min_val
def _sift_down(self, i):
n = len(self.heap)
while True:
left = self._left(i)
right = self._right(i)
smallest = i
if left < n and self.heap[left] < self.heap[smallest]:
smallest = left
if right < n and self.heap[right] < self.heap[smallest]:
smallest = right
if smallest == i:
break
self._swap(i, smallest)
i = smallest
def size(self):
return len(self.heap)
Line-by-Line Explanation
_parent(i), _left(i), _right(i): Index formulas for the implicit binary tree; keep the logic in one place.push(x): Append at end, then_sift_upfrom that index until the min-heap property holds (current ≥ parent or at root)._sift_up(i): While not at root, compare with parent; if smaller, swap and move index to parent.pop(): Swap root with last element, remove last (that’s the min), then_sift_down(0)so the new root sinks to the correct level._sift_down(i): Find the smallest among nodeiand its two children; if the smallest is a child, swap with that child and repeat from the child index; otherwise stop.
Time Complexity
- push: O(log n). One append + at most O(log n) swaps along the path to the root.
- peek: O(1).
- pop: O(log n). Swap + pop + at most O(log n) swaps along one path downward.
- Building a heap from n elements by repeated push: O(n log n). Building in place with heapify (topic 12.3) is O(n).
Space Complexity
O(n) for storing n elements in the array. No extra space proportional to n for the operations (only a few variables for indices).
Edge Cases
- Empty heap:
peekandpopshould raise or return a sentinel; the implementation above raisesIndexError. - Single element: After one push, one pop leaves the heap empty; no need to sift down.
- Duplicate values: Min heap allows duplicates; any of the equal minima can be returned first. No need for stable ordering unless the problem requires it.
In _sift_down, compare with both children and swap with the smaller one. Swapping with the larger child can break the min-heap property in the other subtree.
Python’s heapq module is a min-heap on a list. Use heapq.heappush(h, x), heapq.heappop(h), heapq.heapify(lst). For max-heap, negate values or use a custom comparator. Knowing both the library and the manual implementation makes you interview-ready.
Evolution: From Naive to Heap
For “repeatedly get the minimum and possibly add new elements”:
- Brute force: Store elements in an unsorted list; each “get min” is O(n) scan, each insert O(1). Total for n get-mins + inserts: O(n²).
- Better: Keep a sorted list; get min O(1), but insert is O(n) to maintain order. Still O(n²) for n operations if we do many inserts.
- Optimal: Min heap — get min O(1), insert and remove min O(log n). n operations → O(n log n).
If the problem involves “K smallest/largest”, “merge K sorted”, “stream of numbers and report median”, or “schedule by priority”, think heap. Clarify whether you can use heapq or must implement from scratch; then use the array index formulas and sift-up/sift-down correctly.
Summary
- Min heap: Complete binary tree with parent ≤ children; minimum at root.
- Array: Index
i→ parent(i-1)//2, left2*i+1, right2*i+2. - Insert: Append, then sift up. Remove min: Swap root with last, pop last, then sift down. Peek: Return root.
- Time: push O(log n), pop O(log n), peek O(1). Space: O(n).
12.2 Max Heap
Introduction
A max heap is a complete binary tree where every node has a value greater than or equal to the values of its children. The largest element is always at the root. Max heaps are used when you need fast access to the maximum: Top K largest elements, scheduling by highest priority, or any “repeatedly take the biggest” scenario. The structure and array representation are identical to the min heap; only the comparison direction changes.
Relationship to Min Heap
Max heap is the mirror of min heap: replace “smaller” with “larger” in every comparison. Parent ≥ both children; root holds the global maximum. All operations (push, pop, peek) have the same O(log n) or O(1) complexity. In Python’s heapq (which is min-heap only), a common trick for a max heap is to negate values: push -x, and when you pop, take -heapq.heappop(h) to get the real maximum.
Formal Definition
- Complete binary tree: Same as min heap — all levels full except possibly the last, filled left to right.
- Max-heap property: For every node
i,value(i) ≥ value(children of i). The root is the maximum.
Why This Topic Matters
Many problems ask for the K largest elements, the maximum in a sliding window, or “always process the highest-priority item.” Max heap (or negated min heap) is the right tool. Interviewers may ask you to implement a max heap from scratch or to adapt a min-heap solution; knowing the single comparison flip makes this straightforward.
Mental Model
Same pyramid as min heap, but the largest is on top. On insert: append at the end, then sift up by swapping with the parent while the current value is greater than the parent. On remove max: save the root, replace root with the last element, pop the last, then sift down by swapping with the larger child until both children are ≤ current or you reach a leaf.
Array Representation
Identical to min heap: root at 0, parent (i-1)//2, left 2*i+1, right 2*i+2. Only the sift logic uses “greater than” instead of “less than.”
ASCII Diagram
Max heap (tree view) Array: [10, 8, 7, 5, 6, 3, 4]
10 (root, max)
/ \
8 7
/ \ / \
5 6 3 4
Every node ≥ its children. Sift-up: swap if current > parent.
Sift-down: swap with the *larger* child if current < that child.
Core Operations (Summary)
- Push: Append, then sift up (swap with parent while
heap[i] > heap[parent]). - Peek: Return
heap[0]. O(1). - Pop (remove max): Swap root with last, pop last, then sift down (swap with the larger child while current is smaller than that child).
Python Implementation
class MaxHeap:
def __init__(self):
self.heap = []
def _parent(self, i):
return (i - 1) // 2
def _left(self, i):
return 2 * i + 1
def _right(self, i):
return 2 * i + 2
def _swap(self, i, j):
self.heap[i], self.heap[j] = self.heap[j], self.heap[i]
def push(self, x):
self.heap.append(x)
self._sift_up(len(self.heap) - 1)
def _sift_up(self, i):
while i > 0:
p = self._parent(i)
if self.heap[i] <= self.heap[p]: # stop when current <= parent
break
self._swap(i, p)
i = p
def peek(self):
if not self.heap:
raise IndexError("peek from empty heap")
return self.heap[0]
def pop(self):
if not self.heap:
raise IndexError("pop from empty heap")
n = len(self.heap)
self._swap(0, n - 1)
max_val = self.heap.pop()
if self.heap:
self._sift_down(0)
return max_val
def _sift_down(self, i):
n = len(self.heap)
while True:
left = self._left(i)
right = self._right(i)
largest = i
if left < n and self.heap[left] > self.heap[largest]: # compare with >
largest = left
if right < n and self.heap[right] > self.heap[largest]:
largest = right
if largest == i:
break
self._swap(i, largest)
i = largest
def size(self):
return len(self.heap)
Min Heap vs Max Heap: Comparison
| Aspect | Min Heap | Max Heap |
|---|---|---|
| Property | Parent ≤ children | Parent ≥ children |
| Root | Minimum | Maximum |
| Sift-up | Swap if current < parent | Swap if current > parent |
| Sift-down | Swap with smaller child | Swap with larger child |
| heapq | Direct use | Negate values: push -x, pop -val |
To use heapq as a max heap: heapq.heappush(h, -x) and max_val = -heapq.heappop(h). For objects, push (-priority, item) to get max-first ordering.
Time and Space Complexity
Same as min heap: push O(log n), pop O(log n), peek O(1). Space O(n).
When to Use Which
- Min heap: K smallest, merge K sorted (smallest next), Dijkstra (smallest distance), median (lower half max).
- Max heap: K largest, “highest priority first,” sliding window max (often with deque or two heaps), lower half of median (max of lower half).
If the problem says “K largest,” use a max heap or a min heap of size K (keep only K elements; pop the smallest when exceeding K). For “K smallest,” use a max heap of size K (pop the largest) or a min heap and pop K times. State which you’re using and why.
Summary
- Max heap: Complete binary tree with parent ≥ children; maximum at root.
- Same array layout as min heap; only comparisons are reversed (use
>and “larger child” in sift-down). - heapq in Python is min-heap only; use negated values for a max heap.
- Same complexities as min heap; choose by whether you need “smallest” or “largest” at the root.
12.3 Heapify
Introduction
Heapify is the operation of turning an arbitrary array of numbers into a valid heap (min or max) in place. You already use it when you call heapq.heapify(lst) or when building a heap from a list. The surprising part: building a heap from n elements can be done in O(n) time, not O(n log n). This lesson explains why that is true and how to implement and use heapify correctly.
Why This Topic Matters
Whenever you have a pre-existing list of values and need a heap (e.g., for Top K, merge K sorted, or a one-time priority queue), building the heap with heapify is faster than pushing each element one by one. Understanding the O(n) analysis also comes up in interviews when you're asked to optimize "build heap from array."
Two Ways to Build a Heap
Method 1: Repeated Insert (Push) — O(n log n)
Start with an empty heap. For each element in the list, call push(x). Each push does O(log n) work in the worst case (sift up), and we do n pushes, so total time is O(n log n). Simple but not optimal when you already have all the data.
Method 2: Heapify (Bottom-Up Sift-Down) — O(n)
Treat the array as a complete binary tree (same index rules: parent (i-1)//2, left 2*i+1, right 2*i+2). Then, starting from the last non-leaf node down to the root, run sift-down (bubble down) at each node. This restores the heap property in one pass and can be proven to run in O(n).
Mental Model
- Leaves are already valid heaps (single nodes).
- Index of the last non-leaf in a 0-based array of length
nisn // 2 - 1. - For each node from that index down to 0, we "fix" the subtree rooted at that node by sifting the node down until both children are valid (smaller than it in a min heap, or larger in a max heap).
ASCII Diagram: Heapify Order (Min Heap)
Array length n = 7. Last non-leaf index = 7//2 - 1 = 2.
Tree indices: 0
/ \
1 2 ← start heapify here (index 2), then 1, then 0
/ \ / \
3 4 5 6
Order of sift-down: 2 → 1 → 0. At each step, the node may sink down
multiple levels until its subtree satisfies the heap property.
Formal Definition
Heapify (build heap in place): Given an array A[0..n-1], rearrange it so that the complete binary tree represented by the array satisfies the heap property (min or max). The algorithm is: for i = n//2 - 1 down to 0, perform sift-down at index i.
Step-by-Step: Heapify for Min Heap
- Set
n = len(arr). - Last non-leaf index:
start = n // 2 - 1. - For
i = startdown to0(inclusive):- Run sift-down at index
i: compare with left and right children, swap with the smaller child if the current node is larger, and repeat until the node is ≤ both children or reaches a leaf.
- Run sift-down at index
Python Implementation: Heapify (Min Heap)
def heapify_min(arr):
"""Turn list into a min heap in place. O(n)."""
n = len(arr)
def sift_down(i):
while True:
left = 2 * i + 1
right = 2 * i + 2
smallest = i
if left < n and arr[left] < arr[smallest]:
smallest = left
if right < n and arr[right] < arr[smallest]:
smallest = right
if smallest == i:
break
arr[i], arr[smallest] = arr[smallest], arr[i]
i = smallest
# Start from last non-leaf down to root
for i in range(n // 2 - 1, -1, -1):
sift_down(i)
Why Is Heapify O(n) and Not O(n log n)?
Intuition: Most nodes are near the bottom of the tree. When we sift down from a node at height h, it can move at most h steps. There are roughly n/2^(h+1) nodes at height h. So the total work is on the order of:
\[ \sum_{h=0}^{\lfloor \log n \rfloor} \frac{n}{2^{h+1}} \cdot h \;=\; n \sum_{h \ge 0} \frac{h}{2^{h+1}} \;=\; O(n). \]
The sum \(\sum \frac{h}{2^h}\) is a constant, so the whole build is O(n). This is why heapq.heapify and a proper "build heap from array" implementation are preferred when you start with a full list.
Time and Space Complexity
- Time: O(n) for heapify on n elements.
- Space: O(1) extra if done in place (only the array is modified).
Using heapq.heapify
import heapq
arr = [7, 2, 5, 1, 9, 3]
heapq.heapify(arr) # in-place min heap
# arr is now a valid min-heap (e.g. arr[0] is minimum)
print(arr[0]) # 1
print(heapq.heappop(arr)) # 1
arr_neg = [-x for x in arr], then heapq.heapify(arr_neg). Pop with -heapq.heappop(arr_neg).
Edge Cases
- Empty list:
n//2 - 1is-2in Python;range(-1, -1, -1)is empty, so no iterations — correct. - Single element:
n//2 - 1 = -1; loop runs zero times; single node is already a heap. - Two elements: One non-leaf at index 0; one sift-down may swap them — correct.
Common Mistakes
Summary
- Heapify builds a heap from an array in place by running sift-down from the last non-leaf (
n//2 - 1) down to the root. - Complexity is O(n), not O(n log n), because most nodes are low in the tree and move few steps.
- Use
heapq.heapify(lst)for a min heap in Python; for max heap, heapify a negated list. - Prefer heapify over n pushes when you already have all elements — it's faster and same space.
12.4 Top K Elements
Introduction
Top K Elements is one of the most common patterns in coding interviews and real systems: given a collection of items (numbers, strings, objects with a score), find the K largest or K smallest elements. Heaps give an efficient, streaming-friendly solution that avoids sorting the entire dataset. Mastering this pattern unlocks problems like "K closest points," "K most frequent elements," and "merge K sorted lists."
Real-World Analogy
Imagine you run a music app with millions of songs. You want to show users "Top 10 most played this week." You could sort all songs by play count and take the first 10 — but sorting millions of entries is expensive and unnecessary. A smarter approach: keep only a small "candidate set" of size K (e.g., a min heap of the top 10 so far). As you scan through songs, you compare each with the smallest in your top 10; if the new song is bigger, it kicks out the smallest and joins the set. At the end, you have exactly the top K without ever fully sorting the list.
Formal Definition
Input: An array (or stream) of n elements and an integer K (1 ≤ K ≤ n).
Output: The K elements that are largest (or smallest) by some comparison key.
Goal: Achieve better than O(n log n) full sort when possible, and support streaming (process elements one by one) when needed.
Why This Topic Matters
- Interviewers love "Top K" in various forms: K largest, K smallest, K most frequent, K closest.
- Heaps give O(n log K) time and O(K) space — ideal when K is much smaller than n.
- The same idea extends to priority queues in Dijkstra, merge K sorted lists, and finding the median of a stream.
Mental Model
- K largest: Keep a min heap of size K. The root is the "smallest of the top K." If a new element is larger than the root, pop the root and push the new one. At the end, the heap contains the K largest; order inside the heap doesn't matter for the answer.
- K smallest: Keep a max heap of size K (in Python: min heap of negated values). The root is the "largest of the bottom K." If a new element is smaller than that, pop the root and push the new one. Result: K smallest.
Evolution: Brute Force → Better → Optimal
Brute Force: Full Sort
Sort the entire array, then take the first K (for K smallest) or last K (for K largest).
- Time: O(n log n).
- Space: O(n) or O(log n) depending on sort.
Simple but wasteful when K ≪ n.
Better: Partial Sort or Quickselect
Use quickselect to find the K-th smallest (or largest) element, then partition. Or use a library partial sort.
- Time: O(n) average for quickselect; O(n log K) for heap approach.
- Doesn't naturally support streaming (elements arriving one by one).
Optimal (for streaming and when K is small): Heap
Maintain a min heap of size K for "K largest" (or max heap of size K for "K smallest"). One pass over the data; each insertion is O(log K).
- Time: O(n log K).
- Space: O(K).
Works for streams and is easy to reason about in interviews.
Step-by-Step: K Largest Using Min Heap
- Create an empty min heap (e.g. Python list with
heapq). - For each element
xin the array:- If heap size < K:
heapq.heappush(heap, x). - Else: if
x > heap[0], thenheapq.heapreplace(heap, x)(or pop then push). This keeps the heap size K and drops the smallest of the current top K when a larger candidate appears.
- If heap size < K:
- After the loop, the heap contains exactly the K largest elements. The root is the K-th largest; to get them in sorted order (optional), pop K times or sort the heap.
Python Implementation: K Largest and K Smallest
import heapq
def k_largest(nums, k):
"""Return the K largest elements. Uses min heap of size K."""
if k <= 0 or not nums:
return []
if k >= len(nums):
return list(nums)
heap = []
for x in nums:
if len(heap) < k:
heapq.heappush(heap, x)
elif x > heap[0]:
heapq.heapreplace(heap, x) # pop smallest, push x
return heap # order is heap order; for sorted: sorted(heap, reverse=True)
def k_smallest(nums, k):
"""Return the K smallest elements. Uses max heap of size K via negated min heap."""
if k <= 0 or not nums:
return []
if k >= len(nums):
return list(nums)
heap = [] # min heap of -x => "max heap" of x
for x in nums:
if len(heap) < k:
heapq.heappush(heap, -x)
elif x < -heap[0]: # x is smaller than current "max of smallest K"
heapq.heapreplace(heap, -x)
return [-x for x in heap]
Examples Section
Example 1: K Largest — Walkthrough
nums = [3, 2, 1, 5, 6, 4], K = 2. We want the 2 largest: 5 and 6.
Min heap of size 2 (only the top 2 candidates):
- Push 3 → heap
[3]. - Push 2 → heap
[2, 3](min at root). - 5 > 2 → replace: pop 2, push 5 → heap
[3, 5]. - 6 > 3 → replace: pop 3, push 6 → heap
[5, 6]. - 4 < 5 → do nothing.
Final heap = [5, 6]. The 2 largest are 5 and 6. Root (5) is the 2nd largest.
Example 2: K Smallest — Walkthrough
nums = [7, 10, 4, 3, 20, 15], K = 3. We want the 3 smallest: 3, 4, 7.
Max heap of size 3 implemented as min heap of negatives:
- Push -7 → heap
[-7]. - Push -10 → heap
[-10, -7]. Root -10 means "largest of small set" is 10. - Push -4 → heap
[-10, -7, -4]. - 3 < 10 (current max of small set) → replace -10 with -3 → heap
[-7, -3, -4]. - 20 > 7 → ignore.
- 15 > 7 → ignore.
Heap contains [-7, -3, -4] → values [7, 3, 4]. The 3 smallest are 3, 4, 7.
Example 3: Code Run with Output
import heapq
nums = [3, 2, 1, 5, 6, 4]
k = 2
# K largest
heap = []
for x in nums:
if len(heap) < k:
heapq.heappush(heap, x)
elif x > heap[0]:
heapq.heapreplace(heap, x)
print("K largest (heap order):", heap) # e.g. [5, 6]
print("K-th largest value:", heap[0]) # 5 (2nd largest)
print("Sorted K largest:", sorted(heap, reverse=True)) # [6, 5]
Output:
K largest (heap order): [5, 6]
K-th largest value: 5
Sorted K largest: [6, 5]
Example 4: K Most Frequent (Top K by Frequency)
nums = [1, 1, 1, 2, 2, 3] and K = 2, return the 2 most frequent elements: 1 (freq 3) and 2 (freq 2). Here the "score" is frequency; we want "top K by frequency."
Approach: Count frequencies, then use a min heap of size K on (frequency, element). Python compares tuples by first element, then second. So we store (freq, item) and the heap keeps the K pairs with largest freq (smallest of those K at root).
from collections import Counter
import heapq
def top_k_frequent(nums, k):
count = Counter(nums)
heap = []
for num, freq in count.items():
if len(heap) < k:
heapq.heappush(heap, (freq, num))
elif freq > heap[0][0]:
heapq.heapreplace(heap, (freq, num))
return [num for _, num in heap]
# Example run
nums = [1, 1, 1, 2, 2, 3]
print(top_k_frequent(nums, 2)) # [2, 1] or [1, 2] (heap order)
Result: the two most frequent elements are 1 and 2.
Time and Space Complexity
- Time: O(n log K) — n elements, each heap operation O(log K).
- Space: O(K) for the heap. If you count a frequency map, O(n) for that; heap alone is O(K).
Edge Cases
- K ≤ 0 or empty array: Return empty list.
- K ≥ n: All elements are "top K"; return a copy of the array (or heap of all).
- K = 1: One pass with a single variable (or a heap of size 1) to track max or min.
- Duplicates: Heap approach naturally handles duplicates; they can coexist in the heap.
Common Mistakes
heapq is a min heap. For K smallest you must negate values (or use a custom comparator) to simulate a max heap of size K.
Pattern Recognition
Use the Top K heap pattern when you see:
- "K largest," "K smallest," "K most frequent," "K closest," "K closest points to origin."
- Streaming data where you can't sort the whole input.
- Problems that reduce to "maintain a set of K best candidates and update as we scan."
Interview Insight
Practice Problems
- LeetCode 215: Kth Largest Element in an Array (use min heap of size K or quickselect).
- LeetCode 347: Top K Frequent Elements (count + min heap on frequency).
- LeetCode 373: Find K Pairs with Smallest Sums (heap of pairs from two sorted arrays).
- K closest points to origin: maintain min heap of size K by distance (or max heap of K smallest distances).
Summary
- Top K largest: Min heap of size K; replace root when a larger element is seen. Result: heap contains K largest; root = K-th largest.
- Top K smallest: Max heap of size K (in Python: min heap of negated values); replace when a smaller element is seen.
- Time O(n log K), space O(K). Beats full sort when K ≪ n and supports streaming.
- Same idea extends to "top K by any key" (e.g. frequency): use (key, item) in the heap and compare by key.
12.5 Merge K Sorted Lists
Introduction
In many problems, we are given K sorted lists (or arrays, or linked lists) and asked to merge them into one sorted list. A naive pairwise merge can be slow when K or the total number of elements is large. A min heap gives a clean, optimal solution that generalizes the two-list merge you saw in merge sort.
Real-World Analogy
Imagine K checkout queues in a supermarket, each already ordered by arrival time. You want to reconstruct the global order in which customers arrived across all queues. You always pick the earliest arriving customer among the queue fronts, then advance that queue. A min heap automates "find the earliest front" in O(log K) time.
Problem Definition
Input: K sorted lists, total of N elements.
Output: A single sorted list containing all N elements.
Goal: Better than O(NK) and easy to implement for arbitrary K.
Brute Force and Better Approaches
Approach 1: Concatenate + Sort (Brute Force)
- Concatenate all K lists into one big list (size N).
- Sort the big list using a standard sort: O(N log N).
Simple, but it ignores the fact that each list is already sorted.
Approach 2: Repeated Pairwise Merge
- Merge list 1 and list 2 (like merge sort) into a new sorted list.
- Merge that result with list 3, and so on.
Each merge of two lists of total size M costs O(M). In the worst case (unbalanced merging), the total complexity can approach O(NK). Even if you balance merges (like a tournament tree), the complexity is O(N log K), but the implementation is more involved.
Approach 3 (Optimal and Clean): Min Heap
Use a min heap of size at most K, where each heap entry represents the "current front" element from one of the lists. Repeatedly extract the smallest element from the heap and push the next element from the same list.
- Time: O(N log K) — N heap operations, each O(log K).
- Space: O(K) extra for the heap, plus the output list.
Mental Model
- Visualize each list as a line of sorted items with a pointer at the front.
- The heap always stores at most one element per list: the current front with its list index.
- At each step, you remove the globally smallest front from the heap and advance that list’s pointer.
ASCII Diagram
Given 3 sorted lists:
L0: 1 4 7
L1: 2 5 8
L2: 3 6 9
Initial heap (value, list_index, element_index):
[(1, 0, 0), (2, 1, 0), (3, 2, 0)]
Pop (1,0,0) → output [1], push next from L0 → (4,0,1)
Heap: [(2,1,0), (3,2,0), (4,0,1)]
Pop (2,1,0) → output [1,2], push (5,1,1)
Heap: [(3,2,0), (4,0,1), (5,1,1)]
... continue until heap is empty ...
Final output: [1,2,3,4,5,6,7,8,9]
Python Implementation (Lists of Arrays)
import heapq
def merge_k_sorted_lists(lists):
"""
Merge K sorted lists (Python lists) into one sorted list.
lists: List[List[int]]
Returns: List[int]
"""
heap = []
result = []
# 1) Initialize heap with first element of each non-empty list
for list_idx, arr in enumerate(lists):
if arr: # non-empty
first_val = arr[0]
heapq.heappush(heap, (first_val, list_idx, 0)) # (value, which list, index in that list)
# 2) Extract-min and push next from same list
while heap:
val, list_idx, elem_idx = heapq.heappop(heap)
result.append(val)
next_idx = elem_idx + 1
if next_idx < len(lists[list_idx]):
next_val = lists[list_idx][next_idx]
heapq.heappush(heap, (next_val, list_idx, next_idx))
return result
Example: Arrays
lists = [
[1, 4, 5],
[1, 3, 4],
[2, 6]
]
print(merge_k_sorted_lists(lists))
Step-by-step heap evolution (values only):
- Initial heap: [1 (L0), 1 (L1), 2 (L2)] → pop 1 (L0), push 4 → output [1].
- Heap: [1 (L1), 2 (L2), 4 (L0)] → pop 1 (L1), push 3 → output [1, 1].
- Heap: [2 (L2), 4 (L0), 3 (L1)] → pop 2 (L2), push 6 → output [1, 1, 2].
- Heap: [3 (L1), 4 (L0), 6 (L2)] → pop 3, push 4 → output [1, 1, 2, 3].
- Heap: [4 (L0), 6 (L2), 4 (L1)] → pop 4 (L0), push 5 → output [1, 1, 2, 3, 4].
- Heap: [4 (L1), 6 (L2), 5 (L0)] → pop 4 (L1) → output [1, 1, 2, 3, 4, 4].
- Heap: [5 (L0), 6 (L2)] → pop 5 → output [1, 1, 2, 3, 4, 4, 5].
- Heap: [6 (L2)] → pop 6 → output [1, 1, 2, 3, 4, 4, 5, 6].
Final result: [1, 1, 2, 3, 4, 4, 5, 6].
Linked List Variant (LeetCode-style)
Many interview problems use linked lists instead of arrays. The idea is identical: the heap
stores (node.val, list_id, node). On pop, you append node.val to the result list
and push node.next if it exists.
import heapq
class ListNode:
def __init__(self, val=0, next=None):
self.val = val
self.next = next
def merge_k_sorted_linked_lists(lists):
"""
lists: List[ListNode] - each is the head of a sorted linked list.
Returns: head of merged sorted linked list.
"""
heap = []
# Initialize heap with head of each list
for i, node in enumerate(lists):
if node:
heapq.heappush(heap, (node.val, i, node))
dummy = ListNode(0)
tail = dummy
while heap:
val, i, node = heapq.heappop(heap)
tail.next = node
tail = tail.next
if node.next:
heapq.heappush(heap, (node.next.val, i, node.next))
return dummy.next
Time and Space Complexity
- N = total number of elements across all K lists.
- Each element is pushed to and popped from the heap at most once.
- Each heap operation costs O(log K) (heap size ≤ K).
- Total time: O(N log K).
- Extra space: O(K) for the heap, plus O(N) for the merged output.
Edge Cases
- All lists empty: Heap is empty, result is empty list.
- Some lists empty: We simply skip them during initialization.
- K = 1: Return that list directly.
- K = 0: Return empty list.
Common Mistakes
Pattern Recognition
Use this pattern whenever you see:
- "Merge K sorted arrays/lists/streams."
- "Multi-way merge" problems (e.g. merging log files from many servers).
- Anything that sounds like "always take the smallest front across multiple sorted sources."
Interview Insight
Summary
- Goal: Merge K sorted lists with total N elements efficiently.
- Tool: Min heap holding the current front element of each list.
- Complexity: O(N log K) time, O(K) extra space.
- Extremely common pattern in system design (logs, streams) and interviews (e.g. LeetCode 23).
12.6 Median in Stream
Introduction
The Median in a Data Stream problem asks: as numbers arrive one by one (a stream), maintain a data structure so that at any moment you can quickly return the median of all elements seen so far. Re-sorting after every insertion would be too slow. The classic efficient solution uses two heaps: a max heap for the lower half and a min heap for the upper half, so the median is always at the "boundary" between them.
Real-World Analogy
Imagine a running race where times are reported live. You want to show the "middle" time so far — the median — after each finisher. You could sort all times after every new result, but that gets expensive. Instead, you keep two groups: the slower half (you only care about the fastest in that half, i.e. its max) and the faster half (you only care about the slowest in that half, i.e. its min). The median is either that max, that min, or their average, depending on how many numbers you have. Two heaps give you that max and min in O(1) and updates in O(log n).
Formal Definition
Stream: A sequence of numbers arriving one at a time. You must support:
- addNum(num) — add a number to the stream.
- findMedian() — return the median of all numbers added so far.
Median (for sorted order of current elements):
- If count n is odd: median = middle element = element at index
n // 2(0-based). - If count n is even: median = average of the two middle elements = (element at
n//2 - 1+ element atn//2) / 2.
In heap terms: the lower half has the smaller elements; the upper half has the larger elements. The "middle" is at the boundary: the max of the lower half and the min of the upper half.
Why This Topic Matters
- Classic interview problem (e.g. LeetCode 295). Tests understanding of heaps and invariants.
- Real use: sliding-window medians, real-time analytics, load balancing (median latency).
- Pattern: "maintain two halves with a clear boundary" appears in other problems too.
Mental Model: Two Heaps
- Lower half (left): A max heap — we need the largest of the small numbers. In Python, implement as a min heap of negated values.
- Upper half (right): A min heap — we need the smallest of the large numbers.
- Invariant: Size of lower half is either equal to or one more than the size of upper half. So the median is always: the max of the lower half (when total is odd), or the average of max-of-lower and min-of-upper (when total is even).
Evolution: Brute Force → Optimal
Brute Force: Store All + Sort on Query
Keep a list. On addNum, append. On findMedian, sort the list and return the middle (or average of two middles).
- addNum: O(1). findMedian: O(n log n).
Optimal: Two Heaps (Max-Heap + Min-Heap)
Maintain lo (max heap for lower half) and hi (min heap for upper half). After each add, rebalance so that len(lo) >= len(hi) and len(lo) - len(hi) <= 1. Median = lo[0] when total is odd, or (lo[0] + hi[0]) / 2 when even (with negated lo, use -lo[0]).
- addNum: O(log n). findMedian: O(1).
- Space: O(n).
Step-by-Step: addNum with Two Heaps
- If the lower-half (max) heap is empty or
num <= current max of lower half, pushnuminto the lower half (push-numinto the min-heap representation of the max heap). Otherwise, pushnuminto the upper-half min heap. - Rebalance sizes: if lower half has more than one extra element than upper half, move the max of the lower half to the upper half (pop from
lo, push negated value tohi). If upper half becomes larger than lower half, move the min of the upper half to the lower half (pop fromhi, push negated tolo). - After rebalance:
len(lo) >= len(hi)andlen(lo) - len(hi) <= 1.
ASCII Diagram
After adding: 1, 2, 3, 4, 5
Lower half (max heap, stored as min heap of -x): [-3, -2, -1] → max = 3
Upper half (min heap): [4, 5]
Total count = 5 (odd). Median = max of lower = 3.
After adding 6:
Lower: [-3,-2,-1] Upper: [4,5,6]
Rebalance: sizes 3 and 3 → even. Median = (3 + 4) / 2 = 3.5
Python Implementation
import heapq
class MedianFinder:
def __init__(self):
self.lo = [] # max heap of lower half (store -x for min-heap simulation)
self.hi = [] # min heap of upper half
def addNum(self, num: int) -> None:
if not self.lo or num <= -self.lo[0]:
heapq.heappush(self.lo, -num)
else:
heapq.heappush(self.hi, num)
# Rebalance: we want len(lo) >= len(hi) and len(lo) - len(hi) <= 1
if len(self.lo) > len(self.hi) + 1:
move = -heapq.heappop(self.lo)
heapq.heappush(self.hi, move)
elif len(self.hi) > len(self.lo):
move = heapq.heappop(self.hi)
heapq.heappush(self.lo, -move)
def findMedian(self) -> float:
if len(self.lo) > len(self.hi):
return -self.lo[0]
return (-self.lo[0] + self.hi[0]) / 2.0
Examples Section
Example 1: Step-by-Step Walkthrough
addNum(1), addNum(2), findMedian(), addNum(3), findMedian().
- addNum(1):
lo = [-1],hi = []. Median would be 1 (odd count). - addNum(2): 2 > 1 (max of lo), so push to
hi.lo = [-1],hi = [2]. Sizes 1 and 1; no rebalance. findMedian() → (1 + 2) / 2 = 1.5. - addNum(3): 3 > 1, push to
hi.lo = [-1],hi = [2, 3]. Nowlen(hi) > len(lo); rebalance: move 2 from hi to lo.lo = [-2, -1],hi = [3]. findMedian() → -lo[0] = 2.
Example 2: Code Run with Output
mf = MedianFinder()
mf.addNum(1)
mf.addNum(2)
print(mf.findMedian()) # 1.5
mf.addNum(3)
print(mf.findMedian()) # 2.0
mf.addNum(4)
mf.addNum(5)
print(mf.findMedian()) # 3.0
Output:
1.5
2.0
3.0
Example 3: Even vs Odd Count
Time and Space Complexity
- addNum: O(log n) — at most two heap operations (one push, one possible move).
- findMedian: O(1) — just reading the root of one or two heaps.
- Space: O(n) — all elements stored in the two heaps.
Edge Cases
- No elements: Define behavior (e.g. return 0 or raise). LeetCode 295 assumes at least one add before first findMedian.
- Single element: Median is that element; one heap has one element, the other is empty.
- Duplicate values: Algorithm works; duplicates can go to either half by the
<=comparison (consistent with "lower half" containing the middle).
Common Mistakes
Pattern Recognition
Use two heaps when you need:
- Median (or similar "middle" statistic) in a stream or dynamic set.
- Fast access to both "largest of the small" and "smallest of the large" with incremental updates.
Interview Insight
Practice Problems
- LeetCode 295: Find Median from Data Stream (exact problem).
- Sliding window median: maintain two heaps for the window and update as the window moves.
Summary
- Median in stream = two heaps: max heap (lower half) + min heap (upper half).
- Invariant:
len(lower) >= len(upper)andlen(lower) - len(upper) <= 1. - Median: odd total → root of lower; even total → average of the two roots.
- addNum O(log n), findMedian O(1), space O(n).
Section 13: Graph Theory
This section introduces graphs: nodes (vertices) connected by edges. You will learn how to represent graphs in code, traverse them using BFS and DFS, and build up to powerful algorithms like Dijkstra, Topological Sort, Minimum Spanning Tree, and Network Flow. Mastering graph representation is the first and most important step: if you choose the wrong representation, your algorithms will be harder to write, reason about, and optimize.
13.1 Graph Representation
Introduction
A graph is a set of vertices (nodes) connected by edges. Before you can run BFS/DFS, Dijkstra, or any other graph algorithm, you must decide how to store the graph in memory. In Python, the most common representations are:
- Edge list
- Adjacency matrix
- Adjacency list (the most common in competitive programming and interviews)
Key Concepts
- Directed vs Undirected: In a directed graph (digraph), edges have direction (u → v). In an undirected graph, edges are two-way (u — v).
- Weighted vs Unweighted: Weighted edges have a cost/weight (e.g. distance, time). Unweighted edges are effectively weight 1.
- Sparse vs Dense: A graph with few edges compared to n² is sparse; one with many edges is dense. This heavily influences representation choice.
Representation 1: Edge List
Store the graph simply as a list of edges, each edge being a pair (u, v) for unweighted, or a triple (u, v, w) for weighted graphs.
# Undirected, unweighted edge list
edges = [
(0, 1),
(0, 2),
(1, 2),
(2, 3),
]
Representation 2: Adjacency Matrix
For a graph with n vertices (typically labeled 0..n-1), an adjacency matrix is
an n × n 2D array where entry matrix[u][v] indicates whether there is an edge from
u to v (and possibly stores its weight).
n = 4
# Unweighted directed graph: 1 means edge exists, 0 means no edge
matrix = [[0] * n for _ in range(n)]
edges = [(0, 1), (0, 2), (1, 2), (2, 3)]
for u, v in edges:
matrix[u][v] = 1
# For undirected: mark both directions
for u, v in edges:
matrix[u][v] = 1
matrix[v][u] = 1
Pros and Cons
- Pros: O(1) check if edge (u, v) exists; very simple; good for dense graphs.
- Cons: Uses O(n²) space even if there are few edges; iterating neighbors of u is O(n) (scan the whole row).
Representation 3: Adjacency List (Preferred)
The adjacency list stores, for each vertex u, a list of its neighbors. In Python, we usually use a list of lists (for 0..n-1 vertex labels) or a dictionary mapping each node to a list of neighbors.
n = 4
# Unweighted, undirected graph using list of lists
adj = [[] for _ in range(n)]
edges = [(0, 1), (0, 2), (1, 2), (2, 3)]
for u, v in edges:
adj[u].append(v)
adj[v].append(u) # for undirected
print(adj) # e.g. [[1, 2], [0, 2], [0, 1, 3], [2]]
For weighted graphs, we store (neighbor, weight) pairs:
# Weighted directed graph
adj = [[] for _ in range(n)]
weighted_edges = [
(0, 1, 5), # edge 0 -> 1 with weight 5
(0, 2, 2),
(1, 2, 1),
(2, 3, 7),
]
for u, v, w in weighted_edges:
adj[u].append((v, w))
Pros and Cons
- Pros: Space O(n + m) (where m is number of edges): ideal for sparse graphs; iterating all neighbors of u is O(deg(u)), which is often small.
- Cons: Checking if an arbitrary edge (u, v) exists may require an O(deg(u)) scan.
ASCII Diagram and Adjacency List Example
Graph (undirected):
0
/ \
1---2
\
3
Edges: (0,1), (0,2), (1,2), (2,3)
Adjacency list:
0: 1, 2
1: 0, 2
2: 0, 1, 3
3: 2
Python Example: Building and Traversing a Graph
Let's build an undirected, unweighted graph using an adjacency list and run a simple BFS from node 0.
from collections import deque
def build_undirected_graph(n, edges):
adj = [[] for _ in range(n)]
for u, v in edges:
adj[u].append(v)
adj[v].append(u)
return adj
def bfs(start, adj):
n = len(adj)
visited = [False] * n
order = []
q = deque([start])
visited[start] = True
while q:
u = q.popleft()
order.append(u)
for v in adj[u]:
if not visited[v]:
visited[v] = True
q.append(v)
return order
n = 4
edges = [(0, 1), (0, 2), (1, 2), (2, 3)]
adj = build_undirected_graph(n, edges)
print(\"Adjacency list:\", adj)
print(\"BFS from 0:\", bfs(0, adj))
Example Run
[0, 1, 2, 3] (depending on neighbor order). The key takeaway: once you choose an adjacency list
representation, algorithms like BFS and DFS become straightforward to implement.
When to Use Which Representation
| Representation | Space | Best For |
|---|---|---|
| Edge list | O(m) | Algorithms that iterate edges (e.g. Kruskal) |
| Adjacency matrix | O(n²) | Dense graphs, constant-time edge checks |
| Adjacency list | O(n + m) | Sparse graphs, BFS/DFS, Dijkstra, most interview problems |
Time and Space Complexity Summary
- Edge list: Space O(m). Neighbor iteration O(m). Edge existence check O(m).
- Adjacency matrix: Space O(n²). Neighbor iteration for u: O(n). Edge check (u, v): O(1).
- Adjacency list: Space O(n + m). Neighbor iteration for u: O(deg(u)). Edge check (u, v): O(deg(u)).
Common Mistakes
Interview Insight
Summary
- Graphs can be represented as edge lists, adjacency matrices, or adjacency lists.
- For most interview problems and competitive programming tasks, adjacency lists are the right default choice.
- The representation strongly affects time and space complexity of graph algorithms; choose based on n, m, and required operations.
13.2 BFS
Introduction
Breadth-First Search (BFS) is a graph (and tree) traversal algorithm that explores all nodes at the current "distance" (number of edges) from the source before moving to nodes one step farther. It uses a queue: you process nodes in the order they were discovered, which naturally gives you level-by-level exploration. BFS is the standard tool for shortest path in unweighted graphs, finding connected components, and many grid/puzzle problems.
Real-World Analogy
Imagine a rumor spreading in a social network: it starts from one person. In the first minute, all direct friends hear it. In the second minute, all friends of those friends (who haven't heard yet) hear it, and so on. BFS does exactly this: it "spreads" from the source in waves. The first wave is distance 1, the next is distance 2, etc. So the first time you reach a node, you've found a shortest path to it (in terms of number of edges).
Formal Definition
Input: A graph (adjacency list or matrix), and a source vertex s.
Output: Depending on the problem: visitation order, distances from s, a shortest-path tree (parent pointers), or simply "all reachable nodes."
Key property: In an unweighted graph, BFS from s visits vertices in non-decreasing order of shortest-path distance (number of edges) from s. The first time a node is reached, that path is a shortest path.
Why This Topic Matters
- Shortest path in unweighted graphs: BFS gives O(V + E) solution; no need for Dijkstra.
- Level-order traversal in trees; multi-source BFS (e.g. all 0s as sources in a grid).
- Connected components, bipartite checking, and many interview problems (grid, word ladder, etc.).
Mental Model
- Use a queue (FIFO). Start by enqueueing the source and marking it visited.
- While the queue is not empty: dequeue a node
u, then enqueue all unvisited neighbors ofuand mark them visited. Those neighbors are "one edge farther" thanu. - Because we process in FIFO order, we always finish all nodes at distance
dbefore processing any node at distanced + 1.
Step-by-Step Algorithm
- Create a queue and a visited set (or array). Enqueue
sand marksvisited. - Optionally, set
dist[s] = 0andparent[s] = None. - While the queue is not empty:
- Dequeue
u. - For each neighbor
vofu: ifvis not visited, markvvisited, setdist[v] = dist[u] + 1(andparent[v] = uif needed), and enqueuev.
- Dequeue
ASCII Diagram
Graph (undirected): BFS from 0 (queue order)
0 Level 0: 0
/ \ Level 1: 1, 2
1---2 Level 2: 3
\ /
3
Queue steps: [0] → pop 0, add 1,2 → [1,2] → pop 1, add 3 → [2,3] → pop 2 → [3] → pop 3 → []
Visitation order: 0, 1, 2, 3. Distances: d[0]=0, d[1]=1, d[2]=1, d[3]=2.
Python Implementation
from collections import deque
def bfs_shortest_paths(adj, start):
"""
adj: list of lists (adjacency list), indices 0..n-1.
start: source vertex.
Returns: (dist, parent) where dist[u] = shortest distance from start to u,
parent[u] = predecessor on a shortest path (None for start).
"""
n = len(adj)
dist = [-1] * n
parent = [None] * n
dist[start] = 0
q = deque([start])
while q:
u = q.popleft()
for v in adj[u]:
if dist[v] == -1:
dist[v] = dist[u] + 1
parent[v] = u
q.append(v)
return dist, parent
def path_from_parent(parent, start, end):
"""Reconstruct path from start to end using parent array."""
path = []
cur = end
while cur is not None:
path.append(cur)
cur = parent[cur]
path.reverse()
return path if path and path[0] == start else []
Examples Section
Example 1: BFS Visitation and Distances
Adjacency list: adj = [[1, 2], [0, 2], [0, 1, 3], [2]]
- Start: queue = [0], dist = [0, -1, -1, -1].
- Pop 0: neighbors 1, 2 → dist[1]=1, dist[2]=1, queue = [1, 2].
- Pop 1: neighbor 0 (visited), 2 (visited). Queue = [2].
- Pop 2: neighbor 3 → dist[3]=2, queue = [3].
- Pop 3: no new neighbors. Done.
Result: dist = [0, 1, 1, 2]. Shortest path from 0 to 3 has length 2 (e.g. 0→2→3).
Example 2: Code Run with Path Reconstruction
adj = [[1, 2], [0, 2], [0, 1, 3], [2]]
dist, parent = bfs_shortest_paths(adj, 0)
print("Distances:", dist) # [0, 1, 1, 2]
print("Path 0 -> 3:", path_from_parent(parent, 0, 3)) # [0, 2, 3]
Output:
Distances: [0, 1, 1, 2]
Path 0 -> 3: [0, 2, 3]
Example 3: Multi-Source BFS (Grid)
from collections import deque
def grid_bfs_distances(grid, source_value=0):
"""grid: 2D list. Cells with value source_value are sources. Return 2D dist."""
rows, cols = len(grid), len(grid[0])
dist = [[-1] * cols for _ in range(rows)]
q = deque()
for r in range(rows):
for c in range(cols):
if grid[r][c] == source_value:
dist[r][c] = 0
q.append((r, c))
while q:
r, c = q.popleft()
for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
nr, nc = r + dr, c + dc
if 0 <= nr < rows and 0 <= nc < cols and dist[nr][nc] == -1:
dist[nr][nc] = dist[r][c] + 1
q.append((nr, nc))
return dist
# Example: 3x3 grid, (0,0) and (2,2) are sources (value 0)
grid = [[0, 1, 1], [1, 1, 1], [1, 1, 0]]
print(grid_bfs_distances(grid)) # distances to nearest 0
Time and Space Complexity
- Time: O(V + E) — each vertex is enqueued and dequeued at most once; each edge is considered at most once (for directed) or twice (for undirected).
- Space: O(V) for visited/dist/parent and the queue (queue can hold up to O(V) vertices in the worst case).
Edge Cases
- Disconnected graph: BFS from
sonly reaches nodes in the same component. To visit all nodes, run BFS from each unvisited node (or use a loop over components). - Single node: Queue starts with [s], then empty; dist[s]=0, others -1.
- Directed graph: Same algorithm; only follow outgoing edges from
u(adj[u]).
Common Mistakes
Pattern Recognition
Use BFS when you need:
- Shortest path in terms of number of edges (unweighted graph).
- Level-by-level or "distance in steps" exploration (e.g. word ladder, grid moves).
- Multi-source shortest distances (all sources in the queue at distance 0).
Interview Insight
Practice Problems
- LeetCode 127: Word Ladder (BFS over "word graph").
- LeetCode 542: 01 Matrix (multi-source BFS from 0s).
- LeetCode 1091: Shortest Path in Binary Matrix (BFS on grid).
Summary
- BFS = explore graph using a queue; visit nodes in order of increasing distance from the source.
- In unweighted graphs, BFS from
scomputes shortest-path distances (and a shortest-path tree) in O(V + E). - Mark nodes visited when you enqueue them; use a queue (deque), not a stack.
- Multi-source BFS: start with all sources in the queue at distance 0.
13.3 DFS
Introduction
Depth-First Search (DFS) is a fundamental graph traversal algorithm that explores as far as possible along one path before backtracking. You can think of it as always going \"deeper\" first, using a stack (either explicit or via recursion). DFS is the basis for many advanced algorithms: topological sort, cycle detection, connected components, articulation points & bridges, strongly connected components (SCC), and more.
DFS vs BFS: Mental Contrast
- BFS explores in layers (level by level) using a queue → good for shortest paths in unweighted graphs.
- DFS explores by going deep into the graph using a stack/recursion → good for exploring structure, detecting cycles, and topological ordering.
Formal Definition
Input: A graph (usually adjacency list), and optionally a starting node.
Output: A traversal order, discovery/finish times, connected components, or answers to questions like \"is there a path between u and v?\", \"is the graph acyclic?\", etc.
Recursive DFS: Core Idea
The recursive DFS for a starting node u is:
- Mark
uas visited. - For each neighbor
vofu:- If
vis not visited, recursively DFS fromv.
- If
ASCII Diagram
Graph (undirected):
0
/ \
1 2
|
3
DFS from 0 (one possible order):
0 → 1 (backtrack) → 2 → 3 (backtrack) → done
Python Implementation (Recursive)
def dfs_recursive(adj, start, visited=None, order=None):
\"\"\"Depth-first search from start. Returns visitation order.\"\"\"
if visited is None:
visited = set()
if order is None:
order = []
visited.add(start)
order.append(start)
for v in adj[start]:
if v not in visited:
dfs_recursive(adj, v, visited, order)
return order
Python Implementation (Iterative with Stack)
def dfs_iterative(adj, start):
visited = set()
order = []
stack = [start]
while stack:
u = stack.pop()
if u in visited:
continue
visited.add(u)
order.append(u)
# Push neighbors in reverse if you want a specific order
for v in reversed(adj[u]):
if v not in visited:
stack.append(v)
return order
Examples Section
Example 1: Simple DFS Order
adj = [[1,2],[0],[0,3],[2]].
Using dfs_recursive(adj, 0):
- Start at 0: visit 0, then neighbor 1 → visit 1 (backtrack).
- Back at 0: next neighbor 2 → visit 2, then neighbor 3 → visit 3 (backtrack).
- Traversal order:
[0, 1, 2, 3](one valid DFS order).
Example 2: Counting Connected Components
def count_components(adj):
n = len(adj)
visited = [False] * n
components = 0
def dfs(u):
visited[u] = True
for v in adj[u]:
if not visited[v]:
dfs(v)
for u in range(n):
if not visited[u]:
components += 1
dfs(u)
return components
For adj = [[1], [0], [3], [2]] (two separate edges 0–1 and 2–3), count_components(adj) returns 2.
Example 3: Cycle Detection in Undirected Graph
def has_cycle_undirected(adj):
n = len(adj)
visited = [False] * n
def dfs(u, parent):
visited[u] = True
for v in adj[u]:
if not visited[v]:
if dfs(v, u):
return True
elif v != parent:
# visited neighbor that is not parent → cycle
return True
return False
for u in range(n):
if not visited[u]:
if dfs(u, -1):
return True
return False
Time and Space Complexity
- Time: O(V + E) — each vertex and edge is explored at most once.
- Space: O(V) for visited + recursion stack (in recursive version) or explicit stack (iterative).
Edge Cases
- Disconnected graph: A single DFS from one start will not visit all nodes; run DFS from each unvisited node to cover all components.
- Deep/long path: Recursive DFS may hit recursion depth limits in Python on very deep graphs; in those cases, prefer iterative DFS with an explicit stack.
- Directed graph: DFS is defined the same way, but many algorithms (e.g. cycle detection, SCC) interpret edges differently.
Common Mistakes
Pattern Recognition
Use DFS when you need:
- To explore all reachable nodes and recurse on structure (trees, graphs).
- To find connected components, cycles, topological order, or articulation points & bridges.
- To perform backtracking-style exploration (e.g. generating paths, solving puzzles on graphs).
Interview Insight
Summary
- DFS explores depth-first using recursion or an explicit stack.
- Time complexity O(V + E), space O(V).
- Key for many graph algorithms: connected components, cycle detection, topological sort, SCC, and more.
13.4 Topological Sort (DFS & Kahn's Algorithm)
Introduction
A topological sort of a directed acyclic graph (DAG) is a linear ordering of its vertices
such that for every directed edge u → v, u comes before v in the ordering.
Topological order is the backbone of many dependency problems: task scheduling, build systems, course
prerequisites, and more. In this topic, we will learn two classic algorithms:
DFS-based topological sort and Kahn's algorithm (BFS + indegree).
Real-World Analogy
Imagine you have a set of courses with prerequisites: to take course B, you must first complete course A. This forms a directed edge A → B. A topological ordering is a valid sequence of courses you can follow so that all prerequisites are satisfied. Similarly, in build systems, some files or modules must be built before others; topological sort gives a valid build order.
When Topological Order Exists
- The graph must be a DAG (Directed Acyclic Graph).
- If there is a cycle (like A → B → C → A), no linear order can satisfy all edges.
Method 1: DFS-Based Topological Sort
Mental Model
Think of running DFS on the directed graph. For each node, you first recursively visit all nodes reachable from it, and only after exploring all its outgoing edges do you \"finish\" the node and add it to a list. If you then reverse this list of finishing times, you get a valid topological order. Intuition: a node comes after all of its descendants in DFS finishing time, so reversing moves it before them.
Algorithm (DFS)
- Maintain a visited array and a list
order. - For each vertex
u:- If
uis not visited, rundfs(u).
- If
- In
dfs(u):- Mark
uas visited. - For each neighbor
v(edgeu → v): ifvis not visited,dfs(v). - After exploring all neighbors, append
utoorder.
- Mark
- Reverse
order; the result is a topological sort.
Python Implementation (DFS)
def topo_sort_dfs(adj):
"""
adj: adjacency list of a directed graph, vertices 0..n-1.
Returns a list of vertices in topological order.
Assumes the graph is a DAG (no cycles).
"""
n = len(adj)
visited = [False] * n
order = []
def dfs(u):
visited[u] = True
for v in adj[u]:
if not visited[v]:
dfs(v)
order.append(u)
for u in range(n):
if not visited[u]:
dfs(u)
order.reverse()
return order
Method 2: Kahn's Algorithm (BFS + Indegree)
Mental Model
Kahn's algorithm repeatedly removes nodes with indegree 0 (no incoming edges). Such nodes
have no prerequisites, so they can safely come next in the topological order. When you remove a node u,
you conceptually delete its outgoing edges (u → v), which may cause some neighbors v
to drop to indegree 0, and thus become candidates next.
Algorithm (Kahn's)
- Compute
indegree[v]for all vertices v (number of incoming edges). - Push all vertices with
indegree[v] == 0into a queue. - While the queue is not empty:
- Pop
ufrom the queue, appenduto the resultorder. - For each neighbor
v(edgeu → v): decrementindegree[v]by 1; if it becomes 0, pushvto the queue.
- Pop
- If
ordercontains all vertices, you found a topological ordering. If not, the graph had a cycle.
Python Implementation (Kahn's Algorithm)
from collections import deque
def topo_sort_kahn(adj):
n = len(adj)
indegree = [0] * n
for u in range(n):
for v in adj[u]:
indegree[v] += 1
q = deque([u for u in range(n) if indegree[u] == 0])
order = []
while q:
u = q.popleft()
order.append(u)
for v in adj[u]:
indegree[v] -= 1
if indegree[v] == 0:
q.append(v)
if len(order) != n:
raise ValueError("Graph has a cycle; no topological ordering exists.")
return order
Examples Section
Example 1: Simple DAG
adj = [
[1, 2], # 0
[3], # 1
[3], # 2
[] # 3
]
A valid topological order is [0, 1, 2, 3] or [0, 2, 1, 3]. Running either
topo_sort_dfs(adj) or topo_sort_kahn(adj) will produce a valid ordering.
Example 2: Kahn's Algorithm Step-by-Step
Using the same graph as above:
- Compute indegrees: indegree[0]=0, indegree[1]=1, indegree[2]=1, indegree[3]=2.
- Queue starts with [0] (only node with indegree 0).
- Pop 0 → order=[0]. Decrement indegree[1] to 0, indegree[2] to 0 → queue becomes [1, 2].
- Pop 1 → order=[0, 1]. Decrement indegree[3] to 1 → queue=[2].
- Pop 2 → order=[0, 1, 2]. Decrement indegree[3] to 0 → queue=[3].
- Pop 3 → order=[0, 1, 2, 3]. Queue empty, order length = 4 = n → success.
Example 3: Detecting a Cycle
adj = [
[1], # 0
[2], # 1
[0], # 2
]
In Kahn's algorithm, no node will ever have indegree 0 (or after a few steps the queue becomes empty before
we've processed all vertices). Our implementation checks len(order) != n and raises an error.
In DFS-based approaches, you can detect a cycle by tracking recursion stack (colors: WHITE/GRAY/BLACK).
Time and Space Complexity
- DFS-based topological sort: O(V + E) time, O(V) space (visited + recursion stack + order).
- Kahn's algorithm: O(V + E) time, O(V) space (indegree array + queue + order).
Common Mistakes
Pattern Recognition
Use topological sort when you see:
- Tasks, jobs, or courses with prerequisites (dependencies form a DAG).
- Build order of modules or packages based on dependency edges.
- Any problem that says \"do X before Y\" for many pairs (X, Y) and asks for a valid global order.
Interview Insight
Summary
- Topological sort orders vertices u before v for all edges u → v in a DAG.
- DFS method: run DFS, push nodes after exploring neighbors, then reverse the list.
- Kahn's algorithm: repeatedly remove indegree-0 nodes, updating neighbors' indegrees.
- Time O(V + E), space O(V); only defined for DAGs (no cycles).
13.5 Dijkstra
Introduction
Dijkstra's algorithm finds the shortest path distances from a single source vertex to all other vertices in a graph with non-negative edge weights. It is one of the most important algorithms in graph theory and appears in routing (GPS, networks), scheduling, and many interview problems.
Key Requirements
- Graph may be directed or undirected.
- Edge weights must be non-negative. If negative edges exist, Dijkstra can give wrong answers (use Bellman-Ford or other algorithms instead).
- Graph is usually represented with an adjacency list storing (neighbor, weight) pairs.
Real-World Analogy
Imagine you are at a city intersection (the source) and want to know the shortest driving distance to every other intersection. Initially, distances are infinity except for your starting point (distance 0). At each step, you permanently choose the not-yet-finalized intersection with the smallest known distance and \"relax\" edges from it, possibly updating distances of its neighbors if going through this intersection yields a shorter route. This is exactly what Dijkstra's algorithm does using a min-priority queue.
Mental Model
- Maintain an array
distwheredist[v]is the current best known distance from sourcestov. - Use a min-heap (priority queue) to always pick the vertex
uwith the smallest tentative distance. - When you \"finalize\" a vertex (pop it from the heap with its minimal distance), you relax its outgoing edges
u → v: ifdist[u] + w(u,v) < dist[v], updatedist[v]and push a new pair into the heap.
Algorithm (High-Level)
- Initialize
dist[v] = ∞for all v, anddist[s] = 0for the source vertexs. - Create a min-heap and push
(0, s). - While the heap is not empty:
- Pop
(d, u)from the heap. Ifd > dist[u], skip (this is an outdated entry). - For each edge
u → vwith weightw, ifdist[u] + w < dist[v], updatedist[v]and push(dist[v], v)into the heap.
- Pop
ASCII Diagram
Graph (directed, weighted):
(1)
0 ----> 1
| |
(4) (2)
| v
v 3
2 --(5) ^
\ |
(1) (1)
\ |
v |
4 --
Edges:
0 -> 1 (1), 0 -> 2 (4)
1 -> 3 (2)
2 -> 3 (5), 2 -> 4 (1)
4 -> 3 (1)
Shortest distances from 0:
dist[0] = 0
dist[1] = 1
dist[2] = 4
dist[4] = 5
dist[3] = 6 (0 -> 1 -> 3 or 0 -> 2 -> 4 -> 3)
Python Implementation (Adjacency List + Heap)
import heapq
def dijkstra(adj, source):
\"\"\"adj: adjacency list, adj[u] = list of (v, w) edges. Returns dist[] and parent[].\"\"\"
n = len(adj)
INF = float('inf')
dist = [INF] * n
parent = [None] * n
dist[source] = 0
heap = [(0, source)] # (distance, node)
while heap:
d, u = heapq.heappop(heap)
if d > dist[u]:
continue # outdated entry
for v, w in adj[u]:
nd = d + w
if nd < dist[v]:
dist[v] = nd
parent[v] = u
heapq.heappush(heap, (nd, v))
return dist, parent
def reconstruct_path(parent, s, t):
path = []
cur = t
while cur is not None:
path.append(cur)
cur = parent[cur]
path.reverse()
return path if path and path[0] == s else []
Examples Section
Example 1: Shortest Paths from a Source
adj = [
[(1, 1), (2, 4)], # 0
[(3, 2)], # 1
[(3, 5), (4, 1)], # 2
[], # 3
[(3, 1)], # 4
]
dist, parent = dijkstra(adj, 0)
print(\"dist:\", dist)
print(\"path 0 -> 3:\", reconstruct_path(parent, 0, 3))
One possible output:
dist: [0, 1, 4, 6, 5]
path 0 -> 3: [0, 1, 3]
Example 2: Step-by-Step Heap Evolution (Intuition)
- Start: dist[0]=0, others ∞. Heap = [(0,0)].
- Pop (0,0): relax 0→1 (1) → dist[1]=1, push (1,1); relax 0→2 (4) → dist[2]=4, push (4,2). Heap=[(1,1),(4,2)].
- Pop (1,1): relax 1→3 (2) → dist[3]=3, push (3,3). Heap=[(3,3),(4,2)].
- Pop (3,3): no outgoing edges. Heap=[(4,2)].
- Pop (4,2): relax 2→3 (5) → new distance 9 > current dist[3]=3, ignore; relax 2→4 (1) → dist[4]=5, push (5,4).
- Pop (5,4): relax 4→3 (1) → new distance 6 > 3, ignore. Done.
Example 3: Unreachable Nodes
dist[v] will remain ∞. You can treat that as \"unreachable\" in problem statements.
Time and Space Complexity
- Let V = number of vertices, E = number of edges.
- Each edge is relaxed at most once in the main loop.
- Each relaxation may push a new entry into the heap. Heap operations cost O(log V).
- Time: O(E log V) using a binary heap (Python's
heapq). - Space: O(V + E) for the adjacency list, O(V) for dist/parent and the heap.
Edge Cases
- Negative weights: Dijkstra is not valid if any edge weight is negative. Use Bellman-Ford or other algorithms instead.
- Disconnected graph: Some nodes may remain at distance ∞, meaning unreachable from the source.
- Multiple edges / self-loops: Algorithm still works; relaxations simply may never improve dist values.
Common Mistakes
if d > dist[u] when popping from the heap, which can cause extra work or incorrect processing of outdated entries.Pattern Recognition
Use Dijkstra when you see:
- \"Shortest path\" or \"minimum cost\" in a graph with non-negative weights.
- Grid problems where moving between cells has different positive costs (e.g. terrain costs).
- Network routing, travel planning, or any path-finding with positive distances.
Interview Insight
Summary
- Dijkstra computes single-source shortest paths in graphs with non-negative edge weights.
- Uses a min-heap priority queue and edge relaxation: if going through u shortens dist[v], update it.
- Time complexity O(E log V), space O(V + E).
- Do not use when negative edge weights are present; use Bellman-Ford or other algorithms instead.
13.6 Bellman-Ford
Introduction
Bellman-Ford is a single-source shortest-path algorithm that, unlike Dijkstra, works with negative edge weights. It can also detect negative cycles (cycles whose total weight is negative), in which case no finite shortest path exists from the source to nodes reachable through that cycle. The trade-off is higher time complexity: O(V · E) instead of O(E log V).
When to Use Bellman-Ford
- Graph has negative edge weights (Dijkstra is invalid).
- You need to detect negative cycles (e.g. arbitrage in currency graphs).
- Dense graphs where V is small; O(V · E) may be acceptable.
Real-World Analogy
Imagine currency exchange rates: each edge (A → B) has a "cost" (e.g. −log(rate)). A path from currency A back to A with total negative cost means you can make money by cycling (arbitrage). Bellman-Ford can find shortest paths and, with one extra pass, tell you if such a "negative cycle" exists.
Algorithm (High-Level)
- Initialize
dist[s] = 0anddist[v] = ∞for all other vertices. - Repeat V − 1 times: for every edge (u, v) with weight w, relax: if
dist[u] + w < dist[v], setdist[v] = dist[u] + w(and optionally update parent). - Negative cycle check: Run one more relaxation pass. If any edge (u, v) still improves
dist[v], then the graph contains a negative cycle reachable from the source.
Why V − 1 rounds? A shortest path from s to any vertex has at most V − 1 edges. Each round relaxes all edges once; after V − 1 rounds, shortest paths of up to V − 1 edges have been propagated. If a path has V or more edges and is still improving, it must use a negative cycle.
Mental Model
- Think of "waves" of relaxation: round 1 fixes shortest paths of length 1 edge, round 2 fixes paths of length 2, and so on. After V − 1 rounds, all finite shortest paths are correct.
- If after V − 1 rounds you can still relax an edge, that relaxation is "driven" by a negative cycle.
ASCII Diagram
Directed graph (can have negative weights):
(2)
0 ----> 1
\ |
(-1) (1)
\ v
\ 2
\ |
(3)|
v v
3
Edges: 0→1(2), 0→2(-1), 1→2(1), 2→3(3)
No negative cycle. Shortest from 0: dist[0]=0, dist[1]=2, dist[2]=-1, dist[3]=2.
Python Implementation
def bellman_ford(edges, n, source):
"""
edges: list of (u, v, w) directed edges. Vertices 0..n-1.
Returns (dist, parent, has_negative_cycle).
"""
INF = float('inf')
dist = [INF] * n
parent = [None] * n
dist[source] = 0
# V - 1 relaxation rounds
for _ in range(n - 1):
for u, v, w in edges:
if dist[u] != INF and dist[u] + w < dist[v]:
dist[v] = dist[u] + w
parent[v] = u
# Negative cycle detection: one more round
has_negative_cycle = False
for u, v, w in edges:
if dist[u] != INF and dist[u] + w < dist[v]:
has_negative_cycle = True
break
return dist, parent, has_negative_cycle
Examples Section
Example 1: Graph Without Negative Cycle
edges = [(0, 1, 2), (0, 2, -1), (1, 2, 1), (2, 3, 3)]
dist, parent, has_neg = bellman_ford(edges, 4, 0)
print("dist:", dist) # [0, 2, -1, 2]
print("negative cycle:", has_neg) # False
Output: dist: [0, 2, -1, 2], negative cycle: False. Shortest path 0→2→3 has length 2.
Example 2: Graph With Negative Cycle
edges = [(0, 1, 2), (0, 2, -1), (1, 2, 1), (2, 3, 3), (3, 0, -5)]
dist, parent, has_neg = bellman_ford(edges, 4, 0)
print("negative cycle:", has_neg) # True
Output: negative cycle: True. Distances may be incorrect for nodes in or reachable through the cycle.
Example 3: Edge List vs Adjacency List
Bellman-Ford is typically implemented over an edge list (list of (u, v, w)) so that each round iterates over all edges once. If your graph is stored as an adjacency list, build the edge list first or iterate adj[u] for each u and relax (u, v, w) for each neighbor.
Time and Space Complexity
- Time: O(V · E) — V − 1 rounds, each round O(E) relaxations.
- Space: O(V) for dist and parent; O(E) for the edge list.
Edge Cases
- Negative cycle reachable from source: Algorithm reports it; distances to nodes reachable via the cycle are meaningless (can be made arbitrarily negative).
- Disconnected graph: Nodes unreachable from the source stay at ∞; they are not affected by negative cycles in other components.
- Multiple edges: Include each edge in the edge list; relaxation handles them naturally.
Common Mistakes
dist[u] != INF before relaxing u→v; otherwise "∞ + w" could incorrectly update dist[v] in languages where ∞ + negative is still ∞ (in Python it is, but the check keeps logic clear).
Pattern Recognition
Use Bellman-Ford when you see:
- Shortest path with negative edge weights.
- Negative cycle detection (e.g. arbitrage, fault detection in networks).
- Constraints like "at most K edges" (modified Bellman-Ford can be used for limited-hop shortest path).
Interview Insight
Summary
- Bellman-Ford computes single-source shortest paths and detects negative cycles.
- V − 1 rounds of full edge relaxation; one extra round to detect negative cycle.
- Time O(V · E), space O(V). Use when edges can be negative or you need cycle detection.
13.7 Floyd-Warshall
Introduction
Floyd-Warshall is an all-pairs shortest path algorithm: it computes the shortest distance between every pair of vertices in a graph. It works with negative edge weights and can detect negative cycles. The algorithm is simple to implement (three nested loops) but has O(V³) time and space, so it is practical only when the number of vertices V is moderate (typically a few hundred or when you explicitly need a full distance matrix).
When to Use Floyd-Warshall
- You need shortest path between every pair of nodes (e.g. distance matrix for a small graph).
- Graph may have negative edge weights (unlike Dijkstra).
- V is not too large (V² or V³ is acceptable).
- Dense graphs: one run gives all pairs; running V times Dijkstra would be O(V · E log V), which can be worse for dense E ≈ V².
Real-World Analogy
Imagine a table of distances between every pair of cities. Initially you have direct road distances. Floyd-Warshall asks: "For each pair (A, B), could we get a shorter distance by going through city C?" It tries every intermediate city and updates the table. After considering all intermediates, the table holds the true shortest distances (and can reveal if any city acts as a "negative cycle" hub).
Dynamic Programming Idea
Define dist[i][j][k] = shortest path from i to j using only vertices {0, 1, …, k} as intermediates. Then:
- Base:
dist[i][j][−1]= direct edge weight (or ∞ if no edge). We use a 2D table and overwrite it: letdist[i][j]mean "shortest from i to j using intermediates 0..k" in round k. - Transition: Either we don't use vertex k, so
dist[i][j]stays as is; or we go i → k → j, sodist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]).
Algorithm (High-Level)
- Initialize an n×n matrix
dist:dist[i][j]= 0 if i == j, else weight of edge (i, j) or ∞ if no edge. - For k = 0 to n − 1: for each pair (i, j), set
dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]). - Negative cycle: If after the loops any
dist[i][i] < 0, the graph contains a negative cycle (you can go from i to i with negative cost).
Mental Model
In round k, we "allow" vertex k as an intermediate. For every pair (i, j), we ask: is it shorter to go i → k → j than our current best? The order of the k-loop matters (k must be the outer loop); i and j can be in any order.
ASCII Diagram
Graph (4 nodes, directed, weighted):
0 --(2)--> 1
| |
(4) (1)
v v
2 <--(2)-- 3
Edges: 0→1(2), 0→2(4), 1→3(1), 3→2(2).
Initial dist (direct edges, rest ∞):
0 1 2 3
0 0 2 ∞ ∞
1 ∞ 0 ∞ 1
2 ∞ ∞ 0 ∞
3 ∞ ∞ 2 0
After Floyd-Warshall, dist[0][2] = 5 (0→1→3→2).
Python Implementation
def floyd_warshall(n, edges, directed=True):
"""
n: number of vertices (0..n-1).
edges: list of (u, v, w). If directed=False, add both (u,v,w) and (v,u,w).
Returns: 2D list dist, and has_negative_cycle (bool).
"""
INF = float('inf')
dist = [[INF] * n for _ in range(n)]
for i in range(n):
dist[i][i] = 0
for u, v, w in edges:
dist[u][v] = min(dist[u][v], w)
if not directed:
dist[v][u] = min(dist[v][u], w)
for k in range(n):
for i in range(n):
for j in range(n):
if dist[i][k] != INF and dist[k][j] != INF:
dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])
has_negative_cycle = any(dist[i][i] < 0 for i in range(n))
return dist, has_negative_cycle
Examples Section
Example 1: Small Graph, All-Pairs Distances
n = 4
edges = [(0, 1, 2), (0, 2, 4), (1, 3, 1), (3, 2, 2)]
dist, neg_cycle = floyd_warshall(n, edges)
for i in range(n):
print(dist[i])
Output (one possible):
[0, 2, 5, 3]
[inf, 0, 3, 1]
[inf, inf, 0, inf]
[inf, inf, 2, 0]
Interpretation: dist[0][2]=5 via 0→1→3→2; dist[1][0]=∞ (no path from 1 to 0).
Example 2: Negative Cycle Detection
edges_neg = [(0, 1, 2), (0, 2, 4), (1, 3, 1), (3, 2, 2), (2, 0, -10)]
dist, neg_cycle = floyd_warshall(4, edges_neg)
print("has_negative_cycle:", neg_cycle) # True
Example 3: Transitive Closure (Unweighted)
For an unweighted graph, you can use the same structure to compute reachability: set dist[i][j]=1 if there is an edge, 0 for i==j, ∞ otherwise; then run Floyd-Warshall with "min" and "sum" replaced by logical OR and AND (or keep 0/1 and use dist[i][j] = dist[i][j] or (dist[i][k] and dist[k][j])). After the loops, dist[i][j] < ∞ means "j is reachable from i".
Time and Space Complexity
- Time: O(V³) — three nested loops over V.
- Space: O(V²) for the distance matrix.
Edge Cases
- No edge between i, j: Initialize to ∞; after the algorithm, still ∞ means no path.
- Negative cycle: Some dist[i][i] < 0; distances to nodes reachable through the cycle are not well-defined.
- Undirected graph: Store each edge in both directions (or set dist[u][v] = dist[v][u] = w).
Common Mistakes
Pattern Recognition
Use Floyd-Warshall when you see:
- "Shortest path between all pairs of vertices."
- Small graph (V ≤ few hundred) and need a full distance matrix.
- Reachability / transitive closure with the same triple-loop structure.
Interview Insight
Summary
- Floyd-Warshall computes all-pairs shortest paths in O(V³) time and O(V²) space.
- Uses dynamic programming: allow intermediates 0..k; relax dist[i][j] via k.
- Works with negative weights; negative cycle detected if dist[i][i] < 0 for some i.
- Use when you need every pair; otherwise prefer Dijkstra or Bellman-Ford for single-source.
13.8 Prim's Algorithm
Introduction
Prim's algorithm finds a Minimum Spanning Tree (MST) of a connected, undirected, weighted graph. An MST is a spanning tree (connects all vertices, no cycles) whose total edge weight is as small as possible. Prim's grows a single tree from a starting vertex by repeatedly adding the cheapest edge that connects a vertex already in the tree to a vertex outside it—very similar in structure to Dijkstra, but the "key" is the minimum edge weight to reach a node, not the path length.
Minimum Spanning Tree (MST)
- Spanning tree: Subgraph that is a tree and includes all vertices.
- Minimum: Sum of edge weights is minimum among all spanning trees.
- If the graph is not connected, there is no spanning tree; we can run Prim's on each component to get an MST forest.
Real-World Analogy
Imagine laying cable to connect all houses in a neighborhood at minimum cost. You start at one house. At each step, you extend the cable to the nearest house not yet connected (minimum cost edge from the current network to a new house). When all houses are connected, you have an MST—no redundant links, minimum total cost.
Algorithm (High-Level)
- Start with a single vertex (e.g. 0); mark it "in the tree."
- Maintain the minimum cost to add each vertex to the tree (initially ∞ except neighbors of the start).
- Repeat until all vertices are in the tree: pick the vertex v with minimum cost that is not yet in the tree; add it and the edge that achieved that cost to the MST; update the minimum cost for neighbors of v (if an edge v→u has weight w and w < current cost for u, set cost[u] = w).
This is exactly like Dijkstra, but the "distance" is replaced by "minimum edge weight from the current tree to this vertex."
Mental Model
- You have a growing set T of vertices in the MST. For each vertex not in T, track the cheapest edge from T to that vertex.
- Each step: add the vertex with the cheapest such edge; that edge becomes part of the MST; then update the cheapest edge for its neighbors.
ASCII Diagram
Undirected weighted graph:
1
0 --- 1
|\ |
4| \2 |3
| \ |
2 --- 3
1
Edges: 0-1(1), 0-2(4), 0-3(2), 1-3(3), 2-3(1).
Prim from 0: add 0; cheapest to 1 is 1, to 3 is 2, to 2 is 4.
Add 1 (edge 0-1); then add 3 (edge 0-3 or 2-3); then add 2 (edge 2-3).
MST edges: (0,1), (0,3), (2,3) or (0,1), (2,3), (0,3). Total weight = 1+2+1 = 4.
Python Implementation (Min-Heap)
import heapq
def prim(n, adj, start=0):
"""
n: number of vertices (0..n-1).
adj: adjacency list, adj[u] = list of (v, w) for undirected edges.
Returns: (mst_total_weight, mst_edges).
"""
in_mst = [False] * n
min_cost = [float('inf')] * n
min_cost[start] = 0
parent = [None] * n
heap = [(0, start, -1)] # (cost, node, parent)
mst_edges = []
mst_weight = 0
while heap:
c, u, p = heapq.heappop(heap)
if in_mst[u]:
continue
in_mst[u] = True
mst_weight += c
if p != -1:
mst_edges.append((p, u, c))
for v, w in adj[u]:
if not in_mst[v] and w < min_cost[v]:
min_cost[v] = w
parent[v] = u
heapq.heappush(heap, (w, v, u))
return mst_weight, mst_edges
Examples Section
Example 1: Small Graph MST
def build_adj_undirected(n, edges):
adj = [[] for _ in range(n)]
for u, v, w in edges:
adj[u].append((v, w))
adj[v].append((u, w))
return adj
n = 4
edges = [(0, 1, 1), (0, 2, 4), (0, 3, 2), (1, 3, 3), (2, 3, 1)]
adj = build_adj_undirected(n, edges)
weight, mst_edges = prim(n, adj, 0)
print("MST weight:", weight) # 4
print("MST edges:", mst_edges) # e.g. [(0, 1, 1), (0, 3, 2), (3, 2, 1)]
Output: MST weight: 4, and three edges (e.g. 0-1, 0-3, 2-3) forming the MST.
Example 2: Step-by-Step (Intuition)
- Start: in_mst = [T,F,F,F], heap = [(0,0,-1)].
- Pop (0,0,-1): add 0; push (1,1,0), (4,2,0), (2,3,0).
- Pop (1,1,0): add 1, edge (0,1); push (3,3,1) but 3 already has cost 2 from 0, so (2,3,0) stays better.
- Pop (2,3,0): add 3, edge (0,3); push (1,2,3).
- Pop (1,2,3): add 2, edge (3,2). All in MST; total weight 0+1+2+1 = 4.
Example 3: Disconnected Graph
Time and Space Complexity
- Time: O(E log V) with a binary heap—each edge is considered at most once, and we do O(E) heap operations of cost O(log V).
- Space: O(V) for in_mst, min_cost, parent; O(V) for the heap in the worst case.
Prim vs Kruskal
- Prim: Grows one tree from a source; uses a min-heap of "crossing" edges; good for dense graphs (can be O(V²) with an array instead of heap).
- Kruskal: Sorts all edges and adds the smallest that doesn't create a cycle (Union-Find); O(E log E); often simpler and good for sparse graphs.
Edge Cases
- Disconnected graph: Prim from one node gives an MST of that component only; run for each component to get an MST forest.
- Single node: MST has weight 0 and no edges.
- Multiple edges between same pair: Use the minimum weight; the algorithm naturally uses the smallest when relaxing.
Common Mistakes
Pattern Recognition
Use Prim (or Kruskal) when you see:
- "Minimum spanning tree," "connect all nodes at minimum cost," "minimum wiring/cabling."
- Problems that reduce to MST (e.g. clustering with minimum total distance).
Interview Insight
Summary
- Prim's algorithm builds an MST by growing a single tree, always adding the minimum-weight edge to a new vertex.
- Implementation is similar to Dijkstra; key = min edge weight from tree to node, not path length.
- Time O(E log V) with heap, space O(V). For disconnected graphs, run on each component.
13.9 Kruskal's Algorithm
Introduction
Kruskal's algorithm finds a Minimum Spanning Tree (MST) by considering edges in increasing order of weight and adding an edge to the MST only if it does not create a cycle. It uses a Union-Find (Disjoint Set Union) data structure to check in nearly constant time whether adding an edge would connect two vertices that are already in the same connected component. Kruskal is simple to implement and often preferred for sparse graphs.
Why Kruskal Works
The greedy choice: the minimum-weight edge that does not form a cycle is always part of some MST (cut property). So we sort all edges, then process them from smallest to largest; for each edge (u, v, w), if u and v are not yet in the same component, add the edge and merge their components. After processing all edges, we have exactly V − 1 edges (for a connected graph) and they form an MST.
Real-World Analogy
Imagine you have a list of possible road segments between cities, each with a cost. You want to connect all cities at minimum total cost without building redundant roads (no cycles). Sort the segments by cost, then add the cheapest segment that doesn't already connect two cities that are connected (directly or indirectly). That's Kruskal.
Algorithm (High-Level)
- Sort all edges by weight (ascending).
- Initialize a Union-Find structure with each vertex in its own set.
- For each edge (u, v, w) in sorted order:
- If
find(u) != find(v), add the edge to the MST andunion(u, v). - Otherwise skip (u and v are already in the same component; adding this edge would create a cycle).
- If
- Stop when you have added V − 1 edges (connected graph) or when no more edges remain.
Union-Find (Disjoint Set) Recap
We need two operations: find(x) — which set does x belong to? and union(x, y) — merge the sets containing x and y. With path compression and union by rank, both are nearly O(1) amortized. We use this to check "are u and v in the same component?" before adding an edge.
Mental Model
- Start with each vertex as its own "island." Edges are bridges. Sort bridges by cost.
- Pick the cheapest bridge; if it connects two different islands, build it and merge the islands. Repeat until you have one island (one connected component) and exactly V − 1 bridges (MST).
ASCII Diagram
Same graph as Prim (4 nodes):
0 --- 1 Edges sorted by weight: (0,1,1), (2,3,1), (0,3,2), (1,3,3), (0,2,4)
|\ |
| \ | Kruskal: add (0,1,1), add (2,3,1), add (0,3,2). Now 0,1,2,3 connected.
2 --- 3 Skip (1,3,3) — 1 and 3 same component. Skip (0,2,4) — 0 and 2 same component.
MST: (0,1), (2,3), (0,3). Total weight = 4.
Python Implementation (with Union-Find)
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
px, py = self.find(x), self.find(y)
if px == py:
return False
if self.rank[px] < self.rank[py]:
px, py = py, px
self.parent[py] = px
if self.rank[px] == self.rank[py]:
self.rank[px] += 1
return True
def kruskal(n, edges):
"""
n: number of vertices (0..n-1).
edges: list of (u, v, w) for undirected edges.
Returns: (mst_total_weight, mst_edges).
"""
edges_sorted = sorted(edges, key=lambda e: e[2])
uf = UnionFind(n)
mst_edges = []
mst_weight = 0
for u, v, w in edges_sorted:
if uf.find(u) != uf.find(v):
uf.union(u, v)
mst_edges.append((u, v, w))
mst_weight += w
if len(mst_edges) == n - 1:
break
return mst_weight, mst_edges
Examples Section
Example 1: Same Graph as Prim
n = 4
edges = [(0, 1, 1), (0, 2, 4), (0, 3, 2), (1, 3, 3), (2, 3, 1)]
weight, mst = kruskal(n, edges)
print("MST weight:", weight) # 4
print("MST edges:", mst) # [(0, 1, 1), (2, 3, 1), (0, 3, 2)]
Output: MST weight: 4, MST edges: [(0, 1, 1), (2, 3, 1), (0, 3, 2)]. Same total weight as Prim; edge set may differ but cost is the same.
Example 2: Step-by-Step
- Sorted edges: (0,1,1), (2,3,1), (0,3,2), (1,3,3), (0,2,4).
- Add (0,1,1): components {0,1}, {2}, {3}. MST = [(0,1,1)].
- Add (2,3,1): components {0,1}, {2,3}. MST = [(0,1,1), (2,3,1)].
- Add (0,3,2): merge {0,1} and {2,3} → one component. MST = [(0,1,1), (2,3,1), (0,3,2)].
- We have 3 edges = n−1; stop. (1,3,3) and (0,2,4) would create cycles; skip.
Example 3: Disconnected Graph (MST Forest)
No code change needed: just stop when no more edges can be added; the result may have fewer than n−1 edges.
Time and Space Complexity
- Time: O(E log E) for sorting edges; O(E · α(V)) for the Union-Find operations (α is inverse Ackermann, effectively constant). So O(E log E) overall, which is O(E log V) when E = O(V²).
- Space: O(V) for Union-Find; O(E) for the sorted edge list (or sort in place).
Kruskal vs Prim
- Kruskal: Sort edges once; no need for adjacency list; easy to implement; excellent for sparse graphs (E ≈ V).
- Prim: Uses a heap and adjacency list; good when you have one source or dense graphs (with array: O(V²)).
Edge Cases
- Disconnected graph: Result is a spanning forest; number of edges = V − number of components.
- Multiple edges between same pair: Include all in the edge list; the sort will consider the smallest first; Union-Find prevents duplicates in the MST.
- Equal weights: Any order among equal-weight edges is fine; the MST may not be unique but total weight is.
Common Mistakes
Pattern Recognition
Use Kruskal when you see:
- "Minimum spanning tree," "connect all nodes at minimum cost," especially when the input is an edge list.
- Sparse graphs (E not much larger than V); sorting E edges is cheap.
Interview Insight
Summary
- Kruskal's algorithm builds an MST by adding edges in increasing order of weight, skipping edges that would create a cycle (Union-Find).
- Time O(E log E), space O(V). Simple and ideal for sparse graphs.
- Requires a working Union-Find (Disjoint Set) for cycle detection.
13.10 Disjoint Set (Union-Find)
Introduction
A Disjoint Set Union (DSU), also called Union-Find, is a data structure that maintains a partition of elements into disjoint sets. It supports two main operations: find(x) — which set does x belong to? — and union(x, y) — merge the sets containing x and y. It is essential for Kruskal's MST, cycle detection in graphs, connected components, and many problems that ask "are x and y in the same group?" with dynamic merging.
Operations
- find(x): Return a representative (e.g. root) of the set containing x. If find(x) == find(y), then x and y are in the same set.
- union(x, y): Merge the sets containing x and y. After this, find(x) == find(y).
- Often we also want number of sets or size of set containing x; both can be maintained with minor extra bookkeeping.
Representation: Parent Array
We represent each set as a tree: each node points to its parent; the root points to itself.
The "representative" of a set is its root. We store parent[i] = parent of element i (or i if it
is the root). Initially parent[i] = i for all i (each element is its own set).
Find with Path Compression
To find the root of x, walk up the parent chain until we reach a node that points to itself. Path
compression: while traversing, set parent[x] = root for every node along the path so
that future finds are O(1) for those nodes. This keeps the tree flat and gives amortized near-constant time.
Union by Rank (or Size)
When merging two trees, attach the smaller tree under the larger tree's root so that the depth doesn't grow
unnecessarily. Union by rank: maintain a rank[i] (upper bound on height); when
merging, make the root with smaller rank point to the root with larger rank; if equal, increment the new
root's rank. Alternatively, union by size uses set size instead of rank. Both yield
amortized O(α(n)) per operation, where α is the inverse Ackermann function (effectively a constant).
Mental Model
- Each set is a tree; the root is the "representative." find(x) = go to root; union(x,y) = make one root point to the other.
- Path compression flattens the tree on every find; union by rank keeps trees short. Together they make operations extremely fast in practice.
ASCII Diagram
Initial: parent = [0,1,2,3,4] (5 singletons)
After union(0,1), union(2,3): sets {0,1}, {2,3}, {4}
parent might be [0,0,2,2,4] (1→0, 3→2)
After union(1,3): merge sets containing 1 and 3
find(1)=0, find(3)=2; union(0,2) → e.g. parent[2]=0
parent = [0,0,0,2,4] so find(4)=4, find(0)=find(1)=find(2)=find(3)=0
Python Implementation
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
self.count = n # number of disjoint sets
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # path compression
return self.parent[x]
def union(self, x, y):
px, py = self.find(x), self.find(y)
if px == py:
return False
if self.rank[px] < self.rank[py]:
px, py = py, px
self.parent[py] = px
if self.rank[px] == self.rank[py]:
self.rank[px] += 1
self.count -= 1
return True
def same(self, x, y):
return self.find(x) == self.find(y)
Examples Section
Example 1: Basic Union and Find
uf = UnionFind(5)
uf.union(0, 1)
uf.union(2, 3)
uf.union(1, 3)
print(uf.find(0), uf.find(2), uf.find(4)) # 0, 0, 4 (0 and 2 in same set after merge)
print(uf.same(0, 2), uf.same(0, 4)) # True, False
print("Number of sets:", uf.count) # 2
Output: 0 0 4, True False, Number of sets: 2 (sets {0,1,2,3} and {4}).
Example 2: Counting Connected Components (Graph)
def count_components(n, edges):
uf = UnionFind(n)
for u, v in edges:
uf.union(u, v)
return uf.count
n, edges = 5, [(0, 1), (1, 2), (3, 4)]
print(count_components(n, edges)) # 2 (components {0,1,2} and {3,4})
Example 3: Cycle Detection in Undirected Graph
def has_cycle(n, edges):
uf = UnionFind(n)
for u, v in edges:
if uf.same(u, v):
return True
uf.union(u, v)
return False
edges = [(0, 1), (1, 2), (2, 0)] # triangle
print(has_cycle(3, edges)) # True
Time and Space Complexity
- Amortized time per find or union: O(α(n)), where α(n) is the inverse Ackermann function (≤ 5 for any practical n). Effectively constant.
- Space: O(n) for parent and rank arrays.
Applications
- Kruskal's MST: Check if adding an edge connects two different components (find) and merge them (union).
- Connected components: Start with n sets; for each edge, union the two endpoints; number of sets = number of components.
- Cycle detection: If for edge (u,v) we have find(u)==find(v), we have a cycle.
- Dynamic connectivity: Answer "are u and v connected?" as edges are added (or removed, with more advanced structures).
Edge Cases
- Single element: find(x) returns x; union(x,x) does nothing (already same set).
- Union same set twice: union(x,y) after they're already in the same set is a no-op; return False if you use a boolean to indicate "did we merge?".
Common Mistakes
Pattern Recognition
Use Union-Find when you see:
- "Same group," "connected," "merge sets," "detect cycle" as edges or relations are added.
- Kruskal's algorithm, connected components, or "dynamic connectivity" style problems.
Interview Insight
Summary
- Union-Find maintains disjoint sets with find (representative) and union (merge).
- Path compression + union by rank give amortized O(α(n)) per operation and O(n) space.
- Essential for Kruskal, connected components, and cycle detection in graphs.
13.11 Tarjan's Algorithm (SCC & Bridges)
Introduction
Tarjan's algorithm refers to a family of DFS-based methods that use discovery time and low-link (or "low") values to find important structures in graphs. Two central applications are: (1) Strongly Connected Components (SCC) in directed graphs — maximal sets of vertices where every pair can reach each other; and (2) Bridges in undirected graphs — edges whose removal increases the number of connected components. Both run in O(V + E) with a single DFS (or two passes).
Part 1: Strongly Connected Components (SCC)
Definition
In a directed graph, an SCC is a maximal set of vertices such that for every pair u, v in the set, there is a path from u to v and from v to u. Each vertex belongs to exactly one SCC. Tarjan finds all SCCs in one DFS using a stack and "low-link" values.
Idea (Tarjan for SCC)
- During DFS, assign each vertex a discovery time (disc) and compute a low-link (low): the smallest disc reachable from the current vertex by following tree edges and at most one back/cross edge.
- Maintain a stack of vertices that might be in the current SCC. When we finish a vertex u and
low[u] == disc[u], u is the "root" of an SCC; pop from the stack until u is popped — those vertices form one SCC.
Python: Tarjan SCC
def tarjan_scc(adj):
n = len(adj)
disc = [-1] * n
low = [-1] * n
on_stack = [False] * n
stack = []
time = [0]
sccs = []
def dfs(u):
disc[u] = low[u] = time[0]
time[0] += 1
stack.append(u)
on_stack[u] = True
for v in adj[u]:
if disc[v] == -1:
dfs(v)
low[u] = min(low[u], low[v])
elif on_stack[v]:
low[u] = min(low[u], disc[v])
if low[u] == disc[u]:
scc = []
while True:
v = stack.pop()
on_stack[v] = False
scc.append(v)
if v == u:
break
sccs.append(scc)
for u in range(n):
if disc[u] == -1:
dfs(u)
return sccs
Example: SCC
adj = [[1], [2], [0, 3], [4], [3]]
print(tarjan_scc(adj)) # [[2, 1, 0], [4, 3]] or similar order
Part 2: Bridges (Undirected)
Definition
A bridge is an edge whose removal increases the number of connected components. In an undirected graph, edge (u, v) is a bridge if and only if there is no back edge from the subtree of v (in the DFS tree) to u or an ancestor of u. Equivalently: (u, v) is a bridge if low[v] > disc[u] (when u is the parent of v in the DFS tree). Root needs special handling (bridge if it has more than one child in the DFS tree).
Idea (Tarjan for Bridges)
Run DFS from each unvisited node. For each vertex v, compute low[v] = minimum of disc[v] and disc[w]
for all w reachable from v by tree edges and exactly one back edge. For tree edge (parent, v): if
low[v] > disc[parent], then (parent, v) is a bridge.
Python: Bridges
def find_bridges(n, edges):
adj = [[] for _ in range(n)]
for u, v in edges:
adj[u].append(v)
adj[v].append(u)
disc = [-1] * n
low = [-1] * n
time = [0]
bridges = []
def dfs(u, parent):
disc[u] = low[u] = time[0]
time[0] += 1
for v in adj[u]:
if disc[v] == -1:
dfs(v, u)
low[u] = min(low[u], low[v])
if low[v] > disc[u]:
bridges.append((u, v))
elif v != parent:
low[u] = min(low[u], disc[v])
for u in range(n):
if disc[u] == -1:
dfs(u, -1)
return bridges
Example: Bridges
edges = [(0, 1), (1, 2), (2, 0), (2, 3), (3, 4)]
print(find_bridges(5, edges)) # [(2, 3), (3, 4)]
Time and Space Complexity
- SCC: O(V + E) — one DFS; each vertex and edge processed once.
- Bridges: O(V + E) — one DFS with parent check.
- Space: O(V) for disc, low, stack/recursion.
Common Mistakes
low[v] instead of disc[v] when updating low from a back edge. The condition for a bridge is low[v] > disc[u]; for back-edge updates use disc[w] (not low[w]) so that we only consider a single back edge.
on_stack[v] before using disc[v] to update low[u].
Pattern Recognition
- SCC: "Strongly connected," "mutually reachable," condensation graph, 2-SAT.
- Bridges: "Critical connections," "edges whose removal disconnects," "articulation points" (related: vertex removal).
Interview Insight
Summary
- Tarjan SCC: One DFS with disc, low, stack; when low[u]==disc[u], pop to form an SCC.
- Tarjan Bridges: DFS with disc and low; (parent, v) is a bridge if low[v] > disc[parent].
- Both run in O(V + E). Use for strongly connected components and critical edges.
13.12 Bridges & Articulation Points
Introduction
In an undirected graph, two key concepts describe "critical" structure: bridges (edges whose removal increases the number of connected components) and articulation points, or cut vertices (vertices whose removal increases the number of connected components). Both can be found in O(V + E) with one DFS using discovery time and low-link values, similar to Tarjan's approach in the previous topic.
Bridges (Recap)
A bridge is an edge (u, v) such that removing it disconnects the graph (increases the number of connected components). In a DFS tree, a tree edge (parent, v) is a bridge if and only if low[v] > disc[parent] — meaning no back edge from the subtree of v reaches parent or above. See topic 13.11 for full Tarjan bridges implementation.
Articulation Points (Cut Vertices)
Definition
An articulation point is a vertex whose removal (together with its incident edges) increases the number of connected components. So the graph becomes "more disconnected" if we delete that vertex. Finding all articulation points helps identify single points of failure in networks.
Conditions (DFS Tree)
- Root of the DFS tree: The root is an articulation point if and only if it has at least two children in the DFS tree. (Removing it disconnects those subtrees.)
- Non-root vertex u: u is an articulation point if there exists a child v of u in the DFS tree such that low[v] ≥ disc[u]. That means no back edge from v's subtree reaches above u, so removing u would disconnect the subtree of v from the rest.
Mental Model
During DFS, for each vertex u we compute low[u] (earliest reachable discovery time). For a non-root u, if some child v has low[v] ≥ disc[u], then v's subtree has no back edge to u's ancestors — so u "splits" the graph. For the root, we simply count its children in the DFS tree.
Single DFS: Bridges and Articulation Points Together
One DFS can compute disc, low, and then determine both bridges and articulation points. We need to count children of the root separately to classify the root as an articulation point.
Python Implementation
def bridges_and_articulation_points(n, edges):
adj = [[] for _ in range(n)]
for u, v in edges:
adj[u].append(v)
adj[v].append(u)
disc = [-1] * n
low = [-1] * n
time = [0]
bridges = []
is_articulation = [False] * n
def dfs(u, parent):
disc[u] = low[u] = time[0]
time[0] += 1
children = 0
for v in adj[u]:
if disc[v] == -1:
children += 1
dfs(v, u)
low[u] = min(low[u], low[v])
if parent != -1 and low[v] >= disc[u]:
is_articulation[u] = True
if low[v] > disc[u]:
bridges.append((u, v))
elif v != parent:
low[u] = min(low[u], disc[v])
if parent == -1 and children >= 2:
is_articulation[u] = True
for u in range(n):
if disc[u] == -1:
dfs(u, -1)
articulation_points = [u for u in range(n) if is_articulation[u]]
return bridges, articulation_points
Examples Section
Example 1: Bridges and Articulation Points
edges = [(0, 1), (1, 2), (2, 0), (2, 3), (3, 4)]
bridges, ap = bridges_and_articulation_points(5, edges)
print("Bridges:", bridges) # [(2, 3), (3, 4)]
print("Articulation points:", ap) # [2, 3]
Output: Bridges: [(2, 3), (3, 4)], Articulation points: [2, 3].
Example 2: Root as Articulation Point
edges = [(0, 1), (0, 2), (0, 3)]
_, ap = bridges_and_articulation_points(4, edges)
print(ap) # [0]
Example 3: No Articulation Point
edges = [(0, 1), (1, 2), (2, 3), (3, 0)]
bridges, ap = bridges_and_articulation_points(4, edges)
print("Bridges:", bridges, "AP:", ap) # [], []
Time and Space Complexity
- Time: O(V + E) — one DFS.
- Space: O(V) for disc, low, and the articulation flag.
Edge Cases
- Disconnected graph: Run DFS from each unvisited vertex; each DFS root is checked for ≥2 children. Vertices in other components are not reachable and are correctly not affected.
- Single vertex or two vertices: No bridge (or one edge that is a bridge depending on definition); articulation point check for root (two vertices: root has one child, so not AP).
Common Mistakes
low[v] > disc[u] for articulation points. The correct condition for a non-root u is low[v] ≥ disc[u] (≥, not >). Equality can occur when v has a back edge to u itself; removing u still disconnects v's subtree if there's no other path.
Pattern Recognition
Use this when you see:
- "Critical nodes," "vertices whose removal disconnects," "single point of failure," "articulation points."
- "Critical edges," "bridges," "edges whose removal disconnects" (see also 13.11).
Interview Insight
Summary
- Bridge: edge (u,v) with low[v] > disc[u] (in DFS tree with u as parent).
- Articulation point: root with ≥2 children, or non-root u with a child v such that low[v] ≥ disc[u].
- One DFS finds both in O(V + E).
13.13 Bipartite Graph
Introduction
A graph is bipartite if its vertices can be partitioned into two sets (say A and B) such that no edge has both endpoints in the same set — every edge goes between A and B. Equivalently, the graph is 2-colorable: we can assign two "colors" to vertices so that no two adjacent vertices share the same color. Bipartite graphs are exactly those that contain no odd-length cycle. Checking whether a graph is bipartite is done by BFS or DFS coloring in O(V + E).
Formal Definition
- Partition: V = A ∪ B, A ∩ B = ∅, and every edge (u, v) has u in A and v in B (or vice versa).
- 2-colorable: There exists a function color : V → {0, 1} such that for every edge (u, v), color(u) ≠ color(v).
- No odd cycle: The graph has no cycle of odd length. (If it had an odd cycle, 2-coloring would be impossible.)
Real-World Analogy
Imagine people and jobs: edges mean "person can do job." You want to split people and jobs into two groups so that every "can do" link is between the two groups (people on one side, jobs on the other). That's a bipartite structure. Scheduling conflicts (e.g. events that can't share a room) also model as edges; 2-coloring means assigning two time slots so that conflicting events get different slots — possible only if the conflict graph is bipartite.
Algorithm: BFS/DFS 2-Coloring
Start from an arbitrary unvisited vertex; assign it color 0. Then traverse the graph (BFS or DFS). For each edge (u, v), if v is unvisited, assign v the opposite color of u (1 − color[u]). If v is already visited, check that color[v] ≠ color[u]; if color[v] == color[u], we have found an edge inside the same "side" — the graph is not bipartite. If we finish without conflict, the graph is bipartite.
Mental Model
- Think of "layers": from a start vertex, layer 0 gets color 0, layer 1 gets color 1, layer 2 gets color 0, and so on. If any edge connects two vertices of the same layer (same color), that edge creates an odd cycle — not bipartite.
Python Implementation (BFS)
from collections import deque
def is_bipartite(n, edges):
adj = [[] for _ in range(n)]
for u, v in edges:
adj[u].append(v)
adj[v].append(u)
color = [-1] * n # -1 = unvisited; 0 and 1 = two colors
for start in range(n):
if color[start] != -1:
continue
color[start] = 0
q = deque([start])
while q:
u = q.popleft()
for v in adj[u]:
if color[v] == -1:
color[v] = 1 - color[u]
q.append(v)
elif color[v] == color[u]:
return False
return True
Examples Section
Example 1: Bipartite Graph
edges = [(0, 1), (1, 2), (2, 3), (3, 0)]
print(is_bipartite(4, edges)) # True
Example 2: Not Bipartite (Odd Cycle)
edges = [(0, 1), (1, 2), (2, 0)]
print(is_bipartite(3, edges)) # False
Example 3: Disconnected Graph
# Square 0-1-2-3-0 and triangle 4-5-6-4
edges = [(0, 1), (1, 2), (2, 3), (3, 0), (4, 5), (5, 6), (6, 4)]
print(is_bipartite(7, edges)) # False
Time and Space Complexity
- Time: O(V + E) — each vertex and edge is processed once (we may start BFS/DFS from each unvisited vertex).
- Space: O(V) for color array and the queue (or recursion stack).
Edge Cases
- Single vertex or empty graph: Trivially bipartite (one or zero colors needed).
- Disconnected: Must check every component; if any component is not bipartite, the whole graph is not.
- No edges: Every vertex can get the same color in principle, but we can still assign alternating colors per component; the graph is bipartite.
Common Mistakes
Pattern Recognition
Think "bipartite" when you see:
- "Two groups," "no two adjacent same type," "2-colorable," "schedule with two slots so no conflict."
- Problems that use bipartite matching (maximum matching in bipartite graphs) or "can we split into two sets with no internal edges?"
Interview Insight
Summary
- Bipartite = vertices can be split into two sets with no edge inside a set = 2-colorable = no odd cycle.
- Algorithm: BFS/DFS with colors 0 and 1; conflict (same color on both ends of an edge) ⇒ not bipartite.
- Time O(V + E), space O(V). Check every component.
13.14 Eulerian Path
Introduction
An Eulerian path is a path in a graph that visits every edge exactly once. An Eulerian circuit (or Eulerian cycle) is an Eulerian path that starts and ends at the same vertex. The famous "Seven Bridges of Königsberg" problem asked whether such a walk exists. Euler showed that it depends on the degrees of the vertices. We will see the conditions for undirected and directed graphs and a simple algorithm (Hierholzer's) to build an Eulerian circuit or path in O(E).
Definitions
- Eulerian path: Walk that uses every edge exactly once (vertices may repeat).
- Eulerian circuit: Eulerian path that is closed (start = end).
Conditions: Undirected Graph
- Eulerian circuit exists iff the graph is connected (except isolated vertices) and every vertex has even degree.
- Eulerian path (not circuit) exists iff the graph is connected (except isolated vertices) and exactly 0 or 2 vertices have odd degree. If 2 vertices have odd degree, any Eulerian path must start at one of them and end at the other.
Conditions: Directed Graph
- Eulerian circuit exists iff the graph is strongly connected (or one connected component when we ignore direction) and for every vertex in-degree = out-degree.
- Eulerian path exists iff: at most one vertex has out_degree − in_degree = 1 (start), at most one has in_degree − out_degree = 1 (end), and all others have in_degree = out_degree. The underlying graph (ignoring direction) must be connected.
Hierholzer's Algorithm
To build an Eulerian circuit: start from a vertex (or from an "start" vertex if we want a path), and do a DFS that removes each edge as it is used. When we get stuck (current vertex has no outgoing edges left), that vertex is a "dead end" — we push it onto a path and backtrack. The final path is built in reverse order; reverse it to get the actual Eulerian circuit (or path). Alternatively, build the path by appending vertices when we backtrack (then reverse at the end). Time O(E).
Mental Model
Imagine tracing a pencil along edges without lifting it, using each edge once. You can only get "stuck" at the start/end vertex (for a path) or at the same vertex you started (for a circuit). Hierholzer simulates this by going as far as possible, then backtracking and recording vertices when we have no way out — that gives the reverse of the Eulerian order.
Python Implementation (Undirected Eulerian Circuit)
from collections import defaultdict
def eulerian_circuit_undirected(n, edges):
"""
n: vertices 0..n-1. edges: list of (u, v).
Returns list of vertices in order of Eulerian circuit, or [] if none exists.
"""
deg = [0] * n
adj = defaultdict(list)
for u, v in edges:
adj[u].append(v)
adj[v].append(u)
deg[u] += 1
deg[v] += 1
if any(d % 2 != 0 for d in deg):
return []
# Start from a vertex with at least one edge
start = next(i for i in range(n) if deg[i] > 0)
path = []
stack = [start]
while stack:
u = stack[-1]
if adj[u]:
v = adj[u].pop()
adj[v].remove(u) # remove reverse edge
stack.append(v)
else:
path.append(stack.pop())
path.reverse()
return path
Note: Using a multiset or keeping an index per vertex for adjacency list avoids slow remove. For simplicity we show the idea; in practice use a list of pairs and a "next" pointer per vertex, or a multiset.
Finding Start Vertex for Eulerian Path
For an Eulerian path (not circuit), pick the start vertex as follows: if there are two vertices with odd degree, start at one of them (the path will end at the other). If all degrees are even, start at any vertex with non-zero degree. Then run the same "remove edge and recurse, push on backtrack" logic; reverse the resulting list to get the path order. For a clean O(E) implementation, use an adjacency list with a "next index" per vertex so you don't scan already-used edges.
Examples Section
Example 1: Eulerian Circuit (All Even Degrees)
Degrees: 0:2, 1:2, 2:2. Condition satisfied; circuit exists.
Example 2: Eulerian Path (Two Odd Degrees)
Two odd-degree vertices ⇒ path exists; start at one, end at the other.
Example 3: No Eulerian Path (Four Odd Degrees)
Time and Space Complexity
- Time: O(E) to check degrees and O(E) for Hierholzer (each edge used once). Total O(E) (with suitable data structure to avoid O(E) per edge removal).
- Space: O(V + E) for graph and path.
Edge Cases
- Disconnected graph: If more than one component has edges, no Eulerian path/circuit uses all edges. Check connectivity (or degree conditions only for the component that has edges).
- Zero edges: Single vertex is a trivial circuit.
- Multiple edges / self-loops: Conditions and algorithm extend; count multiplicity in degrees.
Common Mistakes
Pattern Recognition
Think "Eulerian" when you see:
- "Use every edge exactly once," "trace without lifting the pencil," "postman route."
- Problems that reduce to finding a closed or open walk covering all edges.
Interview Insight
Summary
- Eulerian circuit: All vertices even degree (undirected); in = out (directed).
- Eulerian path: Exactly 0 or 2 odd-degree vertices (undirected); one start and one end vertex (directed).
- Hierholzer: DFS, remove edges, push vertex when stuck; reverse to get the path. O(E).
13.15 Hamiltonian Path
Introduction
A Hamiltonian path is a path in a graph that visits every vertex exactly once. A Hamiltonian cycle (or circuit) is a Hamiltonian path that starts and ends at the same vertex. Unlike the Eulerian problem (which has a simple degree-based characterization and O(E) algorithm), determining whether a graph has a Hamiltonian path or cycle is NP-complete in general. We typically use backtracking or DP with bitmask for small graphs (small V).
Hamiltonian vs Eulerian
- Eulerian: Visit every edge exactly once. Polynomial-time check and construction (degree conditions + Hierholzer).
- Hamiltonian: Visit every vertex exactly once. NP-complete; no simple necessary-and-sufficient condition for general graphs.
Definitions
- Hamiltonian path: Permutation (v₁, v₂, …, vₙ) of the vertices such that every consecutive pair is adjacent (there is an edge between vᵢ and vᵢ₊₁).
- Hamiltonian cycle: Hamiltonian path with an edge from the last vertex back to the first.
Why It's Hard
There is no known condition like "all degrees even" that characterizes Hamiltonian graphs. Some sufficient conditions exist (e.g. Dirac's theorem: if the graph has n ≥ 3 vertices and every vertex has degree ≥ n/2, then the graph has a Hamiltonian cycle), but they are not necessary. In practice we use backtracking or DP for small n.
Approach 1: Backtracking (DFS)
Try to build a path vertex by vertex. From the current vertex u, try each unvisited neighbor v; recurse. If we ever have visited all n vertices, we have a Hamiltonian path. If we need a cycle, also check that the last vertex is adjacent to the start. Backtrack when we hit a dead end. Time in the worst case is O(n!) for n vertices; with pruning it can be faster for sparse graphs.
Approach 2: DP with Bitmask
For small n (e.g. n ≤ 20), we can use dynamic programming: state is (mask, last) where
mask is a bitmask of visited vertices and last is the last vertex in the path.
dp[mask][last] = true if there is a path that visits exactly the vertices in mask and ends at
last. Transition: extend by a neighbor v not in mask. Base: mask has one bit set (start vertex). Answer:
any state with mask = (1<<n)−1 for path; for cycle, also require an edge from last to start. Time
O(n² · 2ⁿ), space O(n · 2ⁿ).
Python Implementation: Backtracking
def hamiltonian_path_exists(adj, n):
"""Returns True if the graph has a Hamiltonian path. adj: adjacency list."""
def dfs(path, visited):
if len(path) == n:
return True
u = path[-1]
for v in adj[u]:
if not visited[v]:
visited[v] = True
path.append(v)
if dfs(path, visited):
return True
path.pop()
visited[v] = False
return False
for start in range(n):
visited = [False] * n
visited[start] = True
if dfs([start], visited):
return True
return False
Python Implementation: DP Bitmask (Path)
def hamiltonian_path_dp(adj, n):
"""Returns True if graph has Hamiltonian path. adj: list of sets or lists (neighbors)."""
# dp[mask][last] = can we visit all in mask and end at last?
dp = [[False] * n for _ in range(1 << n)]
for i in range(n):
dp[1 << i][i] = True
for mask in range(1 << n):
for last in range(n):
if not dp[mask][last]:
continue
for v in adj[last]:
if mask & (1 << v):
continue
new_mask = mask | (1 << v)
dp[new_mask][v] = True
full = (1 << n) - 1
return any(dp[full][v] for v in range(n))
Examples Section
Example 1: Graph With Hamiltonian Path
adj = [[1], [0, 2], [1, 3], [2]]
print(hamiltonian_path_exists(adj, 4)) # True
Example 2: Graph With Hamiltonian Cycle
For cycle: after finding a path of length n, check if the last vertex is adjacent to the start.
Example 3: No Hamiltonian Path
adj = [[1], [0, 2], [1], []] # 3 is isolated
print(hamiltonian_path_exists(adj, 4)) # False
Time and Space Complexity
- Backtracking: Worst case O(n!) (try all orderings); with pruning, often much better.
- DP bitmask: O(n² · 2ⁿ) time, O(n · 2ⁿ) space. Practical for n up to about 20.
Edge Cases
- Single vertex: Trivially has a Hamiltonian path (and cycle if we allow "empty" cycle).
- Disconnected graph: No Hamiltonian path that visits all vertices (cannot reach other components).
- Two vertices with one edge: That edge is a Hamiltonian path and (if we consider the same vertex twice) we need to define cycle; typically two vertices with one edge: path yes, cycle no (no loop).
Common Mistakes
Pattern Recognition
Think "Hamiltonian" when you see:
- "Visit every city exactly once," "TSP (Traveling Salesman)" — TSP asks for a minimum-weight Hamiltonian cycle.
- "Order all nodes with constraints," "permutation of vertices with adjacency constraints."
Interview Insight
Summary
- Hamiltonian path visits every vertex exactly once; Hamiltonian cycle is a closed such path.
- Problem is NP-complete. Use backtracking (small/sparse) or DP bitmask (n ≤ ~20) for exact solution.
- DP: state (mask, last), transition by unvisited neighbors; O(n² · 2ⁿ) time.
13.16 Network Flow
Introduction
Network flow is a powerful model for problems where something \"flows\" through a network: water through pipes, cars through roads, data through links, or goods in a supply chain. The classic problem is Maximum Flow: given a directed graph with capacities on edges, a source s, and a sink t, find the maximum amount of flow that can be sent from s to t without exceeding capacities or violating conservation at intermediate nodes.
Basic Definitions
- Capacity c(u, v): Maximum allowed flow on edge (u, v) (often a non-negative integer).
- Flow f(u, v): Actual flow along (u, v), with 0 ≤ f(u, v) ≤ c(u, v).
- Conservation: For every vertex u except source s and sink t, total incoming flow equals total outgoing flow.
- Value of flow: Total flow out of s (or into t).
- Residual graph: Graph that indicates how we can adjust flow: residual capacity c_f(u, v) = c(u, v) − f(u, v) for forward edges, and c_f(v, u) = f(u, v) for backward edges.
Max-Flow / Min-Cut Theorem
The Max-Flow / Min-Cut Theorem states that the value of the maximum s–t flow equals the capacity of the minimum s–t cut (a partition of the vertices into two sets S and T = V \\ S with s in S and t in T, capacity = sum of capacities of edges from S to T). Many network design and scheduling problems can be reduced to max-flow or min-cut.
Ford-Fulkerson and Edmonds-Karp
- Ford-Fulkerson: Repeatedly find an augmenting path from s to t in the residual graph and push as much flow as possible along it. Complexity depends on how paths are chosen; if capacities are integers, it terminates.
- Edmonds-Karp: A specific implementation of Ford-Fulkerson that always chooses the shortest augmenting path in terms of number of edges using BFS. It runs in O(V · E²) time.
Mental Model
Think of starting with zero flow. You repeatedly find a path from s to t along edges that have remaining capacity (residual capacity > 0) and push as much flow as possible along that path (the bottleneck). When no such path remains, you have a maximum flow. The residual graph tells you where you can still push more flow (forward edges) or cancel some you previously sent (backward edges).
Python Implementation: Edmonds-Karp (Adjacency List)
from collections import deque
def edmonds_karp(n, edges, source, sink):
"""
n: number of vertices (0..n-1)
edges: list of (u, v, c) directed edges with capacity c
Returns: max_flow value
"""
# Build capacity matrix and adjacency list
capacity = [[0] * n for _ in range(n)]
adj = [[] for _ in range(n)]
for u, v, c in edges:
capacity[u][v] += c # allow parallel edges by accumulating
adj[u].append(v)
adj[v].append(u) # add reverse edge for residual graph
def bfs():
parent = [-1] * n
parent[source] = source
q = deque([source])
while q:
u = q.popleft()
for v in adj[u]:
if parent[v] == -1 and capacity[u][v] > 0:
parent[v] = u
if v == sink:
return parent
q.append(v)
return parent
max_flow = 0
while True:
parent = bfs()
if parent[sink] == -1:
break # no more augmenting paths
# find bottleneck capacity along path
flow = float('inf')
v = sink
while v != source:
u = parent[v]
flow = min(flow, capacity[u][v])
v = u
# update residual capacities
v = sink
while v != source:
u = parent[v]
capacity[u][v] -= flow
capacity[v][u] += flow
v = u
max_flow += flow
return max_flow
Examples Section
Example 1: Simple Flow Network
n = 4
edges = [
(0, 1, 3),
(0, 2, 2),
(1, 2, 1),
(1, 3, 2),
(2, 3, 3),
]
print(edmonds_karp(n, edges, 0, 3)) # 4
One maximum flow: 0→1→3 with 2 units and 0→2→3 with 2 units, total 4.
Example 2: Bottleneck Intuition
Time and Space Complexity
- Edmonds-Karp: O(V · E²) time, O(V²) space for capacity matrix + O(V + E) for adjacency.
- Ford-Fulkerson (generic): O(E · |max_flow|) in the worst case (depends on capacities and path choices).
Edge Cases
- No path from s to t: BFS never reaches t; max flow = 0.
- Parallel edges: Handled by summing capacities between the same u, v.
- Self-loops: Typically irrelevant for s–t flow and can be ignored or left with zero capacity.
Common Mistakes
Pattern Recognition
Use max-flow / network flow when you see:
- \"Maximum number of disjoint paths,\" \"maximum matching\" (in bipartite graphs), \"assign people to tasks,\" \"route as many units as possible.\"
- Capacity constraints on edges or nodes, and conservation of some quantity except at sources/sinks.
Interview Insight
Summary
- Network flow models flows with capacities and conservation; max-flow seeks the largest s–t flow.
- Residual graph and augmenting paths underpin Ford-Fulkerson and Edmonds-Karp.
- Edmonds-Karp: BFS for augmenting paths, O(V·E²); foundation for many advanced flow algorithms.
13.17 Edmonds-Karp
Introduction
Edmonds-Karp is the name for the max-flow algorithm that implements Ford–Fulkerson by always choosing the shortest augmenting path (in number of edges) from source to sink in the residual graph, using BFS. This choice guarantees at most O(V · E) augmentations and total time O(V · E²), making it a standard, easy-to-code algorithm for maximum flow.
Why “Shortest” Augmenting Path?
In generic Ford–Fulkerson, we only require some augmenting path; the number of augmentations can be huge (even infinite for irrational capacities). Edmonds–Karp fixes this: by always taking a shortest s–t path in the residual graph, one can prove that the distance from s to any vertex (in edges) never decreases. Each edge can be “critical” (bottleneck on a shortest path) at most O(V) times, so there are at most O(V · E) augmentations. Each BFS costs O(E), hence O(V · E²) total.
Algorithm Summary
- Initialize flow to zero; residual capacities = original capacities.
- Repeat: run BFS from s to t in the residual graph (only edges with residual capacity > 0). If t is unreachable, stop.
- Reconstruct the path via parent pointers; find bottleneck = minimum residual capacity on the path.
- Augment: subtract bottleneck from forward residual capacities, add bottleneck to backward residual capacities. Add bottleneck to total flow.
Python Implementation
from collections import deque
def edmonds_karp(n, edges, source, sink):
capacity = [[0] * n for _ in range(n)]
adj = [[] for _ in range(n)]
for u, v, c in edges:
capacity[u][v] += c
adj[u].append(v)
adj[v].append(u)
def bfs():
parent = [-1] * n
parent[source] = source
q = deque([source])
while q:
u = q.popleft()
for v in adj[u]:
if parent[v] == -1 and capacity[u][v] > 0:
parent[v] = u
if v == sink:
return parent
q.append(v)
return parent
max_flow = 0
while True:
parent = bfs()
if parent[sink] == -1:
break
flow_inc = float('inf')
v = sink
while v != source:
u = parent[v]
flow_inc = min(flow_inc, capacity[u][v])
v = u
v = sink
while v != source:
u = parent[v]
capacity[u][v] -= flow_inc
capacity[v][u] += flow_inc
v = u
max_flow += flow_inc
return max_flow
Examples Section
Example 1: Two Paths
n = 4
edges = [(0, 1, 10), (0, 2, 10), (1, 3, 10), (2, 3, 10)]
print(edmonds_karp(n, edges, 0, 3)) # 20
Example 2: Bottleneck in the Middle
edges2 = [(0, 1, 100), (0, 2, 100), (1, 2, 1), (1, 3, 100), (2, 3, 100)]
print(edmonds_karp(4, edges2, 0, 3)) # 200
Example 3: No Path
edges3 = [(0, 1, 5), (2, 3, 5)]
print(edmonds_karp(4, edges3, 0, 3)) # 0
Time and Space Complexity
- Time: O(V · E²) — O(V · E) augmentations, each requiring O(E) BFS.
- Space: O(V²) for capacity matrix; O(V + E) for adjacency list and BFS.
Common Mistakes
Summary
- Edmonds-Karp = Ford–Fulkerson with BFS for shortest augmenting path.
- O(V · E²) time, O(V²) space; deterministic and easy to implement.
- Use whenever you need max-flow in practice for moderate-sized graphs; for very large graphs, consider Dinic or other algorithms.
13.18 Dinic
Introduction
Dinic's algorithm (also written Dinitz) is a faster maximum flow algorithm than Edmonds-Karp. It uses level graphs and blocking flows: in each phase, it builds a layered graph with BFS and then pushes as much flow as possible along many paths in that layer using DFS, until no more flow can be sent (blocking flow). For general graphs it runs in O(V² · E); for unit capacities or bipartite matching it is often even faster in practice.
Core Ideas
- Level graph: Run BFS from the source in the residual graph. Assign each vertex a
level = distance in edges from source. Keep only edges (u, v) where
level[v] = level[u] + 1. This gives a DAG (no backward edges in the level graph). - Blocking flow: In the level graph, send flow from s to t until no s–t path remains in the level graph (every path has at least one saturated edge). That flow is a blocking flow for this phase.
- Phases: Each phase: (1) BFS to build level graph; (2) repeated DFS to find and push blocking flow; then update residual and repeat until no augmenting path exists.
Mental Model
Edmonds-Karp sends flow along one shortest path per BFS. Dinic "locks in" the current distances and sends flow along many shortest paths in the same level graph before recomputing levels. That reduces the number of BFS phases and improves performance.
Edge Representation (Forward + Reverse)
Store each logical edge as two directed edges: forward (u→v with capacity c) and backward (v→u with capacity 0). Each edge stores a pointer to its reverse so we can update both when sending flow: forward.cap −= f, reverse.cap += f.
Python Implementation
from collections import deque
class Dinic:
def __init__(self, n):
self.n = n
self.adj = [[] for _ in range(n)] # each: [to, cap, rev_index]
def add_edge(self, u, v, c):
fwd = [v, c, None]
bwd = [u, 0, None]
fwd[2] = len(self.adj[v])
bwd[2] = len(self.adj[u])
self.adj[u].append(fwd)
self.adj[v].append(bwd)
def bfs_level(self, s, t):
self.level = [-1] * self.n
q = deque()
self.level[s] = 0
q.append(s)
while q:
u = q.popleft()
for v, cap, rev in self.adj[u]:
if cap > 0 and self.level[v] == -1:
self.level[v] = self.level[u] + 1
q.append(v)
return self.level[t] != -1
def dfs_flow(self, u, t, f, it):
if u == t:
return f
for i in range(it[u], len(self.adj[u])):
it[u] = i
v, cap, rev = self.adj[u][i]
if cap > 0 and self.level[v] == self.level[u] + 1:
d = self.dfs_flow(v, t, min(f, cap), it)
if d > 0:
self.adj[u][i][1] -= d
self.adj[v][rev][1] += d
return d
return 0
def max_flow(self, s, t):
flow = 0
INF = 10**18
while self.bfs_level(s, t):
it = [0] * self.n
while True:
f = self.dfs_flow(s, t, INF, it)
if f == 0:
break
flow += f
return flow
Examples Section
Example 1: Same Network as Edmonds-Karp
n = 4
dinic = Dinic(n)
dinic.add_edge(0, 1, 3)
dinic.add_edge(0, 2, 2)
dinic.add_edge(1, 2, 1)
dinic.add_edge(1, 3, 2)
dinic.add_edge(2, 3, 3)
print(dinic.max_flow(0, 3)) # 4
Example 2: Two Disjoint Paths
dinic2 = Dinic(4)
dinic2.add_edge(0, 1, 10)
dinic2.add_edge(1, 3, 10)
dinic2.add_edge(0, 2, 10)
dinic2.add_edge(2, 3, 10)
print(dinic2.max_flow(0, 3)) # 20
Time and Space Complexity
- Time: O(V² · E) for general graphs. For unit capacities or bipartite matching, often O(√V · E) or better in practice.
- Space: O(V + E) for adjacency lists and edge structures.
Common Mistakes
it[] in DFS. Without it, we rescan from the start of each vertex's list and performance degrades; the iterator ensures we don't revisit saturated edges in the same phase.
rev breaks residual updates and gives wrong flow.
Pattern Recognition
Use Dinic when you see:
- Max-flow on larger graphs where Edmonds-Karp may be too slow.
- Bipartite matching, scheduling, or routing modeled as flow.
- Competitive programming problems with strict time limits.
Summary
- Dinic uses level graphs (BFS) and blocking flows (DFS) for maximum flow.
- O(V² · E) in general; faster on many special cases; preferred over Edmonds-Karp for large graphs.
13.19 Johnson's Algorithm
Introduction
Johnson's algorithm finds all-pairs shortest paths in a directed graph that may have negative edge weights (but no negative cycles). It combines Bellman-Ford (to compute a "potential" function) with Dijkstra (run once per vertex on reweighted edges). The reweighting makes all edge weights non-negative so Dijkstra is valid, and the final distances are corrected to get the true shortest paths. Total time is O(V² log V + V E) with a binary heap, or O(V E + V²) with a Fibonacci heap.
Why Not Just Bellman-Ford or Floyd-Warshall?
- Bellman-Ford from each source: O(V² · E) — slow for all pairs.
- Floyd-Warshall: O(V³) and handles negative weights, but no benefit from sparse graphs.
- Johnson: One Bellman-Ford O(V·E) plus V runs of Dijkstra O(E log V) each ⇒ O(V·E + V·E log V) = O(V E log V) for sparse graphs, which can beat Floyd-Warshall when E is small.
Algorithm Overview
- Add a dummy vertex s with zero-weight edges to every other vertex. Run Bellman-Ford from s. Let
h[v]= shortest distance from s to v. If Bellman-Ford detects a negative cycle, stop (no finite all-pairs distances). - Reweight edges: For each edge (u, v) with weight w, set
w'(u,v) = w(u,v) + h[u] − h[v]. The key property: w' ≥ 0 (triangle inequality from shortest paths), and any path's length in w' differs from its length in w by a constant that depends only on endpoints:dist'(u,v) = dist(u,v) + h[u] − h[v]. - Run Dijkstra from each vertex u in the graph with the new weights w'. Get
d'[u][v]= shortest distance from u to v under w'. - Convert back: True shortest distance
dist[u][v] = d'[u][v] − h[u] + h[v].
Mental Model
The potential h[v] represents "how much cheaper" it is to get to v from the dummy source. Reweighting
with + h[u] − h[v] makes paths that go "downhill" in potential have non-negative cost, so Dijkstra
works. The correction − h[u] + h[v] in the final step removes the reweighting so we get real distances.
Python Implementation (Sketch)
import heapq
def johnson(n, edges):
"""
n: vertices 0..n-1. edges: list of (u, v, w).
Returns: 2D list dist, or None if negative cycle exists.
"""
# Step 1: Add dummy source n, run Bellman-Ford
adj_bf = [[] for _ in range(n + 1)]
for u, v, w in edges:
adj_bf[u].append((v, w))
for v in range(n):
adj_bf[n].append((v, 0))
INF = float('inf')
h = [INF] * (n + 1)
h[n] = 0
for _ in range(n):
for u in range(n + 1):
for v, w in adj_bf[u]:
if h[u] != INF and h[u] + w < h[v]:
h[v] = h[u] + w
# Negative cycle check
for u in range(n + 1):
for v, w in adj_bf[u]:
if h[u] != INF and h[u] + w < h[v]:
return None # negative cycle
h = h[:n] # discard dummy
# Step 2 & 3: Build adj with reweighted edges, run Dijkstra from each u
adj = [[] for _ in range(n)]
for u, v, w in edges:
adj[u].append((v, w + h[u] - h[v]))
dist = [[INF] * n for _ in range(n)]
for s in range(n):
dist[s][s] = 0
heap = [(0, s)]
while heap:
d, u = heapq.heappop(heap)
if d != dist[s][u]:
continue
for v, w in adj[u]:
if dist[s][u] + w < dist[s][v]:
dist[s][v] = dist[s][u] + w
heapq.heappush(heap, (dist[s][v], v))
for v in range(n):
if dist[s][v] != INF:
dist[s][v] = dist[s][v] - h[s] + h[v]
return dist
Examples Section
Example 1: Graph With Negative Edge (No Negative Cycle)
Dummy source 3 → 0,1,2 with weight 0. Bellman-Ford gives h[0], h[1], h[2]. Reweight; then Dijkstra from each vertex; correct with −h[u]+h[v].
Example 2: Negative Cycle
Time and Space Complexity
- Time: O(V · E) for Bellman-Ford + V × O(E log V) for V Dijkstras ⇒ O(V E log V) with binary heap. With Fibonacci heap: O(V E + V² log V).
- Space: O(V²) for the distance matrix; O(V + E) for adjacency and heaps.
Edge Cases
- Negative cycle: Bellman-Ford step must detect it; return None or signal that all-pairs distances are undefined.
- Disconnected: Unreachable pairs remain at INF after Dijkstra; correction formula still applies (INF stays INF).
Common Mistakes
Summary
- Johnson's algorithm solves all-pairs shortest paths with possible negative weights (no negative cycle).
- Uses one Bellman-Ford to get potentials h, reweights to non-negative, then V Dijkstras; correct with −h[u]+h[v].
- Time O(V E log V) with binary heap; good for sparse graphs compared to Floyd-Warshall O(V³).
13.20 0-1 BFS
Introduction
0-1 BFS is a variant of BFS that finds the shortest path (in terms of total edge weight) in a graph where every edge has weight either 0 or 1. Instead of a priority queue (Dijkstra), we use a double-ended queue (deque): push vertices reached by a 0-weight edge to the front and vertices reached by a 1-weight edge to the back. This keeps the deque ordered by distance, so we always process the smallest-distance node next. Time complexity is O(V + E) — linear, like BFS.
When to Use 0-1 BFS
- Graph has only 0 and 1 edge weights (e.g. "free" vs "cost 1" moves).
- You need shortest path from a source; Dijkstra would work but 0-1 BFS is simpler and faster (no log factor).
- Common in grid problems: moving to an empty cell = 0, breaking a wall or paying cost = 1.
Algorithm
- Initialize
dist[s] = 0for source s,dist[v] = ∞for others. Use a deque; pushsat the front. - While the deque is not empty:
- Pop a vertex
ufrom the front of the deque. - For each neighbor
vwith edge weightw(0 or 1):- If
dist[u] + w < dist[v], updatedist[v] = dist[u] + w. Ifw == 0, pushvto the front of the deque; else pushvto the back.
- If
- Pop a vertex
Why this works: vertices in the deque are always in non-decreasing order of distance. So the front always has the smallest distance; we never need a heap.
Mental Model
Think of the deque as two "layers": the front contains nodes at the current minimum distance; the back contains nodes at current distance + 1. Processing from the front keeps the invariant. When we add a 0-weight edge, we don't increase distance, so we put the node at the front; when we add a 1-weight edge, we put it at the back.
Python Implementation
from collections import deque
def bfs_01(n, adj, source):
"""
n: vertices 0..n-1.
adj: adj[u] = list of (v, w) where w is 0 or 1.
Returns: dist[0..n-1] from source.
"""
INF = float('inf')
dist = [INF] * n
dist[source] = 0
dq = deque([source])
while dq:
u = dq.popleft()
for v, w in adj[u]:
if dist[u] + w < dist[v]:
dist[v] = dist[u] + w
if w == 0:
dq.appendleft(v)
else:
dq.append(v)
return dist
Examples Section
Example 1: Simple 0-1 Graph
adj = [
[(1, 0), (2, 1)], # 0
[(3, 1)], # 1
[(3, 0)], # 2
[] # 3
]
print(bfs_01(4, adj, 0)) # [0, 0, 1, 1]
Output: [0, 0, 1, 1] — dist[0]=0, dist[1]=0 (via 0→1 weight 0), dist[2]=1, dist[3]=1.
Example 2: Grid With 0 and 1 Costs
# Conceptual: grid[r][c] = 0 or 1 (cost to enter)
# Vertex index = r * cols + c. Neighbors: up, down, left, right.
# Edge weight = grid[nr][nc]. Use bfs_01 on that graph.
Time and Space Complexity
- Time: O(V + E) — each vertex is popped at most once; each edge is relaxed at most once. No log factor.
- Space: O(V) for dist and the deque.
Edge Cases
- All weights 0: Behaves like standard BFS; all reachable nodes get distance 0 (or same level).
- All weights 1: Same as standard BFS; distance = number of edges.
- Disconnected: Unreachable vertices remain at INF.
Common Mistakes
Pattern Recognition
Use 0-1 BFS when you see:
- "Minimum cost" or "shortest path" with only two types of cost (e.g. free vs 1, or 0 vs 1).
- Grid problems: empty cell = 0, wall/cost = 1; or "minimum walls to break" to reach target.
Summary
- 0-1 BFS finds shortest path when all edge weights are 0 or 1, using a deque (0 → front, 1 → back).
- O(V + E) time, O(V) space; no priority queue needed.
- Ideal for grid and graph problems with binary costs.
13.21 2-SAT
Introduction
2-SAT is the problem of deciding whether a Boolean formula in conjunctive normal
form (CNF) where each clause has exactly two literals can be satisfied. Each clause
is of the form (a ∨ b) (at least one of a, b is true). We want an assignment to all variables
that makes every clause true, or we report that no such assignment exists. The classic solution reduces 2-SAT
to strongly connected components (SCC) in an "implication graph" and runs in
O(V + E) (linear in the number of variables and clauses).
From Clauses to Implications
A clause (a ∨ b) is equivalent to: if ¬a then b, and if ¬b
then a. So we have two implications: ¬a → b and ¬b → a. We build a
directed graph with one node per literal (so for variable x we have nodes for x and ¬x).
For each clause (a ∨ b), add directed edges (¬a, b) and (¬b, a).
Implication Graph and Satisfiability
- Implication graph: 2n nodes (n variables × 2 for literal and negation). Edges from implications.
- Key fact: The 2-SAT formula is satisfiable if and only if no variable x has both a path from x to ¬x and a path from ¬x to x — i.e. x and ¬x are not in the same strongly connected component (SCC).
- If x and ¬x lie in the same SCC, then we must have both x → ¬x and ¬x → x, so x must be both true and false — impossible.
Algorithm
- Build the implication graph from all clauses (each clause (a ∨ b) gives edges (¬a→b) and (¬b→a)).
- Find all SCCs (e.g. Tarjan's algorithm or two DFS Kosaraju).
- For each variable x, check: if
scc_id[x] == scc_id[¬x], return "unsatisfiable." - Otherwise, assign each variable: one common method is to assign x = false if the SCC of x appears before the SCC of ¬x in the topological order of the condensation graph (and x = true otherwise). Alternatively: assign x so that the literal in the "later" SCC is true (so the implication chain is satisfied).
Assignment Construction
After computing SCCs, process variables in the reverse topological order of the condensation graph. For each
variable x, if we haven't assigned it yet, set x so that the literal that appears in the "later" SCC (higher
topological order) is true. In practice: if scc[x] < scc[¬x], we want x to be false (so ¬x
is true and appears "later"); if scc[x] > scc[¬x], set x true. So: value[x] = (scc[x] > scc[¬x]) (assuming higher topo order = larger scc id when we assign in reverse order).
Python Implementation (Using Tarjan SCC)
def two_sat(n, clauses):
"""
n: number of variables (0..n-1). Each variable i has literals i (true) and i+n (false/¬i).
clauses: list of (a, b) meaning (literal_a ∨ literal_b). Literal: 0..n-1 = variable true, n..2n-1 = variable false.
Returns: assignment [True/False] for each variable, or None if unsatisfiable.
"""
# Build implication graph: 2n nodes
N = 2 * n
adj = [[] for _ in range(N)]
def neg(lit):
return lit + n if lit < n else lit - n
for a, b in clauses:
adj[neg(a)].append(b) # ¬a → b
adj[neg(b)].append(a) # ¬b → a
# Tarjan SCC
disc = [-1] * N
low = [-1] * N
on_stack = [False] * N
stack = []
time = [0]
scc_id = [-1] * N
scc_count = [0]
def dfs(u):
disc[u] = low[u] = time[0]
time[0] += 1
stack.append(u)
on_stack[u] = True
for v in adj[u]:
if disc[v] == -1:
dfs(v)
low[u] = min(low[u], low[v])
elif on_stack[v]:
low[u] = min(low[u], disc[v])
if low[u] == disc[u]:
while True:
v = stack.pop()
on_stack[v] = False
scc_id[v] = scc_count[0]
if v == u:
break
scc_count[0] += 1
for u in range(N):
if disc[u] == -1:
dfs(u)
# Check x and ¬x in same SCC?
for i in range(n):
if scc_id[i] == scc_id[i + n]:
return None
# Assign: variable i true iff scc[i] > scc[¬i]
return [scc_id[i] > scc_id[i + n] for i in range(n)]
Examples Section
Example 1: Satisfiable Formula
# Literals: 0=x0, 1=x1, 2=¬x0, 3=¬x1. Clauses (a∨b): (0,1), (2,1), (0,3)
n = 2
clauses = [(0, 1), (2, 1), (0, 3)]
ans = two_sat(n, clauses)
print(ans) # e.g. [True, True]
Example 2: Unsatisfiable Formula
Time and Space Complexity
- Time: O(n + m) where n = number of variables, m = number of clauses. Building graph O(m), Tarjan O(2n + 2m) = O(n + m).
- Space: O(n + m) for the implication graph and SCC data.
Common Mistakes
Pattern Recognition
Use 2-SAT when you see:
- Constraints that are "at least one of two choices" or "if not A then B" type.
- Binary assignments (true/false, 0/1) with pairwise constraints.
Summary
- 2-SAT is solved by building the implication graph and checking that no variable has x and ¬x in the same SCC.
- Assignment: for each variable x, set x = true iff the SCC of x has higher topological order than the SCC of ¬x (or use the standard reverse-topo assignment).
- O(n + m) time with Tarjan (or Kosaraju) for SCC.
13.22 Stable Matching (Gale-Shapley)
Introduction
The stable matching problem (also known as the stable marriage problem) has two sets of agents (e.g. men and women, or students and schools). Each agent has a strict preference list over the other set. We want a perfect matching (everyone matched exactly once) with no blocking pair — no two agents who prefer each other over their current partners. Gale-Shapley is an algorithm that always finds a stable matching in O(n²) time.
Definitions
- Matching: A set of pairs (a, b) with one from each set; each agent appears in at most one pair. Perfect matching: everyone appears exactly once.
- Blocking pair: Two agents x and y (from different sets) who are not matched to each other but each prefers the other to their current partner. So (x, y) would "run off" together.
- Stable matching: A perfect matching with no blocking pair.
Gale-Shapley Algorithm (Proposer-Acceptor)
One set is the proposers (e.g. men); the other is the acceptors (e.g. women). Each proposer has a list of acceptors in order of preference; each acceptor has a list of proposers. The algorithm:
- Every proposer is initially free. Each acceptor has no partner.
- While there exists a free proposer who has not proposed to every acceptor:
- Pick such a proposer m. He proposes to his most preferred acceptor w to whom he has not yet proposed.
- If w is free, she tentatively accepts (m, w). If w is matched to m', she accepts the one she prefers (m or m'); the rejected proposer becomes free again.
- When no free proposer has any remaining choices, the tentative matching is final and is stable.
Key Properties
- Termination: Each proposer proposes at most n times, so at most n² proposals; the algorithm always terminates.
- Stable: The output matching has no blocking pair (proof by contradiction: if (m, w) blocked, w would have rejected m only for someone she prefers, so she doesn't prefer m to her partner).
- Proposer-optimal: Every proposer gets the best partner they can have in any stable matching. Acceptors get their worst valid partner in any stable matching (acceptor-pessimal).
Mental Model
Proposers "propose" in order of preference; acceptors keep the best offer so far and reject the rest. Rejected proposers move down their list. Because acceptors only trade up, no one who was rejected can later form a blocking pair with an acceptor who already has a better (for her) partner.
Python Implementation
def gale_shapley(n, pref_proposer, pref_acceptor):
"""
n: number of proposers and acceptors (each 0..n-1).
pref_proposer[i] = list of n acceptor indices in order of preference for proposer i.
pref_acceptor[j] = list of n proposer indices in order of preference for acceptor j.
Returns: list partner_proposer[0..n-1] where partner_proposer[i] = acceptor matched to proposer i.
"""
# rank_acceptor[j][m] = rank of proposer m in acceptor j's list (0 = best)
rank_acceptor = [[0] * n for _ in range(n)]
for j in range(n):
for r, m in enumerate(pref_acceptor[j]):
rank_acceptor[j][m] = r
partner_acceptor = [-1] * n # partner_acceptor[j] = proposer matched to j
next_proposal = [0] * n # next index in pref_proposer[i] to try
free = list(range(n)) # free proposers
while free:
m = free.pop()
if next_proposal[m] >= n:
continue
w = pref_proposer[m][next_proposal[m]]
next_proposal[m] += 1
if partner_acceptor[w] == -1:
partner_acceptor[w] = m
else:
m_prime = partner_acceptor[w]
if rank_acceptor[w][m] < rank_acceptor[w][m_prime]: # w prefers m
partner_acceptor[w] = m
free.append(m_prime)
else:
free.append(m)
# Build result: partner_proposer[m] = w such that partner_acceptor[w] == m
partner_proposer = [0] * n
for w in range(n):
m = partner_acceptor[w]
if m != -1:
partner_proposer[m] = w
return partner_proposer
Examples Section
Example 1: Two-by-Two
n = 2
pref_proposer = [[1, 0], [0, 1]] # 0 prefers 1>0, 1 prefers 0>1
pref_acceptor = [[0, 1], [1, 0]] # 0 prefers 0>1, 1 prefers 1>0
result = gale_shapley(n, pref_proposer, pref_acceptor)
print(result) # e.g. [1, 0] -> proposer 0 gets acceptor 1, proposer 1 gets acceptor 0
One stable outcome: (0,1) and (1,0). Proposer 0 gets acceptor 1; proposer 1 gets acceptor 0. No blocking pair.
Example 2: Three-by-Three
Time and Space Complexity
- Time: O(n²) — each proposer proposes at most n times; each proposal is O(1) with rank lookup.
- Space: O(n²) for preference and rank arrays; O(n) for matching state.
Common Mistakes
next_proposal[m] so each proposer moves down their list and never re-proposes to the same acceptor.
Pattern Recognition
Use Gale-Shapley when you see:
- "Stable matching," "stable marriage," "college admissions," "internship matching" with two sides and preference lists.
- Problems asking for a matching with no "blocking pair" or "no one would switch."
Summary
- Stable matching: perfect matching with no blocking pair. Gale-Shapley always finds one.
- Proposers propose in preference order; acceptors keep best offer. O(n²) time.
- Output is proposer-optimal and acceptor-pessimal among all stable matchings.
14.1 Recursion Deep Dive
Introduction
Recursion is one of the most powerful and elegant ideas in computer science, but it is also one of the most misunderstood by beginners. In this deep dive, you will build a rock-solid mental model for how recursion works, how Python actually executes recursive functions under the hood, how to design your own recursive solutions, and how to analyze their time and space complexity. By the end, recursion will feel less like “magic” and more like a predictable, mechanical process you can control.
Real-World Analogy
Imagine a line of people, each holding an envelope. The first person is asked:
- If your envelope says “STOP”, open it and read the number inside.
- Otherwise, pass the question to the next person in line, wait for their answer, then add 1 to it and say that.
Eventually, someone’s envelope says “STOP”; that person answers with, say, 0. Then the person just before them adds 1 and answers 1, the one before answers 2, and so on back to the start. Each person only does a tiny bit of work and relies on the next person for the rest. This is exactly how recursion works: each call handles a tiny part of the job and delegates the rest to a “smaller” version of the same problem.
Formal Definition
Informally, a function is recursive if it calls itself (directly or indirectly). Formally:
Every well-formed recursive function has:
- Base case(s): Simple input(s) where the answer is known immediately and no further recursion is needed.
- Recursive case(s): Rule(s) that reduce the current problem to one or more smaller subproblems of the same type.
Why This Topic Matters
- Interview relevance: Many classic interview problems (trees, backtracking, divide-and-conquer, DP) are naturally recursive.
- Expressiveness: Recursive code often mirrors the mathematical or combinatorial definition of the problem, making it easier to reason about.
- Foundation for advanced topics: Backtracking, dynamic programming, and many graph and tree algorithms build directly on recursion.
Mental Model: The Call Stack
When you call a function in Python, the interpreter allocates a stack frame that stores:
- Parameter values
- Local variables
- Return address (where to continue after the function returns)
With recursion, each recursive call gets its own frame. Think of the call stack as a stack of plates: each call pushes a new plate; when a call returns, its plate is popped off.
Call stack grows downward (top of stack is the most recent call)
Before any calls:
[main]
Call fact(3):
[main]
[fact(3)]
fact(3) calls fact(2):
[main]
[fact(3)]
[fact(2)]
fact(2) calls fact(1):
[main]
[fact(3)]
[fact(2)]
[fact(1)]
fact(1) hits base case and returns 1.
Then fact(2) returns 2 * 1, fact(3) returns 3 * 2, etc., and frames are popped.
Step-by-Step Breakdown: Factorial
The factorial of a non-negative integer n, written n!, is defined as:
- 0! = 1
- n! = n × (n − 1)! for n ≥ 1
Notice that the definition itself is recursive: n! is defined in terms of (n − 1)!.
Recursive Design Recipe
- Define the problem clearly: Input: integer n ≥ 0. Output: n!.
- Find the base case: What is the simplest n you can handle directly? Here, 0! = 1.
- Assume the recursive call works: Assume you already have a function that correctly computes (n − 1)!.
- Write the recursive step: Using (n − 1)!, how do you get n!? Answer: n! = n × (n − 1)!
- Make progress: Ensure each recursive call moves toward the base case (n gets smaller).
Python Implementation: Factorial
def factorial(n: int) -> int:
"""Compute n! for n >= 0 using recursion."""
if n < 0:
raise ValueError("n must be non-negative")
# Base case
if n == 0:
return 1
# Recursive case: n! = n * (n - 1)!
return n * factorial(n - 1)
Line-by-Line Explanation
if n < 0: ...– we first guard against invalid input.if n == 0: return 1– this is the base case; no more recursion.return n * factorial(n - 1)– we trust thatfactorial(n - 1)works (inductive assumption), multiply byn, and return.
Execution Trace for factorial(3)
factorial(3)
-> 3 * factorial(2)
|
v
factorial(2)
-> 2 * factorial(1)
|
v
factorial(1)
-> 1 * factorial(0)
|
v
factorial(0)
-> 1 (base case)
Unwinding:
factorial(0) returns 1
factorial(1) returns 1 * 1 = 1
factorial(2) returns 2 * 1 = 2
factorial(3) returns 3 * 2 = 6
Another Example: Sum of an Array
Problem: Given a list of numbers, return their sum.
Recursive idea:
- Base case: The sum of an empty list is 0.
- Recursive case: sum of
[x] + restisx + sum(rest).
from typing import List
def recursive_sum(arr: List[int]) -> int:
if not arr: # base case: empty list
return 0
first = arr[0]
rest = arr[1:]
return first + recursive_sum(rest)
return arr[0] + recursive_sum(arr), the list would never get smaller, and you would get infinite
recursion until Python raises RecursionError.
Time and Space Complexity
Factorial and Recursive Sum
- Time complexity: For both
factorial(n)andrecursive_sum(arr), we make exactly one recursive call with a “smaller” input each time.
Let T(n) be the time to compute factorial(n). We have:
T(0) = O(1) # base case
T(n) = T(n - 1) + O(1) # one recursive call plus constant work
If you expand this recurrence:
T(n) = T(n - 1) + c
= T(n - 2) + c + c
= ...
= T(0) + n * c
= O(n)
- Time: O(n) for factorial(n) and O(n) for recursive_sum on a list of length n.
- Space (call stack): Also O(n), because at most n calls are on the stack at once before unwinding.
Brute Force → Better → Optimal (Recursion vs Iteration)
Brute Force Thinking
A beginner might first write factorial using a loop (iterative):
def factorial_iterative(n: int) -> int:
if n < 0:
raise ValueError("n must be non-negative")
result = 1
for k in range(1, n + 1):
result *= k
return result
This is already efficient (O(n) time, O(1) extra space). Recursion does not automatically make things faster—it makes them clearer when the problem is naturally recursive (trees, backtracking, divide-and-conquer).
Recursive “Better” for Structure
For problems with a recursive structure (e.g., binary trees: process root, left subtree, right subtree), the recursive version is often the most natural and concise. For example, tree traversal is much clearer recursively than managing your own explicit stack on the first try.
Tail Recursion (Preview)
A recursive call is in tail position if it is the very last operation in the function (nothing remains to do after the recursive call returns). For example:
def tail_recursive_factorial(n: int, acc: int = 1) -> int:
if n == 0:
return acc
return tail_recursive_factorial(n - 1, acc * n)
In some languages (e.g., Scheme, some C compilers), the compiler can optimize tail recursion to reuse the same stack frame (tail call optimization), making space O(1). Python does not perform tail call optimization, so tail-recursive functions in Python still use O(n) stack space and are subject to the recursion limit.
Common Mistakes
RecursionError.
n - 1, index + 1, smaller subarray, smaller tree).
Interview Insight
- Start by clearly stating the base case(s) and what they return.
- Define the recursive step verbally (“Assume I can solve the smaller problem; here’s how I use it”).
- Draw a small recursion tree or call stack for a tiny input (n = 3, 4) to verify correctness.
- Then write code that exactly matches your definition.
Practice Problems
- Compute the nth Fibonacci number recursively (then think about why the naive version is slow).
- Given a string, return its reverse using recursion.
- Count how many times a target value appears in a list using recursion.
- Given a binary tree, compute its height and node count recursively.
Summary
- Every recursive function needs clear base case(s) and recursive case(s) that shrink the problem.
- The call stack holds one frame per active call, giving recursion an extra O(depth) space cost.
- Linear recursive algorithms like factorial and sum of array typically run in O(n) time and O(n) space.
- Use recursion when it matches the natural structure of the problem (trees, divide-and-conquer, backtracking) and favor iteration for simple linear tasks in Python.
14.2 Tail Recursion
Introduction
Not all recursive calls are created equal. When the recursive call is the last thing your function does—with no further computation after it returns—we say the call is in tail position. Such functions are called tail-recursive. In languages that support tail call optimization (TCO), tail recursion can use constant stack space instead of growing the stack with every call. Python does not perform TCO, but understanding tail recursion sharpens your reasoning about recursion, stack usage, and how to convert recursive ideas into efficient iterative code.
Real-World Analogy
Imagine passing a baton in a relay. In ordinary recursion, the runner receives the baton, runs a leg, then waits for the next runner to finish the rest of the race, and only then does something with the result (e.g., adds their own time). In tail recursion, the runner passes the baton and is done—they do nothing after the handoff. The “result” is carried forward in the baton itself (like an accumulator). When the last runner crosses the line, the final answer is already in the baton; no one needs to “add up” work on the way back.
Formal Definition
More precisely:
- Tail position: The last action of the function is to call itself (and possibly return that call’s result with no extra work).
- Tail call: A function call that is in tail position.
- Tail call optimization (TCO): A compiler/runtime optimization that reuses the current stack frame for a tail call, so the stack does not grow.
Why This Topic Matters
- Stack safety: In TCO-supporting languages, tail-recursive code can run in O(1) stack space, avoiding stack overflow on large inputs.
- Conversion to loops: Any tail-recursive function can be mechanically converted to an equivalent loop with no recursion—useful in Python where TCO is absent.
- Interview clarity: Interviewers sometimes ask “can you make this tail-recursive?” or “convert this to iterative”; knowing the pattern helps.
Mental Model: Accumulator Pattern
A common way to make a recursive function tail-recursive is to add an accumulator parameter that carries the “result so far.” The base case then returns the accumulator (or a function of it); the recursive case updates the accumulator and passes it down, with no work left to do after the recursive call returns.
Ordinary recursion (factorial):
fact(n) = n * fact(n-1) → work AFTER the call returns (multiply by n)
Tail recursion (factorial with accumulator acc):
fact_tail(n, acc) = fact_tail(n-1, n * acc) → no work after the call; result is in acc when n=0
Step-by-Step: Converting Factorial to Tail-Recursive Form
- Original:
fact(n) = n * fact(n-1), base casefact(0) = 1. - Introduce accumulator: Define
fact_tail(n, acc)meaning “compute n! × acc.” Sofact(n) = fact_tail(n, 1). - Recursive rule:
fact_tail(n, acc) = fact_tail(n-1, n * acc)—the “n ×” is folded intoacc. - Base case:
fact_tail(0, acc) = acc(we have accumulated the full n! inacc). - The recursive call is now the last operation; no multiplication after return.
Python Implementation
Tail-recursive factorial
def factorial_tail(n: int, acc: int = 1) -> int:
"""Tail-recursive factorial. fact(n) = factorial_tail(n, 1)."""
if n < 0:
raise ValueError("n must be non-negative")
if n == 0:
return acc
return factorial_tail(n - 1, n * acc)
Tail-recursive sum of list (using index to avoid slicing)
To avoid O(n) slicing per call, we pass an index and the list:
from typing import List
def sum_tail(arr: List[int], i: int = 0, acc: int = 0) -> int:
"""Tail-recursive sum: sum(arr) = sum_tail(arr, 0, 0)."""
if i == len(arr):
return acc
return sum_tail(arr, i + 1, acc + arr[i])
Line-by-Line Explanation (factorial_tail)
accholds “partial product” so far: when we reach n=0, we have n! × acc_initial. With acc_initial=1, we get n!.if n == 0: return acc— base case; no more recursion; the answer is inacc.return factorial_tail(n - 1, n * acc)— single return of the recursive call; nothing is done after the call returns. So this is a tail call.
Ordinary vs Tail Recursion: Comparison
| Aspect | Ordinary recursion | Tail recursion |
|---|---|---|
| After recursive call returns | More work (e.g. multiply by n) | Nothing; return that result |
| Stack in Python | O(n) frames | O(n) frames (no TCO) |
| Stack with TCO | Still O(n) | O(1) |
| Conversion to loop | Need explicit stack or different rewrite | Straightforward: loop that updates (n, acc) until n=0 |
Tail Recursion to Iteration (Python-Friendly)
Because Python does not do TCO, the “optimal” way to get constant stack space is to convert the tail-recursive function into a loop. The transformation is mechanical:
- Replace the recursive function with a loop.
- Loop variables = the parameters that change (e.g.
n,acc). - Base case → loop exit condition; return value = accumulator (or final state).
- Recursive case → update variables and continue loop.
def factorial_iterative(n: int) -> int:
if n < 0:
raise ValueError("n must be non-negative")
acc = 1
while n != 0:
acc = n * acc
n = n - 1
return acc
This is exactly the same computation as factorial_tail(n, 1), but with no recursive calls and
O(1) extra space.
Time and Space Complexity
- Time: Same as the ordinary recursive version—O(n) for factorial and for list sum (with index, no slicing).
- Space in Python: Tail-recursive
factorial_tailandsum_tailstill use O(n) stack space because there is no TCO. The iterative version uses O(1) extra space.
Edge Cases
- n < 0: Define behavior (e.g. raise); tail version handles it the same as ordinary version.
- n = 0 or empty list: Base case returns accumulator; ensure initial accumulator is correct (1 for factorial, 0 for sum).
Common Mistakes
factorial_tail(n, 0) would always return 0. The public API should call the tail helper with the correct initial value (e.g. factorial_tail(n, 1)).
while
loop over relying on tail recursion. The loop is the “TCO by hand” and is the standard Pythonic approach.
Interview Insight
Practice Problems
- Write a tail-recursive version of “reverse a string” (e.g. pass index and an accumulator string or list).
- Write a tail-recursive “length of list” and then convert it to a loop.
- Implement “power(a, b)” (a^b) tail-recursively using an accumulator, then as a loop.
Summary
- Tail recursion means the recursive call is in tail position—nothing is done after it returns.
- Use an accumulator (and sometimes an index) to carry partial results and make the last step a single recursive call.
- With TCO, tail recursion uses O(1) stack; Python does not do TCO, so stack remains O(n).
- Convert tail-recursive functions to a loop in Python for O(1) stack and to avoid recursion limits.
14.3 Subsets
Introduction
Given a set (or list) of distinct elements, the subsets problem asks: generate all possible subsets. The empty set and the set itself count as subsets. This is a fundamental backtracking pattern: at each step you have a choice—include the current element or exclude it—and you explore both options recursively. Mastering this pattern unlocks subset-sum, combination, and many “generate all possibilities” interview problems.
Real-World Analogy
Imagine packing a suitcase from a row of items. For each item you ask: “Do I take it or leave it?” You don’t need to decide the order—only which items are in the bag. Every possible combination of “take/leave” gives one subset. Empty bag = empty set; all items = full set. Recursion walks through the row: at each item you branch into “include” and “exclude,” then recurse on the rest. When you’ve passed every item, the current bag is one subset—record it and return.
Formal Definition
We do not consider order: [1, 2] and [2, 1] are the same subset. Duplicates in the input are often disallowed so that subsets are uniquely defined.
Why This Topic Matters
- Backtracking foundation: The include/exclude choice at each index is the core of many recursive enumeration problems.
- Interview staple: “Subsets,” “Subsets II” (with duplicates), and “Subset Sum” appear frequently.
- Exponential size: Output has 2n subsets, so time is at least Ω(2n); the goal is to generate them in O(2n) time without extra waste.
Mental Model: Decision Tree
For nums = [1, 2, 3], think of a tree where:
- Level i corresponds to index i (element
nums[i]). - Each node has two children: “include nums[i]” and “exclude nums[i]”.
- Each leaf is a complete subset (after processing all n indices).
start
/ \
include 1 exclude 1
/ \ / \
inc 2 exc 2 inc 2 exc 2
... ... ... ...
(Leaves: [1,2,3], [1,2], [1,3], [1], [2,3], [2], [3], [] → 8 = 2^3)
Step-by-Step Breakdown
- State: Current index
i, and a current subset (path)pathbuilt so far. - Base case: When
i == len(nums), we have fixed include/exclude for every element;pathis one subset—append a copy to the result. - Recursive case: For index
i, two choices:- Exclude: Recurse with
i + 1and samepath. - Include: Append
nums[i]topath, recurse withi + 1, then backtrack (pop) so the samepathcan be reused for other branches.
- Exclude: Recurse with
- Order of exploration (exclude then include, or vice versa) only affects the order of subsets in the output; both are correct.
Python Implementation
from typing import List
def subsets(nums: List[int]) -> List[List[int]]:
result: List[List[int]] = []
def backtrack(i: int, path: List[int]) -> None:
if i == len(nums):
result.append(path[:]) # copy of current subset
return
# Choice 1: exclude nums[i]
backtrack(i + 1, path)
# Choice 2: include nums[i]
path.append(nums[i])
backtrack(i + 1, path)
path.pop() # backtrack
backtrack(0, [])
return result
Line-by-Line Explanation
resultcollects all subsets;pathis the current partial subset (modified and restored during recursion).if i == len(nums): result.append(path[:])— base case: we’ve decided for every element; append a copy ofpath(so later backtracking doesn’t change stored subsets).backtrack(i + 1, path)— excludenums[i]; recurse on the rest.path.append(nums[i]); backtrack(i + 1, path); path.pop()— includenums[i], recurse, then undo so the next sibling branch sees the samepath.
path.pop() or appending path instead of path[:]. Without the copy, every entry in result would reference the same list, which ends up empty (or wrong) after backtracking.
Example Walkthrough
For nums = [1, 2]:
- Start:
i=0,path=[]. - Exclude 1:
backtrack(1, [])→ base case at i=2 → add[]. - Include 1:
path=[1], thenbacktrack(1, [1]). Exclude 2 → add[1]; include 2 → add[1,2]. Pop 2 then pop 1. - Result:
[[], [2], [1], [1, 2]](order may vary by implementation).
Time and Space Complexity
- Time: O(n · 2n). There are 2n leaves; each base case does O(n) work to copy
path. So total O(n · 2n). - Space (excluding output): O(n) for recursion stack and
path. Space for output is O(n · 2n) to store all subsets.
Edge Cases
- Empty input:
nums = []→ one subset[]. Our base case at i=0 appendspath[:]=[], so result =[[]]. Correct. - Single element:
nums = [1]→[[],[1]]. Handled by same logic.
Subsets II (With Duplicates)
If the array may contain duplicates and we want unique subsets (as sets of values), we must avoid generating the same subset in different orders. Standard approach: sort the array, then at each index when we “include,” skip over all subsequent elements that equal the current one (so we only “include” one representative of each duplicate value in a single branch).
def subsets_with_dup(nums: List[int]) -> List[List[int]]:
result: List[List[int]] = []
nums = sorted(nums)
def backtrack(i: int, path: List[int]) -> None:
result.append(path[:])
for j in range(i, len(nums)):
if j > i and nums[j] == nums[j - 1]:
continue # skip duplicate
path.append(nums[j])
backtrack(j + 1, path)
path.pop()
backtrack(0, [])
return result
Here we use a “for-loop over next start index” style: at each step we choose the next index j to include (and skip duplicate values). Result still contains all unique subsets.
Alternative: Bitmask
Each subset can be represented by a bitmask of length n: bit i is 1 if the element at index i is included. We can iterate over all integers from 0 to 2n − 1 and decode each into a subset. Time still O(n · 2n); no recursion, but the pattern is less flexible for “subset sum” or “skip duplicates” variants.
Common Mistakes
path instead of path[:]. You must append a copy so that later backtracking doesn’t mutate the subsets already stored in result.
i, allow the first occurrence of a value in the remaining segment, and skip j > i where nums[j] == nums[j-1] so you don’t start two branches that would form the same subset.
path when recording a subset. For duplicates, say “sort and skip duplicate starts.”
Practice Problems
- LeetCode 78: Subsets (distinct elements).
- LeetCode 90: Subsets II (with duplicates).
- Subset sum: list all subsets that sum to a target (same tree, add a sum parameter and optionally prune).
Summary
- Subsets: generate all 2n subsets via include/exclude at each index; base case appends a copy of
path. - Always
path.pop()after the “include” branch to backtrack. - Time O(n · 2n), space O(n) for recursion and path.
- With duplicates (Subsets II): sort and skip duplicate “next element” choices to get unique subsets.
14.4 Permutations
Introduction
A permutation of a sequence is a rearrangement of its elements. Unlike subsets, order matters: [1, 2, 3] and [3, 2, 1] are different permutations. Given n distinct elements, there are n! permutations. The permutations problem asks: generate all n! permutations. This is another core backtracking pattern: at each step you “place” one element in the next position by choosing from the remaining elements, recurse, then undo the choice. Mastering this pattern is essential for ordering problems, anagrams, and “arrange all” style questions.
Real-World Analogy
Imagine lining up n people in a row. The first position: you can pick any of n people. The second position: any of the remaining n−1. Then n−2, and so on. Each full arrangement is one permutation. Backtracking does exactly this: “place person A in position 0, then recursively arrange the rest; then try person B in position 0, and so on.” When everyone is placed, you have one permutation—record it. Then undo the last placement and try the next candidate.
Formal Definition
Order distinguishes permutations; duplicates in the input complicate “unique” permutations (handled in Permutations II).
Why This Topic Matters
- Backtracking core: “Choose one unused element for the next slot, recurse, then unchoose” is the standard way to enumerate orderings.
- Interview staple: LeetCode 46 (Permutations), 47 (Permutations II), and many “arrange” / “anagram” problems.
- Size: Output has n! permutations; time is at least Ω(n · n!) to write them; we aim for O(n · n!) without extra waste.
Mental Model: Placement Tree
For nums = [1, 2, 3]:
- Level 0: choose which element goes in position 0 (three choices).
- Level 1: choose which remaining element goes in position 1 (two choices).
- Level 2: one element left → one choice; then we have a full permutation.
Position: 0 1 2
Choice 1: 1 → 2 → 3 [1,2,3]
\ \→ 3 → 2 [1,3,2]
Choice 2: 2 → 1 → 3 [2,1,3]
... etc. (6 leaves = 3!)
Step-by-Step Breakdown
- State: Current position
pos(0 to n), current partial permutationpath, and which elements are still available (e.g. a list or a “used” boolean array). - Base case: When
pos == n, every position is filled;pathis one permutation—append a copy to the result. - Recursive case: For position
pos, try every available element: put it atpath[pos], mark it used, recurse topos + 1, then unmark and backtrack. - Alternatively, swap elements in the original array so that “used” items are in the prefix and “available” in the suffix; then swap back after recursing.
Python Implementation (Using a “Used” Array)
from typing import List
def permutations(nums: List[int]) -> List[List[int]]:
result: List[List[int]] = []
n = len(nums)
used = [False] * n
path: List[int] = []
def backtrack(pos: int) -> None:
if pos == n:
result.append(path[:])
return
for i in range(n):
if used[i]:
continue
path.append(nums[i])
used[i] = True
backtrack(pos + 1)
used[i] = False
path.pop()
backtrack(0)
return result
Line-by-Line Explanation
used[i]is True ifnums[i]is already inpath.if pos == n: result.append(path[:])— base case: all positions filled; append a copy ofpath.for i in range(n)— try every index;if used[i]: continueskips already-placed elements.path.append(nums[i]); used[i] = True; backtrack(pos + 1); used[i] = False; path.pop()— place element, recurse, then undo so other branches can use it.
Python Implementation (Swap-Based, In-Place)
We can keep “available” elements in nums[pos .. n-1]. For each j >= pos, swap nums[pos] and nums[j], recurse on pos + 1, then swap back.
def permutations_swap(nums: List[int]) -> List[List[int]]:
result: List[List[int]] = []
n = len(nums)
def backtrack(pos: int) -> None:
if pos == n:
result.append(nums[:])
return
for j in range(pos, n):
nums[pos], nums[j] = nums[j], nums[pos]
backtrack(pos + 1)
nums[pos], nums[j] = nums[j], nums[pos]
backtrack(0)
return result
Here “choose element at index j for position pos” is done by swapping; after recursion we restore the array so the next j gets the original arrangement.
Time and Space Complexity
- Time: O(n · n!). There are n! leaves; each base case does O(n) to copy the permutation. So total O(n · n!).
- Space (excluding output): O(n) for recursion stack,
pathandused(or O(1) extra for swap-based if we don’t count the input). Output space is O(n · n!).
Edge Cases
- Empty input:
nums = []→ one permutation[]. Base case at pos=0 appendspath[:]=[]; result =[[]]. Correct. - Single element:
nums = [1]→[[1]]. One leaf.
Permutations II (With Duplicates)
If the array contains duplicates, we must output unique permutations—no duplicate
sequences. Strategy: sort the array, then when choosing the next element for the current position, skip
duplicates. Rule: for indices j in the “available” range, if nums[j] == nums[j-1]
and the previous one was not used in this position, skip (to avoid generating the same permutation from
two equal elements in different orders).
def permutations_ii(nums: List[int]) -> List[List[int]]:
result: List[List[int]] = []
nums = sorted(nums)
n = len(nums)
used = [False] * n
path: List[int] = []
def backtrack(pos: int) -> None:
if pos == n:
result.append(path[:])
return
for i in range(n):
if used[i]:
continue
if i > 0 and nums[i] == nums[i - 1] and not used[i - 1]:
continue # skip duplicate: same value and previous not used
path.append(nums[i])
used[i] = True
backtrack(pos + 1)
used[i] = False
path.pop()
backtrack(0)
return result
The condition not used[i - 1] ensures we don’t start two branches that would place the same value at the same position (which would yield duplicate permutations).
Comparison: Subsets vs Permutations
| Aspect | Subsets | Permutations |
|---|---|---|
| Order | Does not matter | Matters |
| Count | 2n | n! |
| Choice at step | Include or exclude current element | Pick one unused element for current position |
| State | Index + path | Position + path + used (or swap range) |
Common Mistakes
path instead of path[:]. You must append a copy so backtracking doesn’t mutate stored permutations.
used[i] or to path.pop() after the recursive call. Without undoing, the next iteration reuses the same element or leaves the path in a wrong state.
nums[i] == nums[i-1] and not used[i-1] (so we don’t place the same value at the same position twice via different indices).
used array and keeps “available” elements in a contiguous suffix. Same asymptotic time; slightly less auxiliary space.
Practice Problems
- LeetCode 46: Permutations (distinct elements).
- LeetCode 47: Permutations II (with duplicates).
- Next Permutation (single next in lex order)—different idea but good contrast.
- Letter combinations / permutations of digits (e.g. phone keypad).
Summary
- Permutations: all n! orderings; order matters; at each position choose one unused element, recurse, backtrack.
- Track “used” elements (array or swap-based); base case appends a copy of the current path/array.
- Time O(n · n!), space O(n) for recursion and working storage.
- Permutations II: sort and skip duplicate choices (same value, previous not used) to get unique permutations.
14.5 Combination Sum
Introduction
The Combination Sum family asks: given an array of candidates and a target value, find all unique combinations of candidates whose sum equals the target. A combination is a multiset (order does not matter); [2, 2, 3] and [3, 2, 2] are the same. Two main variants: I—each candidate may be used unlimited times; II—each candidate may be used at most once, and the array may contain duplicates. Both are solved by backtracking with a “start index” to avoid listing the same combination in different orders, and (for II) skipping duplicate values correctly.
Real-World Analogy
You have coins of given denominations and must make exact change for a target amount. Each way to make
change is a “combination”: which coins and how many of each. You don’t care about the order you pick
coins—only the multiset. Backtracking: “Use one more coin of type A (if it keeps us ≤ target), recurse;
then stop using A and try the next coin type.” That “next coin type” is the start index:
we only consider candidates from index i onward so we never build [3, 2] and [2, 3] as two
different combinations.
Formal Definition
Why This Topic Matters
- Classic backtracking: Combines “subset” style (which elements) with a constraint (sum = target) and optional reuse.
- Interview staple: LeetCode 39 (Combination Sum), 40 (Combination Sum II), and variants (e.g. k numbers that sum to target).
- Pattern: “Start index” + “use or skip” (or “use 0, 1, … times”) + prune when sum exceeds target.
Mental Model: Decision Tree with Start Index
We build combinations by deciding, for each “slot,” which candidate to use next—but we only consider candidates at or after a start index so we never generate [2, 3] and [3, 2] separately.
- At each call: current sum, current path (combination so far), start index
i. - Base: sum == target → record path; sum > target → return (prune).
- For each
jfromito n−1: addcandidates[j]to path, recurse (with same j for unlimited use, or j+1 for use-once), then backtrack.
Candidates [2, 3, 5], target 5.
Start at 0: use 2 → sum=2, recurse from 0 again (unlimited) or from 1 (once).
use 3 → sum=3, recurse...
use 5 → sum=5 → record [5].
Pruning: if sum > target, stop that branch.
Step-by-Step Breakdown
- State:
start(index into candidates),path(current combination),cur_sum(sum of path). - Base cases: If
cur_sum == target, appendpath[:]to result and return. Ifcur_sum > target, return (prune). - Recursive case: For
jfromstarttolen(candidates)-1:- I (unlimited): Add
candidates[j], recurse withstart = j(can use j again), then pop and continue to next j. - II (once): If duplicate (j > start and candidates[j]==candidates[j-1]), skip. Else add
candidates[j], recurse withstart = j+1, pop.
- I (unlimited): Add
Python Implementation: Combination Sum I (Unlimited Use)
from typing import List
def combination_sum(candidates: List[int], target: int) -> List[List[int]]:
result: List[List[int]] = []
path: List[int] = []
def backtrack(start: int, cur_sum: int) -> None:
if cur_sum == target:
result.append(path[:])
return
if cur_sum > target:
return
for j in range(start, len(candidates)):
path.append(candidates[j])
backtrack(j, cur_sum + candidates[j]) # same j: can reuse
path.pop()
backtrack(0, 0)
return result
Python Implementation: Combination Sum II (Use at Most Once)
def combination_sum_ii(candidates: List[int], target: int) -> List[List[int]]:
result: List[List[int]] = []
path: List[int] = []
candidates = sorted(candidates)
def backtrack(start: int, cur_sum: int) -> None:
if cur_sum == target:
result.append(path[:])
return
if cur_sum > target:
return
for j in range(start, len(candidates)):
if j > start and candidates[j] == candidates[j - 1]:
continue # skip duplicate: same value, avoid same combination
path.append(candidates[j])
backtrack(j + 1, cur_sum + candidates[j]) # j+1: use once
path.pop()
backtrack(0, 0)
return result
Line-by-Line Explanation
- I:
backtrack(j, ...)keepsstart = jso we can pick the same candidate again.cur_sum > targetprunes. - II:
candidatesis sorted so duplicates are adjacent.if j > start and candidates[j] == candidates[j-1]: continueavoids using the same value twice in the same “slot” (which would create duplicate combinations).backtrack(j+1, ...)ensures each candidate is used at most once.
j > start (not j > 0) is what prevents duplicate combinations from the same value appearing in two different positions.
Time and Space Complexity
- I: Time depends on target and candidates. In the worst case we explore many combinations; often described as O(2^(target/min)) in loose terms. Space O(target/min) for recursion depth and path.
- II: Up to 2^n subsets, each checked for sum; O(2^n) time in the worst case. Space O(n) for recursion and path.
Edge Cases
- Target 0: One valid combination: empty list []. Handle by either appending when cur_sum==0 at start, or defining “target 0” as return [[]].
- Empty candidates: No combinations; return [].
- All candidates > target (II): Pruning will cut all branches; result [].
Common Mistakes
j instead of j+1 in the recursive call gives multiple uses of the same candidate.
cur_sum > target; without it you can recurse unnecessarily (and risk stack overflow or TLE).
cur_sum + candidates[j] > target and skip that j (or break if candidates are positive and sorted, since later j are larger).
Practice Problems
- LeetCode 39: Combination Sum (unlimited use).
- LeetCode 40: Combination Sum II (at most once, with duplicates).
- Combination Sum III: use exactly k numbers from 1..9 that sum to n.
- Subset Sum (count or list subsets that sum to target)—same tree, different base case.
Summary
- Combination Sum I: Backtrack with start index; recurse with
start=jto allow reusing the same candidate; prune whencur_sum > target. - Combination Sum II: Sort candidates; recurse with
start=j+1; skip whenj > startandcandidates[j]==candidates[j-1]to avoid duplicate combinations. - Always append
path[:]and pop after recursion to backtrack correctly. - Pruning on sum is essential for efficiency.
14.6 N Queens
Introduction
The N Queens problem: place n queens on an n×n chessboard so that no two queens attack each other. Queens attack along their row, column, and both diagonals. We place exactly one queen per row (or per column); at each row we choose a column, check that it is safe with respect to all previously placed queens, then recurse to the next row. Backtrack when no column is valid. This is a classic constraint satisfaction problem and a standard backtracking interview question.
Real-World Analogy
Imagine placing one token per row on a grid. Each token “blocks” its entire column and both diagonal directions. Your job: fill every row without any two tokens sharing a column or diagonal. You try the first row—pick a column; then the second row—pick a column that isn’t blocked; and so on. If you reach a row where every column is blocked, you undo the last row’s choice and try another column. That’s exactly N Queens: one queen per row, try each column, check safety, recurse, backtrack.
Formal Definition
Why This Topic Matters
- Constraint backtracking: Combines “choose one option per step” with a non-trivial safety check (column + diagonals).
- Interview staple: LeetCode 51 (N-Queens) and 52 (N-Queens II count). Tests recursion, pruning, and encoding state.
- Pattern: One decision per row/column; efficient “is this cell safe?” is key (sets or arrays for columns and diagonals).
Mental Model: Row-by-Row Placement
We fill the board row by row. Row 0: try column 0, 1, …, n−1. For each choice, mark that column and the two diagonals as “used,” then recurse to row 1. At row 1 we only try columns that are still safe. If we reach row n, we have placed n queens—record the solution. If for some row no column is safe, backtrack: unmark the last placement and try the next column.
Row 0: place Q at col 1 → mark col 1, diag (0-1), anti (0+1)
Row 1: try col 0 (safe?), col 2 (safe?), ...
Row 2: ...
...
Row n-1: place last Q → solution. Backtrack to find more.
Diagonal Indexing
For a cell (row, col):
- Main diagonal (↘): cells with same (row − col). Use index
row - col(can be negative; shift by +n for array index). - Anti-diagonal (↙): cells with same (row + col). Use index
row + col(0 to 2n−2).
So we maintain three sets (or boolean arrays): cols, main_diag (row−col), anti_diag (row+col). A placement (r, c) is safe iff c not in cols, (r−c) not in main_diag, (r+c) not in anti_diag.
Step-by-Step Breakdown
- State: Current row
r(0 to n), current placement (e.g. list of column indices or a 2D board), and sets/arrays for columns and diagonals. - Base case: If
r == n, all rows have a queen; record the solution (e.g. append a copy of the board or the column list). - Recursive case: For each column
c, if (r, c) is safe, mark column and both diagonals, add queen to placement, recurse tor+1, then unmark and remove queen (backtrack).
Python Implementation
from typing import List
def solve_n_queens(n: int) -> List[List[str]]:
result: List[List[str]] = []
# placement: board[r] = column index of queen in row r
board: List[int] = [-1] * n
cols: set = set()
main_diag: set = set() # row - col
anti_diag: set = set() # row + col
def safe(r: int, c: int) -> bool:
return c not in cols and (r - c) not in main_diag and (r + c) not in anti_diag
def format_board() -> List[str]:
rows = []
for c in board:
rows.append("." * c + "Q" + "." * (n - c - 1))
return rows
def backtrack(r: int) -> None:
if r == n:
result.append(format_board())
return
for c in range(n):
if not safe(r, c):
continue
board[r] = c
cols.add(c)
main_diag.add(r - c)
anti_diag.add(r + c)
backtrack(r + 1)
anti_diag.discard(r + c)
main_diag.discard(r - c)
cols.discard(c)
board[r] = -1
backtrack(0)
return result
Line-by-Line Explanation
board[r] = cmeans the queen in row r is in column c. We only need one index per row.safe(r, c): no conflict in column c, main diagonal (r−c), or anti-diagonal (r+c).backtrack(r): ifr == n, format the board (e.g. [".Q..", "...Q", ...]) and append to result.- In the loop: if safe, set
board[r]=c, add c and diagonals to sets, recursebacktrack(r+1), then remove from sets and resetboard[r]so the next column can be tried.
Time and Space Complexity
- Time: In the worst case we explore many placements; upper bound is O(n!) because we try up to n choices per row and pruning reduces it. Often cited as O(n!) or with a tighter analysis.
- Space: O(n) for recursion stack,
board, and the three sets. Excluding output, O(n).
Edge Cases
- n = 1: One queen, one cell; one solution.
- n = 2 or 3: No solution; result is [].
- n = 4: Two solutions (up to symmetry).
Common Mistakes
board[r] so the next column is tried with a clean state.
Practice Problems
- LeetCode 51: N-Queens (return all distinct board configurations).
- LeetCode 52: N-Queens II (return the number of distinct solutions).
- Print one valid configuration only (stop after first solution if desired).
Summary
- N Queens: Place n queens, one per row; try each column per row; ensure no two share column or diagonal.
- Track columns and both diagonals (row−col, row+col) for O(1) safety check.
- Backtrack: after recursing, unmark column and diagonals and remove the queen so the next column can be tried.
- Output can be list of strings (one per row, "Q" and ".") or list of column indices per row.
14.7 Sudoku Solver
Introduction
The Sudoku Solver problem: given a 9×9 grid partially filled with digits 1–9 (and empty cells marked as '.' or 0), fill the grid so that every row, every column, and every 3×3 sub-box contains the digits 1–9 exactly once. We solve it by backtracking: pick an empty cell, try each valid digit (1–9), check that it doesn’t violate row, column, or box constraints, place it and recurse; if no digit leads to a solution, backtrack and try another digit (or another cell order). It’s a classic constraint satisfaction problem and a standard interview question.
Real-World Analogy
Imagine a 9×9 grid with some numbers already written. Your job: fill the blanks so that in every row, every column, and every 3×3 block, the numbers 1–9 each appear exactly once. You pick one empty cell, try 1, then 2, …; for each try you check “is this allowed here?” (no same number in row, column, or block). If it’s allowed, you write it and move to the next empty cell. If you ever get stuck (no valid digit), you erase the last number you wrote and try the next option. That’s backtracking: try, check, recurse, undo.
Formal Definition
Why This Topic Matters
- Constraint backtracking: Multiple constraints (row, column, box) and many choices per cell make it a strong test of backtracking and pruning.
- Interview staple: LeetCode 37 (Sudoku Solver). Tests recursion, validity checks, and in-place modification.
- Pattern: “Find empty cell → try each valid value → recurse → backtrack” generalizes to many grid puzzles.
Mental Model: Cell-by-Cell Fill
We process the grid in some order (e.g. row-major). For each cell:
- If the cell is already filled, move to the next cell (or return true if no more cells—solved).
- If the cell is empty, try digits 1–9. For each digit, check if it is valid in this cell (not already in the same row, column, or 3×3 box). If valid, place it, recurse to the next cell; if the recursion returns true, we’re done. Otherwise remove the digit and try the next.
- If no digit works, return false so the caller can backtrack.
Grid order: (0,0), (0,1), ... (8,8).
At empty cell (r,c): try d in 1..9
if valid(r, c, d): board[r][c]=d, recurse next cell; if true return true; else board[r][c]='.'
return false
Box Index
The nine 3×3 boxes are indexed by (row // 3, col // 3). So box index for cell (r, c) is box = (r // 3, c // 3). To check “digit d in box containing (r, c)”: look at cells (r//3)*3 + i, (c//3)*3 + j for i, j in 0..2. Alternatively, use a single index box_id = (r // 3) * 3 + (c // 3) (0–8).
Step-by-Step Breakdown
- Find next empty cell: Scan row by row (and column by column); if no empty cell, the board is solved—return True.
- Try digits 1–9: For each digit d, check
valid(board, r, c, d)(not in row r, not in column c, not in the 3×3 box containing (r, c)). - Place and recurse: Set
board[r][c] = d, call solver (next cell). If it returns True, return True. Else setboard[r][c]back to empty and try next d. - Backtrack: If no digit works, return False.
Python Implementation
from typing import List # board: List[List[str]] with '1'..'9' or '.'
def solve_sudoku(board: List[List[str]]) -> None:
def valid(r: int, c: int, d: str) -> bool:
for i in range(9):
if board[r][i] == d or board[i][c] == d:
return False
br, bc = (r // 3) * 3, (c // 3) * 3
for i in range(3):
for j in range(3):
if board[br + i][bc + j] == d:
return False
return True
def solve() -> bool:
for r in range(9):
for c in range(9):
if board[r][c] != '.':
continue
for d in "123456789":
if not valid(r, c, d):
continue
board[r][c] = d
if solve():
return True
board[r][c] = '.'
return False # no digit worked
return True # no empty cell
solve()
Line-by-Line Explanation
valid(r, c, d): Check row r (all columns), column c (all rows), and the 3×3 box starting at (br, bc) = (r//3*3, c//3*3). If d appears anywhere, return False.solve(): Double loop finds first empty cell (board[r][c] == '.'). If none, return True. For that cell, try each d; if valid, place d, recurse; if solve() returns True, propagate True. Else reset cell to '.' and try next d. If no d works, return False.
board[r][c] to '.' when the recursive call fails. Without resetting, the board is left in an invalid state and later checks are wrong.
Time and Space Complexity
- Time: Worst case we try up to 9 choices per empty cell; number of empty cells can be large. Loosely O(9^m) where m is the number of empty cells, but pruning (valid check) reduces it. Often cited as exponential in the number of blanks.
- Space: O(1) extra if we don’t count the board; recursion depth is at most the number of empty cells (e.g. O(81) for 9×9).
Edge Cases
- Already solved: No empty cells; solve() returns True immediately.
- Invalid initial board: Same digit twice in a row/col/box; our solver may still run but won’t find a solution. Problem usually guarantees valid input.
- Character representation: Use strings '1'..'9' and '.' (or 0) as in LeetCode; ensure comparison is consistent.
Optimization: Track Rows, Columns, Boxes
Instead of scanning row/column/box every time, maintain three 9×9 boolean arrays (or sets of digits): rows[r][d], cols[c][d], boxes[b][d] indicating whether digit d is in row r, column c, or box b. When placing d at (r,c), set these to True; when removing, set to False. Then valid(r,c,d) is O(1). Initialize from the given board.
Common Mistakes
(r//3)*3 and column (c//3)*3, and spans 3 rows and 3 columns. Off-by-one or wrong multiplier is a frequent bug.
Practice Problems
- LeetCode 37: Sudoku Solver (solve in place).
- LeetCode 36: Valid Sudoku (check if a filled board is valid—no need to solve).
- Variants: larger grids (e.g. 16×16), or “find all solutions.”
Summary
- Sudoku Solver: Fill empty cells so each row, column, and 3×3 box has 1–9 exactly once.
- Backtrack: find empty cell → try 1–9 → valid (row, col, box) → place, recurse, unplace if recurse fails.
- Box for (r, c): top-left at (r//3)*3, (c//3)*3; check that 3×3 region for duplicates.
- Reset cell to empty when backtracking; use row/col/box arrays for O(1) validity if optimizing.
14.8 Rat in Maze
Introduction
The Rat in a Maze problem: given an n×n grid where some cells are open (1) and some blocked (0), find a path from the top-left (0, 0) to the bottom-right (n−1, n−1). The rat can move only up, down, left, or right (typically one step at a time). We use backtracking: from the current cell, try each allowed direction; if the next cell is in bounds, open, and unvisited, mark it visited, recurse; if we reach the destination, record or return the path; otherwise unmark and try the next direction. Variations ask for one path, all paths, or the shortest path (BFS is better for shortest).
Real-World Analogy
Imagine a mouse in a maze drawn on graph paper. Some squares are walls; the mouse can only step on open squares and move one step up, down, left, or right. Starting at the top-left corner, the mouse tries one direction: if the next square is open and not yet visited, it steps there and continues. If it reaches the bottom-right, it has found a path. If it gets stuck (all neighbors blocked or visited), it steps back and tries another direction. That try-step-back-try-again is backtracking.
Formal Definition
maze with maze[i][j] == 1 (open) or 0 (blocked).
Start at (0, 0), destination (n−1, n−1). Valid move: one step in one of four directions (up, down, left,
right) to a cell that is in bounds, open, and (in the standard formulation) not already on the current path.
Output: a path (sequence of moves or coordinates) from start to destination, or report that no path exists.
Sometimes the problem allows visiting a cell multiple times; then we only need to avoid going back to the
previous cell to prevent trivial loops.
Why This Topic Matters
- Grid backtracking: Same “try each option, recurse, backtrack” pattern on a 2D grid; foundation for many puzzle and path problems.
- Interview staple: Common in coding rounds; sometimes “print all paths” or “count paths” (with possible DP overlap).
- Pattern: Direction array (dx, dy), bounds check, “open and unvisited” check, mark/unmark, recurse.
Mental Model: Try All Directions
At current cell (r, c):
- If (r, c) is the destination, we have a path—record it and return (or continue to find more paths).
- Otherwise, try each of the four neighbors (down, up, right, left—or any order). For each neighbor (nr, nc): if in bounds, open, and unvisited, mark visited, add to path, recurse, then unmark and remove from path.
Directions: D(down), U(up), R(right), L(left) → (1,0), (-1,0), (0,1), (0,-1)
At (r,c): for each (dr,dc), next = (r+dr, c+dc)
if in bounds and maze[nr][nc]==1 and not visited[nr][nc]:
mark, path += move, recurse, unmark, path pop
Step-by-Step Breakdown
- State: Current position (r, c), current path (list of moves or cells), visited matrix (or mark/unmark on the maze).
- Base case: If (r, c) == (n−1, n−1), destination reached—append current path to result (or return true for “one path” version).
- Recursive case: Define direction vectors (e.g. down, up, right, left). For each direction, compute next cell (nr, nc). If nr, nc in [0, n−1], maze[nr][nc] is open, and (nr, nc) not visited: mark visited, append move to path, recurse, then unmark and pop path.
Python Implementation
from typing import List, Tuple
# maze: 1 = open, 0 = blocked. Find path from (0,0) to (n-1,n-1).
# Moves: D U R L → (1,0), (-1,0), (0,1), (0,-1)
DIRS = [(1, 0, "D"), (-1, 0, "U"), (0, 1, "R"), (0, -1, "L")]
def rat_in_maze(maze: List[List[int]], n: int) -> List[str]:
result: List[str] = []
path: List[str] = []
visited = [[False] * n for _ in range(n)]
def in_bounds(r: int, c: int) -> bool:
return 0 <= r < n and 0 <= c < n
def backtrack(r: int, c: int) -> None:
if r == n - 1 and c == n - 1:
result.append("".join(path))
return
for dr, dc, move in DIRS:
nr, nc = r + dr, c + dc
if not in_bounds(nr, nc) or maze[nr][nc] == 0 or visited[nr][nc]:
continue
visited[nr][nc] = True
path.append(move)
backtrack(nr, nc)
path.pop()
visited[nr][nc] = False
if maze[0][0] == 1:
visited[0][0] = True
backtrack(0, 0)
return result
Line-by-Line Explanation
DIRS: (delta row, delta col, move name) for down, up, right, left.backtrack(r, c): If (r, c) is (n−1, n−1), save path string and return. For each direction, compute (nr, nc); skip if out of bounds, blocked, or visited. Else mark (nr, nc) visited, append move, recurse, then pop move and unmark.- Start: if (0,0) is open, mark it visited and call backtrack(0, 0). Result list holds all path strings (e.g. ["DDRR", "DRDR"]).
visited[nr][nc] after the recursive call. Without unmarking, cells stay “visited” and other valid paths that reuse that cell are missed (for “all paths”); or the same path is blocked on backtrack.
Time and Space Complexity
- Time: In the worst case we explore many paths; each cell can be tried in multiple paths. Upper bound is exponential (e.g. O(4^(n²)) in theory); in practice pruning (blocked/visited) reduces it.
- Space: O(n²) for visited matrix; O(path length) for recursion stack and path list. For “all paths” we store each path string.
Edge Cases
- Start or destination blocked: If maze[0][0]==0 or maze[n-1][n-1]==0, no path; return [] or false.
- 1×1 grid: Start is destination; path is [] (no moves). Handle by checking (r,c)==(n-1,n-1) at entry and appending empty path if needed.
- No path: All paths get stuck; result is [].
Variation: Count Paths or One Path
For “count all paths,” increment a counter (or len(result)) when reaching the destination instead of storing path strings. For “find one path,” return True as soon as you reach the destination and pass the path back; no need to try other directions.
Common Mistakes
Practice Problems
- GfG / classic: Rat in a Maze (print all paths in lex order—often D, L, R, U).
- LeetCode 79: Word Search (path in grid spelling a word—similar try-directions backtracking).
- Count number of paths; shortest path (BFS).
Summary
- Rat in Maze: Find path from (0,0) to (n−1,n−1) on a grid; move to open neighbors only; backtrack over directions.
- Use direction vectors (dr, dc), bounds check, and “open and unvisited” check; mark before recurse, unmark after.
- Base case: at destination, record path (or return true).
- For shortest path use BFS; use backtracking for all paths or one path.
14.9 Branch & Bound
Introduction
Branch and Bound is a search strategy that extends backtracking with bounds to prune the search tree. We explore the same kind of "decision tree" (branch), but at each node we compute a bound (e.g. a lower bound for a minimization problem) that tells us the best possible value we can get from that subtree. If the bound shows that no descendant can beat our current best solution, we prune that branch and do not explore it. This turns "find all solutions" into "find an optimal solution" and often reduces the number of nodes visited compared to plain backtracking.
Real-World Analogy
Imagine searching for the cheapest route that visits several cities. Plain backtracking would try every ordering and then pick the best. Branch and bound: as you build a partial route, you estimate the minimum cost any completion could have (e.g. using a simple lower bound). If that estimate is already higher than the best full route you've found so far, you stop extending that partial route—you "prune" that branch. You still branch (try next city), but you bound (estimate) and cut off hopeless branches.
Formal Definition
Why This Topic Matters
- Optimization over backtracking: When you need the best solution (not all), bounds can drastically reduce the search space.
- Classic problems: 0/1 Knapsack (max profit), Traveling Salesman (min cost), Job Assignment (min cost).
- Interview relevance: Less common than pure backtracking, but understanding "bound + prune" helps in optimization and DP discussions.
Mental Model: Tree + Bound + Prune
Think of the search as a tree:
- Each node = partial solution (e.g. first k items chosen for knapsack).
- Children = extensions (e.g. include item k+1 or exclude it).
- At each node: compute a bound (e.g. lower bound for min TSP = current cost + minimum remaining edge cost).
- If bound >= current best (for minimization), prune. Otherwise recurse into children.
- When we reach a leaf (full solution), update "best" if this solution is better.
Minimization example:
best = infinity
at node: bound = lower_bound(partial)
if bound >= best: return (prune)
if leaf: best = min(best, cost); return
for each child: recurse
Step-by-Step Breakdown
- State: Partial solution (e.g. current weight, value, or path), current index or level, and global best value (and optionally best solution).
- Bound: Compute a bound for the current partial solution. For minimization: lower bound = best possible completion; for maximization: upper bound.
- Prune: If bound cannot beat the current best, return without exploring children.
- Base case: If partial solution is complete (e.g. all items decided), update best if better and return.
- Branch: Generate children (e.g. include next item / exclude next item); for each child, recurse.
Backtracking vs Branch and Bound
| Aspect | Backtracking | Branch & Bound |
|---|---|---|
| Goal | Find all solutions or one solution | Find optimal solution (min or max) |
| Pruning | Only when constraint violated (e.g. invalid) | When bound shows subtree cannot beat current best |
| Extra work per node | None (or simple feasibility) | Compute bound (lower/upper) |
Example: 0/1 Knapsack (Maximization)
Given weights and values of n items and capacity W, maximize total value with total weight <= W. Branch: at each item, include or exclude. Bound: an upper bound for the current partial solution (e.g. current value + fractional knapsack value of remaining items—greedy upper bound). If this upper bound <= best value so far, prune.
# Sketch: items = [(w1,v1), ...], capacity W. best_value = 0.
def knapsack_bb(i: int, cur_w: int, cur_v: int) -> None:
if cur_w > W:
return
if i == n:
global best_value
best_value = max(best_value, cur_v)
return
# Upper bound: cur_v + fractional knapsack value of items i..n-1 with remaining weight (W - cur_w)
ub = upper_bound(i, cur_w, cur_v)
if ub <= best_value:
return # prune
knapsack_bb(i + 1, cur_w, cur_v) # exclude
knapsack_bb(i + 1, cur_w + w[i], cur_v + v[i]) # include
A good upper bound is critical: too loose gives little pruning; too tight (invalid) can miss the optimum. Fractional knapsack bound is valid and often strong.
Time and Space Complexity
- Time: Still exponential in the worst case (no pruning), but in practice a good bound can reduce the tree significantly. Depends heavily on the problem and bound quality.
- Space: O(depth of tree) for recursion; O(1) or O(n) for best solution and state.
Edge Cases
- No feasible solution: best remains at initial value (e.g. -infinity for max, +infinity for min); report "no solution."
- Bound computation: Ensure the bound is valid (never underestimate a lower bound, never overestimate an upper bound for max problems) or pruning may discard the optimal solution.
Common Mistakes
Practice Problems
- 0/1 Knapsack (max profit) with fractional upper bound.
- Traveling Salesman: lower bound = current cost + MST of remaining cities (or similar).
- Job Assignment (min cost): bound = current cost + minimum possible cost of assigning remaining jobs.
Summary
- Branch and Bound: Branch (split into subproblems) and bound (estimate best value in subtree); prune if bound cannot beat current best.
- Used for optimization (min or max); keep a global "best" and update at leaves.
- Bounds must be valid (lower bound <= true minimum; upper bound >= true maximum) or pruning may miss the optimum.
- Strong bounds and good branching order improve pruning and speed.
14.10 Pruning Techniques
Introduction
Pruning in backtracking and search means cutting off branches of the search tree that cannot lead to a solution (or to a better solution, in optimization). Without pruning, we may explore millions of useless nodes; with good pruning, we often reduce the search space dramatically. This topic summarizes pruning techniques: when and how to detect "this branch is hopeless" and skip it. The ideas apply across subsets, permutations, combination sum, N Queens, Sudoku, and branch-and-bound.
Real-World Analogy
Imagine searching a maze by trying every path. As soon as you see a sign "dead end" or "no exit," you stop walking that way and try another. Pruning is that sign: we compute something (constraint violated? bound too bad? duplicate?) and decide "don't go further here." The earlier and more accurate the sign, the less walking we do.
Formal Definition
Why This Topic Matters
- Efficiency: Good pruning can turn an infeasible brute force into an acceptable solution.
- Interview: Saying "I'll prune when..." shows you think about search space and optimization.
- Reuse: The same ideas (constraint check, bound, skip duplicate) apply to many problems.
Mental Model
Before or after extending the current node, ask:
- Is the current (partial) solution already invalid? → Constraint prune.
- Can this subtree never beat the best we have? → Bound prune.
- Is this branch equivalent to one we already explored? → Symmetry / duplicate prune.
- Do we only need one solution? → Early terminate when we find it.
Types of Pruning
1. Constraint Pruning (Feasibility)
If the partial solution already violates a problem constraint, no extension can fix it. Examples:
- Combination Sum: If
cur_sum > target, stop—adding more will only increase the sum. - N Queens: If placing a queen at (r, c) attacks an existing queen, don't place it (or don't recurse).
- Knapsack: If
cur_weight > W, return; no need to add more items.
Implement by checking the constraint at the start of the recursive call or before recursing; if not feasible: return.
2. Bound Pruning (Optimization)
Used when we want the best solution. Compute a bound (lower for min, upper for max). If bound cannot beat the current best, prune. See Branch & Bound (14.9).
3. Symmetry and Duplicate Pruning
Avoid exploring branches that are equivalent to ones already explored. Examples:
- Subsets / Combination Sum: Use a "start index"—only consider candidates from index
ionward so we don't get [2,3] and [3,2] as two different branches. - Subsets II / Combination Sum II: Sort and skip: if
j > startandcandidates[j] == candidates[j-1], skipjso we don't generate the same combination twice. - Permutations II: Skip placing the same value at the same position twice (e.g.
nums[i]==nums[i-1]andnot used[i-1]).
4. Ordering Heuristics
Order in which we try choices can affect how much we prune:
- Try promising branches first: In branch-and-bound, trying "include high-value item" before "exclude" may improve the best solution early, so later bound pruning is more effective.
- Try constrained options first: In Sudoku, choosing the cell with fewest valid digits (most constrained) can reduce branching.
5. Early Termination
When the problem asks for "one solution" or "any valid path," return as soon as you find it. Don't continue to explore other branches. Example: Rat in Maze "find one path"—if reached destination: return True and propagate True up so the caller stops trying other directions.
Example: Pruning in Combination Sum
def backtrack(start: int, cur_sum: int) -> None:
if cur_sum == target:
result.append(path[:])
return
if cur_sum > target: # constraint prune
return
for j in range(start, len(candidates)): # start index = symmetry/order prune
path.append(candidates[j])
backtrack(j, cur_sum + candidates[j]) # I: same j
path.pop()
Here cur_sum > target is constraint pruning; start ensures we don't generate the same combination in different orders.
Example: Pruning in N Queens
We don't "prune" in the sense of returning early from a node—we simply never recurse into invalid cells. The "safety" check (column and diagonals) acts as pruning: we only branch into safe columns. So "try each column, recurse only if safe" is constraint pruning at the branch level.
Common Mistakes
if cur_sum > target: return is simple and avoids many useless recursions. Missing it can cause TLE or stack overflow on large targets.
Practice Problems
- Revisit Subsets II, Combination Sum II, Permutations II and name the pruning used (duplicate/symmetry).
- Revisit N Queens: no explicit "prune" return, but only branching into safe columns is constraint pruning.
- 0/1 Knapsack with branch-and-bound: bound pruning + optional constraint prune (weight > W).
Summary
- Pruning = not exploring branches that can't lead to a (or a better) solution.
- Constraint pruning: Partial solution violates a constraint → return.
- Bound pruning: Bound shows subtree can't beat current best → return (optimization).
- Symmetry/duplicate pruning: Start index, sort+skip duplicates, so we don't explore equivalent branches.
- Early termination: Return as soon as one solution is found when that's all we need.
Section 15: Dynamic Programming
This section covers dynamic programming (DP): solving problems by breaking them into overlapping subproblems and storing results so each subproblem is solved once. You will learn memoization (top-down: recurse and cache) and tabulation (bottom-up: fill a table), then apply them to classic problems like Fibonacci, Knapsack, LCS, LIS, and Coin Change. State design and transition design are the keys: once you define "what to store" and "how to combine," the code follows.
15.1 Memoization
Introduction
Memoization is a technique to make recursive algorithms efficient by caching the results of function calls. When a function is called with the same arguments again, we return the stored result instead of recomputing. This is the top-down approach to dynamic programming: you write the natural recurrence (e.g. fib(n) = fib(n-1) + fib(n-2)), then add a cache so each subproblem is solved only once. Memoization turns exponential-time recursion into time proportional to the number of distinct subproblems.
Real-World Analogy
Imagine solving a puzzle that has many identical sub-puzzles. The first time you solve a sub-puzzle, you write the answer on a sticky note and stick it on that piece. The next time you need that same sub-puzzle, you read the note instead of solving it again. Memoization is that sticky note: the cache stores answers for "already solved" inputs so we never recompute them.
Formal Definition
Why This Topic Matters
- Foundation of DP: Many DP solutions are easiest to derive as recurrences; memoization implements them with minimal change.
- Interview standard: "First write the recurrence, then add memoization" is a common approach; tabulation can follow.
- Complexity: Reduces time from exponential (repeated subproblems) to O(number of states × cost per state).
Mental Model
For a recursive function f(args):
- If
argsis in the cache, returncache[args]. - Otherwise compute the result (using base cases and recursive calls to
f). - Store the result in
cache[args], then return it.
Every distinct set of arguments is a "state"; we compute each state at most once.
Step-by-Step: Fibonacci with Memoization
Recurrence: fib(0)=0, fib(1)=1, fib(n)=fib(n-1)+fib(n-2) for n≥2. Without cache, many repeated calls (e.g. fib(2) computed many times). With a cache keyed by n, we compute fib(n) only once per n.
- Base: n=0 or n=1 → return n.
- Check cache: if n in cache, return cache[n].
- Compute: result = fib(n-1) + fib(n-2) (these calls may use cache).
- Store cache[n] = result; return result.
Python Implementation
Explicit cache (dict)
def fib_memo(n: int, cache: dict = None) -> int:
if cache is None:
cache = {}
if n <= 1:
return n
if n in cache:
return cache[n]
result = fib_memo(n - 1, cache) + fib_memo(n - 2, cache)
cache[n] = result
return result
Using functools.lru_cache
from functools import lru_cache
@lru_cache(maxsize=None)
def fib_lru(n: int) -> int:
if n <= 1:
return n
return fib_lru(n - 1) + fib_lru(n - 2)
lru_cache automatically stores return values keyed by arguments. maxsize=None means unbounded cache. Arguments must be hashable.
Line-by-Line Explanation (Explicit Cache)
cachemaps n → fib(n). Defaultcache=Noneso we create one dict per top-level call.if n <= 1: return n— base case.if n in cache: return cache[n]— already computed; no recursion.result = fib_memo(n-1, cache) + fib_memo(n-2, cache)— recurse; nested calls will use the same cache and may return cached values.cache[n] = result; return result— store and return.
Time and Space Complexity
- Time: Each distinct argument (state) is computed once. For Fibonacci, states are 0..n, so O(n) calls with O(1) work each → O(n) time.
- Space: O(n) for the cache (n+1 entries) and O(n) for the recursion stack in the worst case. So O(n) total.
Without memoization, fib(n) would make about 2^n recursive calls; memoization reduces this to O(n).
Edge Cases
- n < 0: Define behavior (e.g. raise or extend definition).
- n = 0 or 1: Base case; no cache needed for these if you handle them before the cache check.
Cache Key Design
For functions with multiple arguments, the cache key must uniquely identify the state. Use a tuple of arguments: cache[(i, j)]. Ensure arguments are hashable (no lists; use tuples). For recursive DP on arrays, the state might be (index, extra_param).
Common Mistakes
return cache[key] after the lookup; otherwise you compute again and may overwrite the cache without using it.
Memoization vs Tabulation
Memoization: Top-down; recurse and cache; compute only needed states; recursion stack. Tabulation: Bottom-up; fill a table in order; often iterate; no recursion. Same asymptotic time and space when both solve the same subproblems. Choose memoization when the recurrence is natural and state space is sparse or order is tricky.
Practice Problems
- Fibonacci (above).
- Climbing stairs: ways to reach step n (same recurrence as fib).
- 0/1 Knapsack: memoize (index, remaining_weight).
- Longest Common Subsequence: memoize (i, j) on two string indices.
Summary
- Memoization = cache results of recursive calls; before compute, check cache; after compute, store and return.
- Transforms exponential recursion into O(states × work per state) time.
- Cache key = function arguments (tuple if multiple); must be hashable.
- Use explicit dict or
@lru_cache; same idea applies to any recurrence.
15.2 Tabulation
Introduction
Tabulation is the bottom-up approach to dynamic programming. Instead of recursing and
caching, we fill a table (usually an array or 2D array) in a fixed order so that when we
compute dp[i], all the values it depends on (e.g. dp[i-1], dp[i-2])
are already computed. We use loops, not recursion. The recurrence is the same as in memoization—only the
order of evaluation changes: we solve "smaller" subproblems first, then build up to the
answer.
Real-World Analogy
Building a wall brick by brick: you don't build the top row first and then ask "what's under it?" You build the bottom row, then the next, then the next. Each row depends only on the row below. Tabulation is like that: fill the table in an order where every cell you write only needs cells you've already written.
Formal Definition
dp[...] where each entry
corresponds to a state (e.g. dp[i] = answer for subproblem of size i). Determine an order
of filling so that when computing dp[state], all states it depends on are already computed.
Initialize base cases (e.g. dp[0], dp[1]), then iterate and fill the rest.
No recursion; the final answer is typically dp[n] or dp[0][m] etc.
Why This Topic Matters
- No recursion stack: Avoids stack overflow when the "depth" of subproblems is large.
- Same recurrence: Once you have the recurrence (from memoization or reasoning), tabulation is a mechanical rewrite: loops in dependency order.
- Space optimization: Often only a few previous rows or columns are needed, so we can reduce space (e.g. O(n) to O(1) for Fibonacci).
Mental Model
For a 1D recurrence like Fibonacci:
- Define
dp[i]= value for subproblem i. - Set base cases:
dp[0],dp[1], ... - For i from first "unknown" index to n:
dp[i] = f(dp[i-1], dp[i-2], ...)using the recurrence. - Return
dp[n](or the relevant entry).
Critical: fill in an order such that every dependency is already in the table.
Step-by-Step: Fibonacci with Tabulation
Recurrence: fib(0)=0, fib(1)=1, fib(i)=fib(i-1)+fib(i-2). Table: dp[i] = fib(i). Order: i = 0, 1, 2, ..., n. So we need dp[0] and dp[1] first; then for i ≥ 2, dp[i] depends only on dp[i-1] and dp[i-2], both already computed.
Python Implementation
Full table
def fib_tab(n: int) -> int:
if n <= 1:
return n
dp = [0] * (n + 1)
dp[0] = 0
dp[1] = 1
for i in range(2, n + 1):
dp[i] = dp[i - 1] + dp[i - 2]
return dp[n]
Space-optimized (only previous two)
def fib_tab_opt(n: int) -> int:
if n <= 1:
return n
prev2, prev1 = 0, 1
for i in range(2, n + 1):
curr = prev1 + prev2
prev2, prev1 = prev1, curr
return prev1
We only need the last two values, so O(1) extra space instead of O(n).
Line-by-Line Explanation
dp[i]stores fib(i). Base:dp[0]=0,dp[1]=1.- Loop
i from 2 to n:dp[i] = dp[i-1] + dp[i-2]— recurrence; i-1 and i-2 are already filled. - Return
dp[n]. In the optimized version,prev2andprev1roll forward so we never need the full array.
Time and Space Complexity
- Time: O(n) — one loop, O(1) work per iteration.
- Space (full table): O(n) for
dp. Space (optimized): O(1) for two variables.
Order of Filling (2D Example)
For a 2D table (e.g. dp[i][j] depends on dp[i-1][j] and dp[i][j-1]), we must fill so that when we compute dp[i][j], those cells are already done. Typical: fill row by row, or column by column. For dp[i][j] depending on dp[i+1][j] and dp[i][j+1], fill in reverse order (bottom-right to top-left).
Edge Cases
- n <= 1: Return n without building the table.
- Table size: Use
n+1for 0-indexed fib(n); ensure indices stay in bounds.
Tabulation vs Memoization
| Aspect | Memoization | Tabulation |
|---|---|---|
| Direction | Top-down (recurse to smaller) | Bottom-up (fill from base) |
| Structure | Recursion + cache | Loops + table |
| States computed | Only those reached by recurrence | All in the fill order (unless optimized) |
| Stack | Uses recursion stack | No recursion |
Common Mistakes
dp[i] depends on dp[i+1], you must fill from high index to low index, or the dependency is not yet computed.
dp = [0] * (n + 1) if you need indices 0..n; ensure every access is within bounds.
Practice Problems
- Fibonacci (above); climbing stairs (same table).
- 0/1 Knapsack: 2D table
dp[i][w]; fill by i and w. - LCS: 2D table
dp[i][j]; fill row by row or by (i+j).
Summary
- Tabulation = bottom-up: fill a table in dependency order so each cell's dependencies are already computed.
- Same recurrence as memoization; evaluation order is explicit (loops).
- No recursion → no stack overflow; often allows space optimization (e.g. keep only last row or last two values).
- Ensure correct fill order and indices; for 2D, state the order clearly.
15.3 State Design
Introduction
In dynamic programming, the state is the set of parameters that uniquely define a subproblem. Choosing the right state is the first and most important step: it determines the dimensions of your table (or the keys in your memo), the recurrence, and the complexity. State design means deciding: "What do I need to know to answer this subproblem?" Once the state is clear, the transition (how to combine smaller states) often follows naturally.
Real-World Analogy
Imagine solving "best way to get from A to B" with choices along the way. The "state" is your current situation: where you are and what resources you have left (e.g. time, money). Two people in the same place with the same resources face the same subproblem—so we store the answer keyed by (place, resources). State design is asking: "What exactly is 'same situation'?" so we don't store redundant or insufficient information.
Formal Definition
Why This Topic Matters
- First step in any DP: Wrong state → wrong or inefficient solution; right state → clean recurrence.
- Interview: "What is your state?" is a standard question; answering clearly shows you understand the problem.
- Complexity: Number of states = table size (or cache size); minimizing state dimensions keeps complexity manageable.
Mental Model: What Must We Remember?
Ask: "If I'm in the middle of the problem, what information do I need to compute the rest without redoing the past?" That information is your state. Examples:
- Fibonacci: "How many steps left?" → state = n (one parameter).
- 0/1 Knapsack: "Which items are left and how much weight capacity?" → state = (index, remaining_weight).
- LCS: "How much of each string is left?" → state = (i, j) (prefix lengths or indices).
Step-by-Step: How to Choose State
- Identify the "decisions" or "progress": What choices are we making? (e.g. which item to take, which character to match.)
- Ask what we need to know after each decision: Typically "how much of the input is left" (indices) and any "resource" (capacity, count, flag).
- Express as parameters: State = (index, ...) or (i, j, ...). Ensure that two different states never have the same answer when they should differ (and that equivalent situations map to the same state).
- Check base cases: Smallest state(s) should have known answers (e.g. empty input, zero capacity).
Common State Patterns
1D State
Single parameter: position, index, or "length." Examples: Fibonacci (n), climbing stairs (step), house robber (index). Table: dp[i].
2D State
Two parameters: often two indices (two strings, two sequences) or (index, capacity). Examples: LCS (i, j), knapsack (i, w), edit distance (i, j). Table: dp[i][j] or dp[i][w].
State with Extra Dimension
Sometimes we need an extra parameter: "number of items taken," "whether we took the previous," "remaining k." Examples: "at most k transactions" → (index, k); "paint n houses with no two adjacent same color" → (house, color). Table: dp[i][k] or dp[i][color].
Example: 0/1 Knapsack State
Problem: Items with weight and value; capacity W. Maximize value with total weight ≤ W.
State: (i, w) = "considering items from index i onward, with w weight remaining." Answer for (i, w) = max value we can get from items i..n-1 with capacity w. Base: i = n (no items) → 0; or w = 0 → 0. Transition: skip item i → (i+1, w); take item i → value[i] + (i+1, w - weight[i]) if w ≥ weight[i]. State is 2D: index and remaining capacity.
Example: LCS State
Problem: Longest common subsequence of two strings A, B.
State: (i, j) = "LCS of A[0..i-1] and B[0..j-1]" (prefixes of length i and j). Base: i=0 or j=0 → 0. Transition: if A[i-1]==B[j-1] then 1 + dp(i-1,j-1); else max(dp(i-1,j), dp(i,j-1)). State is 2D: two indices.
Common Mistakes
Practice Problems
- Climbing stairs: state = step index (1D).
- 0/1 Knapsack: state = (index, remaining weight) (2D).
- LCS: state = (i, j) (2D).
- House robber: state = index; sometimes "did we take previous?" (index, 0/1).
Summary
- State = parameters that uniquely define a subproblem; must be sufficient and ideally minimal.
- Ask: "What do I need to know to solve the rest from here?" → that's your state.
- Common: 1D (index/length), 2D (two indices or index + capacity), or 2D + extra (index, k or index, flag).
- Wrong state → wrong recurrence; right state makes the transition natural.
15.4 Transition Design
Introduction
Once the state is defined, the transition is the rule that expresses
dp[state] in terms of other states. It answers: "Given that I'm in this state, what choices
do I have, and how do the results of those choices combine?" Transition design is the heart of the
recurrence: base cases handle the smallest states; for every other state we write an equation (or
min/max/sum over choices) that uses only "smaller" or "already computed" states. Getting the transition
right is what makes the DP correct.
Real-World Analogy
At every step you have a few choices. Each choice leads to a new situation (a new state). The "transition" is the rule: "My best outcome from here = (best outcome if I choose A) combined with (best outcome if I choose B)"—e.g. take the max, or add the cost of the choice and recurse. You're not inventing new math; you're writing down "what happens if I do this" and "how do I combine the outcomes."
Formal Definition
dp[S] using
only base cases and dp[S'] for states S' that are "smaller" or "already computed." Typically:
dp[S] = combine( choice_1(S), choice_2(S), ... ) where each choice may involve a cost/reward
and a recursive dp[next_state]. Combine is often max, min, or + (sum). The transition must
only reference states that are computed before S in the fill order (tabulation) or that eventually hit
base cases (memoization).
Why This Topic Matters
- Correctness: Wrong transition → wrong answer; the transition encodes the problem logic.
- Interview: "What are the choices at this state?" and "How do you combine them?" are the next questions after state.
- Fill order: In tabulation, the transition dictates which states must be computed first (dependencies).
Mental Model: Choices and Combine
For each state, ask:
- What are my choices? (e.g. take item / skip item; use character i / skip i.)
- What state does each choice lead to? (e.g. (i+1, w) or (i+1, w - weight[i]).)
- What is the "value" of each choice? (immediate cost or reward + value of the next state.)
- How do I combine? Usually max (optimization), min (minimization), or + (counting/sum).
Common Transition Patterns
Linear combination (Fibonacci-style)
dp[i] = dp[i-1] + dp[i-2] (or a linear combination of a few previous terms). No explicit "choice"; the recurrence is the rule. Base: dp[0], dp[1].
Take or skip (knapsack-style)
At state (i, w): choice 1 — skip item i → go to (i+1, w), value = dp(i+1, w). Choice 2 — take item i (if w ≥ weight[i]) → value = value[i] + dp(i+1, w - weight[i]). Then dp(i,w) = max(choice1, choice2). Base: when i = n or w = 0.
Match or skip (LCS / edit distance-style)
At state (i, j): if A[i-1] == B[j-1] then we can "match" → 1 + dp(i-1, j-1). Else we "skip" one character: max(dp(i-1, j), dp(i, j-1)). So dp(i,j) = 1 + dp(i-1,j-1) if match, else max(dp(i-1,j), dp(i,j-1)). Base: i=0 or j=0 → 0.
Min/max over many options
Sometimes we take the best over several next states: dp[i] = min over k of (cost(i,k) + dp[k]). Examples: matrix chain multiplication, segment splits. Ensure k runs over valid indices and that dp[k] is computed before dp[i] (or use memoization).
Example: 0/1 Knapsack Transition
# dp[i][w] = max value from items i..n-1 with capacity w
# Base: dp[n][w] = 0 for all w; dp[i][0] = 0
# Transition:
if w < weight[i]:
dp[i][w] = dp[i+1][w] # can't take
else:
dp[i][w] = max( dp[i+1][w], value[i] + dp[i+1][w - weight[i]] )
Two choices: skip (same w, next i) or take (value plus state with reduced w and next i). Take the max.
Example: LCS Transition
# dp[i][j] = LCS length of A[0..i-1] and B[0..j-1]
# Base: dp[0][j] = dp[i][0] = 0
if A[i-1] == B[j-1]:
dp[i][j] = 1 + dp[i-1][j-1]
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
Match (both advance) or skip one of the two characters; take max of the two skip options.
Dependency Order
In tabulation, we must fill the table so that when we compute dp[i][j], every state it depends on is already computed. For knapsack dp[i][w] depends on dp[i+1][...], so we fill i from n-1 down to 0 (or j from 0 to n depending on definition). For LCS dp[i][j] depends on dp[i-1][j-1], dp[i-1][j], dp[i][j-1] — all with "smaller" i+j or smaller i,j — so we can fill row by row, or by increasing i+j.
Common Mistakes
Practice Problems
- Fibonacci: transition dp[i] = dp[i-1] + dp[i-2].
- 0/1 Knapsack: transition = max(skip, take) with correct indices.
- LCS: transition = match or max(skip A, skip B).
- Climbing stairs: dp[i] = dp[i-1] + dp[i-2] (or sum of last k steps for k steps at a time).
Summary
- Transition = how dp[state] is computed from other states; encodes choices and combine rule (max/min/sum).
- For each state: list choices → next state and value for each → combine (usually max, min, or +).
- Patterns: linear combo, take/skip, match/skip, min over options; base cases handle smallest states.
- In tabulation, fill order must respect dependencies; in memoization, recursion handles order.
15.5 Fibonacci
Introduction
The Fibonacci sequence is the classic first example of dynamic programming. Defined by
F(0)=0, F(1)=1, and F(n)=F(n-1)+F(n-2) for n≥2, it has overlapping
subproblems: computing F(n) naively by recursion recomputes F(n-2), F(n-3), ... many times, leading to
exponential time. Fibonacci illustrates state (single parameter n), transition (add two previous), and the
full progression: brute force → memoization → tabulation → space-optimized tabulation.
Why Fibonacci for DP
- Overlapping subproblems: F(n-2) is needed for both F(n) and F(n-1); without storing it, we recompute.
- Simple state: One parameter n; state space is 0..n.
- Simple transition: F(n) = F(n-1) + F(n-2); no choices, just combine.
- Same pattern everywhere: Climbing stairs, tile problems, and many counting problems use the same recurrence.
State and Transition
State: n (or index i) — "Fibonacci value at position n."
Base cases: dp[0] = 0, dp[1] = 1.
Transition: dp[i] = dp[i-1] + dp[i-2] for i ≥ 2. No "choice"; it's a direct recurrence.
Brute Force → Memoization → Tabulation → Space-Optimized
1. Brute force (recursion only)
def fib_naive(n: int) -> int:
if n <= 1:
return n
return fib_naive(n - 1) + fib_naive(n - 2)
Time: O(2^n) — each call branches into two; many repeated subproblems. Space: O(n) stack.
2. Memoization (top-down)
def fib_memo(n: int, cache: dict = None) -> int:
if cache is None:
cache = {}
if n <= 1:
return n
if n in cache:
return cache[n]
cache[n] = fib_memo(n - 1, cache) + fib_memo(n - 2, cache)
return cache[n]
Time: O(n). Space: O(n) cache + O(n) stack.
3. Tabulation (bottom-up, full table)
def fib_tab(n: int) -> int:
if n <= 1:
return n
dp = [0] * (n + 1)
dp[0], dp[1] = 0, 1
for i in range(2, n + 1):
dp[i] = dp[i - 1] + dp[i - 2]
return dp[n]
Time: O(n). Space: O(n) table; no recursion stack.
4. Space-optimized tabulation
def fib_opt(n: int) -> int:
if n <= 1:
return n
prev2, prev1 = 0, 1
for i in range(2, n + 1):
curr = prev1 + prev2
prev2, prev1 = prev1, curr
return prev1
Time: O(n). Space: O(1) — only two variables. We only need the last two values to compute the next.
Time and Space Summary
| Approach | Time | Space |
|---|---|---|
| Brute force | O(2^n) | O(n) stack |
| Memoization | O(n) | O(n) |
| Tabulation | O(n) | O(n) |
| Space-optimized | O(n) | O(1) |
Edge Cases
- n < 0: Usually undefined; raise or define (e.g. extend for negative indices if needed).
- n = 0 or 1: Return n; handle before building any table.
- Large n: Values grow fast; use modulo if problem asks for F(n) mod M to avoid overflow (Python ints are big, but modulo is common in contests).
Common Mistakes
curr = prev1 + prev2 then prev2, prev1 = prev1, curr. Updating prev1 before computing curr loses the previous value.
Related Problems (Same Idea)
- Climbing stairs: Number of ways to reach step n (1 or 2 steps at a time) — same recurrence as Fibonacci.
- House robber: Can't take two adjacent; dp[i] = max(dp[i-1], nums[i] + dp[i-2]) — similar "use previous two" structure.
- Tile a 2×n board: Often Fibonacci or a small variant.
Summary
- Fibonacci: F(0)=0, F(1)=1, F(n)=F(n-1)+F(n-2); state = n, transition = add previous two.
- Brute force is O(2^n); memoization and tabulation are O(n) time; space-optimized tabulation is O(1) space.
- Same pattern applies to climbing stairs and many counting problems.
- Handle n ≤ 1; watch off-by-one and variable update order in the O(1)-space version.
15.6 Knapsack (0/1 & Unbounded)
Introduction
The knapsack problem: given items with weights and values, and a capacity W, choose items to maximize total value without exceeding capacity. Two main variants: 0/1 Knapsack — each item at most once; Unbounded Knapsack — each item can be used any number of times. Both are solved by DP with state (index, capacity) or (capacity only for unbounded) and transition "take or skip" (0/1) or "take one more of current item or move on" (unbounded). Knapsack is the template for many "choose subset with constraint" problems.
Problem Definition
- Input: n items; item i has weight
wt[i]and valueval[i]; capacity W. - 0/1: Each item at most once. Maximize sum of values of chosen items such that sum of weights ≤ W.
- Unbounded: Each item can be taken any number of times. Same objective.
0/1 Knapsack: State and Transition
State: dp[i][w] = maximum value we can get from items 0..i-1 with capacity w (or: from items i..n-1 with remaining capacity w — definition can vary; below we use "items 0..i-1, capacity w").
Base: dp[0][w] = 0 for all w (no items); dp[i][0] = 0 (no capacity).
Transition: For item i-1 (0-indexed), we have two choices:
- Skip:
dp[i][w] = dp[i-1][w]. - Take: If
wt[i-1] ≤ w,dp[i][w] = val[i-1] + dp[i-1][w - wt[i-1]]. dp[i][w] = max(skip, take)(take only if valid).
Answer: dp[n][W].
0/1 Knapsack: Python (Tabulation)
def knapsack_01(wt: list, val: list, W: int) -> int:
n = len(wt)
dp = [[0] * (W + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for w in range(1, W + 1):
dp[i][w] = dp[i - 1][w] # skip
if wt[i - 1] <= w:
take = val[i - 1] + dp[i - 1][w - wt[i - 1]]
dp[i][w] = max(dp[i][w], take)
return dp[n][W]
0/1 Knapsack: Space Optimization
dp[i][w] only depends on dp[i-1][...]. So we can use a single row (or two rows) and fill w from right to left so we don't overwrite values we still need: dp[w] = max(dp[w], val[i-1] + dp[w - wt[i-1]]) for w from W down to wt[i-1].
def knapsack_01_opt(wt: list, val: list, W: int) -> int:
dp = [0] * (W + 1)
for i in range(len(wt)):
for w in range(W, wt[i] - 1, -1): # reverse so we don't use updated dp
dp[w] = max(dp[w], val[i] + dp[w - wt[i]])
return dp[W]
Unbounded Knapsack: State and Transition
State: dp[w] = maximum value we can get with capacity w, using any item any number of times.
Base: dp[0] = 0.
Transition: For each capacity w, try including one copy of each item i: if wt[i] ≤ w, dp[w] = max(dp[w], val[i] + dp[w - wt[i]]). We iterate w from 0 to W so that dp[w - wt[i]] may already include more of item i (unbounded).
Answer: dp[W].
Unbounded Knapsack: Python
def knapsack_unbounded(wt: list, val: list, W: int) -> int:
dp = [0] * (W + 1)
for w in range(1, W + 1):
for i in range(len(wt)):
if wt[i] <= w:
dp[w] = max(dp[w], val[i] + dp[w - wt[i]])
return dp[W]
Note: We iterate w forward (0 to W) so that the same item can be used multiple times (we use updated dp[w - wt[i]]). In 0/1 we iterate w backward to avoid using the same item twice.
0/1 vs Unbounded: Key Difference
| Aspect | 0/1 Knapsack | Unbounded Knapsack |
|---|---|---|
| Item use | At most once | Unlimited |
| State | (i, w) or 1D with reverse w loop | (w) — 1D |
| Loop order (1D) | w from W down to wt[i] | w from 1 to W (forward) |
Time and Space Complexity
- 0/1: Time O(n×W), space O(n×W) full table or O(W) with 1D and reverse loop.
- Unbounded: Time O(n×W), space O(W).
Edge Cases
- W = 0 or n = 0: Answer 0.
- All weights > W: Answer 0.
- Negative weight/value: Standard formulation assumes non-negative; if allowed, problem changes.
Common Mistakes
Practice Problems
- LeetCode 416: Partition Equal Subset Sum (0/1: can we reach sum/2?).
- LeetCode 518: Coin Change 2 (unbounded: number of ways to make amount).
- LeetCode 322: Coin Change (unbounded: minimum number of coins).
Summary
- 0/1 Knapsack: Each item once; state (i, w); transition max(skip, take); 1D with w loop reversed.
- Unbounded Knapsack: Each item unlimited; state (w); transition try each item, dp[w] = max(val[i] + dp[w-wt[i]]); w loop forward.
- Reverse w in 0/1 to avoid using same item twice; forward w in unbounded to allow reuse.
- Time O(n×W); space O(W) with optimization for both.
15.7 LCS
Introduction
The Longest Common Subsequence (LCS) of two strings is the longest sequence of characters that appears in both strings in the same relative order (not necessarily contiguous). For example, LCS of "abcde" and "ace" is "ace" (length 3). LCS is a classic 2D DP: state (i, j) = LCS length of the prefixes A[0..i-1] and B[0..j-1]; transition is "match if equal, else skip one character from either string." It appears in diff tools, bioinformatics, and many string problems.
Formal Definition
State and Transition
State: dp[i][j] = length of LCS of A[0..i-1] and B[0..j-1] (prefixes of length i and j).
Base: dp[0][j] = dp[i][0] = 0 for all i, j (empty prefix has LCS length 0).
Transition:
- If
A[i-1] == B[j-1]: we can match this character →dp[i][j] = 1 + dp[i-1][j-1]. - Else: we skip one character — either from A or from B →
dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
Answer: dp[len(A)][len(B)].
Python Implementation
def lcs(A: str, B: str) -> int:
m, n = len(A), len(B)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if A[i - 1] == B[j - 1]:
dp[i][j] = 1 + dp[i - 1][j - 1]
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
return dp[m][n]
Line-by-Line Explanation
dp[i][j]uses 1-based length for i, j; indices into A, B are i-1, j-1.- Match:
A[i-1] == B[j-1]→ extend LCS by 1, so1 + dp[i-1][j-1]. - No match: best of "skip A[i-1]" (
dp[i-1][j]) or "skip B[j-1]" (dp[i][j-1]). - Fill order: row by row (or column by column); when we compute dp[i][j], dp[i-1][j-1], dp[i-1][j], dp[i][j-1] are already computed.
Reconstructing One LCS (Backtrack)
To recover one LCS string, backtrack from dp[m][n]: if A[i-1]==B[j-1], include that character and go to (i-1, j-1); else go to (i-1, j) or (i, j-1) depending on which gave the max (or either if equal). Build the string in reverse, then reverse at the end.
def lcs_string(A: str, B: str) -> str:
m, n = len(A), len(B)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if A[i - 1] == B[j - 1]:
dp[i][j] = 1 + dp[i - 1][j - 1]
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
# Backtrack
i, j = m, n
res = []
while i and j:
if A[i - 1] == B[j - 1]:
res.append(A[i - 1])
i, j = i - 1, j - 1
elif dp[i - 1][j] >= dp[i][j - 1]:
i -= 1
else:
j -= 1
return "".join(reversed(res))
Time and Space Complexity
- Time: O(m×n) — two nested loops over m and n.
- Space: O(m×n) for the table. Can be reduced to O(min(m,n)) by keeping only two rows (current and previous), since dp[i][j] only depends on row i-1 and current row.
Edge Cases
- Empty string: LCS of "" and anything is ""; length 0. Our base case handles this.
- No common character: LCS length 0; dp stays 0.
- One string is subsequence of the other: LCS length = length of the shorter string.
Common Mistakes
Related Problems
- Longest Common Substring: Contiguous; different recurrence (reset to 0 when mismatch).
- Edit distance (Levenshtein): Similar 2D table; insert/delete/replace with costs.
- Shortest Common Supersequence: Length = len(A) + len(B) - LCS(A,B).
Practice Problems
- LeetCode 1143: Longest Common Subsequence (length).
- Print one LCS (backtrack above).
- LeetCode 72: Edit Distance (related 2D DP).
Summary
- LCS: longest subsequence common to two strings; state (i, j) = LCS length of prefixes of length i, j.
- Transition: match → 1 + dp[i-1][j-1]; else max(dp[i-1][j], dp[i][j-1]).
- Time O(m×n), space O(m×n) or O(min(m,n)) with two rows.
- Subsequence ≠ substring; backtrack table to recover one LCS string.
15.8 LIS
Introduction
The Longest Increasing Subsequence (LIS) problem: given an array of numbers, find the length of the longest subsequence that is strictly (or non-strictly) increasing. For example, in [10, 9, 2, 5, 3, 7, 101, 18], one LIS is [2, 3, 7, 101] with length 4. LIS can be solved in O(n²) with DP (dp[i] = longest increasing subsequence ending at index i) or in O(n log n) using a "tails" array and binary search. It is a classic DP and appears in many forms (e.g. Russian doll envelopes, building bridges).
Formal Definition
O(n²) DP: State and Transition
State: dp[i] = length of the longest increasing subsequence that ends at index i (and includes nums[i]).
Base: dp[i] = 1 for all i (each element alone is a subsequence of length 1).
Transition: For each i, consider all j < i such that nums[j] < nums[i]. We can extend the LIS ending at j by appending nums[i], so dp[i] = 1 + max{ dp[j] : j < i and nums[j] < nums[i] }. If no such j exists, dp[i] stays 1.
Answer: max(dp[0], dp[1], ..., dp[n-1]).
O(n²) Python Implementation
def lis_n2(nums: list) -> int:
if not nums:
return 0
n = len(nums)
dp = [1] * n
for i in range(1, n):
for j in range(i):
if nums[j] < nums[i]:
dp[i] = max(dp[i], 1 + dp[j])
return max(dp)
O(n log n) Approach: Tails + Binary Search
We maintain an array tails where tails[k] = smallest ending value of an increasing subsequence of length k+1 seen so far. For each new value x, we either extend the longest chain (append to tails) or replace the first element in tails that is ≥ x (so we keep a "better" candidate for that length). Finding that position is a binary search. The length of LIS is the length of tails at the end.
import bisect
def lis_nlogn(nums: list) -> int:
tails = []
for x in nums:
pos = bisect.bisect_left(tails, x)
if pos == len(tails):
tails.append(x)
else:
tails[pos] = x
return len(tails)
bisect_left(tails, x) returns the index where x would be inserted to keep tails sorted. If pos == len(tails), all current endings are < x, so we extend. Otherwise we replace tails[pos] with x (same length, better ending value for future).
Comparison: O(n²) vs O(n log n)
| Aspect | O(n²) DP | O(n log n) Tails |
|---|---|---|
| Time | O(n²) | O(n log n) |
| Space | O(n) | O(n) for tails |
| Reconstruct LIS | Easy: backtrack parent pointers from max dp[i] | Tails doesn't store indices; need extra structure to recover |
Time and Space Complexity
- O(n²): Time O(n²), space O(n) for dp array.
- O(n log n): Time O(n log n) (n elements, binary search each), space O(n) for tails (length at most n).
Edge Cases
- Empty array: LIS length 0.
- Single element: LIS length 1.
- Strictly decreasing: LIS length 1 (each element alone).
- Non-strict (≤): Use
bisect_rightand replace when > instead of ≥ if problem allows equal.
Common Mistakes
Practice Problems
- LeetCode 300: Longest Increasing Subsequence (length).
- LeetCode 354: Russian Doll Envelopes (sort + LIS on second dimension).
- Number of LIS (count; different DP).
Summary
- LIS: longest increasing (strict or non-strict) subsequence; we usually compute length.
- O(n²) DP: dp[i] = LIS ending at i; dp[i] = 1 + max dp[j] for j < i, nums[j] < nums[i]; answer max(dp).
- O(n log n): Tails array + binary search; length of tails = LIS length.
- Empty/single element edge cases; use bisect_left for strict LIS in the O(n log n) method.
15.9 Coin Change
Introduction
The Coin Change problem: given coins of certain denominations and a target amount, find the minimum number of coins needed to make that amount (each coin can be used unlimited times), or find the number of distinct combinations that make the amount. Both are unbounded DP: state is amount (or amount + some index depending on formulation); we iterate over amounts and try each coin. Coin Change is a direct application of the unbounded knapsack idea—minimize coins (min instead of max) or count ways (sum over choices).
Two Classic Variants
- Minimum number of coins (LeetCode 322): Return the fewest number of coins that sum to amount. If impossible, return -1.
- Number of combinations (LeetCode 518): Return the number of combinations that make amount. Order does not count (1+2 and 2+1 are the same).
Variant 1: Minimum Number of Coins
State: dp[a] = minimum number of coins needed to make amount a (using coins unlimited times).
Base: dp[0] = 0 (no coins needed for 0). Initialize dp[a] = infinity for a > 0 (or a large number) to represent "not yet reachable."
Transition: For each amount a from 1 to amount, try each coin c: if c <= a, dp[a] = min(dp[a], 1 + dp[a - c]). After the loop, if dp[amount] is still infinity, return -1.
def coin_change_min(coins: list, amount: int) -> int:
INF = float("inf")
dp = [INF] * (amount + 1)
dp[0] = 0
for a in range(1, amount + 1):
for c in coins:
if c <= a:
dp[a] = min(dp[a], 1 + dp[a - c])
return dp[amount] if dp[amount] != INF else -1
We iterate a forward so that the same coin can be used multiple times (unbounded).
Variant 2: Number of Combinations
State: dp[a] = number of ways to make amount a (order of coins doesn't matter—combinations).
Base: dp[0] = 1 (one way to make 0: use no coins).
Transition: To avoid counting (1,2) and (2,1) as different, we iterate by coin first, then amount. For each coin c, for each amount a from c to amount: dp[a] += dp[a - c]. This way we build combinations in a fixed order (e.g. use coin 1 first, then coin 2, ...).
def coin_change_ways(coins: list, amount: int) -> int:
dp = [0] * (amount + 1)
dp[0] = 1
for c in coins:
for a in range(c, amount + 1):
dp[a] += dp[a - c]
return dp[amount]
If we iterated amount first then coin, we would count permutations (order matters). For combinations, coin-first is correct.
Key Difference: Loop Order for Combinations
Time and Space Complexity
- Minimum coins: Time O(amount × len(coins)), space O(amount).
- Number of ways: Time O(amount × len(coins)), space O(amount).
Edge Cases
- amount = 0: Min coins = 0; number of ways = 1.
- No solution (min coins): e.g. amount = 3, coins = [2]. Return -1.
- Empty coins: Min coins for amount > 0 is impossible (-1); ways = 0.
Common Mistakes
Practice Problems
- LeetCode 322: Coin Change (minimum number of coins).
- LeetCode 518: Coin Change 2 (number of combinations).
- Number of permutations to make amount (amount loop outer, coins inner).
Summary
- Min coins: dp[a] = min(1 + dp[a-c]) over coins c; dp[0]=0, else init infinity; return -1 if dp[amount] still infinity.
- Number of combinations: dp[0]=1; loop coins outer, amount inner; dp[a] += dp[a-c].
- Loop order matters: coins-first for combinations, amount-first for min coins (either order works for min).
- Time O(amount × |coins|), space O(amount).
15.10 Matrix Chain Multiplication
Introduction
Matrix Chain Multiplication: given a chain of matrices with compatible dimensions, find the order of multiplying them (where to put parentheses) that minimizes the total number of scalar multiplications. Multiplying an (a×b) matrix by a (b×c) matrix costs a×b×c scalar multiplications. The order of multiplication can change the total cost dramatically. This is a classic interval DP or partition DP: we define dp[i][j] = min cost to multiply matrices i through j, and try every split point k between i and j.
Problem Definition
We have n matrices A0, A1, ..., An-1. Matrix Ai has dimensions d[i] × d[i+1]. So we are given an array d of length n+1: d[0], d[1], ..., d[n]. The product A0×A1×...×An-1 is well-defined and results in a d[0]×d[n] matrix. We want the minimum number of scalar multiplications to compute this product (by choosing the order of operations).
Example: Three matrices: 10×20, 20×30, 30×40. (A×B)×C costs 10×20×30 + 10×30×40 = 6000 + 12000 = 18000. A×(B×C) costs 20×30×40 + 10×20×40 = 24000 + 8000 = 32000. So the first order is better.
State and Transition
State: dp[i][j] = minimum number of scalar multiplications to compute the product Ai×Ai+1×...×Aj (i and j 0-indexed, i ≤ j).
Base: dp[i][i] = 0 (single matrix—no multiplication needed).
Transition: For i < j, we must split at some k where i ≤ k < j: multiply Ai...Ak (cost dp[i][k]), multiply Ak+1...Aj (cost dp[k+1][j]), then multiply the two results. The two results have dimensions d[i]×d[k+1] and d[k+1]×d[j+1], so that final multiplication costs d[i] * d[k+1] * d[j+1]. So:
dp[i][j] = min over k in [i, j) of { dp[i][k] + dp[k+1][j] + d[i]*d[k+1]*d[j+1] }
Fill Order
dp[i][j] depends on dp[i][k] and dp[k+1][j] where both segments [i,k] and [k+1,j] are shorter than [i,j]. So we fill by length L = j - i + 1: first L=1 (already base), then L=2, 3, ..., n. So: for L from 2 to n, for i from 0 to n-L, j = i+L-1, then for k from i to j-1 compute the candidate and take min.
Python Implementation
def matrix_chain_order(d: list) -> int:
n = len(d) - 1 # number of matrices
if n <= 0:
return 0
dp = [[0] * n for _ in range(n)]
for L in range(2, n + 1): # chain length
for i in range(n - L + 1):
j = i + L - 1
dp[i][j] = float("inf")
for k in range(i, j):
cost = dp[i][k] + dp[k + 1][j] + d[i] * d[k + 1] * d[j + 1]
dp[i][j] = min(dp[i][j], cost)
return dp[0][n - 1]
d[i]*d[k+1]*d[j+1]: result of Ai...Ak is d[i]×d[k+1], result of Ak+1...Aj is d[k+1]×d[j+1]; multiplying them costs d[i]*d[k+1]*d[j+1].
Line-by-Line Explanation
n = len(d) - 1: n matrices, dimensions d[0]×d[1], d[1]×d[2], ..., d[n-1]×d[n].- Base: diagonal dp[i][i] = 0 (initialized); we only fill when L ≥ 2.
- L = length of segment (number of matrices); i = start index; j = i+L-1 = end index.
- k is the split: we multiply (Ai...Ak) × (Ak+1...Aj). Cost = dp[i][k] + dp[k+1][j] + cost of final multiply.
Time and Space Complexity
- Time: O(n³). Three nested loops: length L (n), start i (O(n)), split k (O(n)).
- Space: O(n²) for the dp table.
Edge Cases
- n = 0 or 1: Zero or one matrix; return 0 (no multiplication).
- n = 2: Two matrices; only one way to multiply; cost d[0]*d[1]*d[2].
Common Mistakes
Practice Problems
- Classic: Matrix chain multiplication (min scalar multiplications).
- Print optimal parentheses (store split point k in another table and backtrack).
- Variants: burst balloons (similar interval DP), optimal BST.
Summary
- Matrix chain: Given dimensions d[0..n], minimize scalar multiplications for A0×...×An-1.
- State dp[i][j] = min cost for matrices i..j; base dp[i][i]=0; transition try split k: dp[i][k] + dp[k+1][j] + d[i]*d[k+1]*d[j+1].
- Fill by length L = 2..n so dependencies are computed first.
- Time O(n³), space O(n²).
15.11 Partition DP
Introduction
Partition DP (also called interval DP or split DP) is the pattern where the state is a range [i, j] and the transition tries every way to partition that range into subranges (e.g. [i, k] and [k+1, j]), solve the subranges recursively or from smaller intervals, and combine their results with a cost or value that depends on the split. Matrix chain multiplication (15.10) is the canonical example; others include burst balloons, palindrome partitioning, merge stones, and optimal BST. The hallmark is: state = (i, j), transition = try all k, combine.
Formal Definition
dp[i][j] as the optimal value (min or max cost,
count, etc.) for the subproblem on the interval [i, j] (array segment, string substring, or range of
indices). The transition: for each partition point k in [i, j), we split into [i, k] and [k+1, j] (or
similar), get dp[i][k] and dp[k+1][j], and combine them with a cost that may depend on i, k, j. We take
min or max over all k. Base cases: single-element or empty intervals (e.g. dp[i][i]).
Why This Topic Matters
- Pattern recognition: Many "choose where to split" or "parenthesize" problems are partition DP.
- Fill order: We must compute dp[i][j] only after all shorter intervals are done—typically by increasing length L = j - i + 1.
- Complexity: Often O(n³): n² states, O(n) partition points per state.
Mental Model
For a segment [i, j]:
- Base: if i == j (or i > j), return the base value (e.g. 0 or known cost).
- For each split k from i to j-1: left = [i, k], right = [k+1, j]. Cost = combine(dp[i][k], dp[k+1][j], i, k, j).
- dp[i][j] = min (or max) over all k of that cost.
Common Examples
1. Matrix Chain (already seen)
dp[i][j] = min cost to multiply matrices i..j. Split at k: cost = dp[i][k] + dp[k+1][j] + d[i]*d[k+1]*d[j+1]. Fill by length.
2. Burst Balloons (LeetCode 312)
Given balloons with values nums[i]; when you burst balloon k (in range [i,j]), you get nums[i-1]*nums[k]*nums[j+1] (with boundary 1). Maximize total coins. State: dp[i][j] = max coins from bursting all balloons in (i, j) exclusively (so i+1 to j-1), with boundaries i and j. Split: "last balloon burst is k" → dp[i][j] = max over k of (dp[i][k] + dp[k][j] + nums[i]*nums[k]*nums[j]). Base: dp[i][i+1] = 0 (no balloon in between).
3. Palindrome Partitioning (min cuts)
Given a string, partition into palindromic substrings. Find minimum number of cuts. State: dp[i][j] = min cuts for s[i..j] to be partitioned into palindromes; or dp[i] = min cuts for prefix s[0..i-1]. Transition: try each split so that the right part is a palindrome; dp[i] = min(1 + dp[j]) for j where s[j..i-1] is palindrome. (Alternative formulation with 2D interval is also possible.)
4. Merge Stones / Minimum Cost to Merge
Merge adjacent piles with cost = sum of the pile; minimize total cost to merge into one. dp[i][j] = min cost to merge segment [i,j] into one pile. Split: merge [i,k] and [k+1,j] first, then merge the two; cost = dp[i][k] + dp[k+1][j] + sum(i..j). Fill by length.
Fill Order (General)
dp[i][j] depends on dp[i][k] and dp[k+1][j] for k between i and j-1. Both [i,k] and [k+1,j] have length smaller than [i,j]. So we iterate by length L from 2 to n: for each L, for each start i, j = i + L - 1, then for each k from i to j-1. This guarantees dependencies are ready.
for L in 2..n:
for i in 0..(n-L):
j = i + L - 1
for k in i..(j-1):
dp[i][j] = combine(dp[i][k], dp[k+1][j], ...)
Time and Space Complexity
- Time: Typically O(n³)—three nested loops (length, start, split). Sometimes O(n²) if the split is determined by a simpler rule.
- Space: O(n²) for the 2D dp table.
Common Mistakes
Practice Problems
- Matrix chain multiplication (15.10).
- LeetCode 312: Burst Balloons.
- LeetCode 132: Palindrome Partitioning II (min cuts).
- Merge stones / minimum cost to merge consecutive elements.
Summary
- Partition DP: state = interval [i, j]; transition = try every split k, combine dp[i][k] and dp[k+1][j] with a cost.
- Fill by length L = 2 to n so that shorter intervals are computed first.
- Matrix chain, burst balloons, merge stones, palindrome partitioning are classic examples.
- Typically O(n³) time, O(n²) space.
15.12 DP on Trees
Introduction
DP on Trees means defining subproblems on the tree structure: typically, the state is "at node u" (and possibly an extra dimension like "did we take this node" or "how many nodes chosen"). We compute the answer for a node using the answers for its children—so we process in post-order (children before root). The recurrence is natural: each subtree is a subproblem; we combine results from subtrees. Classic problems include maximum path sum, house robber III (take/skip nodes with no two adjacent), tree diameter, and "best outcome in subtree" with constraints.
Why Trees Work Well for DP
- No cycles: Each subtree is independent once we fix the root; no overlapping in a confusing way.
- Natural subproblems: "Subtree rooted at u" is a clear subproblem; state = node (and maybe a small extra state).
- Order: Post-order DFS ensures we visit children before parent, so when we compute for node u, we already have results for all children.
Mental Model
For each node u:
- Recursively get results for all children (e.g. left and right for binary tree).
- Combine children's results with the current node's value (or constraint) to get the result for the subtree rooted at u.
- Return one or more values (e.g. "best path in subtree," "best if we take u," "best if we don't take u").
State and Transition (General)
State: Often (node u) or (node u, flag). For example: dp[u] = best value for subtree at u; or (take[u], skip[u]) = best if we take u / skip u. We don't always store in a table—we can return values from a DFS and use them in the parent.
Transition: Combine children. For binary tree: left_result, right_result = dfs(left), dfs(right); then result_at_u = f(node.val, left_result, right_result). For "take/skip" (house robber): take[u] = val[u] + skip[left] + skip[right]; skip[u] = max(take[left], skip[left]) + max(take[right], skip[right]).
Example: Maximum Path Sum (Binary Tree)
Find the maximum path sum (any path: node-to-node, not necessarily root-to-leaf). For each node we need: (1) the best path that goes through this node and ends at this node (so we can extend to parent)—call it "chain"; (2) the best path entirely inside the subtree (candidate for global max).
def max_path_sum(root):
best = [float("-inf")]
def dfs(node):
if not node:
return 0
left = max(0, dfs(node.left)) # ignore negative chains
right = max(0, dfs(node.right))
# path through this node: node.val + left + right
best[0] = max(best[0], node.val + left + right)
# chain ending at this node (for parent)
return node.val + max(left, right)
dfs(root)
return best[0]
We return the "chain" (max sum path from some node in subtree down to current node); the global best path might be "left chain + node + right chain," which we update at each node.
Example: House Robber III (Take / Skip)
Tree version: we can't rob two adjacent nodes (parent and child). For each node return (take, skip): take = node.val + skip(left) + skip(right); skip = max(take(left), skip(left)) + max(take(right), skip(right)). Answer at root = max(take(root), skip(root)).
def rob(root):
def dfs(node):
if not node:
return (0, 0)
left_take, left_skip = dfs(node.left)
right_take, right_skip = dfs(node.right)
take = node.val + left_skip + right_skip
skip = max(left_take, left_skip) + max(right_take, right_skip)
return (take, skip)
t, s = dfs(root)
return max(t, s)
Time and Space Complexity
- Time: O(n)—we visit each node once and do O(1) or O(children) work per node. Total O(n) for a tree with n nodes.
- Space: O(h) for recursion stack (height h); O(n) if we store a dp table per node.
Edge Cases
- Empty tree (null root): Return 0 or identity value for the problem.
- Single node: Base case; return node value or (val, 0) for take/skip.
- Negative values: In path sum, "ignore negative" by max(0, chain) so we don't drag negative into the parent.
Common Mistakes
Practice Problems
- LeetCode 124: Binary Tree Maximum Path Sum.
- LeetCode 337: House Robber III.
- Tree diameter (longest path between two nodes): for each node, diameter through node = 1 + max depth left + max depth right; return max depth and best diameter.
- Sum of all paths (root to leaf) or path count.
Summary
- DP on Trees: State = subtree rooted at node (and maybe take/skip or other flag); compute after children (post-order).
- Transition = combine children's results with current node; return value(s) for subtree.
- Time O(n), space O(h) or O(n).
- Max path sum: chain + update global with "through node"; house robber III: (take, skip) with take = val + skip(children).
15.13 DP on Graph
Introduction
DP on Graph means defining subproblems on the vertices (and sometimes edges) of a graph and computing them in an order that respects dependencies. On a DAG (directed acyclic graph), we have a natural order—topological sort—so we can compute dp[v] after all predecessors of v. That gives shortest path, longest path, count of paths, and similar problems in one or two passes. On graphs with cycles, "DP" in the classic sense is harder (no fixed order); we use shortest-path algorithms (Bellman-Ford, Dijkstra) or state that includes more information (e.g. bitmask of visited nodes). This topic focuses on DAG-based DP, which is the standard "DP on graph" in interviews.
Real-World Analogy
Imagine tasks with dependencies: task B can only start after task A finishes. The graph has an edge A → B meaning "A before B." To compute the earliest finish time at each task (a kind of DP), we must process tasks in an order where every dependency is done first—that's topological order. DP on a DAG is exactly that: we assign a value to each node using the values of the nodes that point into it, in topo order.
Formal Definition
dp[v] as the optimal value for node v (e.g.
shortest path from source to v, longest path, or number of paths to v). The transition: dp[v]
is computed from dp[u] for all predecessors u of v (u → v), plus the edge cost or weight. We
must process nodes in topological order so that when we compute dp[v], all dp[u] for
u → v are already computed. If the graph has cycles, there is no topo order; we need other techniques
(shortest path algorithms or expanded state).
Why This Topic Matters
- DAGs everywhere: Dependency graphs, project scheduling, compilation order, and many problem constraints form DAGs.
- Single-source shortest path in DAG: Can be solved in O(V + E) with one topo pass—simpler than Dijkstra when the graph is a DAG.
- Interview: "Longest path in a DAG," "number of paths from source to target," "critical path" are common.
Mental Model
For each node v in topological order:
- Initialize dp[v] (e.g. 0 for source, infinity for others in shortest path).
- For each edge (u, v) with weight w: update dp[v] from dp[u] (e.g. dp[v] = min(dp[v], dp[u] + w) for shortest path).
- After the pass, dp[v] is the answer for node v (e.g. shortest distance from source to v).
Prerequisite: Topological Sort
We need nodes in an order such that for every edge (u, v), u appears before v. Algorithms: DFS (push to stack when leaving) or Kahn's (in-degree zero queue). Without topo order we cannot guarantee that dp[u] is ready when we compute dp[v].
Example 1: Shortest Path in DAG (Single Source)
Problem: Given a DAG with edge weights and a source s, find the shortest distance from s to every vertex. Negative weights are allowed (unlike Dijkstra).
State: dist[v] = shortest distance from s to v.
Base: dist[s] = 0; dist[v] = ∞ for v ≠ s.
Transition: For each node u in topo order, for each neighbor v of u: dist[v] = min(dist[v], dist[u] + weight(u,v)). This is "relaxation" in topo order.
def shortest_path_dag(graph, weights, source, n):
# graph: list of lists, graph[u] = list of neighbors v
# weights: (u,v) -> w or dict
from collections import deque
indeg = [0] * n
for u in range(n):
for v in graph[u]:
indeg[v] += 1
topo = []
q = deque([i for i in range(n) if indeg[i] == 0])
while q:
u = q.popleft()
topo.append(u)
for v in graph[u]:
indeg[v] -= 1
if indeg[v] == 0:
q.append(v)
INF = float("inf")
dist = [INF] * n
dist[source] = 0
for u in topo:
if dist[u] == INF:
continue
for v in graph[u]:
w = weights.get((u, v), 1)
dist[v] = min(dist[v], dist[u] + w)
return dist
Example 2: Longest Path in DAG
Same idea: for longest path from s to v, initialize dist[v] = -∞ and use dist[v] = max(dist[v], dist[u] + w). Or negate weights and run shortest path. Application: critical path in project scheduling.
Example 3: Number of Paths from Source to Target
State: paths[v] = number of paths from source to v.
Base: paths[source] = 1; paths[v] = 0 for others initially.
Transition: For each u in topo order, for each neighbor v of u: paths[v] += paths[u] (add paths that come through u).
def count_paths_dag(graph, source, target, n):
from collections import deque
indeg = [0] * n
for u in range(n):
for v in graph[u]:
indeg[v] += 1
topo = []
q = deque([i for i in range(n) if indeg[i] == 0])
while q:
u = q.popleft()
topo.append(u)
for v in graph[u]:
indeg[v] -= 1
if indeg[v] == 0:
q.append(v)
paths = [0] * n
paths[source] = 1
for u in topo:
for v in graph[u]:
paths[v] += paths[u]
return paths[target]
Graphs with Cycles
If the graph has cycles, there is no topological order. Options:
- Shortest path: Use Bellman-Ford or Dijkstra (non-negative weights).
- State expansion: e.g. dp[v][mask] = best way to reach v having visited set mask (Hamiltonian path style); then we can iterate in mask order.
- Memoization: DFS from source with cache: dp(v) = f(neighbors' dp). Works when the recurrence is acyclic in "dependency" sense (e.g. shortest path in a DAG from v to target: we need neighbors to target first, so we process in reverse topo order or use recursion with memo).
Time and Space Complexity
- Topological sort: O(V + E).
- One pass over nodes and edges: O(V + E). So total O(V + E) for shortest/longest path or count paths in a DAG.
- Space: O(V) for dist/paths and topo list.
Edge Cases
- Source not in topo order first: Ensure we only relax from nodes that are reachable from source (check dist[u] != INF before relaxing, or run BFS/DFS from source first and only consider those nodes).
- Multiple components: Nodes unreachable from source stay at ∞ or 0 paths; that's correct.
- Cycle in graph: Topo sort fails (we won't get all nodes, or Kahn's will leave some with positive in-degree). Detect and handle (e.g. report "no order" or use a different algorithm).
Common Mistakes
Practice Problems
- Shortest path in DAG (single source, possibly negative weights).
- Longest path in DAG (critical path).
- Number of paths from source to target in DAG.
- LeetCode 329: Longest Increasing Path in a Matrix (build DAG from grid, then longest path).
Summary
- DP on Graph (DAG): State = value at each node (e.g. dist, path count); transition = combine from all predecessors; order = topological sort.
- Shortest path: dist[v] = min(dist[v], dist[u] + w) in topo order; longest path: use max; count paths: paths[v] += paths[u].
- Time O(V + E), space O(V). With cycles, use other algorithms or expanded state.
- Always process in topo order; only relax from reachable nodes (dist[u] finite) when applicable.
15.14 Bitmask DP
Introduction
Bitmask DP uses an integer (or bitmask) to represent a subset of a set of
n elements: bit i is 1 if element i is in the subset, 0 otherwise. So we have 2n possible
subsets encoded as 0 to 2n−1. The state is often (mask, ...) where mask is this
integer, plus optional dimensions (e.g. "last visited node"). This lets us solve "visit each city exactly
once" (TSP), "assign n tasks to n people with cost," and other subset-based optimization in
O(2n × poly(n)) time. The key is: state = subset (mask) + optional context; transition = try
adding or removing one element.
Real-World Analogy
Imagine a checklist of n places. Instead of storing a list of "visited" or "not visited," you carry a single number: the binary digits tell you which places are checked (1) and which are not (0). For example, 1011 (binary) = 11 means places 0, 1, and 3 are visited. When you visit one more place, you flip one bit. Bitmask DP is that idea: the mask is the checklist, and we build up from smaller masks (fewer places visited) to the full mask (all visited).
Formal Definition
mask | (1 << i); remove i → mask & ~(1 << i);
check i → (mask >> i) & 1 or mask & (1 << i);
iterate subsets of mask → iterate i where bit i is 1. In bitmask DP, state
includes a mask (and often another parameter like "current position"); we iterate over masks and/or over
elements to add/remove.
Why This Topic Matters
- Exponential but feasible: 2n is manageable for n up to ~20; bitmask DP is the standard for "subset of items" optimization.
- TSP and assignment: Traveling salesman (visit all, min cost), job assignment (min cost to assign n jobs to n workers), and many "choose one per slot" problems.
- Interview: Less common than linear/2D DP, but appears in harder rounds; knowing the pattern is a plus.
Bit Operations Cheat Sheet
1 << i— bit i set (2i).mask | (1 << i)— add i to subset.mask & ~(1 << i)— remove i from subset.(mask >> i) & 1orbool(mask & (1 << i))— is i in subset?mask.bit_count()(Python 3.10+) orbin(mask).count("1")— size of subset.
Mental Model
State = (mask, optional extra). For "visit all and end somewhere": dp[mask][v] = best cost to have visited exactly the set in mask and be at v. Transition: we got to v from some u in mask; so dp[mask][v] = min over u in mask, u ≠ v of (dp[mask without v][u] + cost(u, v)). Base: dp[1<<s][s] = 0 (start at s, only s visited).
Example: Traveling Salesman (TSP)
Problem: n cities, distance/cost matrix. Start at city 0, visit every city exactly once, return to 0. Minimize total cost.
State: dp[mask][v] = minimum cost to have visited all cities in mask and be at city v (v must be in mask). We will finally return to 0, so answer = min over v of (dp[full_mask][v] + cost[v][0]).
Base: dp[1 << 0][0] = 0 (start at 0, only 0 visited).
Transition: For each mask and each v in mask (v ≠ 0 or we allow 0 when mask has more), we could have come from some u in mask with u ≠ v. So dp[mask][v] = min over u in mask, u ≠ v of (dp[mask without v][u] + cost[u][v]). Iterate masks in increasing order (or by size); for each mask, for each v in mask, for each u in mask with u ≠ v, relax.
Python: TSP (Bitmask DP)
def tsp(n, cost):
# cost[i][j] = cost from i to j; n cities 0..n-1
INF = float("inf")
full = (1 << n) - 1
dp = [[INF] * n for _ in range(1 << n)]
dp[1][0] = 0 # mask with only 0, at city 0
for mask in range(1 << n):
for v in range(n):
if not (mask & (1 << v)):
continue
prev = mask & ~(1 << v)
for u in range(n):
if not (prev & (1 << u)):
continue
if dp[prev][u] != INF:
dp[mask][v] = min(dp[mask][v], dp[prev][u] + cost[u][v])
ans = INF
for v in range(1, n):
if dp[full][v] != INF:
ans = min(ans, dp[full][v] + cost[v][0])
return ans
Note: We iterate all masks; for each v in mask we look at prev = mask without v and all u in mask. So we need dp[prev][u] already computed—prev is a smaller mask (fewer bits), so iterating mask from 0 to 2n−1 works (smaller masks are computed first). Actually same mask size can have multiple entries; we're iterating by mask value and prev < mask, so we're good.
Line-by-Line Explanation (TSP)
dp[mask][v]: min cost to visit all cities in mask and end at v.- Base: dp[1][0] = 0 (only city 0 visited, we're at 0).
- For each mask and v in mask: prev = mask without v. We came from some u in prev; so dp[mask][v] = min( dp[prev][u] + cost[u][v] ) over u in prev.
- Answer: after visiting all (mask = full), go back to 0: min over v of dp[full][v] + cost[v][0].
Fix: Indentation in TSP Loop
The inner loop over u must be inside the "v in mask" block, and we must compute prev and use it correctly. Correct structure:
for mask in range(1 << n):
for v in range(n):
if not (mask & (1 << v)):
continue
prev = mask & ~(1 << v)
for u in range(n):
if not (prev & (1 << u)):
continue
if dp[prev][u] != INF:
dp[mask][v] = min(dp[mask][v], dp[prev][u] + cost[u][v])
Time and Space Complexity
- Time: O(2n × n2) — for each of 2n masks and each v (n), we iterate over u (n).
- Space: O(2n × n) for the dp table.
Edge Cases
- n = 1: Only one city; no travel; return 0 (or cost[0][0] if self-loop).
- Disconnected: If some cost[i][j] is infinite, ensure we don't use it; INF check in transition.
Common Mistakes
Practice Problems
- TSP (visit all, return to start).
- Assignment: n tasks to n people, cost[i][j]; min total cost (same structure: dp[mask] = min cost to assign tasks in mask; try assigning the last task to each person).
- LeetCode 847: Shortest Path Visiting All Nodes (graph; state (mask, node)).
Summary
- Bitmask DP: State includes a mask (subset) encoded as an integer; use bit operations to add/remove/check elements.
- TSP: dp[mask][v] = min cost to visit mask and end at v; transition from dp[mask\v][u] + cost[u][v]; iterate masks in order so dependencies are ready.
- Time typically O(2n × poly(n)); space O(2n × ...).
- Check bit ops (1<<i, |, &, ~) and iteration order.
15.15 Digit DP
Introduction
Digit DP (digit dynamic programming) solves problems about numbers in a range that satisfy a property on their digits. Examples: count numbers in [L, R] with digit sum equal to S, with no digit 4, or with digits in non-decreasing order. We build the number digit by digit (from most significant). The key state is (position, tight, ...): tight means the prefix we've chosen equals the prefix of the upper bound, so we cannot pick a digit larger than the bound's digit at the current position. We use memoization over (pos, tight, optional params) and iterate over valid digits at each step.
Problem Type
Typically: count integers in [0, N] (or [L, R] as count(R) − count(L−1)) that satisfy a digit constraint. We convert N to a list of digits (e.g. "1234" → [1,2,3,4]) and process from left to right.
State and Transition
State: (pos, tight, ...). pos = current position (0 = most significant). tight = True if the prefix built so far equals the prefix of the bound (so we are "tight" and the next digit cannot exceed bound[pos]). Extra state depends on the problem: digit sum, "has digit 4," "last digit," etc.
Base: When pos == len(digits), we have built a full number; return 1 if it satisfies the property, else 0.
Transition: At pos, try each digit d from 0 to (bound[pos] if tight else 9). New_tight = tight and (d == bound[pos]). Recurse to (pos+1, new_tight, updated_extra_state). Sum (for count) or combine results.
Example: Count Numbers in [0, N] With No Digit 4
State: (pos, tight). If we've built a prefix without 4 and we're at pos, try d in 0..9 (or 0..bound[pos] if tight); skip d==4. Base: pos == len → return 1.
def count_no_four(n: int) -> int:
s = list(map(int, str(n)))
def dfs(pos: int, tight: bool) -> int:
if pos == len(s):
return 1
up = s[pos] if tight else 9
res = 0
for d in range(0, up + 1):
if d == 4:
continue
res += dfs(pos + 1, tight and (d == s[pos]))
return res
return dfs(0, True)
With memoization: cache (pos, tight); same state is reused. Without memo, we have many repeated (pos, tight) calls.
Example: Count Numbers With Digit Sum = Target
State: (pos, tight, sum_so_far). Try digit d; new_sum = sum_so_far + d; recurse (pos+1, new_tight, new_sum). Base: pos == len and sum_so_far == target → 1, else 0. Memo on (pos, tight, sum_so_far).
def count_digit_sum(n: int, target: int) -> int:
s = list(map(int, str(n)))
from functools import lru_cache
@lru_cache(maxsize=None)
def dfs(pos: int, tight: bool, sum_so_far: int) -> int:
if pos == len(s):
return 1 if sum_so_far == target else 0
if sum_so_far > target:
return 0
up = s[pos] if tight else 9
res = 0
for d in range(0, up + 1):
res += dfs(pos + 1, tight and (d == s[pos]), sum_so_far + d)
return res
return dfs(0, True, 0)
Range [L, R]
Count in [L, R] = count(R) − count(L−1). Implement count(N) for upper bound N; then answer = count(R) − count(L−1). Handle L=0 (count(L−1) = count(-1) — define as 0 or implement count for "numbers <= N" and use count(R) − count(L−1) with care).
Time and Space Complexity
- Time: O(digits × 2 × (domain of extra state)) with memo. For "no 4," states are (pos, tight) — O(len(N)). For digit sum, extra dimension is sum (bounded by 9×len(N)). So typically O(poly(log N)) states.
- Space: Same as state space + recursion depth O(len(N)).
Edge Cases
- N = 0: Digits = [0]; count is 0 or 1 depending on whether 0 is allowed.
- Leading zeros: Usually we build numbers with the same length as N (leading zeros not used for "tight" comparison). If we allow shorter numbers, we need an extra "started" or "leading zero" flag.
Common Mistakes
Practice Problems
- Count numbers in [1, N] with no digit 4 (or 9).
- Count numbers in [L, R] with digit sum = S.
- Sum of digits of all numbers in [L, R] (extra state: sum so far; at the end add sum to global or return (count, sum)).
Summary
- Digit DP: Count (or optimize) numbers in a range satisfying a digit property; build digit by digit from MSB.
- State: (pos, tight, ...); tight = prefix equals bound prefix; try d from 0 to bound[pos] (if tight) or 9.
- Memoize on (pos, tight, extra); [L,R] = count(R) − count(L−1).
- Time O(poly(log N)) with memo; watch tight update and leading zeros.
15.16 Divide & Conquer DP
Introduction
Divide & Conquer DP (also called the Knuth-Yao or optimal
split optimization) applies to certain partition DP recurrences where the optimal split
point k has a monotonicity property: if opt[i][j] is the best k for segment [i, j],
then opt[i][j-1] ≤ opt[i][j] ≤ opt[i+1][j]. That lets us search only in a narrow range of k
instead of trying all k in [i, j), reducing time from O(n³) to O(n²) for
problems like optimal BST, matrix chain (with certain cost functions), and some "merge" problems. The cost
function must satisfy the quadrangle inequality (and sometimes monotonicity) for the
optimization to be valid.
When It Applies
Recurrence of the form:
dp[i][j] = min over k in [i, j) of { dp[i][k] + dp[k+1][j] + cost(i, j, k) }
Often cost(i, j, k) is just w(i, j) (doesn't depend on k)—e.g. sum of frequencies in [i, j] for optimal BST. For the optimization to hold we need:
- Quadrangle inequality (QI): cost satisfies a certain inequality so that the best k doesn't "jump" arbitrarily.
- Monotonicity of opt:
opt[i][j-1] ≤ opt[i][j] ≤ opt[i+1][j], so when we fill by length L, we know opt[i][j] lies between opt[i][j-1] and opt[i+1][j].
Idea: Restrict k Range
Instead of looping k from i to j-1, we loop k only from opt[i][j-1] to opt[i+1][j] (with bounds i and j-1). For the first row or when opt is not yet computed, use i and j-1. When we fill by increasing length L = j−i+1, opt[i][j-1] and opt[i+1][j] are already known from previous lengths.
Example: Optimal BST (Sketch)
Given keys and frequencies, build a BST that minimizes total search cost (sum of depth×frequency). dp[i][j] = min cost for keys i..j; root is some k in [i, j]; cost = dp[i][k-1] + dp[k+1][j] + sum(freq[i..j]). The sum doesn't depend on k. With QI, opt[i][j-1] ≤ opt[i][j] ≤ opt[i+1][j]. So we iterate k only in that range.
# Fill by length L; opt[i][j] = best root for keys i..j
for L in range(1, n + 1):
for i in range(0, n - L + 1):
j = i + L - 1
lo = opt[i][j - 1] if j > 0 else i
hi = opt[i + 1][j] if i + 1 <= j else j
dp[i][j] = inf
for k in range(lo, hi + 1):
cost = (dp[i][k-1] if k > i else 0) + (dp[k+1][j] if k < j else 0) + sum(freq[i:j+1])
if cost < dp[i][j]:
dp[i][j], opt[i][j] = cost, k
Time and Space Complexity
- Time: O(n²) — for each of O(n²) states (i, j), we try O(opt range) values of k; the total over all (i, j) is O(n²) when the range is bounded by monotonicity (each k is tried O(n) times total).
- Space: O(n²) for dp and opt tables.
Common Mistakes
Practice Problems
- Optimal BST (minimize expected search cost).
- Matrix chain with certain cost structures.
- Merge stones / minimum cost to merge (when the cost satisfies QI).
Summary
- Divide & Conquer DP: Optimize partition DP by restricting the split k using monotonicity of the optimal split point.
- Requires quadrangle inequality (and often monotonicity); then opt[i][j-1] ≤ opt[i][j] ≤ opt[i+1][j].
- Loop k only in [opt[i][j-1], opt[i+1][j]]; fill by length so these opt values are known.
- Time O(n²), space O(n²).
15.17 Knuth Optimization
Introduction
Knuth Optimization (also called Knuth-Yao or the opt table method) is a technique to speed up certain partition DP recurrences from O(n³) to O(n²). It applies when the cost function in the recurrence satisfies Knuth's conditions (a form of quadrangle inequality and monotonicity), which imply that the optimal split point k for segment [i, j] lies between the optimal split for [i, j−1] and for [i+1, j]. We maintain an opt[i][j] table and only try k in that range instead of all k in [i, j), reducing the inner loop from O(n) to O(1) amortized. It is widely used for optimal BST, matrix chain (with appropriate cost), and similar problems.
Real-World Analogy
Imagine cutting a rod at different positions to minimize cost. If "the best cut for a longer piece is never to the left of the best cut for a shorter piece starting at the same place," then when we solve for a longer segment we only need to look near where we cut the slightly shorter segments. Knuth optimization formalizes this: the best split doesn't jump around; it moves monotonically, so we narrow the search.
Formal Definition
dp[i][j] = min_{i ≤ k < j} { dp[i][k] + dp[k+1][j] + w(i, j) } for
i < j, with base dp[i][i] = 0. Let opt[i][j] be the value of k that achieves this minimum
(smallest k if tie). Knuth's conditions on the weight w(i, j) are: (1) Quadrangle
inequality (QI): w(i, j) + w(i′, j′) ≤ w(i, j′) + w(i′, j) for i ≤ i′ ≤ j ≤ j′. (2)
Monotonicity: w(i, j) ≤ w(i, j+1) and w(i, j) ≤ w(i−1, j). Under these, we have
opt[i][j−1] ≤ opt[i][j] ≤ opt[i+1][j], so we can restrict the k loop.
Why This Topic Matters
- Speedup: O(n³) → O(n²) for a class of partition DP problems; critical when n is large.
- Optimal BST and variants: Standard solution uses Knuth optimization (or the same idea).
- Competitive programming: Common in advanced DP problems; knowing when and how to apply it is valuable.
Mental Model
When filling dp[i][j] for a segment [i, j]:
- We know opt[i][j−1] and opt[i+1][j] from previous iterations (shorter segments).
- So the best k for [i, j] lies in [opt[i][j−1], opt[i+1][j]] (clamped to [i, j−1]).
- Loop only over k in that range; update dp[i][j] and set opt[i][j] to the best k.
Knuth's Conditions in Practice
For optimal BST: w(i, j) = sum of frequencies from i to j (prefix sum). This satisfies QI and monotonicity. For matrix chain with cost = d[i]*d[k+1]*d[j+1], the cost depends on k, so the standard matrix chain recurrence is not of the form w(i,j) only; a different but related formulation (e.g. with cumulative cost) can satisfy the conditions. Many "concave" or "convex" cost functions in partition DP satisfy them.
Step-by-Step: Optimal BST with Knuth
- Input: Keys 0..n−1 (or 1..n), frequency f[i] for key i. We want a BST minimizing sum of (depth of key i × f[i]).
- Precompute: Prefix sum of frequencies so w(i, j) = sum(f[i..j]) in O(1).
- Base: dp[i][i] = 0 (or f[i] if single node has cost); opt[i][i] = i.
- Fill by length L: For L from 2 to n, for i from 0 to n−L, j = i+L−1. Set lo = opt[i][j−1], hi = opt[i+1][j] (or i and j−1 for first row/column). For k from lo to hi: cost = dp[i][k−1] + dp[k+1][j] + w(i,j). If cost < dp[i][j], update dp[i][j] and opt[i][j] = k.
Python Implementation (Optimal BST)
def optimal_bst(freq):
n = len(freq)
# prefix sum: w(i,j) = pref[j+1] - pref[i]
pref = [0]
for f in freq:
pref.append(pref[-1] + f)
dp = [[0] * n for _ in range(n)]
opt = [[i] * n for i in range(n)]
for L in range(2, n + 1):
for i in range(0, n - L + 1):
j = i + L - 1
dp[i][j] = float("inf")
lo = opt[i][j - 1] if j > 0 else i
hi = opt[i + 1][j] if i + 1 <= j else j
for k in range(lo, hi + 1):
left = dp[i][k - 1] if k > i else 0
right = dp[k + 1][j] if k < j else 0
w = pref[j + 1] - pref[i]
cost = left + right + w
if cost < dp[i][j]:
dp[i][j] = cost
opt[i][j] = k
return dp[0][n - 1]
Line-by-Line Explanation
pref: w(i, j) = pref[j+1] − pref[i] in O(1).opt[i][j]: best root k for keys i..j; initialized to i (for length 1, opt[i][i]=i).- Length L from 2 to n; for segment [i, j] with j = i+L−1, lo = opt[i][j−1], hi = opt[i+1][j].
- Cost for root k: dp[i][k−1] + dp[k+1][j] + w(i,j). Update dp and opt when we get a better cost.
Time and Space Complexity
- Time: O(n²). For each (i, j) we try O(opt[i+1][j] − opt[i][j−1] + 1) values of k; the sum over all (i, j) of this range is O(n²) due to monotonicity (each k is "used" in O(n) segments).
- Space: O(n²) for dp and opt.
Edge Cases
- n = 0 or 1: Single node: cost is f[0]; or return 0 for empty.
- lo > hi: Clamp lo to i and hi to j−1 (or j for optimal BST where k is root index in [i, j]); ensure the loop runs correctly when opt[i][j−1] > opt[i+1][j] (shouldn't happen if conditions hold, but defensive coding helps).
Common Mistakes
Practice Problems
- Optimal BST (classic).
- Matrix chain with cost that fits the form (e.g. some variants).
- Merge stones / minimum cost to merge when cost is additive (sum) over the segment.
Summary
- Knuth Optimization: Restrict the split k in partition DP to [opt[i][j−1], opt[i+1][j]] when the cost w(i,j) satisfies quadrangle inequality and monotonicity.
- Maintain opt[i][j] = best k for segment [i, j]; fill by length so opt[i][j−1] and opt[i+1][j] are known.
- Time O(n²), space O(n²). Optimal BST is the standard example.
- Do not apply when cost doesn't satisfy the conditions.
15.18 Convex Hull Trick
Introduction
The Convex Hull Trick (CHT) solves the problem: given many linear functions fk(x)
= mk·x + ck, for a query value x find the minimum (or maximum) of fk(x)
over all k. The "trick" is that we only need to maintain the lower envelope (for min) or
upper envelope (for max)—the set of line segments that actually win for some x. In DP,
recurrences of the form dp[i] = min_j (m[j]*x[i] + c[j]) + const can be evaluated in O(log n)
per state using CHT, turning O(n²) into O(n log n). Used when the transition is linear in the "query"
variable.
Problem Formulation
We have lines Lk: y = mkx + ck. For query x, compute mink Lk(x) (or max). The lower envelope is the piecewise minimum; it is a convex chain (when lines are sorted by slope). We add lines one by one and maintain the hull so that we can query in O(log n) or O(1) amortized.
When Slopes and Queries Are Monotonic
If we add lines in increasing slope order and query increasing x, we can use a deque. The hull is maintained by removing lines that are never minimal: when adding a new line, pop from the back while the new line is better than the back line at the intersection point with the second-from-back. Query: pop from the front while the next line is better at current x. Each line is pushed and popped at most once, so total O(n).
Deque Implementation (Min Hull, Increasing Slope, Increasing Query)
# Lines as (m, c). Query x returns min over added lines.
# Assumes lines added in order of increasing m; queries in increasing x.
def cht_deque():
lines = [] # list of (m, c) that form the hull
def cross(m1, c1, m2, c2):
# x where m1*x+c1 == m2*x+c2
return (c2 - c1) / (m1 - m2) if m1 != m2 else float("-inf")
def add(m, c):
while len(lines) >= 2:
m1, c1 = lines[-2]
m2, c2 = lines[-1]
x12 = cross(m1, c1, m2, c2)
x1n = cross(m1, c1, m, c)
if x1n <= x12: # new line overtakes back at or before back's end
lines.pop()
else:
break
lines.append((m, c))
def query(x):
while len(lines) >= 2:
m1, c1 = lines[0]
m2, c2 = lines[1]
if m1*x + c1 > m2*x + c2:
lines.pop(0)
else:
break
return lines[0][0]*x + lines[0][1] if lines else float("inf")
return add, query
DP Application
Suppose dp[i] = min_{j < i} (m[j]*x[i] + c[j]) + const[i], where m[j] and c[j] depend on j (and known values), and x[i] is a value at i. Then we can treat each (m[j], c[j]) as a line, add them as we compute j, and query(x[i]) to get the min. If x[i] is monotonic (e.g. prefix sum), the deque version applies and we get O(n) total.
General Case: Li Chao Tree
When query x is not monotonic or lines are added in arbitrary order, we need a structure that supports insert(line) and query(x). The Li Chao segment tree stores in each segment the line that is "best" at the segment's midpoint; updates and queries are O(log U) where U is the range of x (after coordinate compression if needed).
Time and Space Complexity
- Deque (monotonic): O(n) total for n add and n query (amortized O(1) per operation).
- Li Chao: O(log U) per insert and per query; space O(U) or O(n log U) with dynamic structure.
Common Mistakes
Practice Problems
- DP problems where the transition is linear in a variable (e.g. max profit with linear cost per item).
- Classic: "Commando" style (dp[i] = max a[i]*b[j] + c[j] + ...).
Summary
- Convex Hull Trick: Query min (or max) of linear functions f_k(x) = m_k*x + c_k at given x.
- Maintain lower (or upper) envelope; add lines and query. Deque when slopes and x are monotonic; Li Chao for general.
- DP: when transition is dp[i] = min_j (m[j]*x[i] + c[j]) + const, CHT gives O(n) or O(n log n) instead of O(n²).
- Watch monotonicity assumptions and division by zero in line intersection.
15.19 Profile DP
Introduction
Profile DP (also called contour DP or plug DP) is a
technique for counting or optimizing on a grid by processing row by row (or column by
column). The profile is the state of the "boundary" between the current row and the
previous one—typically which cells are filled, or connectivity information along the contour. The state
is often encoded as a mask or a string of length m (the width). We compute dp[row][profile]:
number of ways (or best cost) to fill the grid up to the current row with the given profile at the
boundary. Transition: try all valid ways to extend the profile to the next row. Used for domino
tiling, counting fillings with shapes, and path problems on grids. State space can be exponential in m
(e.g. 2^m or larger with connectivity).
What Is a Profile?
When we have filled rows 0..r−1 and are about to fill row r, the profile describes the interface between row r−1 and row r. For simple tiling (e.g. 1×2 dominoes): profile might be a bitmask of length m where bit j = 1 if the cell (r−1, j) is already covered by a domino that "sticks out" into row r (vertical domino). So we know which cells in row r are blocked from above. For connectivity problems (e.g. Hamiltonian path on grid), the profile stores which cells on the contour are "connected" (same path component)—more complex encoding (e.g. bracket sequence or state of m+1 "plugs").
State and Transition
State: dp[r][profile] = number of ways (or min cost) to fill rows 0..r−1
such that the boundary between row r−1 and row r is in state profile.
Base: Row 0: profile usually "empty" or "all cells need to be filled"; dp[0][initial_profile] = 1.
Transition: From (r, profile), try all valid ways to place tiles in row r that are consistent with profile (e.g. vertical domino from above fills a cell; we can place horizontal dominoes in row r). This yields a new profile for the boundary between row r and row r+1. Add to dp[r+1][new_profile].
Example: Domino Tiling (1×2) on n×m Grid
Profile = mask of m bits: bit j = 1 if (r−1, j) is covered by a domino that extends down (so (r, j) is "blocked" from above). We iterate over row r: for each cell we can place a horizontal domino (covers (r,j) and (r,j+1)) if both are free, or a vertical domino (covers (r,j) and (r+1,j)) if (r,j) is free. The state is (column, profile); we go column by column and update the profile (which cells in current row are now covered). Final answer: dp[n][0] (full grid filled, no dangling from last row). Implementation details depend on whether we process cell-by-cell or row-by-row; the key is encoding "what the previous row leaves for the current row."
Time and Space Complexity
- State space: O(n × |profiles|). For simple mask, |profiles| = 2^m. With connectivity (bracket/plug), it can be larger but often still manageable for small m.
- Transition: Per state we try several choices (e.g. place domino or not); total time O(n × |profiles| × choices). Often used when m is small (e.g. m ≤ 10–20).
Common Mistakes
Practice Problems
- Domino tiling: count ways to tile n×m with 1×2 (or 2×1) dominoes.
- Count ways to fill grid with L-shapes or other fixed shapes.
- Hamiltonian path on grid (connectivity profile / plug DP).
Summary
- Profile DP: Grid problems; state = (row, profile); profile = boundary state between current and previous row (mask or connectivity).
- Process row by row; transition = try all valid ways to fill current row consistent with profile, get new profile.
- State space O(n × 2^m) or more; use for small m. Domino tiling is the classic example.
- Encode profile correctly; be consistent about row boundaries.
15.20 SOS DP
Introduction
SOS DP (Sum Over Subsets DP) computes, for each mask m (0 to 2n−1),
the sum of A[s] over all submasks s of m (i.e. s
⊆ m, or s & m == s). Naively that is O(3n) (each mask has 2^(popcount(m))
submasks). SOS DP does it in O(n × 2n) by processing one bit at a time: after
processing bit i, F[m] contains the sum over all submasks of m that only use bits 0..i. Used
in problems like "for each mask m count pairs (i, j) with i | j = m" or "sum of values over all submasks."
Formal Definition
A of length 2n (indexed by mask), define F[m] =
sum of A[s] for all s such that s ⊆ m (s is a submask of m, i.e. the set of bits in s is a
subset of the set of bits in m, or (s | m) == m equivalently (s & m) == s).
SOS DP computes F for all m in O(n × 2n). Variant: sum over supermasks (m ⊆ s)
uses a similar loop with the opposite direction.
Why This Topic Matters
- Fast submask/supermask aggregation: Many problems ask "for each mask, sum over submasks or count pairs with OR = mask." Naive is O(3^n); SOS is O(n × 2^n).
- Competitive programming: Common in problems involving bitmasks and subset convolution.
- Interview: Less common but appears in bitmask-heavy problems.
Mental Model
Start with F[m] = A[m]. Then for each bit i (0 to n−1): for each mask m that has bit i set, add F[m without bit i] to F[m]. After processing bit i, F[m] = sum of A[s] over all s that are submasks of m and only differ in bits 0..i. After all bits, F[m] = sum over all submasks of m.
Algorithm
Initialize F[m] = A[m] for all m. Then:
for i in 0..n-1:
for m in 0..(2^n - 1):
if m has bit i set:
F[m] += F[m ^ (1 << i)]
Order matters: we must iterate m in ascending order (or ensure we don't use an already-updated F[m ^ (1<<i)] for the same bit i—so we iterate m and only add from the "previous" state, which is F[m ^ (1<<i)] that hasn't included bit i yet). So the inner loop is over all m; for each m with bit i set, we add F[m without i]. This way each submask is counted once.
Python Implementation
def sos_submask(A, n):
"""F[m] = sum of A[s] for all s ⊆ m."""
F = A[:]
for i in range(n):
for m in range(1 << n):
if m & (1 << i):
F[m] += F[m ^ (1 << i)]
return F
Iterating m from 0 to 2^n−1 is correct: when we update F[m], we use F[m ^ (1<<i)] which has not been updated for bit i yet in this round (we're iterating m and m ^ (1<<i) < m when bit i is set in m), so we're adding the sum over submasks that don't include bit i.
Line-by-Line Explanation
F = A[:]: start with F[m] = A[m] (each mask is its own submask).- For each bit i: for each mask m with bit i set, F[m] += F[m ^ (1<<i)]. So we add the sum of all submasks of m that don't have bit i (which is F[m ^ (1<<i)] after previous bits) to F[m]. Thus F[m] becomes the sum over submasks that may or may not have bit i.
- After all bits, F[m] = sum of A[s] for all s ⊆ m.
Sum Over Supermasks
To get G[m] = sum of A[s] for all s such that m ⊆ s (s is a supermask of m): iterate bits and for each m that does not have bit i, add G[m | (1<<i)] to G[m]. Process bits and iterate m in descending order so we don't double-count. Alternatively: define m' = complement of m; sum over submasks of m' in A' where A'[s] = A[complement of s]; then relate to supermasks of m.
Time and Space Complexity
- Time: O(n × 2n) — two nested loops.
- Space: O(2n) for F (can do in-place over A).
Edge Cases
- n = 0: 2^0 = 1 mask; F[0] = A[0].
- In-place: If we update A in place, the order of the inner loop (m from 0 to 2^n−1) ensures we read the "old" value for m ^ (1<<i) when we update m (since m ^ (1<<i) < m when bit i is in m).
Common Mistakes
Practice Problems
- For each mask m, count pairs (i, j) such that a[i] | a[j] = m (use SOS on frequency array).
- Sum over submasks: given A, compute F[m] = sum of A[s] for s ⊆ m.
- Subset convolution (more advanced, builds on SOS idea).
Summary
- SOS DP: F[m] = sum of A[s] over all submasks s of m; computed in O(n × 2n).
- Initialize F[m] = A[m]; for each bit i, for each m with bit i set: F[m] += F[m ^ (1<<i)].
- Loop order: m from 0 to 2^n−1 so we don't double-count. Sum over supermasks uses a different direction.
- Use for submask/supermask aggregation in bitmask problems.
Section 16: Bit Manipulation
This section covers bit manipulation: working with numbers at the level of individual bits (0s and 1s). You will learn bit basics (representation, positions, and operators), XOR tricks, single-number and subset/mask problems, Gray code, and bit DP. Mastery of bits is essential for low-level optimization, encoding, and many interview problems that ask for O(1) space or constant-time checks.
16.1 Bit Basics
Introduction
Bit basics are the foundation of bit manipulation: understanding how integers are stored as sequences of bits, how to read and set individual bit positions, and how the core bitwise operators (AND, OR, XOR, NOT, and shifts) work. Every other topic in this section builds on these ideas. Without a solid grasp of bit basics, XOR tricks, bitmasks, and bit DP will feel like magic; with it, they become systematic tools.
Real-World Analogy
Think of an integer as a row of light switches: each switch is either on (1) or off (0). The rightmost switch is the "ones" place, the next is "twos," then "fours," and so on—each position is a power of 2. Flipping a switch changes that bit; bitwise operators are rules for combining two rows of switches (e.g., "turn on a light only if both rows have it on" for AND). Bit basics are learning how these switches are numbered and how the rules work.
Formal Definition
Why This Topic Matters
- Interviews: Many problems (single number, subset generation, power-of-two checks) rely on bit operations. Interviewers often expect you to know AND/OR/XOR/NOT and shifts by heart.
- Performance: Bit operations are among the fastest CPU instructions; using them can replace branches and arithmetic in hot loops.
- Encoding and flags: Bits are used to represent sets (each bit = one element), permissions, and compact state in DP (e.g., bitmask DP).
Mental Model
For any non-negative integer n:
- Binary: Write n as a sum of powers of 2; the coefficient of 2i is the bit at position i (0 = LSB).
- Position i: "Is 2i included in n?" → check
(n >> i) & 1. Set bit i:n | (1 << i). Clear bit i:n & ~(1 << i). - Operators: AND = both 1 → 1; OR = at least one 1 → 1; XOR = exactly one 1 → 1; NOT = flip; left shift = multiply by 2; right shift = integer divide by 2.
Bit Positions and Powers of 2
In binary, the bit at position 0 (rightmost) has value 20 = 1; position 1 has value 21 = 2; position 2 has value 4; and so on. So the integer value is the sum of (bit at position i) × 2i over all i.
Position: 3 2 1 0 (0 = LSB) Power of 2: 8 4 2 1 n = 13: 1 1 0 1 → 8+4+0+1 = 13 n = 6: 0 1 1 0 → 0+4+2+0 = 6
To read the bit at position i: shift n right by
i so that bit moves to position 0, then mask with 1 to keep only that bit:
(n >> i) & 1. To set bit i to 1: n | (1 << i). To
clear bit i (set to 0): n & ~(1 << i). To toggle
bit i: n ^ (1 << i).
Bitwise Operators (Step-by-Step)
We compare two numbers bit-by-bit. Assume 8-bit width for clarity (Python uses arbitrary precision; the idea is the same).
AND (&)
Result bit is 1 only when both input bits are 1. Use: extract a subset of bits (mask), check if a bit is set, clear bits.
a = 12 → 1 1 0 0 b = 10 → 1 0 1 0 a & b = 8 → 1 0 0 0 (only position 3 has 1 in both)
OR (|)
Result bit is 1 when at least one input bit is 1. Use: set bits, merge sets.
a = 12 → 1 1 0 0 b = 10 → 1 0 1 0 a | b = 14 → 1 1 1 0
XOR (^)
Result bit is 1 when exactly one input bit is 1 (one or the other, not both). Use: toggle bits, detect difference, cancel duplicates (a ^ a = 0).
a = 12 → 1 1 0 0 b = 10 → 1 0 1 0 a ^ b = 6 → 0 1 1 0
NOT (~)
Flips every bit. In Python, integers are arbitrary-precision, so ~n is -(n+1) (two's complement of the representation). For a fixed width w, NOT would give (2^w - 1) - n. To clear bit i we use n & ~(1 << i): ~(1 << i) is a mask with every bit 1 except bit i.
Left shift (<<)
n << k shifts all bits of n left by k positions; new right bits are 0. Equivalent to multiplying n by 2k (for non-negative n in range).
5 << 1 → 10 (5×2) 5 << 2 → 20 (5×4)
Right shift (>>)
n >> k shifts all bits right by k; in Python for non-negative n this is integer division by 2k (floor).
13 >> 1 → 6 13 >> 2 → 3
Python Implementation: Reading, Setting, Clearing, Toggling
def get_bit(n: int, i: int) -> int:
"""Return the bit at position i (0 = LSB)."""
return (n >> i) & 1
def set_bit(n: int, i: int) -> int:
"""Set bit at position i to 1."""
return n | (1 << i)
def clear_bit(n: int, i: int) -> int:
"""Set bit at position i to 0."""
return n & ~(1 << i)
def toggle_bit(n: int, i: int) -> int:
"""Flip bit at position i."""
return n ^ (1 << i)
def is_power_of_two(n: int) -> bool:
"""True iff n is 1, 2, 4, 8, ... (exactly one bit set)."""
return n > 0 and (n & (n - 1)) == 0
Line-by-Line Explanation
- get_bit:
n >> imoves bit i to position 0;& 1keeps only that bit (0 or 1). - set_bit:
1 << iis a number with only bit i set;n | ...forces that bit to 1 without changing others. - clear_bit:
~(1 << i)is a mask with 0 only at position i;n & ...clears that bit. - toggle_bit:
n ^ (1 << i)flips bit i (0→1, 1→0). - is_power_of_two: Powers of 2 have form 100...0; subtracting 1 gives 011...1. So
n & (n-1)is 0 only when n has at most one bit set. We needn > 0because 0 & (-1) is 0 but 0 is not a power of 2.
ASCII Diagram: AND / OR / XOR at One Position
Bit A Bit B A&B A|B A^B
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0
Time and Space Complexity
For the basic operations above (get/set/clear/toggle bit, is_power_of_two):
- Time: O(1) — a fixed number of arithmetic/bit operations.
- Space: O(1) — no extra structures (Python integers are immutable; we return new integers for set/clear/toggle).
If we iterate over all bits of n (e.g., count set bits by looping i = 0 to bit length), the number of iterations is O(log n) for positive n, since the bit length is about log₂(n).
Edge Cases
- Negative numbers: In Python, negative integers have a "virtual" infinite sign extension in two's complement. Right shift
n >> kis arithmetic for negative n (sign-extending). For bit tricks, we often work with non-negative n or usen & 0xFFFF...to restrict to a fixed width. - Shift amount:
n >> kwhen k is large (e.g. k ≥ bit length of n) gives 0 for non-negative n.n << kwith large k can produce very large numbers; no overflow in Python, but be aware of magnitude. - n = 0:
0 & (0-1)is 0, but 0 is not a power of two—hence then > 0check inis_power_of_two.
Common Mistakes
1 << i and forgetting that bit positions are 0-indexed from the LSB. Position 0 is the rightmost bit; position 1 is the second from the right. So "the first bit" often means position 0.
and) with bitwise AND (&). 5 and 3 is 3 (truthy value); 5 & 3 is 1. Use & for bit masks.
n % 2 and n & 1 are always equivalent. For non-negative n they are (both give LSB). For negative n in Python, n % 2 is 0 or 1 by Python's floor-mod rule; n & 1 is still the LSB of the two's-complement representation. For "is n even?" on integers, both are fine in practice; for bit extraction, n & 1 is the standard.
n: use bin(n).count('1') for clarity, or a loop with n & 1 and n >>= 1, or n.bit_count() in Python 3.10+. For "clear the lowest set bit," use n & (n - 1) — this is the same trick as in is_power_of_two and appears in many problems.
n & 1 is typically faster than n % 2 (one bit op vs. division). Multiplying by 2k via n << k and dividing by 2k via n >> k are also faster than n * (2**k) and n // (2**k). In tight loops, prefer bit operations when the intent is bit-level.
n > 0 and (n & (n-1)) == 0. Clear lowest set bit: n & (n-1). These show up everywhere in bit manipulation and bitmask DP."
Practice Problems
- Implement
count_set_bits(n)(number of 1s in binary). - Check if the i-th bit is set without using
get_bit(directly with one expression). - Given n, return the position of the LSB that is 1 (e.g. n=12 → 2; 12 = 1100, lowest set bit is at position 2).
- Turn off the rightmost 1 in n using one expression (same as clear lowest set bit).
Summary
- Bits: Binary digits 0 and 1; position i has value 2i; LSB is position 0.
- Read/set/clear/toggle:
(n >> i) & 1,n | (1 << i),n & ~(1 << i),n ^ (1 << i). - Operators: AND (both 1), OR (at least one 1), XOR (exactly one 1), NOT (flip),
<<(×2k),>>(÷2k). - Power of two:
n > 0 and (n & (n - 1)) == 0. Clear lowest set bit:n & (n - 1). - Master these before XOR tricks, bitmasks, and bit DP.
16.2 XOR Tricks
Introduction
XOR tricks are patterns that use the exclusive-or operator (^) to solve
problems in elegant, often constant-space ways. XOR has special algebraic properties—cancelling
duplicates, toggling bits, and being reversible—that make it ideal for "find the unique element,"
"swap without extra variable," and "find missing number" type problems. This topic builds directly
on Bit Basics: you use the same shift-and-mask ideas, but the focus is on when and why
XOR is the right tool.
Real-World Analogy
Imagine two identical keys: if you "XOR" them together, they cancel out and you get nothing (0). If you XOR a key with nothing, you get the same key back. Now suppose you have a pile of duplicate keys and one unique key: XORing every key together cancels all pairs; what remains is the unique one. That's the core idea behind "single number" and "find the missing number" problems: pairing cancels, the odd one out remains.
Formal Definition
a ^ b, is a bitwise operation: the result
bit is 1 when exactly one of the two input bits is 1. Key properties: (1) a ^ a = 0
(same value cancels); (2) a ^ 0 = a (identity); (3) XOR is commutative
(a ^ b = b ^ a) and associative ((a ^ b) ^ c = a ^ (b ^ c)).
So the XOR of a multiset of numbers depends only on the parity of how many times each bit appears:
even occurrences cancel, odd ones remain.
Why This Topic Matters
- Interview staple: "Single Number," "Missing Number," and "Two numbers that appear once" are common; the expected solution is often O(n) time, O(1) space using XOR.
- No extra space: XOR lets you accumulate a "signature" in one variable, so you avoid hashing or extra arrays when finding the unique or missing element.
- Foundation for 16.3: Single Number problems (next topic) are direct applications of these tricks.
Mental Model
- Cancel pairs:
x ^ x = 0. So XORing all numbers in an array where every value appears twice except one → the result is the single number. - Identity:
x ^ 0 = x. XORing with 0 leaves the number unchanged; 0 is the "neutral" element. - Order doesn't matter: Because of commutativity and associativity,
a ^ b ^ c ^ a ^ b = c(the two a's and two b's cancel). - Reversible:
(a ^ b) ^ b = a. So XOR can be used to "encode" and "decode" with the same key (e.g. swap by triple-XOR).
Core XOR Properties (Step-by-Step)
These four facts are the entire foundation. Memorize them.
- Self-cancel:
a ^ a = 0. Same number XORed with itself gives 0. - Identity:
a ^ 0 = a. XOR with 0 does nothing. - Commutative:
a ^ b = b ^ a. Order of operands doesn't matter. - Associative:
(a ^ b) ^ c = a ^ (b ^ c). So we can writea ^ b ^ cwithout parentheses and compute in any order.
From (1)–(4) it follows that the XOR of a multiset is 0 if every value appears an even number of times, and equals the XOR of all values that appear an odd number of times. If exactly one value appears once and the rest twice, the XOR of the whole array is that single value.
Classic Tricks
1. Swap two variables without a temporary
Using (a ^ b) ^ b = a and (a ^ b) ^ a = b:
a = a ^ b # a now holds a^b
b = a ^ b # b = (a^b)^b = a
a = a ^ b # a = (a^b)^a = b
After these three steps, a and b are swapped. No extra variable, but only safe for integers (and same variable: if a and b are the same variable, you'd zero it out—so in practice ensure they're distinct indices or values).
2. Find the single number (every other appears twice)
Given an array where every element appears exactly twice except one that appears once, XOR all elements. Pairs cancel; the result is the single number.
def single_number(nums: list[int]) -> int:
result = 0
for x in nums:
result ^= x
return result
Time O(n), space O(1). This is the core of the next topic (16.3).
3. Find the missing number (0 to n)
Given an array of size n containing distinct integers from 0 to n (inclusive) with one number missing, we can XOR (1) all indices from 0 to n, and (2) all elements in the array. The XOR of (1) and (2) cancels every index that has a matching element; the only "unpaired" value is the missing number.
def missing_number(nums: list[int]) -> int:
n = len(nums)
xor_all = 0
for i in range(n + 1):
xor_all ^= i
for x in nums:
xor_all ^= x
return xor_all
Alternatively: xor_all = 0; for i in range(n): xor_all ^= i ^ nums[i]; then xor_all ^= n. Same idea: indices and values pair except the missing one.
4. Toggle a bit (from Bit Basics)
n ^ (1 << i) flips the i-th bit. XOR with 1 toggles; XOR with 0 keeps unchanged.
5. Check if two numbers have opposite signs (optional)
For integers a and b, (a ^ b) < 0 is true when one is positive and one is negative (in two's complement, the sign bit differs). Not always asked, but shows XOR of sign bits.
ASCII Diagram: Why XOR Cancels Pairs
Array: [4, 1, 2, 1, 2] → only 4 appears once 4 ^ 1 ^ 2 ^ 1 ^ 2 = 4 ^ (1^1) ^ (2^2) (reorder by commutativity) = 4 ^ 0 ^ 0 = 4
Python Implementation: Two Numbers That Appear Once
Variation: every element appears twice except two numbers that each appear once. Find those two.
- XOR all elements →
xor_all = a ^ b(the two unknowns). - Pick any set bit in
xor_all(e.g. the lowest:diff = xor_all & -xor_allorxor_all & ~(xor_all - 1)). That bit is 1 in one ofa,band 0 in the other. - Partition the array: XOR all elements where that bit is set → one of the numbers; XOR all where that bit is 0 → the other. (Same as: one group has a, the other has b; pairs within each group still cancel.)
def two_single_numbers(nums: list[int]) -> tuple[int, int]:
xor_all = 0
for x in nums:
xor_all ^= x
# xor_all = a ^ b (the two singles)
low_bit = xor_all & (-xor_all) # or: xor_all & ~(xor_all - 1)
first = 0
for x in nums:
if x & low_bit:
first ^= x
second = xor_all ^ first
return (first, second)
Line-by-Line Explanation (Two Singles)
xor_all: After the loop,xor_all = a ^ b(the two numbers that appear once).low_bit = xor_all & (-xor_all): In two's complement,-xor_allflips bits and adds 1;xor_all & -xor_allkeeps only the lowest set bit ofxor_all. Solow_bitis a power of 2 whereaandbdiffer.if x & low_bit: We split numbers into two groups—those with that bit set and those without. One ofa,bis in the first group, the other in the second; duplicates still come in pairs in the same group.first ^= x: XOR of all elements in the first group gives one of the two singles (the other group gives the other).second = xor_all ^ firstrecovers the second froma ^ b ^ a = b.
Time and Space Complexity
- Single number: O(n) time, O(1) space.
- Missing number: O(n) time, O(1) space.
- Two single numbers: O(n) time, O(1) space (two passes over the array, a few variables).
Edge Cases
- Empty array: Single-number and two-singles should define behavior (e.g. return 0 or raise). For missing number,
n=0means the only "index" is 0 and the array is empty, so the missing number is 0. - Single element: For single number, return that element. For missing number with
n=1, array has one element (0 or 1); the missing one is the other. - Swap: If
aandbrefer to the same variable (e.g.arr[i] ^= arr[j]wheni == j), you zero that cell. Always checki != jbefore swapping with XOR.
Common Mistakes
single ^ x (where x is the triple), not the single.
a and b differ. If you split by a bit where both have 0 (or both 1), the two singles end up in the same group and you get a ^ b instead of a or b. Always use a set bit of xor_all = a ^ b to partition.
n: n & (-n) (two's complement) or n & ~(n - 1). Both yield the smallest power of 2 that divides the bit pattern of n; use this to "split" by one differing bit in the two-singles problem.
acc ^= i ^ nums[i] for i in range(n), then acc ^= n). Same O(n) but fewer loop iterations and one variable.
Practice Problems
- Single Number I: every element twice except one — XOR all.
- Missing Number: array of size n with distinct 0..n, one missing — XOR indices and values.
- Single Number III: every element twice except two — partition by a bit of (a^b), XOR each group.
- Swap two integers without a temporary variable using XOR.
- Given an array where every element appears 3 times except one that appears 1 time — think why XOR alone isn't enough (need digit-by-digit mod 3 or similar).
Summary
- Properties:
a ^ a = 0,a ^ 0 = a; XOR is commutative and associative, so pairs cancel. - Single number (pairs): XOR every element → result is the one that appears once.
- Missing number (0..n): XOR all indices 0..n and all array elements → result is the missing number.
- Two singles: XOR all → a^b; pick a set bit, partition by that bit, XOR each half to get the two numbers.
- Swap:
a^=b; b^=a; a^=b;(only when a and b are distinct variables). - These patterns are the basis for the next topic (Single Number problems).
16.3 Single Number Problems
Introduction
Single Number problems ask you to find the element that appears a different number of times than the others—usually once while every other element appears twice (or three times), or to find two elements that each appear once while the rest appear twice. These are direct applications of XOR tricks from 16.2: the "single number I" and "single number III" variants are solved with the patterns you already learned. The "single number II" variant (one appears once, rest three times) cannot be solved by plain XOR alone; it needs a bit-count or state-machine approach. This topic ties everything together and adds the generalization to three occurrences.
Real-World Analogy
Imagine a room of people where everyone has a twin except one person (Single Number I): if you "cancel" every pair, the one left without a pair is the answer. If instead two people have no twin (Single Number III), you first find "how the two differ" (e.g. one has a red badge, one doesn't), split the room into two groups by that trait, then in each group cancel pairs—the leftover in each group is one of the two. For "everyone has two twins except one" (Single Number II), pairing doesn't work; you need to count "at each position, how many people have that trait" and take the remainder mod 3 to isolate the single.
Formal Definition
Why This Topic Matters
- Interview frequency: Single Number I and III are classic; II appears as a follow-up. Knowing the progression (XOR for pairs → partition for two singles → bit-count/mod-3 for triplets) shows depth.
- Reinforces XOR: Single Number I and III are the main use cases for the XOR tricks from 16.2; practicing them cements the "cancel pairs" mental model.
- Bit-count pattern: Single Number II introduces counting bits mod k (here k=3), which generalizes to "every other appears k times."
Mental Model
- I (pairs): XOR everything → pairs cancel, result is the single. One variable, one pass.
- III (two singles): XOR everything → get
a ^ b. Use one differing bit to split the array; XOR each half to getaandb. - II (triplets): XOR doesn't cancel triplets. For each bit position, count how many numbers have that bit set; take count % 3. The resulting bits form the single number (or use a state machine: "ones" and "twos" per bit).
Single Number I (One Single, Rest Twice)
This is the direct application of a ^ a = 0 and commutativity: XOR of the entire array equals the single element.
def single_number_i(nums: list[int]) -> int:
result = 0
for x in nums:
result ^= x
return result
Time: O(n). Space: O(1).
Single Number III (Two Singles, Rest Twice)
After XORing all elements we have xor_all = a ^ b. We need to separate a and b. They differ in at least one bit; pick the lowest set bit of xor_all (so one of a,b has that bit, the other doesn't). Partition the array by that bit and XOR each partition: one partition gives a, the other gives b.
def single_number_iii(nums: list[int]) -> list[int]:
xor_all = 0
for x in nums:
xor_all ^= x
low = xor_all & (-xor_all) # lowest set bit
a, b = 0, 0
for x in nums:
if x & low:
a ^= x
else:
b ^= x
return [a, b]
Time: O(n). Space: O(1). Order of a and b in the output may vary; the problem usually accepts any order.
Single Number II (One Single, Rest Three Times)
XOR cancels pairs, not triplets: if x appears three times, x ^ x ^ x = x, so the XOR of the whole array is not the single. We need to exploit "count mod 3" per bit.
Idea: Bit-count mod 3
For each bit position i, count how many numbers in the array have bit i set. If every other number appears 3 times, that count is 3 * (number of distinct values with bit i) + 0 or 1; the remainder mod 3 is exactly the i-th bit of the single number. So: for each bit i, set the i-th bit of the result to (count of numbers with bit i set) % 3.
def single_number_ii_bitcount(nums: list[int]) -> int:
result = 0
for i in range(32): # assume 32-bit integers
count = 0
for x in nums:
if (x >> i) & 1:
count += 1
result |= (count % 3) << i
return result
This works for non-negative 32-bit integers. For Python's arbitrary-precision integers, you can use i up to max(nums).bit_length() or a fixed upper bound (e.g. 32 or 64) if the problem states a range.
Optimization: State machine (ones and twos)
We can simulate "count mod 3" per bit with two variables ones and twos: ones holds bits that have been set 1 time mod 3, twos holds bits set 2 times mod 3. After processing all numbers, ones is the single. The update rules (for each number x):
def single_number_ii_state(nums: list[int]) -> int:
ones, twos = 0, 0
for x in nums:
ones = (ones ^ x) & ~twos # add x to ones, but remove from ones if already in twos
twos = (twos ^ x) & ~ones # add x to twos, but remove from twos if now in ones
return ones
Intuition: when we see x, we XOR it into ones (first occurrence), then on second occurrence it moves from ones to twos (because we mask by ~twos and later twos gets x and ones loses it via ~ones), and on third occurrence it clears from both. So each bit appears in ones only when its count mod 3 is 1. One pass, O(1) space.
Evolution: Brute Force → Better → Optimal
- Brute force: For each element, count how many times it appears (nested loop or hash). Time O(n²) or O(n) with O(n) space.
- Better (I & III): Hash map to count frequencies, then find the element with count 1. O(n) time, O(n) space.
- Optimal (I & III): XOR solution — O(n) time, O(1) space. For II, bit-count or state machine — O(n) time, O(1) space (vs. hash).
Comparison Table
Problem | Frequency of others | Approach | Key idea ---------------|---------------------|------------------------|------------------ Single Number I| twice | XOR all | Pairs cancel Single Number II| three times | Bit-count % 3 or state | Triplets don't cancel Single Number III| twice | XOR all, then partition| a^b, split by diff bit
Time and Space Complexity
- Single Number I: O(n) time, O(1) space.
- Single Number III: O(n) time, O(1) space.
- Single Number II (bit-count): O(n × bits) time (e.g. O(32n)), O(1) space.
- Single Number II (state): O(n) time, O(1) space.
Edge Cases
- Single element: I and III — array of length 1 or 2; handle accordingly (e.g. for III with n=2, both are "singles").
- Negative numbers (II): Bit-count and state-machine work on the binary representation; for negative integers in Python, 32-bit or 64-bit mask may be needed (e.g.
x & 0xFFFFFFFF) if the problem expects a fixed width. - Order of output (III): Returning [a, b] vs [b, a] is usually acceptable; sort if the problem requires sorted order.
Common Mistakes
x is x, so the total XOR is single ^ (xor of all distinct triple values), which is not the single number.
a and b. You must use a bit where a ^ b is 1, i.e. a set bit of xor_all, so that a and b go to different groups.
xor_all & -xor_all, partition array by that bit, XOR each group to get the two numbers. Single Number II: either count set bits per position mod 3, or use the ones/twos state machine. State the constraint (twice vs three times) and choose the right tool."
Practice Problems
- LeetCode 136 — Single Number (I).
- LeetCode 137 — Single Number II.
- LeetCode 260 — Single Number III.
- Generalize: every element appears k times except one that appears once (use count mod k per bit or state with log₂(k) variables).
Summary
- Single Number I: XOR every element; pairs cancel → result is the single. O(n), O(1).
- Single Number III: XOR all → a^b; pick lowest set bit of that; partition by that bit, XOR each half → the two singles. O(n), O(1).
- Single Number II: XOR doesn't work (triplets don't cancel). Use bit-count mod 3 per bit, or ones/twos state machine. O(n), O(1).
- Recognize "pairs" (XOR) vs "two singles" (XOR + partition) vs "triplets" (mod 3 or state).
16.4 Subsets via Bitmask
Introduction
Subsets via bitmask is the technique of representing every subset of a set of
n elements as an integer from 0 to 2n − 1: the
i-th bit is 1 if and only if the i-th element is in the subset. By iterating over all such
integers, we generate all 2n subsets without recursion or backtracking—just a loop over
masks and bit checks. This is the foundation for subset enumeration, brute-force over choices,
and bitmask DP (Section 15.14 / 16.6).
Real-World Analogy
Imagine a row of n light switches; each switch controls whether one item is "in" or
"out" of a subset. Every possible on/off configuration corresponds to one subset. If we label
configurations by counting in binary (0 = all off, 1 = only first on, 2 = only second on, …,
2n−1 = all on), then we have a one-to-one mapping between integers 0 to 2n−1
and subsets. Subsets via bitmask is: "for each number in that range, read the bits and build
the subset."
Formal Definition
n elements (typically indexed 0 to n−1), a bitmask
is an integer m in the range 0 ≤ m < 2n. The subset
represented by m is { i : the i-th bit of m is 1 }. So mask 0 is the
empty set, mask 2n−1 is the full set. Subset enumeration via
bitmask means iterating m = 0 to 2n−1 and, for each
m, building the subset of indices (or elements) where (m >> i) & 1
is 1.
Why This Topic Matters
- Exhaustive search: Many problems ask for "all subsets" or "try every combination of choices"; bitmask gives a simple, non-recursive way to iterate all 2n subsets.
- Bitmask DP: In DP, state is often "subset of items chosen so far"; the state is stored as an integer mask. Subsets via bitmask is how you interpret and iterate those states.
- Interview staple: Subset Sum, Partition, "generate all subsets," and TSP-style problems often use or build on this idea.
Mental Model
- Mask = subset: Integer
mfrom 0 to 2n−1 ↔ subset of {0, 1, …, n−1}. Bit i set ⇔ element i in subset. - Iterate: Loop
m = 0to2n − 1; for eachm, loopi = 0ton−1and if(m >> i) & 1, include elementi. - Cardinality: The number of elements in the subset is the number of set bits in
m(popcount). Subset of size k ↔ mask with exactly k bits set.
Step-by-Step: Generate All Subsets
- Let
arrbe the list of n elements (or use indices 0..n−1). - For
m = 0to2n − 1(inclusive): - Build a list for this mask: for each
iin0..n−1, if(m >> i) & 1, appendarr[i](ori) to the current subset. - Yield or collect this subset (or process it directly).
ASCII Diagram: n = 3
Elements: [A, B, C] (indices 0, 1, 2)
n = 3 → 2^3 = 8 subsets
mask binary subset
0 000 {}
1 001 {A}
2 010 {B}
3 011 {A,B}
4 100 {C}
5 101 {A,C}
6 110 {B,C}
7 111 {A,B,C}
Python Implementation
def subsets_bitmask(arr: list) -> list[list]:
"""Generate all subsets of arr using bitmask. Order of subsets follows mask 0..2^n-1."""
n = len(arr)
result = []
for m in range(1 << n): # 1 << n = 2^n
subset = []
for i in range(n):
if (m >> i) & 1:
subset.append(arr[i])
result.append(subset)
return result
To get only subsets of a given size k, add a check: if bin(m).count('1') == k (or m.bit_count() == k in Python 3.10+) before appending.
Line-by-Line Explanation
1 << n: Same as 2n; the number of masks.for m in range(1 << n): m takes values 0, 1, …, 2n−1.(m >> i) & 1: 1 if the i-th bit of m is set, 0 otherwise—so we includearr[i]exactly when bit i is 1.- Each
mproduces one subset; order is by numeric value of the mask (empty set first, then singletons, etc., but not strictly by size—e.g. mask 3 = {0,1} comes before mask 4 = {2}).
Time and Space Complexity
- Time: O(2n × n): we iterate 2n masks and for each mask iterate n bits and build a list of size at most n. So total O(n · 2n).
- Space: O(2n) for storing all subsets (each subset can be size O(n)); if we only process one subset at a time and don't store all, space is O(n) for the current subset plus O(1) extra.
Generating all subsets is inherently exponential; the factor of n is from building each subset. For "process each subset without storing" use a generator to keep space O(n).
Edge Cases
- n = 0:
range(1)gives one mask (0); the only subset is the empty list[]. Correct. - Large n: 2n grows very fast (n=20 → about 1e6, n=25 → 33e6). Use only when n is small enough; for larger n, backtracking or pruning is preferred.
- Subsets of indices vs elements: The code above uses
arr[i]so subsets are over the actual elements; if you need indices, appendiinstead.
Common Mistakes
range(2**n) instead of range(1 << n). For small n both work; for larger n, 2**n is an exponentiation and can be slightly slower. More importantly, ensure the upper bound is exclusive: masks are 0 to 2n−1, so range(1 << n) is correct (stops at 2n).
k set bits (subsets of size k), you can use for m in range(1 << n): if m.bit_count() == k: .... For very large n, use Gosper's hack or a recursive/backtracking generator to avoid iterating all 2n masks when you only want a fixed size.
for m in range(1 << n): subset = [arr[i] for i in range(n) if (m >> i) & 1]; .... This keeps memory O(n). For bitmask DP you don't build the subset list at all—you use the integer mask as the state key.
Pattern Recognition
Whenever the problem asks for "all subsets," "every combination of choices from n items," or "try all 2n possibilities," consider bitmask enumeration. If the problem then asks for the best subset satisfying a constraint (e.g. maximum sum, feasibility), you iterate masks and evaluate each—or use bitmask DP to avoid redoing work (Section 16.6).
Practice Problems
- Generate all subsets of an array (LeetCode 78 — Subsets).
- Generate all subsets of size k (combinations).
- Subset Sum: is there a subset that sums to target? (Iterate masks, sum selected elements.)
- Partition Equal Subset Sum: can the array be split into two subsets with equal sum? (Check if any subset sums to half total.)
Summary
- Bitmask = subset: Integer m in [0, 2n−1]; bit i set ⇔ element i in subset.
- Enumerate all: For m from 0 to 2n−1, build subset = [ arr[i] for i in range(n) if (m >> i) & 1 ].
- Time: O(n · 2n); space: O(n) per subset if not storing all.
- Foundation for brute-force subset problems and bitmask DP (state = mask).
16.5 Gray Code
Introduction
Gray code (also called reflected binary code) is an ordering of
the 2n n-bit binary numbers such that consecutive numbers differ in exactly
one bit. This property is useful in error correction, analog-to-digital conversion,
and puzzles (e.g. Tower of Hanoi). In competitive programming and interviews, you may be asked
to generate the n-bit Gray code sequence (LeetCode 89) or to convert between binary and Gray
encoding. The key formula: the i-th Gray code value (0-indexed) is i ^ (i >> 1).
Real-World Analogy
Imagine a circular dial with 2n positions, each labeled with an n-bit pattern. If two adjacent positions differed in several bits, a slight misalignment could read a completely wrong value. Gray code ensures that when you move from one position to the next, only one "digit" flips—so small errors cause at most a small misread. It's like a ruler where neighboring marks are as similar as possible so you never jump from 011 to 100 by a tiny error.
Formal Definition
gray(i) = i ⊕ (i >> 1) (XOR of
i with i right-shifted by 1). Equivalently, to convert binary b to Gray g:
g = b ^ (b >> 1). To convert Gray g back to binary b: start
with the MSB of b equal to the MSB of g, then each lower bit is b[i] = g[i] ⊕ b[i+1]
(or use a loop: b = g; while g: g >>= 1; b ^= g).
Why This Topic Matters
- Interview: LeetCode 89 "Gray Code" asks for the n-bit Gray code sequence; the one-liner
i ^ (i >> 1)for i in range(2n) is the expected solution. - Single-bit change: When you need to enumerate all 2n states but want consecutive states to differ by one bit (e.g. in some DP or search), Gray code gives that order.
- Conversion: Binary ↔ Gray conversion appears in encoding and hardware; the XOR formula is simple and O(1) per value.
Mental Model
- Sequence: List of 2n numbers; each pair of neighbors differs in exactly one bit.
- Formula: The i-th Gray code number (0-indexed) =
i ^ (i >> 1). No need to build the sequence recursively—just iterate i and apply the formula. - Binary → Gray:
g = b ^ (b >> 1). Gray → binary: repeatedly XOR with shifted copy of current value until shift becomes 0.
Construction: Reflected Binary Gray Code
For n=1, the Gray code is just [0, 1]. For n=2, take the n=1 sequence, reflect it (so 1, 0), prefix the first half with 0 and the reflected half with 1: [00, 01, 11, 10] → decimals [0, 1, 3, 2]. For n=3, repeat: reflect [0,1,3,2] to [2,3,1,0], prefix first four with 0 and next four with 1 → [0,1,3,2,6,7,5,4]. The formula i ^ (i >> 1) gives exactly this sequence: for i = 0,1,2,…,7 we get 0,1,3,2,6,7,5,4.
ASCII Diagram: n = 3
i binary(i) i>>1 gray = i^(i>>1) binary(gray) 0 000 000 000 0 1 001 000 001 1 2 010 001 011 3 3 011 001 010 2 4 100 010 110 6 5 101 010 111 7 6 110 011 101 5 7 111 011 100 4 Gray sequence (order): 0, 1, 3, 2, 6, 7, 5, 4 Consecutive pairs differ in exactly one bit.
Python Implementation
Generate n-bit Gray code sequence (LeetCode 89)
def gray_code(n: int) -> list[int]:
"""Return the n-bit Gray code sequence (0 to 2^n - 1 in Gray order)."""
return [i ^ (i >> 1) for i in range(1 << n)]
Binary to Gray
def binary_to_gray(b: int) -> int:
return b ^ (b >> 1)
Gray to binary
def gray_to_binary(g: int) -> int:
b = g
while g:
g >>= 1
b ^= g
return b
Line-by-Line Explanation
- gray_code(n): For each i from 0 to 2n−1,
i ^ (i >> 1)produces the Gray code value at position i. The list is already in the required order (consecutive indices differ by 1 in binary, which yields consecutive Gray values differing in one bit). - binary_to_gray: One XOR with the right-shifted value. Each bit of the result is the XOR of that bit and the next higher bit of b, which is the standard binary-to-Gray rule.
- gray_to_binary: We recover b from g. The MSB of b equals the MSB of g. For the next bit, b[i] = g[i] ⊕ b[i+1]. The loop does: repeatedly shift g right and XOR into b, so b accumulates the decoded binary. (Starting with b=g, then b ^= (g>>1), then b ^= (g>>2), … recovers b.)
Time and Space Complexity
- Generate sequence: O(2n) time (one operation per value), O(2n) space for the list. If we stream values (generator), space is O(1).
- Binary ↔ Gray (single value): O(1) for binary-to-Gray; O(log n) or O(number of bits) for Gray-to-binary (the while loop runs once per bit).
Edge Cases
- n = 0: 20 = 1; the sequence is [0].
0 ^ (0 >> 1) = 0. Correct. - n = 1: Sequence [0, 1]; consecutive differ in one bit. Correct.
- Gray to binary for large g: In Python integers are arbitrary-precision; the while loop runs until g becomes 0, so it runs over the bit length of g.
Common Mistakes
i ^ (i >> 1), not the other way around. We iterate i (binary 0..2n−1) and compute the Gray value for that index.
i ^ (i << 1) instead of i ^ (i >> 1). Gray code uses right shift: we XOR with the value that has bits shifted right by 1, so each Gray bit is (binary bit i) ⊕ (binary bit i+1).
[format(i ^ (i >> 1), '0' + str(n) + 'b') for i in range(1 << n)]. Or pad with zeros to length n.
[i ^ (i >> 1) for i in range(2**n)]. Binary to Gray: b ^ (b >> 1). Gray to binary: b = g; while g: g >>= 1; b ^= g. LeetCode 89 is the standard problem."
Practice Problems
- LeetCode 89 — Gray Code: generate the sequence for n.
- Convert a given binary number to Gray and back; verify round-trip.
- Find the position (index) of a given Gray code value in the n-bit sequence (equivalent to Gray-to-binary).
Summary
- Gray code: Ordering of 0..2n−1 so consecutive values differ in exactly one bit.
- Generate:
[ i ^ (i >> 1) for i in range(1 << n) ]. - Binary → Gray:
g = b ^ (b >> 1). - Gray → binary:
b = g;while g: g >>= 1; b ^= g. - Uses only XOR and shift; no recursion needed for the sequence.
16.6 Bit DP
Introduction
Bit DP (bitmask dynamic programming) is the technique of using a bitmask—
an integer whose bits represent a subset of n elements—as (part of) the state in dynamic
programming. Instead of storing "which items we've chosen" as a list or set, we encode it as a single
integer from 0 to 2n−1. The DP state is then something like dp[mask] or
dp[mask][j], and transitions correspond to flipping bits (adding or removing one element).
This gives O(2n × poly(n)) solutions for problems like Traveling Salesman (TSP), assignment
(n tasks to n people), and "best subset" optimization. Bit DP is the natural combination of Section 16.4
(Subsets via Bitmask) and DP (Section 15): state = subset encoded as mask; transition = try one more
element. See also Section 15.14 (Bitmask DP) for more examples.
Real-World Analogy
Imagine you have a checklist of n cities to visit. Instead of writing down the list of cities you've already visited, you keep a single number: its binary digits are 1 for "visited" and 0 for "not visited." So the number 13 (binary 1101) means cities 0, 2, and 3 are visited. When you move to a new city, you flip one bit. In Bit DP we ask: "For every possible checklist (every mask), what is the best cost (or score) we can achieve?" We fill the table from small checklists (few 1s) to the full checklist (all 1s), so when we compute a state we have already computed the smaller states it depends on.
Formal Definition
dp[mask] = best value for the subset mask; dp[mask][i] = best value for
subset mask with some extra context (e.g. current position i). Transitions: from a
state (mask, …), try adding one new element j (set bit j): new_mask = mask | (1 << j); then
dp[new_mask][…] is updated from dp[mask][…] plus the cost or reward of including j. We iterate masks
so that when we compute dp[mask], all states for subsets of mask (fewer bits) are already computed.
Why This Topic Matters
- Exponential but feasible: For n up to about 20, 2n is around 1e6; with a small polynomial factor (e.g. n or n²), Bit DP is the standard solution for "choose a subset and optimize."
- TSP and assignment: Traveling Salesman (visit all cities once, minimize cost) and "assign n tasks to n people" (minimize total cost) are classic Bit DP problems; interviews and contests ask them.
- Unifies bits and DP: You use bit operations (Section 16.1) to encode subsets (Section 16.4) and DP (Section 15) to avoid recomputing; mastering Bit DP shows you can combine these tools.
Bit Operations Recap (State = Mask)
These are the operations you need to implement Bit DP. Let mask be the current subset and i an element (index 0 to n−1).
- Include i:
new_mask = mask | (1 << i) - Exclude i / remove i:
mask & ~(1 << i) - Is i in mask?
(mask >> i) & 1ormask & (1 << i) - Size of subset (popcount):
mask.bit_count()(Python 3.10+) orbin(mask).count('1') - Full set (all n elements):
full = (1 << n) - 1
Mental Model
- State: dp[mask] or dp[mask][j] = "best value when the subset chosen so far is mask" (and optionally "we are at j" or "last chosen was j").
- Base case: Usually the empty subset (mask = 0) or a subset with one element (mask = 1 << i).
- Transition: To extend mask by one element j (where j is not in mask), set new_mask = mask | (1 << j) and update dp[new_mask][…] from dp[mask][…] plus the cost/reward of adding j.
- Order: Iterate masks in increasing order (0 to 2n−1). When we process mask, all subsets of mask (masks with fewer 1s) have smaller numeric value, so they are already computed. For TSP we need "mask without v," which is smaller than mask, so the same order works.
Step-by-Step: Designing a Bit DP
- Identify the "elements": What are the n items? (e.g. n cities, n tasks.)
- Define state: What do we need to remember? Usually "which subset" (mask) plus one more thing (e.g. current city, last task assigned). So state = (mask, j) → dp[mask][j].
- Base case: Smallest mask (e.g. mask with one element). Set dp[1<<i][i] = 0 or given value.
- Transition: For each mask and each "current" j in mask, how did we get here? From some previous state (smaller mask, different j). Write the recurrence (min/max/sum over possible predecessors).
- Answer: Usually dp[full_mask][…] or min/max over dp[full_mask][j] for all j, possibly plus a final step (e.g. return to start).
- Loop order: Iterate mask from 0 to 2n−1 so that any "mask without one element" is already computed.
Example 1: Traveling Salesman (TSP)
Problem: n cities (0 to n−1). Cost matrix cost[u][v]. Start at city 0, visit every city exactly once, return to 0. Minimize total cost.
State: dp[mask][v] = minimum cost to have visited exactly the set of cities in mask and currently be at city v (v must be in mask).
Base: dp[1 << 0][0] = 0 (we start at city 0; only city 0 is visited).
Transition: We reached (mask, v) by having been at some city u in the set "mask without v," then moving from u to v. So:
dp[mask][v] = min over u in (mask \ {v}) of ( dp[mask & ~(1<<v)][u] + cost[u][v] )
Answer: After visiting all cities, we are at some v; we need to go back to 0. So ans = min over v of ( dp[full][v] + cost[v][0] ).
Python: TSP (Bit DP)
def tsp(n, cost):
"""cost[i][j] = cost from i to j. Start at 0, visit all once, return to 0. Min total cost."""
INF = float("inf")
full = (1 << n) - 1
dp = [[INF] * n for _ in range(1 << n)]
dp[1 << 0][0] = 0
for mask in range(1 << n):
for v in range(n):
if not (mask & (1 << v)):
continue
prev_mask = mask & ~(1 << v) # mask without v
for u in range(n):
if not (prev_mask & (1 << u)):
continue
if dp[prev_mask][u] != INF:
dp[mask][v] = min(dp[mask][v], dp[prev_mask][u] + cost[u][v])
ans = INF
for v in range(n):
if dp[full][v] != INF:
ans = min(ans, dp[full][v] + cost[v][0])
return ans
Line-by-Line Explanation (TSP)
full = (1 << n) - 1: The mask with all n bits set (all cities visited).dp[mask][v]: Minimum cost to visit exactly the cities in mask and end at v.- Base:
dp[1<<0][0] = 0: only city 0 visited, we're at 0, cost 0. - For each mask and each v in mask:
prev_mask = mask & ~(1<<v)is the set "mask without v." We must have come from some city u in prev_mask; the cost is dp[prev_mask][u] + cost[u][v]. We take the minimum over such u. - Answer: For the full mask, we are at some v; add cost[v][0] to return to 0, and take the minimum over v.
Example 2: Assignment (Min Cost to Assign n Tasks to n People)
Problem: n tasks, n people. cost[i][j] = cost if person j does task i. Each person does exactly one task; each task assigned to exactly one person. Minimize total cost.
State: dp[mask] = minimum total cost to assign the tasks corresponding to the set mask (mask has k set bits = first k "slots" or we interpret: bit j set = person j already assigned). Common convention: mask = "which tasks have been assigned" (task i assigned if bit i set). Then we need "which person did the last task?" — so state dp[mask] where the number of set bits in mask = number of tasks assigned; we assign tasks in order 0,1,…, so mask with k bits = first k tasks assigned. Then dp[mask] = min over "who did the last task" (the k-th task): dp[mask without j] + cost[k-1][j] for j in mask. Simpler: dp[mask] = min cost to assign tasks for the set of indices in mask (we need to know how many tasks = popcount(mask)); the "last" task is one of them. So dp[mask] = min over i in mask of ( dp[mask without i] + cost[popcount(mask)-1][i] ) if we assign task number (popcount-1) to person i. Actually the standard formulation: mask = subset of people already assigned (or tasks); then we assign task by task. Let mask = "tasks already assigned" (bit i = 1 if task i is done). Then dp[mask] = min cost to complete the tasks in mask (each task assigned to a distinct person). Transition: the last task we completed is some i in mask; it was assigned to person j = (popcount(mask)-1)-th person... This gets index-heavy. Simpler: mask = subset of people used (bit j = 1 if person j assigned). We assign task 0, then task 1, ... So when mask has k ones, we've assigned tasks 0..k-1. So dp[mask] = min over j in mask of ( dp[mask without j] + cost[popcount(mask)-1][j] ). Base: dp[0] = 0. Answer: dp[full].
def assignment_min_cost(cost):
"""cost[i][j] = cost for task i to be done by person j. n tasks, n people. Min total cost."""
n = len(cost)
full = (1 << n) - 1
dp = [float("inf")] * (1 << n)
dp[0] = 0
for mask in range(1 << n):
k = mask.bit_count() # number of tasks assigned so far (tasks 0..k-1)
for j in range(n):
if not (mask & (1 << j)):
continue
prev = mask & ~(1 << j)
dp[mask] = min(dp[mask], dp[prev] + cost[k - 1][j])
return dp[full]
Here mask = set of people already assigned; k = number of people in mask = number of tasks we've assigned (tasks 0 to k−1). So the last task assigned was task k−1, and it was assigned to some person j in mask. Cost for that is cost[k−1][j]; the rest is dp[prev]. We minimize over j.
Time and Space Complexity
- TSP: O(2n × n2) time — for each of 2n masks and each v (n), we loop over u (n). Space O(2n × n).
- Assignment: O(2n × n) time — for each mask we iterate over n bits to find j in mask; popcount is O(1) in Python 3.10+. Space O(2n).
In general, Bit DP is O(2n × (number of states per mask) × (transition cost)). The 2n factor is fixed; the rest depends on the problem.
Edge Cases
- n = 0 or n = 1: TSP with one city: no travel; return 0. Assignment with n=1: one task, one person; return cost[0][0].
- Unreachable / INF: If some costs are infinite, initialize dp with INF and only relax when the previous state is not INF; the answer may be INF if no valid tour or assignment exists.
- Base case mask: Ensure the base state (e.g. dp[1<<0][0] for TSP) is set before any state that depends on it; iterate from mask=0 so that small masks are computed first.
Common Mistakes
1 < i instead of 1 << i; using + to add an element instead of | (1 << i); using - to remove instead of & ~(1 << i). Always use the correct bit ops so the mask truly represents a subset.
Pattern Recognition
Use Bit DP when: (1) You need to "choose a subset" of n elements (n small, e.g. ≤ 20). (2) The objective is min/max/count over valid subsets. (3) The constraint is "each element used at most once" or "visit each exactly once." (4) Subproblems naturally decompose by "which subset we've chosen so far" plus optional context (current position, last item). If the problem asks for "minimum cost to visit all" or "assign all with min cost," think Bit DP.
Practice Problems
- TSP: visit all cities once, return to start (min cost).
- Assignment: n tasks to n people, cost[i][j], min total cost.
- LeetCode 847 — Shortest Path Visiting All Nodes: state (mask, node), BFS or DP.
- Partition to K equal sum subsets (can use mask for "which elements in current subset" in backtracking; Bit DP if K and n are small).
Summary
- Bit DP: DP state includes a bitmask encoding a subset; transitions add or remove one element (flip one bit).
- State: dp[mask] or dp[mask][j]; mask = subset of {0,…,n−1}; use | (1<<i) to add, & ~(1<<i) to remove, (mask>>i)&1 to check.
- TSP: dp[mask][v] = min cost to visit mask and end at v; transition from dp[mask\v][u] + cost[u][v]; answer min_v dp[full][v] + cost[v][0].
- Assignment: dp[mask] = min cost to assign tasks 0..popcount(mask)-1 to people in mask; transition from dp[mask\j] + cost[k-1][j].
- Order: Iterate mask from 0 to 2n−1 so smaller subsets are ready when computing larger ones. Time O(2n × poly(n)), space O(2n × …).
Section 17: Greedy Algorithms
This section covers greedy algorithms: making the best local choice at each step in the hope that it leads to a global optimum. You will learn Activity Selection (maximum non-overlapping intervals), Fractional Knapsack, Huffman Coding, Job Sequencing, Interval Merging, and the Gas Station problem. Greedy works when the problem has optimal substructure and the greedy choice property—recognizing when that holds is a key skill for interviews and contests.
17.1 Activity Selection
Introduction
The Activity Selection problem: given n activities, each with a
start time and finish time, select the maximum number
of activities that can be performed by a single person (or resource) assuming only one activity at a
time. Two activities are compatible if they do not overlap—i.e. the finish time of one is less than
or equal to the start time of the other. This is one of the classic problems where a greedy algorithm
is optimal: sort by finish time and repeatedly pick the next activity that doesn't overlap the last
chosen one.
Real-World Analogy
Imagine you have a single meeting room and many meeting requests with fixed start and end times. You want to schedule as many meetings as possible. The greedy strategy: always choose the meeting that ends earliest among those that haven't started yet (and don't overlap the last one you picked). Ending early frees the room sooner, leaving more room for later meetings. This "earliest finish first" rule turns out to be optimal for maximizing the count.
Formal Definition
s[i] and finish time
f[i] (assume s[i] < f[i]). Two activities i and j are compatible
if they do not overlap: f[i] ≤ s[j] or f[j] ≤ s[i]. Goal:
Select a set of mutually compatible activities of maximum size. Greedy
choice: Sort activities by finish time; then in order, add an activity to the solution if
its start time is at least the finish time of the last chosen activity.
Why This Topic Matters
- Classic greedy: Activity Selection is the standard first example of "greedy works here"; it teaches the pattern of sorting by one key (finish time) and scanning once.
- Interview staple: Same idea appears as "merge intervals," "non-overlapping intervals," "minimum rooms for meetings," and "maximum events you can attend."
- Optimal substructure: The problem has the property that an optimal solution contains an optimal solution to a smaller subproblem (activities that start after some time t).
Mental Model
- Sort by finish time: So we always consider "what ends first." That way we free the resource as early as possible.
- Greedy rule: After choosing some activities, the next valid choice is any activity whose start ≥ last finish. Among those, picking the one that finishes earliest (first in sorted order) is optimal.
- Why it works: If an optimal solution passes over an activity that ends earlier than the one it chose at some step, we can swap: the earlier-finishing activity is also compatible and leaves more room for the rest. So there is always an optimal solution that follows our greedy choices.
Step-by-Step Algorithm
- Sort activities by finish time (ascending). If finish times tie, any order is fine (or sort by start as tiebreaker).
- Initialize: last finish = −∞ (or a value before any start), count = 0 (or list of chosen indices).
- For each activity in sorted order: if activity.start ≥ last_finish, then choose it: set last_finish = activity.finish, increment count (or append to list).
- Return the count (or the list of chosen activities).
ASCII Diagram
Activities (start, finish): (1,3), (2,4), (3,5), (0,6), (5,7), (8,9)
Sorted by finish: (1,3) (2,4) (3,5) (0,6) (5,7) (8,9)
f=3 f=4 f=5 f=6 f=7 f=9
Greedy: pick (1,3) → last=3. (2,4) start 2 < 3 → skip. (3,5) start 3 ≥ 3 → pick, last=5.
(0,6) start 0 < 5 → skip. (5,7) start 5 ≥ 5 → pick, last=7. (8,9) start 8 ≥ 7 → pick.
Chosen: (1,3), (3,5), (5,7), (8,9) → 4 activities. Optimal.
Python Implementation
def activity_selection(starts: list[int], finishes: list[int]) -> list[int]:
"""Return indices of a maximum-size set of non-overlapping activities.
starts[i], finishes[i] = start and finish time of activity i."""
n = len(starts)
# Sort by finish time; keep original index
activities = [(finishes[i], starts[i], i) for i in range(n)]
activities.sort()
result = []
last_finish = -1
for f, s, idx in activities:
if s >= last_finish:
result.append(idx)
last_finish = f
return result
Line-by-Line Explanation
- We sort by
(finish, start, index)so that activities are processed in order of increasing finish time; we keep the original index to return. last_finish = -1: initially no activity chosen, so any start time ≥ −1 is valid.- For each activity: if its start is at least
last_finish, it doesn't overlap the last chosen one; we add it and updatelast_finishto this activity's finish. - We only need to compare with the last chosen activity (not all previous) because we sorted by finish—the last chosen has the maximum finish among all chosen, so compatibility with it implies compatibility with the whole set.
Time and Space Complexity
- Time: O(n log n) — dominated by sorting. One pass over the sorted list is O(n).
- Space: O(n) for the list of (finish, start, index) and the result. If we only need the count, we can avoid storing indices and use O(1) extra beyond the sort.
Edge Cases
- Empty input: Return empty list (or count 0).
- Single activity: Return that one (start ≥ last_finish with last_finish = −1).
- All overlap: Only one activity can be chosen; the one that finishes first is selected.
- No overlap: All activities are compatible; we take all (each has start ≥ previous finish after sorting).
Common Mistakes
s >= last_finish (or strict > if intervals are open). Don't use s > last_finish if the problem allows touching (one ends at 3, next starts at 3).
Practice Problems
- LeetCode 435 — Non-overlapping Intervals (min intervals to remove so rest are non-overlapping: equivalent to max intervals to keep).
- LeetCode 452 — Minimum Number of Arrows to Burst Balloons (interval overlap variant).
- Maximum number of meetings in one room (same as Activity Selection).
Summary
- Problem: Maximum number of non-overlapping activities (each has start, finish).
- Greedy: Sort by finish time; repeatedly pick the next activity with start ≥ last chosen finish.
- Why optimal: Earliest-finish-first leaves the resource free as early as possible; there is always an optimal solution that includes the greedy choice.
- Time: O(n log n); space: O(n).
17.2 Fractional Knapsack
Introduction
The Fractional Knapsack problem: given n items, each with a
weight and a value, and a knapsack of capacity W,
you may take any fraction of each item (e.g. half of an item). The goal is to
maximize the total value in the knapsack without exceeding capacity. Because fractions are allowed,
a greedy strategy is optimal: sort items by value per unit weight (value/weight)
in descending order, then repeatedly take as much as possible of the next item until the knapsack
is full. This contrasts with the 0/1 Knapsack (Section 15.6), where each item is taken whole or not—
that problem requires DP; Fractional Knapsack is solved by greedily choosing the "most valuable
per pound" first.
Real-World Analogy
Imagine you can fill a bag with grains: rice, gold dust, and sand. Each has a value per kilogram. You want to maximize the total value in the bag. The best strategy is to take as much as you can of the most valuable-per-kg material first (e.g. gold), then the next (e.g. rice), and only use the least valuable (sand) to top off the bag. That's exactly the fractional knapsack greedy: value per unit weight decides the order; you take whole items until capacity runs out, then a fraction of the next item if needed.
Formal Definition
w[i] and value v[i];
knapsack capacity W. We may take a fraction x[i] of item i (0 ≤ x[i] ≤ 1),
contributing weight x[i]*w[i] and value x[i]*v[i]. Constraint:
Total weight ≤ W. Goal: Maximize total value. Greedy: Sort items
by v[i]/w[i] (value per unit weight) descending. In that order, take each item fully
if capacity allows; otherwise take the fraction that fills the remaining capacity.
Why This Topic Matters
- Greedy vs DP: Fractional Knapsack is the classic example where greedy works because we can "take a fraction"; 0/1 Knapsack cannot be solved by this greedy (counterexample: one heavy high-value item vs many light medium-value items).
- Interview: Often asked as "maximum value you can get with capacity W if you can take fractions of items."
- Key idea: When the choice is continuous (fractions), "value per unit" is the right ordering; when the choice is discrete (whole items only), we need DP.
Mental Model
- Value density: value/weight = how much value we get per unit capacity. We want to fill the knapsack with the "densest" value first.
- Greedy order: Process items from highest value/weight to lowest. For each item, take min(remaining capacity, item weight)—i.e. take the whole item or a fraction that fills the rest.
- Why optimal: If an optimal solution used less of a high-density item and more of a lower-density item, we could swap: replace some of the lower-density with high-density and get more value for the same weight. So an optimal solution always uses items in order of decreasing value/weight until capacity is used.
Step-by-Step Algorithm
- Compute value per unit weight for each item:
ratio[i] = v[i] / w[i]. Handlew[i] == 0(infinite ratio: take that item fully; or skip if no weight). - Sort items by
ratiodescending (highest value-per-weight first). - Initialize:
remaining = W,total_value = 0. - For each item in sorted order: take
take = min(remaining, w[i]); addtake * (v[i]/w[i])to total_value; subtracttakefrom remaining. Ifremaining == 0, break. - Return total_value (and optionally which fractions of which items were taken).
ASCII Diagram
Items: (weight, value) (2, 40), (3, 50), (5, 60) Capacity W = 5 Ratio (v/w): 40/2=20, 50/3≈16.67, 60/5=12 Sorted by ratio: (2,40), (3,50), (5,60) Take (2,40) fully: weight 2, value 40, remaining = 3. Take (3,50) fully: weight 3, value 50, remaining = 0. Stop. Total value = 40 + 50 = 90. (Item 3 not taken; we're full.)
Python Implementation
def fractional_knapsack(weights: list[float], values: list[float], capacity: float) -> float:
"""Maximize value with capacity W; fractions of items allowed. Returns max total value."""
n = len(weights)
# (ratio = value/weight, weight, value) — sort by ratio desc
items = []
for i in range(n):
w, v = weights[i], values[i]
ratio = v / w if w > 0 else float('inf')
items.append((ratio, w, v))
items.sort(key=lambda x: x[0], reverse=True)
total_value = 0.0
remaining = capacity
for ratio, w, v in items:
if remaining <= 0:
break
take = min(remaining, w)
total_value += take * ratio # take * (v/w) = value taken
remaining -= take
return total_value
Line-by-Line Explanation
ratio = v / w: value per unit weight. Ifw == 0, we take that item fully (ratio infinity) or define behavior (e.g. skip).sort(..., reverse=True): highest ratio first so we fill the knapsack with the most "valuable per kg" items first.take = min(remaining, w): take the whole item or only the fraction that fits.take * ratiois the value we get (take * (v/w) = (take/w) * v).- We stop when
remaining <= 0(knapsack full) or when we've processed all items.
Time and Space Complexity
- Time: O(n log n) — sorting by ratio. The scan is O(n).
- Space: O(n) for the list of (ratio, w, v). In-place sorting of indices by ratio would also work and keep the same complexity.
Edge Cases
- Capacity 0: Return 0 (we take nothing).
- Weight 0: Item contributes no weight but has value; ratio is infinite. Taking it fully gives free value. Handle by
ratio = infand takingtake = min(remaining, w)— for w=0 we take 0 weight, so we might need to add the full value if we allow "zero-weight items" (problem-dependent). Often problems assume w[i] > 0. - Total weight < W: We take all items; remaining stays positive; total value is sum of all values.
Common Mistakes
Practice Problems
- Standard Fractional Knapsack (given weights, values, W; return max value).
- Same but return the list of (item index, fraction taken).
- Compare with 0/1 Knapsack: same instance, show that fractional optimal ≥ 0/1 optimal, and that greedy fractional can exceed 0/1 optimal.
Summary
- Problem: Maximize value with capacity W; any fraction of each item allowed.
- Greedy: Sort by value/weight (desc); take each item fully or the fraction that fills remaining capacity.
- Why optimal: Value-per-weight ordering ensures we never prefer a lower-density item over a higher-density one; swapping would improve.
- Time: O(n log n); space: O(n). Does not work for 0/1 Knapsack—use DP there.
17.3 Huffman Coding
Introduction
Huffman Coding is a greedy algorithm used for lossless data compression. Given symbols and their frequencies, it builds a binary prefix code so that frequent symbols get shorter codes and rare symbols get longer codes. The objective is to minimize the total number of bits needed to encode the data. Huffman coding is widely used in compression systems (for example, parts of ZIP, PNG, and JPEG pipelines).
Real-World Analogy
Imagine sending messages where some words appear very often ("the", "is") and others are rare. If every word had a fixed-length code, you waste bits on common words. A smarter strategy is to assign very short codes to common words and longer codes to rare words. Huffman coding does exactly this in an optimal way while ensuring the decoding process is unambiguous.
Formal Definition
S = {s1, s2, ..., sk} with frequencies f1, f2, ..., fk, find a
prefix-free binary code (no code is a prefix of another) minimizing:
sum(fi * code_length(si)). Huffman's greedy solution repeatedly combines the two lowest-frequency
nodes into one parent node until a single tree remains. Left edge is often labeled 0, right edge 1.
Why This Topic Matters
- Core greedy proof pattern: "pick smallest two repeatedly" is a classic example of greedy choice plus optimal substructure.
- Practical systems: Understanding Huffman helps you reason about real compression formats and encoding tradeoffs.
- Interview-ready: Commonly appears with heaps/priority queues and tree construction questions.
Mental Model
- Each symbol starts as a leaf with weight = frequency.
- Always merge the two lightest trees first (greedy choice).
- The merged node has weight equal to the sum of child weights.
- Repeat until one tree remains; path from root to leaf is the symbol's code.
- Frequent symbols tend to stay near the root, so they get shorter codes.
Step-by-Step Breakdown
- Create a min-heap (priority queue) of nodes: each node is (frequency, symbol/tree).
- While heap size > 1:
- Pop the two smallest nodes
xandy. - Create a parent node with frequency
x.freq + y.freq. - Set parent.left = x (bit 0), parent.right = y (bit 1).
- Push parent back into heap.
- Pop the two smallest nodes
- The remaining node is the root of the Huffman tree.
- DFS from root to assign codes: append '0' on left, '1' on right.
ASCII Diagram
Example frequencies:
A:5 B:9 C:12 D:13 E:16 F:45
Merge smallest repeatedly:
5+9=14
12+13=25
14+16=30
25+30=55
45+55=100 (root)
One possible tree (0=left,1=right):
[100]
/ \
F:45 [55]
/ \
[25] [30]
/ \ / \
C:12 D:13 [14] E:16
/ \
A:5 B:9
Codes (one valid assignment):
F:0
C:100
D:101
A:1100
B:1101
E:111
Python Implementation
import heapq
from dataclasses import dataclass
from typing import Optional
@dataclass
class Node:
freq: int
ch: Optional[str] = None
left: Optional["Node"] = None
right: Optional["Node"] = None
def huffman_codes(freq_map: dict[str, int]) -> dict[str, str]:
"""
Build Huffman codes for symbols in freq_map.
Returns mapping: symbol -> binary code.
"""
heap = []
uid = 0 # tie-breaker for stable heap ordering
for ch, freq in freq_map.items():
heapq.heappush(heap, (freq, uid, Node(freq=freq, ch=ch)))
uid += 1
if not heap:
return {}
# Special case: one symbol -> assign "0"
if len(heap) == 1:
only = heap[0][2]
return {only.ch: "0"}
while len(heap) > 1:
f1, _, n1 = heapq.heappop(heap)
f2, _, n2 = heapq.heappop(heap)
parent = Node(freq=f1 + f2, left=n1, right=n2)
heapq.heappush(heap, (parent.freq, uid, parent))
uid += 1
root = heap[0][2]
codes = {}
def dfs(node: Node, path: str) -> None:
if node.ch is not None:
codes[node.ch] = path
return
dfs(node.left, path + "0")
dfs(node.right, path + "1")
dfs(root, "")
return codes
Line-by-Line Explanation
- Use a min-heap so we can always extract the two smallest frequencies in O(log n).
uidavoids comparison errors when frequencies tie (heap entries stay comparable).- Each merge creates an internal node with summed frequency.
- After all merges, one root remains: that root encodes all symbols.
- DFS builds codes from root to leaves; left adds '0', right adds '1'.
- Single-symbol input is a special edge case: assign code "0" so encoding isn't empty.
Brute Force → Better → Optimal
Brute Force (impractical)
Enumerate all prefix-free binary code assignments and pick the best weighted length. This search space is enormous and not feasible.
Better Intuition
We suspect low-frequency symbols should be deeper and high-frequency symbols shallower, but manually enforcing this for all symbols is still hard.
Optimal Greedy (Huffman)
Merge the two least frequent symbols/subtrees first, reduce the problem size by one, and repeat. Using a min-heap gives O(k log k), where k is number of distinct symbols.
Time Complexity
- Building heap: O(k) to heapify or O(k log k) with repeated push (both acceptable).
- Merges: k−1 merges, each with pop/pop/push = O(log k), total O(k log k).
- DFS code generation: O(k) nodes/leaves traversal.
- Overall: O(k log k).
Space Complexity
- Heap stores up to O(k) nodes.
- Tree has O(k) leaves and O(k) internal nodes, so O(k) total auxiliary space.
- Code map stores one code per symbol: O(k).
Edge Cases
- Empty frequency map: return empty code map.
- One symbol: assign "0" (or "1") by convention.
- Tied frequencies: multiple valid Huffman trees/codes may exist; total encoded length remains optimal.
Common Mistakes
Pattern Recognition
If a problem says "combine two minimum-cost elements repeatedly" and each merge contributes to future cost, think min-heap greedy. Huffman is the canonical version of this pattern.
Practice Problems
- Given symbol frequencies, build Huffman codes and compute total encoded bits.
- Given a Huffman tree, output all symbol codes.
- Given text, build frequency map, compress with Huffman codes, and decode back.
Summary
- Goal: Minimum weighted prefix-free binary encoding.
- Greedy rule: Always merge the two least frequent nodes.
- Data structure: Min-heap for efficient repeated smallest extraction.
- Result: Frequent symbols get shorter codes, rare symbols longer codes.
- Complexity: O(k log k) time, O(k) space.
17.4 Job Sequencing
Introduction
Job Sequencing with Deadlines is a classic greedy scheduling problem. Each job takes exactly one unit of time, has a deadline (latest time slot by which it must be finished), and gives a profit if completed by deadline. We can do at most one job per time slot. The goal is to choose and schedule jobs to maximize total profit.
The key greedy strategy is: sort jobs by profit descending, and for each job place it in the latest available slot not exceeding its deadline. This preserves earlier slots for other jobs and leads to optimal profit.
Real-World Analogy
Imagine you are a freelancer with limited daily slots and many client tasks. Each task pays differently and has a latest acceptable submission day. You want maximum earnings. Intuitively, you pick the highest-paying task first. But you schedule it as late as possible before its deadline so earlier slots remain open for other tasks. That is exactly the job sequencing greedy rule.
Formal Definition
(id, deadline, profit). Every job requires one unit time.
Time slots are usually 1..D, where D = max(deadline). A job with deadline d
must be scheduled in some slot t ≤ d. Objective: maximize total profit.
Greedy algorithm:
- Sort jobs by decreasing profit.
- For each job, assign it to the latest free slot ≤ its deadline.
- If no such slot exists, skip the job.
Why This Topic Matters
- Interview classic: Frequently asked in greedy sections, often with "return max jobs and max profit".
- Greedy intuition: Shows how local choices (highest profit first) combine with careful placement (latest valid slot).
- Foundation for scheduling: Builds intuition for interval scheduling and deadline-based planning problems.
Mental Model
- Think of deadlines as limited parking spaces in time.
- High-profit jobs are "more valuable cars" that should get priority.
- Park each chosen job in the rightmost spot it can legally occupy.
- Rightmost placement keeps left spots available for jobs with earlier deadlines.
Brute Force → Better → Optimal
Brute Force
Try all subsets of jobs, and for each subset check if it can be scheduled before deadlines, then compute profit. This is exponential (O(2^n * n log n) or worse), not feasible for large n.
Better
Sort by profit and for each job linearly scan backward for a free slot. This is the classic greedy implementation: O(n log n + n * D), where D is max deadline (or O(n^2) in worst case when D ~ n).
Optimal (with DSU / Disjoint Set)
Use union-find to quickly locate the latest free slot ≤ deadline in almost O(1) amortized time. Complexity becomes O(n log n + n * alpha(n)), typically written O(n log n). This is preferred when deadlines are large.
Step-by-Step Greedy (Simple Slot Array)
- Sort all jobs by profit descending.
- Compute
max_deadline. - Create slots array of size
max_deadline + 1initialized empty. - For each job, scan from
min(deadline, max_deadline)down to1. - If a free slot is found, place job there and add its profit.
- Return selected job sequence (or count) and total profit.
ASCII Diagram
Jobs: (id, deadline, profit) J1(2,100), J2(1,19), J3(2,27), J4(1,25), J5(3,15) Sort by profit: J1(2,100), J3(2,27), J4(1,25), J2(1,19), J5(3,15) Slots: [1] [2] [3] Start empty: _ _ _ Place J1 at latest <=2 -> slot2 _ J1 _ Place J3 at latest <=2 -> slot1 (slot2 filled) J3 J1 _ J4 deadline1 -> slot1 occupied, skip J2 deadline1 -> slot1 occupied, skip Place J5 at latest <=3 -> slot3 J3 J1 J5 Total profit = 27 + 100 + 15 = 142
Python Implementation (Simple Greedy)
from dataclasses import dataclass
@dataclass
class Job:
job_id: str
deadline: int
profit: int
def job_sequencing(jobs: list[Job]) -> tuple[list[str], int]:
"""
Returns (scheduled_job_ids_in_slot_order, max_profit).
Each job takes 1 unit time and must be done by its deadline.
"""
if not jobs:
return ([], 0)
jobs.sort(key=lambda j: j.profit, reverse=True)
max_deadline = max(job.deadline for job in jobs)
slots = [None] * (max_deadline + 1) # index 0 unused
total_profit = 0
for job in jobs:
for t in range(min(job.deadline, max_deadline), 0, -1):
if slots[t] is None:
slots[t] = job
total_profit += job.profit
break
schedule = [slots[t].job_id for t in range(1, max_deadline + 1) if slots[t] is not None]
return (schedule, total_profit)
Line-by-Line Explanation
- Sort by highest profit first so most valuable jobs get first chance.
- Slots represent time 1..max_deadline; one job per slot.
- For each job, scan backward from its deadline to find the latest free valid slot.
- When scheduled, add profit once and move to next job.
- Backward scan is crucial; forward scan can block future jobs with tighter deadlines.
Optimization Insight (DSU Approach)
parent[t] point to
the latest available slot ≤ t. When slot s is occupied, union it with s-1, meaning next query for s
should jump directly to the next free candidate. This reduces scheduling lookup to near-constant time.
Python Implementation (DSU Optimized)
def job_sequencing_dsu(jobs: list[Job]) -> tuple[list[str], int]:
if not jobs:
return ([], 0)
jobs.sort(key=lambda j: j.profit, reverse=True)
max_deadline = max(job.deadline for job in jobs)
parent = list(range(max_deadline + 1)) # parent[t] = best available slot <= t
slot_job = [None] * (max_deadline + 1)
def find(x: int) -> int:
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
total_profit = 0
for job in jobs:
s = find(min(job.deadline, max_deadline))
if s > 0:
slot_job[s] = job
total_profit += job.profit
parent[s] = find(s - 1) # mark s as used
schedule = [slot_job[t].job_id for t in range(1, max_deadline + 1) if slot_job[t] is not None]
return (schedule, total_profit)
Time Complexity
- Simple greedy: Sorting O(n log n), placement O(n * D) in worst case, where D=max deadline.
- If D ~ n: often simplified as O(n^2).
- DSU optimized: Sorting O(n log n), each find/union near O(alpha(n)), total O(n log n).
Space Complexity
- Simple version: O(D) for slots.
- DSU version: O(D) for parent + slot tracking arrays.
- Additional sorting overhead depends on language implementation, typically O(n).
Edge Cases
- No jobs: answer is empty schedule and profit 0.
- Deadlines <= 0: such jobs are unschedulable in 1-based slot model; skip or prefilter.
- Same deadlines/profits: multiple optimal schedules may exist with same profit.
- Very large deadlines: if max deadline is huge but n is small, coordinate-compress deadlines or cap relevant slots to n.
Common Mistakes
Pattern Recognition
When you see tasks with unit duration, deadlines, and profits where objective is maximize total profit, think of this exact greedy pattern: highest profit first + latest valid slot placement.
Practice Problems
- Standard Job Sequencing Problem (maximize count and profit).
- Return scheduled job IDs in slot order and total profit.
- Implement both backward-scan and DSU versions; compare runtime on large random tests.
Summary
- Goal: Maximize profit under deadlines with unit-time jobs.
- Greedy choice: Process jobs by highest profit first.
- Placement rule: Put each chosen job in the latest available slot <= deadline.
- Complexity: O(n log n + nD) simple, O(n log n) with DSU optimization.
17.5 Interval Merging
Introduction
Interval Merging means combining overlapping intervals into disjoint intervals that cover
the same ranges. Given intervals like [1,3] and [2,6], they overlap, so they merge
into [1,6]. This is a core greedy pattern used in scheduling, calendar systems, event conflict
detection, and range normalization.
Real-World Analogy
Think of booked times on a calendar. If one meeting is 10:00–11:00 and another is 10:30–12:00, you can summarize occupied time as 10:00–12:00. If a third meeting is 13:00–14:00, that remains separate. Merging intervals creates a clean "occupied blocks" view.
Formal Definition
[start, end] with start <= end.
Output: a list of non-overlapping intervals where:
- Union of output intervals equals union of input intervals.
- No two output intervals overlap.
- Sort intervals by start time.
- Scan from left to right, merging into the last output interval when overlap exists.
Why This Topic Matters
- Interview frequent: LeetCode 56 (Merge Intervals) is one of the most common interval problems.
- Foundational pattern: Sorting + linear scan appears in many interval tasks (insert interval, erase overlap, meeting rooms).
- Data cleaning: Practical when normalizing time ranges, IP ranges, or numeric segments.
Mental Model
- After sorting by start, any future overlap can only happen with the most recently merged interval.
- Keep a result list; compare current interval with
result[-1]. - If overlap: extend end boundary.
- If no overlap: start a new merged block.
Step-by-Step Breakdown
- If input is empty, return empty list.
- Sort intervals by start ascending (tie-break by end ascending is fine).
- Initialize result with first interval.
- For each next interval
[s, e]:- If
s <= last_end, overlap exists; updatelast_end = max(last_end, e). - Else, append
[s, e]as a new interval.
- If
- Return result.
ASCII Diagram
Input: [1,3] [2,6] [8,10] [15,18] Sorted (already sorted): [1,3], [2,6], [8,10], [15,18] Scan: result = [1,3] [2,6] overlaps with [1,3] -> merge -> [1,6] [8,10] no overlap with [1,6] -> append [15,18] no overlap with [8,10] -> append Output: [1,6], [8,10], [15,18]
Python Implementation
def merge_intervals(intervals: list[list[int]]) -> list[list[int]]:
"""
Merge overlapping intervals.
Intervals are inclusive ranges [start, end].
"""
if not intervals:
return []
intervals.sort(key=lambda x: x[0])
merged = [intervals[0][:]] # copy first interval
for s, e in intervals[1:]:
last = merged[-1]
if s <= last[1]: # overlap
last[1] = max(last[1], e)
else:
merged.append([s, e])
return merged
Line-by-Line Explanation
- Sort by start so potential overlaps become adjacent.
mergedstores normalized disjoint intervals built so far.- Only compare current interval with
merged[-1], not all previous intervals. - Overlap condition
s <= last_endmerges touching/inclusive intervals. - Non-overlap starts a new block in output.
Brute Force → Better → Optimal
Brute Force
Repeatedly compare every pair, merge overlaps, restart until stable. This can devolve into O(n^2) or worse.
Better
Sort first, then only compare neighbors conceptually during one pass. This avoids repeated global scans.
Optimal (comparison model)
O(n log n) due to sorting lower bound. After sorting, the merge scan is O(n). Total O(n log n) is optimal for arbitrary unsorted input.
Time Complexity
- Sorting: O(n log n)
- Single pass merge: O(n)
- Overall: O(n log n)
Space Complexity
- Output: O(n) in worst case (no intervals overlap).
- Extra: O(1) beyond output if sorting in place (language-dependent sort stack may add O(log n)).
Edge Cases
- Empty input: return
[]. - Single interval: return it unchanged.
- Fully nested:
[1,10]and[2,3]-> keep[1,10]. - Touching intervals:
[1,2]and[2,3]merge if inclusive endpoints are intended.
Common Mistakes
s <= last_end (inclusive merge) vs s < last_end (strict overlap only).
Pattern Recognition
Whenever input is intervals/ranges and you need normalized non-overlapping blocks, start with sort by start + linear scan. This pattern also appears in Insert Interval and many event timeline problems.
Practice Problems
- LeetCode 56 — Merge Intervals.
- LeetCode 57 — Insert Interval.
- LeetCode 435 — Non-overlapping Intervals (related greedy variant).
- LeetCode 452 — Minimum Number of Arrows to Burst Balloons.
Summary
- Core idea: Sort intervals by start, then merge overlaps in one pass.
- Overlap check: compare current start with last merged end.
- Complexity: O(n log n) time, O(n) output space.
- Key caution: define inclusive vs strict overlap correctly.
17.6 Gas Station Problem
Introduction
The Gas Station Problem asks: given two arrays gas and cost,
where gas[i] is fuel available at station i and cost[i] is fuel needed to go
from station i to i+1 (circularly), can we complete one full cycle? If yes, return a valid starting
station index; otherwise return -1.
This is a famous greedy problem because a naive "try every start" approach is O(n^2), while the optimal greedy solution is O(n): one pass, constant extra space, and a clear correctness argument.
Real-World Analogy
Imagine driving around a circular highway with fuel pumps at checkpoints. At each checkpoint you get some fuel, then you spend fuel to reach the next checkpoint. If your tank becomes negative at some point, your chosen start clearly fails. The greedy insight is stronger: if start s fails at station t, then no station between s and t can be a valid start either.
Formal Definition
gas[0..n-1], cost[0..n-1].
Define net gain at i as delta[i] = gas[i] - cost[i].
We need an index start such that cumulative fuel from start around the full circle never
drops below 0.
- If
sum(gas) < sum(cost), solution does not exist. - If
sum(gas) >= sum(cost), at least one solution exists (in this problem setting, unique if guaranteed).
Why This Topic Matters
- Interview staple: LeetCode 134 is one of the most common greedy interview questions.
- Greedy proof skill: Teaches elimination arguments ("if this segment fails, all starts inside it fail").
- Linear optimization: Great example of reducing O(n^2) brute force to O(n).
Mental Model
- Track running tank while scanning from left to right.
- If tank drops below 0 at i, current start cannot reach i+1.
- Any start between current start and i also fails (it would have even less fuel before reaching i+1).
- So move start to i+1 and reset local tank to 0.
- Global feasibility is checked by total sum.
Brute Force → Better → Optimal
Brute Force
Try each station as start, simulate full cycle each time. Complexity O(n^2), too slow for large n.
Better Intuition
Observe that failures eliminate whole ranges of starts, not just one index. We should skip impossible candidates in bulk.
Optimal Greedy
One pass with two accumulators:
total += gas[i]-cost[i] for feasibility and tank += gas[i]-cost[i] for current
candidate start. If tank < 0, reset start to i+1 and tank to 0. At end, if
total < 0 return -1; else return start.
Step-by-Step Breakdown
- Initialize
total = 0,tank = 0,start = 0. - For each station i:
delta = gas[i] - cost[i]total += delta,tank += delta- If
tank < 0, setstart = i + 1andtank = 0.
- After loop: if
total < 0, return -1 else returnstart.
ASCII Diagram
gas = [1, 2, 3, 4, 5] cost = [3, 4, 5, 1, 2] delta= [-2,-2,-2,+3,+3] i=0: tank=-2 -> fail, start=1, tank=0 i=1: tank=-2 -> fail, start=2, tank=0 i=2: tank=-2 -> fail, start=3, tank=0 i=3: tank=+3 i=4: tank=+6 total = 0 (feasible) Answer: start = 3 Route from 3: tank never goes negative over full cycle.
Python Implementation
def can_complete_circuit(gas: list[int], cost: list[int]) -> int:
"""
Return start index to complete circular route once, or -1 if impossible.
"""
total = 0
tank = 0
start = 0
for i in range(len(gas)):
delta = gas[i] - cost[i]
total += delta
tank += delta
# Current candidate start cannot reach i+1
if tank < 0:
start = i + 1
tank = 0
return start if total >= 0 else -1
Line-by-Line Explanation
totaltracks global feasibility. If negative at end, impossible for all starts.tanktracks fuel for current candidate start segment.- When
tank < 0at i, current start fails before i+1. - Set
start = i+1: all indices from old start..i are discarded as impossible. - Reset
tank = 0and continue scan. - Final answer is candidate start only if total is non-negative.
Correctness Intuition
Suppose we start at s and first fail at station i (tank becomes negative at i). Then cumulative sum from s to i is negative. Any station k between s and i has even less accumulated fuel to reach i+1 (because it skips some prefix that did not make sum positive enough). So none of those k can be valid starts. Hence skipping directly to i+1 is safe and complete.
Time Complexity
- Single scan: O(n)
- Each index processed once: O(n) overall
Space Complexity
- Only a few scalar variables (
total,tank,start) - Space: O(1)
Edge Cases
- No feasible cycle: if total gas < total cost, answer is -1.
- Single station: return 0 if gas[0] >= cost[0], else -1.
- Multiple valid starts: some variants may allow multiple; this algorithm returns one valid candidate.
- All zeros: total = 0, start 0 is valid.
Common Mistakes
total >= 0. Local resets alone are not enough to prove feasibility.
Pattern Recognition
When a circular traversal asks for a feasible start and failures invalidate contiguous ranges of starts, look for a greedy one-pass elimination strategy with a global feasibility check.
Practice Problems
- LeetCode 134 — Gas Station.
- Circular tour with petrol pumps (classic variant).
- Adaptation: return all feasible starts (harder; requires additional reasoning).
Summary
- Goal: Find start index to complete one circular trip, or -1.
- Greedy: Reset start to i+1 whenever current tank becomes negative.
- Feasibility: Total gas must be at least total cost.
- Complexity: O(n) time, O(1) space.
Section 18: Computational Geometry
This section introduces computational geometry: solving algorithmic problems involving points, lines, vectors, polygons, and spatial relationships. You will learn core building blocks such as points and vectors, cross product, orientation tests, line intersection, polygon area, convex hull, and sweep line. The focus is on clean geometric intuition plus robust formulas that work in code.
18.1 Points & Vectors
Introduction
Points and vectors are the alphabet of computational geometry. Most geometry algorithms reduce to a few vector operations: subtraction, addition, scaling, dot product, and later cross product. If you deeply understand what a point is, what a vector is, and how to compute with them, topics like line intersection, convex hull, polygon area, and sweep line become much easier.
Real-World Analogy
A point is like a location pin on a map (where). A vector is like a movement instruction (how far and in which direction). For example, "start at (2,3)" is a point; "move by (+4, -1)" is a vector. Applying the vector to the point gives a new point: (6,2).
Formal Definition
- A point is a coordinate pair
P = (x, y). - A vector is a directed displacement
v = (vx, vy). - Vector from A to B is
B - A = (Bx-Ax, By-Ay). - Point translation:
P + v = (Px+vx, Py+vy). - Vector length:
|v| = sqrt(vx^2 + vy^2). - Squared length (often preferred in code):
|v|^2 = vx^2 + vy^2.
Why This Topic Matters
- Foundation layer: Orientation, intersection, area, hulls all depend on point/vector operations.
- Implementation reliability: Many bugs in geometry come from weak coordinate modeling and sign mistakes.
- Interview readiness: Geometry questions often test whether you can convert picture intuition into vector math.
Mental Model
- Points are positions; vectors are movements/directions.
- You can subtract two points to get a vector.
- You can add a vector to a point to get another point.
- You can add/subtract vectors to combine movements.
- Distance between points is length of their difference vector.
Step-by-Step Breakdown
- Represent each point as
(x, y). - To get direction from A to B, compute
AB = B - A. - To move point P by vector v, compute
P' = P + v. - To compare distances, prefer squared distance to avoid unnecessary square roots.
- Use dot product to reason about projection/angle-type checks (perpendicular, acute, obtuse).
ASCII Diagram
Coordinate plane:
y
^
5 | B(5,4)
4 | *
3 | A(2,2) *
2 | *
1 |
0 +---------------------------------> x
0 1 2 3 4 5 6
Vector AB = B - A = (5-2, 4-2) = (3,2)
Interpretation:
Start at A, move +3 in x and +2 in y to reach B.
Core Operations
1) Vector Addition and Subtraction
For vectors u=(ux,uy) and v=(vx,vy):
u+v=(ux+vx, uy+vy), u-v=(ux-vx, uy-vy).
2) Scalar Multiplication
k*v = (k*vx, k*vy). This changes magnitude; sign of k can reverse direction.
3) Dot Product
u·v = ux*vx + uy*vy. Uses:
- Length:
|v|^2 = v·v - Orthogonal check:
u·v = 0means perpendicular. - Angle type: dot > 0 acute, dot = 0 right, dot < 0 obtuse.
4) Distance Between Points
For points A and B, distance:
dist(A,B)=sqrt((Bx-Ax)^2 + (By-Ay)^2).
In algorithm comparisons, use squared distance to avoid floating precision and sqrt cost.
Python Implementation
from dataclasses import dataclass
import math
@dataclass(frozen=True)
class Point:
x: float
y: float
def vector(a: Point, b: Point) -> Point:
"""Return vector AB = B - A."""
return Point(b.x - a.x, b.y - a.y)
def add(u: Point, v: Point) -> Point:
return Point(u.x + v.x, u.y + v.y)
def sub(u: Point, v: Point) -> Point:
return Point(u.x - v.x, u.y - v.y)
def scale(v: Point, k: float) -> Point:
return Point(v.x * k, v.y * k)
def dot(u: Point, v: Point) -> float:
return u.x * v.x + u.y * v.y
def norm2(v: Point) -> float:
"""Squared length."""
return dot(v, v)
def norm(v: Point) -> float:
"""Length."""
return math.sqrt(norm2(v))
def dist2(a: Point, b: Point) -> float:
return norm2(vector(a, b))
def dist(a: Point, b: Point) -> float:
return math.sqrt(dist2(a, b))
Line-by-Line Explanation
Pointholds coordinates; immutable (frozen=True) avoids accidental mutation bugs.vector(a,b)computes displacement from a to b by subtraction.add/sub/scaleimplement fundamental vector algebra operations.dotcomputes scalar product used for projections and angle checks.norm2uses dot(v,v), preferred when only comparing distances.dist2anddistare point-to-point distance helpers.
Brute Force → Better → Optimal Thinking
Brute Force Habit
Beginners often compute Euclidean distance with sqrt everywhere, even when only ordering/comparing is needed.
Better Practice
Compare squared distances to avoid sqrt and reduce floating error exposure.
Optimal Geometry Style
Use integer arithmetic where possible (especially with input integers), defer floating operations until final output, and structure all geometry around reusable vector primitives.
Time Complexity
- Each primitive operation (add/sub/dot/norm2/vector) is O(1).
- Distance with sqrt is O(1) but more expensive constant factor than squared distance.
- Geometry algorithms built from these primitives depend on number of points/segments (covered in later topics).
Space Complexity
- Each operation uses O(1) extra space.
- Algorithm-level memory depends on problem (e.g. hull arrays, sweep structures).
Edge Cases
- Coincident points: A = B gives zero vector and zero distance.
- Large coordinates: squared terms may overflow in fixed-width integer languages; use 64-bit types.
- Floating precision: equality comparisons on floats should use epsilon tolerance.
Common Mistakes
B-A is vector from A to B, not vice versa.
Pattern Recognition
If a geometry question involves movement, direction, distance, collinearity, turn direction, or projection, convert it to vector operations first. This algebraic translation is the key skill.
Practice Problems
- Given two points A and B, compute vector AB, distance, and squared distance.
- Given vectors u and v, determine if they are perpendicular using dot product.
- Find the nearest point to origin from a list of points using squared distance.
Summary
- Points represent location; vectors represent displacement.
- Core operations: subtraction, addition, scaling, dot product, norm, distance.
- Prefer squared distance for comparisons.
- Strong point/vector fundamentals make all later computational geometry topics easier.
18.2 Cross Product
Introduction
The cross product is one of the most powerful tools in computational geometry. In 2D, it helps answer questions like:
- Did we turn left or right?
- Are three points clockwise, counterclockwise, or collinear?
- What is the area of a triangle/parallelogram?
Even though "cross product" is a 3D vector operation in full linear algebra, in 2D geometry we use its scalar z-component. This scalar sign (+/−/0) is the heart of many geometry algorithms.
Real-World Analogy
Imagine walking from point A to B, then deciding where C lies relative to your direction. If C is to your left, that's a positive turn. If C is to your right, that's a negative turn. If C is straight ahead on the same line, turn is zero. Cross product gives this left/right/straight information instantly.
Formal Definition
u = (ux, uy) and v = (vx, vy), define:
cross(u, v) = ux*vy - uy*vx.
This equals the z-component of the 3D cross product if vectors are embedded as (x, y, 0).
cross(u, v) > 0: v is to the left of u (counterclockwise turn).cross(u, v) < 0: v is to the right of u (clockwise turn).cross(u, v) = 0: u and v are collinear.
|cross(u,v)| = |u||v|sin(theta), which equals area of the
parallelogram formed by u and v.
Why This Topic Matters
- Orientation test basis: Next topic (18.3) directly uses cross signs.
- Area computations: Triangle and polygon area formulas use cross products.
- Core in advanced geometry: Convex hull, segment intersection, winding checks all rely on cross product.
Mental Model
- Cross product is a signed area measurement.
- Sign tells direction of turn (left/right).
- Absolute value tells "how much turn" geometrically (scaled area).
- Zero means no turn (collinear points/vectors).
Step-by-Step: Cross for Three Points
Given points A, B, C, define vectors:
AB = B - A, AC = C - A.
Then orientation value:
cross(AB, AC).
- Compute
AB = (Bx-Ax, By-Ay). - Compute
AC = (Cx-Ax, Cy-Ay). - Evaluate
AB.x * AC.y - AB.y * AC.x. - Interpret sign:
- > 0: A→B→C is counterclockwise
- < 0: A→B→C is clockwise
- = 0: A, B, C are collinear
ASCII Diagram
Case 1: Left turn (positive cross)
C
*
/
/
A *------* B
AB points right, AC points up-right -> cross(AB, AC) > 0
Case 2: Right turn (negative cross)
A *------* B
\
\
* C
AB points right, AC points down-right -> cross(AB, AC) < 0
Area Interpretation
For vectors u and v:
- Parallelogram area =
|cross(u, v)| - Triangle area (with sides u and v from same origin) =
|cross(u, v)| / 2
For triangle ABC:
area = |cross(B-A, C-A)| / 2.
Python Implementation
from dataclasses import dataclass
@dataclass(frozen=True)
class Point:
x: float
y: float
def cross(u: Point, v: Point) -> float:
"""2D cross product (scalar z-component)."""
return u.x * v.y - u.y * v.x
def vector(a: Point, b: Point) -> Point:
return Point(b.x - a.x, b.y - a.y)
def orientation(a: Point, b: Point, c: Point) -> float:
"""Positive: left turn, Negative: right turn, Zero: collinear."""
return cross(vector(a, b), vector(a, c))
def triangle_area2(a: Point, b: Point, c: Point) -> float:
"""Twice the signed area of triangle ABC."""
return orientation(a, b, c)
def triangle_area(a: Point, b: Point, c: Point) -> float:
"""Unsigned area of triangle ABC."""
return abs(triangle_area2(a, b, c)) / 2.0
Line-by-Line Explanation
cross(u, v)computes the determinant of [u; v].orientation(a,b,c)is cross of AB and AC, the standard turn test primitive.triangle_area2returns signed double area (very common in geometry to avoid division).triangle_areatakes absolute value and divides by 2 for actual area.
Brute Force → Better → Optimal Thinking
Brute Force Habit
Using trigonometry (angles, arccos, atan2) for left/right checks is slower and numerically fragile.
Better Practice
Use determinant/cross directly for orientation and area checks.
Optimal Geometry Style
Prefer integer arithmetic cross computations when coordinates are integers; avoid floating operations unless final answer requires decimals.
Time Complexity
- Cross product of two vectors: O(1).
- Orientation for three points: O(1).
- Triangle area from cross: O(1).
Space Complexity
- All primitive calculations use O(1) extra space.
Edge Cases
- Collinear points: cross = 0 (or very close to 0 with floats).
- Large coordinates: products may overflow in 32-bit ints; use 64-bit in C++/Java.
- Floating inputs: compare against epsilon, e.g.,
abs(val) < 1e-9.
Common Mistakes
cross(u,v) = -cross(v,u), so sign flips.
orient(a,b,c) early and use it
everywhere (hull, intersection, polygon routines). Consistency prevents sign bugs.
Pattern Recognition
If a question asks left/right turn, clockwise/counterclockwise order, collinearity, or area of triangle/ polygon pieces, cross product is usually the first tool to try.
Practice Problems
- Given A, B, C, determine orientation (CW/CCW/collinear).
- Compute area of triangle from three points using cross product.
- Given a polyline of points, count how many left turns occur.
Summary
- Formula:
cross(u,v)=ux*vy-uy*vx. - Sign meaning: positive left turn, negative right turn, zero collinear.
- Area meaning:
|cross|= parallelogram area,|cross|/2= triangle area. - Complexity: O(1) time and space for primitive operations.
18.3 Orientation Test
Introduction
The orientation test determines whether three points A, B, C make a left turn (counterclockwise), a right turn (clockwise), or are collinear. This tiny O(1) primitive is one of the most used checks in computational geometry and appears in convex hulls, segment intersection, polygon algorithms, and sweep-line methods.
Real-World Analogy
Imagine standing at A, walking to B, and then deciding where C lies relative to your facing direction. If C is to your left, it is a counterclockwise turn. If it is to your right, clockwise. If straight ahead on the same line, collinear. The orientation test computes this exactly from coordinates.
Formal Definition
orient(A,B,C) = (bx-ax)*(cy-ay) - (by-ay)*(cx-ax).
This is the 2D cross product of vectors AB and AC.
orient > 0=> counterclockwise (left turn)orient < 0=> clockwise (right turn)orient = 0=> collinear
|orient|/2 is area of triangle ABC.
Why This Topic Matters
- Core geometry primitive: Many geometry algorithms reduce to repeated orientation checks.
- Intersection logic: Segment intersection is mostly orientation comparisons + boundary checks.
- Convex hull decisions: Hull construction repeatedly removes points that cause wrong turn direction.
Mental Model
- Orientation is the signed "turn amount" from AB to AC.
- Positive sign means C is to the left of directed line AB.
- Negative sign means C is to the right.
- Zero means A, B, C lie on one line.
Step-by-Step Breakdown
- Given points A, B, C, compute vector AB and vector AC.
- Compute determinant:
(Bx-Ax)*(Cy-Ay) - (By-Ay)*(Cx-Ax). - Check sign:
- positive -> CCW
- negative -> CW
- zero -> collinear
- Use this result as building block in larger algorithms.
ASCII Diagram
1) Counterclockwise (left turn)
C
*
/
/
A *------* B
orient(A,B,C) > 0
2) Clockwise (right turn)
A *------* B
\
\
* C
orient(A,B,C) < 0
3) Collinear
A *-----*-----* C
B
orient(A,B,C) = 0
Python Implementation
from dataclasses import dataclass
@dataclass(frozen=True)
class Point:
x: int
y: int
def orient(a: Point, b: Point, c: Point) -> int:
"""
Returns:
>0 for counterclockwise
<0 for clockwise
0 for collinear
"""
return (b.x - a.x) * (c.y - a.y) - (b.y - a.y) * (c.x - a.x)
def orientation_label(a: Point, b: Point, c: Point) -> str:
v = orient(a, b, c)
if v > 0:
return "CCW"
if v < 0:
return "CW"
return "COLLINEAR"
Line-by-Line Explanation
orientcomputes determinant/cross of AB and AC.- No trigonometry is needed; only integer arithmetic for integer coordinates.
orientation_labelmaps numeric sign to readable category for debugging/interview output.
Application: Segment Intersection Skeleton
For segments AB and CD, compute:
o1 = orient(A,B,C),
o2 = orient(A,B,D),
o3 = orient(C,D,A),
o4 = orient(C,D,B).
General intersection occurs when o1 and o2 have opposite signs and
o3 and o4 have opposite signs. Collinear edge cases require on-segment checks.
Brute Force → Better → Optimal Thinking
Brute Force Habit
Using slope comparisons (dy/dx) can fail on vertical lines and introduces floating precision issues.
Better Practice
Replace slope logic with orientation determinant. No division required, robust for vertical/horizontal lines.
Optimal Geometry Style
Build all geometry predicates from orient and avoid floating arithmetic unless unavoidable.
This improves correctness and performance.
Time Complexity
- Single orientation query: O(1).
- Four orientation checks for segment intersection core: O(1).
Space Complexity
- Orientation computation uses O(1) extra space.
Edge Cases
- Duplicate points: if A=B or B=C etc., orientation can be 0 and needs problem-specific handling.
- Collinearity: orientation=0 alone does not imply overlap/intersection; bounding-box checks may be needed.
- Large coordinates: use 64-bit integer types in fixed-width languages to avoid overflow.
- Floating coordinates: use epsilon threshold for near-zero values.
Common Mistakes
orient(A,B,C) = -orient(A,C,B); sign flips and logic breaks.
Pattern Recognition
If a geometry problem asks any of these: clockwise vs counterclockwise, turn direction, collinearity,
or relative side of a directed line, immediately think orient(a,b,c).
Practice Problems
- Given triples of points, classify each as CW, CCW, or collinear.
- Implement segment intersection using orientation + on-segment checks.
- Use orientation to filter non-left turns in a convex hull stack routine.
Summary
- Orientation formula:
(Bx-Ax)*(Cy-Ay) - (By-Ay)*(Cx-Ax). - Sign tells direction: + CCW, - CW, 0 collinear.
- Complexity: O(1) time and O(1) space.
- This primitive powers intersection tests, hull algorithms, and many geometry predicates.
18.4 Line Intersection
Introduction
Line Intersection problems ask whether two lines or line segments intersect, and sometimes where they intersect. In competitive programming and interviews, this usually means:
- Do two segments AB and CD intersect?
- If they intersect at one point, what is that point?
- How do we handle collinear overlaps and touching endpoints?
The key tool is the orientation test from 18.3. With a few orientation checks plus boundary checks, we can robustly detect intersection in O(1).
Real-World Analogy
Think of two roads drawn on a map. They may cross at an interior point, just touch at one endpoint, run parallel forever, or overlap partially if they are on the same line. Segment intersection logic is exactly classifying these possibilities using coordinate math.
Formal Definition
- General intersection occurs when C and D lie on opposite sides of line AB, and A and B lie on opposite sides of line CD.
- Using orientation:
o1 = orient(A,B,C),o2 = orient(A,B,D),o3 = orient(C,D,A),o4 = orient(C,D,B). - General case intersect if
o1*o2 < 0ando3*o4 < 0. - Special collinear cases require "point on segment" checks.
Why This Topic Matters
- Geometry backbone: Segment intersection is used in polygon algorithms, clipping, collision detection, and sweep-line methods.
- Interview frequent: Often appears directly or as a subroutine in harder geometry tasks.
- Precision practice: Teaches careful handling of edge cases (collinear, endpoints, overlap).
Mental Model
- Each segment defines a directed line and a side test (orientation sign).
- If endpoints of one segment fall on different sides of the other line, they "straddle" it.
- Mutual straddling implies proper intersection.
- If orientations are zero, points are collinear and you must check interval overlap.
Step-by-Step Segment Intersection Test
- Compute
o1=orient(A,B,C),o2=orient(A,B,D),o3=orient(C,D,A),o4=orient(C,D,B). - If
o1ando2have opposite signs ando3ando4have opposite signs, return intersect. - Handle special cases when any orientation is 0:
- If
o1==0and C is on segment AB -> intersect. - If
o2==0and D is on segment AB -> intersect. - If
o3==0and A is on segment CD -> intersect. - If
o4==0and B is on segment CD -> intersect.
- If
- Else, no intersection.
ASCII Diagram
1) Proper intersection (crossing)
A *------\
\
* X
/
C *------/
B D
Segments AB and CD cross at interior point X.
2) Touching at endpoint
A *-----* B
|
|
* C
B is shared endpoint -> intersection exists.
3) Collinear overlap
A *---------* B
C *---------* D
Same line, overlapping range -> intersection exists.
Python Implementation (Boolean Intersection)
from dataclasses import dataclass
@dataclass(frozen=True)
class Point:
x: int
y: int
def orient(a: Point, b: Point, c: Point) -> int:
return (b.x - a.x) * (c.y - a.y) - (b.y - a.y) * (c.x - a.x)
def on_segment(a: Point, b: Point, p: Point) -> bool:
"""Assumes p is collinear with segment ab; checks bounding box inclusion."""
return (min(a.x, b.x) <= p.x <= max(a.x, b.x) and
min(a.y, b.y) <= p.y <= max(a.y, b.y))
def segments_intersect(a: Point, b: Point, c: Point, d: Point) -> bool:
o1 = orient(a, b, c)
o2 = orient(a, b, d)
o3 = orient(c, d, a)
o4 = orient(c, d, b)
# General case: strict opposite signs
if (o1 > 0 and o2 < 0 or o1 < 0 and o2 > 0) and \
(o3 > 0 and o4 < 0 or o3 < 0 and o4 > 0):
return True
# Special collinear cases
if o1 == 0 and on_segment(a, b, c):
return True
if o2 == 0 and on_segment(a, b, d):
return True
if o3 == 0 and on_segment(c, d, a):
return True
if o4 == 0 and on_segment(c, d, b):
return True
return False
Line-by-Line Explanation
orientgives relative side/turn information in O(1).on_segmentuses bounding box to verify collinear point lies within segment endpoints.- General case checks strict sign opposition for mutual straddling.
- Special cases catch endpoint touching and collinear overlap.
Computing Exact Intersection Point (Non-Parallel Lines)
For infinite lines AB and CD, if they are not parallel, you can compute intersection using determinants. Let:
den = (x1-x2)*(y3-y4) - (y1-y2)*(x3-x4)
If den == 0, lines are parallel (or coincident). Otherwise:
px = ((x1*y2 - y1*x2)*(x3-x4) - (x1-x2)*(x3*y4 - y3*x4)) / den
py = ((x1*y2 - y1*x2)*(y3-y4) - (y1-y2)*(x3*y4 - y3*x4)) / den
This gives line-line intersection point. For segment-segment intersection, additionally ensure point lies on both segments.
Brute Force → Better → Optimal Thinking
Brute Force Habit
Solve line equations with floating-point slopes/intercepts and compare ranges manually. This is error-prone for vertical lines and precision corner cases.
Better Practice
Use orientation signs for robust boolean intersection, then compute exact point only when necessary.
Optimal Predicate Style
Integer orientation + bounding checks gives fast, robust O(1) predicate suitable for high-volume geometry tasks.
Time Complexity
- Constant number of orientation and min/max operations.
- Overall: O(1) per segment pair query.
Space Complexity
- Only fixed temporary variables.
- Overall: O(1).
Edge Cases
- Shared endpoint: counts as intersection in most definitions.
- Collinear disjoint: orientations may be 0 but bounding boxes do not overlap.
- Collinear overlapping: intersection exists as a segment (not unique point).
- Parallel non-collinear lines: no intersection.
Common Mistakes
Pattern Recognition
If a problem involves collision/contact of line segments, polygon edge crossing, ray casting, or planar graph edge checks, line intersection predicate is a core building block.
Practice Problems
- Implement boolean segment intersection for integer points.
- Return exact intersection point for two non-parallel infinite lines.
- Given many segments, count intersecting pairs (naive O(n^2), then optimize later with sweep line).
Summary
- Use
orientsigns to test relative sides and mutual straddling. - Always include collinear on-segment edge cases.
- Boolean segment intersection runs in O(1) time and O(1) space per query.
- This predicate is foundational for advanced computational geometry algorithms.
18.5 Polygon Area
Introduction
The Polygon Area problem asks you to compute the area enclosed by a polygon given its vertices in order. The standard and most important method is the Shoelace Formula (also called Gauss's area formula), which runs in O(n) for n vertices and works for any simple polygon (convex or concave).
This topic is foundational in computational geometry because many advanced tasks (centroid, winding, clipping, lattice geometry) build on the same cross-product summation pattern.
Real-World Analogy
Imagine tracing a boundary on a map with GPS points in order. Instead of decomposing manually into many triangles and adding areas one by one, shoelace gives one clean loop: pair each point with the next point, compute a signed cross term, sum them, then divide by 2.
Formal Definition
P0, P1, ..., P(n-1) in cyclic order, with
Pi = (xi, yi), define:
signed_area2 = sum(i=0..n-1) (xi * y(i+1) - yi * x(i+1)),
where index i+1 wraps around modulo n.
- Signed area:
A_signed = signed_area2 / 2 - Actual area:
A = |A_signed|
Why This Topic Matters
- Interview frequent: "Compute area of polygon from vertices" is a common geometry prompt.
- Cross-product pattern: Reinforces orientation/cross ideas from 18.2 and 18.3.
- Algorithmic utility: Needed in polygon processing, GIS, graphics, and robotics mapping.
Mental Model
- Area can be seen as sum of signed triangle/parallelogram contributions from each edge.
- Each edge
Pi -> P(i+1)contributescross(Pi, P(i+1)) / 2. - Positive/negative contributions cancel correctly for concave shapes.
- Absolute value at the end gives geometric area.
Step-by-Step Shoelace Computation
- Initialize
area2 = 0(this stores twice signed area). - For each vertex i, let j = (i+1) mod n.
- Add term:
xi * yj - yi * xjtoarea2. - After loop, signed area =
area2 / 2. - Return
abs(area2) / 2for actual polygon area.
ASCII Diagram
Example polygon (rectangle): P0(0,0), P1(4,0), P2(4,3), P3(0,3) Shoelace table: i (xi,yi) (xj,yj) xi*yj - yi*xj 0 (0,0) -> (4,0) 0*0 - 0*4 = 0 1 (4,0) -> (4,3) 4*3 - 0*4 = 12 2 (4,3) -> (0,3) 4*3 - 3*0 = 12 3 (0,3) -> (0,0) 0*0 - 3*0 = 0 area2 = 24 Area = |24| / 2 = 12
Python Implementation
from dataclasses import dataclass
@dataclass(frozen=True)
class Point:
x: float
y: float
def polygon_area(points: list[Point]) -> float:
"""
Returns absolute area of a simple polygon.
Points must be given in boundary order (clockwise or counterclockwise).
"""
n = len(points)
if n < 3:
return 0.0
area2 = 0.0
for i in range(n):
j = (i + 1) % n
area2 += points[i].x * points[j].y - points[i].y * points[j].x
return abs(area2) / 2.0
def signed_polygon_area(points: list[Point]) -> float:
"""
Positive for counterclockwise ordering, negative for clockwise.
"""
n = len(points)
if n < 3:
return 0.0
area2 = 0.0
for i in range(n):
j = (i + 1) % n
area2 += points[i].x * points[j].y - points[i].y * points[j].x
return area2 / 2.0
Line-by-Line Explanation
n < 3returns 0 because fewer than 3 points cannot enclose positive area.j = (i + 1) % nwraps last vertex back to first, closing polygon.- Each loop adds one shoelace cross term.
polygon_areauses absolute value to return geometric area.signed_polygon_areapreserves orientation information (useful in many algorithms).
Brute Force → Better → Optimal
Brute Force
Triangulate manually from a fixed point and sum triangle areas. Works but can be more error-prone and verbose.
Better
Use cross products edge-by-edge to accumulate signed contributions.
Optimal (for ordered vertices)
Shoelace formula gives O(n) time and O(1) extra space, which is optimal since every vertex must be read.
Time Complexity
- Single pass over n vertices.
- Overall: O(n).
Space Complexity
- Only constant extra variables (
area2, indices). - Overall: O(1) extra space.
Edge Cases
- Less than 3 points: area is 0.
- Collinear polygon points: area can be 0.
- Clockwise vertex order: signed area negative; absolute still correct.
- Repeated first point at end: either handle directly or preprocess to avoid duplicate closing point.
Common Mistakes
area2 as integer for exact arithmetic and divide at the
end. This avoids floating precision drift in large polygons.
Pattern Recognition
Whenever you are given polygon vertices in order and asked for area (or orientation), think shoelace immediately. The same cross-sum appears in centroid and winding-based formulas.
Practice Problems
- Compute area of convex and concave polygons from ordered vertices.
- Determine whether vertex order is clockwise or counterclockwise via signed area.
- Given polygon points, remove duplicate final closing point and recompute area robustly.
Summary
- Main formula:
Area = |sum(xi*y(i+1) - yi*x(i+1))| / 2. - Works for simple convex/concave polygons with ordered vertices.
- Complexity: O(n) time, O(1) extra space.
- Signed area gives orientation; absolute gives geometric area.
18.6 Convex Hull
Introduction
The Convex Hull of a set of points is the smallest convex polygon that contains all points. A classic visualization is: put nails on a board at each point and stretch a rubber band around them; when released, the band outlines the convex hull.
Convex hull is a cornerstone geometry problem. Many higher-level tasks (diameter of points, rotating calipers, collision boundaries, farthest pair, shape simplification) start by computing the hull.
Real-World Analogy
Imagine plotting all delivery locations on a map. If you want a minimal outer boundary that encloses all locations, you draw the boundary around the outermost points only. Inner points do not matter for the boundary. That outer boundary is the convex hull.
Formal Definition
Why This Topic Matters
- Foundational geometry primitive: Used by many optimization and distance problems.
- Interview significance: Tests sorting + orientation + stack-like construction.
- Algorithmic pattern: "Maintain valid boundary, pop when turn is wrong" appears in several geometry routines.
Mental Model
- Sort points left-to-right.
- Build lower boundary from left to right.
- Build upper boundary from right to left.
- While adding a point, if last turn is not counterclockwise (or not strictly, depending on collinear policy), remove middle point.
- Combine boundaries to form full hull.
Approach Choice
There are multiple hull algorithms:
- Jarvis March (Gift Wrapping): O(nh), where h = hull points.
- Graham Scan: O(n log n).
- Monotonic Chain (Andrew): O(n log n), easy to implement and interview-friendly.
We use Monotonic Chain because it is concise, robust, and directly uses orientation tests.
Brute Force → Better → Optimal
Brute Force
For each pair of points (A, B), check if all other points lie on one side of line AB. If yes, AB is hull edge. Complexity O(n^3), too slow for large n.
Better Intuition
Hull is boundary-only, so interior points should be eliminated quickly when they cause inward turns.
Optimal Practical Method
Sort points and use orientation-based stack popping. Complexity O(n log n) due to sorting; linear pass after sort.
Step-by-Step (Monotonic Chain)
- Sort unique points lexicographically by (x, y).
- Build lower hull:
- Iterate sorted points.
- While last two points + new point make non-left turn, pop last point.
- Append new point.
- Build upper hull similarly using reversed order.
- Concatenate lower and upper, excluding duplicate endpoints.
ASCII Diagram
Points:
* *
* * *
* * * *
*
Hull uses only outer boundary points:
H-----------H
/ \
H H
\ /
H-----------H
Interior points are ignored.
Python Implementation
from dataclasses import dataclass
@dataclass(frozen=True, order=True)
class Point:
x: int
y: int
def cross(o: Point, a: Point, b: Point) -> int:
"""
Cross of OA x OB where O is origin point 'o'.
Positive => counterclockwise turn.
"""
return (a.x - o.x) * (b.y - o.y) - (a.y - o.y) * (b.x - o.x)
def convex_hull(points: list[Point]) -> list[Point]:
"""
Monotonic chain convex hull.
Returns hull vertices in counterclockwise order without repeating first point.
"""
pts = sorted(set(points))
n = len(pts)
if n <= 1:
return pts
lower = []
for p in pts:
while len(lower) >= 2 and cross(lower[-2], lower[-1], p) <= 0:
lower.pop()
lower.append(p)
upper = []
for p in reversed(pts):
while len(upper) >= 2 and cross(upper[-2], upper[-1], p) <= 0:
upper.pop()
upper.append(p)
# Remove duplicate endpoints
return lower[:-1] + upper[:-1]
Line-by-Line Explanation
sorted(set(points))removes duplicates and sorts by x, then y.cross(lower[-2], lower[-1], p)checks turn direction of last edge with new point.<= 0pops non-left turns, producing strictly convex boundary (collinear middle points removed).- Upper hull repeats same logic in reverse order.
- Last point of lower and upper duplicates first point of the other boundary, so we slice with
[:-1].
Collinear Policy (Important)
The condition cross <= 0 removes collinear points on edges, keeping only extreme endpoints.
If you want to keep all boundary collinear points, change condition to cross < 0.
Interviewers may ask this variant explicitly.
Time Complexity
- Sorting: O(n log n)
- Lower + Upper passes: O(n) amortized (each point pushed/popped at most once per pass)
- Overall: O(n log n)
Space Complexity
- Hull stacks and sorted list require O(n) space.
- Overall: O(n).
Edge Cases
- 0 or 1 point: hull is the same set.
- 2 points: hull is both points.
- All points collinear: with
cross <= 0, result has two endpoints only. - Duplicate points: remove before processing.
Common Mistakes
< vs >) causing inside-out hull.
Pattern Recognition
If a problem asks for outer boundary, minimal enclosing polygon from points, or wants to ignore interior points before further computation (farthest pair, perimeter), convex hull is usually the first step.
Practice Problems
- LeetCode 587 — Erect the Fence (convex hull with collinear boundary points kept).
- Compute perimeter/area of convex hull of given points.
- Given points, find farthest pair by first computing hull (rotating calipers extension).
Summary
- Goal: smallest convex polygon containing all points.
- Method: sort + lower/upper hull with orientation-based popping.
- Complexity: O(n log n) time, O(n) space.
- Key detail: choose collinear policy via cross condition.
18.7 Sweep Line Algorithm
Introduction
A Sweep Line Algorithm is a design technique for geometry problems where we move an imaginary line (usually left to right) across the plane and process events in sorted order. Instead of checking all pairs (often O(n^2)), we maintain only the currently relevant objects in an active set, which often reduces complexity to O(n log n).
Sweep line is not one single algorithm; it is a pattern used in many tasks: segment intersection detection, union of intervals, closest pair of points, skyline, rectangle overlap, and area/coverage computations.
Real-World Analogy
Imagine a vertical scanner moving across a map from left to right. At each x-position where something changes (an object starts or ends), you update your current "active" objects. You do not care about objects far away from the scanner because they cannot interact right now. This local focus is what makes sweep line fast.
Formal Definition
- Events: Critical x (or y) coordinates where state changes (e.g., segment start/end).
- Active set: Data structure of objects intersecting current sweep position.
- Update/query rules: Insert/remove objects at events and check only local neighbors or aggregate state.
Why This Topic Matters
- Major complexity drop: Converts many O(n^2) geometric checks into O(n log n).
- Interview and contest value: Appears in advanced geometry and interval problems.
- Reusable paradigm: Same event-sorting + active-structure idea works across domains.
Mental Model
- Sort events by sweep coordinate.
- At each event, update active set.
- Only nearby/adjacent items in active set can create new interactions.
- Maintain an invariant about what active set represents at current sweep position.
Core Sweep Template
events = sorted(all_events)
active = OrderedStructure()
for event in events:
if event.type == "start":
active.insert(event.object)
# check possible interactions with neighbors
elif event.type == "end":
# check neighbors that become adjacent after removal
active.remove(event.object)
else:
# custom event handling (if problem generates extra events)
process(event)
Step-by-Step Example: Detect Any Overlapping Intervals
Problem: given intervals on a number line, determine whether any two overlap. This is a 1D sweep line version that clearly shows the pattern.
- Convert each interval [l, r] into two events: (l, start), (r, end).
- Sort events by position; if equal position, process starts before ends (or define based on closed/open interval policy).
- Maintain
active_count= number of currently open intervals. - On start: if
active_count > 0, overlap exists; increment count. - On end: decrement count.
ASCII Diagram
Intervals: I1: [1,5] I2: [3,7] I3: [8,10] Events sorted: x=1 start(I1) x=3 start(I2) -> active already non-empty => overlap found x=5 end(I1) x=7 end(I2) x=8 start(I3) x=10 end(I3)
Python Implementation (1D Sweep)
def has_overlap(intervals: list[tuple[int, int]]) -> bool:
"""
Return True if any two closed intervals overlap.
"""
events = []
for l, r in intervals:
events.append((l, 0)) # 0 = start
events.append((r, 1)) # 1 = end
# For closed intervals [l,r], start before end at same coordinate means touching counts as overlap.
events.sort(key=lambda x: (x[0], x[1]))
active = 0
for _, typ in events:
if typ == 0: # start
if active > 0:
return True
active += 1
else: # end
active -= 1
return False
Line-by-Line Explanation
- Each interval contributes two events.
- Sorting determines the exact sweep processing order.
activetracks how many intervals are currently open.- When a new interval starts while one is active, overlap is detected.
2D Sweep Insight: Segment Intersection (High Level)
In 2D segment intersection (Bentley-Ottmann style), events are segment endpoints (and sometimes discovered intersections), and active set stores segments ordered by y-coordinate at current sweep x. Only neighboring segments in this order need intersection checks, which is the key optimization from O(n^2) naive checks.
Brute Force → Better → Optimal
Brute Force
Check all pairs of objects (e.g., all segment pairs) -> O(n^2).
Better
Sort by one coordinate and process incrementally while keeping active candidates.
Optimal Pattern (for many sweep problems)
Event sorting O(n log n) + logarithmic active structure operations per event -> O(n log n) (or O((n+k) log n) when k intersections/events are produced).
Time Complexity
- General sweep pattern: O(E log E + updates/queries), where E is number of events.
- Typical case: O(n log n) for static event sets.
- Intersection-reporting variants: often O((n + k) log n), where k is reported intersections.
Space Complexity
- Events list + active structure usually require O(n) space.
- Output-sensitive variants add O(k) output storage.
Edge Cases
- Tied events: tie-break order (start/end/intersection) must match problem semantics.
- Touching boundaries: decide whether touching counts as overlap/intersection.
- Duplicate objects: ensure deterministic handling to avoid double counting.
- Floating geometry: precision issues can break event ordering and active comparisons.
Common Mistakes
Pattern Recognition
If a problem has many objects along one axis and asks for overlaps/intersections/coverage while scanning position, think sweep line. Keywords: events, active intervals/segments, sorted endpoints, local neighbor checks.
Practice Problems
- Detect if any intervals overlap (1D sweep).
- Compute maximum number of overlapping intervals (meeting rooms variant).
- Count/report segment intersections (advanced sweep with ordered active set).
- Rectangle union area with line sweep + segment tree (advanced).
Summary
- Idea: move an imaginary line, process sorted events, maintain active set.
- Benefit: avoids all-pairs checks by focusing on currently relevant objects.
- Typical complexity: O(n log n) or output-sensitive O((n+k) log n).
- Key correctness detail: define and preserve event ordering + active-set invariant.
Section 19: Advanced Data Structures
This section covers advanced structures used when arrays, hash maps, and basic trees are not enough. You will learn ordered sets/maps, Treaps, Skip Lists, B-Trees, KD Trees, Persistent Segment Trees, and Rope. The common theme is maintaining order or rich query power with efficient updates.
19.1 Ordered Set / Map
Introduction
An Ordered Set stores unique keys in sorted order. An Ordered Map stores key-value pairs with keys kept in sorted order. Unlike hash-based structures, ordered structures support powerful operations such as:
- find smallest/largest key quickly
- find predecessor/successor of a key
- iterate keys in sorted order
- query/count in key ranges efficiently
These are typically implemented using self-balancing BSTs (Red-Black Tree, AVL, etc.), giving O(log n) update/query times.
Real-World Analogy
Think of a dictionary shelf arranged alphabetically. You can quickly find words near a target word, and easily browse in order. A hash table is like a random box with labels: great for exact lookup, but poor for "next bigger word" or "all words between A and B". Ordered set/map is the alphabetically organized shelf.
Formal Definition
- Ordered Set: collection of distinct keys from a totally ordered domain.
- Ordered Map: mapping key -> value with unique ordered keys.
- Core ops (
n= number of keys): insert/delete/search = O(log n), next/prev = O(log n), in-order traversal = O(n). - Typical backend: balanced BST, so tree height stays O(log n).
Why This Topic Matters
- Range and neighbor queries: Essential in problems involving closest values, interval boundaries, and sorted dynamic sets.
- Sweep line active sets: Many geometry/interval algorithms need ordered active structures.
- Interview depth: Knowing when hash map is insufficient and ordered map is required is a key design skill.
Mental Model
- Hash map: fast exact key lookup, no order guarantees.
- Ordered map/set: slightly slower exact lookup (log n), but rich order-aware operations.
- Use ordered structures when "next/previous/range" appears in requirements.
Core Operations (Conceptual)
Ordered Set
add(x),remove(x),contains(x)first(),last()lower(x)(greatest key < x),higher(x)(smallest key > x)floor(x)(greatest key <= x),ceiling(x)(smallest key >= x)iterate in sorted order
Ordered Map
put(key, value),get(key),remove(key)- All ordered-key neighbor operations from set apply to keys.
- Range views/queries on key intervals in many language libraries.
Step-by-Step Usage Pattern
- Insert elements as stream arrives.
- For each new element x, query predecessor/successor to find nearest existing values.
- Optionally update answer from range query on [L, R].
- Delete elements when sliding window or active interval moves.
This pattern appears in nearest-neighbor updates, balanced window statistics, and sweep line active sets.
ASCII Diagram
Ordered set keys: 2, 5, 8, 13, 21 For x = 10: predecessor (lower/floor) = 8 successor (higher/ceiling) = 13 For range [5, 13]: keys in range -> 5, 8, 13
Python Implementation Notes
Python standard library has no built-in tree-based ordered set/map. Common options:
- bisect + sorted list: simple, but insertion/deletion O(n) due to list shifts.
- third-party SortedContainers: near O(log n) operations, very practical.
- heapq: good for min/max only, not general ordered predecessor/successor.
Example with bisect (educational baseline)
import bisect
class OrderedSetList:
def __init__(self):
self.a = []
def add(self, x: int) -> None:
i = bisect.bisect_left(self.a, x)
if i == len(self.a) or self.a[i] != x:
self.a.insert(i, x) # O(n) shift
def remove(self, x: int) -> None:
i = bisect.bisect_left(self.a, x)
if i < len(self.a) and self.a[i] == x:
self.a.pop(i) # O(n) shift
def contains(self, x: int) -> bool:
i = bisect.bisect_left(self.a, x)
return i < len(self.a) and self.a[i] == x
def lower(self, x: int):
i = bisect.bisect_left(self.a, x)
return self.a[i - 1] if i > 0 else None
def higher(self, x: int):
i = bisect.bisect_right(self.a, x)
return self.a[i] if i < len(self.a) else None
Line-by-Line Explanation
bisect_leftfinds insertion/search position in sorted order via binary search O(log n).- Membership test is O(log n) by checking that position.
lower/highercome naturally from neighboring indices.- Insertion/deletion in Python list is O(n) due to element shifts, so this is not a true balanced-tree complexity baseline.
Complexity Table (Typical Implementations)
Structure Search Insert Delete Lower/Higher Ordered Iteration ----------------------------------------------------------------------------------------------- Hash Set / Hash Map O(1)* O(1)* O(1)* Not supported Unordered Balanced BST Ordered Set/Map O(log n) O(log n) O(log n) O(log n) O(n) Python sorted list + bisect O(log n) O(n) O(n) O(log n) O(n) (* average-case for hash structures)
Brute Force → Better → Optimal
Brute Force
Keep unsorted list and scan linearly for predecessor/successor/range each time: O(n) per query.
Better
Keep sorted list + binary search for lookup/neighbors: O(log n) query but O(n) updates.
Optimal (dynamic ordered data)
Use balanced BST ordered set/map for O(log n) both queries and updates.
Time Complexity
- Balanced BST ordered set/map: insert, erase, find, lower/higher all O(log n).
- In-order iteration: O(n).
- Range query traversal: O(log n + k) where k results returned.
Space Complexity
- Store n keys (and n values for map): O(n).
- Balanced tree pointers/metadata add constant-factor overhead per node.
Edge Cases
- Empty set/map: predecessor/successor queries should return None/null safely.
- Duplicate inserts in set: ignored by set semantics.
- Key overwrite in map: insert same key updates value instead of creating duplicate key.
- Boundary queries: lower(min_key) or higher(max_key) has no answer.
Common Mistakes
Pattern Recognition
Keywords that strongly indicate ordered set/map: "closest smaller/greater", "next element in dynamic set", "range count/sum over keys", "maintain sorted active elements", "floor/ceiling".
Practice Problems
- Maintain a dynamic set of numbers and answer predecessor/successor queries online.
- Sliding window with nearest value lookup (active ordered set).
- Given stream of numbers, report count of keys in [L, R] after each update.
Summary
- Ordered set/map stores keys in sorted order with dynamic updates.
- Main strength: predecessor/successor and range operations in O(log n).
- Backed by balanced BST in typical language libraries.
- Use when order-aware queries are required; use hash structures for pure exact-lookup workloads.
19.2 Treap
Introduction
A treap (tree + heap) is a randomized binary search tree where each node stores a key (BST order: left < root < right) and a random priority (heap order: parent priority ≥ children in max-heap convention, or the reverse for min-heap). The random priorities make the tree balanced in expectation: height is O(log n) with high probability, so search, insert, and delete run in expected O(log n) time without explicit rotations like AVL or Red-Black trees.
Real-World Analogy
Think of organizing people in a line by height (BST key order), but each person also draws a random “VIP number” (priority). Higher VIP numbers float up like in a heap. The combination forces a unique shape that stays shallow on average—no one has to manually rebalance by complex rules; the randomness does the work.
Formal Definition
(key, priority). Invariants:
- BST property: all keys in left subtree < node.key < all keys in right subtree.
- Heap property: each node’s priority is ≥ (or ≤, by convention) both children’s priorities.
Why This Topic Matters
- Simple balancing: Easier to implement than AVL/RB in contest settings; split/merge are powerful primitives.
- Implicit treap extension: Same idea supports sequence operations (rope-like) by keying by index.
- Interview depth: Shows you understand randomized data structures and BST invariants.
Mental Model
- BST alone can degenerate to a chain; heap priority “pulls” high-priority nodes up, keeping depth small in expectation.
- Insert: treat as BST insert, then rotate (or use split/merge) until heap property holds—or build via split/merge directly.
- Split and merge are the two core operations from which everything else can be built.
Split and Merge
Split(root, key)
Splits the treap into two treaps: (L, R) where all keys in L are ≤ key (or < key, by convention) and all keys in R are > key. Walk down the tree; heap priorities guide rotations implicitly in recursive split.
Merge(L, R)
Assumes all keys in L < all keys in R. Combines into one treap: compare roots’ priorities and attach the smaller root as child of the larger.
Insert and Delete (via Split/Merge)
- Insert(k): Split root at k, create node (k, random priority), merge(merge(left, node), right).
- Delete(k): Split to isolate k, split again to remove node, merge remaining parts.
ASCII Diagram (Conceptual)
Treap (max-heap on priority, BST on key)
(40, prio 9)
/ \
(20, 7) (50, 8)
/ \ \
(10,6) (30,5) (60,4)
Keys follow BST; priorities decrease down the heap (example numbers illustrative).
Python Implementation (Split / Merge Treap)
import random
from dataclasses import dataclass
from typing import Optional, Tuple
@dataclass
class Node:
key: int
pri: int
left: Optional["Node"] = None
right: Optional["Node"] = None
def split(root: Optional[Node], key: int) -> Tuple[Optional[Node], Optional[Node]]:
"""All keys <= key go left, > key go right."""
if root is None:
return (None, None)
if root.key <= key:
l, r = split(root.right, key)
root.right = l
return (root, r)
else:
l, r = split(root.left, key)
root.left = r
return (l, root)
def merge(a: Optional[Node], b: Optional[Node]) -> Optional[Node]:
"""All keys in a < all keys in b."""
if not a:
return b
if not b:
return a
if a.pri > b.pri:
a.right = merge(a.right, b)
return a
else:
b.left = merge(a, b.left)
return b
def insert(root: Optional[Node], key: int) -> Optional[Node]:
# split: left has keys <= t, right has > t
left, right = split(root, key - 1) # left: keys < key, right: keys >= key
mid_left, mid_right = split(right, key) # mid_left: keys == key, mid_right: keys > key
if mid_left is not None: # key already present (unique keys)
return merge(merge(left, mid_left), mid_right)
node = Node(key=key, pri=random.randint(1, 10**9))
return merge(merge(left, node), mid_right)
def erase(root: Optional[Node], key: int) -> Optional[Node]:
left, right = split(root, key - 1)
mid_left, mid_right = split(right, key) # drop mid_left (subtree of nodes with this key)
return merge(left, mid_right)
Here split(root, t) puts keys <= t in the left treap and
> t in the right treap. Double-split isolates keys equal to key for
insert/delete. Duplicate-key policy should match the problem statement.
Line-by-Line Explanation
split: If root.key ≤ key, the root belongs to the left treap; split right child and attach left part back.merge: Higher priority becomes parent; recursively merge the other side.insert: Isolate range forkey, insert new leaf if missing, merge three parts.erase: Isolate the node withkeyand drop it by merging around it.
Time and Space Complexity
- Expected time: O(log n) per split, merge, insert, delete, search (walk).
- Worst case (rare): O(n) if random priorities are adversarially bad—use cryptographic RNG or treap as intended (random).
- Space: O(n) nodes, O(log n) expected recursion depth.
Edge Cases
- Duplicate keys: Decide whether BST allows duplicates; may store count in node.
- Empty tree: split/merge base cases return None.
- Deterministic priorities: If priorities repeat, tie-break consistently to avoid structural ambiguity.
Common Mistakes
Practice Problems
- Implement treap with insert, delete, and search.
- Extend with subtree size for k-th order statistic.
- Implicit treap: reverse subarray, cut/paste sequence.
Summary
- Treap: randomized BST balanced by heap priorities.
- Core ops: split, merge; insert/delete compose from them.
- Expected complexity: O(log n) height and operations.
- Alternative to AVL/RB; especially popular in competitive programming.
19.3 Skip List
Introduction
A skip list is a probabilistic data structure that maintains a sorted sequence of elements
and supports search, insert, and delete in expected O(log n) time. Instead of a single
linked list (O(n) search), skip lists stack several levels of sparse express lanes:
higher levels skip over many nodes so you can “fast-forward” toward the target, then drop down level by
level. It is a practical alternative to balanced BSTs and is used in real systems (for example Redis sorted
sets use a skip list–like design; Java’s ConcurrentSkipListMap is skip-list based).
Real-World Analogy
Imagine a train line with a local track that stops at every station and an express track that skips stops. To reach a distant station, you ride express until you would overshoot, then switch to a slower line that stops more often. Skip lists are the same idea in a linked structure: higher levels are express, lower levels are local.
Formal Definition
forward[i] to the next node at level i,
and a level (height) chosen at insert time. Invariant: level i lists are sorted by key;
level i+1 is a subsequence of level i. Search starts at the highest level of the header, walks forward
while next key < target, then steps down one level, repeating until level 0. Expected number of levels
per node is O(1) if level is chosen with geometric distribution (e.g. coin flips until tails).
Why This Topic Matters
- Engineering reality: Easier to implement lock-free or concurrent variants than many tree rebalancing schemes.
- Same asymptotics as balanced BST: Expected O(log n) search/insert/delete without rotations.
- Interviews: Tests understanding of randomized structures and layered linked lists.
Mental Model
- Bottom level (0) is an ordinary sorted linked list.
- Higher levels are shortcuts: a node at level k appears in lists 0..k.
- Search never goes backward along a level—only forward and down.
- Random height keeps shortcuts sparse so expected path length is logarithmic.
Random Level (Typical Rule)
On insert, set level = 1, then while random() < p (often p = 1/2 or 1/4) and level < MAX_LEVEL, increment level. Expected level is small; cap MAX_LEVEL to O(log n) to bound pointers.
Step-by-Step: Search
- Start at header, current level = max level in structure.
- While forward pointer at this level points to a node with key < target, move forward.
- If next key ≥ target or null, go down one level.
- At level 0, forward is either the node with key or the insertion position.
Step-by-Step: Insert
- Search path: record, at each level, the last node before dropping (the “update” array).
- Choose random level for new node; extend list height if needed.
- Splice new node: for each level ≤ new level, set new.forward[i] = update[i].forward[i] and update[i].forward[i] = new.
Step-by-Step: Delete
- Find node at level 0 (same search).
- For each level where the node participates, redirect predecessors’ forward pointers to skip the node.
ASCII Diagram
Levels (higher = more express): L2: HEAD --------> 30 -----------------------------> NULL L1: HEAD --> 15 --> 30 --------> 45 --> NULL L0: HEAD --> 10 --> 15 --> 20 --> 30 --> 45 --> NULL Search for 20: from L2, skip to 30 (too far), down; at L1, skip to 15, forward would be 30 > 20, down; at L0, walk 10 -> 15 -> 20.
Python Implementation (Educational)
import random
from dataclasses import dataclass, field
from typing import List, Optional
MAX_LEVEL = 16
P = 0.5
@dataclass
class Node:
key: int
forward: List[Optional["Node"]] = field(default_factory=list)
class SkipList:
def __init__(self):
self.header = Node(key=-10**18, forward=[None] * MAX_LEVEL)
self.level = 0
def _random_level(self) -> int:
lvl = 0
while random.random() < P and lvl < MAX_LEVEL - 1:
lvl += 1
return lvl
def search(self, key: int) -> bool:
cur = self.header
for i in range(self.level, -1, -1):
while cur.forward[i] is not None and cur.forward[i].key < key:
cur = cur.forward[i]
cur = cur.forward[0]
return cur is not None and cur.key == key
def insert(self, key: int) -> None:
update: List[Optional[Node]] = [None] * MAX_LEVEL
cur = self.header
for i in range(self.level, -1, -1):
while cur.forward[i] is not None and cur.forward[i].key < key:
cur = cur.forward[i]
update[i] = cur
cur = cur.forward[0]
if cur is not None and cur.key == key:
return
new_level = self._random_level()
if new_level > self.level:
for i in range(self.level + 1, new_level + 1):
update[i] = self.header
self.level = new_level
x = Node(key=key, forward=[None] * MAX_LEVEL)
for i in range(new_level + 1):
x.forward[i] = update[i].forward[i]
update[i].forward[i] = x
def delete(self, key: int) -> None:
update: List[Optional[Node]] = [None] * MAX_LEVEL
cur = self.header
for i in range(self.level, -1, -1):
while cur.forward[i] is not None and cur.forward[i].key < key:
cur = cur.forward[i]
update[i] = cur
cur = cur.forward[0]
if cur is None or cur.key != key:
return
for i in range(self.level + 1):
if update[i].forward[i] != cur:
continue
update[i].forward[i] = cur.forward[i]
while self.level > 0 and self.header.forward[self.level] is None:
self.level -= 1
The sentinel header uses a key smaller than any real key. Production code uses a proper comparator and optional values per node.
Line-by-Line Explanation
_random_level: geometric distribution; expected O(1) levels per node.search: same walk as insert until level 0, then check key equality.insert:update[i]is the predecessor at level i before splicing in the new node.delete: unlink at every level where the target appears; shrinkself.levelif top levels empty.
Time and Space Complexity
- Expected time: O(log n) for search, insert, delete (with p = 1/2 and MAX_LEVEL = O(log n)).
- Worst case (unlikely): O(n) if every node reaches max level (bad luck).
- Space: O(n) nodes; expected O(n) extra pointers total (each node ~2 pointers on average with p = 1/2).
Edge Cases
- Duplicate keys: Decide whether insert is no-op or multiset; above code skips duplicate.
- Empty list: header only;
levelmay be 0. - MIN/MAX key: sentinel must be smaller than any inserted key (or use separate head/tail sentinels).
Common Mistakes
update entries to header when new node’s level exceeds current list height.
forward arrays (level 0..L vs length L+1).
Pattern Recognition
When you need sorted order, predecessor/successor, or range iteration with expected logarithmic updates and want a linked structure (or concurrency-friendly design), skip list is a strong candidate next to treap or balanced tree libraries.
Practice Problems
- Implement skip list with insert, delete, search, and optional
lower_bound. - Compare average path length vs theoretical O(log n) by simulation.
- Extend nodes with satellite data (sorted map semantics).
Summary
- Skip list: multi-level sorted linked lists with random node heights.
- Core idea: search forward on high levels, drop down when next key would overshoot.
- Expected complexity: O(log n) time, O(n) space.
- Practical alternative to balanced BSTs; used in real concurrent and database-adjacent structures.
19.4 B-Tree
Introduction
A B-tree is a self-balancing search tree designed so that each node can hold many keys (not just one like a typical binary node). That high branching factor keeps the tree shallow, which is ideal when each node read/write corresponds to an expensive disk block or page. B-trees (and the closely related B+ tree, where all records live in leaves) are the standard index structure in relational databases and many file systems.
Real-World Analogy
A phone book split into thick chapters is easier to search than a single endless scroll: you jump to the right chapter (wide internal node), then narrow inside it. B-tree nodes are those chapters—each stores several separators so one disk read brings many routing decisions at once.
Formal Definition
t ≥ 2 (definitions vary by textbook;
some use maximum children m instead). A B-tree of order t satisfies:
- Every node has at most
2t − 1keys and at most2tchildren (common convention). - Every node except root has at least
t − 1keys (at leasttchildren for internal nodes). - Root may have as few as 1 key (if not a leaf) or is a single leaf.
- All leaves appear at the same depth (perfectly balanced).
- Internal node with
kkeys hask + 1children; keys partition child subtrees.
B-Tree vs B+ Tree (Practical Note)
In a B+ tree, keys in internal nodes are only separators; all actual records (or pointers to rows) sit in leaves, often linked in order for range scans. Database indexes are usually B+ trees. Classic B-tree may store values in internal nodes too. Algorithm courses often teach B-tree properties; production systems emphasize B+ tree behavior.
Why This Topic Matters
- Disk and cache locality: Fewer levels ⇒ fewer random I/Os for large datasets.
- Industry standard: Explaining indexes as “B+ tree” is expected in system design interviews.
- Contrast with BST: Binary trees minimize comparisons in RAM; B-trees minimize node accesses when nodes are huge.
Mental Model
- Each node = one block/page of sorted keys + child pointers.
- Height stays O(log n) but base of logarithm is large (number of keys per node), so height is small.
- Insert may split a full node; delete may merge or borrow from sibling.
Search Algorithm
- Start at root.
- Find smallest index i such that key ≤ keys[i] (or use linear/ binary search within node).
- If key equals keys[i], found (unless internal-only separators in B+).
- Else descend to child i (keys before i are smaller interval).
- Repeat until leaf or match.
Cost per level: O(t) comparisons inside node if linear scan, O(log t) if binary search on keys within node.
Insert (High Level)
- Descend to the leaf where the new key belongs.
- If leaf has room (< max keys), insert in sorted order.
- If leaf is full, split into two nodes with median key promoted to parent (or handled per variant).
- If parent becomes overfull, split propagates upward; new root may appear if old root splits.
Delete (High Level)
- Remove key from leaf (or swap with inorder predecessor/successor if stored in internal node—B+ usually deletes from leaf).
- If node has too few keys, borrow from left/right sibling via parent rotation, or merge with sibling and pull down parent key.
- If root becomes empty after merge, shrink height.
ASCII Diagram (Order t = 2, max 3 keys per node — illustrative)
[ 40 | 70 ]
/ | \
[10|20] [50|60] [80|90]
Internal node splits key range; children are subtrees with keys in (..,40), (40,70), (70,..).
Actual B-trees use larger t in practice (many keys per page).
Python Sketch: Search in a Simplified Node
def find_child_index(keys: list[int], k: int) -> int:
"""Return child slot for key k (0..len(keys))."""
i = 0
while i < len(keys) and k > keys[i]:
i += 1
return i
# Conceptual: node has keys[] and children[] length len(keys)+1
# Recursively descend until leaf or match.
Time Complexity
- Let
n= number of keys, each node hold Θ(t) keys, heighth = O(log_t n). - Search: O(h · t) with linear scan per node, or O(h · log t) with binary search inside node.
- Since
tis constant for a fixed page size, this is O(log n) with a small constant height. - Insert/Delete: O(h) node visits; splits/merges O(h) amortized in typical analysis.
Space Complexity
- O(n) keys plus O(n) child pointers overall (tree structure).
- Internal fragmentation in real disks adds constant-factor overhead per page.
Edge Cases
- Root split: tree grows in height when root is full and splits.
- Root merge: height may shrink when root’s children merge.
- Duplicate keys: policy varies (disallow, or chain in leaf for B+).
Common Mistakes
Pattern Recognition
Questions about database indexes, filesystem metadata, “why not binary tree on disk,” or “log base” of index depth → think B-tree / B+ tree and fanout.
Practice Problems
- Trace insertions that cause leaf split and then internal split on paper.
- Compare max height of AVL vs B-tree for same n and large page fanout.
- Explain why B+ tree is preferred for range queries in SQL indexes.
Summary
- B-tree: balanced multi-way search tree with min/max keys per node.
- Goal: minimize tree height for block-oriented storage.
- Ops: search; insert/delete with split, merge, borrow.
- B+ tree: internal separators only, records in leaves—common in DB engines.
19.5 KD Tree
Introduction
A k-d tree (k-dimensional tree) is a binary space-partitioning tree that stores points in
k-dimensional space. Each internal node splits space with an axis-aligned hyperplane: at depth d, we often
split on axis d mod k (cycle through dimensions). Points in the “left” subtree lie on one
side of the split, points in the “right” subtree on the other. k-d trees support
nearest-neighbor search, range queries (report points in a box or ball),
and approximate geometry queries more efficiently than brute-force O(n) scans when dimensionality is modest
and data is not adversarial.
Real-World Analogy
Imagine organizing a map of cities: first draw a vertical line dividing east and west halves, then within each half draw horizontal lines, then vertical again, and so on. Each region shrinks until it holds one city (or a small bucket). To find the city closest to a query point, you descend the tree like a decision tree, then backtrack and check whether the other side of a split could hold a closer city.
Formal Definition
p, a split dimension axis ∈ {0,…,k−1},
and left/right children containing points with coordinate on that axis less than or greater than (or ≤ / >)
p[axis] according to convention. Leaves may store one point or a small bucket. The tree is
binary and partitions ℝk recursively. Common construction: sort by current axis
and split at median for balance, giving O(n log n) build time for n points.
Why This Topic Matters
- Nearest neighbor: Core in graphics, ML (k-NN), mapping, and collision broad-phase.
- Range search: “All points in axis-aligned rectangle” in average sublinear time per query in good cases.
- Curriculum bridge: Connects BST ideas to computational geometry and higher dimensions.
Mental Model
- Each node = one splitting plane perpendicular to one coordinate axis.
- Depth cycles axis: x, then y, then z, then x again in 3D.
- Search prunes subtrees when query cannot possibly beat current best distance.
Construction (Median Split)
- Choose axis = depth mod k.
- Sort points by that axis (or use nth_element / quickselect for median in O(n) per level).
- Median point becomes node; recursively build left from lower half, right from upper half.
- Stop when no points remain (or bucket size ≤ 1).
Balanced median builds yield height O(log n) for n points (assuming distinct coordinates in typical analysis).
Nearest Neighbor Search (Sketch)
- Recursive descent: at each node, go to the child on the same side of the split as the query point.
- Update best distance when visiting a point.
- On unwind, if distance from query to splitting plane ≤ current best radius, search the other subtree too (sphere can cross the plane).
Pruning is the key: if the query is far from the plane compared to best distance, skip the other side.
ASCII Diagram (k = 2)
Points in plane: y | C | B D | A E +---------- x First split on x (vertical line through median x): left region: A, B right region: C, D, E Next split on y within each region, and so on.
Python Implementation (Build + NN Search Sketch)
from dataclasses import dataclass
from typing import List, Optional, Tuple
@dataclass
class KDNode:
point: Tuple[float, ...]
axis: int
left: Optional["KDNode"] = None
right: Optional["KDNode"] = None
def build_kdt(points: List[Tuple[float, ...]], depth: int = 0) -> Optional[KDNode]:
if not points:
return None
k = len(points[0])
axis = depth % k
points.sort(key=lambda p: p[axis])
mid = len(points) // 2
return KDNode(
point=points[mid],
axis=axis,
left=build_kdt(points[:mid], depth + 1),
right=build_kdt(points[mid + 1 :], depth + 1),
)
def dist2(a: Tuple[float, ...], b: Tuple[float, ...]) -> float:
return sum((x - y) ** 2 for x, y in zip(a, b))
def nearest(node: Optional[KDNode], target: Tuple[float, ...],
best: Tuple[float, Optional[Tuple[float, ...]]]) -> Tuple[float, Optional[Tuple[float, ...]]]:
"""Returns (best_dist2, best_point)."""
if node is None:
return best
d = dist2(node.point, target)
if d < best[0]:
best = (d, node.point)
axis = node.axis
diff = target[axis] - node.point[axis]
first, second = (node.left, node.right) if diff < 0 else (node.right, node.left)
best = nearest(first, target, best)
if diff * diff < best[0]: # hyperplane might intersect best sphere
best = nearest(second, target, best)
return best
Production code adds tie-breaking, bucket leaves, and iterative stacks; high dimensions often use approximate NN or other structures (LSH, ball trees) because k-d tree performance can degrade.
Line-by-Line Explanation
build_kdt: sorts on current axis, median as root—simple but O(n log² n) if sorting each level naively; median-of-axis via quickselect improves to O(n log n) total.nearest: visits near child first, then optionally the far child if the splitting hyperplane is close enough.diff * diff < best[0]: squared distance to plane along axis (exact plane distance in axis-normal direction).
Time and Space Complexity
- Build: O(n log² n) naive per-level sort; O(n log n) with linear-time median partition per level.
- NN query (average, low k): often O(log n) with good pruning; worst case O(n).
- Space: O(n) nodes.
In high dimensions, “curse of dimensionality” makes pruning weak—many queries degrade toward linear scan.
Edge Cases
- Empty set: return None.
- Duplicate coordinates: may need duplicate handling or bucket nodes.
- All points collinear: tree still valid but splits may be degenerate.
Common Mistakes
Pattern Recognition
Keywords: 2D/3D nearest point, “points in rectangle,” k-NN with small k and moderate n, spatial indexing without full database engine.
Practice Problems
- LeetCode 973 — K Closest Points to Origin (can use heap; try k-d tree for learning).
- Implement range count: points inside axis-aligned box.
- Compare runtime of brute-force vs k-d tree NN on random 2D points.
Summary
- k-d tree: binary tree splitting k-dimensional space with axis-aligned planes.
- Build: median on rotating axes; NN: recurse + prune across split plane.
- Good for: low–moderate dimension, static or lightly dynamic point sets.
- Caveat: high dimension ⇒ curse of dimensionality; consider other methods.
19.6 Persistent Segment Tree
Introduction
A persistent data structure preserves old versions after updates: you can still query “what did the array look like at time t?” or “sum elements in [L, R] in version t?” A persistent segment tree is a segment tree where each point update creates a new root and only allocates O(log n) new nodes along the path from root to leaf, reusing unchanged subtrees from the previous version. This is path copying (structural sharing).
Real-World Analogy
Think of version control: when you edit one file, Git does not duplicate the entire repository—it stores a delta or new blob and reuses unchanged trees. Persistent segment trees do the same: only the path from root to the changed leaf is “new”; sibling subtrees are shared pointers to old nodes.
Formal Definition
[l, r) (or [l,r] by convention). It stores an
aggregate (sum, min, count, etc.). Persistent update: to set position pos,
clone the root, then recursively clone only the child on the path containing pos; the other
child pointer copies the previous child’s pointer. Each version is identified by a root
pointer. Space over time: O((n + q) log n) for initial build + q updates if each update touches
O(log n) new nodes.
Why This Topic Matters
- Offline/historical queries: “Range sum at version k” without storing full array snapshots.
- Competitive programming: Classic for K-th order statistic on subarray, mergeable history problems.
- Functional persistence: Illustrates immutable structure sharing clearly.
Mental Model
- Normal segment tree: one tree, mutable.
- Persistent: each update yields a new root; old root still valid for old queries.
- Unchanged branches are literally the same nodes in memory (shared).
Brute Force → Better → Optimal
Brute Force
After each update, copy the entire array or entire segment tree: O(n) or O(n) space per version → O(nq) for q updates.
Better
Store only deltas per version (difference arrays)—works for some queries but not universal.
Optimal (Persistent Segment Tree)
Path copy only: O(log n) new nodes per update, O(log n) query per version.
Step-by-Step: Persistent Point Update
- Start from previous version’s root and interval [0, n).
- Create a new node copying the aggregate field; point one child to old child if range does not contain pos.
- Recurse into the half that contains pos; the other child reuses previous pointer unchanged.
- At leaf, write new value and propagate sums upward in cloned nodes only.
- Return new root as new version id.
ASCII Diagram (Path Copy)
Version 0 tree: Version 1 after updating index i:
R0 R1 (new root)
/ \ / \
A0 B0 A1 B0 <- B subtree reused
/ \ / \
... ... ... ... only left spine cloned
Python Implementation (Sum, Persistent Update)
from dataclasses import dataclass
from typing import Optional, List
@dataclass
class Node:
left: Optional["Node"]
right: Optional["Node"]
sum: int
def build(a: List[int], l: int, r: int) -> Node:
if l + 1 == r:
return Node(None, None, a[l])
m = (l + r) // 2
left = build(a, l, m)
right = build(a, m, r)
return Node(left, right, left.sum + right.sum)
def update(prev: Optional[Node], l: int, r: int, pos: int, val: int) -> Node:
if l + 1 == r:
return Node(None, None, val)
m = (l + r) // 2
if pos < m:
new_left = update(prev.left, l, m, pos, val)
new_right = prev.right
else:
new_left = prev.left
new_right = update(prev.right, m, r, pos, val)
return Node(new_left, new_right, new_left.sum + new_right.sum)
def query(node: Node, l: int, r: int, ql: int, qr: int) -> int:
if qr <= l or r <= ql:
return 0
if ql <= l and r <= qr:
return node.sum
m = (l + r) // 2
return query(node.left, l, m, ql, qr) + query(node.right, m, r, ql, qr)
build creates version 0. Each update(prev_root, …) returns a new root for the next
version. Assumes prev is non-null for internal calls; guard or use leaves for base. Production
code adds null checks and dynamic size.
Line-by-Line Explanation
updateclones only the side containingpos; the other child is shared via reference.sumrecomputed from children in new nodes along the path.queryis identical to normal segment tree but uses the root for the chosen version.
Time and Space Complexity
- Build: O(n) nodes for initial array.
- Update: O(log n) time, O(log n) new nodes.
- Query: O(log n) per version.
- Total space after q updates: O(n + q log n) nodes stored.
Edge Cases
- First version: build from initial array before persistent updates.
- Out of range index: reject or define behavior.
- Zero-length range: query returns 0.
Common Mistakes
prev when cloning.
Pattern Recognition
Problems asking for range queries on historical array states, K-th number in subarray, or “if we only changed one element per step, answer across timeline” often map to persistent segment tree or related persistent structures (Fenwick with copies is not standard—segment tree persistence is the usual tool).
Practice Problems
- Maintain versions of an array with point updates; answer range sum for arbitrary past version.
- K-th smallest value in subarray L..R (chairman tree / merge sort tree + persistence).
- Compare memory of full copy vs persistent after many updates.
Summary
- Persistent segment tree: immutable path copying on update; shared unchanged subtrees.
- Version = root pointer; update O(log n) time and nodes.
- Use for: historical range queries and advanced order-statistics on subarrays.
- Never mutate old nodes in place when older versions must remain valid.
19.7 Rope Data Structure
Introduction
A rope is a tree-based string data structure optimized for very large text and frequent edits (insert, delete, split, concatenate) at arbitrary positions. Instead of storing one giant contiguous string (where middle edits can be expensive due to shifting), a rope stores text in leaves and keeps internal nodes with subtree length metadata. This gives near O(log n) edits on average for balanced ropes.
Real-World Analogy
Think of a book made of many pages clipped together. Inserting text near page 500 does not require rewriting all pages after it; you split one page and insert new pages, then reconnect. A rope is that “editable pages” model for strings.
Formal Definition
- Leaves hold short string chunks.
- Each internal node stores
weight= total length of left subtree (or full subtree size by variant). - In-order traversal of leaves gives the full string.
Why This Topic Matters
- Text editors: Classic structure for efficient mutable text buffers.
- Large strings: Avoids O(n) copying for every middle insertion/deletion in long documents.
- Tree-operation pattern: Reinforces split/merge ideas from treaps and persistent structures.
Mental Model
- String is represented as concatenation of leaf chunks.
- Internal weights help route by index (like order-statistics in trees).
- To edit at position i, split around i, modify middle, then concatenate back.
Core Operations
1) Index / Character Access
Compare index i with node weight. If i < weight, go left; else go right with i - weight. Reach leaf and index into chunk.
2) Split(rope, i)
Returns two ropes: left contains first i characters, right contains the rest.
3) Concat(a, b)
Create parent whose left = a, right = b, and recompute weight/size. Rebalance if needed.
4) Insert/Delete/Substr
- Insert at i: split at i -> concat(left, new_text_rope) -> concat(result, right).
- Delete [l, r): split at l, split second part at (r-l), drop middle, concat remaining.
- Substring [l, r): same split pattern but keep middle.
ASCII Diagram
Rope for "HelloWorld!!!" (chunked):
[len=13]
/ \
[len=5] [len=8]
/ \ / \
"He" "llo" "World" "!!!"
In-order leaves -> "He" + "llo" + "World" + "!!!"
Python Sketch (Conceptual Rope Node)
from dataclasses import dataclass
from typing import Optional
@dataclass
class RopeNode:
left: Optional["RopeNode"] = None
right: Optional["RopeNode"] = None
text: str = "" # non-empty only for leaves in this sketch
weight: int = 0 # length of left subtree (or len(text) for leaf convention)
total_len: int = 0
def recalc(node: Optional[RopeNode]) -> int:
if node is None:
return 0
if node.left is None and node.right is None:
node.weight = len(node.text)
node.total_len = len(node.text)
else:
left_len = recalc(node.left)
right_len = recalc(node.right)
node.weight = left_len
node.total_len = left_len + right_len
return node.total_len
Production ropes include balancing (often via AVL/Red-Black/Treap-like priorities), leaf-size constraints, and efficient split/concat implementations.
Step-by-Step Example: Insert
Insert "XYZ" into "abcdef" at position 3:
- Split at 3 -> left
"abc", right"def". - Create rope for
"XYZ". - Concat(left, new) ->
"abcXYZ". - Concat(result, right) ->
"abcXYZdef".
Brute Force → Better → Optimal
Brute Force
Plain immutable string edits in middle repeatedly: each edit may copy O(n) characters.
Better
Gap buffer/piece table (editor-specific alternatives) reduce some costs depending on edit patterns.
Optimal (for tree-based random-position edits)
Rope with balanced tree gives O(log n) navigation/split/concat plus chunk-size effects, making large-document random edits practical.
Time Complexity (Balanced Rope)
- Index access: O(log n)
- Split/concat: O(log n)
- Insert/delete: O(log n + k) where k is size of inserted/deleted chunk handling
- Flatten to full string: O(n)
Without balancing, rope height can degrade and operations become slower (up to O(n) in worst case).
Space Complexity
- O(n) for characters plus tree node overhead.
- Persistent/versioned variants can share unchanged subtrees similarly to persistent trees.
Edge Cases
- Insert at 0 or end: reduces to prepend/append concat.
- Delete empty range: no change.
- Very small text: plain string may be faster due to lower constants.
Common Mistakes
Pattern Recognition
If requirements include huge mutable text, many random-position edits, undo/versioning, or fast substring operations, rope/piece-table/gap-buffer discussion is expected. Rope is strong for tree-based split/merge workloads.
Practice Problems
- Implement split and concat for a simplified rope with metadata updates.
- Build insert/delete using only split + concat.
- Benchmark repeated middle insertions: plain string vs rope-style chunk tree.
Summary
- Rope: balanced tree of string chunks with length metadata.
- Core idea: split + concat compose efficient text edits.
- Best for: large, mutable text with frequent middle edits.
- Requires metadata maintenance and balancing for promised performance.
20.1 Meet in the Middle
Introduction
Meet in the Middle (MITM) is a powerful algorithmic strategy used when brute force is too slow, but classic dynamic programming is not feasible because constraints are large or values are huge. The core trick is simple: instead of solving a problem over n elements at once, split it into two halves of roughly n/2 each, solve each half separately, and then combine the results efficiently.
For beginners, this pattern can feel magical at first. But once you see the complexity math, it becomes one of the most practical tools for interview and competitive programming problems involving subsets, sums, or combinations.
Real-World Analogy
Imagine trying to open a safe where the code has 8 digits. Brute force means trying all 10^8 possibilities. Instead, suppose you split the code into left 4 digits and right 4 digits. You precompute all left possibilities and all right possibilities, then combine information to find the full code much faster. You did not reduce correctness, only restructured the search.
Formal Definition
Meet in the Middle is a divide-and-combine strategy where the search space is partitioned into two parts, all possibilities of each part are enumerated, and a fast merge/search technique (sorting + binary search, hashing, two pointers, etc.) is used to produce the final answer.
O(2^n), and splitting gives roughly O(2^(n/2)) work per side plus merge overhead.
Why This Topic Matters
- Transforms impossible brute force into feasible solutions for
naround 30 to 45. - Appears in subset-sum style interview questions when constraints block standard DP table methods.
- Builds strong pattern recognition for balancing time and memory in exponential problems.
- Teaches a reusable design principle: precompute partial state, then merge smartly.
Mental Model
All subsets of n elements:
2^n combinations (too many)
|
v
Split into two halves:
Left half (n/2) Right half (n/2)
2^(n/2) subsets 2^(n/2) subsets
| |
+------ combine via search -------+
|
v
final answer
The key benefit is that 2^(n/2) is dramatically smaller than 2^n. For example, if n = 40, brute force is about 1 trillion subsets, while MITM uses about 1 million subsets per side.
Evolution: Brute Force → Better → Optimal
Brute Force
Enumerate every subset of all n elements and check condition.
- Time:
O(2^n)(usually too slow forn ~ 40) - Space: small if streamed, but time kills feasibility.
Better
Use pruning/backtracking if constraints allow monotonic cuts. This helps some cases, but worst case can still be near 2^n.
Optimal (for medium n exponential search problems)
Use MITM: split array into 2 halves, generate all subset results for each half, then combine using binary search/hash/two pointers.
- Time often near
O(2^(n/2) * poly(n)) - Space often
O(2^(n/2))
Step-by-Step Breakdown (Classic Subset Sum: Max Sum ≤ S)
Problem: Given an array arr and a limit S, find the maximum subset sum that is less than or equal to S.
- Split
arrintoleftandright. - Generate all subset sums of
left, store in listL. - Generate all subset sums of
right, store in listR. - Sort
R. - For each value
xinL, we need the largestyinRsuch thatx + y ≤ S. - Use binary search on
Rto find this besty. - Track the global maximum
x + y.
ASCII Diagram
arr = [3, 34, 4, 12, 5, 2], S = 10
Split:
left = [3, 34, 4]
right = [12, 5, 2]
L = all subset sums(left) -> [0, 3, 34, 37, 4, 7, 38, 41]
R = all subset sums(right) -> [0, 12, 5, 17, 2, 14, 7, 19]
Sort R -> [0, 2, 5, 7, 12, 14, 17, 19]
For each x in L:
find largest y <= S - x in R (binary search)
candidate = x + y
best candidate <= S is answer
Python Implementation
from bisect import bisect_right
from typing import List
def all_subset_sums(nums: List[int]) -> List[int]:
"""
Returns list of sums of all subsets of nums.
If len(nums) = k, result size is 2^k.
"""
sums = [0]
for num in nums:
new_sums = []
for current in sums:
new_sums.append(current + num)
sums.extend(new_sums)
return sums
def max_subset_sum_leq_s(arr: List[int], s: int) -> int:
"""
Meet in the Middle solution.
Returns max subset sum <= s.
"""
n = len(arr)
mid = n // 2
left = arr[:mid]
right = arr[mid:]
left_sums = all_subset_sums(left)
right_sums = all_subset_sums(right)
right_sums.sort()
best = 0
for x in left_sums:
if x > s:
continue
# Need maximum y such that x + y <= s
target = s - x
idx = bisect_right(right_sums, target) - 1
if idx >= 0:
best = max(best, x + right_sums[idx])
return best
# Example
if __name__ == "__main__":
arr = [3, 34, 4, 12, 5, 2]
s = 10
print(max_subset_sum_leq_s(arr, s)) # 10
Line-by-Line Explanation
Function all_subset_sums
- Start with
[0]because empty subset sum is always 0. - For each number, create new sums by adding it to every existing sum.
- Append those new sums to original list; count doubles each step.
- After
knumbers, there are exactly2^ksums.
Function max_subset_sum_leq_s
- Split array into two halves to reduce exponent from
nton/2. - Compute all subset sums for both halves.
- Sort right sums so we can binary search quickly.
- For each left sum
x, find best right sumy ≤ s - x. - Use
bisect_rightto get index of last validy. - Update answer with
x + y.
Time Complexity
Let left size be n1 and right size be n2, with n1 + n2 = n and usually n1 ≈ n2 ≈ n/2.
- Generate
left_sums:O(2^n1) - Generate
right_sums:O(2^n2) - Sort
right_sums:O(2^n2 log(2^n2)) = O(2^n2 * n2) - Loop over
left_sumsand binary search each:O(2^n1 * log(2^n2)) = O(2^n1 * n2)
When halves are equal, complexity is roughly O(2^(n/2) * n), far better than O(2^n).
Space Complexity
left_sumsstores2^n1values.right_sumsstores2^n2values.- Total:
O(2^(n/2))when split evenly.
Edge Cases
- Empty array: answer is 0 (empty subset).
- All numbers greater than S: answer may still be 0 if empty subset allowed.
- Negative numbers present: MITM still works, but your assumptions about pruning/change in ordering must be careful.
- Large duplicates: valid; no special correctness issue, only many repeated sums.
Common Mistakes
0; this can break correctness for small S.
Pattern Recognition
Suspect MITM when you see:
naround 30 to 45 (too large for2^n, too small for heavy polynomial with huge values).- Subset/combinational decisions (pick or skip each element).
- Need to optimize sum/count/closest value by combining partial results.
- Constraints where DP by sum is impossible (sum values too large).
Interview Insight
2^n to roughly 2 * 2^(n/2) plus merge. That makes n ~ 40 practical." Interviewers look for this complexity-driven reasoning, not just the final code.
Practice Problems
- Maximum subset sum ≤ S (classic MITM foundation).
- Count subsets with sum in range [A, B] using sorted list + binary search bounds.
- Closest subsequence sum (find subset sum closest to target).
- Partition with minimum difference for moderate
nand large values.
n = 40 and values up to 10^12, DP by sum is impossible, and 2^40 brute force is too big. This is the textbook signal for Meet in the Middle.
Summary
- MITM splits one exponential search into two smaller exponential parts.
- Main gain is exponent reduction from
nton/2. - Typical flow: split -> generate all half-results -> sort/hash -> merge by search.
- Best suited for subset-style problems with medium
nand large value ranges. - In interviews, always justify MITM with complexity math.
20.2 Mo's Algorithm
Introduction
Mo's Algorithm is an offline query optimization technique for range queries on arrays. It is used when you are asked to answer many queries like "compute something on subarray [L, R]" and a direct approach per query is too slow. Instead of solving each query from scratch, Mo's Algorithm reorders queries so that the current window changes only a little from one query to the next.
For beginners, think of it as "smart scheduling of queries." The data does not change, queries are known in advance, and we process them in an order that minimizes repeated work.
Real-World Analogy
Suppose you manage a camera sliding on a rail to inspect different segments of a long pipeline. If requests come in random order, the camera keeps jumping back and forth, wasting time. If you reorder requests by nearby segments, camera movement is reduced. Mo's Algorithm does exactly this for the query window boundaries L and R.
Formal Definition
Mo's Algorithm is an offline algorithm that sorts range queries in block order so that a current interval can be adjusted incrementally using add() and remove() operations, reducing total complexity compared to recomputing each query independently.
Why This Topic Matters
- Solves many query problems where
O(q * n)is too slow. - Teaches a powerful engineering idea: reduce total work by changing processing order.
- Frequently appears in coding contests and advanced interview rounds.
- Builds intuition for pointer movement costs and amortized analysis.
Mental Model
At any time, maintain one active range [curL, curR]. For next query [L, R], move boundaries step by step:
- If
curL > L, decrementcurLandadd(arr[curL]). - If
curR < R, incrementcurRandadd(arr[curR]). - If
curL < L,remove(arr[curL])then incrementcurL. - If
curR > R,remove(arr[curR])then decrementcurR.
Current Window: [curL........curR]
Target Window: [L..........R]
Move left and right ends gradually.
Each movement triggers add/remove in O(1) (or near O(1)).
Evolution: Brute Force → Better → Optimal
Brute Force
For each query, scan from L to R and compute answer.
- Per query: up to
O(n) - Total:
O(q * n)
Better
Prefix sums help only for additive/invertible functions (like sum). But for richer metrics (distinct count, frequency-based power, mode-like conditions), prefix sum may not work.
Optimal (for static offline range queries with local updateable state)
Use Mo's Algorithm with block decomposition and incremental add/remove updates.
- Typical total complexity:
O((n + q) * sqrt(n) * F), whereFis cost of one add/remove. - When
F = O(1), this is usually aroundO((n + q) * sqrt(n)).
Step-by-Step Breakdown (Count Distinct in Range)
Problem: Given array arr and queries [L, R], return number of distinct elements in each range.
- Choose block size
B = int(sqrt(n)). - Represent each query as
(L, R, index)to restore output order later. - Sort queries by
(L // B, R)(with optional odd-even R optimization). - Maintain:
freq[value]= count of value in current windowdistinct= number of values with frequency > 0
- Adjust current window to each query using add/remove boundary moves.
- Store
distinctas answer for that query's original index.
ASCII Diagram
arr index: 0 1 2 3 4 5 6 7
arr value: [1,1,2,1,3,4,2,3]
Queries:
Q0: [0,4]
Q1: [1,3]
Q2: [2,6]
Q3: [4,7]
After Mo sorting, process in movement-friendly order.
Window moves:
Start: empty
-> expand to Q?
-> shrink/expand slightly to next Q
-> ...
Instead of recomputing counts from scratch each time.
Python Implementation
from math import isqrt
from typing import List, Tuple
def mos_distinct_count(arr: List[int], queries: List[Tuple[int, int]]) -> List[int]:
"""
Returns distinct element count for each query [L, R] (inclusive)
using Mo's Algorithm.
"""
n = len(arr)
q = len(queries)
if n == 0:
return [0] * q
block = max(1, isqrt(n))
indexed_queries = []
for i, (l, r) in enumerate(queries):
indexed_queries.append((l, r, i))
# Standard Mo ordering with odd-even optimization on R
indexed_queries.sort(
key=lambda x: (
x[0] // block,
x[1] if ((x[0] // block) % 2 == 0) else -x[1]
)
)
freq = {}
answers = [0] * q
distinct = 0
cur_l, cur_r = 0, -1 # empty window
def add(value: int) -> None:
nonlocal distinct
old = freq.get(value, 0)
freq[value] = old + 1
if old == 0:
distinct += 1
def remove(value: int) -> None:
nonlocal distinct
old = freq[value]
if old == 1:
distinct -= 1
del freq[value]
else:
freq[value] = old - 1
for l, r, idx in indexed_queries:
while cur_l > l:
cur_l -= 1
add(arr[cur_l])
while cur_r < r:
cur_r += 1
add(arr[cur_r])
while cur_l < l:
remove(arr[cur_l])
cur_l += 1
while cur_r > r:
remove(arr[cur_r])
cur_r -= 1
answers[idx] = distinct
return answers
if __name__ == "__main__":
arr = [1, 1, 2, 1, 3, 4, 2, 3]
queries = [(0, 4), (1, 3), (2, 6), (4, 7)]
print(mos_distinct_count(arr, queries)) # [3, 2, 4, 3]
Line-by-Line Explanation
Block Size and Query Ordering
block = sqrt(n)balances number of blocks and movement inside blocks.- Sort by block of
Lfirst, then byRto reduce jumps. - Odd-even trick on
Rfurther reduces backtracking in practice.
State Maintenance
freqdictionary tracks frequency of each element in current window.distinctis the live answer for current window.addincreases frequency and updatesdistinctwhen frequency becomes 1.removedecreases frequency and updatesdistinctwhen frequency becomes 0.
Pointer Movement
- Expand/shrink window one step at a time until current range equals query range.
- Every pointer step performs exactly one
addorremove. - Once aligned, current
distinctis answer for that query.
Additional Worked Example
arr = [5, 5, 6, 7, 5], query [1, 4] includes values [5, 6, 7, 5]. Frequencies become {5: 2, 6: 1, 7: 1}, so distinct count is 3. If next query is [2, 4], you remove leftmost 5; frequencies become {5: 1, 6: 1, 7: 1}, distinct stays 3. This shows how one update can reuse nearly all previous work.
Time Complexity
Let n be array size and q number of queries.
- Sorting queries:
O(q log q). - Total pointer movement is approximately
O((n + q) * sqrt(n))in standard analysis. - If each add/remove is
O(1), total is nearO((n + q) * sqrt(n)). - If add/remove costs
F, multiply movement term byF.
Compared to brute force O(q * n), this is a major improvement for large q.
Space Complexity
- Query storage:
O(q). - Frequency map/array:
O(U)whereUis number of distinct values in active domain. - Answer array:
O(q).
Edge Cases
- Single-element query: answer computed by one add operation.
- All values same: distinct is always 1 for non-empty range.
- Large value domain: use dictionary or coordinate compression.
- Empty array: guard and return zeros for all queries.
Common Mistakes
When Mo's Algorithm Is Not Ideal
- Online queries that must be answered immediately in given order.
- Problems with frequent point/range updates unless using advanced Mo-with-updates variant.
- Query functions that are hard to maintain incrementally on add/remove.
Pattern Recognition
Think "Mo's Algorithm" when:
- You have many static array range queries.
- Direct per-query recomputation is too slow.
- You can maintain answer with local add/remove operations.
- Queries can be processed offline.
Interview Insight
Practice Problems
- Count distinct numbers in each range.
- Frequency of value
xin each range (for fixed x and for mixed x). - Range power metric like
sum(freq[v]^2 * v)using update formulas. - Number of pairs with equal values inside each query range.
Summary
- Mo's Algorithm is an offline range-query optimization via smart query ordering.
- Core mechanism is incremental window adjustment with add/remove hooks.
- Best for static arrays with many range queries and cheap local updates.
- Typical complexity is around
O((n + q) * sqrt(n))when updates are O(1). - It is a pattern for minimizing total work, not just speeding one query.
20.3 Square Root Decomposition
Introduction
Square Root Decomposition is a technique to speed up range queries (and sometimes updates) by dividing an array into blocks of size about sqrt(n). Instead of scanning every element for every query, you precompute block-level summaries, then answer a query using a mix of full blocks and small leftover parts.
For beginners, this is one of the best "bridge topics" between brute force and advanced trees (Fenwick Tree / Segment Tree). It teaches how to trade preprocessing and memory for faster repeated operations.
Real-World Analogy
Think of a warehouse inventory stored shelf-by-shelf. If someone asks for total items between shelf 12 and shelf 487, counting each item one by one is slow. If you already know the total for each shelf section, you can add full section totals quickly and only manually count the partial start/end sections.
Formal Definition
Square Root Decomposition partitions an array of size n into ~sqrt(n) blocks, each of size ~sqrt(n), and stores summary information per block (sum/min/max/count, depending on problem) to process operations faster than naive scanning.
O(sqrt(n)) per query/update.
Why This Topic Matters
- Simple to implement compared to segment trees.
- Gives strong intuition for block-based optimization strategies.
- Works well for medium constraints and static/semi-dynamic arrays.
- Frequently appears in interviews as a stepping stone to advanced structures.
Mental Model
Array of size n
|----block0----|----block1----|----block2----| ... |
~sqrt(n) ~sqrt(n) ~sqrt(n)
Query [L, R]:
1) Consume left partial block element-by-element
2) Consume full middle blocks using precomputed summary
3) Consume right partial block element-by-element
You do small manual work at the boundaries and fast aggregated work in the middle.
Evolution: Brute Force → Better → Optimal
Brute Force
For each range query, iterate from L to R.
- Range sum query:
O(n)worst case per query. - With many queries, total can become very large.
Better
Prefix sums solve static range sum queries in O(1) but do not handle point updates efficiently (O(n) rebuild if naive). So for mixed queries + updates, prefix-only approach is limited.
Optimal (for this pattern and simplicity target)
Use square root decomposition with block sums.
- Point update:
O(1)(adjust one block summary) - Range sum query:
O(sqrt(n))
Step-by-Step Breakdown (Range Sum + Point Update)
- Choose block size
B = ceil(sqrt(n)). - Create array
block_sumwhereblock_sum[i]stores sum of blocki. - Build: iterate array once and add each value to its block sum.
- Point update index
idxtonew_val:- Find block
b = idx // B. - Adjust
block_sum[b] += (new_val - arr[idx]). - Update
arr[idx] = new_val.
- Find block
- Range query
[L, R]:- If within same block, direct iterate.
- Else process left partial block, then full middle blocks using
block_sum, then right partial block.
ASCII Diagram
n = 16, B = 4
Index: 0 1 2 3 | 4 5 6 7 | 8 9 10 11 | 12 13 14 15
Block: 0 0 0 0 | 1 1 1 1 | 2 2 2 2 | 3 3 3 3
Query [2, 13]
Left partial: indices 2..3
Full blocks: block 1, block 2 (use block_sum directly)
Right partial: indices 12..13
Python Implementation
from math import isqrt
from typing import List
class SqrtDecompositionRangeSum:
"""
Supports:
- point update: arr[idx] = value
- range sum query: sum(arr[l:r+1])
"""
def __init__(self, arr: List[int]) -> None:
self.n = len(arr)
self.arr = arr[:] # keep internal copy
self.block_size = max(1, isqrt(self.n))
if self.block_size * self.block_size < self.n:
self.block_size += 1 # ceil(sqrt(n))
self.block_count = (self.n + self.block_size - 1) // self.block_size
self.block_sum = [0] * self.block_count
for i, val in enumerate(self.arr):
self.block_sum[i // self.block_size] += val
def update(self, idx: int, new_val: int) -> None:
block_idx = idx // self.block_size
delta = new_val - self.arr[idx]
self.arr[idx] = new_val
self.block_sum[block_idx] += delta
def query(self, left: int, right: int) -> int:
if left > right:
return 0
total = 0
start_block = left // self.block_size
end_block = right // self.block_size
if start_block == end_block:
for i in range(left, right + 1):
total += self.arr[i]
return total
# Left partial block
end_of_start_block = (start_block + 1) * self.block_size - 1
for i in range(left, min(end_of_start_block, self.n - 1) + 1):
total += self.arr[i]
# Full middle blocks
for b in range(start_block + 1, end_block):
total += self.block_sum[b]
# Right partial block
start_of_end_block = end_block * self.block_size
for i in range(start_of_end_block, right + 1):
total += self.arr[i]
return total
if __name__ == "__main__":
nums = [2, 1, 5, 3, 4, 7, 6, 2, 9, 8]
ds = SqrtDecompositionRangeSum(nums)
print(ds.query(2, 7)) # 27
ds.update(3, 10) # index 3: 3 -> 10
print(ds.query(2, 7)) # 34
print(ds.query(0, 9)) # 54
Line-by-Line Explanation
Construction
- Compute block size close to
sqrt(n). - Allocate
block_sumfor all blocks. - One pass over array to fill block sums.
Update Operation
- Find block containing index.
- Compute delta between new and old value.
- Apply delta to both array and block summary.
- No full recomputation needed.
Query Operation
- If both indices in same block, direct loop.
- Otherwise split into three segments: left partial, full blocks, right partial.
- Use
block_sumonly for complete blocks. - Add remaining elements manually at boundaries.
Additional Worked Examples
arr = [4, 8, 1, 6, 3, 7, 2, 5, 9], if block size is 3, block sums are:
- Block 0 (0..2):
4 + 8 + 1 = 13 - Block 1 (3..5):
6 + 3 + 7 = 16 - Block 2 (6..8):
2 + 5 + 9 = 16
[1, 7]:
- Left partial: indices 1..2 ->
8 + 1 = 9 - Full middle: block 1 ->
16 - Right partial: indices 6..7 ->
2 + 5 = 7 - Total =
9 + 16 + 7 = 32
4 from 3 to 11, only block 1 changes by +8. New block 1 sum becomes 24. This shows why updates are efficient.
Time Complexity
- Build:
O(n) - Point update:
O(1) - Range query:
- At most two partial blocks: about
O(sqrt(n))in worst case due to block size. - Middle full blocks: about
O(sqrt(n)).
- At most two partial blocks: about
- Total per query:
O(sqrt(n))
Space Complexity
- Array copy:
O(n) - Block summary: about
O(sqrt(n)) - Total auxiliary (excluding input storage choice):
O(sqrt(n))
Edge Cases
- n = 0: guard behavior (no valid updates/queries).
- left > right: return neutral value (0 for sum).
- Single-element range: direct access works naturally.
- Last block smaller: always clamp loop bounds by
n - 1.
Common Mistakes
Comparison With Nearby Approaches
| Approach | Query | Update | When Useful |
|---|---|---|---|
| Brute Force | O(n) | O(1) | Very small inputs |
| Prefix Sum | O(1) | O(n) naive rebuild | Static arrays |
| Sqrt Decomposition | O(sqrt(n)) | O(1) | Balanced simple solution |
| Segment Tree | O(log n) | O(log n) | High performance + flexibility |
Pattern Recognition
Think of square root decomposition when:
- You need many range queries and occasional updates.
- Segment tree feels heavy for current constraints/time.
- A block-level summary can combine answers quickly.
- Problem constraints are around
10^5with moderate operations and relaxed time limits.
Interview Insight
Practice Problems
- Range sum query with point update.
- Range minimum query with point update (store block minimums).
- Count numbers greater than
kin range (block sorting variant). - Jump-game style queries with block precomputed jump pointers (advanced sqrt decomposition).
Summary
- Square Root Decomposition divides data into blocks of size about
sqrt(n). - Query combines boundary scans with fast full-block summaries.
- For range-sum + point-update, common complexity is
O(sqrt(n))query andO(1)update. - It is easier than segment trees and great for medium-complexity constraints.
- This technique strengthens your understanding of preprocessing and amortized optimization.
20.4 Randomized Algorithms
Introduction
Randomized algorithms intentionally use randomness during execution to improve average performance, simplify logic, or avoid worst-case adversarial inputs. Instead of following exactly the same path for the same problem shape, the algorithm makes random choices (for example, choosing a random pivot in quicksort), which often leads to strong expected performance.
For beginners, the key mindset is this: randomness is not "guessing without logic." It is a controlled design tool with mathematical guarantees such as expected runtime, probability of error, or high-probability success.
Real-World Analogy
Imagine searching for one specific card in a large shuffled deck. If you always start from the top and the deck is adversarially arranged, you may be unlucky often. If you randomly sample strategically, your expected search behavior can become more stable against bad arrangements. Randomized algorithms similarly reduce dependence on input order patterns.
Formal Definition
A randomized algorithm is an algorithm that has access to random bits and may produce different internal execution paths for the same input. Its performance/correctness is analyzed probabilistically (expected time, error probability, success probability, etc.).
Why This Topic Matters
- Many classic high-performance algorithms are randomized (randomized quicksort, randomized selection, hashing techniques).
- Randomization often gives simpler implementations than deterministic worst-case-optimal approaches.
- Prevents predictable worst-case behavior in adversarial settings.
- Important for interviews, contests, distributed systems, and probabilistic data structures.
Mental Model
Deterministic Algorithm:
Input -> fixed path -> fixed output/runtime behavior
Randomized Algorithm:
Input + random bits -> one of many possible paths
-> output/runtime analyzed by probability
The output may still be always correct (Las Vegas), while runtime varies; or runtime may be bounded but output has tiny error chance (Monte Carlo).
Core Subtopics
1) Las Vegas vs Monte Carlo
- Las Vegas: Always correct output, random runtime (example: randomized quicksort pivot choice affects runtime, not correctness).
- Monte Carlo: Bounded runtime, but small probability of wrong answer (example: probabilistic primality tests with configurable error probability).
2) Expected Value in Algorithm Analysis
Expected runtime is a weighted average over all random choices. Even if some runs are slow, if they are rare, the expected runtime can still be excellent.
3) Amplification
For Monte Carlo methods, repeating the algorithm independently can reduce error probability exponentially.
p, k independent trials fail together with probability p^k.
Evolution: Brute Force → Better → Optimal
Brute Force
Try all possibilities deterministically (often infeasible for large inputs).
- May guarantee correctness, but time can explode.
Better
Use deterministic heuristics; may improve average behavior but can still be vulnerable to crafted worst-case inputs.
Optimal (practical perspective)
Use randomized design with probabilistic guarantees: strong expected performance, lower implementation complexity, robust behavior against adversarial patterns.
Step-by-Step Breakdown: Randomized Quicksort
Randomized quicksort picks a random pivot index in each recursive call, then partitions around it.
- If subarray has 0 or 1 elements, return.
- Pick random pivot index in current subarray.
- Swap pivot to end (or start) for partition convenience.
- Partition elements into less-than pivot and greater-or-equal regions.
- Recursively sort left and right partitions.
ASCII Diagram
Array: [9, 1, 7, 3, 8, 2, 5]
Random pivot chosen: 3
Partition result:
[1, 2] [3] [9, 7, 8, 5]
left pivot right
Recurse on left and right similarly.
Python Implementation
import random
from typing import List
def randomized_quicksort(arr: List[int]) -> List[int]:
"""
Returns a new sorted list using randomized quicksort.
Expected runtime: O(n log n), worst case: O(n^2) but unlikely.
"""
nums = arr[:] # avoid mutating caller data
def partition(left: int, right: int) -> int:
# Choose random pivot and move it to end
pivot_idx = random.randint(left, right)
nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx]
pivot = nums[right]
store = left
for i in range(left, right):
if nums[i] < pivot:
nums[store], nums[i] = nums[i], nums[store]
store += 1
nums[store], nums[right] = nums[right], nums[store]
return store
def sort(left: int, right: int) -> None:
if left >= right:
return
p = partition(left, right)
sort(left, p - 1)
sort(p + 1, right)
if nums:
sort(0, len(nums) - 1)
return nums
if __name__ == "__main__":
data = [9, 1, 7, 3, 8, 2, 5]
print(randomized_quicksort(data))
Line-by-Line Explanation
pivot_idx = random.randint(left, right)ensures pivot choice does not depend on input order pattern.- Partition places pivot in final sorted position.
- Elements less than pivot go left, others go right.
- Recursive calls sort subproblems independently.
- Correctness remains deterministic: final sorted output is always correct.
- Randomness affects recursion shape and thus runtime.
Additional Example: Randomized Quickselect (k-th Smallest)
Quickselect finds the k-th smallest element. Random pivot gives expected O(n) time.
import random
from typing import List
def randomized_quickselect(arr: List[int], k: int) -> int:
"""
Returns k-th smallest (1-indexed k).
Expected O(n), worst O(n^2).
"""
if not 1 <= k <= len(arr):
raise ValueError("k out of range")
nums = arr[:]
target = k - 1
left, right = 0, len(nums) - 1
while left <= right:
pivot_idx = random.randint(left, right)
nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx]
pivot = nums[right]
store = left
for i in range(left, right):
if nums[i] < pivot:
nums[store], nums[i] = nums[i], nums[store]
store += 1
nums[store], nums[right] = nums[right], nums[store]
if store == target:
return nums[store]
if store < target:
left = store + 1
else:
right = store - 1
raise RuntimeError("Unexpected state")
[7, 10, 4, 3, 20, 15] and k = 3, quickselect returns 7 (3rd smallest) without fully sorting all elements.
Time Complexity
Randomized Quicksort
- Expected:
O(n log n) - Worst case:
O(n^2)(very unlikely across random pivots)
Randomized Quickselect
- Expected:
O(n) - Worst case:
O(n^2)
Expected bounds come from probability of balanced-enough partitions over random pivot choices.
Space Complexity
- Quicksort recursion stack: expected
O(log n), worstO(n). - Quickselect iterative version:
O(1)extra (ignoring copied input array choice).
Edge Cases
- Duplicate values: partition rules must handle equals consistently.
- Already sorted input: deterministic bad pivot quicksort suffers, randomized pivot protects expected performance.
- Very small arrays: overhead can dominate; hybrid with insertion sort is common in production.
- Reproducibility needs: set random seed when deterministic testing is required.
Common Mistakes
When to Prefer Randomized Algorithms
- Input can be adversarial or highly patterned.
- Deterministic worst-case-optimal method is too complex to implement under time pressure.
- Small probability of error is acceptable (Monte Carlo scenarios).
- You need practical speed with simple implementation.
Pattern Recognition
Suspect randomization when you hear:
- "Average performance is fine, worst-case patterns hurt."
- "Need robust behavior across unknown or hostile test data."
- "Can trade tiny failure probability for big performance gain."
- "Need random pivot/hash/sampling for stability."
Interview Insight
Practice Problems
- Implement randomized quicksort and compare against deterministic pivot on sorted inputs.
- Implement randomized quickselect for k-th smallest.
- Simulate repeated Monte Carlo trials and compute empirical error reduction.
- Design a random sampling approach to estimate majority candidate before verification.
Summary
- Randomized algorithms use controlled randomness to improve expected behavior.
- Las Vegas: always correct, random runtime. Monte Carlo: bounded runtime, small error chance.
- Classic examples include randomized quicksort and randomized quickselect.
- The right analysis is probabilistic (expected time, error probability, high probability bounds).
- Randomization is a practical engineering tool, not a shortcut around correctness.
20.5 Reduction Techniques
Introduction
Reduction is one of the most important problem-solving techniques in computer science. The idea is simple but powerful: instead of solving a new problem directly, transform it into another problem you already know how to solve efficiently. If the transformation is correct and efficient, you inherit the known solution.
For beginners aiming at mastery, reduction changes how you think about hard problems. You stop asking only "How do I solve this from scratch?" and start asking "Which known problem does this resemble, and how can I map it there?"
Real-World Analogy
Suppose you have a document in a rare file format. You do not build a custom printer driver for that format. You convert it to PDF, then use standard printing tools. You solved your original task by reducing it to a widely supported one.
Formal Definition
A problem A is reducible to problem B (written informally as A -> B) if any instance of A can be transformed into an instance of B such that solving that transformed instance yields a correct solution for A, and the transformation itself is computationally efficient.
Why This Topic Matters
- Speeds up problem solving by reusing known patterns and algorithms.
- Essential for interview questions where direct solution looks messy.
- Foundation of NP-hardness/NP-completeness proofs.
- Improves code quality: cleaner logic, less reinvention, fewer bugs.
Mental Model
Original Problem A
|
| transform(input)
v
Known Problem B -- solve with known algorithm --> result_B
|
| map back (if needed)
v
Answer for Problem A
The transformation and mapping back must preserve correctness.
Core Types of Reductions
1) Algorithmic Reduction (Practical)
Reduce to a standard data structure or algorithm you already know, then solve quickly in interviews/contests.
2) Complexity Reduction (Theory)
Use polynomial-time transformations to show relative hardness (for example, prove new problem is NP-hard by reducing a known NP-hard problem to it).
3) Decision vs Optimization Reduction
Sometimes an optimization problem can be solved via repeated decision checks (often with binary search on answer).
Evolution: Brute Force → Better → Optimal
Brute Force
Design custom algorithm directly on original statement, often ending in high complexity and complex case handling.
Better
Spot partial structure but still perform significant manual logic for special cases.
Optimal (thinking approach)
Recognize canonical target problem, reduce cleanly, apply proven solver, and map result back.
Step-by-Step Reduction Framework
- State source problem clearly: input, output, constraints.
- Identify target known problem: sorting, hashing, graph shortest path, bipartite matching, etc.
- Define transformation: exactly how source input becomes target input.
- Prove correctness: show answer equivalence in both directions when needed.
- Analyze complexity: transformation cost + target solver cost + reverse mapping cost.
- Handle edge cases: ensure transformation remains valid on boundaries.
Reduction Example 1: Pair Sum → Hash Lookup
Problem
Given array nums and target T, determine if any pair sums to T.
Reduction Idea
Reduce pair search to repeated lookup problem: for each x, check whether T - x has been seen before.
Pair Sum
-> For each element x
-> Need existence of (T - x)
-> Hash set membership query
Python Implementation
from typing import List
def has_two_sum(nums: List[int], target: int) -> bool:
seen = set()
for x in nums:
if target - x in seen:
return True
seen.add(x)
return False
Line-by-Line Explanation
seenstores numbers processed so far.- For current
x, complement istarget - x. - If complement exists in
seen, valid pair found. - Otherwise add
xand continue.
Reduction result: from naive O(n^2) pair checking to expected O(n).
Reduction Example 2: Scheduling With Deadlines → Sort + Heap
Problem
Each task has duration and deadline. Maximize number of tasks completed before deadlines.
Reduction Idea
Reduce to ordered processing by deadlines, maintaining chosen durations in a max-heap. If total time exceeds current deadline, remove longest task.
Python Implementation
import heapq
from typing import List, Tuple
def max_tasks(tasks: List[Tuple[int, int]]) -> int:
"""
tasks[i] = (duration, deadline)
"""
tasks.sort(key=lambda x: x[1]) # reduce to deadline order
total = 0
max_heap = [] # store negative durations
for duration, deadline in tasks:
total += duration
heapq.heappush(max_heap, -duration)
if total > deadline:
longest = -heapq.heappop(max_heap)
total -= longest
return len(max_heap)
Why Reduction Works
- Sorting by deadline makes each prefix a local feasibility checkpoint.
- When infeasible, removing longest duration gives best chance to fit more tasks.
- Heap gives fast access to longest chosen task.
ASCII Diagram
Source input
|
| transform
v
Sorted/graph/heap/hash representation
|
| apply known algorithm
v
intermediate solution
|
| interpret/map back
v
final answer
Time Complexity (How to Derive Properly)
Always split analysis into components:
- Transformation cost: build target instance.
- Solver cost: run known algorithm on transformed instance.
- Mapping-back cost: convert result to original problem output format.
Total complexity is the sum of these components.
O(n), solver is O(n log n), and mapping back is O(n), total is O(n log n).
Space Complexity
- Depends on target representation (hash set, heap, graph, DP table).
- Include extra memory introduced by transformation, not just solver memory.
- In interviews, explicitly mention whether transformation is in-place or auxiliary.
Edge Cases
- Empty input: transformed problem must remain valid.
- Duplicate elements: mapping should preserve multiplicity if needed.
- Signed/large values: avoid assumptions that break hashing/sorting logic.
- Constraint mismatches: ensure target algorithm assumptions actually hold.
Common Mistakes
Pattern Recognition
Reduction is likely when problem statements include clues like:
- "Can this be sorted first?"
- "Can each query become membership lookup?"
- "Can this interval/task/string problem become graph traversal?"
- "Can optimization be solved via binary search + feasibility check?"
Interview Insight
Practice Problems
- Reduce interval overlap checks to sorting + linear scan.
- Reduce duplicate detection to hash set membership.
- Reduce shortest transformation sequence problem to BFS on implicit graph.
- Reduce capacity minimization problem to binary search on answer + greedy feasibility.
Summary
- Reduction means solving a new problem by transforming it into a known one.
- A valid reduction must preserve correctness and stay computationally efficient.
- It is both a practical interview technique and a core complexity-theory concept.
- Always analyze transformation + solve + mapping-back costs together.
- Mastering reduction greatly accelerates advanced problem-solving ability.
20.6 NP-Completeness Basics
Introduction
NP-Completeness is a foundational concept in theoretical computer science that helps us classify problems by computational difficulty. It tells us why some problems likely do not have fast exact algorithms and guides us toward practical alternatives like approximation, heuristics, or special-case optimization.
For beginners, this topic is not about memorizing heavy theory symbols. It is about understanding a powerful decision framework: when to keep searching for a fast algorithm, and when to stop and change strategy.
Real-World Analogy
Imagine packing a truck with many packages where each has size and value constraints. Finding the perfect combination might be possible for small inputs, but becomes explosively hard at scale. NP-Completeness is the formal language for this "combinatorial explosion" and helps engineers choose realistic methods under time limits.
Formal Definition
P (Polynomial Time)
Class of decision problems solvable in polynomial time by a deterministic algorithm.
NP (Nondeterministic Polynomial Time)
Class of decision problems where a proposed solution can be verified in polynomial time.
NP-Hard
A problem is NP-hard if every problem in NP can be polynomial-time reduced to it. NP-hard problems are at least as hard as NP problems, and may not even be decision problems.
NP-Complete
A decision problem is NP-complete if it is both:
- In NP, and
- NP-hard.
P = NP.
Why This Topic Matters
- Explains why many optimization problems resist efficient exact algorithms.
- Prevents wasted effort trying to force impossible asymptotic improvements.
- Guides practical choices: approximation, randomized methods, branch-and-bound, ILP, or constraint solvers.
- Frequently tested in higher-level interviews and competitive programming discussions.
Mental Model
P ⊆ NP
|
| hardest problems inside NP
v
NP-Complete
NP-Hard includes NP-Complete and possibly even harder/non-decision problems.
Think of NP-complete problems as "universal hard cores" of NP: if you solve one efficiently, you unlock all.
Core Subtopics
1) Decision vs Optimization Form
NP-completeness is defined on decision problems (yes/no). Many optimization problems are handled by converting them to decision versions.
2) Polynomial-Time Reduction
To show problem B is hard, reduce known hard problem A to B (A -> B). This means: if we could solve B fast, we could solve A fast.
3) Verification Perspective
For NP membership, you do not need to find a solution quickly; you only need to verify a candidate quickly.
Evolution: Brute Force → Better → Optimal
Brute Force
Enumerate all possibilities (subsets, permutations, assignments). Usually exponential.
Better
Use pruning, memoization, or special constraints. Works for medium-sized instances but not worst-case scalable.
Optimal (strategy-level, not always exact algorithm)
Classify hardness first. If NP-complete, choose best practical approach: approximation, heuristic search, or exact algorithms on small/structured cases.
Step-by-Step: How to Prove NP-Completeness
- Convert your target problem to a decision version (if needed).
- Show target is in NP (polynomial-time verifier).
- Pick a known NP-complete source problem.
- Build polynomial-time reduction from source to target.
- Prove correctness of transformation (yes-instance maps to yes-instance, no to no).
- Conclude target is NP-hard; with step 2, target is NP-complete.
ASCII Diagram
Known NP-Complete Problem A
|
| polynomial-time reduction
v
Target Problem B
If B were easy (poly-time), A would be easy.
So B is at least as hard as A.
Example 1: Subset Sum (Decision Form)
Problem: Given integers and target T, is there a subset with sum exactly T?
- Given candidate subset, verification is polynomial (compute sum and compare).
- Hence in NP.
- Known NP-complete problem (decision form).
nums = [3, 34, 4, 12, 5, 2], T = 9 -> yes (4 + 5).
Example 2: Traveling Salesman Problem (Decision Form)
Decision TSP: Is there a tour with total cost <= K?
- Candidate tour can be verified quickly by summing edge costs and checking validity.
- Decision version is NP-complete; optimization version is NP-hard.
Python Helper (Verifier Mindset Demonstration)
This code does not solve NP-complete problems efficiently; it demonstrates polynomial-time verification for a candidate certificate.
from typing import List
def verify_subset_sum(nums: List[int], chosen_indices: List[int], target: int) -> bool:
"""
Verifier for Subset Sum certificate:
chosen_indices represent one proposed subset.
Runs in polynomial time.
"""
total = 0
n = len(nums)
for idx in chosen_indices:
if idx < 0 or idx >= n:
return False
total += nums[idx]
return total == target
if __name__ == "__main__":
nums = [3, 34, 4, 12, 5, 2]
cert = [2, 4] # nums[2]=4, nums[4]=5
print(verify_subset_sum(nums, cert, 9)) # True
Line-by-Line Explanation
- Certificate is list of indices representing a proposed subset.
- Verifier checks index validity and accumulates selected values.
- Final equality check to target decides acceptance.
- This is polynomial-time verification, illustrating NP membership intuition.
Time Complexity Perspective
- For NP-complete problems, no known polynomial-time exact algorithms in general case.
- Exact methods often exponential:
O(2^n),O(n!), etc. - Verification of certificates for NP problems is polynomial.
Space Complexity Perspective
- Depends on chosen method: brute force recursion, DP pseudo-polynomial tables, branch-and-bound, SAT encoding, etc.
- When discussing NP-completeness, focus primarily on asymptotic hardness class and reduction correctness.
Edge Cases and Clarifications
- Pseudo-polynomial algorithms: may exist (for example DP on subset sum values), but do not imply problem is in P.
- Special graph/input structures: many NP-hard problems become easy on restricted cases.
- Approximation possibility: NP-hard does not always mean no good approximation, but guarantees vary by problem.
- Practical solvability: small input sizes can still be solved exactly using optimized exponential methods.
Common Mistakes
Pattern Recognition
Suspect NP-complete territory when you see:
- Combinatorial selection with global constraints (subsets, partitions, schedules, tours).
- Decision form asks "Does there exist ... ?".
- Huge search space with no obvious greedy/local-optimal guarantee.
- Problems resembling SAT, 3-SAT, Clique, Vertex Cover, Subset Sum, Hamiltonian Cycle, TSP decision.
Interview Insight
Practice Problems
- Classify decision vs optimization versions of Knapsack, TSP, and Set Cover.
- Write a verifier for Clique certificate.
- Explain why Subset Sum has pseudo-polynomial DP but is still NP-complete.
- Map a new problem to a known NP-complete problem via reduction sketch.
Summary
- NP-completeness classifies hardest decision problems in NP.
- To prove NP-complete: show in NP and show NP-hard via polynomial-time reduction.
- Decision vs optimization distinction is essential.
- This theory directly guides practical strategy choices for hard real-world problems.
- Mastering basics here prepares you for P vs NP and advanced complexity topics.
20.7 P vs NP
Introduction
The P vs NP question is one of the most famous open problems in computer science and mathematics. It asks whether every problem whose solution can be verified quickly can also be solved quickly. This sounds simple, but its implications affect optimization, cryptography, AI, scheduling, logistics, and many real-world engineering systems.
As a beginner, your goal is not to "solve" P vs NP. Your goal is to deeply understand what the question means, why it matters, and how it changes practical algorithm design decisions.
Real-World Analogy
Suppose someone gives you a completed Sudoku. Checking if it is valid is fast. But creating a valid filled Sudoku from scratch can be much harder. P vs NP asks whether this "easy to verify but hard to find" gap is fundamental, or whether we just have not discovered the right fast algorithms yet.
Formal Definitions
P
Set of decision problems solvable in polynomial time by deterministic algorithms.
NP
Set of decision problems for which a proposed solution (certificate) can be verified in polynomial time.
The P vs NP Question
Is P = NP ?
- If yes: every efficiently verifiable problem is also efficiently solvable.
- If no: there exist problems that are easy to verify but inherently hard to solve.
P ⊆ NP. The unknown part is whether this inclusion is strict.
Why This Topic Matters
- Explains why many practical optimization problems remain computationally difficult.
- Influences cryptographic assumptions (for example, difficulty of certain factoring/discrete-log style tasks).
- Helps engineers choose realistic strategies: exact, approximate, heuristic, or probabilistic.
- Improves interview communication when discussing hard problem classes.
Mental Model
All easy-to-solve decision problems -> P
All easy-to-verify decision problems -> NP
Known: P is inside NP.
Unknown: Is NP strictly bigger?
Think of P as "construct answer quickly" and NP as "check answer quickly."
Core Subtopics
1) Verification vs Construction
NP focuses on verification efficiency, not construction efficiency. Many learners confuse these.
2) NP-Complete Problems
NP-complete problems are the hardest problems in NP. If one NP-complete problem is in P, then P = NP.
3) Practical Consequence
Because P = NP is unproven, practitioners assume hard instances remain hard and build robust approximate/heuristic pipelines.
Evolution: Brute Force → Better → Optimal (Practical Strategy)
Brute Force
Search all possible solutions and verify each. Correct but usually exponential.
Better
Use pruning, branch-and-bound, memoization, or constraint propagation to shrink search space.
Optimal (engineering mindset under current knowledge)
Classify complexity, then choose constrained exact methods, approximations, or heuristics rather than chasing unknown polynomial-time exact solutions for NP-complete cases.
Step-by-Step Reasoning Workflow for New Problems
- Convert to decision version if needed.
- Check if candidate answers can be verified in polynomial time (NP membership clue).
- Look for reduction to/from known NP-complete problems.
- If likely NP-hard, decide practical path: exact for small n, approximation, or heuristic.
- Communicate trade-offs clearly (quality vs runtime vs guarantees).
ASCII Diagram
P ?= NP
Case 1: P = NP
easy-to-verify => easy-to-solve
Case 2: P != NP
some problems remain verification-easy but solution-hard
Concrete Example: SAT Intuition
SAT (Boolean satisfiability) asks whether there exists an assignment of variables that makes a boolean formula true.
- Given an assignment, verification is fast (evaluate formula).
- Finding a satisfying assignment may be hard for large instances.
- SAT was the first proven NP-complete problem.
(x1 OR x2) AND (NOT x1 OR x3). If someone gives x1=False, x2=True, x3=True, checking truth is quick.
Python Implementation (Verification Demonstration)
This snippet demonstrates fast verification (NP perspective), not fast universal solving.
from typing import Dict, List, Tuple
# Clause represented as list of literals:
# (var_name, is_positive)
Clause = List[Tuple[str, bool]]
def verify_cnf(clauses: List[Clause], assignment: Dict[str, bool]) -> bool:
"""
Verifies if a given assignment satisfies a CNF formula.
Runs in polynomial time in formula size.
"""
for clause in clauses:
clause_ok = False
for var, is_positive in clause:
if var not in assignment:
return False
value = assignment[var]
literal_value = value if is_positive else (not value)
if literal_value:
clause_ok = True
break
if not clause_ok:
return False
return True
if __name__ == "__main__":
# (x1 OR x2) AND (NOT x1 OR x3)
formula = [
[("x1", True), ("x2", True)],
[("x1", False), ("x3", True)]
]
assignment = {"x1": False, "x2": True, "x3": True}
print(verify_cnf(formula, assignment)) # True
Line-by-Line Explanation
- Each clause is checked independently.
- A clause is satisfied if at least one literal evaluates true.
- Formula is satisfied only if all clauses are satisfied.
- This verification is polynomial in number of literals and clauses.
Time Complexity Perspective
- Verification for NP problems is polynomial by definition.
- Exact solving for NP-complete problems has no known polynomial-time algorithm in general.
- Brute force for SAT-style problems can be
O(2^n)over variable assignments.
Space Complexity Perspective
- Verification often needs only input + assignment storage.
- Exact search methods may require recursion stacks, memo tables, or solver state structures.
Edge Cases and Clarifications
- Small inputs: NP-hard problems can still be solved exactly in practice.
- Structured instances: special constraints can make hard problems tractable.
- Pseudo-polynomial algorithms: do not automatically place a problem in P.
- Randomized/approximate solvers: useful in practice without resolving P vs NP.
Common Mistakes
Pattern Recognition
You are in P vs NP territory when a problem has:
- Existential decision phrasing: "Does there exist...?"
- Easy candidate checking but difficult candidate finding.
- Large combinatorial search spaces (assignments/subsets/tours/schedules).
- Connections to SAT, Clique, Vertex Cover, Hamiltonian Cycle, TSP decision, etc.
Interview Insight
Practice Problems
- Differentiate P, NP, NP-hard, NP-complete with examples.
- Write verifiers for SAT and Clique candidate certificates.
- Convert optimization formulations into decision forms.
- For a known NP-hard problem, design one exact-small and one approximate-large strategy.
Summary
- P vs NP asks whether fast verification implies fast solving.
- We know
P ⊆ NP; equality remains unknown. - NP-complete problems are central: one polynomial algorithm there would collapse the gap.
- In practice, treat NP-complete problems with strategy diversity: exact, approximate, heuristic.
- Understanding this topic improves both interview communication and real-world algorithm decisions.
20.8 Simulated Annealing & Heuristic Search
Introduction
Simulated Annealing and Heuristic Search are practical strategies for solving hard optimization problems where exact algorithms are too slow. Instead of guaranteeing the perfect global optimum every time, these methods aim to find very good solutions quickly, especially for large NP-hard search spaces.
For beginners, this topic is the bridge between theory and real-world engineering: when exact methods are unrealistic, you still need smart, measurable ways to produce high-quality solutions.
Real-World Analogy
Imagine trying to find the lowest valley in a huge mountain range at night. A purely greedy strategy always walking downhill can get stuck in a nearby valley (local minimum). Simulated Annealing sometimes allows uphill moves early, so you can escape shallow valleys and eventually settle into deeper ones as the "temperature" cools down.
Formal Definitions
Heuristic Search
A search strategy guided by domain-specific rules or scoring functions to find good solutions faster than exhaustive search.
Simulated Annealing (SA)
A probabilistic local-search metaheuristic inspired by thermal annealing in metallurgy. At each step, it explores a neighboring solution and may accept worse moves with probability based on temperature and score difference.
Why This Topic Matters
- Provides practical tools for NP-hard optimization where exact methods do not scale.
- Useful in routing, scheduling, layout optimization, hyperparameter tuning, and game AI.
- Teaches trade-offs between solution quality and runtime budget.
- Frequently discussed in advanced interviews as "what would you do at scale?"
Mental Model
Current solution S
|
| generate neighbor S'
v
If better -> accept
If worse -> maybe accept with probability exp(-delta / T)
|
v
Lower temperature T gradually
High T: explore widely
Low T: exploit best regions
Core Subtopics
1) Objective Function
A numeric score to optimize (minimize cost or maximize reward). Everything depends on objective quality.
2) Neighborhood Design
How to move from one candidate solution to a nearby one (swap two cities, reassign one task, flip one bit, etc.).
3) Cooling Schedule
How temperature decreases over time (geometric cooling, linear cooling, adaptive cooling).
4) Acceptance Rule
If neighbor is better, accept. If worse by delta, accept with probability exp(-delta / T).
Evolution: Brute Force → Better → Optimal (Practical)
Brute Force
Try every possible solution. Correct but impossible for large combinatorial spaces.
Better
Greedy/local search quickly finds decent solutions but often gets trapped in local optima.
Optimal (under large-scale constraints)
Use heuristic/metaheuristic methods (SA, tabu search, genetic algorithms, beam search) with quality-time trade-off controls.
Step-by-Step: Simulated Annealing Workflow
- Pick initial solution
S(random or heuristic seed). - Set initial temperature
T0and cooling ratealpha. - Generate neighbor
S'. - Compute
delta = cost(S') - cost(S)for minimization. - If
delta <= 0, acceptS'. - Else accept with probability
exp(-delta / T). - Track best solution seen so far.
- Cool temperature:
T = T * alpha. - Stop when
Tlow enough or iteration/time limit reached.
ASCII Diagram
Search Landscape (cost):
High cost /\ /\
/ \____/ \___
/ \____ <- global minimum region
/\__ local min
Greedy: falls and gets stuck at local min
SA: can jump out early (high T), then settle later (low T)
Python Implementation (TSP-Style Demonstration)
import math
import random
from typing import List, Tuple
Point = Tuple[float, float]
def distance(a: Point, b: Point) -> float:
dx = a[0] - b[0]
dy = a[1] - b[1]
return math.sqrt(dx * dx + dy * dy)
def route_cost(route: List[int], points: List[Point]) -> float:
total = 0.0
n = len(route)
for i in range(n):
p = points[route[i]]
q = points[route[(i + 1) % n]] # cycle
total += distance(p, q)
return total
def random_neighbor(route: List[int]) -> List[int]:
# Swap two random positions
n = len(route)
i, j = random.sample(range(n), 2)
new_route = route[:]
new_route[i], new_route[j] = new_route[j], new_route[i]
return new_route
def simulated_annealing_tsp(
points: List[Point],
initial_temp: float = 1000.0,
cooling_rate: float = 0.995,
min_temp: float = 1e-3,
iterations_per_temp: int = 200
) -> Tuple[List[int], float]:
n = len(points)
current = list(range(n))
random.shuffle(current)
current_cost = route_cost(current, points)
best = current[:]
best_cost = current_cost
temp = initial_temp
while temp > min_temp:
for _ in range(iterations_per_temp):
candidate = random_neighbor(current)
candidate_cost = route_cost(candidate, points)
delta = candidate_cost - current_cost
# Always accept better; sometimes accept worse
if delta <= 0 or random.random() < math.exp(-delta / temp):
current = candidate
current_cost = candidate_cost
if current_cost < best_cost:
best = current[:]
best_cost = current_cost
temp *= cooling_rate
return best, best_cost
if __name__ == "__main__":
pts = [(0, 0), (1, 5), (5, 2), (6, 6), (8, 3), (2, 1)]
best_route, best_cost = simulated_annealing_tsp(pts)
print("Best route:", best_route)
print("Best cost:", round(best_cost, 3))
Line-by-Line Explanation
route_costmeasures objective (tour length).random_neighbordefines neighborhood by swapping two cities.deltacompares candidate and current cost.- Acceptance rule balances exploration and exploitation.
bestis tracked independently from current to avoid losing strong solutions.- Temperature cooling gradually reduces probability of accepting worse moves.
Additional Worked Example (Acceptance Probability)
delta = 5.
- At
T = 100: accept probabilityexp(-5/100) ≈ 0.951(very likely). - At
T = 1: accept probabilityexp(-5/1) ≈ 0.0067(very unlikely).
Time Complexity
Let:
K= number of temperature levelsI= iterations per temperatureN_eval= cost to evaluate one candidate
Total complexity is roughly O(K * I * N_eval).
- For TSP-like full route recomputation,
N_eval = O(n). - With incremental delta evaluation, this can be reduced in many problems.
Space Complexity
- Store current and best solution: typically
O(n). - Extra overhead depends on representation and neighbor generation.
Edge Cases
- Temperature too low initially: behaves almost greedy, poor exploration.
- Cooling too fast: freezes before discovering strong regions.
- Cooling too slow: high runtime.
- Weak neighborhood: search cannot escape structural traps effectively.
Common Mistakes
Comparison: Greedy vs SA vs Exact
| Approach | Quality | Speed | Guarantee |
|---|---|---|---|
| Greedy | Often decent | Very fast | Rarely global optimal |
| Simulated Annealing | Usually better with tuning | Moderate | No exact guarantee |
| Exact Search/DP | Optimal | Often too slow at scale | Optimality proof |
Pattern Recognition
Use SA/heuristic search when:
- Problem is NP-hard and input size is large.
- Exact optimum is less important than high-quality solution quickly.
- Objective function is easy to evaluate.
- You can design meaningful neighborhood transitions.
Interview Insight
Practice Problems
- Implement SA for Traveling Salesman with swap and 2-opt neighborhoods.
- Apply SA to maximize score in a scheduling/assignment toy problem.
- Compare greedy vs SA quality across 20 random seeds.
- Tune cooling schedules and plot cost vs iteration.
Summary
- Heuristic search aims for strong solutions quickly in hard search spaces.
- Simulated Annealing escapes local minima by probabilistically accepting worse moves early.
- Quality depends on objective design, neighborhood design, and cooling schedule.
- No exact optimality guarantee, but often excellent practical performance.
- Essential for large-scale optimization where exact methods are infeasible.
21.1 Design LRU Cache
Introduction
Designing an LRU (Least Recently Used) Cache is one of the most important data structure design problems for interviews and real-world systems. The cache stores key-value pairs with fixed capacity. When capacity is full and a new key must be inserted, we evict the least recently used key.
The core challenge is not just implementing get/set behavior. The challenge is achieving both operations in constant time, O(1).
Real-World Analogy
Imagine a study desk with space for only 3 books. The books you use often stay on the desk. If a new book comes and desk is full, you remove the book that has not been touched for the longest time. That is exactly LRU behavior.
Formal Definition
An LRU cache supports:
get(key): return value if key exists; otherwise return -1.put(key, value): insert/update key-value pair.
Constraint: both operations should run in O(1) average time.
Why This Topic Matters
- Classic interview design question testing data-structure composition.
- Used in CPU caches, DB buffer pools, API caching layers, and web backends.
- Teaches how to combine fast lookup with fast order maintenance.
- Builds foundation for advanced designs like LFU and ARC caches.
Mental Model
Need two abilities together:
1) Find by key quickly -> Hash Map (dict)
2) Track usage order quickly -> Doubly Linked List
Most recent <-> ... <-> Least recent
Hash map gives direct node access. Doubly linked list gives O(1) move-to-front and O(1) remove-from-end eviction.
Evolution: Brute Force → Better → Optimal
Brute Force
Use list of keys by recency. Lookup is O(n); updates may also be O(n).
Better
Use hash map for key lookup and separate array/list for order. Still costly to move keys in middle due to shifting.
Optimal
Use Hash Map + Doubly Linked List:
get: O(1) lookup + O(1) move node to front.put: O(1) insert/update + O(1) tail eviction if needed.
Data Structure Blueprint
Doubly Linked List Role
- Head side: most recently used (MRU).
- Tail side: least recently used (LRU).
- Supports O(1) remove and O(1) insert at front if node pointer is known.
Hash Map Role
map[key] = node_reference- Provides O(1) direct access to node for get/update.
Step-by-Step Operations
get(key)
- If key not in map, return -1.
- Get node from map.
- Remove node from current place in list.
- Insert node right after head (MRU position).
- Return node value.
put(key, value)
- If key already exists:
- Update node value.
- Move node to MRU position.
- If key does not exist:
- Create node and insert at MRU position.
- Add key -> node in map.
- If size exceeds capacity, remove tail's previous node (LRU) and delete from map.
ASCII Diagram
Head <-> [K3] <-> [K1] <-> [K7] <-> Tail
MRU LRU
get(K1):
Head <-> [K1] <-> [K3] <-> [K7] <-> Tail
put(new) when full:
Evict K7 (near tail), insert new near head
Python Implementation
class Node:
def __init__(self, key: int, value: int):
self.key = key
self.value = value
self.prev = None
self.next = None
class LRUCache:
def __init__(self, capacity: int):
self.capacity = capacity
self.cache = {} # key -> node
# Dummy head and tail to simplify edge cases
self.head = Node(0, 0) # MRU side next to head
self.tail = Node(0, 0) # LRU side prev to tail
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node: Node) -> None:
prev_node = node.prev
next_node = node.next
prev_node.next = next_node
next_node.prev = prev_node
def _add_to_front(self, node: Node) -> None:
first_real = self.head.next
self.head.next = node
node.prev = self.head
node.next = first_real
first_real.prev = node
def get(self, key: int) -> int:
if key not in self.cache:
return -1
node = self.cache[key]
self._remove(node)
self._add_to_front(node)
return node.value
def put(self, key: int, value: int) -> None:
if key in self.cache:
node = self.cache[key]
node.value = value
self._remove(node)
self._add_to_front(node)
return
new_node = Node(key, value)
self.cache[key] = new_node
self._add_to_front(new_node)
if len(self.cache) > self.capacity:
# Evict least recently used node (just before tail)
lru = self.tail.prev
self._remove(lru)
del self.cache[lru.key]
Line-by-Line Explanation
Node and Sentinels
Nodestores key, value, prev, next.- Dummy
headandtailavoid null-check complexity at boundaries.
Private Helpers
_remove(node): unlink node in O(1)._add_to_front(node): place node after head in O(1).
Public API
get: check map, move accessed node to front, return value.putexisting key: update + move to front.putnew key: insert front; if overflow, evict tail-prev.
Worked Example
put(1, 10)-> cache: [1]put(2, 20)-> cache: [2, 1] (2 is MRU)get(1)returns 10 -> order becomes [1, 2]put(3, 30)-> capacity exceeded, evict LRU key 2 -> order [3, 1]get(2)returns -1
Time Complexity
- get: O(1) average
- put: O(1) average
- Why: hashmap lookup O(1) average, linked-list pointer operations O(1)
Space Complexity
- O(capacity) for hash map + linked list nodes.
Edge Cases
- Capacity = 1: every new key evicts previous key.
- Put existing key: must update value and mark as most recent.
- Get missing key: return -1 without modifying order.
- Repeated gets: key should remain near front.
Common Mistakes
get, which breaks recency semantics.
Pattern Recognition
Use this design pattern when problem asks for:
- Fast lookup by key + fast recency/frequency order maintenance.
- O(1) operations under fixed capacity policy.
- Eviction by usage order (LRU/LFU variants).
Interview Insight
Practice Problems
- Implement LRU Cache with same API and test edge cases.
- Add
delete(key)operation in O(1). - Implement thread-safe LRU conceptually (locks/read-write strategy).
- Compare LRU vs LFU behavior on repeated access patterns.
Summary
- LRU cache evicts the least recently used key when full.
- Optimal design is Hash Map + Doubly Linked List.
- Both
getandputachieve O(1) average time. - Correct recency updates are as important as correct lookup.
- This pattern is a cornerstone of data-structure design interviews.
21.2 Design LFU Cache
Introduction
An LFU (Least Frequently Used) cache evicts the key with the lowest access frequency when the cache is full. If several keys share the same minimum frequency, the usual tie-break is LRU among those keys (evict the least recently used among the least frequent).
This topic builds directly on LRU thinking, but the eviction policy depends on counts of gets and puts, not only recency.
Real-World Analogy
Imagine a small bookshelf that fits only a few books. You track how often each book is opened. When you need space for a new book, you remove the one opened least often. If two books were opened equally rarely, you remove the one you have not touched for the longest time.
Formal Definition
Support:
get(key): return the value if present; otherwise -1. A successful get increments that key’s frequency.put(key, value): set or insert. Inserting a new key increments its frequency to 1. Updating an existing key also counts as use (frequency increases).
When capacity is exceeded, evict the LFU key; break ties by LRU.
put on existing key increments frequency). Always confirm the spec in an interview.
Why This Topic Matters
- Common follow-up after LRU in FAANG-style interviews.
- Models “popularity” better than pure recency for some workloads (hot keys stay longer).
- Teaches composing multiple structures: key map, per-frequency lists, global min frequency.
Mental Model
key -> Node(key, value, freq)
freq 1: head <-> ... <-> tail (LRU order within this freq)
freq 2: head <-> ... <-> tail
freq 3: ...
min_freq -> points to smallest freq that still has keys
Eviction: remove LRU node from list at min_freq
Evolution: Brute Force → Better → Optimal
Brute Force
Store keys in a dict with (freq, last_used_time). On eviction, scan all keys to find minimum — O(n) per eviction.
Better
Keep a min-heap of (freq, time, key) — updates are messy and not all O(1).
Optimal
Hash map + doubly linked lists per frequency + min_freq pointer. Each get/put touches only local list operations — O(1) average.
min_freq and move it when a bucket empties.
Data Structure Blueprint
key_to_node:key -> Nodefor O(1) lookup.freq_to_list:freq -> DoublyLinkedListstoring keys at that frequency in LRU order (e.g. head = MRU, tail-prev = LRU).min_freq: smallest frequency that currently has at least one key.
Step-by-Step Operations
get(key)
- If key missing, return -1.
- Remove node from its current frequency list.
- If that list becomes empty and
freq == min_freq, incrementmin_freq(next bucket exists because we are about to add at freq+1). - Increment node’s frequency; append to MRU side of the new frequency list.
- Return value.
put(key, value)
- If key exists: update value, then same frequency bump as
get(one use). - If new key and cache full: evict LRU node from
freq_to_list[min_freq], delete fromkey_to_node. - Insert new node with frequency 1; set
min_freq = 1(new key is among the smallest freq).
ASCII Diagram
freq 1: [A] <-> [B] (B is LRU at freq 1)
freq 2: [C]
min_freq = 1 -> evict B if we must evict one key at freq 1
Python Implementation
class Node:
def __init__(self, key: int, value: int, freq: int = 1):
self.key = key
self.value = value
self.freq = freq
self.prev = None
self.next = None
class DoublyLinkedList:
"""Maintains LRU order: head side = MRU, before tail = LRU."""
def __init__(self) -> None:
self.head = Node(0, 0, 0)
self.tail = Node(0, 0, 0)
self.head.next = self.tail
self.tail.prev = self.head
def add_to_front(self, node: Node) -> None:
nxt = self.head.next
self.head.next = node
node.prev = self.head
node.next = nxt
nxt.prev = node
def remove(self, node: Node) -> None:
node.prev.next = node.next
node.next.prev = node.prev
def pop_lru(self) -> Node:
"""Remove node just before tail (LRU)."""
if self.head.next is self.tail:
raise ValueError("empty list")
lru = self.tail.prev
self.remove(lru)
return lru
def is_empty(self) -> bool:
return self.head.next is self.tail
class LFUCache:
def __init__(self, capacity: int) -> None:
self.capacity = capacity
self.key_to_node: dict = {}
self.freq_to_list: dict = {}
self.min_freq = 0
def _ensure_list(self, freq: int) -> DoublyLinkedList:
if freq not in self.freq_to_list:
self.freq_to_list[freq] = DoublyLinkedList()
return self.freq_to_list[freq]
def _touch(self, node: Node) -> None:
f = node.freq
self.freq_to_list[f].remove(node)
if self.freq_to_list[f].is_empty():
del self.freq_to_list[f]
if self.min_freq == f:
# Moved key will be inserted at f+1, so new minimum frequency is f+1
self.min_freq = f + 1
node.freq = f + 1
self._ensure_list(node.freq).add_to_front(node)
def get(self, key: int) -> int:
if key not in self.key_to_node:
return -1
node = self.key_to_node[key]
self._touch(node)
return node.value
def put(self, key: int, value: int) -> None:
if self.capacity <= 0:
return
if key in self.key_to_node:
node = self.key_to_node[key]
node.value = value
self._touch(node)
return
if len(self.key_to_node) >= self.capacity:
lst = self.freq_to_list[self.min_freq]
ev = lst.pop_lru()
del self.key_to_node[ev.key]
if lst.is_empty():
del self.freq_to_list[self.min_freq]
node = Node(key, value, 1)
self.key_to_node[key] = node
self._ensure_list(1).add_to_front(node)
self.min_freq = 1
Line-by-Line Explanation
DoublyLinkedListholds all keys sharing the same frequency; order encodes LRU tie-breaking._touchremoves the node from old freq list; if that list empties and wasmin_freq, bumpmin_freq.- Node moves to
freq + 1at MRU position. puton new key: if full, evict LRU atmin_freqviapop_lru.- New keys start at freq 1 and reset
min_freqto 1.
Worked Example
put(1, 1),put(2, 2)— both freq 1,min_freq = 1.get(1)— key 1 goes to freq 2; only key 2 remains at freq 1,min_freq = 1.put(3, 3)— full; evict LRU among freq 1 → key 2 removed.- Cache holds keys 1 and 3.
Time Complexity
getandput: O(1) average — hash map lookup plus constant linked-list work per list.
Space Complexity
O(capacity)for nodes and map entries; frequency lists share those nodes.
Edge Cases
- capacity == 0:
putshould no-op (LeetCode-style). - Single key repeated access: frequency grows;
min_frequpdates when lower buckets empty. - Tie on eviction: must evict LRU within minimum frequency list.
Common Mistakes
min_freq after removing the last key at the current minimum frequency.
LRU vs LFU (Quick Comparison)
| Policy | Evicts based on | Good when |
|---|---|---|
| LRU | Recency | Temporal locality |
| LFU | Frequency (tie: LRU) | Stable hot keys |
Pattern Recognition
Reach for LFU when the problem mentions:
- Evict least frequent key, or “count” of accesses.
- Tie-breaking by recency among equal frequency.
Interview Insight
get and a full put that triggers eviction.
Practice Problems
- Implement LFU with the LeetCode 460 API and test against examples.
- Compare behavior with LRU on the same access sequence.
- Extend with a max total memory size in bytes (variable value sizes).
Summary
- LFU evicts the least frequently used key; ties broken by LRU within that frequency.
- Use
key -> node,freq -> DLL, andmin_freqfor O(1) operations. - Moving a key to a higher frequency may empty the old bucket and advance
min_freq. - Clarify problem details for
puton existing keys in interviews.
21.3 Design Twitter
Introduction
The "Design Twitter" problem is a classic object-oriented + data structure design question. You need to support posting tweets, following/unfollowing users, and returning a personalized news feed containing recent tweets from the user and people they follow.
This problem tests whether you can combine clean API design, efficient data modeling, and practical complexity trade-offs.
Real-World Analogy
Think of each user as a newspaper publisher. If user A subscribes to users B, C, and D, A's feed should show the newest headlines across all these publishers, mixed by timestamp. You do not need to sort the entire history each time; you only need the most recent few items.
Formal Definition
Support operations:
postTweet(userId, tweetId)getNewsFeed(userId): return up to 10 most recent tweet IDs from user + followees.follow(followerId, followeeId)unfollow(followerId, followeeId)
Ordering should be by most recent first.
Why This Topic Matters
- Common interview problem that mixes hash maps, sets, lists, and heaps.
- Introduces real feed-aggregation thinking used in large social systems.
- Teaches k-way merge pattern for "top recent items from multiple streams."
Mental Model
Each user has:
- own tweet list (most recent at end)
- set of followees
Feed for user U:
Merge recent tweets from:
U + all users U follows
Keep top 10 by timestamp
Evolution: Brute Force → Better → Optimal
Brute Force
Collect all tweets in system, filter by follow relations, sort, take top 10. Very expensive.
Better
Collect only tweets from relevant users (self + followees), sort combined list, take top 10.
Optimal (for interview constraints)
Use per-user tweet lists and max-heap k-way merge over latest tweet from each followed stream.
- Avoid sorting all candidate tweets from scratch each request.
- Extract only up to 10 feed items.
Data Structure Design
followees: dict[int, set[int]]-> who each user follows.tweets: dict[int, list[tuple[int, int]]]-> per-user list of(timestamp, tweetId).timecounter -> incremented each post to maintain global order.
Step-by-Step Operations
postTweet(userId, tweetId)
- Increment global timestamp.
- Append
(time, tweetId)to that user's tweet list.
follow(followerId, followeeId)
- If same user, ignore (or safely no-op).
- Add followee to follower's set.
unfollow(followerId, followeeId)
- Remove followee from follower set if present.
- If absent, no-op.
getNewsFeed(userId)
- Build candidate user set: self + followees.
- Push each candidate user's latest tweet into max-heap (by timestamp).
- Pop most recent tweet, append to answer.
- From same user stream, push the previous tweet (if exists).
- Repeat until 10 tweets or heap empty.
ASCII Diagram
User 1 follows: {2, 3}
Tweets:
U1: (t=8, id=101), (t=12, id=102)
U2: (t=11, id=201)
U3: (t=9, id=301), (t=10, id=302)
Heap starts with latest from each stream:
[(12,102,U1), (11,201,U2), (10,302,U3)]
Pop top repeatedly and push previous from same user stream.
Python Implementation
import heapq
from collections import defaultdict
from typing import List
class Twitter:
def __init__(self):
self.time = 0
self.followees = defaultdict(set) # follower -> set of followees
self.tweets = defaultdict(list) # user -> list of (time, tweetId)
def postTweet(self, userId: int, tweetId: int) -> None:
self.time += 1
self.tweets[userId].append((self.time, tweetId))
def getNewsFeed(self, userId: int) -> List[int]:
users = set(self.followees[userId])
users.add(userId) # user should see own tweets
# Max-heap using negative time in Python min-heap
heap = []
for u in users:
if self.tweets[u]:
idx = len(self.tweets[u]) - 1 # latest index
t, tid = self.tweets[u][idx]
heapq.heappush(heap, (-t, tid, u, idx))
feed = []
while heap and len(feed) < 10:
neg_t, tid, u, idx = heapq.heappop(heap)
feed.append(tid)
prev_idx = idx - 1
if prev_idx >= 0:
pt, ptid = self.tweets[u][prev_idx]
heapq.heappush(heap, (-pt, ptid, u, prev_idx))
return feed
def follow(self, followerId: int, followeeId: int) -> None:
if followerId == followeeId:
return
self.followees[followerId].add(followeeId)
def unfollow(self, followerId: int, followeeId: int) -> None:
self.followees[followerId].discard(followeeId)
Line-by-Line Explanation
timegives unique increasing order for tweets.tweets[user]acts as an append-only personal timeline.getNewsFeedinitializes heap with latest tweet from each relevant user.- Each pop gives globally most recent remaining tweet among streams.
- After pop, pushing previous tweet from same stream creates k-way merge behavior.
Worked Example
- User 1 posts 5, posts 6.
- User 1 follows user 2.
- User 2 posts 7.
getNewsFeed(1)returns [7, 6, 5] (newest first).- User 1 unfollows user 2.
getNewsFeed(1)returns [6, 5].
Time Complexity
postTweet: O(1)follow/unfollow: O(1) average (set operations)getNewsFeed:- Heap init over
Ffollowed users + self: O(F log F) worst-case (or O(F) with heapify variant). - Up to 10 pop/push operations: O(10 log F) = O(log F) practically bounded constant factor.
- Heap init over
Space Complexity
- Follow graph storage: O(total follow relations).
- Tweet storage: O(total tweets).
- Feed heap: O(F).
Edge Cases
- User with no tweets and no follows: empty feed.
- Self-follow requests: usually ignored.
- Unfollow non-followed user: no-op.
- Multiple users with same logical time: avoided by global incrementing counter.
Common Mistakes
System Design Extension Insight
At real scale, pull-based feed generation can become expensive. Systems often use hybrid fan-out strategies:
- Fan-out on write: push tweet to followers' timelines at post time.
- Fan-out on read: compute feed when requested.
- Hybrid: push for normal users, pull for celebrity accounts.
Pattern Recognition
This problem pattern appears when you need:
- Top-k recent items from multiple sorted streams.
- Social graph style follow/unfollow relations.
- Efficient API operations with evolving user activity.
Interview Insight
Practice Problems
- Extend feed size from 10 to configurable k.
- Add
likeTweetand return top liked recent tweets. - Implement pagination for older feed pages.
- Add blocked users constraint to feed filtering.
Summary
- Design Twitter combines graph relationships and top-k feed aggregation.
- Use per-user tweet lists, follow sets, and heap-based k-way merge for feed.
- Main feed optimization: extract only needed recent items, not full history.
- This problem is a strong bridge between DSA design and system design thinking.
21.4 Design MinStack
Introduction
MinStack is a stack data structure that supports normal stack operations and can also return the minimum element in constant time. The key requirement is: push, pop, top, and getMin should all be O(1).
This is a classic interview design problem because it looks simple, but naive approaches often fail on time complexity.
Real-World Analogy
Imagine a pile of books where each book has a weight label. Besides normal push/pop behavior, you want to instantly answer: “What is the lightest book currently in the pile?” If you scan the full pile every time, it is slow. MinStack keeps extra tracking so the answer is immediate.
Formal Definition
Design a stack that supports:
push(x): push element x.pop(): remove top element.top(): return top element.getMin(): return minimum element currently in stack.
All operations must run in O(1).
getMin is the core constraint that rules out scanning the full stack.
Why This Topic Matters
- Very common interview question and often asked as a warm-up for advanced design questions.
- Teaches augmentation pattern: enrich a base data structure with auxiliary state.
- Same pattern appears in max-stack, queue-with-min, and monotonic data structures.
Mental Model
Main stack: stores actual values
Min stack : stores minimum so far at each depth
Depth i in min stack = minimum among first i elements in main stack
When you pop from main stack, you also pop from min stack, so both remain synchronized.
Evolution: Brute Force → Better → Optimal
Brute Force
Keep one normal stack; for getMin, scan all elements.
push/pop/top: O(1)getMin: O(n)
Better
Maintain one variable current_min. Easy for push, but pop becomes hard when the popped element equals current min (you no longer know next minimum without scanning).
Optimal
Use two stacks:
- Main stack stores values.
- Min stack stores minimum-so-far after each push.
Now all operations are O(1).
Step-by-Step Operations
push(x)
- Push
xto main stack. - If min stack empty, push
x. - Else push
min(x, min_stack[-1])to min stack.
pop()
- Pop from main stack.
- Pop from min stack.
top()
- Return main stack top.
getMin()
- Return min stack top.
ASCII Diagram
After pushes: 5, 3, 7, 2
Main: [5, 3, 7, 2]
Min : [5, 3, 3, 2]
top() -> 2
getMin() -> 2
After pop():
Main: [5, 3, 7]
Min : [5, 3, 3]
getMin() -> 3
Python Implementation
class MinStack:
def __init__(self):
self.stack = []
self.min_stack = []
def push(self, val: int) -> None:
self.stack.append(val)
if not self.min_stack:
self.min_stack.append(val)
else:
self.min_stack.append(min(val, self.min_stack[-1]))
def pop(self) -> None:
self.stack.pop()
self.min_stack.pop()
def top(self) -> int:
return self.stack[-1]
def getMin(self) -> int:
return self.min_stack[-1]
Line-by-Line Explanation
stackholds actual values in LIFO order.min_stack[i]stores minimum among firsti + 1pushed elements still present.pushcomputes new running minimum instantly from previous minimum.popremoves both stacks together to keep depths aligned.getMinis O(1) because minimum is always atmin_stack[-1].
Alternative Compact Approach
You can also store pairs in one stack: (value, min_so_far). This avoids maintaining two separate lists but uses the same core idea.
class MinStackPairs:
def __init__(self):
self.stack = []
def push(self, val: int) -> None:
current_min = val if not self.stack else min(val, self.stack[-1][1])
self.stack.append((val, current_min))
def pop(self) -> None:
self.stack.pop()
def top(self) -> int:
return self.stack[-1][0]
def getMin(self) -> int:
return self.stack[-1][1]
Worked Example
push(4)-> min=4push(1)-> min=1push(3)-> min=1getMin()returns 1pop()removes 3getMin()still 1pop()removes 1getMin()now 4
Time Complexity
push: O(1)pop: O(1)top: O(1)getMin: O(1)
Space Complexity
- O(n) extra space for min tracking.
- Two-stack version stores n values + n minima.
Edge Cases
- Duplicate minima: min stack still works because each depth stores min-so-far.
- All decreasing values: min stack mirrors main values.
- All increasing values: min stack repeats first minimum.
- Operations on empty stack: define behavior (exception/no-op) based on problem specification.
Common Mistakes
getMin, violating O(1) requirement.
Pattern Recognition
Use this augmentation pattern when:
- A base data structure needs an extra query in constant time.
- You can precompute “state so far” incrementally (min/max/gcd prefix-like metadata).
- Push/pop operations naturally maintain aligned metadata stacks.
Interview Insight
Practice Problems
- Design MaxStack with O(1) max retrieval.
- Queue with O(1) min using two MinStacks.
- Support
getSecondMin()with reasonable complexity trade-offs. - Build a stack that supports
getMinandgetMaxin O(1).
Summary
- MinStack augments a stack to answer minimum queries in O(1).
- Optimal idea: maintain synchronized running-min metadata.
- All required operations become O(1) with O(n) extra space.
- This is a foundational design pattern for augmented data structures.
21.5 Design HashMap
Introduction
Designing a HashMap means building a data structure that stores key-value pairs and supports fast insert, search, and delete operations. In interviews, this problem checks whether you understand hashing fundamentals instead of only using built-in dictionaries.
The objective is to achieve average-case O(1) for put, get, and remove.
Real-World Analogy
Think of a large set of mailboxes. A hash function is like a rule that maps each person's name to one mailbox number. If two names map to the same mailbox, you need a strategy to handle that collision without losing data.
Formal Definition
Design a map with operations:
put(key, value)– insert or update key.get(key)– return value if key exists, else -1.remove(key)– delete key if present.
Target complexity: average O(1) per operation.
Why This Topic Matters
- HashMap is one of the most used data structures in real software systems.
- Understanding internals helps debug performance and collision issues.
- Frequently asked in coding interviews as “Design HashMap from scratch”.
Mental Model
key --hash()--> bucket index
table[index] holds entries that hash to same index
Collision handling required when multiple keys share index
Hashing gives quick bucket location; collision strategy gives correctness.
Evolution: Brute Force → Better → Optimal
Brute Force
Store key-value pairs in a list and linearly search for every operation.
put/get/remove: O(n)
Better
Use fixed-size array with direct index for small key ranges. Fast but not general for large/sparse keys.
Optimal (general-purpose)
Use hashing + collision handling + resizing (rehashing) for stable average O(1).
Collision Handling Approaches
1) Separate Chaining
Each bucket stores a list of entries. Colliding keys go into the same list.
2) Open Addressing (Linear/Quadratic/Double Hashing)
Store entries directly in table and probe for next free slot on collision.
In this course implementation, we use separate chaining because it is simpler and interview-friendly.
Step-by-Step Design (Separate Chaining)
- Create array of buckets.
- For a key, compute bucket index with hash function.
put: search bucket; update if key exists else append new pair.get: search bucket and return value if found.remove: search bucket and delete pair if found.- Resize and rehash when load factor exceeds threshold (e.g., 0.75).
ASCII Diagram
bucket_count = 8
index: 0 1 2 3 4 5 6 7
[] [k1] [] [k2->k9] [] [k3] [] []
k2 and k9 collided to same bucket (index 3)
Python Implementation
from typing import List, Tuple
class MyHashMap:
def __init__(self):
self.capacity = 8
self.size = 0
self.load_factor_threshold = 0.75
self.buckets: List[List[Tuple[int, int]]] = [[] for _ in range(self.capacity)]
def _index(self, key: int) -> int:
return hash(key) % self.capacity
def _rehash(self) -> None:
old_buckets = self.buckets
self.capacity *= 2
self.buckets = [[] for _ in range(self.capacity)]
self.size = 0
for bucket in old_buckets:
for key, value in bucket:
self.put(key, value)
def put(self, key: int, value: int) -> None:
idx = self._index(key)
bucket = self.buckets[idx]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
bucket.append((key, value))
self.size += 1
if self.size / self.capacity > self.load_factor_threshold:
self._rehash()
def get(self, key: int) -> int:
idx = self._index(key)
bucket = self.buckets[idx]
for k, v in bucket:
if k == key:
return v
return -1
def remove(self, key: int) -> None:
idx = self._index(key)
bucket = self.buckets[idx]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket.pop(i)
self.size -= 1
return
Line-by-Line Explanation
bucketsis an array where each entry is a list of(key, value)pairs._indexmaps a key to bucket index.putupdates existing key if found, else appends new pair.getlinearly checks only one bucket chain, not whole map.removedeletes matching key from its bucket._rehashdoubles capacity and reinserts all pairs to new buckets.
Worked Example
put(1, 100)put(9, 900)(may collide with key 1 for small bucket count)get(1)returns 100put(1, 111)updates existing keyget(1)returns 111remove(9), thenget(9)returns -1
Time Complexity
- Average:
put/get/remove = O(1) - Worst case:
O(n)if many keys collide into same bucket. - Rehash: O(n) occasionally, but amortized cost per operation remains O(1) average.
Space Complexity
- O(n + m) where n = number of entries, m = number of buckets.
Edge Cases
- Update existing key: size should not increase.
- Remove missing key: no-op.
- Negative/large keys: hash function should still map safely.
- Frequent inserts: ensure resizing is implemented or performance degrades.
Common Mistakes
Pattern Recognition
Use HashMap when you need:
- Fast lookup by key.
- Frequent insert/delete/search operations.
- No requirement for sorted order.
Interview Insight
Practice Problems
- Implement HashSet using same hashing framework.
- Build frequency counter from scratch using custom HashMap.
- Implement open addressing version and compare with chaining.
- Add iterator over key-value pairs.
Summary
- HashMap provides average O(1) put/get/remove using hashing.
- Collision handling is essential for correctness.
- Resizing keeps load factor controlled and operations fast in practice.
- This design is foundational for many higher-level algorithms and systems.
21.6 Design Rate Limiter
Introduction
A rate limiter controls how many requests a user/client can make in a given time window. It protects systems from abuse, traffic spikes, accidental overload, and unfair resource usage.
In interview settings, this topic tests both algorithmic data structure skills and practical backend design thinking.
Real-World Analogy
Think of a building elevator that allows only a fixed number of people per minute for safety. If too many people arrive, some must wait. A rate limiter is the software version of this control gate.
Formal Definition
A rate limiter answers this decision repeatedly:
allow(user, timestamp)->Trueif request allowed, elseFalse.
Example policy: "Allow at most 3 requests per 10 seconds per user."
Why This Topic Matters
- Critical for API reliability and abuse prevention in production systems.
- Common in backend and system design interviews.
- Teaches windowing, counters, queues, and trade-offs between precision and cost.
Mental Model
Incoming request
|
v
Lookup user state (counter/timestamps/tokens)
|
Check policy
allow? -> Yes: consume capacity
No : reject/throttle
Core Approaches
1) Fixed Window Counter
Count requests in discrete windows (e.g., per minute). Fast and simple, but can allow bursts near boundary transitions.
2) Sliding Window Log
Store timestamps of recent requests; remove outdated ones each call. Accurate, but memory heavier.
3) Token Bucket
Tokens refill at steady rate; request consumes one token. Supports burst tolerance with smooth long-term rate control.
Evolution: Brute Force → Better → Optimal
Brute Force
Store all historical request timestamps forever and scan all on every request. Correct but very slow and memory-heavy.
Better
Use fixed window counters. O(1) operations, but edge burst artifacts can violate fairness expectations.
Optimal (for accuracy + interview clarity)
Sliding window log using deque per user: keeps only relevant timestamps; accurate per-window enforcement.
Step-by-Step (Sliding Window Log)
Policy: max limit requests per window_seconds.
- For incoming request at time
t, get user's deque. - Remove timestamps
<= t - window_seconds(outside active window). - If deque size is already
>= limit, reject request. - Else append
tand allow request.
ASCII Diagram
Window = 10s, Limit = 3
User A timestamps deque:
[12, 15, 19]
Request at t=22:
Remove <= 12 -> deque becomes [15, 19]
size=2 < 3 -> allow and append 22
new deque: [15, 19, 22]
Python Implementation (Sliding Window)
from collections import defaultdict, deque
from typing import Deque, Dict
class RateLimiter:
def __init__(self, limit: int, window_seconds: int):
self.limit = limit
self.window = window_seconds
self.user_requests: Dict[str, Deque[int]] = defaultdict(deque)
def allow(self, user_id: str, timestamp: int) -> bool:
q = self.user_requests[user_id]
window_start = timestamp - self.window
# Remove requests outside current sliding window
while q and q[0] <= window_start:
q.popleft()
if len(q) >= self.limit:
return False
q.append(timestamp)
return True
Line-by-Line Explanation
user_requests[user]stores only recent timestamps relevant to policy.- Cleanup loop keeps queue minimal by removing expired requests.
- Queue length equals current request count inside active window.
- Accepting request appends timestamp, updating future state.
Additional Example (Boundary Behavior)
Time Complexity
- Per request: amortized O(1) for deque cleanup + append/check.
- Each timestamp enters and leaves deque once.
Space Complexity
- Per user: O(number of requests within current window).
- Total: sum across active users.
Edge Cases
- Out-of-order timestamps: simple deque approach assumes non-decreasing request time per user.
- Very high cardinality users: need state eviction (TTL cleanup) for inactive users.
- Clock skew across servers: distributed systems require synchronized or logical time strategy.
- Limit = 0: reject all requests by definition.
Common Mistakes
Distributed System Considerations
- Use Redis or centralized store for shared counters/timestamps.
- Prefer atomic operations/Lua scripts to avoid race conditions.
- Choose key granularity: per user, per IP, per API key, per endpoint.
- Decide fail-open vs fail-closed behavior when rate-limit store is unavailable.
Pattern Recognition
Rate limiter design appears when requirements mention:
- "X requests per Y seconds"
- Traffic shaping / abuse prevention
- Per-client fairness under high throughput
Interview Insight
Practice Problems
- Implement fixed-window and compare behavior with sliding-window on bursty traffic.
- Implement token bucket rate limiter.
- Add endpoint-specific policies (different limits per API route).
- Build distributed limiter using Redis sorted sets or counters.
Summary
- Rate limiters protect systems by controlling request frequency.
- Sliding window log offers accurate policy enforcement with manageable complexity.
- Choose strategy (fixed/sliding/token) based on fairness, precision, and cost needs.
- Distributed correctness requires shared atomic state and careful clock assumptions.
21.7 Design Vending Machine
Introduction
Designing a Vending Machine is a classic object-oriented design problem that tests how you model states, transitions, inventory, and payments in a clean and extensible way. The challenge is not just “dispense item” — it is handling all business rules correctly.
This problem is excellent practice for modeling real-world workflows with robust error handling.
Real-World Analogy
You select a product, insert money, and expect one of two outcomes: either product + change, or clear reason for rejection/refund. Internally, the machine moves through states like “waiting for selection”, “waiting for payment”, and “dispensing”.
Formal Definition
A vending machine should support:
- Load inventory with item code, price, and quantity.
- Select an item by code.
- Insert money.
- Dispense item if payment is sufficient and stock exists.
- Return change/refund when needed.
Why This Topic Matters
- Highly common in low-level design interviews.
- Teaches state machine thinking and class responsibility separation.
- Builds habits for transactional correctness and edge-case handling.
Mental Model
States:
IDLE -> ITEM_SELECTED -> PAYMENT_COLLECTED -> DISPENSING -> IDLE
\_______________________________________/
cancel/refund path
Every action should be validated against current state and machine invariants (stock, balance, item validity).
Evolution: Brute Force → Better → Optimal
Brute Force
Put all logic in one giant function with many if-else checks. Works for tiny demos, becomes hard to maintain.
Better
Use one class with helper methods for inventory and payment checks. Cleaner, but state transitions can still become messy.
Optimal (for interview-quality design)
Model:
- Clear entities (Item, InventorySlot, VendingMachine).
- Explicit state variable (or State pattern for larger systems).
- Deterministic transitions and guarded operations.
Core Components
Item
Represents product details: code, name, price.
Inventory Slot
Maps item to available quantity.
VendingMachine
- Inventory storage
- Current selected item
- Inserted balance
- State transitions and business rules
Step-by-Step Flow
- User selects item code.
- Machine validates code and stock.
- User inserts money (possibly multiple times).
- When balance >= price, machine dispenses item.
- Machine returns change if balance > price.
- Machine resets session state for next customer.
ASCII Diagram
[IDLE]
| select(valid, in-stock)
v
[ITEM_SELECTED]
| insert money
v
[PAYMENT_COLLECTED]
| enough balance?
| yes -> dispense + change -> reset
| no -> wait for more / cancel-refund
Python Implementation
from dataclasses import dataclass
from typing import Dict, Optional, Tuple
@dataclass(frozen=True)
class Item:
code: str
name: str
price: int # price in smallest unit (e.g., cents)
@dataclass
class InventorySlot:
item: Item
quantity: int
class VendingMachine:
IDLE = "IDLE"
ITEM_SELECTED = "ITEM_SELECTED"
def __init__(self):
self.inventory: Dict[str, InventorySlot] = {}
self.state = VendingMachine.IDLE
self.selected_code: Optional[str] = None
self.balance = 0
def load_item(self, item: Item, quantity: int) -> None:
if quantity <= 0:
return
if item.code in self.inventory:
self.inventory[item.code].quantity += quantity
else:
self.inventory[item.code] = InventorySlot(item=item, quantity=quantity)
def select_item(self, code: str) -> str:
if code not in self.inventory:
return "Invalid item code."
slot = self.inventory[code]
if slot.quantity <= 0:
return "Item out of stock."
self.selected_code = code
self.state = VendingMachine.ITEM_SELECTED
self.balance = 0
return f"Selected {slot.item.name}. Price: {slot.item.price}"
def insert_money(self, amount: int) -> str:
if self.state != VendingMachine.ITEM_SELECTED:
return "Select an item first."
if amount <= 0:
return "Insert a positive amount."
self.balance += amount
slot = self.inventory[self.selected_code]
if self.balance < slot.item.price:
remaining = slot.item.price - self.balance
return f"Inserted {amount}. Remaining: {remaining}"
return f"Inserted {amount}. Ready to dispense."
def dispense(self) -> Tuple[str, int]:
if self.state != VendingMachine.ITEM_SELECTED or self.selected_code is None:
return ("No item selected.", 0)
slot = self.inventory[self.selected_code]
price = slot.item.price
if slot.quantity <= 0:
self._reset_session()
return ("Item became unavailable.", self._refund_all())
if self.balance < price:
return ("Insufficient balance.", 0)
slot.quantity -= 1
change = self.balance - price
item_name = slot.item.name
self._reset_session()
return (f"Dispensed: {item_name}", change)
def cancel(self) -> int:
refund = self._refund_all()
self._reset_session()
return refund
def _refund_all(self) -> int:
refund = self.balance
self.balance = 0
return refund
def _reset_session(self) -> None:
self.state = VendingMachine.IDLE
self.selected_code = None
self.balance = 0
Line-by-Line Explanation
ItemandInventorySlotseparate immutable product info from mutable stock count.select_itemvalidates existence and stock before entering purchase session.insert_moneyupdates balance only in valid state.dispenseenforces payment and stock checks, decrements inventory, and computes change.cancelreturns full refund and resets machine session.
Worked Example
- Load Coke (code C1, price 120, qty 2).
- Select C1 -> state ITEM_SELECTED.
- Insert 50 -> remaining 70.
- Insert 100 -> balance 150 (enough).
- Dispense -> returns "Dispensed: Coke", change 30, quantity reduces by 1.
Time Complexity
load_item,select_item,insert_money,dispense,cancel: O(1) average (hash map access).
Space Complexity
- O(n) where n is number of item codes loaded into machine.
Edge Cases
- Invalid item code: reject immediately.
- Out-of-stock item: prevent selection/dispense.
- Insufficient balance: keep waiting or allow cancel.
- Cancel mid-transaction: full refund.
- Concurrent users: real system needs session locking/isolation.
Common Mistakes
Design Extensions
- Coin inventory for exact change-making.
- Multiple payment methods (cash/card/UPI).
- Admin mode for refill/pricing updates.
- Telemetry: sales logs, low-stock alerts, fault reporting.
Pattern Recognition
This pattern appears when systems involve:
- Clear workflow states and transitions.
- Inventory/resources and transactional updates.
- User actions that can fail at different checkpoints.
Interview Insight
Practice Problems
- Add exact change-making using limited coin inventory.
- Support multiple item selection cart before checkout.
- Add timeout auto-cancel with refund.
- Model this using full State Design Pattern classes.
Summary
- Vending machine design is a stateful workflow + inventory + payment problem.
- Clean object model with guarded transitions yields maintainable behavior.
- Correctness depends on state reset, stock handling, and money accounting.
- This is a strong practice problem for real-world low-level design interviews.
21.8 Design TinyURL
Introduction
Design TinyURL is a classic system design + data structure problem where long URLs are converted into short unique aliases. The core requirements are correctness, uniqueness, fast lookup, and scalability.
In interviews, this problem checks whether you can move from simple mapping logic to production concerns like collisions, key space size, distributed ID generation, and analytics.
Real-World Analogy
Imagine replacing long home addresses with short locker numbers. Instead of writing the full address every time, you hand out a compact code that maps back to the real destination when needed.
Formal Definition
Provide two main APIs:
encode(longUrl)-> returns short URL.decode(shortUrl)-> returns original long URL.
Typical constraints:
- Same short URL should decode to exactly one long URL.
- Generated keys should avoid collisions (or handle them safely).
- Operation latency should be low.
Why This Topic Matters
- Frequently asked in backend/system design interviews.
- Combines hashing, encoding, storage, and distributed architecture choices.
- Teaches how small API surfaces can hide large scaling complexity.
Mental Model
Long URL --(generate short key)--> short key
short key --(store mapping)------> long URL
decode:
short key --(lookup)-------------> long URL
Core Approaches
1) Auto-increment ID + Base62 Encoding
Generate numeric IDs sequentially, then encode into Base62 (0-9, a-z, A-Z). Deterministic and collision-free if ID uniqueness is guaranteed.
2) Random Key Generation
Generate random 6-8 char keys and retry on collision. Simple, but collision checks are required.
3) Hash-based Key
Use hash of URL and truncate. Needs collision resolution and often does not guarantee uniqueness by itself.
Evolution: Brute Force → Better → Optimal
Brute Force
Store full URL and search linearly for decode/encode relations. Too slow.
Better
Use hash map with random key generation and collision retries.
Optimal (common production-friendly baseline)
Use unique numeric IDs (from DB sequence/snowflake-like service), Base62 encode for short token, and store direct mapping in key-value storage.
Step-by-Step Design (ID + Base62)
- Generate unique numeric ID.
- Convert ID to Base62 token.
- Store
token -> longUrl. - Optionally store
longUrl -> tokento return same short URL for duplicate long URLs. - Return short URL as
baseDomain + "/" + token. - For decode, parse token and lookup original long URL.
ASCII Diagram
ID Service -> 125
Base62(125) -> "cb"
Store: "cb" -> "https://example.com/very/long/path"
Decode "cb" -> lookup -> original URL
Python Implementation (Interview-Scale)
class Codec:
ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
BASE = 62
def __init__(self):
self.id_counter = 1
self.short_to_long = {}
self.long_to_short = {}
self.domain = "https://tiny.url/"
def _encode_base62(self, num: int) -> str:
if num == 0:
return "0"
chars = []
while num > 0:
num, rem = divmod(num, self.BASE)
chars.append(self.ALPHABET[rem])
return "".join(reversed(chars))
def encode(self, longUrl: str) -> str:
# Optional dedup: same long URL returns same short URL
if longUrl in self.long_to_short:
return self.domain + self.long_to_short[longUrl]
token = self._encode_base62(self.id_counter)
self.id_counter += 1
self.short_to_long[token] = longUrl
self.long_to_short[longUrl] = token
return self.domain + token
def decode(self, shortUrl: str) -> str:
token = shortUrl.rsplit("/", 1)[-1]
return self.short_to_long.get(token, "")
Line-by-Line Explanation
id_counterensures unique ID generation in this single-instance demonstration._encode_base62converts numeric IDs to short human-friendly tokens.short_to_longis required for decode path.long_to_shortis optional but useful for idempotent encode behavior.decodeextracts token and does O(1) average map lookup.
Worked Example
encode("https://abc.com/page1")->https://tiny.url/1(token depends on base62 encoding).encode("https://abc.com/page2")-> another unique token.decode("https://tiny.url/1")-> original page1 URL.- Re-encoding page1 can return same token if dedup map is enabled.
Time Complexity
encode: O(log62 N) for Base62 conversion + O(1) average hash ops.decode: O(1) average hash lookup.
Space Complexity
- O(n) for stored mappings, where n is number of unique URLs.
Edge Cases
- Duplicate URLs: choose dedup or always-new-token policy.
- Invalid short token: return error/empty response/404.
- URL normalization: decide whether equivalent URLs map to same key.
- Token exhaustion: increase token length or expand key generation space.
Common Mistakes
System Design Considerations
- Storage: key-value DB for
token -> URL, optional secondary index for dedup. - ID generation: DB auto-increment, distributed ID service, or pre-allocated ranges.
- Caching: cache hot token lookups for low-latency decode.
- Analytics: click counts, referrers, geolocation, time buckets.
- Security: phishing detection, malware scanning, abuse rate limiting.
- Expiration: optional TTL-based link expiry and cleanup jobs.
Pattern Recognition
TinyURL-style design appears when you need:
- Compact externally visible identifiers for long internal data.
- Fast reverse lookup by short token.
- Large-scale unique ID allocation across distributed systems.
Interview Insight
Practice Problems
- Add custom alias support while preserving uniqueness constraints.
- Implement expiration and soft-delete of links.
- Add analytics counters with eventual consistency discussion.
- Implement random-key strategy and collision retry logic.
Summary
- TinyURL is a bidirectional mapping + key generation design problem.
- ID + Base62 is a strong baseline: compact, predictable, collision-free with unique IDs.
- Decode speed, uniqueness guarantees, and abuse controls are critical in production.
- This problem is a strong blend of DSA fundamentals and practical system design.
21.9 Design Logging System
Introduction
A Logging System stores log entries and supports time-based retrieval. In interview versions, each log usually has an ID and timestamp, and queries request IDs between two timestamps at a given granularity (Year, Month, Day, etc.).
This problem tests how well you handle temporal data, string/timestamp normalization, and query boundaries.
Real-World Analogy
Think of a surveillance archive where every event has exact date-time. If someone asks “show all events in June 2017” or “show events between these two days,” you should quickly filter by time range and precision level.
Formal Definition
Support operations:
put(logId, timestamp)– store a log.retrieve(start, end, granularity)– return log IDs with timestamps in the requested range after applying granularity.
Typical timestamp format: "YYYY:MM:DD:HH:MM:SS".
YYYY:MM:DD.
Why This Topic Matters
- Common interview problem combining data structure and string/time handling.
- Very relevant for backend systems, observability tools, and monitoring platforms.
- Teaches precision-aware range queries and indexing trade-offs.
Mental Model
Store logs as (timestamp, id)
Query:
1) truncate start/end by granularity
2) expand end to inclusive upper bound for that granularity
3) return logs with timestamp in [start, end]
Evolution: Brute Force → Better → Optimal
Brute Force
Store unsorted logs; for each query scan all logs and compare processed timestamps.
put: O(1)retrieve: O(n)
Better
Keep logs sorted by timestamp and use binary search for range boundaries.
Optimal (for this problem style)
Sorted timestamp list + binary search + granularity prefix transformation.
- Fast range extraction.
- Simple correctness around timestamp string comparisons.
Granularity Mapping
Prefix lengths:
- Year -> 4
- Month -> 7
- Day -> 10
- Hour -> 13
- Minute -> 16
- Second -> 19
Step-by-Step Query Logic
- Convert
startandendto granularity-aware lower/upper bounds. - For lower bound: keep prefix, fill rest with minimal suffix
:00:00:00.... - For upper bound: keep prefix, fill rest with maximal suffix
:99:99:99...(safe lexical upper trick). - Use binary search over sorted timestamps to find matching interval.
- Return IDs in that index range.
ASCII Diagram
Stored (sorted):
2017:01:01:23:59:59 -> id1
2017:01:02:00:00:00 -> id2
2017:02:10:12:00:00 -> id3
Query:
start=2017:01:01:00:00:00
end =2017:01:31:23:59:59
granularity=Month
Range becomes:
[2017:01:00:00:00:00, 2017:01:99:99:99:99]
Matches id1, id2
Python Implementation
from bisect import bisect_left, bisect_right
from typing import List, Tuple
class LogSystem:
def __init__(self):
self.logs: List[Tuple[str, int]] = [] # sorted by timestamp
self.gidx = {
"Year": 4,
"Month": 7,
"Day": 10,
"Hour": 13,
"Minute": 16,
"Second": 19,
}
def put(self, log_id: int, timestamp: str) -> None:
# For interview simplicity: append then sort.
# In production/high-volume inserts, use better indexing/storage.
self.logs.append((timestamp, log_id))
self.logs.sort(key=lambda x: x[0])
def retrieve(self, start: str, end: str, granularity: str) -> List[int]:
idx = self.gidx[granularity]
# Build lexical lower/upper bounds for chosen granularity
low = start[:idx] + "0" * (19 - idx)
high = end[:idx] + "9" * (19 - idx)
left = bisect_left(self.logs, (low, -10**18))
right = bisect_right(self.logs, (high, 10**18))
return [log_id for _, log_id in self.logs[left:right]]
Line-by-Line Explanation
logsstores tuples sorted by timestamp for range binary search.gidxmaps granularity to prefix cutoff index.lowandhighare generated by keeping prefix and relaxing suffix.bisect_leftfinds first timestamp >= low;bisect_rightfinds first > high.- Slice between indices yields all logs in the time range.
Worked Example
put(1, "2017:01:01:23:59:59")put(2, "2017:01:02:00:00:00")put(3, "2017:02:01:00:00:00")retrieve("2017:01:01:00:00:00", "2017:01:31:23:59:59", "Month")-> [1, 2]
Time Complexity
- put in this simple version: O(n log n) due to full sort after each insert.
- retrieve: O(log n + k), where k is number of returned logs.
With always-sorted insertion via bisect + list insert, put becomes O(n) shift cost and retrieval stays O(log n + k).
Space Complexity
- O(n) to store logs.
Edge Cases
- No logs: retrieval returns empty list.
- start > end: should return empty or validate input.
- Multiple logs same timestamp: all matching IDs should be returned.
- Granularity mismatch understanding: ensure truncation applies symmetrically to start and end.
Common Mistakes
System Design Extensions
- Partition logs by date/hour shards for scalable retrieval.
- Use time-series databases or index trees for high-volume ingestion.
- Add retention policies (TTL), compression, and archival storage.
- Support filters by service, severity, region in addition to time range.
Pattern Recognition
This pattern appears when you need:
- Time-based range retrieval.
- Precision/granularity-aware queries.
- Large append-heavy data with selective reads.
Interview Insight
Practice Problems
- Optimize
putusing bisect insertion instead of full sort. - Add severity filtering (
INFO/WARN/ERROR) alongside time retrieval. - Implement rolling log retention (delete entries older than X days).
- Design distributed logging ingestion pipeline with query index service.
Summary
- Logging System design combines timestamp storage and granularity-aware range querying.
- Fixed-format timestamp strings allow direct lexical ordering and binary search.
- Correct boundary handling is the key correctness detail.
- This problem is an excellent bridge between DSA and practical backend observability design.
22.1 Pattern Recognition Framework
Introduction
Pattern recognition is the skill of mapping a new problem to a known solution template quickly and correctly. Most interview problems look different on the surface, but underneath they repeat the same families: sliding window, two pointers, binary search on answer, graph traversal, DP, greedy, and so on.
This topic gives you a practical framework to identify those patterns under time pressure.
Real-World Analogy
A good doctor does not memorize every possible symptom combination separately. They identify clinical patterns and then apply proven treatment protocols. DSA interviews are similar: strong candidates recognize problem signatures and apply the right algorithmic protocol.
Formal Definition
Pattern recognition in DSA is the process of extracting structural cues from a problem statement and selecting the most appropriate algorithm/data structure template with complexity justification.
Why This Topic Matters
- Reduces solve time dramatically in interviews and contests.
- Prevents brute-force dead ends by forcing early complexity thinking.
- Improves communication: you can explain “why this pattern” clearly.
Mental Model
Problem Statement
|
v
Signal Extraction
(constraints, input form, operation type, objective)
|
v
Pattern Candidate Set
|
v
Complexity + Correctness Check
|
v
Chosen Template + Edge Cases + Implementation
The 6-Step Pattern Recognition Framework
- Classify input shape: array/string, matrix, tree, graph, intervals, stream.
- Identify task verb: search, count, optimize, shortest path, connectivity, partition.
- Read constraints first: they eliminate impossible complexities.
- List 2-3 candidate patterns: do not lock in too early.
- Pick by complexity + invariant: choose method with a provable invariant.
- Validate with edge cases: empty input, duplicates, negatives, boundaries.
Pattern Cue Table
| Problem Signal | Likely Pattern | Typical Complexity |
|---|---|---|
| Contiguous subarray/substring | Sliding Window / Prefix Sum | O(n) |
| Sorted data + target condition | Binary Search / Two Pointers | O(log n) / O(n) |
| Tree/graph reachability | DFS / BFS / Union-Find | O(V+E) |
| Optimal with overlapping subproblems | Dynamic Programming | Varies |
| Intervals scheduling/merging | Sort + Greedy | O(n log n) |
Evolution: Brute Force → Better → Optimal
Brute Force
Start with complete enumeration to understand state space and correctness baseline.
Better
Use data structures to remove repeated work (hash map, prefix sums, sorting).
Optimal
Recognize the dominant pattern early and implement invariant-driven solution with target complexity.
Step-by-Step Example (Framework in Action)
Problem: “Find length of longest substring without repeating characters.”
- Input shape: string.
- Task verb: longest contiguous segment.
- Constraint: typically O(n) desired.
- Pattern candidates: brute-force substrings, sliding window.
- Choose sliding window with frequency/last-seen map.
- Invariant: window always has unique characters.
Python Mini-Framework Helper
This helper is not a solver; it demonstrates a checklist-style classifier to train your intuition.
def suggest_patterns(problem_text: str, n_hint: int | None = None) -> list[str]:
text = problem_text.lower()
patterns = []
if "substring" in text or "subarray" in text or "contiguous" in text:
patterns.append("Sliding Window / Prefix Sum")
if "sorted" in text or "monotonic" in text:
patterns.append("Binary Search / Two Pointers")
if "graph" in text or "node" in text or "edge" in text:
patterns.append("BFS / DFS / Union-Find / Shortest Path")
if "tree" in text:
patterns.append("DFS / BFS / Tree DP")
if "minimum" in text or "maximum" in text or "count ways" in text:
patterns.append("Dynamic Programming / Greedy")
if "interval" in text:
patterns.append("Sort + Merge / Greedy")
if n_hint is not None:
if n_hint <= 2000:
patterns.append("O(n^2) may be acceptable")
else:
patterns.append("Target near O(n) or O(n log n)")
# Remove duplicates while preserving order
seen = set()
ordered = []
for p in patterns:
if p not in seen:
seen.add(p)
ordered.append(p)
return ordered
Line-by-Line Explanation
- Uses keyword cues from statement text to propose candidate patterns.
n_hintinjects complexity sanity check.- Returns ordered unique recommendations, mimicking interview thought flow.
Time Complexity Perspective
- Pattern recognition itself is fast; main gain is avoiding wrong-path implementations.
- Primary goal: pick a pattern whose target complexity fits constraints.
Space Complexity Perspective
- Many optimal patterns trade space for speed (hash maps, DP tables, heaps).
- Always state this trade-off explicitly in interviews.
Edge Cases Checklist (Universal)
- Empty input / single element.
- Duplicates / all equal values.
- Negative values / zero handling.
- Boundary indices and overflow-prone operations.
Common Mistakes
Interview Insight
Practice Problems
- Take 20 random problems and label primary + secondary pattern before coding.
- For each solved problem, write one-sentence “pattern signature” for revision.
- Redo medium problems by forcing an alternative valid pattern and compare trade-offs.
Summary
- Pattern recognition is the fastest path from problem statement to correct algorithm family.
- Use a repeatable framework: classify -> shortlist -> justify -> implement.
- Constraints + invariants should drive pattern choice, not keyword guessing.
- This skill is the backbone of interview consistency.
22.2 Problem Difficulty Ladder
Introduction
A problem difficulty ladder is a structured progression system for DSA practice where you intentionally move from easy patterns to advanced combinations. Instead of solving random questions, you follow a sequence that builds transferable skills layer by layer.
This topic helps you train like an athlete: controlled progression, measurable checkpoints, and targeted weakness repair.
Real-World Analogy
You do not begin gym training with maximum weight on day one. You build movement quality first, then strength, then complexity under fatigue. DSA mastery works the same way: foundations first, then harder pattern combinations, then interview simulation pressure.
Formal Definition
Problem Difficulty Ladder is a staged practice framework where each level introduces stricter constraints, deeper pattern composition, and higher implementation precision requirements.
Why This Topic Matters
- Prevents random practice and skill plateaus.
- Builds confidence through repeatable progression milestones.
- Improves interview readiness by matching practice to target company bar.
Mental Model
Level 1: Pattern Recognition
Level 2: Pattern Execution
Level 3: Pattern Mixing
Level 4: Constraint-Driven Optimization
Level 5: Interview Simulation
Each level assumes mastery of the previous one. Skipping levels usually creates hidden weakness.
The 5-Level Difficulty Ladder
Level 1: Foundational Pattern Identification
- Goal: detect core pattern quickly.
- Problem type: easy variants of arrays/strings/hash maps.
- Target: explain why chosen pattern fits constraints.
Level 2: Clean Implementation Under Time
- Goal: implement correctly without template copy-paste dependency.
- Problem type: medium single-pattern questions.
- Target: pass edge cases in first/second attempt.
Level 3: Hybrid Pattern Problems
- Goal: combine two or more patterns (e.g., binary search + greedy check, DFS + DP).
- Problem type: medium-hard composition problems.
- Target: reason about interaction between sub-techniques.
Level 4: Optimization and Trade-offs
- Goal: move from acceptable solution to optimal complexity.
- Problem type: hard constraints, advanced data structures.
- Target: justify why alternatives fail complexity limits.
Level 5: Interview Simulation
- Goal: communicate, code, debug, and optimize in one realistic session.
- Problem type: unseen mixed-difficulty questions under strict time.
- Target: complete solution with clear explanation and test strategy.
Evolution: Brute Force → Better → Optimal (Training Strategy)
Brute Force Practice
Solve random questions without pattern tracking. Progress feels slow and inconsistent.
Better Practice
Group by topic, but still no progression gates or readiness metrics.
Optimal Practice
Use ladder with entry criteria, exit criteria, and review cycles.
Step-by-Step Weekly Ladder Plan
- Select one core topic cluster (e.g., sliding window + prefix sum).
- Solve 6-8 level-1/2 problems for recognition and execution.
- Solve 4-6 level-3 hybrid problems.
- Solve 2-3 level-4 optimization problems.
- Run one level-5 mock interview session.
- Perform retrospective: errors, time leaks, communication gaps, pattern confusion.
Difficulty Ladder Scorecard
| Metric | Target | Why It Matters |
|---|---|---|
| Pattern identification time | < 3 minutes | Reduces interview dead time |
| First correct implementation rate | >= 70% | Reliability under pressure |
| Hint dependency | Decreasing trend | Independence growth |
| Edge-case misses | <= 1 per problem | Code robustness |
ASCII Progress Board
Week N:
L1 [#####]
L2 [#### ]
L3 [### ]
L4 [## ]
L5 [# ]
Goal: push weakest filled bar each week
Python Tracker Utility (Practice Analytics)
from dataclasses import dataclass
from typing import List
@dataclass
class Attempt:
level: int
solved: bool
used_hint: bool
minutes: int
edge_case_bugs: int
def summarize(attempts: List[Attempt]) -> dict:
if not attempts:
return {"total": 0}
total = len(attempts)
solved = sum(a.solved for a in attempts)
hint_count = sum(a.used_hint for a in attempts)
avg_minutes = sum(a.minutes for a in attempts) / total
avg_bugs = sum(a.edge_case_bugs for a in attempts) / total
by_level = {}
for lv in range(1, 6):
group = [a for a in attempts if a.level == lv]
if not group:
continue
by_level[lv] = {
"count": len(group),
"solve_rate": sum(a.solved for a in group) / len(group),
"avg_minutes": sum(a.minutes for a in group) / len(group),
}
return {
"total": total,
"solve_rate": solved / total,
"hint_rate": hint_count / total,
"avg_minutes": avg_minutes,
"avg_edge_case_bugs": avg_bugs,
"by_level": by_level,
}
Line-by-Line Explanation
- Each attempt records level, correctness, hint usage, time, and bug count.
summarizeprovides global and per-level health metrics.- Use this to identify bottlenecks (for example level-3 hybrid weakness).
Time Complexity Perspective
- Ladder planning is about improving average solve complexity choices over time.
- The real complexity gain is strategic: fewer brute-force starts and faster convergence to optimal patterns.
Space Complexity Perspective
- Training logs are lightweight; keeping detailed notes gives disproportionate long-term payoff.
Edge Cases in Preparation
- Overfitting to one platform: solve from multiple sources.
- Skipping review: solved count rises but skill stagnates.
- Only hard problems: weak fundamentals remain hidden.
Common Mistakes
Interview Insight
Practice Problems
- Create your own 5-level ladder for one topic (e.g., graphs).
- Run a 2-week cycle and analyze by-level solve rate changes.
- For each failed problem, classify failure type: pattern miss, implementation bug, edge case miss, time panic.
Summary
- Difficulty ladders convert random practice into systematic skill growth.
- Use staged progression with measurable exit criteria per level.
- Track quality metrics (time, hints, bugs), not only solved count.
- This framework builds interview reliability and long-term mastery.
22.3 Whiteboard Coding
Introduction
Whiteboard coding is not just coding without an IDE. It is a structured communication exercise where interviewers evaluate how you think, decompose problems, reason about edge cases, and recover from mistakes in real time.
Strong candidates treat whiteboard coding as a collaborative design-and-implementation session, not silent puzzle solving.
Real-World Analogy
Imagine a pilot simulation: evaluators are not checking only whether the destination is reached, but how decisions are made under constraints, how checklists are followed, and how anomalies are handled calmly. Whiteboard interviews test similar discipline.
Formal Definition
Whiteboard coding is a constrained problem-solving format where the candidate explains approach, writes code manually, validates with examples, and analyzes complexity without relying on IDE tooling.
Why This Topic Matters
- Many companies still use whiteboard or whiteboard-like collaborative coding rounds.
- Builds clarity of thought, algorithm articulation, and debugging confidence.
- Improves on-the-spot reasoning when syntax support is limited.
Mental Model
Understand -> Clarify -> Plan -> Code -> Trace -> Analyze -> Improve
The order matters. Jumping directly into code usually causes avoidable errors.
The Whiteboard Coding Workflow (7 Steps)
- Restate the problem: confirm input/output and objective.
- Ask clarifying questions: constraints, duplicates, edge conditions, return format.
- Propose brute force briefly: show baseline understanding.
- Derive optimal approach: explain key invariant/data structure choice.
- Write clean skeleton first: function signature, helpers, core loop.
- Dry run with sample: trace variable changes aloud.
- Complexity + edge cases: finalize confidently.
Brute Force → Better → Optimal (Communication Style)
Brute Force
“A direct approach is X with complexity O(...). It is correct but too slow because ...”
Better
“We can remove repeated work by using ... and reduce complexity to ...”
Optimal
“Final approach uses invariant ... with data structure ... giving O(...) time and O(...) space.”
Whiteboard-Friendly Code Structure
- Use short meaningful variable names (
left,right,freq). - Split tricky logic into helper functions when possible.
- Avoid deeply nested code if a guard clause can simplify flow.
- Write comments only for non-obvious invariants.
ASCII Whiteboard Layout Strategy
+-----------------------------------------------+
| Problem Notes / Constraints |
|-----------------------------------------------|
| Example + Dry Run Table |
|-----------------------------------------------|
| Final Code |
|-----------------------------------------------|
| Complexity + Edge Cases |
+-----------------------------------------------+
This layout keeps your thinking visible and easy for interviewer to follow.
Step-by-Step Demonstration Pattern
Use this sequence for almost any medium DSA question:
- “I’ll restate: ...”
- “Assumptions: ...”
- “Naive approach: ... O(...)”
- “Better insight: ...”
- “Final algorithm steps: 1..2..3..”
- Write code and narrate critical lines.
- Dry run + complexity + edge case checks.
Python Template for Whiteboard Communication
def solve(nums):
# 1) Guard clauses
if not nums:
return 0
# 2) State initialization
left = 0
best = 0
freq = {}
# 3) Main loop with invariant narration:
# window [left..right] always valid
for right, x in enumerate(nums):
freq[x] = freq.get(x, 0) + 1
while not is_valid(freq): # placeholder condition
y = nums[left]
freq[y] -= 1
if freq[y] == 0:
del freq[y]
left += 1
best = max(best, right - left + 1)
return best
This is a generic communication template: guard clauses, state, invariant loop, and final result.
Line-by-Line Explanation
- Guard clause shows immediate boundary awareness.
- Initialization makes data dependencies explicit.
- Main loop updates state progressively.
- Invariant-preserving while-loop demonstrates correctness control.
bestupdate indicates objective tracking.
Time Complexity Checklist During Interview
- State complexity before coding when possible.
- Name each dominant loop/data-structure operation.
- Mention amortized behavior if relevant (e.g., two pointers, deque pops).
Space Complexity Checklist
- Differentiate input space vs extra auxiliary space.
- Mention recursion stack for DFS/backtracking solutions.
Edge Cases to Always Ask/Check
- Empty input, single element, all equal values.
- Negative values, large constraints, duplicates.
- Index boundaries and overflow-sensitive operations.
Common Mistakes
Interview Recovery Strategy (When Stuck)
- Pause and summarize current state in one sentence.
- State where uncertainty is (pattern, edge case, implementation detail).
- Propose smallest possible next test case.
- Adjust with explicit reasoning instead of random edits.
Pattern Recognition in Whiteboard Context
When under pressure, use quick cue mapping:
- Contiguous region -> sliding window/prefix.
- Sorted + threshold -> binary search/two pointers.
- Reachability/path -> BFS/DFS.
- “Ways/optimal with overlap” -> DP.
Interview Insight
Practice Problems
- Re-solve 10 medium problems on paper without running code.
- Record yourself explaining approach in under 2 minutes before coding.
- Practice “live dry-run” for each solution with at least two edge cases.
- Run timed mock interviews with a friend and feedback rubric.
Summary
- Whiteboard coding evaluates reasoning + communication + correctness.
- Follow a repeatable workflow: clarify, plan, code, trace, analyze.
- Narrated brute-force-to-optimal progression increases interviewer confidence.
- Calm debugging and explicit invariants often differentiate top candidates.
22.4 Communication Strategy
Introduction
In coding interviews, communication is a core technical skill, not a soft add-on. Interviewers evaluate not just what solution you reach, but how clearly and reliably you reason toward it.
A strong communication strategy helps you make your thinking visible, reduce misunderstandings, and recover smoothly from mistakes.
Real-World Analogy
A senior engineer in production incidents does not silently type fixes. They narrate assumptions, risks, and next steps so the team can coordinate. Interview communication follows the same principle: make your internal model externally understandable.
Formal Definition
Interview communication strategy is a structured way to articulate problem understanding, algorithm decisions, implementation plan, validation, and trade-offs throughout the session.
Why This Topic Matters
- Prevents solving the wrong interpretation of the problem.
- Signals algorithmic clarity and collaboration style.
- Creates opportunities for hints and alignment instead of silent failure.
- Can differentiate two candidates with similar coding ability.
Mental Model
Understand -> Align -> Decide -> Implement -> Validate -> Reflect
At each stage, communicate the minimum essential information clearly and briefly.
The 6-Phase Communication Framework
Phase 1: Problem Alignment
- Restate input, output, and objective in your own words.
- Confirm assumptions and ambiguous requirements.
Phase 2: Constraint Anchoring
- Mention expected complexity targets based on n limits.
- Rule out infeasible brute force clearly.
Phase 3: Approach Narration
- Present brute force briefly, then improved and final approach.
- Name invariants/data structures explicitly.
Phase 4: Implementation Signposting
- Before coding each block, say what it does.
- Call out tricky lines and boundary handling.
Phase 5: Validation Loop
- Dry run with one normal and one edge case.
- Narrate state transitions (pointers/maps/queues).
Phase 6: Final Technical Wrap
- Time and space complexity.
- Trade-offs and possible optimizations.
Brute Force → Better → Optimal Communication Script
Brute Force
“A straightforward method is ..., complexity is O(...), but this may fail for large constraints.”
Better
“We can avoid repeated work by ..., improving to O(...).”
Optimal
“Final approach uses ... invariant with ... data structure; complexity becomes O(...).”
High-Value Sentence Templates
- “Let me restate to confirm we are aligned...”
- “Given n up to ..., I should target around O(...).”
- “The invariant I maintain is ...”
- “I’ll quickly dry run this on ...”
- “One edge case here is ... and this line handles it.”
ASCII Interview Timeline
0-3 min : Clarify + constraints
3-8 min : Approach evolution
8-20 min : Code with signposting
20-25 min : Dry run + complexity + refinements
Mini Python Example + Narration Style
The code is simple; focus is how to narrate intent while writing.
def two_sum(nums, target):
# Narration: map stores value -> index seen so far
seen = {}
for i, x in enumerate(nums):
need = target - x
# Narration: if complement already seen, pair found
if need in seen:
return [seen[need], i]
seen[x] = i
return []
Line-by-Line Communication Notes
- State data structure purpose before writing it (
seenlookup table). - State the key equation (
need = target - x) aloud. - Point out return behavior when solution is found.
- Mention fallback return and assumptions about existence.
Time Complexity Communication Checklist
- Name dominant operations and loop counts.
- Distinguish average vs worst case when hashing is used.
- Avoid vague terms like “fast” without Big-O.
Space Complexity Communication Checklist
- Mention auxiliary structures explicitly (maps, stacks, queues).
- Include recursion depth if recursion is used.
Common Mistakes
Recovery Strategy When You Make a Mistake
- Acknowledge quickly: “I see a bug in boundary handling.”
- Localize precisely: “Issue is in while-loop condition.”
- Patch with rationale: “I’ll change this to preserve invariant ...”
- Re-run one small test case to verify fix.
Pattern Recognition + Communication
When stating a chosen pattern, always attach evidence:
- Input form (string/array/graph).
- Objective type (max/min/count/path).
- Constraint target (O(n), O(n log n), etc.).
- Invariant that proves correctness.
Interview Insight
Practice Problems
- Solve 10 known problems while recording a 2-minute approach explanation before coding.
- Practice one mock where you are graded only on clarity, not code correctness.
- Create a personal checklist card: clarify, constraints, invariant, dry run, complexity.
Summary
- Communication strategy is a technical multiplier in coding interviews.
- Use a repeatable phase-based framework from alignment to wrap-up.
- Narrate invariants and decisions, not every keystroke.
- Clear recovery behavior after mistakes often improves interviewer confidence.
22.5 Time Management
Introduction
Time management in DSA interviews is the skill of allocating minutes intentionally across understanding, planning, coding, testing, and optimization. Many candidates fail not because they lack knowledge, but because they spend too long in one phase and run out of time for crucial final steps.
This topic teaches a practical pacing framework that improves completion rate and interview consistency.
Real-World Analogy
In a marathon, running too fast in the first few kilometers can destroy performance later. In interviews, over-investing early (for example, 20 minutes on one edge case before writing core logic) creates the same failure pattern.
Formal Definition
Interview time management is the deliberate budgeting of limited session time across problem-solving phases, with checkpoint-based adjustments to maximize the probability of a complete, correct, and communicable solution.
Why This Topic Matters
- Increases probability of shipping a full solution within interview limits.
- Prevents “almost done but no dry run” outcomes.
- Improves interviewer confidence through controlled execution.
Mental Model
Budget -> Execute -> Checkpoint -> Adjust -> Deliver
The goal is not perfection in every phase; the goal is high-confidence delivery before time expires.
Standard 45-Minute Coding Round Budget
| Phase | Target Time | Outcome |
|---|---|---|
| Understand + clarify | 3-5 min | Aligned problem statement |
| Approach design | 7-10 min | Chosen algorithm + complexity |
| Implementation | 18-22 min | Working code skeleton complete |
| Dry run + edge cases | 6-8 min | Bug fixes + correctness confidence |
| Final wrap | 2-3 min | Complexity + optional optimization |
Brute Force → Better → Optimal (Pacing Strategy)
Brute Force Pacing
No time checkpoints; candidate gets stuck and notices too late.
Better Pacing
Rough phase targets, but no active adjustment if delayed.
Optimal Pacing
Checkpoint-driven pacing with explicit pivot decisions when phase exceeds budget.
Checkpoint Rules (Practical)
- If no clear approach by minute 10, state fallback approach and start coding.
- If core skeleton not done by minute 25, reduce optional abstractions and finish main path first.
- If debugging crosses 5 minutes, run smallest failing test and isolate one variable/invariant at a time.
- Reserve final 2-3 minutes for complexity and trade-off summary no matter what.
ASCII Pace Tracker
00----05----10----20----30----40----45
| U/C | Design | Coding | Test | Wrap |
U/C = Understand + Clarify
Time-Aware Execution Template (Python)
This utility models phase tracking during mock practice sessions.
from dataclasses import dataclass
from typing import List
@dataclass
class PhaseLog:
phase: str
planned_minutes: int
actual_minutes: int
def pacing_report(logs: List[PhaseLog]) -> dict:
total_planned = sum(x.planned_minutes for x in logs)
total_actual = sum(x.actual_minutes for x in logs)
overruns = []
for x in logs:
delta = x.actual_minutes - x.planned_minutes
if delta > 0:
overruns.append((x.phase, delta))
return {
"total_planned": total_planned,
"total_actual": total_actual,
"overrun_minutes": max(0, total_actual - total_planned),
"phase_overruns": overruns,
}
Line-by-Line Explanation
PhaseLogcaptures planned vs actual time per phase.pacing_reporthighlights where time leaks repeatedly occur.- Use these signals to adjust future budgets (for example, reduce design overthinking).
Time Complexity Perspective
- Pacing decisions should be complexity-aware: avoid spending long time polishing non-viable O(n^2) plans when constraints need O(n log n) or better.
Space Complexity Perspective
- When short on time, choose simpler implementation with slightly higher space if it is easier to code correctly and explain.
Edge Cases in Time Management
- Hard problem spike: switch to clear baseline + discuss optimization path.
- Unexpected bug late: patch minimal safe fix first, then mention full refinement if time remained.
- Interviewer interruptions: answer briefly, then restate where you paused.
Common Mistakes
Interview Insight
Practice Problems
- Run 5 timed mocks with fixed 45-minute budget and log phase splits.
- For each mock, identify one recurring time leak and one correction rule.
- Practice “minute-10 decision”: commit to a viable approach and move to coding.
Summary
- Time management is a first-class interview skill, not an afterthought.
- Use phase budgets and checkpoints to avoid late-stage incompletion.
- Prioritize complete, validated solutions over perfection paralysis.
- Consistent pacing dramatically improves interview outcomes.
22.6 Fast I/O
Introduction
Fast I/O (Input/Output) is the technique of reading and writing data efficiently when input size is huge. In many coding contests and some interview-style assessments, algorithm complexity is correct but solution still times out due to slow I/O methods.
In Python, understanding when and how to optimize I/O can be the difference between AC (Accepted) and TLE (Time Limit Exceeded).
Real-World Analogy
Suppose you can solve packages quickly in a warehouse, but the loading gate is narrow and slow. Overall throughput is still poor. Similarly, even an O(n) algorithm can underperform if each line read/write is expensive.
Formal Definition
Fast I/O means minimizing overhead in data ingestion and output emission by using buffered operations and reduced per-call overhead.
Why This Topic Matters
- Critical for competitive programming and large-batch coding tests.
- Prevents TLE in otherwise correct solutions.
- Builds awareness of runtime bottlenecks beyond algorithm Big-O.
Mental Model
Total runtime = Algorithm time + I/O overhead
If input/output volume is huge:
I/O overhead can dominate
Optimize both compute path and data movement path.
Brute Force → Better → Optimal
Brute Force
Use repeated input() and print() in loops.
- Simple but high per-call overhead.
Better
Use sys.stdin.readline and buffer outputs in list, then join once.
Optimal (Python contest baseline)
Read raw bytes using sys.stdin.buffer.read(), parse tokens once, and write output using sys.stdout.write() in bulk.
Step-by-Step Fast I/O Strategy
- Read entire input once using buffered method.
- Split into tokens and parse with pointer/index.
- Avoid repeated string conversions where possible.
- Collect outputs in list and write once at end.
ASCII Diagram
Slow path:
input() -> parse -> print()
input() -> parse -> print()
... repeated many times
Fast path:
read all -> tokenize -> compute -> join outputs -> single write
Python Implementations
Approach A: Practical Fast Enough (Most Cases)
import sys
def solve():
input = sys.stdin.readline
n = int(input().strip())
arr = list(map(int, input().split()))
ans = sum(arr)
sys.stdout.write(str(ans) + "\\n")
if __name__ == "__main__":
solve()
Approach B: High-Volume Fast I/O Template
import sys
def solve():
data = sys.stdin.buffer.read().split()
it = iter(data)
n = int(next(it))
total = 0
for _ in range(n):
total += int(next(it))
sys.stdout.write(str(total) + "\\n")
if __name__ == "__main__":
solve()
Line-by-Line Explanation
sys.stdin.buffer.read()reads bytes in one buffered call.split()tokenizes by whitespace quickly.- Iterator over tokens avoids manual index tracking complexity.
sys.stdout.writeavoids repeatedprintoverhead.
Additional Example: Batch Output
print each time. Append to list and do:
out = []
for x in answers:
out.append(str(x))
sys.stdout.write("\\n".join(out))
Time Complexity Perspective
- Algorithm complexity remains same asymptotically.
- Fast I/O reduces constant factors significantly in input-heavy tasks.
Space Complexity Perspective
- Bulk read approach uses more memory (stores full input tokens).
readline-based approach uses less memory but may be slightly slower.- Choose based on input size and memory limits.
When to Use Which Approach
- Small/medium input: normal
input()is often fine. - Large input, moderate memory:
readline+ batched output. - Very large input, tight time: buffered
read().split()style.
Edge Cases
- Trailing spaces/newlines: use robust parsing (
split()handles whitespace). - Empty input: guard before reading expected tokens.
- Mixed token types: parse carefully and validate expected count.
Common Mistakes
print in large loops instead of buffered output.
Interview Insight
Practice Problems
- Implement same solution with
input(),readline, andbuffer.read; benchmark differences. - Solve large query-sum problem with batched output.
- Build reusable fast token parser template for contests.
readline version and a high-throughput buffer.read version. Pick based on constraints instead of habit.
Summary
- Fast I/O reduces runtime overhead in data-heavy problems.
- Use buffered reads and batched writes when input/output volume is large.
- Algorithm complexity still remains the primary performance driver.
- Choose I/O strategy by balancing speed and memory constraints.
22.7 Modulo Tricks
Introduction
Modulo arithmetic is one of the most common tools in DSA and competitive programming. It helps prevent integer overflow, supports cyclic behavior, and enables efficient counting/combinatorics under large constraints.
This topic is not just “use % 1000000007”. It is about understanding the rules deeply so you can apply them correctly in dynamic programming, number theory, hashing, and prefix techniques.
Real-World Analogy
Think of a clock with 12 hours. After 12 comes 1 again. Modulo arithmetic works like this wrap-around system. On a clock, (10 + 5) mod 12 = 3. In programming, the same logic helps in circular indexing and bounded arithmetic.
Formal Definition
For integers a and positive m:
a mod m is the remainder when a is divided by m.
m if they have the same remainder: a ≡ b (mod m).
Why This Topic Matters
- Avoids overflow in large multiplication/addition chains.
- Essential in counting problems where answers are huge.
- Enables advanced techniques: modular inverse, fast exponentiation, hashing, cyclic arrays.
Mental Model
Work inside a fixed remainder space [0, m-1]
Every operation "wraps" back into this range
As long as you apply modulo rules correctly, intermediate huge values can be safely controlled.
Core Identities
(a + b) % m = ((a % m) + (b % m)) % m(a - b) % m = ((a % m) - (b % m) + m) % m(a * b) % m = ((a % m) * (b % m)) % m- Division is not direct:
(a / b) % mneeds modular inverse ofb(when it exists).
Brute Force → Better → Optimal
Brute Force
Compute large numbers directly then take modulo at the end. Risk: overflow/time issues.
Better
Take modulo after each operation to keep values bounded.
Optimal
Combine rolling modulo with fast exponentiation, modular inverse, and precomputation where required.
Step-by-Step Tricks You Must Know
1) Safe Subtraction
Use (a - b + m) % m to avoid negative remainders.
2) Fast Power (Binary Exponentiation)
Compute a^b % m in O(log b), not O(b).
3) Modular Inverse
For prime m, inverse of x is x^(m-2) % m (Fermat's theorem), when x % m != 0.
4) Prefix Mod Trick
For subarray sum divisibility problems, track prefix sum remainders.
ASCII Diagram
Modulo 5 number line wraps:
... -2 -1 0 1 2 3 4 5 6 7 ...
remainders:
... 3 4 0 1 2 3 4 0 1 2 ...
Same remainder => same class mod 5
Python Implementation Snippets
Fast Power (mod exponentiation)
def mod_pow(a: int, b: int, mod: int) -> int:
a %= mod
result = 1
while b > 0:
if b & 1:
result = (result * a) % mod
a = (a * a) % mod
b >>= 1
return result
Modular Inverse (prime mod)
def mod_inv(x: int, mod: int) -> int:
# Works when mod is prime and x % mod != 0
return mod_pow(x, mod - 2, mod)
Count subarrays with sum divisible by k
from collections import defaultdict
def count_divisible_subarrays(nums, k):
freq = defaultdict(int)
freq[0] = 1
prefix = 0
ans = 0
for x in nums:
prefix = (prefix + x) % k
ans += freq[prefix]
freq[prefix] += 1
return ans
Line-by-Line Explanation
mod_powhalves exponent each iteration using binary representation.mod_invuses Fermat theorem shortcut for prime modulus.- In prefix-divisibility, equal remainders imply divisible difference.
Worked Example
nums = [4, 5, 0, -2, -3, 1], k = 5
Prefix remainders repeat multiple times. Every repeated remainder pair contributes one valid subarray. Final answer is 7.
Time Complexity
- Fast exponentiation: O(log b)
- Prefix remainder counting: O(n)
- Mod inverse via fast power: O(log mod)
Space Complexity
- Prefix remainder map: O(min(n, k))
- Fast power/inverse: O(1) auxiliary
Edge Cases
- Negative numbers: normalize remainder as needed (language-dependent behavior).
- Non-prime modulus: modular inverse may not exist for all values.
- Division under modulo: only valid when inverse exists.
- Very large multiplication: apply modulo at each step.
Common Mistakes
(a - b) % m and assuming non-negative result in all languages.
Pattern Recognition
Modulo tricks usually appear when:
- Problem asks for answer “mod 1e9+7” or large prime.
- Subarray divisibility or remainder-frequency patterns exist.
- Huge exponentiation/combinatorics required.
- Circular indexing behavior is involved.
Interview Insight
Practice Problems
- Implement
nCr % modwith factorial + inverse factorial. - Count subarrays with sum divisible by k.
- Compute huge power tower variants using modular exponentiation.
- Solve circular array indexing problems with modulo normalization.
Summary
- Modulo arithmetic is essential for large-number and cyclic problems.
- Use core identities carefully, especially subtraction and division.
- Fast exponentiation and modular inverse are must-know tools.
- Correct modulo usage prevents many hidden runtime and correctness failures.
22.8 Contest Strategy
Introduction
Contest strategy is the system you use to maximize score under fixed time, not just your raw algorithm knowledge. Many strong coders underperform because they solve in the wrong order, spend too long debugging one problem, or ignore scoring mechanics.
This topic teaches how to convert knowledge into consistent contest outcomes.
Real-World Analogy
In a chess tournament, winning is not only about seeing deep tactics. You also manage clock time, choose practical lines, avoid unnecessary risk, and adapt to opponent pressure. Coding contests require the same strategic control.
Formal Definition
Contest strategy is a decision framework for problem selection, time allocation, risk management, debugging priority, and submission timing to optimize final rank/score.
Why This Topic Matters
- Improves rank without increasing theoretical knowledge immediately.
- Reduces panic and poor decisions under time pressure.
- Builds repeatable habits for long contests and interview assessments.
Mental Model
Scan -> Prioritize -> Execute -> Validate -> Submit -> Replan
You should continuously re-evaluate your strategy during the contest, not just at the start.
Brute Force → Better → Optimal (Contest Approach)
Brute Force
Start from problem A and continue sequentially regardless of fit. High risk of getting stuck early.
Better
Quickly scan all problems and solve easiest first.
Optimal
Use dynamic prioritization: solve highest expected-value problems first (confidence × points / time), with strict time caps and fallback decisions.
Pre-Contest Preparation Checklist
- Set up tested templates (fast I/O, graph boilerplate, modulo utilities).
- Warm up with 1-2 short problems to activate speed and focus.
- Review common bug checklist (indices, overflow, modulo negatives, recursion limits).
- Prepare mental submission routine: sample test -> custom edge test -> submit.
In-Contest Step-by-Step Strategy
- First 5 minutes: scan all problems and tag as Easy / Medium / Hard for you (not globally).
- Solve momentum problems first: secure fast accepted submissions.
- Set time cap per attempt: e.g., 20-25 minutes max before reassessment.
- When stuck: leave concise notes and switch; return later with fresh state.
- Before every submit: run quick edge checklist.
- Final phase: prioritize bug-fixing near-complete solutions over speculative new starts.
ASCII Contest Timeline (120-minute Example)
0-----5-----30-----60-----90----110----120
|Scan| Easy | Mid | Mid/Hard | Fix | Final submit checks |
Problem Priority Scoring Heuristic
Use a practical score to decide next problem:
priority_score = confidence * points / estimated_minutes
Higher score means better expected return.
Python Utility for Priority Ranking
from dataclasses import dataclass
from typing import List
@dataclass
class ProblemOption:
name: str
points: int
confidence: float # 0.0 to 1.0
estimated_minutes: int
def rank_problem_options(options: List[ProblemOption]) -> List[ProblemOption]:
def score(p: ProblemOption) -> float:
if p.estimated_minutes <= 0:
return -1.0
return (p.confidence * p.points) / p.estimated_minutes
return sorted(options, key=score, reverse=True)
Line-by-Line Explanation
- Each problem has subjective confidence, potential points, and expected solve time.
- Ranking function estimates expected-value efficiency.
- This is not exact math, but it prevents random decision-making under stress.
Time Management Inside Contest
- Use hard stop rules (for example, 20 minutes without progress -> switch).
- Track time spent vs points gained every 30 minutes.
- Reserve final 10 minutes for validation and careful submission.
Debugging Strategy Under Pressure
- Reproduce smallest failing case.
- Check assumptions/invariants before rewriting logic.
- Audit boundaries: loops, indices, empty/single inputs.
- Only then patch specific lines.
Common Mistakes
Edge Cases in Contest Decisions
- Low confidence but high points: timebox and reevaluate quickly.
- Many partial solutions: convert nearest-complete one to AC first.
- Late-stage fatigue: simplify approach and avoid risky refactors.
Pattern Recognition in Contest Setting
Faster recognition means faster solves. During scan phase, map problem cues to likely patterns immediately and shortlist candidate templates before coding.
Interview Insight
Practice Problems
- Run 3 mock contests and log time allocation decisions every 20 minutes.
- After each contest, classify misses: knowledge gap vs strategy error.
- Practice “forced switch” drill: leave a stuck problem at minute 20 and solve another.
Summary
- Contest success is a strategy problem as much as an algorithm problem.
- Use scan-first prioritization, time caps, and expected-value decisions.
- Secure points early, debug systematically, and submit with discipline.
- Strong contest strategy builds transferable interview performance.
22.9 Mock Interviews
Introduction
Mock interviews are the closest training environment to real coding interviews. They combine problem-solving, communication, time management, and stress handling in one session. Solving problems alone is necessary, but mock interviews are where performance becomes interview-ready.
If your goal is actual selection, mock interviews are non-negotiable.
Real-World Analogy
A pilot does not train only by reading manuals; they train in flight simulators under realistic scenarios. Mock interviews are your simulator: same pressure, same constraints, same evaluation style.
Formal Definition
Mock interview is a structured practice session that replicates real interview conditions (time-boxed problem, live communication, no hidden IDE support assumptions, and post-round evaluation).
Why This Topic Matters
- Reveals gaps invisible in solo practice (panic, unclear explanation, pacing breakdown).
- Builds confidence through repeated exposure to interview-like pressure.
- Transforms knowledge into reliable interview execution.
Mental Model
Simulate -> Measure -> Diagnose -> Correct -> Re-simulate
Every mock should end with specific corrective actions, not generic feedback.
Brute Force → Better → Optimal (Mock Practice)
Brute Force
Random mock sessions with no rubric, no logs, no follow-up.
Better
Regular mocks with basic qualitative feedback.
Optimal
Rubric-based mock cycle: score dimensions, identify root causes, assign targeted drills, and re-test same weakness type.
Mock Interview Session Blueprint (60 Minutes)
| Phase | Duration | Goal |
|---|---|---|
| Problem understanding + clarification | 5 min | Correct interpretation |
| Approach design | 10 min | Brute-force to optimal reasoning |
| Coding | 25 min | Working implementation |
| Dry run + edge cases | 10 min | Bug detection |
| Feedback + action items | 10 min | Improvement loop |
Scoring Rubric Dimensions
- Problem understanding (clarity, assumptions, scope).
- Algorithm quality (correctness + complexity fit).
- Code quality (structure, bugs, edge handling).
- Communication (clear reasoning, collaboration).
- Debugging and recovery (methodical correction under pressure).
- Time management (phase pacing, completion).
ASCII Feedback Loop
Mock #N result
|
v
Root-cause labels
(pattern miss / bug / pacing / communication)
|
v
Targeted drills (3-5)
|
v
Mock #N+1 validation
Python Mock Tracker Utility
from dataclasses import dataclass
from typing import List, Dict
@dataclass
class MockResult:
date: str
understanding: int
algorithm: int
code: int
communication: int
debugging: int
time_management: int
primary_issue: str
def summarize_mocks(results: List[MockResult]) -> Dict:
if not results:
return {"count": 0}
n = len(results)
avg = {
"understanding": sum(r.understanding for r in results) / n,
"algorithm": sum(r.algorithm for r in results) / n,
"code": sum(r.code for r in results) / n,
"communication": sum(r.communication for r in results) / n,
"debugging": sum(r.debugging for r in results) / n,
"time_management": sum(r.time_management for r in results) / n,
}
issue_freq = {}
for r in results:
issue_freq[r.primary_issue] = issue_freq.get(r.primary_issue, 0) + 1
return {"count": n, "averages": avg, "issue_frequency": issue_freq}
Line-by-Line Explanation
- Each mock is scored on six interview-critical dimensions.
- Summary computes trends, not just one-off outcomes.
- Issue frequency reveals recurring bottlenecks for targeted practice.
Time Complexity Perspective
- Mock interviews train complexity judgment under pressure, not just in calm offline solving.
- A frequent mock failure pattern: selecting an O(n^2) plan despite large constraints due to stress.
Space Complexity Perspective
- Mock feedback should include space trade-off reasoning quality, not only runtime choices.
Edge Cases in Mock Preparation
- Over-practice with familiar partners: feedback may become predictable and less useful.
- Only easy mocks: confidence rises but selection readiness does not.
- No speaking practice: strong coder, weak interview signal.
Common Mistakes
Mock Interview Formats You Should Rotate
- Peer mock: accessible and frequent.
- Recorded self-mock: excellent for communication self-audit.
- Professional mock: high-quality calibrated feedback near final prep stage.
Interview Insight
Practice Problems
- Run 6 mocks over 3 weeks with rubric scoring and trend tracking.
- Re-attempt one failed mock problem after one week to measure recovery.
- Do one communication-only mock where coding is secondary and explanation quality is primary.
Summary
- Mock interviews are the highest-fidelity preparation for real interviews.
- Use rubric-based measurement across algorithm, coding, communication, and pacing.
- Convert every mock outcome into targeted corrective drills.
- Consistent mock feedback loops create reliable interview performance.
22.10 Revision Strategy
Introduction
Revision strategy is the system that turns solved problems into long-term interview-ready skill. Without revision, problem-solving quality decays quickly, pattern recall slows down, and previously solved questions feel new again.
This topic teaches how to revise efficiently so your preparation compounds over time.
Real-World Analogy
Learning DSA is like building muscle memory for an instrument. Practicing a piece once is not enough; spaced, structured repetition is what makes performance reliable on stage. Interviews are your stage.
Formal Definition
Revision strategy is a planned schedule for revisiting concepts, patterns, and past problems using spaced repetition, error-driven review, and timed re-implementation to maximize retention and execution speed.
Why This Topic Matters
- Prevents forgetting and keeps core patterns interview-ready.
- Improves speed and confidence under time pressure.
- Converts one-time practice into durable problem-solving intuition.
Mental Model
Solve -> Capture lessons -> Revisit at intervals -> Re-solve faster -> Internalize
Each revision cycle should reduce hint dependency and implementation time.
Brute Force → Better → Optimal (Revision)
Brute Force
Randomly revisit old problems when you remember them. Inconsistent and incomplete.
Better
Maintain topic-wise lists and occasionally re-solve.
Optimal
Use spaced revision + error buckets + timed mixed sets + mock integration.
Spaced Revision Schedule (Practical)
After solving a problem on Day 0, revisit on:
- Day 1 (quick recall check)
- Day 3 (re-implement key idea)
- Day 7 (timed re-solve)
- Day 14 (mixed set recall)
- Day 30 (retention verification)
Revision Buckets Framework
Bucket A: Pattern Misses
You solved only after hints because initial pattern identification failed.
Bucket B: Implementation Bugs
Pattern was right, but coding errors caused WA/TLE.
Bucket C: Edge Case Misses
Core logic worked, but boundary inputs failed.
Bucket D: Complexity Misjudgment
Chosen approach did not meet constraints.
Step-by-Step Weekly Revision Plan
- Pick 15-20 previously solved problems from different topics.
- Tag each problem into revision buckets (A/B/C/D).
- Re-solve 5 problems timed (no notes).
- Revisit 5 problems as explanation-only drill (verbal algorithm articulation).
- Run one mixed mock of 2-3 questions from weak buckets.
- Update notes with one-line takeaway per problem.
ASCII Revision Cycle
Week Start:
[Select old problems]
|
v
[Tag failure type]
|
v
[Timed re-solve]
|
v
[Mock integration]
|
v
[Update notebook + next schedule]
Python Revision Tracker Utility
from dataclasses import dataclass
from typing import List
@dataclass
class RevisionEntry:
problem: str
bucket: str # A/B/C/D
solve_minutes: int
used_hint: bool
passed_all_tests: bool
def revision_summary(entries: List[RevisionEntry]) -> dict:
if not entries:
return {"count": 0}
n = len(entries)
bucket_count = {}
for e in entries:
bucket_count[e.bucket] = bucket_count.get(e.bucket, 0) + 1
return {
"count": n,
"avg_time": sum(e.solve_minutes for e in entries) / n,
"hint_rate": sum(e.used_hint for e in entries) / n,
"pass_rate": sum(e.passed_all_tests for e in entries) / n,
"bucket_distribution": bucket_count,
}
Line-by-Line Explanation
- Each entry tracks outcome quality, not just “solved/not solved”.
- Bucket distribution reveals dominant weakness patterns.
- Average time and hint rate show practical readiness trend.
Time Complexity Perspective
- Revision improves your ability to choose optimal complexity faster.
- Over time, you should need fewer brute-force starts before reaching correct asymptotic approach.
Space Complexity Perspective
- Revision notes should stay concise: key invariant, common bug, final complexity, one edge case.
- Large notes with low signal are hard to review and reduce efficiency.
Edge Cases in Revision Planning
- Only revising easy problems: confidence rises but interview performance stagnates.
- Only revising hard problems: fundamentals become shaky.
- No timed component: recall may exist but execution speed remains low.
Common Mistakes
Interview Insight
Practice Problems
- Create a personal revision sheet for top 10 high-frequency interview patterns.
- Run 14-day spaced revision cycle and compare time/hint metrics before vs after.
- Re-solve one old hard problem weekly under strict 35-minute timer.
Summary
- Revision strategy transforms short-term solving into long-term mastery.
- Use spaced repetition, failure buckets, and timed re-solving.
- Track progress metrics (time, hints, pass rate), not just solved count.
- Consistent revision is one of the strongest predictors of interview reliability.
23.1 SOLID Principles in Python
Introduction
SOLID is a set of five object-oriented design principles that help you write software that is easier to maintain, extend, test, and reason about. In Python, these principles are especially useful because dynamic typing and fast iteration can otherwise lead to tightly coupled, fragile code if design discipline is missing.
This topic moves from pure algorithm solving into engineering-level software design — exactly the layer needed for strong production coding and system interviews.
Real-World Analogy
Imagine building a modular home. If plumbing, wiring, and walls are all tangled together, small changes become dangerous and expensive. SOLID is like architectural standards that keep parts separated with clean interfaces, so upgrades are safe and predictable.
Formal Definition
SOLID expands to:
- S – Single Responsibility Principle (SRP)
- O – Open/Closed Principle (OCP)
- L – Liskov Substitution Principle (LSP)
- I – Interface Segregation Principle (ISP)
- D – Dependency Inversion Principle (DIP)
Why This Topic Matters
- Reduces bug risk when features evolve.
- Improves unit testing through decoupled components.
- Common in senior interviews and code review expectations.
- Builds foundation for design patterns and maintainable architecture.
Mental Model
High cohesion inside classes
Low coupling between classes
Change one feature -> minimal ripple effects
SOLID helps control two forces: where responsibility lives and how components depend on each other.
Brute Force → Better → Optimal (Design Evolution)
Brute Force
Monolithic classes with many unrelated responsibilities and hard-coded dependencies.
Better
Some helper classes introduced, but boundaries and abstractions are inconsistent.
Optimal
SOLID-aligned architecture: focused responsibilities, extension points, substitutable contracts, small interfaces, and dependency abstraction.
S — Single Responsibility Principle (SRP)
Idea
A class should have one reason to change.
Bad Example (Multiple Responsibilities)
class ReportManager:
def create_report(self, data):
return f"Report: {data}"
def save_to_file(self, report, path):
with open(path, "w") as f:
f.write(report)
def send_email(self, report, email):
print(f"Sending to {email}: {report}")
This class handles creation, persistence, and notification — too many responsibilities.
Better Split
class ReportBuilder:
def create(self, data):
return f"Report: {data}"
class ReportRepository:
def save(self, report, path):
with open(path, "w") as f:
f.write(report)
class ReportNotifier:
def send_email(self, report, email):
print(f"Sending to {email}: {report}")
O — Open/Closed Principle (OCP)
Idea
Software entities should be open for extension, closed for modification.
Example
Instead of editing one giant if/elif block for new discount types, introduce strategy classes implementing a common interface.
from abc import ABC, abstractmethod
class DiscountStrategy(ABC):
@abstractmethod
def apply(self, amount: float) -> float:
pass
class NoDiscount(DiscountStrategy):
def apply(self, amount: float) -> float:
return amount
class SeasonalDiscount(DiscountStrategy):
def apply(self, amount: float) -> float:
return amount * 0.9
L — Liskov Substitution Principle (LSP)
Idea
Subtypes should be usable wherever base types are expected without breaking behavior contracts.
Classic Pitfall
If subclass changes expected behavior (for example throwing unsupported errors where parent guarantees support), LSP is violated.
Example Note
Avoid inheritance that forces invalid operations. Prefer composition or better abstractions when contracts differ.
I — Interface Segregation Principle (ISP)
Idea
Clients should not depend on methods they do not use.
Example
Instead of one large interface with print/scan/fax, define focused interfaces so each class implements only relevant capabilities.
from abc import ABC, abstractmethod
class Printable(ABC):
@abstractmethod
def print_doc(self, doc: str) -> None:
pass
class Scannable(ABC):
@abstractmethod
def scan_doc(self) -> str:
pass
D — Dependency Inversion Principle (DIP)
Idea
High-level modules should depend on abstractions, not concrete implementations.
Example
from abc import ABC, abstractmethod
class MessageSender(ABC):
@abstractmethod
def send(self, message: str) -> None:
pass
class EmailSender(MessageSender):
def send(self, message: str) -> None:
print(f"Email: {message}")
class NotificationService:
def __init__(self, sender: MessageSender):
self.sender = sender
def notify(self, text: str) -> None:
self.sender.send(text)
NotificationService can now work with email, SMS, push, or mock sender without changing core logic.
Step-by-Step Refactoring Checklist
- Identify classes with multiple reasons to change.
- Extract responsibilities into focused components.
- Replace switch-heavy logic with polymorphism/strategies.
- Check subtype behavior contracts for LSP violations.
- Split fat interfaces into role-specific interfaces.
- Inject dependencies through abstractions (constructor injection).
ASCII Design Shift
Before:
App -> ConcreteA -> ConcreteB -> ConcreteC (tight coupling)
After:
App -> InterfaceX <- ConcreteA
App -> InterfaceY <- ConcreteB
App -> InterfaceZ <- ConcreteC
Time Complexity Perspective
- SOLID usually does not change algorithmic Big-O directly.
- Its performance effect is architectural: safer optimizations and easier profiling-driven changes.
Space Complexity Perspective
- Abstractions may add slight object overhead.
- Trade-off is generally worth it for maintainability and testability.
Edge Cases in Applying SOLID
- Over-abstraction: too many tiny classes can harm readability.
- Premature generalization: do not design for hypothetical future complexity only.
- Inheritance misuse: composition often gives safer flexibility in Python.
Common Mistakes
Pattern Recognition
You likely need SOLID-oriented refactoring when:
- One class keeps changing for unrelated reasons.
- Adding a feature requires editing many existing files.
- Unit tests require heavy monkey-patching due to tight coupling.
- Large interfaces force classes to implement irrelevant methods.
Interview Insight
Practice Problems
- Refactor a monolithic class into SRP-compliant components.
- Convert if-else business rules into strategy pattern (OCP).
- Inject repository and notifier abstractions for DIP-compliant service.
- Review one existing project and identify one violation per SOLID principle.
Summary
- SOLID principles improve maintainability, extensibility, and testability.
- They are design heuristics for managing change safely.
- Apply pragmatically: enough abstraction for clarity, not abstraction for its own sake.
- Mastering SOLID is essential for strong engineering-level interview performance.
23.2 Singleton Pattern
Introduction
The Singleton Pattern ensures that a class has only one instance and provides a global access point to that instance. It is commonly used for shared resources such as configuration managers, logging controllers, cache coordinators, and connection pools.
In Python, singleton design is straightforward to implement, but must be used carefully because overuse can create hidden global state and testing difficulties.
Real-World Analogy
Think of a control tower at an airport. You do not want multiple independent towers giving contradictory instructions. A singleton acts like one authoritative control point shared by all consumers.
Formal Definition
Singleton is a creational design pattern that restricts object instantiation so that exactly one instance of a class exists during application lifetime (or within a defined scope).
Why This Topic Matters
- Common in low-level design interviews and real codebases.
- Useful for centralized coordination and shared expensive resources.
- Helps understand trade-offs between convenience and testability.
Mental Model
Client A ----\
Client B ----- > Singleton.get_instance() -> same object reference
Client C ----/
No matter how many times creation is requested, same instance should be returned.
Brute Force → Better → Optimal
Brute Force
Create class normally; multiple independent objects appear. Shared state coordination becomes inconsistent.
Better
Store a module-level global object. Simple, but less explicit and harder to control initialization semantics.
Optimal (interview-quality pattern)
Encapsulate instance control in class-level logic with optional thread safety.
Python Implementation Approaches
1) __new__-based Singleton
class Singleton:
_instance = None
def __new__(cls, *args, **kwargs):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
class ConfigManager(Singleton):
def __init__(self):
# Guard to avoid reinitializing on every call
if hasattr(self, "_initialized") and self._initialized:
return
self.settings = {}
self._initialized = True
2) Metaclass-based Singleton (Reusable)
class SingletonMeta(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
class Logger(metaclass=SingletonMeta):
def __init__(self):
self.logs = []
def log(self, message: str) -> None:
self.logs.append(message)
3) Thread-safe Metaclass Variant
from threading import Lock
class ThreadSafeSingletonMeta(type):
_instances = {}
_lock = Lock()
def __call__(cls, *args, **kwargs):
with cls._lock:
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
Line-by-Line Explanation
_instance(or_instancesmap) stores already created object(s).- Creation path checks cache first; creates object only once.
__init__may run multiple times unless guarded in __new__ approach.- Thread-safe version uses lock to prevent race conditions during first creation.
Step-by-Step Usage Example
a = Logger()b = Logger()id(a) == id(b)is True (same object).a.log("x")is visible throughb.logstoo.
ASCII Diagram
Request #1 -> create instance -> store in _instances
Request #2 -> return stored instance
Request #3 -> return stored instance
Time Complexity Perspective
- Instance retrieval: O(1) average dictionary lookup.
- Thread-safe locking adds small synchronization overhead.
Space Complexity Perspective
- O(1) per singleton class instance (or O(k) for k singleton classes in metaclass registry).
Edge Cases
- Multithreading: race conditions can create multiple instances without lock.
- Serialization/deserialization: may accidentally create extra objects if not handled carefully.
- Testing: global shared state can leak between tests unless reset hooks exist.
Common Mistakes
When to Use vs Avoid
Use Singleton When
- You truly need one shared coordinator/resource.
- Centralized lifecycle management is required.
- Configuration consistency must be guaranteed.
Avoid Singleton When
- It introduces hidden global mutable state.
- Dependency injection would provide clearer design.
- Test isolation becomes difficult.
Pattern Recognition
Singleton is suitable if requirements include:
- “Exactly one shared instance”
- “Global access to central manager”
- “Avoid duplicate expensive initialization”
Interview Insight
Practice Problems
- Implement singleton logger with thread safety.
- Build singleton config manager with lazy initialization.
- Refactor singleton usage into dependency injection and compare testability.
Summary
- Singleton ensures only one instance with global access.
- Useful for shared managers and expensive one-time resources.
- Implement carefully with initialization guards and thread safety when needed.
- Apply sparingly to avoid global-state design debt.
23.2 Singleton Pattern
Introduction
The Singleton Pattern ensures that a class has only one instance and provides a global access point to that instance. It is commonly used for shared resources such as configuration managers, logging controllers, cache coordinators, and connection pools.
In Python, singleton design is straightforward to implement, but must be used carefully because overuse can create hidden global state and testing difficulties.
Real-World Analogy
Think of a control tower at an airport. You do not want multiple independent towers giving contradictory instructions. A singleton acts like one authoritative control point shared by all consumers.
Formal Definition
Singleton is a creational design pattern that restricts object instantiation so that exactly one instance of a class exists during application lifetime (or within a defined scope).
Why This Topic Matters
- Common in low-level design interviews and real codebases.
- Useful for centralized coordination and shared expensive resources.
- Helps understand trade-offs between convenience and testability.
Mental Model
Client A ----Client B ----- > Singleton.get_instance() -> same object reference
Client C ----/
No matter how many times creation is requested, same instance should be returned.
Brute Force → Better → Optimal
Brute Force
Create class normally; multiple independent objects appear. Shared state coordination becomes inconsistent.
Better
Store a module-level global object. Simple, but less explicit and harder to control initialization semantics.
Optimal (interview-quality pattern)
Encapsulate instance control in class-level logic with optional thread safety.
Python Implementation Approaches
1) __new__-based Singleton
class Singleton:
_instance = None
def __new__(cls, *args, **kwargs):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
class ConfigManager(Singleton):
def __init__(self):
# Guard to avoid reinitializing on every call
if hasattr(self, "_initialized") and self._initialized:
return
self.settings = {}
self._initialized = True
2) Metaclass-based Singleton (Reusable)
class SingletonMeta(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
class Logger(metaclass=SingletonMeta):
def __init__(self):
self.logs = []
def log(self, message: str) -> None:
self.logs.append(message)
3) Thread-safe Metaclass Variant
from threading import Lock
class ThreadSafeSingletonMeta(type):
_instances = {}
_lock = Lock()
def __call__(cls, *args, **kwargs):
with cls._lock:
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
Line-by-Line Explanation
_instance(or_instancesmap) stores already created object(s).- Creation path checks cache first; creates object only once.
__init__may run multiple times unless guarded in __new__ approach.- Thread-safe version uses lock to prevent race conditions during first creation.
Step-by-Step Usage Example
a = Logger()b = Logger()id(a) == id(b)is True (same object).a.log("x")is visible throughb.logstoo.
ASCII Diagram
Request #1 -> create instance -> store in _instances
Request #2 -> return stored instance
Request #3 -> return stored instance
Time Complexity Perspective
- Instance retrieval: O(1) average dictionary lookup.
- Thread-safe locking adds small synchronization overhead.
Space Complexity Perspective
- O(1) per singleton class instance (or O(k) for k singleton classes in metaclass registry).
Edge Cases
- Multithreading: race conditions can create multiple instances without lock.
- Serialization/deserialization: may accidentally create extra objects if not handled carefully.
- Testing: global shared state can leak between tests unless reset hooks exist.
Common Mistakes
When to Use vs Avoid
Use Singleton When
- You truly need one shared coordinator/resource.
- Centralized lifecycle management is required.
- Configuration consistency must be guaranteed.
Avoid Singleton When
- It introduces hidden global mutable state.
- Dependency injection would provide clearer design.
- Test isolation becomes difficult.
Pattern Recognition
Singleton is suitable if requirements include:
- "Exactly one shared instance"
- "Global access to central manager"
- "Avoid duplicate expensive initialization"
Interview Insight
Practice Problems
- Implement singleton logger with thread safety.
- Build singleton config manager with lazy initialization.
- Refactor singleton usage into dependency injection and compare testability.
Summary
- Singleton ensures only one instance with global access.
- Useful for shared managers and expensive one-time resources.
- Implement carefully with initialization guards and thread safety when needed.
- Apply sparingly to avoid global-state design debt.
23.3 Factory Pattern
Introduction
The Factory Pattern is a creational design pattern that centralizes object creation logic. Instead of creating objects directly throughout your codebase, you delegate creation to a factory component that decides which concrete class to instantiate.
This pattern improves flexibility, readability, and maintainability when object creation rules are conditional or likely to evolve.
Real-World Analogy
Imagine ordering a drink at a cafe. You ask for “coffee,” but you do not manually prepare espresso, add milk, and assemble the final drink. The cafe system decides the exact preparation process and returns the right product. Factory pattern does the same for objects.
Formal Definition
Factory Pattern provides an interface for creating objects while allowing subclasses or factory methods to determine the concrete type returned.
Why This Topic Matters
- Reduces scattered conditional object creation code.
- Makes extension easier when adding new implementations.
- Frequently appears in LLD interviews and production systems.
Mental Model
Client -> Factory -> Concrete Product
Client knows abstraction,
Factory knows concrete class selection
Brute Force → Better → Optimal
Brute Force
Client code directly creates concrete classes with many if/elif branches across files.
Better
Move object creation into one helper function, but still tightly coupled and hard to extend cleanly.
Optimal
Use a factory abstraction (or factory method) so new product types can be added with minimal client changes.
Step-by-Step: Refactor to Factory
- Identify repeated object creation condition blocks.
- Extract common product interface (abstract base class/protocol).
- Create concrete product classes implementing that interface.
- Add factory method/class that maps input type to concrete class.
- Replace direct constructor calls in client with factory call.
Python Implementation (Simple Factory)
from abc import ABC, abstractmethod
class Notification(ABC):
@abstractmethod
def send(self, message: str) -> None:
pass
class EmailNotification(Notification):
def send(self, message: str) -> None:
print(f"[EMAIL] {message}")
class SMSNotification(Notification):
def send(self, message: str) -> None:
print(f"[SMS] {message}")
class PushNotification(Notification):
def send(self, message: str) -> None:
print(f"[PUSH] {message}")
class NotificationFactory:
@staticmethod
def create(channel: str) -> Notification:
channel = channel.lower()
if channel == "email":
return EmailNotification()
if channel == "sms":
return SMSNotification()
if channel == "push":
return PushNotification()
raise ValueError(f"Unsupported channel: {channel}")
# Client
notifier = NotificationFactory.create("email")
notifier.send("Welcome!")
Factory Method Style (Extensible)
from abc import ABC, abstractmethod
class Transport(ABC):
@abstractmethod
def deliver(self) -> str:
pass
class Truck(Transport):
def deliver(self) -> str:
return "Deliver by road"
class Ship(Transport):
def deliver(self) -> str:
return "Deliver by sea"
class Logistics(ABC):
@abstractmethod
def create_transport(self) -> Transport:
pass
def plan_delivery(self) -> str:
transport = self.create_transport()
return transport.deliver()
class RoadLogistics(Logistics):
def create_transport(self) -> Transport:
return Truck()
class SeaLogistics(Logistics):
def create_transport(self) -> Transport:
return Ship()
Line-by-Line Explanation
- Product interface (
Notification/Transport) defines behavior contract. - Concrete classes implement behavior variants.
- Factory encapsulates type-selection logic.
- Client depends on abstraction, not concrete constructor details.
ASCII Diagram
Client
|
v
Factory.create(type)
|
+--> ConcreteA()
+--> ConcreteB()
+--> ConcreteC()
Time Complexity Perspective
- Object creation selection is typically O(1) with direct mapping/branching.
- Main benefits are architectural, not asymptotic.
Space Complexity Perspective
- Negligible additional overhead for factory layer.
- May store registry maps if using dynamic registration approach.
Edge Cases
- Invalid type key: return clear error or fallback strategy.
- Growing product variants: avoid giant if-else by registry/dictionary mapping.
- Shared configuration: ensure factory supports dependency injection.
Common Mistakes
Pattern Recognition
Factory pattern is a strong fit when:
- Object type depends on runtime input/configuration.
- Creation process is non-trivial and repeated in many places.
- You expect new variants to be added over time.
Interview Insight
Practice Problems
- Refactor payment gateway selection logic into a factory.
- Create parser factory for JSON/XML/CSV handlers.
- Build plugin-style factory with class registry map.
Summary
- Factory pattern centralizes object creation and hides concrete selection details.
- It improves extensibility and keeps client code focused on abstractions.
- Use it when creation logic is conditional, repeated, or likely to evolve.
- Apply pragmatically to avoid unnecessary abstraction overhead.
23.4 Adapter Pattern
Introduction
The Adapter Pattern allows two incompatible interfaces to work together without modifying their original code. It acts as a translator between a client’s expected interface and an existing class with a different interface.
In Python systems, adapter usage is common when integrating third-party SDKs, legacy modules, and external services.
Real-World Analogy
A laptop charger plug from one country may not fit a wall socket in another country. A travel adapter bridges this mismatch without changing either the laptop or the building wiring. Software adapters solve the same compatibility problem.
Formal Definition
Adapter Pattern is a structural design pattern that converts the interface of a class into another interface clients expect.
Why This Topic Matters
- Lets you integrate legacy or third-party code safely.
- Reduces ripple effects by isolating integration differences.
- Frequently used in production microservices and SDK wrappers.
Mental Model
Client expects: target_interface()
Adapter exposes: target_interface()
Adapter internally calls: adaptee.different_interface()
Brute Force → Better → Optimal
Brute Force
Modify client everywhere to support multiple incompatible APIs. Leads to scattered conditionals and brittle code.
Better
Add conversion logic near call sites. Still duplicated and hard to maintain.
Optimal
Introduce adapter layer so client stays stable and only adapter knows external API differences.
Core Participants
- Target: interface expected by client.
- Adaptee: existing class with incompatible interface.
- Adapter: bridge converting target calls into adaptee calls.
- Client: code that depends only on target interface.
Step-by-Step Refactor to Adapter
- Define target interface used by your application.
- Identify external/legacy method mismatch.
- Create adapter implementing target interface.
- Map and transform request/response fields inside adapter.
- Inject adapter into client; remove external API calls from business logic.
Python Implementation
Scenario
Client expects send(message), but third-party notifier provides push_text(payload).
from abc import ABC, abstractmethod
class Notifier(ABC):
@abstractmethod
def send(self, message: str) -> None:
pass
# Third-party / legacy class (Adaptee)
class LegacyPushService:
def push_text(self, payload: dict) -> None:
print(f"LEGACY_PUSH: {payload['body']}")
# Adapter
class LegacyPushAdapter(Notifier):
def __init__(self, legacy_service: LegacyPushService):
self.legacy_service = legacy_service
def send(self, message: str) -> None:
payload = {"body": message}
self.legacy_service.push_text(payload)
# Client depends only on target interface
class AlertService:
def __init__(self, notifier: Notifier):
self.notifier = notifier
def alert(self, text: str) -> None:
self.notifier.send(text)
legacy = LegacyPushService()
adapter = LegacyPushAdapter(legacy)
service = AlertService(adapter)
service.alert("High CPU usage")
Line-by-Line Explanation
Notifieris stable interface for business layer.LegacyPushServicehas incompatible method and payload shape.LegacyPushAdaptertranslatessend(text)into legacy API format.AlertServiceremains clean and unaffected by legacy details.
ASCII Diagram
Client (AlertService)
|
v
Target: Notifier.send(msg)
|
v
Adapter (LegacyPushAdapter)
|
v
Adaptee (LegacyPushService.push_text(payload))
Class Adapter vs Object Adapter
Object Adapter (Preferred in Python)
Adapter holds adaptee instance (composition). Flexible and most common in Python.
+Class Adapter
Adapter uses inheritance from adaptee + target interface. Less common and less flexible in Python due to multiple inheritance complexity concerns.
+Time Complexity Perspective
- Usually O(1) forwarding + transformation overhead.
- Main benefits are architectural, not asymptotic.
Space Complexity Perspective
- O(1) additional adapter object overhead per integration instance.
Edge Cases
- Data format mismatch: adapter must validate and map fields robustly.
- Error model mismatch: convert external exceptions to domain-specific exceptions.
- Async vs sync mismatch: adapter may need async wrappers or background execution handling.
Common Mistakes
Pattern Recognition
Use Adapter when:
- You must reuse an existing class with incompatible interface.
- You cannot or should not modify third-party/legacy code.
- You want business layer to depend on stable internal contracts.
Interview Insight
Practice Problems
- Wrap two payment gateway SDKs under one internal payment interface.
- Adapt legacy XML parser output into modern JSON-domain objects.
- Build adapter for async third-party API into sync internal contract (or vice versa).
Summary
- Adapter pattern bridges incompatible interfaces safely.
- It decouples business logic from external API quirks.
- Most effective when integrating legacy or third-party components.
- Use composition-focused object adapters for flexibility in Python.
23.5 Decorator Pattern
Introduction
The Decorator Pattern lets you add behavior to objects dynamically without modifying their original class. Instead of creating many subclasses for every feature combination, decorators wrap objects and extend behavior layer by layer.
In Python, this pattern is especially powerful because both object-oriented wrappers and function decorators are common in production code.
Real-World Analogy
Think of ordering coffee: you start with a basic coffee, then add milk, then sugar, then whipped cream. Each add-on changes behavior (price/description) by wrapping the previous object, not by rewriting the coffee itself.
Formal Definition
Decorator Pattern is a structural pattern that attaches additional responsibilities to an object dynamically by placing it inside wrapper objects that implement the same interface.
Why This Topic Matters
- Prevents subclass explosion for feature combinations.
- Supports flexible runtime composition of behaviors.
- Common in logging, caching, validation, retry, metrics, and middleware pipelines.
Mental Model
Client -> DecoratorN -> DecoratorN-1 -> ... -> BaseComponent
Each layer adds behavior before/after delegating
Brute Force → Better → Optimal
Brute Force
Create subclasses for every combination (e.g., CoffeeWithMilkAndSugarAndWhip). This quickly becomes unmanageable.
Better
Use flags/conditionals inside one class. Simpler initially, but violates SRP and becomes messy.
Optimal
Use decorators that wrap a common component interface and compose features dynamically.
Core Participants
- Component: common interface used by client.
- Concrete Component: base object with default behavior.
- Decorator Base: holds wrapped component and forwards calls.
- Concrete Decorators: add specific behavior before/after delegation.
Step-by-Step Design
- Define base interface for operation(s).
- Implement core concrete component.
- Create decorator base class implementing same interface.
- Create concrete decorators for each extra feature.
- Wrap components in required order at runtime.
Python OOP Decorator Pattern Example
from abc import ABC, abstractmethod
class Coffee(ABC):
@abstractmethod
def cost(self) -> float:
pass
@abstractmethod
def description(self) -> str:
pass
class BasicCoffee(Coffee):
def cost(self) -> float:
return 50.0
def description(self) -> str:
return "Basic Coffee"
class CoffeeDecorator(Coffee):
def __init__(self, coffee: Coffee):
self._coffee = coffee
def cost(self) -> float:
return self._coffee.cost()
def description(self) -> str:
return self._coffee.description()
class MilkDecorator(CoffeeDecorator):
def cost(self) -> float:
return super().cost() + 15.0
def description(self) -> str:
return super().description() + ", Milk"
class SugarDecorator(CoffeeDecorator):
def cost(self) -> float:
return super().cost() + 5.0
def description(self) -> str:
return super().description() + ", Sugar"
class WhipDecorator(CoffeeDecorator):
def cost(self) -> float:
return super().cost() + 20.0
def description(self) -> str:
return super().description() + ", Whip"
coffee = BasicCoffee()
coffee = MilkDecorator(coffee)
coffee = SugarDecorator(coffee)
coffee = WhipDecorator(coffee)
print(coffee.description()) # Basic Coffee, Milk, Sugar, Whip
print(coffee.cost()) # 90.0
Python Function Decorator Perspective
Python’s @decorator syntax is a language-level implementation of the same idea: wrapping behavior around a callable.
import time
from functools import wraps
def timing_decorator(fn):
@wraps(fn)
def wrapper(*args, **kwargs):
start = time.time()
result = fn(*args, **kwargs)
elapsed = time.time() - start
print(f"{fn.__name__} took {elapsed:.6f}s")
return result
return wrapper
@timing_decorator
def compute(n):
return sum(i * i for i in range(n))
Line-by-Line Explanation
- Decorator object keeps reference to wrapped component.
- Each concrete decorator augments behavior and delegates to wrapped object.
- Order of wrapping affects final output/behavior.
- Function decorators use wrapper closures to add behavior around function execution.
ASCII Diagram
Client
|
v
WhipDecorator
|
v
SugarDecorator
|
v
MilkDecorator
|
v
BasicCoffee
Time Complexity Perspective
- Each decorator layer adds O(1) extra work per call.
- Total overhead is proportional to number of layers wrapped.
Space Complexity Perspective
- O(k) wrapper objects for k decorators.
Edge Cases
- Decorator order: can change outputs and side effects.
- Too many layers: may hurt readability/debugging.
- Mutable shared state: ensure wrappers do not produce unintended interactions.
Common Mistakes
Decorator vs Adapter (Quick Contrast)
| Pattern | Primary Goal | Interface |
|---|---|---|
| Decorator | Add behavior dynamically | Same interface |
| Adapter | Convert compatibility | Different -> expected interface |
Pattern Recognition
Use Decorator when:
- You need combinable optional features.
- Inheritance tree is growing too large.
- Behavior should be attachable/removable at runtime.
Interview Insight
Practice Problems
- Build API client decorators for retry, logging, and caching.
- Implement text processing pipeline with layered decorators (trim, sanitize, encrypt).
- Refactor subclass-heavy feature combinations into decorators.
Summary
- Decorator adds behavior dynamically by wrapping components.
- It avoids subclass explosion and supports flexible composition.
- Widely used in Python through both OOP wrappers and function decorators.
- Best applied when optional behaviors are combinable and evolving.
23.6 Facade Pattern
Introduction
The Facade Pattern provides a simple, unified interface over a complex subsystem. Instead of forcing clients to understand and orchestrate many low-level components, a facade offers one clean entry point for common workflows.
In Python projects, facade is useful when service orchestration grows messy and call-site complexity starts leaking everywhere.
Real-World Analogy
When planning a trip, you can book flights, hotels, insurance, and transport separately through different systems — or use a travel desk that handles all steps through one request. Facade is that travel desk for software subsystems.
Formal Definition
Facade Pattern is a structural pattern that defines a higher-level interface making a subsystem easier to use.
Why This Topic Matters
- Reduces client-side orchestration complexity.
- Improves readability and discoverability of common workflows.
- Helps enforce cleaner architectural boundaries in large systems.
Mental Model
Client -> Facade -> Subsystem A
-> Subsystem B
-> Subsystem C
Client calls one method; facade coordinates the rest.
Brute Force → Better → Optimal
Brute Force
Each client calls many subsystem classes directly and repeats orchestration logic.
Better
Shared helper utilities reduce some duplication but still expose too many subsystem details.
Optimal
Introduce a facade with clear business-level operations that internally coordinate subsystem calls.
Core Participants
- Subsystems: existing components with detailed APIs.
- Facade: simple interface that composes subsystem operations.
- Client: depends on facade for common use cases.
Step-by-Step Refactor to Facade
- Identify repeated multi-step workflows across clients.
- Define high-level operations clients actually need.
- Create facade class exposing those operations.
- Move orchestration/ordering logic into facade.
- Keep advanced subsystem access optional for special cases.
Python Implementation Example
Scenario: Order Processing Pipeline
class InventoryService:
def reserve(self, item_id: str, qty: int) -> bool:
print(f"Inventory reserved: {item_id} x{qty}")
return True
class PaymentService:
def charge(self, user_id: str, amount: float) -> bool:
print(f"Charged {user_id}: {amount}")
return True
class ShippingService:
def create_shipment(self, user_id: str, item_id: str, qty: int) -> str:
tracking_id = "TRK123"
print(f"Shipment created: {tracking_id}")
return tracking_id
class NotificationService:
def send(self, user_id: str, message: str) -> None:
print(f"Notify {user_id}: {message}")
class OrderFacade:
def __init__(self):
self.inventory = InventoryService()
self.payment = PaymentService()
self.shipping = ShippingService()
self.notification = NotificationService()
def place_order(self, user_id: str, item_id: str, qty: int, amount: float) -> dict:
if not self.inventory.reserve(item_id, qty):
return {"ok": False, "error": "Inventory unavailable"}
if not self.payment.charge(user_id, amount):
return {"ok": False, "error": "Payment failed"}
tracking = self.shipping.create_shipment(user_id, item_id, qty)
self.notification.send(user_id, f"Order placed. Tracking: {tracking}")
return {"ok": True, "tracking_id": tracking}
Line-by-Line Explanation
- Subsystems stay focused on their own responsibilities.
OrderFacade.place_orderdefines one business-level method for clients.- Facade controls operation order and failure handling.
- Client now calls one method instead of coordinating four services manually.
ASCII Diagram
Client
|
v
OrderFacade.place_order()
|--> Inventory.reserve()
|--> Payment.charge()
|--> Shipping.create_shipment()
\--> Notification.send()
Time Complexity Perspective
- Facade usually adds O(1) orchestration overhead around subsystem calls.
- Overall complexity depends on subsystem operations, not facade itself.
Space Complexity Perspective
- Minimal extra memory for facade object and references to subsystem services.
Edge Cases
- Partial failure: facade may need compensating actions (rollback/cancel).
- Long workflows: ensure error propagation remains clear and traceable.
- Overgrown facade: split into domain-specific facades if it becomes too large.
Common Mistakes
Facade vs Adapter vs Decorator
| Pattern | Primary Goal |
|---|---|
| Facade | Simplify usage of subsystem |
| Adapter | Convert interface compatibility |
| Decorator | Add behavior dynamically |
Pattern Recognition
Use Facade when:
- Clients repeatedly execute the same multi-step subsystem sequence.
- Subsystem API is too detailed for common use cases.
- You want a clean entry point for business-level operations.
Interview Insight
Practice Problems
- Build HomeTheaterFacade over audio, projector, and streaming subsystems.
- Create DeploymentFacade for build-test-deploy-notify pipeline.
- Refactor scattered payment + inventory + notification orchestration into facade.
place_order()) instead of exposing low-level subsystem verbs. This preserves abstraction value.
Summary
- Facade pattern provides a simple interface to complex subsystems.
- It centralizes orchestration and reduces client-side complexity.
- Best used for common workflows where many subsystem calls are repeatedly combined.
- Use carefully to avoid oversized facade classes.
23.7 Observer Pattern
Introduction
The Observer Pattern defines a one-to-many dependency between objects so that when one object (the subject) changes state, all dependent objects (observers) are notified automatically.
This pattern is fundamental in event-driven systems, UI frameworks, pub-sub architectures, and real-time notification pipelines.
Real-World Analogy
Think of a YouTube channel subscription model. The channel uploads a new video (state change), and all subscribers (observers) receive notifications automatically. Subscribers can also unsubscribe whenever they want.
Formal Definition
Observer Pattern is a behavioral design pattern where a subject maintains a list of observers and notifies them of state changes, usually by calling an update method.
Why This Topic Matters
- Enables event-driven architecture and loose coupling.
- Supports dynamic subscriber management at runtime.
- Frequently appears in GUI systems, messaging systems, and monitoring tools.
Mental Model
Subject state changes
|
v
notify_all()
| | |
Obs1 Obs2 Obs3 (each reacts independently)
Brute Force → Better → Optimal
Brute Force
Subject directly calls concrete services (email, SMS, logs) with hard-coded dependencies.
Better
Move handlers to helper functions, but subject still knows too much about receivers.
Optimal
Use observer abstraction. Subject only broadcasts events; observers decide their own reactions.
Core Participants
- Subject: maintains observers and triggers notifications.
- Observer Interface: defines update contract.
- Concrete Observers: implement reaction logic.
Step-by-Step Design
- Create observer interface with
update(event). - Subject stores observer list and supports attach/detach.
- On state change, subject calls
notify(). - Each observer handles event independently.
Python Implementation
from abc import ABC, abstractmethod
from typing import List
class Observer(ABC):
@abstractmethod
def update(self, event: str) -> None:
pass
class Subject:
def __init__(self):
self._observers: List[Observer] = []
def attach(self, observer: Observer) -> None:
if observer not in self._observers:
self._observers.append(observer)
def detach(self, observer: Observer) -> None:
if observer in self._observers:
self._observers.remove(observer)
def notify(self, event: str) -> None:
for observer in self._observers:
observer.update(event)
class EmailNotifier(Observer):
def update(self, event: str) -> None:
print(f"[EMAIL] {event}")
class SMSNotifier(Observer):
def update(self, event: str) -> None:
print(f"[SMS] {event}")
class AnalyticsTracker(Observer):
def update(self, event: str) -> None:
print(f"[ANALYTICS] tracked event: {event}")
class OrderService(Subject):
def place_order(self, order_id: str) -> None:
event = f"Order placed: {order_id}"
self.notify(event)
service = OrderService()
service.attach(EmailNotifier())
service.attach(SMSNotifier())
service.attach(AnalyticsTracker())
service.place_order("ORD-101")
Line-by-Line Explanation
Subjectmanages observer lifecycle with attach/detach methods.notifybroadcasts event to all current observers.- Concrete observers implement independent side effects.
OrderServicefocuses on business action, not notification specifics.
ASCII Diagram
OrderService (Subject)
| state change
v
notify(event)
|------> EmailNotifier
|------> SMSNotifier
\------> AnalyticsTracker
Push vs Pull Notification Styles
Push Model
Subject sends event data directly in update call (as in current example).
Pull Model
Subject only signals change; observers query needed state from subject afterward.
Time Complexity Perspective
notifyis O(n) where n = number of observers.attach/detachis O(n) with list (can be O(1) average with set if needed and hashable observers).
Space Complexity Perspective
- O(n) for storing observer references.
Edge Cases
- Observer failure: one failing observer should not always block others (consider try/except per observer).
- Observer removes itself during notification: iterate over snapshot copy to avoid mutation issues.
- High-frequency events: notification storms may require batching or async queues.
Common Mistakes
Observer vs Pub-Sub (Quick Contrast)
| Model | Coupling Style | Common Scope |
|---|---|---|
| Observer | Direct subject-observer references | In-process object collaboration |
| Pub-Sub | Broker/topic mediated | Distributed/event bus systems |
Pattern Recognition
Use Observer when:
- One state change should trigger multiple independent reactions.
- Receivers should be pluggable/removable at runtime.
- Sender must remain decoupled from specific side-effect handlers.
Interview Insight
Practice Problems
- Implement stock price subject with multiple subscriber dashboards.
- Add async observer dispatch using queue/thread pool.
- Implement observer priority ordering and retry policy for failures.
Summary
- Observer pattern enables one-to-many event notification with loose coupling.
- It is ideal for event-driven local object collaboration.
- Attach/detach flexibility improves runtime extensibility.
- Plan for failure handling and scalability in production use.
23.8 Strategy Pattern
Introduction
The Strategy Pattern defines a family of algorithms, encapsulates each one, and makes them interchangeable at runtime. Instead of hardcoding one algorithm in a class, you inject a strategy object that implements a shared contract.
This pattern is ideal when behavior changes based on context (for example pricing policy, payment method, sorting rule, or retry policy).
Real-World Analogy
Consider a navigation app. You can choose “fastest route,” “shortest distance,” or “avoid tolls.” The app context stays same, but route strategy changes based on user preference. Strategy pattern models this behavior cleanly.
Formal Definition
Strategy Pattern is a behavioral pattern that enables selecting an algorithm’s implementation at runtime by composing with interchangeable strategy objects.
Why This Topic Matters
- Eliminates large if/elif blocks for behavior variants.
- Supports Open/Closed Principle by adding new strategies without changing context class.
- Improves unit testing by testing strategies independently.
Mental Model
Context
|
+--> Strategy Interface
|--> Strategy A
|--> Strategy B
|--> Strategy C
Context delegates behavior to current strategy.
Brute Force → Better → Optimal
Brute Force
One class with huge conditional logic selecting behavior by flags/types.
Better
Split helper methods but still centralize branching in context.
Optimal
Extract algorithms into interchangeable strategy classes and inject one into context.
Core Participants
- Strategy Interface: common operation contract.
- Concrete Strategies: different implementations.
- Context: uses strategy to perform work; can switch strategy dynamically.
Step-by-Step Design
- Identify algorithm variants currently in conditionals.
- Create shared strategy interface.
- Move each variant into separate concrete strategy class.
- Inject strategy into context via constructor or setter.
- Delegate execution from context to strategy.
Python Implementation Example
Scenario: Payment Processing
from abc import ABC, abstractmethod
class PaymentStrategy(ABC):
@abstractmethod
def pay(self, amount: float) -> str:
pass
class CreditCardPayment(PaymentStrategy):
def pay(self, amount: float) -> str:
return f"Paid {amount} using Credit Card"
class UpiPayment(PaymentStrategy):
def pay(self, amount: float) -> str:
return f"Paid {amount} using UPI"
class WalletPayment(PaymentStrategy):
def pay(self, amount: float) -> str:
return f"Paid {amount} using Wallet"
class CheckoutContext:
def __init__(self, strategy: PaymentStrategy):
self.strategy = strategy
def set_strategy(self, strategy: PaymentStrategy) -> None:
self.strategy = strategy
def checkout(self, amount: float) -> str:
return self.strategy.pay(amount)
checkout = CheckoutContext(CreditCardPayment())
print(checkout.checkout(1200.0))
checkout.set_strategy(UpiPayment())
print(checkout.checkout(799.0))
Line-by-Line Explanation
PaymentStrategydefines the common contract.- Each concrete strategy encapsulates one payment behavior.
CheckoutContextis decoupled from concrete payment implementations.set_strategyallows runtime behavior switching.
ASCII Diagram
CheckoutContext.checkout()
|
v
current strategy.pay()
|
[CreditCard | UPI | Wallet]
Functional Strategy Variant (Pythonic)
In Python, strategies can also be callables/functions for lightweight use cases.
def fast_shipping(cost):
return cost + 50
def free_shipping(cost):
return cost
class ShippingContext:
def __init__(self, strategy):
self.strategy = strategy
def total(self, base_cost):
return self.strategy(base_cost)
Time Complexity Perspective
- Pattern itself adds O(1) dispatch overhead.
- Actual complexity depends on chosen strategy algorithm.
Space Complexity Perspective
- O(k) for storing k strategy class definitions; runtime context holds one active strategy reference.
Edge Cases
- No strategy set: context should validate or provide default strategy.
- Stateful strategies: ensure reusability/thread-safety rules are clear.
- Too many tiny strategies: can overcomplicate simple domains.
Common Mistakes
Strategy vs Factory (Quick Contrast)
| Pattern | Primary Focus |
|---|---|
| Strategy | Choosing behavior/algorithm at runtime |
| Factory | Creating object instances |
Pattern Recognition
Use Strategy when:
- You have multiple interchangeable algorithms for same task.
- Behavior needs runtime switching.
- Large conditional blocks keep growing with new policy types.
Interview Insight
Practice Problems
- Implement discount engine using strategy classes for coupon types.
- Build compression context with gzip/zip/lz strategies.
- Refactor tax calculation if-else tree into strategy pattern.
Summary
- Strategy pattern encapsulates interchangeable algorithms behind one interface.
- It removes conditional complexity and supports runtime behavior switching.
- Widely useful for policy-driven business logic in production systems.
- Use it when behavior variants are expected to grow.
23.9 State Pattern
Introduction
The State Pattern allows an object to alter its behavior when its internal state changes. Instead of one class containing large conditional state logic, each state is represented as a separate class with its own behavior.
This pattern is especially useful for workflows like vending machines, order lifecycle, document publishing states, and connection/session handling.
Real-World Analogy
A traffic light behaves differently depending on its current state: green allows go, yellow warns transition, red enforces stop. Same object, different behavior per state. State pattern models this explicitly.
Formal Definition
State Pattern is a behavioral design pattern that lets an object delegate behavior to state-specific objects and switch between them dynamically.
Why This Topic Matters
- Removes complex state conditionals from central classes.
- Makes transitions explicit and safer to extend.
- Common in domain-driven workflows and backend services.
Mental Model
Context
|
+--> current_state.handle(...)
|
+--> may trigger context state transition
Context delegates behavior to current state object; state decides next valid transitions.
Brute Force → Better → Optimal
Brute Force
Use one class with huge if/elif blocks checking state in every method.
Better
Centralize state checks in helper methods, but conditional complexity still grows rapidly.
Optimal
Model each state as a class implementing a common interface; transitions change active state object.
Core Participants
- Context: owns current state and delegates behavior.
- State Interface: declares operations for state-specific behavior.
- Concrete States: implement behavior and transitions.
Step-by-Step Design
- Identify all distinct states and allowed transitions.
- Define state interface for relevant actions.
- Implement one class per state.
- Context routes actions to active state.
- State objects trigger transitions by updating context state.
Python Implementation Example
Scenario: Order Lifecycle
from abc import ABC, abstractmethod
class OrderState(ABC):
@abstractmethod
def pay(self, order):
pass
@abstractmethod
def ship(self, order):
pass
@abstractmethod
def deliver(self, order):
pass
class CreatedState(OrderState):
def pay(self, order):
print("Payment received.")
order.set_state(PaidState())
def ship(self, order):
print("Cannot ship before payment.")
def deliver(self, order):
print("Cannot deliver before shipping.")
class PaidState(OrderState):
def pay(self, order):
print("Order already paid.")
def ship(self, order):
print("Order shipped.")
order.set_state(ShippedState())
def deliver(self, order):
print("Cannot deliver before shipping.")
class ShippedState(OrderState):
def pay(self, order):
print("Order already paid and shipped.")
def ship(self, order):
print("Order already shipped.")
def deliver(self, order):
print("Order delivered.")
order.set_state(DeliveredState())
class DeliveredState(OrderState):
def pay(self, order):
print("Order already completed.")
def ship(self, order):
print("Order already completed.")
def deliver(self, order):
print("Order already delivered.")
class Order:
def __init__(self):
self.state = CreatedState()
def set_state(self, state: OrderState):
self.state = state
def pay(self):
self.state.pay(self)
def ship(self):
self.state.ship(self)
def deliver(self):
self.state.deliver(self)
Line-by-Line Explanation
Ordercontext holds current state object.- Each state class defines valid behavior for actions.
- Invalid transitions are handled where they logically belong (inside current state).
- Successful actions can move order to next state via
set_state().
ASCII State Flow
Created --pay--> Paid --ship--> Shipped --deliver--> Delivered
Invalid actions in any state are blocked by that state's logic.
State vs Strategy (Quick Contrast)
| Pattern | Behavior Change Trigger |
|---|---|
| State | Internal lifecycle transitions |
| Strategy | External selection of algorithm/policy |
Time Complexity Perspective
- State method dispatch is O(1) per operation.
- Main gain is architectural clarity, not asymptotic speed.
Space Complexity Perspective
- Additional state objects/classes add small structural overhead.
Edge Cases
- Invalid transitions: ensure each state handles disallowed actions explicitly.
- State explosion: too many micro-states may hurt readability; group where appropriate.
- Persistence: long-lived systems may need serializable state identity.
Common Mistakes
Pattern Recognition
Use State Pattern when:
- Object behavior changes significantly by lifecycle stage.
- You see repeated
if state == ...checks across many methods. - Transition rules are explicit business rules that may evolve.
Interview Insight
Practice Problems
- Model document workflow: Draft -> Review -> Approved -> Published.
- Implement media player states: Playing, Paused, Stopped.
- Refactor a large conditional-driven workflow into state objects.
Summary
- State pattern models behavior by lifecycle stage using dedicated state classes.
- It replaces state conditionals with explicit, maintainable transitions.
- Ideal for workflow-driven systems where valid actions depend on current state.
- Use judiciously to balance clarity with implementation complexity.
24.1 Interview Score Rubric (0 to 10)
Introduction
Top courses are not judged only by topic coverage. They are judged by measurable outcomes. This rubric gives you a repeatable way to score any solve attempt and objectively track your interview readiness.
Rubric Dimensions
| Dimension | Weight | What "Excellent" Looks Like |
|---|---|---|
| Problem Understanding | 15% | Restates constraints, edge limits, output format correctly. |
| Approach Quality | 25% | Moves from brute to optimal with clear trade-offs. |
| Correctness | 20% | No logic bugs on dry run and custom test cases. |
| Complexity Analysis | 15% | Precise Big-O with justification and constraints alignment. |
| Communication | 15% | Structured, concise, interviewer-friendly narration. |
| Code Quality | 10% | Clean naming, robust checks, no dead branches. |
24.2 8-Week Timed Problem Roadmap
Plan
- Weeks 1-2: Arrays, strings, hashing, two pointers, binary search (45 min/question).
- Weeks 3-4: Stack/queue, linked list, trees, heaps (50 min/question).
- Weeks 5-6: Graphs, recursion/backtracking, greedy (55 min/question).
- Weeks 7-8: Dynamic programming + mixed mocks (60 min full interview simulation).
Execution Rule
Each week: 5 timed solves + 1 revision day + 1 full mock day. Never replace revision day with new questions.
24.3 Edge-Case & Test Design Checklist
Mandatory Test Buckets
- Empty/min input and max boundary input.
- Duplicate-heavy and all-equal values.
- Strictly increasing/decreasing order patterns.
- Negative values, zeros, and sign-mix inputs.
- Single valid answer vs multiple valid answers.
- Invalid-state guard tests when applicable.
24.4 Mock Interview Protocol
60-Minute Format
- 5 min: clarify requirements and constraints.
- 10 min: propose brute and optimize to target approach.
- 30 min: code with incremental dry-runs.
- 10 min: edge tests + complexity discussion.
- 5 min: reflection and alternative approach.
Post-Mock Review Card
Record: final score, major mistake category, one technical fix, one communication fix, and next-day drill.
24.5 Contest + Interview Hybrid Routine
Use contests to build speed and pressure tolerance; use interview mocks to build explanation quality and structured reasoning.
- Contest Day: speed, pattern recognition, fallback strategy.
- Interview Day: clarity, trade-offs, readable production-style code.
- Bridge Task: rewrite one contest solution as interview-grade explanation + clean code.
24.6 Portfolio & Credibility Proof Pack
What to Build
- A public problem log: question, approach, mistakes, final complexity.
- At least 3 polished writeups: one graph, one DP, one design problem.
- One mini project using DSA choices with performance comparison.
- A revision tracker with weak-pattern trendline.
24.7 Final Readiness Gate
Promotion Criteria (Ready for Interviews)
- 10 consecutive medium problems solved under time budget.
- At least 3 hard problems solved with clean explanation.
- Average mock score >= 8.2 over last 6 mocks.
- No repeated critical mistakes in the last 2 weeks.
Summary
This section converts your course from content-heavy to outcome-driven. Once learners can pass this gate consistently, the practical course value typically reaches elite interview-prep standards.