The Ultimate DSA Master Course (Python)

1.1 What is an Algorithm

Introduction

Before you write a single line of code for any problem, you need a clear, step-by-step plan. That plan—when it is precise, unambiguous, and finite—is what we call an algorithm. This is the first and most important idea in the entire DSA course: every correct program is the implementation of some algorithm. If you understand what an algorithm is and what makes one good or bad, you have the foundation for everything that follows.

Real-World Analogy

Think of an algorithm like a recipe. A recipe has:

Inputs: ingredients (flour, eggs, sugar)
Steps: mix, bake at 350°F for 30 minutes
Output: a cake

If the steps are vague (“add some flour” or “bake until it looks done”), two people can get different results. If the steps are exact (“add 2 cups flour,” “bake 30 minutes”), anyone who follows them gets the same cake. An algorithm is the same: it takes defined inputs, follows a fixed set of clear steps, and produces a defined output. The more precise the steps, the more reliable the result.

Example

Giving someone directions is also an algorithm: start at A, turn left at the gas station, go 2 miles, turn right at the red house. Input = starting point; steps = the turns and distances; output = reaching the destination. Ambiguous directions (“go that way for a bit”) are like a bad algorithm—they don’t guarantee the same outcome every time.

Formal Definition

In computer science, we define an algorithm as:

Concept Note

An algorithm is a finite, well-defined sequence of unambiguous instructions that, when followed from a given set of inputs, produces a corresponding set of outputs and terminates in a finite amount of time.

Breaking that down:

Finite: The list of steps has an end. It doesn’t run forever.
Well-defined: Each step is clear. There’s no “do something useful” or “maybe do this.”
Unambiguous: Only one interpretation. No “sometimes do A, sometimes B” unless we explicitly say under what condition.
Inputs: We know what we’re given (e.g., a list of numbers, a string).
Outputs: We know what we must produce (e.g., the maximum, a sorted list, yes/no).
Terminates: The process stops. We don’t require infinite time or infinite memory.

Why This Topic Matters

Interviews and real-world design both revolve around algorithms. Interviewers ask you to “design an algorithm” for a problem—they want to see that you can break a task into clear, correct steps before coding. In production, the algorithm you choose decides whether your system scales (e.g., O(n log n) vs O(n²)) or fails under load. Understanding what an algorithm is—and what makes one better than another—is the basis for the rest of this course.

Mental Model

Picture an algorithm as a black box with a contract:

  ┌─────────────────────────────────────┐
  │  INPUT(s)                            │
  │  e.g. array [3, 1, 4, 1, 5]          │
  └──────────────┬──────────────────────┘
                 │
                 ▼
  ┌─────────────────────────────────────┐
  │  ALGORITHM (sequence of steps)       │
  │  • Step 1: ...                       │
  │  • Step 2: ...                       │
  │  • ...                               │
  └──────────────┬──────────────────────┘
                 │
                 ▼
  ┌─────────────────────────────────────┐
  │  OUTPUT(s)                           │
  │  e.g. maximum value 5                │
  └─────────────────────────────────────┘

You don’t have to care how the box is implemented (which language, which data structures) when you’re defining the algorithm. You only care: given these inputs, these steps produce this output. Implementation comes later.

Properties of a Good Algorithm

Not every set of steps is a “good” algorithm. We usually require:

Property	Meaning
Finiteness	Stops after a finite number of steps.
Definiteness	Each step is precisely defined; no ambiguity.
Input	Zero or more well-defined inputs.
Output	At least one well-defined output.
Effectiveness	Every step is doable in finite time (e.g., no “solve the Halting Problem”).

Step-by-Step: Describing an Algorithm

When you explain an algorithm, you typically:

State the problem: What are we given, and what must we return?
Assume valid input: (Or state what you assume, e.g., “array is non-empty.”)
List steps in order: Numbered, clear, one idea per step.
Handle edge cases: Empty input, single element, duplicates—whatever the problem allows.

You can describe the same algorithm in plain English, pseudocode, or code. The algorithm is the idea; the code is one way to run it.

Example: Find the Maximum in a List

Problem: Given a list of numbers, return the largest number.

Input: A list arr of numbers (we’ll assume at least one element).

Output: The maximum value in arr.

Algorithm (steps):

Assume the first element is the maximum; call it max_so_far.
For each remaining element in the list: if that element is greater than max_so_far, update max_so_far to this element.
When all elements have been checked, return max_so_far.

This is finite (we do one pass), well-defined (each step is clear), and effective. It’s an algorithm.

Python Implementation

def find_max(arr):
    if not arr:
        return None  # or raise, depending on contract
    max_so_far = arr[0]
    for i in range(1, len(arr)):
        if arr[i] > max_so_far:
            max_so_far = arr[i]
    return max_so_far

The code above is one implementation of that algorithm. The algorithm itself is the three-step idea; Python is just the language we used to run it.

Algorithm vs Program

Algorithm: Language-independent, step-by-step procedure. It’s the “what to do” and “in what order.”
Program: An implementation of one or more algorithms in a specific language (Python, Java, etc.). It’s the “how it runs on a machine.”

One algorithm can be implemented in many languages; the core logic stays the same. When we analyze “time complexity” or “space complexity” later, we’re analyzing the algorithm, not the quirks of a single language.

Common Mistake

Confusing “writing code” with “designing an algorithm.” In interviews, always clarify the steps (and edge cases) before coding. The algorithm is the plan; the code is the execution of that plan.

Interview Insight

When asked “how would you solve X?”, start by stating the problem (inputs/outputs), then give a high-level algorithm in steps. Only then translate to code. Showing that you think in algorithms—not just syntax—signals strong problem-solving skills.

Summary

An algorithm is a finite, well-defined, unambiguous sequence of steps that maps inputs to outputs and terminates.
Good algorithms are finite, definite, have clear input/output, and are effective.
You can describe an algorithm in words, pseudocode, or code; the algorithm is the idea, the code is one implementation.
Thinking in algorithms first (then implementing) is the foundation of DSA and of strong technical interviews.

1.2 What is a Data Structure

Introduction

An algorithm tells you what steps to follow. But those steps operate on something—numbers, names, relationships. How you organize and store that “something” so that your algorithm can work efficiently is the job of a data structure. Algorithms and data structures are inseparable: the right structure makes the algorithm simple and fast; the wrong one makes it clumsy or slow.

Real-World Analogy

Imagine you need to find a book in a library.

Stack of unsorted books: You look one by one until you find it. Slow to search, but easy to add a new book (just put it on top).
Books sorted by title on a shelf: You can jump to the right section (like binary search). Fast to search, but inserting a new book may require shifting others.
Index cards (catalog): You look up “Author name” and get the shelf location. Very fast lookup if you know the key.

Same “data” (books), different ways of organizing it—each with different tradeoffs for “find a book,” “add a book,” “remove a book.” A data structure is exactly that: a chosen way to organize and store data so that the operations we care about (search, insert, delete, etc.) are as efficient as we need.

Example

A queue at a ticket counter: first come, first served. The “structure” is an ordered line; the operations are “join at the end” and “serve from the front.” A stack of plates: last in, first out. Same idea—different rules for how we add and remove—different data structure.

Formal Definition

Concept Note

A data structure is a way to store, organize, and manage data in memory so that certain operations (access, insertion, deletion, search, etc.) can be performed efficiently.

“Efficiently” depends on the problem: sometimes we need fast lookup by key (dictionary); sometimes we need order (sorted list); sometimes we need fast insert at one end (list, queue). The data structure is the organization; the algorithm is the procedure that uses it.

Why This Topic Matters

In interviews and in production, the first question after “what’s the algorithm?” is often “what data structure will you use?” Picking an array vs a hash set can change the time complexity from O(n²) to O(n). Trees, graphs, and heaps exist because certain problems are naturally expressed and solved with those shapes. You don’t just “use a list for everything”—you choose a structure that matches the operations and constraints of the problem.

Mental Model

Every data structure gives you two things:

Storage layout: How the data is arranged in memory (contiguous array, linked nodes, key–value buckets, tree, graph).
Operation set: What you can do with it—and how fast. For example: “add at end O(1),” “search by value O(n),” “look up by key O(1) average.”

  ALGORITHM (steps)     +     DATA STRUCTURE (storage + operations)     =     PROGRAM
  "How to solve"               "Where to put data & how to access"           Working code

Change the data structure and the same high-level algorithm might become faster or simpler; change the algorithm and you might need a different structure. They are designed together.

Algorithm and Data Structure: A Partnership

An algorithm assumes it can do certain things with the data: “get the next element,” “check if this key exists,” “remove the smallest.” The data structure provides those operations. If your algorithm needs “find minimum quickly,” you might choose a heap; if it needs “check membership quickly,” you might choose a set. So:

Algorithm = what steps to perform.
Data structure = how data is organized so those steps are efficient.

You’ll often hear “hash map” or “two pointers” or “binary search”—the first is a data structure, the others are algorithmic ideas. In practice we combine them: e.g., “use a hash map and one pass” (structure + algorithm).

Types of Data Structures (High-Level)

We don’t need to memorize a catalog yet. It’s enough to see the landscape:

Linear: Data in a line—arrays, linked lists, stacks, queues. Order matters; access by position or by scanning.
Tree: Hierarchical—binary trees, BSTs, heaps. Good for “smallest/largest,” “split by range,” “parent–child” relationships.
Graph: Nodes and edges—networks, dependencies, maps. Good for “paths,” “connected components,” “shortest route.”
Hash-based: Store by key; access by key in (average) constant time. Dictionaries, sets.

Later sections of the course go deep on each. Here the takeaway is: different shapes support different operations and tradeoffs.

Same Data, Different Structure

Suppose you must support: “add an element at the front” and “traverse all elements.”

Array (list): Insert at front is expensive—you shift every element one position. O(n). Traverse is O(n).
Singly linked list: Insert at front is just “create a node and point the head to it.” O(1). Traverse is still O(n).

So for “insert at front” often, a linked list can be a better choice than an array. The data is the same (a sequence of items); the structure decides the cost of each operation.

Choosing a Data Structure

Ask:

What operations do I need? (insert, delete, search, get min, range query, etc.)
How often is each used? (e.g., search often → consider hash or sorted structure)
What are the constraints? (memory, order, duplicates allowed?)

Then match the structure to the operations. This “operation-first” thinking is how experienced engineers pick structures in interviews.

Expert Tip

When stuck, write down the operations you need (e.g., “add,” “remove max,” “check if present”). Then think: “Which standard structure supports these?” Often the problem is designed to fit one—array, hash, heap, tree, or graph.

Summary

A data structure is a way to store and organize data so that the operations you need (access, insert, delete, search) can be done efficiently.
Algorithms and data structures work together: the algorithm defines the steps; the structure defines how data is stored and what operations cost.
Different structures (linear, tree, graph, hash) offer different tradeoffs; choose based on the operations and constraints of the problem.

1.3 How to Think Like a Problem Solver

Introduction

Solving a problem in an interview or in real code isn’t about memorizing solutions—it’s about a repeatable way of thinking. This section gives you a framework: what to do from the moment you read the problem until you have a clear, implementable plan. Master this and you’ll handle new problems you’ve never seen before.

Why a Framework Matters

Without a process, it’s easy to jump into code, miss edge cases, or get stuck. With a process, you: clarify the problem, test your understanding with examples, get a simple solution first, then improve. That’s how strong problem solvers work—and how you should too.

The Problem-Solving Mindset

Clarify before solving: Make sure you understand inputs, outputs, and rules. Ask “what if” (empty input? duplicates? negative numbers?).
Start simple: Get a correct solution first, even if it’s slow. Correctness is the baseline; optimization comes next.
Use examples: Work 2–3 small examples by hand. If your approach works on paper, it’s easier to translate to code.
Break it down: If the problem is big, solve a smaller version or one part first.

Step-by-Step Framework

Follow these steps every time. They work for interviews and for practice.

Step 1: Read and Restate

Read the problem fully. Then restate it in your own words: “So I’m given X, and I need to return Y, with these rules: …” If you can’t restate it, you don’t understand it yet. Ask clarifying questions (in an interview) or re-read (in practice).

Concept Note

Restating forces you to identify the inputs, the output, and any constraints. That’s the contract your solution must satisfy.

Step 2: Identify Inputs and Output

Write them down explicitly. Example: “Input: array of integers, length n. Output: indices of two numbers that add up to target, or [-1, -1] if none.” Knowing I/O prevents you from solving the wrong problem.

Step 3: Work Small Examples and Edge Cases

Pick 2–3 small examples and solve them by hand. Then think of edge cases:

Empty input, single element, two elements
No valid answer (e.g., no pair sums to target)
Duplicates, negatives, zeros
Maximum size (what if n is huge?)

If your approach breaks on an edge case, fix the approach before coding.

Step 4: Brute Force First

Describe the simplest solution that definitely works—even if it’s “try every pair” or “check every possibility.” Say it out loud or write 3–5 steps. This gives you a correctness baseline and often reveals a path to a better solution.

Step 5: Optimize (If Needed)

Look for repeated work, unnecessary loops, or a better data structure. Can you use a hash map to avoid a second loop? Can you sort and use two pointers? We’ll build a toolkit of patterns; for now, the habit is: “I have a working solution; where is the waste?”

Step 6: Write the Code

Only after steps 1–5, translate your algorithm to code. You’re implementing a plan, not exploring in the dark. Test with the examples and edge cases you already thought through.

  READ → RESTATE → I/O → EXAMPLES & EDGE CASES → BRUTE FORCE → OPTIMIZE → CODE → TEST

Restate in Your Own Words

This is the single most underused habit. Before writing a single line, say: “I need to … given … and return … when …” If you do this, you’ll catch misunderstandings early and your code will match the problem.

Work Examples by Hand

Take a tiny input (e.g., array [2, 7, 11], target 9). Walk through your intended steps. Do you get [0, 1]? If yes, your logic is clear. If no, fix the logic. This “hand trace” is how you avoid bugs and how you explain your approach in an interview.

Common Mistake

Jumping into code before understanding the problem or testing an approach on paper. You’ll waste time debugging or solving the wrong problem. Always clarify and example first.

Interview Insight

Interviewers want to see your process. Talk through: “First I’m clarifying … here are the inputs and output … let me try a small example … my brute force would be … then I can optimize by …” That narrative is as important as the code.

Summary

Use a framework: read → restate → I/O → examples & edge cases → brute force → optimize → code.
Clarify the problem and restate it; identify inputs and output explicitly.
Work small examples and edge cases by hand before coding.
Get a correct (brute force) solution first, then look for optimization.

1.4 Brute Force Approach

Introduction

The brute force approach is the straightforward solution that “just tries everything” or does the most obvious thing—no clever tricks yet. It’s often slow for large inputs, but it’s simple, easy to get right, and gives you a correct baseline. In problem-solving and in interviews, you should start with brute force, then optimize only when needed.

What Is Brute Force?

Concept Note

Brute force means solving a problem by trying all relevant possibilities or by the most direct, naive method—without optimizing for time or space. Correctness first; efficiency second.

Examples:

Find two numbers that sum to target: Check every pair. Two nested loops. Slow (O(n²)) but obviously correct.
Find maximum in an array: Scan every element and keep the largest. One loop. For this problem, that “naive” scan is already optimal—so “brute force” here is also the best solution.
Find a value in an unsorted list: Linear search—check each element until you find it. Brute force and, for unsorted data, the only option.

So “brute force” doesn’t always mean “bad.” It means “no clever optimization yet.” Sometimes the brute force solution is already good enough.

Why Start With Brute Force?

Correctness: You have a solution that works. You can test it, debug it, and verify against examples.
Baseline: You know the worst-case behavior (e.g., O(n²)). Optimization then means “do better than this.”
Clarity: Simple logic is easier to explain and to implement without bugs.
Interview signal: Showing brute force first, then improving, demonstrates structured thinking—better than jumping to a half-remembered “optimal” idea and getting stuck.

Mental Model

Think: “What’s the dumbest correct solution?” That’s brute force. Then ask: “Where am I doing redundant work? Can a different data structure or a different order of operations remove it?” That’s the path from brute force to a better solution.

Example: Two Sum (Evolution)

Problem: Given an array of integers and a target, return indices of two numbers that add up to the target. Assume exactly one valid pair.

Brute Force: Try Every Pair

For each index i, for each index j > i, check if arr[i] + arr[j] == target. If yes, return [i, j].

Time: Two nested loops over n elements → O(n²).
Space: O(1) besides the input.

def two_sum_brute(arr, target):
    n = len(arr)
    for i in range(n):
        for j in range(i + 1, n):
            if arr[i] + arr[j] == target:
                return [i, j]
    return [-1, -1]

This is correct. For small n it’s fine. For large n we want to do better.

Better: One Pass With a Hash Map

As we scan the array, for each value arr[i] we need a partner target - arr[i]. If we’ve already seen that partner at some index j, we can return [j, i]. Store “value → index” in a dictionary.

Time: One pass, O(n). Each lookup/insert in the dict is O(1) average.
Space: O(n) for the dictionary.

def two_sum_better(arr, target):
    seen = {}  # value -> index
    for i, x in enumerate(arr):
        need = target - x
        if need in seen:
            return [seen[need], i]
        seen[x] = i
    return [-1, -1]

Same problem, same correctness—faster when n is large. This is the “evolution”: brute force first, then optimize by removing the inner loop with a hash map.

Optimization Insight

The jump from O(n²) to O(n) came from asking: “What am I doing repeatedly?” Answer: “Looking for a complement for each element.” A hash map turns that repeated search into O(1) lookup—classic trade: a bit of extra space for a lot less time.

When Is Brute Force Acceptable?

Small input: If n is tiny (e.g., ≤ 20), O(n²) or even O(2^n) might run in milliseconds. Brute force is fine.
Quick to code: In a contest or interview, a slow-but-correct solution can be better than no solution or a buggy “clever” one.
No better known: Some problems (e.g., certain NP-hard cases) don’t have a much better algorithm; brute force (or clever brute force) is what we use.

Evolution: Brute Force → Better → Optimal

Get in the habit of this progression:

Brute force: Correct, simple, maybe slow. State it and implement it.
Better: Identify the bottleneck (extra loop? repeated work?). Use a better structure or invariant (e.g., hash, two pointers, sort).
Optimal: Often “better” is already optimal (e.g., one pass + hash for Two Sum). Sometimes you can prove “we must look at each element at least once” → O(n) is a lower bound.

Don’t skip step 1. It keeps your thinking grounded and your code correct.

Common Mistake

Optimizing too early and producing a wrong or incomplete solution. Or not mentioning brute force in an interview—interviewers want to see that you can get a correct solution first, then improve.

Interview Insight

Say: “The brute force would be to try every pair / check every subset / … That’s O(…). Then we can optimize by …” This shows you prioritize correctness and then efficiency—exactly what interviewers look for.

Summary

Brute force = straightforward, try-all or naive solution. Correctness first; no clever optimization yet.
Always consider brute force first: it’s a correct baseline and often reveals how to optimize.
Evolution: brute force → find bottleneck → better data structure or algorithm → optimal (if needed).
For small inputs or when a better solution isn’t obvious, brute force is acceptable and sometimes preferred.

1.5 Optimization Strategy

Introduction

Once you have a correct solution—often brute force—the next step is to ask: where is the time or space being wasted? Optimization isn’t random cleverness; it’s a systematic way to find bottlenecks and remove them. This section gives you a repeatable optimization strategy so you can move from “it works” to “it scales.”

Why Optimize Systematically?

Without a strategy, you might optimize the wrong part (e.g., micro-tune a loop that’s already O(n) while the real cost is an O(n²) nested loop elsewhere). With a strategy, you measure or reason about where the cost is, then attack that first. One bottleneck fixed often yields a bigger win than many small tweaks.

Mental Model

Think of your program as a pipeline: some steps run once, some run inside loops. The total time is dominated by what runs most often or on the largest data. Optimization means: find that hot spot, then either do less work there or do the same work fewer times.

Step-by-Step Optimization Strategy

Step 1: Establish Correctness First

Never optimize broken code. Get a solution that passes your tests and handles edge cases. Optimization changes code; if the baseline is wrong, you’ll either hide bugs or optimize the wrong behavior.

Step 2: Identify the Bottleneck

Ask: “What is the slow part?”

By counting: Look at loops. A single loop over n → O(n). Two nested loops over n → O(n²). Three nested → O(n³). The deepest or most repeated structure usually dominates.
By profiling (when you can run code): Use a profiler to see where CPU time is spent. Focus on the top few functions or lines.

In interviews you usually reason by counting. In production, profiling confirms your guess.

Step 3: Ask “What Am I Doing Redundantly?”

Often the bottleneck is repeated work. Examples:

Repeated search: For each element, scanning the rest of the array to find a match → O(n²). Replace the inner scan with a hash lookup → O(n).
Repeated computation: Computing the same sum or max over and over in a loop. Compute once, reuse (e.g., prefix sum, sliding window).
Repeated traversal: Walking the whole list to find one item, many times. One pass with a hash or one sort + linear pass can often replace many passes.

Optimization Insight

Most big wins come from eliminating a loop (by a better data structure) or from reusing work (caching, prefix sums, invariants). Look for “do I need to do this every time, or can I do it once and reuse?”

Step 4: Consider the Right Data Structure

The wrong structure forces extra work. Examples:

“Check if this value exists” in a list → O(n) per check. In a set → O(1) average. Replace list membership with a set when you need many lookups.
“Get the smallest element” in a list → O(n). In a min-heap → O(log n) for extract-min. Use a heap when you need repeated min/max.
“Insert at front” in an array → O(n) shift. In a linked list → O(1). Choose the structure that makes your dominant operation cheap.

Step 5: Trade Space for Time (When Appropriate)

Often you can use extra memory to avoid recomputation: hash maps, prefix arrays, caching. If the problem allows O(n) extra space and you can turn O(n²) into O(n), that’s usually a good trade. Don’t over-optimize space when time is the constraint.

Step 6: Re-evaluate After a Change

After each optimization, confirm correctness and re-check the bottleneck. Sometimes the bottleneck moves (e.g., I/O or a different loop). Stop when you meet the required performance or when further optimization isn’t worth the complexity.

  CORRECT solution → FIND bottleneck (loops, repeated work) → REMOVE redundancy
  (data structure / cache / one pass) → VERIFY still correct → REPEAT if needed

Common Optimization Patterns

Hash for lookups: Replace “search for X in the rest of the array” with “is X in this set?” → often O(n²) to O(n).
Two pointers / sliding window: Replace “for every start, for every end” with one pass where start and end move in one direction → O(n²) to O(n).
Sort first: Sorted data enables binary search or two-pointer scans. Sort is O(n log n); if that’s cheaper than repeated linear scans, sort once and reuse.
Prefix sum: Replace “sum from i to j” computed in a loop with precomputed prefix array → O(1) per query after O(n) setup.

When to Stop Optimizing

Optimize until:

You meet the time/space constraints (e.g., problem says n ≤ 10^5 and your solution is O(n log n)), or
Further optimization would make the code much harder to read or maintain without a clear need.

Don’t optimize for the sake of it. Correct and clear first; then fast enough.

Interview Insight

State your brute force and its complexity, then say: “The bottleneck is the inner loop / repeated lookup. I can remove it by using a hash map / two pointers / …” That shows you think in bottlenecks and know how to optimize systematically.

Summary

Optimize only after you have a correct solution.
Find the bottleneck (nested loops, repeated work); attack that first.
Look for redundancy: repeated search → hash; repeated range computation → prefix sum or sliding window.
Choose the right data structure so the dominant operation is cheap.
Trade space for time when it removes a bottleneck.

1.6 Writing Pseudocode

Introduction

Pseudocode is a compact, language-agnostic way to describe an algorithm using a mix of plain English and simple control structures (loops, conditionals). It’s the bridge between “I know what to do” and “here’s the code.” Writing pseudocode first helps you get the logic right before you worry about syntax, and it’s how you’re expected to communicate your approach in interviews.

Why Write Pseudocode?

Clarify logic: You focus on steps and order, not semicolons or types. Mistakes in logic show up before you write a single line of code.
Communicate: In interviews, pseudocode lets you explain your algorithm quickly. Interviewers can follow it even if they don’t use your language.
Plan: It’s an outline. You can refine it (e.g., “here I need a loop”) and only then translate to real code.

What Pseudocode Looks Like

There’s no single standard. The goal is: readable and unambiguous. Use:

Indentation for blocks (like Python).
Keywords such as if / else / for / while / return.
Short names for variables and data (e.g., arr, n, result).
Plain English for high-level steps when that’s clearer than fake code.

Avoid: language-specific syntax (e.g., list comprehensions or pointer arithmetic) unless they make the idea obvious. Prefer clarity over looking like real code.

Conventions We’ll Use

Assignment: x = value or result ← value.
Indexing: arr[i] for the i-th element (0-based unless stated).
Length: length(arr) or n when we’ve set n = length(arr).
Loops: for i from 0 to n-1 or for each element x in arr.
Return: return result.

Example: Two Sum in Pseudocode

Input: Array arr of integers, integer target.

Output: Indices i, j such that arr[i] + arr[j] == target, or “none” if no such pair.

function twoSum(arr, target):
    n = length(arr)
    for i from 0 to n-1:
        for j from i+1 to n-1:
            if arr[i] + arr[j] == target:
                return [i, j]
    return "none"

This is brute force. Now optimized version using a hash map:

function twoSum(arr, target):
    seen = empty map   // value -> index
    for i from 0 to length(arr)-1:
        need = target - arr[i]
        if need is in seen:
            return [seen[need], i]
        put (arr[i], i) in seen
    return "none"

Same problem; the second version makes the “lookup complement” step explicit and shows we only need one pass.

How Much Detail?

Enough that someone could implement it without guessing. Include:

Loop bounds and what the loop variable means.
Conditions for if/else and what you return in each case.
Where you update state (e.g., “add current element to set”).

You can omit: exact type names, error handling, or trivial details (“increment i”).

Expert Tip

In an interview, write 5–15 lines of pseudocode before coding. It keeps you on track and gives the interviewer a clear picture of your algorithm. If you get stuck coding, the pseudocode is your roadmap.

Common Mistake

Writing pseudocode that’s so vague (“do something with the array”) that it doesn’t constrain the implementation. Or writing full code in pseudocode—then small syntax errors distract from the logic. Aim for the middle: clear structure and key steps, not every semicolon.

Summary

Pseudocode = language-agnostic description of an algorithm using simple control flow and names.
Use it to clarify logic, communicate in interviews, and plan before coding.
Keep it readable: indentation, clear loops and conditions, enough detail to implement without guessing.

1.7 Debugging Techniques

Introduction

Debugging is the process of finding and fixing the cause of incorrect behavior—wrong output, crash, or infinite loop. It’s a core skill: even correct algorithms get implemented with off-by-one errors or missed edge cases. This section gives you a systematic way to narrow down where the bug is and fix it quickly.

Mindset

Assume the bug is in your code or assumptions, not the compiler or the problem. Form a hypothesis (“maybe the loop starts at the wrong index”), test it, and adjust. Random changes rarely help; targeted checks do.

Step-by-Step Debugging Process

Step 1: Reproduce the Failure

Get a concrete input that produces the wrong result (or crash). If you can’t reproduce it, you can’t fix it reliably. Use the problem’s examples first; then try small inputs you design (edge cases: empty, one element, two elements).

Step 2: State the Expected vs Actual

Write down: “For input X, I expected Y, but I got Z.” That makes the bug unambiguous. Sometimes in writing this you already spot the mistake (e.g., “I expected the first index to be 0 but I’m returning 1”).

Step 3: Narrow the Location

Find where the program first goes wrong. Methods:

Print / log: Print key variables at the start of the loop, after conditionals, and before return. Check: “At step 2, is i what I think it is? Is seen correct?”
Binary search: Comment out the second half of the logic (or test with half the input). If the bug disappears, it’s in the part you removed. Narrow until you’re down to a few lines.
Rubber duck: Explain the code line by line to yourself or someone else. Often you hear yourself say something wrong (“and here we add 1 … wait, we shouldn’t”).

Step 4: Find the Root Cause

Don’t stop at “it’s in this function.” Identify the exact wrong assumption or wrong operation. Examples: “I used < but the problem says strictly less,” “I’m updating the index after the loop so it’s off by one,” “I’m not handling empty input.” Fix that assumption or line.

Step 5: Fix and Re-test

Make the minimal change that fixes the bug. Re-run the failing case and the examples you had passing. Add a test for the edge case you missed so it doesn’t come back.

Techniques in Practice

Print Debugging

Insert print(...) (or logs) at critical points: loop start (loop variable, key state), after branches, before return. Inspect the output. Remove or comment out prints when done.

# Example: debugging a loop
for i in range(len(arr)):
    print("i:", i, "arr[i]:", arr[i], "need:", target - arr[i])  # temporary
    if (target - arr[i]) in seen:
        return [seen[target - arr[i]], i]
    seen[arr[i]] = i

Use a Debugger

When you can run the code locally, use a debugger: set breakpoints, step line by line, inspect variables. You see state at each step without adding prints. Essential for larger programs.

Check Edge Cases Explicitly

Many bugs are at boundaries: empty input, one element, first/last index. Add a small test for these. If the code fails on “empty list,” the fix is usually at the start of the function (early return or different initialization).

Compare With a Working Example

Trace through your code by hand on the same input that fails. Write down the value of each variable at each step. Where does your trace first differ from what the code actually does? That’s near the bug.

Common Bug Categories

Off-by-one: Loop runs one too many or too few times; index is 0-based vs 1-based. Check loop bounds and indexing.
Wrong condition: Used < instead of <=, or forgot to handle equality. Re-read the problem and your condition.
State not updated: Forgot to add an element to a set or update a variable inside a loop. Ensure every path that should update state does.
Edge case: Empty input, single element, or “no answer” case not handled. Add explicit checks.

Common Mistake

Changing code randomly without a hypothesis or without re-running the same failing case. Always reproduce, locate, then fix. One fix at a time.

Interview Insight

When your code fails a test case, say: “Let me trace through with this input.” Walk through your logic step by step and state expected values. You’ll often spot the bug while explaining. That’s more impressive than silent trial-and-error.

Summary

Reproduce the failure with a concrete input; state expected vs actual.
Narrow the bug location (prints, binary search, rubber duck).
Find the root cause (wrong condition, off-by-one, missing edge case), fix it, then re-test.
Watch for off-by-one, wrong conditions, missing state updates, and unhandled edge cases.

1.8 Handling Edge Cases

Introduction

An edge case is an input or situation at the “boundary” of what your solution is designed for: empty input, a single element, maximum size, or a “no valid answer” scenario. Code that works on “normal” examples often fails on edge cases. Handling them explicitly is what separates a quick hack from a robust solution—and what interviewers look for.

Why Edge Cases Matter

In production, edge cases are where systems break: empty list causing a crash, or one user when the logic assumed many. In interviews, test cases often include edge cases on purpose. If you don’t handle them, your solution is incomplete. Thinking about edge cases up front also improves your algorithm design: you clarify the problem and avoid bugs before they happen.

What Counts as an Edge Case?

Depends on the problem. Common categories:

Size Boundaries

Empty: Empty array, empty string, zero elements. Many algorithms assume “at least one”; they crash or return nonsense on empty.
Single element: One item in the list, one character in the string. Loops that assume “first and rest” or “pair” can break.
Two elements: Minimal “non-trivial” case. Good for testing “first and last” or “pair” logic.
Large input: Maximum n. Tests performance and overflow (e.g., integer or recursion depth).

Value Boundaries

Zero, negative, or maximum values: Division by zero, negative indices, or values at the limit of the type.
Duplicates: All elements equal, or many duplicates. “Find two that sum to target” with many repeated numbers can affect indexing or “same index” bugs.
No valid answer: Problem says “return -1 if not found” or “return empty list.” Your code must explicitly handle and return that.

Structural / Problem-Specific

Already sorted / reverse sorted: For sorting or search, these can stress your algorithm.
All same: All zeros, all same character. Can break “find the different one” or “max” logic if you’re not careful.

How to Handle Edge Cases

1. List Them Before Coding

After reading the problem, write down 3–5 edge cases: “What if empty? What if one element? What if no pair exists?” Then ensure your algorithm and code account for each. In an interview, say them out loud: “I’ll need to handle empty input and the case when no two numbers sum to target.”

2. Early Returns / Guards

At the start of the function, check for edge cases and return a defined result. That keeps the main logic simple and avoids special cases inside loops.

def find_max(arr):
    if not arr:           # edge: empty
        return None
    if len(arr) == 1:     # edge: single element (optional, main loop handles it)
        return arr[0]
    # main logic
    result = arr[0]
    for i in range(1, len(arr)):
        if arr[i] > result:
            result = arr[i]
    return result

3. Design the Main Logic to Naturally Handle Boundaries

Sometimes the “normal” logic already works for one element or two (e.g., a loop that runs from 0 to n-1 and updates a result). Other times you need a special case. Prefer one clear path when possible; use early returns when the edge is truly different.

4. Test Edge Cases Explicitly

Before submitting or shipping, run: empty input, one element, two elements, no answer, all same. If any fails, fix the code and add that case to your mental (or automated) test list.

Example: Two Sum Edge Cases

Empty array: Return “no pair” (e.g., [-1, -1] or empty list) without entering the loop.
Single element: Can’t form a pair; return “no pair.”
No pair sums to target: After the loop, return the agreed “not found” value.
Duplicate values: Ensure you don’t use the same index twice (e.g., i and j must be different). Your algorithm should already enforce that (e.g., j > i or storing index when we see a value).

Concept Note

Edge cases are not “extra”—they define the contract of your function. The problem statement usually says what to return when input is empty or when there’s no solution; your code must implement that contract.

Common Mistake

Assuming “input will always have at least two elements” or “there will always be an answer” without checking the problem. If the problem doesn’t guarantee it, handle the opposite case explicitly.

Interview Insight

After reading the problem, say: “I’ll consider edge cases: empty input, single element, and when no valid answer exists.” Then add the checks. Interviewers notice when you proactively handle boundaries.

Summary

Edge cases = boundary inputs or situations: empty, single element, no answer, duplicates, max size, etc.
List them before coding; use early returns or design main logic so boundaries are handled.
Test edge cases explicitly; they define the contract of your solution.

1.9 Problem Decomposition

Introduction

Problem decomposition is the skill of breaking a large, complex problem into smaller subproblems that are easier to solve. You solve the small pieces (often recursively or in order), then combine their results to get the full solution. It’s the same idea behind “divide and conquer,” dynamic programming, and clean system design—and it’s how you avoid feeling overwhelmed by a hard problem.

Why Decompose?

Manageable pieces: A subproblem like “find the max in the left half” is easier to think about than “find the max in the whole array” when you’re building recursion or iteration.
Reuse: The same subproblem often appears many times (e.g., “sort this range”). Solve it once and reuse—that’s the heart of dynamic programming and recursion.
Clear structure: Dependencies between subproblems (e.g., “solve A and B before C”) define the order of computation and help you avoid circular or missing steps.

Mental Model

Picture the problem as a tree: the root is the original problem; children are subproblems. You solve leaves first (base cases), then combine upward until you reach the root. The art is choosing a split that makes the subproblems simpler and the combination step cheap.

  Original: "Sort array A[0..n-1]"
       |
       v
  Subproblem 1: Sort A[0..mid-1]    Subproblem 2: Sort A[mid..n-1]
       |                                    |
       v                                    v
  (base: 1 element, done)            (base: 1 element, done)
       |                                    |
       +---------> COMBINE: merge two sorted halves --------> sorted A

How to Decompose

Step 1: Identify Natural Subproblems

Ask: “What smaller version of this problem would help?” Examples: “max of array” → “max of left half” and “max of right half,” then take the larger. “Count ways to reach step n” → “ways to reach n-1” and “ways to reach n-2,” then add. The subproblems should have the same structure as the original (same kind of input/output, smaller size or simpler case).

Step 2: Define the Dependency

Does subproblem A need the result of B? Then B must be solved first. In recursion, that’s “call B, then use its result in A.” In dynamic programming, that’s “fill table in an order so that when we compute A, B is already computed.”

Step 3: Identify Base Cases

What’s the smallest or simplest case you can solve directly? Empty array, single element, zero steps—these stop the recursion or form the first row/column of your DP table. Without clear base cases, decomposition doesn’t terminate.

Step 4: Define How to Combine

Given solutions to the subproblems, how do you get the solution to the bigger problem? Merge two sorted halves, take max of two values, add counts, etc. The combine step should be simpler than solving the whole problem from scratch.

Example: Counting Inversions

Problem: Count pairs (i, j) with i < j and arr[i] > arr[j].

Decomposition: Split array into left and right halves. Count inversions in left, count in right, and count inversions that cross the middle (one in left, one in right). The cross count can be computed during a merge-like pass: when we take an element from the right and there are remaining elements in the left, each of those left elements forms an inversion with this right element. So we get:

Subproblem 1: count inversions in left half (same problem, smaller n).
Subproblem 2: count inversions in right half.
Combine: add left count + right count + cross count (computed in O(n) during merge).

Base case: 0 or 1 element → 0 inversions. This is merge-sort with an extra counter—decomposition turned a hard count into “solve two halves + merge.”

Top-Down vs Bottom-Up

Top-down (recursion / memoization): Start from the full problem, recurse to subproblems, combine on the way back. Natural to think about; you must handle overlapping subproblems (memoize) to avoid exponential time.
Bottom-up (tabulation): Solve smallest subproblems first, then larger ones, in order. No recursion; good when dependency order is clear (e.g., “solve for size 1, then 2, then … n”).

Same decomposition; different order of evaluation. Choose based on what’s easier to implement and what fits the problem.

Expert Tip

When stuck on a problem, ask: “If I had the answer for a smaller input (n-1, n/2, or a subset), could I use it to get the answer for the full input?” If yes, you’ve found a decomposition.

Interview Insight

Say: “I can break this into …” and name the subproblems and how you’ll combine. That shows structured thinking. Then implement the base case and the combine step; the recursion or loop over subproblems often falls into place.

Summary

Problem decomposition = break a big problem into smaller subproblems, solve them, then combine.
Identify subproblems, their dependencies, base cases, and the combine step.
Same idea underlies recursion, divide-and-conquer, and dynamic programming.

1.10 Pattern Recognition in Problems

Introduction

Many problems that look different on the surface share the same underlying pattern: same kind of input, same kind of operation, same data structure or technique that fits. Once you recognize “this is a two-pointer problem” or “this is a BFS shortest-path,” you can apply a known strategy instead of inventing from scratch. Pattern recognition is what turns experience into speed and confidence.

Why Patterns Matter

Interview and contest problems are often designed to test a specific technique. The problem statement might not say “use a hash map”—you’re expected to see that “find two things that sum to target” or “check if we’ve seen this before” suggests a hash map. The more patterns you know, the faster you match the problem to the right tool and the less you waste time on the wrong approach.

Mental Model

Read the problem → notice structure (sorted? pairs? subsequences? paths?) and operations (find, count, maximize, exists?) → map to a pattern → apply the standard strategy for that pattern. Pattern recognition is the step between “what’s being asked” and “which algorithm/structure to use.”

Common Patterns (High-Level)

You’ll see these again in later sections. Here we name them so you can start building the mapping.

Two Pointers

Two indices moving over a sequence (often from both ends or both from the start). Good for: “two numbers that sum to target” in a sorted array, “remove duplicates in place,” “palindrome check,” “merge two sorted arrays.” Clue: sorted array or string, “pair,” “two indices.”

Sliding Window

A contiguous segment of fixed or variable size that moves one step at a time. Good for: “longest subarray with sum ≤ K,” “minimum window containing all characters,” “max in every window of size k.” Clue: “subarray,” “substring,” “contiguous,” “window.”

Hash Map / Set for Lookup

Store what you’ve seen; check membership or fetch in O(1). Good for: “two sum,” “first duplicate,” “group anagrams,” “subarray with sum 0.” Clue: “find a pair,” “have we seen this,” “count distinct,” “group by.”

Prefix Sum

Precompute cumulative sums (or other aggregates) so range queries are O(1). Good for: “subarray sum equals K,” “range sum queries.” Clue: “sum of range,” “subarray sum,” multiple range queries.

Binary Search (on Answer or on Index)

When the answer or the “split point” is in a sorted space, binary search can reduce tries. Good for: “find minimum capacity,” “search in sorted array,” “kth smallest.” Clue: sorted data, “minimum maximum,” “feasibility check.”

BFS / DFS (Graph or Implicit Graph)

Explore neighbors level by level (BFS) or depth-first (DFS). Good for: shortest path in unweighted graph, connected components, “reachable,” “level order.” Clue: “shortest path,” “neighbors,” “grid,” “level.”

Dynamic Programming

Optimal substructure + overlapping subproblems. Good for: “maximum sum,” “count ways,” “longest increasing subsequence,” “edit distance.” Clue: “maximum/minimum,” “count,” “choose or skip,” “subsequence.”

How to Get Better at Recognition

Solve by pattern: After solving a problem, label it (“two pointers,” “sliding window”). Next time you see similar wording or structure, try that pattern first.
Note keywords: “Subarray” often suggests sliding window or prefix sum; “two numbers” in sorted array suggests two pointers; “shortest path” in unweighted graph suggests BFS.
Use constraints: Large n with “find pair” often rules out O(n²) brute force and points to hash or two pointers. Small n might allow brute force or bitmask DP.

Concept Note

Patterns are not rigid. One problem might be solvable with two pointers or a hash map. The goal is to narrow the set of strategies quickly so you don’t wander. If one pattern doesn’t fit, try the next best match.

Interview Insight

After reading the problem, say: “This looks like a two-pointer / sliding-window / … problem because …” Then outline the standard approach. That signals you’ve seen similar problems and know the template.

Summary

Pattern recognition = matching problem structure and operations to a known technique (two pointers, hash, sliding window, BFS, DP, etc.).
Use keywords, constraints, and experience to narrow the strategy.
Naming the pattern and applying the template speeds you up and impresses interviewers.

1.11 Fermi Estimation for System Design

Introduction

Fermi estimation (named after physicist Enrico Fermi) is the skill of getting an order-of-magnitude answer to a question using rough assumptions and simple arithmetic—no calculator, no exact data. “How many piano tuners are in Chicago?” “How many queries per second does Google handle?” You break the question into a few factors, estimate each to the nearest power of 10 or a round number, multiply or divide, and get a number that’s right within a factor of 10 or so. In system design interviews and in real-world capacity planning, this is how you sanity-check ideas and communicate scale.

Why It Matters

Interviewers ask “how many servers do you need?” or “what’s the storage for 1 billion users?” They don’t expect a precise number—they want to see that you can reason about scale: break the problem down, estimate each piece, and combine. Fermi estimation is that process. In practice, it’s how you quickly check if a design is in the right ballpark before diving into details.

How to Do It

Step 1: Break the Question Into Factors

Turn the big question into a product or quotient of quantities you can guess. Example: “Queries per second for Google search?” → (queries per user per day) × (number of users) ÷ (seconds per day). Each factor is easier to estimate than the whole.

Step 2: Estimate Each Factor

Use round numbers and powers of 10. “Number of people on Earth” ≈ 10^9 (actually ~8 billion, but 10^9 is fine). “Searches per person per day” might be 1–10; say 5. “Seconds per day” = 24 × 3600 ≈ 10^5. It’s okay to be wrong by 2–3×; we’re aiming for order of magnitude.

Step 3: Multiply or Divide

Do the arithmetic. Prefer mental math: 5 × 10^9 / 10^5 = 5 × 10^4 = 50,000. So “on the order of 10^4 to 10^5 queries per second” is a valid answer. Stating “roughly 50k QPS” or “tens of thousands” is better than “I don’t know” or an overly precise wrong number.

Step 4: Sanity Check

Does the result make sense? If you got 10^9 QPS for Google, that might be too high (global internet traffic is finite). If you got 10 QPS, that’s too low for a global product. Adjust assumptions if the result is obviously off.

Example: How Many Servers for a Video Platform?

Question: Roughly how many servers does a YouTube-scale video platform need?

Breakdown:

Assume 1 billion daily active users, each watching ~30 min of video per day → 0.5 billion hours of video streamed per day.
Assume average bitrate ~1 Mbps (mixed quality) → 0.5 × 10^9 × 3600 × 1 Mbit ≈ 1.8 × 10^15 bits per day. In bytes, ~2 × 10^14 bytes/day ≈ 2 × 10^11 bytes per second (roughly 200 Gbps global).
If one server can serve ~10 Gbps (simplified), we need on the order of 200/10 ≈ 20 servers just for egress? That’s too low because we ignored replication, peaks, storage, etc. So we might say “order of 10^5 to 10^6 machines” when we include redundancy, storage servers, and peak load. The exact number isn’t the point—the reasoning is.

Interviewers care that you: (1) break it down, (2) state assumptions, (3) do the math, (4) sanity check.

Rules of Thumb

Round boldly: 7 billion → 10^9; 24 × 3600 → 10^5. Fewer significant figures = faster and often “good enough.”
State assumptions: “I’m assuming 10 million users …” so the interviewer can correct you or follow your logic.
One or two significant figures: “About 50k” or “on the order of 10^5” is fine. Avoid fake precision like “47,382.”

Example

“How much storage for 1 billion users with 100 MB each?” → 10^9 × 10^8 bytes = 10^17 bytes = 100 PB. Saying “around 100 petabytes” or “order of 10^17 bytes” is a valid Fermi answer.

Interview Insight

When asked for scale or capacity, don’t freeze. Say: “Let me break this down. We need … I’ll assume … So that’s roughly … Does that order of magnitude make sense?” Showing the process matters more than the exact number.

Summary

Fermi estimation = order-of-magnitude answers using rough assumptions and simple arithmetic.
Break the question into factors, estimate each (powers of 10, round numbers), multiply/divide, then sanity check.
State assumptions; aim for one or two significant figures. The reasoning is what interviewers evaluate.

1.12 Invariants & Monovariants in Logical Problems

Introduction

In logical and algorithmic problems, an invariant is something that stays true throughout a process (e.g., before and after each step of a loop or each move in a game). A monovariant (or monotonic variant) is a quantity that only moves in one direction—usually it only decreases (or only increases). Invariants help you prove correctness or narrow down possible states; monovariants help you prove that a process eventually terminates. Together they’re powerful tools for reasoning about algorithms and puzzles.

Why They Matter

In interviews you might get a puzzle or a “prove this loop terminates” question. In competitive programming, some problems are solved by finding an invariant that must hold and then constructing the answer from it. In code, loop invariants are what you use to reason that your algorithm is correct. So even if you don’t hear the words “invariant” and “monovariant” every day, the ideas are central to rigorous thinking.

Invariants

Definition

An invariant is a condition or quantity that remains true (or unchanged) every time a certain operation is applied. You typically state it “at the start of the loop” and “after each iteration”—and show that if it was true before the step, it’s still true after.

Example: Sum of Array Mod 2

Suppose you have a game where in each move you pick two elements and replace them with their sum. The sum of the whole array mod 2 (i.e., even or odd) never changes: (a + b) mod 2 = (a mod 2 + b mod 2) mod 2, so replacing two numbers by their sum doesn’t change the parity of the total. So if the initial sum is odd, the final single number (if you merge everything) must be odd. That invariant restricts what the answer can be.

Example: Loop Invariant in Code

In “find max in array,” a useful invariant is: “At the start of each iteration, max_so_far is the maximum of all elements we’ve seen so far.” Before the loop it’s true (we’ve seen only the first element). Each step we compare the next element and update; so after the step we’ve still seen a prefix and max_so_far is still the max of that prefix. When the loop ends we’ve seen the whole array, so max_so_far is the global max. That’s how you prove the loop correct.

Monovariants

Definition

A monovariant is a quantity that (under the rules of the process) only increases or only decreases. For example: “the number of inversions in the array” might only decrease when we swap two elements in a certain way. If it’s bounded below (e.g., by 0) and it’s integer, it can only decrease a finite number of times—so the process must stop. That’s a termination argument.

Example: Distance to Goal

In a BFS over a grid, “distance from start” increases as we go to deeper layers. So “distance from start” is a monovariant (monotonically increasing along any path). We don’t use it for termination (BFS stops when we find the goal or exhaust the graph), but we use a similar idea: “steps taken” only increases, and the graph is finite, so we can’t run forever.

Example: Inversion Count

In bubble sort, each swap reduces the total number of inversions by exactly 1. So “inversion count” is a monovariant that decreases with each swap. It’s bounded below by 0. So after a finite number of swaps, we must have 0 inversions—i.e., the array is sorted. That proves bubble sort eventually terminates (and with a sorted array).

Using Invariants and Monovariants Together

In a puzzle: first find an invariant that limits what’s possible (e.g., “parity of sum”). Then, if the process has steps that change the state, find a monovariant that only decreases (or increases) and is bounded, so the process must stop. The invariant might tell you the only possible final state; the monovariant tells you you’ll get there.

Concept Note

Invariant = “X stays true (or unchanged).” Use it to prove correctness or to restrict possible answers. Monovariant = “Y only decreases (or only increases) and is bounded.” Use it to prove termination.

Example

“Prove that this loop terminates.” Identify a quantity that (1) decreases (or increases) each iteration, and (2) has a lower (or upper) bound. For a loop that divides n by 2 each time, “n” is a monovariant: it decreases and is bounded below by 0 (or 1). So the loop runs at most O(log n) times.

Interview Insight

When asked “why does this terminate?” or “what’s always true here?”, name an invariant or monovariant and state why it holds. For example: “The number of inversions decreases with each swap and is non-negative, so we can only do finitely many swaps.”

Summary

An invariant is a condition or quantity that stays true (or unchanged) through each step; use it to prove correctness or narrow possibilities.
A monovariant is a quantity that only increases or only decreases and is bounded; use it to prove termination.
Together they support rigorous reasoning about loops, games, and puzzles.

2.1 Python Data Types

Introduction

Every value in Python has a type: it’s an integer, a string, a list, a dictionary, and so on. The type determines what you can do with the value (index it, add to it, hash it) and how it behaves in memory (mutable vs immutable, shared by reference). For DSA, you need a clear picture of the built-in types so you can pick the right one and avoid subtle bugs—especially around mutability and copying.

Why Data Types Matter for DSA

Different types support different operations at different costs. You use a list when you need order and index access; you use a set or dict when you need fast membership or key lookup. Immutable types (e.g., str, tuple) can be used as dictionary keys and in sets; mutable ones cannot. Knowing types helps you reason about time complexity (e.g., “in” on a list is O(n), on a set is O(1) average) and about correctness (e.g., “did I just mutate a shared list?”).

Numeric Types

int

Integers in Python have arbitrary precision: they can be as large as memory allows. There’s no fixed 32- or 64-bit overflow like in C or Java (though operations get slower for huge numbers). For DSA, this means you usually don’t worry about integer overflow in pure Python; in contests or when interfacing with other systems, modulo arithmetic might still be required by the problem.

x = 42
y = 10**100   # valid in Python
print(type(x))  # <class 'int'>

float

Floating-point numbers are approximate. Avoid using == for equality; use tolerance checks (e.g., abs(a - b) < 1e-9) or integer arithmetic when possible. For DSA, many problems use only integers; when floats appear, be aware of precision issues.

bool

True and False. They’re a subtype of int (True is 1, False is 0), but for clarity use them as booleans. Used in conditionals and as results of comparisons.

Sequence Types

str (string)

Immutable sequence of Unicode characters. Indexing s[i] and slicing s[l:r] are O(1) for access, but slicing creates a new string. Concatenating with + in a loop is O(n²) for n characters; prefer ''.join(...) for building strings. For DSA, strings often appear in pattern matching, palindromes, and parsing.

s = "hello"
# s[0] = 'H'   # TypeError: str does not support item assignment
t = s.upper()  # returns new string; s unchanged

list

Mutable, ordered sequence. Append at end is amortized O(1); insert at front is O(n). Indexing and assignment arr[i] = x are O(1). The main workhorse for arrays in DSA. Supports negative indices: arr[-1] is the last element.

arr = [3, 1, 4, 1, 5]
arr.append(9)    # O(1) amortized
arr[0] = 10     # O(1)
# arr[10]       # IndexError

tuple

Immutable sequence. Same indexing and slicing as list, but you can’t change elements. Use when you need a fixed sequence (e.g., (x, y) coordinates, return multiple values) or when you need a hashable value (e.g., as dict key or set element).

t = (1, 2, 3)
# t[0] = 10    # TypeError
point = (x, y)  # common for coordinates

Mapping and Set Types

dict

Key–value mapping. Keys must be hashable (immutable types: int, str, tuple, etc.). Lookup, insert, and delete by key are O(1) average. Essential for “count frequency,” “have we seen this,” “two sum” style problems. Iteration order is insertion order (Python 3.7+).

d = {}
d["a"] = 1
d["b"] = 2
if "a" in d:      # O(1) average
    print(d["a"])

set

Unordered collection of unique, hashable elements. Add, remove, and “in” are O(1) average. Use when you need fast membership or to remove duplicates. No indexing; iteration order is arbitrary.

s = {1, 2, 3}
s.add(2)         # no change; 2 already there
print(len(s))     # 3

None

None is the single value of type NoneType. Used to mean “no value” or “missing.” Functions that don’t explicitly return something return None. In DSA you’ll use it for “not found” or optional results when you don’t want to use -1 or an exception.

Mutability and Why It Matters

Mutable objects (list, dict, set) can be changed in place. When you pass them to a function, the function gets a reference to the same object—so if the function modifies it, the caller sees the change. That’s useful when you want to build or update a structure, but dangerous when you didn’t intend to share: two names pointing to the same list can cause bugs if you assume they’re independent.

Immutable objects (int, float, str, tuple) can’t be changed. “Operations” that look like changes (e.g., s += 'x') create a new object. So you can’t accidentally mutate shared state; also, immutable values are hashable and can go in sets and as dict keys.

Common Mistake

Modifying a list while iterating over it can skip elements or behave oddly. If you need to change a list during a loop, iterate over a copy (e.g., for x in list(arr):) or use a while loop and update the index yourself. Same idea: be clear about who “owns” the data and whether you’re sharing or copying.

Type Checking

type(x) returns the type of x. isinstance(x, int) is True if x is an int (or a subclass). For branching on type, isinstance is preferred because it respects inheritance. In DSA code you often assume types from the problem statement; type hints (e.g., def f(arr: list[int]) -> int:) document and can be checked with tools.

Quick Reference: DSA-Oriented

Need ordered sequence, index access, append: list.
Need fast “have we seen this” or unique elements: set.
Need fast lookup by key or count by key: dict.
Need immutable sequence (e.g., as dict key): tuple.
Need sequence of characters, often read-only: str.

Summary

Python’s main types for DSA: int, float, bool, str, list, tuple, dict, set, None.
Mutability matters: mutable (list, dict, set) can be changed in place and are passed by reference; immutable (int, str, tuple) are hashable and safe to share.
Choose the type that supports the operations you need at the right cost (e.g., set/dict for O(1) lookup).

2.2 Conditionals & Loops

Introduction

Conditionals let your program choose different paths (if this, do that; else do something else). Loops let you repeat a block of code—over a sequence, a range of indices, or until a condition is false. Together they form the control flow of almost every algorithm. For DSA you must be fluent: clean conditionals for edge cases and branches, and correct loops with the right bounds and iteration style.

Conditionals

if / elif / else

Execute a block only when its condition is true. elif is “else if”; only one branch runs. Indentation defines the block (typically 4 spaces).

if n < 0:
    print("negative")
elif n == 0:
    print("zero")
else:
    print("positive")

Truthiness

Conditions don’t have to be strictly True or False. Values are “truthy” or “falsy”: False, None, 0, 0.0, '', [], {} are falsy; most other values are truthy. So if arr: means “if arr is non-empty”; if not arr: means “if arr is empty.” Use this for clean guards.

if not arr:
    return 0
if key in d:   # key exists in dict
    ...

Comparison and Logical Operators

Comparisons: <, <=, >, >=, ==, !=. Chained comparisons: a < b < c is equivalent to a < b and b < c. Logical: and, or, not. Short-circuit: and stops at first falsy; or stops at first truthy. Use parentheses when it helps readability.

Loops

for loop

Iterate over an iterable: a sequence (str, list, tuple) or something that yields values (e.g., range).

for x in [1, 2, 3]:
    print(x)           # 1, 2, 3

for i in range(5):     # i = 0, 1, 2, 3, 4
    print(i)

for i in range(2, 10, 2):  # start 2, stop 10 (exclusive), step 2 → 2,4,6,8
    print(i)

range(n) gives 0 to n-1 (n values). range(a, b) gives a to b-1. range(a, b, step) goes from a by step, stopping before b. So “iterate indices 0 to n-1” is for i in range(len(arr)); “iterate indices 0 to n-1 inclusive in reverse” is for i in range(len(arr)-1, -1, -1).

enumerate

When you need both index and value in a loop, use enumerate(iterable, start=0). It yields (index, value) pairs. Avoids manual index management and off-by-one errors.

arr = [10, 20, 30]
for i, x in enumerate(arr):
    print(i, x)   # (0,10), (1,20), (2,30)

for i, x in enumerate(arr, start=1):
    print(i, x)   # (1,10), (2,20), (3,30)

while loop

Repeat while a condition is true. Use when the number of iterations isn’t known in advance (e.g., “while the stack is not empty,” “while n > 0”). Ensure the condition eventually becomes false to avoid infinite loops.

n = 10
while n > 0:
    print(n)
    n -= 1

break and continue

break: exit the innermost loop immediately. Use when you’ve found what you need (e.g., found a pair that sums to target).

continue: skip the rest of the current iteration and go to the next. Use to skip unwanted elements without nesting.

for i in range(n):
    if arr[i] < 0:
        continue   # skip negative
    process(arr[i])

else on Loops

A for or while can have an else block. It runs only if the loop completes without hitting a break. Useful for “search and report if not found.”

for x in arr:
    if x == target:
        print("Found")
        break
else:
    print("Not found")  # runs only if we never broke

Nested Loops and Complexity

A loop inside a loop typically does work proportional to the product of the iteration counts. Two loops over n → O(n²). That’s why we optimize by removing inner loops (e.g., with a hash map) when possible. Be aware of what’s inside the inner loop: if each iteration does O(1), total is O(n²); if the inner loop does O(n), total can be O(n³).

Common Patterns in DSA

Iterate by index: for i in range(len(arr)) when you need to read or write arr[i] or use i in logic.
Iterate by value: for x in arr when you only need the elements.
Index and value: for i, x in enumerate(arr).
Reverse index: for i in range(len(arr)-1, -1, -1) or for x in reversed(arr) (latter doesn’t give index).

Common Mistake

Off-by-one with range: range(n) gives 0..n-1, not 0..n. To run from 0 to n inclusive you need range(n+1). Also: modifying a list while iterating with for x in arr can skip or repeat elements; iterate over a copy or use indices and update carefully.

Expert Tip

Prefer for i, x in enumerate(arr) over for i in range(len(arr)): x = arr[i] when you need both—it’s clearer and less error-prone. Use while when the stopping condition is “until structure is empty” (e.g., stack) or “until state changes” (e.g., binary search on answer).

Summary

Conditionals: if / elif / else; use truthiness (if arr:, if not arr:) for clean guards.
for: over iterables and range; use enumerate for index + value; range(n) is 0..n-1.
while: when iterations aren’t known in advance; ensure condition eventually becomes false.
break exits the loop; continue skips to the next iteration; else on a loop runs when no break occurred.

2.3 Functions

Introduction

A function is a named block of code that takes inputs (arguments), does work, and optionally returns a value. Functions let you reuse logic, structure programs into clear steps, and express algorithms in small, testable pieces. For DSA, almost every solution is written as one or more functions—so you need a solid grasp of how to define them, pass data in and out, and avoid common pitfalls (especially with mutable arguments and scope).

Why Functions Matter for DSA

In problems you’ll write a function that takes the input (e.g., an array and a target) and returns the answer. Recursion is “a function that calls itself.” Helper functions keep your main solution readable (e.g., “is_valid,” “merge”). Understanding parameters (by value vs by reference), return values, and default arguments helps you avoid bugs and write clean, interview-ready code.

Defining a Function

Use the def keyword, the function name, parentheses with parameters, and a colon. The body is indented. Execution starts at the first line of the body and ends at return or at the end of the block (then the function returns None).

def greet(name):
    return "Hello, " + name

result = greet("Alice")   # result is "Hello, Alice"

Parameters and Arguments

Parameters are the names listed in the def line; arguments are the values you pass when you call the function. In Python, arguments are passed by object reference: the parameter name is bound to the same object the caller passed. For immutable types (int, str, tuple), reassigning the parameter doesn’t affect the caller. For mutable types (list, dict), modifying the object does affect the caller—because it’s the same object.

Positional and Keyword Arguments

Arguments can be passed by position or by name. f(1, 2) passes 1 and 2 to the first and second parameters. f(a=1, b=2) or f(1, b=2) uses names; keyword arguments must come after positional ones in a call.

def add(a, b):
    return a + b

add(3, 5)       # 8, positional
add(a=3, b=5)   # 8, keyword
add(3, b=5)     # 8, mixed

Default Parameter Values

You can give a parameter a default so callers can omit it. Defaults are evaluated once at function definition time—so avoid using mutable defaults (like def f(arr=[])); that one list is shared across all calls that omit arr. Use def f(arr=None) and if arr is None: arr = [] instead.

def power(x, n=2):
    return x ** n

power(5)     # 25
power(5, 3)  # 125

Common Mistake

Mutable default argument: def f(x, arr=[]): arr.append(x); return arr. Every call that omits arr shares the same list. Use arr=None and create a new list inside the function when arr is None.

Return Values

return exits the function and sends a value back to the caller. You can return one value, multiple values (as a tuple—often unpacked by the caller), or nothing (returns None). Returning multiple values is just return a, b; the caller gets (a, b) and can write x, y = f().

def min_max(arr):
    if not arr:
        return None, None
    return min(arr), max(arr)

lo, hi = min_max([3, 1, 4])  # lo=1, hi=4

Scope

Variables defined inside a function are local to that function; they aren’t visible outside. Variables defined at the top level (module level) are global. Reading a global name inside a function is allowed; assigning to it (e.g., count = 0) creates a new local name unless you declare global count. For DSA, prefer passing values in and returning them—avoid global state so your functions are easy to test and reason about.

Mutability and Side Effects

If you pass a list and the function appends to it or changes elements, the caller’s list is modified. That’s useful when you intentionally want to build or update a structure in place (e.g., “fill this list with results”). It’s a bug when the caller didn’t expect the input to change. When in doubt, don’t mutate the input unless the problem or API says so; if you need to return a modified copy, build a new list/dict and return it.

Expert Tip

In interviews, state your contract: “This function takes the array and target and returns the two indices; it does not modify the input array.” Then implement accordingly—copy if you need to sort or mutate internally, or work on the original if the problem allows in-place changes.

Recursion Preview

A function can call itself—that’s recursion. You’ll see it in depth in the next section. For now: every recursive function needs a base case (when to stop) and a recursive case (how to express the result in terms of a smaller subproblem). Example: factorial—fact(0)=1; fact(n)=n*fact(n-1).

def fact(n):
    if n <= 0:
        return 1
    return n * fact(n - 1)

Summary

Functions are defined with def; they take parameters and return values (or None).
Arguments are passed by object reference: mutating a mutable argument affects the caller; reassigning an immutable doesn’t.
Avoid mutable default arguments; use None and create a new list/dict inside if needed.
Prefer passing data in and returning results; be clear about whether your function mutates input.

2.4 Recursion Basics

Introduction

Recursion is when a function calls itself to solve a smaller instance of the same problem. Many algorithms (tree traversal, divide-and-conquer, backtracking) are naturally expressed recursively. To use it well you need a clear base case (when to stop), a correct recursive case (how to reduce the problem), and an understanding of the call stack so you can reason about correctness and space.

Why Recursion Matters for DSA

Recursion appears everywhere: merge sort (“sort left half, sort right half, merge”), binary search (“search left or right half”), tree DFS (“process root, then recurse on each child”), and backtracking (“try a choice, recurse, undo”). If you’re comfortable with recursion, these patterns are much easier to write and debug. The same logic can often be converted to an iterative loop with an explicit stack—but thinking recursively first is a core skill.

Two Parts of Every Recursive Function

Base Case

The base case is when the problem is so small that you can return an answer directly without calling yourself again. Without a base case, recursion never stops and you get infinite recursion (and eventually a stack overflow). Examples: “if n is 0, return 1” (factorial); “if the list is empty, return 0” (sum of list).

Recursive Case

The recursive case expresses the answer in terms of the same function on a smaller input. You must ensure the subproblem is strictly smaller (e.g., n−1 or half the array) so that repeated recursion eventually hits the base case. Example: “fact(n) = n * fact(n−1)”—each call reduces n until n is 0.

  Recursive function:
  1. Base case: return known answer for smallest input.
  2. Recursive case: call self on smaller input; combine result with current step.

Example: Factorial

Definition: fact(0) = 1; fact(n) = n × fact(n−1) for n ≥ 1. Base case: n ≤ 0 → return 1. Recursive case: return n * fact(n−1).

def fact(n):
    if n <= 0:
        return 1
    return n * fact(n - 1)

Trace for fact(3): fact(3)=3*fact(2), fact(2)=2*fact(1), fact(1)=1*fact(0), fact(0)=1. Then 1→1, 2*1=2, 3*2=6. So fact(3)=6.

Example: Sum of an Array

Sum of arr = first element + sum of the rest. Base case: empty list → 0. Recursive case: arr[0] + sum(arr[1:]).

def sum_arr(arr):
    if not arr:
        return 0
    return arr[0] + sum_arr(arr[1:])

Note: arr[1:] creates a new list each time, so this uses O(n) extra space per level and O(n²) time for the slicing. For real DSA you’d often pass indices (left, right) instead of slicing—same recursive idea, better complexity. The point here is the structure: base + recursive case.

Example: Fibonacci

F(0)=0, F(1)=1, F(n)=F(n−1)+F(n−2). Base cases: n=0 → 0, n=1 → 1. Recursive case: return fib(n−1) + fib(n−2).

def fib(n):
    if n <= 0:
        return 0
    if n == 1:
        return 1
    return fib(n - 1) + fib(n - 2)

This version is correct but exponential in n because it recomputes the same F(k) many times. Later you’ll fix that with memoization or iteration—but as a first recursive definition it’s the standard example.

Call Stack Intuition

Each recursive call pushes a new frame onto the call stack (with its own parameters and local variables). When a call returns, its frame is popped and execution resumes in the caller. So recursion uses space proportional to the depth of the recursion (e.g., fact(n) has depth n; sum_arr on a list of length n has depth n). If the depth is too large (e.g., recursion on a list of 10^5 elements without tail-call optimization), you get a stack overflow. For deep recursion, an iterative solution or passing indices to limit depth is often better.

Common Mistakes

No base case or wrong base case: Recursion never stops, or returns wrong value for the smallest input. Always ask: “What’s the smallest input, and what should I return?”
Recursive case not smaller: If you call f(n) in terms of f(n) or f(n+1), you never reach the base case. Ensure the argument (or problem size) strictly decreases.
Wrong recurrence: The formula that combines “current step” with “result of smaller problem” must match the problem definition. Check with a small example by hand.

Common Mistake

Forgetting the base case or writing a base case that never runs (e.g., “if n == 1” when you only ever call with n ≥ 2 but the recursive call can produce n=0). Trace one small call to verify the base case is hit.

Expert Tip

When designing recursion, write the base case first (“what’s the trivial case?”), then the recursive case (“if I had the answer for the smaller problem, how would I get the full answer?”). That order matches how the computation unfolds.

Summary

Recursion = function calls itself on a smaller instance; needs a base case and a recursive case.
Base case: return directly for smallest input. Recursive case: call self on smaller input and combine.
Recursion depth = stack space; avoid very deep recursion (e.g., long lists) unless you use indices or iteration.
Naive Fibonacci is exponential; memoization or iteration fixes it (covered later).

2.5 Lists

Introduction

A list in Python is a mutable, ordered sequence of values. It’s the primary “array” type for DSA: you use it for sequences that need index access, in-place updates, and dynamic growth. Understanding how to create, index, slice, and modify lists—and what each operation costs—is essential for implementing algorithms correctly and efficiently.

Why Lists Matter for DSA

Most array-based problems give you or expect a list. You’ll index by position (arr[i]), append during a single pass, or use a list as an explicit stack or queue. Knowing that append is amortized O(1) but insert(0, x) is O(n) helps you choose the right structure (e.g., collections.deque for frequent front insert/delete).

Creating Lists

arr = []                    # empty list
arr = [1, 2, 3]             # literal
arr = list()                # same as []
arr = list(range(5))        # [0, 1, 2, 3, 4]
arr = [0] * 10              # [0, 0, ..., 0] — same object repeated (careful with mutable elements!)
arr = [x * 2 for x in range(5)]  # list comprehension: [0, 2, 4, 6, 8]

Common Mistake

[[]] * 5 creates a list of five references to the same inner list. Appending to one affects all. Use [[] for _ in range(5)] to get five separate lists.

Indexing and Slicing

Indices are 0-based. arr[i] is the element at position i; valid range is 0 to len(arr)-1. Negative indices count from the end: arr[-1] is the last element, arr[-2] the second-to-last. Indexing is O(1).

Slicing arr[start:stop] returns a new list from index start up to but not including stop. Omitted start defaults to 0; omitted stop defaults to the end. arr[start:stop:step] steps by step. Slicing creates a copy—O(k) where k is the slice length.

arr = [10, 20, 30, 40, 50]
arr[0]      # 10
arr[-1]     # 50
arr[1:4]    # [20, 30, 40]  — indices 1,2,3
arr[:3]     # [10, 20, 30]
arr[::2]    # [10, 30, 50]  — every 2nd element

Mutability and In-Place Operations

Lists are mutable: you can change elements, append, insert, and remove. Assigning to an index updates that slot: arr[i] = x is O(1). In-place operations modify the list and often return None (e.g., arr.sort() returns None; sorted(arr) returns a new list).

Key Operations and Time Complexity

Operation	Time
`arr[i]`, `arr[i] = x`	O(1)
`arr.append(x)`	O(1) amortized
`arr.pop()`	O(1)
`arr.insert(0, x)`, `arr.pop(0)`	O(n)
`x in arr`	O(n)
`arr + other` (concatenate)	O(n + m)

append is amortized O(1) because the list occasionally grows its backing storage; over many appends the average cost per append is constant. insert(0, x) shifts all elements, so it’s O(n). Use collections.deque when you need O(1) front and back operations.

Common Methods

arr.append(x) — add at end. Amortized O(1).
arr.extend(iterable) — add all elements from iterable at end. O(k) for k elements.
arr.insert(i, x) — insert x at index i; elements shift. O(n).
arr.pop() — remove and return last element. O(1). arr.pop(i) removes at index i, O(n).
arr.remove(x) — remove first occurrence of x. O(n).
arr.reverse() — reverse in place. O(n).
arr.sort() — sort in place. O(n log n). sorted(arr) returns new list, doesn’t change arr.
len(arr), arr.count(x), arr.index(x) — length O(1); count and index O(n).

List as Stack or Queue

Stack: Use append for push and pop() for pop—both O(1) amortized. Perfect for DFS, expression evaluation, etc.

Queue: Using insert(0, x) to enqueue and pop() to dequeue makes enqueue O(n). For a real queue use collections.deque (append and popleft both O(1)).

# Stack
stack = []
stack.append(1)
stack.append(2)
top = stack.pop()   # 2

Copying vs Reference

Assignment does not copy: b = arr makes b refer to the same list. Modifying b changes arr. To copy: arr.copy() or arr[:] — shallow copy (same elements; if elements are mutable, inner objects are shared). For a deep copy of nested structures use copy.deepcopy(arr).

Expert Tip

When you need to pass a list to a function that might mutate it and you want to keep the original, pass arr.copy() or arr[:]. When implementing recursion that “tries a choice then backtracks,” often you append, recurse, then pop—same list, no copy—for O(n) space instead of copying at each level.

Summary

Lists are mutable, ordered sequences; index and slice with 0-based indices; negative indices from the end.
append / pop at end are O(1) amortized; insert(0) / pop(0) / in are O(n).
Use a list as a stack (append + pop); for a queue use deque.
Assignment shares the list; use arr.copy() or arr[:] for a shallow copy when needed.

2.6 Tuples

Introduction

A tuple is an immutable, ordered sequence of values. Like a list, it supports indexing and slicing—but you cannot add, remove, or change elements after creation. That immutability makes tuples hashable: they can be used as dictionary keys and as set elements, and they safely represent fixed structures (e.g., coordinates, multi-value returns) without accidental mutation.

Why Tuples Matter for DSA

You’ll use tuples when you need a fixed pair or record (e.g., (row, col) for grid positions, (distance, node) for Dijkstra’s priority queue). Returning multiple values from a function is done with a tuple (return i, j). Storing “visited” or “seen” composite keys (e.g., (i, j) for 2D states) requires a hashable type—list won’t work in a set or as a dict key; tuple will. When you don’t need to mutate the sequence, tuple is a clear and efficient choice.

Creating Tuples

t = ()                    # empty tuple
t = (1,)                  # single-element tuple — comma required
t = (1, 2, 3)             # literal
t = 1, 2, 3               # parentheses optional when unambiguous
t = tuple()               # empty, same as ()
t = tuple([1, 2, 3])      # from iterable: (1, 2, 3)

Single-element tuple needs a trailing comma: (1,). Without it, (1) is just the integer 1 in parentheses.

Indexing and Slicing

Same as lists: 0-based indices, negative indices from the end, slicing t[start:stop] returns a new tuple. Indexing and slicing are O(1) and O(k) respectively. You cannot assign: t[0] = 5 raises TypeError.

t = (10, 20, 30, 40)
t[0]      # 10
t[-1]     # 40
t[1:3]    # (20, 30)

Unpacking

You can assign elements of a tuple to multiple names in one line. Useful for function returns and loop variables.

x, y = (3, 5)           # x=3, y=5
a, b = b, a             # swap without temp
for i, (r, c) in enumerate([(1,2), (3,4)]):
    ...                 # i=0, r=1, c=2; then i=1, r=3, c=4

Star unpacking: first, *rest = (1, 2, 3, 4) gives first=1, rest=[2, 3, 4] (rest is a list).

Hashable: Dict Keys and Set Elements

Because tuples are immutable, they can be hashed (assuming their elements are hashable). So you can use a tuple as a dict key or put it in a set—unlike a list.

seen = set()
seen.add((0, 0))        # valid
# seen.add([0, 0])      # TypeError: unhashable type: 'list'

dist = {}
dist[(1, 2)] = 5       # key is tuple (1, 2)

In DSA, “visited cells” in a 2D grid are often stored as set() of (r, c) tuples. State in memoization (e.g., DP) is often a tuple of parameters so it can be used as a key.

Tuple vs List: When to Use Which

Tuple: Fixed structure, multiple return values, keys or set members, or when you want to signal “this sequence is not meant to change.”
List: Need to append, remove, or change elements; building a sequence over time; stack/queue.

Tuples are slightly more memory-efficient and can be slightly faster for creation and access in some cases, but the main reason to choose a tuple is semantics: immutability and hashability.

Common Mistake

A tuple is only hashable if every element is hashable. ([1, 2], 3) contains a list, so it cannot be put in a set or used as a dict key. Use (tuple([1, 2]), 3) or keep the inner structure immutable.

Expert Tip

When returning multiple values, return i, j is a tuple; the caller can write i, j = f(). For BFS/DFS on a grid, store coordinates as (r, c) in the queue and in the visited set so you have one consistent, hashable type.

Summary

Tuples are immutable, ordered sequences; same indexing and slicing as lists, but no assignment.
Tuples are hashable (if elements are hashable)—use as dict keys and set elements; lists are not.
Use tuples for fixed structure, multi-value returns, and composite keys (e.g., (r, c), (i, j)).

2.7 Sets

Introduction

A set is an unordered collection of unique, hashable elements. No duplicates are stored; adding the same element again has no effect. Membership test (x in s), add, and remove are O(1) average. Sets are the go-to structure when you need “have I seen this?” or “distinct values” without caring about order or index.

Why Sets Matter for DSA

Many problems reduce to “check if this value exists” or “collect unique items.” With a list, x in arr is O(n); with a set it’s O(1) average—so a single pass with a set can replace nested loops (e.g., Two Sum: for each element, check if complement is in a set). Sets also give you deduplication for free: set([1,2,2,3]) → {1, 2, 3}. In graph BFS/DFS, “visited” is often a set of nodes. Use a set whenever the key operation is membership or uniqueness.

Creating Sets

s = set()                  # empty set (not {} — that's empty dict)
s = {1, 2, 3}             # literal
s = set([1, 2, 2, 3])     # from iterable: {1, 2, 3}, duplicates removed
s = set("hello")          # {'h', 'e', 'l', 'o'} — unique chars

Elements must be hashable (immutable: int, str, tuple, etc.). You cannot have a list or another set inside a set.

Key Operations and Time Complexity

x in s, x not in s — O(1) average.
s.add(x) — add element; no effect if already present. O(1) average.
s.remove(x) — remove x; raises KeyError if not in set.
s.discard(x) — remove x if present; no error if absent. O(1) average.
len(s) — O(1).

There is no indexing: sets are unordered. Iteration order is arbitrary (and can change). Use for x in s when you only need to process each element once.

Set Operations

Mathematical set operations are built in:

a = {1, 2, 3}
b = {2, 3, 4}
a | b   # union: {1, 2, 3, 4}
a & b   # intersection: {2, 3}
a - b   # difference (in a, not in b): {1}
a ^ b   # symmetric difference (in one but not both): {1, 4}

These return a new set. In-place versions: a.update(b) (union), a.intersection_update(b), a.difference_update(b). Useful for “remove all seen elements” or “keep only common elements.”

No Duplicates, No Order

Adding the same element multiple times doesn’t change the set. So building a set from a list is an easy way to get unique values—but you lose order. If you need “unique but preserve insertion order” (Python 3.7+), dict.fromkeys(arr) gives you an ordered mapping; list(dict.fromkeys(arr)) is unique elements in first-seen order.

Example

“Remove duplicates from a list” → list(set(arr)) is O(n) but order is arbitrary. For “unique, keep first occurrence order”: list(dict.fromkeys(arr)).

When to Use a Set

Membership: “Is x in the collection?” — use a set for O(1) instead of list O(n).
Uniqueness: “Collect distinct values” — add to a set; duplicates are ignored.
Visited / seen: In BFS/DFS, store visited nodes in a set for O(1) check and add.
No indexing needed: If you don’t need order or position, set is simpler and faster for in/add/remove.

frozenset

frozenset is an immutable set. It’s hashable, so it can be used as a dict key or as an element of another set. Use it when you need a set that must not change and must be storable in a set or as a key (e.g., “set of frozensets” for distinct subsets).

fs = frozenset([1, 2, 3])
# fs.add(4)   # AttributeError
seen_subsets = {fs}       # valid — frozenset is hashable

Common Mistake

Using a list when you need membership: if x in arr in a loop makes the overall algorithm O(n²). If you’re only doing membership checks and adds, use a set for O(n) total. Also: {} is an empty dict, not an empty set—use set() for an empty set.

Expert Tip

In “find two elements that sum to target,” maintain a set of values seen so far. For each new value x, check if target - x is in the set—O(1)—then add x. One pass, O(n). Same idea for “first duplicate,” “two arrays have a common element,” etc.

Summary

Sets store unique, hashable elements; unordered; no indexing.
in, add, remove, discard are O(1) average—use sets for fast membership and deduplication.
Use a set for “seen,” “visited,” “distinct values,” or any problem where the main operation is “is this in the collection?”

2.8 Dictionaries

Introduction

A dictionary (dict) is a mapping from keys to values. Keys must be hashable (immutable); values can be anything. Lookup, insert, and delete by key are O(1) average. In Python 3.7+, dictionaries preserve insertion order. For DSA, dicts are essential: frequency counts, “value → index” or “value → count,” memoization caches, and graph adjacency (when nodes are hashable) all use dicts.

Why Dictionaries Matter for DSA

Whenever you need “for this key, what’s the value?” or “have I seen this key and what did I store?” a dict is the right structure. Two Sum: map each value to its index so you can find a complement in O(1). Frequency: map each element to its count in one pass. Memoization: map (arguments as key) to return value. Subarray with sum K: prefix sum → count of prefixes seen. Most hash-based optimizations in this course are implemented with a dict.

Creating Dictionaries

d = {}                      # empty dict
d = {"a": 1, "b": 2, "c": 3}   # literal
d = dict()                  # empty
d = dict(a=1, b=2)          # keyword args (keys are strings)
d = dict([("a", 1), ("b", 2)])  # from list of (key, value) pairs
d = dict.fromkeys([1, 2, 3], 0)  # {1: 0, 2: 0, 3: 0} — same value for all keys

Accessing and Modifying

d[key] returns the value for key; raises KeyError if the key is missing. d[key] = value sets or overwrites. del d[key] removes the key; key in d is O(1) average. Use d.get(key, default) to avoid KeyError: returns d[key] if key exists, else default (or None if no default given).

d = {"x": 10, "y": 20}
d["x"]       # 10
d["z"] = 30  # add or update
d.get("x")   # 10
d.get("w")   # None
d.get("w", 0)   # 0 — default when key missing

Key Operations and Time Complexity

d[key], d[key] = value, del d[key], key in d — O(1) average.
len(d) — O(1).
d.get(key, default) — O(1) average.

Iteration: for k in d iterates keys; d.keys(), d.values(), d.items() give key, value, or (key, value) pairs. In Python 3 these are view-like and reflect the current dict; iteration order is insertion order (3.7+).

Common Patterns in DSA

Frequency count: for x in arr: d[x] = d.get(x, 0) + 1 — one pass, O(n).
Value → index (e.g., Two Sum): for i, x in enumerate(arr): store d[x] = i (or list of indices if duplicates matter). Check target - x in d before storing.
Memoization: Key = tuple of arguments (or something hashable); value = computed result. Check if key in d: return d[key] before computing.

# Frequency count
freq = {}
for x in arr:
    freq[x] = freq.get(x, 0) + 1

defaultdict

collections.defaultdict(factory) is a dict that never raises KeyError on access: if the key is missing, it creates a value with factory() and stores it. defaultdict(int) gives 0 for missing keys—so freq[x] += 1 works without get. defaultdict(list) gives [] for missing keys—useful for “group by key.” Same O(1) average behavior as dict.

from collections import defaultdict
freq = defaultdict(int)
for x in arr:
    freq[x] += 1   # no get() needed

groups = defaultdict(list)
for item in items:
    groups[item.key].append(item)   # each key gets a list

Keys Must Be Hashable

Dictionary keys must be immutable and hashable: int, str, tuple (of hashable elements), etc. You cannot use a list or another dict as a key. For composite keys (e.g., (i, j) for 2D state), use a tuple. If you need to key by something mutable, convert to something hashable (e.g., tuple(sorted(items)) for a “canonical” set-like key).

Common Mistake

Using a mutable type as a key: d[[1,2]] = 3 raises TypeError: unhashable type: 'list'. Use a tuple: d[(1, 2)] = 3. Also: d[key] when key might be missing raises KeyError—use d.get(key) or check key in d first.

Expert Tip

For “first index where each value appears,” store d[x] = i only when x is not already in d, so the first index is kept. For “last index,” overwrite every time: d[x] = i. Choose based on what the problem needs.

Summary

Dictionaries map hashable keys to values; lookup, insert, delete by key are O(1) average.
Use d.get(key, default) to avoid KeyError; use defaultdict when you want a default value for missing keys.
Dicts are the standard tool for frequency counts, value→index (Two Sum), and memoization.

2.9 Strings

Introduction

A string in Python is an immutable sequence of Unicode characters. Like a tuple, you can index and slice it, but you cannot change a character in place. Strings are hashable (so they can be dict keys or set elements) and support many methods for searching, splitting, and transforming. For DSA, strings show up in palindromes, anagrams, pattern matching, and parsing—and building strings efficiently (e.g., with ''.join()) matters for performance.

Why Strings Matter for DSA

String problems often ask: “Is it a palindrome?” “Are two strings anagrams?” “Find pattern in text.” You’ll index by position (s[i]), slice substrings (s[l:r]), and compare or count characters. Because strings are immutable, “changing” a character means creating a new string (e.g., s[:i] + c + s[i+1:])—O(n). For heavy string building, use a list of characters and ''.join(...) at the end for O(n) total instead of O(n²) from repeated concatenation.

Indexing and Slicing

Same as lists: 0-based indices, negative indices from the end. s[i] is O(1). s[start:stop] returns a new string (substring); omitted start/stop mean “from start” / “to end.” s[::-1] is the reversed string—handy for palindrome checks. Slicing is O(k) where k is the slice length.

s = "hello"
s[0]      # 'h'
s[-1]     # 'o'
s[1:4]    # "ell"
s[::-1]   # "olleh" — reverse

Immutability

You cannot do s[0] = 'H'—that raises TypeError. To “change” a string you build a new one: s = s[:i] + new_char + s[i+1:], or use methods like replace, upper, etc., which all return new strings. The original string is never modified.

Concatenation and Building Strings

a + b creates a new string of length len(a)+len(b). Doing result = result + c in a loop is O(n²) over n characters, because each concatenation copies the whole result. For building a string from many parts, collect parts in a list and join once: ''.join(parts) is O(n).

# Slow: O(n^2)
result = ""
for c in s:
    result = result + c.upper()   # avoid in loops

# Fast: O(n)
result = ''.join(c.upper() for c in s)
# Or: parts = []; parts.append(...); ''.join(parts)

Optimization Insight

When you need to build a string character by character or from many segments, use parts.append(...) and then ''.join(parts). Append to list is amortized O(1); one join at the end is O(n). Repeated s += c in a loop is O(n²).

Common Methods

s.split(sep=None) — split by whitespace (default) or by sep; returns list of strings. "a b c".split() → ['a','b','c'].
s.strip(), s.lstrip(), s.rstrip() — remove leading/trailing whitespace (or specified chars).
s.upper(), s.lower() — return new string in upper/lower case.
s.replace(old, new) — return new string with all occurrences of old replaced by new.
s.startswith(prefix), s.endswith(suffix) — boolean.
s.find(sub) — index of first occurrence of sub, or -1. s.index(sub) same but raises ValueError if not found.
s.count(sub) — number of non-overlapping occurrences of sub.

All of these return new strings or other values; they do not modify s.

Membership and Iteration

c in s is O(n) — Python scans the string. for c in s iterates over characters. for i, c in enumerate(s) gives index and character. To get a list of characters: list(s). To convert a list of characters back: ''.join(lst).

s = "abc"
list(s)   # ['a', 'b', 'c']
''.join(['a', 'b', 'c'])   # "abc"

Comparing Strings

Strings are compared lexicographically (dictionary order): character by character, using Unicode code points. ==, !=, <, >, <=, >= all work. So sorted(list_of_strings) sorts them alphabetically. For case-insensitive sort: sorted(lst, key=str.lower).

Common Mistake

Building a string with s += c in a loop—O(n²). Use a list and ''.join(). Also: s[i] returns a string of length 1 (a character); in Python there is no separate “char” type. So s[0] == 'h' is correct.

Expert Tip

For “check if palindrome”: s == s[::-1] is simple but uses O(n) extra space for the reversed slice. For O(1) space, use two pointers from both ends. For anagrams, sorted(s1)==sorted(s2) is O(n log n); frequency count with a dict or Counter is O(n).

Summary

Strings are immutable sequences of characters; index and slice like lists; slicing creates a new string.
Build strings with ''.join(parts), not repeated s += c in a loop, to avoid O(n²).
Use split, strip, upper/lower, replace, find, startswith/endswith as needed; all return new values.

2.10 List Comprehension

Introduction

A list comprehension is a compact syntax for building a list from an iterable (and optionally filtering). Instead of a multi-line loop that appends to a list, you write a single expression in brackets. It’s idiomatic in Python, readable once you’re used to it, and often slightly faster than an equivalent loop. For DSA you’ll use it to build lists of indices, transformed values, or filtered elements in one line.

Basic Syntax

[expression for item in iterable] — for each item in the iterable, evaluate expression and collect the results into a new list. The result has the same length as the iterable (unless you add a filter).

squares = [x * x for x in range(5)]     # [0, 1, 4, 9, 16]
uppers  = [c.upper() for c in "hello"]  # ['H', 'E', 'L', 'L', 'O']
lengths  = [len(w) for w in ["a", "bb", "ccc"]]  # [1, 2, 3]

With a Filter: `if`

[expression for item in iterable if condition] — only include expression when condition is true. The result can be shorter than the iterable.

evens = [x for x in range(10) if x % 2 == 0]   # [0, 2, 4, 6, 8]
pos   = [x for x in arr if x > 0]              # only positive elements

There is no else in the filter position. To map some items to one value and others to another, use a conditional expression in the expression part: [a if cond else b for x in ...].

# Map: even -> "e", odd -> "o"
labels = ["e" if x % 2 == 0 else "o" for x in range(5)]  # ['e','o','e','o','e']

Nested Loops

You can use multiple for clauses. They nest left to right: the rightmost iterable is the “inner” loop.

pairs = [(i, j) for i in range(2) for j in range(2)]
# [(0,0), (0,1), (1,0), (1,1)]

Equivalent to:

pairs = []
for i in range(2):
    for j in range(2):
        pairs.append((i, j))

You can add an if after the loops to filter. Keep comprehensions readable; if they get long or complex, an explicit loop is often clearer.

When to Use List Comprehensions

Use: Simple “build a list from an iterable” or “build with one filter”—one line, easy to read.
Avoid: Complex logic, multiple conditions, or when you need side effects (e.g., printing, updating other variables). Use an explicit for loop instead.

Concept Note

A list comprehension always produces a new list. If you don’t need to keep the list (e.g., you’re only iterating once), a generator expression (expr for x in it) saves memory—same syntax, but with parentheses. sum(x*x for x in range(5)) doesn’t build a list of squares.

Dict and Set Comprehensions

Same idea with different brackets: {key_expr: value_expr for x in it} builds a dict; {expr for x in it} builds a set. Useful for “list of pairs → dict” or “unique values from a transformation.”

d = {x: x * x for x in range(5)}   # {0:0, 1:1, 2:4, 3:9, 4:16}
s = {c.upper() for c in "hello"}   # {'H', 'E', 'L', 'O'}

Common Mistake

Putting side effects (e.g., print(x)) in a comprehension—it works but is confusing. Use a loop. Also: [x for x in it if x] keeps truthy values; [x if x else 0 for x in it] maps falsy to 0. The if after the loop filters; the x if cond else y in the expression chooses between two values.

Expert Tip

In DSA, list comprehensions are handy for “indices where condition holds”: [i for i, x in enumerate(arr) if x > 0], or “transform and collect”: [x * 2 for x in arr]. Keep the expression and condition simple so the line stays readable.

Summary

List comprehension: [expr for item in iterable] or [expr for item in iterable if cond].
Use for simple “build list / filter list” in one line; use explicit loops for complex logic or side effects.
Dict comprehension: {k: v for ...}; set comprehension: {expr for ...}. Generator: (expr for ...) when you don’t need a full list.

2.11 Lambda Functions

Introduction

A lambda is an anonymous function defined with a single expression. You write lambda arguments: expression—no def, no block, no explicit return. Lambdas are useful when you need a small function as an argument (e.g., key for sorted, or a comparator). For anything longer or reusable, use a normal def function.

Syntax

lambda x: x * 2 is a function that takes one argument x and returns x * 2. You can have multiple arguments: lambda a, b: a + b. You cannot have multiple statements or assignments—only one expression. The expression is implicitly returned.

f = lambda x: x * 2
f(5)   # 10

g = lambda a, b: a - b
g(7, 3)   # 4

Common Use: `sorted` and `key`

Many built-ins accept a key function that maps each element to a value used for comparison. sorted(iterable, key=...) sorts by the result of key(item). A lambda is often the shortest way to supply that.

arr = [("b", 2), ("a", 3), ("c", 1)]
sorted(arr)                    # by first element: [('a',3), ('b',2), ('c',1)]
sorted(arr, key=lambda t: t[1])   # by second: [('c',1), ('b',2), ('a',3)]

# Sort list of numbers by absolute value
sorted([-3, 1, -2], key=lambda x: abs(x))   # [1, -2, -3]

Same idea for min, max: min(items, key=lambda x: x.weight). For custom comparison in sorting (e.g., multiple criteria), you can use key=lambda x: (x.a, -x.b) to sort by a ascending and b descending.

map and filter

map(f, iterable) returns an iterator of f(x) for each x. filter(pred, iterable) returns an iterator of elements for which pred(x) is true. Lambdas can be used as f or pred.

list(map(lambda x: x * 2, [1, 2, 3]))     # [2, 4, 6]
list(filter(lambda x: x > 0, [-1, 2, 0, 3]))  # [2, 3]

In Python, list comprehensions are often clearer: [x * 2 for x in lst] and [x for x in lst if x > 0]. Use map/filter with lambda when you prefer that style or when you need an iterator without building a list.

When to Use Lambda vs `def`

Lambda: One-off function passed to sorted, min, max, key=, or similar; expression fits on one line and is simple.
def: Reusable logic, multiple lines, default arguments, or when a name would make the code clearer. If you’re doing f = lambda ... and using f in several places, a named function is better.

Common Mistake

Using a lambda for anything that needs more than one expression. Lambdas can’t contain statements (assignments, loops, etc.). If the logic is at all involved, use def. Also: lambda: x doesn’t “capture” x at definition time in the way you might expect in a loop—closure over loop variables can be tricky; prefer passing the value as a default argument or using a named function.

Expert Tip

In DSA, the most common use of lambda is sorted(arr, key=lambda x: (x[0], -x[1])) or similar—sort by one field ascending, another descending. For heap operations you’ll see key= or a custom comparator; in Python’s heapq, you often store (priority, item) or use a custom class, but lambda isn’t used there. Keep lambdas short and readable.

Summary

Lambda = anonymous function: lambda args: expression; one expression only, implicitly returned.
Use with sorted(..., key=lambda x: ...), min/max key=, or map/filter when the logic is one short expression.
Use def for anything multi-line, reusable, or complex.

2.12 Sorting in Python

Introduction

Sorting arranges elements in a defined order (ascending or descending). In Python you have two main ways: sorted(iterable)—which returns a new sorted list and leaves the original unchanged—and list.sort()—which sorts the list in place and returns None. Both use the same underlying algorithm (Timsort) and support a key function and reverse flag. For DSA, you’ll sort to enable binary search, two-pointer techniques, or to order items by a custom rule; knowing the API and the cost is essential.

Why Sorting Matters for DSA

Many algorithms assume or produce sorted data: binary search requires a sorted array; “merge two sorted lists” and “find pairs with sum K” often start by sorting; greedy problems sometimes need items ordered by value or deadline. Python’s sort is O(n log n) and stable—so you get predictable, efficient ordering. Custom key lets you sort by a field, by multiple criteria, or by a computed value without writing a comparison function by hand.

Two Ways to Sort

`sorted(iterable, key=..., reverse=...)`

Built-in function. Accepts any iterable (list, tuple, string, etc.) and returns a new list in sorted order. The original is not modified. Use when you need to keep the original or when the input isn’t a list (e.g., sorted("hello") → ['e','h','l','l','o']).

arr = [3, 1, 4, 1, 5]
new_list = sorted(arr)   # [1, 1, 3, 4, 5]; arr unchanged
s = "hello"
sorted(s)                # ['e', 'h', 'l', 'l', 'o'] — returns list of chars

`list.sort(key=..., reverse=...)`

Method on lists only. Sorts the list in place and returns None. Slightly more efficient when you don’t need the original: no extra list is allocated. Use when the variable is a list and you’re fine mutating it.

arr = [3, 1, 4, 1, 5]
arr.sort()   # arr is now [1, 1, 3, 4, 5]; returns None
# new_list = arr.sort()   # wrong — new_list would be None

Feature	`sorted()`	`list.sort()`
Input	Any iterable	List only
Return value	New sorted list	`None` (mutates list)
Original	Unchanged	Modified in place

The `key` Parameter

Both sorted() and list.sort() accept key=function. For each element x, the sort compares key(x) instead of x. So you can sort by a field, by a transformation, or by a tuple for multiple criteria—without implementing a comparator. The key function is called once per element; the result is cached, so cost is O(n) key calls plus O(n log n) comparisons.

Sort by a Single Field

pairs = [(2, "b"), (1, "a"), (2, "a")]
sorted(pairs)                      # by first elem: [(1,'a'), (2,'a'), (2,'b')]
sorted(pairs, key=lambda p: p[1])  # by second: [(2,'a'), (1,'a'), (2,'b')]

Sort by Multiple Criteria

Return a tuple from key. Python compares tuples lexicographically: first element, then second, and so on. To sort by field A ascending and field B descending, use (x.a, -x.b) for numbers (negate for descending), or a custom key that returns a tuple where the “descending” part is inverted (e.g., -x.b for numeric B).

# Sort by second element ascending, then first descending
pairs = [(2, 1), (3, 1), (1, 2)]
sorted(pairs, key=lambda p: (p[1], -p[0]))  # [(3,1), (2,1), (1,2)]

Sort by Computed Value

arr = [-4, 2, -1, 3]
sorted(arr, key=abs)   # [-1, 2, 3, -4] — by absolute value
sorted(arr, key=lambda x: -x)   # [3, 2, -1, -4] — descending (key negated)

The `reverse` Parameter

reverse=True sorts in descending order. Same comparisons as ascending; order of output is reversed. Equivalent to sorting by key(x) and then reversing, but reverse=True is built in and clear.

sorted([3, 1, 4], reverse=True)   # [4, 3, 1]

Stability

Python’s sort is stable: when two elements compare equal, their relative order in the output is the same as in the input. So you can sort by one field first, then sort again by another field—the second sort preserves the order of ties from the first. Example: sort by last name, then by first name; people with the same last name stay in the order of first name from the previous pass. Or: sort by value, then by index—equal values keep original index order.

Concept Note

Stability is why “sort by (primary_key, secondary_key)” works: the tuple comparison gives primary first, and when primary is equal, secondary breaks the tie. You get the same effect as a stable sort by primary then by secondary.

Time and Space Complexity

Python uses Timsort (a hybrid of merge sort and insertion sort). Time: O(n log n) in the average and worst case. Each key is computed once—O(n) key calls—and the comparison-based sort does O(n log n) comparisons. Space: sorted() allocates a new list—O(n) extra space. list.sort() sorts in place but Timsort uses O(n) auxiliary space for the merge step. So both are O(n log n) time, O(n) space; sort() avoids a second list.

Edge Cases

Empty iterable: sorted([]) → []; [].sort() does nothing. Both are safe.
Single element: Returns or leaves the list with one element; no error.
Duplicates: All kept; order among equal elements is stable (same as input).
Mixed types: In Python 3, comparing incompatible types (e.g., int vs str) raises TypeError. Ensure elements are comparable or supply a key that returns a comparable type (e.g., all numbers or all strings).

Sorting Custom Objects

By default, objects are compared by identity (or by __lt__ if defined). To sort by an attribute, use key=lambda obj: obj.attr. For multiple attributes, key=lambda obj: (obj.a, obj.b). You can also define __lt__ (and optionally other rich comparison methods) on the class so the object is directly comparable; then sorted(list_of_objs) works without key. For one-off sorts, key is usually simpler.

class Person:
    def __init__(self, name, age):
        self.name, self.age = name, age
people = [Person("Bob", 30), Person("Alice", 25)]
sorted(people, key=lambda p: p.age)   # by age
sorted(people, key=lambda p: (p.age, p.name))  # by age, then name

Common Mistakes

Expecting list.sort() to return the list: It returns None. Write arr.sort() and then use arr; don’t assign the return value.
Sorting in place when you need the original: If you still need the unsorted list, use sorted(arr) and assign to a new variable, or copy first: arr_copy = arr.copy(); arr_copy.sort().
Wrong multi-criteria order: Remember that key=lambda x: (x.a, x.b) sorts by a first, then b. For descending on one field, use negation (numbers) or a key that inverts order.

Common Mistake

arr = arr.sort() sets arr to None because sort() returns None. Use arr.sort() (no assignment) for in-place, or arr = sorted(arr) for a new list.

Interview Insight

When the problem needs “order by X, then by Y,” say: “I’ll sort with key=lambda x: (x.X, x.Y)” or “by (X, -Y) if Y should be descending.” Mention that Python’s sort is stable and O(n log n). If you need to preserve the original list, use sorted() and assign to a new variable.

Summary

sorted(iterable) returns a new sorted list; list.sort() sorts in place and returns None.
Use key=function to sort by a field, computed value, or tuple for multiple criteria; use reverse=True for descending.
Python’s sort is stable and O(n log n) time, O(n) space; choose sorted() vs sort() based on whether you need to keep the original.

2.13 Time Complexity of Built-ins

Introduction

To analyze your algorithms you need to know the cost of the operations you use. Python's built-in types have specific time complexities: list index is O(1), but x in list is O(n); dict and set lookup are O(1) average; string concatenation in a loop is O(n²). This section summarizes the main time complexities of built-in operations so you can reason correctly about your overall complexity and avoid hidden bottlenecks.

Why This Matters for DSA

If you use in on a list inside a loop, you may turn an O(n) idea into O(n²). If you use a set for membership instead, you stay O(n). Choosing the right structure and knowing the cost of each call (index, append, insert, in, sort, etc.) lets you state and optimize complexity accurately in interviews and in code.

List

Operation	Time
`arr[i]`, `arr[i] = x`	O(1)
`arr.append(x)`, `arr.pop()`	O(1) amortized
`arr.insert(i, x)`, `arr.pop(i)`	O(n)
`x in arr`, `arr.index(x)`, `arr.count(x)`	O(n)
`arr + other`, `arr[k:j]` (slice of size k)	O(n) / O(k)
`arr.sort()`, `sorted(arr)`	O(n log n)

Dict and Set

Average case assumes a good hash function and no excessive collisions. Worst case (many collisions) can degrade toward O(n) per operation, but in practice the average case is the one to use for analysis.

Operation	Time (average)
`d[key]`, `d[key]=v`, `del d[key]`, `key in d`	O(1)
`s.add(x)`, `s.discard(x)`, `x in s`	O(1)
Iteration over `d` or `s`	O(n)

String

Operation	Time
`s[i]`, `s[i:j]` (slice length k)	O(1), O(k)
`sub in s`	O(n) typical
`s + t` (lengths n, m)	O(n + m)
`''.join(list_of_strings)` (total length n)	O(n)

Repeated s += c in a loop does a new concatenation each time (copy of growing string), so total is O(n²). Use ''.join(parts) for O(n).

Deque (`collections.deque`)

append, appendleft, pop, popleft are O(1). Indexing d[i] in the middle is O(n). Use when you need O(1) at both ends (queue, sliding window).

Expert Tip

When analyzing a loop, multiply "number of iterations" by "cost per iteration." If the inner step is x in list (O(n)), and the loop runs n times, that's O(n²). Switching to a set makes the inner step O(1), so total O(n).

Summary

List: index/append/pop at end O(1) (append amortized); insert/pop at front or middle, in, index/count O(n); sort O(n log n).
Dict / Set: get, set, delete, in O(1) average; iteration O(n).
String: index O(1), slice O(k); in O(n); avoid repeated +=, use ''.join().

2.14 collections Module

Introduction

The collections module provides high-performance alternatives and extensions to built-in types. For DSA the most used are: deque (double-ended queue with O(1) append/pop at both ends), Counter (count hashable elements and get frequencies), and defaultdict (dict with a default value for missing keys). Knowing when to use each keeps your code simple and efficient.

deque (Double-Ended Queue)

collections.deque is a sequence that supports O(1) append, appendleft, pop, and popleft. Use it instead of a list when you need to add or remove at both ends (e.g., BFS queue, sliding window from both sides). Indexing in the middle is O(n), so use for queue/stack style access, not random access.

from collections import deque

q = deque()
q.append(1)
q.append(2)
q.appendleft(0)   # [0, 1, 2]
x = q.popleft()   # 0, q is [1, 2]
y = q.pop()       # 2, q is [1]

BFS pattern: q = deque([start]), then while q: node = q.popleft(); ...; q.append(neighbor). This is the standard queue for level-order traversal and shortest path in unweighted graphs.

Concept Note

List insert(0, x) and pop(0) are O(n) because elements shift. Deque uses a linked structure internally, so both ends are O(1). For a queue or a stack that sometimes needs to pop from the left, always use deque.

Counter

Counter counts hashable elements. Give it an iterable (list, string) and it returns a dict-like object: keys are elements, values are counts. in, [], and iteration work like dict; missing keys return 0 (no KeyError).

from collections import Counter

c = Counter([1, 2, 2, 3, 3, 3])   # Counter({3: 3, 2: 2, 1: 1})
c = Counter("hello")              # Counter({'l': 2, 'h': 1, 'e': 1, 'o': 1})
c["x"]   # 0 (missing key)
c.most_common(2)   # [(l, 2), (h, 1)] — 2 most common
c.subtract(Counter("he"))   # subtract counts in place

Building a frequency map from an array: freq = Counter(arr) is one line and O(n). Use most_common(k) for “top k frequent” and elements() to iterate with repetition. For anagrams, Counter(s1) == Counter(s2) is a clean O(n) check.

Expert Tip

Many “count frequency” or “top k frequent” problems are one-liners with Counter. If you need to add or subtract counts (e.g., sliding window frequencies), use c.subtract(iterable) or update with another Counter. For “first k most common,” most_common(k) returns a list of (elem, count) pairs.

defaultdict

defaultdict(factory) is a dict that calls factory() when a key is missing. defaultdict(int) gives 0 for missing keys—so d[x] += 1 works without get. defaultdict(list) gives []—so d[key].append(item) groups by key. Same O(1) average as dict.

from collections import defaultdict

freq = defaultdict(int)
for x in arr:
    freq[x] += 1   # no KeyError

groups = defaultdict(list)
for item in items:
    groups[item.key].append(item)

When to Use Which

deque: Queue (BFS), stack with occasional left pop, sliding window that shrinks from the left.
Counter: Frequency count from iterable, “top k frequent,” anagram check, subtract/add counts.
defaultdict: Frequency (int) or “group by key” (list) when you want to avoid get or explicit key checks.

Summary

deque: O(1) append/pop at both ends; use for BFS queue and when you need operations at both ends.
Counter: Count elements from iterable; most_common(k), subtract; ideal for frequency and anagram problems.
defaultdict: Dict with default value for missing keys; use for counts (int) or grouping (list).

2.15 heapq

Introduction

The heapq module provides a min-heap implementation on top of a list. The smallest element is always at index 0; after you pop it, the heap reorganizes so the next smallest is at the top. Operations heappush, heappop, and heapify are O(log n). There is no separate heap type—you use a list and call heapq functions so the list satisfies the heap invariant. For DSA, heaps are used for “top K,” “merge K sorted lists,” and priority queues (e.g., Dijkstra).

Why heapq Matters for DSA

When you need repeated “smallest (or largest) element” or “smallest among many candidates,” a heap gives O(log n) insert and O(log n) extract-min. That’s better than sorting the whole list each time (O(n log n)) or scanning for the min (O(n)). Top K elements, merge K sorted lists, and Dijkstra’s algorithm all rely on this “get min, add new candidates” pattern.

Basic API

heapq.heapify(lst) — Turn list lst into a min-heap in place. O(n). After this, lst[0] is the smallest element.
heapq.heappush(heap, x) — Add x to the heap, keeping the heap invariant. O(log n).
heapq.heappop(heap) — Remove and return the smallest element. O(log n).
heapq.heappushpop(heap, x) — Push x then pop the smallest. More efficient than push + pop when you’re replacing the min.
heapq.nsmallest(k, iterable) / heapq.nlargest(k, iterable) — Return the k smallest or k largest. Useful for one-off “top k”; for streaming or repeated updates, maintain a heap yourself.

import heapq

arr = [3, 1, 4, 1, 5]
heapq.heapify(arr)   # arr is now a min-heap; arr[0] == 1
x = heapq.heappop(arr)   # 1, heap size reduced
heapq.heappush(arr, 0)   # add 0; 0 becomes new min
smallest = arr[0]        # 0 — peek without popping (don't pop yet)

Min-Heap vs Max-Heap

heapq implements a min-heap only. For a max-heap, negate values: push -x, and when you pop, negate again to get the original maximum. Or use key with a wrapper: store (-priority, item) so that “largest priority” becomes “smallest -priority” and pops first.

# Max-heap: negate
heapq.heappush(heap, -x)   # push negative
max_val = -heapq.heappop(heap)   # pop and negate back

Time Complexity

heapify: O(n). heappush, heappop: O(log n) each. Peeking the min is O(1) (heap[0]). So “push n elements then pop n times” is O(n log n). “Top k from n” with a size-k heap: O(n log k)—push each of n elements, heap never exceeds size k.

Example: Top K Elements

Keep a min-heap of size k. For each element: if heap has fewer than k elements, push it; else if the new element is larger than the heap’s minimum, pop the min and push the new one. At the end, the heap contains the K largest. To get them in order, pop repeatedly (smallest of the K first) or use nsmallest on the heap. Alternatively, push all with negated values for “K largest” and pop k times from a max-heap (negated) to get largest first.

def top_k_largest(arr, k):
    heap = arr[:k]
    heapq.heapify(heap)   # min-heap of first k
    for x in arr[k:]:
        if x > heap[0]:
            heapq.heapreplace(heap, x)   # pop min, push x (or heappop + heappush)
    return heap   # k largest (unordered in heap; sort if needed)

heapq.heapreplace(heap, x) pops the smallest and then pushes x—equivalent to heappop then heappush, but slightly more efficient.

Common Mistake

The heap is just a list; the order of elements in the list is not fully sorted—only the heap invariant holds (parent ≤ children). So heap[0] is the min, but heap[1] is not necessarily the second smallest. Don’t iterate the list expecting sorted order; use heappop to get elements in order.

Expert Tip

For “merge K sorted lists,” push the first element of each list (with list index or iterator) into a min-heap. Pop the smallest, then push the next element from the same list. Repeat until the heap is empty. Each push/pop is O(log K); total O(N log K) for N total elements.

Summary

heapq gives a min-heap on a list: heapify, heappush, heappop; min is always at heap[0].
For a max-heap, negate values when pushing and when popping.
Use for top K, merge K sorted lists, and priority-queue algorithms (e.g., Dijkstra).

2.16 bisect

Introduction

The bisect module provides binary search on sorted lists. It finds the position where an element would be inserted to keep the list sorted, or where an existing element sits. All operations are O(log n). Use it when you have a sorted sequence and need “find index of x” or “insert x and keep sorted” without writing binary search by hand.

Why bisect Matters for DSA

Binary search on a sorted array is a core pattern: search for a value, find the first position ≥ x, or the last position ≤ x. bisect_left and bisect_right give you these insertion points; from that you can check “is x present?” or “how many elements are < x?”. Many problems (search in rotated array, lower/upper bound, binary search on answer) build on the same idea—bisect is the standard library way to get the indices right.

Main Functions

bisect.bisect_left(a, x) — Leftmost index where x can be inserted so the list stays sorted. If x is already in a, returns the index of the first occurrence. So “is x in a?” → i = bisect_left(a, x); i < len(a) and a[i] == x.
bisect.bisect_right(a, x) (alias bisect.bisect(a, x)) — Rightmost index where x can be inserted. If x is in a, one past the last occurrence. So “count of elements ≤ x” → bisect_right(a, x); “count of elements < x” → bisect_left(a, x).
bisect.insort_left(a, x) — Insert x in a at the position that keeps order (before equal elements). O(n) for the shift.
bisect.insort_right(a, x) (alias bisect.insort(a, x)) — Insert x after equal elements. O(n) for the shift.

import bisect

a = [1, 2, 2, 3, 4]
bisect.bisect_left(a, 2)   # 1 — first position for 2
bisect.bisect_right(a, 2) # 3 — one past last 2
bisect.bisect_left(a, 5)  # 5 — would go at end
# Is 2 in a? i = bisect_left(a, 2); i < len(a) and a[i] == 2  → True

Left vs Right

Use bisect_left when you want “first index ≥ x” or “where to insert so x is before equals.” Use bisect_right when you want “first index > x” (one past last x) or “count of elements ≤ x” (the return value is that count when the list is 0-indexed and sorted). For “lower bound” (first ≥ x): bisect_left. For “upper bound” (first > x): bisect_right.

Time Complexity

All bisect functions are O(log n) for the search. insort is O(n) because it shifts elements to make room. So use bisect for search; use insort only when you occasionally need to maintain a sorted list with inserts (e.g., small dynamic set). For many inserts, consider a structure that supports O(log n) insert (e.g., sorted container from a library) or batch sort.

Common Mistake

The list must be sorted (ascending) for bisect to be correct. If the list is not sorted, the result is meaningless. Also: bisect_left returns an index that can be len(a) (x is greater than all elements)—check bounds before using as an index.

Expert Tip

“Lower bound” = first index i such that a[i] >= x → bisect_left(a, x). “Upper bound” = first index i such that a[i] > x → bisect_right(a, x). Number of elements in [L, R] in sorted a → bisect_right(a, R) - bisect_left(a, L).

Summary

bisect_left(a, x) = leftmost insertion point (first ≥ x); bisect_right(a, x) = rightmost (first > x). List must be sorted.
Use for “is x in sorted list?,” “count in range [L, R],” lower/upper bound. O(log n) search; insort is O(n).

2.17 itertools

Introduction

The itertools module provides iterator building blocks: combinations, permutations, Cartesian product, chaining iterables, grouping, and more. These are lazy (one element at a time) and memory-efficient. For DSA you’ll use them for “all subsets of size k,” “all permutations,” “all pairs from two lists,” and similar enumeration tasks without writing nested loops or recursion by hand.

Why itertools Matters for DSA

Problems that ask for “all combinations,” “all permutations,” or “iterate over pairs (i, j)” can be done with itertools.combinations, permutations, or product. They’re correct, readable, and avoid off-by-one errors. For backtracking you often write your own recursion—but when the problem is “enumerate all,” the standard iterators are a quick and correct option (and you can wrap them in list() if you need the full list, though that uses O(n) space).

Combinations and Permutations

itertools.combinations(iterable, r) — All subsequences of length r in iterable order; no repeats, (a,b) same as (b,a).
itertools.combinations_with_replacement(iterable, r) — Same but elements can repeat (e.g., (1,1), (1,2), (2,2)).
itertools.permutations(iterable, r=None) — All orderings of length r (default: full length). (a,b) and (b,a) both appear.

import itertools

list(itertools.combinations([1,2,3], 2))   # [(1,2), (1,3), (2,3)]
list(itertools.permutations([1,2], 2))      # [(1,2), (2,1)]
list(itertools.combinations_with_replacement([1,2], 2))  # [(1,1),(1,2),(2,2)]

Product

itertools.product(*iterables, repeat=1) — Cartesian product: all pairs (or tuples) from the given iterables. product(A, B) is like nested loops “for a in A: for b in B.” With repeat=k, one iterable is used k times (e.g., all k-tuples from a set of choices).

list(itertools.product([1,2], ['a','b']))  # [(1,'a'),(1,'b'),(2,'a'),(2,'b')]
list(itertools.product([0,1], repeat=3))   # all 3-bit: (0,0,0)..(1,1,1)

Chain and Slice

itertools.chain(*iterables) — Iterate over the first iterable, then the second, and so on. Flattens one level. chain.from_iterable(iterables) takes one iterable of iterables.
itertools.islice(iterable, start, stop, step) — Slice an iterator (like list slice but lazy). No negative indices.

list(itertools.chain([1,2], [3,4]))   # [1, 2, 3, 4]
list(itertools.islice(range(10), 2, 6))   # [2, 3, 4, 5]

groupby

itertools.groupby(iterable, key=None) — Group consecutive elements that have the same key. The iterable should be sorted by the key for meaningful groups. Yields (key, group_iterator) pairs. Useful for “run-length” style grouping.

# Consecutive equal elements (sort first if needed)
for k, g in itertools.groupby([1,1,2,2,2,3]):
    print(k, list(g))   # 1 [1,1], 2 [2,2,2], 3 [3]

Other Useful Tools

itertools.cycle(iterable) — Repeat the iterable forever (use with islice to limit).
itertools.repeat(x, times=None) — Yield x repeatedly (or times times).
itertools.count(start=0, step=1) — Infinite counter.

Expert Tip

For “all subsets of size k” from a list: itertools.combinations(arr, k). For “all permutations of length k”: itertools.permutations(arr, k). For nested loops over indices: product(range(n), range(m)). Convert to list only if you need to index or reuse; otherwise iterate once to save memory.

Summary

combinations / permutations / product — enumerate subsets, orderings, or Cartesian product; lazy iterators.
chain, islice, groupby — combine or slice iterators, group consecutive keys.
Use itertools for “all combinations/permutations/pairs” to avoid manual loops and keep code clear.

2.18 functools

Introduction

The functools module provides tools for higher-order functions: caching, partial application, and reduction. For DSA the most important is lru_cache—a decorator that memoizes function results by argument (with an optional size limit). It turns a recursive solution with repeated subproblems into an efficient one without writing a cache by hand. Other tools like partial and reduce are useful in general Python but less central to algorithm implementation.

Why functools Matters for DSA

Recursive solutions often recompute the same arguments (e.g., Fibonacci, many DP problems). Memoization stores the return value for each argument so the second time you call with the same args you get the cached value. @lru_cache does this automatically: add the decorator and ensure arguments are hashable (e.g., use tuples instead of lists for state). Then the recursive function runs in time proportional to distinct argument values instead of exploding exponentially.

lru_cache

functools.lru_cache(maxsize=128, typed=False) — Decorator that caches the most recent results of a function. Arguments must be hashable (so use tuples, not lists, for composite state). Call the function with the same arguments again and you get the cached result (O(1) lookup) instead of recomputing.

from functools import lru_cache

@lru_cache(maxsize=None)   # unbounded cache
def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)

fib(100)   # fast — each n computed once

Without the decorator, fib(n) would be exponential. With it, each distinct n is computed once; subsequent calls with the same n return the cached value. Use maxsize=None for unbounded cache (all distinct argument tuples stored). Use a number (e.g., 1000) to limit memory; LRU evicts least recently used entries when full.

Hashable Arguments Only

The cache uses arguments as dict keys, so they must be hashable. If your state includes a list, convert it to a tuple for the call: f(tuple(arr), i). For multiple arguments, lru_cache treats the whole (arg1, arg2, ...) as the key.

Concept Note

Memoization = “remember results by arguments.” lru_cache is memoization with an optional cap (LRU eviction). Same idea as a manual dict cache: if args in cache: return cache[args]; cache[args] = result. The decorator does this for you and handles thread-safety and size limit.

partial

functools.partial(func, *args, **kwargs) — Returns a new callable that fixes some arguments of func. Useful when you need to pass a function that takes one argument (e.g., to sorted or map) but your function takes two—fix the second with partial.

from functools import partial
def power(base, exp):
    return base ** exp
square = partial(power, exp=2)
square(5)   # 25

reduce

functools.reduce(function, iterable, initializer=None) — Apply a two-argument function cumulatively: function(function(...(function(initial, first), second), ...), last). Classic example: reduce(lambda a, b: a + b, [1,2,3], 0) → 6. For “fold” or “reduce a sequence to one value” it’s built-in; for readability, a simple loop is often clearer. Use when it fits the problem (e.g., product of a list: reduce(operator.mul, arr, 1)).

Common Mistake

Using lru_cache on a function that takes unhashable arguments (e.g., a list). You’ll get TypeError: unhashable type: 'list'. Convert lists to tuples when calling, or use a tuple of primitive args that describes the state. Also: clear the cache between test cases in competitive coding—use fib.cache_clear() (or your decorated function’s cache_clear()).

Expert Tip

In DP/recursion, define the function with the minimal set of (hashable) parameters that define the state. Decorate with @lru_cache(maxsize=None). If you have a list that’s part of the state, pass indices or a tuple of relevant values instead of the list so the key is hashable.

Summary

lru_cache — Memoization decorator; use for recursive/DP functions with hashable args to avoid recomputation.
partial — Fix some arguments of a function; useful for callbacks. reduce — Fold an iterable to a single value.
For DSA, lru_cache is the main tool; ensure arguments are hashable (e.g., tuple, int).

2.19 Python Memory Management (Reference Counting & GC)

Introduction

Python manages memory mainly through reference counting: each object keeps a count of how many references point to it. When that count drops to zero, the object is reclaimed immediately. A garbage collector (GC) also runs to break reference cycles (e.g., A refers to B, B refers to A) that reference counting alone cannot free. For DSA you rarely tune this—but understanding “variables are references” and “when is an object freed?” helps you reason about aliasing, copies, and space.

Why This Matters for DSA

You don’t usually optimize Python memory at the level of refcounts. What matters is: assignment doesn’t copy objects (a = b means both names point to the same object); mutating that object affects all names; when you need a copy, use .copy() or list() or slicing. Knowing that “no more references = object can be freed” explains why a big structure can be garbage-collected when you leave the only reference to it (e.g., reassign the variable or return from a function and drop the local reference).

Reference Counting

Every object has a reference count: how many names, containers, or other objects point to it. When you assign x = [1, 2], the list’s refcount is 1. When you do y = x, it becomes 2—same object, two names. When y is reassigned or goes out of scope, the count drops. When it reaches 0, Python frees the object (and any objects it alone referenced). This is automatic and immediate for non-cyclic structures.

import sys
a = [1, 2, 3]
sys.getrefcount(a)   # 2 (a + the argument to getrefcount)
b = a
sys.getrefcount(a)   # 3
del b
# refcount drops; when a is no longer used, list is freed

getrefcount is for curiosity; the exact number includes the temporary reference from passing a to the function. The idea: more references = higher count; when all references are gone, the object is reclaimed.

Cycles and the Garbage Collector

If object A holds a reference to B and B holds a reference to A, their refcounts never become 0 even if no external reference exists. Python’s cycle detector (in the gc module) periodically finds such cycles and reclaims them. So you can have circular references (e.g., a graph node pointing to neighbors) and they will still be collected when unreachable. You normally don’t need to call gc.collect(); the runtime runs it when needed.

Implications for Your Code

Aliasing: a = b does not copy; both refer to the same object. Changes via a are visible via b. For DSA this is why “pass by reference” for lists matters: mutating inside a function affects the caller’s list.
When objects are freed: When the last reference disappears (variable reassigned, scope left, container holding the reference cleared), the object becomes eligible for reclamation. So a large local list is freed when the function returns (assuming you don’t return it or store it elsewhere).
No explicit free: You don’t manually free memory; dropping references is enough. To release a large structure early, delete the name (del big_list) or reassign it so nothing else points to the object.

Concept Note

“Reference” here means a pointer to an object: a variable name, an element in a list, a dict value, etc. Immutable objects (ints, tuples) can be interned or shared by the implementation, so refcount might be higher than you expect—but the model “no references → freed” still holds.

Summary

Python uses reference counting plus a garbage collector for cycles. When refcount hits 0 (and no cycle keeps it alive), the object is reclaimed.
Assignment copies the reference, not the object—so aliasing and mutation matter. Use explicit copy when you need an independent copy.
For DSA, this backs your mental model of “pass by reference” and when big structures get freed (when no reference remains).

2.20 Internals of list & dict (Dynamic Resizing & Amortized Analysis)

Introduction

Lists and dicts in Python grow dynamically: they allocate more memory when needed instead of fixing a size upfront. That growth is done in chunks (list) or by resizing the hash table (dict), so individual operations stay fast on average. Understanding dynamic resizing and amortized analysis explains why we say list.append is O(1) amortized and dict get/set is O(1) average—and why the occasional "expensive" resize doesn't make the whole sequence slow.

Why This Matters for DSA

When we state "append is O(1) amortized," we mean: over n appends, total time is O(n), so average per append is O(1). You don't need to implement resizing yourself—but knowing that lists over-allocate and resize in steps (and that dicts resize when load factor gets high) lets you trust the complexity we use in analysis and avoid micro-optimizations like preallocating list size "to be safe."

List: Dynamic Array

A list is implemented as a dynamic array: a contiguous block of pointers to objects. When you append and the current block is full, Python allocates a larger block, copies the pointers over, and frees the old one. It doesn't grow by one slot each time—it uses a growth strategy (in CPython, the new capacity is computed so that the sequence of appends does O(n) total work). So most appends just write into an existing empty slot (O(1)); occasionally one triggers a resize (O(current size)), but that cost is "spread" over the next many cheap appends. That's amortized O(1) per append.

Concept Note

Amortized means: if you do n operations, total cost is O(n), so average cost per operation is O(1). A single operation might sometimes be O(k) (e.g., resize when size was k), but the sum of all such costs over the life of the structure is linear in the number of operations. So we say "append is O(1) amortized."

Dict: Hash Table and Resizing

A dict is a hash table: an array of buckets, each holding key-value entries. On get/set, the key is hashed and the bucket is found; collisions are handled (e.g., by probing or chaining). When the table gets too full (high load factor = number of entries / number of buckets), the table is resized (e.g., doubled), all entries are rehashed into the new table. That resize is O(n), but it happens only when the load factor crosses a threshold, so over many insertions the average cost per insertion stays O(1). Deletes don't shrink the table immediately in CPython; the table can grow and stay large. So we say get/set/delete are O(1) average (assuming a good hash function and normal load).

Amortized Analysis Intuition

For a list: suppose every time we double the capacity we do work proportional to the current size. So we do 1 + 2 + 4 + 8 + … + (capacity at some point) units of copy work. That sum is less than 2 × final size. So for n appends, total copy work is O(n), hence O(1) per append on average. The "expensive" resizes are rare and their cost is amortized over the many cheap appends that follow. Same idea for dict: occasional O(n) resize, but averaged over all insertions, each insertion is O(1).

Expert Tip

In interviews you can say: "List append is O(1) amortized because the list over-allocates and resizes in chunks; over n appends the total work is O(n). Dict get/set is O(1) average because it's a hash table with resizing when load factor gets high." That shows you understand the internals at a level that supports complexity analysis.

Summary

List = dynamic array; grows in chunks so append is O(1) amortized (total work for n appends is O(n)).
Dict = hash table; resizes when load factor is high so get/set are O(1) average.
Amortized = total cost of n operations is O(n), so average per operation is O(1). Occasional expensive resize is spread over many cheap operations.

2.21 Object Internals (slots, is vs ==)

Introduction

Two concepts that come up in Python object model: is vs == (identity vs value equality), and __slots__ (restricting instance attributes to save memory and speed attribute access). For DSA you'll mostly care about is vs ==—especially using is None instead of == None by convention—and rarely __slots__ unless you create huge numbers of small objects.

is vs ==

is checks identity: do two names refer to the exact same object in memory? == checks value equality: do the objects compare equal (via __eq__)? Two different list instances with the same elements are == but not is. The same object is both is and ==.

a = [1, 2, 3]
b = [1, 2, 3]
a == b   # True  — same values
a is b   # False — different objects

c = a
a is c   # True  — same object

Use is None (and is not None) to check for None—it's the conventional and correct way, because there's only one None object. Using == None works but can be overridden by a custom __eq__; is None cannot. For other values, use == when you care about equality of content.

Common Mistake

Using == when you mean identity (e.g., "if x is None") or using is when you mean value equality. For numbers and strings, is can seem to work due to interning/caching of small integers, but don't rely on it—use == for comparing values.

slots

By default, each instance of a class has a __dict__ that stores its attributes—flexible but uses extra memory per instance. __slots__ is a class attribute that lists the only attribute names the instances are allowed to have. The class then uses a fixed-size structure (like a tuple) instead of a dict for those attributes, saving memory and giving slightly faster attribute access. You can't add new instance attributes beyond those in __slots__.

class Point:
    __slots__ = ('x', 'y')
    def __init__(self, x, y):
        self.x, self.y = x, y

p = Point(1, 2)
# p.z = 3   # AttributeError — not in __slots__

Use __slots__ when you have many small instances (e.g., graph nodes, events) and want to reduce memory. For DSA it's an optimization, not required for correctness.

Summary

is = identity (same object); == = value equality. Use is None / is not None for None checks.
__slots__ = fixed set of instance attributes; saves memory and speeds access when you have many small objects.

2.22 Concurrency Basics (GIL & Asyncio Overview)

Introduction

Python has a Global Interpreter Lock (GIL): only one thread can execute Python bytecode at a time in a single process. So multithreading doesn't give you parallel execution of CPU-bound Python code—threads take turns. For I/O-bound work (network, disk), threads can still help because the lock is released while waiting. Asyncio is a different model: cooperative concurrency with async/await, one thread, many tasks that yield during I/O. For DSA and algorithm interviews you rarely need concurrency; this section is an overview so you know the landscape.

The GIL (Global Interpreter Lock)

The GIL is a mutex that protects access to Python objects. Only one thread holds it at a time, so only one thread runs Python bytecode at a time. That simplifies the interpreter (no fine-grained locking on every object) but means:

CPU-bound: Multiple threads running pure Python computation don't run in parallel—they serialize. To use multiple CPU cores for CPU-bound work, use multiprocessing (separate processes, each with its own interpreter and GIL) or offload to C extensions that release the GIL.
I/O-bound: While a thread is waiting for I/O (network, file), it typically releases the GIL. So other threads can run. Multithreading can still speed up I/O-bound programs (e.g., many simultaneous HTTP requests) because waiting is overlapped.

Concept Note

For algorithm contests and interviews, code is almost always single-threaded. The GIL matters when you design systems (e.g., "should we use threads or processes for this worker pool?") or when you optimize I/O-heavy scripts. For DSA problem-solving, you can ignore the GIL.

Asyncio Overview

Asyncio provides cooperative concurrency: you define coroutines with async def and await other coroutines or I/O operations. One thread runs an event loop; when a coroutine hits await (e.g., waiting for a socket), the loop can run another coroutine. So many I/O operations can be in flight without blocking the thread—useful for servers handling many connections or scripts making many HTTP requests. It's not parallelism (one thread); it's concurrency (overlapping I/O wait).

Key ideas: async def defines a coroutine; await yields until the awaited thing completes; asyncio.run(main()) runs the top-level coroutine. For DSA you don't need to write async code; just know that asyncio is for I/O-bound concurrency on one thread, not for making CPU-bound algorithms faster.

When to Use What

Single-threaded: Default for DSA, scripts, most interview code.
Multithreading: I/O-bound work where you want to overlap waits (e.g., many URLs); limited by GIL for CPU.
Multiprocessing: CPU-bound work to use multiple cores; no GIL sharing (each process has its own).
Asyncio: I/O-bound, many concurrent operations; one thread, cooperative scheduling.

Summary

The GIL allows only one thread to run Python bytecode at a time; multithreading doesn't parallelize CPU-bound Python code.
Asyncio = cooperative concurrency with async/await; good for I/O-bound, one thread.
For DSA and interviews, single-threaded code is the norm; concurrency is for system design and I/O-heavy applications.

3.1 Big-O Notation

Introduction

Big-O notation describes how the time or space used by an algorithm grows as the input size grows. We use it to compare algorithms and to state guarantees: "this runs in O(n) time" means the running time is at most proportional to n (for large n), up to a constant factor. Big-O is the language of complexity analysis in interviews and in practice—you must be able to state it, derive it from code, and compare two Big-O expressions.

Why Big-O Matters

We care about growth rate, not the exact number of nanoseconds. When n doubles, does the time double (O(n)), quadruple (O(n²)), or stay roughly the same (O(1))? That tells us whether an algorithm will scale. Problem constraints (e.g., n ≤ 10⁵) combined with Big-O tell you if a solution will pass (e.g., O(n log n) is fine, O(n²) might be too slow). In interviews, you're expected to state complexity and justify it.

Formal Definition (Intuitive)

We say T(n) is O(g(n)) if, for large enough n, T(n) is at most a constant multiple of g(n). In symbols: there exist constants c > 0 and n₀ such that for all n ≥ n₀, T(n) ≤ c · g(n). So we ignore constant factors and lower-order terms: 5n + 3 is O(n), 2n² + n is O(n²). Big-O is an upper bound: "no worse than this growth."

Common Complexity Classes

From best to worst (for large n):

O(1) — Constant. Same cost regardless of n (e.g., hash lookup, array index).
O(log n) — Logarithmic. Doubling n adds a constant amount of work (e.g., binary search, balanced tree operations).
O(n) — Linear. Doubling n doubles the work (e.g., one pass over an array).
O(n log n) — Linearithmic. Typical for efficient sorting (e.g., merge sort, heapsort).
O(n²) — Quadratic. Nested loops over n (e.g., two loops over the array).
O(n³), O(2ⁿ), O(n!) — Higher; often too slow for large n unless the problem size is tiny.

Concept Note

We usually write "O(n)" not "O(n²)" when the bound is linear—we mean the function inside the O. So "runs in O(n) time" means time ≤ c·n for some c and large n. Base of logarithm doesn't matter for Big-O (log₂ n and log₁₀ n differ by a constant factor), so we write O(log n).

How to Derive Big-O from Code

Count basic steps (or representative operations) as a function of input size n.
Drop constant factors (e.g., 3n → n).
Keep the dominant term (the one that grows fastest as n grows). So 5n + 10 → O(n); n² + n → O(n²).

Single loop that does O(1) work per iteration → O(n). Two nested loops, each over n → O(n²). Loop that halves n each time → O(log n). Recursion with one call per level and n levels → O(n); with two calls per level and n levels (like naive Fibonacci) → exponential.

# O(n): one loop
for i in range(len(arr)):
    process(arr[i])

# O(n²): two nested loops
for i in range(n):
    for j in range(n):
        do_something()

Summary

Big-O = upper bound on growth rate; we ignore constants and lower-order terms.
Common: O(1), O(log n), O(n), O(n log n), O(n²); know what they mean and when they arise.
Derive from code by counting steps, dropping constants, keeping the dominant term.

3.2 Theta & Omega Notation

Introduction

Big-O gives an upper bound: "the algorithm is no worse than this growth." Sometimes we need a lower bound ("at least this much work") or a tight bound ("exactly this growth, up to constants"). Omega (Ω) is the lower bound; Theta (Θ) means "both upper and lower bound"—the growth is tightly characterized. Together, O, Ω, and Θ are the standard asymptotic notation; in interviews you'll mostly use O, but knowing Θ and Ω helps you state "optimal" precisely and understand lower-bound arguments.

Why This Topic Matters

When we say "this algorithm is optimal," we often mean its running time is Θ(f(n)) and that any algorithm for the problem must take Ω(f(n))—so we've matched the lower bound. For example, comparison-based sorting is Ω(n log n); merge sort is O(n log n), so merge sort is optimal (Θ(n log n) for that problem). Omega is also used in proofs: "you must look at every element at least once, so the algorithm is Ω(n)." Theta is the right way to say "the complexity is exactly this order of growth."

Omega (Ω): Lower Bound

We say T(n) is Ω(g(n)) if, for large enough n, T(n) is at least a constant multiple of g(n). Formally: there exist constants c > 0 and n₀ such that for all n ≥ n₀, T(n) ≥ c · g(n). So Ω describes a lower bound: "the algorithm does at least this much work." Example: finding the maximum in an unsorted array by comparison is Ω(n), because you must look at every element (otherwise an unseen element might be the max). So any correct algorithm is at least linear.

Concept Note

Big-O = "no worse than" (upper bound). Omega = "no better than" (lower bound). Saying "T(n) is Ω(n)" means T(n) grows at least as fast as n (up to a constant). So we're giving a guarantee that the cost doesn't disappear or grow slower than n.

Theta (Θ): Tight Bound

We say T(n) is Θ(g(n)) if T(n) is both O(g(n)) and Ω(g(n)). So for large n, T(n) is sandwiched between two constant multiples of g(n): c₁·g(n) ≤ T(n) ≤ c₂·g(n) for some c₁, c₂ > 0 and n ≥ n₀. That means the growth rate is exactly g(n), up to constants—no faster, no slower. Example: merge sort does Θ(n log n) comparisons—we can prove both O(n log n) and Ω(n log n) for comparison-based sorting on n elements, so Θ(n log n) is the tight bound.

Relationship: T(n) = Θ(g(n)) if and only if T(n) = O(g(n)) and T(n) = Ω(g(n)). So when you've proved both an upper and a lower bound with the same g(n), you write Θ(g(n)).

Comparison of O, Ω, and Θ

Notation	Meaning	Use
O(g(n))	T(n) ≤ c·g(n) for large n (upper bound)	"At most this fast"
Ω(g(n))	T(n) ≥ c·g(n) for large n (lower bound)	"At least this slow"
Θ(g(n))	T(n) between c₁·g(n) and c₂·g(n) (tight)	"Exactly this growth"

Examples

Linear scan to find max: We do n−1 comparisons → O(n). We must touch every element → Ω(n). So the algorithm is Θ(n).
Binary search (success): Each step halves the range → O(log n) comparisons. We need at least log₂ n steps to narrow from n to 1 → Ω(log n). So Θ(log n).
Bubble sort: O(n²) (nested loops). It can be Ω(n²) in the worst case (e.g., reverse order), so worst case is Θ(n²). Best case (already sorted) is O(n) but not Θ(n²)—Theta is for a specific scenario (e.g., worst case).

Example

Claim: "Comparison-based sorting of n elements is Ω(n log n)." So any algorithm that only compares elements must do at least on the order of n log n comparisons in the worst case. Merge sort does O(n log n), so merge sort is optimal for that model—its worst case is Θ(n log n).

When to Use Which

O: Most common. "My algorithm runs in O(n²) time." Safe and correct as long as you've proved an upper bound.
Ω: When you're proving "you can't do better" or "at least this much work is required." Used in lower-bound proofs and to justify optimality.
Θ: When you've proved both upper and lower bounds with the same growth. "Merge sort is Θ(n log n) in the comparison model." More precise than O when you know the bound is tight.

Common Mistake

Saying "this algorithm is Θ(n)" when you've only shown it's O(n). Theta requires a matching lower bound. For example, "find max in array" is O(n) and Ω(n) so Θ(n). But "check if array has an even number" can be done in O(1) if you see an even early—you might not read all n elements, so it's O(n) but not Ω(n) in general; the worst case might still be Θ(n) if you must scan all in the worst case.

Interview Insight

In interviews you'll usually state Big-O. If the interviewer asks "is that tight?" or "can we do better?", you can say: "It's O(n); and we need Ω(n) because we must look at every element, so it's Θ(n)—optimal for this problem." That shows you understand upper and lower bounds.

Summary

Ω(g(n)) = lower bound: T(n) ≥ c·g(n) for large n. "At least this much work."
Θ(g(n)) = tight bound: T(n) = O(g(n)) and T(n) = Ω(g(n)). "Exactly this growth."
Use O for upper bounds (most common); use Ω for lower-bound proofs; use Θ when you have both and they match.

3.3 Best / Worst / Average Case

Introduction

So far we've talked about Big-O, Theta, and Omega as ways to describe how an algorithm's cost grows with input size. But the same algorithm can take different amounts of time depending on what the input looks like. Linear search might find the target in the first cell (one step) or in the last (n steps). That's why we distinguish best case, worst case, and average case: they tell us how the algorithm behaves under different input scenarios. Master this and you'll know exactly what to say in interviews and how to choose algorithms in practice.

Real-World Analogy

Imagine searching for your keys at home. Best case: they're on the table by the door—you find them in one look. Worst case: they're in the last drawer you check, after searching every room. Average case: sometimes they're in the first place, sometimes the middle, sometimes the last—over many days, you do "about half" the possible search. The method (where you look) is the same; only the input (where the keys actually are) changes the cost. Algorithms work the same way: same code, different inputs → different runtimes.

Formal Definitions

Best Case

The best case is the scenario (or type of input) that minimizes the algorithm's work. For a given input size n, it's the minimum number of steps (or comparisons, or operations) the algorithm can take over all valid inputs of that size. We express it with Big-O (or Theta when the bound is tight) and call it "best-case time complexity."

Example: Linear search in an array of size n. Best case = target is at index 0 → 1 comparison → O(1) best case.

Worst Case

The worst case is the scenario that maximizes the algorithm's work. For input size n, it's the maximum number of steps over all valid inputs. We almost always care about worst case when we say "time complexity" because it's a guarantee: the algorithm will never do more than this, no matter how unlucky the input.

Example: Linear search, target not present or at last index → n comparisons → O(n) worst case.

Average Case

The average case is the expected (average) number of steps over some distribution of inputs—usually we assume all inputs of size n are equally likely, or we use a probability distribution that matches reality. Average case is harder to define and analyze than best/worst, but it often reflects "typical" performance.

Example: Linear search with target equally likely at any index (1 to n) or absent. If present: average position ≈ n/2 → about n/2 comparisons. Often we still say O(n) average, since the constant factor doesn't change the growth class.

Concept Note

Best/worst/average refer to which input we're considering, not different algorithms. We then describe each with asymptotic notation (O, Θ, Ω). So we get phrases like "worst-case O(n²)" or "average-case Θ(n log n)."

Why This Topic Matters

In interviews and in production, you need to know: (1) Worst case—so you can guarantee latency and avoid surprises. (2) Best case—so you know when an algorithm can be very fast (e.g., early exit). (3) Average case—when inputs are random or typical, average often predicts real behavior. Many algorithms (e.g., quicksort) have a bad worst case but a good average case; choosing them means accepting that worst case in exchange for speed on average.

Mental Model

Think of the algorithm as a fixed procedure. For each input size n, imagine all possible inputs of that size. Each input leads to some number of steps. Best case = minimum over those inputs; worst case = maximum; average case = average (with a defined distribution). The three cases can have different Big-O classes (e.g., best O(1), worst O(n), average O(n)).

Step-by-Step: Analyzing an Algorithm for All Three Cases

Identify the "cost" you're measuring (e.g., comparisons, swaps, or total operations).
Ask: what input minimizes this cost? That gives the best case; count steps and express in Big-O.
Ask: what input maximizes this cost? That gives the worst case.
For average case, define a distribution over inputs (e.g., uniform), compute expected cost, then simplify to Big-O.

Example: Linear Search

Problem: find index of target in list arr, or return -1 if not present.

def linear_search(arr, target):
    for i in range(len(arr)):
        if arr[i] == target:
            return i
    return -1

Best case: target at index 0. One comparison, then return. → O(1).
Worst case: target at last index or absent. n comparisons. → O(n).
Average case (target in array, uniform position): Position k (1-based) with probability 1/n; comparisons = k. Expected comparisons = (1 + 2 + … + n) / n = n(n+1)/(2n) = (n+1)/2 → Θ(n).

Example: Binary Search (on sorted array)

We repeatedly compare with the middle and discard half. Best case: target is the middle element → 1 comparison → O(1). Worst case: target absent or at a leaf of the "decision tree"—we halve until size 1 → about log₂ n comparisons → O(log n). Average case (assuming target equally likely in any position): also on the order of log n → O(log n). So for binary search, best is O(1), worst and average are O(log n).

Example: Bubble Sort

Compare adjacent pairs and swap if out of order; repeat until no swaps. Best case: array already sorted → one pass, no swaps, early exit possible → O(n). Worst case: array reverse sorted → every pass does maximum swaps, O(n) passes → O(n²). Average case: random order → still O(n²) comparisons and swaps on average. So we say: best O(n), worst O(n²), average O(n²).

ASCII Diagram: Where Do the Cases Come From?

Input size n = 5. "Cost" = number of comparisons (e.g., linear search).

  Input:   [T, ?, ?, ?, ?]  [?, ?, ?, ?, T]  [?, T, ?, ?, ?]  ...
           (target at 0)    (target at 4)    (target at 1)

  Cost:        1                 5                 2             ...

  Best case  = min(1, 5, 2, ...) = 1   →  O(1)
  Worst case = max(1, 5, 2, ...) = 5   →  O(n)
  Average    = (1+5+2+...)/#inputs     →  O(n)

Comparison Table: Best / Worst / Average

Algorithm	Best	Worst	Average
Linear search	O(1)	O(n)	O(n)
Binary search (sorted)	O(1)	O(log n)	O(log n)
Bubble sort	O(n)	O(n²)	O(n²)
Insertion sort	O(n)	O(n²)	O(n²)
Quicksort (naive pivot)	O(n log n)	O(n²)	O(n log n)
Merge sort	O(n log n)	O(n log n)	O(n log n)

Edge Cases and Assumptions

Empty input: Often O(1) or a single check. Sometimes counted separately or as part of best case.
Average case depends on distribution: If keys are usually at the start, "average" for linear search is better than n/2. We usually assume uniform or "random" when we say "average."
Worst case can be rare: Quicksort's O(n²) worst case happens for specific inputs (e.g., already sorted with first-element pivot); random shuffling makes it unlikely.

Common Mistake

Confusing "best case" with "fast algorithm." Best case only says: "in the most favorable input, we do this well." A bad algorithm can still have a great best case (e.g., one comparison) but terrible worst case. Always state which case you mean: "Linear search is O(1) in the best case and O(n) in the worst case."

Expert Tip

When you see "time complexity" without "best/worst/average," it almost always means worst case. So "sorting is O(n log n)" means worst-case O(n log n) for algorithms like merge sort. If someone says "average case," they'll say it explicitly.

Interview Insight

Interviewers often ask: "What's the time complexity?" Follow up with: "Best, worst, or average?" Then answer precisely: "Worst case O(n) because we might scan the whole array; best case O(1) if the target is first." For sorting, be ready to say why quicksort is used in practice (good average case) despite O(n²) worst case, and when merge sort is preferred (when you need a guaranteed bound).

Summary

Best case = input that minimizes work; worst case = input that maximizes work; average case = expected work over a chosen input distribution.
Same algorithm can have different O(·) for each case (e.g., linear search: O(1) best, O(n) worst and average).
Default "time complexity" usually means worst case. State best/worst/average explicitly when it matters.
Analyze by asking: what input minimizes/maximizes cost? Then express each with Big-O (or Θ when tight).

3.4 Loop & Nested Loop Analysis

Introduction

Most iterative algorithms are built from loops. To get their time complexity, we need a reliable way to count how many times the loop body runs and how much work each run does. A single loop that runs n times with O(1) work per iteration gives O(n). Two nested loops, each over n, give O(n²). This section teaches you to analyze single loops, nested loops, and common variations (loop bounds that change, multiple sequential loops, loop with a shrinking range) so you can derive Big-O from code quickly and correctly.

Why This Topic Matters

Loop structure is the main source of complexity in non-recursive code. If you can analyze loops, you can analyze most iterative algorithms—searching, sorting, matrix traversal, and many DP and greedy solutions. Interview problems often have obvious loop structure; being able to say "two nested loops over n → O(n²)" or "outer n, inner halves each time → O(n log n)" shows you understand complexity at a glance.

Mental Model

Total time = (number of iterations) × (work per iteration), where "work per iteration" is itself in Big-O. So we (1) figure out how many times the loop runs as a function of n, (2) figure out the cost per iteration (constant, or another loop, etc.), (3) multiply (or sum if different iterations do different work), then (4) simplify to Big-O by keeping the dominant term.

Single Loop Analysis

Fixed number of iterations

If the loop runs a constant number of times (e.g., 10, 100) independent of input size n, total cost is O(1). Example: a loop that runs exactly 5 times doing O(1) work each time → 5 × O(1) = O(1).

Loop runs n times

Most common: for i in range(n): or for i in range(len(arr)):. If the body does O(1) work per iteration, total is n × O(1) = O(n). If the body does O(log n) work per iteration (e.g., a binary search or a balanced-tree operation inside), total is n × O(log n) = O(n log n).

# O(n): n iterations, O(1) per iteration
for i in range(n):
    do_something_constant()

# Still O(n): body is O(1) in terms of n
for i in range(n):
    x = arr[i] + 1
    result.append(x)

Loop runs a fraction or multiple of n

If the loop runs n/2, 2n, or 5n times, we still get O(n)—constant factors are dropped. So for i in range(0, n, 2): runs n/2 times → O(n).

Loop that halves the range each time

If the loop variable doubles each time (e.g., i = 1; while i < n: i *= 2), the number of iterations is about log₂ n. With O(1) work per iteration → O(log n). If the body does O(n) work (e.g., process all elements), total is O(n log n).

# O(log n): i takes values 1, 2, 4, 8, ... up to n
i = 1
while i < n:
    do_something_constant()
    i *= 2

Concept Note

For a single loop, first ask: how many iterations as a function of n? Then: what is the cost of one iteration (in Big-O)? Multiply them. Sum only when different iterations do different amounts of work (e.g., inner loop size depends on outer index).

Nested Loop Analysis

Two loops, both run n times

Outer loop runs n times. For each outer iteration, inner loop runs n times. Inner body is O(1). So total = n × n × O(1) = O(n²). Classic pattern for "check every pair" or full matrix traversal.

# O(n²): n * n iterations, O(1) per inner iteration
for i in range(n):
    for j in range(n):
        process(i, j)

Inner loop depends on outer index

Very common: inner loop runs from 0 to i (or i to n). Example: for i in range(n): for j in range(i): .... Inner iterations: when i=0 → 0, i=1 → 1, …, i=n-1 → n-1. Total inner iterations = 0+1+2+…+(n-1) = n(n-1)/2 = Θ(n²). So still O(n²).

# O(n²): inner runs i times for each i; total 0+1+...+(n-1) = n(n-1)/2
for i in range(n):
    for j in range(i):
        do_something()

Why the sum 0+1+…+(n-1) is Θ(n²)

Sum of first k integers = 1+2+…+k = k(k+1)/2. So 0+1+…+(n-1) = (n-1)n/2 = (n² - n)/2. For large n, the n² term dominates; we drop the linear term and constant factor → Θ(n²). So any time the inner loop runs "up to the outer index" (or "from outer index to n"), expect O(n²).

Three nested loops, each O(n)

n × n × n = O(n³). Same idea: multiply the number of iterations of each loop when they're independent (or sum over outer index if inner bounds depend on it).

Step-by-Step: Deriving Complexity from Loops

Identify the loops and their bounds (as functions of n or input size).
For each loop, determine how many times it runs (exact or in Big-O). If the bound depends on another loop variable, express it in terms of that variable first.
Multiply iteration counts for nested loops when the inner count doesn't depend on the outer index, or sum over the outer index when it does (e.g., inner runs 0 to i → sum i from 0 to n-1).
Multiply by work per iteration if it's not O(1).
Simplify to a single Big-O (drop constants, keep dominant term).

ASCII Diagram: Nested Loop Iterations

Outer i:    0    1    2    3   ...  n-1
Inner j:   (0)  (0,1) (0,1,2) (0..3) ... (0..n-1)
Count:      0    1      2      3    ...  n-1

Total inner iterations = 0 + 1 + 2 + ... + (n-1) = n(n-1)/2  →  O(n²)

Common Loop Patterns (Quick Reference)

Pattern	Time
Single loop, n iterations, O(1) body	O(n)
Single loop, doubles until n (e.g. i *= 2)	O(log n)
Two nested loops, each n, O(1) inner body	O(n²)
Outer n, inner 0..i (or i..n)	O(n²)
Three nested loops, each n	O(n³)
Outer n, inner halves each time (e.g. binary search style)	O(n log n)

Python Examples with Line-by-Line Complexity

Example 1: Sequential loops

def two_loops(arr):
    total = 0
    for x in arr:           # n iterations, O(1) each → O(n)
        total += x
    for x in arr:           # n iterations, O(1) each → O(n)
        total += x * 2
    return total            # O(n) + O(n) = O(n)

Two independent loops, each O(n). We add their costs: O(n) + O(n) = O(n). So overall O(n).

Example 2: Nested loops (full pair check)

def count_pairs(arr):
    count = 0
    for i in range(len(arr)):      # n times
        for j in range(len(arr)):  # n times each → n * n
            if i != j and arr[i] + arr[j] == 0:
                count += 1
    return count                   # O(n²)

Outer n, inner n, body O(1). Total O(n²).

Example 3: Inner loop from 0 to i

def prefix_sums(arr):
    n = len(arr)
    result = [0] * n
    for i in range(n):              # i = 0, 1, ..., n-1
        for j in range(i + 1):      # j runs 1, 2, ..., n times
            result[i] += arr[j]     # inner: 1+2+...+n = n(n+1)/2
    return result                   # O(n²)

Inner iterations: 1 + 2 + … + n = n(n+1)/2 → O(n²).

Edge Cases

Loop with early break/return: Complexity is the worst case over all inputs—assume the loop runs as many times as the bound allows unless you're analyzing best/average separately.
Loop bound is min(n, k) or similar: If k is a constant, effective iterations ≤ k → O(1). If k is a parameter, express complexity in both n and k (e.g., O(n × k)).
Inner loop that doesn't start at 0: Same idea—count how many times it runs for each outer value and sum (e.g., j from i to n-1 gives (n-i) iterations for outer i; sum = n(n-1)/2 → O(n²)).

Common Mistake

Multiplying when you should add: two sequential loops (one after the other) add: O(n) + O(n) = O(n). Two nested loops multiply: O(n) × O(n) = O(n²). Also: confusing "inner runs i times" with "inner runs n times"—the total is the sum over i (0 to n-1), which is Θ(n²), not n.

Expert Tip

When the inner loop's bound depends on the outer index, write the total as a sum (e.g., Σ i from 0 to n-1). That sum is often an arithmetic series; simplify it to a closed form (e.g., n(n-1)/2) and then to Big-O. This is how you prove "nested loop with inner 0..i" is O(n²), not O(n).

Interview Insight

Interviewers often ask "what's the time complexity?" after you write a solution. For loop-based code, say the structure clearly: "We have two nested loops over the array, so O(n²)." If the inner bound depends on the outer index, briefly say "inner runs up to i each time, so total iterations are on the order of n²." That shows you can derive it, not just memorize.

Summary

Single loop: (iterations) × (work per iteration). n iterations with O(1) body → O(n); log n iterations → O(log n).
Nested loops: Multiply when bounds are independent (n × n → O(n²)). When inner bound depends on outer index, sum over outer index (e.g., 0+1+…+(n-1) = Θ(n²)).
Sequential loops: Add their costs: O(n) + O(n) = O(n).
Derive by: count iterations per loop, multiply or sum as appropriate, multiply by work per iteration, then simplify to Big-O.

3.5 Recursion Tree Method

Introduction

Recursive algorithms are described by recurrence relations: equations that express T(n) in terms of T on smaller inputs (e.g., T(n) = 2T(n/2) + n). To find their time complexity, we need to solve the recurrence. The recursion tree method does this by drawing a tree: each node represents the cost at one level of recursion, and we sum the costs level by level (or across leaves) to get the total. It gives intuition, works for many recurrences that don't fit the Master Theorem, and is a standard tool in interviews and coursework.

Why This Topic Matters

Merge sort, quicksort, binary search, and many divide-and-conquer algorithms have runtimes given by recurrences like T(n) = 2T(n/2) + Θ(n). The recursion tree method lets you see why the total is O(n log n): you draw the tree, sum the work at each level, and observe that there are log n levels with O(n) work per level. When the Master Theorem doesn't apply (e.g., T(n) = T(n/3) + T(2n/3) + n), the tree method still works. It also builds the intuition you need for the Master Theorem and for substitution proofs.

What Is a Recurrence?

A recurrence is an equation that defines T(n) in terms of T on smaller arguments. Example: T(n) = 2T(n/2) + n says: "to solve a problem of size n, we do n work and then solve two subproblems of size n/2." We need a base case (e.g., T(1) = 1 or T(1) = O(1)) to stop the recursion. The goal is to find a closed form or Big-O for T(n).

Concept Note

In recurrences we often write T(n/2) even when n is odd; we assume n is a power of 2 for simplicity, or use floor/ceiling. The asymptotic result (e.g., O(n log n)) is the same. We also often write "+ n" or "+ Θ(n)" for the "non-recursive" work at the current level—the cost of dividing and combining.

Recursion Tree: The Idea

Imagine the recurrence as a tree:

Root = one call of size n; we charge the "extra" work at this level (the + n part) to the root.
Children = the recursive calls. So T(n) = 2T(n/2) + n gives two children, each of size n/2; we write the cost "n" at the root.
Each child again has its own cost (n/2 at that level) and its children (size n/4), and so on until we hit the base case (size 1).
Total cost = sum of the costs written at every node. We can sum by level: level 0 has 1 node with cost n; level 1 has 2 nodes with cost n/2 each → total n; level 2 has 4 nodes with cost n/4 each → total n; … So each level sums to n, and there are log₂ n levels → total Θ(n log n).

Step-by-Step: Recursion Tree Method

Write the recurrence in the form T(n) = (sum of T(subproblems)) + (work at this level). Put the "work at this level" as the node cost.
Draw the tree (at least conceptually): root cost, then children with their costs, and so on. Identify the pattern: how many children per node? What size? What cost per node at level i?
Find the number of levels until base case. For T(n/2), size halves each level → about log₂ n levels. For T(n/3), about log₃ n levels.
Sum the cost per level, then sum over levels. Or sum over all leaves (if each leaf has the same cost and you know the count) and add the internal cost—whichever is easier.
Simplify to Big-O. Often the sum is a geometric series or "constant per level × number of levels."

Worked Example: T(n) = 2T(n/2) + n

This is the merge sort recurrence. Assume T(1) = Θ(1).

Level 0:        [n]                    cost = n
               /   \
Level 1:    [n/2]   [n/2]              cost = n/2 + n/2 = n
            /  \     /  \
Level 2:  [n/4][n/4][n/4][n/4]        cost = 4·(n/4) = n
            ...
Level k:  2^k nodes, each cost n/2^k   cost = 2^k · (n/2^k) = n

Every level has total cost n. Number of levels: size goes n → n/2 → … → 1, so k levels when n/2^k = 1 → k = log₂ n. So total = n × log₂ n = Θ(n log n).

Worked Example: T(n) = 2T(n/2) + Θ(1)

Same structure, but only constant work at each node. Level 0: 1; level 1: 2; level 2: 4; … level k: 2^k nodes, each Θ(1). Total = 2^0 + 2^1 + … + 2^(log n) = 2^(log n + 1) − 1 = Θ(n). So T(n) = Θ(n)—the tree is a full binary tree with Θ(n) leaves, and constant work per node.

When the Per-Level Cost Changes

If the work at level i is not constant (e.g., it decreases with depth), write the cost at each level and sum. Example: T(n) = 2T(n/2) + n². Root cost n²; level 1: 2 × (n/2)² = n²/2; level 2: 4 × (n/4)² = n²/4; … level k: 2^k × (n/2^k)² = n²/2^k. Total = n²(1 + 1/2 + 1/4 + …) ≤ 2n² → O(n²). Here the root dominates; the series is geometric and sums to a constant.

ASCII: Summing by Levels vs by Leaves

Sum by levels (T(n)=2T(n/2)+n):
  L0: n
  L1: n
  L2: n
  ...
  L_log n: n
  Total: n * (log n + 1) = Θ(n log n)

Sum by leaves (for T(n)=2T(n/2)+Θ(1)):
  Number of leaves = 2^(log n) = n
  Work per leaf = Θ(1)
  Total = Θ(n)
  (Internal work is also Θ(n), same order.)

Common Recurrence Patterns (Recursion Tree View)

Recurrence	Result
T(n) = 2T(n/2) + n	Θ(n log n)
T(n) = 2T(n/2) + Θ(1)	Θ(n)
T(n) = 2T(n/2) + n²	Θ(n²)
T(n) = T(n/2) + n	Θ(n)
T(n) = T(n-1) + n	Θ(n²)

Edge Cases and Tips

Uneven splits: T(n) = T(n/3) + T(2n/3) + n. The tree is not full; the "depth" is determined by the branch that shrinks slowest (2n/3). After k steps, size is at most (2/3)^k n; so depth ≈ log_{3/2} n. Per-level cost is still O(n). Total O(n log n).
More than two subproblems: T(n) = 3T(n/2) + n. Each node has 3 children of size n/2. Level k has 3^k nodes, cost (n/2^k) each → total n×(3/2)^k. Sum over k = 0 to log₂ n: geometric with ratio 3/2 → dominated by last term, O(n^(log₂ 3)).
Base case: Use T(1) = c or T(0) = c. For asymptotic result, the exact base constant doesn't change Big-O.

Common Mistake

Counting only the leaves and forgetting the internal work. T(n) = 2T(n/2) + n has Θ(n) leaves if we think of "work at base case," but the recurrence charges "+ n" at each internal node. So we must sum all node costs (or sum per level), not just the bottom level. For T(n) = 2T(n/2) + n, the internal work dominates (n log n); for T(n) = 2T(n/2) + Θ(1), leaf count gives Θ(n) and that's correct.

Expert Tip

When per-level cost is the same at every level (e.g., n at each level for T(n)=2T(n/2)+n), total = (cost per level) × (number of levels). When per-level cost forms a geometric series (increasing or decreasing), the sum is dominated by the first or last term—use the geometric sum formula and then simplify to Big-O.

Interview Insight

If asked "why is merge sort O(n log n)?" you can say: "The recurrence is T(n) = 2T(n/2) + n. In the recursion tree, each level does O(n) work and there are O(log n) levels, so total O(n log n)." Drawing a small tree (root + one level of children) is enough to show you understand the method. For recurrences that don't match the Master Theorem, saying "I'd draw the recursion tree and sum the levels" is a good approach.

Summary

Recurrence = equation like T(n) = 2T(n/2) + n; recursion tree = picture of cost at each level of recursion.
Method: Draw tree (root = current work, children = subcalls); find number of levels and cost per level; sum over levels (or use geometric series); simplify to Big-O.
Classic: T(n) = 2T(n/2) + n → log n levels, n work per level → Θ(n log n).
Use the tree when the Master Theorem doesn't apply or when you want a clear visual derivation.

3.6 Master Theorem

Introduction

The Master Theorem (or Master Method) gives a cookbook solution for recurrences of the form T(n) = aT(n/b) + f(n): we split the problem into a subproblems of size n/b, and do f(n) extra work. By comparing f(n) with the quantity n^{log_b a} (the "critical exponent"), the theorem tells you whether the total time is dominated by the base-case work, balanced, or dominated by the top-level work—and gives the answer in one step. It's the fastest way to solve many divide-and-conquer recurrences without drawing the recursion tree.

Why This Topic Matters

Merge sort, binary search, and many recursive algorithms fit T(n) = aT(n/b) + f(n). The Master Theorem lets you state their complexity in seconds: "a=2, b=2, f(n)=n → n^(log_b a)=n, so Case 2 → Θ(n log n)." In interviews, knowing the three cases and when they apply is enough to justify divide-and-conquer runtimes. When the recurrence doesn't fit (e.g., a or b not constant, or f(n) has a different form), you fall back to the recursion tree or substitution method.

Standard Form and the Critical Exponent

Recurrence in standard form:

T(n) = aT(n/b) + f(n)

a ≥ 1: number of subproblems per step.
b > 1: factor by which the problem size shrinks (so each subproblem has size n/b).
f(n): cost of dividing and combining (the "non-recursive" work). We assume f(n) is asymptotically positive.

The critical exponent is log_b a. So the "size" of the recursive work (ignoring f) is like n^{log_b a}: that's the number of leaves in the recursion tree (a^(log_b n) = n^(log_b a)). The theorem compares f(n) with n^{log_b a} to decide which dominates.

Concept Note

We often write n/b meaning floor or ceiling; for the theorem we assume n is a power of b so that sizes are integers. The asymptotic result is unchanged. Also, f(n) is usually given as Θ(n^k), Θ(n^k log n), etc.; the theorem uses a precise "polynomial comparison" condition.

The Three Cases

Case 1: f(n) is polynomially smaller than n^{log_b a}

If f(n) = O(n^{log_b a − ε}) for some constant ε > 0, then the recursive part dominates. Result: T(n) = Θ(n^{log_b a}).

Intuition: the work at the root (and each level) is tiny compared to the growth of the leaf count; total is dominated by the leaves.

Case 2: f(n) is the same order as n^{log_b a}

If f(n) = Θ(n^{log_b a} log^k n) for some k ≥ 0 (usually k = 0 or 1), then work is balanced across levels. Result: T(n) = Θ(n^{log_b a} log^k+1 n). For k = 0: T(n) = Θ(n^{log_b a} log n).

Intuition: every level does about the same total work; there are Θ(log n) levels, so we get an extra log factor.

Case 3: f(n) is polynomially larger than n^{log_b a}

If f(n) = Ω(n^{log_b a + ε}) for some ε > 0, and if the regularity condition holds (a·f(n/b) ≤ c·f(n) for some c < 1 and large n), then the top-level work dominates. Result: T(n) = Θ(f(n)).

Intuition: the root (and upper levels) do so much work that the sum is dominated by f(n). The regularity condition ensures that work decreases as we go down the tree.

Quick Reference Table

Condition	T(n)
f(n) = O(n^{log_b a − ε}), ε > 0	Θ(n^{log_b a})
f(n) = Θ(n^{log_b a} log^k n), k ≥ 0	Θ(n^{log_b a} log^k+1 n)
f(n) = Ω(n^{log_b a + ε}), ε > 0, and regularity	Θ(f(n))

Step-by-Step: How to Apply the Master Theorem

Identify a, b, and f(n) from T(n) = aT(n/b) + f(n).
Compute log_b a. So n^{log_b a} is your comparison benchmark. (Tip: log_b a = (ln a)/(ln b).)
Compare f(n) with n^{log_b a}:
- If f(n) is O(n^{log_b a − ε}) for some ε > 0 → Case 1 → T(n) = Θ(n^{log_b a}).
- If f(n) is Θ(n^{log_b a} log^k n) → Case 2 → T(n) = Θ(n^{log_b a} log^k+1 n).
- If f(n) is Ω(n^{log_b a + ε}) and a·f(n/b) ≤ c·f(n) for some c < 1 → Case 3 → T(n) = Θ(f(n)).
If none of the cases clearly applies (e.g., f(n) is not polynomial in n, or recurrence has a different form), use recursion tree or substitution.

Worked Examples

Example 1: T(n) = 2T(n/2) + n (merge sort)

a = 2, b = 2, f(n) = n. So n^{log_b a} = n^{log₂ 2} = n¹ = n. We have f(n) = n = Θ(n) = Θ(n^{log_b a} log⁰ n). That's Case 2 with k = 0. So T(n) = Θ(n^{log_b a} log¹ n) = Θ(n log n).

Example 2: T(n) = T(n/2) + Θ(1) (binary search)

a = 1, b = 2, f(n) = Θ(1). So n^{log_b a} = n^{log₂ 1} = n⁰ = 1. We have f(n) = Θ(1) = Θ(n⁰). So f(n) = Θ(n^{log_b a} log⁰ n) → Case 2, k = 0. T(n) = Θ(n⁰ log n) = Θ(log n).

Example 3: T(n) = 2T(n/2) + Θ(1)

a = 2, b = 2, f(n) = Θ(1). n^{log_b a} = n. So f(n) = O(n^1−ε) for any ε in (0,1] (e.g. f(n) = O(n^0.5)). Case 1. T(n) = Θ(n^{log_b a}) = Θ(n).

Example 4: T(n) = 2T(n/2) + n²

a = 2, b = 2, f(n) = n². n^{log_b a} = n. So f(n) = n² = Ω(n^1+ε) for ε = 1. Regularity: 2·(n/2)² = n²/2 ≤ c·n² for c = 1/2 < 1. Case 3. T(n) = Θ(n²).

Example 5: T(n) = 3T(n/2) + n (e.g. Strassen-like split)

a = 3, b = 2, f(n) = n. n^{log_b a} = n^{log₂ 3} ≈ n^1.58. So f(n) = n = O(n^{1.58 − ε}) for small ε. Case 1. T(n) = Θ(n^{log₂ 3}) ≈ Θ(n^1.58).

When the Master Theorem Does Not Apply

T(n) = 2T(n/2) + n log n: f(n) = n log n is larger than n but not polynomially larger than n (no n^1+ε). Case 2 with k=1 gives T(n) = Θ(n log² n)—check the exact statement for log^k n in f(n).
T(n) = T(n/2) + T(n/2) + n is really 2T(n/2)+n, so it applies. But T(n) = T(n/3) + T(2n/3) + n has two different fractions; standard form assumes one n/b. Use recursion tree.
a or b not constant: e.g. T(n) = n·T(√n) + n. Form is different; use substitution or tree.

Common Mistake

Using Case 2 when f(n) is only O(n^{log_b a}) but not Θ(n^{log_b a} log^k n). For Case 2 we need f(n) = Θ(n^{log_b a} log^k n). If f(n) is strictly smaller (e.g. f(n) = O(n^0.9) and n^{log_b a} = n), that's Case 1. Also: forgetting the regularity condition in Case 3—without it, the theorem doesn't guarantee T(n) = Θ(f(n)).

Expert Tip

For Case 2, the most common situation is f(n) = Θ(n^{log_b a})—i.e. no log factor in f(n). Then k = 0 and T(n) = Θ(n^{log_b a} log n). Merge sort is the classic: f(n)=n, n^{log_b a}=n, so Θ(n log n).

Interview Insight

You don't need to recite the theorem word-for-word. Say: "It's the form aT(n/b)+f(n); I compare f(n) to n^(log_b a). Here that's n, and f(n)=n so they're equal—Case 2—so Θ(n log n)." For 3T(n/2)+n, say "log_b a is log₂ 3, so n^(log₂ 3) is bigger than n; the leaves dominate, Case 1, so Θ(n^(log₂ 3))." That shows you understand the three cases.

Summary

Standard form: T(n) = aT(n/b) + f(n). Compute n^{log_b a} and compare with f(n).
Case 1: f(n) polynomially smaller → T(n) = Θ(n^{log_b a}).
Case 2: f(n) = Θ(n^{log_b a} log^k n) → T(n) = Θ(n^{log_b a} log^k+1 n).
Case 3: f(n) polynomially larger and regularity → T(n) = Θ(f(n)).
When the recurrence doesn't fit (e.g. different split ratios), use the recursion tree or substitution method.

3.7 Space Complexity

Introduction

Space complexity is how much extra memory an algorithm uses, as a function of input size n (and sometimes other parameters). We express it in Big-O, just like time: O(1), O(n), O(n²), etc. It tells you whether your solution will run in limited memory (e.g. on embedded systems or with huge inputs) and whether you can improve by using less auxiliary space—e.g. in-place algorithms use O(1) extra space. This section covers what to count, how to analyze iterative and recursive code, and how to state space complexity clearly in interviews.

Why This Topic Matters

Interviews often ask "what's the space complexity?" in addition to time. Recursive solutions can be O(n) space just from the call stack; building a copy of the input adds O(n). In-place algorithms are valued when you must not use extra linear space. In practice, running out of memory can be as bad as running too long—so understanding space helps you choose and justify algorithms (e.g. iterative vs recursive DFS, merge sort vs in-place quicksort in terms of space).

Auxiliary Space vs Total Space

Total space = space for input + output + any extra data structures and call stack. Sometimes we say "the algorithm uses O(n) space" meaning total.
Auxiliary space = extra space used beyond the input (and sometimes beyond the output, depending on convention). So "O(1) auxiliary space" means we only use a constant number of extra variables; the input itself is not counted.

In DSA and interviews, "space complexity" usually means auxiliary space—we don't count the input (or the output if it's required, e.g. returning a new array). So "merge sort is O(n) space" means O(n) extra for the temporary arrays, not counting the input array. Always clarify if the problem says "space" without specifying: most of the time it's auxiliary.

Concept Note

When the output size is Θ(n) or larger (e.g. returning a list of all results), some definitions count it and some don't. In interviews, it's common to say "O(n) auxiliary space" to exclude the output, or "O(n) space including output." Stating "auxiliary" avoids ambiguity.

What to Count

Extra variables and data structures: A list of size n you build → O(n). A hash map with n keys → O(n). A few integers → O(1).
Call stack (recursion): Each recursive call uses space for parameters, return address, and local variables. Depth of recursion × space per frame. Example: recursive binary search has depth Θ(log n), so O(log n) space if each frame is O(1).
Input and output: Usually not counted in auxiliary space. If you must count total space, input is often Θ(n) and output can be too.

Iterative Code: No Recursion

Space is just the extra data structures and variables. One loop with a fixed number of variables → O(1). Building a list of size n → O(n). Two arrays of size n → O(n) (we don't double-count constants in Big-O, but "two arrays" is still O(n)). Nested loops don't add space unless you allocate per iteration (e.g. a new list each time would be dangerous and likely O(n²) or worse).

# O(1) auxiliary: only a few variables
def max_element(arr):
    m = arr[0]
    for x in arr[1:]:
        if x > m:
            m = x
    return m

# O(n) auxiliary: copy of input
def sorted_copy(arr):
    return sorted(arr)   # returns new list of size n

Recursive Code: Call Stack

Space = (maximum depth of recursion) × (space per frame). Frame space is usually O(1) per call (parameters and locals). So if depth is Θ(n), space is O(n); if depth is Θ(log n), space is O(log n).

# O(n) space: depth n, O(1) per frame
def fact(n):
    if n <= 1:
        return 1
    return n * fact(n - 1)

# O(log n) space: depth log n, O(1) per frame
def binary_search_rec(arr, t, lo, hi):
    if lo > hi:
        return -1
    mid = (lo + hi) // 2
    if arr[mid] == t:
        return mid
    if arr[mid] > t:
        return binary_search_rec(arr, t, lo, mid - 1)
    return binary_search_rec(arr, t, mid + 1, hi)

Common Space Complexities (Quick Reference)

Scenario	Auxiliary Space
Few variables, no extra structures, iterative	O(1)
One extra array/list of size n	O(n)
Hash map/set with n entries	O(n)
Recursion depth n, O(1) per frame	O(n)
Recursion depth log n (e.g. binary search)	O(log n)
Matrix/table of size n×n (e.g. DP)	O(n²)

In-Place and O(1) Space

An in-place algorithm uses O(1) extra space (or at most O(log n) for recursion). It may overwrite the input. Examples: swapping two elements, reversing an array with two pointers, some sorting algorithms (e.g. quicksort with tail recursion or iterative implementation can be O(log n) stack; "in-place" often means no extra array). Saying "we can do this in O(1) space" is a strong claim—it usually means no extra arrays or maps that grow with n.

Time vs Space Trade-off

Often you can use more space to get faster time: e.g. hash map for O(1) lookup instead of O(n) scan—O(n) space for O(n) or O(1) time. Or memoization (DP): store subproblem results to avoid recomputation—extra space for less time. The reverse: in-place algorithms save space but may be trickier or have the same time. In interviews, stating the trade-off ("we can do O(n) time and O(n) space with a set, or O(n²) time and O(1) space with two loops") shows good understanding.

Common Mistake

Forgetting the call stack in recursive code. A recursive function that only uses a few variables per call still uses O(depth) space. Also: counting the input in auxiliary space—we typically don't. And saying "O(1) space" when you're building a list of results of size n; that list is O(n) unless the problem explicitly doesn't count the output.

Expert Tip

To reduce recursion space, convert to iteration with an explicit stack if needed. For example, DFS can be implemented recursively (O(h) or O(n) stack space) or iteratively with a stack data structure—the stack still uses O(n) in the worst case, but you avoid stack overflow and can sometimes optimize what you store. For "O(1) space" requirements, prefer iterative solutions or tail recursion where the language optimizes it.

Interview Insight

When asked "what's the space complexity?", say both what you're counting and the result: "Auxiliary space is O(n) because we use a hash set of seen elements." For recursion: "O(n) space for the call stack since we have n recursive calls." If you give an O(n) space solution, you can add: "We could do O(1) space if we sort in place and use two pointers, but that would change the time to O(n log n)." That shows you understand the trade-off.

Summary

Space complexity = extra memory as a function of n, usually meaning auxiliary space (excluding input, and often output).
Count: extra data structures (arrays, maps, etc.) and call stack depth for recursion.
Iterative: space = size of extra variables and structures. Recursive: space = depth × space per frame.
O(1) = in-place style; O(n) = one extra linear structure or linear recursion depth; O(log n) = logarithmic depth (e.g. binary search recursion).
Time–space trade-offs: more space can mean faster time (e.g. hash map); less space may mean more time or a more complex algorithm.

3.8 Amortized Analysis

Introduction

Amortized analysis gives a bound on the average cost per operation over a sequence of operations, rather than the cost of a single operation in the worst case. Some operations in the sequence may be expensive (e.g. resizing an array), but if they happen rarely, the average cost per operation can still be low. We say "amortized O(1) per append" meaning: over n appends, total cost is O(n), so each append "costs" O(1) on average. This is how we justify that list.append in Python (and dynamic arrays in general) is O(1) amortized, even though a single append can trigger an O(n) resize.

Why This Topic Matters

Dynamic arrays (Python list, C++ vector, Java ArrayList), hash tables, and many data structures have operations that are cheap most of the time and occasionally expensive. Worst-case per-operation analysis would say "append can be O(n)," which is misleading—we don't pay O(n) every time. Amortized analysis tells the right story: append is O(1) amortized, so building a list of n elements is O(n) total. In interviews, saying "append is O(1) amortized" shows you understand the difference between worst-case per operation and amortized cost.

Amortized vs Worst-Case vs Average Case

Worst-case (per operation): The cost of a single operation in the worst possible scenario. One append might be O(n) when it triggers a resize.
Average case (over inputs): Expected cost of one operation when the input is random. That's a different notion—we're not averaging over a sequence of operations.
Amortized: We fix a sequence of operations (e.g. n appends). Total cost of the sequence is T. Amortized cost per operation = T / n. So we're spreading the cost of expensive operations over the whole sequence.

Concept Note

Amortized analysis does not use probability—it's deterministic. We consider the worst possible sequence of operations and show that even then, the total cost is bounded; dividing by the number of operations gives the amortized cost. So "O(1) amortized" means: for any sequence of n operations, total cost is O(n).

Classic Example: Dynamic Array (List) Append

Start with an array of capacity 1 (or some constant). When we append and the array is full, we allocate a new array of double the size, copy all elements, then append. So most appends are O(1); every time we double, we do O(current size) work. Question: over n appends, what is the total cost?

We do at most O(1) work for each of the n appends, plus the cost of copies during resizes. Copy sizes: 1, 2, 4, 8, … up to at most n (roughly). So total copy cost ≤ 1 + 2 + 4 + … + n ≤ 2n (geometric series). So total operations = n (appends) + O(n) (copies) = O(n). Hence amortized cost per append = O(n)/n = O(1).

Append #:  1  2  3  4  5  6  7  8  9  ...
Cost:      1  1  2  1  1  1  1  4  1  ...
           (copy 1)  (copy 1,2)       (copy 1..4)

Total after n appends: n + (1+2+4+...+ ≤n) = n + O(n) = O(n)
Amortized per append: O(1)

Three Methods for Amortized Analysis

Aggregate method

Sum the total cost of n operations; show it's T(n); then amortized cost = T(n)/n. We used this for dynamic array: total O(n), so O(1) amortized per append.

Accounting (banker's) method

Assign a "charge" (amortized cost) to each operation. Some of that charge pays for the operation itself; the rest is "credit" stored for later. We require that credit never goes negative. If we charge 2 units per append (so amortized O(1)), then a cheap append uses 1 and saves 1; when we resize we use the saved credit to pay for the copy. So total charge n×2 = O(n) covers all real cost.

Potential method

Define a "potential" function Φ on the data structure state. Let c_i be the real cost of the i-th operation and Φ_i the potential after it. We show that amortized cost â_i = c_i + Φ_i − Φ_i−1 is small (e.g. O(1)). Then sum of â_i = sum of c_i + Φ_n − Φ₀; if Φ is always non-negative and bounded, total real cost is bounded by sum of â_i. For dynamic array, Φ can be "2 × (number of elements) − capacity"; when we double, the drop in potential pays for the copy.

For interviews, the aggregate method is usually enough: "Total cost of n appends is O(n), so O(1) amortized."

Summary of Dynamic Array Resize

Doubling when full: copy sizes 1, 2, 4, … up to ≈ n. Sum = O(n). So n appends cost O(n) total → O(1) amortized per append.
If we grew by a constant (e.g. +10 each time): after n appends we'd copy 10 + 20 + 30 + … ≈ O(n²) total → amortized O(n) per append. So doubling (or any constant factor growth) is essential for O(1) amortized.

When Amortized Analysis Applies

Use it when you have a sequence of operations and some are occasionally expensive. Examples: dynamic array append/push, hash table insert (with rehashing), splay tree operations, incrementing a binary counter (flipping bits). We don't use "amortized" for a single standalone operation—we use it for the cost per operation in a long run.

Common Mistake

Confusing amortized with average case. Amortized is "worst-case total over the sequence, divided by number of operations"—no randomness. Average case is "expected cost of one operation under a distribution over inputs." Also: saying "append is O(1)" without "amortized" is fine in practice (everyone understands), but technically it's O(1) amortized; a single append can be O(n) in the worst case.

Expert Tip

When you see "dynamic array" or "list that grows," think doubling (or 1.5× or similar) and geometric series. The sum of resize costs is O(n), so amortized O(1) per insert. Same idea applies when rehashing a hash table: if we double the table when load factor is high, insert is O(1) amortized.

Interview Insight

If asked "what's the time complexity of appending n elements to a list?" say "O(n) total, so O(1) amortized per append—occasionally we double and copy, but the total copy cost is O(n)." If they ask "why is append O(1)?" you can say "amortized O(1) because we double the capacity when full, so the total work for n appends is O(n)." That shows you know the difference between one expensive operation and amortized cost.

Summary

Amortized cost = (total cost of a sequence of operations) / (number of operations). We bound the total, then divide.
Used when some operations are occasionally expensive (e.g. resize); over the sequence, the average cost per operation is small.
Dynamic array append with doubling: total O(n) for n appends → O(1) amortized per append. Constant-size growth would give amortized O(n).
Methods: aggregate (sum total, divide), accounting (charge and credit), potential (potential function). For interviews, aggregate is usually enough.
Amortized is not average case: it's deterministic worst-case over the sequence.

3.9 Recurrence Relations

Introduction

A recurrence relation is an equation that defines a function T(n) in terms of its values on smaller inputs—e.g. T(n) = 2T(n/2) + n. Recurrences appear whenever we analyze recursive algorithms: the cost of solving a problem of size n is the cost of the "current step" plus the cost of solving smaller subproblems. To get Big-O for the algorithm, we must solve the recurrence—find a closed form or an asymptotic bound. This section ties together recurrences as a concept, when they arise, and how to solve them using the tools from earlier topics (recursion tree, Master Theorem, and substitution).

Why This Topic Matters

Merge sort, quicksort, binary search, divide-and-conquer, and many recursive DP or backtracking algorithms have runtimes that naturally express as recurrences. Writing the recurrence is step one; solving it gives you the complexity. Recurrence relations are the bridge between "what the code does" and "what is T(n) in Big-O." In interviews, you might be asked to write a recurrence for your recursive solution and then solve it (or say "it fits the Master Theorem, so Θ(n log n)").

What Is a Recurrence Relation?

Formally, a recurrence for T(n) has the form:

T(n) = (expression involving T(n₁), T(n₂), …) + (non-recursive work)

where n₁, n₂, … are smaller than n (e.g. n−1, n/2, n/3). We also need a base case: T(1) = c or T(0) = c so the recursion stops. The "non-recursive work" is the cost of dividing the problem and combining results (e.g. merging two sorted halves). We want to find a closed form or asymptotic bound for T(n).

Concept Note

We often write T(n/2) even when n is odd; we assume n is a power of 2 for simplicity, or we use floor/ceiling. The Big-O result is the same. The "non-recursive" term is usually written as f(n)—e.g. Θ(n), Θ(1), Θ(n²)—and we use the Master Theorem or recursion tree by comparing f(n) with n^(log_b a).

Where Recurrences Come From

Divide-and-conquer: Split into a subproblems of size n/b, do f(n) work. T(n) = aT(n/b) + f(n). Examples: merge sort 2T(n/2)+n, binary search T(n/2)+1.
Linear reduction: One subproblem of size n−1 plus linear work. T(n) = T(n−1) + n or T(n) = T(n−1) + Θ(1). Examples: factorial, simple recursive scan.
Multiple subproblems with different sizes: T(n) = T(n/3) + T(2n/3) + n. Doesn't fit the standard Master Theorem form; use recursion tree.

Common Recurrence Types and Their Solutions

Recurrence	Solution	Typical use
T(n) = T(n−1) + Θ(1)	Θ(n)	Single recursion, constant work
T(n) = T(n−1) + n	Θ(n²)	Single recursion, linear work
T(n) = T(n/2) + Θ(1)	Θ(log n)	Binary search
T(n) = T(n/2) + n	Θ(n)	One half, linear merge
T(n) = 2T(n/2) + Θ(1)	Θ(n)	Tree traversal style
T(n) = 2T(n/2) + n	Θ(n log n)	Merge sort
T(n) = 2T(n/2) + n²	Θ(n²)	Heavy combine step

How to Solve Recurrences: Method Overview

Recursion tree (3.5): Draw the tree, sum cost per level (or over leaves). Works for any recurrence; especially useful when Master Theorem doesn't apply (e.g. T(n/3)+T(2n/3)+n).
Master Theorem (3.6): For T(n) = aT(n/b) + f(n). Compare f(n) with n^(log_b a); apply Case 1, 2, or 3. Fast when it fits.
Substitution: Guess the form (e.g. T(n) = O(n log n)), then prove by induction. Useful when you have a candidate bound and need to verify, or when the recurrence is non-standard.
Expand and sum: For simple recurrences like T(n) = T(n−1) + n, unroll: T(n) = n + (n−1) + … + T(1) = n(n+1)/2 + c = Θ(n²).

Quick Derivation: T(n) = T(n−1) + n

T(n) = n + T(n−1) = n + (n−1) + T(n−2) = … = n + (n−1) + … + 1 + T(0) = n(n+1)/2 + Θ(1) = Θ(n²).

Quick Derivation: T(n) = 2T(n/2) + n

Master Theorem: a=2, b=2, f(n)=n, n^(log_b a)=n. So f(n)=Θ(n)=Θ(n^(log_b a)). Case 2 → T(n)=Θ(n log n). Or recursion tree: each level costs n, log n levels → n log n.

When the Recurrence Doesn't Fit Standard Form

Uneven splits: T(n) = T(n/3) + T(2n/3) + n. Use recursion tree; depth from slowest branch (2n/3); per-level cost O(n) → O(n log n).
More than one recursive term with different arguments: Still draw the tree and sum, or try substitution with a guessed bound.
f(n) not polynomial: e.g. T(n) = 2T(n/2) + n log n. Master Theorem (extended) or recursion tree; result is often Θ(n log² n) or similar.

Common Mistake

Forgetting the base case when writing a recurrence—without it, T(n) is not well-defined. Also: using the Master Theorem when the recurrence isn't in the form aT(n/b)+f(n) (e.g. T(n)=T(n/2)+T(n/3)+n). And confusing the "combine" cost with the "divide" cost—both go into f(n) as the non-recursive part.

Expert Tip

When you see a recursive algorithm, write the recurrence first: "We do f(n) work and make a calls of size n/b" → T(n)=aT(n/b)+f(n). Then decide: does the Master Theorem apply? If yes, plug in. If no (e.g. uneven split), use the recursion tree. For T(n)=T(n−1)+something, unrolling usually gives a simple sum.

Interview Insight

If you give a recursive solution, the interviewer may ask "what's the recurrence?" Say something like: "We split into two halves and do O(n) merge, so T(n)=2T(n/2)+n." Then: "That's Master Theorem Case 2, so Θ(n log n)." For a recurrence that doesn't fit, say "I'd draw the recursion tree and sum the levels." That shows you know the full toolkit.

Summary

Recurrence relation = equation defining T(n) in terms of T on smaller inputs plus non-recursive work; need a base case.
Common forms: T(n)=aT(n/b)+f(n) (divide-and-conquer), T(n)=T(n−1)+g(n) (linear reduction).
Solve with: recursion tree, Master Theorem (when form fits), substitution, or expand-and-sum for simple cases.
Know the classic solutions: T(n−1)+n → Θ(n²); 2T(n/2)+n → Θ(n log n); T(n/2)+1 → Θ(log n).
When the recurrence is non-standard (uneven splits, non-polynomial f), use the recursion tree or substitution.

3.10 Proof of Time Complexity (Basic Induction)

Introduction

So far we've derived time complexity using recursion trees and the Master Theorem. To be rigorous, we can prove that our solution is correct—e.g. that T(n) = O(n log n) for the recurrence T(n) = 2T(n/2) + n. The standard way to do that is the substitution method: guess a bound (e.g. T(n) ≤ c·n log n), then use induction to show that the recurrence implies the bound for a suitable constant c and large n. This section introduces basic induction and the substitution method so you can prove (or verify) complexity bounds when needed—and understand why the Master Theorem and recursion tree conclusions are valid.

Why This Topic Matters

In coursework and sometimes in interviews, you may need to justify that your asymptotic bound is correct—not just state it. The substitution method is the standard proof technique for recurrence solutions. It also helps when the recurrence doesn't fit the Master Theorem: you guess the answer from the recursion tree, then prove it by substitution. Understanding induction makes you confident that "T(n) = 2T(n/2) + n ⇒ T(n) = Θ(n log n)" is not just a formula but a provable fact.

Induction in One Paragraph

Induction proves a statement P(n) for all natural numbers n (or all n ≥ n₀). You show: (1) Base case: P(n₀) is true. (2) Inductive step: For every n > n₀, if P(k) is true for all k < n (or for k = n−1, in simple induction), then P(n) is true. Then by induction, P(n) holds for all n ≥ n₀. For recurrences we often use strong induction: assume the bound holds for all smaller values (T(1), T(2), …, T(n−1)), then prove it for T(n) using the recurrence.

Concept Note

We prove Big-O bounds: "T(n) ≤ c·f(n) for n ≥ n₀." We choose c and n₀ so that the base case and the inductive step both work. Sometimes we need to subtract a lower-order term in the guess (e.g. T(n) ≤ c·n log n − dn) so that the recurrence "falls into" the bound when we substitute.

The Substitution Method (Steps)

Guess the form of the solution: e.g. T(n) ≤ c·n log n for some constant c > 0.
State the inductive hypothesis: Assume T(k) ≤ c·k log k for all k < n (and for k in the range we care about, e.g. k ≥ 2).
Plug into the recurrence: T(n) = 2T(n/2) + n ≤ 2(c·(n/2) log(n/2)) + n = c·n log(n/2) + n = c·n log n − c·n + n.
Show the right-hand side is ≤ your guess: We want c·n log n − c·n + n ≤ c·n log n. That holds if −c·n + n ≤ 0, i.e. c ≥ 1. So choose c = 1 (or larger).
Base case: For n = 1 (or small n), T(1) = O(1). Choose c large enough so that c·1·log 1 = 0 doesn't matter; we may need to set n₀ ≥ 2 and check T(2) by hand, then the inductive step applies for n ≥ 2.

Worked Example: T(n) = 2T(n/2) + n ⇒ T(n) = O(n log n)

Guess: T(n) ≤ c·n log n for n ≥ 2, for some c ≥ 1.

Inductive hypothesis: Assume T(k) ≤ c·k log k for all 2 ≤ k < n.

Recurrence: T(n) = 2T(n/2) + n. By the hypothesis (with k = n/2, and we assume n/2 ≥ 2 so n ≥ 4, or we handle n = 2, 3 separately), T(n/2) ≤ c·(n/2)·log(n/2). So:

T(n) ≤ 2·c·(n/2)·log(n/2) + n = c·n·log(n/2) + n = c·n·(log n − 1) + n = c·n log n − c·n + n.

We need T(n) ≤ c·n log n. So we need −c·n + n ≤ 0, i.e. c ≥ 1. Choose c = 1. Then T(n) ≤ n log n for n ≥ 2 (with base case checked). Hence T(n) = O(n log n).

When the Guess Needs a Lower-Order Term

Sometimes a "plain" guess like T(n) ≤ c·n log n doesn't work because the recurrence gives an extra positive term that won't disappear. Then we guess with a subtraction: T(n) ≤ c·n log n − dn. Substituting, we get something like c·n log n − dn − (d − 1)n; we choose d so that (d − 1)n ≥ 0 (e.g. d ≥ 1), and then the right-hand side is ≤ c·n log n − dn. So the inductive step goes through. The Master Theorem and recursion tree already tell us the correct form; the subtraction trick is just to make the algebra work in the proof.

Proving Θ (Upper and Lower Bounds)

To prove T(n) = Θ(n log n), we prove both T(n) = O(n log n) and T(n) = Ω(n log n). For the upper bound we use substitution as above. For the lower bound, we guess T(n) ≥ c·n log n and show that the recurrence implies it (with a suitable c > 0 and possibly a lower-order term added in the guess). The idea is the same; the inequalities are reversed.

Summary of the Substitution Method

Guess the asymptotic form (from recursion tree or Master Theorem).
Assume by (strong) induction that the bound holds for all smaller inputs.
Substitute into the recurrence and show that the bound holds for n.
Choose constants (c, n₀, and sometimes a subtracted term) so that the base case and inductive step both work.

Common Mistake

Using the recurrence to "prove" the recurrence—e.g. writing "T(n) = 2T(n/2) + n, so by the Master Theorem T(n) = O(n log n)." That's correct but not a substitution proof. In substitution you must plug your inductive hypothesis (the bound for T(n/2)) into the recurrence and show the bound for T(n). Also: forgetting the base case—induction requires it.

Expert Tip

In interviews you usually just state the bound and cite the Master Theorem or recursion tree. Substitution is for when someone asks "can you prove it?" or in written exams. If you do prove by substitution, keep the algebra simple: write "T(n) ≤ 2·(c·(n/2) log(n/2)) + n = …" and end with "≤ c·n log n when c ≥ 1."

Interview Insight

Most interviewers are satisfied with "By the Master Theorem, Case 2, so Θ(n log n)" or "The recursion tree has log n levels and n work per level, so O(n log n)." If they ask "can you prove it?" say: "We'd use the substitution method: assume T(k) ≤ c·k log k for k < n, substitute into T(n) = 2T(n/2)+n, and show T(n) ≤ c·n log n for some c." You don't need to do the full algebra unless they insist.

Summary

Induction proves a statement for all n: base case + inductive step (assume for smaller k, prove for n).
Substitution method for recurrences: guess a bound (e.g. T(n) ≤ c·n log n), assume it holds for all smaller inputs, substitute into the recurrence, and show the bound holds for n; choose c and base case appropriately.
Sometimes the guess needs a lower-order term (e.g. c·n log n − dn) so the inequality works out.
To prove Θ, prove both O and Ω.
In practice, recursion tree and Master Theorem are enough for stating complexity; substitution is for rigor or when asked to prove.

4.1 Number Systems

Introduction

When you see the number 42, you instantly read it as “forty-two.” When a computer stores that same value, it uses a different representation: 101010 in binary. The value is the same; the way we write it depends on the number system (base) we choose. In algorithms and programming, you will constantly meet binary (bits, masks, XOR), hexadecimal (memory addresses, colors, hashes), and sometimes octal. Understanding how number systems work—and how to convert between them—is essential for bit manipulation, low-level reasoning, and many interview problems.

Real-World Analogy

Think of number systems like different languages for writing the same quantities.

Decimal (base 10): What we use every day. Ten symbols: 0–9. “42” means 4×10 + 2×1.
Time: Clocks use base 60 for seconds and minutes (60 seconds in a minute), and base 12 for hours (12 on the clock face). So “1:30” is one way of writing a duration—same idea as a different base.
Binary (base 2): Only two symbols: 0 and 1. Computers use it because hardware is built from switches that are either on or off. “101010” is the same quantity as decimal 42, just written in base 2.
Hexadecimal (base 16): Sixteen symbols: 0–9 and A–F. Shorthand for binary (one hex digit = four bits). Used in memory addresses, color codes (#FF5733), and when debugging.

The value (how many) doesn’t change; only the notation (how we write it) changes with the base.

Formal Definition

A number system (or numeral system) is a way of representing numbers using a fixed set of symbols and a base (radix). In a positional system, the value of a digit depends on its position in the number.

Concept Note

In base b, a number written as d_k d_k−1 … d₁ d₀ (where each d_i is a digit from 0 to b−1) has value:

d_k·b^k + d_k−1·b^k−1 + … + d₁·b¹ + d₀·b⁰.

So in base 10, 42 = 4×10¹ + 2×10⁰. In base 2, 101010 = 1×2⁵ + 0×2⁴ + 1×2³ + 0×2² + 1×2¹ + 0×2⁰ = 32 + 8 + 2 = 42.

The radix is the number of distinct digits. Base 2 → digits 0,1. Base 10 → 0–9. Base 16 → 0–9, A(10), B(11), C(12), D(13), E(14), F(15).

Why This Topic Matters

In DSA and interviews, number systems show up everywhere:

Bit manipulation: AND, OR, XOR, shifts—all operate on the binary representation. You need to think in base 2 to set/clear/test bits.
Fast exponentiation / modular arithmetic: The binary expansion of the exponent drives “square and multiply” algorithms (covered later).
Encoding and hashing: Hex strings represent raw bytes (e.g., SHA hashes). Parsing and generating such strings requires comfort with base 16.
Problem constraints: Some problems ask for “the number of 1s in the binary representation” or “convert to base k”—direct number-system questions.

Building a clear mental model of bases and conversion will make later topics (Bit Manipulation, Fast Exponentiation, Modular Arithmetic) much easier.

Mental Model

In any base b, each position is a power of b. The rightmost digit is the “ones” place (b⁰), the next is “bs” (b¹), then “b²s,” and so on. So:

  Base 10:  ... thousands  hundreds  tens   ones
            ... 10³        10²       10¹    10⁰
  Example:  4    2     →   4×10 + 2×1 = 42

  Base 2:   ... 32  16  8  4  2  1
            ... 2⁵  2⁴  2³ 2² 2¹ 2⁰
  Example:  1   0   1  0  1  0  →  32+8+2 = 42

  Base 16:  ... 256  16  1
            ... 16²  16¹ 16⁰
  Example:  2    A       →  2×16 + 10×1 = 42

Same number, different “columns.” Conversion is just re-expressing the same total in a different column system.

Decimal (Base 10)

We use digits 0–9. Each place is a power of 10. You already do this intuitively: 347 = 3×100 + 4×10 + 7×1. Nothing new here except naming: this is our default radix.

Binary (Base 2)

Only two digits: 0 and 1. Every number is a sum of distinct powers of 2. That’s why binary is natural for computers: each bit is one “switch” (on/off).

Rightmost bit = 2⁰ = 1 (least significant bit, LSB).
Next left = 2¹ = 2, then 4, 8, 16, … (most significant bit, MSB, on the left).

Example

42 in binary: 42 = 32 + 8 + 2 = 1×2⁵ + 0×2⁴ + 1×2³ + 0×2² + 1×2¹ + 0×2⁰ → digits from MSB to LSB: 101010. So 42₁₀ = 101010₂.

Counting in binary: 0, 1, 10, 11, 100, 101, 110, 111, 1000, … (same idea as rolling over digits when you pass 9 in decimal).

Octal (Base 8)

Digits 0–7. Each octal digit corresponds to exactly three binary digits (because 8 = 2³). So conversion between binary and octal is trivial: group bits in threes from the right. Less common today than hex, but you may see file permissions (e.g., chmod 755) expressed in octal.

Example: 42₁₀ = 101010₂. Group as 101 | 010 → 5 and 2 in octal → 52₈. Check: 5×8 + 2 = 42 ✓.

Hexadecimal (Base 16)

Digits 0–9 and A–F (A=10, B=11, C=12, D=13, E=14, F=15). One hex digit = four bits (16 = 2⁴). So binary ↔ hex is a simple grouping by fours. Hex is compact and easy to read; that’s why memory addresses, RGB values (e.g., #FF5733), and many hashes are shown in hex.

Example

42₁₀ = 101010₂. Pad to four-bit groups: 0010 | 1010 → 2 and 10 (A in hex) → 0x2A or 2A₁₆. Check: 2×16 + 10 = 42 ✓.

Step-by-Step: Converting Between Bases

Two main directions: from decimal to base b, and from base b to decimal.

Decimal → Base b (e.g., Decimal → Binary)

Method: repeated division by b; remainders (read in reverse) give digits from LSB to MSB.

Divide the number by b. The remainder is the rightmost digit (LSB).
Take the quotient and divide by b again. The new remainder is the next digit to the left.
Repeat until the quotient is 0. The sequence of remainders (last to first) is the number in base b.

Example: 42 → binary (base 2). 42÷2=21 rem 0; 21÷2=10 rem 1; 10÷2=5 rem 0; 5÷2=2 rem 1; 2÷2=1 rem 0; 1÷2=0 rem 1. Remainders (bottom to top): 1,0,1,0,1,0 → 101010.

Base b → Decimal

Method: expand by place value. Multiply each digit by its place power and add: d_k·b^k + … + d₀·b⁰. For binary, that’s “add the powers of 2 where the bit is 1.”

Example: 101010₂ = 1×32 + 0×16 + 1×8 + 0×4 + 1×2 + 0×1 = 32+8+2 = 42.

Binary ↔ Hex (Shortcut)

Binary → Hex: group bits in fours from the right; replace each group with the corresponding hex digit (0–9, A–F). Hex → Binary: replace each hex digit with its 4-bit binary form (e.g., A → 1010).

ASCII Diagram: Place Value Across Bases

  Decimal 42 in different bases (same value, different digits):

  Base 10:    4    2     → 4×10¹ + 2×10⁰
  Base  2:  1  0  1  0  1  0  → 1×2⁵ + 1×2³ + 1×2¹
  Base  8:    5    2     → 5×8¹ + 2×8⁰
  Base 16:    2    A     → 2×16¹ + 10×16⁰

  Position:  (MSB) ............ (LSB)
  Value:     b^(n-1) ... b^1  b^0

Python Implementation

Python has built-in support for number systems. You can parse strings in a given base and format integers in binary, octal, or hex.

Parsing: String in Base b → Integer

# int(string, base) — base 2 to 36
int("101010", 2)    # 42
int("52", 8)        # 42
int("2A", 16)       # 42
int("2a", 16)       # 42 (hex digits case-insensitive)

# Optional prefix: 0b, 0o, 0x (base inferred)
int("0b101010", 0)  # 42
int("0o52", 0)      # 42
int("0x2A", 0)      # 42

Formatting: Integer → String in Base b

# bin(), oct(), hex() return strings with prefix
bin(42)   # '0b101010'
oct(42)   # '0o52'
hex(42)   # '0x2a'

# Without prefix: use format() or slice
format(42, 'b')   # '101010'
format(42, 'o')   # '52'
format(42, 'x')   # '2a'
format(42, 'X')   # '2A' (uppercase hex)

# Generic base (2–36) — no built-in; implement with repeated division

Custom Base Conversion (Decimal to Base b)

def to_base(n: int, b: int) -> str:
    if n == 0:
        return "0"
    digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    result = []
    neg = n < 0
    n = abs(n)
    while n:
        result.append(digits[n % b])
        n //= b
    if neg:
        result.append("-")
    return "".join(reversed(result))

# Examples
to_base(42, 2)   # "101010"
to_base(42, 16)  # "2A"
to_base(255, 16) # "FF"

Custom Base Conversion (Base b String to Decimal)

def from_base(s: str, b: int) -> int:
    s = s.strip().upper()
    if not s:
        return 0
    start = 1 if s[0] in "-+" else 0
    sign = -1 if s[0] == "-" else 1
    n = 0
    digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    for c in s[start:]:
        n = n * b + digits.index(c)
    return sign * n

# Examples
from_base("101010", 2)  # 42
from_base("2A", 16)     # 42

Line-by-Line Explanation: `to_base`

if n == 0: return "0" — edge case: zero in any base is "0".
digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" — one character per value 0 to 35; supports bases 2–36.
result = [] — we collect remainder digits; we'll reverse at the end because we get LSB first.
neg, n = abs(n) — handle negative by converting to positive and tracking sign; then we process positive n.
while n: — repeat until quotient is 0. n % b is the next digit (LSB first); n //= b is the quotient for the next iteration.
return "".join(reversed(result)) — remainders were collected LSB-first, so reversing gives MSB-first (normal written form). Prepend "-" if the number was negative.

This is exactly the “repeated division” algorithm we described earlier; the code just automates it.

Edge Cases

Zero: In any base, 0 is written "0". Your to_base(0, b) should return "0".
Negative numbers: Standard conversion is for non-negative integers. If you need negative, handle the sign separately (e.g., convert abs(n) and prepend "-").
Empty string: int("", 2) raises ValueError. In custom from_base, you might return 0 or raise; document the choice.
Invalid digits: e.g. int("12", 2) — digit 2 is invalid in base 2. Python raises ValueError. In your own parser, validate each character against the base.
Large numbers: Python integers are arbitrary-precision, so no overflow; int("1"*1000, 2) works.

Common Mistakes

Reading binary/hex backwards: The rightmost bit is the LSB (2⁰). Don’t treat the leftmost as “first” when computing value—left is MSB, highest power.
Confusing prefix with value: 0b101010 is a literal in code; the value is 42. When you bin(42) you get the string '0b101010'—the 0b is a prefix, not part of the mathematical value.
Off-by-one in bit position: “Bit 0” usually means the LSB (2⁰). “Bit i” often means the coefficient of 2ⁱ. Check problem wording (0-indexed vs 1-indexed).
Using int(string) without base for non-decimal: int("101010") is 101010 in decimal, not 42. For binary you must use int("101010", 2).

Common Mistake

int("101010") returns 101010 (one hundred one thousand ten in decimal). To interpret "101010" as binary, you must call int("101010", 2), which returns 42. Always pass the base when the string is not in decimal.

Pattern Recognition

Base conversion fits two patterns you’ll reuse:

Decimal → base b: Repeated division by b; remainders (reverse order) = digits. Same idea as “extract digits” in decimal (n % 10, n // 10).
Base b → decimal: Horner-style expansion: start at 0, then for each digit do value = value * b + digit. One pass left-to-right.

Many “digit” problems (e.g., “sum of digits in base k,” “palindrome in base b”) use these same building blocks.

Optimization Insight

Binary ↔ hex conversion is O(number of digits): group four bits per hex digit. Decimal ↔ binary via repeated division is O(log n) steps (each step halves the number). For very large numbers, Python’s int(..., 2) and bin() are implemented in C and are efficient; use them instead of hand-rolled loops when possible.

Interview Insight

When a problem involves “binary representation,” “number of 1 bits,” “reverse bits,” or “base k,” state clearly: “We can get the binary representation with bin(n) or by repeated division; then operate on the string or digits.” For “count set bits,” mention that you can use bin(n).count('1') or bit operations (n & 1 and n >>= 1 in a loop). Knowing both the string and the bit-op approaches shows depth.

Practice Problems

Convert a decimal number to binary and to hex by hand for small values (e.g., 0–255).
Implement to_base(n, b) and from_base(s, b) for bases 2–36 without using int(s, b) or bin/hex (practice the algorithm).
Given a positive integer, count the number of 1s in its binary representation (Hamming weight).
Check if a number’s binary representation is a palindrome (e.g., 9 = 1001 is; 10 = 1010 is not).
Convert a number from base A to base B (e.g., base 7 → base 5) by going through decimal or by direct conversion if you know the place values.

Summary

A number system is a way of writing numbers using a base (radix) and a set of digits. In a positional system, value = sum of digit × base^position.
Decimal (base 10), binary (base 2), octal (base 8), and hexadecimal (base 16) are the most common in programming. Binary is fundamental for bits; hex is a compact shorthand for binary (one hex digit = four bits).
Decimal → base b: Repeated division by b; remainders (reverse order) give the digits. Base b → decimal: Expand by place value, or use Horner: value = value * b + digit.
In Python: int(string, base) parses; bin(), oct(), hex() and format(n, 'b'/'o'/'x') format. Use these for correctness and speed; implement by hand when practicing the algorithm.
Number systems underpin bit manipulation, fast exponentiation, and encoding—master them early for a smooth path through Mathematics for Algorithms and Bit Manipulation.

4.2 Prime Numbers

Introduction

A prime number is a natural number greater than 1 that has exactly two positive divisors: 1 and itself. Primes are the building blocks of integers—every integer greater than 1 can be written uniquely (up to order) as a product of primes. In DSA, primes appear in hashing (hash table sizes), cryptography, factorization problems, and counting (e.g., divisors, coprime pairs). This section covers the definition, why primes matter, how to test if a number is prime, and how to count primes in a range—setting you up for the Sieve of Eratosthenes (next topic) and number-theoretic algorithms.

Real-World Analogy

Think of primes as indivisible building blocks. Just as you can’t split an atom into smaller pieces of the same substance (in classical chemistry), you can’t factor a prime into smaller positive integers other than 1 and itself. Composite numbers are “molecules” made of primes: 12 = 2×2×3, 42 = 2×3×7. Once you know the primes up to a limit, you can build and analyze all integers in that range—same idea as having a periodic table of elements.

Formal Definition

Concept Note

A prime number is an integer p ≥ 2 such that the only positive divisors of p are 1 and p. An integer n ≥ 2 that is not prime is composite; it has at least one divisor d with 2 ≤ d ≤ n−1.

By convention, 1 is neither prime nor composite. The first primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, … Note that 2 is the only even prime; every other even number is divisible by 2.

Why This Topic Matters

Primality testing: “Is n prime?” appears in problems and interviews. You need a correct, efficient check—often trial division first, then optimizations.
Counting primes: “How many primes ≤ N?” or “List all primes ≤ N” lead to the Sieve of Eratosthenes (next section)—one of the most common algorithms in number theory.
Divisors and factorization: GCD, LCM, divisor count, and “sum of divisors” rely on prime factorization. Primes are the first step.
Hashing and randomization: Prime-sized hash tables and prime moduli reduce collisions. Choosing a “large enough prime” is a standard trick.

Mental Model

For a number n to be composite, it must have a divisor d with 2 ≤ d ≤ √n. Why? If n = a×b with a ≤ b, then a ≤ √n (otherwise a×b > n). So we only need to check divisors up to √n—that’s the core of trial-division primality testing and of “find one factor” for composites.

  n = a × b   with  a ≤ b
  ⇒  a² ≤ a·b = n
  ⇒  a ≤ √n

  So: if no divisor in [2, √n], then n is prime.

Primality Testing: Is n Prime?

We want a function is_prime(n) that returns True if n is prime and False otherwise.

Brute Force: Trial Division

Check every integer d from 2 to n−1: if d divides n, then n is composite. If none divide, n is prime. Correct but slow—O(n) divisions for n.

Better: Check Only Up to √n

If n has a divisor d > √n, then n/d is a divisor < √n. So we only need to check d from 2 to ⌊√n⌋. If n % d == 0 for any such d, n is composite; otherwise prime. This reduces the loop to O(√n) iterations.

Optimal for Trial Division: Check 2, Then Odd Numbers

If n is even, we can immediately return n == 2. Then check only odd d: 3, 5, 7, … up to √n. This halves the number of divisions (still O(√n) but with a smaller constant). For very large n, probabilistic tests (e.g., Miller–Rabin) are used in practice; for DSA and interviews, trial division up to √n is usually enough.

Optimization Insight

Brute force: O(n) checks. Better: O(√n) by checking only up to √n. Best trial division: O(√n) but only odd divisors after 2. For “list all primes ≤ N,” the Sieve of Eratosthenes (next topic) is O(N log log N)—much better than testing each number with trial division.

Python Implementation

`is_prime(n)` — Trial Division

def is_prime(n: int) -> bool:
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    d = 3
    while d * d <= n:
        if n % d == 0:
            return False
        d += 2
    return True

Line-by-Line Explanation

n < 2 — Primes are ≥ 2; 0 and 1 are not prime.
n == 2 — 2 is the only even prime; return early.
n % 2 == 0 — Any other even number is composite.
while d * d <= n — Check only up to √n. We use d * d <= n to avoid floating-point √n. We start d = 3 and add 2 each time (odd divisors only).
If any such d divides n, return False. If the loop finishes, no divisor was found, so return True.

Time and Space Complexity

Time: O(√n). The loop runs at most about √n times (only odd d, so roughly √n/2 iterations). Each iteration does a remainder and a multiply; both O(1) for fixed-size integers. For arbitrary-precision integers, cost per division grows with the number of digits; we still say O(√n) in terms of the value n when analyzing algorithm design.

Space: O(1). Only a few variables.

Edge Cases

n < 2: Return False. 0 and 1 are not prime.
n == 2: Return True. Handle before the “even” check so we don’t incorrectly label 2 as composite.
Large n: Trial division is fine for n up to around 10¹² in practice (√n ≈ 10⁶). For bigger n, use a probabilistic test or accept that trial division is slow.

Counting Primes and Listing Primes

To answer “how many primes ≤ N?” or “list all primes ≤ N,” testing each number with is_prime would take about O(N √N) time—too slow for large N. The Sieve of Eratosthenes (next topic, 4.3) does this in O(N log log N) by marking composites once. Here we only note the goal; the algorithm is in the next section.

Expert Tip

For a single query “is n prime?”, use trial division. For “all primes up to N” or “count primes up to N”, use the Sieve. Don’t sieve when you only need one primality check; don’t do N trial divisions when you need the full list.

Common Mistakes

Treating 1 as prime: 1 has only one positive divisor (itself). By definition, primes have exactly two divisors (1 and n). So 1 is not prime.
Checking up to n−1 instead of √n: That leaves the algorithm correct but O(n) instead of O(√n). Always use the √n bound.
Using floating-point sqrt(n): int(n**0.5) can have rounding errors for large n. Prefer while d * d <= n so all arithmetic is integer.
Forgetting to handle 2: If you do “if n % 2 == 0: return False” before “if n == 2: return True”, you’ll incorrectly say 2 is not prime.

Common Mistake

1 is not prime. The definition requires exactly two positive divisors. 1 has only one divisor (1), so it is excluded. Your is_prime(1) must return False.

Interview Insight

When asked “how do you check if a number is prime?”, say: “Trial division: check divisors from 2 to √n. If any divide n, it’s composite; otherwise prime. We can skip evens after 2 to speed up. Time O(√n), space O(1).” If the follow-up is “count primes up to N,” say: “Then we’d use the Sieve of Eratosthenes to mark composites and count primes in O(N log log N).”

Practice Problems

Implement is_prime(n) with trial division (up to √n, odd divisors after 2).
Given N, count how many primes are in the range [2, N] using trial division for each (then compare with the Sieve in the next section).
Find the smallest prime factor of n (or return n if prime)—same loop as primality test, but return the first d that divides n.
Check if n is a “prime power” (n = p^k for some prime p and k ≥ 1): after confirming n > 1, find the smallest prime factor p; then check if n is a power of p.

Summary

A prime is an integer ≥ 2 whose only positive divisors are 1 and itself. 1 is not prime. 2 is the only even prime.
Primality test: Trial division—check if any d in [2, √n] divides n. If yes, composite; if no, prime. Optimize by checking 2 then only odd divisors. Time O(√n), space O(1).
For “list/count primes ≤ N,” use the Sieve of Eratosthenes (next topic)—O(N log log N)—not N separate trial divisions.
Edge cases: n < 2 → false; n == 2 → true; use integer comparison d*d <= n instead of floating-point √n to avoid rounding issues.

4.3 Sieve of Eratosthenes

Introduction

The Sieve of Eratosthenes is an ancient algorithm that finds all prime numbers up to a given limit N by repeatedly marking multiples of primes as composite. Instead of testing each number with trial division (which would cost about O(N √N) for the whole range), the sieve marks each composite number once—or a small number of times—yielding a total cost of O(N log log N) time and O(N) space. It is the standard way to “list all primes ≤ N” or “count primes ≤ N” and is a must-know for number theory, competitive programming, and interviews.

Real-World Analogy

Imagine a long list of numbers from 2 to N written on a board. You go through the list in order. When you see a number that hasn’t been crossed out, it’s prime—so you cross out all its multiples (4, 6, 8, … for 2; then 6, 9, 12, … for 3; and so on). When you’re done, every number still left is prime. You never “test” a number by dividing it; you only erase multiples of numbers you’ve already declared prime. That’s the sieve: “sift out” composites by marking multiples of each prime.

Formal Definition

Concept Note

The Sieve of Eratosthenes works as follows: maintain a boolean array is_prime[0..N] (or similar) where initially every index is considered “prime.” For each i from 2 to √N (or to N): if i is still marked prime, then mark all multiples of i (2i, 3i, …) as composite. When the loop finishes, every index that remains marked prime (in [2, N]) is a prime number.

We only need to start crossing multiples from p = 2 up to p = √N, because any composite ≤ N has a prime factor ≤ √N—so it will already be marked when we process that factor.

Why This Topic Matters

Count primes / list primes: Many problems ask “how many primes ≤ N?” or “output all primes ≤ N.” The sieve answers both in one pass.
Precomputation: Once you’ve sieved up to N, you have O(1) primality checks for any number ≤ N (just look up the array). Useful when you need many such checks.
Prime factorization, divisors: A common variant stores the smallest prime factor (SPF) for each number instead of just a boolean. That allows fast factorization and divisor enumeration.
Interview staple: “Count primes less than n” is a classic LeetCode-style problem; the expected solution is the sieve.

Mental Model

Think of the sieve as “every composite number has a smallest prime factor.” When we process prime p, we mark all multiples of p. The first time a composite m gets marked is when we process its smallest prime factor. So we’re not “testing” each number—we’re just ensuring each composite gets marked once (by its smallest prime factor). That’s why the total work is much less than N × √N.

  For each prime p in [2, √N]:
      Mark 2p, 3p, 4p, ... (multiples of p) as composite.

  After the loop: any unmarked number in [2, N] is prime.
  Key: we only iterate p up to √N; multiples beyond N are skipped.

Step-by-Step Breakdown

Create: An array is_prime of length N+1. Set is_prime[0] = is_prime[1] = False (0 and 1 are not prime). Set is_prime[2..N] = True (assume prime until marked composite).
Outer loop: For p from 2 to N (or to √N for the optimized version): if is_prime[p] is still True, then p is prime.
Mark composites: For each multiple of p—i.e. p*2, p*3, p*4, ...—set is_prime[k] = False for k = 2*p, 3*p, ... (stop when k > N).
Optional optimization: Start marking from p*p instead of 2*p. Why? Because 2*p, 3*p, ..., (p-1)*p have already been marked when we processed smaller primes (2, 3, …, p−1). So we only need to mark p², p²+p, p²+2p, ... up to N.
Result: After the loop, collect all i in [2, N] such that is_prime[i] is True—those are the primes. Or count them.

ASCII Diagram

Sieving primes up to 30. We mark multiples of 2, then 3, then 5 (we don’t need to go beyond √30 ≈ 5). X = composite, . = prime (unchanged).

  p=2: mark 4,6,8,10,12,14,16,18,20,22,24,26,28,30
  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
  .  .  X  .  X  .  X  .  X  .  X  .  X  .  X  .  X  .  X  .  X  .  X  .  X  .  X  .  X

  p=3: mark 9,15,21,27
  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
  .  .  X  .  X  .  X  X  X  .  X  .  X  X  .  X  .  X  .  X  .  X  .  X  .  X  X  .  X

  p=5: mark 25
  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
  .  .  X  .  X  .  X  X  X  .  X  .  X  X  .  X  .  X  .  X  .  X  .  X  X  .  X  .  X

  Primes: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29

Evolution: Brute Force → Sieve → Optimized Sieve

Brute Force

For each number i from 2 to N, call is_prime(i) (trial division). Time: O(N √N). Space: O(1) per call. Too slow for large N.

Sieve of Eratosthenes (Basic)

One array; for each p from 2 to N, if p is prime, mark all multiples of p. Time: O(N log log N) (see below). Space: O(N). Much faster than brute force.

Optimized Sieve

(1) Outer loop only to √N—once we’ve processed primes up to √N, all composites ≤ N are already marked. (2) Start marking multiples at p²—smaller multiples were marked by smaller primes. (3) Optional: use a list of booleans for odd numbers only (half the memory, index mapping). The asymptotic time remains O(N log log N); constants improve.

Optimization Insight

Stopping the outer loop at √N doesn’t change the asymptotic time (most marking is done by small primes), but it avoids redundant work. Starting inner loop at p² is both correct and faster: we mark fewer cells per prime. For “count primes” only, you can use a bit-packed sieve (one bit per odd number) to reduce memory.

Python Implementation

Basic Sieve: List All Primes ≤ N

def sieve(n: int) -> list[int]:
    if n < 2:
        return []
    is_prime = [True] * (n + 1)
    is_prime[0] = is_prime[1] = False
    p = 2
    while p * p <= n:
        if is_prime[p]:
            for k in range(2 * p, n + 1, p):
                is_prime[k] = False
        p += 1
    return [i for i in range(2, n + 1) if is_prime[i]]

Count Primes Only (Same Idea)

def count_primes(n: int) -> int:
    if n < 2:
        return 0
    is_prime = [True] * (n + 1)
    is_prime[0] = is_prime[1] = False
    p = 2
    while p * p <= n:
        if is_prime[p]:
            for k in range(2 * p, n + 1, p):
                is_prime[k] = False
        p += 1
    return sum(1 for i in range(2, n + 1) if is_prime[i])

Optimized: Start Marking at p²

def sieve_optimized(n: int) -> list[int]:
    if n < 2:
        return []
    is_prime = [True] * (n + 1)
    is_prime[0] = is_prime[1] = False
    p = 2
    while p * p <= n:
        if is_prime[p]:
            for k in range(p * p, n + 1, p):
                is_prime[k] = False
        p += 1
    return [i for i in range(2, n + 1) if is_prime[i]]

The only change is range(2 * p, ...) → range(p * p, ...). Correct because multiples p×2, p×3, …, p×(p−1) were already marked when we processed primes 2, 3, …, p−1.

Line-by-Line Explanation (Basic Sieve)

if n < 2: return [] — No primes below 2.
is_prime = [True] * (n + 1) — Index 0..n; we’ll use indices 0 and 1 for 0 and 1, and 2..n for numbers 2..n.
is_prime[0] = is_prime[1] = False — 0 and 1 are not prime.
while p * p <= n — We only need to consider p up to √n. If p > √n, then any multiple p×k ≥ p² > n, so we’d mark nothing new.
if is_prime[p] — If p was already marked composite (by a smaller prime), skip; we’ve already marked its multiples.
for k in range(2*p, n+1, p) — Mark 2p, 3p, 4p, … up to n. Step size p gives exactly the multiples of p.
return [i for i in range(2, n+1) if is_prime[i]] — Collect all indices that are still True; those are the primes.

Time Complexity

For each prime p ≤ √N, we mark about N/p numbers (multiples of p). Total operations is roughly:

N/2 + N/3 + N/5 + N/7 + … (sum over primes p ≤ N). Factor out N: N × (1/2 + 1/3 + 1/5 + …). The sum of 1/p over primes ≤ N is known to be about log log N. So total is O(N log log N).

Stopping the outer loop at √N doesn’t change the dominant term (most marks come from small primes). Starting the inner loop at p² reduces the number of marks per prime but the same asymptotic holds. So the sieve is O(N log log N) time.

Space Complexity

We use one boolean array of length N+1 → O(N) space. If we pack one bit per number (or only store odd indices), we can get O(N) bits, i.e. O(N/8) bytes—same big-O, smaller constant.

Edge Cases

n < 2: No primes. Return [] or count 0.
n == 2: One prime (2). The outer loop runs p=2, 2×2=4 > 2 so the inner loop runs no times; is_prime[2] stays True. Correct.
Large N: O(N) space can be an issue for N in the hundreds of millions. Use a segmented sieve or bit-packed storage if needed.

Common Mistakes

Including 0 or 1 as prime: Set is_prime[0] = is_prime[1] = False at the start.
Looping p to N instead of √N: Correct but wasteful. Stopping at √N is enough and faster.
Starting the inner loop at 2*p but forgetting p² optimization: Correct either way; starting at p² is a performance improvement.
Off-by-one in range: Use range(2, n + 1) so that n itself is included when we collect primes. Use range(p * p, n + 1, p) so the last multiple ≤ n is marked.

Common Mistake

Forgetting to mark 0 and 1 as non-prime. If you leave is_prime[0] or is_prime[1] as True, your list or count will be wrong. Always set is_prime[0] = is_prime[1] = False before the loop.

Pattern Recognition

“Mark multiples of each prime” is a recurring idea: when you want to eliminate all numbers that are “multiples of something,” iterate over the “something” (here, primes) and mark multiples in one batch. The same pattern appears in segmented sieves (sieving a range [L, R]) and in some divisor-sum precomputations.

Interview Insight

For “count primes less than n” (or “list primes ≤ n”), say: “I’ll use the Sieve of Eratosthenes. Create a boolean array, mark 0 and 1 as non-prime, then for each p from 2 to √n, if p is still prime, mark all multiples of p. Finally count or collect indices that remain True. Time O(n log log n), space O(n).” If they ask for optimization, mention starting the inner loop at p² and stopping the outer loop at √n.

Practice Problems

Implement the sieve and return the list of primes ≤ N; then implement count-only without building the list.
LeetCode 204: Count Primes — use the sieve; handle n ≤ 2.
Precompute a “smallest prime factor” (SPF) array: for each number, store its smallest prime divisor. (When marking multiples of p, if spf[k] is not yet set, set spf[k] = p.) Then use SPF to factor any number ≤ N quickly.
Segmented sieve: find primes in an interval [L, R] without building a full sieve up to R (useful when R is huge but R−L is moderate).

Summary

The Sieve of Eratosthenes finds all primes ≤ N by marking multiples of each prime; unmarked numbers are prime. Use a boolean array and iterate p from 2 to √N; for each prime p, mark multiples (start at 2p or p²).
Time: O(N log log N). Space: O(N).
Evolution: brute force (N × trial division) → sieve → optimized sieve (outer loop to √N, inner from p²).
Edge cases: n < 2 → empty list / 0; always set is_prime[0] = is_prime[1] = False.
Standard for “list/count primes ≤ N”; variant with smallest-prime-factor array enables fast factorization.

4.4 GCD & LCM

Introduction

The Greatest Common Divisor (GCD) of two integers is the largest positive integer that divides both. The Least Common Multiple (LCM) is the smallest positive integer that both divide. GCD and LCM are fundamental in number theory, modular arithmetic, simplifying fractions, and many DSA problems (e.g., “minimum steps to reach,” “repeat pattern,” “coprime pairs”). This section covers definitions, the Euclidean algorithm for GCD (and why it’s fast), the link between GCD and LCM, and clean Python implementations—including edge cases and interview-ready phrasing.

Real-World Analogy

Imagine you have two ropes of length 12 and 18. You want to cut both into equal pieces (no leftover), with each piece as long as possible. The piece length must divide both 12 and 18—so it’s a common divisor. The greatest such length is 6. That’s the GCD(12, 18) = 6. Now imagine two gears that turn every 12 and 18 seconds. When do they align (both at the start)? The first time is at the smallest positive time that both 12 and 18 divide—that’s the LCM(12, 18) = 36. So: GCD = “largest common measure”; LCM = “earliest common multiple.”

Formal Definition

Concept Note

GCD(a, b) (or gcd(a, b)) is the largest positive integer d such that d | a and d | b (i.e., d divides both a and b). By convention, GCD(0, b) = b and GCD(a, 0) = a (so GCD(0, 0) is often defined as 0).

LCM(a, b) is the smallest positive integer m such that a | m and b | m. For positive a, b, we have LCM(a, b) × GCD(a, b) = a × b, so LCM(a, b) = a × b / GCD(a, b).

For negative numbers, we usually work with absolute values: GCD(−12, 18) = GCD(12, 18) = 6. So in code we often take gcd(abs(a), abs(b)).

Why This Topic Matters

Modular arithmetic and cryptography: Extended GCD (next topic) is used to compute modular inverses—essential for RSA and many algorithms.
Simplifying fractions: A fraction a/b in lowest terms is (a/gcd(a,b)) / (b/gcd(a,b)).
Periodicity and “when do two events align?”: Problems like “two runners with periods 12 and 18—when do they meet?” reduce to LCM (or GCD of periods in special cases).
Interview problems: “Minimum steps to make two numbers equal,” “number of coprime pairs,” “split array into groups with GCD K”—all use GCD/LCM.

Mental Model

Think of a and b in terms of their prime factorizations. The GCD takes the minimum exponent for each prime (the “overlap”). The LCM takes the maximum exponent (the “union”). So for 12 = 2²×3 and 18 = 2×3²: GCD = 2¹×3¹ = 6, LCM = 2²×3² = 36. And indeed 6 × 36 = 12 × 18. That’s the intuition behind GCD × LCM = a × b (for positive a, b).

  a = 12 = 2² × 3¹    b = 18 = 2¹ × 3²
  GCD = common part    = 2¹ × 3¹ = 6
  LCM = combined part  = 2² × 3² = 36
  Check: 6 × 36 = 216 = 12 × 18

GCD: Brute Force → Euclidean Algorithm

Brute Force

List all divisors of a and b; take the largest number that appears in both. Finding divisors of n costs O(√n), so total is about O(√a + √b). Works for small numbers but is slow and clumsy.

Euclidean Algorithm (Optimal)

Key identity: GCD(a, b) = GCD(b, a mod b). Why? Any common divisor of a and b also divides a − b, a − 2b, …, so it divides a mod b. So the set of common divisors of (a, b) and (b, a mod b) is the same; hence the GCD is the same. Base case: GCD(a, 0) = a (the largest divisor of a is a). So we repeatedly replace (a, b) with (b, a mod b) until the second number is 0; then the first number is the GCD.

Example

GCD(48, 18): 48 mod 18 = 12 → GCD(18, 12). 18 mod 12 = 6 → GCD(12, 6). 12 mod 6 = 0 → GCD(6, 0) = 6. So GCD(48, 18) = 6.

Each step roughly halves the larger number (a mod b < b, and often much smaller), so the number of steps is O(log min(a, b)). No factorization needed—just remainders.

Step-by-Step: Euclidean Algorithm

Start with (a, b). If b = 0, return a (base case).
Otherwise compute r = a mod b and return GCD(b, r).
Repeat until the second argument is 0. The first argument at that point is the GCD.

  GCD(48, 18)
  = GCD(18, 48 mod 18) = GCD(18, 12)
  = GCD(12, 18 mod 12) = GCD(12, 6)
  = GCD(6, 12 mod 6)   = GCD(6, 0)
  = 6

LCM from GCD

For positive integers a, b: LCM(a, b) = a × b / GCD(a, b). So compute GCD first, then LCM = (a // g) * b (or a * (b // g)) to avoid overflow: divide one number by the GCD first, then multiply by the other. In Python, integers are arbitrary size, but (a // gcd) * b keeps the intermediate values smaller and is good practice for other languages.

Common Mistake

Writing lcm = a * b // gcd(a, b) can overflow in C++/Java for large a, b. Prefer lcm = (a // gcd(a, b)) * b so the multiplication is with a smaller factor. In Python it’s less critical but still clearer and portable.

Python Implementation

GCD — Recursive

def gcd(a: int, b: int) -> int:
    a, b = abs(a), abs(b)
    if b == 0:
        return a
    return gcd(b, a % b)

GCD — Iterative

def gcd_iter(a: int, b: int) -> int:
    a, b = abs(a), abs(b)
    while b:
        a, b = b, a % b
    return a

LCM (Using GCD)

def lcm(a: int, b: int) -> int:
    if a == 0 or b == 0:
        return 0
    return (a // gcd(abs(a), abs(b))) * abs(b)

We use (a // g) * b so the division happens first (no overflow). We take absolute values so LCM is defined for negatives in a consistent way (e.g., LCM(−12, 18) = LCM(12, 18) = 36).

Using the Standard Library

import math
math.gcd(a, b)   # GCD; handles 0 and negatives sensibly
# Python 3.9+:
math.lcm(a, b)   # LCM; returns 0 if either argument is 0

In practice, use math.gcd and math.lcm when available; implement by hand when practicing the algorithm or when you need extended GCD (next topic).

Line-by-Line Explanation (Iterative GCD)

a, b = abs(a), abs(b) — GCD is defined for non-negative numbers; negatives are handled by taking absolute values so the result is non-negative.
while b: — Loop until the second argument is 0. When b becomes 0, the current a is the GCD.
a, b = b, a % b — Replace (a, b) with (b, a mod b). This is one step of the Euclidean algorithm; the GCD is unchanged.
return a — When b = 0, GCD(a, 0) = a, so we return a.

Time Complexity

GCD: Each step reduces the larger argument. It can be shown that in two steps, the larger number is at least halved (e.g., if a > b, then (a mod b) < b, and after one more step we have (b, a mod b) with both values bounded by the previous larger value). So the number of steps is O(log min(a, b)). Each step is O(1) for fixed-size integers; for big integers, cost per step is proportional to the number of digits. So O(log min(a, b)) iterations; with cost per iteration O(log n) for big integers, total is O(log² min(a, b)) in that model. We usually state it as O(log min(a, b)) for the number of iterations.

LCM: One GCD plus one division and one multiplication → same as GCD, O(log min(a, b)).

Space Complexity

Iterative GCD: O(1) extra space. Recursive GCD: O(log min(a, b)) stack depth. Prefer iterative if you care about stack or want to avoid recursion limits for huge inputs.

Edge Cases

One or both zero: GCD(0, b) = b, GCD(a, 0) = a, GCD(0, 0) = 0 (by convention). LCM(0, b) = LCM(a, 0) = 0 (no positive multiple of 0 in the usual sense; 0 is the standard convention).
Negatives: GCD and LCM are usually defined for positive integers; in code we take absolute values so GCD(−12, 18) = 6, LCM(−12, 18) = 36.
Equal numbers: GCD(a, a) = a, LCM(a, a) = a.

Common Mistakes

Forgetting to handle zero: GCD(a, 0) must return a, not crash or loop forever. In iterative code, while b: correctly stops when b = 0.
LCM overflow: Using a * b // gcd(a, b) in a language with fixed-width integers can overflow. Use (a // gcd(a, b)) * b.
LCM when one is zero: Define LCM(a, 0) = 0 (or avoid calling LCM with 0). Don’t divide by zero.
Assuming a and b are positive: If the problem allows negatives, use abs(a), abs(b) for a non-negative GCD.

Optimization Insight

There is no asymptotically faster algorithm than the Euclidean algorithm for GCD (in the number of digits). The binary GCD (Stein’s algorithm) uses only shifts and subtraction—sometimes faster in practice due to hardware. For interviews and most code, the standard Euclidean (iterative) is enough. Use math.gcd in Python for production.

Pattern Recognition

“Reduce (a, b) to (b, a mod b)” is the same idea as “reduce the problem size using the remainder.” Many number-theoretic algorithms (extended GCD, modular inverse, continued fractions) build on this. When you see “greatest common divisor,” “simplify fraction,” or “coprime,” think GCD first.

Interview Insight

When asked “how do you compute GCD?”, say: “Euclidean algorithm: GCD(a, b) = GCD(b, a mod b), with base case GCD(a, 0) = a. Iterative implementation in a loop: while b is non-zero, set (a, b) = (b, a % b); then return a. Time O(log min(a, b)), space O(1). LCM is a*b/GCD(a,b); compute as (a // gcd) * b to avoid overflow.” If the problem involves “lowest terms” or “coprime,” connect it to dividing by GCD.

Practice Problems

Implement gcd and lcm (iterative and recursive for GCD) and test with (0, b), (a, 0), negatives, and (48, 18).
Simplify a fraction (a, b) to lowest terms: divide numerator and denominator by GCD(a, b).
Count pairs (i, j) in an array such that GCD(arr[i], arr[j]) = 1 (coprime pairs)—use GCD in the inner check or use inclusion–exclusion with prime factors.
Given two numbers and a step, “minimum steps to make both equal” often involves GCD of the steps and the difference.
LCM of an array: LCM(a, b, c) = LCM(LCM(a, b), c); use a loop and the two-argument LCM.

Summary

GCD(a, b) = largest positive integer that divides both. LCM(a, b) = smallest positive integer that both divide. For positive a, b: LCM × GCD = a × b.
Euclidean algorithm: GCD(a, b) = GCD(b, a mod b); base case GCD(a, 0) = a. Iterative: while b ≠ 0, (a, b) = (b, a % b); return a. Time O(log min(a, b)), space O(1) iterative.
LCM: LCM(a, b) = (a // GCD(a, b)) × b (or use math.lcm in Python 3.9+). Handle zero: LCM(a, 0) = 0.
Use abs(a), abs(b) when inputs can be negative. Prefer math.gcd and math.lcm in production; implement by hand for learning and for extended GCD (next topic).

4.5 Extended Euclidean Algorithm

Introduction

The Extended Euclidean Algorithm not only computes GCD(a, b) but also finds integers x and y such that ax + by = gcd(a, b). This identity is called Bézout’s identity. The coefficients x, y are essential for computing modular inverses (next topic), solving linear Diophantine equations, and many cryptographic algorithms. This section builds on the ordinary Euclidean algorithm and shows how to “extend” it to recover the coefficients.

Real-World Analogy

Imagine you have two piles of coins with a and b coins. You’re allowed to add or remove multiples of each pile. The Euclidean algorithm tells you the “unit” of value you can always form (the GCD). The extended algorithm tells you how to form it: “take x copies of the first pile and y copies of the second (with removal meaning negative copies), and you get exactly gcd(a, b).” So you get both the number (GCD) and a “recipe” (x, y) to express it as a combination of a and b.

Formal Definition

Concept Note

Bézout’s identity: For any integers a, b (not both zero), there exist integers x, y such that ax + by = gcd(a, b). The Extended Euclidean Algorithm computes gcd(a, b) and one such pair (x, y). The pair is not unique: (x + k·b/g, y − k·a/g) also works for any integer k, where g = gcd(a, b).

We usually want one concrete solution (x, y). The extended algorithm gives one by propagating coefficients backward through the Euclidean steps.

Why This Topic Matters

Modular inverse: Finding x such that a·x ≡ 1 (mod m) reduces to solving a·x + m·y = 1. That has a solution iff gcd(a, m) = 1; the extended algorithm gives x (mod m). Critical for RSA and many algorithms.
Linear Diophantine equations: Equations like a·x + b·y = c have integer solutions (x, y) iff gcd(a, b) | c. The extended algorithm gives a particular solution when c = gcd(a, b); scale to get one for general c.
Interview and contest problems: “Compute modular inverse,” “find one solution to a·x + b·y = c”—both rely on extended GCD.

Mental Model

In the Euclidean algorithm we repeatedly replace (a, b) with (b, a mod b). At each step we have a = q·b + r, so r = a − q·b. If we already know how to write the GCD as a combination of b and r (i.e., b·x₁ + r·y₁ = g), then we can substitute r = a − q·b and get a combination of a and b: a·y₁ + b·(x₁ − q·y₁) = g. So we propagate coefficients backward: from (b, r) we get coefficients for (a, b). Base case: when b = 0, we have a·1 + 0·0 = a = gcd(a, 0).

  At step: a = q·b + r   ⇒   r = a − q·b
  If  b·x₁ + r·y₁ = g  then  b·x₁ + (a − q·b)·y₁ = g
  ⇒  a·y₁ + b·(x₁ − q·y₁) = g   →  new (x, y) = (y₁, x₁ − q·y₁)

Step-by-Step: Recursive Form

Base case: If b = 0, then gcd(a, 0) = a and a·1 + 0·0 = a. So return (a, 1, 0) meaning (gcd, x, y).
Recursive step: Compute (g, x₁, y₁) = extended_gcd(b, a % b). So g = b·x₁ + (a % b)·y₁. We have a = q·b + (a % b) with q = a // b. So (a % b) = a − q·b. Substitute: g = b·x₁ + (a − q·b)·y₁ = a·y₁ + b·(x₁ − q·y₁). So coefficients for (a, b) are x = y₁, y = x₁ − q·y₁. Return (g, x, y).

Example

Extended GCD(35, 15): 35 = 2·15 + 5, so we need coefficients for (15, 5). Extended GCD(15, 5): 15 = 3·5 + 0. Base case: (5, 1, 0) → 15·1 + 0·0 = 5. Back: we had 5 = 35 − 2·15, so 5 = 35·1 + 15·(−2). So (g, x, y) = (5, 1, −2). Check: 35·1 + 15·(−2) = 35 − 30 = 5 ✓.

Python Implementation

Recursive Extended GCD

def extended_gcd(a: int, b: int) -> tuple[int, int, int]:
    """Returns (g, x, y) such that a*x + b*y = g = gcd(a, b)."""
    a, b = abs(a), abs(b)
    if b == 0:
        return (a, 1, 0)
    q = a // b
    g, x1, y1 = extended_gcd(b, a % b)
    # g = b*x1 + (a % b)*y1, and a % b = a - q*b
    # so g = a*y1 + b*(x1 - q*y1)
    x, y = y1, x1 - q * y1
    return (g, x, y)

Iterative Extended GCD

Keep track of (a, b) and coefficients (x_a, y_a) for current a and (x_b, y_b) for current b, such that a = orig_a·x_a + orig_b·y_a and b = orig_a·x_b + orig_b·y_b. Initially a = orig_a, b = orig_b → (x_a, y_a) = (1, 0), (x_b, y_b) = (0, 1). When we replace (a, b) with (b, a − q·b), we update the coefficient vectors the same way: new (x_b, y_b) becomes (x_a − q·x_b, y_a − q·y_b). When b = 0, (g, x, y) = (a, x_a, y_a).

def extended_gcd_iter(a: int, b: int) -> tuple[int, int, int]:
    """Returns (g, x, y) such that a*x + b*y = g = gcd(a, b)."""
    a, b = abs(a), abs(b)
    x_a, y_a = 1, 0
    x_b, y_b = 0, 1
    while b:
        q = a // b
        a, b = b, a % b
        x_a, y_a, x_b, y_b = x_b, y_b, x_a - q * x_b, y_a - q * y_b
    return (a, x_a, y_a)

Line-by-Line Explanation (Recursive)

if b == 0: return (a, 1, 0) — Base case: a·1 + 0·0 = a = gcd(a, 0).
q = a // b — Quotient so that a = q·b + (a % b).
g, x1, y1 = extended_gcd(b, a % b) — Get GCD and coefficients for (b, a % b): g = b·x1 + (a % b)·y1.
x, y = y1, x1 - q * y1 — Substitute (a % b) = a − q·b to get g = a·y1 + b·(x1 − q·y1), so x = y1, y = x1 − q·y1.
return (g, x, y) — One solution to a·x + b·y = g.

Time and Space Complexity

Time: Same as the Euclidean algorithm—O(log min(a, b)) steps. Each step does a division and a few arithmetic operations. So O(log min(a, b)).

Space: Recursive version O(log min(a, b)) stack depth. Iterative version O(1) extra variables.

Edge Cases

b = 0: Return (a, 1, 0). Correct: a·1 + 0·0 = a.
a = 0, b ≠ 0: extended_gcd(0, b) → (b, 0, 1). Check: 0·0 + b·1 = b = gcd(0, b) ✓.
Negative inputs: We take abs(a), abs(b) so we work with non-negative numbers. The GCD is non-negative; x, y can be negative (e.g., 35·1 + 15·(−2) = 5).

Application: Modular Inverse (Preview)

We want x such that a·x ≡ 1 (mod m). That means a·x + m·y = 1 for some y. This has a solution iff gcd(a, m) = 1. Run extended_gcd(a, m) to get (1, x, y). The x you get might be negative; reduce modulo m: x % m (or (x % m + m) % m) is the modular inverse of a modulo m. This is covered in detail in the next topic (Modular Inverse).

Expert Tip

When you need the modular inverse of a modulo m, call extended_gcd(a, m). If g ≠ 1, no inverse exists. Otherwise the inverse is x % m (adjust for negative). In Python 3.8+, you can use pow(a, -1, m) for the inverse when gcd(a, m) = 1.

Common Mistakes

Wrong order of (x, y): The equation is a·x + b·y = g. So the first coefficient multiplies a, the second multiplies b. Don’t swap x and y when returning or when using the result for modular inverse (inverse of a mod m uses the coefficient that multiplies a).
Forgetting to reduce x modulo m for inverse: extended_gcd returns an x that might be negative or larger than m. The modular inverse is (x % m + m) % m (or x % m in Python since % is non-negative when divisor is positive).
Assuming inverse exists: a has an inverse mod m only if gcd(a, m) = 1. Always check g == 1 before using x as the inverse.

Common Mistake

The equation is a·x + b·y = g. So when you compute the modular inverse of a modulo m, you use the coefficient that goes with a (the first coefficient x), not the one that goes with m (the second coefficient y). Inverse of a mod m = x mod m (after checking gcd(a, m) = 1).

Interview Insight

When asked “how do you find coefficients such that ax + by = gcd(a, b)?”, say: “Extended Euclidean algorithm. Recursively compute (g, x1, y1) for (b, a mod b). Then use a = q·b + (a mod b) to get coefficients for (a, b): x = y1, y = x1 − q·y1. Base case: (a, 1, 0) when b = 0. Same complexity as GCD, O(log min(a,b)).” For “modular inverse,” say: “Run extended GCD(a, m); if g = 1, the inverse is x mod m.”

Practice Problems

Implement recursive and iterative extended GCD and verify a·x + b·y = g for (35, 15), (48, 18), (0, 7).
Using extended GCD, compute the modular inverse of a modulo m when gcd(a, m) = 1 (handle negative x).
Solve a·x + b·y = c in integers: first run extended_gcd(a, b). If g does not divide c, no solution. Otherwise one solution is (x·(c//g), y·(c//g)); describe the full solution set.

Summary

Bézout’s identity: There exist integers x, y with a·x + b·y = gcd(a, b). The Extended Euclidean Algorithm computes (g, x, y).
Recursive: Base case (b = 0) → (a, 1, 0). Otherwise (g, x1, y1) = extended_gcd(b, a % b); then x = y1, y = x1 − (a//b)·y1; return (g, x, y).
Iterative: Maintain (a, b) and coefficient vectors (x_a, y_a), (x_b, y_b); update with the same recurrence as (a, b) when doing (a, b) = (b, a % b).
Time O(log min(a, b)), space O(1) iterative. Use extended GCD to get the modular inverse of a mod m (x mod m when g = 1)—next topic.

4.6 Fast Exponentiation

Introduction

Fast exponentiation (also called binary exponentiation or square-and-multiply) computes a^b using only O(log b) multiplications instead of b−1. The idea is to use the binary representation of the exponent: write b in base 2, and combine precomputed powers a^2⁰, a^2¹, a^2², … by squaring repeatedly and multiplying when the current bit is 1. This is essential for modular exponentiation (a^b mod m)—used in RSA, hashing, and many algorithms—where we must keep intermediate results modulo m to avoid overflow. This section covers the idea, the algorithm, and clean Python implementations.

Real-World Analogy

To get 3¹³, you could multiply 3 by itself 12 times. Instead, build powers by doubling: 3¹, 3², 3⁴, 3⁸ (each is the previous squared). Then 13 in binary is 1101, so 3¹³ = 3⁸ × 3⁴ × 3¹—multiply only the powers that correspond to 1-bits. You do about four squarings and a few multiplications instead of 12. Same idea as “repeated doubling” in any setting: grow exponentially, then combine.

Formal Definition

Concept Note

Write the exponent b in binary: b = b_k·2^k + … + b₁·2¹ + b₀·2⁰, where each b_i is 0 or 1. Then a^b = a^b₀ × (a²)^b₁ × (a⁴)^b₂ × … . We compute a, a², a⁴, a⁸, … by repeated squaring, and multiply into the result only when the current bit of b is 1. Total multiplications: O(log b).

For modular exponentiation we compute a^b mod m: after each multiplication or squaring, take the result mod m so numbers stay in [0, m−1].

Why This Topic Matters

Modular exponentiation: RSA and many crypto primitives need a^b mod m for huge b. Naive a × a × … mod m would take b steps; fast exponentiation does O(log b) steps.
Python’s pow(a, b, m): The built-in three-argument form does exactly this. Knowing the algorithm explains why it’s fast and how to implement it when needed.
Matrix exponentiation, recurrence relations: The same “binary exponentiation” idea applies to raising a matrix to power b (e.g., Fibonacci in O(log n))—covered later in the course.

Mental Model

Scan the bits of b from right to left (LSB first). Maintain a running power base = a^(2^i) (start with base = a, then square each time we move left). Maintain a result res (start 1). When the current bit of b is 1, multiply res by base. After each step, square base (for the next bit) and shift b right. When b becomes 0, res is a^b.

  b in binary:  ... b₂ b₁ b₀
  a^b = (a^1)^b₀ × (a^2)^b₁ × (a^4)^b₂ × ...
  So: res = 1; base = a; for each bit of b (LSB first):
      if bit is 1: res *= base
      base *= base; b //= 2
  Return res.

Evolution: Naive → Fast Exponentiation

Naive (Brute Force)

Multiply res = 1 by a exactly b times. Time O(b), space O(1). For b in the millions or more, this is impractical.

Fast Exponentiation (Square-and-Multiply)

Repeated squaring + multiply when bit is 1. Number of iterations = number of bits of b = ⌊log₂ b⌋ + 1. Each iteration does one or two multiplications (squaring base, and possibly multiplying res by base). Time O(log b), space O(1) iterative.

Optimization Insight

There is no way to compute a^b using fewer than Ω(log b) multiplications in the general case—each multiplication can at most double the exponent we can represent. Fast exponentiation achieves O(log b) and is optimal up to constants.

Step-by-Step Example

Compute 3¹³. 13 in binary is 1101 (LSB = 1).

  res=1, base=3, b=13
  b&1=1 → res=1*3=3, base=3²=9,   b=6
  b&1=0 → res=3,   base=9²=81,  b=3
  b&1=1 → res=3*81=243, base=81²=6561, b=1
  b&1=1 → res=243*6561=1594323, base=..., b=0
  Return 1594323. Check: 3^13 = 1594323 ✓

We did four “bit” steps (squaring each time, and three multiplies into res). So about 4 squarings + 3 multiplies instead of 12 multiplies.

Modular Exponentiation

To compute a^b mod m, reduce modulo m after every multiplication and squaring. That keeps all intermediates in [0, m−1] and avoids overflow. The algorithm is the same; only the multiplication and squaring are done mod m.

# In code: res = (res * base) % m; base = (base * base) % m

Python’s pow(a, b, m) does exactly this when m is provided. Use it in production; implement by hand when learning or when you need a custom variant (e.g., matrix exponentiation).

Python Implementation

Iterative: a^b (no mod)

def power(a: int, b: int) -> int:
    """Returns a^b for non-negative b."""
    if b == 0:
        return 1
    res = 1
    base = a
    while b:
        if b & 1:
            res *= base
        base *= base
        b >>= 1
    return res

Iterative: a^b mod m

def power_mod(a: int, b: int, m: int) -> int:
    """Returns (a^b) % m for non-negative b. Assumes m >= 1."""
    if b == 0:
        return 1 % m
    a %= m
    res = 1
    base = a
    while b:
        if b & 1:
            res = (res * base) % m
        base = (base * base) % m
        b >>= 1
    return res

Recursive (Alternative)

def power_mod_rec(a: int, b: int, m: int) -> int:
    if b == 0:
        return 1 % m
    a %= m
    half = power_mod_rec(a, b // 2, m)
    half = (half * half) % m
    if b & 1:
        half = (half * a) % m
    return half

Recursive idea: a^b = (a^b//2)² × a^b%2. Same O(log b) steps; uses O(log b) stack.

Line-by-Line Explanation (Iterative with Mod)

if b == 0: return 1 % m — a⁰ = 1; reduce mod m for consistency (handles m = 1).
a %= m — Work with a in [0, m−1] so base stays small.
res = 1, base = a — Result starts 1; current power of a is a¹.
while b: — Process each bit of b until b = 0.
if b & 1: res = (res * base) % m — If the LSB is 1, multiply result by current base (that bit contributes to the exponent).
base = (base * base) % m — Square: a^{2^i} → a^{2^(i+1)}.
b >>= 1 — Shift right to process the next bit.

Time and Space Complexity

Time: O(log b) iterations. Each iteration does O(1) multiplications (with mod, each multiplication is O(log m) for fixed-size integers, or O((log m)²) for naive big-int). We state O(log b) in terms of the number of steps; with big integers, total can be O(log b · (log a + log m)²) or similar depending on model.

Space: O(1) for the iterative version (a few variables). Recursive version O(log b) stack depth.

Edge Cases

b = 0: a⁰ = 1. Return 1 (or 1 % m).
a = 0, b > 0: 0^b = 0. The loop will leave res = 0 after the first 1-bit (base becomes 0). Or handle explicitly: if a % m == 0 and b > 0, return 0.
m = 1: Any integer mod 1 is 0. So a^b mod 1 = 0 for any a, b. The code 1 % 1 = 0 and (res * base) % 1 = 0 is correct.
Negative exponent: a^−b = 1/(a^b). Not usually needed for modular exponentiation in DSA; if needed, use modular inverse (when gcd(a, m) = 1, a⁻¹ mod m exists and pow(a, -b, m) = pow(a⁻¹, b, m) in Python 3.8+).

Common Mistakes

Forgetting to reduce mod m at each step: If you only take mod at the end, intermediates can overflow (in other languages) or become huge (slow in Python). Always do res = (res * base) % m and base = (base * base) % m.
Using b - 1 instead of b >>= 1: We must process bits, not decrement b. Use b >>= 1 (or b //= 2).
Wrong order of operations: Check the bit (b & 1) before squaring base and shifting b. Multiply into res when bit is 1, then always square base and shift.

Common Mistake

Computing a^b first and then taking mod m only at the end. For large b, a^b is astronomically large and won’t fit in memory. Always reduce modulo m after every multiplication so that values stay bounded by m.

Pattern Recognition

“Binary decomposition of the exponent” is a recurring pattern: any operation that is associative (like multiplication, matrix multiplication) can be raised to power b in O(log b) “multiplications” by using the binary expansion of b. Same idea underlies matrix exponentiation for Fibonacci and linear recurrences.

Interview Insight

When asked “how do you compute a^b mod m efficiently?”, say: “Fast exponentiation using the binary representation of b. Start with res = 1, base = a mod m. For each bit of b from LSB: if the bit is 1, multiply res by base mod m; then square base mod m and shift b right. Time O(log b), space O(1). In Python we can use pow(a, b, m).” If the problem is “a^b without mod,” same algorithm without the % m.

Practice Problems

Implement power_mod(a, b, m) iteratively and verify against pow(a, b, m) for small and large b.
Compute the last k digits of a^b (i.e., a^b mod 10^k) using fast exponentiation.
LeetCode-style: “Count good numbers” or problems that need (base)^(exponent) mod mod—use fast exponentiation.
Later: apply the same “binary exponentiation” idea to matrices (e.g., compute the n-th Fibonacci number in O(log n) using matrix power).

Summary

Fast exponentiation computes a^b in O(log b) multiplications by using the binary expansion of b: a^b = product of a^{2^i} over bits that are 1. Algorithm: res = 1, base = a; for each bit (LSB first): if bit 1, res *= base; base *= base; b //= 2.
Modular exponentiation: Reduce mod m after every multiplication and squaring so intermediates stay in [0, m−1]. Use pow(a, b, m) in Python.
Time O(log b), space O(1) iterative. Same idea extends to matrix exponentiation and other associative operations.
Edge cases: b = 0 → 1; reduce a mod m first; handle a = 0 or m = 1 if needed.

4.7 Modular Arithmetic

Introduction

Modular arithmetic is arithmetic on integers where we only care about the remainder when dividing by a fixed positive integer m (the modulus). Two integers a and b are congruent modulo m, written a ≡ b (mod m), if they leave the same remainder when divided by m—equivalently, if m divides (a − b). Addition, subtraction, and multiplication “work” modulo m: you can reduce before or after the operation and get a consistent result. Division modulo m is different: it requires the modular inverse (next topic). Modular arithmetic is foundational for cryptography, hashing, cyclic structures, and almost every problem that asks for “answer modulo 10⁹+7” in competitive programming.

Real-World Analogy

Think of a clock with 12 hours. If it’s 9 o’clock and you add 5 hours, you get 2 o’clock—not 14. So 9 + 5 ≡ 2 (mod 12). The clock “wraps around” at 12. Similarly, “what day of the week is it 100 days from now?” is modular arithmetic mod 7. The modulus is the size of the cycle; we only care which position we’re in, not how many full cycles have passed.

Formal Definition

Concept Note

For a positive integer m (the modulus), we say a ≡ b (mod m) iff m divides (a − b), i.e., (a − b) is a multiple of m. Equivalently, a and b leave the same remainder when divided by m: a mod m = b mod m.

The residue class of a modulo m is the set of all integers congruent to a mod m. A representative of that class is often chosen in the range [0, m−1]: that’s a mod m (when we define mod to return a value in [0, m−1]).

In code, “a mod m” usually means the remainder when a is divided by m. In Python, a % m returns a value in [0, m−1] when m > 0 (so −17 % 5 = 3, because −17 = (−4)·5 + 3). In C/Java, a % m can be negative when a is negative; the “canonical” representative is then (a % m + m) % m.

Why This Topic Matters

Competitive programming and interviews: Problems often ask for the answer “modulo 10⁹+7” or “mod 998244353” to avoid big integers and focus on the algorithm. You must add, subtract, multiply (and sometimes divide) correctly under the modulus.
Cryptography: RSA and many protocols work in modular arithmetic (mod a large composite or prime).
Hashing: Hash tables use hash(key) % capacity. Understanding mod avoids off-by-one and negative-index bugs.
Cyclic behavior: Sequences that repeat (e.g., state machines, periodic signals) are naturally described with modular arithmetic.

Mental Model

Work with numbers as their “remainder when divided by m.” Every integer is equivalent to exactly one value in {0, 1, …, m−1}. When you add or multiply, do the operation and then take the remainder (or take remainders first—see below). So the universe of values is finite: only m “slots,” and everything wraps around.

  Integers mod m:  ... ≡ -2m ≡ -m ≡ 0 ≡ m ≡ 2m ≡ ...  (all same "slot")
                   ... ≡ -m+1 ≡ 1 ≡ m+1 ≡ ...         (another slot)
  We usually pick representatives 0, 1, ..., m-1.

Basic Operations Modulo m

For addition, subtraction, and multiplication, the following hold:

Addition: (a + b) mod m = ((a mod m) + (b mod m)) mod m. So you can reduce a and b first, then add, then reduce again (avoids overflow in other languages).
Subtraction: (a − b) mod m = ((a mod m) − (b mod m) + m) mod m. The +m ensures the result is non-negative when (a mod m) < (b mod m).
Multiplication: (a · b) mod m = ((a mod m) · (b mod m)) mod m. Reduce before multiplying to keep intermediates small.

Example

Mod 7: 5 + 4 = 9 ≡ 2; 5 − 4 = 1; 5 × 4 = 20 ≡ 6. For subtraction with negative result: 3 − 5 mod 7. (3 − 5) = −2 ≡ 5 (mod 7) because −2 + 7 = 5. So (3 mod 7) − (5 mod 7) = 3 − 5 = −2; then (−2 + 7) % 7 = 5.

Division and the Need for Inverses

Division modulo m is not “divide then take remainder.” In integers, we don’t have fractions. Instead, to “divide by b” mod m we multiply by the modular inverse of b: a number b⁻¹ such that b · b⁻¹ ≡ 1 (mod m). Then (a / b) mod m is defined as (a · b⁻¹) mod m. The inverse exists iff gcd(b, m) = 1 (e.g., when m is prime, every non-zero b has an inverse). So:

(a / b) mod m = (a · b⁻¹) mod m, where b⁻¹ is the modular inverse of b (next topic). Never compute (a mod m) / (b mod m) as integers and then take mod—that’s wrong.

Common Mistake

Assuming (a / b) mod m = (a mod m) / (b mod m). Division in modular arithmetic is multiplication by the inverse: (a · b⁻¹) mod m. If you don’t have the inverse, you can’t “divide” in the usual sense.

Congruence Properties

If a ≡ b (mod m) and c ≡ d (mod m), then:

a + c ≡ b + d (mod m)
a − c ≡ b − d (mod m)
a · c ≡ b · d (mod m)
a^k ≡ b^k (mod m) for any non-negative integer k

So we can replace any term with a congruent one before doing operations. That justifies reducing mod m at each step to keep numbers small.

Python Implementation

Addition and Multiplication

def add_mod(a: int, b: int, m: int) -> int:
    return (a % m + b % m) % m

def sub_mod(a: int, b: int, m: int) -> int:
    return (a % m - b % m + m) % m

def mul_mod(a: int, b: int, m: int) -> int:
    return (a % m * (b % m)) % m

Adding m in sub_mod ensures the result is in [0, m−1] even when a % m < b % m (since in Python a % m is already in [0, m−1], (a % m - b % m) can be negative; adding m and then % m fixes it).

Power: a^b mod m

# Use fast exponentiation; in Python:
pow(a, b, m)   # computes (a^b) % m efficiently

Negative Numbers and “Canonical” Representative

# In Python, a % m is already in [0, m-1] when m > 0
-17 % 5   # 3

# If you get a from a language where % can be negative:
def to_canonical(a: int, m: int) -> int:
    return (a % m + m) % m

Line-by-Line Explanation (sub_mod)

a % m, b % m — Bring both into [0, m−1].
a % m - b % m — Can be negative (e.g., 3 − 5 = −2).
+ m — Add one modulus so the value becomes in [0, m−1] (e.g., −2 + 7 = 5).
% m — Final reduce (redundant if we only added one m, but safe if we ever add more or if the first % was done in a different way).

Time and Space Complexity

Addition, subtraction, multiplication modulo m: O(1) for fixed-size integers; O(log m) or O(log a + log b) for arbitrary-precision. Power mod m: O(log b) using fast exponentiation (previous topic).

Edge Cases

m = 1: Every integer is ≡ 0 (mod 1). So a mod 1 = 0 for any a. Operations mod 1 always yield 0.
m < 1: Modulus is usually defined as positive. In code, avoid m ≤ 0 or handle explicitly (Python’s % with negative divisor has a different convention).
Negative a: In Python, a % m is in [0, m−1]. So (−17) % 5 = 3. No extra fix needed. In other languages, use (a % m + m) % m.
Large intermediates: Always reduce before and after multiplication so (a * b) % m is computed as ((a % m) * (b % m)) % m to avoid overflow in C++/Java; in Python it’s for speed and clarity.

Common Mistakes

Dividing without inverse: (a / b) mod m must be (a · b⁻¹) mod m. Don’t use integer division.
Subtraction giving negative: (a − b) mod m must be in [0, m−1]. Use (a % m − b % m + m) % m.
Overflow in multiplication: In C++/Java, (a % m) * (b % m) can still overflow if m is large. Use long or ((a % m) * (b % m)) % m with 64-bit; for very large m, use a type that can hold (m−1)² or use a custom big-int approach.
Assuming mod distributes over everything: (a − b) mod m ≠ (a mod m) − (b mod m) when the right-hand side is negative; you must add m. And (a / b) mod m ≠ (a mod m) / (b mod m).

Optimization Insight

In long expressions (e.g., sums of products), reduce after each operation so that every intermediate stays in [0, m−1]. That keeps numbers small and avoids overflow. For (a + b + c) mod m, you can do ((a + b) % m + c) % m; for (a * b * c) mod m, ((a * b) % m * c) % m.

Pattern Recognition

Whenever the problem says “output modulo 10⁹+7” (or similar), all arithmetic in your solution should be done mod m. Counts, sums, products—reduce at each step. If you need to “divide” (e.g., divide by 2 or by n!), use the modular inverse. The pattern is: work in the ring of integers mod m; addition, subtraction, multiplication are safe; division is multiply by inverse.

Interview Insight

When the problem asks for “answer mod 10⁹+7,” say: “I’ll do all arithmetic modulo m. For addition and multiplication I’ll reduce after each step. For subtraction I’ll use (a - b + m) % m to keep the result non-negative. For division I’ll use the modular inverse (e.g., pow(b, -1, m) in Python or extended GCD).” Mention that a ≡ b (mod m) means (a − b) is divisible by m, and that we work with representatives in [0, m−1].

Practice Problems

Implement add_mod, sub_mod, mul_mod and test with negative numbers and large values.
Compute (a + b + c) mod m and (a · b · c) mod m by reducing at each step.
Given a, b, m, compute (a − b) mod m correctly when a < b.
Problems that ask for “number of ways mod 10⁹+7”: use modular arithmetic throughout; when you need to divide by k!, compute k! mod m and then multiply by its inverse mod m.

Summary

a ≡ b (mod m) iff m | (a − b); equivalently, same remainder when divided by m. We usually work with representatives in [0, m−1].
Addition/subtraction/multiplication: (a ± b) mod m and (a · b) mod m can be computed by reducing operands first, then doing the operation, then reducing. For subtraction use (a % m − b % m + m) % m to keep result in [0, m−1].
Division: (a / b) mod m = (a · b⁻¹) mod m; b⁻¹ is the modular inverse (exists when gcd(b, m) = 1). Never use integer division.
Congruence is preserved under +, −, ·, and powers. Reduce at each step to avoid overflow and keep intermediates small. Use pow(a, b, m) for a^b mod m.

4.8 Modular Inverse

Introduction

The modular inverse of an integer a modulo m is an integer x such that a · x ≡ 1 (mod m). We write x ≡ a⁻¹ (mod m). It lets us “divide” by a in modular arithmetic: (b / a) mod m = (b · a⁻¹) mod m. The inverse exists if and only if gcd(a, m) = 1 (a and m are coprime). When it exists, it is unique modulo m. This section covers when and why the inverse exists, two ways to compute it (Extended Euclidean and Fermat’s little theorem when m is prime), and how to use it in code.

Real-World Analogy

In normal arithmetic, the inverse of 3 is 1/3 because 3 × (1/3) = 1. Modulo m we only have integers, so we can’t use fractions. The modular inverse is the integer that “plays the role” of 1/a: when you multiply a by it, you get 1 (mod m). For example, mod 7 we have 3 × 5 = 15 ≡ 1 (mod 7), so 5 is the inverse of 3 mod 7. “Dividing by 3” mod 7 means multiplying by 5.

Formal Definition

Concept Note

Modular inverse: For integers a and m with m > 0, a⁻¹ (mod m) is an integer x in [0, m−1] such that a · x ≡ 1 (mod m). Such an x exists iff gcd(a, m) = 1, and when it exists it is unique modulo m (all solutions are x + k·m for integer k).

Why gcd(a, m) = 1? The equation a·x ≡ 1 (mod m) means a·x + m·y = 1 for some integer y. By Bézout’s identity, this has a solution iff gcd(a, m) divides 1, i.e., gcd(a, m) = 1.

Why This Topic Matters

Division mod m: To compute (a / b) mod m you need b⁻¹ mod m, then (a · b⁻¹) mod m. Essential whenever the problem asks for “answer mod 10⁹+7” and your formula involves division (e.g., n! / (k! (n−k)!) for combinations).
Combinatorics: nCr mod p, Catalan numbers, partition counts—many use factorials and require dividing by factorials mod p. Precompute factorials and inverse factorials mod p using the inverse.
Cryptography: RSA decryption and many protocols use the modular inverse.

When the Inverse Exists

The inverse of a mod m exists iff gcd(a, m) = 1. So:

When m is prime, every a with 1 ≤ a ≤ m−1 has an inverse (since gcd(a, m) = 1). 0 has no inverse.
When m is composite, a has an inverse iff a and m are coprime. For example mod 10, 3 has inverse 7 (3·7=21≡1); 2 has no inverse because gcd(2, 10) = 2 ≠ 1.

Two Methods to Compute the Inverse

Method 1: Extended Euclidean Algorithm

Solve a·x + m·y = 1. The coefficient x is a modular inverse of a mod m. Run extended_gcd(a, m); if g ≠ 1, no inverse. Otherwise reduce x to [0, m−1]: inv = x % m or inv = (x % m + m) % m if x might be negative. This works for any m and any a with gcd(a, m) = 1. Time O(log min(a, m)).

Method 2: Fermat’s Little Theorem (When m is Prime)

If m is prime and a is not divisible by m, then a^m−1 ≡ 1 (mod m). So a · a^m−2 ≡ 1 (mod m), hence a⁻¹ ≡ a^m−2 (mod m). Compute a^m−2 mod m with fast exponentiation. Time O(log m). Only applies when m is prime.

Example

Inverse of 3 mod 7. Extended GCD: 3·x + 7·y = 1 → (1, 5, −2), so x = 5. Check: 3·5 = 15 ≡ 1 (mod 7). Or by Fermat (m=7 prime): 3⁻¹ ≡ 3⁵ = 243 ≡ 5 (mod 7).

Python Implementation

Using Extended GCD (Works for Any m)

def mod_inverse_gcd(a: int, m: int) -> int | None:
    """Returns a^(-1) mod m if gcd(a, m) = 1, else None."""
    g, x, _ = extended_gcd(a % m, m)
    if g != 1:
        return None
    return (x % m + m) % m

Assume extended_gcd is from topic 4.5. We take a % m so we work with a in [0, m−1]; then (x % m + m) % m puts the inverse in [0, m−1].

Using Fermat (When m is Prime)

def mod_inverse_fermat(a: int, m: int) -> int | None:
    """Returns a^(-1) mod m using a^(m-2). Only when m is prime."""
    a %= m
    if a == 0:
        return None
    return pow(a, m - 2, m)

Using Python’s Built-in (3.8+)

# When gcd(a, m) = 1:
pow(a, -1, m)   # returns a^(-1) mod m
# Raises ValueError if gcd(a, m) != 1

In practice use pow(a, -1, m) when you know a and m are coprime (e.g., m prime and a not divisible by m).

Line-by-Line Explanation (mod_inverse_gcd)

a % m — Work with a in [0, m−1]; gcd and inverse are unchanged.
extended_gcd(a % m, m) — Get (g, x, y) with (a mod m)·x + m·y = g.
if g != 1: return None — Inverse exists only when gcd is 1.
(x % m + m) % m — x might be negative; this gives the unique representative in [0, m−1].

Time and Space Complexity

Extended GCD method: O(log min(a, m)). Fermat (m prime): O(log m) for pow(a, m−2, m). Space O(1) for both iterative implementations.

Edge Cases

a = 0: 0 has no inverse (0·x ≡ 0 ≢ 1 for any x). Return None or handle before calling.
gcd(a, m) ≠ 1: No inverse. extended_gcd returns g > 1; return None. pow(a, -1, m) raises ValueError.
m = 1: Every number is ≡ 0 (mod 1); “inverse” isn’t useful. Usually m ≥ 2 in practice.
Negative a: Reduce a to [0, m−1] first with a % m; the inverse of (a mod m) is the same as the inverse of a mod m.

Common Mistakes

Not checking gcd: If you assume the inverse exists and it doesn’t, you’ll get wrong results or a crash. Always check g == 1 (or catch ValueError for pow).
Returning negative x: The inverse should be in [0, m−1]. Use (x % m + m) % m after extended GCD.
Using Fermat when m is not prime: a⁻¹ ≡ a^m−2 only holds when m is prime. For composite m use extended GCD.

Common Mistake

Using Fermat’s little theorem (a^m−2) to compute the inverse when m is composite. The identity a^m−1 ≡ 1 (mod m) holds for all a coprime to m only when m is prime. For composite m (e.g., m = 10⁹+7 is prime, so Fermat is fine), but if the problem used a composite modulus you must use the extended Euclidean algorithm.

Application: (a / b) mod m

To compute (a / b) mod m when gcd(b, m) = 1: find inv = b⁻¹ mod m, then return (a % m * inv) % m. So “division” is multiply by the inverse.

def div_mod(a: int, b: int, m: int) -> int | None:
    inv = pow(b, -1, m)  # or mod_inverse_gcd(b, m)
    if inv is None:
        return None
    return (a % m * inv) % m

Interview Insight

When asked “how do you compute the modular inverse?”, say: “If gcd(a, m) = 1, the inverse exists. Two ways: (1) Extended Euclidean algorithm—solve a·x + m·y = 1, then x mod m is the inverse. (2) When m is prime, by Fermat’s little theorem a⁻¹ ≡ a^m−2 (mod m), so we can use pow(a, m-2, m). In Python 3.8+ we use pow(a, -1, m). Always check that the inverse exists when m is composite.”

Practice Problems

Implement mod_inverse using extended GCD and (when m is prime) using Fermat; verify a · inv ≡ 1 (mod m).
Compute nCr mod p (p prime): nCr = n! / (k! (n−k)!); precompute factorials and inverse factorials mod p, then combine.
Given a, b, m, compute (a / b) mod m when possible; return a sentinel or raise when b has no inverse.

Summary

The modular inverse of a mod m is x with a·x ≡ 1 (mod m). It exists iff gcd(a, m) = 1; when it exists it is unique in [0, m−1].
Extended GCD: Solve a·x + m·y = 1; x mod m (adjusted for negative) is the inverse. Works for any m. Time O(log min(a, m)).
Fermat (m prime): a⁻¹ ≡ a^m−2 (mod m). Compute with pow(a, m−2, m). Only when m is prime.
Use pow(a, -1, m) in Python 3.8+ when gcd(a, m) = 1. (a / b) mod m = (a · b⁻¹) mod m.

4.9 Combinatorics

Introduction

Combinatorics is the study of counting: the number of ways to arrange, select, or partition objects under given rules. In DSA and competitive programming you constantly need “how many ways …?” (paths, subsets, arrangements, valid configurations). This section covers the foundational counting principles—the sum rule and product rule—and the role of factorials in counting. The next topic (4.10) gives the exact formulas for permutations and combinations; here we build the mindset and the modular tools (factorials and inverse factorials mod m) needed to compute those efficiently.

Real-World Analogy

Imagine choosing an outfit: 3 shirts and 4 pants. Any shirt can pair with any pant, so total outfits = 3 × 4 = 12. That’s the product rule: when one choice doesn’t affect the other, multiply the number of options. Now imagine you can either wear a hat (2 choices) or no hat (1 choice), but not both—total hat options = 2 + 1 = 3. That’s the sum rule: when choices are mutually exclusive, add. Most counting problems combine these two rules.

Formal Definition

Concept Note

Sum rule: If a task can be done in one of n₁ ways, or n₂ ways, …, or n_k ways, and these options are mutually exclusive, then the task can be done in n₁ + n₂ + … + n_k ways.

Product rule: If a task is done in a sequence of steps: step 1 in n₁ ways, step 2 in n₂ ways (regardless of step 1), …, step k in n_k ways, then the task can be done in n₁ × n₂ × … × n_k ways.

Many problems require breaking the count into cases (sum rule) and then counting each case by a sequence of choices (product rule).

Why This Topic Matters

“Number of ways” problems: Count paths, subsets, valid sequences, placements—all use combinatorial reasoning.
Permutations and combinations: The next topic gives P(n,r) and C(n,r); both rely on factorials and on the product/sum rules for derivation.
Modular counting: Problems often ask for the answer “mod 10⁹+7.” You need factorials and inverse factorials mod m (from the previous topics) to compute nCr and nPr without overflow.

Mental Model

Ask: “Is this a sequence of independent choices?” → product rule (multiply). “Is this one of several disjoint cases?” → sum rule (add). Often you combine: “For each case i, count the ways (product rule); then sum over cases.” Also: “Order matters” vs “order doesn’t matter” will distinguish permutations from combinations (topic 4.10).

Factorials and Growth

The factorial of a non-negative integer n is n! = n × (n−1) × … × 1, with 0! = 1. It counts the number of ways to arrange n distinct objects in a line (permutations of n objects). Factorials grow very fast: 10! ≈ 3.6×10⁶, 20! ≈ 2.4×10¹⁸. So we almost always work modulo m (e.g., 10⁹+7) when n is large.

def factorial(n: int, m: int | None = None) -> int:
    """n! or n! mod m if m given."""
    if n < 0:
        return 0  # or raise
    res = 1
    for i in range(2, n + 1):
        res *= i
        if m is not None:
            res %= m
    return res

For repeated use (e.g., many nCr queries), precompute factorials 0! .. N! mod m in an array—O(N) time once, then O(1) per lookup.

Precomputing Factorials and Inverse Factorials Mod m

To compute nCr = n! / (k! (n−k)!) mod m (m prime), we need n!, k!, (n−k)! mod m and then divide by (k! (n−k)!) using the modular inverse. Precompute:

fact[i] = i! mod m for i = 0..N
inv_fact[i] = (i!)⁻¹ mod m (inverse factorial)

Then nCr mod m = fact[n] × inv_fact[k] × inv_fact[n−k] mod m. Building inv_fact: inv_fact[N] = pow(fact[N], -1, m), then inv_fact[i] = inv_fact[i+1] × (i+1) mod m (so inv_fact[i] = 1/(i!) mod m).

def precompute_factorials(n: int, m: int) -> tuple[list[int], list[int]]:
    fact = [1] * (n + 1)
    for i in range(1, n + 1):
        fact[i] = (fact[i - 1] * i) % m
    inv_fact = [1] * (n + 1)
    inv_fact[n] = pow(fact[n], -1, m)
    for i in range(n - 1, -1, -1):
        inv_fact[i] = (inv_fact[i + 1] * (i + 1)) % m
    return fact, inv_fact

Then C(n, k) mod m = fact[n] * inv_fact[k] % m * inv_fact[n - k] % m (for 0 ≤ k ≤ n). This is the standard setup for combinatorics mod m.

Sum Rule and Product Rule: Examples

Product rule

Number of k-digit strings over an alphabet of size d (repetition allowed): each of k positions has d choices → d^k. Number of ways to order n distinct items: n choices for first, n−1 for second, … → n!.

Sum rule

Number of ways to get a sum of 7 with two dice: (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) → 6 ways. Each outcome is mutually exclusive, so we add.

Combined

Count 5-letter words that start with a vowel (a,e,i,o,u) or end with a consonant. Cases: (1) start vowel, end consonant; (2) start vowel, end vowel; (3) start consonant, end consonant. Each case uses the product rule; the three cases are disjoint, so add the three counts.

Time and Space Complexity

Single factorial mod m: O(n). Precompute factorials 0..N: O(N) time, O(N) space. Each nCr query after precomputation: O(1). So for many nCr queries (e.g., up to N), precomputation is essential.

Edge Cases

n < 0 or k < 0: Factorial and C(n,k) are usually defined for n, k ≥ 0. Return 0 or handle as invalid.
k > n: C(n, k) = 0 (no way to choose k items from n). So check 0 ≤ k ≤ n before computing.
m = 1: Everything mod 1 is 0; factorials and nCr mod 1 are 0. Usually m is a prime > 1 (e.g., 10⁹+7).

Common Mistakes

Confusing sum and product: If choices are independent (one doesn’t restrict the other), multiply. If they’re mutually exclusive (either A or B), add.
Computing n! without mod for large n: n! overflows or becomes huge. Always work mod m when the problem says “mod 10⁹+7.”
Computing nCr as fact[n] // (fact[k] * fact[n-k]): Integer division is wrong in modular arithmetic. Use fact[n] * inv_fact[k] * inv_fact[n-k] mod m.

Common Mistake

Using integer division for nCr mod m: (fact[n] // fact[k]) // fact[n-k] is wrong—division in integers doesn’t preserve congruence. You must multiply by the modular inverse of the denominator (or use precomputed inverse factorials).

Pattern Recognition

“In how many ways …?” usually means: identify disjoint cases (sum rule) and/or a sequence of choices (product rule). When the count involves “choose k from n” or “arrange n items,” use permutations and combinations (next topic); the formulas are built from factorials and the rules above.

Interview Insight

When the problem asks for “number of ways mod 10⁹+7,” say: “I’ll use the sum and product rules to break the count into cases and choices. For formulas that need nCr or nPr I’ll precompute factorials and inverse factorials mod m in O(N), then each query is O(1). I won’t use integer division for nCr—I’ll use inverse factorials.”

Practice Problems

Precompute fact and inv_fact for n up to 10⁶ mod 10⁹+7; implement nCr(n, k, fact, inv_fact, m) and test.
Count the number of k-digit numbers with no leading zero: first digit 1..9, remaining k−1 digits 0..9 → 9 × 10^k−1 (product rule).
Solve a problem that asks for “number of ways” mod 10⁹+7 using sum/product rules and (when needed) nCr from precomputed factorials.

Summary

Sum rule: Mutually exclusive options → add the counts. Product rule: Sequence of independent choices → multiply the counts.
Factorial: n! = n×(n−1)×…×1, 0! = 1; grows fast—work mod m for large n. Precompute fact[0..N] and inv_fact[0..N] mod m for O(1) nCr/nPr.
nCr mod m = fact[n] × inv_fact[k] × inv_fact[n−k] mod m (for 0 ≤ k ≤ n). Never use integer division; use modular inverses.
Combinatorics problems: identify cases (sum) and choices (product); then plug in permutations/combinations (next topic) when order or selection is involved.

4.10 Permutations & Combinations

Introduction

A permutation is an arrangement of objects in a specific order; a combination is a selection of objects where order does not matter. “How many ways to arrange 3 books from 5?” is P(5, 3) = 60. “How many ways to choose 3 books from 5?” is C(5, 3) = 10. Permutations and combinations are the core counting tools in DSA: subsets, teams, passwords, placements. This section gives the definitions, formulas (with and without repetition), efficient computation mod m using precomputed factorials (from 4.9), and common identities.

Real-World Analogy

Permutation: Picking 3 people for president, vice-president, secretary—order matters (Alice–Bob–Carol is different from Bob–Alice–Carol). So we count arrangements. Combination: Picking 3 people for a committee—order doesn’t matter (the same three people are one committee). So we count subsets. Same “choose 3 from n,” but permutation counts orderings (more), combination counts subsets (fewer); and P(n, 3) = C(n, 3) × 3!.

Formal Definition

Concept Note

Permutation (without repetition): P(n, r) or P_n,r = number of ways to arrange r distinct objects chosen from n distinct objects. Order matters. Formula: P(n, r) = n! / (n−r)! = n × (n−1) × … × (n−r+1).

Combination (without repetition): C(n, r) or C(n,r) or ⁿC_r or (n choose r) = number of ways to choose r objects from n distinct objects. Order does not matter. Formula: C(n, r) = n! / (r! (n−r)!).

By convention, P(n, 0) = C(n, 0) = 1 (one way to arrange or choose nothing). For r > n, P(n, r) = C(n, r) = 0.

Why This Topic Matters

Subset and selection problems: “How many subsets of size k?” → C(n, k). “How many ways to assign k distinct roles from n people?” → P(n, k).
Counting paths, sequences, and configurations: Many problems decompose into “choose positions” (combination) and “assign values” (permutation or product rule).
Competitive programming: nCr and nPr mod 10⁹+7 appear constantly; you need the formulas and fast implementation with precomputed factorials.

Relation Between P and C

To arrange r objects chosen from n: first choose the r objects (C(n, r) ways), then order them (r! ways). So P(n, r) = C(n, r) × r!. Hence C(n, r) = P(n, r) / r! = n! / (r! (n−r)!).

Key Identities

Symmetry: C(n, k) = C(n, n−k). Choosing k is the same as “leaving out” n−k.
Pascal’s identity: C(n, k) = C(n−1, k−1) + C(n−1, k). Either the first element is in the subset (C(n−1, k−1)) or it isn’t (C(n−1, k)). This gives a recurrence to build Pascal’s triangle.
Sum of row: C(n, 0) + C(n, 1) + … + C(n, n) = 2ⁿ (total subsets of an n-element set).

With Repetition (Brief)

Permutations with repetition: Arrange r objects from n types, each type available unlimited times. Each of r positions has n choices → n^r.

Combinations with repetition (multiset choice): Choose r objects from n types (unlimited of each). Formula: C(n+r−1, r) = C(n+r−1, n−1) (“stars and bars”). Example: number of ways to distribute r identical candies to n children = C(n+r−1, r).

Python Implementation

Assume we have precomputed fact and inv_fact (from topic 4.9) for indices 0..N mod m.

nCr (combinations)

def nCr(n: int, k: int, fact: list[int], inv_fact: list[int], m: int) -> int:
    if k < 0 or k > n:
        return 0
    return fact[n] * inv_fact[k] % m * inv_fact[n - k] % m

nPr (permutations)

def nPr(n: int, r: int, fact: list[int], inv_fact: list[int], m: int) -> int:
    if r < 0 or r > n:
        return 0
    return fact[n] * inv_fact[n - r] % m

nPr(n, r) = n! / (n−r)! = fact[n] × inv_fact[n−r]. No need for inv_fact[r] unless you derive from nCr × r!.

Line-by-Line Explanation (nCr)

if k < 0 or k > n: return 0 — No way to choose k from n when k > n or k < 0. C(n, 0) = 1 is handled by the formula (inv_fact[0] = 1).
fact[n] * inv_fact[k] % m * inv_fact[n - k] % m — n! / (k! (n−k)!) mod m. Multiply by inverse factorials instead of dividing. Reduce mod m after each multiplication to keep intermediates bounded.

Time and Space Complexity

With precomputed fact and inv_fact: O(1) per nCr or nPr query. Precomputation is O(N) time and O(N) space (topic 4.9). Without precomputation, computing one nCr with a loop (multiply and divide) is O(min(k, n−k)); for many queries, precomputation is better.

Edge Cases

k > n or k < 0: C(n, k) = 0. Return 0.
k = 0 or k = n: C(n, 0) = C(n, n) = 1. The formula gives fact[n] * inv_fact[0] * inv_fact[n] = 1 when inv_fact[0] = 1 and inv_fact[n] = 1/fact[n].
r > n for nPr: P(n, r) = 0. Return 0.

Common Mistakes

Using P when order doesn’t matter: “Choose a committee of 3” → C(n, 3), not P(n, 3). P counts orderings.
Using C when order matters: “Rank top 3” or “assign 3 distinct roles” → P(n, 3).
Off-by-one in formulas: P(n, r) has r factors: n, n−1, …, n−r+1. So the last factor is (n−r+1), not (n−r).
Integer division for nCr: Use inverse factorials (or modular inverse) when working mod m; never integer division.

Common Mistake

Using P(n, r) when the problem says “choose” or “select” and order doesn’t matter. That overcounts by r!. Use C(n, r). Conversely, using C(n, r) when positions or order are distinct (e.g., “first place, second place, third place”) undercounts—use P(n, r).

Pattern Recognition

Ask: “Does order matter?” Yes → permutation (arrange, rank, assign distinct roles). No → combination (choose, committee, subset). Then check: repetition allowed? If yes, use n^r (permutation) or stars-and-bars (combination with repetition). If no, use P(n, r) or C(n, r).

Interview Insight

When the problem involves “number of ways to choose/arrange,” clarify: “Order matters → permutation P(n,r) = n!/(n−r)!. Order doesn’t matter → combination C(n,r) = n!/(k!(n−k)!). I’ll precompute factorials and inverse factorials mod m so each nCr/nPr is O(1).” Mention symmetry C(n,k)=C(n,n−k) to reduce k when k > n/2 for a slight optimization.

Practice Problems

Implement nCr and nPr with precomputed fact/inv_fact; verify against small values (e.g., C(5,2)=10, P(5,2)=20).
Count subsets of size k: C(n, k). Count permutations of n objects: n!. Count ways to place k distinct items in n distinct positions (each at most one): P(n, k).
LeetCode-style: “Unique paths” (grid with C(n+m-2, n-1)), “combinations” (enumerate C(n,k)), or “number of ways” mod 10⁹+7 using nCr.

Summary

Permutation P(n, r) = n!/(n−r)!: arrange r distinct objects from n; order matters. Combination C(n, r) = n!/(r!(n−r)!): choose r from n; order doesn’t matter. P(n, r) = C(n, r) × r!.
Key identities: C(n, k) = C(n, n−k); C(n, k) = C(n−1, k−1) + C(n−1, k); sum of C(n,0)..C(n,n) = 2ⁿ.
With repetition: permutations n^r; combinations (stars and bars) C(n+r−1, r).
Implement nCr/nPr mod m with precomputed fact and inv_fact; O(1) per query. Edge cases: k > n or k < 0 → 0.

4.11 Probability Basics

Introduction

Probability in DSA appears in randomized algorithms, expected-value analysis, and problems that ask “what is the probability that …?” or “expected number of steps?”. You don’t need measure theory—only discrete probability: sample space, events, and the rule P(event) = number of favorable outcomes / number of total outcomes when outcomes are equally likely. This section covers basic definitions, the addition and multiplication rules, complement, independence, and expected value (including linearity). Enough to reason about probability in interviews and to tie counting (combinatorics) to probability.

Real-World Analogy

Roll a fair die. The sample space is {1, 2, 3, 4, 5, 6}. The probability of rolling a 4 is 1/6 (one favorable outcome out of six). The probability of rolling an even number is 3/6 = 1/2 (outcomes 2, 4, 6). “Probability” here is just “favorable count / total count” when every outcome is equally likely. Same idea in algorithms: “probability a random permutation has property P” = (number of permutations with P) / (total permutations).

Formal Definition (Discrete, Equally Likely)

Concept Note

Sample space Ω: the set of all possible outcomes. An event E is a subset of Ω. When all outcomes are equally likely, the probability of event E is P(E) = |E| / |Ω| (number of outcomes in E divided by total outcomes). So probability reduces to counting: count favorable outcomes and total outcomes (often using permutations and combinations).

We have 0 ≤ P(E) ≤ 1. P(Ω) = 1 (something must happen). P(empty) = 0.

Why This Topic Matters

Expected value: “Expected number of comparisons in quicksort,” “expected steps until random walk hits a state”—linearity of expectation is used everywhere.
Randomized algorithms: “With probability at least 1/2 the algorithm succeeds”—you need to bound or compute such probabilities.
Interview problems: “Probability that two people share a birthday,” “expected value of …”—formulate as counting (favorable / total) or as expectation.

Key Rules

Complement

P(not E) = 1 − P(E). So “probability at least one” = 1 − “probability none.”

Addition (Disjoint Events)

If events A and B cannot happen together (disjoint), then P(A or B) = P(A) + P(B). For more than two disjoint events, add all.

Multiplication (Independent Events)

If A and B are independent (one occurring doesn’t change the chance of the other), then P(A and B) = P(A) × P(B). Example: two fair coin flips; P(both heads) = (1/2)×(1/2) = 1/4.

General Addition

For any two events: P(A or B) = P(A) + P(B) − P(A and B). When A and B are disjoint, P(A and B) = 0.

Expected Value

For a discrete random variable X that takes values x₁, x₂, … with probabilities p₁, p₂, …, the expected value is E[X] = Σ x_i · P(X = x_i). Intuition: long-run average if you repeat the experiment many times.

Linearity of expectation: E[X + Y] = E[X] + E[Y] for any X, Y (even if they are not independent). So E[X₁ + X₂ + … + Xₙ] = E[X₁] + … + E[Xₙ]. This is used constantly: break the quantity into indicator or simple random variables, compute each expectation, and add.

Example

Expected number of heads in n fair coin flips: Let Xᵢ = 1 if flip i is heads, 0 otherwise. E[Xᵢ] = 1/2. Total heads X = X₁ + … + Xₙ, so E[X] = n/2.

Probability as Counting

When outcomes are equally likely, P(E) = (number of outcomes in E) / (total outcomes). So:

Probability that a random subset of size k from {1..n} contains 1 = C(n−1, k−1) / C(n, k) = k/n (or: 1 is in the subset with probability k/n by symmetry).
Probability that a random permutation of n is a derangement (no fixed point) = (number of derangements) / n!. Useful in some puzzles.

So combinatorics (nCr, nPr, counting) directly gives probabilities when the sample space is “all subsets” or “all permutations” with uniform distribution.

Python: Simple Probability and Expectation

For small spaces you can enumerate and count. For expectation you can sum value × probability, or simulate (Monte Carlo) for verification.

# Example: P(sum of two fair dice = 7)
# Favorable: (1,6),(2,5),(3,4),(4,3),(5,2),(6,1) → 6 outcomes. Total 36.
p_7 = 6 / 36  # 1/6

# Expected value of one die roll (1 to 6)
E_one_die = sum(x for x in range(1, 7)) / 6  # 3.5

# E[X+Y] = E[X] + E[Y]: two dice → 3.5 + 3.5 = 7

Common Mistakes

Assuming independence when events aren’t: P(A and B) = P(A)×P(B) only when A and B are independent. If they’re not, use P(A and B) = P(A) P(B|A) (conditional) or count outcomes.
Adding non-disjoint events without subtracting overlap: P(A or B) = P(A) + P(B) − P(A and B). Forgetting the subtraction double-counts.
Confusing E[X·Y] with E[X]·E[Y]: Linearity gives E[X+Y] = E[X] + E[Y]. In general E[X·Y] ≠ E[X]·E[Y] unless X and Y are uncorrelated/independent.

Common Mistake

Using P(A and B) = P(A) × P(B) when A and B are not independent. For example, “probability first draw is red and second draw is red” from an urn without replacement—the second probability depends on the first. Use conditional probability or count favorable outcomes directly.

Interview Insight

When asked a probability question, say: “I’ll assume outcomes are equally likely and compute P(E) = favorable outcomes / total outcomes. I’ll use counting—combinations or permutations—for the numerator and denominator.” For expected value: “I’ll use linearity: write the quantity as a sum of simpler random variables (e.g., indicators), compute each expectation, and add.”

Practice Problems

Probability that two people in a room of n share a birthday (use complement: 1 − P(all different) = 1 − (365/365)×(364/365)×…×((365−n+1)/365)).
Expected number of trials until first success (geometric): if P(success) = p, E[trials] = 1/p.
Given n items and k chosen at random, expected number of “special” items in the chosen set (linearity with indicators).

Summary

When outcomes are equally likely, P(E) = |E| / |Ω|—probability is counting. Use combinatorics (nCr, nPr) for numerator and denominator.
Complement: P(not E) = 1 − P(E). Disjoint: P(A or B) = P(A) + P(B). Independent: P(A and B) = P(A)×P(B). General or: P(A or B) = P(A) + P(B) − P(A and B).
Expected value: E[X] = Σ x·P(X=x). Linearity: E[X+Y] = E[X] + E[Y]; use to break into indicators or simple terms.
In DSA: randomized algorithms (probability of success), expected running time, and “probability that …” problems (count favorable / total).

4.12 Matrix Basics

Introduction

A matrix is a rectangular grid of numbers (or elements) arranged in rows and columns. An m × n matrix has m rows and n columns. In DSA, matrices appear as 2D arrays (graph adjacency, grid problems), in linear recurrences (matrix exponentiation for Fibonacci—topic 4.17), and in dynamic programming. This section covers representation (row/column indexing), basic operations—addition, scalar multiplication, matrix multiplication, and transpose—and clean Python implementations. Mastery of matrix multiplication (dimensions, loop order) is essential for matrix exponentiation and many advanced topics.

Real-World Analogy

Think of a matrix as a spreadsheet or a data table: rows are entities (e.g., users), columns are attributes (e.g., age, score). The entry in row i and column j is the value for that pair. In graphics or physics, a matrix can represent a transformation (rotation, scaling); multiplying a vector by a matrix gives a new vector. In algorithms, we often multiply matrices to combine transitions (e.g., “one step” in a recurrence becomes a matrix; “n steps” is the matrix raised to power n).

Formal Definition

Concept Note

An m × n matrix A has m rows and n columns. We write A[i][j] or A_i,j for the entry in row i and column j (often 0-indexed: rows 0..m−1, columns 0..n−1). The transpose A^T is the n×m matrix with (A^T)_j,i = A_i,j—rows and columns swapped.

Two matrices of the same dimensions can be added entry-wise. Matrix multiplication is defined when the number of columns of the first equals the number of rows of the second: (m×n)(n×p) → m×p.

Why This Topic Matters

2D arrays and grids: Matrices are the natural structure for grids (maze, board, image). Traversal, DP on grids, and adjacency matrices all use the same indexing.
Matrix multiplication: Recurrences like Fibonacci can be written as a vector updated by a fixed matrix; then F_n is obtained by “matrix to power n” (topic 4.17). So matrix multiply is the building block.
Graphs: Adjacency matrix of a graph is an n×n matrix; A[i][j] = 1 if there’s an edge from i to j. Powers of the adjacency matrix count walks of a given length.

Representation in Python

A matrix is typically a list of lists: each inner list is a row. So A[i][j] is row i, column j. Ensure every row has the same length (same number of columns).

# 2×3 matrix
A = [
    [1, 2, 3],
    [4, 5, 6]
]
# A[0][1] = 2, A[1][2] = 6
rows, cols = len(A), len(A[0])

Basic Operations

Addition

For two matrices A and B of the same size (m×n), (A + B)[i][j] = A[i][j] + B[i][j]. Time O(m·n), space O(m·n) for the result.

Scalar Multiplication

(c·A)[i][j] = c × A[i][j]. Time O(m·n).

Transpose

A^T has dimensions n×m with A^T[j][i] = A[i][j]. So the j-th row of A^T is the j-th column of A. Time O(m·n).

Matrix Multiplication

Let A be m×n and B be n×p. The product C = A×B is m×p with:

C[i][j] = Σ_k=0ⁿ⁻¹ A[i][k] · B[k][j]

So the (i,j) entry of the product is the dot product of row i of A and column j of B. The inner dimension (n) must match; the result has dimensions m×p.

Example

2×2 times 2×2: A = [[a,b],[c,d]], B = [[e,f],[g,h]]. C[0][0] = a·e + b·g, C[0][1] = a·f + b·h, C[1][0] = c·e + d·g, C[1][1] = c·f + d·h.

ASCII Diagram: Matrix Multiplication

  A (m×n)     ×     B (n×p)     =     C (m×p)
  row i  ──────────────►  dot with col j  ──►  C[i][j]
        [ ... A[i][k] ... ]   [ B[k][j] ]   =  sum_k A[i][k]*B[k][j]
                             [   ...   ]
  Inner dimension n must match; result has outer dimensions m and p.

Identity Matrix

The identity matrix I_n is the n×n matrix with I[i][j] = 1 if i = j and 0 otherwise. For any n×n matrix A, A·I = I·A = A. In exponentiation we use I as the base case (A⁰ = I).

Python Implementation

Matrix Addition

def mat_add(A: list[list[float]], B: list[list[float]]) -> list[list[float]]:
    m, n = len(A), len(A[0])
    if len(B) != m or len(B[0]) != n:
        raise ValueError("dimension mismatch")
    return [[A[i][j] + B[i][j] for j in range(n)] for i in range(m)]

Matrix Multiplication

def mat_mul(A: list[list[float]], B: list[list[float]]) -> list[list[float]]:
    m, n, p = len(A), len(A[0]), len(B[0])
    if len(B) != n:
        raise ValueError("dimension mismatch: A is m×n, B must be n×p")
    C = [[0] * p for _ in range(m)]
    for i in range(m):
        for j in range(p):
            for k in range(n):
                C[i][j] += A[i][k] * B[k][j]
    return C

Transpose

def mat_transpose(A: list[list[float]]) -> list[list[float]]:
    if not A:
        return []
    m, n = len(A), len(A[0])
    return [[A[i][j] for i in range(m)] for j in range(n)]

Identity Matrix

def identity(n: int) -> list[list[float]]:
    I = [[0] * n for _ in range(n)]
    for i in range(n):
        I[i][i] = 1
    return I

Line-by-Line Explanation (mat_mul)

m, n, p = len(A), len(A[0]), len(B[0]) — A is m×n, B must be n×p; result C is m×p.
if len(B) != n — B must have n rows (same as A’s columns) for the product to be defined.
C = [[0] * p for _ in range(m)] — Initialize m×p result to zeros.
for i in range(m): for j in range(p): for k in range(n): — For each (i,j), C[i][j] = sum over k of A[i][k]*B[k][j]. Loop order: i, j, k is standard (good cache behavior when row-major).

Time and Space Complexity

Addition: O(m·n) time, O(m·n) space for result. Transpose: O(m·n) time and space. Matrix multiplication (naive): Three nested loops over m, p, n → O(m·n·p) time. For two n×n matrices, O(n³). Space for result O(m·p). (Strassen and other methods reduce the exponent for large matrices but are rarely needed in interviews.)

Modular Matrix Multiplication

When entries are integers and the problem asks for “result mod M,” reduce modulo M after each multiplication and addition to avoid overflow and keep numbers small. Same loop structure; add % M when accumulating C[i][j].

def mat_mul_mod(A: list[list[int]], B: list[list[int]], M: int) -> list[list[int]]:
    m, n, p = len(A), len(A[0]), len(B[0])
    if len(B) != n:
        raise ValueError("dimension mismatch")
    C = [[0] * p for _ in range(m)]
    for i in range(m):
        for j in range(p):
            for k in range(n):
                C[i][j] = (C[i][j] + A[i][k] * B[k][j]) % M
    return C

Edge Cases

Empty matrix: A = [] or A = [[]]. Rows = 0; number of columns undefined. Check if not A or not A[0] before using len(A[0]).
Dimension mismatch: For A×B, A’s columns must equal B’s rows. Validate before looping.
Single row or column: A 1×n matrix times an n×1 matrix gives a 1×1 matrix (a single number). Handled correctly by the same loops.

Common Mistakes

Wrong loop order: C[i][j] must sum over k. So the inner loop must be over k (A’s columns / B’s rows). Writing loops in wrong order (e.g., j, i, k without i fixed) gives wrong indices.
Using A[k][j] instead of B[k][j]: The second matrix is B; row k of A times column j of B uses A[i][k] and B[k][j]. Don’t mix A and B.
Dimension confusion: (m×n)(n×p) = m×p. The shared dimension n is the one we sum over.

Common Mistake

In matrix multiplication, the (i,j) entry uses row i of A and column j of B. So the inner loop is over k: A[i][k] and B[k][j]. Writing A[i][j] and B[i][j] or swapping A and B in the inner product is wrong.

Pattern Recognition

Many problems that look like “apply a linear recurrence n times” can be expressed as “vector × matrix^n”. The recurrence defines the matrix; matrix multiplication combines two steps into one. So implementing mat_mul (and later mat_pow using fast exponentiation) is a recurring pattern.

Interview Insight

When the problem involves matrices, state dimensions clearly: “A is m×n, B is n×p, so the product is m×p. The (i,j) entry is the dot product of row i of A and column j of B—three nested loops over i, j, k with C[i][j] += A[i][k]*B[k][j]. For matrix exponentiation we’ll use this multiply with fast exponentiation (binary exponentiation on the matrix).”

Practice Problems

Implement mat_add, mat_mul, transpose, identity and test on 2×2 examples (hand-compute expected result).
Implement mat_mul_mod for integer matrices mod M; use it as the “multiply” in matrix exponentiation (next topic 4.17).
Count walks of length k from vertex i to j in a graph using the adjacency matrix: the (i,j) entry of A^k is the number of walks of length k.

Summary

A matrix is an m×n grid; A[i][j] is row i, column j. Represent in Python as list of lists (each list is a row).
Addition: entry-wise, same dimensions. Transpose: (A^T)_j,i = A_i,j. Multiplication: (m×n)(n×p) = m×p; (AB)[i][j] = Σ_k A[i][k]·B[k][j].
Naive matrix multiply: O(m·n·p) time. For mod M, reduce after each operation. Identity matrix I satisfies A·I = I·A = A.
Loop order: for i, for j, for k with C[i][j] += A[i][k]*B[k][j]. Use row of A and column of B—don’t mix indices.

4.13 Euler's Totient Function

Introduction

Euler’s totient function φ(n) counts the number of integers in {1, 2, …, n} that are coprime to n (i.e., gcd(k, n) = 1). So φ(n) is also the number of elements in the set of integers modulo n that have a multiplicative inverse—the size of the “unit group” mod n. It appears in Euler’s theorem (a^φ(n) ≡ 1 (mod n) when gcd(a, n) = 1), in RSA cryptography, and in counting problems (e.g., “how many fractions a/b in lowest terms with 0 < a < b ≤ n?”). This section defines φ(n), gives formulas (including from prime factorization and via a sieve), and shows how to compute it in code.

Real-World Analogy

Imagine n chairs in a circle. You want to assign each chair a number from 1 to n so that every chair gets a number that “shares no common factor with n” in a certain sense—think of it as “valid” positions. The number of valid assignments is φ(n). Equivalently: among 1, 2, …, n, how many share no prime factor with n? That count is φ(n). For n = 10, the numbers coprime to 10 are 1, 3, 7, 9 → φ(10) = 4.

Formal Definition

Concept Note

For a positive integer n, φ(n) (Euler’s totient) is the number of integers k in the range 1 ≤ k ≤ n such that gcd(k, n) = 1. So φ(1) = 1 (1 is coprime to 1 by convention). For n ≥ 2, φ(n) is the count of elements in {1, …, n} that have a multiplicative inverse modulo n.

If n has prime factorization n = p₁^a₁ p₂^a₂ … p_k^aₖ, then φ(n) = n × Π (1 − 1/p_i) = n × (1 − 1/p₁)(1 − 1/p₂)…(1 − 1/p_k). Equivalently, φ(n) = Π (p_i^aᵢ − p_i^aᵢ−1) = Π p_i^aᵢ−1(p_i − 1).

Why This Topic Matters

Euler’s theorem: If gcd(a, n) = 1, then a^φ(n) ≡ 1 (mod n). So a⁻¹ ≡ a^φ(n)−1 (mod n)—another way to compute the modular inverse when n is not prime (Fermat applies only when n is prime).
RSA: The public and private exponents are chosen using φ(N) where N = p·q. Security relies on the hardness of computing φ(N) without knowing the factors.
Counting: Problems like “count pairs (a, b) with 1 ≤ a < b ≤ n and gcd(a, b) = 1” or “number of reduced fractions” use sums involving φ.

Mental Model

φ(n) is “how many numbers in 1..n don’t share any prime factor with n.” So start with n and for each distinct prime p dividing n, “remove” a fraction 1/p of the numbers (those divisible by p). What remains is n × (1 − 1/p₁)(1 − 1/p₂)… = φ(n). For a prime p, every number 1..p−1 is coprime to p, so φ(p) = p − 1.

Key Formulas

Prime: φ(p) = p − 1.
Prime power: φ(p^k) = p^k − p^k−1 = p^k−1(p − 1).
Multiplicative: If gcd(m, n) = 1, then φ(m·n) = φ(m)·φ(n). So from prime powers, φ(n) = Π φ(p_i^aᵢ) = Π (p_i^aᵢ − p_i^aᵢ−1).
Product form: φ(n) = n × Π_p|n (1 − 1/p).

Step-by-Step: Computing φ(n) from Prime Factorization

Factor n into primes: n = p₁^a₁ … p_k^aₖ.
For each distinct prime p dividing n, multiply the running result by (1 − 1/p), or equivalently compute φ(n) = n × Π (1 − 1/p). Alternatively, φ(n) = Π (p_i^aᵢ − p_i^aᵢ−1).
If you only need φ(n) for one n, factor n (trial division up to √n) then apply the formula. If you need φ(1) through φ(N), use a sieve (see below).

Computing φ(1..N) with a Sieve

Initialize phi[i] = i for all i. For each prime p from 2 to N: for each multiple k = p, 2p, 3p, … ≤ N, do phi[k] -= phi[k] / p (or phi[k] *= (1 - 1/p) in integer form: phi[k] = phi[k] * (p - 1) / p). After processing all primes, phi[i] = φ(i). This runs in O(N log log N) like the sieve of Eratosthenes.

Example

φ(12): 12 = 2² × 3. φ(12) = 12 × (1 − 1/2)(1 − 1/3) = 12 × (1/2)(2/3) = 4. Or φ(12) = φ(4)·φ(3) = (4−2)·(3−1) = 2·2 = 4. The integers in [1,12] coprime to 12 are 1, 5, 7, 11 → four numbers.

Euler’s Theorem

If gcd(a, n) = 1, then a^φ(n) ≡ 1 (mod n). So the order of a modulo n divides φ(n). Corollary: a⁻¹ ≡ a^φ(n)−1 (mod n)—useful to compute the modular inverse when n is composite (e.g., n = 10⁹+7 is prime, so Fermat is simpler; but for composite n, use φ(n) or extended GCD).

Python Implementation

φ(n) from Factorization (Single Value)

def totient(n: int) -> int:
    """Returns φ(n) for n >= 1."""
    if n <= 0:
        return 0
    if n == 1:
        return 1
    res = n
    d = 2
    while d * d <= n:
        if n % d == 0:
            while n % d == 0:
                n //= d
            res -= res // d
        d += 1
    if n > 1:
        res -= res // n
    return res

Idea: start with res = n. For each distinct prime p dividing n, multiply res by (1 − 1/p), implemented as res -= res // p (so res becomes res * (p-1) / p). We iterate d and divide n by d so we only process distinct primes.

φ(1..N) via Sieve

def totient_sieve(n: int) -> list[int]:
    """Returns list [φ(0), φ(1), ..., φ(n)]. φ(0)=0."""
    phi = list(range(n + 1))
    for i in range(2, n + 1):
        if phi[i] == i:
            for j in range(i, n + 1, i):
                phi[j] -= phi[j] // i
    return phi

If phi[i] == i, then i is prime (not yet reduced). For each prime i, update all its multiples j: phi[j] *= (1 - 1/i) via phi[j] -= phi[j] // i.

Line-by-Line Explanation (totient)

if n <= 0: return 0 — φ is defined for positive n only.
if n == 1: return 1 — φ(1) = 1 by convention.
res = n — Start with n; we’ll multiply by (1 − 1/p) for each prime p.
while d * d <= n — Trial division up to √n. When we exit, n is 1 or a single prime.
if n % d == 0 — d is a prime factor. while n % d == 0: n //= d removes all factors of d so we process d only once.
res -= res // d — Same as res = res * (1 - 1/d) = res * (d-1) / d in integers.
if n > 1: res -= res // n — Remaining n is a prime factor; apply the same.

Time and Space Complexity

Single φ(n) from factorization: O(√n) for trial division. Totient sieve: O(N log log N) time, O(N) space—same as the sieve of Eratosthenes. Use the sieve when you need φ(1)..φ(N); use the factorization method for a single large n.

Edge Cases

n ≤ 0: φ is defined for positive integers; return 0 or handle as invalid.
n = 1: φ(1) = 1 (1 is coprime to 1).
Prime n: φ(n) = n − 1.

Common Mistakes

Using φ(n) for modular inverse when n is prime: For prime n, Fermat (aⁿ⁻²) is simpler than a^φ(n)−1 (they’re equal when n is prime since φ(n) = n−1). Use φ when n is composite.
Forgetting distinct primes: In the product φ(n) = n Π (1 − 1/p), each distinct prime p appears once. When factoring, don’t apply (1 − 1/p) multiple times for the same p.
Sieve: updating phi[j] for composite j: In the sieve we iterate primes i and update multiples j. We do phi[j] -= phi[j] // i; doing this for each prime factor of j gives the correct φ(j).

Common Mistake

In the product formula φ(n) = n × Π (1 − 1/p), the product is over distinct primes dividing n. So for n = 12 = 2²×3, use (1−1/2) once and (1−1/3) once, not (1−1/2) twice.

Optimization Insight

When you need φ(n) for many n in a range [1, N], the sieve is better than factoring each n (sieve is O(N log log N) total vs N × O(√n) for individual factorization). When you need a single φ(n) for large n, factorization is O(√n) and sufficient.

Interview Insight

When asked about Euler’s totient, say: “φ(n) counts integers in [1, n] coprime to n. Formula: φ(n) = n × product over distinct primes p|n of (1 − 1/p). For prime p, φ(p) = p−1. Euler’s theorem: if gcd(a,n)=1, a^φ(n) ≡ 1 (mod n). I can compute φ(n) by factoring n in O(√n) or precompute φ(1..N) with a sieve in O(N log log N).”

Practice Problems

Implement totient(n) and totient_sieve(N); verify φ(12)=4, φ(7)=6.
Compute the modular inverse of a mod n (composite n) using a^φ(n)−1 mod n when gcd(a, n) = 1.
Count the number of integers in [1, n] coprime to n (that’s φ(n)); or count pairs (a, b) with 1 ≤ a < b ≤ n and gcd(a, b) = 1 using φ.

Summary

φ(n) = number of integers in {1, …, n} with gcd(k, n) = 1. φ(1) = 1; for prime p, φ(p) = p−1.
Formula: φ(n) = n × Π_p|n (1 − 1/p) = Π (p^a − p^a−1) over prime powers in n.
Euler’s theorem: If gcd(a, n) = 1 then a^φ(n) ≡ 1 (mod n). So a⁻¹ ≡ a^φ(n)−1 (mod n) for composite n.
Single φ(n): factor n, then apply formula—O(√n). Range φ(1..N): sieve in O(N log log N).

4.14 Chinese Remainder Theorem

Introduction

The Chinese Remainder Theorem (CRT) says that when we have several congruences with pairwise coprime moduli, there is a unique solution modulo the product of the moduli. Specifically: given x ≡ a₁ (mod m₁), x ≡ a₂ (mod m₂), …, x ≡ a_k (mod m_k) with gcd(m_i, m_j) = 1 for i ≠ j, there exists a unique x (mod M) where M = m₁·m₂·…·m_k. CRT lets us combine results computed modulo different primes (e.g., nCr mod several primes) into one result modulo their product, or split a problem into smaller moduli. This section states the theorem, gives the constructive formula, and implements it in Python.

Real-World Analogy

You have a number of items. When you divide by 3 the remainder is 2; when you divide by 5 the remainder is 3; when you divide by 7 the remainder is 2. Is there a number that fits all three? CRT says yes, and that all such numbers differ by a multiple of 3×5×7 = 105. So there is a unique remainder mod 105. Like solving a puzzle: each condition narrows the set; with coprime moduli, the constraints are “independent” and pin down one residue class mod the product.

Formal Definition

Concept Note

Chinese Remainder Theorem: Let m₁, m₂, …, m_k be positive integers that are pairwise coprime (gcd(m_i, m_j) = 1 for i ≠ j). Let M = m₁·m₂·…·m_k. Then for any integers a₁, …, a_k, the system of congruences

x ≡ a₁ (mod m₁), x ≡ a₂ (mod m₂), …, x ≡ a_k (mod m_k)

has a unique solution modulo M. That is, there exists an integer x such that all congruences hold, and any two such x are congruent modulo M.

If the moduli are not pairwise coprime, a solution may not exist (e.g., x ≡ 0 (mod 2) and x ≡ 1 (mod 2) is impossible). When a solution exists (e.g., x ≡ 1 (mod 2) and x ≡ 1 (mod 4)), it is unique modulo lcm(m₁, …, m_k).

Why This Topic Matters

Combining results: Compute something mod p₁, mod p₂, …, mod p_k (e.g., nCr mod each prime), then use CRT to get the result mod p₁·p₂·…·p_k or mod a large composite.
Large modulus: Instead of working mod a huge M, work mod several smaller coprime factors and recombine with CRT.
Contest problems: “Find x such that x ≡ a (mod m) and x ≡ b (mod n)”—direct CRT application.

Mental Model

Each congruence x ≡ a_i (mod m_i) says “x is in a certain residue class mod m_i.” Because the m_i are coprime, the conditions are independent: there is exactly one residue class mod M that matches all of them. So we “build” x by combining the contributions: for each i, we want a term that is a_i mod m_i and 0 mod m_j for j ≠ i; then add these terms.

Construction (Formula)

Let M = m₁·m₂·…·m_k and M_i = M / m_i. Then gcd(M_i, m_i) = 1 (since m_i is coprime to every other m_j). So M_i has an inverse mod m_i; call it y_i (so M_i·y_i ≡ 1 (mod m_i)). One solution is:

x = Σ_i a_i · M_i · y_i

Reduce x mod M to get the unique representative in [0, M−1]. Check: for each i, all terms a_j·M_j·y_j with j ≠ i are divisible by m_i (because M_j contains m_i), so x ≡ a_i·M_i·y_i ≡ a_i·1 ≡ a_i (mod m_i).

Two Moduli (Special Case)

Solve x ≡ a (mod m) and x ≡ b (mod n) with gcd(m, n) = 1. Write x = a + m·t for some integer t. Substitute into the second: a + m·t ≡ b (mod n) ⇒ m·t ≡ (b − a) (mod n). So t ≡ (b − a)·m⁻¹ (mod n). Compute t mod n, then x = a + m·t is a solution; reduce mod (m·n) for the unique solution in [0, mn−1].

Example

x ≡ 2 (mod 3), x ≡ 3 (mod 5). M = 15, M₁ = 5, M₂ = 3. y₁ = inverse of 5 mod 3 = 2 (5·2=10≡1). y₂ = inverse of 3 mod 5 = 2 (3·2=6≡1). x = 2·5·2 + 3·3·2 = 20 + 18 = 38 ≡ 8 (mod 15). Check: 8 mod 3 = 2, 8 mod 5 = 3 ✓.

Step-by-Step: General CRT

Check moduli are pairwise coprime (or handle non-coprime case separately).
Compute M = m₁·m₂·…·m_k and for each i, M_i = M / m_i.
For each i, compute y_i = modular inverse of M_i modulo m_i (e.g., pow(M_i, -1, m_i) in Python).
x = Σ a_i·M_i·y_i; then x = x % M (and if negative, (x % M + M) % M).

Python Implementation

Two Moduli

import math

def crt2(a: int, m: int, b: int, n: int) -> int | None:
    """Solves x ≡ a (mod m), x ≡ b (mod n). Returns x mod (m*n) or None if no solution."""
    if math.gcd(m, n) != 1:
        return None  # or solve for lcm when solution exists
    t = (b - a) * pow(m, -1, n) % n
    x = a + m * t
    return x % (m * n)

General CRT (List of (remainder, modulus))

def crt(remainders: list[int], moduli: list[int]) -> int | None:
    """Solves x ≡ remainders[i] (mod moduli[i]) for all i. Moduli must be pairwise coprime."""
    if len(remainders) != len(moduli):
        return None
    M = 1
    for m in moduli:
        M *= m
    x = 0
    for a, m in zip(remainders, moduli):
        Mi = M // m
        yi = pow(Mi, -1, m)
        x = (x + a * Mi * yi) % M
    return x

We reduce x mod M after each term (or once at the end) to avoid overflow. Final x is in [0, M−1].

Line-by-Line Explanation (General CRT)

M = product of moduli — The solution is unique mod M.
Mi = M // m — M_i = M/m_i; divisible by all m_j except m_i.
yi = pow(Mi, -1, m) — Inverse of M_i mod m_i (exists because gcd(M_i, m_i) = 1).
x = (x + a * Mi * yi) % M — Add a_i·M_i·y_i and keep x mod M so the sum doesn’t grow and we stay in [0, M−1].

Time and Space Complexity

For k congruences: k modular inverses (each O(log min(M_i, m_i))), k multiplications and additions. So O(k · log(max moduli)). Space O(1) if we don’t store all M_i, y_i (we can compute and add in one loop).

Edge Cases

Moduli not pairwise coprime: A solution may not exist (e.g., x ≡ 0 (mod 2) and x ≡ 1 (mod 2)). If it exists, it is unique mod lcm(m₁, …, m_k). For a full implementation, check pairwise gcd and either return None or use the lcm and check consistency.
Single congruence: Just return a mod m (or a if already in range).
Empty list: Return 0 or define M = 1, x = 0.

Common Mistakes

Assuming CRT applies when moduli are not coprime: The uniqueness and existence hold only when moduli are pairwise coprime. For 2 and 4, a solution exists only if a ≡ b (mod 2) (consistency); then solution is unique mod 4.
Forgetting to reduce x mod M: The formula can produce a large x; the answer is x mod M. Always return x % M (and handle negative if needed).
Wrong order of arguments: (remainder, modulus) pairs must match: x ≡ a (mod m). Don’t swap a and m when calling.

Common Mistake

Using CRT when the moduli are not pairwise coprime. For example, x ≡ 2 (mod 4) and x ≡ 0 (mod 2) has solutions (e.g., x ≡ 2 (mod 4)), but x ≡ 1 (mod 2) and x ≡ 2 (mod 4) has no solution. Always ensure gcd(m_i, m_j) = 1 for i ≠ j, or implement consistency checks and use lcm.

Optimization Insight

When combining results mod p₁, p₂, …, p_k (e.g., nCr mod each prime), compute each result in parallel or in one pass, then run CRT once. The CRT step is O(k) and typically k is small. Precompute M and the inverses y_i if you solve many systems with the same moduli.

Interview Insight

When asked about CRT, say: “If we have x ≡ a_i (mod m_i) for pairwise coprime m_i, there’s a unique solution mod M = product of m_i. We build it as sum of a_i * (M/m_i) * inv(M/m_i) mod m_i. I’ll compute M, then for each i get the inverse of M/m_i mod m_i and add the term. For two moduli, we can also solve by writing x = a + m*t and solving for t mod n.”

Practice Problems

Implement crt2 and crt; verify with the example x ≡ 2 (mod 3), x ≡ 3 (mod 5) → 8 (mod 15).
Solve a system of three congruences with coprime moduli (e.g., mod 3, 5, 7) and check the result.
Use CRT to combine nCr mod 2, mod 3, mod 5 into nCr mod 30 (or another small composite).

Summary

CRT: For pairwise coprime m₁, …, m_k, the system x ≡ a_i (mod m_i) has a unique solution modulo M = m₁·…·m_k.
Construction: x = Σ a_i·(M/m_i)·y_i where y_i = (M/m_i)⁻¹ mod m_i; then x mod M.
Two moduli: x = a + m·t with t ≡ (b−a)·m⁻¹ (mod n); solution x mod (m·n).
Moduli must be pairwise coprime for the standard statement. Time O(k · log(max modulus)); reduce x mod M to get the unique representative.

4.15 Lucas Theorem (Large nCr % Mod)

Introduction

Lucas’s theorem computes C(n, k) mod p when p is prime and n, k can be very large (e.g., 10¹⁸). The idea: write n and k in base p; then C(n, k) ≡ Π_i C(n_i, k_i) (mod p), where n_i and k_i are the base-p digits. So instead of precomputing factorials up to n (impossible when n is huge), we only need factorials up to p−1 and then one small product per digit. This is the standard way to compute nCr mod p for large n in competitive programming when p is prime (e.g., p = 10⁹+7).

Real-World Analogy

To compute “choose k from n” mod p, we could use n! / (k!(n−k)!) but n! is astronomically large. Lucas says: break n and k into “digits” in base p (like writing numbers in base 10). Each digit is at most p−1, so “choose k_i from n_i” for each digit is a small binomial coefficient we can precompute. The answer mod p is the product of these small binomials. So we reduce a huge problem to many tiny ones.

Formal Definition

Concept Note

Let p be prime. Write n and k in base p: n = n₀ + n₁·p + n₂·p² + …, k = k₀ + k₁·p + k₂·p² + … (digits 0 ≤ n_i, k_i ≤ p−1). Then C(n, k) ≡ Π_i C(n_i, k_i) (mod p). If for some digit i we have k_i > n_i, then C(n_i, k_i) = 0, so the whole product is 0.

The theorem follows from the fact that (1+x)^p ≡ 1+x^p (mod p) (by the binomial theorem and the fact that p divides C(p,j) for 0 < j < p). So the coefficient of x^k in (1+x)ⁿ mod p factors as the product over digits.

Why This Topic Matters

Large n, prime p: Problems often ask for nCr mod 10⁹+7 with n up to 10¹⁸. You cannot compute n! or even store n. Lucas reduces to O(log_p n) binomials each with arguments < p.
Competitive programming: Standard tool for “n choose k mod prime” when n is huge. Precompute factorials 0..p−1 once, then each query is O(log n).
When p is not prime: Factor p into prime powers, compute nCr mod each prime power (using Lucas for primes; for prime powers there are extensions), then combine with CRT.

Mental Model

Think of n and k in base p. Each “digit position” contributes independently: we must choose k_i “items” from n_i available in that position. The theorem says the total number of ways (mod p) is the product of the ways at each position. So we only ever need C(a, b) for 0 ≤ a, b ≤ p−1—a small table.

Step-by-Step: Computing C(n, k) mod p (Lucas)

Precompute factorials and inverse factorials for 0..p−1 (so we can compute C(n_i, k_i) in O(1)).
Get base-p digits of n and k: n = n₀ + n₁·p + …, k = k₀ + k₁·p + … (e.g., repeatedly n % p, n //= p).
If len(k_digits) > len(n_digits), pad n with zeros (or treat missing digits of n as 0). For each digit index i, if k_i > n_i, return 0.
result = 1. For each i: result = (result * C(n_i, k_i)) % p. Return result.

Example

C(10, 3) mod 5. 10 = 2·5 + 0 → digits (0,2); 3 = 0·5 + 3 → digits (3,0). So C(10,3) ≡ C(0,3)·C(2,0) (mod 5). C(0,3) = 0 (cannot choose 3 from 0). So C(10,3) ≡ 0 (mod 5). Check: C(10,3) = 120 ≡ 0 (mod 5) ✓. Another: C(7,2) mod 5. 7 = (2,1) in base 5, 2 = (2,0). C(7,2) ≡ C(2,2)·C(1,0) = 1·1 = 1 (mod 5). C(7,2) = 21 ≡ 1 (mod 5) ✓.

Python Implementation

Assume we have precomputed fact and inv_fact for indices 0..p−1 (size p).

def nCr_mod_small(n: int, k: int, fact: list[int], inv_fact: list[int], p: int) -> int:
    """C(n,k) mod p when 0 <= n, k < p."""
    if k < 0 or k > n:
        return 0
    return fact[n] * inv_fact[k] % p * inv_fact[n - k] % p

def digits_base_p(x: int, p: int) -> list[int]:
    """Digits of x in base p (LSB first)."""
    if x == 0:
        return [0]
    d = []
    while x:
        d.append(x % p)
        x //= p
    return d

def lucas(n: int, k: int, fact: list[int], inv_fact: list[int], p: int) -> int:
    """C(n, k) mod p using Lucas's theorem. p must be prime."""
    if k < 0 or k > n:
        return 0
    nd = digits_base_p(n, p)
    kd = digits_base_p(k, p)
    if len(kd) > len(nd):
        return 0
    res = 1
    for i in range(len(kd)):
        ni = nd[i] if i < len(nd) else 0
        ki = kd[i]
        if ki > ni:
            return 0
        res = res * nCr_mod_small(ni, ki, fact, inv_fact, p) % p
    return res

Line-by-Line Explanation

digits_base_p(x, p) — Extract digits of x in base p (LSB first) by repeated x % p and x //= p.
if len(kd) > len(nd) — If k has more base-p digits than n, then k > n, so C(n,k) = 0.
for i in range(len(kd)) — We only need to consider digit positions where k has a digit. Where k has no digit (higher positions), k_i = 0 and C(n_i, 0) = 1, so we skip those.
ni = nd[i] if i < len(nd) else 0 — i-th digit of n. Since we already ensured len(kd) ≤ len(nd), we have i < len(nd), so nd[i] exists. (If we didn’t return 0 earlier, n ≥ k implies n has at least as many digits as k.)
if ki > ni: return 0 — If any digit k_i > n_i, C(n_i, k_i) = 0, so the whole product is 0.
res = res * nCr_mod_small(ni, ki, ...) % p — Lucas: multiply the binomial coefficient for each digit position.

Time and Space Complexity

Precomputation: Factorials and inverse factorials for 0..p−1: O(p) time and space. Per query (Lucas): O(log_p n) digit extraction and O(log_p n) small binomial lookups (each O(1) with precomputed fact/inv_fact). So O(log_p n) per query. When n is huge (e.g., 10¹⁸) and p = 10⁹+7, log_p n is about 2, so just a few multiplications.

Edge Cases

k > n or k < 0: C(n, k) = 0. Return 0.
k = 0 or k = n: C(n, 0) = C(n, n) = 1. Lucas gives product of C(n_i, 0) = 1 for each digit → 1.
Any digit k_i > n_i: C(n_i, k_i) = 0, so entire product 0. Return 0.
n = 0: digits [0]; k must be 0 (else k > n); C(0,0) = 1.

Common Mistakes

Using Lucas when p is not prime: The theorem holds only for prime p. For composite modulus, use prime-power factorization and CRT, or use a different method.
Precomputing factorials up to n: The whole point of Lucas is that n can be huge. Precompute only 0..p−1.
Digit order: Lucas uses base-p digits; product is over the same position i. LSB first is consistent as long as both n and k use the same convention.

Common Mistake

Applying Lucas when the modulus is composite (e.g., 10⁹+9 or 1000). Lucas only applies when the modulus is prime. For composite moduli you need to factor into prime powers, compute nCr mod each (e.g., with Lucas for primes), then combine with CRT.

Optimization Insight

Precompute fact and inv_fact once for the given prime p (e.g., at program start). Then each Lucas query is O(log n). When n and k are both less than p, Lucas is unnecessary—use direct nCr with the precomputed arrays (and then you only need arrays of size max(n)+1, but if p is small, size p is fine).

Interview Insight

When asked “how do you compute C(n, k) mod p for very large n?”, say: “Lucas’s theorem: write n and k in base p; then C(n,k) ≡ product of C(n_i, k_i) mod p. So we only need binomials with arguments less than p. I precompute factorials and inverse factorials for 0..p-1, then for each base-p digit compute C(n_i, k_i) and multiply. Time O(log_p n) per query. This only works when p is prime.”

Practice Problems

Implement digits_base_p, nCr_mod_small, and lucas; verify C(10,3) mod 5 = 0, C(7,2) mod 5 = 1.
Solve a problem that asks for C(n, k) mod 10⁹+7 with n ≤ 10¹⁸ using Lucas.
Compare: when n < p, direct nCr with fact[0..n] vs Lucas (both work; direct is simpler).

Summary

Lucas’s theorem (prime p): Write n, k in base p; then C(n, k) ≡ Π_i C(n_i, k_i) (mod p). If any k_i > n_i, result is 0.
Precompute fact and inv_fact for 0..p−1 (O(p)). Each query: get base-p digits of n and k, multiply C(n_i, k_i) for each digit—O(log_p n).
Only applies when p is prime. For composite modulus use prime factors + CRT.
Use when n (or k) can be much larger than p; when n, k < p, direct nCr is enough.

4.16 Inclusion-Exclusion Principle

Introduction

The inclusion-exclusion principle counts the size of a union of finite sets by adding sizes of sets, subtracting sizes of pairwise intersections, adding sizes of triple intersections, and so on—alternating signs so that each element is counted exactly once. It is one of the most useful counting tools in DSA: “how many numbers in [1, N] are not divisible by any of these primes?” “how many permutations have at least one fixed point?” “how many strings avoid certain substrings?” All reduce to union counting. This section states the formula, gives the intuition, and shows how to implement it (often by iterating over subsets of conditions).

Real-World Analogy

You want to count how many people in a room speak English or Spanish or French. If you add “English speakers” + “Spanish” + “French,” you count anyone who speaks two languages twice, and anyone who speaks all three three times. So subtract the counts of “English and Spanish,” “English and French,” “Spanish and French.” Now those who speak all three were added three times and subtracted three times (once in each pair)—so add back “English and Spanish and French.” The result is the count of people who speak at least one of the three. That’s inclusion-exclusion: add singles, subtract pairs, add triples.

Formal Definition

Concept Note

For finite sets A₁, A₂, …, A_n, the size of their union is:

|A₁ ∪ A₂ ∪ … ∪ A_n| = Σ |A_i| − Σ |A_i ∩ A_j| + Σ |A_i ∩ A_j ∩ A_k| − … + (−1)ⁿ⁺¹ |A₁ ∩ … ∩ A_n|

Equivalently: for every non-empty subset S of {1, …, n}, take the intersection of A_i for i ∈ S; add (−1)^|S|+1 times its size. So |∪ A_i| = Σ_{∅≠S⊆{1..n}} (−1)^|S|+1 |∩_i∈S A_i|.

The “inclusion” is adding; the “exclusion” is subtracting to correct overcounts. The sign alternates: odd-sized subsets add, even-sized subtract (or vice versa depending on how you write it; the key is that each element in the union is counted exactly once).

Why This Topic Matters

“Count numbers not divisible by any of …”: Let A_i = numbers divisible by p_i. Then “not divisible by any” = total − |A₁ ∪ … ∪ A_k|. Intersection of A_i for i ∈ S is “divisible by LCM of those p_i”—size ⌊N / LCM⌋.
Derangements: Permutations with no fixed point = n! − (permutations with at least one fixed point). The latter is inclusion-exclusion over “position i is fixed.”
Contest problems: Many “count valid configurations” or “count numbers with property P” use inclusion-exclusion over violating conditions.

Mental Model

We want |A₁ ∪ … ∪ A_n|. If we add all |A_i|, we overcount elements in more than one set. Subtract intersections of two sets; then we undercount elements in three sets. Add intersections of three sets; and so on. The alternating sum makes every element in the union contribute exactly 1 (proved by checking how many times an element in exactly r sets is counted: C(r,1) − C(r,2) + … + (−1)^(r+1) C(r,r) = 1).

Two and Three Sets

Two sets: |A ∪ B| = |A| + |B| − |A ∩ B|.

Three sets: |A ∪ B ∪ C| = |A| + |B| + |C| − |A∩B| − |A∩C| − |B∩C| + |A∩B∩C|.

Pattern: sum of singles, minus sum of pairs, plus sum of triples, …

Step-by-Step: Applying Inclusion-Exclusion

Define the sets: Identify conditions (e.g., “divisible by p_i”). Let A_i be the set of elements satisfying condition i.
Decide what to count: Often we want “elements in none of the sets” = total − |∪ A_i|, or “elements in at least one” = |∪ A_i|.
Compute intersection sizes: For each non-empty subset S, compute |∩_i∈S A_i|. This is problem-specific (e.g., “divisible by LCM of primes in S” → ⌊N / LCM⌋).
Combine with alternating signs: |∪ A_i| = Σ (−1)^|S|+1 |∩_i∈S A_i| over non-empty S. Or “none” = total − that sum.

Example: Numbers Not Divisible by 2 or 3

Example

Count integers in [1, 100] not divisible by 2 or 3. A₁ = divisible by 2, A₂ = divisible by 3. |A₁| = 50, |A₂| = 33, |A₁ ∩ A₂| = divisible by 6 = 16. So |A₁ ∪ A₂| = 50 + 33 − 16 = 67. Numbers not divisible by 2 or 3 = 100 − 67 = 33. Check: numbers 1,5,7,11,13,… (every 6 we have 2 numbers: 1,5 and 7,11 and …) so 100/6 ≈ 16 full blocks of 6, 2 per block → 32, plus remainder → 33 ✓.

Python Implementation (Subset Iteration)

Iterate over non-empty subsets of {0, 1, …, k−1} using bitmasks 1 to 2^k−1. For each subset S, compute the size of the intersection (problem-specific) and add (−1)^|S|+1 × size to the result.

def inclusion_exclusion_union(n_conditions: int, intersection_size: callable) -> int:
    """
    Returns |A0 ∪ A1 ∪ ... ∪ A_{k-1}|.
    intersection_size(S) returns |∩_{i in S} A_i| for S a set or bitmask.
    S is represented as a bitmask: S has bit i set iff i is in S.
    """
    k = n_conditions
    total = 0
    for mask in range(1, 1 << k):
        pop = bin(mask).count("1")
        sign = 1 if pop % 2 == 1 else -1
        total += sign * intersection_size(mask)
    return total

# Example: count 1..N not divisible by any prime in primes
def count_not_divisible(N: int, primes: list[int]) -> int:
    from math import lcm
    def inter_size(mask: int) -> int:
        prod = 1
        for i in range(len(primes)):
            if (mask >> i) & 1:
                prod = lcm(prod, primes[i])
                if prod > N:
                    return 0
        return N // prod
    return N - inclusion_exclusion_union(len(primes), inter_size)

intersection_size(mask) must return the size of the intersection of sets A_i for which bit i is set in mask. For “divisible by prime p_i,” the intersection is “divisible by LCM of selected primes,” so size = N // LCM (or 0 if LCM > N).

Line-by-Line Explanation

for mask in range(1, 1 << k) — Non-empty subsets: mask from 1 to 2^k−1; bit i set means set A_i is in the intersection.
pop = bin(mask).count("1") — |S| = number of sets in this intersection.
sign = 1 if pop % 2 == 1 else -1 — (−1)^|S|+1: odd |S| → +1, even → −1. So we add singles, subtract pairs, add triples, …
total += sign * intersection_size(mask) — Add signed size of this intersection.
In count_not_divisible: “not divisible by any” = N − |union|. inter_size(mask) computes LCM of primes in mask and returns N // LCM (or 0 if LCM > N).

Time and Space Complexity

We iterate 2^k − 1 subsets. For each subset we call intersection_size once. So O(2^k × cost of intersection_size). For the “not divisible by primes” example, each intersection is O(k) for LCM (or O(1) if we precompute LCMs). So total O(2^k · k) or similar. Space O(1) plus the cost of intersection_size. When k is large (e.g., 20), 2^k can be acceptable; when k is very large, we may need a different approach or pruning.

Edge Cases

No conditions (k = 0): Union of zero sets is empty; |∪| = 0. Or define “not in any” = all N elements.
Single set (k = 1): |A₁| = one term; no subtraction.
LCM exceeds N: Intersection “divisible by LCM” has size 0; return 0 to avoid invalid division or wrong count.

Common Mistakes

Wrong sign: The formula is |∪| = Σ (−1)^|S|+1 |∩_S|. So odd-sized subsets add, even-sized subtract. Reversing the sign gives the complement (e.g., “count in none” vs “count in at least one”)—double-check what you want.
Forgetting the empty subset: We do not include the empty subset in the sum (empty intersection would be the whole universe; we’re not adding that). So start mask from 1.
Wrong intersection meaning: For “divisible by p_i,” the intersection over S is “divisible by LCM of p_i for i ∈ S,” not product (product is correct only when primes are pairwise coprime; LCM = product for primes).

Common Mistake

Using the wrong sign for “count elements in none of the sets.” That count = Total − |∪ A_i|. So compute |∪| with inclusion-exclusion (add odd, subtract even), then subtract from total. Don’t flip the sign inside the sum unless you’re sure you’re computing the complement correctly.

Pattern Recognition

When the problem asks “count elements that satisfy none of the bad conditions” or “avoid all of these,” define A_i = “satisfies bad condition i.” Then “satisfies none” = total − |∪ A_i|. When it asks “count elements that satisfy at least one,” you want |∪ A_i| directly. The same subset loop works; only the interpretation (and possibly the final subtraction) changes.

Interview Insight

When the problem involves “count numbers not divisible by any of these” or “count permutations avoiding all of these positions,” say: “I’ll use inclusion-exclusion. Define A_i as the set satisfying condition i (e.g., divisible by p_i). Then |union| = sum over non-empty subsets of (−1)^(|S|+1) times the size of the intersection. I’ll iterate over subsets with a bitmask (1 to 2^k−1), compute the intersection size for each (e.g., N // LCM for the chosen primes), and add with the correct sign. For ‘none of the conditions’ I subtract the union from the total.”

Practice Problems

Count integers in [1, N] not divisible by any of 2, 3, 5 using inclusion-exclusion; verify for N = 100.
Count derangements of n (permutations with no fixed point): n! − (at least one fixed point); expand “at least one fixed point” with inclusion-exclusion over positions.
Given a list of primes, count coprime integers in [1, N] (coprime to product of primes) = same as “not divisible by any prime.”

Summary

Inclusion-exclusion: |A₁ ∪ … ∪ A_n| = Σ_∅≠S (−1)^|S|+1 |∩_i∈S A_i|. Add singles, subtract pairs, add triples, …
Implement by iterating non-empty subsets (e.g., mask 1 to 2^k−1); for each subset compute intersection size and add with sign (−1)^|S|+1.
“Count in none” = total − |∪|. “Count in at least one” = |∪|. For “not divisible by any of primes,” intersection over S = numbers divisible by LCM(primes in S), size ⌊N / LCM⌋.
Time O(2^k × cost of intersection). Watch sign and empty subset.

4.17 Matrix Exponentiation for Recurrence

Introduction

Many linear recurrences (e.g., Fibonacci: F_n = F_n−1 + F_n−2) can be written as a state vector updated by a fixed matrix: v_n = M · v_n−1. Then v_n = Mⁿ · v₀, so we get the n-th term by matrix exponentiation (compute Mⁿ with fast exponentiation from topic 4.6, using matrix multiplication from topic 4.12) and then multiplying by the initial vector. This gives the n-th term in O(d³ log n) time where d is the dimension of the state (e.g., d = 2 for Fibonacci), instead of O(n) with a naive loop. This section shows how to build the matrix from a recurrence and how to implement it in code (including modulo).

Real-World Analogy

Think of the recurrence as a state machine: at each step, the current state (e.g., “last two Fibonacci numbers”) is updated by a fixed rule. That rule is linear—it’s a matrix multiplying the state vector. Doing n steps means applying the matrix n times = Mⁿ. Fast exponentiation lets us compute Mⁿ in about log n “matrix multiplications,” so we jump from “step one by one” to “double the number of steps” each time.

Formal Definition

Concept Note

A linear recurrence of order d has the form F_n = c₁·F_n−1 + c₂·F_n−2 + … + c_d·F_n−d (and initial values F₀, …, F_d−1). Define the state vector v_n = (F_n, F_n−1, …, F_n−d+1)^T. Then there is a d×d matrix M such that v_n = M · v_n−1. So v_n = M^n−d+1 · v_d−1 (or similar, depending on indexing). The first entry of v_n is F_n.

The matrix M encodes the recurrence: the first row is (c₁, c₂, …, c_d); the rest of the rows shift the previous state (identity-like with a shift).

Why This Topic Matters

Fibonacci and similar: F_n in O(log n) instead of O(n). Essential when n is huge (e.g., 10¹⁸).
Linear recurrences in contests: Many problems give a recurrence of order 2 or 3; matrix exponentiation is the standard solution.
Counting paths: Number of walks of length n in a graph = (adjacency matrix)ⁿ; same idea (matrix power).

Fibonacci as a 2×2 Matrix

F_n = F_n−1 + F_n−2. State: v_n = (F_n, F_n−1)^T. We want v_n from v_n−1 = (F_n−1, F_n−2)^T. So F_n = 1·F_n−1 + 1·F_n−2 and F_n−1 = 1·F_n−1 + 0·F_n−2. Hence:

  [ F_n   ]   [ 1  1 ]   [ F_{n-1} ]
  [ F_{n-1} ] = [ 1  0 ] · [ F_{n-2} ]
  So M = [[1,1],[1,0]].  v_n = M · v_{n-1},  so v_n = M^{n-1} · v_1,  with v_1 = (F_1, F_0)^T = (1, 0)^T.
  Thus (F_n, F_{n-1})^T = M^{n-1} · (1, 0)^T;  F_n = (M^{n-1})_{0,0} (top-left of M^{n-1}) or first component of M^{n-1} * (1,0)^T.
  Actually: v_1 = (F_1, F_0) = (1, 0). v_2 = M*v_1 = (1, 1). So v_n = M^{n-1} * v_1. F_n = first entry of v_n = first entry of M^{n-1} * (1,0)^T = (M^{n-1})_{00} * 1 + (M^{n-1})_{01} * 0 = (M^{n-1})_{00}. So F_n = top-left entry of M^{n-1}.
  For n=1: M^0 = I, (1,0) -> F_1 = 1. For n=2: M^1 = M, (1,0) -> (1,1), F_2 = 1. Good.

So F_n = first component of Mⁿ⁻¹ · (1, 0)^T, or equivalently the top-left entry of Mⁿ⁻¹. For n = 0 we define F₀ = 0; handle separately.

Step-by-Step: Building the Matrix for a Recurrence

Write the recurrence: F_n = c₁ F_n−1 + … + c_d F_n−d.
State vector: v = (F_n, F_n−1, …, F_n−d+1)^T (d components).
First row of M: (c₁, c₂, …, c_d) — these are the coefficients of the recurrence.
Row i (i ≥ 2): has a 1 in column i−1 and 0 elsewhere — shifts F_n−1 into second slot, etc. So M is: row 0 = [c₁, c₂, …, c_d]; row 1 = [1, 0, …, 0]; row 2 = [0, 1, 0, …, 0]; …; row d−1 = [0, …, 0, 1, 0].
Initial vector: v_d−1 = (F_d−1, F_d−2, …, F₀)^T. Then v_n = M^n−d+1 · v_d−1 for n ≥ d.
Compute M^n−d+1 with binary exponentiation (matrix version); multiply by v_d−1; the first component is F_n.

Python Implementation

Matrix Power (Modulo)

def mat_pow_mod(M: list[list[int]], exp: int, mod: int) -> list[list[int]]:
    """Returns M^exp mod mod. M is square (d×d)."""
    d = len(M)
    if exp == 0:
        I = [[1 if i == j else 0 for j in range(d)] for i in range(d)]
        return I
    base = [row[:] for row in M]
    res = [[1 if i == j else 0 for j in range(d)] for i in range(d)]
    while exp:
        if exp & 1:
            res = mat_mul_mod(res, base, mod)
        base = mat_mul_mod(base, base, mod)
        exp >>= 1
    return res

Assume mat_mul_mod(A, B, mod) from topic 4.12 (multiplies two matrices and reduces mod mod).

Fibonacci F_n mod m

def fib_mod(n: int, m: int) -> int:
    if n <= 0:
        return 0
    if n == 1:
        return 1
    M = [[1, 1], [1, 0]]
    P = mat_pow_mod(M, n - 1, m)
    # v_n = P * (1, 0)^T; F_n = P[0][0]*1 + P[0][1]*0 = P[0][0]
    return P[0][0]

General Linear Recurrence (Order 2)

F_n = a·F_n−1 + b·F_n−2. Matrix M = [[a, b], [1, 0]]. v_n = Mⁿ⁻¹ · (F₁, F₀)^T. F_n = first component of Mⁿ⁻¹ · (F₁, F₀)^T = P[0][0]*F₁ + P[0][1]*F₀.

Line-by-Line Explanation (mat_pow_mod)

if exp == 0 — M⁰ = identity matrix.
res = identity — Accumulator for the result; we multiply by base when the current bit of exp is 1.
while exp: if exp & 1: res = mat_mul_mod(res, base, mod) — Binary exponentiation: when the LSB of exp is 1, multiply res by base.
base = mat_mul_mod(base, base, mod); exp >>= 1 — Square base and shift exp (same as integer fast exponentiation).

Time and Space Complexity

Matrix multiplication (d×d): O(d³). Matrix power: O(log n) multiplications, so O(d³ log n) time. Space O(d²) for the matrices. For Fibonacci (d = 2), this is O(log n) — much better than O(n) with a loop when n is huge.

Edge Cases

n = 0 or n = 1: Handle before matrix power (F₀ = 0, F₁ = 1 for standard Fibonacci).
Negative n: Often undefined for recurrences; return 0 or handle as invalid.
mod = 1: All entries become 0; return 0.

Common Mistakes

Wrong matrix: The first row must be the recurrence coefficients in order (c₁, c₂, …). Rows below shift: (1,0,…,0), (0,1,0,…), …. Swapping rows or columns gives wrong results.
Wrong initial vector or exponent: v_n = Mⁿ⁻¹ · v₁ for Fibonacci (state has F_n, F_n−1). So we need Mⁿ⁻¹, not Mⁿ. Check with n = 2: M¹ · (1,0) = (1,1) → F₂ = 1 ✓.
Index off by one: F_n = first component of v_n = Mⁿ⁻¹ · v₁. So use exponent n−1 for Fibonacci.

Common Mistake

Using Mⁿ instead of Mⁿ⁻¹ for Fibonacci. We have v_n = M · v_n−1, so v_n = Mⁿ⁻¹ · v₁. So the exponent in mat_pow_mod must be n−1 to get F_n. Using n would give the first component of v_n+1, i.e., F_n+1.

Optimization Insight

For d = 2 (Fibonacci-like), the matrix is 2×2; each multiply is O(1). So total O(log n). For d = 3 or 4, still very fast. When the recurrence has constant coefficients and we need a single term F_n for huge n, matrix exponentiation is the standard; when n is small (e.g., n < 10⁶), a simple loop may be simpler and cache-friendly.

Interview Insight

When asked “compute F_n (or the n-th term of a linear recurrence) for very large n,” say: “I’ll express the recurrence as a state vector updated by a matrix M. Then the n-th state is M^{n-1} times the initial vector. I’ll compute M^{n-1} with matrix binary exponentiation (same as integer fast exponentiation but with matrix multiply). Time O(d^3 log n). For Fibonacci, M = [[1,1],[1,0]], initial (1,0), and F_n is the first component of M^{n-1} * (1,0).”

Practice Problems

Implement mat_pow_mod and fib_mod(n, m); verify F_10 = 55, F_0 = 0, F_1 = 1.
Solve F_n = 2*F_{n-1} + 3*F_{n-2} with given F_0, F_1 using matrix exponentiation.
Count the number of ways to tile a 2×n board with 2×1 dominoes (recurrence: a_n = a_{n-1} + a_{n-2}; same as Fibonacci).

Summary

Linear recurrence F_n = c₁·F_n−1 + … + c_d·F_n−d can be written v_n = M · v_n−1; then v_n = M^n−d+1 · v_d−1. First row of M is (c₁, …, c_d); below that, shift rows.
Fibonacci: M = [[1,1],[1,0]], v₁ = (1,0)^T; F_n = first component of Mⁿ⁻¹ · v₁ = (Mⁿ⁻¹)_0,0.
Compute M^exp with matrix binary exponentiation (same as fast exponentiation, with mat_mul_mod). Time O(d³ log n), space O(d²).
Handle n < d (base cases) separately. Use exponent n−1 (not n) for Fibonacci to get F_n.

5.1 Array Basics

Introduction

An array is a contiguous block of memory that stores a sequence of elements of the same type, each identifiable by an index. In Python, the built-in list is the primary “array” type: it supports indexing (arr[i]), length (len(arr)), and dynamic growth (append). Arrays are the foundation of most data structures and algorithms—strings are arrays of characters, matrices are arrays of arrays, and most problems involve traversing or querying an array. This section covers what an array is, 0-based indexing, basic operations (access, update, traverse), and how Python lists behave so you can reason about time complexity and edge cases.

Real-World Analogy

Think of an array like a row of lockers or parking spots numbered 0, 1, 2, … Each slot holds one item. You can go directly to slot 5 (O(1) “access”) and read or replace what’s there. You can walk the row from start to end (traversal). The “address” of each slot is computed from the base address plus the index—that’s why access by index is constant time. Adding a new slot at the end is cheap (append); inserting in the middle or at the front may require shifting (expensive in a true array; Python lists hide this with amortized cost).

Formal Definition

Concept Note

An array of size n is a sequence of n elements stored in contiguous memory, indexed by integers 0 to n−1 (or 1 to n in 1-based indexing; we use 0-based). Access by index i is O(1) because the address of the i-th element is base + i × (size of one element). Length is typically stored, so len is O(1).

In a static array, the size is fixed at creation. In a dynamic array (like Python’s list), the size can grow (and sometimes shrink); append is amortized O(1), but insert at position 0 is O(n) because elements must be shifted.

Why This Topic Matters

Foundation: Almost every data structure (strings, heaps, graphs as adjacency lists) uses arrays. Two-pointer, sliding window, and prefix-sum techniques all operate on arrays.
Interviews: “Given an array of integers…” is the most common problem start. You must be comfortable with indexing, bounds, and traversal.
Complexity: Access O(1); search by value O(n); insert at end amortized O(1); insert at front or middle O(n). Choosing the right structure (array vs linked list) depends on these costs.

Mental Model

Picture a row of boxes numbered 0, 1, …, n−1. Each box holds one value. “arr[i]” means “open box i.” “len(arr)” is the number of boxes. Traversal is “visit each box in order.” Slicing arr[start:end] is “the segment from box start up to (but not including) box end.” Negative index −1 means “last box,” −2 means “second to last,” and so on.

Indexing and Slicing in Python

0-based: First element is arr[0], last is arr[len(arr)−1] or arr[−1].
Negative indices: arr[−1] is the last element, arr[−2] the second-to-last; arr[−i] is the same as arr[len(arr)−i] (for valid i).
Slicing: arr[start:end] gives elements from index start to end−1 (end excluded). arr[:end] means start=0; arr[start:] means end=len(arr). arr[::step] can reverse (step=−1) or skip elements.

arr = [10, 20, 30, 40, 50]
arr[0]    # 10
arr[-1]   # 50
arr[1:4]  # [20, 30, 40]
arr[::-1] # [50, 40, 30, 20, 10] — reverse

Basic Operations

Operation	Python	Time
Access by index	`arr[i]`	O(1)
Update	`arr[i] = x`	O(1)
Length	`len(arr)`	O(1)
Append	`arr.append(x)`	O(1) amortized
Insert at position	`arr.insert(i, x)`	O(n)
Search by value	`x in arr`	O(n)

Traversal

# By index
for i in range(len(arr)):
    print(arr[i])

# By value
for x in arr:
    print(x)

# With index and value
for i, x in enumerate(arr):
    print(i, x)

Edge Cases

Empty array: arr = []; len(arr) = 0. Accessing arr[0] raises IndexError. Check len(arr) or use “if arr” before indexing.
Index out of bounds: arr[i] when i < 0 or i ≥ len(arr) raises IndexError (for negative i, Python interprets as relative to end; −1 is valid if len ≥ 1).
Single element: arr[0] and arr[−1] are the same; no special case needed.

Common Mistakes

Off-by-one: Valid indices are 0 to len(arr)−1. Loop “for i in range(len(arr))” gives 0..len−1. Using range(1, len(arr)) skips the first element; range(len(arr)+1) causes IndexError on the last iteration.
Modifying list while iterating: Removing or inserting elements during “for x in arr” can skip elements or cause errors. Iterate over a copy (e.g., arr[:]) or use indices and adjust.
Assuming append returns the list: arr.append(x) returns None and mutates arr. Don’t write result = arr.append(x) and expect result to be the new list.

Common Mistake

Using 1-based logic when the language is 0-based. “First element” is arr[0], “second” is arr[1]. “Element at position i” in problem statements often means 1-based; convert to index i−1 when accessing, or use arr[i−1] and document that you’re 1-based.

Interview Insight

When given “an array of n integers,” clarify: “0-indexed? Can be empty? Sorted or unsorted? Any duplicates?” Then state: “I’ll use a list; access and update by index are O(1). I’ll traverse with for i in range(len(arr)) when I need the index, or for x in arr when I only need values. For in-place changes I’ll be careful not to modify while iterating.”

Practice Problems

Given an array, return the maximum element and its index (one pass).
Reverse an array in place (swap arr[i] and arr[n−1−i] for i in range(len(arr)//2)).
Check if an array is palindrome (compare arr[i] and arr[n−1−i]).

Summary

An array is a contiguous sequence of elements indexed 0 to n−1. Access and update by index are O(1).
Python list: append amortized O(1), insert O(n), “x in arr” O(n). Use 0-based and negative indices (arr[−1] = last).
Traverse with for i in range(len(arr)), for x in arr, or enumerate(arr). Watch empty array and index bounds.
Off-by-one and “modify while iterating” are common bugs; clarify 0 vs 1-based when the problem says “position.”

5.2 Searching

Introduction

Searching in an array means finding whether a given value (the target) exists and, often, at which index. The right search strategy depends on whether the array is sorted or unsorted. For an unsorted array, you have no choice but to scan elements until you find the target or reach the end—linear search, O(n). For a sorted array, you can repeatedly eliminate half of the remaining elements—binary search, O(log n). This section builds both from scratch, explains why binary search works only when the array is sorted, and shows how to implement "first index of," "last index of," and related variants so you can handle interview problems confidently.

Real-World Analogy

Imagine finding a name in a phone book (sorted A–Z). You don't read every page: you open the middle, see "M," and know the name is either in the first half or the second half. You throw away one half and repeat. That's binary search—each step halves the search space. Now imagine a pile of unsorted receipts. To find one with a certain date, you must look through them one by one until you find it or run out. That's linear search. The structure of the data (sorted vs not) determines which strategy is possible and how fast it can be.

Example

Given arr = [3, 7, 2, 9, 1] (unsorted), finding 7 requires checking 3, then 7—you might get lucky, but in the worst case you check all 5. Given arr = [1, 2, 3, 7, 9] (sorted), you compare 7 to the middle (3), then to the right half's middle (7), and you're done in 2 steps. No need to look at 1 or 2 or 9.

Formal Definition

Concept Note

Search problem: Given an array A of size n and a target value x, determine if x is in A and optionally return an index i such that A[i] = x. Linear search examines elements in order until a match or end of array; worst-case time O(n), space O(1). Binary search (for sorted arrays) repeatedly compares the target to the middle element and discards half of the remaining range; worst-case time O(log n), space O(1) (iterative) or O(log n) (recursive stack).

Binary search requires a sorted array (or an array that can be treated as sorted with a predicate). The "discard half" step is valid only when you know that all elements in one half cannot contain the target—which follows from the ordering.

Why This Topic Matters

Interviews: "Find the index of target in a sorted array" is a classic. You must code binary search correctly (bounds, loop condition, when to return) and know first/last occurrence variants.
Building block: Binary search on answer (Topic 5.9), search in rotated array, and many optimization problems use the same "narrow the range" idea.
Complexity: Going from O(n) to O(log n) when the array is sorted is a huge win for large n. Knowing when you can use binary search (sorted + comparable) is essential.

Mental Model

Linear search: "Walk from index 0 to n−1; stop when you see the target or run out." Binary search: "Keep a range [left, right] where the target might be. While the range is non-empty, look at the middle element. If it equals the target, you're done. If it's less than the target, the target must be in the right half (if at all). If it's greater, the target must be in the left half. Update the range and repeat." The key is that "left" and "right" are indices, and you always shrink the range; the loop exits when left > right (range is empty) or when you find the target.

Linear Search (Unsorted or Any Array)

Algorithm

For each index i from 0 to n−1:
If arr[i] == target, return i (or True).
If the loop finishes without returning, the target is not present; return −1 (or False).

Python Implementation

def linear_search(arr, target):
    for i in range(len(arr)):
        if arr[i] == target:
            return i
    return -1

Time O(n): in the worst case we check every element. Space O(1). This is optimal for an unsorted array—you cannot avoid looking at every element in the worst case (the target might be last or absent).

Python built-ins: target in arr returns True/False (same idea, O(n)). arr.index(target) returns the first index or raises ValueError; also O(n).

Binary Search (Sorted Array)

Why It Works

If the array is sorted in non-decreasing order, then for any index mid, every element to the left is ≤ arr[mid] and every element to the right is ≥ arr[mid]. So when we compare target to arr[mid]:

If target == arr[mid], we found it.
If target < arr[mid], the target cannot be at mid or to the right; search only [left, mid−1].
If target > arr[mid], the target cannot be at mid or to the left; search only [mid+1, right].

Each step removes at least half of the remaining indices, so after O(log n) steps the range is empty or we find the target.

ASCII Diagram: Binary Search Step

  Sorted array:  [ 2,  5,  7,  9, 12, 15 ]   target = 9
  Index:          0   1   2   3   4   5

  Step 1: left=0, right=5, mid=2 → arr[2]=7 < 9 → search right half
          [ 2,  5,  7, | 9, 12, 15 ]
                       ↑
  Step 2: left=3, right=5, mid=4 → arr[4]=12 > 9 → search left half
          [ 9, 12, 15 ]
            ↑
  Step 3: left=3, right=3, mid=3 → arr[3]=9 == 9 → found at index 3.

Standard Binary Search (Any Occurrence)

def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        if arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1

Line-by-Line Explanation

left <= right: When left == right, the range has one element; we must check it. So the loop condition is left <= right. Exiting when left > right means the range is empty—target not found.
mid = (left + right) // 2: Middle index (integer division). Avoids overflow in other languages; in Python, (left + right) // 2 is standard.
If arr[mid] < target, every element at index ≤ mid is too small, so set left = mid + 1. If arr[mid] > target, every element at index ≥ mid is too large, so set right = mid - 1.

First Occurrence (Leftmost Index)

When duplicates are allowed, "find the first index where arr[i] == target" requires a small change: when arr[mid] == target, don't return yet—remember mid as a candidate and continue searching the left half (there might be an earlier occurrence).

def first_index(arr, target):
    left, right = 0, len(arr) - 1
    result = -1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            result = mid
            right = mid - 1   # keep looking left
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return result

When we find a match, we shrink the range to [left, mid−1] to see if there's another match to the left. If not, result holds the leftmost index we saw.

Last Occurrence (Rightmost Index)

Similarly, for the last occurrence: when arr[mid] == target, set result = mid and search the right half with left = mid + 1.

def last_index(arr, target):
    left, right = 0, len(arr) - 1
    result = -1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            result = mid
            left = mid + 1    # keep looking right
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return result

Evolution: Brute Force → Linear → Binary

Approach	When	Time	Space
Linear search	Unsorted or one-off	O(n)	O(1)
Binary search	Sorted array	O(log n)	O(1)

Optimization Insight

If the array is sorted, always prefer binary search over linear search—O(log n) vs O(n). If you need to search the same array many times, consider sorting once (O(n log n)) and then doing k binary searches (k × O(log n)); that can beat k linear searches (k × O(n)) when k is large.

Time and Space Complexity

Linear search: Time O(n), space O(1).
Binary search (iterative): Time O(log n)—each step halves the range, so at most ⌈log₂(n+1)⌉ iterations. Space O(1).
Binary search (recursive): Same time O(log n), but space O(log n) for the call stack.

Edge Cases

Empty array: Linear: loop doesn't run, return −1. Binary: left=0, right=-1, so left <= right is false; return −1.
Target not present: Both return −1 (or your chosen sentinel).
Single element: Linear: one comparison. Binary: one iteration, left==right, check arr[mid].
All same value: Linear finds the first. Standard binary finds any; first_index/last_index give the correct boundary.
Unsorted array: Binary search is wrong—it can miss the target or return an arbitrary index. Always ensure the array is sorted (or use a predicate that preserves the "discard half" property) before using binary search.

Common Mistakes

Using binary search on an unsorted array: Binary search assumes sorted order. If the array isn't sorted, use linear search or sort first.
Off-by-one in loop condition: Use left <= right so that when left == right you still check that single element. Using left < right can skip the last candidate.
Wrong mid update: When arr[mid] < target, the target is in the right half, so left = mid + 1. When arr[mid] > target, right = mid - 1. Don't set left = mid or right = mid without ±1, or the range might not shrink and you can get an infinite loop.
Integer overflow for mid: In C/Java, mid = (left + right) / 2 can overflow for very large indices. Use mid = left + (right - left) / 2. In Python, (left + right) // 2 is fine.

Common Mistake

Writing while left < right and then returning left or right without verifying that arr[left] == target. The "find insertion point" variant (bisect) uses left < right and returns left; the "find exact match" variant should use left <= right and return when arr[mid] == target, or −1 when the loop exits.

Python Built-ins: bisect

The bisect module provides binary search for sorted lists:

bisect.bisect_left(arr, target): leftmost index where arr[i] >= target (insertion point to keep sorted). If target is present, this is the first occurrence.
bisect.bisect_right(arr, target) (or bisect.bisect): rightmost index where arr[i] <= target is still true before the next element—i.e., one past the last occurrence of target.

So "first index of target" is bisect_left (and check arr[i]==target); "last index" is bisect_right(arr, target) - 1 (and check). Count of target = bisect_right(arr, target) - bisect_left(arr, target).

Interview Insight

Clarify: "Is the array sorted? Can there be duplicates? What should I return if the target is not found—−1 or something else?" Then implement binary search with left <= right, correct updates for left/right, and handle the "first/last index" variant if asked. Mention bisect in Python if the problem is just "find index" and you're allowed to use the standard library.

Practice Problems

Binary search: Sorted array, return index of target or −1.
First and last position: Sorted array with duplicates; return [first_index, last_index] of target or [−1, −1].
Search insert position: Sorted array, return the index where target would be inserted to keep order (same as bisect_left).
Count occurrences: Sorted array, count how many times target appears (last_index − first_index + 1, or use bisect).

Summary

Linear search: Scan from 0 to n−1; O(n) time, O(1) space. Use for unsorted arrays or when you need a simple one-off check.
Binary search: Requires sorted array. Maintain [left, right]; compare target to middle; discard half each time. O(log n) time, O(1) space (iterative).
Use left <= right and mid = (left+right)//2; update left = mid+1 or right = mid−1 so the range always shrinks.
First occurrence: when match, search left (right = mid - 1). Last occurrence: when match, search right (left = mid + 1).
Python: in and index() are linear. For sorted lists, use bisect_left / bisect_right for insertion position and range of target.

5.3 Insertion & Deletion

Introduction

Insertion means adding a new element at a given position; deletion means removing an element (by index or by value). Because array elements are stored in contiguous memory, inserting or deleting in the middle (or at the front) forces the rest of the elements to shift—that’s why these operations are O(n) in the worst case. Appending at the end is the exception: no shift is needed, so it’s O(1) amortized in a dynamic array like Python’s list. This section covers exactly when and why shifting happens, how to implement insertion and deletion correctly, and how to reason about time complexity so you can choose the right structure (array vs linked list) when the problem involves many middle insertions or deletions.

Real-World Analogy

Imagine a row of cars in a parking lot with no gaps. To add a car at spot 2, you must move the car currently at 2 (and every car after it) one spot forward to make room—that’s insertion and shifting. To remove the car at spot 2, you must move every car after it one spot backward to close the gap—that’s deletion and shifting. Adding a car at the end of the row doesn’t require moving anyone. The “contiguous, no gaps” rule is what makes middle insertions and deletions expensive; a linked list is like having each car point to the next, so you can insert or remove by changing pointers without moving everyone.

Example

arr = [10, 20, 30, 40]. Insert 25 at index 2: we need [10, 20, 25, 30, 40]. Elements at indices 2 and 3 (30, 40) shift right. Delete element at index 1 (20): we need [10, 30, 40]. Elements at indices 2 and 3 shift left. In both cases, the number of elements that move is proportional to the number of positions after the insertion or deletion point—hence O(n) in the worst case.

Formal Definition

Concept Note

Insertion at position i: Add a new element at index i. All elements at indices ≥ i must move one position to the right (or the array must be reallocated). The number of shifts is n−i in the worst case, so O(n) time. Deletion at position i: Remove the element at index i. All elements at indices > i must move one position to the left. The number of shifts is n−1−i, so O(n) time. Append (insert at end): No shift; O(1) amortized. Space for in-place operations is O(1) extra; dynamic arrays may use extra space for growth.

In a static array (fixed size), insertion might be impossible if the array is full; deletion only “marks” or overwrites. In a dynamic array (Python list), the structure can grow and shrink; the implementation hides reallocation, but the shift cost remains when inserting or deleting not at the end.

Why This Topic Matters

Complexity reasoning: You must know that append is cheap and insert(0, x) or insert(i, x) for small i is expensive. Same for pop(0) vs pop().
Choosing data structures: If the problem has many insertions/deletions at the front or middle, an array (list) may be the wrong choice—deque or linked list can offer O(1) at ends or O(1) at a known node.
Interviews: “Implement a list that supports insert and delete” or “why is insert at front slow?”—you need to explain shifting and give the correct big-O.

Mental Model

Picture the array as a row of slots. Insert at i: Make room by shifting everything from i to the end one step to the right, then write the new element at i. Delete at i: Remove the element at i and shift everything from i+1 to the end one step to the left. The “shift” is a loop that copies elements; the cost is proportional to how many elements lie after the insertion or deletion point.

Insertion

Insert at End (Append)

No shifting: the new element goes into the next available slot. In Python, arr.append(x) is O(1) amortized (occasional reallocation when capacity is exceeded, but amortized constant).

arr = [10, 20, 30]
arr.append(40)   # arr → [10, 20, 30, 40]

Insert at Position i

Elements at indices i, i+1, …, n−1 must move one place right; then write the new element at i. So we need space for one more element (dynamic array handles this) and a loop that copies from right to left to avoid overwriting.

  Before:  [ 10, 20, 30, 40 ]   insert 25 at index 2
  Index:      0   1   2   3

  Step 1: Shift right from index 2 onward (copy 40→3, 30→2)
  After:   [ 10, 20, 25, 30, 40 ]
                       ↑ new

Python: arr.insert(i, x) inserts x at index i; all elements at i and beyond shift right. Time O(n).

arr = [10, 20, 30, 40]
arr.insert(2, 25)   # arr → [10, 20, 25, 30, 40]

Insert at beginning (insert(0, x)) shifts all n elements—O(n). Insert at end is same as append—O(1) amortized.

Deletion

Delete by Index

Remove the element at index i; elements at i+1, …, n−1 shift left by one. Time O(n) because up to n−1 elements may move.

  Before:  [ 10, 20, 30, 40 ]   delete index 1
  After:   [ 10, 30, 40 ]
               ↑ 30 and 40 moved left

Python: arr.pop(i) removes and returns the element at index i (default i = last, so pop() is O(1) at end). pop(0) is O(n)—shifts all remaining elements.

arr = [10, 20, 30, 40]
arr.pop(1)    # returns 20, arr → [10, 30, 40]
arr.pop()     # returns 40, arr → [10, 30]   (pop from end, O(1))

Delete by Value

Find the first occurrence of the value (O(n) search) and remove it (shift the rest left, O(n)). Total O(n). Python: arr.remove(x) removes the first occurrence of x; raises ValueError if not found.

arr = [10, 20, 20, 30]
arr.remove(20)   # removes first 20 → [10, 20, 30]

Step-by-Step: Manual Insert and Delete (Conceptual)

Insert x at index i (without using insert):

Ensure capacity (or append a dummy so length increases by 1).
For j from n−1 down to i: set arr[j+1] = arr[j]. (Shift right from the end so we don’t overwrite.)
Set arr[i] = x.

Delete at index i (without using pop):

For j from i to n−2: set arr[j] = arr[j+1]. (Shift left.)
Decrease length by 1 (or remove last slot).

Python Implementation Summary

# Insertion
arr.append(x)           # end, O(1) amortized
arr.insert(i, x)        # at index i, O(n)

# Deletion
arr.pop()               # remove last, O(1)
arr.pop(i)              # remove at index i, O(n)
arr.remove(x)           # remove first occurrence of x, O(n)

# Length changes
len(arr)                # after insert +1, after delete −1

Line-by-Line Notes

insert(0, x) and pop(0) are O(n)—avoid in a loop or use collections.deque for O(1) at both ends.
remove(x) only removes the first match. To remove all occurrences, either loop (careful: indices change) or build a new list with a list comprehension.
Neither insert nor append nor pop returns the list; they mutate and return the element (pop) or None (insert/append/remove).

Evolution: Many Insertions/Deletions

Scenario	Best choice	Why
Insert/delete only at end	List (array)	append/pop() are O(1) amortized.
Insert/delete at front (e.g. queue from front)	deque	appendleft/popleft O(1); list insert(0)/pop(0) O(n).
Insert/delete in middle by index	List or linked list	List O(n) per op; linked list O(1) if you have the node (but no random access).

Optimization Insight

If you are building a list and only ever append, a dynamic array is optimal. If you need to remove from the front frequently (e.g. queue), use collections.deque so that popleft is O(1). If you need many middle insertions and you have a reference to the node, a linked list avoids shifting—but you lose O(1) access by index.

Time Complexity

append(x): O(1) amortized.
insert(i, x): O(n); worst when i=0 (shift all).
pop(): O(1).
pop(i): O(n); worst when i=0.
remove(x): O(n) (find + shift).

Space Complexity

In-place insertion/deletion: O(1) extra space (aside from the list’s own storage). The list may over-allocate for growth; that’s implementation-dependent and amortized.

Edge Cases

Empty list: pop() and pop(i) raise IndexError; remove(x) raises ValueError. Check if arr before popping.
Single element: pop(0) or pop() both remove the only element and leave an empty list.
Index out of range: insert(len(arr), x) is valid (same as append). insert(i, x) for i > len(arr) can raise or append (Python: insert at end if i > len). pop(i) for i ≥ len raises IndexError.
Value not present: remove(x) raises ValueError if x not in list. Check if x in arr first if you need to avoid the exception.

Common Mistakes

Using insert(0, x) or pop(0) in a loop: That’s O(n) per call, so k operations become O(kn). Use deque for queue-like behavior.
Removing while iterating: Deleting elements in a for x in arr loop skips elements (index shifts). Iterate over a copy (e.g. for x in arr[:]) or use a while loop and adjust index when you remove.
Assuming remove removes all occurrences: It removes only the first. To remove all, use list comprehension arr = [a for a in arr if a != x] or a loop over a copy.
Expecting insert/append to return the list: They return None; the list is modified in place.

Common Mistake

Building a list by repeatedly inserting at the front: result.insert(0, x) in a loop makes each insertion O(n), so total O(n²). Instead, append in the loop and reverse at the end (result.reverse() or result[::-1]), or use a deque and appendleft, then convert to list if needed.

Pattern Recognition

When a problem involves “add element” or “remove element”:

If only at the end: list append/pop is fine.
If at the front: consider deque (Topic 9.7).
If in the middle and you need to preserve order: list insert/pop is O(n) per op; acceptable for small n or few operations.

Interview Insight

When asked “how do you insert/delete in an array?”, explain the shift: “Insert at i shifts elements from i to the end right; delete at i shifts elements from i+1 to the end left. So both are O(n) in the worst case. Append and pop from the end are O(1). If the problem needs many front operations, I’d use a deque.” Mention that remove-by-value is O(n) because it combines search and shift.

Practice Problems

Implement “insert at index i” and “delete at index i” on a list without using insert/pop (use a loop to shift).
Remove all occurrences of value x from a list in place (one pass with two pointers or build new list and assign back).
Merge two sorted arrays into one sorted array (compare and push to result; no “insert in middle” needed—just append).

Summary

Insertion at i: Shift elements from i right, then write. O(n). Append is O(1) amortized.
Deletion at i: Shift elements from i+1 left. O(n). Pop from end is O(1).
Python: insert(i,x), pop(i), remove(x); avoid insert(0,x) and pop(0) in a loop—use deque for O(1) at both ends.
Don’t remove (or insert) while iterating over the same list; use a copy or index-based loop with care.

5.4 Two Pointer Technique

Introduction

The two pointer technique uses two indices (or pointers) that move through an array—often from opposite ends or both from the start—to solve a problem in one pass with O(n) time and O(1) extra space. Instead of nested loops (e.g. checking every pair), you move the pointers based on the current values and the problem condition, so each element is considered at most a constant number of times. Typical uses: finding a pair with a given sum in a sorted array, removing duplicates in place, checking if a sequence is a palindrome, or partitioning (e.g. move zeros to the end). This section covers the main patterns (converging pointers, same-direction pointers) and when to apply each.

Real-World Analogy

Imagine two people walking toward each other from opposite ends of a corridor. They meet in the middle. If the corridor is sorted by some rule (e.g. height), you can decide at each step whether the “left” person or the “right” person should move so you get closer to a goal (e.g. two people whose heights sum to a target). That’s the converging two-pointer pattern. Alternatively, imagine one person walking fast and one slow along the same path—useful for finding the “middle” or a cycle. That’s the fast/slow or same-direction pattern. In both cases, you avoid checking every possible pair by using the structure of the array (e.g. sorted) to rule out large parts of the search space.

Example

Sorted array [1, 2, 3, 4, 5], target sum 7. Put left=0, right=4. arr[0]+arr[4]=6 < 7 → increase sum by moving left right. left=1: arr[1]+arr[4]=7 → found. Only a few comparisons instead of checking all pairs.

Formal Definition

Concept Note

Two pointer technique: Maintain two indices left and right (or slow and fast). At each step, update one or both based on arr[left], arr[right], and the problem condition. Converging: left starts at 0, right at n−1; they move toward each other until left ≥ right. Same direction: both start at 0 (or 0 and 1); one or both advance. Guarantee: each element is processed O(1) times, so total time O(n), space O(1).

Converging pointers work well when the array is sorted (or has a known structure) so that moving one pointer in one direction has a predictable effect (e.g. sum increases or decreases). Same-direction pointers work for in-place compaction, “read” vs “write” positions, or fast/slow for cycle or middle detection.

Why This Topic Matters

Interviews: Two Sum (sorted), 3Sum, remove duplicates, move zeros, palindrome, container with most water—all frequently asked and often solved with two pointers.
Efficiency: Replaces O(n²) “check every pair” with O(n) when the problem allows ruling out ranges using order or invariants.
In-place: Many two-pointer solutions use O(1) extra space, which is required when you cannot allocate a new array.

Mental Model

Converging: You have a “window” [left, right]. The condition (e.g. sum, or “elements between”) depends on both ends. If the current window is too small (e.g. sum too low), move the pointer that will increase it (e.g. left++ on a sorted array increases the smallest element). If too large, move the pointer that will decrease it (e.g. right--). Same direction: One pointer is the “reader” (scans the array), the other is the “writer” (next position to write a valid element). Or one is “slow” and one “fast” so that when fast has moved 2k steps, slow has moved k (e.g. find middle).

Pattern 1: Converging Pointers (Opposite Ends)

Use when the array is sorted (or can be sorted) and you are comparing or combining values at two positions. Start with left = 0, right = len(arr) - 1. Loop while left < right. Depending on the problem, move left right or right left so the search space shrinks.

Two Sum in Sorted Array

Find two indices such that arr[i] + arr[j] == target. Because the array is sorted, if arr[left] + arr[right] < target, we need a larger sum—move left right. If arr[left] + arr[right] > target, move right left. If equal, return the pair.

def two_sum_sorted(arr, target):
    left, right = 0, len(arr) - 1
    while left < right:
        s = arr[left] + arr[right]
        if s == target:
            return [left, right]
        if s < target:
            left += 1
        else:
            right -= 1
    return [-1, -1]

Each iteration either increases left or decreases right, so the number of steps is at most n−1. Time O(n), space O(1).

ASCII Diagram: Converging Pointers

  Sorted: [ 1,  2,  3,  4,  5 ]   target = 7
  Index:    0   1   2   3   4
            ↑               ↑
          left            right   sum=6 < 7 → left++
               ↑            ↑
             left         right   sum=7 → return (1, 4)

Check Palindrome (converging)

Check if arr reads the same from both ends. Move left and right toward the center; if arr[left] != arr[right], return False. Stop when left >= right.

def is_palindrome(arr):
    left, right = 0, len(arr) - 1
    while left < right:
        if arr[left] != arr[right]:
            return False
        left += 1
        right -= 1
    return True

Pattern 2: Same-Direction Pointers (Read/Write)

One pointer scans the array (read); the other marks where to write the “next valid” element. Used for in-place removal or compaction (e.g. remove duplicates, move zeros).

Remove Duplicates In Place (Sorted)

Sorted array: keep one copy of each value, in place. write is the next index to write a unique value; read scans. When arr[read] != arr[write-1] (or write==0), copy arr[read] to arr[write] and advance write. Return write as the new length.

def remove_duplicates_sorted(arr):
    if not arr:
        return 0
    write = 1
    for read in range(1, len(arr)):
        if arr[read] != arr[write - 1]:
            arr[write] = arr[read]
            write += 1
    return write

Move Zeros to End

Keep relative order of non-zeros; put all zeros at the end. write = next position for a non-zero. Scan with read; when arr[read] != 0, swap or copy to arr[write] and increment write. Fill the rest with zeros if needed (or swap and leave zeros at end).

def move_zeros(arr):
    write = 0
    for read in range(len(arr)):
        if arr[read] != 0:
            arr[write], arr[read] = arr[read], arr[write]
            write += 1

Time O(n), space O(1).

Pattern 3: Fast and Slow Pointers

Two pointers both start at the beginning; one moves one step per iteration, the other two steps (or different rates). Used for finding the middle of a list, detecting a cycle (in linked lists), or similar “position relative to length” problems. On arrays, a common use is “slow” = write, “fast” = read, which is the same as the read/write pattern above.

Step-by-Step: Two Sum (Sorted)

Set left = 0, right = len(arr) - 1.
While left < right: compute s = arr[left] + arr[right].
If s == target, return (left, right).
If s < target, do left += 1 (we need a larger sum; increasing the left element increases the sum because the array is sorted).
If s > target, do right -= 1.
If the loop exits, no pair found; return a sentinel.

Evolution: Brute Force → Two Pointers

Two sum (sorted): Brute force: two nested loops, check every pair—O(n²). Two pointers: one pass from both ends—O(n). The key is using sorted order: if the current sum is too small, the only way to get a larger sum is to move the left pointer right; if too large, move the right pointer left.

Optimization Insight

Whenever you have a sorted array and need to find a pair (or triple) satisfying a condition, consider converging pointers (or one pointer + binary search). Same-direction pointers are for in-place scans (one pass, O(1) space). Don’t fall back to nested loops if a single pass with two indices is enough.

Time and Space Complexity

Converging (two sum, palindrome): Each iteration does O(1) work and one of the pointers moves; total iterations O(n). Time O(n), space O(1).
Same direction (remove duplicates, move zeros): Single loop, each element read once, O(1) writes. Time O(n), space O(1).

Edge Cases

Empty or single element: For converging, left < right is false when n ≤ 1; handle (e.g. return False or empty result). For read/write, handle n==0 so you don’t use write-1 when write is 0.
No valid pair: Return a clear value (e.g. [-1,-1], False, or empty list).
Multiple valid pairs: Clarify whether you need one or all (e.g. two sum usually returns one pair; 3Sum returns all unique triplets).
Duplicates: In “remove duplicates,” sorted array is assumed. In two sum, if you need distinct indices, (left, right) are always distinct when left < right.

Common Mistakes

Using two pointers on an unsorted array for “pair sum”: Converging logic assumes order; sort first or use a hash map.
Wrong loop condition: Use left < right for converging (not left <= right unless you need to handle the same index twice). For “find pair,” left and right must be distinct.
Moving the wrong pointer: In a sorted array, smaller index → smaller value. So to increase sum, move left right; to decrease, move right left. Reverse if the array is sorted descending.
Read/write: off-by-one: In remove duplicates, the first element is always kept; write starts at 1 and we compare with arr[write-1].

Common Mistake

Using left <= right and then using left and right as a pair: when left == right, you’re using the same index twice. For “two distinct indices,” keep left < right. For “palindrome,” left == right is the middle element and doesn’t need to be compared with itself, so left < right is correct.

Pattern Recognition

Sorted array + pair/triplet sum or “find two”: Think converging pointers (or one pointer + binary search for the second).
In-place remove/compact (duplicates, zeros, “remove value”): Think same-direction read/write pointer.
Palindrome, “valid from both ends”: Converging pointers from both ends.
Linked list middle or cycle: Fast/slow pointers (Topic 8).

Interview Insight

When the problem involves “find two indices” or “in-place removal” or “palindrome,” say: “I can use two pointers. If the array is sorted, I’ll start from both ends and move based on the condition. If I need to remove elements in place, I’ll use a read and a write pointer.” State the invariant (e.g. “elements in [0, write) are the valid ones”) and the time/space (O(n), O(1)).

Practice Problems

Two Sum II (sorted): Return indices (1-based) of two numbers that add to target; converging pointers.
Remove duplicates from sorted array: In place, return new length; same-direction write pointer.
Move zeros: In place; read/write with swap.
Valid palindrome: Ignore non-alphanumeric, case-insensitive; converging pointers.
Container with most water: Heights array; converging pointers (move the shorter line inward).

Summary

Two pointers = two indices moving in one pass; often O(n) time, O(1) space.
Converging: left at 0, right at n−1; move toward each other. Use for sorted array pair sum, palindrome, “two from ends.”
Same direction: read and write (or fast/slow). Use for in-place remove duplicates, move zeros, compaction.
Loop condition for “distinct pair”: left < right. Move the pointer that will fix the condition (e.g. sum too small → left++, sorted).
Recognize “sorted + pair” and “in-place remove” as two-pointer problems to avoid O(n²) or extra space.

5.5 Sliding Window

Introduction

The sliding window technique solves problems on contiguous subarrays (or substrings) by maintaining a “window” [left, right] and moving it in one pass. Instead of checking every possible subarray (O(n²)), you expand or shrink the window based on the problem condition so each element is added and removed from the window at most twice—O(n) time. There are two main types: fixed-size window (e.g. max sum of k consecutive elements) and variable-size window (e.g. smallest subarray with sum ≥ target, or longest substring with at most k distinct characters). This section covers both patterns, when to use which, and how to keep window state (e.g. sum or frequency map) updated efficiently.

Real-World Analogy

Imagine a train car with a fixed number of seats (fixed window): as the train moves, one person gets off at the front and one gets on at the back—you always see the same number of people, but the group “slides.” For a variable window, imagine a rope you pull from both ends: you lengthen it until a condition is met (e.g. “contains at least 3 red beads”), then shorten from the left until the condition fails, then lengthen again. You never go backward on the right pointer; you only adjust left. In both cases, you avoid re-scanning the whole array by reusing what you already know about the current window.

Example

Array [2, 1, 5, 1, 3, 2], k = 3. Fixed window: first window sum = 2+1+5 = 8; slide right: drop 2, add 1 → 1+5+1 = 7; drop 1, add 3 → 5+1+3 = 9; drop 5, add 2 → 1+3+2 = 6. Max sum = 9. One pass, O(n), instead of recomputing each window from scratch.

Formal Definition

Concept Note

Sliding window: Maintain a contiguous segment [left, right] (inclusive or [left, right) as convenient). Fixed-size: Window length is k. Advance right and left together (or right first until window size k, then slide both). Variable-size: Expand (right++) when the current window doesn’t satisfy the condition; shrink (left++) when it does (or the opposite, depending on the problem). Keep a running state (sum, count, frequency map) that you update in O(1) when adding/removing one element. Total time O(n); space O(1) or O(k) for a frequency map.

The key invariant: each element enters the window once and leaves at most once (when left advances). So the number of “add to window” and “remove from window” operations is O(n), and if each update is O(1), the whole algorithm is O(n).

Why This Topic Matters

Interviews: Max sum subarray of size k, smallest subarray with sum ≥ target, longest substring with at most K distinct characters, minimum window substring—all classic sliding window.
Efficiency: Turns “check every subarray” O(n²) (or O(n·k) for fixed k) into O(n) by reusing window state.
Pattern: “Contiguous subarray/substring” + “max/min length” or “satisfy condition” often suggests sliding window.

Mental Model

Fixed-size: The window is a “frame” of k elements. Slide one step: subtract the element that just left (left), add the new element (right). Keep a running sum (or other aggregate) and update it in O(1) per slide. Variable-size: Right expands the window; when the condition is met (or violated, depending on the problem), you may record a candidate answer and then shrink from the left until the condition is no longer met, then expand again. The goal is usually “smallest window that satisfies” or “largest window that satisfies”; the expansion/shrink logic depends on that.

Fixed-Size Window

Problem: Maximum Sum of K Consecutive Elements

Given an array and integer k, find the maximum sum of any contiguous subarray of length k.

Algorithm

Compute the sum of the first k elements (window [0, k−1]). This is the first candidate.
For right from k to n−1: the new window drops arr[left] and adds arr[right], where left = right − k. So new_sum = current_sum − arr[left] + arr[right]. Update current_sum and track the maximum.
Return the maximum sum seen.

def max_sum_k(arr, k):
    if not arr or k <= 0 or k > len(arr):
        return 0
    window_sum = sum(arr[:k])
    best = window_sum
    for right in range(k, len(arr)):
        left = right - k
        window_sum = window_sum - arr[left] + arr[right]
        best = max(best, window_sum)
    return best

Line-by-Line Notes

window_sum is the sum of the current window. When we slide, we subtract the element that exits (arr[left]) and add the element that enters (arr[right]).
Loop runs (n − k) iterations; each iteration is O(1). Total O(n).

ASCII Diagram: Fixed Window

  arr: [ 2,  1,  5,  1,  3,  2 ]   k = 3
  Step 0: [ 2,  1,  5 ]            sum = 8
  Step 1:      [ 1,  5,  1 ]        sum = 7  (drop 2, add 1)
  Step 2:           [ 5,  1,  3 ]  sum = 9  (drop 1, add 3)  ← max
  Step 3:                [ 1,  3,  2 ]  sum = 6
  Result: max = 9

Variable-Size Window

Problem: Smallest Subarray with Sum ≥ Target

Given an array of positive integers and a target, find the length of the smallest contiguous subarray whose sum is ≥ target. If none, return 0.

Idea: Expand the window (right++) until the window sum ≥ target. Then we have a candidate length (right − left + 1). Shrink from the left (left++) until the sum is < target again; each time before shrinking, update the minimum length. Then expand again. Each element is added once and removed once—O(n).

def min_subarray_sum(arr, target):
    if not arr or target <= 0:
        return 0
    left = 0
    window_sum = 0
    min_len = float('inf')
    for right in range(len(arr)):
        window_sum += arr[right]
        while window_sum >= target:
            min_len = min(min_len, right - left + 1)
            window_sum -= arr[left]
            left += 1
    return min_len if min_len != float('inf') else 0

While the sum is ≥ target, we shrink from the left and update the minimum length. Time O(n): right and left each advance at most n times. Space O(1).

Problem: Longest Substring with At Most K Distinct Characters

Given a string (or array of characters) and k, find the length of the longest substring with at most k distinct characters. Idea: Expand (right++) and add the new character to a frequency map. While the number of distinct characters exceeds k, shrink from the left (remove arr[left] from the map, left++). After each shrink step, the window has ≤ k distinct. Track the maximum window size. Time O(n), space O(k) for the map.

def longest_k_distinct(s, k):
    if k <= 0 or not s:
        return 0
    left = 0
    freq = {}
    max_len = 0
    for right in range(len(s)):
        c = s[right]
        freq[c] = freq.get(c, 0) + 1
        while len(freq) > k:
            c_left = s[left]
            freq[c_left] -= 1
            if freq[c_left] == 0:
                del freq[c_left]
            left += 1
        max_len = max(max_len, right - left + 1)
    return max_len

Step-by-Step: Variable Window (Min Subarray Sum)

left = 0, window_sum = 0, min_len = ∞.
For right from 0 to n−1: add arr[right] to window_sum.
While window_sum ≥ target: update min_len = min(min_len, right − left + 1); subtract arr[left] from window_sum; left++.
After the loop, return min_len (or 0 if no valid window).

The “while” shrink step ensures that when we leave the inner loop, the window [left, right] has sum < target. So the next expansion (right++) is the only way to get back to ≥ target. No need to re-scan from the beginning.

Evolution: Brute Force → Sliding Window

Max sum of k consecutive: Brute force: for each starting index i, sum arr[i..i+k−1] — O(n·k). Sliding window: one pass, add one and remove one per step — O(n). Smallest subarray sum ≥ target: Brute force: for each (i, j), sum arr[i..j] and compare — O(n²). Sliding window: expand and shrink, each element processed O(1) times — O(n).

Problem type	Brute force	Sliding window
Fixed size k (max sum)	O(n·k)	O(n)
Variable (min subarray sum)	O(n²)	O(n)

Optimization Insight

Whenever you need a contiguous subarray (or substring) that maximizes or minimizes a measure (sum, length, count) or satisfies a condition (sum ≥ T, at most k distinct), consider sliding window. Fixed k → fixed-size window with running aggregate. Variable size → expand until condition met, then shrink from the left; keep the invariant that the window state is updated in O(1) when adding/removing one element.

Time and Space Complexity

Fixed-size (max sum k): One pass over the array; each element enters and leaves the window once. Time O(n), space O(1).
Variable-size (min subarray sum): Left and right each advance at most n times; inner while runs at most n times total. Time O(n), space O(1).
Variable-size with frequency map (k distinct): Time O(n), space O(k) for the map.

Edge Cases

Empty array or k > n (fixed window): Return 0 or handle (e.g. no valid window).
k = 0 or negative: No valid window; return 0 or appropriate value.
Target unreachable (min subarray): If total sum < target, return 0 (or report no solution).
All elements same (k distinct): One distinct character; window can be the whole array if k ≥ 1.
Negative numbers: Min subarray sum with “sum ≥ target” can still use sliding window, but the “shrink while sum ≥ target” logic remains correct. For “max subarray sum” (Kadane), different algorithm (Topic 5.8).

Common Mistakes

Recomputing the window from scratch each time: That gives O(n·k) or O(n²). Always update the running state (sum, frequency) in O(1) when sliding.
Shrinking too much (variable window): Shrink only while the condition is (still) satisfied (or violated, depending on the problem). For “smallest subarray with sum ≥ target,” shrink while sum ≥ target; for “longest with ≤ k distinct,” shrink while distinct > k.
Using “if” instead of “while” when shrinking: After expanding, you may need to shrink multiple steps (e.g. remove several elements from the left) to restore the invariant. Use while for the shrink loop.
Off-by-one in length: Length of [left, right] inclusive is right - left + 1. Check your problem’s definition (0-based indices vs 1-based length).

Common Mistake

In variable-size window, moving left past right: ensure after shrinking, left ≤ right + 1. Usually left is increased until the condition fails, so the next iteration expands right again. Don’t reset left to 0 on each iteration unless the problem requires it (almost never in standard sliding window).

Pattern Recognition

“Contiguous subarray of size k” / “every consecutive k”: Fixed-size window.
“Smallest/largest subarray such that sum ≥ / ≤ target”: Variable-size window; expand then shrink (or the reverse).
“Longest substring with at most K distinct”: Variable window + frequency map.
“Minimum window substring” (containing all chars of T): Variable window + frequency map for T and current window.

Interview Insight

Say: “This is a contiguous subarray problem, so I’ll use a sliding window. If the size is fixed (k), I’ll maintain a running sum and slide by subtracting the left element and adding the right. If the size is variable, I’ll expand with right until the condition is met, then shrink from the left with a while loop and update the answer.” State the invariant (“window sum is the sum of [left, right]”) and time O(n), space O(1) or O(k).

Practice Problems

Max sum of k consecutive elements: Fixed window; running sum.
Smallest subarray with sum ≥ target: Variable window; expand, then shrink while sum ≥ target.
Longest substring with at most K distinct characters: Variable window + freq map; shrink while distinct > k.
Maximum average subarray of length k: Same as max sum of k (fixed window); average = sum/k.
Minimum window substring: Smallest substring of s containing all characters of t; variable window + two frequency maps.

Summary

Sliding window = contiguous subarray [left, right] with state updated in O(1) when adding/removing one element; total time O(n).
Fixed-size: Window length k; slide by subtracting arr[left] and adding arr[right]; one pass O(n).
Variable-size: Expand (right++) until condition met; shrink (left++) with a while until condition fails; track min/max length (or other measure).
Use while (not if) when shrinking so the invariant is restored after multiple removals.
Recognize “contiguous subarray” + “max/min sum or length” or “at most k distinct” as sliding window to get O(n) instead of O(n²) or O(n·k).

5.6 Prefix Sum

Introduction

A prefix sum (or cumulative sum) array stores, at each index i, the sum of all elements from the start of the array up to and including i. Once built in O(n), any range sum query—"what is the sum of elements from index left to right?"—can be answered in O(1) using the identity: sum(arr[left..right]) = prefix[right] − prefix[left−1] (with a convention for left=0). This turns repeated range-sum queries from O(n) per query to O(1), so q queries take O(n + q) instead of O(n·q). Prefix sum is the basis for many problems: subarray sum, equilibrium index, and 2D range sums (matrices). This section covers building the prefix array, the range-sum formula, and common uses.

Real-World Analogy

Imagine a road with mile markers. If you record the cumulative distance from the start at each marker (0, 5, 12, 18, …), then the distance between marker 2 and marker 4 is (distance at 4) − (distance at 2). You don't re-measure the segment; you subtract two stored numbers. The prefix array is that list of cumulative distances: prefix[i] = "total from start up to i." Any segment [left, right] is prefix[right] − prefix[left−1].

Example

arr = [1, 2, 3, 4, 5]. Prefix: prefix = [1, 3, 6, 10, 15]. Sum of arr[2..4] = 3+4+5 = 12. Using prefix: prefix[4] − prefix[1] = 15 − 3 = 12. One subtraction instead of looping.

Formal Definition

Concept Note

Prefix sum array: For array arr of length n, define prefix[i] = arr[0] + arr[1] + … + arr[i] for 0 ≤ i < n. Convention: prefix[-1] = 0 (sum of zero elements). Then range sum from index left to right (inclusive): sum(arr[left..right]) = prefix[right] − prefix[left−1]. With prefix[-1]=0, this holds for left=0 too. Build: O(n); per query: O(1).

We can use a 1-indexed prefix array so that prefix[i] = sum of first i elements; then sum of elements from index a to b (1-based) is prefix[b] − prefix[a−1]. The same idea applies in 2D: prefix[r][c] = sum of the rectangle from (0,0) to (r,c), and a subrectangle sum becomes four prefix lookups.

Why This Topic Matters

Range queries: Many problems ask for "sum of subarray [L, R]" repeatedly. Naive: O(n) per query. With prefix sum: O(n) preprocess, O(1) per query.
Interviews: Subarray sum equals K (with hash map + prefix), equilibrium index, 2D range sum (matrix block sum).
Building block: Difference array (Topic 5.7) and many segment-tree problems can be understood via prefix thinking.

Mental Model

prefix[i] = "sum of everything from the start up to i." So the sum from left to right is "sum up to right" minus "sum up to (left−1)." Picture a number line: prefix marks cumulative totals; the segment [left, right] is the gap between two marks.

Building the Prefix Array

def build_prefix(arr):
    n = len(arr)
    prefix = [0] * (n + 1)   # prefix[0] = 0, prefix[i] = sum(arr[0..i-1])
    for i in range(n):
        prefix[i + 1] = prefix[i] + arr[i]
    return prefix

Here prefix[i] = sum of arr[0..i−1] (first i elements). So sum(arr[left..right]) = prefix[right+1] − prefix[left]. Alternatively, use prefix[i] = sum(arr[0..i]) and prefix[-1]=0; then sum(arr[left..right]) = prefix[right] − prefix[left−1] (treat prefix[-1]=0 in code as a special case or use a length-(n+1) array with prefix[0]=0).

Convention: Length-(n+1) with prefix[0] = 0

Let prefix[0] = 0 and prefix[i] = arr[0] + … + arr[i−1] for 1 ≤ i ≤ n. Then:

Sum of arr[left..right] (0-based, inclusive) = prefix[right+1] − prefix[left].
No special case for left=0: prefix[0]=0 gives prefix[right+1] − 0 = sum of first (right+1) elements.

Range Sum Query

def range_sum(prefix, left, right):
    # prefix has length n+1, prefix[i] = sum(arr[0..i-1])
    return prefix[right + 1] - prefix[left]

O(1) per query. Left and right are 0-based inclusive indices.

ASCII Diagram

  arr:    [  1,   2,   3,   4,   5 ]
  index:     0    1    2    3    4

  prefix: [ 0,   1,   3,   6,  10,  15 ]
  index:     0    1    2    3    4    5   (prefix[i] = sum arr[0..i-1])

  sum(arr[2..4]) = arr[2]+arr[3]+arr[4] = 3+4+5 = 12
                 = prefix[5] - prefix[2] = 15 - 3 = 12

Python Implementation (In-Place or New Array)

# Build prefix (new array, length n+1)
prefix = [0]
for x in arr:
    prefix.append(prefix[-1] + x)

# Range sum [left, right] inclusive
def query(left, right):
    return prefix[right + 1] - prefix[left]

# Example: subarray sum equals K (count)
# For each right, count how many left with prefix[left] = prefix[right+1] - K
# Use a dict: for each prefix value, how many indices seen so far
from collections import defaultdict
def subarray_sum_count(arr, K):
    prefix = 0
    seen = defaultdict(int)
    seen[0] = 1
    count = 0
    for x in arr:
        prefix += x
        count += seen[prefix - K]
        seen[prefix] += 1
    return count

Line-by-Line: Subarray Sum Equals K

We want count of (left, right) such that sum(arr[left..right]) = K. That is prefix[right+1] − prefix[left] = K, i.e. prefix[left] = prefix[right+1] − K. As we iterate right, we have prefix = prefix[right+1]. So for each right, add to count the number of left < right+1 with prefix[left] = prefix − K. Maintain a frequency map of prefix values seen so far; before adding current prefix to the map, add seen[prefix − K] to count.

Time and Space Complexity

Build prefix: One pass, O(n) time, O(n) space for the prefix array (or O(1) extra if you overwrite a copy of arr).
Range sum query: O(1) per query.
q queries: O(n + q) with prefix sum vs O(n·q) naive.

Edge Cases

Empty array: prefix = [0]; any range query with left > right can return 0.
Single element: prefix = [0, arr[0]]; sum(arr[0..0]) = prefix[1] − prefix[0] = arr[0].
left = right: Sum of one element; formula still works.
left > right: Define as 0 or handle as invalid.

Common Mistakes

Off-by-one: With prefix[0]=0 and prefix[i]=sum(arr[0..i−1]), sum(arr[left..right]) = prefix[right+1] − prefix[left]. Using prefix[right] − prefix[left] gives sum(arr[left..right−1]).
Wrong convention: If prefix[i] = sum(arr[0..i]), then sum(arr[left..right]) = prefix[right] − (prefix[left−1] if left>0 else 0). Stick to one convention (e.g. length n+1 with prefix[0]=0) and use it consistently.
Index bounds: For prefix of length n+1, valid indices are 0..n. Query (left, right) must have 0 ≤ left ≤ right < n.

Common Mistake

Using prefix[right] − prefix[left] for inclusive range [left, right]. That equals the sum of arr[left..right−1]. For inclusive right you need prefix[right+1] − prefix[left] (with the standard length-(n+1) prefix where prefix[i] = sum of first i elements).

Pattern Recognition

"Sum of subarray [L,R]" or "range sum" repeatedly: Build prefix once, then O(1) per query.
"Number of subarrays with sum K": Prefix + hash map (store count of prefix values; for each right, add count of prefix value = current_prefix − K).
"Equilibrium index" (left sum = right sum): Total sum = S; at index i, left sum = prefix[i], right sum = S − prefix[i] − arr[i]; solve for i.

Interview Insight

When you see "range sum" or "subarray sum," say: "I can precompute a prefix sum array in O(n). Then each range sum is O(1) as prefix[right+1] − prefix[left]." For "count subarrays with sum K," say: "I'll use prefix and a hash map: for each right, I need the count of left with prefix[left] = current_prefix − K."

Practice Problems

Range sum query (many queries): Build prefix; answer each [L,R] in O(1).
Subarray sum equals K (count): Prefix + frequency map.
Equilibrium index: Index where sum of elements on left = sum on right; use total sum and prefix.
2D range sum (matrix): prefix[r][c] = sum of rectangle (0,0) to (r,c); block sum using four prefix values.

Summary

Prefix sum lets you answer range sum queries in O(1) after O(n) build. prefix[i] = sum of arr[0..i−1] (with prefix[0]=0).
Range sum [left, right] inclusive = prefix[right+1] − prefix[left].
Use length-(n+1) array and prefix[0]=0 to avoid special cases.
"Subarray sum equals K" count: iterate with current prefix, add seen[current_prefix − K], then update seen[current_prefix].

5.7 Difference Array

Introduction

A difference array (or difference table) lets you apply many range-update operations—“add value v to every element in [left, right]”—in O(1) time per update. After all updates, you recover the final array by taking the prefix sum of the difference array—one O(n) pass. So q range updates take O(q + n) instead of O(q·n). It is the “inverse” idea of prefix sum: prefix sum answers range queries (sum over [L,R]); the difference array handles range updates (add to [L,R]). This section covers how to represent range updates as two point updates, how to build and apply the difference array, and when to use it.

Real-World Analogy

Imagine a long fence where you paint segments. Instead of walking the whole segment each time to add paint, you mark “+1 bucket” at the start of the segment and “−1 bucket” at the end. Later, you walk once from left to right, carrying a running total of “buckets so far”—that total is how much paint is on the fence at each point. The difference array is those +1 and −1 marks; the prefix sum of that array is the final “amount of paint” (or the final array after all range adds).

Example

Start with array [0, 0, 0, 0]. Updates: add 5 to [1, 2], add 3 to [0, 1]. Difference array: at 0: +3, at 1: +5, at 2: −5, at 3: −3 (or diff[0]=3, diff[1]=5, diff[2]=−5, diff[3]=−3; diff[4]=0 for boundary). Prefix sum of diff: [3, 8, 3, 0]. So final array = [3, 8, 3, 0].

Formal Definition

Concept Note

Difference array diff: For an array arr, define diff[0] = arr[0] and diff[i] = arr[i] − arr[i−1] for i ≥ 1. Then arr is the prefix sum of diff. Equivalently: to add v to every element in [left, right], do diff[left] += v and diff[right+1] −= v (if right+1 is in bounds). After all such updates, arr[i] = diff[0] + diff[1] + … + diff[i] = prefix sum of diff. Each range update is O(1); recovering the array is O(n).

We use a length-(n+1) diff array so that diff[right+1] is valid when right = n−1. Initialize diff with zeros (or from an initial arr). Apply each range update with two point updates; then prefix sum gives the final array.

Why This Topic Matters

Range updates: Problems like “add v to [L,R] for many (L,R,v), then output the final array” are O(q·n) naive. With a difference array: O(q) for updates + O(n) to recover = O(q + n).
Interviews: Range add queries, “car pooling,” “meeting rooms” style “add in range,” or recover array after many range updates.
Duality with prefix sum: Prefix sum = range query (sum); difference array = range update (add). Taking prefix sum of diff recovers the array.

Mental Model

Think of diff as “how much does this index change from the previous one?” Adding v to [left, right] means: at left, the array “steps up” by v (diff[left] += v); at right+1, it “steps down” by v (diff[right+1] −= v). The prefix sum of diff accumulates these steps into the actual values.

Building the Difference Array from an Initial Array

If you start with an array arr: diff[0] = arr[0], and for i from 1 to n−1, diff[i] = arr[i] − arr[i−1]. Then arr is the prefix sum of diff. (We can use length n and define prefix sum accordingly, or use length n+1 with diff[n]=0.)

def build_diff(arr):
    n = len(arr)
    diff = [0] * (n + 1)
    diff[0] = arr[0]
    for i in range(1, n):
        diff[i] = arr[i] - arr[i - 1]
    return diff

Applying a Range Update: Add v to [left, right]

Update: diff[left] += v and diff[right + 1] -= v. If right+1 == n, we have diff[n] (which we ignore when computing prefix sum for indices 0..n−1), so it’s fine. If we use 0-indexed and right = n−1, then diff[n] -= v keeps the prefix sum correct for indices 0..n−1.

def range_add(diff, left, right, v):
    diff[left] += v
    if right + 1 < len(diff):
        diff[right + 1] -= v

O(1) per update.

Recovering the Array (Prefix Sum of diff)

def recover_array(diff):
    arr = []
    s = 0
    for i in range(len(diff) - 1):  # or n, if diff has length n+1 for n elements
        s += diff[i]
        arr.append(s)
    return arr

O(n). The final arr[i] is the prefix sum of diff up to i.

ASCII Diagram

  Start: arr = [0, 0, 0, 0],  diff = [0, 0, 0, 0, 0]
  Update: add 5 to [1, 2]
  diff[1] += 5  →  diff[2] -= 5
  diff: [0, 5, -5, 0, 0]

  Prefix sum of diff (first 4): [0, 5, 0, 0]  ✓  (indices 1,2 got +5)

  Update: add 3 to [0, 1]
  diff[0] += 3, diff[2] -= 3
  diff: [3, 5, -8, 0, 0]
  Prefix sum: [3, 8, 0, 0]  then index 2: 0+(-8) wrong — need to sum
  Correct: arr[0]=3, arr[1]=3+5=8, arr[2]=8-5=3, arr[3]=3+0=3. So [3, 8, 3, 0].
  (Prefix: 3, 3+5=8, 8+(-5)=3, 3+(-3)=0 for diff[3]=-3 if we did diff[3]-=3 for right=1.)
  For [0,1]: diff[0]+=3, diff[2]-=3 → prefix: 3, 8, 3, 0. Yes.

Full Example in Code

# Start with zeros; apply range adds; recover array
n = 5
diff = [0] * (n + 1)

def add(left, right, v):
    diff[left] += v
    if right + 1 <= n:
        diff[right + 1] -= v

add(1, 3, 10)   # add 10 to indices 1,2,3
add(0, 2, 5)    # add 5 to indices 0,1,2

# Recover
arr = []
s = 0
for i in range(n):
    s += diff[i]
    arr.append(s)
# arr = [5, 15, 15, 10, 0]

Time and Space Complexity

Build diff from arr: O(n).
One range update (add v to [L,R]): O(1)—two index updates.
Recover array: O(n)—one prefix-sum pass.
q range updates + recover: O(q + n). Naive would be O(q·n).
Space: O(n) for the diff array.

Edge Cases

right = n−1: diff[right+1] is diff[n]; ensure diff has length n+1 so this is valid.
left > right: Treat as no-op or skip.
Empty array: diff = [0]; recover gives [].

Common Mistakes

Forgetting diff[right+1] −= v: Without it, the “step up” at left is never canceled, so every index from left to the end gets +v.
Using right instead of right+1: The “step down” must happen at the first index that should not get the update, i.e. right+1.
Diff array too short: Use length n+1 so that diff[right+1] is valid for right = n−1.

Common Mistake

Doing only diff[left] += v and forgetting diff[right+1] −= v. Then the prefix sum from left onward is increased by v forever; the range update becomes “add v to [left, end].” Always add both point updates for a bounded range [left, right].

Pattern Recognition

“Add v to all elements in [L,R]” repeated many times, then output the array: Difference array.
“Apply many range updates (add/subtract), then query” or “get final state”: Diff array + prefix sum to recover.
2D variant: Similar idea with a 2D diff matrix and 2D prefix sum to recover (four corners for each rectangle update).

Interview Insight

When the problem has “add value to range [L,R]” many times and then asks for the final array (or a single query), say: “I’ll use a difference array. Each range add is two point updates: diff[L] += v and diff[R+1] -= v. Then I recover the array with one prefix-sum pass. Total O(q + n).”

Practice Problems

Range add: Given q queries (L, R, v), add v to arr[L..R] for each; output final array. Use diff.
Car pooling / booking: Trips (num_passengers, start, end); at each point, current load = prefix sum of diff (add at start, subtract at end).
2D range add: Add v to all cells in rectangle (r1,c1) to (r2,c2); use 2D difference array and 2D prefix sum to recover.

Summary

Difference array supports range updates “add v to [left, right]” in O(1) each: diff[left] += v, diff[right+1] -= v.
Recover the final array by taking the prefix sum of diff—O(n).
Use a length-(n+1) diff array so diff[right+1] is valid when right = n−1.
q range updates + recover: O(q + n) vs naive O(q·n).

5.8 Kadane's Algorithm

Introduction

Kadane's algorithm finds the maximum sum of a contiguous subarray (maximum subarray sum) in O(n) time and O(1) space with a single pass. The idea: at each position, the maximum sum ending at that position is either “extend the best subarray ending at the previous position” or “start fresh with only this element.” By keeping a running “max sum ending here” and a global “best so far,” you never need to check every subarray—so you avoid O(n²). This section covers the standard formulation, handling all-negative arrays, and how to recover the indices of the best subarray.

Real-World Analogy

Imagine you’re tracking daily profit over a month. “Best contiguous period” is the stretch of days that would have made you the most money. At each day, you decide: “Do I extend my current streak (add today’s profit) or throw it away and start from today?” If the current streak goes negative, starting from today might be better. You only need to remember “best sum ending at yesterday” and “best sum seen so far”—no need to try every possible start and end day.

Example

arr = [-2, 1, -3, 4, -1, 2, 1, -5, 4]. The maximum sum contiguous subarray is [4, -1, 2, 1] with sum 6. Kadane: at index 3 (value 4), we start fresh (previous ending sum was negative); we extend through indices 4,5,6; then -5 reduces the ending sum; 4 starts a new candidate. One pass, no nested loops.

Formal Definition

Concept Note

Maximum subarray sum (MSCS): Given array arr, find max{ sum(arr[i..j]) : 0 ≤ i ≤ j < n }. Kadane's algorithm: Define cur = maximum sum of a contiguous subarray ending at the current index. Then cur = max(arr[i], cur + arr[i]) (either start fresh at i or extend the previous best ending). Update best = max(best, cur). Initial: cur = arr[0], best = arr[0]. Time O(n), space O(1).

If the problem allows an “empty” subarray (sum 0), use cur = max(0, cur + arr[i]) and best = max(best, cur) with cur = 0, best = 0 initially. Then all-negative arrays yield 0. For “non-empty subarray,” use the formulation above so the answer is at least the maximum single element.

Why This Topic Matters

Interviews: “Maximum subarray sum” is a classic; Kadane is the expected O(n) solution. Follow-ups: return the indices, handle circular array, or maximum product subarray (similar idea).
Efficiency: Brute force is O(n²) (or O(n³) with naive sum). Kadane is O(n).
Pattern: “Best contiguous segment” with an additive criterion often reduces to “max sum ending here” + “global max.”

Mental Model

At each index, you have a “current streak” (max sum of a subarray ending at the previous index). Adding the current element might improve it or make it worse. If the current streak becomes negative, it’s better to drop it and start a new streak with only the current element (otherwise any future positive segment would be dragged down). So: cur = max(arr[i], cur + arr[i]). The global best is the maximum cur you ever see.

Algorithm (Non-Empty Subarray)

Initialize cur = arr[0], best = arr[0].
For i from 1 to n−1: cur = max(arr[i], cur + arr[i]); best = max(best, cur).
Return best.

Interpretation: cur is the maximum sum of a contiguous subarray that ends at index i. We either extend (cur + arr[i]) or start fresh (arr[i]).

Python Implementation

def max_subarray_sum(arr):
    if not arr:
        return 0  # or None, depending on problem
    cur = arr[0]
    best = arr[0]
    for i in range(1, len(arr)):
        cur = max(arr[i], cur + arr[i])
        best = max(best, cur)
    return best

Line-by-Line Explanation

cur = max(arr[i], cur + arr[i]): Extend the previous best ending (cur + arr[i]) or start a new subarray with only arr[i]. We take the max, so we never keep a negative “ending” when arr[i] alone is better.
best = max(best, cur): Track the best sum we’ve seen over all positions.
Empty array: return 0 or raise; for non-empty we guarantee at least one element is considered.

Recovering the Indices (Start and End of Best Subarray)

While updating cur and best, track when cur equals arr[i] (we started fresh → new start index) and when best is updated (we have a new best → update end index).

def max_subarray_indices(arr):
    if not arr:
        return 0, -1, -1
    cur = arr[0]
    best = arr[0]
    start = end = 0
    best_start = best_end = 0
    for i in range(1, len(arr)):
        if arr[i] > cur + arr[i]:
            cur = arr[i]
            start = i
        else:
            cur = cur + arr[i]
        end = i
        if cur > best:
            best = cur
            best_start, best_end = start, end
    return best, best_start, best_end

ASCII Diagram

  arr:  [ -2,  1, -3,  4, -1,  2,  1, -5,  4 ]
  index:   0   1   2   3   4   5   6   7   8

  i=0: cur=-2, best=-2
  i=1: cur=max(1, -2+1)=1, best=1
  i=2: cur=max(-3, 1-3)=-2, best=1
  i=3: cur=max(4, -2+4)=4, best=4   (start fresh)
  i=4: cur=max(-1, 4-1)=3, best=4
  i=5: cur=max(2, 3+2)=5, best=5
  i=6: cur=max(1, 5+1)=6, best=6   ← max sum
  i=7: cur=max(-5, 6-5)=1, best=6
  i=8: cur=max(4, 1+4)=5, best=6
  Return 6 (subarray [4,-1,2,1])

All-Negative and Empty-Subarray Variants

All-negative array

With the non-empty formulation, cur and best stay in the array; the answer is the maximum (least negative) element. Correct.

Allow empty subarray (sum = 0)

Use cur = max(0, cur + arr[i]) and best = max(best, cur), with cur = 0, best = 0. Then if every element is negative, we return 0 (empty subarray).

def max_subarray_sum_allow_empty(arr):
    cur = best = 0
    for x in arr:
        cur = max(0, cur + x)
        best = max(best, cur)
    return best

Evolution: Brute Force → Kadane

Brute force: For each pair (i, j), compute sum(arr[i..j])—O(n²) pairs, O(n) sum each = O(n³), or O(n²) with prefix sum. Kadane: One pass, O(n). The key is that we don’t need to try every start index; the recurrence “max sum ending at i” depends only on “max sum ending at i−1” and arr[i].

Optimization Insight

Whenever you need the “best contiguous segment” by sum (or a similar additive measure), ask: “Can I compute the best segment ending at each index from the best ending at the previous index?” If yes, that’s a linear recurrence and usually O(n) with O(1) space.

Time and Space Complexity

Time: O(n)—one pass over the array.
Space: O(1)—only a few variables (cur, best, and optionally indices).

Edge Cases

Empty array: Return 0, None, or as specified. Avoid indexing arr[0].
Single element: cur and best both equal that element; correct.
All negative: Non-empty version returns the maximum element. Empty-allowed version returns 0.
All positive: The whole array is the answer; Kadane correctly extends to the end.

Common Mistakes

Initializing cur = 0, best = 0 for non-empty: Then for all-negative arrays you’d return 0, but the problem may require a non-empty subarray (answer = max element). Use cur = arr[0], best = arr[0] for non-empty.
Using cur = cur + arr[i] without the max: Then a negative prefix keeps dragging cur down; you never “restart.” You must do cur = max(arr[i], cur + arr[i]).
Confusing with “max sum subsequence”: Subarray = contiguous. Subsequence = any subset in order. Kadane is for subarray only.

Common Mistake

Using “allow empty” (cur = max(0, cur + x)) when the problem says “non-empty contiguous subarray.” For [-1, -2, -3], the non-empty answer is -1; the empty-allowed answer is 0. Always clarify and implement accordingly.

Pattern Recognition

“Maximum sum contiguous subarray”: Kadane.
“Maximum product contiguous subarray”: Similar idea but track both max and min (negative × negative = positive).
“Best contiguous segment” with a cumulative condition: Consider “best ending here” recurrence.

Interview Insight

Say: “I’ll use Kadane’s algorithm. I’ll keep the maximum sum of a subarray ending at the current index, and either extend the previous best or start fresh. One pass, O(n) time, O(1) space.” If asked for indices, mention tracking start when we restart and end when we update the global best.

Practice Problems

Maximum subarray sum: Standard Kadane; return the sum or the subarray indices.
Maximum product subarray: Track cur_max and cur_min (and swap on negative); update best.
Maximum sum circular subarray: Either max subarray in linear array, or total − min subarray (wrap-around case).

Summary

Kadane’s algorithm finds the maximum sum of a contiguous subarray in O(n) time, O(1) space.
cur = max(arr[i], cur + arr[i]) (max sum ending at i); best = max(best, cur).
Non-empty: initialize cur = arr[0], best = arr[0]. Empty allowed: cur = 0, best = 0 and cur = max(0, cur + arr[i]).
All-negative: non-empty answer = maximum element; empty allowed = 0.

5.9 Binary Search on Answer

Introduction

Binary search on the answer (also called binary search on value or answer space binary search) is used when you want to find the minimum or maximum value that satisfies a condition, and that value lies in a known range [low, high]. Instead of checking every value in the range, you binary search: pick mid; ask “is mid a valid answer?” (or “can we achieve at least/most mid?”). If the predicate is monotonic—e.g. “if x works, then every value ≥ x works”—you can discard half of the range each time. Total cost is O(log(range) × cost of one predicate check). This section covers when to use it, how to design the predicate, and classic problems (minimum capacity, split array, etc.).

Real-World Analogy

Imagine finding the minimum speed at which you must drive to reach a city within 10 hours. Speeds 1, 2, 3, … up to some max. If speed 50 works, then 51, 52, … also work. If 50 doesn’t work, then 49, 48, … don’t work. So “does speed x work?” is monotonic in x. You binary search on speed: try 50; if it works, try 25; if not, try 75; and so on. You need only about log₂(max_speed) tries instead of checking every speed.

Example

Split array largest sum: Partition array into k contiguous subarrays; minimize the largest sum among them. Answer is in [max(arr), sum(arr)]. For a candidate “max sum” S, we can greedily form segments so each has sum ≤ S and count how many segments we need. If segments ≤ k, S is feasible. If we can do it with S, we can do it with S+1. Binary search on S; each check is O(n). Total O(n log(sum)).

Formal Definition

Concept Note

Answer space: The answer is an integer (or real) in range [low, high]. Predicate: feasible(x) = “is x a valid answer?” (or “can we achieve at least x?” / “at most x?”). Monotonicity: For “minimize the answer,” typically if feasible(mid) then every value ≥ mid is feasible (so we try left half for a smaller answer). For “maximize,” if feasible(mid) then every value ≤ mid is feasible (try right half). Algorithm: Binary search on [low, high]; at each mid, call feasible(mid); narrow the range. Time O(log(range) × T), where T = cost of feasible().

The hardest part is (1) identifying that the answer lies in a range and (2) writing a correct, monotonic predicate. Once you have that, the binary search loop is standard.

Why This Topic Matters

Interviews: “Minimum capacity to ship in D days,” “Koko eating bananas,” “Split array largest sum,” “minimum time to complete trips”—all classic “binary search on answer” problems.
Optimization: Turns “find minimum/maximum x such that P(x)” into O(log(range)) iterations instead of linear scan over the answer space.
Pattern: When the problem asks “minimize the maximum” or “maximize the minimum” and you can check “is x achievable?” in reasonable time, consider binary search on x.

Mental Model

The answer is “some number” in [low, high]. You don’t search the array by index—you search the value of the answer. For each candidate value mid, you ask: “If the answer were mid, would that be valid?” (feasible(mid)). The key is that feasibility is monotonic: once you cross a threshold, all values on one side work and all on the other don’t. So you can binary search that threshold.

When to Use

Problem asks for minimum or maximum of something (capacity, speed, sum, time).
You can define a range [low, high] that contains the answer.
You can write a function feasible(x) that returns True if x is achievable/valid, and feasibility is monotonic (e.g. if x is valid, then any y ≥ x is valid for “minimize” problems).

Algorithm: Minimize the Answer

Find the smallest value x in [low, high] such that feasible(x) is True. Typical: if feasible(mid), then we can try smaller—search left (high = mid). Otherwise search right (low = mid + 1).

def minimize_answer(low, high, feasible):
    while low < high:
        mid = (low + high) // 2
        if feasible(mid):
            high = mid   # try smaller
        else:
            low = mid + 1
    return low

Loop invariant: answer is in [low, high]. When low == high, that value is the minimum feasible.

Algorithm: Maximize the Answer

Find the largest value x such that feasible(x) is True. If feasible(mid), try larger (low = mid + 1); else try smaller (high = mid - 1). Or keep the same loop and return low - 1 / high depending on how you shrink.

def maximize_answer(low, high, feasible):
    while low < high:
        mid = (low + high + 1) // 2   # ceiling to avoid infinite loop
        if feasible(mid):
            low = mid
        else:
            high = mid - 1
    return low

Example: Minimum Capacity to Ship in D Days

Weights in array; ship all in at most D days; same order; minimize the maximum weight per day (capacity). Answer in [max(weights), sum(weights)]. feasible(cap): can we ship all with capacity cap in ≤ D days? Greedy: pack days until adding the next item would exceed cap; then start a new day. If days needed ≤ D, cap is feasible. Monotonic: if cap works, cap+1 works. Binary search for minimum cap.

def ship_within_days(weights, D):
    def feasible(cap):
        days = 1
        cur = 0
        for w in weights:
            if cur + w > cap:
                days += 1
                cur = w
            else:
                cur += w
            if days > D:
                return False
        return True

    low, high = max(weights), sum(weights)
    while low < high:
        mid = (low + high) // 2
        if feasible(mid):
            high = mid
        else:
            low = mid + 1
    return low

Line-by-Line Notes

feasible(cap): simulate shipping with daily capacity cap; count days. If days ≤ D, cap is valid.
low must be at least max(weights) (one item per day); high at most sum(weights) (one day).
We want minimum cap, so when feasible(mid) we set high = mid to try smaller.

ASCII Diagram: Binary Search on Answer

  Range [1, 10]. Find minimum x such that feasible(x).
  feasible(1)=F, feasible(2)=F, feasible(3)=F, feasible(4)=T, feasible(5)=T, ...

  low=1, high=10 → mid=5, feasible(5)=T → high=5
  low=1, high=5  → mid=3, feasible(3)=F → low=4
  low=4, high=5  → mid=4, feasible(4)=T → high=4
  low=4, high=4  → return 4

Time and Space Complexity

Iterations: O(log(high - low + 1))—each step halves the range.
Per iteration: Cost of feasible(mid). Often O(n) for array problems.
Total: O(T × log(range)), where T = cost of feasible(). Space O(1) plus any space used by feasible().

Edge Cases

No feasible value: If even high is not feasible (minimize) or low not feasible (maximize), handle after the loop or ensure range is chosen so that at least one value is feasible.
low > high: Range is empty; no solution (or adjust low/high so the answer is in range).
Integer overflow for mid: Use mid = low + (high - low) // 2 in other languages.

Common Mistakes

Predicate not monotonic: If feasible(x) is true for some x and false for larger x, binary search is wrong. Verify “if x works, then x+1 works” (or the appropriate direction).
Wrong bound update (maximize with mid): When maximizing, use mid = (low + high + 1) // 2 so that when feasible(mid) and you set low = mid, you don’t get stuck (low=3, high=4, mid=3 forever).
Wrong range: low/high must include the answer. E.g. minimum capacity must be ≥ max(weights).

Common Mistake

In “maximize” binary search, writing mid = (low + high) // 2 and then low = mid when feasible: when low and high are consecutive (e.g. 4 and 5), mid stays 4, and if feasible(4) is True you set low=4 again—infinite loop. Use mid = (low + high + 1) // 2 so mid moves to high.

Evolution: Linear Scan vs Binary Search on Answer

Naive: try each value from low to high until you find the minimum feasible—O(range × T). Binary search: O(log(range) × T). When the range is large (e.g. up to 10^9), only binary search is practical.

Optimization Insight

If the problem says “minimize the maximum” or “maximize the minimum” and you can check “can we achieve value x?” in O(n) or O(n log n), binary search on the answer often gives the optimal complexity. Always check monotonicity of the predicate.

Pattern Recognition

“Minimize the maximum …” / “Maximize the minimum …”: Candidate for binary search on answer.
“What is the minimum capacity/speed/time such that …?”: Answer in a range; define feasible(cap) and binary search.
“Split into k parts, minimize the largest sum”: Binary search on the largest sum; feasible = greedy partition.

Interview Insight

When you see “minimize the maximum” or “maximize the minimum,” say: “The answer is in a range. I’ll binary search on the answer. For each candidate value, I’ll check if it’s achievable—that’s my feasible function. If feasible is monotonic, we’re done. Time O(log(range) × cost of feasible).” Then implement the loop and the predicate.

Practice Problems

Capacity to ship packages in D days: Minimize max capacity; feasible = greedy days.
Koko eating bananas: Minimize speed; feasible = can finish in H hours.
Split array largest sum: Minimize largest sum when splitting into k subarrays; feasible = greedy segments.
Minimum time to complete trips: Binary search on time; feasible = can complete all trips.

Summary

Binary search on answer finds min/max value in [low, high] such that feasible(x) is true. Requires monotonic predicate.
Minimize: If feasible(mid), try smaller (high = mid); else low = mid + 1. Return low.
Maximize: Use mid = (low + high + 1) // 2; if feasible(mid), low = mid; else high = mid - 1. Return low.
Choose [low, high] so the answer is inside; implement feasible() correctly and check monotonicity.

5.10 2D Arrays

Introduction

A 2D array (or matrix) is an array of arrays: each element is itself an array, so you access elements with two indices—mat[row][col] or mat[i][j]. Rows and columns form a grid; dimensions are “rows × columns” (e.g. 3×4). In Python, a 2D array is a list of lists. Used for matrices in math, grids in games, adjacency structures for graphs, and images (pixels). This section covers creating 2D arrays correctly in Python, indexing, traversal patterns, and common pitfalls (shallow copy).

Real-World Analogy

Think of a spreadsheet or chessboard: rows and columns. Cell (2, 3) means row 2, column 3. A 2D array is the same: first index = row, second index = column. Like a 1D array is a row of lockers, a 2D array is a grid of lockers—you need two coordinates (row, col) to open one.

Example

mat = [[1,2,3], [4,5,6], [7,8,9]] is a 3×3 matrix. mat[0][0]=1, mat[1][2]=6, mat[2][1]=8. Rows are mat[0], mat[1], mat[2]; number of rows = len(mat), number of columns = len(mat[0]) (assuming non-empty).

Formal Definition

Concept Note

2D array (matrix): A structure with rows and columns. Element at row r and column c is accessed as mat[r][c]. Dimensions: rows × cols. In memory, often stored row-major (row 0, then row 1, …) so mat[r][c] is at base + r×cols + c. In Python, mat is a list of lists; mat[r] is the r-th row (a list); mat[r][c] is the c-th element of that row. Access and update: O(1).

All rows should have the same length (rectangular matrix) unless you explicitly need a “ragged” 2D structure.

Why This Topic Matters

Foundation: Matrices in math, dynamic programming tables, graph adjacency matrices, grid problems (BFS/DFS).
Interviews: Matrix traversal, spiral, search in sorted 2D array, island count, path sum—all assume comfort with 2D indexing and bounds.
Python gotcha: Creating 2D arrays with [[0]*c]*r reuses the same row; use [[0]*c for _ in range(r)] for independent rows.

Mental Model

Rows are horizontal; columns are vertical. mat[i][j] = row i, column j. Row index i goes from 0 to rows−1; column index j from 0 to cols−1. Traversal “row by row” is for i in range(rows): for j in range(cols): mat[i][j]. “Column by column” swaps the loops (j outer, i inner).

Creating a 2D Array in Python

Correct: Independent Rows

rows, cols = 3, 4
mat = [[0] * cols for _ in range(rows)]

This creates rows separate lists, each of length cols. Changing mat[0][0] does not affect mat[1][0].

Wrong: Shallow Copy

mat = [[0] * cols] * rows   # BAD: same row repeated

Here all rows are the same list. mat[0] is mat[1] is mat[2]. So mat[0][0] = 1 makes mat[1][0] and mat[2][0] also 1. Never use this for a mutable 2D array.

Common Mistake

Using [[0]*cols]*rows creates one inner list referenced by every row. Use [[0]*cols for _ in range(rows)] so each row is a new list.

Dimensions and Bounds

rows = len(mat)
cols = len(mat[0]) if mat else 0

Valid indices: 0 ≤ i < rows, 0 ≤ j < cols. For non-rectangular lists, cols might vary; then use len(mat[i]) for the i-th row.

Access and Update

Same as 1D: mat[i][j] to read or write. O(1).

mat[1][2] = 99
x = mat[0][0]

Traversal

Row by Row (Standard)

for i in range(len(mat)):
    for j in range(len(mat[0])):
        print(mat[i][j])

Column by Column

for j in range(len(mat[0])):
    for i in range(len(mat)):
        print(mat[i][j])

With enumerate

for i, row in enumerate(mat):
    for j, val in enumerate(row):
        print(i, j, val)

ASCII Diagram

  mat[3][4]:     col 0   col 1   col 2   col 3
  row 0            a       b       c       d
  row 1            e       f       g       h
  row 2            i       j       k       l

  mat[1][2] = g
  len(mat) = 3, len(mat[0]) = 4

Time and Space Complexity

Access/update mat[i][j]: O(1).
Traverse all elements: O(rows × cols).
Space: O(rows × cols) for the matrix.

Edge Cases

Empty matrix: mat = []; len(mat)=0. Don’t use len(mat[0]). Check if not mat or not mat[0] before using dimensions.
Single row: mat = [[1,2,3]]; one row, three columns.
Single column: mat = [[1],[2],[3]]; three rows, one column.

Common Mistakes

[[0]*c]*r: Same row reference; mutations affect every row. Use list comprehension.
Off-by-one in bounds: Valid (i,j) are 0 to rows−1 and 0 to cols−1. Loop conditions: range(len(mat)), range(len(mat[0])).
Assuming rectangular: If rows can have different lengths, use len(mat[i]) for the j-loop bound.

Pattern Recognition

Grid / matrix problems: 2D array; neighbors = (i±1,j), (i,j±1) or 8-direction.
DP table: Often 2D; dp[i][j] from dp[i-1][j], dp[i][j-1], etc.
Next topics: Matrix traversal (5.11), spiral (5.12) build on 2D indexing.

Interview Insight

When given a matrix, state dimensions: “It’s rows × cols. I’ll use mat[i][j] for row i, column j. I’ll traverse with nested loops—row by row unless we need column order. I’ll check for empty matrix and bounds before indexing.” If you need to create a 2D array, use [[0]*cols for _ in range(rows)].

Practice Problems

Matrix initialization: Create rows×cols matrix of zeros (correct way).
Transpose: New matrix where new[i][j] = mat[j][i].
Sum all elements: Nested loop; O(rows×cols).

Summary

2D array = list of lists; mat[i][j] = row i, column j. Dimensions: rows × cols.
Create with [[0]*cols for _ in range(rows)]—not [[0]*cols]*rows.
Traverse row-by-row or column-by-column with nested loops; check empty and bounds.

5.11 Matrix Traversal

Introduction

Matrix traversal means visiting every cell of a 2D array in a defined order. Common patterns: row-wise (left to right, top to bottom), column-wise, diagonal (main, anti-, or all diagonals), layer/ring (from outer boundary inward—used in spiral), and BFS/DFS on the grid (neighbor-based). The choice of order affects how you solve problems (e.g. DP row-by-row, spiral by layer). This section covers these patterns, how to iterate diagonals and boundaries, and 4- vs 8-neighbor movement for grid problems.

Real-World Analogy

Imagine reading a page of text: you go left to right, then next line (row-wise). Or scanning a form column by column (column-wise). Diagonal is like cutting the grid with lines of slope 1 or −1. Layer by layer is like peeling an onion—outer rectangle first, then the inner one. Each order suits different tasks (fill row-by-row for DP; peel for spiral).

Example

3×3 matrix: row-wise order is (0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2). Main diagonal: (0,0), (1,1), (2,2). Layer 0 = top row, right col, bottom row, left col; layer 1 = center. Spiral (next topic) follows layers with a specific winding order.

Formal Definition

Concept Note

Traversal: A sequence of cell indices (i, j) that visits each cell of an rows×cols matrix exactly once (or as needed). Row-wise: for i in range(rows): for j in range(cols). Column-wise: for j in range(cols): for i in range(rows). Diagonal: cells with constant (i−j) or (i+j). Layer: cells on the boundary of a sub-rectangle; repeat for inner rectangles. Neighbors: 4-direction (i±1, j), (i, j±1); 8-direction adds diagonals.

Why This Topic Matters

DP and iteration: Many DP tables are filled row by row or diagonal by diagonal.
Spiral / boundaries: Topic 5.12 uses layer-wise traversal; you need clean boundary loops.
Grid BFS/DFS: Graph-like traversal on (i,j) with 4 or 8 neighbors—islands, shortest path in maze.

Mental Model

Row-wise = “for each row, scan all columns.” Column-wise = “for each column, scan all rows.” Diagonal = “all (i,j) where i−j = d” (one diagonal) or “for each sum s, all (i,j) with i+j = s.” Layer = “for k = 0,1,…, min(rows,cols)//2, traverse the k-th boundary (top row, right col, bottom row, left col of the k-th rectangle).”

Row-Wise and Column-Wise

# Row-wise (standard)
for i in range(rows):
    for j in range(cols):
        process(mat[i][j])

# Column-wise
for j in range(cols):
    for i in range(rows):
        process(mat[i][j])

Diagonal Traversal

Main Diagonal (i == j)

for i in range(min(rows, cols)):
    process(mat[i][i])

All Diagonals (constant i − j)

For each value d = i − j, we get one diagonal. d ranges from -(cols-1) to (rows-1). For each d, iterate (i, j) such that i − j = d and 0 ≤ i < rows, 0 ≤ j < cols.

for d in range(-(cols - 1), rows):
    for i in range(rows):
        j = i - d
        if 0 <= j < cols:
            process(mat[i][j])

Anti-Diagonals (constant i + j)

Sum s = i + j ranges from 0 to (rows−1)+(cols−1). For each s, iterate valid (i, j) with i + j = s.

for s in range(rows + cols - 1):
    for i in range(rows):
        j = s - i
        if 0 <= j < cols:
            process(mat[i][j])

Layer / Boundary Traversal

Traverse the k-th “ring” (boundary of the rectangle from row k to row rows−1−k and col k to col cols−1−k). Top row (k, k) to (k, cols−1−k), right col (k+1, cols−1−k) to (rows−1−k, cols−1−k), bottom row (rows−1−k, cols−2−k) to (rows−1−k, k), left col (rows−2−k, k) to (k+1, k). Handle the case when the ring collapses to a single row or column.

def traverse_boundary(mat, k):
    rows, cols = len(mat), len(mat[0])
    top, bottom = k, rows - 1 - k
    left, right = k, cols - 1 - k
    if top > bottom or left > right:
        return
    for j in range(left, right + 1):
        process(mat[top][j])
    for i in range(top + 1, bottom + 1):
        process(mat[i][right])
    if top != bottom:
        for j in range(right - 1, left - 1, -1):
            process(mat[bottom][j])
    if left != right:
        for i in range(bottom - 1, top, -1):
            process(mat[i][left])

Repeat for k = 0, 1, … until the rectangle is empty (top > bottom or left > right). This is the structure used for spiral order.

4-Direction and 8-Direction Neighbors

For cell (i, j), 4-neighbors: (i−1,j), (i+1,j), (i,j−1), (i,j+1). 8-neighbors add (i−1,j−1), (i−1,j+1), (i+1,j−1), (i+1,j+1). Use for BFS/DFS on grids (maze, islands).

# 4-direction
dr = [-1, 1, 0, 0]
dc = [0, 0, -1, 1]
for d in range(4):
    ni, nj = i + dr[d], j + dc[d]
    if 0 <= ni < rows and 0 <= nj < cols:
        # (ni, nj) is a valid neighbor

ASCII Diagram

  Matrix 3×3:
     j=0  j=1  j=2
  i=0  a    b    c
  i=1  d    e    f
  i=2  g    h    i

  Row-wise: a,b,c, d,e,f, g,h,i
  Main diagonal: a, e, i  (i=j)
  Layer 0: a,b,c, f,i, h,g, d
  Layer 1: e

Time and Space Complexity

Any full traversal: Visits O(rows×cols) cells; time O(rows×cols).
Space: O(1) for the traversal order itself; O(rows×cols) if you store visited (e.g. BFS/DFS).

Edge Cases

Single row/column: Layer has only one segment (top row or left col); avoid duplicating corners in boundary traversal.
Non-square: Diagonals have different lengths; main diagonal length = min(rows, cols).

Common Mistakes

Boundary traversal: When top == bottom or left == right, don’t traverse the same row/column twice (skip the “bottom” or “left” segments when the ring is a single row or column).
Neighbor bounds: Always check 0 ≤ ni < rows and 0 ≤ nj < cols before using (ni, nj).

Interview Insight

When the problem involves “traverse the matrix” or “spiral” or “layer,” say: “I’ll use row/column indices and either nested loops (row-wise) or a boundary loop (layer). For neighbors I’ll use a dr, dc array and bounds check.” For spiral, mention that you traverse layer by layer (Topic 5.12).

Practice Problems

Traverse diagonals: Print all elements in anti-diagonal order (i+j constant).
Boundary of matrix: Print only the outer ring.
BFS on grid: Shortest path in a 0/1 matrix (4-neighbors).

Summary

Row-wise: for i: for j. Column-wise: for j: for i.
Diagonal: constant (i−j) or (i+j); iterate (i, j) with bounds check.
Layer: For each k, traverse top row, right col, bottom row, left col of the k-th rectangle; handle single row/col.
Neighbors: 4-direction (dr, dc); 8-direction adds diagonals; always check bounds.

5.12 Spiral Matrix

Introduction

Spiral matrix traversal visits elements in the order they appear when you walk the matrix in a spiral: right along the top row, down the right column, left along the bottom row, up the left column, then repeat for the inner rectangle. Same idea as Topic 5.11’s layer traversal with a fixed winding order. Two common problems: (1) Given a matrix, return elements in spiral order; (2) Generate an n×n matrix filled with 1 to n² in spiral order. Both use the same layer-by-layer boundary logic. This section covers the standard solution, corner handling, and the generate variant.

Real-World Analogy

Imagine walking around the perimeter of a rectangular field: go right along the top edge, turn and go down the right edge, turn and go left along the bottom, turn and go up the left edge. You’re back near the start but one “layer” in. Repeat for the inner rectangle until you’ve covered the whole field. That’s spiral order—one boundary at a time, clockwise (or counter-clockwise; we use clockwise here).

Example

Matrix [[1,2,3],[4,5,6],[7,8,9]]. Spiral order: 1,2,3, 6,9, 8,7, 4, 5. Layer 0: top row 1,2,3; right col 6,9; bottom row 8,7; left col 4. Layer 1: center 5.

Formal Definition

Concept Note

Spiral order (clockwise): For each layer k = 0, 1, …, traverse in order: (1) top row from (k,k) to (k, cols−1−k), (2) right column from (k+1, cols−1−k) to (rows−1−k, cols−1−k), (3) bottom row from (rows−1−k, cols−2−k) to (rows−1−k, k) if top ≠ bottom, (4) left column from (rows−2−k, k) to (k+1, k) if left ≠ right. Stop when the current rectangle is empty (top > bottom or left > right). Each cell is visited once; output length = rows×cols.

Why This Topic Matters

Interviews: “Spiral order” and “generate spiral matrix” are common; clean boundary loops and single-row/column handling are what interviewers look for.
Layer abstraction: Same pattern as general boundary traversal (5.11); spiral fixes the order (right, down, left, up).

Mental Model

Maintain four bounds: top, bottom, left, right. The current “ring” is the rectangle between these. Add all of the top row (left→right), then right column (top+1→bottom), then bottom row (right−1→left) only if there is more than one row, then left column (bottom−1→top+1) only if there is more than one column. Then shrink: top++, bottom--, left++, right--. Repeat until top > bottom or left > right.

Algorithm: Spiral Order (Read)

Initialize top=0, bottom=rows−1, left=0, right=cols−1. Result list = [].
While top ≤ bottom and left ≤ right:
Traverse top row: for j from left to right, append mat[top][j].
Traverse right column: for i from top+1 to bottom, append mat[i][right].
If top < bottom: traverse bottom row from right−1 to left, append mat[bottom][j].
If left < right: traverse left column from bottom−1 to top+1, append mat[i][left].
top++, bottom--, left++, right--.
Return result.

Python Implementation: Spiral Order

def spiral_order(mat):
    if not mat or not mat[0]:
        return []
    rows, cols = len(mat), len(mat[0])
    top, bottom = 0, rows - 1
    left, right = 0, cols - 1
    result = []
    while top <= bottom and left <= right:
        for j in range(left, right + 1):
            result.append(mat[top][j])
        top += 1
        for i in range(top, bottom + 1):
            result.append(mat[i][right])
        right -= 1
        if top <= bottom:
            for j in range(right, left - 1, -1):
                result.append(mat[bottom][j])
            bottom -= 1
        if left <= right:
            for i in range(bottom, top - 1, -1):
                result.append(mat[i][left])
            left += 1
    return result

Line-by-Line Notes

After the top row we do top += 1, so the right column starts at the next row (no duplicate corner).
After the right column we do right -= 1; bottom row goes from right to left (no duplicate).
if top <= bottom: when there is only one row left, we already added it as the “top” row; skip the bottom pass.
if left <= right: when there is only one column left, we already added it in the right column; skip the left pass.

Generate Spiral Matrix (n×n, 1 to n²)

Same bounds (top, bottom, left, right). Fill with a counter: do the same four segments (top row, right col, bottom row, left col), writing counter and incrementing. Initialize mat with zeros; write in spiral order.

def generate_spiral(n):
    mat = [[0] * n for _ in range(n)]
    top, bottom = 0, n - 1
    left, right = 0, n - 1
    num = 1
    while top <= bottom and left <= right:
        for j in range(left, right + 1):
            mat[top][j] = num
            num += 1
        top += 1
        for i in range(top, bottom + 1):
            mat[i][right] = num
            num += 1
        right -= 1
        if top <= bottom:
            for j in range(right, left - 1, -1):
                mat[bottom][j] = num
                num += 1
            bottom -= 1
        if left <= right:
            for i in range(bottom, top - 1, -1):
                mat[i][left] = num
                num += 1
            left += 1
    return mat

ASCII Diagram

  3×3:  [1 2 3]
        [4 5 6]
        [7 8 9]

  Layer 0: top row 1,2,3 → right 6,9 → bottom 8,7 → left 4
  Layer 1: top row 5 (then top>bottom, done)
  Order: 1,2,3,6,9,8,7,4,5

Time and Space Complexity

Spiral order (read): Each cell visited once; time O(rows×cols). Space O(1) extra plus O(rows×cols) for the output list.
Generate spiral: O(n²) time and space for the n×n matrix.

Edge Cases

Empty matrix: Return [] or empty matrix.
Single row: Top row adds all; right col and bottom row skipped (top > bottom after top+=1); left col skipped. Correct.
Single column: Top row adds one cell; right col adds rest (or similar); bottom/left conditions avoid duplicates.

Common Mistakes

Double-counting corners: After adding the top row, start the right column at top+1 (and we already did top+=1). After the right column, the bottom row should go from right−1 to left (and we did right−=1). Same for left column.
Forgetting single row/column: Without “if top <= bottom” before the bottom row, a single horizontal strip would be added twice. Same for “if left <= right” and the left column.

Common Mistake

Traversing the bottom row or left column without checking top < bottom or left < right. When the ring collapses to one row or one column, those segments were already covered by the top row or right column; adding them again duplicates elements.

Pattern Recognition

“Spiral order” / “spiral traversal”: Layer-by-layer, four segments per layer with the if-checks.
“Generate n×n spiral”: Same loop, write counter instead of read.

Interview Insight

Say: “I’ll traverse layer by layer with top, bottom, left, right. For each layer I add the top row, right column, then if there’s more than one row the bottom row, then if there’s more than one column the left column. Then shrink the bounds.” Mention the single-row/column check to avoid duplicates.

Practice Problems

Spiral order: Return elements of matrix in spiral order (as above).
Generate spiral matrix: n×n matrix with 1..n² in spiral order.
Spiral order for non-square: Same algorithm works for rows×cols.

Summary

Spiral order = layer by layer: top row (left→right), right col (top+1→bottom), bottom row (right−1→left) if top<bottom, left col (bottom−1→top+1) if left<right.
Update bounds after each segment (top++, right--, etc.) to avoid duplicate corners.
Use if top <= bottom and if left <= right before bottom and left segments so single row/column are not traversed twice.

5.13 Subarrays

Introduction

A subarray is a contiguous segment of an array—elements from index start to index end (inclusive), with start ≤ end. So a subarray is uniquely defined by the pair (start, end). There are n(n+1)/2 subarrays in an array of length n. This is different from a subsequence, which keeps order but does not require contiguity (Topic 5.14). Most “subarray” problems (max sum, subarray sum equals K, longest with property) use techniques you’ve seen: Kadane, prefix sum, sliding window, two pointers. This section clarifies the definition, count, and how to enumerate or reason about subarrays.

Real-World Analogy

Imagine a train with numbered cars. A subarray is a connected stretch of cars: cars 3, 4, 5 together. You can’t take cars 3 and 5 and skip 4 and still call it a “subarray”—that would be a subsequence. So “subarray” = one contiguous block. The number of possible contiguous blocks is the number of ways to pick a start car and an end car (with start ≤ end)—that’s n(n+1)/2 for n cars.

Example

Array [1, 2, 3]. Subarrays: [1], [1,2], [1,2,3], [2], [2,3], [3]—six total. 3(3+1)/2 = 6. [1,3] is not a subarray (not contiguous); it’s a subsequence.

Formal Definition

Concept Note

Subarray: For array arr of length n, a subarray is arr[start..end] for some 0 ≤ start ≤ end < n (0-based). It has (end − start + 1) elements. Count: For each start (0 to n−1), end can be start, start+1, …, n−1—that’s (n − start) choices. Total = n + (n−1) + … + 1 = n(n+1)/2. Subsequence (different): any subset of indices in order; not necessarily contiguous; 2^n possible.

Why This Topic Matters

Terminology: Interview problems often say “subarray”; you must not confuse with “subsequence.” Subarray ⇒ contiguous; use Kadane, prefix sum, sliding window.
Counting: “How many subarrays have sum K?” uses prefix + map; “max sum subarray” uses Kadane; “longest subarray with …” uses sliding window or two pointers.

Mental Model

Fix start; vary end from start to n−1. That gives all subarrays starting at start. Then start = 0, 1, …, n−1. So you can enumerate with two loops: for start in range(n): for end in range(start, n). The subarray is arr[start:end+1].

Enumerating All Subarrays

# All subarrays: (start, end) with 0 <= start <= end < n
for start in range(len(arr)):
    for end in range(start, len(arr)):
        sub = arr[start:end+1]
        # process sub, or use start/end to compute sum via prefix

This is O(n²) subarrays; if you do O(1) work per subarray (e.g. sum via prefix), total O(n²). Often we don’t enumerate all—we use Kadane (O(n)) or prefix+map for “count subarray sum = K” (O(n)).

Count of Subarrays

Number of pairs (start, end) with 0 ≤ start ≤ end < n = n + (n−1) + … + 1 = n(n+1)/2. So brute-force “check every subarray” is at least Ω(n²) if we must look at each once.

Subarray vs Subsequence

	Subarray	Subsequence
Definition	Contiguous segment	Any subset, order preserved
Count	n(n+1)/2	2^n
Techniques	Kadane, prefix, sliding window	DP, LIS-style

Common Subarray Problems (Recap)

Maximum sum subarray: Kadane (5.8)—O(n).
Count subarrays with sum = K: Prefix sum + frequency map (5.6)—O(n).
Longest subarray with sum ≤ K / at most K distinct: Sliding window (5.5)—O(n).
Minimum/maximum length subarray with sum ≥ K: Sliding window—O(n).

Time and Space Complexity

Enumerate all subarrays: O(n²) subarrays; O(n) per subarray if you copy → O(n³). With prefix sum for range sum: O(n²).
Optimized solutions: Kadane O(n); prefix+map for count O(n); sliding window O(n).

Edge Cases

Empty array: Zero subarrays (or define “empty subarray” if problem allows).
Single element: One subarray: the element itself.

Common Mistakes

Confusing subarray with subsequence: Subarray must be contiguous. [1,3] from [1,2,3] is a subsequence, not a subarray.
Brute force when better exists: Don’t enumerate all subarrays for “max sum” or “count sum = K”—use Kadane or prefix+map.

Common Mistake

Using “subsequence” techniques (e.g. pick/don’t pick DP) for a problem that asks for “subarray.” Subarray ⇒ contiguous; the recurrence is different (e.g. “max sum ending at i” for Kadane).

Pattern Recognition

“Subarray” + max/min sum: Kadane or prefix + structure.
“Subarray” + count with sum K: Prefix + hash map.
“Subarray” + longest/shortest with condition: Sliding window or two pointers.

Interview Insight

When the problem says “subarray,” confirm: “So we need a contiguous segment.” Then choose: max sum → Kadane; count sum = K → prefix + map; longest/shortest with property → sliding window. Don’t mix up with subsequence.

Practice Problems

Max sum subarray: Kadane.
Subarray sum equals K (count): Prefix + frequency map.
Longest subarray with at most K distinct: Sliding window.

Summary

Subarray = contiguous segment arr[start..end]. Count = n(n+1)/2.
Subarray ≠ subsequence: Subsequence keeps order but not contiguity; 2^n possible.
Enumerate with for start: for end in range(start, n). Optimized: Kadane, prefix+map, sliding window.

5.14 Subsequences

Introduction

A subsequence of an array is a sequence obtained by deleting zero or more elements without changing the order of the remaining elements. Unlike a subarray, elements do not have to be contiguous—you can skip any indices. There are 2^n subsequences (each element is either included or not). Many “subsequence” problems use dynamic programming (pick/don’t pick, or “state at index i”) or greedy (e.g. longest increasing subsequence with binary search). This section defines subsequences, how to enumerate them, and how they differ from subarrays (Topic 5.13).

Real-World Analogy

Imagine a queue of people. A subsequence is any group you get by asking some people to step out—the ones left stay in the same relative order. You might keep persons 1, 3, and 5; that’s a subsequence. A subarray would require that the kept people stood next to each other in the original queue. So “subsequence” = same order, any selection; “subarray” = one contiguous block.

Example

Array [1, 2, 3]. Subsequences include: [], [1], [2], [3], [1,2], [1,3], [2,3], [1,2,3]—eight = 2^3. [1,3] is a subsequence (we skipped 2) but not a subarray. Every subarray is a subsequence; not every subsequence is a subarray.

Formal Definition

Concept Note

Subsequence: A sequence (b_0, b_1, …, b_{k−1}) is a subsequence of array arr if there exist indices 0 ≤ i_0 < i_1 < … < i_{k−1} < n such that b_j = arr[i_j] for each j. So we choose a strictly increasing sequence of indices and take those elements in order. Count: For each of the n elements, we either include it or not → 2^n subsequences (including the empty one). vs Subarray: Subarray = contiguous; count n(n+1)/2.

Why This Topic Matters

Terminology: “Subsequence” ⇒ order preserved, not contiguous. Techniques are different from subarray (DP, LIS, pick/don’t pick).
Classic problems: Longest increasing subsequence (LIS), longest common subsequence (LCS), “is s a subsequence of t?”—all use subsequence definition.

Mental Model

At each index, you have two choices: pick this element (add to the subsequence) or skip it. So generating all subsequences is like generating all subsets—2^n. For optimization (“longest subsequence with property”), you usually use DP: state = (index, maybe extra info); recurrence = pick vs don’t pick.

Generating All Subsequences

Recursion (Pick / Don’t Pick)

def subsequences(arr, i, path, result):
    if i == len(arr):
        result.append(path[:])
        return
    # Don't pick arr[i]
    subsequences(arr, i + 1, path, result)
    # Pick arr[i]
    path.append(arr[i])
    subsequences(arr, i + 1, path, result)
    path.pop()

Call with subsequences(arr, 0, [], result). Each path is one subsequence. Total 2^n.

Using Bitmask

For small n, iterate mask from 0 to 2^n − 1; if bit j is set, include arr[j].

for mask in range(1 << len(arr)):
    sub = [arr[j] for j in range(len(arr)) if (mask >> j) & 1]
    # process sub

Count: 2^n

Each element is either in or out of the subsequence → n independent binary choices → 2^n. Including the empty subsequence.

Subsequence vs Subarray (Recap)

Subarray: Contiguous; indices [start, end]; count n(n+1)/2. Use Kadane, prefix sum, sliding window.
Subsequence: Any subset in order; count 2^n. Use DP (pick/don’t pick), LIS, LCS.

Classic Example: Longest Increasing Subsequence (LIS)

Find the length of the longest subsequence that is strictly increasing. DP: dp[i] = length of LIS ending at index i. For each i, look at all j < i with arr[j] < arr[i]; dp[i] = 1 + max(dp[j]). O(n²). Better: O(n log n) with binary search (patience sorting). This is a subsequence problem—we skip elements.

def lis_length(arr):
    if not arr:
        return 0
    dp = [1] * len(arr)
    for i in range(1, len(arr)):
        for j in range(i):
            if arr[j] < arr[i]:
                dp[i] = max(dp[i], dp[j] + 1)
    return max(dp)

Is Subsequence (Check)

Given two strings/arrays s and t, is s a subsequence of t? Two pointers: scan t; whenever t[j] == s[i], advance i. If i reaches len(s), yes. O(|s| + |t|).

def is_subsequence(s, t):
    i = 0
    for c in t:
        if i < len(s) and s[i] == c:
            i += 1
    return i == len(s)

Time and Space Complexity

Generate all: 2^n subsequences; O(n) each → O(n·2^n). Space O(n) recursion depth.
LIS (DP): O(n²) time, O(n) space. LIS with binary search: O(n log n).
Is subsequence: O(len(s) + len(t)).

Edge Cases

Empty array: One subsequence: [].
Empty s in “is s subsequence of t?”: Empty string is subsequence of any t; return True.

Common Mistakes

Treating as subarray: Don’t use Kadane or sliding window for “longest increasing subsequence”—it’s not contiguous.
Wrong order: Subsequence must preserve original order. [3,1,2] is not a subsequence of [1,2,3].

Common Mistake

Using “subarray” techniques for “subsequence” problems. Max sum subsequence is different from Kadane (max sum subarray): for subsequence you can skip negative elements, so it’s sum of all positive (or pick/don’t pick DP). Always check the problem wording.

Pattern Recognition

“Longest increasing subsequence”: DP O(n²) or binary search O(n log n).
“Is s a subsequence of t?”: Two pointers O(n+m).
“Count subsequences with property”: Often DP with state (index, …).

Interview Insight

When the problem says “subsequence,” confirm: “So we can skip elements but must keep order.” Then: generating all → recursion or bitmask; longest/subsequence count → DP; “is A subsequence of B?” → two pointers. Don’t use subarray (contiguous) techniques.

Practice Problems

Longest increasing subsequence: DP or binary search.
Is subsequence: Two pointers.
Longest common subsequence (LCS): Classic 2D DP.

Summary

Subsequence = elements in original order, not necessarily contiguous. Count = 2^n.
Subarray = contiguous; count n(n+1)/2. Don’t confuse.
Generate all: recursion (pick/don’t pick) or bitmask. Optimize: DP (LIS, LCS) or two pointers (is subsequence).

6.1 Linear Search

Introduction

Linear search (also called sequential search) is the simplest search algorithm: you start at the beginning of a collection and check every element in order until you either find the target or reach the end. It makes no assumptions about the data—the array or list can be sorted, unsorted, or in any order. Because it does not exploit structure, in the worst case it must look at every element, giving O(n) time. That makes it the baseline "brute force" for search: correct for any input, but not fast when the data has useful structure (like sorted order). Mastering linear search teaches you how to scan collections safely, handle "not found," and recognize when a better algorithm (like binary search) can replace it.

Real-World Analogy

Imagine you're looking for a specific book on a shelf where books are not in any particular order. You have no choice but to start at one end and look at each spine until you find the title you want or run out of books. You cannot "jump to the middle" and conclude the book isn't in the other half—because there is no ordering, the target could be anywhere. That's linear search: one item at a time, in sequence. Contrast this with a sorted phone book, where you can open to the middle and eliminate half the names in one step—that's binary search (Topic 6.2). Linear search is what you do when you don't have that luxury.

Example

Array arr = [40, 12, 88, 5, 23, 9], target 23. You check index 0 (40 ≠ 23), index 1 (12 ≠ 23), index 2 (88 ≠ 23), index 3 (5 ≠ 23), index 4 (23 = 23) → found at index 4. If the target were 99, you would check all six elements and then report "not found."

Formal Definition

Concept Note

Linear search: Given a sequence A of n elements (e.g. array or list) and a target value x, examine A[0], A[1], …, A[n−1] in order. If for some index i we have A[i] = x, return i (or True). If no such i exists, return a "not found" sentinel (e.g. −1 or False). No assumption is made about the order of elements. The algorithm is correct for any input; its cost depends on how many comparisons are made before a match or end of sequence.

Why This Topic Matters

Foundation: Linear search is the first search strategy you should internalize. It works everywhere—unsorted arrays, linked lists, even streams—and is the fallback when no better structure exists.
Baseline for comparison: When you learn binary search (O(log n)), you'll appreciate why "halving the search space" requires sorted data. Linear search is the O(n) baseline that motivates using better algorithms when the data allows.
Interviews: Interviewers often ask "find the index of target" on an unsorted array—that's linear search. They may then ask "what if the array were sorted?" to lead you to binary search. Knowing when to use which is key.
Built-in behavior: In Python, target in lst and lst.index(target) are linear search under the hood. Understanding linear search explains how these work and when they are expensive.

Mental Model

Think of linear search as "walk from left to right; stop when you see the target or run out of elements." You maintain a single position (index). At each step, you ask: "Is this element the one I want?" If yes, you're done. If no, you move to the next index. If you've passed the last index without finding it, the target is not in the collection. No need to remember previous elements or look ahead—just one pass, one element at a time.

Step-by-Step Breakdown

Initialize: You'll traverse indices from 0 to len(arr) - 1. No extra data structure is needed.
Loop: For each index i in that range, compare arr[i] with target.
Match: If arr[i] == target, return i (or True) immediately. You've found the target.
No match: If the loop completes without returning, the target was never seen. Return a sentinel value such as −1 or False to indicate "not found."

ASCII Diagram

  arr = [ 40,  12,  88,   5,  23,   9 ]   target = 23
  i=0:   40 ≠ 23  →  continue
  i=1:   12 ≠ 23  →  continue
  i=2:   88 ≠ 23  →  continue
  i=3:    5 ≠ 23  →  continue
  i=4:   23 = 23  →  return 4

  If target = 99:
  i=0..5: no match  →  after loop, return -1

Python Implementation

Basic: Return Index or −1

def linear_search(arr, target):
    for i in range(len(arr)):
        if arr[i] == target:
            return i
    return -1

Variant: Return Boolean (Exists or Not)

def linear_search_exists(arr, target):
    for x in arr:
        if x == target:
            return True
    return False

Variant: Find Last Occurrence

To find the last index where the target appears, scan from the end or keep updating the result as you see matches.

def linear_search_last(arr, target):
    result = -1
    for i in range(len(arr)):
        if arr[i] == target:
            result = i
    return result

Line-by-Line Explanation (Basic Version)

for i in range(len(arr)): Iterates over every valid index from 0 to n−1. This guarantees we consider every element exactly once (until we return early).
if arr[i] == target: The only comparison we need. For custom types, this might be replaced by a predicate (e.g. if predicate(arr[i])) or == if the type supports it.
return i: As soon as we find a match, we return that index. No need to look further for "first occurrence" — the first match we see is the first occurrence because we go left to right.
return -1: Reached only if the loop completed without returning. By convention, −1 often means "not found" for index-returning functions (since 0 is a valid index).

Time Complexity

Worst case: The target is not in the array, or it is at the last position. We compare arr[i] with target for every index i from 0 to n−1. That's n comparisons → O(n) time.

Best case: The target is at index 0. We do one comparison and return. So 1 comparison → O(1) time.

Average case (target in array, uniformly random position): On average the target is around the middle; we check about n/2 elements before finding it. So about n/2 comparisons → O(n). If the target might not be in the array at all, the average depends on the probability of presence; still linear in n.

Concept Note

We say linear search is Θ(n) in the worst case: we do at least n/2 and at most n comparisons when the target is absent or at the end. So worst-case time is both O(n) and Ω(n), hence Θ(n). For big-O we typically state O(n) and note that it's optimal for an unsorted array—you cannot do better than O(n) in the worst case without additional structure (e.g. a hash set) or assumptions (e.g. sorted order).

Space Complexity

We only use a loop variable (index i) and the input reference. No extra array or recursive call stack that grows with n. So O(1) auxiliary space.

Edge Cases

Empty array: range(0) runs zero times; we skip the loop and return −1. Correct.
Single element: One comparison: either it's the target (return 0) or not (return −1).
Target at index 0: One comparison, return 0. Best case.
Target at last index: n comparisons, return n−1. Worst case when target exists.
Duplicate values: The basic implementation returns the first occurrence. If you need the last, use the "last occurrence" variant above.
None or non-comparable elements: If target or elements can be None, ensure your comparison doesn't raise. For custom objects, == must be defined; otherwise use an explicit predicate.

Common Mistakes

Using linear search on a sorted array when you need speed: If the array is sorted, binary search (Topic 6.2) gives O(log n). Use linear search only when the array is unsorted or when you need a simple one-off check and n is small.
Forgetting the "not found" return: If you only return inside the loop, you must return something after the loop (e.g. −1). Otherwise the function returns None when the target is absent, which can cause bugs if the caller expects an integer.
Modifying the array while iterating: Don't add or remove elements from the list during the linear search loop; iteration behavior can become confusing. If you need to collect indices or modify data, do it in a separate pass or with a while-loop and explicit index management.

Common Mistake

Using for x in arr when you need the index. Then you only have the value x, not its position. Use for i in range(len(arr)) and arr[i], or for i, x in enumerate(arr) so you have both.

Evolution: When to Use Linear vs Better Search

Scenario	Use	Reason
Unsorted array, find target	Linear search	No structure to exploit; O(n) is optimal.
Sorted array, find target	Binary search	Halve search space each step; O(log n).
Need "exists?" only, small n	Linear or `in`	Simple and fast enough for small data.
Many searches on same data	Consider sort + binary, or hash set	Amortize cost: sort once O(n log n), then k × O(log n) or O(1) per lookup.

Optimization Insight

Linear search is the brute force for search: correct and simple, but O(n). If your array is sorted, switch to binary search for O(log n). If you only need membership and will do many lookups, building a set from the array gives O(n) build and O(1) average lookup—better when k (number of lookups) is large. Use linear search when the data is unsorted, small, or you need the index and don't have (or don't want to maintain) extra structure.

Pattern Recognition

Many problems are "linear scan and compare": find first/last index, count occurrences, find min/max, or check a condition over the array. The pattern is: one loop, one pass, O(n). If the problem says "unsorted" or "arbitrary order," think linear search first. If it says "sorted" or "non-decreasing," think binary search.

Python Built-ins

target in arr performs a linear search and returns True or False. arr.index(target) returns the first index of target or raises ValueError if not found. Both are O(n). For a safe "index or -1" you can use:

try:
    return arr.index(target)
except ValueError:
    return -1

Or stick to an explicit loop so you have full control (e.g. last occurrence, custom predicate).

Interview Insight

For "find index of target in an array," clarify: Is the array sorted? If yes, say you'd use binary search (Topic 6.2) for O(log n). If unsorted, implement linear search and state O(n) time, O(1) space. Mention that for unsorted data, O(n) is optimal in the worst case—you must look at every element if the target might be last or absent. If the interviewer asks "what if we search many times?" discuss sorting once and then binary search, or using a hash set for membership.

Practice Problems

Implement linear search: return first index of target or −1.
Find the last index of target in an unsorted array.
Count how many times target appears (single linear pass).
Find the minimum (or maximum) element and its index—same O(n) scan, different comparison.

Summary

Linear search = scan elements in order from index 0 to n−1; return index when arr[i] == target, else return −1 after the loop.
Time: O(n) worst and average; O(1) best (target at front). Space: O(1).
Use for unsorted data or when you need the index and have no extra structure. For sorted data, prefer binary search.
Edge cases: empty array (return −1), duplicates (first occurrence returned by basic version), last occurrence (keep updating result in loop).
Python: in and index() are linear; use explicit loop when you need last index or custom logic.

6.2 Binary Search

Introduction

Binary search is the fast way to find a target in a sorted array: instead of checking every element (linear search, O(n)), you repeatedly compare the target with the middle element and throw away half of the remaining range. Each step halves the search space, so you need at most about log₂(n) steps—giving O(log n) time. This only works when the array is sorted (or ordered by some predicate) so that comparing with the middle tells you which half can contain the target. Binary search is a cornerstone of algorithm design: the same "narrow the range" idea appears in search-in-rotated-array, binary search on answer (Topic 5.9), and countless interview problems. Mastering the loop condition, the updates for left and right, and the first/last occurrence variants will make you interview-ready.

Real-World Analogy

Imagine a phone book sorted A–Z. To find "Smith," you don't read every page. You open to the middle: if you see "M," Smith is in the second half; if you see "T," Smith is in the first half. You throw away the half that cannot contain Smith and repeat on the remaining half. Each time you eliminate roughly half the pages. After about log₂(number of pages) steps, you're down to one page. That's binary search—divide and conquer by comparison with the middle. The critical requirement: the book must be sorted. In an unsorted pile of papers, opening to the middle tells you nothing about where "Smith" might be.

Example

Sorted array arr = [2, 5, 7, 9, 12, 15], target 9. Step 1: mid = 2, arr[2]=7 < 9 → search right half [9, 12, 15]. Step 2: mid = 4, arr[4]=12 > 9 → search left half [9]. Step 3: mid = 3, arr[3]=9 == 9 → found at index 3. Only three comparisons instead of scanning all six elements.

Formal Definition

Concept Note

Binary search (sorted array): Let A be an array of n elements sorted in non-decreasing order, and x a target. Maintain a range [left, right] of indices where x could lie. While the range is non-empty, let mid = (left + right) // 2. If A[mid] = x, return mid. If A[mid] < x, then x (if present) must be in [mid+1, right]. If A[mid] > x, then x must be in [left, mid−1]. Update the range and repeat. The loop terminates when left > right (range empty → not found) or when a match is found. Invariant: If x is in the array, it lies in [left, right] at the start of each iteration.

Why This Topic Matters

Speed: O(log n) vs O(n) for linear search on sorted data. For n = 1,000,000, that's about 20 comparisons instead of a million.
Interview staple: "Find target in sorted array," "first/last position of target," "search in rotated sorted array," "binary search on answer"—all rely on the same pattern.
Building block: Binary search on answer (Topic 5.9), lower_bound/upper_bound, and many optimization problems use "find the smallest/largest value that satisfies a condition" with a binary search over a range.
Python: The bisect module implements binary search for sorted lists. Knowing how it works helps you use bisect_left, bisect_right, and when to write a custom loop.

Mental Model

Keep a range [left, right] of indices where the target might be. Initially that's the whole array. Each step: look at the middle element. If it equals the target, you're done. If it's smaller than the target, the target (if present) must be to the right—so discard the left half and set left = mid + 1. If it's larger, the target must be to the left—set right = mid - 1. The range shrinks every time; when left > right, the range is empty and the target isn't in the array. The key is: one comparison with the middle tells you which half to keep—and that's only valid because the array is sorted.

Step-by-Step Breakdown

Initialize: left = 0, right = len(arr) - 1. The search range is the entire array.
Loop: While left <= right (range is non-empty), compute mid = (left + right) // 2.
Compare: If arr[mid] == target, return mid. If arr[mid] < target, set left = mid + 1. Otherwise arr[mid] > target, set right = mid - 1.
Termination: If the loop exits without returning, the range became empty; return −1 (not found).

ASCII Diagram

  Sorted array:  [ 2,  5,  7,  9, 12, 15 ]   target = 9
  Index:          0   1   2   3   4   5

  Step 1: left=0, right=5, mid=2 → arr[2]=7 < 9 → discard left half
          [ 2,  5,  7, | 9, 12, 15 ]
          left = 3, right = 5

  Step 2: left=3, right=5, mid=4 → arr[4]=12 > 9 → discard right half
          [ 9, 12, 15 ]
          left = 3, right = 3

  Step 3: left=3, right=3, mid=3 → arr[3]=9 == 9 → return 3

Python Implementation

Standard: Any Occurrence, Return Index or −1

def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        if arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1

First Occurrence (Leftmost Index)

When the array may contain duplicates, "first index of target" means: when you find a match at mid, don't return yet—there might be another match to the left. Remember mid as a candidate and search [left, mid−1].

def first_index(arr, target):
    left, right = 0, len(arr) - 1
    result = -1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            result = mid
            right = mid - 1   # keep looking left
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return result

Last Occurrence (Rightmost Index)

For the last occurrence: when arr[mid] == target, set result = mid and search the right half with left = mid + 1.

def last_index(arr, target):
    left, right = 0, len(arr) - 1
    result = -1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            result = mid
            left = mid + 1    # keep looking right
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return result

Recursive Version

Same logic, expressed recursively: base case is empty range (not found) or match at mid; otherwise recurse on the left or right half.

def binary_search_rec(arr, target, left, right):
    if left > right:
        return -1
    mid = (left + right) // 2
    if arr[mid] == target:
        return mid
    if arr[mid] < target:
        return binary_search_rec(arr, target, mid + 1, right)
    return binary_search_rec(arr, target, left, mid - 1)

# Call: binary_search_rec(arr, target, 0, len(arr) - 1)

Line-by-Line Explanation (Standard Version)

left <= right: We must include the case left == right (one element left). So the loop runs while the range is non-empty. Exiting when left > right means we've considered every candidate—target not found.
mid = (left + right) // 2: Integer division so mid is a valid index. In C/Java, left + (right - left) / 2 avoids overflow; in Python (left + right) // 2 is standard.
arr[mid] < target → left = mid + 1: Every element at index ≤ mid is ≤ arr[mid], so < target. The target cannot be there; search only [mid+1, right].
else (arr[mid] > target) → right = mid - 1: Every element at index ≥ mid is ≥ arr[mid], so > target. Search only [left, mid−1].
We always exclude mid when we shrink (mid+1 or mid−1), so the range strictly shrinks and the loop cannot run forever.

Time Complexity

Why O(log n)? Each iteration compares the target with one element and then discards at least half of the current range. So the size of the range goes: n → n/2 → n/4 → … → 1 (or 0). The number of times we can halve n until we reach 1 is ⌈log₂(n)⌉ (or one more for the empty check). So the number of iterations is at most ⌈log₂(n+1)⌉, which is O(log n).

Best case: Target at the first mid we check—O(1). Worst case: Target not present or at a leaf of the "decision tree"—O(log n). Average case: O(log n) as well.

Concept Note

We say binary search is Θ(log n) in the worst case: we do at least and at most on the order of log n comparisons. No comparison-based search on a sorted array can do better than Ω(log n) in the worst case (decision-tree argument: n! orderings possible, each comparison gives a binary decision, so we need at least log₂(n!) ≈ n log n bits… for search we need log n comparisons). So binary search is optimal for sorted array search.

Space Complexity

Iterative version: Only a few variables (left, right, mid). O(1) auxiliary space.

Recursive version: Each recursive call uses stack space. The depth is the number of halvings, so O(log n) space for the call stack.

Edge Cases

Empty array: left=0, right=-1 → left <= right is false → return −1 immediately.
Single element: left=right=0, one iteration, compare once, return 0 or −1.
Target not present: Loop eventually makes the range empty; return −1.
All elements equal to target: Standard version returns any matching index. First_index returns 0; last_index returns n−1.
Unsorted array: Binary search is incorrect—it may miss the target or return a wrong index. Always ensure the array is sorted (or that your predicate preserves the "discard half" property).

Common Mistakes

Using binary search on unsorted data: The "discard half" logic depends on sorted order. If the array isn't sorted, use linear search or sort first.
Loop condition left < right instead of left <= right: When left == right, there is still one element to check. With left < right you exit without checking it and can incorrectly return −1.
Not shrinking the range: You must set left = mid + 1 or right = mid - 1. Using left = mid or right = mid can leave the range unchanged when left == mid or right == mid, causing an infinite loop.
Integer overflow for mid (other languages): In C/Java, (left + right) / 2 can overflow. Use left + (right - left) / 2. In Python this is not an issue for list indices.

Common Mistake

Confusing "find exact index" with "find insertion point." For exact match we use left <= right and return when arr[mid] == target. For insertion point (smallest index where arr[i] >= target) we often use left < right and return left at the end—that's the bisect_left style. Don't mix the two loop invariants.

Evolution: Linear → Binary

Approach	Requirement	Time	Space
Linear search	Any order	O(n)	O(1)
Binary search	Sorted array	O(log n)	O(1) iter / O(log n) rec

Optimization Insight

Whenever the array (or search space) is sorted and you need to find a value or the boundary of a condition, binary search is the tool. If you need to run many searches on the same array, sorting once (O(n log n)) and then doing k binary searches (k × O(log n)) beats k linear searches (k × O(n)) for moderate to large k. For "find insertion point" or "lower_bound," use the same pattern with left < right and return left—or use bisect_left.

Pattern Recognition

Binary search applies when: (1) the data is sorted (or monotonic in some sense), and (2) comparing with the middle tells you which half to keep. The pattern: maintain [left, right], compute mid, compare, then set left = mid + 1 or right = mid - 1. The same idea extends to "binary search on answer": the "array" is a range of possible answers, and you have a predicate that is false for small values and true for large (or vice versa); you binary search for the boundary.

Python Built-ins: bisect

The bisect module provides binary search for sorted lists:

bisect.bisect_left(arr, target): Leftmost index i such that arr[i] ≥ target. If target is present, this is the first occurrence. Inserting target at this index keeps the list sorted.
bisect.bisect_right(arr, target) (or bisect.bisect): Index one past the last occurrence of target—smallest i such that arr[i] > target. So last index of target = bisect_right(arr, target) - 1 (if present).

To check existence and get first index: i = bisect_left(arr, target); if i < len(arr) and arr[i] == target, then index is i. Count of target: bisect_right(arr, target) - bisect_left(arr, target).

Interview Insight

Clarify: "Is the array sorted? Are duplicates allowed? Return first index, last index, or any?" Then implement with left <= right, correct updates, and first_index/last_index if needed. Mention that for sorted arrays, binary search is O(log n) and optimal. If the problem is "insertion position" or "lower_bound," you can implement it or use bisect_left in Python. Be ready to derive why the loop terminates and why we need left <= right.

Practice Problems

Find target in sorted array (return index or −1).
Find first and last position of target in sorted array with duplicates.
Search insert position: index where target would be inserted to keep order (same as bisect_left).
Count occurrences of target in sorted array (last_index − first_index + 1 or bisect).
Search in rotated sorted array (array is sorted then rotated; adapt the comparison logic).

Summary

Binary search requires a sorted array. Maintain [left, right]; compare target with arr[mid]; discard left or right half; repeat until found or range empty.
Time O(log n), space O(1) iterative / O(log n) recursive. Optimal for comparison-based search on sorted data.
Use left <= right and mid = (left + right) // 2; update left = mid + 1 or right = mid - 1 so the range always shrinks.
First occurrence: on match, search left (right = mid - 1). Last occurrence: on match, search right (left = mid + 1).
Python: bisect_left / bisect_right for insertion position and range; use explicit loop when you need exact "first/last index" semantics or custom predicates.

6.3 Ternary Search

Introduction

Ternary search is a divide-and-conquer technique that splits the search space into three parts (instead of two like binary search) using two interior points, then discards one of the three segments. It is most useful for finding the maximum or minimum of a unimodal function—a function that first strictly increases and then strictly decreases (or the reverse). On a plain sorted array, ternary search is not better than binary search: you do more comparisons per step and reduce the range by one-third instead of one-half, so you get O(log₃ n) which is still O(log n) but with a worse constant. Where ternary search shines is unimodal optimization: given a function f that has a single peak (or valley), ternary search can find that peak in O(log n) evaluations of f. This appears in problems like "find the peak in a bitonic array," "minimize a cost function," or "find the maximum in a sequence that first goes up then down."

Real-World Analogy

Imagine you're standing on a hill that goes up to a single peak and then down on the other side. You can't see the whole hill; you can only check the height at positions you walk to. To find the peak efficiently, you might check two points partway along the range: if the left point is higher than the right, the peak must be to the left of the right point, so you discard the right third. If the right point is higher, discard the left third. If they're equal (or you're close enough), you're near the peak. You keep narrowing the range until you've found the top. That's ternary search on a unimodal "height" function—each step throws away at least one-third of the remaining range.

Example

Bitonic array arr = [1, 3, 8, 12, 9, 5, 2] (increases to 12, then decreases). We want the index of the maximum. Compare at indices mid1 and mid2: if arr[mid1] < arr[mid2], the peak is in the right two-thirds; if arr[mid1] > arr[mid2], the peak is in the left two-thirds. We narrow until we have a single candidate (the peak).

Formal Definition

Concept Note

Unimodal function: A function f on indices [0, n−1] is unimodal if there exists an index m such that f is strictly increasing on [0, m] and strictly decreasing on [m, n−1] (or the reverse—strictly decreasing then strictly increasing). So there is exactly one local maximum (or minimum). Ternary search: Maintain a range [left, right]. Choose two interior points mid1 = left + (right - left) // 3 and mid2 = right - (right - left) // 3. Compare f(mid1) and f(mid2). If unimodal and we want the maximum: if f(mid1) < f(mid2), the maximum cannot be in [left, mid1]; if f(mid1) > f(mid2), the maximum cannot be in [mid2, right]. Update the range and repeat until the range is small enough (e.g. one or two elements).

Why This Topic Matters

Unimodal optimization: Many real-world and contest problems ask for the maximum (or minimum) of a function that has a single peak—e.g. bitonic array peak, minimize cost with a convex structure. Ternary search is the standard tool.
Interviews: "Find peak in a bitonic array" or "find the index of the maximum in an array that increases then decreases" are classic. Knowing ternary search (or the equivalent "compare two points and discard one third") shows you understand divide-and-conquer beyond binary.
Comparison with binary search: For sorted array lookup, binary search is strictly better (fewer comparisons, same O(log n)). For unimodal functions, ternary search is natural because there is no single "middle" comparison that tells you "which half"—you need two points to decide which third to discard.

Mental Model

You have a range [left, right] and a unimodal function (one peak). Pick two points inside the range: one in the left third and one in the right third. Compare their values. If the left point is lower than the right, the peak must be to the right of the left point (otherwise the function wouldn't go up toward the right point)—so discard the left third. If the left point is higher, the peak must be to the left of the right point—discard the right third. You always throw away at least one-third of the range, so after O(log n) steps the range collapses to the peak.

Step-by-Step Breakdown (Find Maximum of Unimodal Array)

Initialize: left = 0, right = len(arr) - 1.
Loop: While the range has more than two elements (e.g. right - left > 2), or while left < right with care: compute mid1 = left + (right - left) // 3, mid2 = right - (right - left) // 3 (so mid1 < mid2).
Compare: If arr[mid1] < arr[mid2], the peak is in [mid1+1, right] (or at least not in [left, mid1]), set left = mid1 + 1. If arr[mid1] > arr[mid2], the peak is in [left, mid2−1], set right = mid2 - 1. If equal, we can move either way (e.g. left = mid1 + 1).
Termination: When the range is small (e.g. 1 or 2 elements), compare and return the index of the maximum.

ASCII Diagram

  Unimodal (bitonic):  [ 1,  3,  8, 12,  9,  5,  2 ]   find index of max
  Index:               0   1   2   3   4   5   6

  left=0, right=6 → mid1 = 2, mid2 = 4
  arr[2]=8, arr[4]=9 → 8 < 9 → peak in right part → left = 3

  left=3, right=6 → mid1 = 4, mid2 = 5
  arr[4]=9, arr[5]=5 → 9 > 5 → peak in left part → right = 4

  left=3, right=4 → small range: max(arr[3], arr[4]) = 12 at index 3 → return 3

Python Implementation

Find Index of Maximum in Unimodal (Bitonic) Array

def ternary_search_max(arr):
    left, right = 0, len(arr) - 1
    while right - left > 2:
        mid1 = left + (right - left) // 3
        mid2 = right - (right - left) // 3
        if arr[mid1] < arr[mid2]:
            left = mid1 + 1
        else:
            right = mid2 - 1
    # Range has at most 3 elements; find max index
    best = left
    for i in range(left + 1, right + 1):
        if arr[i] > arr[best]:
            best = i
    return best

Ternary Search on a Function (Find x that Maximizes f(x))

When the "array" is implicit—you have a function f and a continuous or integer range—evaluate f at mid1 and mid2 and narrow the range.

def ternary_search_func(f, left, right, eps=1e-9):
    """Find x in [left, right] that maximizes f(x). f is unimodal."""
    while right - left > eps:
        mid1 = left + (right - left) / 3
        mid2 = right - (right - left) / 3
        if f(mid1) < f(mid2):
            left = mid1
        else:
            right = mid2
    return (left + right) / 2

Line-by-Line Explanation (Array Version)

mid1 = left + (right - left) // 3: One-third of the way from left. mid2 = right - (right - left) // 3: One-third from the right. So we have three segments: [left, mid1], (mid1, mid2), [mid2, right].
arr[mid1] < arr[mid2]: On a unimodal that increases then decreases, if the value at mid1 is less than at mid2, we're still on the "increasing" side—the peak is to the right of mid1. So discard [left, mid1] by setting left = mid1 + 1.
else: arr[mid1] >= arr[mid2]. Then we're at or past the peak; the peak is to the left of mid2. Set right = mid2 - 1.
When right - left <= 2, we have at most 3 elements; a simple loop finds the maximum index. This avoids off-by-one issues at the end.

Time Complexity

Each iteration reduces the range to at most 2/3 of its size (we discard at least one-third). So the number of iterations k satisfies (2/3)^k · n ≤ 1, i.e. k = O(log₃ n) = O(log n). Each iteration does a constant number of comparisons and index computations. So time O(log n).

For the function version with real-valued domain and termination when right - left < eps, the number of iterations is O(log((right−left)/eps)).

Concept Note

Ternary search does more comparisons per step than binary search (2 vs 1) and reduces the range by a factor of 2/3 instead of 1/2. So for sorted array lookup, binary search is strictly better. Use ternary search only when the problem is unimodal optimization (find the peak/valley), not when you're just searching for a target value in a sorted list.

Space Complexity

Only a constant number of variables (left, right, mid1, mid2, best). O(1) auxiliary space.

Edge Cases

Single element: left == right; skip the while loop, return left.
Two elements: right - left == 1; we might enter the loop depending on condition (if we use right - left > 2, we don't); then the final loop compares both and returns the max index.
Strictly increasing array: The "peak" is at the last index. Unimodal allows this (decreasing part is empty). Ternary search still converges to the last index.
Strictly decreasing array: Peak at index 0. Similarly handled.
Non-unimodal array: Ternary search can return a wrong (local) maximum. Ensure the array or function is unimodal before using.

Common Mistakes

Using ternary search for sorted array lookup: For "find target in sorted array," use binary search. Ternary search is for finding the maximum/minimum of a unimodal function, not for equality check.
Wrong mid1/mid2 or update: mid1 and mid2 must lie strictly between left and right, and we must discard a full third. Using mid1 = (2*left + right) // 3 and mid2 = (left + 2*right) // 3 is equivalent. When arr[mid1] < arr[mid2], discard the left third (left = mid1 + 1); when arr[mid1] > arr[mid2], discard the right third (right = mid2 - 1).
Infinite loop with small range: Ensure the loop condition (e.g. right - left > 2) guarantees the range shrinks, and handle the base case (small range) explicitly.

Common Mistake

Assuming ternary search is "faster" than binary search because it divides into three. For sorted array search, binary search does fewer comparisons (1 per step) and halves the range (better reduction). Ternary is for unimodal optimization where one comparison is not enough to decide which half to keep.

Comparison: Binary vs Ternary

Use case	Algorithm	Time
Find target in sorted array	Binary search	O(log n), 1 compare/step
Find max/min in unimodal array	Ternary search	O(log n), 2 compares/step
Minimize unimodal function f(x)	Ternary search on domain	O(log (range/eps))

Optimization Insight

For unimodal functions, ternary search is the standard O(log n) method. Alternative: if you can compute the derivative (or discrete difference), you could use binary search on the sign of the derivative to find where it crosses zero—equivalent to finding the peak. For arrays, "find peak in bitonic array" is the classic ternary search problem.

Pattern Recognition

Think "ternary" when: (1) you need the maximum or minimum of something, and (2) that something is unimodal (one peak or one valley). Keywords: "bitonic," "increases then decreases," "single peak," "minimize cost that first decreases then increases." If the problem is "find where this value is" in a sorted list, use binary search instead.

Interview Insight

If asked "find the peak in an array that first increases then decreases," describe ternary search: divide the range into thirds, compare the two interior points, discard one third. State O(log n) time, O(1) space. Mention that for plain "find target in sorted array," binary search is preferred. You can implement the loop with mid1/mid2 and handle the small-range base case by scanning the few remaining elements for the max.

Practice Problems

Find the index of the maximum in a bitonic (unimodal) array.
Find the minimum of a unimodal function (decreases then increases) over an integer or real range.
Peak Index in a Mountain Array (LeetCode-style: array increases then decreases, find peak index).

Summary

Ternary search divides the range into three parts using two points (mid1, mid2); compare and discard one third. Used for unimodal optimization (find peak or valley), not for sorted array lookup.
Unimodal = strictly increasing then strictly decreasing (or the reverse). One local maximum (or minimum).
Time O(log n), space O(1). For sorted array search, binary search is better (fewer comparisons per step).
Implementation: while range large, compute mid1 and mid2, compare arr[mid1] and arr[mid2], set left = mid1+1 or right = mid2−1; then handle small range by linear scan for max.

6.4 Exponential & Interpolation Search

Introduction

This topic covers two search variants that improve on standard binary search in specific settings: exponential search when the target is likely near the start or when the array is effectively unbounded, and interpolation search when the sorted data is roughly uniformly distributed. Exponential search finds a range by repeatedly doubling an index (1, 2, 4, 8, …) until the value at that index exceeds the target, then runs binary search within that range—giving O(log i) time where i is the target's index. Interpolation search estimates the position of the target using the value at the endpoints (assuming uniform spread), then narrows the range like binary search but with a smarter probe—average O(log log n) for uniform data, but O(n) worst case for skewed distributions. Both assume a sorted array; neither is a drop-in replacement for binary search in all cases, but they are useful tools when the problem structure matches.

Real-World Analogy

Exponential search: Like searching for a word in a dictionary when you suspect it's in the first few pages. Instead of opening to the middle, you flip 1 page, then 2, then 4, then 8—until you've passed the word. Then you binary search in the small range you just bounded. Interpolation search: Like estimating where "Newton" sits in an alphabetically sorted list by the letters: "N" is about 14/26 of the way through the alphabet, so you might open the book about 14/26 of the way through. If the names were uniformly spread, that gets you close in one shot; if not, you adjust. Both methods exploit extra structure (target near start, or uniform distribution) to reduce work.

Example

Exponential: Sorted array of 1000 elements, target at index 5. Binary search does ~10 steps over [0, 999]. Exponential: check indices 1, 2, 4, 8—at 8 we exceed or match; then binary search in [4, 8] (or [0, 8]). Fewer steps because we quickly bound the range. Interpolation: Array of values 10, 20, 30, …, 1000 (uniform). Target 320. We estimate position ≈ (320−10)/(1000−10) × n ≈ 0.31n; probe there and narrow. Often very few steps when data is uniform.

Why This Topic Matters

Exponential search: Used when the target index is small (e.g. search in an unbounded or very large sorted stream, or "find first 1" in a sorted bit array). Also the backbone of "binary search in a range we don't know yet"—find the range by doubling, then binary search.
Interpolation search: In theory and in practice, when data is uniformly distributed, interpolation search can do better than binary search (O(log log n) average). Useful in specialized settings (e.g. numeric keys in a known range).
Interviews: Less common than binary search, but "search in an unbounded sorted array" or "find position with minimal comparisons when target is near start" can lead to exponential search. Interpolation is sometimes mentioned as a "faster average case when data is uniform."

Exponential Search

Formal Definition

Concept Note

Exponential search: Given a sorted array A and target x, find the smallest power-of-two range that could contain x: start with index i = 1; while i < n and A[i] < x, double i (e.g. i = 2, 4, 8, …). Then run binary search in the range [i/2, min(i, n−1)]. If the target's index is k, we need O(log k) steps to bound the range and O(log (range size)) ≈ O(log k) for binary search, so total O(log k) where k is the index of the target (or the upper bound we reach).

Mental Model

You don't know where the target is, but you want to find a small range that contains it. Jump 1, 2, 4, 8, … until you've passed the target (or reached the end). That gives you an upper bound. The target must lie in the previous "jump" range. Then binary search inside that range.

Step-by-Step

If arr[0] == target, return 0.
Set i = 1. While i < n and arr[i] < target, set i *= 2 (exponential jump).
We have bounded the target to the range [i/2, min(i, n−1)]. Run binary search in that range and return the result.

ASCII Diagram

  Sorted array, target at index 5.  i = 1, 2, 4, 8, ...
  i=1: arr[1] < target → i=2
  i=2: arr[2] < target → i=4
  i=4: arr[4] < target → i=8
  i=8: arr[8] > target or i >= n → stop. Range is [4, min(8,n-1)]
  Binary search in [4, 8] → find target at 5.

Python Implementation

def exponential_search(arr, target):
    n = len(arr)
    if n == 0:
        return -1
    if arr[0] == target:
        return 0
    i = 1
    while i < n and arr[i] < target:
        i *= 2
    # Binary search in range [i//2, min(i, n-1)]
    left, right = i // 2, min(i, n - 1)
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        if arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1

Time and Space Complexity (Exponential)

Let k be the index of the target (or the first index where we exceed the target). We need O(log k) doublings to reach or pass k. The range for binary search is at most 2× the last jump, so O(log k) for binary search. Total O(log k). If the target is at the beginning, this is much better than O(log n). Worst case (target at end): O(log n). Space O(1).

Interpolation Search

Formal Definition

Concept Note

Interpolation search: Given a sorted array A and target x, estimate the position of x assuming values are uniformly distributed between A[left] and A[right]: pos = left + (x - A[left]) * (right - left) // (A[right] - A[left]). If A[pos] == x, return pos. If A[pos] < x, search [pos+1, right]; else search [left, pos−1]. Average case (uniform distribution): O(log log n). Worst case (e.g. values increase exponentially): O(n).

Mental Model

You have a sorted range and know the values at the endpoints. If the values were evenly spaced, where would the target sit? That estimated index is your probe. If the target is there, you're done; otherwise narrow to the left or right subrange and repeat. When data is close to uniform, each step reduces the range by a large factor (not just half), giving very few steps on average.

Step-by-Step

Maintain [left, right]. If left > right or target < arr[left] or target > arr[right], return −1 (target cannot be in range).
Compute pos = left + (target - arr[left]) * (right - left) // (arr[right] - arr[left]). Clamp pos to [left, right].
If arr[pos] == target, return pos. If arr[pos] < target, set left = pos + 1; else right = pos - 1. Repeat.

Python Implementation

def interpolation_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        if arr[right] == arr[left]:
            if arr[left] == target:
                return left
            return -1
        pos = left + (target - arr[left]) * (right - left) // (arr[right] - arr[left])
        pos = max(left, min(pos, right))
        if arr[pos] == target:
            return pos
        if arr[pos] < target:
            left = pos + 1
        else:
            right = pos - 1
    return -1

Time and Space Complexity (Interpolation)

Average case (uniform distribution): Each probe reduces the range by a factor that depends on how close the estimate is; analysis gives O(log log n) expected comparisons. Worst case: When values are heavily skewed (e.g. 1, 2, 4, 8, …, 2^n), the estimate can be off and we might only eliminate one element per step → O(n). Space O(1).

Common Mistake

Using interpolation search when the data is not roughly uniform (e.g. many duplicates, or geometric progression). Worst-case performance degrades to O(n). Also avoid division by zero when arr[right] == arr[left]—handle that case separately.

Comparison Table

Algorithm	Best for	Time (typical)	Worst case
Binary search	General sorted array	O(log n)	O(log n)
Exponential search	Target near start, unbounded array	O(log k), k = index	O(log n)
Interpolation search	Uniformly distributed sorted data	O(log log n) avg	O(n)

Optimization Insight

Use exponential search when the target is likely near the beginning (e.g. "find first occurrence of 1" in a sorted 0/1 array) or when the array is conceptually unbounded. Use interpolation search only when you have strong reason to believe the data is uniformly distributed; otherwise binary search is safer and predictable O(log n).

Edge Cases

Exponential: Empty array; target at index 0 (handle before loop); target larger than all elements (i grows until i ≥ n, then binary search in [i/2, n−1]).
Interpolation: arr[right] == arr[left] (avoid division by zero; check if target equals that value). Target outside [arr[left], arr[right]] (return −1). Duplicate values (formula can still give a valid index; duplicates may require scanning).

Pattern Recognition

Exponential: "Unbounded sorted array," "target likely near the start," "find range then search." Interpolation: "Sorted array with uniform spread," "numeric keys in a known range," "minimize comparisons when distribution is uniform." When in doubt, default to binary search.

Interview Insight

For "search in an unbounded sorted array," describe exponential search: double the index until you pass the target, then binary search in the last range. State O(log k) where k is the target index. Interpolation search is less commonly asked; if mentioned, say it gives O(log log n) average for uniform data but O(n) worst case, and that binary search is usually the safe choice.

Practice Problems

Search in an unbounded sorted array (exponential search to find range, then binary search).
Find the first 1 in a sorted binary array (0s then 1s)—exponential search + first-occurrence binary.
Implement interpolation search and compare with binary on uniformly distributed data.

Summary

Exponential search: Bound the target by doubling index (1, 2, 4, 8, …); then binary search in that range. O(log k) where k is the target index. Best when target is near the start or array is unbounded.
Interpolation search: Estimate position from value and endpoints; probe and narrow. O(log log n) average for uniform data, O(n) worst case. Use only when data is roughly uniformly distributed.
Both assume a sorted array. For general sorted search, binary search remains the default; use exponential or interpolation when the problem structure matches.

6.5 Bubble Sort

Introduction

Bubble sort is a simple comparison-based sorting algorithm that repeatedly steps through the array, compares adjacent elements, and swaps them if they are in the wrong order. Each full pass "bubbles" the largest (or smallest) unsorted element to its correct position at the end (or beginning) of the segment. It is easy to understand and implement but inefficient for large data: time is O(n²) in the worst and average case, and O(n) in the best case (when the array is already sorted and we use an early-exit optimization). It is mainly used for teaching, small arrays, or when simplicity matters more than speed. Understanding bubble sort builds intuition for comparison-based sorting and sets the stage for faster algorithms like merge sort and quicksort (Topics 6.8–6.9).

Real-World Analogy

Imagine ordering a row of bottles by height. You walk left to right: if two adjacent bottles are out of order (shorter one on the right), you swap them. You keep doing full passes along the row. After one pass, the tallest bottle has "bubbled" to the right end. You repeat, ignoring the last position (already correct), until no swaps happen in a pass—then the row is sorted. That's bubble sort: repeatedly swap adjacent inversions until none remain.

Example

Array [5, 2, 8, 1]. Pass 1: 5↔2 → [2,5,8,1]; 5<8 ok; 8↔1 → [2,5,1,8]. Largest (8) is at end. Pass 2: 2,5 ok; 5↔1 → [2,1,5,8]; 5<8 ok. Pass 3: 2↔1 → [1,2,5,8]. No more swaps → sorted.

Formal Definition

Concept Note

Bubble sort: For i from 0 to n−2 (outer loop over passes), for j from 0 to n−2−i (inner loop; after pass i, the last i elements are already the largest and sorted), compare arr[j] and arr[j+1]. If arr[j] > arr[j+1], swap them. After pass i, the (i+1)-th largest element is in place at index n−1−i. Stability: When elements are equal, we do not swap (use >, not >=), so bubble sort is stable. In-place: O(1) extra space (aside from the array).

Why This Topic Matters

Foundation: One of the first sorting algorithms taught. Builds the idea of "compare and swap" and "one element per pass in place."
Stability: Bubble sort is stable when implemented with strict > (no swap on equal). Useful to contrast with non-stable sorts later.
Interviews: Rarely asked to implement for production, but "explain bubble sort" or "sort with only adjacent swaps" (e.g. minimum adjacent swaps to sort) can appear. Knowing why it's O(n²) and that better sorts exist is expected.

Mental Model

Each pass scans the unsorted portion and pushes the maximum to the right boundary. After k passes, the k largest elements are in their final positions at the end. So the unsorted region shrinks from the right. Alternatively: "repeatedly fix adjacent inversions until there are none"—when no swap occurs in a pass, the array is sorted (useful for early termination).

Step-by-Step Breakdown

Outer loop: For pass i = 0, 1, …, n−2 (we need at most n−1 passes; after n−1 passes, the smallest has bubbled to the front or the largest to the back).
Inner loop: For j from 0 to n−2−i, compare arr[j] and arr[j+1]. If arr[j] > arr[j+1], swap.
Early exit (optimization): If in a full pass no swap happened, the array is sorted; break out.

ASCII Diagram

  Initial:     [ 5,  2,  8,  1 ]
  Pass 1 (i=0): j=0: 5>2 swap → [2,5,8,1]; j=1: 5<8; j=2: 8>1 swap → [2,5,1,8]
  Pass 2 (i=1): j=0: 2<5; j=1: 5>1 swap → [2,1,5,8]
  Pass 3 (i=2): j=0: 2>1 swap → [1,2,5,8]; j=1: 2<5. No more swaps → done.

Python Implementation

Standard (No Early Exit)

def bubble_sort(arr):
    n = len(arr)
    for i in range(n - 1):
        for j in range(n - 1 - i):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]

With Early Termination

def bubble_sort_optimized(arr):
    n = len(arr)
    for i in range(n - 1):
        swapped = False
        for j in range(n - 1 - i):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
                swapped = True
        if not swapped:
            break

Line-by-Line Explanation

for i in range(n - 1): We do at most n−1 passes. After pass i, the (i+1) largest elements are at indices n−1−i down to n−1.
for j in range(n - 1 - i): Inner loop only goes up to n−2−i so we compare adjacent pairs in the unsorted part; we don't touch the last i elements (already in place).
if arr[j] > arr[j + 1]: Strict > keeps the sort stable (equal elements not swapped). Swap so the larger moves right.

Time Complexity

Worst case: Reverse order. Every pair is inverted. Number of comparisons = (n−1) + (n−2) + … + 1 = n(n−1)/2 = O(n²).

Best case: Already sorted. With early termination, one pass (n−1 comparisons), no swaps → O(n). Without early exit, still (n−1)+(n−2)+…+1 = O(n²).

Average case: Random order; about half the pairs may be inverted. Still on the order of n² comparisons → O(n²).

Space Complexity

Only a few variables (i, j, maybe swapped). Sorting is in-place. O(1) auxiliary space.

Edge Cases

Empty or single element: Loop range is empty or single pass; no issue; array unchanged (already sorted).
Already sorted: With early exit, one pass then break. Without, still O(n²) comparisons.
All equal: No swaps (we use >); one pass with early exit. Stable.

Common Mistakes

Inner loop bound: Use range(n - 1 - i), not range(n - 1), so we don't re-scan already-sorted tail (correctness is fine either way, but efficiency and standard definition expect shrinking range).
Using >= for swap: That makes the sort unstable (equal elements may be reordered). Use > for stability.

Optimization Insight

Early termination (no-swap check) gives O(n) on already-sorted or nearly-sorted data. For general random data, bubble sort remains O(n²); for production use prefer O(n log n) sorts (merge sort, quicksort, or built-in sort).

Evolution: Bubble vs Better Sorts

Algorithm	Time (avg/worst)	Stable
Bubble sort	O(n²)	Yes
Merge sort / Quick sort	O(n log n)	Merge yes, quick no (default)

Interview Insight

If asked to implement bubble sort, write the two nested loops with adjacent compare-and-swap and mention O(n²) time, O(1) space, stability. Add early termination for best case O(n). Acknowledge that in practice we use O(n log n) sorts or the language's built-in sort.

Practice Problems

Implement bubble sort with and without early termination.
Count the number of swaps bubble sort would do (equals number of inversions).
Minimum adjacent swaps to sort an array (answer: inversion count).

Summary

Bubble sort: Repeatedly compare adjacent elements and swap if out of order; each pass bubbles the largest to the right. O(n²) worst/average, O(n) best with early exit, O(1) space, stable (use >).
Inner loop: j from 0 to n - 2 - i; swap when arr[j] > arr[j+1].
Use for teaching or tiny arrays; prefer merge/quicksort or built-in sort for real use.

6.6 Selection Sort

Introduction

Selection sort works by repeatedly finding the minimum element in the unsorted portion of the array and swapping it to the front. After i passes, the first i positions hold the i smallest elements in sorted order; the rest is still unsorted. Unlike bubble sort, it does at most one swap per pass (the minimum into place), but it still needs to scan the unsorted part to find that minimum—so time remains O(n²) in all cases (best, average, worst). It is in-place (O(1) extra space) but not stable in the typical implementation: swapping the minimum to the front can move it past equal elements, changing their relative order. Selection sort is easy to implement and has the property that it does the minimum number of swaps (at most n−1) among comparison-based sorts—useful when swap cost is high compared to comparison cost.

Real-World Analogy

Imagine sorting a row of cards by value. You scan the whole row, find the smallest card, and put it in the first position (swap with whatever was there). Then you scan from the second position to the end, find the smallest in that range, and put it in the second position. You repeat: "find minimum of the rest, put it next." Each pass places one element in its final position with one swap. That's selection sort—select the minimum, place it, repeat.

Example

Array [64, 25, 12, 22, 11]. Pass 1: min in [0..4] is 11 at index 4 → swap with index 0 → [11, 25, 12, 22, 64]. Pass 2: min in [1..4] is 12 at index 2 → swap with index 1 → [11, 12, 25, 22, 64]. Pass 3: min in [2..4] is 22 at index 3 → swap with index 2 → [11, 12, 22, 25, 64]. Pass 4: min in [3..4] is 25 at index 3 → no swap needed. Sorted.

Formal Definition

Concept Note

Selection sort: For i from 0 to n−2: (1) Find the index min_idx of the minimum element in arr[i..n−1]. (2) Swap arr[i] with arr[min_idx]. After step i, arr[0..i] contains the i+1 smallest elements in sorted order. Stability: The standard implementation (swap minimum to front) is not stable—equal elements can be reordered when the minimum is swapped from a later position. In-place: O(1) extra space. Swaps: At most n−1 swaps (one per pass, except possibly the last when the minimum is already at i).

Why This Topic Matters

Minimal swaps: When swapping is expensive (e.g. large records, external storage), selection sort minimizes the number of swaps (at most n−1), while still being simple.
Contrast with bubble sort: Bubble sort fixes inversions with many adjacent swaps; selection sort does one "long-distance" swap per pass. Both O(n²), but selection sort has a fixed, small swap count.
Interviews: "Implement selection sort" or "sort with minimum swaps" can come up. Knowing it's unstable and O(n²) in all cases is important.

Mental Model

Keep a "sorted region" at the front (initially empty). Each pass: look at the entire unsorted region, find the smallest element, and move it to the end of the sorted region (one swap). The sorted region grows by one element each time; after n−1 passes, the last element is automatically in place.

Step-by-Step Breakdown

Outer loop: For i = 0 to n−2 (after n−1 passes, the first n−1 positions are correct; the last is the maximum).
Find minimum: Set min_idx = i. For j from i+1 to n−1, if arr[j] < arr[min_idx], set min_idx = j.
Swap: If min_idx != i, swap arr[i] and arr[min_idx].

ASCII Diagram

  Initial:  [ 64, 25, 12, 22, 11 ]
  i=0: min in [0..4] at index 4 (11) → swap 64,11 → [11, 25, 12, 22, 64]
  i=1: min in [1..4] at index 2 (12) → swap 25,12 → [11, 12, 25, 22, 64]
  i=2: min in [2..4] at index 3 (22) → swap 25,22 → [11, 12, 22, 25, 64]
  i=3: min in [3..4] at index 3 (25) → no swap. Done.

Python Implementation

def selection_sort(arr):
    n = len(arr)
    for i in range(n - 1):
        min_idx = i
        for j in range(i + 1, n):
            if arr[j] < arr[min_idx]:
                min_idx = j
        if min_idx != i:
            arr[i], arr[min_idx] = arr[min_idx], arr[i]

Variant—selection sort by maximum (place largest at end): For i from n−1 down to 1, find max in arr[0..i], swap to arr[i]. Same O(n²), same instability.

Line-by-Line Explanation

for i in range(n - 1): We need n−1 passes; after that, the first n−1 elements are the n−1 smallest in order, so the last element is the largest.
min_idx = i: Assume the element at i is the minimum in the unsorted part; we'll update min_idx if we find a smaller one.
for j in range(i + 1, n):: Scan every element after i. If arr[j] < arr[min_idx], we found a smaller element, so min_idx = j.
if min_idx != i:: Only swap if the minimum wasn't already at i (avoids redundant swap).

Time Complexity

All cases: The inner loop always runs (n−1−i) times for pass i. Total comparisons = (n−1) + (n−2) + … + 1 = n(n−1)/2 = O(n²). There is no early exit—we always scan the full unsorted portion to find the minimum. So best, average, and worst case are all O(n²).

Swaps: At most n−1 swaps (one per pass when min_idx ≠ i). So when comparisons are cheap but swaps are expensive, selection sort can be preferable to bubble sort.

Space Complexity

Only loop indices and min_idx. O(1) auxiliary space; in-place.

Edge Cases

Empty or single element: range(n - 1) is empty; no iterations; array unchanged.
Already sorted: Still O(n²) comparisons; each pass finds min at i, so no swaps (min_idx == i every time).
All equal: min_idx stays at i (we use <), so no swaps. Order preserved among equals, but the algorithm is still considered unstable because with a different initial order of equals we could get reordering—stability is defined over all inputs.

Common Mistakes

Including i in the min search: The minimum in the "unsorted" part can be at i itself, so we start with min_idx = i and compare with j from i+1. Correct.
Forgetting the swap: Finding the minimum is useless if you don't swap it to position i. Always swap arr[i] and arr[min_idx] when they differ.

Common Mistake

Claiming selection sort is stable. The classic implementation (swap min to front) is not stable: if you have two equal elements and the later one gets selected as "min" and swapped to the front, it will now appear before the other equal element. For a stable O(n²) sort, use insertion sort (Topic 6.7).

Comparison: Selection vs Bubble

Property	Selection sort	Bubble sort
Time (all cases)	O(n²)	O(n²) worst/avg; O(n) best with early exit
Swaps	At most n−1	Up to O(n²)
Stable	No	Yes (with strict >)

Optimization Insight

Selection sort minimizes swaps, which can matter when moving large objects. For general-purpose sorting, O(n log n) algorithms (merge sort, quicksort) or the language's built-in sort are preferred. Use selection sort when you need a simple in-place O(n²) sort and care about minimizing swap count.

Pattern Recognition

"Find the minimum (or maximum), put it in place, repeat"—that's selection sort. Useful when the problem asks for "minimum number of swaps to sort" (selection sort achieves at most n−1) or when implementing a simple sort from scratch.

Interview Insight

If asked to implement selection sort: outer loop i from 0 to n−2, inner loop find min index in arr[i+1..n−1] (starting min_idx=i), then swap arr[i] and arr[min_idx]. State O(n²) for all cases, O(1) space, not stable, and at most n−1 swaps. Compare with bubble sort (many swaps, stable with care) and insertion sort (stable, good for nearly sorted).

Practice Problems

Implement selection sort and count comparisons and swaps on a few inputs.
Modify to sort by maximum (place largest at end) instead of minimum at front.
Minimum number of swaps required to sort an array (when only "swap any two" is allowed: answer is n − (number of cycles in the permutation); when only "swap with minimum" is allowed, selection sort is optimal).

Summary

Selection sort: For each position i, find the minimum in arr[i..n−1] and swap it to i. O(n²) time (all cases), O(1) space, not stable, at most n−1 swaps.
Inner loop: find min_idx in [i+1, n−1], then swap arr[i] and arr[min_idx] if different.
Use when swap count must be minimal or for teaching; prefer O(n log n) sorts or built-in sort in practice.

6.7 Insertion Sort

Introduction

Insertion sort builds the sorted array one element at a time by repeatedly taking the next element from the unsorted portion and inserting it into its correct position among the already-sorted elements. The sorted region grows from the left: after processing index i, the subarray arr[0..i] is sorted. To insert arr[i], we compare it with elements to its left and shift larger ones right until we find the right spot. Worst and average case are O(n²) (many comparisons and shifts), but best case is O(n) when the array is already sorted—each element is compared once and no shifts. Insertion sort is stable (we insert after equal elements) and in-place (O(1) extra space). It is the algorithm of choice for small or nearly sorted data and is how many people sort cards in hand. It also underlies efficient algorithms for incremental sorting and for small subarrays in hybrid sorts (e.g. quicksort with insertion sort for small segments).

Real-World Analogy

Imagine sorting a hand of cards. You keep the cards you've already sorted in your left hand (or on the table). You pick the next card from the unsorted pile and insert it into the correct position in the sorted part—sliding larger cards to the right as you go. You repeat until all cards are in the sorted region. That's insertion sort: each new element is inserted into the already-sorted prefix.

Example

Array [12, 11, 13, 5, 6]. Sorted prefix starts as [12]. Insert 11: 11 < 12, shift 12 right → [11, 12]. Insert 13: 13 > 12, no shift → [11, 12, 13]. Insert 5: 5 < 13,12,11, shift all right → [5, 11, 12, 13]. Insert 6: 6 < 13,12,11, 6 > 5, shift 13,12,11 right → [5, 6, 11, 12, 13]. Done.

Formal Definition

Concept Note

Insertion sort: For i from 1 to n−1: assume arr[0..i−1] is sorted. Set key = arr[i]. Shift elements arr[j] (j = i−1, i−2, …) one position right while arr[j] > key. Place key in the vacated position. After iteration i, arr[0..i] is sorted. Stability: We shift only when arr[j] > key (strict), so equal elements are not moved past the inserted one—stable. In-place: O(1) extra space.

Why This Topic Matters

Best case O(n): When the array is already sorted (or nearly sorted), insertion sort does one comparison per element and little or no shifting—faster than selection sort, which always does O(n²) comparisons.
Stable and in-place: The only common O(n²) sort that is both stable and in-place. Useful when stability is required and n is small.
Practical use: Small arrays (e.g. n ≤ 10–50), nearly sorted data, or as the base case in merge sort / quicksort (sort small subarrays with insertion sort).
Interviews: "Implement insertion sort," "sort a stream one element at a time," or "why is insertion sort good for nearly sorted data?" are common.

Mental Model

Maintain a sorted prefix [0..i−1]. Take arr[i] and "insert" it: walk left from i−1, shifting every element that is greater than the key one step right, until you find an element ≤ key (or reach the start). Put the key in the gap. The sorted prefix is now [0..i]. Repeat for i = 1, 2, …, n−1.

Step-by-Step Breakdown

Outer loop: For i from 1 to n−1 (index 0 is trivially sorted).
Save key: key = arr[i]. We will insert key into the sorted region [0..i−1].
Shift and find position: Set j = i−1. While j ≥ 0 and arr[j] > key, set arr[j+1] = arr[j] and j = j−1. The loop stops when we hit an element ≤ key or the start.
Place key: arr[j+1] = key (the position that was vacated or is right after the element ≤ key).

ASCII Diagram

  Initial:  [ 12, 11, 13,  5,  6 ]
  i=1: key=11, shift 12 right → [ 11, 12, 13,  5,  6 ]
  i=2: key=13, 13>12, no shift → [ 11, 12, 13,  5,  6 ]
  i=3: key=5,  shift 13,12,11 right → [ 5, 11, 12, 13,  6 ]
  i=4: key=6,  shift 13,12,11 right, 6>5 stop → [ 5,  6, 11, 12, 13 ]

Python Implementation

Standard (Shift in a while-loop)

def insertion_sort(arr):
    n = len(arr)
    for i in range(1, n):
        key = arr[i]
        j = i - 1
        while j >= 0 and arr[j] > key:
            arr[j + 1] = arr[j]
            j -= 1
        arr[j + 1] = key

Variant (Swap backward instead of shift)

Instead of shifting, swap the key backward until it is in order. Same O(n²), same stability if we use strict > when swapping.

def insertion_sort_swap(arr):
    for i in range(1, len(arr)):
        j = i
        while j > 0 and arr[j - 1] > arr[j]:
            arr[j - 1], arr[j] = arr[j], arr[j - 1]
            j -= 1

Line-by-Line Explanation (Standard Version)

for i in range(1, n): We insert each element from index 1 onward into the sorted prefix [0..i−1].
key = arr[i]; j = i - 1: We'll compare key with elements to its left; j is the current "slot" we're comparing with.
while j >= 0 and arr[j] > key: As long as the element at j is greater than key, it must move right to make room. We copy it to arr[j+1] and decrement j. Strict > ensures stability.
arr[j + 1] = key: When the loop exits, j is either −1 (key is smallest) or the index of the last element that is ≤ key. The insertion position is j+1.

Time Complexity

Worst case: Array in reverse order. Each insertion shifts all existing sorted elements. Comparisons and shifts: 1 + 2 + … + (n−1) = n(n−1)/2 = O(n²).

Best case: Already sorted. For each i, we compare key with arr[i−1] once and find it's not smaller, so no shifts. Total n−1 comparisons → O(n).

Average case: Random order; on average we shift about half of the sorted prefix per insertion → O(n²).

Space Complexity

Only variables i, j, and key. O(1) auxiliary space; in-place.

Edge Cases

Empty or single element: Loop range(1, n) is empty; array unchanged (already sorted).
Already sorted: Best case; one comparison per element, no shifts; O(n).
All equal: No element satisfies arr[j] > key, so no shifts; stable, O(n).

Common Mistakes

Using >= in the condition: arr[j] >= key would shift equal elements, making the sort unstable. Use arr[j] > key.
Forgetting to place key: After the while loop, the vacated position is arr[j+1]. Must assign arr[j+1] = key.

Common Mistake

Starting the outer loop at 0. Index 0 is a single-element "sorted" region; we insert starting from index 1. If you start at 0, there is no "sorted prefix" to insert into.

Comparison: Insertion vs Bubble vs Selection

Algorithm	Best	Worst/Avg	Stable
Insertion sort	O(n)	O(n²)	Yes
Bubble sort	O(n) with early exit	O(n²)	Yes
Selection sort	O(n²)	O(n²)	No

Optimization Insight

For nearly sorted data, insertion sort is fast (O(n) best case). The insertion point can also be found with binary search in the sorted prefix—that reduces comparisons to O(n log n) but shifts remain O(n²), so overall still O(n²). Binary insertion is useful when comparisons are expensive. For small n, insertion sort often beats merge/quicksort due to low constant factors; many standard libraries use it for small subarrays.

Pattern Recognition

"Maintain a sorted prefix; take the next element and insert it in the right place"—that's insertion sort. Good when data arrives one element at a time (online sorting), when the array is small or nearly sorted, or when you need a stable in-place O(n²) sort.

Interview Insight

Implement insertion sort: for i from 1 to n−1, key = arr[i], shift elements to the right while arr[j] > key, then arr[j+1] = key. State O(n) best (already sorted), O(n²) worst/average, O(1) space, stable. Mention that it's the preferred simple sort for nearly sorted data and for small n in hybrid sorts.

Practice Problems

Implement insertion sort and test on already-sorted, reverse-sorted, and random arrays.
Count inversions using insertion sort (each time you shift, you're fixing an inversion).
Binary insertion sort: use binary search to find the insertion position, then shift. Compare total operations.

Summary

Insertion sort: For each i from 1 to n−1, insert arr[i] into the sorted prefix [0..i−1] by shifting larger elements right and placing the key. O(n) best, O(n²) worst/average, O(1) space, stable.
Use strict arr[j] > key for stability; place key at arr[j+1] after the while loop.
Best of the simple O(n²) sorts for nearly sorted or small data; used as base case in many O(n log n) sorts.

6.8 Merge Sort

Introduction

Merge sort is a divide-and-conquer sorting algorithm: it splits the array into two halves, recursively sorts each half (using merge sort), then merges the two sorted halves into one sorted array. The merge step takes two sorted subarrays and combines them in linear time by repeatedly taking the smaller of the two front elements. Because we always divide in half and merge in O(n), the recurrence is T(n) = 2T(n/2) + O(n), which gives O(n log n) time in all cases (best, average, worst). Merge sort uses O(n) extra space for temporary arrays (or one shared temp array) and O(log n) stack space for recursion. It is stable when we take the left element when equal during merge. Merge sort is the go-to when you need guaranteed O(n log n), stability, or when sorting linked lists (where merge is natural and random access is not).

Real-World Analogy

Imagine sorting a deck of cards by splitting it in half, giving each half to a friend to sort (they do the same—split and hand off until they have one card each), then merging the two sorted piles: you always look at the top of each pile and take the smaller card, placing it face-down on the result. When one pile is empty, you put the rest of the other pile on the result. That's merge sort: divide until trivial (one element), then merge sorted halves.

Example

Array [38, 27, 43, 3, 9, 82, 10]. Split: [38, 27, 43, 3] and [9, 82, 10]. Recursively sort: [3, 27, 38, 43] and [9, 10, 82]. Merge: compare 3 and 9 → take 3; 27 and 9 → take 9; 27 and 10 → take 10; 27 and 82 → take 27; … → [3, 9, 10, 27, 38, 43, 82].

Formal Definition

Concept Note

Merge sort: If n ≤ 1, return (already sorted). Otherwise: (1) Divide: mid = n//2; left = arr[0..mid−1], right = arr[mid..n−1]. (2) Conquer: left = merge_sort(left), right = merge_sort(right). (3) Merge: Merge the two sorted sequences left and right into one sorted sequence by repeatedly comparing the front elements and taking the smaller. Stability: When left[i] == right[j], take left[i] first → stable. Space: O(n) for the merged result (and recursion stack O(log n)).

Why This Topic Matters

Guaranteed O(n log n): Unlike quicksort, merge sort has no bad pivot; it always does O(n log n) comparisons and O(n log n) work. Predictable performance.
Stable: Important when sorting by one key then another (e.g. sort by name, then by age—ages stay in name order within same name).
Linked lists and external sorting: Merge sort works well on linked lists (no random access needed) and is the basis for external sorting (merge sorted chunks from disk).
Interviews: "Implement merge sort," "merge two sorted arrays," "count inversions" (using merge step), "sort linked list."

Mental Model

Divide the array in half until each piece has 0 or 1 element (trivially sorted). Then merge adjacent sorted pieces: two pointers at the start of each piece, take the smaller, advance that pointer, until both are exhausted. The merge of two arrays of size n/2 takes O(n). The recursion tree has log n levels and O(n) work per level → O(n log n).

Step-by-Step Breakdown

Base case: If len(arr) <= 1, return the array (already sorted).
Divide: mid = len(arr) // 2; left = arr[:mid], right = arr[mid:].
Conquer: left = merge_sort(left), right = merge_sort(right).
Merge: Create result list. While both left and right are non-empty, compare left[0] and right[0]; append the smaller to result and remove it from that list. Append the remainder of the non-empty list to result. Return result.

ASCII Diagram

  Recursion (split):
  [38, 27, 43, 3, 9, 82, 10]
       /                    \
  [38,27,43,3]            [9,82,10]
    /      \                 /    \
  [38,27] [43,3]          [9,82] [10]
   /  \    /  \            /  \    |
  [38][27][43][3]        [9][82]  [10]

  Merge (bottom-up): [27,38] [3,43] → [3,27,38,43]; [9,82] [10] → [9,10,82]
  Then: [3,27,38,43] and [9,10,82] → [3,9,10,27,38,43,82]

Python Implementation

Recursive (New Lists)

def merge_sort(arr):
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    return merge(left, right)

def merge(left, right):
    result = []
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:   # <= for stability
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result.extend(left[i:])
    result.extend(right[j:])
    return result

Index-Based (One Temp Array)

To avoid creating many small lists, use indices and one temporary array of size n. Merge arr[low..mid] and arr[mid+1..high] into temp, then copy back to arr[low..high].

def merge_sort_inplace(arr):
    temp = [0] * len(arr)

    def merge_segments(lo, mid, hi):
        i, j, k = lo, mid + 1, lo
        while i <= mid and j <= hi:
            if arr[i] <= arr[j]:
                temp[k] = arr[i]; i += 1
            else:
                temp[k] = arr[j]; j += 1
            k += 1
        while i <= mid: temp[k] = arr[i]; i += 1; k += 1
        while j <= hi: temp[k] = arr[j]; j += 1; k += 1
        for i in range(lo, hi + 1): arr[i] = temp[i]

    def sort(low, high):
        if low >= high:
            return
        mid = (low + high) // 2
        sort(low, mid)
        sort(mid + 1, high)
        merge_segments(low, mid, high)

    sort(0, len(arr) - 1)

The list-slicing version above is simpler but uses O(n) space for copies at each level. The index-based version uses one O(n) temp array; the array is modified in place, but auxiliary space is still O(n).

Line-by-Line Explanation (Merge Function)

if left[i] <= right[j]: We take from the left when it's less than or equal to the right. Taking left when equal preserves the original order of equal elements (stability).
result.extend(left[i:]): After one list is exhausted, append the rest of the other. Only one of left[i:] or right[j:] is non-empty.

Time Complexity

Let T(n) be the time for n elements. We split into two halves of size n/2, sort each (2 · T(n/2)), and merge in O(n). So T(n) = 2T(n/2) + O(n). By the Master Theorem (case 2: a=2, b=2, f(n)=Θ(n), n^(log_b a)=n), T(n) = Θ(n log n). This holds for best, average, and worst case—merge sort always does the same divide and merge structure. O(n log n) comparisons and O(n log n) moves.

Space Complexity

Recursive list-slicing version: Each level allocates O(n) for the left/right copies and the merged result. Depth is O(log n), but space is not multiplied across levels if we consider that we return and discard before going to the sibling—more precisely, the extra space at any time is O(n) for the current merge plus O(log n) stack. Often stated as O(n) auxiliary. Index-based with one temp: One temp array of size n plus O(log n) stack → O(n).

Edge Cases

Empty or single element: Base case returns immediately; no merge needed.
Two elements: Split into [a] and [b], merge → sorted.
Already sorted: Still O(n log n); merge sort doesn't adapt to existing order (unlike insertion sort).

Common Mistakes

Using < instead of <= in merge: For stability we must take the left element when equal. If we use <, we take the right when equal, which can reorder equal elements.
Off-by-one in index-based merge: Use mid = (low + high) // 2, left segment [low..mid], right segment [mid+1..high]. Merge into temp then copy back to arr[low..high].

Common Mistake

Assuming "in-place" means O(1) extra space. Standard merge sort uses O(n) extra space for merging. True in-place merge in O(1) space exists but is complex and slower in practice; usually "in-place merge sort" means the array is modified using one O(n) temp buffer.

Comparison: Merge Sort vs Quick Sort

Property	Merge sort	Quick sort
Time (worst)	O(n log n)	O(n²)
Time (avg)	O(n log n)	O(n log n)
Space	O(n)	O(log n) typical
Stable	Yes	No (default)

Optimization Insight

For small subarrays (e.g. n ≤ 15–20), use insertion sort instead of recurring down to size 1. This reduces constant factors and stack depth. Many production merge sorts use this hybrid. Merge sort is also ideal for linked lists: splitting is O(n) (find mid with two pointers), merge is O(n), no extra array needed if you merge by rewiring pointers.

Pattern Recognition

"Divide in half, sort each half, merge"—that's merge sort. Same pattern: merge two sorted arrays (two pointers), count inversions (count when taking from right in merge), sort linked list (split at mid, merge).

Interview Insight

Implement merge sort: base case len ≤ 1; split at mid; recurse on left and right; merge by comparing fronts and appending the smaller (use ≤ for stability). State O(n log n) time, O(n) space, stable. For "merge two sorted arrays," use the same merge logic without the recursion. For "count inversions," in the merge step when you take an element from the right half, add (remaining length of left) to the inversion count.

Practice Problems

Implement merge sort recursively and with an index-based in-place style.
Merge two sorted arrays (or merge two sorted halves of one array) in O(n) time.
Count inversions in an array using merge sort (count when right element is taken before left elements remain).
Sort a linked list using merge sort (find mid with slow/fast pointers, merge by rewiring).

Summary

Merge sort: Divide array in half, recursively sort halves, merge two sorted halves. O(n log n) time (all cases), O(n) extra space, stable (take left when equal in merge).
Merge step: two pointers, append smaller (or left when equal), then append remainder.
Use when you need guaranteed O(n log n), stability, or sorting linked lists; prefer quicksort for average-case in-place when stability isn't required.

6.9 Quick Sort

Introduction

Quick sort is a divide-and-conquer algorithm that works by choosing a pivot element, partitioning the array so that all elements smaller than the pivot are to its left and all larger are to its right, then recursively sorting the left and right subarrays. The pivot ends up in its final sorted position. Average-case time is O(n log n), but worst case is O(n²) when the pivot is always the smallest or largest (e.g. already sorted array with last element as pivot). Quick sort is typically in-place (O(log n) stack space for recursion) and is not stable in the standard implementation. It is widely used in practice because of good average performance, cache friendliness, and the fact that random or median-of-three pivot selection makes worst case rare. Many language runtimes use a quicksort variant (often with insertion sort for small subarrays).

Real-World Analogy

Imagine sorting a stack of papers by picking one as a "reference" (pivot)—say the last one. You go through the rest and put everything smaller than that reference in one pile and everything larger in another. The reference goes in the middle. You then sort each pile the same way (pick a pivot, split). No merging step—the pivot is already in place. That's quicksort: partition around a pivot, then recurse on the two sides.

Example

Array [10, 80, 30, 90, 40, 50, 70], pivot = 70 (last). Partition: smaller than 70 → [10, 30, 40, 50]; larger → [80, 90]; pivot in middle → [10, 30, 40, 50, 70, 80, 90]. Recursively sort [10,30,40,50] and [80,90]. No merge—pivot 70 is already in final position.

Formal Definition

Concept Note

Quick sort: (1) Choose a pivot (e.g. arr[high]). (2) Partition: Rearrange so that elements < pivot are in the left segment, elements > pivot in the right segment, and the pivot is between them (or at a fixed index). (3) Recursively quick sort the left segment and the right segment. Partition invariant: After partition, pivot is in its final position; no merge step. Stability: Standard partition (Lomuto or Hoare) is not stable. In-place: Partition can be done with O(1) extra space; recursion uses O(log n) stack in the average case.

Why This Topic Matters

Practical default: Many standard libraries use quicksort (or a hybrid) for sorting because average O(n log n) with small constants and in-place operation.
Partition as a building block: The partition step is reused in "quickselect" (find kth smallest without fully sorting), "Dutch national flag," and many interview problems.
Worst case and pivot choice: Understanding why sorted input can give O(n²) with last-element pivot leads to random pivot or median-of-three, making worst case unlikely.
Interviews: "Implement quicksort," "partition an array," "find the kth largest element" (quickselect).

Mental Model

Pick a pivot. Walk through the array and group "small" and "large" around it so the pivot lands in its final position. Now the array is split into "left of pivot" and "right of pivot"—both are unsorted but every left element < pivot < every right element. Sort the left and right with the same process. There is no merge: when the recursion returns, the whole segment is sorted because the pivot is already in place.

Step-by-Step Breakdown

Base case: If the segment has 0 or 1 element, return (already sorted).
Choose pivot: Often the last element (or first, or random, or median-of-three). Swap pivot to the end (or keep index) for easier partitioning.
Partition: Maintain a "small" region. Scan elements; when you find one < pivot, extend the small region and put the element there. At the end, place the pivot after the small region. Return the pivot's index.
Recurse: Quick sort arr[low..pivot_idx−1] and arr[pivot_idx+1..high].

ASCII Diagram

  Lomuto partition (pivot = last). arr = [10, 80, 30, 90, 40, 50, 70], pivot=70
  i = index of last "small" element (init -1). j scans.
  j=0: 10<70 → swap to small region → i=0: [10, 80, 30, 90, 40, 50, 70]
  j=1: 80>70 skip
  j=2: 30<70 → swap → i=1: [10, 30, 80, 90, 40, 50, 70]
  j=3,4: 90,40 → 40<70 → [10, 30, 40, 90, 80, 50, 70]; then 50<70 → [10, 30, 40, 50, 80, 90, 70]
  j=6: swap pivot with i+1 → [10, 30, 40, 50, 70, 90, 80]. Pivot index = 4.

Python Implementation

Lomuto Partition (Pivot = Last)

def partition_lomuto(arr, low, high):
    pivot = arr[high]
    i = low - 1
    for j in range(low, high):
        if arr[j] <= pivot:
            i += 1
            arr[i], arr[j] = arr[j], arr[i]
    arr[i + 1], arr[high] = arr[high], arr[i + 1]
    return i + 1

def quicksort(arr, low, high):
    if low < high:
        pi = partition_lomuto(arr, low, high)
        quicksort(arr, low, pi - 1)
        quicksort(arr, pi + 1, high)

# Call: quicksort(arr, 0, len(arr) - 1)

Wrapper and Random Pivot (Avoid Worst Case)

import random

def quicksort_random(arr, low, high):
    if low < high:
        rand = random.randint(low, high)
        arr[rand], arr[high] = arr[high], arr[rand]
        pi = partition_lomuto(arr, low, high)
        quicksort_random(arr, low, pi - 1)
        quicksort_random(arr, pi + 1, high)

Hoare Partition (Two Pointers)

Two pointers from left and right; swap when left finds a large element and right finds a small one; stop when they cross. Pivot can be first or middle. Slightly more efficient (fewer swaps on average); pivot may not end at the split point—adjust recursion bounds.

def partition_hoare(arr, low, high):
    pivot = arr[low]
    left, right = low, high
    while True:
        while left <= right and arr[left] < pivot:
            left += 1
        while left <= right and arr[right] > pivot:
            right -= 1
        if left >= right:
            return right
        arr[left], arr[right] = arr[right], arr[left]
        left += 1
        right -= 1

Line-by-Line Explanation (Lomuto)

i = low - 1: The region arr[low..i] will contain elements ≤ pivot. Initially empty (i is "before" low).
if arr[j] <= pivot: We use ≤ so that elements equal to the pivot can go either side; putting them in the small region is fine. Then we extend the small region (i += 1) and swap arr[i] with arr[j].
arr[i + 1], arr[high] = arr[high], arr[i + 1]: After the loop, arr[low..i] ≤ pivot and arr[i+1..high−1] > pivot. So the pivot (at high) should go at index i+1. Swap and return i+1 as the pivot index.

Time Complexity

Best case: Pivot is always near the middle (each partition splits roughly in half). T(n) = 2T(n/2) + O(n) → O(n log n).

Average case: Random pivot (or random input). On average the pivot divides the array in a constant fraction; recurrence yields O(n log n).

Worst case: Pivot is always the smallest or largest (e.g. sorted array, pivot = last). One segment has n−1 elements, the other 0. T(n) = T(n−1) + O(n) → O(n²). Randomizing the pivot (or choosing median-of-three) makes this rare in practice.

Space Complexity

Partition uses O(1) extra space. Recursion depth: best/average O(log n), worst O(n) (unbalanced splits). So O(log n) average stack space, O(n) worst.

Edge Cases

Empty or single element: low >= high → return; no partition.
All equal: Every element ≤ pivot; Lomuto puts all in the small region, pivot at end. One segment has n−1, the other 0 → O(n²) unless we optimize (e.g. three-way partition).
Already sorted (pivot = last): Each partition puts pivot at the end; left segment has n−1 elements → O(n²). Use random pivot to avoid.

Common Mistakes

Including pivot in the partition loop: In Lomuto we iterate j from low to high-1 and keep pivot at high until the final swap. Don't compare or move the pivot during the loop.
Wrong recursion bounds: After partition, pivot is at index pi. Recurse on [low, pi−1] and [pi+1, high]; do not include pi in either subarray.

Common Mistake

Using the last element as pivot on an already-sorted array gives worst-case O(n²). Always randomize the pivot (swap a random element to the end before partitioning) or use median-of-three when implementing for production or interviews.

Comparison: Quick Sort vs Merge Sort

Property	Quick sort	Merge sort
Worst time	O(n²)	O(n log n)
Avg time	O(n log n)	O(n log n)
Space	O(log n) avg	O(n)
Stable	No	Yes

Optimization Insight

Random pivot or median-of-three (compare first, middle, last; use median as pivot) avoids worst case on sorted or nearly sorted data. For segments smaller than a threshold (e.g. 10–20), use insertion sort to reduce recursion overhead. Three-way partition (elements equal to pivot in the middle) gives O(n) when there are many duplicates.

Pattern Recognition

"Choose pivot, partition, recurse on both sides"—that's quicksort. The same partition idea: "reorder so that all elements with property X are before those without" (e.g. move zeros to the end), or "find kth smallest" (quickselect: partition once; if pivot index is k, done; else recurse on left or right).

Interview Insight

Implement partition (Lomuto or Hoare), then quicksort that recurses on [low, pi−1] and [pi+1, high]. State O(n log n) average, O(n²) worst, O(log n) space average. Mention random pivot to avoid worst case. For "kth largest," use quickselect: partition and recurse on the side that contains the kth position (or use the pivot index to decide).

Practice Problems

Implement quicksort with Lomuto partition and with random pivot.
Partition: given pivot value, reorder array so all < pivot come first, then pivot(s), then > pivot.
Find the kth largest element (quickselect: partition, then recurse on one side based on pivot index vs k).
Sort an array with many duplicates (three-way partition: less, equal, greater).

Summary

Quick sort: Choose pivot (e.g. last), partition so smaller elements are left and larger right, pivot in place; recurse on left and right. O(n log n) average, O(n²) worst, O(log n) space average, not stable.
Lomuto: pivot at high; i = last index of "small" region; for each j, if arr[j] ≤ pivot, extend small region and swap. Final swap places pivot at i+1.
Use random or median-of-three pivot to avoid worst case; use insertion sort for small subarrays in practice.

6.10 Heap Sort

Introduction

Heap sort sorts an array by treating it as a binary max-heap: first we build a max-heap (so the largest element is at the root), then we repeatedly take the maximum (root), swap it to the end of the unsorted region, and sift down to restore the heap property on the remaining elements. Building the heap can be done in O(n) time (bottom-up heapify); each of the n "extract max" steps takes O(log n), so total time is O(n log n) in all cases (best, average, worst). Heap sort is in-place (O(1) extra space) and not stable. It is useful when you need guaranteed O(n log n) with no extra space and when the heap data structure is already relevant (e.g. priority queue, k largest elements). Many embedded or memory-constrained systems use heap sort for this reason.

Real-World Analogy

Imagine a tournament bracket where the winner (largest) always rises to the top. You arrange all players in a binary tree so that each parent beats both children—that's a max-heap. The champion is at the root. You take the champion out, put them in the "sorted" seat at the end, and run a new match among the remaining players to get the next champion. Repeat until everyone is seated in order. That's heap sort: repeatedly extract the maximum from a heap and place it at the end.

Example

Array [12, 11, 13, 5, 6, 7]. Build max-heap: e.g. [13, 11, 12, 5, 6, 7] (13 at root). Swap root (13) with last (7) → [7, 11, 12, 5, 6 | 13], heapify first 5 → [12, 11, 7, 5, 6 | 13]. Swap root (12) with last (6) → [6, 11, 7, 5 | 12, 13], heapify → [11, 6, 7, 5 | 12, 13]. Continue until sorted: [5, 6, 7, 11, 12, 13].

Formal Definition

Concept Note

Max-heap: A complete binary tree (represented as an array: parent at i has left child at 2i+1, right at 2i+2) where every parent is ≥ its children. So the maximum is at index 0. Heap sort: (1) Build heap: Start from the last non-leaf (index n/2−1) down to 0; for each node, sift it down so the subtree rooted at that node becomes a heap. (2) For i from n−1 down to 1: swap arr[0] with arr[i], then sift down from 0 on the heap of size i (so arr[i..n−1] is sorted). Stability: Not stable. In-place: O(1) extra space.

Why This Topic Matters

Guaranteed O(n log n) in-place: Unlike quicksort, no worst-case O(n²); unlike merge sort, no O(n) extra space. Good for memory-constrained environments.
Heap as a tool: The same heapify and sift-down operations are used for priority queues (Topic 9.8), "find k largest" (min-heap of size k), and many scheduling problems.
Interviews: "Implement heap sort," "heapify an array," "k largest elements" (min-heap of size k or quickselect). Understanding parent/child indices and sift-down is essential.

Mental Model

The array is a complete binary tree: index 0 is the root; for node at index i, left child is 2i+1, right is 2i+2, parent is (i−1)//2. Sift down: If the node is smaller than the larger child, swap with that child and repeat in the subtree. Build heap: Sift down every non-leaf from bottom to top. Sort: Swap root (max) with the last element, then sift down from root on the reduced heap (excluding the last). The "last" position joins the sorted region.

Step-by-Step Breakdown

Build max-heap: For i from (n//2 − 1) down to 0, call sift_down(arr, n, i) so that the subtree at i satisfies the heap property. After this, arr[0] is the maximum.
Extract max and shrink heap: For i from n−1 down to 1: swap arr[0] with arr[i]; then sift_down(arr, i, 0) to restore the heap on indices [0..i−1]. After each step, arr[i..n−1] is sorted.

ASCII Diagram

  Array as tree (indices):    0
                            /   \
                           1     2
                          / \   /
                         3   4 5
  Parent(i) = (i-1)//2, Left(i) = 2*i+1, Right(i) = 2*i+2

  After build heap: root = max. Swap root with last, heap size -= 1, sift down from 0.
  Sift down: if arr[i] < max(arr[left], arr[right]), swap with the larger child; repeat.

Python Implementation

Sift Down and Build Heap

def sift_down(arr, n, i):
    largest = i
    left = 2 * i + 1
    right = 2 * i + 2
    if left < n and arr[left] > arr[largest]:
        largest = left
    if right < n and arr[right] > arr[largest]:
        largest = right
    if largest != i:
        arr[i], arr[largest] = arr[largest], arr[i]
        sift_down(arr, n, largest)

def heap_sort(arr):
    n = len(arr)
    for i in range(n // 2 - 1, -1, -1):
        sift_down(arr, n, i)
    for i in range(n - 1, 0, -1):
        arr[0], arr[i] = arr[i], arr[0]
        sift_down(arr, i, 0)

Iterative Sift Down (Avoid Recursion)

def sift_down_iter(arr, n, i):
    while True:
        largest = i
        left = 2 * i + 1
        right = 2 * i + 2
        if left < n and arr[left] > arr[largest]:
            largest = left
        if right < n and arr[right] > arr[largest]:
            largest = right
        if largest == i:
            break
        arr[i], arr[largest] = arr[largest], arr[i]
        i = largest

Line-by-Line Explanation

left = 2 * i + 1, right = 2 * i + 2: Standard indexing for a 0-based complete binary tree. Check left < n and right < n before using (node may have no children).
if largest != i: If the current node is not the largest among itself and its children, swap with the larger child and sift down in that subtree. This restores the heap property at i (assuming both subtrees are already heaps).
for i in range(n // 2 - 1, -1, -1): The last non-leaf index is (n//2 − 1). We heapify from bottom to top so that when we sift down at a node, its children are already heaps.
arr[0], arr[i] = arr[i], arr[0]: Move the current max (root) to position i, which becomes part of the sorted suffix. Then sift_down(arr, i, 0) restores the heap on [0..i−1].

Time Complexity

Build heap: Sift down is O(height) per node. Sum over all nodes (bottom-up) gives O(n), not O(n log n)—because most nodes are near the leaves (short sift). Rigorous: at height h there are at most n/2^(h+1) nodes, each sifts O(h); sum h·n/2^(h+1) = O(n).

Sort phase: We do n−1 extractions; each involves a swap and a sift down on a heap of size at most n. So (n−1) × O(log n) = O(n log n).

Total: O(n) + O(n log n) = O(n log n). Same for best, average, and worst case.

Space Complexity

Only a few variables (indices, largest). Recursive sift_down uses O(log n) stack; iterative sift_down uses O(1). So O(1) with iterative sift-down, O(log n) with recursive.

Edge Cases

Empty or single element: Build loop and sort loop run over empty ranges or do nothing; array unchanged.
Two elements: Build heap: one sift_down at index 0 (compare with left child). Sort: one swap, then sift_down on size 1 (no-op). Correct.

Common Mistakes

Wrong parent/child formula: For 0-based indexing, left = 2*i+1, right = 2*i+2, parent = (i−1)//2. Don't use 1-based formulas (2*i, 2*i+1) without adjusting.
Sift down on wrong size: After swapping root with arr[i], the heap is only indices [0..i−1]. Call sift_down(arr, i, 0), not sift_down(arr, n, 0).

Common Mistake

Building the heap by sifting down from the top for each node (or inserting one by one) gives O(n log n) for the build phase. The correct O(n) build is to sift down from the last non-leaf down to the root, so each node is sifted when its children are already heaps.

Comparison with Other Sorts

Algorithm	Time	Space	Stable
Heap sort	O(n log n) all	O(1)	No
Merge sort	O(n log n)	O(n)	Yes
Quick sort	O(n log n) avg, O(n²) worst	O(log n)	No

Optimization Insight

Use iterative sift_down to avoid recursion and keep space O(1). Heap sort is a good choice when you need O(n log n) guaranteed and cannot afford O(n) extra space. For "k largest" or "k smallest," a min-heap of size k (or max-heap for k smallest) gives O(n log k); full heap sort is O(n log n) and is used when you need the entire array sorted in place.

Pattern Recognition

"Largest at root, swap to end, restore heap"—that's heap sort. The same heap operations: build heap (O(n) when done bottom-up), sift down (O(log n)), and the tree indexing (2*i+1, 2*i+2) appear in priority queues, top-k problems, and scheduling.

Interview Insight

Implement heap sort: (1) Build max-heap from the bottom up (last non-leaf down to 0). (2) For i from n−1 to 1, swap arr[0] with arr[i], then sift_down(arr, i, 0). State O(n log n) time, O(1) space (iterative sift-down), not stable. Be able to derive or state that build heap is O(n). For "k largest," mention min-heap of size k in O(n log k) or quickselect.

Practice Problems

Implement heap sort with recursive and iterative sift-down.
Given an array, heapify it (build max-heap) in O(n).
Find the k largest elements (min-heap of size k, or quickselect).
Merge k sorted lists using a min-heap (Topic 9.8).

Summary

Heap sort: Build max-heap (bottom-up, O(n)), then repeatedly swap root with last, shrink heap, sift down. O(n log n) time (all cases), O(1) space (iterative sift-down), not stable.
Tree indexing: left = 2*i+1, right = 2*i+2; parent = (i−1)//2. Sift down: swap with larger child until heap property holds.
Use when you need guaranteed O(n log n) in-place; heap operations also power priority queues and top-k.

6.11 Counting Sort

Introduction

Counting sort is a non-comparison sorting algorithm for integers (or elements that can be mapped to small integers) in a known range, e.g. [0, k]. It counts how many times each value appears, then uses those counts to place each element in the correct position in the output. Time is O(n + k) where n is the number of elements and k is the range size; when k is O(n), this becomes O(n)—faster than any comparison-based sort. Space is O(n + k) for the output and the count array. Counting sort is stable when we iterate through the input from end to start and place each element using cumulative counts (so equal elements keep their relative order). It is the building block for radix sort (Topic 6.12) and is used whenever the key range is small and known.

Real-World Analogy

Imagine sorting a pile of papers that are only graded 1, 2, 3, 4, or 5. Instead of comparing papers to each other, you make five stacks (one per grade), drop each paper into the right stack, then read off the stacks in order: 1, then 2, then 3, 4, 5. That's counting sort: count how many of each value, then output that many of each in order. No comparisons between elements—just counting and placing.

Example

Array [4, 2, 2, 8, 3, 3, 1], range [1..9]. Count: 1→1, 2→2, 3→2, 4→1, 8→1. Cumulative (for stable placement): 1→1, 2→3, 3→5, 4→6, 8→7. Place from end: 1 at index 0, two 2s at 1–2, two 3s at 3–4, 4 at 5, 8 at 6 → [1, 2, 2, 3, 3, 4, 8].

Formal Definition

Concept Note

Counting sort (integer keys in range [0, k] or [min, max]): (1) Count: count[x] = number of times value x appears. (2) Cumulative (for stable): pos[x] = number of elements with value < x (or cumulative sum of counts). (3) Place: for each element in the input (from end to start for stability), put it in the output at pos[value], then increment pos[value]. Stability: Iterating backward and using cumulative positions preserves the order of equal elements. Time O(n + k), space O(n + k).

Why This Topic Matters

Linear time when range is small: If k = O(n), counting sort is O(n)—beats comparison-based Ω(n log n) lower bound because it doesn't compare elements; it uses the integer key as an index.
Stable and simple: Stable counting sort is the standard subroutine in radix sort (sort by digit from least to most significant).
Interviews: "Sort integers in range [0, 100]" or "sort by frequency then by value"—counting (or count then output) is natural. Often combined with "what if range is large?" (use comparison sort or radix).

Mental Model

First pass: count how many 0s, how many 1s, …, how many ks. Second pass (cumulative): for each value, how many elements come before it in sorted order? Third pass: for each element in the input (back to front for stability), look up "where does this value go?"—place it there and advance the position for that value so the next equal element goes in the next slot.

Step-by-Step Breakdown

Find range: If not given, compute min_val and max_val of the array. Range size k = max_val - min_val + 1.
Count: Create array count of size k (index 0 for min_val). Traverse the input and increment count[arr[i] - min_val].
Cumulative: Convert counts to cumulative positions: count[i] += count[i-1] (so count[i] = number of elements with value ≤ value_i). Then adjust so count[i] = starting index for value_i (shift by one, or build a separate pos array).
Place (stable): Create output array. Traverse input from end to start. For each element x, output[count[x]−1] = x, then count[x] -= 1.

ASCII Diagram

  Input: [4, 2, 2, 8, 3, 3, 1], range 1..9 → indices 0..8 for values 1..9
  Count:        [0,1,2,1,0,0,0,0,1]  for values 1,2,3,4,5,6,7,8
  Cumulative:   [0,1,3,5,6,6,6,6,7]  (count[i] += count[i-1], then count[i] = first index for value i+1)
  Place from end: 1→idx 0; 3→idx 4; 3→idx 3; 8→idx 6; 2→idx 2; 2→idx 1; 4→idx 5
  Output: [1, 2, 2, 3, 3, 4, 8]

Python Implementation

Stable Counting Sort (Range [min_val, max_val])

def counting_sort(arr):
    if not arr:
        return []
    min_val, max_val = min(arr), max(arr)
    k = max_val - min_val + 1
    count = [0] * k
    for x in arr:
        count[x - min_val] += 1
    for i in range(1, k):
        count[i] += count[i - 1]
    output = [0] * len(arr)
    for i in range(len(arr) - 1, -1, -1):
        x = arr[i]
        idx = count[x - min_val] - 1
        output[idx] = x
        count[x - min_val] -= 1
    return output

Simple Version (Unstable, When Stability Not Needed)

Just count, then overwrite the array: for each value v, write count[v] copies of v. Simpler but does not preserve order of equal elements.

def counting_sort_simple(arr, max_val):
    count = [0] * (max_val + 1)
    for x in arr:
        count[x] += 1
    i = 0
    for v in range(max_val + 1):
        for _ in range(count[v]):
            arr[i] = v
            i += 1

Line-by-Line Explanation (Stable Version)

count[x - min_val] += 1: We map value x to index x − min_val so the count array has size k (not max_val+1). After the loop, count[i] = frequency of value (min_val + i).
count[i] += count[i - 1]: Cumulative sum. After this, count[i] = number of elements with value ≤ (min_val + i). So count[i] − 1 is the last index where value (min_val + i) should go (if we have 1-based "how many before").
for i in range(len(arr) - 1, -1, -1): Back to front so that when we place an element, we use the current count and then decrement—the next equal element goes to the previous slot, preserving order (stable).

Time Complexity

One pass to find min/max: O(n). One pass to count: O(n). Cumulative sum over k: O(k). One pass to place: O(n). Total O(n + k). When k = O(n), this is O(n). When k is very large (e.g. range 0 to 10^9 with few elements), counting sort is impractical—use comparison sort or radix sort on digits.

Space Complexity

Count array: O(k). Output array: O(n). So O(n + k). In-place counting sort exists (by permuting cycles) but is more complex; usually we accept O(n) for the output.

Edge Cases

Empty array: Return [] or skip.
Single element: count[0] = 1, cumulative count[0] = 1, place at index 0. Correct.
All same value: Count = n for that value; cumulative gives one run of indices; all n elements placed in order (stable).
Negative numbers or large range: Use min_val offset so indices are 0..k−1. If max_val − min_val is huge, consider radix sort or comparison sort instead.

Common Mistakes

Placing from front instead of back: To keep stability, we must place elements in reverse order of the input. If we iterate forward, equal elements get reversed in the output.
Off-by-one in index: After cumulative sum, count[i] is "one past the last index" for value i (or "number of elements ≤ value i"). So the last occurrence of value i goes at index count[i]−1; then decrement count[i].

Common Mistake

Using counting sort when the range k is very large (e.g. 32-bit integers). Space and time O(n + k) become O(2^32) or similar—impractical. Use comparison sort or radix sort (sort by digit in chunks) when range is large.

When to Use: Counting vs Comparison Sorts

Scenario	Choice
Integers, range 0..k with k = O(n)	Counting sort O(n)
Integers, large range (e.g. 32-bit)	Radix sort or comparison sort
Stable sort by digit (for radix)	Stable counting sort per digit

Optimization Insight

When the range is known and small (e.g. ages 0–120, grades 1–5), counting sort is optimal in time. For sorting objects by an integer key, use the key for counting and store the objects (or indices) to preserve stability and handle duplicates. Counting sort is the inner loop of radix sort (Topic 6.12).

Pattern Recognition

"Integer keys in a small known range" → counting sort. "Sort by digit" (least significant to most) → counting sort per digit = radix sort. "Frequency of each value" or "how many times does x appear" → count array first.

Interview Insight

Describe counting sort: count frequencies, then cumulative counts, then place from end to start for stability. State O(n + k) time and space; mention it's non-comparison and beats Ω(n log n) when k = O(n). If the range is large, say you'd use radix sort (digits) or a comparison sort. For "sort integers in [0, 100]," counting sort is ideal.

Practice Problems

Implement stable counting sort for integers in [min, max].
Sort an array of characters (range 0..255 or 'a'..'z') using counting sort.
Use counting sort as the stable digit sort inside radix sort (Topic 6.12).

Summary

Counting sort: Count frequency of each value, compute cumulative positions, place each element from end to start for stability. O(n + k) time and space; stable when placing backward.
Use when keys are integers (or small discrete keys) in a known range; when k = O(n), sort is O(n).
Do not use when range is huge; use radix or comparison sort instead. Stable version is the building block for radix sort.

6.12 Radix Sort

Introduction

Radix sort sorts integers (or strings) by processing them digit by digit (or character by character), from the least significant to the most significant (LSD), using a stable sort for each digit. Because each digit has a small range (0–9 for decimal, 0–255 for a byte), we use counting sort per digit—giving O(d · (n + k)) time where d is the number of digits and k is the digit range (e.g. 10). For fixed-width integers, d is constant, so radix sort is O(n). It handles large value ranges (e.g. 32-bit integers) without the huge count array that pure counting sort would need. Radix sort is stable when the per-digit sort is stable, and uses O(n + k) extra space per pass (typically O(n) for the output). MSD radix sort (most significant first) is an alternative used for variable-length keys or lexicographic order; LSD is simpler and standard for fixed-length integers.

Real-World Analogy

Imagine sorting a stack of dated papers. First sort by day (1–31), then by month (1–12), then by year. Each pass uses a stable sort so that after sorting by month, papers with the same month stay in day order. After the last pass, everything is in full date order. Radix sort does the same with digits: sort by ones, then tens, then hundreds—each pass stable, so the previous order is preserved within the same digit group.

Example

Array [170, 45, 75, 90, 802, 24, 2, 66]. Sort by ones digit: [170, 90, 802, 2, 24, 45, 75, 66]. Sort by tens: [802, 2, 24, 45, 66, 170, 75, 90]. Sort by hundreds: [2, 24, 45, 66, 75, 90, 170, 802]. Each pass uses stable counting sort on that digit.

Formal Definition

Concept Note

LSD Radix sort: Assume all keys have the same number of digits (pad with leading zeros if needed). For digit position p = 0 (least significant) to d−1 (most significant): stable sort the array by the p-th digit. After all passes, the array is sorted. Why stable? When we sort by digit p, keys that agree on digit p keep their relative order from the previous pass—so the sort by digit p−1 is preserved within each digit-p group. Time: d passes × O(n + k) per pass = O(d · (n + k)). For integers in base 10 or base 256, d is the number of digits; k = 10 or 256. Space: O(n + k) per pass (reusable).

Why This Topic Matters

Large range, linear time: For 32-bit integers, we have at most 32/8 = 4 bytes (or 10 decimal digits). So d is small; radix sort runs in O(n) for fixed-width integers when using byte or digit chunks.
No comparisons: Like counting sort, radix sort uses the structure of the key (digits) rather than comparisons—so it can beat the Ω(n log n) comparison lower bound for integers.
Interviews: "Sort integers in O(n)," "sort by digit," or "how would you sort 1 million 32-bit integers?"—radix sort (or counting sort if range is small) is the answer.

Mental Model

Think of each number as having d digits (e.g. 802 = 8,0,2 from most to least significant). We sort one digit at a time, starting from the rightmost digit (least significant). Each pass is a stable sort (counting sort) on that digit only. After the first pass, numbers are ordered by ones; after the second, by ones then tens; after the last pass, by the full key. Stability is critical: it keeps the work from previous passes intact.

Step-by-Step Breakdown (LSD)

Find max: Get the maximum value to determine the number of digits (or use a fixed width, e.g. 32-bit = 4 bytes).
For each digit position from least significant to most significant: extract that digit (or byte) from each element, run stable counting sort on that digit, and replace the array with the sorted order.
After all digit passes, the array is sorted.

ASCII Diagram

  [170, 45, 75, 90, 802, 24, 2, 66]   (ones: 0,5,5,0,2,4,2,6)
  Pass 1 (ones):  [170,90,802,2,24,45,75,66]
  Pass 2 (tens):  [802,2,24,45,66,170,75,90]   (tens: 7,0,0,2,2,4,9,7)
  Pass 3 (hundreds): [2,24,45,66,75,90,170,802]
  Sorted.

Python Implementation

LSD Radix Sort (Decimal Digits)

def counting_sort_by_digit(arr, exp):
    n = len(arr)
    output = [0] * n
    count = [0] * 10
    for i in range(n):
        digit = (arr[i] // exp) % 10
        count[digit] += 1
    for i in range(1, 10):
        count[i] += count[i - 1]
    for i in range(n - 1, -1, -1):
        digit = (arr[i] // exp) % 10
        output[count[digit] - 1] = arr[i]
        count[digit] -= 1
    for i in range(n):
        arr[i] = output[i]

def radix_sort(arr):
    if not arr:
        return
    max_val = max(arr)
    exp = 1
    while max_val // exp > 0:
        counting_sort_by_digit(arr, exp)
        exp *= 10

LSD Radix Sort (By Byte, for Large Integers)

For 32-bit non-negative integers, process 4 bytes (or 8-bit chunks). Each pass: extract byte (x >> (8*p)) & 255, counting sort on 0..255, then copy back. Four passes; each O(n + 256) = O(n).

Line-by-Line Explanation

(arr[i] // exp) % 10: For exp = 1 we get the ones digit; for exp = 10 the tens digit; for exp = 100 the hundreds digit. So we isolate one decimal digit per pass.
counting_sort_by_digit(arr, exp): Sorts the array in place by the digit given by exp. We use the same stable counting sort as Topic 6.11; the "key" is that digit.
while max_val // exp > 0: We stop when exp exceeds the maximum value (no more digits). For each pass we multiply exp by 10 (next digit to the left).

Time Complexity

Let d be the number of digits (or byte passes). Each pass is a counting sort: O(n + k) with k = 10 (digits) or 256 (bytes). Total O(d · (n + k)). For integers in a fixed range, d is constant (e.g. 10 decimal digits for 32-bit, or 4 bytes). So O(n) when d and k are constants. For variable-length strings, d is the length of the longest string.

Space Complexity

Per pass: count array O(k), output array O(n). So O(n + k); k is small (10 or 256). Can reuse the same output buffer for every pass.

Edge Cases

Empty array: Return or skip.
Negative numbers: Standard LSD radix sort assumes non-negative keys. For signed integers, split into negative and non-negative, radix sort each (negatives: complement or offset for the digit extraction), then concatenate negative (reversed) and non-negative.
Variable length: Pad with leading zeros so all have the same number of digits (for LSD). Or use MSD radix sort, which naturally handles variable length (shorter strings are "smaller" in the digit sense if we treat missing digit as 0).

Common Mistakes

Using an unstable sort per digit: If the per-digit sort is not stable, the previous digit order is lost and the final array may be wrong. Always use stable counting sort.
Wrong digit order: LSD must process from least significant to most significant. If you sort by most significant first (MSD), you need a different algorithm (MSD radix sort, often recursive).

Common Mistake

Forgetting that radix sort is for fixed or bounded key length. If keys are arbitrary-length strings and you use LSD with padding, d equals the max length—fine. If keys are integers with unbounded size, d can grow; then radix sort is O(n · d) and may not beat comparison sort. For 32- or 64-bit integers, d is constant.

LSD vs MSD

Variant	Use
LSD	Fixed-length integers; simple; stable sort per digit from right to left.
MSD	Variable-length keys, lexicographic order; can skip empty buckets; recursive.

Optimization Insight

For 32-bit integers, using byte-based radix (4 passes, k=256) is often faster than digit-based (10 passes, k=10) because fewer passes. Use counting sort for each byte. For strings, LSD with fixed padding or MSD (trie-like) are both used depending on the data.

Pattern Recognition

"Sort integers in O(n)" or "sort without comparison" with large range → radix sort (or counting sort if range is small). "Sort by digit" or "sort strings lexicographically" with fixed-length or by character → radix (LSD or MSD). The key is that each digit has a small range so counting sort per digit is O(n).

Interview Insight

Describe LSD radix sort: sort by least significant digit first, then next digit, …, using a stable sort (counting sort) each time. State O(d · n) for d digits and small digit range. Mention that it's good for fixed-width integers and beats O(n log n) because it's non-comparison. If asked about negatives, say you'd split into negative and non-negative and handle separately (e.g. two's complement or offset).

Practice Problems

Implement LSD radix sort for non-negative integers (decimal digits then byte-based).
Sort an array of strings of equal length using LSD radix sort (by character position from right to left).
Extend to signed integers (split, radix sort, recombine).

Summary

Radix sort (LSD): Sort by least significant digit, then next, …, using a stable sort (counting sort) per digit. O(d · (n + k)) time, O(n + k) space; for fixed d and k, O(n).
Stability of the per-digit sort is essential so that the order from previous passes is preserved.
Use for fixed-width integers (or fixed-length strings) when you need O(n) sort; use counting sort when the full key range is small.

6.13 Bucket Sort

Introduction

Bucket sort assumes the input is uniformly distributed over a known range. It distributes elements into buckets (e.g. bucket i holds values in the range [i/n, (i+1)/n) when values are in [0, 1)), then sorts each bucket (typically with insertion sort, since buckets are small), and concatenates the buckets in order. When distribution is uniform, each bucket has O(1) elements on average, so sorting all buckets is O(n) expected—expected time O(n). Worst case (all elements in one bucket) is O(n²) if we use insertion sort per bucket. Space is O(n) for the buckets. Bucket sort is useful for floating-point numbers in [0, 1) or for keys that can be normalized to a range; it is a distribution sort (like counting and radix) that relies on the key structure rather than comparisons.

Real-World Analogy

Imagine sorting exam scores from 0 to 100. You set up 10 buckets: 0–9, 10–19, …, 90–100. You drop each paper into the right bucket. If scores are spread out, each bucket has only a few papers; you sort each bucket quickly (e.g. insertion sort) and then stack the buckets in order. That's bucket sort: scatter into buckets by range, sort each bucket, then concatenate.

Example

Values in [0, 1): [0.78, 0.17, 0.39, 0.26, 0.72, 0.94]. Use 6 buckets: [0,1/6), [1/6,2/6), … Bucket 0: empty; 1: [0.17]; 2: [0.26, 0.39]; 3: empty; 4: [0.72, 0.78]; 5: [0.94]. Sort each (insertion sort): [0.17], [0.26, 0.39], [0.72, 0.78], [0.94]. Concatenate → [0.17, 0.26, 0.39, 0.72, 0.78, 0.94].

Formal Definition

Concept Note

Bucket sort (values in [0, 1) or normalized range): Create n buckets (or a fixed number). For each element x, place it in bucket floor(x * n) (for [0,1)) or the bucket that covers x's range. Sort each bucket (insertion sort or another sort). Append buckets in order to get the sorted array. Expected time: O(n) when the distribution is uniform (each bucket O(1) size). Worst time: O(n²) when all elements fall in one bucket. Space: O(n).

Why This Topic Matters

Expected O(n) for uniform data: When input is uniformly distributed, bucket sort is one of the few algorithms that achieve linear expected time with no comparison-based lower bound.
Floating-point and normalized keys: Counting sort needs integer indices; radix sort needs digits. Bucket sort works directly on reals in [0, 1) or any range you can map to buckets.
Interviews: "Sort numbers uniformly distributed in [0, 1)" or "what sort would you use for uniformly distributed floats?"—bucket sort is the standard answer.

Mental Model

Divide the range into n equal intervals (buckets). Scatter each element into the bucket that contains its value. Each bucket is "almost full" with about one element on average (if uniform). Sort each bucket (cheap—small size), then read the buckets in order. No merging step—just concatenation.

Step-by-Step Breakdown

Create buckets: n empty buckets (e.g. lists), indexed 0 to n−1.
Scatter: For each element x in [0, 1), bucket index = int(x * n) (or min of that and n−1 to handle 1.0). Append x to that bucket.
Sort each bucket: Use insertion sort (or any sort) on each bucket.
Concatenate: Output = bucket[0] + bucket[1] + … + bucket[n−1].

ASCII Diagram

  Input [0.78, 0.17, 0.39, 0.26, 0.72, 0.94], n=6
  Bucket index = floor(x * 6). 0.78→4, 0.17→1, 0.39→2, 0.26→1, 0.72→4, 0.94→5
  Buckets: [] [0.17,0.26] [0.39] [] [0.72,0.78] [0.94]
  After sort each: [] [0.17,0.26] [0.39] [] [0.72,0.78] [0.94]
  Concatenate: [0.17, 0.26, 0.39, 0.72, 0.78, 0.94]

Python Implementation

def bucket_sort(arr):
    if not arr:
        return []
    n = len(arr)
    buckets = [[] for _ in range(n)]
    for x in arr:
        idx = min(int(x * n), n - 1)
        buckets[idx].append(x)
    for b in buckets:
        b.sort()
    return [x for b in buckets for x in b]

For values in a range [min_val, max_val], normalize: (x - min_val) / (max_val - min_val + 1e-9) then use the same logic, or map directly to bucket index: int((x - min_val) / (max_val - min_val + 1e-9) * n).

Line-by-Line Explanation

idx = min(int(x * n), n - 1): For x in [0, 1), x * n is in [0, n). int(x * n) can be 0 to n−1; if x is exactly 1.0, we get n, so min(..., n-1) keeps the index valid.
b.sort(): Python's sort is O(k log k) for a bucket of size k. Total over buckets: sum O(n_i log n_i). Expected case: each bucket O(1) size → O(n) total. Worst: one bucket has n elements → O(n log n) for that bucket (or O(n²) with insertion sort).

Time Complexity

Expected (uniform distribution): Each bucket has O(1) elements on average. Sorting each bucket is O(1) on average (insertion sort on constant size). Total O(n). Worst case: All elements in one bucket; that bucket has n elements. With insertion sort per bucket → O(n²). With a comparison sort per bucket → O(n log n) for that bucket, so total O(n log n). Space: O(n) for the buckets.

Space Complexity

We need n buckets and the elements are distributed among them. O(n).

Edge Cases

Empty array: Return [].
All same value: All go to one bucket; sort that bucket (already sorted).
Value exactly 1.0: Use min(int(x * n), n - 1) so we don't index out of bounds.

Common Mistakes

Assuming O(n) always: Bucket sort is O(n) only in the expected case when the distribution is uniform. Worst case can be O(n²) with insertion sort per bucket.
Wrong bucket index for [0, 1): Use int(x * n); ensure 1.0 maps to n−1 (not n) to avoid index error.

Common Mistake

Using bucket sort when the data is not uniformly distributed (e.g. heavily skewed). Then many elements fall in few buckets and the "sort each bucket" step becomes expensive. Use comparison sort or another distribution sort (counting/radix) when the key structure fits better.

When to Use

Scenario	Choice
Floats in [0, 1), uniform	Bucket sort O(n) expected
Integers in small range	Counting sort
Skewed or unknown distribution	Comparison sort (merge/quick)

Optimization Insight

Use insertion sort for small buckets (low constant factor); use a full O(k log k) sort per bucket if you want to avoid O(n²) worst case when one bucket gets many elements. The number of buckets is often chosen as n; more buckets mean smaller buckets (faster to sort) but more overhead.

Pattern Recognition

"Uniformly distributed in [0, 1)" or "floats in a range" → bucket sort. "Distribute by range then sort small groups" is the pattern. When data is not uniform, bucket sort loses its advantage.

Interview Insight

Describe bucket sort: create n buckets for range [0, 1), scatter each element into bucket floor(x * n), sort each bucket (e.g. insertion sort), concatenate. State O(n) expected when uniform, O(n²) worst when all in one bucket. Mention it's good for uniformly distributed floats; for integers in small range use counting sort instead.

Practice Problems

Implement bucket sort for floats in [0, 1).
Bucket sort for integers in [min, max] by mapping to bucket index.

Summary

Bucket sort: Scatter elements into buckets by range, sort each bucket, concatenate. O(n) expected when uniform, O(n²) worst with insertion sort per bucket. O(n) space.
Use for uniformly distributed data in a known range (e.g. [0, 1)); not for skewed or arbitrary distributions.

6.14 Stability in Sorting

Introduction

A sorting algorithm is stable if whenever two elements are equal according to the sort key, their relative order in the output is the same as in the input. That is, if element A appears before element B in the input and A and B are equal, then A still appears before B in the output. Stability matters when you sort by one key first and then by another (e.g. sort by name, then by age—you want people with the same age to remain in name order), or when the sort is used as a subroutine (e.g. radix sort requires a stable per-digit sort). Not all sorts are stable: quicksort and heapsort typically are not; merge sort and insertion sort are. This topic defines stability, explains why it matters, and summarizes which algorithms are stable and how to achieve stability when needed.

Real-World Analogy

Imagine sorting a class list first by last name, then by first name. After the second sort, you want "Smith, Alice" and "Smith, Bob" to stay in alphabetical order by first name. If the second sort is stable, all "Smith" entries will keep their relative order from the first-name sort. If it's unstable, two "Smith" entries might be swapped even when their first names were already in order.

Example

Pairs (score, name): [(3, "Alice"), (2, "Bob"), (3, "Charlie")]. Sort by score. Stable sort → [(2, "Bob"), (3, "Alice"), (3, "Charlie")] (Alice before Charlie because they were in that order for equal score). Unstable sort might give [(2, "Bob"), (3, "Charlie"), (3, "Alice")].

Formal Definition

Concept Note

Stability: Let the sort key be a function key(x). A sort is stable if for any two elements a, b with key(a) = key(b), if a appeared before b in the input, then a appears before b in the output. Equivalently: equal elements preserve their original relative order. Why it breaks: An algorithm that swaps or moves elements without regard to original position can reorder equal elements (e.g. quicksort swapping with pivot, heapsort swapping root to end).

Why This Topic Matters

Multi-key sorting: Sort by (last name, first name) by first sorting by first name, then by last name with a stable sort. The second sort preserves first-name order within the same last name.
Radix sort: LSD radix sort requires a stable sort per digit. If the per-digit sort is unstable, the final order can be wrong.
Interviews: "What is a stable sort?" "Which sorts are stable?" "How would you make quicksort stable?" (e.g. attach original index as tiebreaker).

Mental Model

When two elements compare equal, a stable sort never swaps them; it leaves the one that was first in the input still first. So "equal" is treated as "no swap." In code, that usually means using strict < in comparisons (so we only swap when strictly less/greater), or when merging, taking the element from the left subarray when keys are equal (merge sort).

Which Sorts Are Stable?

Stable: Merge sort (take left when equal in merge), insertion sort (insert after equals with strict >), bubble sort (swap only when strict >), counting sort (place from end to start), radix sort (stable per-digit sort), bucket sort (if each bucket is sorted with a stable sort).
Unstable: Quick sort (partition can swap equals past each other), heap sort (swap root with end can reorder equals), selection sort (swapping min to front can move a later equal past an earlier one).

Comparison Table

Algorithm	Stable?	Note
Merge sort	Yes	Take left when equal in merge
Insertion / Bubble	Yes	Use strict > (no swap on equal)
Counting / Radix	Yes	Place backward; stable per digit
Quick sort / Heap sort / Selection	No	Swaps can reorder equals

How to Make an Unstable Sort Stable

Attach the original index (or any unique tiebreaker) to each element. When two keys are equal, compare the indices so that the element that was first in the input is considered "smaller." Then any sort becomes stable with respect to the original order, because the comparison is never truly equal—we break ties by index. Example: sort pairs (value, index); compare by value first, then by index. This uses O(n) extra space for the indices.

# Stable "quicksort" by attaching original index as tiebreaker
def stable_sort(arr):
    indexed = [(x, i) for i, x in enumerate(arr)]
    indexed.sort(key=lambda p: (p[0], p[1]))
    return [x for x, i in indexed]

Common Mistakes

Using >= or <= when swapping: For stability in bubble/insertion, use strict > so we do not swap when equal. If you use >=, equal elements may be swapped.
Assuming "default" quicksort is stable: Standard in-place quicksort is not stable. Say "quicksort is unstable unless we use a tiebreaker."

Interview Insight

Define stability: equal elements keep their relative order. Give an example (sort by name then age). List stable sorts (merge, insertion, bubble, counting, radix) and unstable (quick, heap, selection). To make an unstable sort stable, add original index as secondary key so ties are broken by position.

Summary

Stable sort: If key(a) = key(b) and a was before b in the input, a is before b in the output.
Stable: merge, insertion, bubble, counting, radix (with stable digit sort), bucket (with stable bucket sort). Unstable: quick, heap, selection.
Use stable sort for multi-key sorting and as the subroutine for radix sort. To get stability with an unstable algorithm, attach original index as tiebreaker.

6.15 Inversion Count

Introduction

An inversion in an array is a pair of indices (i, j) such that i < j and arr[i] > arr[j]—i.e. two elements that are out of order. The inversion count (or number of inversions) measures how "unsorted" the array is: it is 0 when the array is sorted (ascending) and maximum when the array is reverse sorted. Counting inversions in O(n²) is trivial (check every pair); we can do it in O(n log n) by modifying merge sort: during the merge step, whenever we take an element from the right subarray, every remaining element in the left subarray is greater than it, so we add the number of remaining left elements to the inversion count. Inversion count equals the minimum number of adjacent swaps needed to sort the array (as in bubble sort). It appears in problems about "how far from sorted," "minimum swaps to sort," and in analysis of sorting algorithms.

Real-World Analogy

Imagine a queue of people sorted by height. An inversion is a pair where the taller person is standing in front of the shorter one. Counting inversions tells you "how many pairs are in the wrong order." If you could only swap adjacent people (like bubble sort), the number of such swaps needed to sort the queue is exactly the inversion count.

Example

Array [2, 4, 1, 3, 5]. Inversions: (2,1), (4,1), (4,3) → count = 3. Sorted is [1,2,3,4,5]; we need at least 3 adjacent swaps to get there (e.g. swap 4 and 1, then 2 and 1, then 4 and 3).

Formal Definition

Concept Note

Inversion: A pair (i, j) with 0 ≤ i < j < n and arr[i] > arr[j]. Inversion count: The total number of such pairs. Relation to sorting: Minimum number of adjacent swaps to sort the array (bubble sort) equals the inversion count. Merge-sort approach: When merging two sorted halves, if we take an element from the right half, it is smaller than every remaining element in the left half—so there are (mid − i + 1) inversions involving this right element and those left elements. Sum over all such "right" picks to get the total count.

Why This Topic Matters

Minimum adjacent swaps: Problems like "sort the array using only adjacent swaps" have answer = inversion count.
Measure of disorder: Inversion count is a simple measure of how far the array is from sorted; used in rankings and similarity.
Interviews: "Count inversions in an array" is a classic merge-sort modification. Also "minimum swaps to sort" (adjacent or not—different formulas).

Mental Model

During merge sort, when we merge the left and right sorted halves, we compare the front of each. When we take an element from the right half and put it in the output, that element is smaller than every element still in the left half (because the left half is sorted and we're taking the smallest remaining from the right). So this one right element forms an inversion with each of those remaining left elements. Add (number of remaining left elements) to the inversion count. We do not count anything when we take from the left (that element is smaller than the current right front, so no inversion with the right elements we've already placed).

Step-by-Step (Merge-Sort Based)

Run merge sort, but during the merge step, maintain a global (or passed) inversion count.
When merging: when we choose to take an element from the right subarray (because it is smaller than the current left front), add (number of elements remaining in the left subarray) to the inversion count. That equals (mid − i + 1) if we use index i for the current position in the left array and mid is the end of the left array.
Return the inversion count (and the sorted array if needed).

Python Implementation

def merge_and_count(arr, temp, left, mid, right):
    i, j, k = left, mid + 1, left
    inv_count = 0
    while i <= mid and j <= right:
        if arr[i] <= arr[j]:
            temp[k] = arr[i]
            i += 1
        else:
            temp[k] = arr[j]
            j += 1
            inv_count += (mid - i + 1)
        k += 1
    while i <= mid:
        temp[k] = arr[i]
        i += 1
        k += 1
    while j <= right:
        temp[k] = arr[j]
        j += 1
        k += 1
    for i in range(left, right + 1):
        arr[i] = temp[i]
    return inv_count

def count_inversions(arr, temp, left, right):
    inv_count = 0
    if left < right:
        mid = (left + right) // 2
        inv_count += count_inversions(arr, temp, left, mid)
        inv_count += count_inversions(arr, temp, mid + 1, right)
        inv_count += merge_and_count(arr, temp, left, mid, right)
    return inv_count

# Usage: temp = [0] * len(arr); total = count_inversions(arr, temp, 0, len(arr)-1)

Line-by-Line Explanation

if arr[i] <= arr[j]: We take from the left when it's less than or equal. So we only take from the right when arr[j] < arr[i]. In that case, arr[j] is smaller than arr[i] and every element after i in the left half (all ≥ arr[i]). So arr[j] forms an inversion with (mid − i + 1) elements.
inv_count += (mid - i + 1): The number of elements left in the left subarray (from i to mid inclusive). Each of these is an inversion with the current right element we're placing.

Time Complexity

Same as merge sort: we do the same splits and merges, with one extra addition per "take from right" event. So O(n log n). Space O(n) for the temp array and recursion stack O(log n).

Space Complexity

O(n) for the temporary array used in merge; O(log n) for recursion stack. O(n) total.

Brute Force vs Merge-Sort

Method	Time	Space
Brute force (all pairs)	O(n²)	O(1)
Merge sort based	O(n log n)	O(n)

Optimization Insight

For "minimum adjacent swaps to sort," the answer is exactly the inversion count (each adjacent swap fixes exactly one inversion). For "minimum swaps to sort" when any two elements can be swapped (not necessarily adjacent), the answer is n − c where c is the number of cycles in the permutation (decompose into cycles and each cycle of length k needs k−1 swaps). Don't confuse the two problems.

Edge Cases

Empty or single element: 0 inversions.
Sorted array: 0 inversions (no pair is out of order).
Reverse sorted: Every pair (i, j) with i < j is an inversion → n(n−1)/2.

Interview Insight

Define an inversion as (i, j) with i < j and arr[i] > arr[j]. Say you can count in O(n²) by checking all pairs, or in O(n log n) by modifying merge sort: when merging, whenever you take an element from the right half, add (remaining left half size) to the count. Mention that inversion count = minimum adjacent swaps to sort (bubble sort).

Practice Problems

Count inversions in an array (merge-sort method).
Minimum adjacent swaps to sort (answer: inversion count).
Count inversions where arr[i] > 2*arr[j] (modify merge step to count such pairs in O(n log n)).

Summary

Inversion: (i, j) with i < j and arr[i] > arr[j]. Inversion count = total number of such pairs.
Count in O(n log n) by merge sort: when placing an element from the right half during merge, add (mid − i + 1) to the count.
Inversion count = minimum adjacent swaps needed to sort the array. For "any swap" minimum swaps, use n − (number of cycles).

7.1 String Basics

Introduction

In the world of data structures and algorithms, a string is one of the most fundamental and frequently used types of data. Whether you're checking if a word is a palindrome, finding anagrams, searching for a pattern in text, or parsing input, you are working with strings. String basics are the foundation for the entire Strings section: once you understand how strings are represented, how to access and slice them efficiently, and how immutability affects your algorithms, you can tackle frequency counting, anagrams, palindromes, pattern matching, and advanced algorithms like KMP and hashing with confidence.

In Python, a string is an immutable sequence of Unicode characters. You can think of it as a read-only array of characters: you can read any position in O(1), but you cannot change a character in place—any "change" creates a new string. This has important consequences for how we design string algorithms and how we reason about time and space.

Real-World Analogy

Imagine a necklace of beads where each bead is a letter. The beads are fixed on the string in order: you can point to the 1st bead, the 5th bead, or the last bead. You can look at any bead or a contiguous stretch of beads (a substring). But you cannot replace a single bead without making a whole new necklace—that's immutability. When you "add" another necklace (concatenation), you're really creating a new, longer necklace by copying both. So "building" a long necklace by adding one bead at a time means making a new necklace each time, which gets expensive. The smart way is to collect all the beads (or segments) first and then thread them once—in code, that's using a list and ''.join().

Example

String s = "hello". The length is 5. The first character is s[0] → 'h'; the last is s[-1] → 'o'. The substring from index 1 to 4 (exclusive) is s[1:4] → "ell". Reversing the string with s[::-1] gives "olleh"—useful for checking palindromes. You cannot do s[0] = 'H'; to get "Hello" you must build a new string, e.g. 'H' + s[1:].

Formal Definition

Concept Note

String: A finite sequence of characters from some alphabet (e.g. ASCII, Unicode). The length n is the number of characters. We use 0-based indexing: the first character is at index 0, the last at index n−1. A substring (or slice) is a contiguous segment s[i:j] (indices i to j−1). A prefix is s[0:k]; a suffix is s[k:n]. Strings are compared lexicographically (character by character, like dictionary order). In Python, strings are immutable: no in-place modification.

Why This Topic Matters

Core DSA topic: String problems appear in every interview and contest: palindromes, anagrams, pattern matching, parsing, and text processing. String basics give you the vocabulary and operations you need.
Efficiency matters: Accessing s[i] is O(1), but building a string with s += c in a loop is O(n²). Knowing when to use a list and ''.join() keeps your solutions fast.
Bridges to advanced topics: Frequency counting (7.2), anagrams (7.3), and palindromes (7.4) all rely on iterating over characters, slicing substrings, and comparing strings. Pattern matching (7.5+) builds on substrings and indices.
Language-agnostic thinking: In other languages (C++, Java), strings may be mutable or stored differently, but the logical view—sequence of characters, indexing, substrings—is the same. Master the concepts here and you can adapt.

Mental Model

Think of a string as a read-only array of characters with fixed positions. You have a cursor (index) that you can move; at each position you can read the character in O(1). You can also "window" a contiguous range (slice) to get a substring—that costs O(k) where k is the length of the slice, because a new string is created. You never "edit" the string; you only create new strings (slices, concatenations, or results of methods like replace, upper). When you need to build a string from many parts, imagine collecting the parts in a list and then joining them once—that way you pay O(n) total instead of repeated copying.

Step-by-Step: How We Work With Strings in Algorithms

Read and index: Use s[i] for a single character (O(1)) and s[start:end] for a substring (O(length of slice)). Use len(s) for length.
Iterate: Use for c in s to iterate over characters, or for i in range(len(s)) when you need indices. Use enumerate(s) for both.
Compare: Use ==, <, > for lexicographic comparison. For "are these two strings equal?" use s1 == s2 (O(n)).
Build new strings: Avoid result += c in a loop. Use parts = [], parts.append(...), then ''.join(parts) for O(n) total.
Reverse and slice: s[::-1] is the reversed string. For palindrome check, s == s[::-1] is simple but uses O(n) extra space; two pointers from both ends give O(1) space.

ASCII Diagram: String as Indexed Sequence

  s = "hello"   (length n = 5)

  Index:    0   1   2   3   4
  Char:     h   e   l   l   o
            ↑               ↑
          s[0]            s[4] or s[-1]

  Slice s[1:4]  →  indices 1,2,3  →  "ell"
  Slice s[:3]   →  indices 0,1,2  →  "hel"   (prefix)
  Slice s[2:]   →  indices 2,3,4  →  "llo"   (suffix)
  Slice s[::-1] →  step -1        →  "olleh" (reverse)

Python Implementation: Essential Operations

Creation, Length, Indexing

s = "hello"
n = len(s)        # 5
c0 = s[0]         # 'h'  — first character
c_last = s[-1]    # 'o'  — last character
sub = s[1:4]      # "ell" — substring (indices 1 to 3)
rev = s[::-1]     # "olleh" — reverse

Immutability: You Cannot Assign to s[i]

# s[0] = 'H'   # TypeError: 'str' object does not support item assignment

# To "change" one character, build a new string:
s_new = 'H' + s[1:]    # "Hello"
# Or for a generic index i:
i = 0
s_new = s[:i] + 'H' + s[i+1:]   # "Hello" when i=0

Building a String Efficiently (List + join)

# Slow: O(n^2) — avoid in loops
result = ""
for c in "hello":
    result = result + c.upper()   # each += copies entire result

# Fast: O(n)
result = ''.join(c.upper() for c in "hello")   # "HELLO"

# Or explicitly with a list:
parts = []
for c in "hello":
    parts.append(c.upper())
result = ''.join(parts)

Common Methods Relevant to DSA

s = "hello world"
s.split()           # ['hello', 'world'] — by whitespace
s.find("ell")       # 1 — index of first occurrence, or -1 if not found
s.count("l")        # 3 — number of non-overlapping occurrences
s.startswith("hel") # True
s.endswith("rld")    # True
s.replace("l", "L") # "heLLo worLd" — returns new string
s.upper()           # "HELLO WORLD"
s.lower()           # "hello world"

Line-by-Line: Why join Is O(n) and += in a Loop Is O(n²)

Suppose you build a string by adding one character at a time in a loop. The first concatenation copies 1 character, the second copies 2, the third copies 3, … the nth copies n. Total work is 1 + 2 + 3 + … + n = n(n+1)/2 = O(n²). So repeated result += c is quadratic.

With a list: each parts.append(c) is amortized O(1). At the end, ''.join(parts) allocates one new string of length n and copies each character once: O(n). So the whole process is O(n).

Time Complexity of Key Operations

Operation	Time
`s[i]`, `len(s)`	O(1)
`s[i:j]` (slice of length k)	O(k)
`s1 + s2` (concatenation)	O(len(s1)+len(s2))
`c in s`, `s.find(sub)`	O(n)
`s1 == s2`	O(n)
`''.join(list_of_strings)` (total length n)	O(n)

Space Complexity

Storing a string of length n takes O(n) space. Slicing s[i:j] creates a new string of length k, so O(k) extra space. Reversing via s[::-1] uses O(n) extra space; reversing in place is not possible because strings are immutable. For O(1) extra space, use two pointers and compare s[left] and s[right] without creating a new string (e.g. for palindrome check).

Edge Cases

Empty string: "" has length 0. s[0] would raise IndexError. Check if not s or len(s) == 0 before indexing.
Single character: "a" has length 1; s[0] and s[-1] are the same. Reversing gives the same string (palindrome).
Whitespace and case: Depending on the problem, you may need to strip whitespace (strip) or normalize case (lower()) before comparing (e.g. "A man a plan a canal Panama" for palindromes).
Unicode: In Python 3, strings are Unicode. One "character" might be one code point (e.g. 'a') or a grapheme cluster (e.g. some emojis). For many DSA problems we assume ASCII or single code points; for full i18n, consider normalization.

Common Mistakes

Building a string with += in a loop: This leads to O(n²) time. Use a list and ''.join() instead.
Trying to assign to s[i]: Strings are immutable; use s = s[:i] + new_char + s[i+1:] or build a new string another way.
Assuming s[i] is a special "char" type: In Python, s[i] is a string of length 1. Comparisons like s[0] == 'h' work; you can use it in sets or as dict keys.
Off-by-one in slices: s[i:j] includes index i and excludes j. So s[0:len(s)] is the whole string; s[0:len(s)-1] is all but the last character.

Common Mistake

Using result += c (or result = result + c) inside a loop to build a string. Each concatenation copies the entire current result, so after n steps you've done O(n²) work. Always prefer parts.append(c) and ''.join(parts) when building strings in a loop.

Optimization Insight

Brute: Build string with += in a loop → O(n²) time. Better: Use a list, append each part in O(1) amortized, then ''.join(parts) in O(n) → O(n) total. For palindrome check, s == s[::-1] is O(n) time and O(n) space; optimal for space is two pointers and compare s[left] == s[right] while moving inward—O(n) time, O(1) space.

Pattern Recognition

When you see a string problem, ask: Do I need to read characters (index, iterate), extract substrings (slicing), compare strings (equality, lexicographic), or build a new string? For reading and comparing, use indices and loops. For building, use a list and join. For "is it a palindrome?", either s == s[::-1] (simple) or two pointers (O(1) space). For "are two strings anagrams?", sorted(s1)==sorted(s2) is O(n log n); frequency count (e.g. Counter) is O(n). These patterns will recur in the next topics (7.2–7.5).

Expert Tip

Keep Python's string methods in mind: find, count, startswith, endswith, split, replace. For interviews, you're often allowed to use these; for learning, implement key logic yourself (e.g. substring search) to understand later algorithms like KMP. When in doubt, index and slice explicitly—it's clear and correct.

Interview Insight

Interviewers expect you to know: strings are immutable; s[i] is O(1), slicing s[i:j] is O(k); building with += in a loop is O(n²)—use list and ''.join(). For palindrome: s == s[::-1] or two pointers. For anagram: sort both and compare, or use a frequency map. Stating these complexities and trade-offs shows you understand string basics and are ready for harder string problems.

Practice Problems

Check if a string is a palindrome (two-pointer and slice versions).
Reverse a string (and reverse words in a sentence).
Build a string from alternating characters of two strings (e.g. "ace" and "bdf" → "abcdef").
Given a string, replace every occurrence of a character with another (without using str.replace).

Summary

A string is an immutable sequence of characters; 0-based indexing, s[i] O(1), s[i:j] O(k).
Immutability: You cannot assign to s[i]; "changes" require building a new string (e.g. slice + concatenation).
Building strings: Use parts.append(...) and ''.join(parts) for O(n) total; avoid result += c in a loop (O(n²)).
Use len(s), s[::-1] for reverse, s.find, s.count, split, strip, upper/lower as needed; all return new values or indices.
Edge cases: empty string, single character, case and whitespace. For palindromes and anagrams, these basics are the foundation for the next topics in Section 7.

7.2 Frequency Counting

Introduction

Frequency counting means counting how many times each distinct element (character, digit, or key) appears in a string, array, or sequence. It is one of the most common and powerful techniques in string and array problems. Once you have a frequency map—a dictionary (or hash map) from element to count—you can answer questions like: "Do two strings have the same character counts?" (anagrams), "Which character appears most often?", "Can we form string A from the characters of string B?", and "How many characters do we need to change?" Frequency counting turns many "compare every pair" or "search for each element" ideas into a single pass plus O(1) lookups, giving O(n) time instead of O(n²) or worse.

In this topic we focus on character frequency in strings: building the map, using it for comparison and validation, and connecting it to anagrams (7.3) and other string problems. The same pattern applies to counting digits, array elements, or any hashable keys.

Real-World Analogy

Imagine you have a bag of letter tiles (like in Scrabble). To check whether you can spell a word, you don't try every possible arrangement—you count how many of each letter you have. Then for the target word, you count how many of each letter it needs. If, for every letter, your count is at least the word's count, you can spell it. That's frequency counting: two counts (your tiles, the word) and a per-letter comparison. Similarly, to see if two words are anagrams (same letters, different order), you compare their letter counts; if every letter has the same count in both, they're anagrams.

Example

String s = "aabbbc". Frequency map: {'a': 2, 'b': 3, 'c': 1}. We can answer: "How many 'b's?" → 3. "Which character appears most?" → 'b'. For t = "abc", map {'a': 1, 'b': 1, 'c': 1}. To check "can we form t from characters of s?" we need, for each char in t, freq_s[char] >= freq_t[char]. Here s has 2 a's, 3 b's, 1 c; t needs 1, 1, 1 → yes. For anagrams: "listen" and "silent" have the same frequency map (each letter count 1), so they are anagrams.

Formal Definition

Concept Note

Frequency (count) of an element x in a sequence S: the number of indices i such that S[i] = x. A frequency map (or frequency table) is a mapping from each distinct element to its count. For a string of length n over an alphabet of size Σ, the map has at most Σ keys; we typically use a hash table (dict) so that building the map is O(n) and lookup/update is O(1) amortized. Two strings are anagrams iff their frequency maps are equal. We say we can "form" string A from string B iff for every character c, freq_B[c] ≥ freq_A[c].

Why This Topic Matters

Anagrams (7.3): The standard way to check if two strings are anagrams is to compare their character frequencies—either build two maps and compare, or build one and decrement while scanning the other. Frequency counting is the backbone of anagram problems.
Palindrome construction: A string can be rearranged into a palindrome iff at most one character has odd frequency. So building the frequency map is the first step.
Substring / window problems: "Find the minimum window in s that contains all characters of t" uses frequency maps for t and for the current window; we update counts as we slide.
Interviews: "Are two strings anagrams?", "First unique character", "Majority element", "Group anagrams"—all rely on counting. Using a dict or Counter and stating O(n) time is expected.

Mental Model

One pass over the sequence: for each element, either add it to the map with count 1 (first time) or increment its count. After the pass, the map answers "how many times does x appear?" in O(1). For comparing two strings (e.g. anagrams), you can build two maps and check equality, or build one map from the first string and then iterate over the second, decrementing the count for each character—if any count goes negative or a character is missing, they're not anagrams. Think: "count first, compare counts."

Step-by-Step: Building a Frequency Map

Initialize: Create an empty dictionary freq = {} (or defaultdict(int), or Counter()).
Single pass: For each character c in the string, do freq[c] = freq.get(c, 0) + 1 (or freq[c] += 1 with defaultdict/Counter).
Use the map: Look up freq[char] for any character; missing keys mean count 0 (use freq.get(char, 0) if not using defaultdict/Counter).

Step-by-Step: Check Anagrams via Frequency

If len(s) != len(t), they cannot be anagrams → return False.
Build frequency map for s: one pass, O(n).
Option A: Build frequency map for t and check freq_s == freq_t. Option B: Iterate over t, and for each c decrement freq_s[c]; if c not in map or count becomes negative, return False. Option B uses one map and avoids building a second.

ASCII Diagram: Frequency Map

  s = "aabbbc"

  Pass:  a   a   b   b   b   c
  freq:  a:1  a:2  b:1  b:2  b:3  c:1

  Final freq = { 'a': 2, 'b': 3, 'c': 1 }

  For t = "abc":  need a≥1, b≥1, c≥1.
  freq['a']=2≥1, freq['b']=3≥1, freq['c']=1≥1  →  can form "abc" from s.

Python Implementation

Manual Frequency Map (dict)

def count_freq(s):
    freq = {}
    for c in s:
        freq[c] = freq.get(c, 0) + 1
    return freq

# Example: count_freq("aabbbc") → {'a': 2, 'b': 3, 'c': 1}

Using collections.Counter

from collections import Counter

def count_freq_counter(s):
    return Counter(s)   # one line; Counter is a dict subclass

# Counter("aabbbc") → Counter({'b': 3, 'a': 2, 'c': 1})
# freq['x'] returns 0 if 'x' not present (no KeyError)

Check Anagrams (Two Maps)

def are_anagrams(s, t):
    if len(s) != len(t):
        return False
    return count_freq(s) == count_freq(t)

Check Anagrams (One Map, Decrement)

def are_anagrams_one_map(s, t):
    if len(s) != len(t):
        return False
    freq = {}
    for c in s:
        freq[c] = freq.get(c, 0) + 1
    for c in t:
        if c not in freq or freq[c] == 0:
            return False
        freq[c] -= 1
    return True

Can We Form String t from String s?

def can_form(t, s):
    """Can we form string t using characters from s (each char used at most once)?"""
    freq_s = count_freq(s)
    for c in t:
        if freq_s.get(c, 0) < 1:
            return False
        freq_s[c] -= 1   # or: freq_s[c] = freq_s.get(c, 0) - 1
    return True

Line-by-Line Explanation (One-Map Anagram Check)

if len(s) != len(t): return False: Different lengths ⇒ different total counts ⇒ cannot be anagrams.
First loop: build freq for s. Each character increments its count.
Second loop: for each character in t, we "use" one occurrence. If the character isn't in the map, or we've already used all of them (freq[c] == 0), return False. Otherwise decrement freq[c].
If we finish the second loop without returning, every character in t was matched with a distinct occurrence in s, and lengths are equal, so the strings are anagrams.

Time Complexity

Building the map: One pass over n characters; each update is O(1) amortized. Total O(n).

Comparing two maps (anagrams): Building both maps is O(n + m). Comparing two dicts with at most |Σ| keys is O(|Σ|) or O(min(n, m)) in practice. So overall O(n + m). One-map decrement approach: O(n) for first string, O(m) for second → O(n + m).

Lookup: After the map is built, freq[c] or freq.get(c, 0) is O(1) amortized.

Space Complexity

The frequency map has at most min(n, |Σ|) entries (one per distinct character in the string). So O(min(n, |Σ|)). For ASCII we can say O(1) if we assume |Σ| is constant (128 or 256); for Unicode, O(k) where k is the number of distinct characters. For anagram check with one map we use one such map: O(min(n, |Σ|)).

Edge Cases

Empty string: Frequency map is {}. Two empty strings are anagrams. Forming "" from any string s is true (need zero of each character).
Single character: "a" → {'a': 1}. "a" and "a" are anagrams.
All same character: "aaaa" → {'a': 4}. "aaaa" and "aaaa" are anagrams; "aa" can be formed from "aaaa".
Case and spaces: Often the problem says "ignore case" or "alphanumeric only." Normalize first (e.g. s = s.lower(); s = ''.join(c for c in s if c.isalnum())) before counting.

Common Mistakes

Forgetting length check for anagrams: If you only decrement one map, different-length strings can still pass (e.g. "a" and "aa"—after decrementing, "aa" would try to use 'a' twice). Always check len(s) == len(t) first.
KeyError when looking up missing character: Use freq.get(c, 0) or Counter (which returns 0 for missing keys) instead of freq[c] when the character might not be in the map.
Using a list of size 26 for "lowercase letters only" but not normalizing: If the problem says only 'a'–'z', you can use ord(c) - ord('a') as index—but ensure you've converted to lowercase first.

Common Mistake

Checking anagrams without comparing lengths. If s = "a" and t = "aa", building one map from s gives {'a': 1}. Decrementing for "aa" would fail on the second 'a' (count 0), so you'd correctly return False—but if you only compare maps and forget to build the second map correctly, or if you use a method that doesn't account for multiplicity, you can get wrong results. Always enforce len(s) == len(t) for anagrams.

Brute Force vs Frequency Map

Approach	Anagram check	Time
Sort both and compare	`sorted(s)==sorted(t)`	O(n log n)
Frequency count (dict or Counter)	Two maps or one map + decrement	O(n)

Optimization Insight

Brute: For "are s and t anagrams?", compare every permutation of one to the other—O(n!)—or sort both and compare—O(n log n). Better: Frequency count both strings and compare maps—O(n) time, O(|Σ|) space. For "first non-repeating character," brute is O(n²) (for each position, scan to see if that char appears again); with a frequency map (one pass to count, one pass to find first with count 1) we get O(n).

Pattern Recognition

Think "frequency count" when you see: anagrams, permutation, same characters, rearrange, minimum window containing all characters of t, first unique / first non-repeating, majority element, can form string A from B. If the problem asks "how many of each?" or "do they have the same multiset of elements?", build a map first.

Expert Tip

In Python, Counter from collections is ideal for frequency counting: Counter(s) in one line, and freq[c] returns 0 for missing keys. For "lowercase letters only" and tight constraints, a list of 26 ints with index ord(c)-ord('a') is also O(n) and uses less overhead than a dict. Use dict/Counter when the alphabet is large or unknown.

Interview Insight

For "are two strings anagrams?", state: "We can sort both and compare—O(n log n)—or use a frequency map: one pass per string, compare maps—O(n)." Implement the map approach. Mention edge case: different lengths ⇒ not anagrams. For "first non-repeating character," say: "One pass to build frequency map, second pass to find first character with count 1—O(n) time, O(1) space if alphabet is fixed."

Practice Problems

Check if two strings are anagrams (same characters, same counts).
First non-repeating character in a string (return index or character).
Given two strings s and t, can you form t using characters from s (each at most once)?
Determine if a string can be rearranged into a palindrome (at most one character with odd frequency).
Group anagrams: given a list of strings, group those that are anagrams of each other (use sorted string or frequency tuple as key).

Summary

Frequency counting: one pass over the sequence, for each element increment its count in a dict (or Counter). Time O(n), space O(distinct elements).
Use the map to compare counts (anagrams: same map), check "can form A from B" (freq_B ≥ freq_A for every char), or find first with count 1 (first non-repeating).
Anagram check: length check, then two maps and compare, or one map and decrement for the second string. O(n) time.
Edge cases: empty string, single char, case/non-alphanumeric (normalize first). Use freq.get(c, 0) or Counter to avoid KeyError.

7.3 Anagram Problems

Introduction

Two strings are anagrams of each other if they contain the same characters with the same frequencies, only in a different order. "listen" and "silent" are anagrams; "hello" and "world" are not. Anagram problems are among the most common string questions in interviews and coding rounds: check if two strings are anagrams, group a list of strings into anagram clusters, find all starting indices where a pattern's anagram appears in a text, or compute the minimum number of character changes to make two strings anagrams. All of these rest on the same idea—character frequency—but the problem shape changes (two strings vs many strings, exact match vs sliding window). This topic builds on frequency counting (7.2) and gives you a toolkit for every anagram variant.

Real-World Analogy

Think of anagrams as the same set of letter tiles arranged differently. If you have the tiles L-I-S-T-E-N and your friend has S-I-L-E-N-T, you both have exactly one L, one I, one S, one T, one E, one N. So you can spell the same words with your tiles; the two words you spell are anagrams. A "group anagrams" problem is like sorting a pile of words into buckets: words that use the same multiset of letters go in the same bucket. "Find anagram of pattern in text" is like looking for a contiguous stretch in a long string where the letter counts match the pattern—same multiset, in a window.

Example

"listen" and "silent": both have 1× l, i, s, t, e, n → anagrams. "aab" and "abb": first has 2 a's and 1 b, second has 1 a and 2 b's → not anagrams. Group anagrams: ["eat","tea","ate","tan","nat","bat"] → groups [["eat","tea","ate"], ["tan","nat"], ["bat"]] (same sorted form or same frequency map per group). Find anagrams of "ab" in "cbaebabacd": windows "ba", "ab", "ab" at indices 2, 4, 5 → answer [2, 4, 5].

Formal Definition

Concept Note

Anagram: Strings s and t are anagrams iff they have the same length and the same multiset of characters (i.e. the same frequency map). Equivalently, t is a permutation of s. So anagram relation is symmetric and transitive on a set of strings. Group anagrams: Partition a list of strings so that two strings are in the same group iff they are anagrams. Find anagrams in text: Given text T and pattern P, find every starting index i such that the substring T[i : i+|P|] is an anagram of P. This is a fixed-length sliding window with frequency comparison.

Why This Topic Matters

Interview staple: "Valid Anagram" and "Group Anagrams" are classic LeetCode-style questions. "Find All Anagrams in a String" combines anagrams with sliding window—very common.
Reuses 7.2: Every anagram solution uses frequency counts. Here we apply that in different problem shapes: pair comparison, grouping by key, and sliding window.
Pattern for other problems: "Minimum steps to make two strings anagrams" (count difference); "anagram palindrome" (at most one odd count). Same frequency logic, different output.

Mental Model

Anagrams = same character counts, different order. So compare counts, not order. For two strings: build counts and compare, or build one and decrement with the other. For grouping: assign each string a canonical key (sorted string or tuple of counts) and group by that key. For "find anagram in text": maintain a window of length |P|, keep a frequency map for the window, slide and update the map; when window's map equals pattern's map, record the start index.

Problem 1: Check If Two Strings Are Anagrams

Given strings s and t, return True if they are anagrams, False otherwise.

Approach

If len(s) != len(t), return False. Then either: (A) build frequency maps for both and check freq_s == freq_t, or (B) build one map from s and iterate over t decrementing; if any character is missing or count goes negative, return False.

from collections import Counter

def is_anagram(s, t):
    if len(s) != len(t):
        return False
    return Counter(s) == Counter(t)

# One-map variant (no Counter):
def is_anagram_one_map(s, t):
    if len(s) != len(t):
        return False
    freq = {}
    for c in s:
        freq[c] = freq.get(c, 0) + 1
    for c in t:
        if freq.get(c, 0) == 0:
            return False
        freq[c] -= 1
    return True

Time O(n+m), space O(|Σ|). With length check, n = m.

Problem 2: Group Anagrams

Given a list of strings, group them so that each group contains all anagrams of each other. Return a list of groups (lists of strings).

Approach

Every anagram has the same "signature": either the sorted string (e.g. "eat" → "aet") or a tuple of character counts. Use this as a key in a dict: key → list of strings with that key. One pass over the list: for each string, compute key, append to groups[key]. Return list(groups.values()).

from collections import defaultdict

def group_anagrams(strs):
    groups = defaultdict(list)
    for s in strs:
        key = tuple(sorted(s))   # or: key = ''.join(sorted(s))
        groups[key].append(s)
    return list(groups.values())

# Alternative key: tuple of counts for 'a'..'z' (if lowercase only)
def group_anagrams_count_key(strs):
    groups = defaultdict(list)
    for s in strs:
        count = [0] * 26
        for c in s:
            count[ord(c) - ord('a')] += 1
        groups[tuple(count)].append(s)
    return list(groups.values())

Time: O(n · k log k) with sorted key (n strings, max length k), or O(n · k) with count tuple. Space: O(n · k) for output.

Problem 3: Find All Anagrams in a String

Given string s (text) and string p (pattern), return a list of all start indices in s such that the substring of length len(p) starting at that index is an anagram of p.

Approach

Sliding window of fixed length len(p). Build frequency map need for p. Maintain a window map (or a single map that we update as we slide). For each window, check if the window's character counts match need. When we slide right, remove the character that leaves the window and add the new one. We can track a single map: when we add a char to the window, increment; when we remove, decrement. When the map equals need, append the start index. Comparing two maps each time is O(26) for lowercase; we can instead track "how many distinct chars have the correct count" to get O(1) comparison.

from collections import Counter

def find_anagrams(s, p):
    if len(p) > len(s):
        return []
    need = Counter(p)
    window = Counter(s[:len(p)])
    result = []
    if window == need:
        result.append(0)
    for i in range(len(p), len(s)):
        # add s[i], remove s[i - len(p)]
        window[s[i]] = window.get(s[i], 0) + 1
        left_char = s[i - len(p)]
        window[left_char] -= 1
        if window[left_char] == 0:
            del window[left_char]
        if window == need:
            result.append(i - len(p) + 1)
    return result

Time O(n) (n = len(s)); each step does O(1) or O(26) for map compare. Space O(1) if alphabet size is constant.

Problem 4: Minimum Number of Steps to Make Two Strings Anagrams

You are allowed to change a character in one string to any other. Return the minimum number of such changes so that the two strings become anagrams. (Equivalent: how many characters differ in the multiset view? Or: total length minus "matched" count.)

Approach

Count frequency of each character in both strings. For each character, the "surplus" in one string that we can't match with the other is the extra we must change. One common approach: count frequency of s and t. For each character, the number of changes needed for that character is |freq_s[c] - freq_t[c]|. Sum over all c and divide by 2 (each "change" fixes one excess in one string and one deficit in the other). Alternatively: minimum steps = (sum of (freq_s[c] - freq_t[c]) for c where freq_s[c] > freq_t[c]) = half of sum of absolute differences.

def min_steps_anagram(s, t):
    from collections import Counter
    freq_s = Counter(s)
    freq_t = Counter(t)
    total_diff = 0
    all_keys = set(freq_s) | set(freq_t)
    for c in all_keys:
        total_diff += abs(freq_s.get(c, 0) - freq_t.get(c, 0))
    return total_diff // 2

Time O(n+m), space O(|Σ|).

Evolution: Brute → Sort → Frequency

Approach	Time (two strings)	Note
Check all permutations of one	O(n!)	Impractical
Sort both, compare	O(n log n)	Simple, no extra structure
Frequency count (dict/Counter)	O(n)	Optimal for comparison

Optimization Insight

For two strings: frequency count wins—O(n). For group anagrams, the key must be canonical: sorted string is O(k log k) per string; count tuple is O(k) per string (fixed alphabet). For find anagrams in text, sliding window + frequency map is O(n); avoid re-building the window map from scratch each time—update incrementally when sliding.

Edge Cases

Empty strings: Two empty strings are anagrams. Group anagrams of [""] → [[""]]. Find anagrams of "" in s: every index is valid (often problem says pattern length > 0).
Unequal length: For "are s and t anagrams?", if lengths differ, return False immediately.
Single character: "a" and "a" are anagrams; "a" and "b" are not.
Case and non-letters: Often problems say "lowercase only" or "ignore case and non-alphanumeric." Normalize before counting (e.g. s = ''.join(c.lower() for c in s if c.isalnum())).

Common Mistakes

Forgetting the length check for "are two strings anagrams?"—different lengths can sometimes pass a decrement-based check if you're not careful (e.g. "a" vs "aa": after processing "a", freq is 0; second 'a' in "aa" fails).
In "find anagrams in string," building a new Counter for every window instead of updating the previous window—that makes each step O(k) and total O(n·k). Update in O(1): add one char, remove one char.
In group anagrams, using the string itself as key—"eat" and "tea" would get different keys. Use sorted string or count tuple.

Common Mistake

In find-all-anagrams, comparing the full window map to the pattern map by rebuilding the window from the substring each time: Counter(s[i:i+len(p)]) == need in a loop. That is O(k) per index, so O(n·k). Instead, maintain one window map and update it when sliding: drop s[i-1], add s[i+len(p)-1]—O(1) per step, total O(n).

Pattern Recognition

Keywords: anagram, permutation, same characters, rearrange, group by same letters. If the problem asks "same multiset" or "same characters in any order," use frequency. For "all start indices where window is anagram of pattern," use fixed-length sliding window + frequency map.

Interview Insight

Start with "Two strings are anagrams iff they have the same character counts. So we can sort both and compare—O(n log n)—or use a Counter for each—O(n)." For group anagrams: "Use a canonical key—sorted string or tuple of counts—and group by that key." For find anagrams in string: "Sliding window of length len(p), maintain frequency of the window, update when we slide; when window count matches pattern count, add start index." Mention length check and empty/case edge cases.

Practice Problems

Valid Anagram (LeetCode 242): check if two strings are anagrams.
Group Anagrams (LeetCode 49): group list of strings into anagram groups.
Find All Anagrams in a String (LeetCode 438): return all start indices.
Minimum Number of Steps to Make Two Strings Anagram (LeetCode 1347).
Check if a string can be rearranged into a palindrome (at most one odd count).

Summary

Anagrams = same character counts, any order. Always use frequency (dict/Counter); optionally sort for a canonical key.
Check two strings: length check, then Counter(s)==Counter(t) or one-map decrement. O(n).
Group anagrams: key = sorted(s) or tuple of counts; group by key. O(n·k) or O(n·k log k).
Find anagrams in text: sliding window of length |p|, maintain window frequency, update on slide; when window == need, record start. O(n).
Min steps to anagram: sum of |freq_s[c]-freq_t[c]| over all c, divided by 2. O(n+m).

7.4 Palindrome Problems

Introduction

A palindrome is a string that reads the same forward and backward: "aba", "racecar", "a". Palindrome problems are extremely common in string interviews: check if a string is a palindrome, find the longest palindromic substring, determine if a string can be rearranged into a palindrome, or compute the minimum number of insertions (or deletions) to make a string a palindrome. The core idea is symmetry around a center—either the string equals its reverse, or we expand from centers to find palindromic substrings. This topic ties together string basics (7.1), indexing and two pointers, and frequency counting (7.2) for the "rearrange to palindrome" variant.

Real-World Analogy

Imagine writing a word on a strip of paper and holding it up to a mirror. If the word looks the same in the mirror as it does on the paper (ignoring the mirror flip), it's a palindrome—the first letter matches the last, the second matches the second-to-last, and so on. "NOON" is a palindrome; "NOON" in the mirror still reads NOON. For "longest palindromic substring," you're looking for the longest stretch inside a string that has this mirror property. For "can we rearrange to form a palindrome?", you're asking: can we arrange the letters so that the left half is the mirror of the right? That's possible only if at most one letter appears an odd number of times (that one can go in the middle).

Example

"aba" → same forward and backward → palindrome. "abba" → palindrome. "abc" → not (a≠c). Valid palindrome (ignore non-alphanumeric, case): "A man, a plan, a canal: Panama" → normalize to "amanaplanacanalpanama" → same as reverse → valid. Longest palindromic substring in "babad": "bab" or "aba" (length 3). Can rearrange to palindrome? "aab" → 2 a's, 1 b → one odd (b) → yes, e.g. "aba". "abc" → three odds → no.

Formal Definition

Concept Note

Palindrome: String s of length n is a palindrome iff s[i] = s[n−1−i] for all i in 0..n−1. Equivalently, s = s[::-1] (reverse). A palindromic substring is any contiguous substring that is a palindrome. A string can be rearranged into a palindrome iff at most one character has odd frequency (that character can sit in the center; the rest pair off). Valid palindrome (typical problem): after removing non-alphanumeric characters and normalizing case, the string reads the same forward and backward.

Why This Topic Matters

Interview staple: "Valid Palindrome," "Longest Palindromic Substring," and "Palindrome Number" (digit version) appear constantly. "Minimum insertions to make palindrome" is a classic DP variant.
Two-pointer foundation: Checking a palindrome with two pointers (left and right moving inward) is the standard O(1)-space approach and generalizes to "expand around center" for longest palindromic substring.
Links to 7.2 and 7.11: "Can rearrange to palindrome" uses frequency counts; longest palindromic substring has an optimal linear-time algorithm (Manacher's, topic 7.11). Here we cover the core ideas and O(n²) expand-around-center.

Mental Model

Check palindrome: Compare first with last, second with second-to-last, etc. Either use s == s[::-1] (simple, O(n) space) or two pointers left, right; while left < right, if s[left] != s[right] return False; then left += 1, right -= 1. Longest palindromic substring: For each possible "center" (character or between two characters), expand outward while left and right match; track the longest. Rearrange to palindrome: Count character frequencies; allow at most one character with odd count.

Problem 1: Check If a String Is a Palindrome

Given string s, return True if it reads the same forward and backward, False otherwise.

Approach

Slice: return s == s[::-1]. Time O(n), space O(n) for the reversed copy. Two pointers: left = 0, right = len(s)-1; while left < right, if s[left] != s[right] return False; then left += 1, right -= 1. Return True. Time O(n), space O(1).

def is_palindrome_slice(s):
    return s == s[::-1]

def is_palindrome_two_pointers(s):
    left, right = 0, len(s) - 1
    while left < right:
        if s[left] != s[right]:
            return False
        left += 1
        right -= 1
    return True

Problem 2: Valid Palindrome (Ignore Non-Alphanumeric, Case)

Given a string, consider only alphanumeric characters and ignore case. Return True if the resulting string is a palindrome.

Approach

Normalize: build a string (or list) of alphanumeric characters only, lowercased. Then check that string with two pointers (or slice). Alternatively, use two pointers on the original string: skip non-alphanumeric by advancing left or right until they point to alphanumeric, then compare (lowercase); if unequal return False.

def is_valid_palindrome(s):
    # Normalize then check
    cleaned = ''.join(c.lower() for c in s if c.isalnum())
    left, right = 0, len(cleaned) - 1
    while left < right:
        if cleaned[left] != cleaned[right]:
            return False
        left += 1
        right -= 1
    return True

# O(1) extra space: two pointers on original, skip non-alphanumeric
def is_valid_palindrome_inplace(s):
    left, right = 0, len(s) - 1
    while left < right:
        while left < right and not s[left].isalnum():
            left += 1
        while left < right and not s[right].isalnum():
            right -= 1
        if s[left].lower() != s[right].lower():
            return False
        left += 1
        right -= 1
    return True

Time O(n), space O(n) for cleaned string or O(1) for in-place.

Problem 3: Longest Palindromic Substring

Given string s, return the longest contiguous substring that is a palindrome.

Approach: Expand Around Center

Every palindrome has a "center": either a single character (odd length) or between two characters (even length). For each center (2n−1 possible centers: n for character, n−1 for between), expand left and right while s[left] == s[right]; track the longest substring seen. Total O(n²) time, O(1) space. (Manacher's algorithm, topic 7.11, does O(n).)

def longest_palindrome(s):
    if not s:
        return ""
    n = len(s)
    start, max_len = 0, 1

    def expand(l, r):
        nonlocal start, max_len
        while l >= 0 and r < n and s[l] == s[r]:
            if r - l + 1 > max_len:
                max_len = r - l + 1
                start = l
            l -= 1
            r += 1

    for i in range(n):
        expand(i, i)       # odd-length: center at s[i]
        expand(i, i + 1)   # even-length: center between s[i] and s[i+1]
    return s[start:start + max_len]

Time O(n²), space O(1).

Problem 4: Can String Be Rearranged Into a Palindrome?

Given a string, determine if its characters can be rearranged to form a palindrome (e.g. "aab" → "aba").

Approach

Build character frequency. A palindrome has at most one character with odd count (the center). So: count odds; return sum(1 for c in freq.values() if c % 2 == 1) <= 1.

from collections import Counter

def can_rearrange_palindrome(s):
    freq = Counter(s)
    odd_count = sum(1 for c in freq.values() if c % 2 == 1)
    return odd_count <= 1

Time O(n), space O(|Σ|).

Problem 5: Count Palindromic Substrings

Given string s, return the number of palindromic substrings (every distinct contiguous substring that is a palindrome).

Approach

Same "expand around center" as longest palindromic substring: for each center, expand and for each valid (left, right) where s[left:right+1] is a palindrome, count one. Odd center: start with (i, i); even: (i, i+1). While expanding, each time s[l]==s[r] we have one more palindromic substring.

def count_palindromic_substrings(s):
    n = len(s)
    count = 0

    def expand(l, r):
        nonlocal count
        while l >= 0 and r < n and s[l] == s[r]:
            count += 1
            l -= 1
            r += 1

    for i in range(n):
        expand(i, i)
        expand(i, i + 1)
    return count

Time O(n²), space O(1).

Evolution: Check Palindrome

Approach	Time	Space
`s == s[::-1]`	O(n)	O(n)
Two pointers (left/right)	O(n)	O(1)

Optimization Insight

For check palindrome: two pointers give O(1) space; slice is simpler but O(n) space. For longest palindromic substring, expand-around-center is O(n²). Manacher's algorithm (topic 7.11) achieves O(n) by reusing information from previous centers. For count palindromic substrings, expand-around-center is standard and optimal without Manacher.

Edge Cases

Empty string: Often considered a palindrome (length 0). Check problem statement.
Single character: Always a palindrome.
Valid palindrome: String with no alphanumeric characters (e.g. " ") → empty after cleaning → typically true (empty is palindrome).
Longest palindromic substring: If no palindrome of length > 1, return any single character (e.g. s[0]).

Common Mistakes

In "valid palindrome," forgetting to skip non-alphanumeric or to normalize case—"A" and "a" must be treated as the same.
In expand-around-center, only expanding from characters and forgetting even-length palindromes (center between two chars)—so you miss "abba". Always try both expand(i,i) and expand(i,i+1).
For "rearrange to palindrome," confusing with "is the string already a palindrome?"—we only need at most one odd count, not that the string equals its reverse.

Common Mistake

When implementing expand-around-center for longest palindromic substring, using only one type of center (e.g. odd-length only). Even-length palindromes like "abba" have their center between two characters; you must call expand(i, i+1) for each i as well as expand(i, i).

Pattern Recognition

Keywords: palindrome, reads same forward and backward, symmetric, longest palindromic, rearrange to palindrome. Check → two pointers or reverse. Longest/count palindromic substrings → expand around center (or Manacher for linear). Rearrange → frequency count, at most one odd.

Expert Tip

For "valid palindrome" with only alphanumeric and case ignored, the in-place two-pointer version avoids building a new string and is O(1) space. For longest palindromic substring, expand-around-center is interview-friendly and O(n²); mention that Manacher's gives O(n) if the interviewer asks for optimal.

Interview Insight

"Is it a palindrome?" → "We can compare with reverse—s == s[::-1]—or use two pointers from both ends, O(1) space." For longest palindromic substring: "For each center (character or between two), expand while characters match; track the longest. O(n²) time, O(1) space. There's also Manacher's algorithm for O(n)." For "can rearrange to palindrome": "Count character frequencies; we need at most one character with odd count—O(n) with a Counter."

Practice Problems

Valid Palindrome (LeetCode 125): ignore non-alphanumeric, case.
Longest Palindromic Substring (LeetCode 5): expand around center or Manacher.
Palindromic Substrings (LeetCode 647): count palindromic substrings.
Palindrome Permutation (LeetCode 266): can string be rearranged to palindrome?
Minimum Insertions to Make Palindrome (DP or longest palindromic subsequence).

Summary

Palindrome = reads same forward and backward. Check: s == s[::-1] or two pointers. O(n) time; two pointers use O(1) space.
Valid palindrome: Normalize (alphanumeric, lowercase) then check, or two pointers skipping non-alphanumeric.
Longest palindromic substring: Expand around center (odd and even centers); O(n²). Manacher's (7.11) is O(n).
Can rearrange to palindrome: At most one character with odd frequency. Build freq map, count odds ≤ 1.
Count palindromic substrings: Same expand-around-center; increment count for each valid expansion.

7.5 Pattern Matching

Introduction

Pattern matching (or string search) is the problem: given a text string T of length n and a pattern string P of length m, find all starting positions (or the first position) in T where P occurs as a contiguous substring. For example, pattern "ab" in text "cababc" occurs at indices 1 and 4. This is one of the most studied problems in string algorithms: search in a document, DNA sequence, or log file. The naive approach—try every possible start index and compare character by character—is correct and easy to implement; it runs in O(n·m) in the worst case. Faster algorithms (Rabin-Karp with hashing, KMP, Z algorithm) avoid re-scanning the text unnecessarily and achieve O(n + m). This topic introduces the problem and the naive method; topics 7.6–7.10 cover hashing and linear-time algorithms.

Real-World Analogy

Imagine searching for a phrase in a book. You slide a bookmark (the "window") along the page. At each position you check whether the characters under the window match the phrase. If they do, you've found an occurrence. If not, you move the window one position to the right and try again. The naive method does exactly that: for every starting position, compare the next m characters with the pattern. Smarter methods (KMP, etc.) use the fact that after a mismatch, you sometimes know enough to skip ahead more than one position—like remembering "we already matched 'ab', so if we fail on the next character we can try aligning the pattern so that the 'ab' we already saw is reused."

Example

Text T = "abcabc", pattern P = "abc". Start at 0: T[0:3] = "abc" matches → index 0. Start at 1: "bca" ≠ "abc". Start at 2: "cab" ≠ "abc". Start at 3: T[3:6] = "abc" matches → index 3. Result: [0, 3]. With P = "abd", no matches. Pattern longer than text (e.g. P length 10, T length 5) → no valid start index, return [].

Formal Definition

Concept Note

Pattern matching (exact): Given text T[0..n−1] and pattern P[0..m−1], find all indices i in [0, n−m] such that T[i..i+m−1] = P (character by character). We require m ≤ n; otherwise there are no valid positions. Output: list of start indices, or −1 / empty list if none. Variants: find first occurrence only, or count occurrences. The problem is also called "substring search" or "find pattern in text."

Why This Topic Matters

Fundamental problem: Search engines, editors, and bioinformatics all do pattern matching. Understanding the naive method and its cost motivates KMP and hashing (7.6–7.10).
Interview baseline: "Find first occurrence of pattern in text" can be solved with str.find in Python, but interviewers often want you to implement the naive loop or discuss how to do it in O(n + m) with KMP.
Building block: Many string problems (e.g. "repeated substring," "longest repeating substring") use substring search or similar ideas. Pattern matching is the core.

Mental Model

Slide a window of length m over the text. For each start index i, the window is T[i..i+m−1]. Check if the window equals P by comparing T[i+j] with P[j] for j = 0..m−1. If any j fails, try the next i. If all j match, record i and then try the next i. The naive method never "skips" i based on what we learned from a mismatch; that's what KMP improves on.

Step-by-Step: Naive (Brute-Force) Algorithm

If len(P) > len(T), return [] (no possible match).
For i = 0 to n - m (inclusive): the candidate start index is i.
For j = 0 to m - 1: if T[i+j] != P[j], break (mismatch at this i).
If the inner loop completed without breaking, then T[i:i+m] == P; append i to the result.
Return the list of start indices.

ASCII Diagram: Naive Search

  T = "abcabc"   P = "abc"   n=6, m=3

  i=0:  [a b c] a b c   compare T[0..2] with P → match → add 0
  i=1:    a [b c a] b c   T[1..3] = "bca" ≠ "abc" → skip
  i=2:    a b [c a b] c   T[2..4] = "cab" ≠ "abc" → skip
  i=3:    a b c [a b c]   T[3..5] = "abc" → match → add 3

  Result: [0, 3]

Python Implementation

Naive: Find All Occurrences

def find_pattern_naive(text, pattern):
    n, m = len(text), len(pattern)
    if m > n:
        return []
    result = []
    for i in range(n - m + 1):
        match = True
        for j in range(m):
            if text[i + j] != pattern[j]:
                match = False
                break
        if match:
            result.append(i)
    return result

Naive: Find First Occurrence Only

def find_first_naive(text, pattern):
    n, m = len(text), len(pattern)
    if m > n:
        return -1
    for i in range(n - m + 1):
        for j in range(m):
            if text[i + j] != pattern[j]:
                break
        else:
            return i   # no break in inner loop
    return -1

Using Python's str.find

# First occurrence
first = text.find(pattern)   # -1 if not found

# All occurrences (loop with start parameter)
def find_all_builtin(text, pattern):
    result = []
    start = 0
    while True:
        i = text.find(pattern, start)
        if i == -1:
            break
        result.append(i)
        start = i + 1   # next search after this match
    return result

Line-by-Line Explanation (Naive)

if m > n: return []: No room for pattern; no valid start index.
for i in range(n - m + 1): Valid start indices are 0 through n−m (inclusive). So n−m+1 positions.
Inner loop: for each j from 0 to m−1, compare text[i+j] with pattern[j]. If any mismatch, set match = False and break.
If we never broke, match is still True → full match at i → append i.

Time Complexity

Worst case: Pattern never matches (e.g. T = "aaa...a", P = "aab"). At each of the (n−m+1) start positions we may compare up to m characters before a mismatch. Total comparisons can be (n−m+1)·m = O(n·m). When m is small or matches are rare, behavior is closer to O(n).

Best case: First character of P never matches in T (e.g. P starts with 'x', T has no 'x'). Then at each i we do one comparison and break. Total O(n).

Python str.find: Implementations typically use a mix of strategies; worst case can still be O(n·m) for a naive implementation, but in practice often better for random text.

Space Complexity

Only a few variables (indices, result list). The result list holds at most O(n) indices (if the pattern is length 1 and every character matches). So O(n) for the output; O(1) auxiliary space excluding output.

Edge Cases

Empty pattern: Usually defined as matching everywhere (every index is a "start"). Check problem: find("abc", "") might return [0,1,2,3] or similar. Often problems assume m ≥ 1.
Pattern longer than text: Return [] or −1.
Pattern equals text: One match at index 0.
Pattern not in text: Return [] or −1.
Overlapping matches: e.g. T = "aaa", P = "aa". Naive finds [0, 1]. Built-in find with start = i+1 after each match also finds overlapping occurrences.

Common Mistakes

Off-by-one in the range of i: valid i goes from 0 to n - m inclusive, so range(n - m + 1). Using range(n - m) misses the last valid start when n > m.
Comparing with pattern[j] but indexing text with i + j; forgetting the + j and writing text[i] in the inner loop (wrong).

Common Mistake

The last valid start index for the pattern is n - m, not n - m - 1. So the loop must be for i in range(n - m + 1). If you use range(n - m), you never check the window that starts at index n−m (e.g. T of length 6, P of length 3: you must check i = 0,1,2,3; i=3 is 6-3 = 3).

Evolution: Naive → Hashing → KMP

Method	Time (typical)	Topic
Naive (try every start, compare)	O(n·m)	7.5
Rabin-Karp (rolling hash)	O(n + m) average	7.7, 7.8
KMP (failure function)	O(n + m)	7.9

Optimization Insight

Brute force is correct and often acceptable when the text and pattern are small or when you only need one match and the pattern is unlikely to match at every position. For large inputs or when you need guaranteed linear time, use Rabin-Karp (hashing; topic 7.8) or KMP (topic 7.9). KMP avoids re-scanning the text by precomputing a "failure" function on the pattern and shifting the pattern intelligently after a mismatch.

Pattern Recognition

Keywords: find pattern, substring search, needle in haystack, first occurrence of, all occurrences. When the problem is "exact match" of a contiguous substring, it's pattern matching. Use naive for interviews unless asked for O(n+m); mention KMP or hashing as the next step.

Expert Tip

In Python, text.find(pattern) returns the first index or −1; pattern in text returns a boolean. For "all occurrences," loop with start = text.find(pattern, start) + 1 until find returns −1. For implementing from scratch, the naive double loop is expected; stating "we could optimize with KMP for O(n+m)" shows you know the theory.

Interview Insight

"Find pattern in text" → "We can try every start index from 0 to n−m and compare the next m characters with the pattern. That's O(n·m) worst case. In Python we'd use str.find. For O(n+m) we can use KMP or Rabin-Karp with rolling hash." Implement the naive loop; mention edge case: pattern longer than text → no match. Last valid start index is n−m, so loop is range(n - m + 1).

Practice Problems

Implement strStr() / Find First Occurrence (LeetCode 28): return first index of pattern in text, or −1.
Find all occurrences of a pattern in a long text (naive then try Rabin-Karp or KMP).
Repeated Substring Pattern: can the string be written as a shorter pattern repeated? (Uses substring check.)

Summary

Pattern matching: find start indices in text T where pattern P occurs as a contiguous substring. Require m ≤ n.
Naive algorithm: For i in 0..n−m, compare T[i..i+m−1] with P character by character. Record i if match. Time O(n·m), space O(1) auxiliary.
Valid start indices: 0 to n−m inclusive → range(n - m + 1). Use text[i+j] and pattern[j] in inner loop.
Faster methods: Rabin-Karp (7.8), KMP (7.9), Z algorithm (7.10) achieve O(n + m). Naive is the baseline; use it unless linear time is required.

7.6 String Hashing

Introduction

String hashing assigns a numeric value (a hash) to a string so that we can compare two strings by comparing their hashes: if the hashes are equal, the strings are likely equal (with a small collision probability); if the hashes differ, the strings are definitely different. The standard approach is the polynomial rolling hash: treat the string as a number in base B, with each character as a digit, and take the value modulo a large prime M. With precomputed prefix hashes and powers of B, we can compute the hash of any substring in O(1)—which is the key to Rabin-Karp pattern matching (7.8) and to quickly comparing substrings in problems like "longest duplicate substring." This topic builds the hash function and the prefix structure; 7.7 Rolling Hash and 7.8 Rabin-Karp use it for sliding-window updates.

Real-World Analogy

Think of a string as a number in a strange base: 'a' = 1, 'b' = 2, … (or use character codes). The string "abc" might become the number 1·B² + 2·B + 3 for some base B. Two different strings usually produce different numbers; we then take that number modulo a large prime to keep it in a manageable range. Comparing two long strings by their hashes is like comparing two people by their ID numbers: if the IDs match, it's almost certainly the same person; if they don't match, they're different. The "almost" is because two different people could in theory have the same ID (hash collision)—we make that very unlikely by choosing a large modulus and sometimes using two hashes.

Example

String "ab" with base B=31, treating 'a'=1, 'b'=2: hash = 1·31 + 2 = 33 (before mod). With modulus M=10⁹+7: hash = 33. For "abc": 1·31² + 2·31 + 3. Substring s[1:3] = "bc": we can get it as (hash of "abc") − (hash of "a")·31², then mod M—that's the idea behind prefix hashes. So we precompute hash of every prefix s[0:i]; then hash(s[i:j]) = (prefix[j] − prefix[i]·B^(j−i)) mod M (with proper handling of mod).

Formal Definition

Concept Note

Polynomial rolling hash: For string s of length n, assign a numeric value to each character (e.g. ord(c) or ord(c) - ord('a') + 1). The hash of s is H(s) = (s₀·B^(n−1) + s₁·B^(n−2) + … + s_(n−1)) mod M, where B is the base (e.g. 31, 131) and M is a large prime (e.g. 10⁹+7). Prefix hash: P[i] = hash of s[0..i−1] (length i). Then hash of s[i..j−1] = (P[j] − P[i]·B^(j−i)) mod M, using modular arithmetic. We need precomputed B^k for shifts.

Why This Topic Matters

Rabin-Karp (7.8): Pattern matching in O(n+m) average time by comparing pattern hash with each window hash; the window hash is updated in O(1) using a "rolling" update (topic 7.7).
Substring comparison in O(1): Problems like "longest repeated substring," "compare two substrings," or "number of distinct substrings" can use hashing to compare substrings in O(1) after O(n) preprocessing.
Interview and contests: String hashing is a standard tool when you need fast equality checks on substrings without building suffix structures (e.g. suffix array 7.12).

Mental Model

String → number in base B (with mod M). Same string → same number. Different strings → usually different numbers (collision possible but rare for good B, M). Store hash of each prefix; then substring s[i:j] has hash = (prefix[j] − prefix[i]·B^(j−i)) mod M. We precompute pow[i] = B^i % M so that multiplying by B^(j−i) is O(1).

Polynomial Hash Formula

For string s with character values c₀, c₁, …, c_(k−1) (length k):

H = (c₀·B^(k−1) + c₁·B^(k−2) + … + c_(k−1)·B^0) mod M

So we can compute H in one pass: start with 0, then for each character do H = (H * B + char_value) % M. For substring s[i:j] (length L = j−i), we need H_sub = (P[j] − P[i]·B^L) mod M. To avoid negative values: H_sub = (P[j] − (P[i] · pow[L]) % M + M) % M.

Step-by-Step: Building Prefix Hashes

Choose base B (e.g. 31, 131) and modulus M (e.g. 10**9+7).
Precompute pow[i] = (B ** i) % M for i = 0..n (or compute on the fly).
prefix[0] = 0. For i from 1 to n: prefix[i] = (prefix[i-1] * B + char_value(s[i-1])) % M. Here prefix[i] is the hash of s[0:i].
Hash of substring s[i:j] = (prefix[j] - prefix[i] * pow[j-i]) % M, then (... + M) % M to keep non-negative.

Python Implementation

Single String Hash and Prefix Hashes

def build_hashes(s, B=31, M=10**9+7):
    """Return (prefix hashes, powers of B). prefix[i] = hash of s[0:i]."""
    n = len(s)
    prefix = [0] * (n + 1)
    pow_B = [1] * (n + 1)
    for i in range(1, n + 1):
        pow_B[i] = (pow_B[i-1] * B) % M
        # char value: ord(s[i-1]) or (ord(s[i-1]) - ord('a') + 1) for lowercase
        val = ord(s[i-1])
        prefix[i] = (prefix[i-1] * B + val) % M
    return prefix, pow_B

def substr_hash(prefix, pow_B, i, j, M=10**9+7):
    """Hash of s[i:j] (0-indexed, j exclusive)."""
    L = j - i
    h = (prefix[j] - prefix[i] * pow_B[L]) % M
    if h < 0:
        h += M
    return h

Compute Hash of a String (No Prefix)

def string_hash(s, B=31, M=10**9+7):
    h = 0
    for c in s:
        h = (h * B + ord(c)) % M
    return h

Line-by-Line: Substring Hash Formula

prefix[i] is the hash of s[0:i], i.e. s₀·B^(i−1) + s₁·B^(i−2) + … + s_(i−1). So prefix[j] = hash of s[0:j]. The substring s[i:j] has hash = s_i·B^(L−1) + … + s_(j−1)·B^0. We have prefix[j] = prefix[i]·B^L + (s_i·B^(L−1) + … + s_(j−1)). So hash(s[i:j]) = prefix[j] − prefix[i]·B^L. Taking mod M: (prefix[j] - prefix[i] * pow_B[L]) % M. Adding M if negative keeps the result in [0, M−1].

Time Complexity

Building prefix hashes: One pass over the string, O(1) per character. Precomputing pow_B is O(n). Total O(n).

Single substring hash: O(1) using prefix and pow_B.

Hash of a string (no prefix): O(k) for string of length k.

Space Complexity

Prefix array: O(n). Power array: O(n). So O(n) for the precomputed structures. Single hash value is O(1).

Collisions and Double Hashing

Two different strings can have the same hash (collision). With one modulus M ≈ 10⁹, the probability of a collision among n comparisons is roughly n²/(2M)—acceptable for many contests. To reduce collision risk, use two bases/moduli and compare both hashes; only if both match do we declare equality (and optionally verify with one character comparison).

# Double hash: (h1, h2) with (B1, M1) and (B2, M2)
# Equal only if h1 and h2 both match. Very low collision rate.

Edge Cases

Empty substring: s[i:i] has length 0. Hash is 0 by convention (prefix[i] − prefix[i]·B^0 = 0).
Single character: s[i:i+1] has hash = (prefix[i+1] − prefix[i]·B) % M = ord(s[i]) if we define prefix and pow correctly.
Full string: s[0:n] hash = prefix[n], which is the standard hash of the whole string.
Modulo: Use (x % M + M) % M when subtracting to avoid negative values.

Common Mistakes

Forgetting to take modulo after each operation—intermediate values can overflow. In Python integers don't overflow, but taking mod at each step keeps numbers small and matches the math.
Wrong substring hash formula: it must be prefix[j] - prefix[i] * pow_B[j-i], not prefix[j] − prefix[i] (that would ignore the shift). Length of substring is j−i, so we multiply prefix[i] by B^(j−i).
Using a small modulus (e.g. 1000) or base that's too small—collisions become likely. Use prime M ≥ 10⁹ and base B > |alphabet|.

Common Mistake

The hash of s[i:j] is not prefix[j] − prefix[i]. It is prefix[j] − prefix[i]·B^(j−i), because prefix[i] represents the hash of the first i characters, which in the full prefix[j] are shifted by (j−i) positions. So you must multiply prefix[i] by B^(j−i) (i.e. pow_B[j-i]) before subtracting.

Optimization Insight

Precompute prefix hashes and powers once in O(n). Then any substring hash is O(1). For Rabin-Karp we don't need all prefix hashes—we use a rolling hash: update the current window hash in O(1) when sliding by one character (add new char, remove old; topic 7.7). So pattern matching can be done in O(n) time with O(1) extra space for the hash state (plus O(m) for the pattern hash).

Pattern Recognition

Use string hashing when you need to compare substrings quickly (e.g. "are these two substrings equal?", "longest duplicate substring," "number of distinct substrings") or when implementing Rabin-Karp. If the problem only needs exact pattern match once, naive or KMP might be simpler; hashing shines when you compare many substrings or when rolling window updates are natural.

Expert Tip

Common choices: B = 31 or 131, M = 10**9+7 or 10**9+9. For lowercase letters only, you can use ord(c) - ord('a') + 1 so values are 1–26 (avoids leading zeros). For arbitrary ASCII/Unicode, ord(c) is fine. Always use a prime modulus to reduce collisions.

Interview Insight

"We can hash a string with a polynomial rolling hash: base B, modulus M, H = (c0*B^(k-1) + ... + c_{k-1}) mod M. With prefix hashes and precomputed powers, we get the hash of any substring in O(1). That lets us compare substrings in O(1) and is the basis for Rabin-Karp. Collisions are possible; we can use two moduli to make them very rare." Implement prefix build and substring hash; mention use in pattern matching (7.8).

Practice Problems

Implement polynomial hash and prefix hashes; compute hash of any substring in O(1).
Longest Duplicate Substring: use binary search on length + hashing to check if there's a duplicate of length L.
Compare two substrings of a string in O(1) after O(n) preprocessing (hash equality).

Summary

Polynomial rolling hash: H(s) = (c0·B^(k−1) + … + c_(k−1)) mod M. Same string → same hash; different strings usually different (collision possible).
Prefix hashes: prefix[i] = hash of s[0:i]. Build in O(n). Hash of s[i:j] = (prefix[j] − prefix[i]·B^(j−i)) mod M; use pow_B for B^(j−i).
Choose prime M (e.g. 10⁹+7), base B > alphabet size. Double hashing (two moduli) reduces collisions.
Used in Rabin-Karp (7.8) and for O(1) substring comparison after O(n) preprocessing.

7.7 Rolling Hash

Introduction

A rolling hash (or sliding window hash) is a way to update the hash of a fixed-length window when the window slides by one position: instead of recomputing the hash of the new window from scratch (O(m) for a window of length m), we remove the contribution of the character that left the window, shift the remaining hash, and add the new character—all in O(1). This is the engine behind Rabin-Karp pattern matching (7.8): we compute the pattern's hash once, then compute the hash of the first window of the text; for each subsequent position we roll the hash in O(1) and compare with the pattern hash. So we scan the text in O(n) with only O(1) work per position, giving O(n + m) total (m for the pattern, n for the text).

Real-World Analogy

Imagine a train with m cars (the window). Each car has a number (character value). The "hash" of the train is a combination of those numbers in order. When the train moves forward one track position, the front car leaves and a new car joins at the back. Instead of recalculating the whole combination from the new set of cars, we subtract the contribution of the car that left, shift the rest (as if every remaining car moved one position in the formula), and add the new car's contribution. One subtraction, one shift, one addition—constant time.

Example

Text T = "abcde", window length m = 3, base B = 31, mod M. Window at index 0: "abc" → hash H0 = a·31² + b·31 + c. Window at index 1: "bcd". Naive: recompute b·31² + c·31 + d. Rolling: H0 corresponds to (a, b, c). To get (b, c, d): remove a's contribution (a·31²), multiply the rest by 31 (so b·31² + c·31), then add d. So H1 = (H0 − a·31²)·31 + d = H0·31 − a·31³ + d. Precompute 31² and 31³ (or 31^(m−1) and 31^m) so each step is O(1).

Formal Definition

Concept Note

Rolling hash update: Let the current window be T[i..i+m−1] with hash H = (T[i]·B^(m−1) + T[i+1]·B^(m−2) + … + T[i+m−1]) mod M. The next window is T[i+1..i+m] with hash H' = (T[i+1]·B^(m−1) + … + T[i+m]) mod M. We have H' = (H − T[i]·B^(m−1))·B + T[i+m], all mod M. So: subtract the leftmost character's contribution (times B^(m−1)), multiply by B (shift), add the new rightmost character. Precompute B^(m−1) mod M once; then each roll is O(1).

Why This Topic Matters

Rabin-Karp (7.8): Pattern matching uses one hash for the pattern and a rolling hash for the text window. Without the roll, each window would cost O(m), giving O(n·m) again; with the roll, O(n + m).
Fixed-length window problems: Any problem that asks "for every contiguous block of length m" (e.g. distinct substrings of length m, or "find all positions where window equals X") can use a rolling hash to update in O(1) per step.
Efficiency: Going from O(m) per window to O(1) per window is what makes hash-based pattern matching competitive with KMP.

Mental Model

Current window hash = value of the string in the window. When we slide right by one: (1) the leftmost character "leaves"—subtract its contribution, which is left_char * B^(m-1); (2) the rest of the window "moves left" in our formula—multiply the remaining hash by B; (3) the new character "enters" at the right—add it (with factor B^0 = 1). So: new_hash = (old_hash - left_char * B^(m-1)) * B + new_char, then mod M. Keep B^(m-1) % M stored.

Step-by-Step: Rolling Update

Compute the hash of the first window T[0..m−1] (one pass, O(m)). Store as cur.
Precompute base_pow = B^(m−1) % M (or use a precomputed array from 7.6).
For i from 1 to n−m (each slide): left character = T[i−1], new character = T[i+m−1]. cur = ((cur - left_char * base_pow) * B + new_char) % M. Handle negative: cur = (cur % M + M) % M after the subtraction step if needed.
After each update, cur is the hash of T[i..i+m−1]. Compare with pattern hash as needed.

ASCII Diagram: One Roll

  Window length m = 3.  Current window: T[i..i+2] = "abc"
  Hash H = a·B² + b·B + c

  Slide right by 1:  new window = "bcd"
  - Remove a:  H - a·B²   →  b·B + c
  - Shift:     (b·B + c)·B  =  b·B² + c·B
  - Add d:     b·B² + c·B + d  =  new hash H'

  So  H' = (H - a·B²)·B + d

Python Implementation

Compute First Window Hash

def first_window_hash(s, m, B=31, M=10**9+7):
    """Hash of s[0:m] (length m)."""
    h = 0
    for i in range(m):
        h = (h * B + ord(s[i])) % M
    return h

Rolling Hash: Update One Step

def roll_hash(cur, left_char, new_char, base_pow, B=31, M=10**9+7):
    """Given hash of window [i..i+m-1], return hash of [i+1..i+m].
    left_char = T[i], new_char = T[i+m]."""
    cur = (cur - ord(left_char) * base_pow) % M
    cur = (cur * B + ord(new_char)) % M
    if cur < 0:
        cur += M
    return cur

Full Loop: All Window Hashes (Rabin-Karp Style)

def rolling_hashes(text, m, B=31, M=10**9+7):
    """Yield hash of text[i:i+m] for i = 0, 1, ..., n-m."""
    n = len(text)
    if m > n:
        return
    base_pow = pow(B, m - 1, M)
    cur = first_window_hash(text, m, B, M)
    yield cur
    for i in range(1, n - m + 1):
        cur = roll_hash(cur, text[i-1], text[i+m-1], base_pow, B, M)
        yield cur

Line-by-Line: Roll Formula

cur is (T[i-1]·B^(m−1) + T[i]·B^(m−2) + … + T[i+m−2]). We want the hash of T[i..i+m−1], i.e. T[i]·B^(m−1) + … + T[i+m−1]. Subtract T[i−1]·B^(m−1) from cur to get (T[i]·B^(m−2) + … + T[i+m−2]). Multiplying by B gives (T[i]·B^(m−1) + … + T[i+m−2]·B). Adding T[i+m−1] gives the new hash. So cur = (cur - ord(text[i-1])*base_pow)*B + ord(text[i+m-1]), then mod M.

Time Complexity

First window: O(m). Each roll: O(1). Total for n−m+1 windows: O(m) + (n−m)·O(1) = O(n). So we get the hash of every length-m window in O(n), which is the key to Rabin-Karp's O(n + m).

Space Complexity

O(1) for the current hash, the base power, and a few variables. We don't store all window hashes unless we need them; in Rabin-Karp we only compare with the pattern hash. O(1) auxiliary space.

Edge Cases

m > n: No window; don't run the loop or return empty.
m == 1: base_pow = B^0 = 1. Roll: cur = (cur - left_char)*B + new_char; correct.
Negative modulo: In Python, (cur - left_char * base_pow) can be negative; take (... % M + M) % M before multiplying by B, or ensure the final cur is in [0, M−1].

Common Mistakes

Using the wrong power: we subtract left_char * B^(m−1), not B^m. The leftmost character in the window has weight B^(m−1) in the polynomial.
Off-by-one in indices: when at position i, the window is T[i..i+m−1]; the character that just left is T[i−1], the new one is T[i+m−1].
Forgetting to take mod after each operation can lead to huge numbers (in other languages, overflow). In Python it's correct but slow; taking mod keeps values bounded.

Common Mistake

The leftmost character in the current window has coefficient B^(m−1), not B^m. So we subtract left_char * B^(m−1). If you use B^m you're shifting one extra time and the hash will be wrong. Precompute base_pow = pow(B, m - 1, M).

Optimization Insight

Without rolling: n−m+1 windows, each hash computed in O(m) → O(n·m). With rolling: O(m) for the first window, then O(1) per slide → O(n). The only extra storage is B^(m−1) mod M and the current hash. This is why Rabin-Karp can be O(n + m) instead of O(n·m).

Connection to Rabin-Karp (7.8)

In Rabin-Karp we compute the pattern hash once (O(m)). Then we iterate over start indices i = 0..n−m: for i=0 we get the first window hash (O(m)); for i≥1 we roll from the previous hash (O(1)). When the current window hash equals the pattern hash, we optionally verify with a character-by-character check (to handle collisions). Total O(n + m). Topic 7.8 implements the full algorithm.

Expert Tip

Use the same base and modulus as in 7.6 for consistency. When implementing Rabin-Karp, compute the pattern hash with the same formula as the first window (so that matching strings have equal hashes). For double hashing, maintain two rolling hashes with two (B, M) pairs and compare both.

Interview Insight

"When we slide the window right by one, we remove the left character and add the right. The hash update is: subtract left_char·B^(m−1), multiply by B, add new_char—all mod M. That's O(1) per position, so we can get every length-m window hash in O(n). That's the rolling hash used in Rabin-Karp." Implement first_window_hash and roll_hash; mention that B^(m−1) is precomputed.

Practice Problems

Implement rolling hash: first window, then roll step. Verify that hashes match prefix-based substring hash for the same window.
Rabin-Karp (7.8): use rolling hash to find all positions where pattern occurs in text.
Find all distinct substrings of length m in a string: use a set of rolling hashes (or double hash) to count distinct windows.

Summary

Rolling hash: Update the hash of a length-m window in O(1) when sliding right by one. Formula: new_hash = (old_hash − left_char·B^(m−1))·B + new_char (mod M).
Precompute B^(m−1) % M once. First window hash computed in O(m); then each of the (n−m) rolls is O(1) → total O(n).
Used in Rabin-Karp (7.8) for O(n + m) pattern matching. Also for any fixed-length sliding window problem where window equality is checked via hash.

7.8 Rabin-Karp Algorithm

Introduction

The Rabin-Karp algorithm is a pattern-matching algorithm that uses hashing (7.6) and a rolling hash (7.7) to find all occurrences of a pattern P in text T in O(n + m) average time, where n = |T| and m = |P|. Instead of comparing the pattern character-by-character at every start index (naive O(n·m)), we compute the hash of the pattern once, then scan the text with a sliding window of length m, updating the window hash in O(1) per step. When the window hash equals the pattern hash, we have a candidate match; we then optionally verify with a character-by-character check to rule out hash collisions. Rabin-Karp is simple to implement, extends naturally to multiple patterns or 2D matching, and is a classic interview topic alongside KMP (7.9).

Real-World Analogy

Imagine you have a stencil (the pattern) and you're sliding it along a long strip of text. Instead of comparing every letter under the stencil with the stencil at each position, you assign a "fingerprint" (hash) to the stencil and a fingerprint to whatever is under the window. You slide the window one step, update the fingerprint in constant time (rolling hash), and compare. When the fingerprints match, you double-check by looking at the letters—in case two different texts had the same fingerprint (collision). So you do fast filter by hash, then confirm when needed.

Example

Text T = "cababc", pattern P = "ab". Pattern hash H(P) = h("ab"). First window T[0:2] = "ca" → H("ca") ≠ H(P). Roll: window "ab" → H("ab") = H(P) → candidate at index 1; verify: T[1:3] == "ab" ✓. Roll: window "ba" → no match. Roll: window "ab" at index 4 → hash match, verify ✓. Result: [1, 4].

Formal Definition

Concept Note

Rabin-Karp: (1) Compute hash H(P) of pattern P (length m) using the same polynomial hash as in 7.6. (2) Compute hash of T[0..m−1]. (3) For i = 0..n−m: if hash(T[i..i+m−1]) = H(P), then either report i as a match or verify by comparing T[i..i+m−1] with P character-by-character; then update the window hash to T[i+1..i+m] using the rolling update (7.7). Same base B and modulus M for pattern and text. Verification: On hash match, if we verify and it's a false positive, cost is O(m); worst-case many false positives → O(n·m). With a good hash, expected verification cost is low → O(n + m) average.

Why This Topic Matters

O(n + m) pattern matching: With rolling hash, we avoid O(m) work per position; combined with sparse verification, average time is linear. Alternative to KMP (7.9) with different trade-offs.
Multiple patterns: Rabin-Karp extends to searching for k patterns at once: compute k pattern hashes, put in a set; at each window, check if the window hash is in the set. KMP would need k passes or Aho-Corasick.
2D and other variants: Rolling hash can be extended to 2D (e.g. find a small image in a large one) by hashing rows and then columns. Conceptually the same idea.

Mental Model

One pattern hash. One window hash that rolls. At each position: if window hash == pattern hash, we have a candidate—verify (optional but recommended) and report. Then roll the window hash to the next position. Repeat. Verification is the safety net for collisions.

Step-by-Step Algorithm

If m > n, return [] (no possible match).
Choose B and M (e.g. B=31, M=10⁹+7). Precompute base_pow = B^(m−1) % M.
Compute pattern_hash = hash of P[0..m−1] (same formula as first window: H = (H*B + ord(c)) % M).
Compute window_hash = hash of T[0..m−1].
For i from 0 to n−m: if window_hash == pattern_hash, optionally verify T[i..i+m−1] == P; if match (or skip verify), append i to result. If i < n−m, roll: window_hash = (window_hash - ord(T[i])*base_pow)*B + ord(T[i+m]), then mod M.
Return the list of start indices.

Python Implementation

def rabin_karp(text, pattern, B=31, M=10**9+7):
    n, m = len(text), len(pattern)
    if m > n:
        return []

    base_pow = pow(B, m - 1, M)

    def hash_str(s, length):
        h = 0
        for i in range(length):
            h = (h * B + ord(s[i])) % M
        return h

    def roll(h, left_c, new_c):
        h = (h - ord(left_c) * base_pow) % M
        h = (h * B + ord(new_c)) % M
        return h % M if h >= 0 else (h % M + M) % M

    pattern_hash = hash_str(pattern, m)
    window_hash = hash_str(text, m)
    result = []

    for i in range(n - m + 1):
        if window_hash == pattern_hash:
            if text[i:i+m] == pattern:   # verify to avoid false positive
                result.append(i)
        if i < n - m:
            window_hash = roll(window_hash, text[i], text[i + m])
    return result

Line-by-Line Explanation

hash_str(s, length): Same polynomial hash as 7.6—ensures H(P) and H(T[i..i+m−1]) use the same formula so equal strings have equal hashes.
roll: Subtract left character's contribution (× B^(m−1)), multiply by B, add new character; handle negative mod so result is in [0, M−1].
When window_hash == pattern_hash, we verify with text[i:i+m] == pattern so that we only report true matches. Without verification, collisions would produce false positives.
We roll only when i < n - m so we don't read past the end of the text.

Time Complexity

Preprocessing: Pattern hash O(m), base_pow O(1) with pow(B, m−1, M). First window hash O(m). Main loop: n−m+1 iterations; each iteration: O(1) hash comparison, O(1) roll (when i < n−m). When we verify, we do O(m) character comparisons. Average case: Few hash matches (and few collisions), so verification cost is low → O(n + m). Worst case: Many hash collisions (e.g. pathological input or bad M)—every position might verify → O(n·m). With a large prime M and good B, worst case is rare in practice.

Space Complexity

O(1) auxiliary: pattern_hash, window_hash, base_pow, and loop variables. Result list is O(number of matches). So O(1) extra space excluding output.

Verification: Why We Need It

Hash equality only implies likely string equality. Two different strings can have the same hash (collision). So when window_hash == pattern_hash, we must either trust the hash (risking false positives) or verify. Verifying takes O(m) per candidate but keeps the output correct. With a single modulus M ≈ 10⁹, the chance of a random collision per comparison is about 1/M; over n positions expected false positives are low. Double hashing (two moduli) reduces the chance further; we can then skip verification in non-critical settings or verify only when both hashes match.

Edge Cases

m > n: Return [] without looping.
m == n: One window; compare hash, verify once.
Empty pattern: Usually not defined; if m == 0, all indices could be considered matches—check problem.
Negative modulo: After (h - ord(left_c)*base_pow) % M, the result can be negative in Python; fix with (h % M + M) % M before multiplying by B.

Common Mistakes

Using a different hash formula for the pattern and the text window—they must be identical so that equal strings get equal hashes.
Rolling when i == n−m (reading T[n] when 0-indexed length is n). Condition must be if i < n - m before rolling.
Skipping verification entirely when correctness is required; hashes can collide.

Common Mistake

Rolling the hash after the last valid window: when i = n−m, the window is T[n−m..n−1]. There is no "next" window, so do not call roll with text[n]. Only roll when i < n - m.

Rabin-Karp vs KMP

Aspect	Rabin-Karp	KMP (7.9)
Average time	O(n + m)	O(n + m)
Worst time	O(n·m) with many collisions	O(n + m)
Idea	Hash + roll; verify on match	Failure function; no backtrack in text
Multiple patterns	Easy (set of hashes)	Need Aho-Corasick or k passes

Optimization Insight

Rabin-Karp gives O(n + m) in practice with a good hash and verification. For guaranteed O(n + m) worst case with no hashing, use KMP (7.9). For "find any of k patterns," Rabin-Karp: one pass over text, at each window check if hash is in a set of k hashes—O(n + m·k) to build pattern hashes, then O(n) scan. Double hashing reduces false positives so verification is rarely needed.

Pattern Recognition

Use Rabin-Karp when you need exact pattern match in O(n + m) and hashing is acceptable, or when you need to search for multiple patterns in one pass. If the problem forbids hashing or requires worst-case linear time with no probability, prefer KMP.

Expert Tip

Use the same B and M as in 7.6 and 7.7. In contests, M = 10**9+7 or 10**9+9 is standard. For maximum safety use double hashing and verify only when both hashes match; or always verify for correctness. Rabin-Karp is easier to code than KMP for many people—rolling hash + one loop.

Interview Insight

"We hash the pattern once and the first window of the text. Then we slide the window one position at a time, updating the hash in O(1) with a rolling hash. When the window hash equals the pattern hash, we verify character-by-character to avoid false positives from collisions. Average time O(n + m)." Implement the loop with hash_str and roll; mention edge case m > n and rolling only when i < n−m.

Practice Problems

Find all occurrences of pattern in text (Rabin-Karp).
Implement strStr() with Rabin-Karp (return first index).
Search for multiple patterns in text: compute set of pattern hashes, scan text with rolling hash, check membership.

Summary

Rabin-Karp: Pattern matching using polynomial hash + rolling hash. Compute H(P) and H(T[0..m−1]); for each start i, if hashes match then verify and report i; then roll to next window. O(n + m) average.
Same hash formula for pattern and window. Roll only when i < n−m. Verify on hash match to avoid false positives.
Worst case O(n·m) with many collisions; use large prime M (and optionally double hash). KMP (7.9) gives guaranteed O(n + m) with no hashing.

7.9 KMP Algorithm

Introduction

The KMP (Knuth-Morris-Pratt) algorithm finds all occurrences of a pattern P in text T in O(n + m) time with no backtracking in the text. The naive method can re-scan the same text characters many times when a mismatch occurs; KMP avoids that by precomputing a failure function (also called lps—longest proper prefix that is also a suffix) on the pattern. When a mismatch happens at pattern index j, we don't move the text pointer back; we shift the pattern using lps so that we reuse the already-matched prefix and only advance the text pointer forward. The result is a single forward pass over the text and a bounded number of pattern shifts, giving guaranteed O(n + m) worst-case time and O(m) space.

Real-World Analogy

Imagine the pattern is a ruler with notches. You slide the ruler along the text. When the notches align with the text, you have a match. When they don't, the naive approach would lift the ruler and try the next position from scratch. KMP says: "We already know the first few characters of the ruler matched the text—so we can slide the ruler so that the next possible alignment uses that matched part again." The lps array tells us how far we can slide the ruler (how much of the pattern is a prefix that matches a suffix we already saw) so we never re-read the text backward.

Example

Pattern P = "ababc". After matching "abab" and then failing on the next character, we know "ab" is a prefix of P that also appears as a suffix of the matched "abab". So we can shift P so that the second "ab" in P aligns with the "ab" we already matched in the text—we don't need to go back in the text. The lps array for "ababc": lps[0]=0, lps[1]=0, lps[2]=1 (prefix "a" vs "b" no; "ab" has border "" only? No—"a" is not suffix of "ab". Proper borders of "aba": "a"; so lps[3]=1. For "abab": borders "ab", "": longest is "ab" len 2 → lps[4]=2. So when we mismatch after "abab", we set j = lps[4-1]=lps[3]=1... Actually standard lps[i] = length of longest proper prefix of P[0..i] that is also a suffix. So lps[4] for "abab" = 2 ("ab"). On mismatch at j=5 (next char), we do j = lps[4] = 2 and continue.

Formal Definition

Concept Note

LPS (longest proper prefix that is also a suffix): For the substring P[0..i], a proper prefix is a prefix not equal to the whole string. lps[i] = length of the longest proper prefix of P[0..i] that is also a suffix of P[0..i]. Example: P = "aabaab"; for i=5 we have "aabaab"; proper prefixes "a","aa","aab","aaba","aabaa"; suffixes "b","ab","aab","baab","abaab"; the longest that appears in both is "aab" (length 3) → lps[5]=3. KMP search: Maintain text index i and pattern index j. If T[i]=P[j], increment both. If not and j>0, set j = lps[j−1] (shift pattern); if j=0, increment i. When j=m, we found a match at i−m; then set j = lps[j−1] to find next overlap.

Why This Topic Matters

Guaranteed O(n + m): No hashing, no probability; worst-case linear time. Preferred when you need deterministic performance.
No backtracking in text: The text pointer i never decreases. Useful for streaming or when the text is read once.
Foundation for other algorithms: The same "failure function" idea appears in Aho-Corasick (7.14) and in problems like "repeated substring," "shortest period."

Mental Model

We have two pointers: i in the text (never goes back), j in the pattern. Match → both advance. Mismatch → we don't move i back; instead we ask: "What's the longest prefix of P[0..j−1] that is also a suffix?" That length is lps[j−1]. We set j = lps[j−1] and compare T[i] with P[j] again (so we've effectively shifted the pattern). If j=0 and still mismatch, then we advance i. When j reaches m, we found a match at i−m; then set j = lps[m−1] to continue searching for the next occurrence.

Building the LPS Array

We want lps[i] = length of longest proper prefix of P[0..i] that is also a suffix. We build it with two pointers: len = current longest border length, i = current index (1 to m−1). If P[i] == P[len], then lps[i] = len + 1, and we increment both. If not and len > 0, set len = lps[len−1] (try a shorter border). If len == 0, then lps[i] = 0 and i++.

def build_lps(pattern):
    """LPS[i] = length of longest proper prefix of P[0..i] that is also a suffix."""
    m = len(pattern)
    lps = [0] * m
    length = 0  # length of current longest border
    i = 1
    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1
    return lps

KMP Search Algorithm

Build lps for the pattern (O(m)).
i = 0 (text), j = 0 (pattern).
While i < n: if T[i] == P[j], then i++, j++. If j == m, we found a match at i−m; append to result; set j = lps[j−1] to continue. Else (mismatch): if j > 0, set j = lps[j−1]; else i++.
Return list of start indices.

Python Implementation

def kmp_search(text, pattern):
    n, m = len(text), len(pattern)
    if m > n:
        return []
    if m == 0:
        return list(range(n + 1))  # problem-dependent

    lps = build_lps(pattern)
    result = []
    i = j = 0
    while i < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1
            if j == m:
                result.append(i - m)
                j = lps[j - 1]
        else:
            if j > 0:
                j = lps[j - 1]
            else:
                i += 1
    return result

Line-by-Line: Build LPS

length holds the length of the longest border for the prefix we've extended so far. When P[i] == P[length], extending by one character gives a longer border, so lps[i] = length + 1.
When P[i] != P[length], we can't extend the current border. We try the next shorter candidate: the longest border of P[0..length−1] has length lps[length−1], so set length = lps[length−1] and recheck (without advancing i yet).
When length == 0, there is no border; lps[i] = 0 and we move i forward.

Line-by-Line: Search

Match: advance both i and j. When j == m, we've matched the whole pattern → record start i−m, then set j = lps[m−1] so we can find the next occurrence that might overlap (e.g. pattern "aa" in "aaa" gives matches at 0 and 1).
Mismatch and j > 0: shift pattern by setting j = lps[j−1]; we don't move i, so we compare T[i] with P[j] again.
Mismatch and j == 0: no border to fall back to; advance i.

ASCII Diagram: LPS and Shift

  P = "ababc"   (m=5)

  Proper prefix that is also suffix (borders):
  P[0..0] "a"   → none (proper prefix only "")     → lps[0]=0
  P[0..1] "ab"  → none                            → lps[1]=0
  P[0..2] "aba" → "a"                             → lps[2]=1
  P[0..3] "abab"→ "ab"                            → lps[3]=2
  P[0..4] "ababc"→ none (suffix "c" no prefix "c")→ lps[4]=0

  Text "abababc", match "abab" then mismatch at next char.
  Shift: j was 4, lps[3]=2 so j=2. Now we compare with P[2]='a' without moving text pointer.

Time Complexity

Build LPS: The inner while (length = lps[length−1]) can run multiple times per i, but length decreases each time and we only increase length by 1 when we advance i. Amortized O(m). Search: Each step either increases i or decreases j (j drops by at least 1 when we do lps[j−1], and j is increased at most n times when we match). So total iterations O(n + m). Overall O(n + m).

Space Complexity

LPS array is O(m). A few variables for i, j. O(m) space.

Edge Cases

m > n: Return [] without building lps or searching.
m == 0: Often "empty pattern matches everywhere"; return [0,1,...,n] or as per problem.
No match: Return [].
Overlapping matches: e.g. P = "aa", T = "aaa". After match at 0, we set j = lps[1] = 1; then we compare T[2] with P[1] and get match at 1. So overlapping matches are found correctly.

Common Mistakes

In build_lps, advancing i when we set length = lps[length−1]—we should not advance i until we've either set lps[i] or determined length==0. The standard code only increments i when pattern[i]==pattern[length] or when length==0.
In search, after finding a match (j==m), forgetting to set j = lps[j−1] and instead setting j=0—then we miss overlapping matches (e.g. "aa" in "aaa").
Confusing 0-indexed lps: lps[j−1] is the border length for P[0..j−1]; the next position to try in the pattern is index lps[j−1], so j = lps[j−1] is correct.

Common Mistake

After finding a match (j == m), set j = lps[j - 1] (i.e. lps[m−1]) so that we can detect the next occurrence that might overlap with the current one. If you set j = 0, you will miss overlaps like pattern "aa" in text "aaa" (matches at 0 and 1).

Optimization Insight

KMP never backs up in the text—every character is compared at most twice in amortized sense (when we decrease j we're "reusing" a prefix). So for one pattern, KMP is the standard O(n + m) solution with no hashing. For multiple patterns, use Aho-Corasick (7.14), which generalizes the failure function to a trie of patterns.

Pattern Recognition

Use KMP when you need exact pattern match with guaranteed O(n + m), when you can't use hashing, or when the problem asks for "failure function" or "border" of a string. Also for "shortest period," "repeated substring" (pattern = string, search in string+string).

Expert Tip

Memorize: lps[i] = longest proper prefix of P[0..i] that is also a suffix. Build with two pointers (length, i); on match extend border; on mismatch shorten with length = lps[length−1]. Search: match → i++, j++; j==m → report, j=lps[j−1]; mismatch and j>0 → j=lps[j−1]; else i++.

Interview Insight

"KMP precomputes an lps array on the pattern: lps[i] is the length of the longest proper prefix of P[0..i] that is also a suffix. When we mismatch at j, we set j = lps[j−1] and don't move the text pointer—so we never backtrack in the text. That gives O(n + m) worst case. After a full match we set j = lps[m−1] to find overlapping matches." Implement build_lps and the search loop; mention overlap handling.

Practice Problems

Implement strStr() / Find first occurrence using KMP.
Find all occurrences of pattern in text (KMP).
Repeated Substring Pattern: is s equal to a proper substring repeated? (Concatenate s with itself, search for s in (s+s)[1:-1] with KMP.)
Shortest Palindrome: add characters in front to make palindrome (use KMP on s + '#' + reverse(s), use lps).

Summary

KMP: Pattern matching in O(n + m) using an lps (failure) array. Text pointer never goes backward.
LPS[i] = length of longest proper prefix of P[0..i] that is also a suffix. Build in O(m) with two pointers.
Search: Match → i++, j++. j==m → report i−m, j = lps[j−1]. Mismatch: j>0 → j = lps[j−1]; else i++.
After a match, set j = lps[m−1] to catch overlapping occurrences. Guaranteed O(n + m); no hashing.

7.10 Z Algorithm

Introduction

The Z algorithm (or Z-box algorithm) computes an array Z for a string S such that Z[i] is the length of the longest substring starting at position i that matches a prefix of S. So Z[0] = n (the whole string is a prefix of itself), and for i > 0, Z[i] tells us how many characters starting at i match S[0], S[1], …. We build the Z array in O(n) time using an invariant: we maintain a "Z-box" [L, R] meaning S[L..R] = S[0..R−L] (the substring from L matches the prefix of the same length). The Z algorithm is used for pattern matching: form S = P + '$' + T (with '$' a character not in P or T); then for each index i in the T part, if Z[i] = len(P), the pattern occurs at that position in T. Like KMP, it gives O(n + m) pattern matching with no hashing.

Real-World Analogy

Imagine you have a ribbon with a repeating pattern at the start. For each position along the ribbon, you ask: "How far to the right does the ribbon match the pattern starting at the beginning?" The Z array stores that length at each position. We compute it efficiently by reusing what we already know: if we've computed a stretch [L, R] that matches the prefix, then for positions inside that stretch we can copy information from the corresponding position in the prefix (with a small adjustment) and only extend when necessary. So we avoid re-comparing from scratch at every index.

Example

String S = "aabxaabxcaabxaabxay". Z[0] = len(S). Z[1] = 1 (S[1]='a' matches S[0]; S[2]='b' ≠ S[1]). Z[2] = 0. Z[3] = 0. Z[4] = 1. Z[5] = 4 (S[5..8] = "aabx" = S[0..3]). For pattern matching: S = "ab" + "$" + "cababc" = "ab$cababc". Z[3] = 2 = len("ab") → pattern matches at index 3−len("ab")−1 = 0 in T? Actually: S = P + '$' + T; if we use 0-indexed and P has length m, then T starts at index m+1. So index i in S corresponds to start i−(m+1) in T. When Z[i] = m, pattern starts at i in S, so it's at start i−(m+1) in T... No: if Z[i]=m, the substring S[i..i+m−1] equals S[0..m−1]=P. So in S, the pattern appears starting at i. T starts at index m+1 in S. So S-index i is T-index i−(m+1). So when Z[i]=m and i ≥ m+1, we have a match at T-index i−m−1.

Formal Definition

Concept Note

Z[i]: For string S of length n, Z[i] = length of the longest substring S[i..i+k−1] that equals the prefix S[0..k−1]. So S[i..i+Z[i]−1] = S[0..Z[i]−1]. By convention Z[0] = n (we can define it as length of the string, or sometimes we don't use Z[0] for pattern matching). Z-box [L, R]: We maintain L and R such that S[L..R] = S[0..R−L], i.e. the substring starting at L matches the prefix of length R−L+1. For each i, if i ≤ R we use the fact that S[i..R] = S[i−L..R−L]; so Z[i] ≥ min(Z[i−L], R−i+1). We then extend by comparing. If i > R we compute Z[i] from scratch. After computing Z[i], if i+Z[i]−1 > R we update L=i, R=i+Z[i]−1.

Why This Topic Matters

Pattern matching in O(n + m): With S = P + '$' + T, any index i (i ≥ m+1) where Z[i] = m gives a match of P in T at position i−m−1. One Z array computation is O(|S|) = O(n + m).
Alternative to KMP: Same linear time, different idea (prefix matching at every position). Some problems are more natural with Z (e.g. find all positions where a prefix repeats).
String structure: Z array reveals periodicity and repeated prefixes; used in "find period," "distinct substrings," and similar.

Mental Model

We have a window [L, R] that we've already verified matches the prefix S[0..R−L]. For the current index i: if i is inside [L, R], we know S[i..R] = S[i−L..R−L], so Z[i] is at least min(Z[i−L], R−i+1). We then try to extend by comparing S[R+1] with S[R−i+1], etc. If i > R, we compute Z[i] by comparing S[i] with S[0] and extending. Whenever we get a new rightmost R (i+Z[i]−1 > R), we update L=i and R=i+Z[i]−1.

Step-by-Step: Building the Z Array

Z[0] = n (or set and skip in loop). L = R = 0 initially.
For i from 1 to n−1: If i > R, we have no box covering i. Set L = R = i, then while R < n and S[R−L] == S[R], extend R. Then Z[i] = R−L, and R = R−1 (since we exited with R one past the match). Actually standard: Z[i] = R−i+1 after extending so that S[i..R] = S[0..R−i]. So we extend R while S[R] == S[R−i]. Then Z[i] = R−i+1, set R to the new R (we use R as inclusive). Then L = i, R = current R.
If i ≤ R, we're inside the box. k = i−L. If Z[k] < R−i+1, then Z[i] = Z[k]. Else we have Z[i] ≥ R−i+1; set L = i and extend R from R+1 comparing S[R+1] with S[R−i+1], then Z[i] = R−i+1.

Python Implementation

Build Z Array

def build_z(s):
    """Z[i] = length of longest substring starting at i that matches prefix of s."""
    n = len(s)
    z = [0] * n
    l = r = 0
    for i in range(1, n):
        if i > r:
            l = r = i
            while r < n and s[r - l] == s[r]:
                r += 1
            z[i] = r - l
            r -= 1
        else:
            k = i - l
            if z[k] < r - i + 1:
                z[i] = z[k]
            else:
                l = i
                while r + 1 < n and s[r + 1] == s[r + 1 - l]:
                    r += 1
                z[i] = r - i + 1
    return z

Pattern Matching with Z

def z_pattern_match(text, pattern):
    """Return list of start indices in text where pattern occurs."""
    m, n = len(pattern), len(text)
    if m > n:
        return []
    s = pattern + '\0' + text  # use \0 or any separator not in P, T
    z = build_z(s)
    result = []
    for i in range(m + 1, len(s) - m + 1):
        if z[i] == m:
            result.append(i - m - 1)
    return result

T starts at index m+1 in the combined string S. When Z[i] = m for some i ≥ m+1, the substring S[i..i+m−1] equals the pattern, so the match in T starts at index i − m − 1.

Line-by-Line: Z Build (i > R case)

When i > R, no prior box covers i. We set L = i and extend R: compare S[R] with S[R−i] (so we're comparing the next character at position R with the same offset from the start). When they match, R increases. So S[i..R] = S[0..R−i]. Then Z[i] = R−i+1 (number of characters). Then we set R to this R (inclusive). In the code that uses "r -= 1" after: the while loop exits when r reaches n or mismatch, so r is one past the last match; we do r -= 1 so that R is the last index of the Z-box. So L = i, R = r (after decrement).

Time Complexity

Each time we compare characters, either R increases or i increases. R never decreases (we only set L=i and R to a value ≥ current R when we extend). So total comparisons O(n). O(n) to build Z for a string of length n. Pattern matching: S has length m+1+n, so O(n + m).

Space Complexity

Z array is O(n). L and R are O(1). O(n) space. For pattern matching, the combined string is O(n + m), so O(n + m) space.

Edge Cases

Empty string: Return Z = [] or Z[0]=0.
Single character: Z[0]=1, no other indices (or i from 1 to n−1 gives Z[i] from the loop).
Pattern longer than text: Return [] before building Z.
Separator in pattern or text: Use a character that does not appear in P or T (e.g. '\0' or '#'); otherwise Z might span across the separator incorrectly.

Common Mistakes

Using a separator that appears in P or T—then Z values can extend across the boundary and give false matches. Use a character guaranteed absent.
Off-by-one in pattern matching: T starts at index m+1 in S (after P and the separator). So when Z[i]=m at index i, the match in T starts at i−(m+1), not i−m.
In the Z build, when i ≤ R and Z[k] ≥ R−i+1, we must extend from R+1; don't forget to update L and R after extending.

Common Mistake

In pattern matching with S = P + '$' + T, the match start index in T is i − m − 1, not i − m. The T part starts at index m+1 in S (index 0 is start of P, index m is the separator, index m+1 is start of T). So when Z[i] = m for index i ≥ m+1, the pattern occupies S[i..i+m−1], and that corresponds to T starting at position (i − (m+1)) = i − m − 1.

Z Algorithm vs KMP

Both achieve O(n + m) pattern matching. KMP: precompute LPS on the pattern, then one pass over the text with two pointers. Z: form S = P + sep + T, compute Z for S; positions in T where Z[i]=m are matches. Z gives "prefix match length at every position" in one array; KMP gives "failure function" and doesn't store prefix match at every text position. For just pattern matching, either is fine; Z can be more intuitive for problems that ask "longest prefix match at each index."

Optimization Insight

The key to O(n) Z build is that R never decreases. When we use Z[i−L] for i inside the box, we get a lower bound and sometimes the exact value without any comparison. When we extend, we only compare from R+1 onward, so each comparison increases R. So total work is linear.

Pattern Recognition

Use Z when you need length of prefix match at every position, or when pattern matching with a single combined string is convenient. Also for "find all positions where the string matches its prefix," "period of string," or "number of occurrences of prefix at each position."

Expert Tip

Keep the Z-box [L, R] meaning S[L..R] = S[0..R−L]. When i ≤ R, use k = i−L and Z[i] = min(Z[k], R−i+1) unless Z[k] ≥ R−i+1 in which case we extend. When i > R, compute Z[i] from scratch and update L, R. For pattern matching, always use a separator not in P or T.

Interview Insight

"The Z array at i gives the length of the longest substring starting at i that matches the prefix. We build it in O(n) by maintaining a box [L,R] that matches the prefix; for i inside the box we reuse Z[i−L], then extend if needed. For pattern matching we form P + sep + T and find indices i where Z[i] = len(P); those are match starts in T at i−len(P)−1." Implement build_z and mention the separator.

Practice Problems

Find all occurrences of pattern in text using Z algorithm.
Find all positions in a string where the prefix repeats (Z[i] = some value).
Shortest period of a string (smallest period p such that s is periodic with period p)—use Z or KMP.

Summary

Z[i] = length of longest substring starting at i that equals a prefix of S. Build in O(n) using a Z-box [L, R].
Pattern matching: S = P + separator + T; compute Z for S; when Z[i] = m and i ≥ m+1, pattern occurs in T at start index i−m−1.
Use a separator that does not appear in P or T. Match start in T is i−m−1, not i−m.
Same O(n + m) as KMP; Z array gives prefix-match length at every position, which is useful for period and repeat problems.

7.11 Manacher's Algorithm

Introduction

Manacher's algorithm finds the longest palindromic substring of a string in O(n) time and O(n) space. In topic 7.4 we used "expand around center" for every center (odd and even), which takes O(n²). Manacher improves this by reusing information: we maintain the rightmost boundary R of any palindrome we've seen so far and its center C. For each position i, we use the mirror position 2·C − i to get a lower bound on the palindrome radius at i, then expand only when needed. We also transform the string by inserting a separator (e.g. '#') between every pair of characters and at the ends, so that every palindrome in the new string has odd length—then we only need one type of center. The result is a single pass where the "expand" step advances R at least once per comparison, giving O(n) total.

Real-World Analogy

Imagine you're measuring how far each point on a line can "see" in both directions while staying inside a symmetric corridor (palindrome). Once you've computed the corridor for a point to your left, a point inside that corridor can often reuse the same width (mirror image) instead of measuring from scratch. You only extend the measurement when you're past the known corridor or when the mirror gives a lower bound and you need to check further. So you avoid re-measuring the same stretch repeatedly—that's how Manacher gets linear time.

Example

String s = "babad". Transform with '#' → "#b#a#b#a#d#". Now every palindrome has odd length and a unique center. The longest palindrome in the original string is "bab" or "aba" (length 3). In the transformed string, the center of "aba" is the middle 'a'; the radius (half-length) is 2 (including the center). Manacher's array P at that center would be 2; the corresponding length in the original string is P[i] (the number of original characters in that palindrome). So we get length 3 and can recover the substring.

Formal Definition

Concept Note

Transformed string: T = "#" + "#".join(s) + "#", so length is 2n+1. Every palindrome in T has odd length with a unique center. P[i]: the radius (half-length) of the longest palindrome centered at index i in T. The palindrome spans T[i−P[i]..i+P[i]] and has length 2·P[i]+1 in T. The number of original characters in that palindrome equals P[i]. So the longest palindromic substring in s has length max(P[i]). We maintain center C and right boundary R of the rightmost palindrome; for each i we use the mirror 2·C−i to get a lower bound, then expand. R never decreases, so total work is O(n).

Why This Topic Matters

Optimal for longest palindromic substring: Expand-around-center (7.4) is O(n²); Manacher is O(n). For large inputs or when the problem explicitly asks for linear time, Manacher is the answer.
Reuses structure: Like Z algorithm and KMP, we reuse previously computed information (the mirror palindrome under center C) to avoid redundant work.
Interview and contests: "Longest palindromic substring in O(n)" is a classic follow-up after the O(n²) solution.

Mental Model

We have a "rightmost" palindrome with center C and right boundary R (so it spans to index R). For the current position i: if i is inside [C−(R−C), R] = [2C−R, R], then the mirror of i with respect to C is j = 2C−i. The palindrome at j is fully inside the one at C, so we know the palindrome at i is at least as large as the part that fits—P[i] ≥ min(P[j], R−i). We set P[i] to that minimum, then try to expand. If we expand past R, we update C = i and R = new right boundary. So we always move R forward, giving O(n) total expansions.

Transform and Key Recurrence

Transform: T = "#" + "#".join(list(s)) + "#" so that every palindrome has odd length. Example: "abba" → "#a#b#b#a#".

Recurrence: Maintain C (center of the palindrome that extends farthest right) and R (its right boundary, inclusive). For each i from 1 to len(T)−1: If i ≤ R, set P[i] = min(R − i, P[2*C − i]). Then while T[i + P[i] + 1] == T[i − P[i] − 1], increment P[i]. If i + P[i] > R, set C = i and R = i + P[i].

Python Implementation

def manacher(s):
    """Return (max_radius, center_index) in transformed string; original length = max_radius."""
    if not s:
        return 0, 0
    t = '#' + '#'.join(s) + '#'
    n = len(t)
    p = [0] * n
    c = r = 0
    for i in range(1, n - 1):
        mirror = 2 * c - i
        if i < r:
            p[i] = min(r - i, p[mirror])
        while i + p[i] + 1 < n and i - p[i] - 1 >= 0 and t[i + p[i] + 1] == t[i - p[i] - 1]:
            p[i] += 1
        if i + p[i] > r:
            c = i
            r = i + p[i]
    max_rad = max(p)
    center = p.index(max_rad)
    return max_rad, center

def longest_palindrome_manacher(s):
    """Return the longest palindromic substring of s."""
    if not s:
        return ""
    max_rad, center = manacher(s)
    t = '#' + '#'.join(s) + '#'
    start = center - max_rad
    end = center + max_rad + 1
    substring = t[start:end]
    return substring.replace('#', '')

Line-by-Line Explanation

t = '#' + '#'.join(s) + '#': Transforms "ab" into "#a#b#". So every palindrome in t has odd length (center is one character).
if i < r: We're inside the palindrome centered at c; mirror = 2*c − i. The palindrome at mirror is fully contained in [2*c−r, r], so we can copy P[mirror] unless it would extend past r—then we cap at r−i.
while ... expand: Try to extend the palindrome at i; stop at boundary or mismatch.
if i + p[i] > r: We extended past the old right boundary, so update center c and right boundary r.
To get the original substring: the segment in t from center−max_rad to center+max_rad (inclusive) has 2*max_rad+1 characters; removing '#' gives max_rad original characters? Actually in t, indices center−max_rad to center+max_rad inclusive: that's 2*max_rad+1 chars. The original chars in that range are every other (the odd indices in that range). So original length = max_rad. So longest palindrome length = max_rad. To extract: take t[center-max_rad : center+max_rad+1] and replace '#' with ''. That gives the substring.

Time Complexity

Each time we expand (increment P[i]), we increase the right boundary R. R never decreases and is at most n (length of t). So total expansions are O(n). The "min(r − i, p[mirror])" step is O(1). So O(n) total.

Space Complexity

Transformed string t: O(n). Array P: O(n). So O(n).

Edge Cases

Empty string: Return "" or length 0.
Single character: Transform "#a#"; P[1]=1 (radius 1); longest is "a".
No palindrome of length > 1: e.g. "abc"; max P[i] = 1; return first character.

Common Mistakes

Forgetting to transform the string—then even-length palindromes don't have a single center in the middle of a character.
Wrong mirror index: mirror = 2*C − i (i and mirror are symmetric about C).
When extracting the substring, mixing up indices in t vs original s. The segment in t is [center−P[center], center+P[center]]; remove '#' to get the original substring.

Common Mistake

The mirror of index i with respect to center C is 2*C − i, not C − i or i − C. So P[i] gets a lower bound from P[2*C − i] when i is inside the current rightmost palindrome.

Expand-Around-Center vs Manacher

Method	Time	Space
Expand around center (7.4)	O(n²)	O(1)
Manacher's algorithm	O(n)	O(n)

Optimization Insight

Expand-around-center does O(n) work per center (up to n/2 expansions per center in the worst case), so O(n²). Manacher reuses the rightmost palindrome: when i is inside it, we get a free lower bound and often don't expand at all; when we do expand, we push R forward, so total expansions are O(n). Trade-off: O(n) extra space for the P array and transformed string.

Pattern Recognition

Use Manacher when the problem asks for longest palindromic substring in O(n), or when you need the radius (or length) of the longest palindrome centered at every position (e.g. count palindromic substrings in O(n) by summing (P[i]+1)//2 or similar). For "just" longest palindrome and n is small, expand-around-center is simpler.

Expert Tip

Transform with '#' so that the transformed string has length 2n+1 and every palindrome is odd-length. Then P[i] is the "radius" (half-length minus center). The original palindrome length equals P[i]. After the loop, max(P) gives the longest palindrome radius; extract the substring from the transformed string and remove '#'.

Interview Insight

"We can do expand-around-center in O(n²). For O(n) we use Manacher's algorithm: transform the string so every palindrome has odd length (insert '#' between characters). Then we maintain the rightmost palindrome [C, R] and for each position i we use the mirror 2*C−i to get a lower bound on P[i], then expand. R only moves forward, so total work is O(n)." Implement the transform and the main loop with mirror and expand.

Practice Problems

Longest Palindromic Substring (LeetCode 5) in O(n) with Manacher.
Count palindromic substrings in O(n) using the P array (each center i contributes (P[i]+1)//2 palindromes, or similar).

Summary

Manacher's algorithm finds the longest palindromic substring in O(n) by reusing the rightmost palindrome boundary.
Transform: T = "#" + "#".join(s) + "#" so every palindrome has odd length. P[i] = radius of longest palindrome centered at i.
Recurrence: If i ≤ R, P[i] = min(R−i, P[2*C−i]); then expand. If i+P[i] > R, update C=i, R=i+P[i].
R never decreases → O(n) time. O(n) space for T and P.

7.12 Suffix Array

Introduction

A suffix array of a string S of length n is an array of integers that gives the starting indices of all suffixes of S in lexicographic (sorted) order. So suffix_array[0] is the index of the smallest suffix, suffix_array[1] the next smallest, and so on. For example, for S = "banana", the suffixes are "banana"(0), "anana"(1), "nana"(2), "ana"(3), "na"(4), "a"(5). Sorted: "a"(5), "ana"(3), "anana"(1), "banana"(0), "na"(4), "nana"(2). So the suffix array is [5, 3, 1, 0, 4, 2]. Once built, we can search for a pattern P in O(m log n) by binary search: find the range of suffixes that have P as a prefix. We can also build an LCP array (longest common prefix between consecutive suffixes in the suffix array) to solve problems like "longest repeated substring" and "number of distinct substrings." Naive construction is O(n² log n) (sort n suffixes, each comparison O(n)); efficient algorithms (doubling, SA-IS) achieve O(n log n) or O(n).

Real-World Analogy

Imagine a dictionary of all suffixes of a word: "banana" gives entries "a", "ana", "anana", "banana", "na", "nana" in alphabetical order. The suffix array is the list of "page numbers" (starting indices) in that order. To find where "nan" appears, you open the dictionary to the right place (binary search) and check if the suffix at that position starts with "nan". The sorted order lets you binary search instead of scanning every suffix.

Example

S = "banana", n = 6. Suffixes: S[0:]= "banana", S[1:]= "anana", S[2:]= "nana", S[3:]= "ana", S[4:]= "na", S[5:]= "a". Sorted lexicographically: "a" < "ana" < "anana" < "banana" < "na" < "nana". So SA = [5, 3, 1, 0, 4, 2]. Pattern "na": binary search finds suffixes that start with "na"—they are at positions 4 and 5 in SA (indices 4 and 2 in S). So "na" occurs at indices 2 and 4.

Formal Definition

Concept Note

Suffix array SA[0..n−1]: A permutation of {0, 1, …, n−1} such that the suffix starting at SA[0] is lexicographically smallest, the suffix at SA[1] is the next smallest, and so on. So S[SA[i]:] < S[SA[i+1]:] for all i. Pattern search: P occurs in S iff there is some suffix that has P as a prefix. Because suffixes are sorted, all such suffixes form a contiguous range in the suffix array. Binary search finds the leftmost and rightmost index in SA where the suffix has P as prefix; the occurrences of P in S are exactly SA[lo], SA[lo+1], …, SA[hi]. LCP array: LCP[i] = length of longest common prefix of S[SA[i]:] and S[SA[i−1]:]. Used for repeated substrings and distinct substring count.

Why This Topic Matters

Pattern matching: After O(n log n) or O(n) build, we can answer "does P occur in S?" and "where does P occur?" in O(m log n) per query (binary search + O(m) comparison per step).
Longest repeated substring: With the LCP array, the maximum LCP value gives the length of the longest substring that appears at least twice. The substring is S[SA[i]:SA[i]+LCP[i]].
Distinct substrings: Total substrings = n(n+1)/2; subtract sum of LCP to get distinct count (each repeated substring is "overcounted" by the common prefix length).

Mental Model

List all suffixes (starting index 0, 1, …, n−1), sort them lexicographically. The sorted order of indices is the suffix array. To find pattern P: binary search for the smallest suffix that is ≥ P, and the smallest suffix that is > P (where "suffix > P" means P is a prefix of the suffix or the suffix is lexicographically greater). The range between them (if any) gives the starting indices where P occurs.

Building the Suffix Array (Naive)

Create pairs (suffix string, index) for each starting index 0..n−1. Sort by the suffix string. Extract the indices in order. Comparing two suffixes is O(n) in the worst case, and we have O(n log n) comparisons, so O(n² log n) total. For small n this is acceptable; for large n we need O(n log n) construction (e.g. doubling with radix sort or SA-IS).

Python Implementation

Naive Suffix Array Build

def build_suffix_array_naive(s):
    """Return suffix array: indices of suffixes in lexicographic order."""
    n = len(s)
    suffixes = [(s[i:], i) for i in range(n)]
    suffixes.sort(key=lambda x: x[0])
    return [idx for _, idx in suffixes]

Pattern Search: Binary Search

def suffix_array_search(s, suffix_array, pattern):
    """Return list of start indices in s where pattern occurs."""
    n, m = len(s), len(pattern)
    if m > n or not pattern:
        return []

    def cmp_suffix(i, pattern):
        # Compare s[i:] with pattern. Return -1 if s[i:] < pattern, 0 if prefix, 1 if s[i:] > pattern
        suff = s[i:]
        if suff.startswith(pattern):
            return 0
        if suff < pattern:
            return -1
        return 1

    # Binary search: leftmost index in SA where suffix >= pattern (has P as prefix or is greater)
    lo, hi = 0, n
    while lo < hi:
        mid = (lo + hi) // 2
        if cmp_suffix(suffix_array[mid], pattern) < 0:
            lo = mid + 1
        else:
            hi = mid
    left = lo

    # Rightmost index where suffix has P as prefix (suffix < P + chr(255) or similar)
    hi = n
    while lo < hi:
        mid = (lo + hi) // 2
        if s[suffix_array[mid]:].startswith(pattern):
            lo = mid + 1
        else:
            hi = mid
    right = lo - 1

    if left <= right:
        return sorted([suffix_array[i] for i in range(left, right + 1)])
    return []

Simpler variant: one binary search for "first suffix ≥ pattern", one for "first suffix > pattern" (where we treat "prefix of suffix" as equal to pattern). Then occurrences are SA[left], ..., SA[right−1] when we use the second search as "first suffix that does not have P as prefix."

LCP Array (Brief)

After building the suffix array, the LCP array LCP[i] = length of the longest common prefix of S[SA[i]:] and S[SA[i−1]:]. We can build it in O(n) by iterating with a pointer that never increases by more than 1 per step. Then max(LCP) is the length of the longest repeated substring; the substring is S[SA[i]: SA[i] + LCP[i]] for the i that achieves the max.

Time Complexity

Naive build: O(n² log n)—n suffixes, sort with O(n log n) comparisons, each comparison O(n). Efficient build: O(n log n) with doubling + sort, or O(n) with SA-IS (not covered here). Pattern search: O(m log n) with binary search (log n steps, each comparison O(m)).

Space Complexity

Suffix array: O(n). Naive build stores O(n) strings of total length O(n²) in the worst case (when we store full suffix strings); we can avoid storing full strings by comparing on the fly during sort (Python's sort will compare s[i:] with s[j:] which creates temporary slices—still O(n) per comparison). So O(n) for SA; naive construction may use O(n²) temporary space for the list of suffixes if we materialize them. With a key that returns s[i:] on demand, we use O(n) for the array of indices and O(n) per comparison.

Edge Cases

Empty string: Suffix array is [].
Single character: SA = [0].
Pattern longer than string: Return [] from search.
Pattern empty: Usually return all indices 0..n−1 or handle per problem.

Common Mistakes

Comparing suffixes incorrectly: use lexicographic order (same as Python's string comparison).
In binary search, off-by-one in the "first suffix that has P as prefix" vs "first suffix that is > P" — the range of occurrences is [left, right] inclusive when left is first with prefix P and right is last with prefix P.

Optimization Insight

Naive: Sort suffixes as strings → O(n² log n). Better: Doubling algorithm: sort by first 1 char, then by first 2, 4, 8, … using ranks; each phase O(n) with radix sort → O(n log n). Optimal: SA-IS and others in O(n). For interviews, naive build is often acceptable when n is small; mention that production uses O(n log n) or O(n) construction.

Pattern Recognition

Use suffix array when you need many pattern searches on the same text (build once, query in O(m log n)), longest repeated substring, longest common substring of two strings (concatenate with separator, build SA and LCP, find max LCP between suffixes from different strings), or distinct substring count. For a single pattern search, KMP or Rabin-Karp is simpler.

Expert Tip

In Python, sorting with key=lambda i: s[i:] is cleaner than storing (s[i:], i) because we only compare when needed; but s[i:] creates a new string each time. For naive build, (s[i:], i) is fine for small n. For search, binary search with a comparator that compares s[SA[mid]:] with pattern avoids building a list of all suffixes.

Interview Insight

"A suffix array is the list of starting indices of all suffixes in sorted order. We can build it naively by sorting the suffixes in O(n² log n). To search for pattern P we binary search: find the range of suffixes that have P as prefix—O(m log n). With an LCP array we can get the longest repeated substring in O(n)." Implement naive build and one binary search; mention LCP for repeated substring.

Practice Problems

Build suffix array (naive) and search for a pattern.
Longest repeated substring: build SA and LCP, return substring for max LCP.
Number of distinct substrings: n(n+1)/2 − sum(LCP).
Longest common substring of two strings: S = A + '#' + B, build SA and LCP, find max LCP where the two suffixes come from A and B.

Summary

Suffix array SA: starting indices of all suffixes in lexicographic order. SA[i] = index of the i-th smallest suffix.
Naive build: sort suffixes → O(n² log n). Efficient: O(n log n) or O(n) with doubling/SA-IS.
Pattern search: binary search for range of suffixes with P as prefix → O(m log n). Occurrences are SA[lo..hi].
LCP array: longest common prefix of consecutive suffixes in SA order. Max LCP = length of longest repeated substring.

7.13 Suffix Tree

Introduction

A suffix tree of a string S is a trie (or compressed trie) that contains all suffixes of S. Each path from the root corresponds to a substring of S; each leaf is labeled with the starting index of the suffix that ends there. Once built, we can search for a pattern P in O(m) time by walking from the root along edges that match P; if we reach a node (or a point on an edge), all leaves in the subtree below give the occurrence indices. The suffix tree also supports longest repeated substring, longest common substring of two strings, and other problems. Naive construction inserts each of the n suffixes into a trie in O(n²) time and space; Ukkonen's algorithm builds the tree in O(n) time and space. This topic introduces the structure and naive build; efficient construction (Ukkonen) is often studied in advanced courses.

Real-World Analogy

Imagine a family tree where every path from the root spells a prefix of some suffix. The root has branches for each first letter of a suffix; each branch leads to more branches for the next character, and so on. When you reach a "leaf," you've read one full suffix and the leaf tells you where that suffix started in the original string. To find where "nan" appears, you walk: root → 'n' → 'a' → 'n'. The subtree under that point contains all the leaves (starting indices) where "nan" occurs. So one walk gives you the answer.

Example

S = "banana". Suffixes: "banana"(0), "anana"(1), "nana"(2), "ana"(3), "na"(4), "a"(5). In the trie, root has edges for 'b', 'a', 'n'. The edge 'a' leads to a node with edges 'n' (→ "ana" suffix 3, "anana" suffix 1) and perhaps a leaf for "a" (suffix 5). Pattern "na": from root go to 'n', then 'a'; the leaves under that node are 2 and 4, so "na" occurs at indices 2 and 4. In a compressed suffix tree, edges are labeled with substrings (e.g. "nana") instead of single characters to reduce nodes.

Formal Definition

Concept Note

Suffix tree (uncompressed): A trie where each root-to-leaf path spells a suffix of S. Each leaf stores the starting index of that suffix. Internal nodes may have multiple children (branching). Compressed suffix tree: Edges are labeled with substrings (not single chars); any internal node (except possibly the root) has at least two children. This keeps the number of nodes O(n). Pattern search: Follow the path from the root that matches P character by character (or substring by substring). If we can match all of P, every leaf in the subtree at that point is an occurrence of P. Space: Uncompressed trie can have O(n²) nodes; compressed tree O(n) nodes and O(n) space with Ukkonen.

Why This Topic Matters

O(m) pattern search: After O(n) build (Ukkonen), each pattern query is O(m)—faster than suffix array's O(m log n) binary search when many queries are needed.
Longest repeated substring: Find the deepest internal node (with at least two leaf descendants); the path from root to that node spells the longest repeated substring.
Longest common substring of two strings: Build suffix tree for A + '#' + B; find the deepest node that has leaves from both A and B (using a separator to distinguish).
Foundation: Suffix trees generalize to multiple strings and are related to suffix arrays (the suffix array can be obtained by a DFS of the suffix tree).

Mental Model

Root = empty string. Each edge is labeled with one or more characters. Insert all suffixes: S[0:], S[1:], …, S[n−1:]. Shared prefixes share the same path. Leaves store the start index. To search P: start at root, follow edges that match P; if we consume all of P, collect all leaf indices in the current subtree. In a compressed tree, one edge might say "nana" so we skip four characters in one step.

Structure: Trie of Suffixes

Start with an empty trie. For each starting index i from 0 to n−1, insert the suffix S[i:] into the trie. When inserting, follow existing edges that match; when no edge matches the next character, create a new edge and a new node (or leaf). Store the start index i at the leaf. Each leaf corresponds to exactly one suffix; each internal node (except root) represents a substring that appears as a prefix of at least two different suffixes.

Python Implementation (Naive Build)

class SuffixTreeNode:
    def __init__(self):
        self.children = {}   # char -> SuffixTreeNode
        self.start = None    # for leaf: start index of suffix
        self.is_leaf = False

def build_suffix_tree_naive(s):
    """Build uncompressed suffix tree (trie of suffixes). O(n^2) time and space."""
    root = SuffixTreeNode()
    n = len(s)
    for i in range(n):
        node = root
        for j in range(i, n):
            c = s[j]
            if c not in node.children:
                node.children[c] = SuffixTreeNode()
            node = node.children[c]
        node.is_leaf = True
        node.start = i
    return root

def suffix_tree_search(root, s, pattern):
    """Return list of start indices where pattern occurs in s."""
    node = root
    for c in pattern:
        if c not in node.children:
            return []
        node = node.children[c]
    result = []
    def collect_leaves(n):
        if n.is_leaf:
            result.append(n.start)
        for child in n.children.values():
            collect_leaves(child)
    collect_leaves(node)
    return sorted(result)

Pattern Search

Walk from the root following the first character of P, then the second, and so on. If at any step the required character is not on any edge, P does not occur in S—return []. If we finish reading P, we are at a node (or in the middle of an edge in a compressed tree). All leaves in the subtree rooted at that point are the starting indices of suffixes that have P as a prefix—i.e. occurrences of P. Collect them with a DFS. Time: O(m) to walk + O(k) to collect k leaves. If we only need to check existence, we can stop after the walk—O(m).

Longest Repeated Substring (Concept)

In the suffix tree, the longest repeated substring is the string spelled by the path from the root to the deepest internal node that has at least two leaf descendants (i.e. the substring appears at least twice). So we can do a DFS, compute the depth of each node, and among nodes with ≥2 leaves in the subtree, take the one with maximum depth. The path label from root to that node is the longest repeated substring.

Time Complexity

Naive build: n suffixes, each of length up to n; each insertion may traverse and create O(n) nodes. Total O(n²) time and O(n²) space in the worst case (e.g. all characters distinct). Ukkonen: O(n) time and O(n) space for the compressed tree. Pattern search: O(m) to walk + O(k) to collect k occurrences. So O(m + k) per query.

Space Complexity

Uncompressed: O(n²) worst case. Compressed (Ukkonen): O(n) nodes and edges. Each edge stores a substring reference (start, end indices into S) so O(1) per edge.

Edge Cases

Empty string: Tree has only root (no suffixes to insert).
Single character: One suffix; one leaf.
Pattern not in string: Walk fails at some character → return [].
All same character "aaa": Heavy sharing; tree is a single path of length n plus leaves at each level (or compressed into one long edge).

Suffix Tree vs Suffix Array

Aspect	Suffix Tree	Suffix Array
Build (efficient)	O(n) Ukkonen	O(n log n) or O(n)
Pattern search	O(m + k)	O(m log n)
Space	O(n) compressed	O(n)
Implementation	Complex (Ukkonen)	Simpler

Optimization Insight

Naive suffix tree: Insert n suffixes into a trie → O(n²) time and space. Ukkonen's algorithm: Builds the tree in a single left-to-right pass with suffix links and active point, achieving O(n). For interviews, the naive trie of suffixes is often enough to convey the idea; mention Ukkonen for linear-time build. In practice, suffix arrays + LCP are often preferred for simplicity and cache efficiency.

Pattern Recognition

Use suffix tree when you need very fast pattern search (O(m) per query after O(n) build), longest repeated substring (deepest branching node), or longest common substring of two strings (build for A#B, find deepest node with leaves from both). For most problems, suffix array + LCP is simpler to implement and sufficient.

Expert Tip

A compressed suffix tree uses edge labels as (start, end) indices into S instead of copying substrings, so space stays O(n). Ukkonen builds it online. If you only need pattern search and n is moderate, the naive trie is acceptable; for production or large n, use a suffix array or a library that implements Ukkonen.

Interview Insight

"A suffix tree is a trie of all suffixes. Each leaf stores the start index. To search for P we walk from the root following P; if we reach a node, all leaves in the subtree are the occurrences—O(m) search. Naive build is O(n²) by inserting each suffix; Ukkonen's algorithm builds in O(n). The longest repeated substring is the path label of the deepest internal node with at least two leaf descendants." Implement the naive trie build and search; mention Ukkonen for linear time.

Practice Problems

Build a suffix tree (naive trie) and implement pattern search.
Longest repeated substring using suffix tree (deepest branching node).
Check if pattern P occurs in S using the suffix tree (O(m) after build).

Summary

Suffix tree: Trie (or compressed trie) of all suffixes; leaves store start indices. Path from root spells a substring.
Pattern search: Walk from root following P; collect all leaf indices in the subtree → O(m + k).
Naive build: Insert n suffixes → O(n²) time and space. Ukkonen: O(n) time and space.
Longest repeated substring: Deepest internal node with ≥2 leaf descendants. Suffix tree supports many string problems; suffix array + LCP are often simpler in practice.

7.14 Aho-Corasick Algorithm

Introduction

The Aho-Corasick algorithm solves multi-pattern string matching: given a text T and a set of patterns {P₁, P₂, …, Pₖ}, find all occurrences of any pattern in T in a single pass over the text. Instead of running KMP (or Rabin-Karp) k times—once per pattern—we build a trie of all patterns and add failure links (like the KMP failure function, but on the trie): from each node, the failure link points to the longest proper suffix of the current path that is also a prefix of some pattern. Then we scan T once, at each character following the trie (or the failure link when there is no matching edge), and at each node we report any pattern that ends there. Total time is O(n + m + z), where n = |T|, m = sum of pattern lengths, and z = total number of matches.

Real-World Analogy

Imagine a single pass through a document with a "dictionary" of many keywords. At each position you're in a "state" (a node in the trie). You try to extend the state by reading the next character; if you can't, you fall back to a shorter matching state (the failure link) and try again, without moving the text pointer back. So you never backtrack in the text—one forward scan—and whenever your state corresponds to a full keyword, you report it. It's like KMP for one pattern, but the "failure" can jump to a state that matches a different pattern's prefix.

Example

Patterns: "he", "she", "his", "hers". Text: "she sells seashells". Build trie: root → 's' → 'h' → 'e' (match "she"), also 'h' → 'e' (match "he"), 'h' → 'i' → 's' (match "his"), 'e' → 'r' → 's' (match "hers"). Failure links: e.g. after "sh" we might fail on next char; the longest suffix of "sh" that is a prefix of some pattern is "h" (prefix of "he", "his"). So from the "sh" node we have a failure link to the "h" node. Scan: s→h→e: match "she" at start; then continue for "he" (overlap), etc.

Formal Definition

Concept Note

Aho-Corasick automaton: (1) Trie: One node per prefix of any pattern; edges are characters; nodes that correspond to the end of a pattern store the pattern ID(s). (2) Failure link: From node u (path spells string S), failure[u] = the node v such that the path from root to v is the longest proper suffix of S that is also a prefix of some pattern. (3) Output link (optional): From u, follow failure links until we hit a node that ends a pattern; we can precompute "which patterns end at u or at any node reachable by failure from u" to report all matches quickly. Search: Start at root; for each character c of T, while current node has no edge c, go to failure[node]; then take the edge c (or stay at root if still no edge). At each node, report patterns ending there.

Why This Topic Matters

Multi-pattern in one pass: KMP does one pattern per pass; Aho-Corasick does all patterns in O(n + m + z). Essential when you have a fixed set of keywords and many texts (e.g. spam filter, intrusion detection, bioinformatics).
No backtrack in text: Like KMP, the text pointer never goes backward. So we can stream the text.
Interview and contests: "Find all occurrences of any of these k patterns" is the classic use case; mentioning Aho-Corasick shows you know beyond single-pattern KMP.

Mental Model

Trie of patterns: each path from root spells a prefix of some pattern; mark nodes where a pattern ends. Failure link: from each node, "if the next character doesn't match, what's the longest suffix of my path that could still match something?"—that suffix is a prefix of some pattern, so we have a node for it. Search: read T character by character; at each step, follow the trie (or failure) so that the current node always represents the longest match of some pattern prefix ending at the current position. When that node (or any node reachable via failure) ends a pattern, report it.

Building the Automaton

Step 1: Build the Trie

Start with root. For each pattern, insert it into the trie (like a normal trie). At the node reached after the last character, mark that pattern ID (and length, or the pattern itself) so we know which pattern(s) end there.

Step 2: Build Failure Links (BFS)

Root's failure is root (or null). For each node u at depth 1 (children of root), failure[u] = root. For nodes at depth > 1: let u be reached by edge c from parent p. Set w = failure[p]. While w ≠ root and w has no edge c, set w = failure[w]. Then failure[u] = w.child[c] if that exists, else root. This computes the longest proper suffix of the path to u that is a prefix of some pattern.

Python Implementation (Simplified)

from collections import deque

class AhoNode:
    def __init__(self):
        self.children = {}
        self.fail = None
        self.output = []   # pattern indices ending at this node

def build_aho_corasick(patterns):
    root = AhoNode()
    for i, p in enumerate(patterns):
        node = root
        for c in p:
            if c not in node.children:
                node.children[c] = AhoNode()
            node = node.children[c]
        node.output.append(i)

    root.fail = root
    q = deque()
    for c, child in root.children.items():
        child.fail = root
        q.append(child)
    while q:
        node = q.popleft()
        for c, child in node.children.items():
            q.append(child)
            fail = node.fail
            while fail != root and c not in fail.children:
                fail = fail.fail
            child.fail = fail.children.get(c, root)
            child.output += child.fail.output

    return root

def aho_search(text, root, patterns):
    """Return list of (pattern_index, start_position_in_text) for each match."""
    result = []
    node = root
    for i, c in enumerate(text):
        while node != root and c not in node.children:
            node = node.fail
        node = node.children.get(c, root)
        for pat_id in node.output:
            start = i - len(patterns[pat_id]) + 1
            result.append((pat_id, start))
    return result

Line-by-Line: Failure and Output

child.fail = fail.children.get(c, root): We walk from node's parent's failure until we find a node that has an edge for c, or we reach root. Then the child's failure is that next node (or root).
child.output += child.fail.output: Any pattern that ends at the failure node also "ends" at the current node in the sense that we should report it when we reach the current node (because the failure path is a suffix of the current path). So we merge output lists so that at each node we report all patterns that end at this node or at any node on the failure chain.
In search: when we can't follow an edge, we go to failure until we can or we're at root. Then we take the edge (or stay at root). We report all patterns in node.output at each step.

Time Complexity

Build trie: O(m) where m = total length of all patterns. Build failure links: BFS over nodes; each node we may follow failure links a bounded number of times (amortized analysis: total failures followed is O(m)). So O(m). Search: n characters; at each character we may follow failure links (each follow moves us to a strictly shorter path, so total over the whole search is O(n)) and then one transition. Reporting z matches is O(z). Total O(n + m + z).

Space Complexity

Trie has O(m) nodes (each character of each pattern creates at most one node). Failure and output pointers: O(m). So O(m).

Edge Cases

Empty pattern: Skip or treat as matching everywhere; handle per problem.
One pattern: Aho-Corasick reduces to KMP-like behavior (trie is a path + failure links).
Pattern a prefix of another: The shorter pattern's end node is an ancestor of the longer's; output lists and failure chain ensure both are reported.

Common Mistakes

Forgetting to merge output from failure node into the current node—then you only report patterns that end exactly at this node and miss patterns that end at a suffix (e.g. "he" inside "she").
In the search loop, moving the text pointer when following failure—we should not advance i when we follow failure; we only advance when we consume a character from the text.

Common Mistake

When at a node that doesn't have an edge for the current character, we follow the failure link and do not advance the text index. We only advance the text index when we actually consume a character (take an edge). So the loop is: while no edge and not root, go to failure; then if there is an edge, take it and advance i; else (at root with no edge) just advance i.

Aho-Corasick vs KMP vs Rabin-Karp

Problem	Best choice
Single pattern	KMP or Rabin-Karp O(n + m)
Multiple patterns, one text	Aho-Corasick O(n + m + z)
Multiple patterns, hashing OK	Rabin-Karp: set of hashes, one pass O(n)

Optimization Insight

For k patterns, running KMP k times gives O(k·(n + m_i)) which can be O(k·n) if patterns are short. Aho-Corasick does one pass O(n) plus O(m) build, so O(n + m)—better when k is large. Rabin-Karp with a set of pattern hashes also does one pass; Aho-Corasick is deterministic and doesn't need verification (no hash collisions).

Pattern Recognition

Use Aho-Corasick when you have multiple patterns and one text (or many texts with the same pattern set). Keywords: "find any of these keywords," "multi-pattern search," "dictionary matching." For a single pattern, use KMP or Rabin-Karp.

Expert Tip

Precompute output lists so that at each node we store all pattern IDs that end at this node or at any node reachable by following failure links. Then during search we only need to iterate node.output at each step—no need to follow the failure chain to collect matches. Build failure links with BFS (level order) so that when we compute failure for a node at depth d, all nodes at depth < d already have their failure set.

Interview Insight

"For multiple patterns we build a trie of all patterns and add failure links: from each node, the failure points to the longest proper suffix of the current path that is also a prefix of some pattern—like KMP on the trie. Then we scan the text once: at each character we follow the trie or failure, and at each node we report patterns in the output list. Time O(n + m + z)." Implement trie build and failure BFS; mention output list merge from failure node.

Practice Problems

Find all occurrences of any of k patterns in a text (Aho-Corasick).
Keyword matching: given a list of forbidden words and a document, find all positions where any forbidden word appears.
Multi-pattern search with pattern IDs: return (pattern_id, start_index) for each match.

Summary

Aho-Corasick: Multi-pattern matching in one pass. Trie of patterns + failure links (longest proper suffix that is a prefix of some pattern) + output lists.
Build: Trie O(m), failure links via BFS O(m). Search: one pass over text O(n + z). Total O(n + m + z).
At each node, output = patterns ending at this node plus those at nodes on the failure chain. Do not advance text index when following failure.
Use when k patterns and one text; for single pattern, KMP or Rabin-Karp is simpler.

7.15 Longest Repeating Substring

Introduction

The longest repeating substring problem asks: given a string S, find the longest substring that appears at least twice in S (the two occurrences may overlap). For example, in "banana" the substring "ana" appears at indices 1 and 3; "anana" would be longer but only appears once; so "ana" (length 3) is one valid answer. This problem ties together suffix structures (7.12, 7.13) and hashing (7.6, 7.7): we can solve it with suffix array + LCP (max LCP value gives the length; the substring is at SA[i] for that i), with a suffix tree (deepest internal node with ≥2 leaves), or with binary search on length + rolling hash (for a fixed length L, check if any length-L substring appears twice using a set of hashes). Each approach has different time/space trade-offs; we cover the main ones here.

Real-World Analogy

Imagine a long paragraph. You want to find the longest phrase that appears more than once—perhaps it's a repeated slogan or a copy-paste. You could list every possible phrase and see which repeats (too slow), or you could use structure: sort all suffixes and notice that when two suffixes share a long common prefix, that prefix is a repeating substring. The "longest common prefix" between consecutive sorted suffixes (the LCP array) directly tells you the longest such repeat.

Example

S = "banana". Suffixes sorted: "a"(5), "ana"(3), "anana"(1), "banana"(0), "na"(4), "nana"(2). LCP: "ana" and "anana" share "ana" (length 3); "anana" and "banana" share "" (0); etc. Max LCP = 3, and the substring is "ana" (e.g. at SA[1]=3, so S[3:6]="ana"). So longest repeating substring = "ana". In "aaaa" the longest repeating substring is "aaa" (appears at 0 and 1).

Formal Definition

Concept Note

Longest repeating substring: A substring T of S that appears at least twice (i.e. there exist indices i ≠ j such that S[i..i+|T|] = S[j..j+|T|]) and |T| is maximum. Overlapping is allowed (e.g. "aaa" in "aaaa" has occurrences 0 and 1). Suffix array + LCP: After sorting suffixes, LCP[i] = length of common prefix of S[SA[i]:] and S[SA[i−1]:]. Any common prefix of two suffixes is a repeating substring (it appears at SA[i] and SA[i−1]). So max LCP over i is the length of the longest repeating substring; the substring itself is S[SA[i]: SA[i] + LCP[i]] for an i that achieves the max. Hash approach: For length L, use rolling hash to get hashes of all length-L substrings; if any hash appears at least twice, some substring of length L repeats. Binary search on L to find the maximum such L.

Why This Topic Matters

Classic string problem: Appears in interviews and contests; combines suffix array (7.12), LCP, or hashing (7.6–7.7).
Application: Plagiarism detection, data compression (repeated blocks), bioinformatics (repeated sequences).
Unifies earlier topics: You can solve with suffix array + LCP (from 7.12), suffix tree (7.13), or binary search + rolling hash (7.6, 7.7).

Approach 1: Suffix Array + LCP

Build the suffix array (e.g. naive sort of suffixes). Build the LCP array: LCP[i] = longest common prefix of S[SA[i]:] and S[SA[i−1]:]. Then max_len = max(LCP), and the substring is S[SA[i]: SA[i] + max_len] for any i where LCP[i] = max_len. If max_len is 0, there is no repeating substring of length ≥ 1 (all characters distinct).

def longest_repeating_suffix_array(s):
    n = len(s)
    if n < 2:
        return ""
    suffixes = [(s[i:], i) for i in range(n)]
    suffixes.sort(key=lambda x: x[0])
    sa = [idx for _, idx in suffixes]
    lcp = [0] * n
    for i in range(1, n):
        a, b = s[sa[i]:], s[sa[i-1]:]
        j = 0
        while j < len(a) and j < len(b) and a[j] == b[j]:
            j += 1
        lcp[i] = j
    max_len = max(lcp)
    if max_len == 0:
        return ""
    i = lcp.index(max_len)
    return s[sa[i]:sa[i] + max_len]

Time: O(n² log n) for naive SA + O(n²) for naive LCP (each LCP[i] can be O(n)). With O(n log n) SA build and O(n) LCP build, total O(n log n). Space: O(n).

Approach 2: Binary Search + Rolling Hash

Binary search on the length L (from 1 to n−1). For a fixed L, compute the hash of every length-L substring using a rolling hash (7.7). Store hashes in a set (or dict: hash → list of start indices). If any hash appears at least twice, then some substring of length L repeats—so we can try larger L. Otherwise try smaller L. Return the maximum L for which a repeat exists, and optionally the substring (e.g. by storing one start index per hash and checking for a second occurrence). To avoid collisions, use double hashing or verify with a direct comparison when two hashes match.

def has_repeating_substring(s, L, B=31, M=10**9+7):
    """True if some length-L substring appears at least twice."""
    n = len(s)
    if L > n or L <= 0:
        return False
    seen = {}
    base_pow = pow(B, L - 1, M)
    h = 0
    for i in range(L):
        h = (h * B + ord(s[i])) % M
    seen[h] = [0]
    for i in range(1, n - L + 1):
        h = (h - ord(s[i-1]) * base_pow) % M
        h = (h * B + ord(s[i + L - 1])) % M
        h = (h % M + M) % M
        if h in seen:
            for start in seen[h]:
                if s[start:start+L] == s[i:i+L]:
                    return True
            seen[h].append(i)
        else:
            seen[h] = [i]
    return False

def longest_repeating_binary_search(s):
    n = len(s)
    if n < 2:
        return ""
    lo, hi = 1, n - 1
    best_len = 0
    while lo <= hi:
        mid = (lo + hi) // 2
        if has_repeating_substring(s, mid):
            best_len = mid
            lo = mid + 1
        else:
            hi = mid - 1
    if best_len == 0:
        return ""
    for i in range(n - best_len + 1):
        sub = s[i:i+best_len]
        if s.count(sub) >= 2:  # or use hash to find second occurrence
            return sub
    return ""

Time: O(n log n) binary search steps; each step O(n) with rolling hash (and possibly O(n) for verification if we use a simple dict). So O(n log n) or O(n² log n) with naive verification. With double hashing we can often skip verification. Space: O(n) for the set/dict.

Approach 3: Suffix Tree (Concept)

Build the suffix tree (7.13). The longest repeating substring is the path label of the deepest internal node that has at least two leaf descendants (i.e. the substring appears at least twice). Depth = number of characters from root to that node. With Ukkonen's algorithm the tree is built in O(n); then a DFS to find the deepest such node is O(n). So total O(n).

Comparison of Approaches

Approach	Time	Note
Suffix array + LCP (naive)	O(n² log n)	Simple; efficient SA gives O(n log n)
Binary search + hash	O(n log n)	No suffix structure; verify to avoid collisions
Suffix tree (Ukkonen)	O(n)	Optimal; implementation complex

Optimization Insight

Brute force: For each length L from n−1 down to 1, check all O(n) substrings of length L and see if any appears twice—O(n³) or O(n²) with hashing per L. Better: Suffix array + LCP gives the answer in one pass over the LCP array after building SA. Alternative: Binary search on L + rolling hash: for each L we do one pass O(n); log n values of L → O(n log n). Suffix tree gives O(n) with Ukkonen.

Edge Cases

All characters distinct: No repeating substring of length ≥ 1; return "".
String length < 2: No repeat possible; return "".
Entire string repeats: e.g. "abab" — longest repeating substring could be "ab" (length 2) or "abab" (if we allow overlapping: "abab" appears at 0 and 2? No, 0 and 2 gives "ab" and "ab" so "ab" repeats. For "aaaa", "aaa" repeats.)

Common Mistakes

Confusing "longest repeating" with "longest substring that appears exactly twice"—the problem usually means "at least twice," so "aaa" in "aaaa" is valid.
In LCP approach, returning the substring from the wrong index: use SA[i] (the start of the suffix at position i in the sorted order), and length LCP[i].
In hash approach, not handling collisions: two different substrings can have the same hash; verify with direct comparison or use double hashing.

Expert Tip

For interviews, the suffix array + LCP approach is the most direct: "Sort the suffixes, compute LCP; the maximum LCP value is the length of the longest repeating substring; the substring is S[SA[i]: SA[i]+LCP[i]] for that i." If the interviewer wants O(n log n) without building a full suffix array, binary search + rolling hash is a good alternative.

Interview Insight

"We can build the suffix array and LCP array. The longest repeating substring has length max(LCP), and we get the actual substring from S[SA[i]: SA[i]+LCP[i]] for an i that achieves the max. Alternatively, binary search on the length L and for each L use a rolling hash to check if any length-L substring appears twice—O(n log n)." Implement one approach; mention the other and suffix tree for O(n).

Practice Problems

Longest Repeating Substring (LeetCode 1044): return the longest substring that appears at least twice.
Longest Duplicate Substring: same problem; often solved with binary search + hashing or suffix array.

Summary

Longest repeating substring = longest substring that appears at least twice (overlap allowed).
Suffix array + LCP: max(LCP) = length; substring = S[SA[i]: SA[i]+LCP[i]] for i with LCP[i] = max. Build SA (e.g. O(n² log n) naive) and LCP.
Binary search + hash: Binary search on length L; for each L, rolling hash + set to see if any hash repeats; verify to avoid collisions. O(n log n).
Suffix tree: Deepest internal node with ≥2 leaves; O(n) with Ukkonen.

8.1 Singly Linked List

Introduction

An array stores elements in contiguous memory: you can jump to any index in O(1), but inserting or deleting in the middle (or at the front) can cost O(n) because you must shift elements. A singly linked list is a different way to represent a sequence: each element lives in a node that holds a value and a pointer (reference) to the next node. There is no random access by index—you must walk from the head—but insertion and deletion at the front (or at a known node) can be done in O(1). This tradeoff (no random access vs cheap front/middle updates) is why linked lists appear everywhere: in low-level memory allocators, in LRU caches, in graph adjacency lists, and in countless interview problems.

In this section we build a singly linked list from scratch in Python: the node class, the list class, and the core operations—traverse, insert at head/tail, delete, search. You will see exactly why some operations are O(1) and others O(n), and how to avoid the most common bugs (losing references, off-by-one, empty list). Master this and you are ready for reverse list, cycle detection, merge lists, and LRU cache.

Real-World Analogy

Imagine a treasure hunt where each clue card says: “Your next clue is at the red mailbox.” You start at the first card (the head), read it, follow to the next card, and so on until one card says “You’re done” (no next—that’s null). You cannot jump to “the 5th card” without walking through the first four. Adding a new first clue is easy: write a new card that points to the old first card and call that the new head. Removing the first clue is easy: the new head is whatever the first card pointed to. That’s a singly linked list: each node points only forward; to go backward you’d need a doubly linked list (Section 8.2).

Example

Browser “Back” and “Forward” buttons: the history of visited pages can be modeled as a list. If we only move forward (clicking links), a singly linked list is enough: each page holds a reference to the “next” page. When you insert a new visit (new page), you make it the new head and point it to the previous head—O(1). No need to shift a big array.

Formal Definition

Concept Note

Singly linked list: A linear data structure consisting of nodes. Each node contains (1) a value (data) and (2) a next reference (pointer) to the next node, or None if it is the last node. The list is accessed via a head reference to the first node. There is no direct access by index; traversal is sequential from head to the node whose next is None. The number of nodes is the length of the list.

We do not store “where is the 3rd element” in one step—we must follow head → next → next. So access by index is O(k) for the k-th element. Insertion after a given node (or at head) is O(1) if we already have a reference to that node; insertion at tail is O(1) if we keep a tail pointer, otherwise O(n). Deletion of a node is O(1) if we have a reference to the node before it (so we can rewire before.next); otherwise we may need O(n) to find that predecessor.

Why This Topic Matters

Foundation for linked structures: Doubly linked lists, circular lists, and many graph/tree representations build on the same “node + pointer” idea. If you understand a singly linked list, you understand the pattern.
Interview staple: Reverse a list, detect cycle, merge two sorted lists, remove Nth from end, reorder list—all assume you can traverse and mutate pointers confidently.
Real systems: Free lists in memory allocators, buckets in hash tables (chaining), and LRU caches often use linked lists for O(1) front insertion and removal.

Mental Model

Picture the list as a chain of boxes. Each box has two slots: one for data, one for “next box.” The first box is the head. The last box’s “next” slot is empty (None). To traverse: start at head, look at data, then move to the box in “next”; repeat until “next” is empty. To insert at front: create a new box whose “next” is the current head, then set head to the new box. Never “lose” the head or the rest of the chain—always update pointers in an order that doesn’t drop references.

Step-by-Step: Node and List Structure

A node is the unit of the list. In Python we use a class with data (or val) and next. The list itself is represented by a head reference; optionally we keep a tail and a length for O(1) tail operations and O(1) size.

1. Define the Node

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

Every node holds one value and a link to the next. None means “no next node.”

2. Define the List (Head + Optional Tail and Length)

class LinkedList:
    def __init__(self):
        self.head = None   # empty list
        # Optional: self.tail = None, self.length = 0

An empty list is head is None. We can add tail for O(1) append and length for O(1) size.

ASCII Diagram: Structure and Traversal

  Empty list:   head → None

  List [10, 20, 30]:
       head
         │
         ▼
  ┌─────┬─────┐    ┌─────┬─────┐    ┌─────┬─────┐
  │ 10  │  ●──┼───►│ 20  │  ●──┼───►│ 30  │ None│
  └─────┴─────┘    └─────┴─────┘    └─────┴─────┘
    node1              node2           node3
  (data, next)       (data, next)     (data, next)

  Traversal: curr = head → curr = curr.next → curr = curr.next → curr is None (stop).

Python Implementation: Full Singly Linked List

Below is a complete implementation with: insert at head, insert at tail (with tail pointer), delete at head, search by value, get length, and traverse/print. We use a tail pointer so that appending is O(1).

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

class LinkedList:
    def __init__(self):
        self.head = None
        self.tail = None
        self.length = 0

    def is_empty(self):
        return self.head is None

    def insert_at_head(self, data):
        new_node = Node(data)
        new_node.next = self.head
        self.head = new_node
        if self.tail is None:
            self.tail = new_node
        self.length += 1

    def insert_at_tail(self, data):
        new_node = Node(data)
        if self.tail is None:
            self.head = self.tail = new_node
        else:
            self.tail.next = new_node
            self.tail = new_node
        self.length += 1

    def delete_at_head(self):
        if self.head is None:
            return None
        value = self.head.data
        self.head = self.head.next
        if self.head is None:
            self.tail = None
        self.length -= 1
        return value

    def search(self, key):
        curr = self.head
        while curr is not None:
            if curr.data == key:
                return True
            curr = curr.next
        return False

    def get_length(self):
        return self.length

    def to_list(self):
        result = []
        curr = self.head
        while curr is not None:
            result.append(curr.data)
            curr = curr.next
        return result

Line-by-Line Explanation (Key Parts)

insert_at_head: Create a new node; set its next to current head; set head to the new node. If the list was empty, the new node is also the tail. Always update length. Time O(1).
insert_at_tail: Create a new node. If list is empty, set both head and tail to it. Otherwise set tail.next = new_node and then tail = new_node. Time O(1) because we have a tail pointer.
delete_at_head: If empty, return None. Otherwise save head.data, set head = head.next. If the list becomes empty (head is None), set tail = None. Decrement length. Time O(1).
search: Walk from head with curr = curr.next until we find key or hit None. Time O(n).
to_list: Traverse and collect values into a Python list. Time O(n), space O(n) for the result.

Common Mistake

When inserting or deleting, updating pointers in the wrong order can “lose” the rest of the list. Rule: when you need to rewire A → B to A → C → B, first set C.next = B, then set A.next = C. If you set A.next = C before setting C.next, you lose the reference to B. Similarly, in delete_at_head, saving the value or the next node before changing head avoids losing the list.

Insertion in the Middle (After a Given Node)

If you have a reference node to the node after which you want to insert:

def insert_after(self, node, data):
    if node is None:
        return
    new_node = Node(data)
    new_node.next = node.next
    node.next = new_node
    if node == self.tail:
        self.tail = new_node
    self.length += 1

Order matters: set new_node.next = node.next first, then node.next = new_node. Time O(1). If you don’t have a reference to the node and must search by index or value, finding that node is O(n).

Deletion by Reference (Delete Node Given Only That Node)

In a singly linked list, to remove a node you normally need the previous node so you can do prev.next = node.next. If you are given only the node to delete (and no reference to the head), you cannot get the previous node without traversing from head—unless you “fake” deletion by copying the next node’s value into the current node and then skipping the next node. That gives O(1) but does not work for the tail (there is no “next” to copy). We’ll see this trick again in “Delete Node in a Linked List” (LeetCode 237).

# Given only the node to delete (and it's not the tail):
def delete_node(node):
    node.data = node.next.data
    node.next = node.next.next

Time Complexity Summary

Operation	Time	Note
Access by index k	O(k)	Must traverse k nodes from head
Search by value	O(n)	Worst case scan entire list
Insert at head	O(1)	Just rewire head
Insert at tail (with tail ptr)	O(1)	Wire tail → new, update tail
Insert after given node	O(1)	If you have the node reference
Delete at head	O(1)	head = head.next
Delete by reference (copy trick)	O(1)	Only when node is not tail
Traverse / length (no stored length)	O(n)	One pass from head to None

Space Complexity

The list itself uses O(n) space for n nodes (each node stores a value and a next reference). Additional space for operations: insert_at_head, delete_at_head, insert_after use O(1) extra; search and to_list use O(1) extra for the pointer/index, but to_list returns a new list of size n so the output is O(n).

Edge Cases

Empty list: head (and tail) is None. Every operation must check: insert at head/tail sets both head and tail to the new node; delete at head returns None and leaves list empty.
Single node: After delete_at_head, set tail = None so the list is correctly empty.
Insert after tail: When you do insert_after(tail, data), update self.tail = new_node so tail stays correct.
Delete the only node: head becomes None; tail must also become None.

Common Mistakes

Reversing the order of pointer updates and losing the rest of the list. Always set the new node’s next before changing existing pointers.
Forgetting to update tail when the last node is removed or when a new node is added at the end.
Assuming you can delete a node in O(1) when given only that node and it’s the tail—you cannot without a reference to the previous node (or a doubly linked list).
Off-by-one in “get k-th node”: the 1st node is at index 0; the k-th node requires k−1 moves of curr = curr.next from head.

Optimization Insight

Without tail pointer: Insert at tail is O(n) because you must traverse to the last node. With tail pointer: Insert at tail becomes O(1). The tradeoff is one extra field and updating it on every operation that changes the last node. For a queue implemented as a linked list, tail is essential for O(1) enqueue. Stored length: If you maintain self.length, get_length is O(1); otherwise it’s O(n).

Pattern Recognition

Many linked-list problems use the same patterns:

Two pointers: Slow and fast (e.g. find middle, detect cycle, remove Nth from end).
Dummy node: A dummy head (e.g. dummy = Node(0); dummy.next = head) simplifies edge cases when the head might change (insertions, deletions at front).
Prev/curr traversal: When you need to delete a node or insert before a node, keep prev and curr and advance prev = curr; curr = curr.next.

Interview Insight

Interviewers expect you to implement a linked list without hesitation and to reason about pointer updates. Always clarify: “Is the list singly or doubly linked? Do we have a tail? Can we use a dummy node?” For “delete node given only that node,” mention the copy trick (overwrite with next, skip next) and its limitation (doesn’t work for tail). For “insert at head” and “delete at head,” state O(1) and show the exact pointer updates. Drawing a small diagram (3–4 nodes) before coding helps avoid pointer bugs.

Practice Problems

LeetCode 206: Reverse Linked List (iterative and recursive).
LeetCode 141: Linked List Cycle (Floyd’s slow/fast).
LeetCode 21: Merge Two Sorted Lists.
LeetCode 19: Remove Nth Node From End of List (two pointers).
LeetCode 237: Delete Node in a Linked List (copy trick).
LeetCode 876: Middle of the Linked List (slow/fast).

Summary

A singly linked list is a sequence of nodes; each node has data and next (or None). The list is accessed via head; optionally tail and length for O(1) append and size.
Insert at head / tail (with tail) / after a node: O(1). Delete at head: O(1). Search / access by index: O(n).
Always update pointers in an order that doesn’t lose references; update tail when the last node changes.
Use dummy node, two pointers, and prev/curr as standard patterns for list problems. Master this before tackling reverse, cycle detection, and merge (Sections 8.4–8.6).

8.2 Doubly Linked List

Introduction

In a singly linked list, each node points only to the next node. To delete a node, you typically need a reference to the previous node—which can cost O(n) to find. A doubly linked list adds a prev pointer in each node, so every node knows both its predecessor and its successor. That allows O(1) deletion of a node when you have a reference to that node (you just rewire prev.next and next.prev), and backward traversal from tail to head. The cost is extra space (one more pointer per node) and slightly more complex pointer updates. Doubly linked lists are the backbone of many practical structures: LRU caches (Section 8.9), browser back/forward history, and ordered data structures that need fast removal of an arbitrary element.

In this section we define the doubly linked list node and list class, implement insert/delete at head and tail, and—importantly—delete a node given only that node in O(1). We compare with the singly linked list and show when the extra prev pointer is worth it.

Real-World Analogy

Think of a train with bidirectional links. Each carriage has a door to the next carriage and a door to the previous one. From any carriage you can walk forward or backward without starting from the engine. If you remove one carriage, you only need to connect “previous carriage → next carriage” and “next carriage → previous carriage”—you don’t need to walk from the engine to find the previous carriage. That’s the doubly linked list: each node has prev and next; deletion at a known node is just rewiring two links.

Example

Browser back/forward: each page has “previous page” and “next page.” When you click a new link, the new page becomes the new “head” of the forward list and its prev points to the current page. When you click Back, you follow prev; when you click Forward, you follow next. Removing a page from the middle (e.g. clearing history from a point) is O(1) if you have the node—exactly what a doubly linked list provides.

Formal Definition

Concept Note

Doubly linked list: A linear data structure of nodes. Each node has (1) data, (2) next (reference to the next node, or None at the tail), and (3) prev (reference to the previous node, or None at the head). The list is accessed via head (first node) and optionally tail (last node). Traversal can go forward (head → next → …) or backward (tail → prev → …). Given a reference to any node, that node can be removed in O(1) by updating its predecessor’s next and its successor’s prev.

The key advantage over a singly linked list: delete node in O(1) given only that node. In a singly linked list you need the previous node to do deletion; in a doubly linked list the node itself carries prev, so you have everything you need.

Why This Topic Matters

LRU cache: The standard implementation keeps “most recently used” items in a doubly linked list so the least recently used (tail) can be evicted in O(1), and moving a node to “most recent” (head) is O(1) by remove-then-insert-at-head—both require O(1) delete at a known node.
Ordered structures: Many ordered containers (e.g. Java’s LinkedList, Python’s deque with certain operations) use doubly linked lists for O(1) removal of an arbitrary element when you have an iterator/node.
Interview follow-up: “How would you delete a node in O(1) given only that node?” Answer: use a doubly linked list (or the “copy next into node” trick for singly, which doesn’t work for the tail).

Mental Model

Picture a chain where each link has two hooks: one to the next link, one to the previous. The first link’s “previous” hook is empty (None); the last link’s “next” hook is empty. To remove a link: disconnect its prev link’s “next” from it, and its next link’s “prev” from it; then connect prev’s next to next, and next’s prev to prev. You never need to traverse from the head to find “who points to this node”—the node’s prev tells you.

Node and List Structure

Node: data, prev, next

class Node:
    def __init__(self, data):
        self.data = data
        self.prev = None
        self.next = None

Each node has two pointers. head.prev and tail.next are always None.

List: head and tail

class DoublyLinkedList:
    def __init__(self):
        self.head = None
        self.tail = None
        self.length = 0

ASCII Diagram

  Empty:  head → None,  tail → None

  List [10, 20, 30]:
       head                                    tail
         │                                       │
         ▼                                       ▼
  ┌─────┬───┬─────┐    ┌─────┬───┬─────┐    ┌─────┬───┬─────┐
  │None │ 10│  ●──┼───►│  ●  │ 20│  ●──┼───►│  ●  │ 30│None│
  │  ◄──┼───┼──◄─┼────│─────┼───┼──◄─┼────│─────┼───┼──◄─┘
  └─────┴───┴─────┘    └─────┴───┴─────┘    └─────┴───┴─────┘
    prev data next       prev data next       prev data next

  Delete middle node (20): set 10's next = 30, set 30's prev = 10. No traversal needed.

Python Implementation

class Node:
    def __init__(self, data):
        self.data = data
        self.prev = None
        self.next = None

class DoublyLinkedList:
    def __init__(self):
        self.head = None
        self.tail = None
        self.length = 0

    def is_empty(self):
        return self.head is None

    def insert_at_head(self, data):
        new_node = Node(data)
        new_node.next = self.head
        if self.head is not None:
            self.head.prev = new_node
        self.head = new_node
        if self.tail is None:
            self.tail = new_node
        self.length += 1

    def insert_at_tail(self, data):
        new_node = Node(data)
        new_node.prev = self.tail
        if self.tail is not None:
            self.tail.next = new_node
        self.tail = new_node
        if self.head is None:
            self.head = new_node
        self.length += 1

    def delete_at_head(self):
        if self.head is None:
            return None
        value = self.head.data
        self.head = self.head.next
        if self.head is None:
            self.tail = None
        else:
            self.head.prev = None
        self.length -= 1
        return value

    def delete_at_tail(self):
        if self.tail is None:
            return None
        value = self.tail.data
        self.tail = self.tail.prev
        if self.tail is None:
            self.head = None
        else:
            self.tail.next = None
        self.length -= 1
        return value

    def delete_node(self, node):
        """Remove node in O(1) given a reference to it."""
        if node is None:
            return
        if node.prev is not None:
            node.prev.next = node.next
        else:
            self.head = node.next
        if node.next is not None:
            node.next.prev = node.prev
        else:
            self.tail = node.prev
        self.length -= 1

    def search_forward(self, key):
        curr = self.head
        while curr is not None:
            if curr.data == key:
                return True
            curr = curr.next
        return False

    def to_list_forward(self):
        result = []
        curr = self.head
        while curr is not None:
            result.append(curr.data)
            curr = curr.next
        return result

    def to_list_backward(self):
        result = []
        curr = self.tail
        while curr is not None:
            result.append(curr.data)
            curr = curr.prev
        return result

Line-by-Line Explanation (Key Operations)

insert_at_head: New node’s next = head. If list was non-empty, head.prev = new_node. Then head = new_node. If list was empty, tail = new_node. Order: wire new node into the list first, then update head. O(1).
insert_at_tail: Symmetric: new_node.prev = tail; if non-empty tail.next = new_node; tail = new_node; if empty head = new_node. O(1).
delete_at_head: Save value, head = head.next. If list becomes empty, tail = None. Else head.prev = None (new head has no predecessor). O(1).
delete_at_tail: Symmetric: tail = tail.prev; if empty head = None; else tail.next = None. O(1).
delete_node(node): If node.prev exists, set node.prev.next = node.next; else node was head, so head = node.next. If node.next exists, set node.next.prev = node.prev; else node was tail, so tail = node.prev. No traversal—O(1).

Common Mistake

When deleting a node, forgetting to handle the case when the node is head or tail. If node.prev is None, you must set self.head = node.next; if node.next is None, you must set self.tail = node.prev. Also, when delete_at_head leaves the list with one node removed, the new head’s prev must become None—otherwise you leave a dangling reference.

Singly vs Doubly Linked List: Comparison

Aspect	Singly	Doubly
Space per node	1 pointer (next)	2 pointers (prev, next)
Delete node given only that node	O(1) copy trick (not for tail), else O(n) to find prev	O(1) always
Backward traversal	Not possible without extra structure	O(n) from tail
Insert/delete at head or tail	O(1) with head (and tail for insert tail)	O(1)
Pointer updates per insert/delete	Fewer	More (prev and next both sides)

Time and Space Complexity

Time: Insert at head/tail O(1). Delete at head/tail O(1). Delete given node O(1). Search O(n). Access by index O(k). Forward or backward traversal O(n).

Space: O(n) for n nodes; each node has two pointers plus data. Doubly uses roughly one extra pointer per node compared to singly (e.g. 2× pointer space per node).

Edge Cases

Empty list: head and tail are None. Insert at head or tail sets both head and tail to the new node.
Single node: head == tail. delete_at_head or delete_at_tail must set both head and tail to None.
delete_node on head: node.prev is None → set head = node.next. If that’s None, set tail = None.
delete_node on tail: node.next is None → set tail = node.prev. If that’s None, set head = None.

Common Mistakes

Updating only next (or only prev) when deleting, leaving the other direction inconsistent. Both prev.next and next.prev must be updated (or head/tail if at boundary).
In insert_at_head, forgetting head.prev = new_node when the list was non-empty—then the old head’s prev stays None and backward traversal is wrong.
Assuming delete_node is O(1) in a singly linked list without the copy trick—in general you need O(n) to find the previous node.

Optimization Insight

Use a doubly linked list when you need O(1) removal of an arbitrary node given a reference (e.g. LRU cache: remove from current position and add to head). Use singly when you only need front/back insertion and deletion and don’t need to delete an arbitrary node by reference—saves space and simpler code. For “delete node given only that node” in an interview, stating “with a doubly linked list this is O(1)” shows you know the tradeoff.

Interview Insight

When the problem involves “remove from the middle in O(1)” or “move this item to the front in O(1)” (e.g. LRU), a doubly linked list is the right structure. Be ready to implement delete_node(node) and to handle head/tail in that method. Draw a 3-node diagram and show updating both prev.next and next.prev (or head/tail) so the interviewer sees you handle boundaries. Comparing “singly: need prev to delete, so O(n) unless copy trick; doubly: O(1) delete given node” is a strong answer.

Practice Problems

LeetCode 146: LRU Cache (doubly linked list + hash map for O(1) get/put).
LeetCode 430: Flatten a Multilevel Doubly Linked List.
LeetCode 707: Design Linked List (support both singly and doubly).

Summary

A doubly linked list has nodes with data, prev, and next. head and tail give O(1) access to both ends.
Delete a node in O(1) given only that node by rewiring node.prev.next and node.next.prev (and head/tail if node is head or tail).
Insert/delete at head or tail remain O(1). Backward traversal is O(n) from tail. Use doubly when you need O(1) arbitrary-node removal or backward traversal; use singly when you don’t, to save space and simplify updates.

8.3 Circular Linked List

Introduction

In a normal (linear) linked list, the last node’s next is None—traversal stops when you hit that. In a circular linked list, the last node’s next points back to the head, so there is no “end”: from any node you can keep following next and eventually loop back. That shape is useful when the data is naturally cyclic: round-robin scheduling, Josephus problem, multiplayer turn order, or any “rotate through a fixed set” scenario. You can build a singly circular list (one pointer per node, tail → head) or a doubly circular list (head’s prev → tail, tail’s next → head). The main implementation catch is termination: a traversal must stop when it reaches the starting point again, or you’ll loop forever.

In this section we define circular singly and doubly linked lists, implement insert and delete (with care for the one-node case), and show how to traverse safely. We also touch on the Josephus problem as a classic application.

Real-World Analogy

Imagine a round table of people. Each person can point to the person to their right. The person at the “last” seat points back to the first—so there is no real last: everyone has a next. To go around the table you start at one person and keep moving “next” until you’re back where you started. That’s a circular list. Removing someone means rewiring “previous person → next person”; if you remove the “head,” the new head is whoever was second. The table never has a physical end—only a designated starting point (head) for convenience.

Example

Round-robin CPU scheduling: Ready processes are kept in a circular list. The scheduler gives the CPU to the current node, then advances to next; when it reaches the “head” again it has completed one full round. No need to check for “end of list”—the list is circular by design.

Formal Definition

Concept Note

Circular linked list (singly): A linked list in which the last node’s next points to the first node (head), forming a ring. There is no node with next is None. The list is identified by a head (or any designated node). Traversal: start at head, follow next until you return to head (or use a counter for n steps). Circular doubly linked list: Same idea, with head.prev = tail and tail.next = head, so you can go forward or backward in a loop.

Empty list is still head is None. A list with one node has node.next = node (and in doubly, node.prev = node)—that’s the critical edge case.

Why This Topic Matters

Round-robin and rotation: Any “take turns in a circle” or “rotate through items” logic maps naturally to a circular list; advancing is just curr = curr.next with no special case for the last element.
Josephus problem: Classic problem: n people in a circle, eliminate every k-th person; who survives? Modeling the circle as a circular list and repeatedly removing the k-th node is a direct approach.
Interview awareness: Less common than linear lists, but “detect cycle” (Section 8.5) and “split a circular list” sometimes appear; understanding that the last node points to head (not None) avoids bugs.

Mental Model

Picture a ring or loop of nodes. You have a “start” marker (head). Walking “next” from any node always leads to another node; after n steps you’re back at the start. To avoid infinite loops in code, you either (1) stop when curr == head again (after at least one step), or (2) iterate exactly n times if you know the length, or (3) use a “visited” set (rarely needed if you control the structure).

Singly Circular: Node and List

Same node as singly linked list: data and next. The list keeps a head; the last node satisfies last.next == head. Optionally we keep a tail so that insert-at-tail is O(1) and we don’t have to traverse to find the last node.

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

class CircularLinkedList:
    def __init__(self):
        self.head = None
        self.tail = None   # tail.next will point to head
        self.length = 0

ASCII Diagram

  Empty:  head → None,  tail → None

  Circular list [10, 20, 30]:
                    head
                      │
                      ▼
              ┌─────┬─────┐
              │ 30  │  ●──┼──┐
              └──●──┴─────┘  │
                 │           │
                 │    ┌─────┬▼────┐
                 └───►│ 10  │  ●──┼──┐
                      └─────┴──●─┘  │
                         │          │
                         │   ┌─────┬▼────┐
                         └──►│ 20  │  ●──┼──┘
                             └─────┴─────┘
                tail.next → head; last node (30) points to 10.

  One node:  head → [data|next] → points to self (node.next = node).

Python Implementation: Singly Circular

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

class CircularLinkedList:
    def __init__(self):
        self.head = None
        self.tail = None
        self.length = 0

    def is_empty(self):
        return self.head is None

    def insert_at_head(self, data):
        new_node = Node(data)
        if self.head is None:
            new_node.next = new_node
            self.head = self.tail = new_node
        else:
            new_node.next = self.head
            self.tail.next = new_node
            self.head = new_node
        self.length += 1

    def insert_at_tail(self, data):
        new_node = Node(data)
        if self.head is None:
            new_node.next = new_node
            self.head = self.tail = new_node
        else:
            new_node.next = self.head
            self.tail.next = new_node
            self.tail = new_node
        self.length += 1

    def delete_at_head(self):
        if self.head is None:
            return None
        value = self.head.data
        if self.head == self.tail:
            self.head = self.tail = None
        else:
            self.tail.next = self.head.next
            self.head = self.head.next
        self.length -= 1
        return value

    def traverse(self, max_steps=None):
        """Yield nodes from head, then around. If max_steps is n, yield n times then stop."""
        if self.head is None:
            return
        curr = self.head
        steps = 0
        while True:
            yield curr.data
            steps += 1
            if max_steps is not None and steps >= max_steps:
                break
            curr = curr.next
            if curr == self.head:
                break

Line-by-Line Explanation (Key Points)

insert_at_head (non-empty): New node’s next = head. Then tail.next = new_node so the circle stays closed (old last still points to the new first). Then head = new_node. If we didn’t update tail.next, the circle would break.
insert_at_head / insert_at_tail (empty): Only node in the list, so new_node.next = new_node. Set both head and tail to it.
delete_at_head: If single node, set head and tail to None. Otherwise, tail.next = head.next (close the circle around the new head), then head = head.next. Forgetting to update tail.next would leave tail pointing to the removed node and break the circle.
traverse: Stop when curr == self.head after at least one step, or when max_steps is reached. Without a stopping condition you get an infinite loop.

Common Mistake

When inserting or deleting at head (or tail), forgetting to update the last node’s next. In a circular list, the tail always points to the head. So: on insert_at_head, set tail.next = new_node before moving head; on delete_at_head, set tail.next = head.next before moving head. Also: the one-node case must set node.next = node and on delete set both head and tail to None.

Doubly Circular (Concept)

In a doubly circular list, head.prev = tail and tail.next = head. Insert/delete at head or tail require updating both the head and tail sides of the ring (four pointer updates instead of two). The same O(1) delete-given-node idea from Section 8.2 applies: node.prev.next = node.next and node.next.prev = node.prev; no need to treat “end” differently because there is no end.

Traversal and Termination

Ways to traverse once around without infinite loop:

By count: For i in range(length): use curr, then curr = curr.next. Stops after n steps.
By reaching head again: Start at curr = head; do { process curr; curr = curr.next } while curr != head. Process each node exactly once.
Do-while style: Process node, then advance; stop when you’re back at head. Ensures the head is processed.

Time and Space Complexity

Same as linear singly linked list when a tail pointer is kept: insert at head O(1), insert at tail O(1), delete at head O(1). Search O(n)—and you must use a count or “back at head” to stop. Space O(n) for n nodes. The only extra cost is maintaining the circle (updating tail.next on head insert/delete).

Edge Cases

Empty list: head and tail None. Insert creates the single node with node.next = node.
Single node: head == tail and head.next == head. Delete must set head = tail = None.
Two nodes: Each points to the other. Delete head: tail stays, tail.next must become tail (new head), so tail.next = head.next correctly gives tail.next = tail.

Josephus Problem (Application)

n people stand in a circle; every k-th person is eliminated until one remains. Model: circular list of n nodes. Repeatedly advance k−1 steps (so the next node is the k-th), remove that node, and continue from the next. When one node remains, its data is the survivor. Removal is O(1) if we use a doubly circular list (or O(k) per removal with singly by advancing then deleting the next node). Total time O(n·k) naive; faster formulations exist using recursion or closed form.

Expert Tip

For “delete the k-th node from current” in a circular list, advance (k−1) times so you’re at the node before the one to delete, then do “prev.next = curr.next” (singly: you need the predecessor, so advance k−1 from current to land on predecessor, then skip next). In a doubly circular list you can land on the node to delete and remove it in O(1).

Interview Insight

If the problem says “circular list” or “nodes in a ring,” remember: there is no None at the end—traversal stops when you return to the start (or after n steps). Mention the one-node case (node.next = node) and that insert/delete at head must update tail.next to keep the circle closed. For Josephus, briefly describe “circular list, repeatedly skip k−1 and remove the next node” and note that a doubly circular list allows O(1) removal.

Practice Problems

Implement a circular linked list with insert_at_head, insert_at_tail, delete_at_head, and safe traversal.
Josephus problem: find the survivor when every k-th person is eliminated from n people in a circle.
LeetCode 708: Insert into a Sorted Circular Linked List (insert and keep order; handle wrap-around).

Summary

A circular linked list has the last node’s next (and in doubly, head’s prev) point back to the head, forming a ring. No node has next/prev None (except in the empty list).
Maintain the circle on every insert/delete: e.g. tail.next = head always; on insert_at_head set tail.next = new_node; on delete_at_head set tail.next = head.next.
One-node list: node.next = node. Traverse by stopping when you return to head or after n steps. Use circular lists for round-robin, rotation, or Josephus-style problems.

8.4 Reverse Linked List

Introduction

Reversing a linked list means flipping the direction of every next pointer so that the last node becomes the new head and the original head becomes the tail. It is one of the most frequently asked linked-list problems in interviews and appears in many variations: reverse the whole list, reverse a portion (between positions m and n), reverse in groups of k, or reverse nodes in even/odd groups. Mastering the basic “reverse the entire list” gives you the same pointer-manipulation skills you need for all of these. We will build from intuition to code, then compare iterative (in-place) and recursive solutions and see why the iterative version is usually preferred for production and interviews.

Real-World Analogy

Imagine a train with cars linked in a single direction. The engine is at the front (head); each car is coupled only to the car behind it. To “reverse” the train, you don’t move the cars—you re-couple them. You start from the engine: uncouple it from the next car, then that next car becomes the new “front” and you attach the old engine behind it. You repeat: the car that was third is now first, and you attach the previous “front” behind it. By the time you reach the last car, the entire train is reversed: the old last car is the new engine, and the old engine is at the back. Reversing a linked list is the same idea: we rewire next pointers one node at a time, without moving data.

Example

Original list: head → 1 → 2 → 3 → None. After reverse: head → 3 → 2 → 1 → None. The node that held 1 now has next = None (it’s the tail); the node that held 3 is the new head. The values don’t move—only the links change.

Formal Definition

Concept Note

Reverse linked list (in-place): Given the head of a singly linked list, reverse the list by changing each node’s next pointer to point to its previous node (instead of the next). The former last node becomes the new head; the former head becomes the last node (its next becomes None). The operation is typically done in-place—no new list is allocated; only references are updated. Return the new head of the reversed list.

We do not create new nodes or copy values. We only change next pointers. That’s why we need to keep a reference to the “previous” node as we traverse: so we can set curr.next = prev. The challenge is doing this without losing the rest of the list—we must save the next node before we overwrite curr.next.

Why This Topic Matters

Interview staple: LeetCode 206 (Reverse Linked List) is among the most common questions. Interviewers use it to check if you can manipulate pointers correctly and handle the head/tail and single-node cases.
Building block: “Reverse between m and n,” “reverse in groups of k,” “reorder list,” and “palindrome linked list” all use the same idea: reverse a segment and reattach it. Once you can reverse a full list, you can reverse a sublist.
Pointer discipline: Reversing in-place teaches you to update pointers in the right order (save next, then rewire, then advance) and to avoid losing references—a skill that transfers to merge, partition, and reorder problems.

Mental Model

Picture three pointers moving together:

prev: The node that should come “after” the current node in the new list (i.e. the already-reversed part’s head). Initially None (because the new tail will point to nothing).
curr: The node we are currently rewiring. We will set curr.next = prev.
next_node: The rest of the list. We must save this before changing curr.next, otherwise we lose the list.

In one step we: save next_node = curr.next, set curr.next = prev, then move prev = curr and curr = next_node. Repeat until curr is None; then prev is the new head.

Step-by-Step Breakdown (Iterative)

Initialize prev = None and curr = head. We have not reversed any nodes yet, so “previous” to the first node is nothing.
While curr is not None:
- Save the rest of the list: next_node = curr.next.
- Rewire: curr.next = prev. Now the current node points backward (to the already-reversed part).
- Advance: prev = curr and curr = next_node.
When the loop exits, curr is None (we passed the tail). The last node we processed is prev, which is the new head. Return prev.

Critical rule: always save curr.next before you overwrite it. Otherwise you cannot move to the next node.

ASCII Diagram: One Step of the Iterative Reverse

  Before rewiring (curr = node 2, prev = node 1 reversed):

      prev    curr   next_node
        │       │        │
        ▼       ▼        ▼
  ┌─────┐  ┌─────┐  ┌─────┐
  │  1  │◄─│  2  │  │  3  │──► ...
  └──●──┘  └──●──┘  └──●──┘
     ▲         │        │
     │         │        │
  curr.next = prev   (save this before overwriting!)

  Step 1: next_node = curr.next   (save 3)
  Step 2: curr.next = prev        (2 now points to 1)
  Step 3: prev = curr, curr = next_node   (prev=2, curr=3)

  After one step:
  ... ◄── 1 ◄── 2    curr → 3 → ...
           prev

After processing the last node, curr becomes None and prev is the old last node—the new head. The list is fully reversed.

Evolution: From Extra Space to In-Place

We can reverse a linked list in several ways. Seeing the evolution helps you choose the right approach and understand why the iterative in-place method is optimal.

Approach 1: Brute Force — Use an Array (or Stack)

Idea: Traverse the list and push each node’s value onto an array. Then traverse again (or traverse the array from the end) and overwrite each node’s value with the values in reverse order. Alternatively, push references to nodes onto a stack, then pop and rewire next pointers.

Time: O(n). Space: O(n) for the array or stack. It works and is easy to reason about, but it uses extra space when we don’t need to.

Approach 2: Recursive Reverse

Idea: Assume we can reverse the sublist starting at head.next. That returns the new head of the reversed rest. The current head is still pointing to the old second node (which is now the tail of the reversed rest). We set head.next.next = head (the old second node now points back to head) and head.next = None, then return the new head we got from the recursion.

Time: O(n). Space: O(n) for the call stack. Elegant and teaches recursion; in practice the stack depth equals list length.

Approach 3: Iterative In-Place (Optimal)

Idea: The three-pointer method above. One pass, rewire as we go. No extra data structure and no recursion stack (aside from a few local variables).

Time: O(n). Space: O(1). This is what you want in interviews and production when asked to “reverse in-place.”

Approach	Time	Space	Note
Array/stack (values or refs)	O(n)	O(n)	Simple but extra space
Recursive	O(n)	O(n) stack	Elegant; stack depth = n
Iterative in-place	O(n)	O(1)	Preferred in interviews

Python Implementation

Iterative (In-Place) — Recommended

def reverse_list(head):
    prev = None
    curr = head
    while curr is not None:
        next_node = curr.next   # save rest of list
        curr.next = prev        # rewire: point backward
        prev = curr             # advance prev
        curr = next_node        # advance curr
    return prev                 # new head

Recursive

def reverse_list_rec(head):
    if head is None or head.next is None:
        return head
    new_head = reverse_list_rec(head.next)
    head.next.next = head   # reverse the link from next back to head
    head.next = None        # head becomes tail
    return new_head

Base case: empty list or single node—nothing to reverse, return head. Recursive case: reverse everything after head; that gives new_head. The node that was second (head.next) is now the tail of the reversed rest; we make it point back to head with head.next.next = head, then set head.next = None so head is the new tail. We never change new_head—it’s returned all the way up.

Line-by-Line Explanation (Iterative)

prev = None: The “previous” node for the first element is nothing; that’s why the new tail’s next will be None.
curr = head: We start at the current head; we’ll rewire it to point to prev (None).
while curr is not None: We process every node. When curr becomes None, we’ve passed the tail.
next_node = curr.next: Must save the rest of the list before we overwrite curr.next. Without this, we could not move to the next node.
curr.next = prev: Reverse the link. The current node now points to the already-reversed part (or None for the first node).
prev = curr, curr = next_node: Move both pointers forward. The node we just processed becomes the new “head” of the reversed portion; we advance curr to the next node to process.
return prev: When the loop ends, curr is None (we’ve passed the last node). The last node we processed is prev—that’s the new head of the reversed list.

Time Complexity

We visit each node exactly once: one next read, one next write, and pointer updates. So the number of operations is proportional to n (the number of nodes). Time: O(n).

Space Complexity

Iterative: Only a fixed number of variables (prev, curr, next_node). No extra data structure and no recursion. Space: O(1).
Recursive: Each call uses a stack frame. For a list of length n, we have n recursive calls (minus the base case). Space: O(n) for the call stack.

Edge Cases

Empty list (head is None): The while loop never runs; we return prev, which is None. Correct.
Single node: We do one iteration: next_node = None, curr.next = None, prev = head, curr = None. We return prev (the single node). Correct.
Two nodes: First iteration reverses the first node; second iteration reverses the second and we return it as the new head. Both iterative and recursive handle this without special code.

No need for a special check for empty or single node in the iterative version—the logic naturally returns None or the single node. You can add if not head or not head.next: return head as an early exit for clarity or for the recursive version (recursive needs the base case).

Common Mistakes

Common Mistake

Overwriting curr.next before saving it. If you do curr.next = prev first and then try to get “the next node” with curr = curr.next, you’ve lost the rest of the list. Always save next_node = curr.next at the start of the loop.

Returning head instead of prev: After the loop, the original head is now the tail; the new head is prev. Returning head would give the caller a list that looks like a single node (or wrong head).
Using curr.next after rewiring: Once curr.next = prev, the link to the rest of the list is gone unless you saved it in next_node. Never rely on curr.next for “advance” after you’ve changed it—use the saved reference.

Optimization Insight

For “reverse the entire list,” the iterative O(1) space solution is already optimal: you must touch every node to change its pointer, so Ω(n) time, and you need only a constant number of pointers to do it in one pass, so O(1) extra space is achievable. The only “optimization” beyond the standard iterative solution is code clarity: use clear names (prev, curr, next_node) and avoid unnecessary branches (empty/single-node are handled by the same loop).

Pattern Recognition

Reversing a linked list is a pointer-rewiring pattern:

Maintain a “reversed so far” (or “previous”) reference; process one node at a time; save the rest of the list before rewiring; then advance. The same idea appears when reversing a sublist (e.g. between positions m and n): you find the node before the sublist and the node after it, reverse the middle with the same while loop, then reattach the new head and tail of the reversed segment.
In “reverse in groups of k,” you reverse the first k nodes with this pattern, then recursively or iteratively process the next segment and attach the reversed segments. The core step is always “reverse a contiguous segment by rewiring next pointers.”

Once you can reverse a full list in a loop, you can isolate that loop for a segment and add logic to find the segment boundaries and reattach—that’s the pattern for LeetCode 92 (Reverse Linked List II) and 25 (Reverse Nodes in k-Group).

Expert Tip

When reversing a sublist (e.g. from position m to n), use a dummy node to simplify: dummy.next = head. Traverse to the node before position m (call it left_prev). The sublist head is left_prev.next. Run the same iterative reverse logic for (n − m + 1) nodes, then set left_prev.next to the new head of the reversed segment and the old head of the segment (now tail) to the node after position n. The same three-pointer logic applies inside the segment.

Interview Insight

State the problem clearly: “Reverse the list in-place by changing next pointers; return the new head.” Give the iterative solution with prev, curr, and saving next_node before rewiring. Mention edge cases: empty list (return None), single node (return that node). If asked for recursion, give the recursive version and note that it uses O(n) stack space. Interviewers often follow up with “reverse between m and n” or “reverse in groups of k”—recognize that the core is the same reverse loop applied to a segment.

Practice Problems

LeetCode 206: Reverse Linked List (implement both iterative and recursive).
LeetCode 92: Reverse Linked List II (reverse nodes from position m to n; use dummy node and the same rewiring loop).
LeetCode 25: Reverse Nodes in k-Group (reverse every k nodes; link reversed segments).
LeetCode 234: Palindrome Linked List (reverse the second half and compare with the first half, or use a stack).
LeetCode 24: Swap Nodes in Pairs (similar pointer discipline; can be seen as reverse in groups of 2).

Summary

Reverse linked list means rewiring each node’s next to point to its previous node; the old tail becomes the new head, the old head becomes the tail (next = None).
Iterative in-place: Use prev, curr, and save next_node = curr.next before setting curr.next = prev; then prev, curr = curr, next_node. Return prev when curr is None. Time O(n), space O(1).
Recursive: Base case: empty or single node. Otherwise reverse head.next, set head.next.next = head and head.next = None, return the new head from the recursion. Time O(n), space O(n) stack.
Always save the next node before overwriting curr.next. Return the new head (prev in iterative), not the original head. The same rewiring pattern extends to reversing a sublist or reversing in groups of k.

8.5 Detect Cycle (Floyd's Algorithm)

Introduction

A cycle in a linked list exists when some node’s next pointer eventually points back to an earlier node, so traversal never reaches None and instead loops forever. Detecting whether a list has a cycle—and optionally finding where the cycle starts—is a classic problem. The elegant solution is Floyd’s cycle-finding algorithm (also called the tortoise and hare): use two pointers that move at different speeds. If there is a cycle, they will eventually meet inside the cycle; if there is no cycle, the fast pointer reaches None. The algorithm runs in O(n) time and O(1) extra space, and the same slow/fast pointer idea appears in “find the middle of the list” and “find duplicate number” in an array of integers in range 1..n. This section builds the intuition, proves why the pointers meet, and shows how to find the cycle’s starting node.

Real-World Analogy

Imagine a circular track. Two runners start at the same point: one runs at speed 1 (one step per second), the other at speed 2 (two steps per second). If the track is a straight line (no cycle), the fast runner reaches the end and stops. If the track is a circle (cycle), the fast runner will eventually lap the slow runner—they meet. You don’t need to mark every position or remember where you’ve been; you only need two runners and the rule “slow moves 1, fast moves 2.” That’s Floyd’s algorithm: the tortoise (slow) and the hare (fast) both start at the head; slow advances one node per step, fast advances two. If fast hits None, no cycle. If they meet, there is a cycle.

Example

List: head → 1 → 2 → 3 → 4 → 5 → 3 (5 points back to 3). Slow and fast start at 1. After a few steps: slow is inside the cycle, fast is also inside and “catches up” from behind. They meet at some node inside the cycle. If the list were 1 → 2 → 3 → None, fast would become None and we return false.

Formal Definition

Concept Note

Cycle: A linked list has a cycle if there exists a node such that following next pointers repeatedly eventually leads back to that same node. Equivalently, no node has next = None reachable from the head. Cycle detection: Given the head of a singly linked list, determine whether the list contains a cycle. Optionally, return the node where the cycle begins (cycle start). Floyd’s algorithm: Use two pointers (slow and fast) starting at head. Slow moves one step per iteration (slow = slow.next), fast moves two (fast = fast.next.next). If fast or fast.next becomes None, there is no cycle. If slow == fast at some point, there is a cycle.

We do not use a hash set to store visited nodes—that would work but uses O(n) space. Floyd’s method uses only two pointers, so O(1) extra space.

Why This Topic Matters

Interview staple: LeetCode 142 (Linked List Cycle II) and the simpler “has cycle?” (LeetCode 141) are very common. Interviewers use them to check understanding of two-pointer techniques and optional follow-up “find the start of the cycle.”
Same pattern elsewhere: “Find the middle of the linked list” (slow once, fast twice; when fast reaches the end, slow is at the middle). “Find duplicate in array 1..n” (treat array indices as next pointers; then the array is like a linked list with a cycle, and Floyd finds the duplicate).
Correctness and proof: Understanding why the tortoise and hare meet (and why moving one back to head and advancing both one step at a time finds the cycle start) separates memorized code from real understanding.

Mental Model

Picture the list as a path that is either a straight line (tail’s next is None) or a stick with a loop: a “tail” from head into the cycle, then a circle. Two pointers start at the head. Slow takes one step, fast takes two. On a straight line, fast hits None. In a loop, both eventually enter the cycle; once inside, fast gains one step per iteration relative to slow, so fast will catch slow after a number of steps at most the length of the cycle. So: if fast ever becomes None (or fast.next is None before we use fast.next.next), no cycle. If slow == fast, cycle.

Step-by-Step: Floyd’s Algorithm (Detection Only)

Initialize slow = head and fast = head. (If head is None, return False.)
Loop while fast is not None and fast.next is not None:
- Move slow: slow = slow.next.
- Move fast: fast = fast.next.next.
- If slow == fast, return True (cycle detected).
If the loop exits, fast reached None or had no next, so the list is acyclic. Return False.

We check fast.next before using fast.next.next to avoid calling next on None.

Why Do Slow and Fast Meet? (Intuition)

Suppose the list has a cycle. Let L be the number of nodes from head to the cycle entrance, and C the number of nodes in the cycle. After L steps, slow is at the cycle entrance. Fast might already be inside the cycle; it has moved 2L steps total. Think of fast as “ahead” of slow by some offset in the cycle. From here on, both are inside the cycle. Each step, fast moves one extra step relative to slow. So fast gains one “position” on slow per step. The cycle has C positions, so within C steps fast will have lapped slow—they meet. So total steps until meeting is O(L + C) = O(n).

Concept Note

Formally: when slow enters the cycle, fast is somewhere in the cycle. Let the distance (in cycle steps) from slow to fast be d (0 ≤ d < C). Each step, this distance decreases by 1 (fast gains one). So after d steps they meet. So they always meet within one full cycle after slow enters.

Finding the Cycle Start (Optional but Important)

Once slow and fast meet at some node meet, we can find the cycle’s starting node without extra space:

Keep one pointer at meet and set another pointer entry = head.
Move both one step at a time: entry = entry.next, meet = meet.next (or move the pointer that was at meet). They will meet at the cycle entrance.

Why? Let L = distance head → cycle start, C = cycle length. When slow and fast first meet, slow has traveled L + a for some a (0 ≤ a < C), and fast has traveled 2(L + a) = L + a + kC for some integer k ≥ 1. So L + a = kC, hence L = kC − a. So from the meeting point, a steps forward in the cycle brings you to the cycle start (because a steps from meeting is (L + a) + a = L + 2a, and we need L mod C…). The cleaner fact: distance from head to cycle start is L; distance from meeting point to cycle start (going forward in the cycle) is C − a. One can show L = C − a (mod C), so moving one pointer from head and one from meet, both one step at a time, they meet exactly at the cycle start after L steps.

ASCII Diagram: List With Cycle

  List with cycle (tail points into the loop):

       head
         │
         ▼
  ┌───┐     ┌───┐     ┌───┐
  │ 1 │────►│ 2 │────►│ 3 │
  └───┘     └───┘     └───┘
                          │
                          │
  ┌───┐     ┌───┐     ◄───┘
  │ 6 │◄────│ 5 │◄────│ 4 │
  └───┘     └───┘     └───┘
     ▲
     │
  Cycle: 3 → 4 → 5 → 6 → 3 ...
  Slow and fast both enter; fast catches slow inside the cycle.

Evolution: Hash Set vs Floyd

Two main approaches:

Approach 1: Hash Set (Visited Nodes)

Traverse from head. For each node, check if it is in a set of visited nodes; if yes, cycle (and this node is in the cycle). If no, add the node and move to next. If you reach None, no cycle. Time O(n), Space O(n). Easy to implement and to find cycle start (first repeated node).

Approach 2: Floyd’s Algorithm (Two Pointers)

Slow and fast as above. No set. Time O(n), Space O(1). Preferred when O(1) space is required. Cycle start can still be found by the “entry from head” trick after detection.

Approach	Time	Space
Hash set (visited)	O(n)	O(n)
Floyd (slow/fast)	O(n)	O(1)

Python Implementation

Detection Only (Has Cycle?)

def has_cycle(head):
    if head is None:
        return False
    slow = head
    fast = head
    while fast is not None and fast.next is not None:
        slow = slow.next
        fast = fast.next.next
        if slow == fast:
            return True
    return False

Detection + Return Cycle Start (or None)

def detect_cycle_start(head):
    if head is None or head.next is None:
        return None
    slow = head
    fast = head
    while fast is not None and fast.next is not None:
        slow = slow.next
        fast = fast.next.next
        if slow == fast:
            entry = head
            while entry != slow:
                entry = entry.next
                slow = slow.next
            return entry
    return None

After finding the meeting point, we move entry from head and slow from meet one step at a time until they are equal; that node is the cycle start.

Line-by-Line Explanation (Detection)

if head is None: return False: Empty list has no cycle.
slow = fast = head: Both start at the head.
while fast is not None and fast.next is not None: We need fast.next to exist so we can use fast.next.next without raising. If fast reaches the end, no cycle.
slow = slow.next, fast = fast.next.next: Tortoise moves 1 step, hare moves 2.
if slow == fast: return True: Same node → we’re in a cycle.
return False: Loop exited because fast hit the end.

Time Complexity

When there is no cycle: fast reaches None in O(n) steps (at most n/2 iterations of the loop). When there is a cycle: as argued, slow and fast meet in O(L + C) = O(n) steps. So time O(n) in both cases.

Space Complexity

Only a fixed number of pointers (slow, fast, and optionally entry). Space O(1).

Edge Cases

Empty list: head is None → return False (or None for cycle start).
Single node, no cycle: head.next is None; first iteration has fast = head, fast.next is None, so we don’t enter the loop (or we exit). Correct.
Single node, self-cycle: head.next = head. Then slow = head, fast = head.next.next = head; we might compare after moving, or we need to move first then compare. Standard implementation: we move slow and fast, then check; after one iteration slow = head.next = head, fast = head; so slow == fast. Correct.

Checking fast and fast.next before advancing avoids null dereference. For “cycle start,” if there is no cycle we return None; if there is, the entry/slow phase runs in O(n) and finds the start.

Common Mistakes

Common Mistake

Using fast.next.next without checking fast.next. If the list has an even number of nodes and no cycle, fast can land on the last node; then fast.next is None and fast.next.next raises AttributeError. Always ensure fast is not None and fast.next is not None before moving fast by two.

Comparing before moving: If you check slow == fast at the start of the loop (before updating), both are head and you might return true incorrectly for a list that has no cycle but you haven’t moved yet. So: move first, then check; or start with slow at head and fast at head.next and then loop (with proper null checks). The code above moves first then checks, so we never compare the initial equal state—we only compare after at least one step.
Wrong cycle start: The cycle start is not necessarily the meeting node. You must do the second phase: one pointer from head, one from meet, step both until they meet.

Optimization Insight

Floyd’s algorithm is already optimal for cycle detection in terms of extra space: you need at least one pointer to traverse, and with two pointers you get O(1) space and O(n) time. You cannot do better than O(n) time in the worst case (you may have to enter the cycle to detect it). The hash set approach trades space for implementation simplicity; use Floyd when the problem asks for O(1) space or when you want to show the classic solution.

Pattern Recognition

Slow/fast (tortoise and hare): Same-direction two pointers with different step sizes. Use for: (1) cycle detection in a linked list, (2) finding the middle of a linked list (slow 1, fast 2; when fast reaches the end, slow is at the middle), (3) “find duplicate number” in an array where values are in 1..n and there is exactly one duplicate (treat index i → value arr[i] as next pointer; then Floyd finds the duplicate as the cycle start). The pattern is “two pointers, different speeds; use the meeting point or the relative position to infer something about the structure.”

Expert Tip

For “find the middle node” of a linked list: slow = fast = head, then while fast and fast.next: slow = slow.next, fast = fast.next.next. When the loop ends, slow is the middle (or the first of the two middle nodes in an even-length list). Same loop structure as cycle detection; no cycle, so fast eventually becomes None.

Interview Insight

State the problem: “Determine if the list has a cycle and optionally return the node where the cycle begins.” Give Floyd’s algorithm: slow and fast from head; move slow by 1, fast by 2; if they meet, cycle; if fast hits None, no cycle. Mention the null check for fast.next before fast.next.next. For “find cycle start,” explain the second phase: pointer from head and pointer from meeting point, both advance one step at a time until they meet—that’s the cycle start. If asked “why do they meet?”, give the intuition: once both are in the cycle, fast gains one step per iteration, so within cycle length steps they meet.

Practice Problems

LeetCode 141: Linked List Cycle (detection only).
LeetCode 142: Linked List Cycle II (detect cycle and return the cycle start node).
LeetCode 287: Find the Duplicate Number (array of n+1 integers in 1..n; one duplicate; use indices as next pointers and apply Floyd to find the duplicate).
LeetCode 876: Middle of the Linked List (slow/fast; when fast reaches end, slow is middle).

Summary

A list has a cycle if following next never reaches None and eventually repeats a node. Floyd’s algorithm uses two pointers (slow and fast) starting at head; slow moves 1 step, fast 2 steps per iteration. If they meet, there is a cycle; if fast becomes None (or fast.next is None), there is no cycle.
Always check fast is not None and fast.next is not None before using fast.next.next. Move slow and fast first, then check equality (to avoid falsely detecting a “cycle” at head when there isn’t one).
To find cycle start: after slow and fast meet, set one pointer to head and keep one at the meeting node; advance both one step at a time until they meet; that node is the cycle entrance. Time O(n), space O(1).
The same slow/fast pattern is used for finding the middle of a list and for “find duplicate in array 1..n” by modeling the array as a linked list with a cycle.

8.6 Merge Lists

Introduction

Merging two sorted linked lists means combining them into one sorted list by repeatedly taking the smaller of the two current heads and appending it to the result. It is the linked-list version of the “merge” step in merge sort and is one of the most common list operations in interviews. You use a dummy node to avoid special-casing the first node, and two pointers (one per list) that advance as you attach nodes. The result can be built in-place (reusing the existing nodes) so that time is O(n + m) and extra space is O(1). This section covers the two-sorted-list merge in detail, then briefly extends to merge k sorted lists (using a heap or repeated two-way merges).

Real-World Analogy

Imagine two sorted stacks of cards (e.g. both sorted by number). You want one combined sorted stack. You look at the top of each stack; take the smaller card and put it face-down on the result pile; repeat until one stack is empty, then put the rest of the other stack on top. You never need to “search” for where to insert—the next smallest overall is always one of the two tops. Merging two sorted linked lists is the same: the “top” is the head of each list; you compare, take the smaller, advance that list’s pointer, and attach the node to the merged list.

Example

List A: 1 → 3 → 5. List B: 2 → 4 → 6. Compare 1 and 2, take 1; compare 3 and 2, take 2; compare 3 and 4, take 3; and so on. Result: 1 → 2 → 3 → 4 → 5 → 6. Each node is chosen in O(1) time; total O(n + m).

Formal Definition

Concept Note

Merge two sorted lists: Given the heads of two singly linked lists sorted in non-decreasing order, merge them into one sorted list. We “merge” by repeatedly choosing the smaller of the two current head values, appending that node to the result, and advancing the pointer of the list we took from. The merged list should use the existing nodes (in-place) or new nodes, as required. Return the head of the merged list. Either list may be empty; the result is the other list. This is the same logic as the merge step in merge sort, but on linked lists instead of arrays.

The key is to avoid special-casing “who is the first node?” A dummy node (a temporary node whose next will point to the real head) lets us always do “append to current tail” in the same way; at the end we return dummy.next.

Why This Topic Matters

Interview staple: LeetCode 21 (Merge Two Sorted Lists) is extremely common. It tests pointer handling and the dummy-node pattern. LeetCode 23 (Merge k Sorted Lists) is a natural follow-up.
Merge sort on lists: Merge sort for linked lists works by splitting the list (e.g. slow/fast for middle), recursively sorting the two halves, then merging with this exact algorithm. So “merge two sorted lists” is the core subroutine.
Reusable pattern: The same “two pointers, compare, take smaller, advance” pattern appears in merging sorted arrays and in two-pointer problems on sorted data. Master it once for lists and you reuse it everywhere.

Mental Model

You have two “current” nodes: p for list 1, q for list 2. You also maintain a “tail” of the merged list (initially the dummy). In each step: if one list is exhausted, attach the rest of the other to the tail and stop. Otherwise, compare p.val and q.val; attach the smaller node to the tail, advance that list’s pointer (p or q), and set tail to the node you just attached. Repeat until both lists are consumed. The dummy’s next is the head of the merged list.

Step-by-Step Breakdown

Create a dummy node and set tail = dummy. We will build the merged list by doing tail.next = ... and then tail = tail.next.
Set p = list1, q = list2.
While both p and q are not None:
- If p.val <= q.val: set tail.next = p, then p = p.next.
- Else: set tail.next = q, then q = q.next.
- Set tail = tail.next (the node we just attached).
After the loop, exactly one of p or q may still have nodes. Attach the remainder: tail.next = p if p is not None else q.
Return dummy.next (the real head; dummy is not part of the result).

ASCII Diagram: Merge in Progress

  list1:  1 → 3 → 5 → None        list2:  2 → 4 → 6 → None
           p                            q

  dummy → 1 → 2 → 3 → ?
                tail   p         q → 4 → 6

  We compare p.val (3) and q.val (4); take 3, so tail.next = p, tail = p, p = p.next.
  Then tail.next = q (4), tail = q, q = q.next; etc.
  Finally attach remaining: tail.next = p or q (whichever is non-None).

Python Implementation

Merge Two Sorted Lists (In-Place)

def merge_two_lists(list1, list2):
    dummy = ListNode(0)   # or Node(0) with .val and .next
    tail = dummy
    p, q = list1, list2
    while p is not None and q is not None:
        if p.val <= q.val:
            tail.next = p
            p = p.next
        else:
            tail.next = q
            q = q.next
        tail = tail.next
    tail.next = p if p is not None else q
    return dummy.next

We reuse existing nodes; no new list allocation. Only dummy is extra (one node).

Line-by-Line Explanation

dummy = ListNode(0), tail = dummy: Dummy gives us a uniform “append” point. We always set tail.next and then move tail to the new end.
p, q = list1, list2: Current heads of the two lists.
while p is not None and q is not None: As long as both lists have a node, we compare and take one.
if p.val <= q.val: Take the smaller (use <= so list1 is preferred when equal; either way is fine). tail.next = p attaches that node; p = p.next advances list1.
tail = tail.next: The new tail is the node we just attached.
tail.next = p if p is not None else q: After the loop, one list may still have nodes. Attach the rest in one shot (either p or q is None, so the other is the remainder).
return dummy.next: The first real node of the merged list; dummy is discarded.

Time Complexity

Each node from both lists is attached exactly once. We do O(1) work per node (compare, set pointers). So time O(n + m) where n and m are the lengths of the two lists.

Space Complexity

We only use a dummy node and a few pointers. No extra list or recursion. Space O(1) (the merged list reuses the input nodes; the dummy is one node).

Edge Cases

Both lists empty: p and q are None; we never enter the loop; tail.next = p if p is not None else q sets tail.next = None. We return dummy.next = None. Correct.
One list empty: We don’t enter the while loop; tail.next is set to the non-empty list’s head. Correct.
Single node in one list: Handled by the same logic; we attach that node and the other list’s remainder.

Common Mistakes

Common Mistake

Forgetting to advance the tail. After tail.next = p (or q), you must do tail = tail.next. Otherwise the next attachment overwrites the same tail.next and you lose the rest of the list. Always move tail to the node you just attached.

Returning the wrong head: Return dummy.next, not dummy. The dummy is not part of the data; it was only a handle.
Not handling empty lists: The code above handles “both empty” and “one empty” via the loop and the final tail.next = p if p else q. No need for an extra if not list1: return list2 unless you want an early exit for clarity.

Merge K Sorted Lists (Brief)

Given k sorted linked lists, merge them into one sorted list. Two main approaches:

Approach 1: Repeated Two-Way Merge

Merge list 0 with list 1, then merge that result with list 2, and so on. Total time O(k × total nodes) in the worst case (each merge touches the growing result). Simple but can be slow when k is large.

Approach 2: Min-Heap (Priority Queue)

Put the head of each list into a min-heap (by value). Repeatedly pop the smallest, append it to the result, and push its next if not None. Each pop/push is O(log k); we do it for every node across all lists. Time O(N log k) where N is the total number of nodes and k is the number of lists. Space O(k) for the heap. This is the standard optimal solution for LeetCode 23.

Expert Tip

In Python for “merge k lists,” use heapq. Push (node.val, id(node), node) so that nodes are comparable (heapq compares by tuple; if values tie, id(node) breaks ties). Or use a wrapper class that implements __lt__ by comparing val. When you pop, get the node, set tail.next = node, tail = node, and if node.next is not None, push node.next.

Pattern Recognition

Two-pointer merge: Two sorted sequences, two pointers; compare current elements, take the smaller, advance that pointer. Same idea as merging two sorted arrays; here we only change next pointers. Dummy node: When building a new list and the head is not known in advance (or you want to avoid “if first node” branches), use a dummy as the initial tail and return dummy.next. This pattern appears in “merge two lists,” “partition list,” and “remove elements.”

Interview Insight

State the problem: “Merge two sorted linked lists into one sorted list, reusing nodes.” Use a dummy node and a tail; in a loop, compare the two current heads, attach the smaller to tail, advance that pointer and tail. After the loop, attach the remainder. Return dummy.next. Mention edge cases: both empty, one empty. If asked to merge k lists, give the heap approach: push all heads, pop smallest, append, push next; O(N log k) time, O(k) space.

Practice Problems

LeetCode 21: Merge Two Sorted Lists (implement with dummy and in-place).
LeetCode 23: Merge k Sorted Lists (heap of k heads; pop smallest, push next).
LeetCode 2: Add Two Numbers (two lists representing digits; similar two-pointer traverse and build result).
LeetCode 148: Sort List (merge sort: find middle with slow/fast, recurse, merge two sorted halves).

Summary

Merge two sorted lists: Use a dummy node and a tail; compare the two current heads (p, q), attach the smaller to tail.next, advance that pointer and tail; then attach the remainder and return dummy.next. Time O(n + m), space O(1) excluding the dummy.
Always set tail = tail.next after attaching a node so the next attachment goes to the new end.
For merge k sorted lists, use a min-heap of the k heads; pop smallest, append to result, push next; time O(N log k), space O(k). The two-way merge pattern is the core of merge sort on linked lists.

8.7 Intersection Point

Introduction

Two singly linked lists may intersect: at some node they merge into a single list and share the same nodes until the end. The lists can have different lengths before the intersection. The problem is: given the heads of two such lists, find the intersection node (the first node that is common to both lists), or return None if they do not intersect. You cannot modify the lists. The elegant O(1) space solution uses two pointers that traverse both lists in a way that aligns their “distance from the end”: either by computing lengths and advancing the longer list’s pointer by the difference, or by having each pointer traverse list A then list B (and the other list B then list A) so they cover the same total distance and meet at the intersection. This section covers both approaches and the hash-set fallback.

Real-World Analogy

Imagine two roads that start in different places but later merge into one. Two people start at the two beginnings and walk at the same speed. If one road is longer before the merge, that person has a head start. To make them “meet at the merge point,” the person on the longer road could start later—by exactly the length difference. Alternatively: person A walks “road 1 then road 2,” person B walks “road 2 then road 1.” They cover the same total distance; when they meet, they are at the same node. If there is no merge, one road ends and they never meet at a shared node. That’s the two-pointer idea for intersection.

Example

List A: 4 → 1 → 8 → 4 → 5. List B: 5 → 6 → 1 → 8 → 4 → 5. The tail 8 → 4 → 5 is shared; the intersection node is the first shared node—the one with value 8. A has 2 nodes before intersection; B has 3. If we align so both pointers are “same distance from the end,” they will meet at the intersection node.

Formal Definition

Concept Note

Intersection of two linked lists: We are given two singly linked list heads, headA and headB. The lists may have different lengths. If they intersect, they share a common tail: from some node onward, next is the same for both. The intersection node is the first node that appears in both lists when traversing from the respective heads. If the lists do not intersect (both have distinct tails), return None. We assume no cycles and do not modify the lists. Return the intersection node (the actual node object), or None.

Constraints: list lengths can differ; we want O(1) extra space if possible; we only traverse and compare nodes by reference (identity), not by value.

Why This Topic Matters

Interview staple: LeetCode 160 (Intersection of Two Linked Lists) is a common two-pointer problem. It tests whether you can “align” two traversals without extra space.
Alignment idea: The trick—making two pointers travel the same effective distance so they meet at the target—reappears in “find cycle start” (head + meet) and in problems where you need to compare or sync two sequences of different lengths.
Reference equality: Intersection is defined by node identity (same object), not value. Two nodes with the same value are not necessarily the intersection node; we compare with is or == for the node reference.

Mental Model

Picture two lists: each has a “unique” part and then a “common” part. The lengths of the unique parts can differ. If we start two pointers at the two heads and move them one step at a time, they will not meet at the intersection in one pass because one list might be longer before the merge. So we either: (1) compute both lengths, advance the longer list’s pointer by |lenA − lenB| steps so both pointers are the same distance from the end, then step both until they are equal or both None; or (2) run pointer A over list A then list B, and pointer B over list B then list A—each travels lenA + lenB, so they meet at the intersection node on the second “leg” if it exists.

Step-by-Step Breakdown

Method 1: Length Difference

Traverse list A to get length lenA; traverse list B to get lenB.
If lenA > lenB, advance headA by lenA − lenB steps (use a pointer p = headA, advance it). Else advance headB by lenB − lenA steps. Now both pointers are the same distance from the end of their lists.
Move both pointers one step at a time until they point to the same node (intersection) or both become None (no intersection). Return the node where they meet, or None.

Method 2: Two Pointers (A then B, B then A)

Set p = headA, q = headB.
While p != q: if p is None, set p = headB; else set p = p.next. If q is None, set q = headA; else set q = q.next. So when one pointer reaches the end of its list, it continues from the other list’s head.
When the loop exits, p == q. If they are both None, there is no intersection; otherwise p (or q) is the intersection node. Return p.

In method 2, each pointer travels lenA + lenB nodes. So they meet at the intersection node (when both are on the common part) or both become None after the same number of steps (when there is no intersection).

ASCII Diagram: Intersection

  headA:  A1 → A2 → \
                      C1 → C2 → C3 → None   (shared tail)
  headB:  B1 → B2 → B3 → /

  Unique A: 2 nodes. Unique B: 3 nodes. Common: 3 nodes.
  Intersection node = C1.

  Length method: lenA=5, lenB=6. Advance B by 1: start B at B2.
  Then move both: A1,B2 → A2,B3 → C1,C1 → meet at C1.

  Two-pointer method: p goes A1→A2→C1→C2→C3→(switch to B) B1→B2→B3→C1.
  q goes B1→B2→B3→C1→C2→C3→(switch to A) A1→A2→C1.
  They meet at C1 when both are on the common part.

Python Implementation

Method 1: Length Difference

def get_intersection_node(headA, headB):
    def length(head):
        n = 0
        while head:
            n += 1
            head = head.next
        return n

    lenA, lenB = length(headA), length(headB)
    p, q = headA, headB
    if lenA > lenB:
        for _ in range(lenA - lenB):
            p = p.next
    else:
        for _ in range(lenB - lenA):
            q = q.next
    while p is not q:
        p = p.next
        q = q.next
    return p

Method 2: Two Pointers (No Length)

def get_intersection_node(headA, headB):
    p, q = headA, headB
    while p is not q:
        p = p.next if p else headB
        q = q.next if q else headA
    return p

When p reaches the end of A, we set p = headB; when q reaches the end of B, we set q = headA. So each pointer eventually traverses A then B (or B then A). They meet at the intersection node or both become None.

Line-by-Line Explanation (Method 2)

p, q = headA, headB: Start at the two heads.
while p is not q: We use is for reference equality (same node object). When they are the same node—or both None—we exit.
p = p.next if p else headB: If p is not None, advance; if p is None (we’ve passed the end of A), switch to list B’s head. Same for q with headA.
return p: After the loop, p == q. If there was no intersection, both traversed to the end and are None. If there was an intersection, both are at the intersection node. So p is the correct return value in both cases.

Time Complexity

Method 1: Two length passes O(n + m), then advancing by the difference O(|n − m|), then stepping until meet O(min(n, m)). Total O(n + m).

Method 2: Each pointer travels at most n + m nodes (list A then list B, or vice versa). So at most 2(n + m) steps. O(n + m).

Space Complexity

Both methods use only a few pointers (and method 1 uses a length helper with O(1) extra space). Space O(1).

Edge Cases

No intersection: Lists have different tails. Method 2: both pointers eventually become None (after traversing A+B and B+A); p is q when both are None, so we return None. Correct.
One or both lists empty: If headA is None, in method 2 we have p = headB immediately (since p.next if p gives headB when p is None). If both are None, p and q stay None and we return None. Correct.
Same list: If headA == headB, we return headA immediately (first iteration p is q).
Intersection at head of one list: One list is a suffix of the other. The length method or the two-pointer method still finds the first common node.

Common Mistakes

Common Mistake

Comparing by value instead of by reference. The intersection is defined by node identity: the first node that is the same object in both lists. Two nodes with the same value are not necessarily the intersection. Use p is q (or id(p) == id(q)), not p.val == q.val, to detect the intersection node.

Modifying the lists: We must not change next pointers. The problem expects a read-only traversal. Method 1 and 2 only read; don’t reverse or mutate.
In method 2, wrong switch: When p is None we set p = headB (the other list), not headA. Mixing these up breaks the “same distance” property.

Alternative: Hash Set

Traverse list A and store every node (or node id) in a set. Then traverse list B; the first node that is in the set is the intersection node. Time O(n + m), Space O(n) or O(m). Use when O(1) space is not required; the two-pointer method is preferred for O(1) space.

Optimization Insight

The two-pointer method (A then B, B then A) avoids computing lengths and uses a single loop. It is usually the cleanest to code and to explain. Both the length-difference and the two-pointer method achieve O(n + m) time and O(1) space; choose the one you find clearer. The hash set is O(n + m) time but O(n) space—mention it as an alternative if the interviewer allows extra space.

Pattern Recognition

Aligning two traversals: When two sequences have different lengths but share a common suffix (or you want them to “meet” at a certain point), you can either (1) equalize the distance from the end by advancing the longer one, or (2) make each pointer traverse both sequences so they cover the same total length. The same idea appears in “find cycle start” (one pointer from head, one from meeting point; both step once until they meet).

Expert Tip

If the problem said “intersection by value” (first node where values match), you’d need to be careful: multiple nodes can have the same value. The standard LeetCode 160 problem is intersection by reference (same node object). Always clarify in an interview: “Is the intersection defined by the same node object or by equal value?”

Interview Insight

State the problem: “Find the first node that is common to both lists, or None if they don’t intersect. No cycles, don’t modify the lists.” Give the two-pointer method: p and q start at headA and headB; when p hits the end of A, set p = headB; when q hits the end of B, set q = headA; loop until p is q; return p. Explain why they meet: each travels lenA + lenB, so they align on the common part (or both become None). Mention edge cases: no intersection (return None), one list empty. If asked for another approach, give the length-difference method or the hash set.

Practice Problems

LeetCode 160: Intersection of Two Linked Lists (implement two-pointer and/or length-difference; O(1) space).
Variation: If lists could have cycles, you must first detect cycles and find cycle starts; then reason about intersection of two lists that may have cycles (more complex).

Summary

Intersection point is the first node that is common to both lists (by reference). Use two pointers and align their effective path length so they meet at that node or both become None.
Length method: Get lenA and lenB; advance the longer list’s pointer by |lenA − lenB|; then step both until they are equal. Return the meeting node or None.
Two-pointer method: p = headA, q = headB. While p is not q: p = p.next if p else headB, q = q.next if q else headA. Return p. Each pointer traverses list A then B (or B then A), so they meet at the intersection or both become None. Time O(n + m), space O(1).
Compare nodes by reference (p is q), not by value. Do not modify the lists.

8.8 Remove Nth Node

Introduction

Remove the nth node from the end of a singly linked list: given the head and an integer n, remove the node that is the n-th from the end (1-indexed from the end) and return the head. For example, n = 1 means remove the last node; n = 2 means remove the second-to-last; if the list has length L, n = L means remove the first node (head). The challenge is that we don’t know the length in advance—we need a single pass or a clever two-pointer setup. The standard solution uses a dummy node and two pointers: advance one pointer n + 1 steps ahead, then move both until the leading pointer reaches the end; the trailing pointer is then just before the node to remove, so we can do prev.next = prev.next.next. This section covers the one-pass two-pointer approach, edge cases (removing the head, single node), and why the dummy simplifies the code.

Real-World Analogy

Imagine a line of people and you must remove the person who is “n places from the end.” If you start at the front, you don’t know where the end is until you walk there. Trick: send one person n steps ahead. When the person at the back reaches the end of the line, the person who was sent ahead is exactly n steps in front of “the end”—so the person n steps behind the leader is standing right before the one to remove. You only need one pass: move the “leader” and the “follower” together until the leader hits the end; then the follower is one step before the node to delete. A dummy at the very front lets you treat “removing the head” the same as removing any other node (the follower sits on the dummy, and dummy.next is the node to remove).

Example

List: 1 → 2 → 3 → 4 → 5, n = 2. We remove the 2nd from end = the node with value 4. Result: 1 → 2 → 3 → 5. If n = 5, we remove the head (1); result: 2 → 3 → 4 → 5. The dummy node lets us handle “remove head” without a special branch: the follower ends up at the dummy, and we set dummy.next = head.next.

Formal Definition

Concept Note

Remove Nth node from end: Given the head of a singly linked list and an integer n (1 ≤ n ≤ list length), remove the n-th node from the end of the list (1-indexed: 1 = last node, 2 = second-to-last, etc.) and return the head. We assume the list has at least n nodes. We do this in (at most) one pass and O(1) extra space. The removed node is no longer part of the list; the list’s length becomes one less. If we remove the head, we return the new head (the former second node).

Using a dummy node whose next is the real head allows us to have a “previous” pointer even when the node to remove is the head; then we always do prev.next = prev.next.next and return dummy.next.

Why This Topic Matters

Interview staple: LeetCode 19 (Remove Nth Node From End of List) is very common. It tests the “lead pointer by n” two-pointer pattern and the dummy node to avoid head special-case.
Same pattern elsewhere: “Find the k-th node from the end” uses the same idea (lead by k, then step both; when lead reaches end, the other pointer is at the k-th from end). “Reorder list” and “split list” sometimes use similar “distance from end” logic.
Dummy + two pointers: The combination “dummy so we have a predecessor for the head” and “lead pointer by n steps” is a reusable pattern for any “from the end” indexing.

Mental Model

We want a pointer to the node before the one we will remove, so we can do prev.next = prev.next.next. The node to remove is n steps from the end—i.e. when we are at the end (None), we go back n steps to get to that node, and one more step back to get to its predecessor. So: put a “fast” pointer n + 1 steps ahead of a “slow” pointer (slow starts at dummy, so slow is “one before” the first data node). Move both one step at a time until fast reaches None. Then slow is exactly at the predecessor of the n-th-from-end node. Remove the next node and return dummy.next.

Step-by-Step Breakdown

Create a dummy node and set dummy.next = head. Set slow = dummy and fast = dummy.
Advance fast by n + 1 steps (so fast is (n + 1) steps ahead of slow). If fast becomes None before we finish n + 1 steps, the list has fewer than n nodes—handle as needed (e.g. return head or raise).
While fast is not None: move slow = slow.next and fast = fast.next. When the loop exits, fast is None—we’ve passed the last node. So slow is at the node that is (n + 1) from the end, i.e. the predecessor of the n-th-from-end node.
Remove the next node: slow.next = slow.next.next. Return dummy.next (the head; it may have changed if we removed the original head).

Why n + 1? We need slow to land on the predecessor of the n-th-from-end node. The n-th-from-end node is n steps before None. So its predecessor is n + 1 steps before None. By moving fast ahead by n + 1 and then stepping both until fast is None, slow has moved the same number of steps as fast from its starting point—so slow is (n + 1) steps before the current “end” (None), i.e. at the predecessor. So we advance fast by n + 1 in the beginning.

ASCII Diagram

  List: 1 → 2 → 3 → 4 → 5 → None   n = 2 (remove 4)

  After advancing fast by n+1 = 3 steps:
  dummy → 1 → 2 → 3 → 4 → 5 → None
    slow         fast

  After moving both until fast is None:
  dummy → 1 → 2 → 3 → 4 → 5 → None
              slow         fast

  slow is at 3; slow.next is 4 (the node to remove).
  slow.next = slow.next.next  →  3.next = 5
  Result: 1 → 2 → 3 → 5 → None

Python Implementation

def remove_nth_from_end(head, n):
    dummy = ListNode(0)
    dummy.next = head
    slow = fast = dummy
    for _ in range(n + 1):
        fast = fast.next
        if fast is None and _ < n:
            return head   # list shorter than n; optional
    while fast is not None:
        slow = slow.next
        fast = fast.next
    slow.next = slow.next.next
    return dummy.next

We advance fast n + 1 times. If we want to assume valid input (list length ≥ n), we can skip the if fast is None check and assume fast is not None after the loop. Then we step both until fast is None, remove the next node of slow, and return dummy.next.

Line-by-Line Explanation

dummy.next = head, slow = fast = dummy: Dummy gives us a predecessor for the head. Both pointers start at dummy.
for _ in range(n + 1): fast = fast.next: Move fast forward by n + 1 steps. After this, fast is (n + 1) nodes ahead of slow. So when we then move both one step at a time until fast is None, slow will be at the (n + 1)-th node from the end = predecessor of the n-th from end.
while fast is not None: slow, fast = slow.next, fast.next: Advance both until fast reaches the end. Slow ends at the node before the one to remove.
slow.next = slow.next.next: Bypass the n-th-from-end node. We assume the list has at least n nodes so slow.next exists and we are not dereferencing None.
return dummy.next: The (possibly new) head. If we removed the original head, dummy.next now points to the second node, which is correct.

Time Complexity

We traverse the list at most twice: once to advance fast by n + 1 (or up to the end), and once to move both until fast is None. Total steps O(n). Time O(n). It is possible to do exactly one pass by advancing fast n + 1 steps and then moving both in the same loop; the total number of pointer moves is still O(n).

Space Complexity

Only the dummy and a few pointers. Space O(1).

Edge Cases

Remove the head (n = length): Fast advances n + 1 steps and becomes None right after the last node. Then we move both: slow never moves (fast was already None after the for-loop…). Actually: after the for-loop we have advanced fast n + 1 times. If the list has exactly n nodes, after n steps fast is at the last node; after n + 1 steps fast is None. So we don’t enter the while loop. Slow is still at dummy (we didn’t enter the while). So slow.next is head—the node to remove. slow.next = slow.next.next makes dummy.next point to head.next, the new head. Correct.
Single node, n = 1: List is [x]. We remove the only node. Dummy.next should become None. After n+1 = 2 steps, fast is None. Slow is still dummy. slow.next = slow.next.next sets dummy.next = None. Return dummy.next = None. Correct.
List shorter than n: If we don’t check and n is too large, the for-loop may set fast to None before we complete n + 1 steps. Then in the while, fast is already None so we don’t enter; slow is still dummy; slow.next.next might be invalid if the list has fewer than n nodes. So either assume valid n or add a check and return head (or raise) when fast becomes None too early.

Common Mistakes

Common Mistake

Leading by n instead of n + 1. If you advance fast by only n steps, when you then move both until fast is None, slow ends at the n-th-from-end node itself, not its predecessor. To remove that node you need the previous node. So you must lead by n + 1 so that slow lands on the predecessor. Alternatively, you could lead by n and then remove the node at slow by copying slow.next’s value into slow and doing slow.next = slow.next.next (delete-next trick), but that doesn’t work if the node to remove is the last node. So “lead by n + 1 and remove slow.next” is the standard approach.

No dummy: When n = length, the node to remove is the head. Without a dummy you have no “previous” node. You’d need a special case: “if we need to remove the head, return head.next.” The dummy unifies this: the predecessor of the head is the dummy.
Off-by-one in the loop: The “n-th from end” is 1-indexed: 1 = last node. So the predecessor of the n-th from end is (n + 1) from the end. Hence advance fast by n + 1.

Alternative: Two Pass (Length First)

First pass: compute length L. The node to remove is at position L − n + 1 from the front (1-indexed). Second pass: traverse to the (L − n)-th node (the predecessor) and set prev.next = prev.next.next. Use a dummy to handle L − n = 0 (removing head). Time O(n), space O(1). Same complexity; the one-pass “lead by n + 1” is more common in interviews.

Optimization Insight

One pass is sufficient: we don’t need to know the total length. The lead-by-(n+1) trick gives us the predecessor in a single sweep. The two-pass solution is easier to derive but does two traversals; both are O(n) time and O(1) space. In practice, the one-pass solution is preferred for elegance and because it matches the “two pointers with a gap” pattern used in other problems.

Pattern Recognition

“K-th from end” with two pointers: To get a pointer to the k-th node from the end (or its predecessor), advance one pointer by k (or k + 1) steps, then move both until the leading pointer reaches None. The trailing pointer is then at the desired position. Use a dummy when the “previous” of the head might be needed. This pattern appears in “remove nth from end,” “find middle,” and “reorder list” (find middle, reverse second half, merge).

Expert Tip

If the problem asked to “find” the n-th node from the end (not remove it), you’d advance fast by n steps (not n + 1), then move both until fast is None. Then slow is at the n-th-from-end node. For removal we need the predecessor, so we use n + 1 and then remove slow.next.

Interview Insight

State the problem: “Remove the n-th node from the end in one pass, return the head.” Use a dummy and two pointers: advance the fast pointer n + 1 steps from the dummy, then move both until fast is None. Slow is then at the predecessor of the node to remove; do slow.next = slow.next.next and return dummy.next. Explain why n + 1: we need the predecessor so we can unlink the node; the predecessor is (n + 1) steps from the end. Mention edge case: removing the head (n = length) is handled by the dummy—slow stays at dummy and we unlink the first node. Assume n is valid (list length ≥ n) unless told otherwise.

Practice Problems

LeetCode 19: Remove Nth Node From End of List (one-pass with dummy and lead by n + 1).
Variation: Find the n-th node from the end (lead by n, then step both; slow lands on that node).
LeetCode 61: Rotate List (rotate right by k; similar “from end” reasoning—find new tail and new head).

Summary

Remove n-th node from end: Use a dummy (dummy.next = head) and two pointers. Advance fast by n + 1 steps from the dummy; then move both slow and fast until fast is None. Slow is at the predecessor of the n-th-from-end node. Set slow.next = slow.next.next and return dummy.next.
Lead by n + 1 (not n) so that slow lands on the predecessor, allowing a single unlink. The dummy handles the case when the node to remove is the head.
Time O(n), space O(1). Alternative: two-pass (compute length, then traverse to predecessor). Same complexity; one-pass is standard in interviews.

8.9 LRU Cache Implementation

Introduction

An LRU (Least Recently Used) cache is a fixed-capacity cache that evicts the least recently used item when the cache is full and a new item is inserted. It supports two operations in O(1) average time: get(key)—return the value for the key if present and mark the item as most recently used; and put(key, value)—insert or update the key and mark it most recently used, evicting the LRU item if the cache is at capacity. To achieve O(1) get and put we combine a hash map (key → node or value) for O(1) lookup with a doubly linked list to maintain access order: the list keeps items ordered by recency, with “most recent” at one end and “least recent” at the other. When we access or add an item, we move its node to the “most recent” end; when we evict, we remove from the “least recent” end. The doubly linked list allows O(1) removal of a node when we have a reference to it (Section 8.2). This section builds the design from scratch and gives a complete Python implementation.

Real-World Analogy

Imagine a clothes rack that can hold only a fixed number of hangers. When you wear an item, you put it back on the “most recently used” end of the rack. When the rack is full and you add a new item, you remove the one that hasn’t been used for the longest time (the one at the “least recently used” end). You need to (1) find an item by “name” quickly (hash map) and (2) reorder the rack when you use something (move that hanger to the front)—and remove from the back when full. The doubly linked list is the rack: each hanger has a link to the next and the previous so you can pluck one out and move it to the front in O(1) time.

Example

Capacity 2. put(1, 10): cache has [1]. put(2, 20): cache has [1, 2] (2 is most recent). get(1): return 10, now order is [2, 1] (1 moved to most recent). put(3, 30): cache full; evict LRU = 2; cache becomes [1, 3]. get(2): not found, return -1.

Formal Definition

Concept Note

LRU cache: A data structure with a fixed positive capacity that supports: (1) get(key)—return the value associated with key if the key exists in the cache, otherwise return -1 (or None). If the key exists, the item is considered “used” and becomes the most recently used. (2) put(key, value)—if the key already exists, update its value and mark it most recently used; if the key is new and the cache is at capacity, evict the least recently used item, then add (key, value) and mark it most recently used. Both operations must run in O(1) average time. “Least recently used” means the item that was accessed (get or put) least recently among all items currently in the cache.

We need O(1) lookup (hash map), O(1) “move to most recent” (remove node from current position and add to head—requires doubly linked list so we can unlink in O(1) given the node), and O(1) “remove least recent” (remove from tail end of the list).

Why This Topic Matters

Interview staple: LeetCode 146 (LRU Cache) is a classic design problem. It combines hash map and doubly linked list and is asked frequently at senior levels.
Real systems: Caches in operating systems, databases, and web servers often use LRU or variants (LRU-K, ARC). Understanding LRU is a stepping stone to cache design and eviction policies.
Data structure combination: The pattern “hash map + doubly linked list” appears whenever you need O(1) lookup and O(1) reordering or removal by reference. Same idea is used in some LFU (Least Frequently Used) implementations and in ordered maps with “move to front.”

Mental Model

Maintain two structures: (1) A hash map from key to the list node that holds (key, value). (2) A doubly linked list that represents recency order: e.g. “most recent” at the head (right after a dummy head node) and “least recent” at the tail (right before a dummy tail node). So we have head <-> [most recent] <-> ... <-> [least recent] <-> tail. On get: look up the node in the map; if found, unlink it and add it after head (make it most recent); return the value. On put: if key exists, update the node’s value and move it to most recent. If key is new and cache is full, remove the node before tail (LRU), delete its key from the map, then create a new node, add it after head, and put it in the map. If key is new and cache is not full, just add the new node after head and put it in the map.

Step-by-Step Breakdown

Structure

Node: key, value, prev, next. We need the key in the node so that when we evict the node at the tail we can remove the corresponding key from the map.
Dummy head and tail: head.next = most recent, tail.prev = least recent. This avoids null checks when adding or removing.
Map: key → node for O(1) lookup.

Helper: Add node after head (make most recent)

Link the node between head and head.next: node.prev = head, node.next = head.next, head.next.prev = node, head.next = node.

Helper: Remove node from list

node.prev.next = node.next, node.next.prev = node.prev. The node is unlinked.

get(key)

If key not in map, return -1.
Get the node from the map. Remove the node from its current position (remove_node). Add it after head (add_after_head). Return node.value.

put(key, value)

If key in map: get the node, update node.value, remove_node, add_after_head. Return.
If cache is full (len(map) == capacity): the LRU node is tail.prev. Remove it from the list, remove its key from the map.
Create a new node for (key, value). add_after_head, and map[key] = node.

ASCII Diagram: List Order

  head <-> [MRU] <-> ... <-> [LRU] <-> tail

  After get(key): unlink that node, insert it between head and head.next.
  After put(new): if full, unlink tail.prev (LRU), remove from map; then add new node after head.
  New / updated item is always at head.next (most recent).

Python Implementation

class Node:
    def __init__(self, key=0, value=0):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

class LRUCache:
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = {}          # key -> node
        self.head = Node()
        self.tail = Node()
        self.head.next = self.tail
        self.tail.prev = self.head

    def _add_after_head(self, node):
        node.prev = self.head
        node.next = self.head.next
        self.head.next.prev = node
        self.head.next = node

    def _remove_node(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev

    def get(self, key: int) -> int:
        if key not in self.cache:
            return -1
        node = self.cache[key]
        self._remove_node(node)
        self._add_after_head(node)
        return node.value

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            node = self.cache[key]
            node.value = value
            self._remove_node(node)
            self._add_after_head(node)
            return
        if len(self.cache) == self.capacity:
            lru = self.tail.prev
            self._remove_node(lru)
            del self.cache[lru.key]
        new_node = Node(key, value)
        self.cache[key] = new_node
        self._add_after_head(new_node)

Line-by-Line Explanation

Node: Holds key (for map cleanup on evict), value, prev, next. Doubly linked.
head, tail: Dummy nodes. head.next is the first real node (MRU), tail.prev is the last (LRU). head.prev and tail.next can stay None or unused.
_add_after_head(node): Inserts node between head and head.next. Standard doubly linked insert after head.
_remove_node(node): Unlinks node by rewiring node.prev.next and node.next.prev. O(1) because we have the node reference.
get: If key not in cache, return -1. Else get node, remove it from list, add after head (move to MRU), return value.
put: If key exists: update value, move to MRU (remove + add_after_head). Else: if at capacity, evict tail.prev (remove from list, delete from map). Then create new node, put in map, add_after_head.

Time Complexity

get(key): O(1) average—hash lookup, then two pointer updates (remove + add). put(key, value): O(1) average—lookup, and either (update + move) or (evict one node + add). All operations are O(1) assuming hash map and doubly linked list operations are O(1).

Space Complexity

O(capacity)—we store at most capacity nodes and capacity map entries. The dummy head and tail are O(1). Space O(capacity).

Edge Cases

Capacity 0: put should not add; get always returns -1. Guard: if capacity == 0, put returns without doing anything (and don’t evict if already empty).
put same key twice: Update value and move to MRU. Handled by “if key in self.cache” branch.
get then put same key: get moves to MRU; put updates value and moves to MRU again. No duplicate nodes; the map still points to the same node.

Common Mistakes

Common Mistake

Forgetting to store the key in the node. When we evict, we remove tail.prev and must do del self.cache[lru.key]. If the node doesn’t store the key, we cannot remove the correct key from the map. Always store (key, value) in the node.

Wrong order of pointer updates: When adding after head, link the node to its neighbors first, then update head.next and head.next.prev. If you update head.next first, you lose the reference to the old first node unless you saved it.
Using a singly linked list: To remove a node in O(1) when you have only the node, you need to change the previous node’s next. Without a prev pointer, finding the previous node is O(n). So LRU cache needs a doubly linked list for O(1) move/remove.

Evolution: Why Hash Map + Doubly Linked List

Naive: Store (key, value) in a list. get: scan the list O(n); put: scan to update or add, and to find LRU (e.g. last element) O(n). Too slow.

Hash map only: We can lookup in O(1) but we don’t have “order of use.” To evict LRU we’d need to track timestamps and scan to find the minimum—O(capacity) per eviction. Not O(1) put.

Hash map + doubly linked list: Map gives O(1) lookup; list gives order. Move to MRU = unlink + add after head = O(1). Evict LRU = remove tail.prev = O(1). This meets the O(1) get/put requirement.

Optimization Insight

You could use an ordered dict (e.g. Python’s collections.OrderedDict) and move the key to the end on get/put (most recent at end), then evict from the beginning. That also gives O(1) get/put in practice. The “hash map + doubly linked list” implementation is the classic interview solution and shows you understand both structures; in Python, mentioning OrderedDict as an alternative is fine. For other languages, the manual doubly linked list is standard.

Pattern Recognition

Hash map + doubly linked list: Use when you need O(1) lookup by key and O(1) “move to front” or “remove by reference.” The list maintains an order; the map gives quick access to the node so you can unlink and reinsert. Same pattern: LRU cache, LFU cache (with frequency lists), and some ordered cache eviction policies.

Expert Tip

In Python 3.7+, plain dict preserves insertion order. You could implement a minimal LRU by deleting and re-inserting the key on access (to move it to the “end” as most recent) and evicting the first key (popitem(last=False)) when full. That gives O(1) average get/put and is very short to write. The interview version with explicit doubly linked list is preferred when the goal is to demonstrate the data structure design; for production Python, OrderedDict or the dict trick is often used.

Interview Insight

State the requirements: O(1) get and put, evict LRU when full. Say you’ll use a hash map for O(1) lookup and a doubly linked list to maintain recency order (most recent at head, least at tail). Describe the node (key, value, prev, next)—emphasize storing key so you can remove from the map on eviction. For get: lookup; if found, unlink the node and add after head, return value. For put: if key exists, update and move to head; else if full, evict tail.prev and remove from map; then create new node, add after head, put in map. Mention dummy head/tail to simplify insert/remove. If they ask “why doubly linked?”, answer: O(1) removal of a node when we have a reference—we need prev and next to unlink.

Practice Problems

LeetCode 146: LRU Cache (implement get/put with hash map + doubly linked list).
LeetCode 460: LFU Cache (least frequently used; multiple lists or heap + map).
LeetCode 588: Design In-Memory File System (can use similar caching ideas for directory listings).

Summary

LRU cache supports get(key) and put(key, value) in O(1) average time; when full, put evicts the least recently used item. Combine a hash map (key → node) with a doubly linked list (order by recency: head = MRU, tail = LRU).
Store (key, value) in each node so that on eviction we can remove the key from the map. Use dummy head and tail for simpler insert/remove.
get: lookup node; if found, unlink and add after head, return value. put: if key exists, update value and move to head; else if full, evict tail.prev and delete from map; then add new node after head and put in map. Time O(1), space O(capacity).

9.1 Stack Implementation

Introduction

A stack is a linear data structure that follows LIFO (Last In, First Out): the last element added is the first one removed. It supports three core operations—push (add an element on top), pop (remove and return the top element), and peek or top (return the top element without removing it)—all in O(1) time when implemented with an array or a linked list. Stacks appear everywhere: in the call stack of your program (function calls and returns), in expression evaluation (postfix/prefix), in matching brackets and parsing, in DFS (depth-first search), and in undo/redo. This section defines the stack ADT, implements it in Python (using a list and optionally a linked list), and covers edge cases and when to reach for a stack in problem-solving.

Real-World Analogy

Think of a stack of plates. You can only add a new plate on top and remove the top plate. You cannot pull a plate from the middle without toppling the stack. The last plate you put on is the first one you take off—LIFO. Similarly, the “undo” in an editor: the last action you did is the first one that gets undone. Stacks model any situation where “most recent” matters and you only need access to the top.

Example

Push 10, push 20, push 30. Top is 30. Pop returns 30; top is now 20. Pop returns 20; the stack has only 10 left. Order of removal is always the reverse of order of insertion.

Formal Definition

Concept Note

Stack (ADT): A collection that supports: (1) push(x)—add element x on top of the stack; (2) pop()—remove and return the top element (undefined if the stack is empty); (3) peek() or top()—return the top element without removing it (undefined if empty); (4) isEmpty() (or empty())—return true if the stack has no elements. Optionally: size(). The only element that can be accessed or removed is the one most recently pushed—the top. No random access by index. LIFO order is guaranteed.

Implementation can use a dynamic array (list) or a singly linked list (push and pop at the head). Both give O(1) amortized or O(1) push/pop/peek.

Why This Topic Matters

Foundation for Stack & Queue section: Stacks and queues are the simplest linear ADTs after arrays and lists. Many interview problems (“valid parentheses,” “next greater element,” “min stack”) are stack-based.
Call stack: Recursion and function calls are implemented with a stack. Understanding stacks helps you reason about recursion and stack overflow.
Algorithm building block: DFS uses an explicit or implicit stack; expression evaluation (postfix), bracket matching, and monotonic stack problems all rely on the LIFO property.

Mental Model

Picture a vertical tube open at the top. You drop items in one at a time; they pile up. The only item you can see or take out is the one on top. That’s the stack. In code, we maintain a “top” (e.g. the last index in an array, or the head of a linked list) and only add or remove there. No need to shift elements—push and pop are O(1).

Step-by-Step: Operations

Using a dynamic array (list)

Push: Append the element to the end of the list. Top = last index. O(1) amortized.
Pop: Remove and return the last element (list.pop()). O(1).
Peek/Top: Return the last element (list[-1]) without removing. O(1).
isEmpty: Check if the list is empty (len(list) == 0). O(1).

Using a singly linked list

Use the head as the top. Push: create a new node, set its next to the current head, set head to the new node. Pop: save head’s value, set head = head.next, return the value. Peek: return head.data (or head.val). All O(1).

ASCII Diagram

  Stack (top on the right):

  push(10):  [10]
  push(20):   [10, 20]      top = 20
  push(30):   [10, 20, 30]  top = 30
  pop():      [10, 20]      returns 30
  peek():     [10, 20]      returns 20
  pop():      [10]          returns 20

  Linked-list version: top = head
  head → 30 → 20 → 10 → None   (30 is top)
  push: new_node.next = head; head = new_node
  pop: val = head.val; head = head.next; return val

Python Implementation

Using list (recommended in Python)

class Stack:
    def __init__(self):
        self._data = []

    def push(self, x):
        self._data.append(x)

    def pop(self):
        if self.is_empty():
            raise IndexError("pop from empty stack")
        return self._data.pop()

    def peek(self):
        if self.is_empty():
            raise IndexError("peek from empty stack")
        return self._data[-1]

    def is_empty(self):
        return len(self._data) == 0

    def size(self):
        return len(self._data)

Using linked list (for practice)

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

class StackLinkedList:
    def __init__(self):
        self.head = None

    def push(self, x):
        new = Node(x)
        new.next = self.head
        self.head = new

    def pop(self):
        if self.head is None:
            raise IndexError("pop from empty stack")
        val = self.head.data
        self.head = self.head.next
        return val

    def peek(self):
        if self.head is None:
            raise IndexError("peek from empty stack")
        return self.head.data

    def is_empty(self):
        return self.head is None

Line-by-Line Explanation (List Version)

_data = []: Internal list; we only add/remove at the end so LIFO is preserved.
push(x): append(x): Add at the end; that becomes the new top. O(1) amortized.
pop(): _data.pop(): Remove and return the last element. Check is_empty first to avoid popping from an empty list.
peek(): _data[-1]: Return last element without removing. Must check is_empty.
is_empty(): len(_data) == 0: O(1). Alternatively return not self._data.

Time Complexity

Push: O(1) amortized (list append may occasionally resize). Pop: O(1). Peek: O(1). isEmpty, size: O(1). All core operations are constant time.

Space Complexity

O(n) where n is the number of elements in the stack. The list (or linked list) stores each element once.

Edge Cases

Pop from empty stack: Undefined behavior unless we define it. In the implementation above we raise IndexError. In problems, often the input is guaranteed non-empty, or we must check before popping.
Peek on empty stack: Similarly, raise or return a sentinel. Always consider “what if the stack is empty?” when using peek in algorithms (e.g. bracket matching).
Push None: Allowed if the stack is intended to hold any value. Some problems use the stack to store indices or nodes; None can be a valid element or a sentinel—clarify in interviews.

Common Mistakes

Common Mistake

Popping or peeking without checking if the stack is empty. In production code, always guard pop/peek with is_empty (or a size check) or document that the caller must ensure non-empty. In contest/problem code, if the problem says “non-empty,” you may skip the check; otherwise one invalid input can cause a runtime error.

Using the wrong end of the list: For a stack, push and pop must happen at the same end. If you push with append but pop from the front (pop(0)), that’s O(n) per pop and it’s a queue-like order, not LIFO. Stick to append + pop() for the list-based stack.
Confusing stack with queue: Stack = LIFO (last in, first out). Queue = FIFO (first in, first out). Use a stack when you need “most recent” or “reverse order” (e.g. DFS, undo, bracket matching).

When to Use a Stack

Use a stack when the problem has one or more of: (1) LIFO / “most recent”—e.g. undo, backtracking; (2) Matching or nesting—e.g. valid parentheses, balanced brackets, HTML tags; (3) Need to “look back” at the last relevant element—e.g. next greater element (monotonic stack), stock span; (4) DFS—explicit stack or recursion (call stack); (5) Expression evaluation—postfix (RPN) or infix with operator stack. If you need “first in, first out,” use a queue instead.

Expert Tip

In Python, you often don’t need a custom Stack class: a list used with append and pop() is a stack. For interviews, either use a list directly (“I’ll use a list as a stack: append for push, pop for pop”) or implement a thin wrapper. Knowing the ADT and when to use it matters more than the exact class name.

Interview Insight

When the problem involves “matching pairs,” “nested structure,” “most recent,” or “reverse order,” mention a stack. Implement with a list: push = append, pop = pop(), peek = list[-1]. State the time: O(1) per operation, O(n) space. For “min stack” or “max stack,” you’ll extend this with an auxiliary structure (e.g. second stack or heap) in later topics.

Practice Problems

LeetCode 20: Valid Parentheses (stack to match brackets).
LeetCode 155: Min Stack (stack + auxiliary structure for O(1) getMin).
LeetCode 232: Implement Queue using Stacks (two stacks for FIFO).
LeetCode 94: Binary Tree Inorder Traversal (iterative version uses a stack).

Summary

A stack is LIFO: push (add on top), pop (remove top), peek (read top). All O(1) with list or linked list. Optionally isEmpty and size.
In Python, use a list: append for push, pop() for pop, list[-1] for peek. Or implement a small Stack class that wraps a list.
Use a stack for: matching/nesting (parentheses), “most recent” (undo), next-greater/span (monotonic stack), DFS, and expression evaluation. Guard pop/peek when the stack may be empty.

9.2 Monotonic Stack

Introduction

A monotonic stack is a stack that maintains elements in strictly increasing or strictly decreasing order (from bottom to top). We use it to answer “for each element, find the next (or previous) element that is greater or smaller” in O(n) time for the entire array—instead of O(n) per element with a naive scan. The idea: as we scan the array, we push indices (or values) onto the stack; before pushing, we pop all elements that “break” the desired monotonicity. Those pops give us the answers: for example, when we pop an index because we found a larger value, that larger value is the “next greater element” for the popped index. Monotonic stacks power problems like Next Greater Element, Stock Span, and Largest Rectangle in Histogram. This section builds the pattern from intuition to code.

Real-World Analogy

Imagine a line of people by height. You walk from left to right. You want to know, for each person, “who is the next person to my right who is taller than me?” When you meet someone taller, they are the “next taller” for everyone you’re still “holding” (people you passed who haven’t found their answer yet). You keep a mental stack of people who haven’t found their “next taller” person. When a new person arrives, anyone on your stack who is shorter than this new person has found their answer—this new person. You remove them from the stack and record the answer, then add the new person to the stack. The stack always has people in increasing height (bottom to top)—monotonic increasing. That’s the monotonic stack.

Example

Array [2, 1, 4, 3, 5]. Next greater element (NGE) for each: 2→4, 1→4, 4→5, 3→5, 5→-1. We scan left to right, keep a stack of indices where we haven’t found NGE yet. When we see 4, 2 and 1 are smaller so we pop them and set their NGE = 4. Then we push 4’s index. When we see 5, we pop 4 and 3 (smaller) and set their NGE = 5. Result: [4, 4, 5, 5, -1].

Formal Definition

Concept Note

Monotonic stack: A stack (usually storing indices) that we maintain in monotonic order relative to the corresponding array values. Monotonically increasing stack: From bottom to top, the values at the stacked indices are in non-decreasing (or strictly increasing) order. Used when we want “next greater” or “previous greater” type queries. Monotonically decreasing stack: From bottom to top, values are in non-increasing order. Used for “next smaller” or “previous smaller.” The key: when we push a new index, we first pop all indices whose values violate the order; each pop often corresponds to finding an “answer” (e.g. next greater element) for that index.

We typically store indices in the stack so we can both compare values (arr[stack[-1]]) and write answers (result[stack.pop()] = current value or current index).

Why This Topic Matters

Interview staple: Next Greater Element (LeetCode 496, 503), Stock Span (739-style), Largest Rectangle in Histogram (84), Trapping Rain Water (42)—all use monotonic stacks. Recognizing the pattern is half the solution.
O(n) where naive is O(n²): For each position, “find next greater” naively is O(n) per element. With a monotonic stack, each element is pushed once and popped at most once, so total O(n).
Reusable pattern: Same idea applies to “next smaller,” “previous greater,” “previous smaller,” and to 2D problems (e.g. maximal rectangle).

Mental Model

We scan the array (left to right for “next,” or right to left for “previous”). The stack holds “candidates” that are still waiting for their answer. For “next greater element”: we want the stack to have smaller elements at the bottom and larger at the top (so the top is the “smallest so far” among the waiting). When a new value comes in that is greater than the stack top, the stack top has found its next greater—the new value. We pop and assign the answer, and repeat until the stack is empty or the top is not smaller. Then we push the current index. So the stack stays sorted by value (bottom ≤ top in terms of arr[i]). That’s a monotonically increasing stack (by value). For “next smaller,” we maintain a monotonically decreasing stack: pop while current is smaller than top, and the current is the “next smaller” for the popped indices.

Step-by-Step: Next Greater Element (NGE)

Given array arr, find for each index i the next greater element (first index j > i such that arr[j] > arr[i]). If none, -1.

Initialize result = [-1] * n and an empty stack (of indices).
For each index i from 0 to n-1:
- While the stack is not empty and arr[stack[-1]] < arr[i]: the element at stack[-1] has found its next greater—it’s arr[i]. Set result[stack.pop()] = arr[i] (or = i if you need the index).
- Push i onto the stack.
Indices left in the stack have no next greater element; they already have -1 in result.

The stack maintains indices whose values are in increasing order (bottom to top): when we see a larger value, we “resolve” all smaller ones.

ASCII Diagram: Next Greater Element

  arr:  [2,  1,  4,  3,  5]
  i=0:  stack=[0]           (value 2)
  i=1:  stack=[0,1]         (2, 1 - both waiting)
  i=2:  arr[2]=4 > 1 → pop 1, result[1]=4
        arr[2]=4 > 2 → pop 0, result[0]=4
        stack=[2]
  i=3:  stack=[2,3]          (4, 3)
  i=4:  arr[4]=5 > 3 → pop 3, result[3]=5
        arr[4]=5 > 4 → pop 2, result[2]=5
        stack=[4]
  result: [4, 4, 5, 5, -1]

Monotonically Increasing vs Decreasing

Stack type	Pop condition (scan L→R)	Use for
Increasing (bottom→top)	Pop while arr[stack[-1]] < arr[i]	Next greater element
Decreasing (bottom→top)	Pop while arr[stack[-1]] > arr[i]	Next smaller element

For “previous” greater/smaller, scan right to left and the same pop logic applies (previous greater = scan from right, pop when current is greater than top, etc.).

Python Implementation

Next Greater Element (each element)

def next_greater_element(arr):
    n = len(arr)
    result = [-1] * n
    stack = []   # indices
    for i in range(n):
        while stack and arr[stack[-1]] < arr[i]:
            idx = stack.pop()
            result[idx] = arr[i]   # or result[idx] = i for index
        stack.append(i)
    return result

Next Smaller Element

def next_smaller_element(arr):
    n = len(arr)
    result = [-1] * n
    stack = []
    for i in range(n):
        while stack and arr[stack[-1]] > arr[i]:
            idx = stack.pop()
            result[idx] = arr[i]
        stack.append(i)
    return result

Only the comparison changes: < for next greater (pop when current is larger), > for next smaller (pop when current is smaller).

Line-by-Line Explanation (NGE)

result = [-1] * n: Default “no next greater.” We only update when we pop.
stack: Holds indices of elements that don’t have an answer yet. Stack values (arr[stack]) are in increasing order from bottom to top.
while stack and arr[stack[-1]] < arr[i]: Current value arr[i] is greater than the top’s value—so the top has found its next greater element (arr[i]). Pop and record.
stack.append(i): After resolving everyone we can, push the current index. It becomes the new top (and it’s the smallest in the stack in terms of value, preserving monotonicity).

Time Complexity

Each index is pushed exactly once and popped at most once. So total operations are O(n). Time O(n).

Space Complexity

The stack can hold up to n indices in the worst case (e.g. strictly decreasing array—nothing gets popped until the end). Space O(n).

Edge Cases

Empty array: Return [] or [-1] as appropriate. The loop doesn’t run.
Strictly decreasing array: No element has a next greater. Stack grows to size n; result stays all -1.
Strictly increasing array: Each element’s next greater is the next element. We pop one per step; stack size stays 1.
Duplicate values: For “next greater” we use < so equal values don’t pop each other; both stay in the stack. For “next greater or equal” you’d use <= when popping so equals resolve each other. Clarify with the problem.

Common Mistakes

Common Mistake

Storing values instead of indices. If you store values, you can compare but you cannot write the result for the correct index. Always store indices in the stack (and use arr[stack[-1]] for comparison) so that when you pop, you know which position’s answer to set.

Wrong comparison direction: Next greater → pop when current is greater than top (arr[stack[-1]] < arr[i]). Next smaller → pop when current is smaller than top (arr[stack[-1]] > arr[i]). Reversing these gives wrong answers.
Previous vs next: “Next” = scan left to right; “previous” = scan right to left. The pop condition is the same; only the loop direction and which index gets the answer change.

Variants: Previous Greater, Stock Span

Previous greater element: For each i, find the previous (j < i) such that arr[j] > arr[i]. Scan right to left (i from n-1 to 0). Pop while arr[stack[-1]] < arr[i]; then result[i] = arr[stack[-1]] (or -1 if stack empty), and push i. Stack stays decreasing (bottom to top) in value.

Stock span: For each day i, find how many consecutive days to the left (including today) where price was ≤ today’s price. Equivalently: find “previous greater” index j; span = i - j. Use a decreasing stack (by value), scan left to right; when we pop, we know the popped index’s “previous greater” is the current index (or -1). Span for current = current index - stack[-1] after pops (or current + 1 if stack empty).

Optimization Insight

The monotonic stack achieves O(n) by ensuring each element is pushed once and popped at most once. There is no faster asymptotic time for “next greater for all” because we must at least read each element. The pattern extends to “next greater in a circular array” (LeetCode 503): traverse twice or use indices modulo n so that the second pass resolves elements that didn’t find a greater in the first pass.

Pattern Recognition

When the problem asks for “next/previous greater/smaller” for each element, or “nearest” such element, think monotonic stack. Keywords: next greater, next smaller, stock span, histogram rectangle, trapping rain water (nearest higher bars). Decide: (1) next or previous? (scan direction); (2) greater or smaller? (pop condition and stack order). Store indices; use arr[i] for comparisons.

Expert Tip

For “largest rectangle in histogram” (LeetCode 84): for each bar, we need the “previous smaller” and “next smaller” (indices where height is less than current). Then width = next_smaller - previous_smaller - 1 and area = height * width. Run both a “previous smaller” and “next smaller” pass (or do both in one pass with a single stack and careful bookkeeping). Same monotonic stack pattern—decreasing stack for “smaller.”

Interview Insight

State: “For each element I need the next greater element. I’ll use a monotonic stack of indices: when I see a value larger than the stack top’s value, the stack top has found its next greater—I pop and record, then push the current index. The stack stays increasing by value so each element is pushed and popped at most once—O(n) time.” Give the code and mention edge cases (empty, all -1 for decreasing array). If asked for “previous greater” or “next smaller,” explain the same idea with different scan direction or comparison.

Practice Problems

LeetCode 496: Next Greater Element I (subset in a larger array).
LeetCode 503: Next Greater Element II (circular array; double the array or traverse twice).
LeetCode 739: Daily Temperatures (next greater index; store index, result is index - i).
LeetCode 84: Largest Rectangle in Histogram (previous and next smaller).
LeetCode 42: Trapping Rain Water (can use monotonic stack or two pointers).

Summary

A monotonic stack keeps elements (usually indices) in increasing or decreasing order by value. We pop when the current value “breaks” that order; each pop often assigns an answer (e.g. next greater = current value).
Next greater: Scan left to right, stack of indices in increasing value (bottom to top). Pop while arr[stack[-1]] < arr[i]; result[pop] = arr[i]. Then push i. Time O(n), space O(n).
Next smaller: Same scan; pop while arr[stack[-1]] > arr[i]. Store indices (not values) so we can write results correctly. Use for stock span, histogram rectangle, and “nearest” type problems.

9.3 Next Greater Element

Introduction

The Next Greater Element (NGE) problem: for each element in an array, find the first element to its right that is strictly greater. If none exists, use -1 (or a sentinel). This appears in two classic forms: (1) NGE for the whole array—return an array where result[i] = next greater of arr[i]; (2) NGE I (LeetCode 496)—given a subset array nums1 and a larger array nums2, find the next greater element in nums2 for each value in nums1 and return answers in nums1’s order; (3) NGE II (LeetCode 503)—same as (1) but the array is circular, so “to the right” wraps around. We solve all with the same monotonic stack idea from Section 9.2: O(n) time, one pass (or two for circular). This section focuses on problem formulation, brute force vs optimal, and the circular variant.

Real-World Analogy

Imagine standing in a line of people with numbers on their shirts. For each person, you want to know: “who is the next person to my right with a higher number?” The first such person to the right is that element’s “next greater.” In a circular line, after the last person you wrap to the first. The monotonic stack is like remembering “everyone who hasn’t found their next higher person yet”; when someone with a higher number arrives, they are the answer for everyone you’re still holding.

Example

Array [4, 2, 1, 5, 3]. NGE: 4→5, 2→5, 1→5, 5→-1, 3→-1. So result = [5, 5, 5, -1, -1]. For circular: after 3 we wrap; 4 is to the “right” of 3, but 4 is not greater than 3; 5 is, so 3→5. Circular result: [5, 5, 5, -1, 5].

Formal Definition

Concept Note

Next Greater Element (standard): Given array arr of length n, for each index i find the smallest index j such that j > i and arr[j] > arr[i]. The value of the next greater element is arr[j], or -1 if no such j exists. NGE I (496): nums1 is a subset of nums2. For each element x in nums1, find the next greater element of x in nums2 (i.e. the first element to the right of x’s position in nums2 that is greater than x). Return an array of the same length as nums1 with these values (or -1). NGE II (503): Same as standard but the array is circular: “next” wraps from the end to the start. So for each index i we consider indices i+1, i+2, …, n-1, 0, 1, … until we find a greater element.

Why This Topic Matters

LeetCode 496 and 503: Direct “Next Greater Element” problems. 496 tests mapping from a subset to a larger array; 503 tests the circular extension. Both are solved with the same stack.
Daily Temperatures (739): For each day, find the number of days until a warmer day—i.e. the distance to the next greater element, not the value. Same stack; store indices and result[i] = (popped index - i) or (next_greater_index - i).
Building block: Many “nearest larger” or “next/previous greater” problems reduce to one or two NGE-style passes.

Mental Model

Scan left to right. The stack holds indices of elements that haven’t found their next greater yet. When we see a value larger than the stack top’s value, the stack top has found its answer—the current value. Pop and assign, repeat, then push the current index. The stack stays monotonically increasing by value (bottom to top). For circular: after the first pass, indices left in the stack might still have an answer in the “wrap-around” part—so run a second pass (or traverse indices 0 to 2n-1 with index mod n) and use the same pop logic; then every element gets at most one chance from the left part of the array.

Step-by-Step: Standard NGE (One Array)

result = [-1] * n, stack = [].
For i in 0..n-1: while stack and arr[stack[-1]] < arr[i]: result[stack.pop()] = arr[i]. Then stack.append(i).
Return result. Elements still in the stack have no next greater (remain -1).

Step-by-Step: NGE II (Circular)

result = [-1] * n, stack = [].
Traverse twice: for i in range(2 * n), use index j = i % n and value arr[j]. While stack and arr[stack[-1]] < arr[j]: result[stack.pop()] = arr[j]. Then, only in the first pass (i < n), stack.append(j). (We only push each index once—during the first pass.)
After two full passes, every element that has a next greater in the circular sense has been assigned. Return result.

Alternatively: push indices 0..n-1 in the first pass; in the second pass (i from n to 2n-1, j = i % n) only do the pop-and-assign, no push. Same effect.

Evolution: Brute Force → Optimal

Brute Force

For each index i, scan j from i+1 to n-1 (and for circular, then 0 to i-1) until arr[j] > arr[i]. Set result[i] = arr[j] or -1. Time O(n²), space O(1) for the result.

Optimal: Monotonic Stack

One pass (or two for circular); each index pushed and popped at most once. Time O(n), space O(n) for the stack. See Section 9.2 for the mechanics.

Python Implementation

Standard NGE (one array)

def next_greater_element(arr):
    n = len(arr)
    result = [-1] * n
    stack = []
    for i in range(n):
        while stack and arr[stack[-1]] < arr[i]:
            result[stack.pop()] = arr[i]
        stack.append(i)
    return result

NGE II (circular)

def next_greater_element_circular(arr):
    n = len(arr)
    result = [-1] * n
    stack = []
    for i in range(2 * n):
        j = i % n
        while stack and arr[stack[-1]] < arr[j]:
            result[stack.pop()] = arr[j]
        if i < n:
            stack.append(j)
    return result

NGE I (LeetCode 496: nums1 subset of nums2)

def next_greater_element_1(nums1, nums2):
    # Build NGE for nums2
    nge2 = {}
    stack = []
    for x in nums2:
        while stack and stack[-1] < x:
            nge2[stack.pop()] = x
        stack.append(x)
    while stack:
        nge2[stack.pop()] = -1
    return [nge2.get(x, -1) for x in nums1]

Here we use a stack of values (since we need to map value → next greater value for nums1). For each x in nums2, pop all values smaller than x and set their NGE = x; then push x. At the end, remaining values have no NGE (-1). Then for each value in nums1, look up in the map.

Time and Space Complexity

Standard and circular: Each index is pushed once and popped at most once. Time O(n). Space O(n) for the stack and result. NGE I: One pass over nums2 to build the map, then one pass over nums1 to build the answer. Time O(len(nums2) + len(nums1)), space O(len(nums2)) for the stack and map.

Edge Cases

Empty array: Return [].
Single element: result = [-1].
Strictly decreasing array: result = [-1] * n. Stack grows to n; nothing is ever popped until we finish.
Circular, all equal: No element has a strictly greater; result = [-1] * n.
NGE I: If nums1 contains a value not in nums2, we can define NGE as -1 (as in the code with get(x, -1)).

Common Mistakes

Common Mistake

Circular: pushing the same index twice. In the second pass we only want to resolve remaining stack indices (assign their next greater from the wrap-around). We must not push the same index again, or we’ll assign wrong answers. So push only when i < n (first pass).

Strictly greater vs greater-or-equal: Standard definition is “strictly greater” (arr[j] > arr[i]). If the problem says “greater or equal,” use >= when comparing and popping.
NGE I: The problem asks for “next greater in nums2” for each value in nums1. Build the NGE map for the whole nums2, then map nums1 values to answers. Don’t scan nums2 for each nums1 value (that would be O(n*m)).

Daily Temperatures (739) Connection

For each day i, return the number of days you have to wait until a warmer day. So we need the index of the next greater element, not the value. result[i] = (that index) - i, or 0 if no warmer day. Same monotonic stack: store indices; when we pop index j because current i is warmer, set result[j] = i - j. Code: result[stack.pop()] = i - popped_index. Rest unchanged.

Expert Tip

For “next greater index” (e.g. Daily Temperatures), store indices in the stack and assign result[pop()] = i - pop() (or the current index minus the popped index). For “next greater value,” assign result[pop()] = arr[i]. Same loop; only what you store in result changes.

Interview Insight

State: “For each element I need the first element to the right that is strictly greater. I’ll use a monotonic stack of indices: scan left to right, pop and set result when current value is greater than top’s value, then push current index. O(n) time.” For circular: “I’ll traverse 2n positions with index mod n; only push in the first n steps so each index is pushed once. Second pass resolves wrap-around.” For NGE I: “Build the next-greater map for nums2 in one pass with the same stack, then map each nums1 value to its NGE.”

Practice Problems

LeetCode 496: Next Greater Element I (nums1, nums2; build NGE for nums2, map to nums1).
LeetCode 503: Next Greater Element II (circular; two passes or 2n loop with mod).
LeetCode 739: Daily Temperatures (next greater index; result[i] = index - i).

Summary

Next Greater Element: For each index i, result[i] = first value to the right that is strictly greater, or -1. Use a monotonically increasing stack (by value); pop when current > top and assign result[pop] = current value; push current index. O(n) time and space.
Circular (NGE II): Traverse 2n with index mod n; only push when index < n. Second pass resolves wrap-around. Same O(n).
NGE I: Build NGE map for nums2 (stack of values, pop when current > top, map[pop] = current); then answer[i] = map[nums1[i]]. Daily Temperatures: store indices, result[pop] = current_index - pop.

9.4 Expression Evaluation

Introduction

Expression evaluation is the task of computing the value of a mathematical expression given as a string (e.g. "3 + 4 * 2" or "( 1 + 2 ) * 3"). Expressions can be written in infix (operator between operands, e.g. 3 + 4), postfix (RPN—operands first, then operator, e.g. 3 4 +), or prefix (Polish—operator first). Stacks are the natural tool: postfix is evaluated with a single stack (scan left to right; push operands, when you see an operator pop two operands, compute, push result). Infix is either converted to postfix first (using an operator stack and precedence rules) or evaluated with two stacks (operands and operators) or with a single pass that respects parentheses and precedence. This section covers postfix evaluation, infix-to-postfix conversion, and a simple calculator pattern. Time is O(n) for a string of length n.

Real-World Analogy

Think of postfix like a stack of instructions: “put 3 on the table, put 4 on the table, now add the top two and replace them with the result.” You only ever need to look at the “top” numbers and the next instruction. No parentheses or “do multiplication before addition”—the order of operations is fixed by the order of tokens. Infix is how we usually write: “3 + 4 * 2.” To evaluate correctly we must defer some operations (e.g. we see + but we don’t add yet if * comes next) and use a stack to hold operators until we can apply them. The stack is the “pending work” list.

Example

Postfix 3 4 + 2 *: push 3, push 4, see + → pop 4, pop 3, push 7. See 2 → push 2. See * → pop 2, pop 7, push 14. Result 14. Infix 3 + 4 * 2: we want 3 + (4*2) = 11; * has higher precedence than +, so we evaluate * before + when building postfix or when using two stacks.

Formal Definition

Concept Note

Infix: Operators appear between operands. Parentheses and precedence (e.g. * before +) determine order. Example: (1 + 2) * 3. Postfix (RPN): No parentheses; each operator follows its operands. Example: 1 2 + 3 * means (1+2)*3. Evaluation: scan left to right; operands go on a stack; when an operator is seen, pop two operands, apply the operator, push the result. Prefix: Operator precedes operands. Example: * + 1 2 3. Evaluation is often done right to left with a stack. We focus on postfix evaluation and infix to postfix (Shunting-yard idea) as the core stack-based methods.

Why This Topic Matters

Classic stack application: Postfix evaluation is the standard “stack for expression” example. Many interview problems (basic calculator, expression parsing) build on this.
LeetCode 150 (Evaluate Reverse Polish Notation), 224 (Basic Calculator), 227 (Basic Calculator II): Direct expression evaluation. 150 is postfix; 224/227 are infix with +, -, *, / and sometimes parentheses.
Parsing and compilers: Expression parsing is a small version of what parsers do. Understanding operator precedence and stack-based evaluation transfers to more complex parsing.

Mental Model

Postfix: One stack of numbers. Scan tokens: if it’s a number, push it; if it’s an operator, pop two numbers (right first, then left), compute left op right, push the result. At the end the stack has one value—the answer. Infix to postfix: One stack for operators (and maybe “(”). Output a list of tokens (postfix). Scan infix: numbers go straight to output; for an operator, pop from the stack and output all operators that have precedence ≥ current (and are not “(”), then push current; for “(” push; for “)” pop and output until “(”. Finally pop and output the rest of the stack.

Step-by-Step: Postfix Evaluation

Split the expression into tokens (numbers and operators). Assume valid postfix.
Stack = []. For each token: if token is a number, push it. If token is an operator (+, -, *, /), pop the top two values (call them right and left, in that order). Compute left op right (e.g. left - right for “-”). Push the result.
After processing all tokens, the stack should have exactly one element—the result. Return it.

Note: For subtraction and division, the first pop is the right operand and the second pop is the left. So “left op right” gives the correct order (e.g. 5 3 - → pop 3, pop 5 → 5 - 3 = 2).

Step-by-Step: Infix to Postfix (Shunting-Yard Idea)

Output list = [], operator stack = []. Scan infix tokens left to right.
Number: append to output.
“(”: push onto operator stack.
“)”: pop from stack and append to output until we pop “(”. Discard the “(”.
Operator (+, -, *, /): while the stack is not empty and the top is an operator with precedence ≥ current (and top ≠ “(”), pop and append to output. Then push the current operator.
End of input: pop all remaining operators from the stack and append to output. Result is the postfix expression. Then evaluate the postfix with the previous algorithm.

Precedence: * and / higher than + and -. Left associativity: when precedence is equal, pop the top first (e.g. 1 - 2 - 3 → 1 2 - 3 - in postfix).

ASCII Diagram: Postfix Evaluation

  Postfix: 3 4 + 2 *
  Token 3: stack [3]
  Token 4: stack [3, 4]
  Token +: pop 4, 3 → 3+4=7, stack [7]
  Token 2: stack [7, 2]
  Token *: pop 2, 7 → 7*2=14, stack [14]
  Result: 14

Python Implementation

Evaluate Postfix (LeetCode 150 style)

def eval_rpn(tokens):
    stack = []
    for t in tokens:
        if t in "+-*/":
            right = stack.pop()
            left = stack.pop()
            if t == "+": stack.append(left + right)
            elif t == "-": stack.append(left - right)
            elif t == "*": stack.append(left * right)
            else: stack.append(int(left / right))  # truncate toward zero
        else:
            stack.append(int(t))
    return stack[0]

LeetCode 150 uses string tokens like "3", "4", "+". Division truncates toward zero (e.g. 6 / -132 = 0 in Python 3 with int()).

Simple Infix Calculator (no parentheses, + - * /)

One common approach: treat expression as a sum of terms. Each term is a product (or single number). Scan: keep a running result and current “term”; when you see + or -, add the current term to the result and reset term; when you see * or /, update the term. Alternatively: first convert infix to postfix (with precedence), then evaluate postfix. Below is a two-stack style: numbers and operators; when we see an operator with lower or equal precedence than the top, we collapse (pop two numbers and one operator, compute, push result) until we can push the current operator.

def calculate_infix(s):
    # Remove spaces, then parse numbers and + - * /
    s = s.replace(" ", "")
    i, n = 0, len(s)
    num_stack = []
    op_stack = []
    precedence = {"+": 0, "-": 0, "*": 1, "/": 1}
    def apply_op():
        r, l = num_stack.pop(), num_stack.pop()
        op = op_stack.pop()
        if op == "+": num_stack.append(l + r)
        elif op == "-": num_stack.append(l - r)
        elif op == "*": num_stack.append(l * r)
        else: num_stack.append(int(l / r))
    while i < n:
        if s[i].isdigit():
            j = i
            while j < n and s[j].isdigit(): j += 1
            num_stack.append(int(s[i:j]))
            i = j
        else:
            op = s[i]
            while op_stack and precedence.get(op_stack[-1], -1) >= precedence[op]:
                apply_op()
            op_stack.append(op)
            i += 1
    while op_stack:
        apply_op()
    return num_stack[0]

This assumes a well-formed expression with no parentheses. For parentheses, push “(” and pop until “(” when we see “)”.

Line-by-Line: Postfix Evaluation

stack = []: We only need one stack for operands.
if t in "+-*/": Token is an operator. Pop two operands (right first, then left).
left - right, int(left / right): Order matters for - and /. Right was popped first. Integer division toward zero: use int(a / b) in Python for LeetCode 150.
else: stack.append(int(t)): Token is a number string; convert and push.
return stack[0]: Valid postfix leaves exactly one value.

Time and Space Complexity

Postfix evaluation: One pass over n tokens. Each token is pushed once; each operator causes two pops and one push. Time O(n), space O(n) for the stack (worst case: many operands before any operator). Infix to postfix: One pass; each token and each operator pushed and popped O(1) times. O(n) time and space.

Edge Cases

Single operand: Postfix with one number, e.g. ["42"]. Stack ends with [42]. Return 42.
Negative numbers: In LeetCode 150, tokens are strings; “-2” might be one token. Check problem: sometimes negative numbers are represented as “-”, “2” (two tokens). Handle according to problem.
Division by zero: Postfix can have “a 0 /”. Guard or assume valid input.
Integer division: LeetCode 150 requires truncation toward zero. In Python, int(6/ -132) is 0; 6 // -132 is -1. Use int(a / b) for “truncate toward zero.”

Common Mistakes

Common Mistake

Wrong operand order for subtraction and division. We pop the right operand first, then the left. So result = left op right. For “5 3 -” we want 5 - 3 = 2. If you do right - left you get -2. Same for division: “6 2 /” should be 3, not 1/3.

Precedence in infix: * and / must be applied before + and -. When converting to postfix, an operator of higher precedence on the stack is popped before pushing a lower-precedence one. When two operators have equal precedence (e.g. + and -), left associativity means pop the top first.
Parentheses in infix: “(” has the effect of “starting fresh” for precedence; “)” pops until we remove the matching “(”. Don’t output “(” or “)” in the postfix.

Basic Calculator With Parentheses (LeetCode 224)

Expression may contain +, -, parentheses, and spaces. One approach: use a stack to store the result and sign for each “level” of parentheses. When we see “(”, push current result and current sign; when we see “)”, pop and combine. Alternatively: convert to postfix respecting parentheses (treat “(” as highest precedence so we don’t pop past it until “)”), then evaluate. Another approach: recursive or iterative with a sign variable; when we see “(”, evaluate the subexpression (recursively or with a stack) and multiply by the sign.

Expert Tip

For “basic calculator” with +, -, (, ): keep a stack of (result_so_far, sign_before_this_level). When you see “(”, push (result, sign) and reset result=0, sign=1. When you see “)”, pop (prev, s), do result = prev + s * result. When you see “+”, sign=1; “-”, sign=-1. When you see a number, add sign*num to result. This avoids building a full postfix string.

Interview Insight

For postfix: “I’ll use one stack. Scan tokens: numbers go on the stack; for an operator I pop two operands (right then left), compute left op right, push the result. Final stack top is the answer. O(n) time.” Mention operand order for - and / and integer division. For infix: “I can convert to postfix using an operator stack and precedence, then evaluate, or use two stacks and collapse when precedence allows.”

Practice Problems

LeetCode 150: Evaluate Reverse Polish Notation (postfix evaluation).
LeetCode 224: Basic Calculator (+,-, parentheses, spaces).
LeetCode 227: Basic Calculator II (+, -, *, /, no parentheses).

Summary

Postfix evaluation: One stack. Scan tokens: push numbers; for an operator pop two (right, then left), compute left op right, push result. Return stack[0]. Order of operands matters for - and /.
Infix to postfix: Output list + operator stack. Numbers to output; “(” push; “)” pop to output until “(”; operator: pop and output while top has precedence ≥ current, then push. Evaluate the resulting postfix.
Use int(left / right) for integer division toward zero. Postfix and infix evaluation are O(n) time and space.

9.5 Queue Implementation

Introduction

A queue is a linear data structure that follows FIFO (First In, First Out): the first element added is the first one removed. It supports enqueue (add at the rear), dequeue (remove from the front), and peek (read the front without removing). Like a stack, we want these operations in O(1) time. In Python, using a list with append for enqueue and pop(0) for dequeue is wrong for performance—pop(0) is O(n) because it shifts all elements. The correct approach is collections.deque (double-ended queue), which supports O(1) append and popleft, or a linked list with head and tail pointers (enqueue at tail, dequeue at head). This section covers the queue ADT, correct Python usage, and a linked-list implementation for understanding.

Real-World Analogy

Think of a line at a ticket counter. People join at the back (enqueue) and are served from the front (dequeue). The first person in line is the first to be served—FIFO. You cannot serve someone from the middle; you only see who is at the front. Queues model task scheduling, BFS (breadth-first search), buffering, and any “first come, first served” scenario.

Example

Enqueue 10, 20, 30. Front is 10. Dequeue returns 10; front is now 20. Dequeue returns 20; the queue has only 30. Order of removal is always the same as order of insertion.

Formal Definition

Concept Note

Queue (ADT): A collection that supports: (1) enqueue(x)—add element x at the rear (back) of the queue; (2) dequeue()—remove and return the element at the front; (3) peek() or front()—return the front element without removing it; (4) isEmpty()—return true if the queue has no elements. Optionally: size(). Only the front element can be removed or read; only the rear can receive new elements. FIFO order is guaranteed. All operations should be O(1) for an efficient implementation.

Why This Topic Matters

BFS: Breadth-first search uses a queue to process nodes level by level. You enqueue neighbors and dequeue the current node. Using a list with pop(0) makes BFS O(n²) instead of O(n) on a graph with n nodes.
Interview and production: “Implement a queue,” “queue using stacks” (LeetCode 232), “sliding window” with a deque (Section 9.9). Correct choice of structure (deque vs list) matters.
Foundation for deque and priority queue: A queue is the basic FIFO structure; deque extends it with O(1) operations at both ends; priority queue (Section 9.8) orders by priority instead of arrival time.

Mental Model

Picture a horizontal tube: elements enter at the rear (right) and leave from the front (left). You only add at one end and remove from the other. In a linked list, we keep a head (front—where we dequeue) and a tail (rear—where we enqueue). Enqueue: create a new node, set tail.next = new_node, tail = new_node (or head = tail = new_node if empty). Dequeue: return head.data, head = head.next (and if head becomes None, set tail = None). In a deque (double-ended queue), the underlying structure allows O(1) append and popleft so we use it as a queue without implementing pointers ourselves.

Step-by-Step: Operations

Using collections.deque (recommended in Python)

Enqueue: q.append(x). O(1).
Dequeue: q.popleft(). O(1).
Peek: q[0]. O(1).
isEmpty: len(q) == 0 or not q. O(1).

Using a list (avoid for queues)

Enqueue with append is O(1), but dequeue with pop(0) is O(n)—every element shifts. Do not use a list as a queue when you expect many enqueue/dequeue operations.

Using a linked list (head + tail)

Enqueue: new node at tail; update tail. Dequeue: remove head; update head (and tail if queue becomes empty). Both O(1).

ASCII Diagram

  Queue (front left, rear right):

  enqueue(10), enqueue(20), enqueue(30):
  front → [10] → [20] → [30] ← rear

  dequeue() → returns 10
  front → [20] → [30] ← rear

  peek() → 20
  dequeue() → returns 20
  front → [30] ← rear  (head and tail point to same node)

Python Implementation

Using collections.deque (recommended)

from collections import deque

q = deque()
q.append(10)      # enqueue
q.append(20)
q.append(30)
front = q.popleft()   # dequeue → 10
peek = q[0]           # peek → 20
is_empty = len(q) == 0

Queue class wrapping deque

from collections import deque

class Queue:
    def __init__(self):
        self._data = deque()

    def enqueue(self, x):
        self._data.append(x)

    def dequeue(self):
        if self.is_empty():
            raise IndexError("dequeue from empty queue")
        return self._data.popleft()

    def peek(self):
        if self.is_empty():
            raise IndexError("peek from empty queue")
        return self._data[0]

    def is_empty(self):
        return len(self._data) == 0

    def size(self):
        return len(self._data)

Linked-list implementation

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

class QueueLinkedList:
    def __init__(self):
        self.head = None
        self.tail = None

    def enqueue(self, x):
        new = Node(x)
        if self.tail is None:
            self.head = self.tail = new
        else:
            self.tail.next = new
            self.tail = new

    def dequeue(self):
        if self.head is None:
            raise IndexError("dequeue from empty queue")
        val = self.head.data
        self.head = self.head.next
        if self.head is None:
            self.tail = None
        return val

    def peek(self):
        if self.head is None:
            raise IndexError("peek from empty queue")
        return self.head.data

    def is_empty(self):
        return self.head is None

Line-by-Line Explanation (Linked List)

enqueue: New node at tail. If queue was empty, head = tail = new. Else tail.next = new, tail = new. O(1).
dequeue: If empty, raise. Else save head.data, head = head.next. If head becomes None (was single element), set tail = None. Return saved value. O(1).
peek: Return head.data if not empty. O(1).

Time and Space Complexity

Enqueue, dequeue, peek, is_empty: O(1) with deque or linked list. List with pop(0): Dequeue is O(n). Space O(n) for n elements stored.

Edge Cases

Dequeue from empty queue: Undefined unless we define it. Raise IndexError or return None. Always check is_empty before dequeue/peek in production, or document that the caller must ensure non-empty.
Single element: After one enqueue, head == tail. One dequeue leaves head and tail both None. The linked-list code handles this with the if self.head is None: self.tail = None branch.

Common Mistakes

Common Mistake

Using a list with pop(0) for dequeue. pop(0) shifts all remaining elements left, so each dequeue is O(n). For BFS or any algorithm that does many enqueue/dequeue operations, this turns O(n) into O(n²). Always use collections.deque with append and popleft for a queue in Python.

Forgetting to update tail when queue becomes empty: In the linked-list dequeue, when you remove the last element (head.next becomes None), you must set tail = None. Otherwise tail still points to the removed node and the next enqueue can create a broken list.
Queue vs stack: Queue = FIFO (first in, first out); stack = LIFO (last in, first out). Use a queue for BFS and “first come first served”; use a stack for DFS and “most recent.”

Queue Using Two Stacks (LeetCode 232)

To implement a queue with two stacks: one stack for “input” (enqueue: push onto input stack) and one for “output” (dequeue: if output is empty, pop all from input and push onto output, then pop from output; else just pop from output). Peek: same as dequeue but don’t remove the top of output. Amortized O(1) per operation: each element is pushed and popped at most twice (once per stack). This is a common interview follow-up.

Expert Tip

For “implement queue using stacks,” use stack_in and stack_out. enqueue: push to stack_in. dequeue: if stack_out is empty, while stack_in: stack_out.push(stack_in.pop()); then return stack_out.pop(). This way the oldest element in stack_in ends up at the top of stack_out. Amortized O(1) because each element moves at most once from in to out.

Interview Insight

State: “A queue is FIFO—enqueue at rear, dequeue from front. In Python I use collections.deque with append for enqueue and popleft for dequeue—both O(1). I avoid list with pop(0) because that’s O(n) per dequeue.” If asked to implement from scratch, give the linked-list version with head and tail, and mention that deque is the standard library queue. If asked “queue using stacks,” describe the two-stack approach with amortized O(1).

Practice Problems

LeetCode 232: Implement Queue using Stacks (two stacks; amortized O(1)).
BFS on a graph or tree (queue for level-order).
LeetCode 225: Implement Stack using Queues (reverse problem—one queue and reorganize on push, or two queues).

Summary

A queue is FIFO: enqueue at rear, dequeue from front, peek at front. All O(1) with deque or linked list (head/tail).
In Python, use collections.deque: append to enqueue, popleft to dequeue. Do not use a list with pop(0)—it is O(n) per dequeue.
Linked-list queue: enqueue at tail (create node, tail.next = new, tail = new); dequeue at head (return head.data, head = head.next; set tail = None if empty). Queue using two stacks: enqueue pushes to one stack; dequeue pops from the other, refilling from the first when empty—amortized O(1).

9.6 Circular Queue

Introduction

A circular queue (or ring buffer) is a queue implemented with a fixed-size array where the front and rear indices wrap around to the start when they reach the end. This gives O(1) enqueue and dequeue without shifting elements and without dynamic allocation—useful in embedded systems, producer-consumer buffers, and when a bounded capacity is required. The main design choice is how to distinguish “empty” from “full”: (1) reserve one slot (never store more than capacity − 1 elements) so that “front == rear” means empty and “(rear + 1) % capacity == front” means full; or (2) store a separate size (or count) so empty is size == 0 and full is size == capacity. This section covers both approaches and a complete Python implementation (LeetCode 622 style).

Real-World Analogy

Imagine a circular track with a fixed number of stations. People board at the “rear” station and get off at the “front” station. When the rear reaches the last station, the next boarding happens at station 0—the track is circular. When the front catches up to the rear (all stations between them are empty), the queue is empty. When the rear is one step behind the front (going forward), the queue is full. The circular arrangement means we never “shift” anyone; we only move the front and rear pointers (indices) and use modulo arithmetic to wrap.

Example

Capacity 4 (indices 0–3). Empty: front = 0, rear = 0. Enqueue 10, 20: rear moves to 2, data = [10, 20, _, _]. Dequeue: return 10, front = 1. Enqueue 30, 40: rear wraps to 0, 1; data = [40, 20, 30, _] with front=1, rear=1 (if we use one-slot reservation) or we track size=3. The “hole” is at index 0 (behind front) for the next enqueue.

Formal Definition

Concept Note

Circular queue: A queue stored in a fixed-size array of capacity k. Two indices: front (where we dequeue) and rear (next free slot for enqueue, or last enqueued—depending on convention). Indices advance with wrap: front = (front + 1) % k, rear = (rear + 1) % k. Empty: no elements (front == rear if one slot is reserved; or size == 0). Full: capacity reached ((rear + 1) % k == front with one slot reserved; or size == k). Enqueue: place element at rear, advance rear (if not full). Dequeue: return element at front, advance front (if not empty). All operations O(1).

Why This Topic Matters

LeetCode 622: Design Circular Queue: Direct implementation problem. Tests understanding of wrap-around and full/empty handling.
Bounded buffers: Producer-consumer queues, task queues with a fixed maximum size, and streaming pipelines often use a circular queue to avoid unbounded growth and to reuse a fixed block of memory.
No shifting, O(1) per operation: Unlike a linear array queue (where dequeue would shift or we’d leave a hole), the circular design reuses freed slots by wrapping the rear index.

Mental Model

Picture an array bent into a circle. front points to the next element to dequeue; rear points to the next slot where we’ll enqueue (or to the last enqueued element—convention varies). When we enqueue, we write at rear and do rear = (rear + 1) % k. When we dequeue, we read from front and do front = (front + 1) % k. The queue content is the segment from front to rear (going forward, wrapping). If we don’t reserve a slot, “front == rear” can mean both empty and full—so we either waste one slot (max capacity k−1) or maintain a separate size/count variable.

Two Conventions: One Slot Reserved vs Size Counter

Convention 1: Reserve one slot (capacity − 1 elements max)

Empty: front == rear.
Full: (rear + 1) % capacity == front. We never fill the last slot so that “front == rear” uniquely means empty.
Enqueue: if not full, write at rear, then rear = (rear + 1) % capacity.
Dequeue: if not empty, read from front, then front = (front + 1) % capacity.

Convention 2: Store size (or count)

Empty: size == 0. Full: size == capacity.
Enqueue: if size < capacity, write at rear, rear = (rear + 1) % capacity, size += 1.
Dequeue: if size > 0, read from front, front = (front + 1) % capacity, size -= 1.

Both give O(1) operations. Convention 2 uses one extra integer (size) but allows storing exactly capacity elements; Convention 1 uses no extra variable but stores at most capacity − 1 elements.

Step-by-Step (Size-Based)

Allocate array of length capacity. front = 0, rear = 0, size = 0.
Enqueue(x): if size == capacity, return False (full). Else: arr[rear] = x, rear = (rear + 1) % capacity, size += 1, return True.
Dequeue(): if size == 0, return None or error. Else: val = arr[front], front = (front + 1) % capacity, size -= 1, return val.
Front(): if size == 0 return -1; else return arr[front]. Rear(): if size == 0 return -1; else return arr[(rear - 1 + capacity) % capacity] (last enqueued).

ASCII Diagram

  Capacity 4, indices 0..3 (circular):

  Empty:  front=0, rear=0, size=0
  [ _, _, _, _ ]
    ^
    f,r

  After enqueue 10, 20: front=0, rear=2, size=2
  [10, 20, _, _ ]
   ^     ^
   f     r

  After dequeue: front=1, rear=2, size=1
  [10, 20, _, _ ]
       ^  ^
       f  r

  After enqueue 30, 40: rear wraps to 0, 1; front=1, rear=2? No:
  rear = (rear+1)%4 twice: 2→3, 3→0. So rear=0. size=3.
  [40, 20, 30, _ ]   (rear=0 is next write slot; last written at index 3)
       ^  ^  ^  ^
       r  f        (front=1, rear=0)
  Rear element = arr[(rear-1+4)%4] = arr[3] = 40.

Python Implementation (LeetCode 622 Style)

class MyCircularQueue:
    def __init__(self, k: int):
        self.capacity = k
        self.arr = [0] * k
        self.front = 0
        self.rear = 0
        self.size = 0

    def enQueue(self, value: int) -> bool:
        if self.isFull():
            return False
        self.arr[self.rear] = value
        self.rear = (self.rear + 1) % self.capacity
        self.size += 1
        return True

    def deQueue(self) -> bool:
        if self.isEmpty():
            return False
        self.front = (self.front + 1) % self.capacity
        self.size -= 1
        return True

    def Front(self) -> int:
        if self.isEmpty():
            return -1
        return self.arr[self.front]

    def Rear(self) -> int:
        if self.isEmpty():
            return -1
        return self.arr[(self.rear - 1 + self.capacity) % self.capacity]

    def isEmpty(self) -> bool:
        return self.size == 0

    def isFull(self) -> bool:
        return self.size == self.capacity

LeetCode 622 uses method names enQueue, deQueue, Front, Rear, isEmpty, isFull; deQueue returns bool (success), and Front/Rear return -1 when empty. We use a size counter so we can store exactly capacity elements.

Line-by-Line Explanation

rear: Points to the next free slot. So after enqueue we write at rear, then advance rear.
enQueue: If full, False. Else arr[rear] = value, rear = (rear + 1) % capacity, size += 1. O(1).
deQueue: If empty, False. Else front = (front + 1) % capacity, size -= 1. We don’t need to clear the old cell. O(1).
Rear(): Last enqueued element is at (rear - 1 + capacity) % capacity, because rear is the next free slot. If empty, return -1.

Time and Space Complexity

Enqueue, dequeue, front, rear, isEmpty, isFull: all O(1). Space O(capacity) for the array and O(1) for the indices and size.

Edge Cases

Enqueue when full: Return False (or raise). Do not overwrite or advance rear.
Dequeue when empty: Return False (or None). Do not advance front.
Capacity 0 or 1: With size, capacity 1 is fine: one element, front == rear after one enqueue, size == 1. Rear() = arr[(rear-1+1)%1] = arr[0]. For capacity 0, enqueue should always fail.

Common Mistakes

Common Mistake

Confusing full and empty when not using a size. If you use “front == rear” for both empty and full (and don’t reserve a slot), you cannot distinguish. You must either (1) reserve one slot so full is (rear+1)%cap == front and never let rear “catch up” to front except when empty, or (2) maintain a size/count. Many bugs come from forgetting the one-slot reservation or the size update.

Wrong index for Rear(): If rear is the “next write” index, the last enqueued value is at (rear - 1 + capacity) % capacity. If rear is “last written,” then Rear() is arr[rear] and you advance rear after writing—then rear points to the last element. Be consistent with your convention.
Forgetting to wrap: Always use (rear + 1) % capacity and (front + 1) % capacity. Plain rear + 1 can go out of bounds when rear == capacity - 1.

One-Slot-Reserved Implementation (No Size)

class CircularQueueReservedSlot:
    def __init__(self, k: int):
        self.capacity = k
        self.arr = [0] * k
        self.front = 0
        self.rear = 0   # next free slot

    def enQueue(self, value: int) -> bool:
        if (self.rear + 1) % self.capacity == self.front:
            return False   # full
        self.arr[self.rear] = value
        self.rear = (self.rear + 1) % self.capacity
        return True

    def deQueue(self) -> bool:
        if self.front == self.rear:
            return False   # empty
        self.front = (self.front + 1) % self.capacity
        return True

    def Front(self) -> int:
        if self.front == self.rear:
            return -1
        return self.arr[self.front]

    def Rear(self) -> int:
        if self.front == self.rear:
            return -1
        return self.arr[(self.rear - 1 + self.capacity) % self.capacity]

    def isEmpty(self) -> bool:
        return self.front == self.rear

    def isFull(self) -> bool:
        return (self.rear + 1) % self.capacity == self.front

Here we store at most capacity − 1 elements. Full is detected by (rear + 1) % capacity == front.

Expert Tip

When implementing “design circular queue,” state your convention: “I’ll use a size counter so the queue can hold exactly capacity elements. front is the dequeue index, rear is the next enqueue index. Empty when size==0, full when size==capacity.” Alternatively: “I’ll reserve one slot so front==rear means empty and (rear+1)%cap==front means full; max elements = capacity−1.” Either is acceptable; size is easier to reason about for many people.

Interview Insight

Explain: “Circular queue uses a fixed array and two indices that wrap with modulo. I use a size variable so empty is size==0 and full is size==capacity. Enqueue: if not full, write at rear, rear = (rear+1)%cap, size++. Dequeue: if not empty, front = (front+1)%cap, size--. Front and Rear read at front and (rear-1+cap)%cap.” Mention the alternative (one slot reserved) and the risk of confusing full and empty without it.

Practice Problems

LeetCode 622: Design Circular Queue (implement with size or one-slot reservation).

Summary

A circular queue uses a fixed array and indices front and rear that wrap: (front + 1) % capacity, (rear + 1) % capacity. Enqueue at rear, dequeue at front. O(1) per operation.
Empty/full: either (1) use a size counter—empty when size==0, full when size==capacity (can store exactly capacity elements), or (2) reserve one slot—empty when front==rear, full when (rear+1)%capacity==front (max capacity−1 elements).
Rear(): last enqueued is at (rear - 1 + capacity) % capacity when rear is the “next free” index. Always wrap indices with modulo to avoid out-of-bounds.

9.7 Deque

Introduction

A deque (double-ended queue) is a linear structure that supports O(1) insertion and deletion at both the front and the rear. It generalizes both the stack (LIFO) and the queue (FIFO): you can push/pop from either end. In Python, collections.deque is implemented with a doubly linked list of blocks (or a circular buffer) and provides append, appendleft, pop, popleft, plus indexing and rotation. Use a deque when you need a queue (avoid list’s O(n) pop(0)), when you need stack-like and queue-like operations in the same structure, or when implementing sliding-window algorithms (e.g. monotonic deque for max in window). This section covers the deque ADT, the Python API, and typical use cases.

Real-World Analogy

Imagine a line where people can join or leave from either end. Someone can cut in at the front (appendleft) or join at the back (append); someone can leave from the front (popleft) or from the back (pop). That’s a deque. It’s more flexible than a strict queue (only rear in, only front out) or a stack (only one end). Real examples: browser history (back/forward can be modeled with two stacks or one deque), undo/redo with “insert at front” for new action; and algorithms that need to inspect or remove from both ends (e.g. sliding window maximum).

Example

d = deque(); d.append(10); d.append(20); d.appendleft(5). Order: [5, 10, 20]. d.popleft() → 5; d.pop() → 20. Remaining: [10]. You can use the same deque as a queue (append right, popleft) or as a stack (append and pop from the right).

Formal Definition

Concept Note

Deque (double-ended queue): A collection that supports insertion and deletion at both ends in O(1) time. Typical operations: append(x) (add at right/rear), appendleft(x) (add at left/front), pop() (remove and return from right), popleft() (remove and return from left). Optionally: peek at front or rear (e.g. d[0], d[-1]), rotate (shift elements left or right), clear, len. No random access in O(1) in a pure linked-list deque; Python’s deque allows index access in O(n) worst case (it’s implemented for good performance in practice for near-end access). The key guarantee is O(1) append, appendleft, pop, popleft.

Why This Topic Matters

Queue and stack in one: In Python, deque is the standard choice for a queue (append + popleft) and can double as a stack (append + pop). Using a list for a queue is wrong (pop(0) is O(n)); deque is correct.
Sliding window and monotonic deque: Problems like “max in every sliding window” (LeetCode 239) use a deque to keep candidates; we remove from the front when they leave the window and from the rear when a larger element makes them useless. Both ends are accessed in O(1).
BFS, palindromes, and rotation: BFS uses a queue (deque). Checking “can this string be a palindrome?” sometimes uses a deque of characters. Rotate operations (e.g. rotate(-1) to move left) are O(k) in Python but useful in some problems.

Mental Model

Picture a horizontal tube open at both ends. You can add or remove from the left (front) or the right (rear). So you have four core operations: add-left, add-right, remove-left, remove-right. The deque maintains order: the first element is at index 0 (front), the last at index -1 (rear). Use it as a queue by restricting to add-right and remove-left; as a stack by restricting to add-right and remove-right.

Python: collections.deque API

Constructor: deque(), deque(iterable), deque(iterable, maxlen=k) (bounded deque; when full, appending drops from the other end).
Add: d.append(x) (right), d.appendleft(x) (left). O(1).
Remove: d.pop() (right), d.popleft() (left). O(1). Raises IndexError if empty.
Access: d[0], d[-1] (front and rear). d[i] for arbitrary i is supported but O(n) in the middle; use for small indices or when needed.
Other: len(d), d.clear(), d.rotate(k) (positive k = rotate right: last becomes first; negative = rotate left). d.extend(iterable), d.extendleft(iterable) (note: extendleft adds elements in reverse order).

ASCII Diagram

  Deque:  front (left)  ←——  [ 5, 10, 20, 30 ]  ——→  rear (right)
          index 0                  ...              index -1

  append(40)    →  [ 5, 10, 20, 30, 40 ]
  appendleft(1) →  [ 1, 5, 10, 20, 30, 40 ]
  popleft()     →  returns 1; deque = [ 5, 10, 20, 30, 40 ]
  pop()         →  returns 40; deque = [ 5, 10, 20, 30 ]

Using Deque as Queue or Stack

Use as	Add	Remove
Queue (FIFO)	append (rear)	popleft (front)
Stack (LIFO)	append (top)	pop (top)

Python Examples

from collections import deque

# As queue (BFS)
q = deque()
q.append(1)
q.append(2)
x = q.popleft()   # 1

# As stack
s = deque()
s.append(10)
s.append(20)
y = s.pop()       # 20

# Both ends
d = deque([1, 2, 3])
d.appendleft(0)   # [0, 1, 2, 3]
d.rotate(1)        # [3, 0, 1, 2]  (right rotate)
d.rotate(-1)      # [0, 1, 2, 3]  (left rotate)

Bounded Deque (maxlen)

When you create deque(iterable, maxlen=k), the deque can hold at most k elements. When it is full and you append, the leftmost element is automatically dropped; when you appendleft, the rightmost is dropped. So a bounded deque behaves like a sliding window of the last k elements (if you only append). Useful for “last k items” or a fixed-size buffer.

d = deque(maxlen=3)
d.append(1)
d.append(2)
d.append(3)   # [1, 2, 3]
d.append(4)   # [2, 3, 4]  (1 dropped)

Time and Space Complexity

append, appendleft, pop, popleft: O(1). Index access d[i]: O(n) in the middle for a linked-list-based implementation; Python’s deque is optimized but still avoid repeated indexing in a loop. rotate(k): O(k). len(d), d[0], d[-1]: O(1). Space O(n) for n elements.

Edge Cases

pop/popleft on empty deque: Raises IndexError. Check if d: or len(d) > 0 before popping if the structure might be empty.
extendleft(iterable): Inserts elements in reverse order (each is appended to the left in turn). So deque([1,2]).extendleft([3,4]) gives [4, 3, 1, 2]. Use it when you want the iterable’s first element to end up at the left.

Common Mistakes

Common Mistake

Using a list as a queue. q = []; q.append(x); q.pop(0) makes dequeue O(n). Use deque: q.append(x); q.popleft() for O(1). Similarly, don’t use list.insert(0, x) for “add to front” in a queue-like scenario—that’s O(n). Use deque.appendleft(x).

Confusing left and right: “Front” of a queue is the left (index 0); we popleft. “Rear” is the right (index -1); we append there. In a stack, “top” is the right; we append and pop.
Assuming deque supports efficient random access: d[i] for arbitrary i can be O(n). Prefer operating at the ends (d[0], d[-1], pop, popleft) when possible.

Monotonic Deque (Sliding Window Maximum)

For “max in every sliding window of size k,” we maintain a deque of indices such that their values are in decreasing order (front = index of current max in the window). When the window moves: (1) remove from front if that index has left the window; (2) from the rear, remove indices whose values are ≤ the new element (they can never be the max again); (3) append the new index at the rear. The front is always the index of the maximum in the current window. We need both popleft (index left the window) and pop (back is smaller than new)—hence a deque. See Section 9.9 (Monotonic Queue) for full detail.

Expert Tip

When you need to remove from both the front (e.g. “element left the window”) and the rear (e.g. “this element is dominated by the new one”), use a deque. When you only remove from one end (e.g. stack: only top; queue: only front), a stack or a simple queue is enough. The deque is the “both ends” structure.

Interview Insight

State: “A deque supports O(1) add and remove at both ends. In Python I use collections.deque: append and popleft for a queue, append and pop for a stack. For sliding window max I keep a deque of indices in decreasing value order and remove from front when the index is out of the window and from the rear when the new element is greater.” Mention that list with pop(0) or insert(0, x) is O(n) and should be avoided for queue-like use.

Practice Problems

LeetCode 239: Sliding Window Maximum (monotonic deque of indices).
LeetCode 232: Implement Queue using Stacks (or use deque as the queue).
BFS: use deque with append and popleft.
Palindrome checker: use a deque (popleft and pop to compare from both ends).

Summary

A deque allows O(1) add and remove at both ends. In Python: collections.deque with append, appendleft, pop, popleft.
Use as queue: append (enqueue), popleft (dequeue). Use as stack: append (push), pop (pop). Never use a list with pop(0) for a queue.
Bounded deque: deque(maxlen=k) drops the opposite end when full. Monotonic deque: keep candidates in order; remove from front (out of window) and from rear (dominated)—key for sliding window maximum.

9.8 Priority Queue

Introduction

A priority queue is a collection where each element has a priority (or key), and we always remove the element with the highest (or lowest) priority—instead of FIFO like a normal queue. It supports insert (add an element with a priority) and extract-max (or extract-min) in O(log n) time when implemented with a binary heap. Peek (see the max or min without removing) is O(1). Priority queues are used in Dijkstra’s algorithm (always expand the closest vertex), merge k sorted lists (always take the smallest of the k heads), “top k” and “kth largest” problems, and task scheduling. In Python, the heapq module provides a min-heap (smallest element at the top); for a max-heap we negate keys or use a custom comparator. This section covers the ADT, heap-based implementation, and the Python API.

Real-World Analogy

Think of an ER triage: patients are not served in arrival order but by urgency. The one with the highest priority (e.g. critical) is seen first. When a new patient arrives, they are inserted into the queue according to priority; when a doctor is free, the highest-priority patient is removed. Similarly, a task scheduler might always run the task with the earliest deadline or the highest priority. The priority queue is the data structure that supports “insert with priority” and “remove the best” efficiently.

Example

Insert (task1, 3), (task2, 1), (task3, 2) with higher number = higher priority. Extract-max returns task1 (3), then task3 (2), then task2 (1). Order of removal is by priority, not by insertion time.

Formal Definition

Concept Note

Priority queue (ADT): A collection of (element, priority) pairs that supports: (1) insert(x, p) or push(x)—add element x with priority p (or with x comparable); (2) extract-max() or extract-min()—remove and return the element with maximum or minimum priority; (3) peek—return the max or min without removing. Optionally: increase-priority, decrease-priority, size, isEmpty. Implementations: binary heap (O(log n) insert and extract, O(1) peek), or balanced BST (same bounds, supports more operations). We focus on the heap-based implementation.

Why This Topic Matters

Dijkstra and A*: Always extract the vertex with smallest tentative distance. Priority queue is the core; without it we’d scan all vertices each time—O(n²). With a heap we get O((V+E) log V).
Merge k sorted lists (LeetCode 23): Keep the smallest element among the k current heads; extract it and push the next from that list. Min-heap of size k gives O(N log k) for N total elements.
Top K / Kth largest (LeetCode 215, 347): Min-heap of size k: keep the k largest; the heap top is the kth largest. Or use quickselect. Priority queue is the standard “maintain top k” tool.

Mental Model

A binary min-heap is a complete binary tree where each node is ≤ its children. So the smallest element is at the root. We store the tree in an array: index 0 = root; for node at index i, left child at 2i+1, right at 2i+2, parent at (i-1)//2. Insert: add at the end and “bubble up” (swap with parent while smaller). Extract-min: save root, move last element to root, then “bubble down” (swap with the smaller child while larger than a child). Both are O(log n). For a max-heap, reverse the comparisons (each node ≥ children).

Python: heapq Module

heapq provides a min-heap on a list. The list is modified in place to satisfy the heap property (smallest at index 0).

heappush(heap, item): Push item onto heap; heap is a list. O(log n).
heappop(heap): Pop and return the smallest item. O(log n).
heap[0]: Peek the smallest without removing. O(1).
heapify(x): Transform list x into a heap in place. O(n). Use when you already have all elements; faster than pushing one by one.
nlargest(k, iterable), nsmallest(k, iterable): Return the k largest or k smallest. For small k or small n this may use a heap internally.

Items must be comparable. For (priority, value) pairs, the first element is used for comparison. So push (priority, value); the smallest priority is at the top. For a max-heap, push (-priority, value) and negate when you pop.

ASCII Diagram: Min-Heap

  Min-heap (smallest at root):

        1
       / \
      3   2
     / \ / \
    7  4 5  6

  Array: [1, 3, 2, 7, 4, 5, 6]
  Index:  0  1  2  3  4  5  6
  Parent of i: (i-1)//2. Children of i: 2i+1, 2i+2.

  heappop() → 1; then last (6) moves to root and bubbles down.
  heappush(0) → 0 goes at end, bubbles up to root.

Python Examples

Min-heap (default)

import heapq

h = []
heapq.heappush(h, 5)
heapq.heappush(h, 2)
heapq.heappush(h, 8)
heapq.heappush(h, 1)
x = heapq.heappop(h)   # 1 (smallest)
y = h[0]               # 2 (peek, don't remove)

Max-heap (negate keys)

import heapq

# Max-heap: store (-value, value) or just -value
h = []
heapq.heappush(h, -5)
heapq.heappush(h, -2)
heapq.heappush(h, -8)
max_val = -heapq.heappop(h)   # 8

Priority queue with payload (e.g. for Dijkstra or merge k lists)

import heapq

# (priority, payload). Smallest priority is popped first.
pq = []
heapq.heappush(pq, (0, "start"))
heapq.heappush(pq, (2, "node2"))
heapq.heappush(pq, (1, "node1"))
priority, node = heapq.heappop(pq)   # (0, "start")

When priorities tie, the second element is used to break ties (e.g. (0, "a") vs (0, "b")). If the payload is not comparable, use (priority, counter, payload) so that (priority, counter, x) is always comparable (counter = insertion order).

Time and Space Complexity

heappush: O(log n). heappop: O(log n). heap[0] (peek): O(1). heapify: O(n). Space O(n) for n elements. For “merge k lists” with total N nodes: we do N pushes and N pops, so O(N log k) time when the heap has at most k elements.

Edge Cases

heappop on empty heap: Raises IndexError. Check if heap: before popping.
Duplicate priorities: heapq is stable in the sense that equal elements are ordered by their second component (if tuples). For (priority, value), if two have the same priority, the one with the smaller value (or earlier insertion if you add a counter) comes out first. Don’t rely on order among equals unless you define it.
Non-comparable payloads: Use (priority, index, payload) so that (p, i, x) is comparable even when x is not. The index breaks ties.

Common Mistakes

Common Mistake

Assuming heapq is a max-heap. heapq is a min-heap: heappop returns the smallest element. For “kth largest” or “extract max,” negate the keys when pushing and negate again when popping, or use (negative priority, value).

Pushing (value, priority) instead of (priority, value): heapq compares the first element. So push (priority, value) so that the smallest priority is on top. If you push (value, priority), the smallest value will be on top, which is wrong for “process by priority.”
Using heap as a sorted list: A heap does not store elements in sorted order; only the root is guaranteed min (or max). To get sorted order, repeatedly heappop—that’s O(n log n). Don’t iterate over the list expecting sorted order.

Merge K Sorted Lists (Pattern)

Put the head of each of the k lists into a min-heap as (value, list_id, node_or_index). Pop the smallest; append to result; if that list has a next element, push (next.val, list_id, next). Repeat until the heap is empty. Each of the N total elements is pushed and popped once; heap size ≤ k. Time O(N log k), space O(k). LeetCode 23.

Expert Tip

When using heapq with custom objects, make the first component of the tuple the key for ordering. For merge k lists, push (node.val, i, node) so that the smallest value is on top; when you pop, get the node and push (node.next.val, i, node.next) if node.next exists. Use a counter as the second component if you need to avoid comparing nodes: (val, counter, node) with counter incremented each push.

Interview Insight

State: “A priority queue lets me always take the element with highest or lowest priority. In Python I use heapq: it’s a min-heap, so heappop gives the smallest. For max-heap I negate the key. Push and pop are O(log n), peek is O(1). I use it for Dijkstra (smallest distance), merge k lists (smallest of k heads), and top k (min-heap of size k).” Mention that heapify(list) is O(n) when you have all elements upfront.

Practice Problems

LeetCode 23: Merge k Sorted Lists (min-heap of k heads).
LeetCode 215: Kth Largest Element in an Array (min-heap of size k or quickselect).
LeetCode 347: Top K Frequent Elements (heap of (count, value) or bucket sort).
LeetCode 373: Find K Pairs with Smallest Sums (heap of (sum, i, j)).
Dijkstra’s shortest path (priority queue of (distance, node)).

Summary

A priority queue supports insert and extract-min (or extract-max) in O(log n) via a binary heap. Peek is O(1).
In Python, heapq is a min-heap: heappush(h, x), heappop(h), h[0] for peek. For max-heap, push -x and negate on pop. For (priority, payload), push (priority, payload) so smallest priority is on top.
Use for: Dijkstra, merge k sorted lists (O(N log k)), top k / kth largest (min-heap of size k), and any “always process the best” algorithm. heapify(list) is O(n) when building from a full list.

9.9 Monotonic Queue

Introduction

A monotonic queue is a deque used to maintain a sequence of indices (or values) in monotonic order while supporting removal from both ends: from the front when an element “leaves” (e.g. goes out of a sliding window) and from the rear when a new element “dominates” older ones (e.g. a larger value makes smaller values useless as future maxima). The classic application is sliding window maximum (LeetCode 239): for each window of size k, report the maximum value in O(1) amortized time, so that the total over n windows is O(n). We keep a deque of indices such that their corresponding values are in decreasing order; the front is always the index of the current window’s maximum. This section gives the full algorithm, intuition, and code.

Real-World Analogy

Imagine a line of people by height in a room with a sliding door. The door shows only k consecutive people (the “window”). You want to know the tallest person in the current view. When the door slides right, the leftmost person may leave (remove from front if their index is out of the window) and a new person enters on the right. Anyone in the line who is shorter than the new person can never be the “tallest in view” again, so we remove them from the back. The line always has people in decreasing height (front = tallest in the window). That’s the monotonic queue: we remove from the front (out of window) and from the back (dominated by the new element).

Example

Array [1, 3, -1, -3, 5, 3, 6, 7], k = 3. Window [1,3,-1] → max 3; [3,-1,-3] → max 3; [-1,-3,5] → max 5; [-3,5,3] → max 5; [5,3,6] → max 6; [3,6,7] → max 7. Result [3, 3, 5, 5, 6, 7]. The deque holds indices; we popleft when index < current window start, and pop from the back while arr[back] ≤ arr[i], then append i.

Formal Definition

Concept Note

Monotonic queue (for sliding window max): A deque that stores indices of the array such that the corresponding values are in non-increasing order (front = index of current max). Invariant: for the current window [left, right], the front of the deque is the index of the maximum in that window. When we advance the window: (1) if the front index is less than left (out of window), remove it from the front (popleft); (2) from the rear, remove all indices whose values are ≤ the new element at the right end (they are dominated); (3) append the new index at the rear. Then the front is the index of the max for the current window. Each index is pushed once and popped at most once, so total time O(n).

Why This Topic Matters

LeetCode 239: Sliding Window Maximum: The standard O(n) solution uses a monotonic deque. Naive “max of each window” is O(n k); with the deque it’s O(n).
Pattern for “range max/min” in a sliding window: Same idea applies to “minimum in each window” (use increasing order: pop from rear when new element is smaller). Also appears in problems like “longest subarray with max - min ≤ limit” (two deques for max and min).
Difference from monotonic stack: A stack only removes from one end. For a sliding window we must remove from the front (elements that left the window) and from the rear (dominated elements)—so we need a deque.

Mental Model

The deque holds “candidates” for being the maximum of the current window. A candidate is useful only if (1) it’s still inside the window (index ≥ left), and (2) no larger element has appeared after it (so we keep indices in decreasing value order). When we move the window right: first drop the front if it’s outside the window; then from the back, drop any index whose value is ≤ the new value (the new value is to the right of them and is at least as large, so they’ll never be the max again); then add the new index. The front is always the index of the maximum in [left, right].

Step-by-Step: Sliding Window Maximum

Let arr be the array, k the window size. Initialize dq = deque() (of indices) and result = [].
For each right index i from 0 to n-1:
- Remove from front: While dq is not empty and dq[0] < i - k + 1 (index is to the left of the window), dq.popleft().
- Remove from rear: While dq is not empty and arr[dq[-1]] <= arr[i], dq.pop(). (Use < for “strictly smaller” so we keep one of equal values; or use <= to drop all that are ≤ new—both work; typically we drop when ≤ so the newest equal value stays.)
- Append: dq.append(i).
- Record result: Once we have at least k elements (i ≥ k - 1), the max for the current window is arr[dq[0]]. Append it to result.
Return result. Length is n - k + 1.

ASCII Diagram

  arr = [1, 3, -1, -3, 5, 3, 6, 7], k = 3
  Window [0..2]: indices 0,1,2. dq after processing 0,1,2: [1] (3 beats 1 and -1)
  max = arr[1] = 3.

  Window [1..3]: dq[0]=1 is still in range. Process 3: arr[3]=-3, dq=[1,3]. max=3.
  Window [2..4]: Process 4: arr[4]=5. Pop 3 (-3≤5), pop 1 (3≤5). dq=[4]. max=5.
  Window [3..5]: Process 5: arr[5]=3. dq=[4,5]. max=5.
  Window [4..6]: Process 6: arr[6]=6. Pop 5, pop 4. dq=[6]. max=6.
  Window [5..7]: Process 7: arr[7]=7. Pop 6. dq=[7]. max=7.
  Result: [3, 3, 5, 5, 6, 7]

Python Implementation (LeetCode 239)

from collections import deque

def max_sliding_window(nums, k):
    dq = deque()
    result = []
    for i in range(len(nums)):
        # Remove indices that are out of the current window
        while dq and dq[0] < i - k + 1:
            dq.popleft()
        # Remove from rear: elements that are <= current (they're dominated)
        while dq and nums[dq[-1]] <= nums[i]:
            dq.pop()
        dq.append(i)
        # First window is complete when i >= k - 1
        if i >= k - 1:
            result.append(nums[dq[0]])
    return result

We use <= when comparing values so that when two values are equal, we keep the newer index (closer to the right of the window). Alternatively use < to keep the leftmost of equals; both give correct max. For “min in each window,” use a deque with increasing order: pop from rear when nums[dq[-1]] >= nums[i].

Line-by-Line Explanation

dq[0] < i - k + 1: The current window is [i - k + 1, i]. So any index < i - k + 1 is to the left of the window and must be removed from the front.
nums[dq[-1]] <= nums[i]: The element at dq[-1] is ≤ the new element at i. Since i is to the right, the element at dq[-1] can never be the max of any future window that includes i. So we pop it from the rear.
dq.append(i): Add the current index. After the two while loops, the deque is still in decreasing order by value (front = max in window).
if i >= k - 1: The first window that is complete has right index k - 1 (indices 0..k-1). From then on, every step produces one window max.

Time and Space Complexity

Each index is appended once and removed at most once (either from the front when it leaves the window or from the rear when it’s dominated). So the total number of operations is O(n). Time O(n). The deque can hold at most k indices (one per window position), so space O(k); often written as O(n) for the result array. For the deque itself, O(k) is accurate.

Edge Cases

k = 1: Each window has one element; result is a copy of the array. The deque always has one element after each step. Correct.
k = n: One window; result has one element = max(arr). The deque may shrink to one index after processing all elements. Correct.
k > n: Problem usually assumes k ≤ n. If k > n, we might return [] or the max of the whole array depending on problem definition.
Strictly decreasing array: The deque will often have one element (the current index), as each new element dominates all previous. Result is the array itself (each window max is its rightmost element).

Common Mistakes

Common Mistake

Removing from the wrong end. Elements that leave the window are at the front of the deque (they were added earliest among current candidates). So we popleft when dq[0] < left. Elements that are dominated by the new value are at the rear (we just added larger ones after them). So we pop from the rear when arr[dq[-1]] ≤ arr[i]. Swapping these (e.g. popping from rear for “out of window”) breaks the invariant.

Storing values instead of indices: We need indices to know when an element is out of the window (index < i - k + 1). If we stored only values, we couldn’t tell when to remove from the front. Always store indices in the deque.
Wrong comparison for “min in window”: For sliding window minimum, we want the deque in increasing order (front = min). Pop from rear when arr[dq[-1]] >= arr[i]. The new smaller value dominates.

Monotonic Queue vs Monotonic Stack

Structure	Removal	Typical use
Monotonic stack	Only from top (one end)	Next greater/smaller element (no window)
Monotonic queue	Front (out of window) and rear (dominated)	Sliding window max/min

Sliding Window Minimum

For the minimum in each window, keep the deque in increasing order (front = index of current min). Pop from front when index < i - k + 1. Pop from rear when arr[dq[-1]] >= arr[i] (the new value is smaller or equal, so the rear is dominated). Then result.append(nums[dq[0]]). Same O(n) time.

Expert Tip

For “subarray range” problems (e.g. “number of subarrays where max - min ≤ limit”), maintain two monotonic deques: one for max (decreasing) and one for min (increasing). For each right, advance left as needed so that max - min ≤ limit (using the two fronts), then count subarrays. Same O(n) idea.

Interview Insight

State: “For sliding window max I use a deque of indices keeping decreasing order by value. When the window moves: remove from front if the index is out of the window, remove from rear all indices whose values are ≤ the new element, then append the new index. The front is always the max index. Each index is pushed and popped at most once, so O(n).” Mention that we need indices (not just values) to detect “out of window.” For window min, same idea with increasing order and pop rear when arr[rear] >= arr[i].

Practice Problems

LeetCode 239: Sliding Window Maximum (decreasing deque of indices).
Sliding window minimum (increasing deque).
LeetCode 1438: Longest Continuous Subarray With Absolute Diff Less Than or Equal to Limit (two deques for max and min).

Summary

A monotonic queue is a deque that maintains indices in monotonic order (decreasing for window max, increasing for window min). We remove from the front when the index leaves the window and from the rear when the new element dominates.
Sliding window maximum: Deque of indices, values in decreasing order. Popleft when dq[0] < i - k + 1; pop when arr[dq[-1]] <= arr[i]; append i; if i ≥ k - 1, result.append(arr[dq[0]]). Time O(n), space O(k).
Store indices (not values) so we can check “out of window.” For window minimum, use increasing order and pop rear when arr[dq[-1]] >= arr[i].

10.1 Hash Tables

Introduction

A hash table (hash map, dictionary, or associative array) is a data structure that maps keys to values and supports insert, lookup, and delete by key in O(1) average time. It works by applying a hash function to the key to get an index into an array of “buckets”; each bucket can hold one or more (key, value) pairs. Collisions (two keys hashing to the same index) are handled by chaining (a list per bucket) or open addressing (probe to the next free slot). In Python, dict and set are hash-table based: dict stores key→value, set stores unique keys only. Hash tables are the go-to when you need fast “find by key” or “check membership” and don’t need order. This section covers the idea, hash functions, collision handling at a high level, and Python usage.

Real-World Analogy

Think of a library catalog: you look up a book by its call number (like a hash). The call number tells you which shelf (bucket) to go to. Several books might share the same shelf (collision); you then scan that shelf to find the exact book. The “hash function” (call number system) spreads books across many shelves so no single shelf has too many. If the hash is good, lookup is fast—you go to one shelf and do a short scan. Hash tables work the same: hash(key) → bucket index → search within that bucket (or probe) to find the key.

Example

Store ("apple", 1), ("banana", 2), ("cherry", 3). Hash of "apple" might be 3, "banana" 7, "cherry" 3 (collision with apple). Bucket 3 holds [("apple", 1), ("cherry", 3)]; bucket 7 holds [("banana", 2)]. Lookup "cherry": hash→3, scan bucket 3, find ("cherry", 3). O(1) average if buckets are small.

Formal Definition

Concept Note

Hash table: A data structure that implements a map from keys to values (or a set of keys). It uses a hash function h(key) that maps each key to an integer in [0, m−1] (bucket index), where m is the number of buckets. Operations: insert(key, value) (or set.add(key)), get(key) / lookup, delete(key). With a good hash function and load factor (n/m) kept bounded, these operations are O(1) average. Worst case (all keys in one bucket) is O(n). Keys must be hashable (immutable and with a consistent __hash__) and equality-comparable. Mutable types (list, dict) are not hashable in Python.

Why This Topic Matters

Most used structure in interviews: “Two sum,” “group anagrams,” “first non-repeating character,” “subarray with sum k”—all rely on fast lookup. Reaching for a dict or set often turns O(n²) into O(n).
Python dict and set: Both are hash tables. dict for key→value; set for unique keys and O(1) membership. Knowing when to use which (and that keys must be hashable) is essential.
Collision handling and load factor: Understanding chaining vs open addressing and why we resize (to keep load factor low) helps you reason about average vs worst case and about custom hash tables (e.g. in systems design).

Mental Model

An array of buckets. Each key is sent to a bucket by hash(key) % m. If we use chaining, each bucket is a list (or another structure) of (key, value) pairs; lookup: go to bucket, scan the list for the key. If we use open addressing, each bucket holds at most one pair; on collision we “probe” (e.g. linear probing: try next index) until we find an empty slot or the key. The load factor α = n/m (number of elements / number of buckets) should stay below a threshold (e.g. 0.7) so that average chain length or probe length is small. When α gets too high, we resize (double m, rehash all keys).

Hash Function

A good hash function: (1) is deterministic (same key → same hash), (2) spreads keys uniformly over buckets (reduces collisions), (3) is fast to compute. For integers, often h(x) = x % m (or a more sophisticated mix). For strings, we might combine character codes: e.g. h = 0; for c in s: h = (h * 31 + ord(c)) % m. Python’s built-in hash() is used by dict and set; it can vary between runs (salted for security), but within one run it is consistent. For user types, define __hash__ and __eq__; if two objects are equal they must have the same hash.

Collision Handling (Overview)

Chaining: Each bucket is a list (or linked list). Insert: append to the list at hash(key). Lookup/delete: scan that list. Average list length = n/m = α. So average O(1 + α) = O(1) if α = O(1).
Open addressing: One item per bucket. On collision, probe (e.g. linear: (h + i) % m, or quadratic, or double hashing). Lookup: probe until we find the key or an empty slot. Delete: can use a “tombstone” marker. Average probes also O(1) for low load factor. See Section 10.2 for detail.

Python: dict and set

Both are implemented with hash tables. dict: key → value; keys must be hashable and unique. set: unordered collection of unique hashable elements. Operations:

dict: d[key] = value (insert/update), d[key] or d.get(key) (lookup), key in d (membership), del d[key] or d.pop(key) (delete). Average O(1).
set: s.add(x), x in s, s.remove(x) or s.discard(x). Average O(1).

Keys must be immutable (or at least their hash must not change). So int, str, tuple (of hashables) are fine; list and dict are not. Use tuple(list) or frozenset if you need to hash a sequence or set.

ASCII Diagram

  Hash table (chaining), m = 4, keys "a","b","c","d"; h("a")=0, h("b")=1, h("c")=0, h("d")=2.

  Buckets:  0: [("a", val_a), ("c", val_c)]
            1: [("b", val_b)]
            2: [("d", val_d)]
            3: []

  Lookup "c": h("c")=0 → bucket 0 → scan → find ("c", val_c).

Time and Space Complexity

Average case (uniform hashing, load factor α = O(1)): insert, lookup, delete O(1). Worst case (all keys collide): O(n) per operation. Space: O(n) for n key-value pairs plus the bucket array O(m); total O(n + m). With resizing to keep α bounded, m = Θ(n), so space O(n).

Edge Cases

Mutable keys: Don’t use list or dict as a key—they’re unhashable. TypeError. Use tuple or frozenset if you need to key by a sequence or set.
Missing key: d[key] raises KeyError if key not in d; d.get(key) returns None (or a default). Use get when you’re not sure the key exists.
Empty dict/set: {} or set(). Check with if d: or len(d) == 0.

Common Mistakes

Common Mistake

Using a list as a dict key. Lists are mutable and unhashable. d[[1,2]] = 3 raises TypeError. Use tuple([1,2]) as the key if you need to store by a sequence. Same for set: s.add([1,2]) fails; use frozenset or tuple.

Assuming order in Python 3.6 and earlier: In Python 3.7+, dict preserves insertion order. In 3.6 and earlier, dict order was not guaranteed. Don’t rely on order unless you know the version or use collections.OrderedDict.
Modifying a dict while iterating: Adding or deleting keys during iteration can raise RuntimeError or give unpredictable results. Iterate over a copy of the keys (list(d)) or collect changes and apply after the loop.

When to Use Hash Table

Use a dict when you need: fast lookup by key, fast insert/update by key, or to count/frequency (key → count). Use a set when you need: unique elements, O(1) membership (x in s), or to remove duplicates. If you need order (sorted keys or insertion order), consider OrderedDict or sorted(d.keys()). If you need range queries (“all keys between a and b”), a hash table is not the right tool—use a balanced tree or sorted structure.

Expert Tip

For “count frequency” or “group by key,” the pattern is: one pass, for each item do d[key] = d.get(key, 0) + 1 or d[key] = d.get(key, []) + [item]. For “check if seen,” use a set: seen.add(x) and if x in seen. For “two sum” style (find a pair with sum k), store seen values in a set or store value→index in a dict and check for complement.

Interview Insight

State: “I’ll use a hash map (dict) for O(1) lookup by key. I’ll store … (e.g. value → index for two sum, or character → frequency). One pass, for each element I check/update the map. Total time O(n), space O(n).” Mention that keys must be hashable. For “unique elements” or “membership,” say “I’ll use a set for O(1) add and O(1) membership check.”

Practice Problems

LeetCode 1: Two Sum (dict: value → index; check for complement).
LeetCode 49: Group Anagrams (dict: sorted string or tuple of counts → list of strings).
LeetCode 387: First Unique Character (dict: char → count or first index).
LeetCode 217: Contains Duplicate (set to check seen).

Summary

A hash table maps keys to values (or stores a set of keys) with O(1) average insert, lookup, and delete. It uses a hash function to choose a bucket and handles collisions by chaining or open addressing.
In Python, dict (key→value) and set (unique keys) are hash tables. Keys must be hashable (immutable; no list/dict as key). Use dict.get(key, default) to avoid KeyError.
Keep load factor low (resize when needed) for O(1) average. Use hash tables for fast lookup, membership, counting, and grouping—the default for “find by key” and “have I seen this?” in interviews.

10.2 Collision Handling

Introduction

A collision occurs when two different keys hash to the same bucket index. Because the number of possible keys is usually much larger than the number of buckets, collisions are inevitable. The way we handle them determines the performance and implementation of the hash table. The two main strategies are chaining (each bucket holds a list of (key, value) pairs) and open addressing (each bucket holds at most one pair; on collision we “probe” for another slot). Each has tradeoffs: chaining is simple and handles high load factors well; open addressing avoids pointers and can have better cache behavior but requires careful handling of deletions and load factor. This section details both methods, probe sequences, load factor, and deletion.

Real-World Analogy

With chaining, each shelf (bucket) can hold multiple books—like a stack or a list on that shelf. When two books have the same call number (collision), both go on the same shelf; you scan that shelf to find the one you need. With open addressing, each shelf holds exactly one book. If the shelf is taken, you look at the “next” shelf (linear probe), or skip by a rule (quadratic or double hashing) until you find an empty one. The table never has more than one item per slot, but finding the right slot can take several steps when the table is crowded.

Example

Keys 5 and 13 both hash to index 2 (e.g. 5 % 8 = 5, 13 % 8 = 5 if we had 8 buckets—or imagine both give 2). Chaining: bucket 2 = [(5, v1), (13, v2)]. Lookup 13: go to bucket 2, scan, find (13, v2). Open addressing: bucket 2 has (5, v1); we try 3, 4, … until an empty slot; we store (13, v2) there. Lookup 13: start at 2, see 5; probe until we find 13 or an empty slot.

Formal Definition

Concept Note

Chaining (separate chaining): Each bucket is a container (list, linked list) that can hold multiple (key, value) pairs. Insert: compute h(key), append to the container at that index. Lookup: go to bucket h(key), search the container for the key. Delete: go to bucket, remove the pair from the container. Open addressing: The table is a single array; each slot holds at most one (key, value) or is empty. Insert: if slot h(key) is occupied, use a probe sequence (e.g. (h(key) + i) % m for i = 0, 1, 2, …) until an empty slot is found. Lookup: follow the same probe sequence until the key is found or an empty slot is reached. Delete: requires special handling (tombstone or rehash) so that lookups for keys that probed past the deleted slot still work.

Why This Topic Matters

Implementation choice: When building a custom hash table (e.g. in systems or interviews), you must choose and implement a collision strategy. Chaining is easier to get right; open addressing is common in high-performance or embedded settings.
Worst case vs average: With a bad hash function or adversarial keys, chaining can degenerate to one long chain (O(n) per operation). Open addressing can suffer from clustering (long probe runs). Understanding both helps you reason about resizing and load factor.
Deletion in open addressing: You cannot simply clear the slot—a later key might have probed past it. Tombstones or “lazy delete” plus periodic rehash are the standard solutions.

Chaining (Separate Chaining)

Each bucket is a list (or linked list) of entries. Insert(k, v): Compute i = h(k) % m. If the table uses a list of lists, append (k, v) to table[i]. Lookup(k): i = h(k) % m; scan table[i] for an entry with key k; return value or None. Delete(k): i = h(k) % m; remove the entry with key k from table[i].

Analysis: Let n = number of elements, m = number of buckets, α = n/m (load factor). Assuming uniform hashing, the expected length of a chain is α. So lookup and delete take O(1 + α) on average. If we keep α = O(1) (e.g. resize when α > 1 or 2), operations are O(1). Worst case: all keys in one bucket → O(n).

Python-style pseudocode (chaining)

def insert(table, key, value):
    i = hash(key) % len(table)
    for j, (k, v) in enumerate(table[i]):
        if k == key:
            table[i][j] = (key, value)
            return
    table[i].append((key, value))

def lookup(table, key):
    i = hash(key) % len(table)
    for k, v in table[i]:
        if k == key:
            return v
    return None

Open Addressing

One entry per slot. When slot i = h(key) % m is occupied by another key, we probe. The probe sequence is a sequence of indices i₀, i₁, i₂, … that must eventually cover all slots (so we can find an empty slot if the table is not full).

Linear probing

Probe sequence: (h(key) + i) % m for i = 0, 1, 2, …. Simple but causes primary clustering: consecutive occupied slots form runs, and new keys that hash into the run make it longer. Average probe length grows quickly as α approaches 1.

Quadratic probing

Probe sequence: (h(key) + c₁·i + c₂·i²) % m for i = 0, 1, 2, …. Reduces primary clustering but can fail to find an empty slot even when one exists (unless m and the constants are chosen carefully). Often used when α is kept below 0.5.

Double hashing

Probe sequence: (h₁(key) + i · h₂(key)) % m for i = 0, 1, 2, …, with a second hash function h₂. Good spread and fewer clusters. h₂(key) must be nonzero and relatively prime to m for the probe to cover all slots.

Lookup: Follow the probe sequence; if we find the key, return its value; if we find an empty slot, the key is not present. Insert: Follow the probe sequence; use the first empty slot (or first tombstone, depending on policy). Delete: If we simply clear the slot, a later lookup for a key that had probed past this slot might stop at the empty slot and incorrectly report “not found.” So we either (1) mark the slot as a tombstone (deleted but not empty—probe continues past it), or (2) rehash all remaining elements in the same cluster. Tombstones are common; they can be reused on insert. Too many tombstones can slow lookups; periodic rehash cleans them.

Load Factor and Resizing

The load factor is α = n / m (number of elements / number of buckets). For chaining, α can exceed 1 (average chain length = α). For open addressing, α must be < 1. To keep operations O(1) average:

Chaining: Resize (e.g. double m) when α exceeds a threshold (e.g. 1 or 2). Rehash all entries into the new table.
Open addressing: Resize when α exceeds 0.7–0.8 (or similar). Higher α causes long probe sequences. After resize, rehash all entries (tombstones are not copied).

Resizing is O(n) but amortized over many inserts, so average insert stays O(1).

Comparison: Chaining vs Open Addressing

Aspect	Chaining	Open Addressing
Slots	One list per bucket; multiple items per index	One item per slot; array only
Load factor	Can be > 1	Must be < 1
Delete	Remove from list	Tombstone or rehash cluster
Cache	Pointer chasing in lists	Sequential probe can be cache-friendly

ASCII Diagram: Linear Probing

  m = 8. Insert keys that hash to: 5→2, 13→2, 20→4, 7→7.
  After 5:   [_, _, (5,v), _, _, _, _, _]   index 2
  After 13:  [_, _, (5,v), (13,v), _, _, _, _]   collision at 2, probe to 3
  After 20:  [_, _, (5,v), (13,v), (20,v), _, _, _]
  After 7:   [_, _, (5,v), (13,v), (20,v), _, _, (7,v)]

  Lookup 13: start at 2, see 5; probe 3, see 13 → found.
  Delete 5: if we clear slot 2 → [_, _, _, (13,v), ...]
  Lookup 13: start at 2, see empty → not found (wrong!). So use tombstone at 2.

Common Mistakes

Common Mistake

In open addressing, clearing a slot on delete. Clearing the slot breaks the probe chain. A key that was inserted after the deleted key (and probed past this slot) would be found by continuing past the deleted slot. Once the slot is empty, lookup stops there and incorrectly reports “not found.” Use a tombstone (special “deleted” marker) so that lookup and insert treat it as “keep probing.”

Assuming no collisions: In analysis, always account for collisions. Average case assumes uniform hashing and load factor O(1). Worst case is O(n) if all keys collide.
Forgetting to rehash after resize: When we double m, every key must be reinserted into the new table (new indices). Tombstones are not carried over.

Python's dict

CPython’s dict uses open addressing with a variant of random probing (perturbation) to reduce clustering. Deleted slots are marked (tombstone). Resizing happens when the table is about two-thirds full. You don’t implement this yourself in Python—but knowing that dict is open-addressing based explains why “one slot per entry” and “tombstones” matter in general.

Expert Tip

In interviews, you usually just say “hash table with chaining” or “open addressing” at a high level. If asked to implement, chaining is simpler: list of lists, insert/lookup/delete by scanning the list at hash(key) % m. Mention load factor and resizing (double m, rehash) to keep O(1) average.

Interview Insight

If asked “how do hash tables handle collisions?”, say: “Two main ways: chaining—each bucket is a list, we append and scan; and open addressing—one item per slot, we probe (e.g. linear or double hashing) until we find an empty slot or the key. For open addressing, delete uses a tombstone so we don’t break the probe chain. We keep load factor low and resize when needed for O(1) average.”

Summary

Chaining: Each bucket is a list of (key, value) pairs. Insert: append to list at h(key)%m. Lookup/delete: scan that list. Average O(1+α); α can be > 1.
Open addressing: One entry per slot. Collision: probe (linear, quadratic, or double hashing) until empty slot or key found. Delete: use tombstone so probe chain is not broken. α must be < 1; resize when α gets high.
Resize (double m, rehash all) to keep load factor bounded. Python’s dict uses open addressing with tombstones and resizing.

10.3 Dictionary Internals

Introduction

Understanding how Python’s dict is implemented helps you reason about performance, key requirements (hashable, immutable), and insertion order. CPython’s dict uses open addressing with a perturbed probe sequence (not plain linear probing) to reduce clustering. It stores entries in a compact table (indices + keys + values) and maintains insertion order (since Python 3.7). Resizing follows a growth policy (typically roughly 2× when about two-thirds full). This section explains the high-level layout, hash and probe, resizing, and gives concrete examples so you can predict behavior and avoid pitfalls.

Real-World Analogy

Think of a dict as a filing cabinet with numbered drawers. The hash of the key tells you which drawer to try first. If it’s full (collision), you use a fixed “perturbation” rule to try other drawers in a pseudo-random order so you don’t always pile up in one place. The cabinet can be resized (bigger cabinet, same files re-filed). There is also a separate list of keys in insertion order (like a log of who put what in when)—that’s why in Python 3.7+ the order is guaranteed. The “key must be immutable” rule is like: you can’t change the label on a file after it’s been filed, or the drawer number would no longer match.

Example

d = {"a": 1, "b": 2, "c": 3}. The keys "a", "b", "c" are hashed and stored; the order of iteration is "a", "b", "c" (insertion order). If you do d["b"] = 20, the key "b" stays in the same logical position; only the value changes. If you do d["d"] = 4, a new entry is added at the “end” of the order. So list(d.keys()) is ["a", "b", "c", "d"]. Deleting "b" and re-inserting it would put "b" at the end in current CPython behavior (deletion can affect the internal order of the compact list).

Formal Definition

Concept Note

CPython dict layout (simplified): (1) A hash table (indices array): each slot holds an index into the “entries” array, or is empty/dummy. (2) An entries array: stores (hash, key, value) in a compact form, with entries in insertion order (or a variant that preserves iteration order). Lookup: compute hash(key), use perturbed probing on the indices table to find the slot that points to the entry with matching key; then return that entry’s value. Insert: probe for an empty slot or the same key; if new key, add to entries and store the index in the slot. Delete: mark slot as dummy (tombstone) so probe chains still work. Resize: When the table is about 2/3 full, allocate a larger indices table (e.g. 2× or 4×) and reinsert all entries. Insertion order: The entries array (or equivalent) is built in insertion order so that iterating the dict yields keys in the order they were first inserted.

Why This Topic Matters

Key requirements: Dict keys must be hashable (immutable and implementing __hash__ and __eq__). Understanding “why immutable?” comes from knowing that the hash is used to find the slot—if the key changed after insertion, the hash would change and we’d look in the wrong place.
Insertion order (3.7+): You can rely on dict preserving order in iteration and in **kwargs. This is part of the language spec and is used in JSON and serialization. Knowing it avoids confusion with older Python or other languages.
Performance and resizing: Insert can occasionally be O(n) when a resize happens, but amortized O(1). Understanding resizing explains why “pre-sizing” a dict (e.g. with an initial capacity) is rarely needed in Python—the implementation handles it.

Hash and Probe (High Level)

Python calls hash(key) to get an integer. For user-defined types, this uses id() by default unless you define __hash__. The hash is then perturbed (e.g. mixed with higher bits) before reducing to an index, so that similar keys don’t cluster. The probe sequence is not linear; it uses the perturbed value to jump around the table. This reduces clustering and keeps average probe length low. Exact details are in the CPython source (Objects/dictobject.c); for interviews, “open addressing with a smart probe and resizing” is enough.

Insertion Order (Python 3.7+)

Since Python 3.7, the language guarantees that dict maintains insertion order: the order in which keys were first inserted is the order in which they are iterated. So:

d = {}
d["z"] = 1
d["a"] = 2
d["m"] = 3
print(list(d.keys()))   # ['z', 'a', 'm']  — same as insertion order

Reassigning a value does not change order:

d["a"] = 99
print(list(d.keys()))   # still ['z', 'a', 'm']

Deleting a key removes it from the order; re-inserting the same key puts it at the “end” of the current order:

del d["a"]
d["a"] = 100
print(list(d.keys()))   # ['z', 'm', 'a']  — 'a' is now last

Resizing and Growth

When the number of entries reaches about 2/3 of the indices table size, the dict is resized: a new, larger indices table is allocated (sizes follow a sequence that roughly doubles), and every entry is reinserted. This is O(n) but happens only periodically, so amortized insert is O(1). You don’t need to “pre-allocate” a dict for normal use; dict.fromkeys(iterable) or building in one pass is fine.

Example: Resize effect

Start with an empty dict. First few inserts use a small table. After enough inserts, the table is resized; subsequent lookups use the new indices. You can observe “steps” in memory growth if you measure size, but for correctness you can assume O(1) average insert and lookup.

Keys Must Be Hashable

An object is hashable if it has a __hash__ that doesn’t change during its lifetime and implements __eq__ so that equal objects have the same hash. Immutable built-in types (int, float, str, tuple) are hashable. Lists and dicts are not (they are mutable). So:

d = {}
d[[1, 2]] = 3   # TypeError: unhashable type: 'list'

# Correct: use tuple
d[tuple([1, 2])] = 3   # OK
d[(1, 2)] = 4          # overwrites; (1, 2) == tuple([1, 2])

Using a tuple as key is standard when you need to key by a sequence:

# Group by pair (a, b)
pairs = [(1, 2), (1, 3), (2, 2)]
groups = {}
for a, b in pairs:
    key = (a, b)
    groups[key] = groups.get(key, 0) + 1
print(groups)   # {(1, 2): 1, (1, 3): 1, (2, 2): 1}

Dictionary Internals: Compact Representation

Modern CPython uses a compact representation: a separate array of indices into a dense array of (hash, key, value) entries. The dense array is in insertion order; the indices array is the “hash table” that maps hash → index. This gives both O(1) lookup (via the indices) and ordered iteration (via the dense array). When you delete, the indices slot is marked as unused (dummy); the dense entry can be left in place or the table can be compacted, depending on implementation details.

Practical Examples

Example 1: Building a frequency map

words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
freq = {}
for w in words:
    freq[w] = freq.get(w, 0) + 1
# freq = {'apple': 3, 'banana': 2, 'cherry': 1}
# Order is insertion order: apple first, then banana, then cherry.

Example 2: Inverting a dict (value → key)

d = {"a": 1, "b": 2, "c": 1}
inv = {}
for k, v in d.items():
    inv.setdefault(v, []).append(k)
# inv = {1: ['a', 'c'], 2: ['b']}
# If multiple keys have the same value, they go in a list.

Example 3: Default value and key existence

d = {"x": 10}
print(d.get("y"))       # None (key missing)
print(d.get("y", 0))    # 0 (default)
print("y" in d)         # False
d["y"] = d.get("y", 0) + 1   # safe: now d["y"] == 1

Example 4: Why mutable keys fail

# Lists are mutable — unhashable
try:
    {[1, 2]: "bad"}
except TypeError as e:
    print(e)   # unhashable type: 'list'

# Tuple of immutables is fine
{(1, 2): "ok", (3, 4): "ok"}   # OK
# Tuple containing a list is not hashable
try:
    {(1, [2, 3]): "bad"}
except TypeError as e:
    print(e)   # unhashable type: 'list'

Time and Space Complexity

Average: insert, lookup, delete O(1). Amortized: insert O(1) (resize cost spread over many inserts). Worst case: O(n) if many keys collide or during resize. Space: O(n) for n entries plus the indices table (typically similar in size to the number of entries after resizing). Iteration is O(n) and yields keys (or items) in insertion order.

Common Mistakes

Common Mistake

Using a list or dict as a key. Only hashable types can be keys. Use tuple(seq) or frozenset(s) if you need to key by a sequence or set. Remember: a tuple is hashable only if all its elements are hashable—(1, [2]) is not hashable.

Assuming order in Python 3.6 and below: In 3.6 and earlier, dict did not guarantee insertion order. Code that relies on order should require 3.7+ or use collections.OrderedDict.
Modifying dict while iterating: Adding or deleting keys during for k in d can raise RuntimeError or give undefined behavior. Iterate over list(d) or a copy if you need to mutate.

Hash Randomization

By default, Python randomizes the hash function at interpreter startup (using PYTHONHASHSEED). So hash("hello") can differ between runs. This is to prevent a class of denial-of-service attacks based on crafted keys that all collide. For the same process, hash is stable. Don’t rely on hash values being the same across different runs or machines.

Expert Tip

When you need “dict with default” behavior, use collections.defaultdict or d.get(key, default). For grouping, d.setdefault(key, []).append(value) or defaultdict(list) is cleaner than checking if key not in d: d[key] = [].

Interview Insight

If asked “how does Python’s dict work?”, say: “It’s a hash table with open addressing and a perturbed probe to reduce clustering. Keys must be hashable (immutable). Since 3.7, dict preserves insertion order. Resizing happens when the table is about two-thirds full; amortized O(1) insert and lookup.” Give an example of building a frequency dict or using tuple as key when you need to key by a pair.

Practice Problems

LeetCode 49: Group Anagrams (dict with key = tuple of counts or sorted string).
LeetCode 1: Two Sum (dict: value → index).
Use dict.get(key, default) and setdefault in frequency and grouping problems.

Summary

CPython’s dict uses open addressing with a perturbed probe, compact storage, and resizing when ~2/3 full. Insertion order is preserved (3.7+).
Keys must be hashable (immutable; implement __hash__ and __eq__). Use tuple or frozenset to key by sequence or set; avoid list/dict as key.
Use d.get(key, default) for safe lookup, d.setdefault(key, []) or defaultdict for grouping. Don’t mutate a dict while iterating over it.

10.4 Frequency Problems

Introduction

Frequency problems are those where you need to count how often each element (or pattern) appears in the input—a list, string, or stream. The hash table (dict) is the natural tool: use the element as the key and the count as the value. Once you have a frequency map, you can answer questions like "which element appears most often?", "how many elements appear exactly k times?", or "are these two sequences anagrams?". This section covers building frequency maps, common patterns (max frequency, k most frequent, anagrams), and how to do it cleanly in Python with dict, defaultdict, and Counter.

Formal Definition

A frequency map (or count map) is a function from the set of distinct elements in the input to the non-negative integers: for each element x, freq(x) is the number of times x appears. Implemented as a hash table: keys are elements, values are counts. The map is built in a single pass: for each occurrence of x, set freq[x] = freq.get(x, 0) + 1. Any query that depends only on per-element counts can then be answered from this map.

Mental Model

Think of the frequency map as a scoreboard: one row per distinct item, one column for "how many." You scan the input once and, for every item you see, add 1 to its row. After the pass, the scoreboard holds all counts. "Most frequent" = row with the largest number; "anagrams?" = two scoreboards (one per string) are identical.

Real-World Analogy

Imagine counting votes in an election: each ballot has a candidate name. You go through the pile and, for each name, add one to that candidate's tally. At the end you have a "frequency map": candidate → number of votes. Finding the winner is "key with maximum value." Frequency problems in algorithms are the same: one pass (or a few) to build counts, then query or process the counts.

Example

Given arr = [1, 2, 2, 3, 2, 1], a frequency map is {1: 2, 2: 3, 3: 1}. So 2 appears most often (3 times). For strings: "hello" → {'h': 1, 'e': 1, 'l': 2, 'o': 1}. Two strings are anagrams if their character frequency maps are equal.

Diagram: Input → Frequency Map

  arr = [1, 2, 2, 3, 2, 1]
  Step:  scan 1 → scan 2 → scan 2 → scan 3 → scan 2 → scan 1

  freq:  {}  →  {1:1}  →  {1:1, 2:1}  →  {1:1, 2:2}  →  {1:1, 2:2, 3:1}  →  {1:1, 2:3, 3:1}  →  {1:2, 2:3, 3:1}
          │       │            │              │                │                    │
          └───────┴────────────┴──────────────┴────────────────┴────────────────────┴──→ one pass

  Final freq = {1: 2, 2: 3, 3: 1}   →   "most frequent" = key with max value = 2

Building a Frequency Map

For any iterable (list, string, etc.), the pattern is: iterate once; for each element, set freq[element] = freq.get(element, 0) + 1 (or use defaultdict(int) or Counter).

Using a plain dict

def build_freq(arr):
    freq = {}
    for x in arr:
        freq[x] = freq.get(x, 0) + 1
    return freq

# Example
print(build_freq([1, 2, 2, 3, 2, 1]))   # {1: 2, 2: 3, 3: 1}
print(build_freq("hello"))               # {'h': 1, 'e': 1, 'l': 2, 'o': 1}

Using defaultdict(int)

from collections import defaultdict

def build_freq_defaultdict(arr):
    freq = defaultdict(int)
    for x in arr:
        freq[x] += 1
    return dict(freq)

Using Counter

from collections import Counter

# Counter is built for this
freq = Counter([1, 2, 2, 3, 2, 1])   # Counter({2: 3, 1: 2, 3: 1})
char_freq = Counter("hello")          # Counter({'l': 2, 'h': 1, 'e': 1, 'o': 1})

# Most common n elements
print(Counter("hello").most_common(2))   # [('l', 2), ('h', 1)]

Common Frequency Patterns

1. Element with maximum frequency

After building freq, find the key with the largest value. One pass over the dict, or use max(freq, key=freq.get).

def most_frequent(arr):
    freq = {}
    for x in arr:
        freq[x] = freq.get(x, 0) + 1
    return max(freq, key=freq.get)   # key whose value is maximum

# Or with Counter
def most_frequent_counter(arr):
    return Counter(arr).most_common(1)[0][0]

2. K most frequent elements

Build frequency map, then either: (a) sort (key, count) by count and take top k — O(n log k) with a heap or O(n log n) with full sort; or (b) use bucket sort: bucket[i] = list of elements with frequency i — O(n).

def top_k_frequent(nums, k):
    freq = Counter(nums)
    # bucket[i] = elements that appear i times
    n = len(nums)
    buckets = [[] for _ in range(n + 1)]
    for x, count in freq.items():
        buckets[count].append(x)
    result = []
    for i in range(n, 0, -1):
        for x in buckets[i]:
            result.append(x)
            if len(result) == k:
                return result
    return result

3. Elements that appear exactly k times

Build freq; then collect all keys where freq[key] == k.

def elements_with_freq_k(arr, k):
    freq = Counter(arr)
    return [x for x, c in freq.items() if c == k]

4. Checking anagrams (same character frequencies)

Two strings are anagrams if their character frequency maps are equal. So build Counter(s1) and Counter(s2) and check equality, or use sorted(s1) == sorted(s2).

def are_anagrams(s1, s2):
    return Counter(s1) == Counter(s2)

# Without Counter
def are_anagrams_manual(s1, s2):
    if len(s1) != len(s2):
        return False
    f1, f2 = {}, {}
    for c in s1:
        f1[c] = f1.get(c, 0) + 1
    for c in s2:
        f2[c] = f2.get(c, 0) + 1
    return f1 == f2

Why This Topic Matters

Interview staple: "Count frequency", "find most frequent", "group by frequency", and "anagrams" appear constantly. The hash-table-one-pass pattern is expected.
Streaming and big data: When you can't store the whole array, you can still maintain a frequency map of what you've seen so far (e.g. for approximate "heavy hitters" or exact counts if the number of distinct keys is small).
Preprocessing step: Many problems (e.g. "substring with same character counts", "permutation in string") reduce to building or comparing frequency maps over windows.

Grouping by Key (Beyond Count)

Sometimes you don't need counts—you need to group elements by a key (e.g. group anagrams by "sorted string" or "tuple of counts"). Use dict with value as list; setdefault(key, []).append(item) or defaultdict(list).

# Group anagrams: key = sorted string, value = list of words
def group_anagrams(words):
    groups = defaultdict(list)
    for w in words:
        key = tuple(sorted(w))   # or "".join(sorted(w))
        groups[key].append(w)
    return list(groups.values())

# Example
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(words))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

Time and Space Complexity

Building frequency map: One pass over n elements; dict insert/lookup O(1) average. Time O(n), space O(k) where k = number of distinct elements.
Max frequency / k most frequent: After building the map, max over k keys is O(k); bucket approach for top-k is O(n). Overall O(n) time, O(n) space for buckets.
Anagram check: O(n) for each string, n = length. O(n) time, O(1) space if alphabet size is fixed (e.g. 26 letters).

Why O(n) for building: We touch each of the n elements exactly once. For each element we do one dict lookup (O(1) average) and one insert/update (O(1) average). So total time is n · O(1) = O(n). Space is one entry per distinct key, so O(k) where k ≤ n.

Edge Cases

Empty input: Return empty dict {} or empty list for "top k" / "elements with freq k".
Single element: freq has one key with count 1; "most frequent" is that element; anagram of two single-char strings is just equality of the two chars.
All elements same: One key with count n; top-k returns the same element k times (or k copies) depending on problem wording.
k larger than distinct count: In "top k frequent," if there are fewer than k distinct elements, return all of them (or pad as per problem).

Pattern Recognition

Use a frequency map when the problem involves: "count occurrences," "most frequent," "least frequent," "elements appearing exactly k times," "anagrams," "group by same multiset," "majority element," "first unique/non-repeating." The pattern is: one pass to build freq, then one or more passes (or a single aggregation) to answer the question.

Common Mistakes

Common Mistake

Assuming order in frequency map. Plain dict in Python 3.7+ preserves insertion order, so the order you see when iterating is the order of first occurrence. For "most frequent" you must explicitly find the key with max value—don't assume the first or last key is the answer.

Off-by-one in "first k" vs "all with frequency ≥ x": "Top k frequent" means exactly k elements; "all elements with frequency ≥ 2" is a different problem. Read the problem carefully.
Anagrams: case and spaces: Often problem says "ignore case" or "ignore spaces". Normalize (e.g. lower and remove spaces) before building the Counter.

Expert Tip

Use Counter when you only need counts; it has most_common(k) and supports addition/subtraction. Use defaultdict(int) when you need to update counts in place or combine with other logic. Use plain dict with get(x, 0) + 1 when you want no extra imports.

Interview Insight

When you see "count occurrences", "most frequent", "anagram", or "group by same character set", say: "I'll use a hash table to build a frequency map in one pass. For anagrams I'll compare Counter(s1) == Counter(s2) or group by a canonical key like sorted string." Then code the one-pass loop and the query (max, top-k, or equality).

Practice Problems

LeetCode 347: Top K Frequent Elements (bucket sort or heap).
LeetCode 49: Group Anagrams (group by sorted string or tuple of counts).
LeetCode 242: Valid Anagram (Counter(s1) == Counter(s2)).
LeetCode 387: First Unique Character (build freq, then scan for first with count 1).

Summary

Frequency map: One pass: freq[x] = freq.get(x, 0) + 1 or Counter(iterable). Use for counts, max frequency, top-k, anagrams, grouping.
Max frequency: max(freq, key=freq.get) or Counter(...).most_common(1).
Anagrams: Same character counts → Counter(s1) == Counter(s2). Group anagrams by key = sorted string or tuple of counts.
Prefer Counter when you need most_common; use defaultdict(int) or plain dict when you need custom updates.

10.5 Two Sum Pattern

Introduction

The Two Sum problem asks: given an array of numbers and a target value, find two elements (by value or by index) that add up to the target. The hash-table solution is the standard approach: in a single pass, for each element x, check whether target - x has already been seen; if so, you have a pair. Store seen values (and optionally their indices) in a dict for O(1) lookup. This pattern generalizes to "find a pair satisfying a condition" and to variants like Three Sum (reduce to Two Sum) or "count pairs with given sum." Mastering Two Sum is essential—it appears in interviews constantly and is the building block for many other problems.

Real-World Analogy

Imagine you're in a store with a fixed budget. You pick up an item and check its price. To know if you can buy two items that exactly match your budget, you need to remember the prices of items you've already seen. When you look at a new item, you ask: "Have I already seen an item whose price is (budget minus this item's price)?" A hash table is like a quick lookup pad: you write down each price as you see it, and when you see a new one you instantly check if the "complement" is on your list.

Example

nums = [2, 7, 11, 15], target = 9. We need two numbers that add to 9. At index 0 we see 2; we need 9 - 2 = 7. Store 2 → 0. At index 1 we see 7; 9 - 7 = 2 is already in the dict at index 0. So indices 0 and 1 (values 2 and 7) are the answer: [0, 1].

Diagram: One-Pass Two Sum

  nums = [ 2,  7, 11, 15 ]    target = 9
  index:   0   1   2   3

  i=0:  x=2,  complement=9-2=7.  7 in seen? No.  seen = {2:0}
  i=1:  x=7,  complement=9-7=2.  2 in seen? Yes (at 0).  Return [0, 1] ✓

  Visual:
  ┌─────┬─────┬─────┬─────┐
  │  2  │  7  │ 11  │ 15  │   ← array
  └──┬──┴──┬──┴─────┴─────┘
     │     │
     │     └── at i=1: need 2 → found at index 0 → pair (0, 1)
     └──────── at i=0: store seen[2]=0, need 7 (not seen yet)

Formal Definition

Given a sequence nums[0..n-1] and integer target, find distinct indices i, j such that nums[i] + nums[j] = target. Output is typically the pair (i, j) or [i, j]. Uniqueness: exactly one valid pair is assumed in the classic problem; we do not reuse the same index twice.

Problem Statement (Classic)

Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target. Assume exactly one solution exists, and you may not use the same element twice.

Thinking Evolution: Brute Force → Better → Optimal

Brute Force: Check Every Pair — O(n²)

Try all pairs (i, j) with i < j: if nums[i] + nums[j] == target, return [i, j]. Two nested loops; no extra space. Correct but slow for large n.

def two_sum_brute(nums, target):
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] + nums[j] == target:
                return [i, j]
    return []

Better: Sort + Two Pointers — O(n log n) time, O(1) extra (if we can lose indices)

If we only needed values, we could sort and use two pointers at the ends. But we need indices, so we must keep (value, index) pairs and sort by value, then run two pointers—still O(n log n). Good when the array is already sorted (then O(n)).

Optimal: Hash Table (One Pass) — O(n) time, O(n) space

For each x = nums[i], the needed partner is target - x. If we have already seen that value at some index j, we are done. So we maintain a mapping "value → index" and check before adding the current element. One pass, O(n) time and space.

Hash-Table Solution (One Pass)

Idea: as we iterate, for each nums[i], the value we need to complete the pair is complement = target - nums[i]. If complement was seen at some earlier index j, return [j, i]. Otherwise, record nums[i] → i in a dict and continue.

def two_sum(nums, target):
    seen = {}   # value -> index
    for i, x in enumerate(nums):
        complement = target - x
        if complement in seen:
            return [seen[complement], i]
        seen[x] = i
    return []   # no solution (problem usually guarantees one)

Line-by-Line Explanation

seen = {}: We will store "value we've seen → smallest index where we saw it" (so we return the earlier index first).
for i, x in enumerate(nums): Process each element once; i is the index, x is the value.
complement = target - x: The other number we need so that x + complement = target.
if complement in seen: We've already seen that value at index seen[complement]. So [seen[complement], i] is a valid pair; return it.
seen[x] = i: Record that we've seen value x at index i for future lookups. We do this after the check so we never use the same index twice.

Why one pass works: when we are at index i, all indices 0..i-1 are already in seen. So if the pair is (j, i) with j < i, we will find complement = target - nums[i] equal to nums[j] when we process i.

Edge Cases

No solution: Return [] or whatever the problem specifies; the one-pass loop will finish without returning.
Duplicate values: Storing seen[x] = i overwrites the previous index. We only need one valid pair; returning the first pair we find (earliest j) is usually acceptable. If the problem requires "all pairs," use a list of indices per value.
target - x == x (same element would satisfy 2x = target): We check complement in seen before adding x to seen, so we only match with an earlier index. We never use the same index twice.

Two-Pass Variant

First pass: build seen (value → index). Second pass: for each i, if target - nums[i] is in seen and the stored index is not i, return [seen[target - nums[i]], i]. Same O(n) time and O(n) space; one pass is cleaner and avoids the "same index" check when the complement is at the same position.

Return Values vs Indices

Some problems ask for the values of the two numbers, not indices. Same algorithm: when you find the pair, return [complement, x] or [nums[j], nums[i]]. If the problem asks for count of pairs with sum equal to target, use a frequency map: for each x, add freq.get(target - x, 0) to the count (handle same-index and duplicates per problem rules), then update freq[x].

Why This Topic Matters

Interview staple: Two Sum is one of the most asked problems. The hash-table pattern is the expected optimal solution (O(n) time).
Pattern for "find pair": Any "find two elements such that f(a, b) = target" can sometimes be reduced to storing seen values and checking for a complement. Works when the relation can be rewritten as "need b = g(target, a)" and g is easy to compute.
Building block: Three Sum is often "fix one element, then Two Sum on the rest"; 4Sum and similar follow. Subarray sum problems use prefix-sum + hash table (different but related idea).

Variants and Extensions

Two Sum II – Sorted array

If the array is sorted, use two pointers at the start and end: if nums[lo] + nums[hi] > target, decrement hi; if smaller, increment lo; if equal, return. O(n) time, O(1) extra space. No hash table needed.

Count pairs with given sum

Use a frequency map. For each x, count how many times target - x has been seen (excluding current element if needed). Add that to the result, then do freq[x] += 1. Handle duplicates and "same index" per problem.

Two Sum – Return all unique pairs (values)

Sort and use two pointers, or use a set of pairs and a set of seen values to avoid duplicates. Depends on whether duplicates in the array are allowed and whether (a,b) and (b,a) count as one.

Time and Space Complexity

Time: O(n) — one pass over the array; dict lookup and insert are O(1) average.
Space: O(n) — in the worst case we store n distinct values in the dict.

Common Mistakes

Common Mistake

Using the same element twice. In the one-pass solution we store seen[x] = i after checking for the complement, so we never use the same index twice. In a two-pass solution, you must ensure seen[complement] != i before returning.

Returning value instead of index: Read the problem: often "return indices" is required. Store index in the dict, not just presence.
Order of result: Some problems want the smaller index first. Our one-pass returns [seen[complement], i] so the first index is always smaller.

Expert Tip

When you see "find two numbers that sum to target", say: "I'll use a hash table. For each element I'll check if (target - element) was already seen; if yes, we have a pair. I'll store each value and its index so I can return indices." Then code the one-pass loop.

Interview Insight

If asked "can you do it in O(1) space?", mention that for an unsorted array, the standard optimal is O(n) time and O(n) space with a hash table. O(1) space would require sorting (then two pointers), which is O(n log n) time—so there's a time–space tradeoff. For a sorted array, two pointers give O(n) time and O(1) space.

Practice Problems

LeetCode 1: Two Sum (classic; return indices).
LeetCode 167: Two Sum II – Input Array Is Sorted (two pointers).
LeetCode 15: 3Sum (fix one, then two sum on the rest; avoid duplicate triplets).
LeetCode 18: 4Sum (similar idea with two fixed elements or one fixed + 3Sum).

Summary

Two Sum (unsorted): One pass with a dict mapping value → index. For each x, if target - x is in the dict, return [seen[target-x], i]; else seen[x] = i.
Time O(n), space O(n). Same pattern extends to "count pairs" (use frequency map) and to "find pair" for other relations when you can define a complement.
Sorted array: Two pointers at both ends, move based on sum vs target — O(n) time, O(1) space. Three Sum / Four Sum often reduce to Two Sum after fixing one or two elements.

10.6 Subarray with Given Sum

Introduction

Given an array (possibly with negative numbers) and a target sum, the problem is to find a contiguous subarray whose elements add up to that target. The efficient solution uses prefix sums plus a hash table: if the prefix sum at index i is P[i], then a subarray from j+1 to i has sum P[i] - P[j]. So we need P[i] - P[j] = target, i.e. P[j] = P[i] - target. As we compute prefix sums left to right, we store each prefix sum (and index or count) in a dict; at each step we check whether current_prefix - target was seen before. This gives O(n) time and O(n) space. The same idea applies to "count subarrays with given sum" or "longest subarray with sum k."

Formal Definition

A subarray of arr is a contiguous block arr[i], arr[i+1], ..., arr[j] for some 0 ≤ i ≤ j < n. The sum of this subarray is arr[i] + arr[i+1] + ... + arr[j]. We define prefix sum P[k] = arr[0] + arr[1] + ... + arr[k] (with P[-1] = 0 for the empty prefix). Then the sum of arr[i..j] equals P[j] - P[i-1]. Finding a subarray with sum target is equivalent to finding indices i, j with i ≤ j such that P[j] - P[i-1] = target, i.e. P[i-1] = P[j] - target. So for each ending position j, we need a starting position i-1 (or i) where the prefix sum was P[j] - target.

Mental Model

Imagine walking along the array and keeping a running total (prefix sum). At each step you ask: "Have I ever had a running total that is exactly (current total minus target)?" If yes, the segment from that earlier moment to now has sum target. The hash table is your memory of "prefix sum → when you had it" (index or count).

Real-World Analogy

Imagine a road with mile markers. You know the total distance from the start to each marker (that's your "prefix sum"). To find a segment that is exactly 10 miles long, you ask: "At which earlier marker was the cumulative distance exactly (current distance minus 10)?" The hash table is your quick log of "cumulative distance → marker number." When you reach a new marker, you look up whether you've already passed a point where the cumulative distance was 10 less than now—if yes, the stretch between that point and now is 10 miles.

Example

arr = [1, 2, 3, 4, 5], target = 9. Prefix sums: P[0]=1, P[1]=3, P[2]=6, P[3]=10, P[4]=15. At index 3, prefix sum is 10. We need P[j] = 10 - 9 = 1. Prefix sum 1 was at index 0 (before index 0 we had sum 0). So subarray from index 1 to 3 (1-based: positions 2 to 4) has sum 9: [2, 3, 4] → 2+3+4 = 9. So the answer is subarray [2, 3, 4] or indices [1, 3] (0-based).

Prefix Sum Idea

Define prefix[i] = sum of arr[0..i] (inclusive). By convention we also define prefix[-1] = 0 (empty prefix). Then the sum of arr[j+1..i] is:

sum(arr[j+1..i]) = prefix[i] - prefix[j]

So we want prefix[i] - prefix[j] = target, i.e. prefix[j] = prefix[i] - target. As we iterate i from 0 to n-1, we maintain a running curr_sum (which is prefix[i]). We need to know: have we seen the value curr_sum - target at some earlier index? If yes, the subarray from (that index + 1) to i has sum target.

Diagram: Prefix Sum and Segment

  arr   = [ 1,  2,  3,  4,  5 ]    target = 9
  index =   0   1   2   3   4

  prefix:  P[-1]=0, P[0]=1, P[1]=3, P[2]=6, P[3]=10, P[4]=15

  To get sum 9 for segment [1..3] (elements 2,3,4):
  sum(arr[1..3]) = 2+3+4 = 9  =  P[3] - P[0]  =  10 - 1 = 9 ✓

  So:  prefix[j] = P[0] = 1,  prefix[i] = P[3] = 10.
  We need prefix[j] = prefix[i] - target = 10 - 9 = 1.  Seen at j=0? Yes.
  Segment from index (0+1) to 3  →  arr[1..3] = [2,3,4], sum = 9.

  Timeline as we scan:
  i=0: curr=1,  need=1-9=-8  (not in seen).  seen = {0:-1, 1:0}
  i=1: curr=3,  need=3-9=-6  (not in seen).  seen = {..., 3:1}
  i=2: curr=6,  need=6-9=-3  (not in seen).  seen = {..., 6:2}
  i=3: curr=10, need=10-9=1  (in seen at 0).  Return (0+1, 3) = (1, 3) ✓

Return One Subarray (Indices or Yes/No)

Store in the hash table: prefix_sum → smallest index where that prefix sum occurred (so we get the longest such subarray if we care, or any valid one). Initialize with 0 → -1 so that a subarray starting at index 0 is handled (we need "prefix[-1] = 0").

def subarray_sum(nums, target):
    # Returns (start, end) 0-based indices if found, else (-1, -1)
    seen = {0: -1}   # prefix sum 0 at "index" -1 (before start)
    curr = 0
    for i, x in enumerate(nums):
        curr += x
        need = curr - target
        if need in seen:
            start = seen[need] + 1
            return (start, i)
        seen[curr] = i   # store first occurrence for shortest/longest logic
    return (-1, -1)

# Example
nums = [1, 2, 3, 4, 5]
print(subarray_sum(nums, 9))   # (1, 3)  -> arr[1:4] = [2,3,4], sum = 9

Why seen[0] = -1? So when curr == target (e.g. first few elements sum to target), we have need = curr - target = 0, and we want to say "from start (index 0) to i". The "index" for prefix sum 0 is -1, so start = -1 + 1 = 0.

Count Subarrays with Given Sum

Instead of storing one index per prefix sum, store the count of how many times each prefix sum has occurred. When at index i with prefix sum curr, the number of subarrays ending at i with sum target is the count of indices j such that prefix[j] = curr - target, i.e. seen.get(curr - target, 0). Add that to the result, then do seen[curr] += 1 (and initialize seen[0] = 1 for the empty prefix).

def count_subarray_sum(nums, target):
    seen = {0: 1}   # prefix sum 0 has occurred once (empty prefix)
    curr = 0
    count = 0
    for x in nums:
        curr += x
        need = curr - target
        count += seen.get(need, 0)
        seen[curr] = seen.get(curr, 0) + 1
    return count

# Example: [1, 2, 3], target=3. Prefix: 1, 3, 6.
# At i=0: curr=1, need=1-3=-2 -> 0; seen[1]=1.
# At i=1: curr=3, need=3-3=0 -> 1 (empty prefix); count=1; seen[3]=1.
# At i=2: curr=6, need=6-3=3 -> 1; count=2; seen[6]=1.
# Subarrays with sum 3: [1,2] and [3]. So count=2.

Handling Negative Numbers

The prefix-sum + hash table approach works with negative numbers. Sliding window with two pointers does not work when the array can have negatives (shrinking the window can both increase or decrease the sum). So for arrays with negatives, the prefix-sum + hash table method is the correct O(n) approach.

Longest Subarray with Sum K

Store prefix_sum → first index where that sum was seen. When you see curr - target in seen, the subarray from seen[curr-target]+1 to i has sum target; update the maximum length. Only store the first occurrence of each prefix sum so that the subarray length i - seen[need] is as large as possible.

def longest_subarray_sum_k(nums, k):
    seen = {0: -1}
    curr = 0
    max_len = 0
    for i, x in enumerate(nums):
        curr += x
        need = curr - k
        if need in seen:
            length = i - seen[need]
            max_len = max(max_len, length)
        if curr not in seen:   # keep first occurrence only
            seen[curr] = i
    return max_len

Why This Topic Matters

Interview staple: "Subarray with given sum" and "count subarrays with sum k" are common. The prefix-sum + hash table pattern is the standard solution when negatives are allowed.
Difference from Two Sum: Two Sum finds two elements; here we find a contiguous segment. The key is rephrasing "sum of segment = target" as "two prefix sums differ by target."
Positive-only arrays: If all elements are positive, a sliding window (expand right until sum ≥ target, then shrink left) also works in O(n). But with negatives, prefix sum + hash is the way to go.

Time and Space Complexity

Time: O(n) — one pass; each dict lookup and insert is O(1) average.
Space: O(n) — up to n distinct prefix sums in the worst case.

Edge Cases

Empty array: No subarray exists; return false, 0, or [].
Target = 0: Subarray with sum 0 exists if any prefix sum repeats (same prefix seen twice) or if we have prefix 0 (segment from start). seen[0] = 1 or seen[0] = -1 handles "segment from index 0."
First element equals target: Prefix after index 0 is target only if we had prefix 0 before (empty prefix). So initializing seen[0] is essential.

Pattern Recognition

Use prefix sum + hash table when you see: "contiguous subarray with sum k," "count subarrays with sum k," "longest/shortest subarray with sum k," or "subarray sum equals target" and the array may contain negative numbers. If the problem says "all positive," you can also mention sliding window as an alternative (O(1) space).

Common Mistakes

Common Mistake

Forgetting to initialize prefix sum 0. The empty prefix has sum 0. You must put seen[0] = -1 (for index version) or seen[0] = 1 (for count version) before the loop. Otherwise you miss subarrays that start at index 0.

Using sliding window for arrays with negatives: Sliding window assumes monotonic change when you move the pointer. With negatives, the sum can go up when you shrink the window, so the two-pointer shrink logic fails. Use prefix sum + hash instead.
Store last vs first occurrence: For "longest subarray with sum k" you want the first index where each prefix sum was seen. For "shortest subarray" you'd store the last occurrence (update every time).

Expert Tip

When you hear "subarray sum equals k" or "contiguous subarray with sum target", say: "I'll use prefix sums and a hash table. At each position I'll check if (current prefix sum minus target) was seen before; that means there's a segment from that earlier position to here with sum target. I'll initialize with prefix 0 so segments starting at index 0 are counted."

Interview Insight

If the interviewer says "all elements are positive", you can mention both approaches: (1) Sliding window: expand until sum ≥ target, then shrink from the left—O(n) time, O(1) space. (2) Prefix sum + hash: still works, O(n) time and O(n) space. For arrays with negatives, only (2) is correct.

Practice Problems

LeetCode 560: Subarray Sum Equals K (count subarrays with sum k).
LeetCode 325: Maximum Size Subarray Sum Equals k (longest subarray with sum k).
GeeksforGeeks: Subarray with given sum (return start/end indices; handle negatives).

Summary

Subarray sum = target is solved with prefix sum + hash table. At each index, check if current_prefix - target was seen; if yes, the segment from (that index + 1) to current has sum target.
Initialize seen[0] = -1 (for indices) or seen[0] = 1 (for count) so subarrays starting at index 0 are included.
Find one: store prefix_sum → index; count: store prefix_sum → count; longest: store prefix_sum → first index only. Works with negative numbers; sliding window does not.

10.7 Custom Hashing

Introduction

Custom hashing means using objects or composite values as keys in a hash table (e.g. Python dict) when built-in types are not enough. To use a custom type as a key, it must be hashable: it must implement __hash__ and __eq__ so that equal objects have the same hash and the hash does not change over the object's lifetime (so typically the object is immutable). For composite keys—e.g. "pair (a, b)" or "set of items"—we use tuple or frozenset, which are hashable when their elements are hashable. Sometimes you need to design a hash function for a domain (e.g. mapping (i, j) to a single integer) to avoid collisions and keep lookups O(1). This section covers making custom classes hashable, using tuples and frozensets as keys, and simple hash-function design.

Real-World Analogy

Think of a library where books are filed by "author + title" together. The filing system needs a single label (like a hash) for each (author, title) pair. You define a rule: e.g. "author alphabetically, then title," and that rule must always give the same label for the same pair and never change. Custom hashing is that rule: you decide how to turn your composite key into something the hash table can use, and you keep that rule consistent.

Example

You want to count how many times the pair (x, y) appears in a list of coordinate pairs. Keys must be hashable—so use (x, y) as the key: freq[(x, y)] = freq.get((x, y), 0) + 1. For "group by set of tags" (order doesn't matter), use frozenset(tags) as the key so that {'a','b'} and {'b','a'} map to the same key.

Hashable Requirements in Python

An object is hashable if:

It implements __hash__() returning an integer that does not change during the object's lifetime.
It implements __eq__() for equality. If a == b, then hash(a) == hash(b) must hold.
Immutable types (int, float, str, tuple, frozenset) are hashable when their contents are hashable. Mutable types (list, dict, set) are not hashable.

Diagram: Tuple vs Frozenset as Key

  Tuple (order matters):          Frozenset (order doesn't matter):
  (1, 2) and (2, 1) are          frozenset({1,2}) == frozenset({2,1})
  different keys.                same key → one bucket for "set of 1 and 2"

  Points: (1,2) → count 3        Tags: {"a","b"} → group [item1, item2]
          (2,1) → count 1                 {"b","a"} → same group

Using Tuples as Composite Keys

When the key is a fixed-length combination of values (pair, triple, etc.), use a tuple. Tuples are hashable if all elements are hashable.

# Count occurrences of (x, y) pairs
points = [(1, 2), (1, 3), (1, 2), (2, 2)]
freq = {}
for p in points:
    freq[p] = freq.get(p, 0) + 1
print(freq)   # {(1, 2): 2, (1, 3): 1, (2, 2): 1}

# Group by (row, col) in a grid
grid_data = {(0, 0): 'A', (0, 1): 'B', (1, 0): 'C'}
# (row, col) is a natural composite key

Order matters: (1, 2) and (2, 1) are different keys. Use a tuple when the order of components is part of the key.

Using Frozenset for Set-Like Keys

When the key is a set of items (order doesn't matter), use frozenset. Two sets with the same elements must map to the same key.

# Group anagrams: key = set of (char, count) or just sorted tuple
# Alternative: key = frozenset(Counter(s).items()) so "aab" and "aba" match
from collections import Counter

def group_by_char_set(words):
    groups = {}
    for w in words:
        # frozenset of (char, count) pairs - order doesn't matter
        key = frozenset(Counter(w).items())
        groups.setdefault(key, []).append(w)
    return list(groups.values())

# Or simpler for anagrams: key = tuple(sorted(s))
def group_anagrams_sorted(words):
    groups = {}
    for w in words:
        key = tuple(sorted(w))
        groups.setdefault(key, []).append(w)
    return list(groups.values())

Making a Custom Class Hashable

To use your class instances as dict keys, implement __hash__ and __eq__. The object should be immutable (or at least the fields used in __hash__ and __eq__ must not change after creation).

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __eq__(self, other):
        if not isinstance(other, Point):
            return False
        return self.x == other.x and self.y == other.y

    def __hash__(self):
        return hash((self.x, self.y))   # delegate to tuple's hash

# Now Point can be used as key
d = {}
d[Point(1, 2)] = "hello"
print(d[Point(1, 2)])   # "hello"

Rule: Include in __hash__ exactly the same fields you use in __eq__. Typically return hash((self.a, self.b, ...)) so that equal objects get the same hash.

Designing a Simple Hash for Indices (i, j)

When you have two indices i, j in a bounded range (e.g. 0 to n-1), you can map them to a single integer to use as a key: key = i * n + j (row-major) or i + j * n (column-major). This avoids storing tuples and can be faster. Reverse: i, j = key // n, key % n.

def flatten_2d(i, j, cols):
    return i * cols + j

def unflatten(key, cols):
    return key // cols, key % cols

# Use in dict for 2D memoization
n, m = 10, 10
memo = {}
def dfs(i, j):
    k = flatten_2d(i, j, m)
    if k in memo:
        return memo[k]
    # ... compute and store memo[k] = result

Why This Topic Matters

Dict keys: Many problems require grouping or lookup by a composite key (pair, triple, set). Tuples and frozensets are the standard way; custom classes are needed when you want a named type as key.
Immutability: If you use a list as part of a key, Python will raise "unhashable type: list". Convert to tuple(lst) or frozenset(lst) as appropriate.
Interview clarity: Saying "I'll use (a, b) as the key" or "I'll use frozenset for set equality" shows you understand hashability and key design.

Common Mistakes

Common Mistake

Using a list or dict as a key. d[[1,2]] = 3 raises TypeError. Use d[tuple([1,2])] or d[(1,2)]. If the key is a set (order doesn't matter), use d[frozenset([1,2])].

Mutable default or mutable fields in __hash__: If your class has list/dict attributes and you use them in __hash__, mutating them later changes the hash and breaks the dict. Prefer immutable fields for hashable types.
Tuple with mutable elements: (1, [2, 3]) is not hashable because the list is mutable. Only tuples of hashable elements are hashable.

Expert Tip

For composite keys: use tuple when order matters (e.g. (i, j), (a, b)); use frozenset when you need set equality (e.g. same set of tags). For custom classes, implement __eq__ and __hash__ using the same fields and keep those fields immutable.

Interview Insight

When the problem says "group by" or "count by" a composite (pair, set, etc.), say: "I'll use a dict with a composite key. For a pair I'll use a tuple (a, b); for a set I'll use frozenset so order doesn't matter. Keys must be hashable, so I'll avoid lists." Then write the loop with key = (x, y) or key = frozenset(...) and d[key] = d.get(key, 0) + 1 (or setdefault for lists).

Practice Problems

LeetCode 49: Group Anagrams (key = tuple(sorted(s)) or frozenset(Counter(s).items())).
Problems that count pairs (i, j) or state (i, j): use (i, j) or flatten to i*n+j as key.
DP or memoization with 2D state: dict key = (i, j) or flattened index.

Summary

Hashable: Implement __hash__ and __eq__; equal objects must have the same hash; keep key state immutable.
Composite keys: Use tuple for ordered pairs/tuples (e.g. (i, j), (a, b)); use frozenset for set-like keys where order doesn't matter.
Custom class: __hash__ and __eq__ based on the same fields; hash((self.x, self.y)) is a simple pattern. For 2D indices, flatten with i * cols + j if you need a single integer key.

Section 11: Trees

This section covers trees from fundamentals to advanced structures and techniques. You will learn terminology (root, depth, height, subtree), traversals (inorder, preorder, postorder, level-order), height and diameter, balanced trees, LCA, path sum problems, serialization, and then BST, AVL, Red-Black trees, Trie, Segment Tree, Fenwick Tree, Sparse Table, Binary Lifting, Euler Tour, Heavy-Light Decomposition, and Centroid Decomposition. Master these to handle tree problems in interviews and contests.

11.1 Tree Terminology

Introduction

A tree is a hierarchical data structure consisting of nodes connected by edges, with no cycles and exactly one path between any two nodes. One node is designated the root; every other node has exactly one parent and zero or more children. Trees are fundamental in DSA: binary trees, BSTs, heaps, tries, and segment trees all build on this structure. To read problems and implement solutions correctly, you must be precise about terms like root, leaf, height, depth, ancestor, and subtree. This section defines all essential tree terminology with examples and ties it to how we represent trees in code.

Mental Model

Picture an upside-down tree: the root at the top, branches (edges) going down to children, and leaves at the bottom. "Depth" is how far down you are from the root (steps from the top). "Height" is how far down the longest branch goes from a given node (steps to the lowest leaf). Recursion on trees almost always says: "Do something at this node, then do the same thing for the left and right subtrees"—so the subtree is your unit of thinking.

What Is a Tree (Formal)

Tree: An undirected, connected, acyclic graph. So: (1) there is a path between any two nodes, (2) there are no cycles.
Rooted tree: A tree with one node chosen as the root. All edges are then thought of as directed "away from the root" (parent → child).
Node: An element of the tree that holds a value (and possibly left/right pointers in a binary tree).
Edge: A connection between two nodes (parent–child). In a rooted tree we say the edge goes from parent to child.

Real-World Analogy

Think of a company org chart: the CEO is the root; each person has one manager (parent) and possibly several reports (children). The hierarchy has no loops (no one reports to themselves through a chain), and from any person you can trace a single path up to the CEO. "Depth" is how many levels below the CEO; "height" of a person is the longest chain of reports below them.

Example

Tree with root 1, left child 2, right child 3; node 2 has left 4 and right 5. So nodes 4 and 5 are leaves; 2 and 3 are internal. The path from root to 5 is 1 → 2 → 5. Depth of 5 is 2 (if root has depth 0). Height of node 2 is 1; height of the tree (root) is 2.

Core Terminology (All Points)

1. Root

The root is the topmost node; it has no parent. All other nodes are descendants of the root. In code we usually hold a reference to the root and traverse from there.

2. Parent and Child

For an edge (u, v) in the rooted tree, if the edge is directed from u to v, then u is the parent of v, and v is a child of u. Every node has at most one parent; the root has none.

3. Sibling

Nodes that share the same parent are siblings. In a binary tree, the left and right children of a node are siblings.

4. Leaf (External Node)

A leaf is a node with no children. Leaves are the "bottom" of the tree. In a binary tree, a leaf has both left and right as null/None.

5. Internal Node

An internal node is any node that has at least one child (i.e. not a leaf).

6. Edge

An edge is a link between a parent and a child. A tree with n nodes has exactly n − 1 edges (this follows from connected + acyclic).

7. Path

A path between two nodes is the sequence of edges connecting them. In a tree there is exactly one path between any two nodes. The path length is the number of edges (or the number of nodes minus one on that path).

8. Depth (of a node)

The depth of a node is the number of edges on the path from the root to that node. The root has depth 0. Its children have depth 1, and so on. Depth is "distance from root."

9. Level

Level is sometimes used like depth: level 0 = root, level 1 = nodes at depth 1, etc. In some contexts "level" is 1-based (root at level 1); clarify in problems. Here we use depth (0-based from root) consistently.

10. Height (of a node)

The height of a node is the number of edges on the longest path from that node down to a leaf. Leaves have height 0. The height of a node is the max height of its children plus one (or 0 if no children).

11. Height of the tree

The height of the tree is the height of the root. Equivalently, it is the maximum depth of any node (since the deepest leaf is at depth = tree height). Empty tree (no root) is often defined to have height −1 so that a single-node tree has height 0.

12. Degree (of a node)

The degree of a node is the number of children it has. In a binary tree, each node has degree 0, 1, or 2 (left child, right child, or both).

13. Subtree

For any node u, the subtree rooted at u is the node u together with all its descendants and the edges between them. It is itself a tree with root u. Recursive algorithms often "process the subtree at u" by processing u and then recursively processing its left and right subtrees.

14. Ancestor and Descendant

If there is a path from node u down to node v (following parent-to-child edges), then u is an ancestor of v, and v is a descendant of u. The root is an ancestor of every node. A node is an ancestor of itself (and a descendant of itself) in many definitions; in others "proper" ancestor excludes the node itself. Be consistent: in LCA problems, a node is usually considered its own ancestor.

15. Binary tree

A binary tree is a rooted tree in which each node has at most two children, typically called left and right. This is the most common tree type in interviews. A full binary tree has every node with 0 or 2 children; a complete binary tree is filled level by level (used for heaps).

Diagram (ASCII)

           1         (root; depth 0, height 2)
          / \
         2   3       (depth 1; 2 has height 1, 3 has height 0)
        / \
       4   5         (leaves; depth 2, height 0)

  Edges: (1,2), (1,3), (2,4), (2,5).  5 nodes, 4 edges.
  Path from 1 to 5: 1 → 2 → 5 (length 2).
  Subtree at 2: nodes {2, 4, 5}.  Ancestors of 5: 5, 2, 1.

Representation in Code (Python)

We represent a binary tree node with a class that has a value and optional left and right children. The tree is referenced by its root.

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

# Example: build the tree in the diagram
root = TreeNode(1)
root.left = TreeNode(2)
root.right = TreeNode(3)
root.left.left = TreeNode(4)
root.left.right = TreeNode(5)

# Check leaf: node is leaf iff (node.left is None and node.right is None)
def is_leaf(node):
    return node and node.left is None and node.right is None

Key Formulas and Facts

n nodes ⇒ n − 1 edges.
Height h (max depth) ⇒ at most 2^(h+1) − 1 nodes in a binary tree (full tree). At least h+1 nodes (chain).
Depth of node = number of edges from root to node. Height of node = longest path from node to a leaf (edges).
For the root, depth = 0 and height = tree height. For a leaf, height = 0.

Edge Cases and Conventions

Empty tree (root is None): No nodes, no edges. Height is conventionally −1 so that a single-node tree has height 0 (0 = 1 + max(−1, −1)).
Single node: One node, zero edges. Depth 0, height 0. It is both root and leaf.
Skew tree (chain): Every node has at most one child. Height = n − 1 (n nodes). This is the worst case for height when doing recursion (O(n) stack depth).
Level 0 vs level 1: Some definitions put the root at "level 1." In this course we use depth 0 for the root; always check the problem statement.

Why This Topic Matters

Problem statements use "depth", "height", "subtree", "ancestor" precisely. Misreading leads to wrong solutions (e.g. confusing depth and height).
Recursion: "Process the subtree at node u" means process u and recurse on left and right. Base case is often "if node is None" or "if node is leaf."
Interviews: You may be asked "what is the height of this tree?" or "how many edges are there?"—answering correctly shows you know the definitions.

Common Mistakes

Common Mistake

Confusing depth and height. Depth is "distance from root" (root has depth 0). Height is "longest path from this node down to a leaf" (leaves have height 0). The height of the tree is the height of the root, which equals the maximum depth of any node.

Empty tree: If the root is None, the tree has no nodes. Convention: height of empty tree = −1 so that height(single node) = 0.
Level vs depth: Some problems use "level" 1-based. Always confirm: "Is the root at depth 0 or 1?"

Expert Tip

When implementing "height of tree", do: if root is None return −1; else return 1 + max(height(root.left), height(root.right)). When implementing "depth", pass current depth as a parameter and increment when going to children.

Interview Insight

If asked to define terms, say: "The root has no parent. Depth is distance from root; height is the longest path from a node down to a leaf. A leaf has no children. The tree height is the root's height. A subtree at a node includes that node and all descendants." Then you can implement height/depth correctly in code.

Practice Problems

Any tree problem that uses depth, height, or subtree (e.g. max depth, min depth, count nodes).
LeetCode 104: Maximum Depth of Binary Tree; LeetCode 111: Minimum Depth of Binary Tree.

Summary

Tree: Connected, acyclic; n nodes, n−1 edges. Root: top node; parent/child: one edge direction; leaf: no children; internal: has at least one child.
Depth: edges from root to node (root depth 0). Height (of node): longest path from node to leaf (leaf height 0). Tree height = root height = max depth.
Subtree at u: u and all descendants. Ancestor/descendant: path from u down to v. Binary tree: at most two children per node (left, right). Code: TreeNode(val, left, right); empty tree height = −1.

11.2 Binary Tree Traversals

Introduction

Traversal means visiting every node of a binary tree in a well-defined order. The four main traversals are: inorder (left → root → right), preorder (root → left → right), postorder (left → right → root), and level-order (BFS, level by level). Recursive implementations are simple and follow the same pattern: handle the base case (null), then recurse on left and right with the "visit" step placed before, between, or after the two recursions. Traversals are the basis for many tree problems (serialization, BST validation, path sum, etc.).

Real-World Analogy

Imagine exploring a branching maze. Preorder: you mark the room (visit) as soon as you enter, then explore left branch, then right. Inorder: explore left first, then mark the room, then explore right—like reading a tree from "left to right" on the page. Postorder: explore left and right first, then mark the room when you're leaving—useful when you need info from children before processing the node. Level-order: visit all rooms on the first floor, then the second, then the third.

Example

Tree: root 1, left 2, right 3; node 2 has left 4, right 5. Preorder: 1, 2, 4, 5, 3. Inorder: 4, 2, 5, 1, 3. Postorder: 4, 5, 2, 3, 1. Level-order: 1, 2, 3, 4, 5.

Diagram: Tree and Traversal Order

           1         (root)
          / \
         2   3
        / \
       4   5

  Preorder  (Root → Left → Right):  1 → 2 → 4 → 5 → 3    (visit as you enter node)
  Inorder   (Left → Root → Right):  4 → 2 → 5 → 1 → 3    (visit between left and right)
  Postorder (Left → Right → Root):  4 → 5 → 2 → 3 → 1    (visit as you leave node)
  Level-order (BFS):                1 → 2 → 3 → 4 → 5    (by levels: 1; then 2,3; then 4,5)

Recursive Traversals

Base case: if root is None, return (or return []). Otherwise, the order of "visit root" relative to "recurse left" and "recurse right" defines the traversal.

Preorder (Root → Left → Right)

Visit the node first, then left subtree, then right subtree. Used for copying trees, prefix notation, or when you need the root before children.

def preorder(root):
    if not root:
        return
    print(root.val)      # visit
    preorder(root.left)
    preorder(root.right)

# Return list instead of print
def preorder_list(root):
    if not root:
        return []
    return [root.val] + preorder_list(root.left) + preorder_list(root.right)

Inorder (Left → Root → Right)

Recurse left, visit node, recurse right. In a BST, inorder gives values in sorted order. Used for BST validation and "sorted" output.

def inorder(root):
    if not root:
        return
    inorder(root.left)
    print(root.val)       # visit
    inorder(root.right)

def inorder_list(root):
    if not root:
        return []
    return inorder_list(root.left) + [root.val] + inorder_list(root.right)

Postorder (Left → Right → Root)

Recurse left, recurse right, then visit. Used when you need children processed first (e.g. height, diameter, deleting a tree).

def postorder(root):
    if not root:
        return
    postorder(root.left)
    postorder(root.right)
    print(root.val)       # visit

def postorder_list(root):
    if not root:
        return []
    return postorder_list(root.left) + postorder_list(root.right) + [root.val]

Level-Order (BFS)

Visit nodes level by level: first the root, then all nodes at depth 1, then depth 2, etc. Use a queue: enqueue root; while queue not empty, dequeue a node, visit it, enqueue its left and right (if non-null).

from collections import deque

def level_order(root):
    if not root:
        return []
    q = deque([root])
    result = []
    while q:
        node = q.popleft()
        result.append(node.val)
        if node.left:
            q.append(node.left)
        if node.right:
            q.append(node.right)
    return result

# Level-order as list of levels (each level is a list)
def level_order_by_levels(root):
    if not root:
        return []
    result = []
    q = deque([root])
    while q:
        level = []
        for _ in range(len(q)):
            node = q.popleft()
            level.append(node.val)
            if node.left:
                q.append(node.left)
            if node.right:
                q.append(node.right)
        result.append(level)
    return result   # e.g. [[1], [2, 3], [4, 5]]

Summary of Order

Traversal	Order	Typical use
Preorder	Root → Left → Right	Copy tree, prefix expr, serialize
Inorder	Left → Root → Right	BST sorted order, validate BST
Postorder	Left → Right → Root	Height, diameter, delete tree
Level-order	BFS by level	Print by level, min depth, BFS problems

Time and Space Complexity

Time: O(n) — each node visited once.
Space (recursive): O(h) call stack, where h = tree height. Worst O(n) for a skew tree.
Space (level-order queue): O(w) where w is max level width; worst O(n).

Why This Topic Matters

Most tree problems involve a traversal (or a variant). Knowing the four orders and when to use each is essential.
BST: inorder gives sorted order; preorder can reconstruct BST with a known structure.
Iterative versions (next topic) avoid stack overflow for very deep trees and are sometimes required.

Edge Cases

Empty tree (root is None): Return [] or skip; do not visit.
Single node: All four traversals output that one value (order may differ for level-order vs others).
Skew tree: Recursive traversals still O(n) time but O(n) stack depth; iterative avoids stack overflow.

Common Mistakes

Swapping left and right. Inorder is left → root → right; reversing to right → root → left gives reverse sorted order in a BST, not sorted.
Level-order: forgetting to check null children. Only enqueue left/right if they are non-null before appending to the queue.

Interview Insight

"I'll use recursive traversals: preorder visit then left then right; inorder left then visit then right (BST gives sorted); postorder left then right then visit. Level-order is BFS with a queue. All O(n) time; recursion uses O(h) stack. For iterative I'd use an explicit stack (next topic)."

Practice Problems

LeetCode 94: Binary Tree Inorder Traversal; 144: Preorder; 145: Postorder; 102: Level Order.
LeetCode 98: Validate BST (inorder gives sorted order).

Expert Tip

Remember: Pre = root first; In = root in the middle (left, root, right); Post = root last. Level-order = BFS with a queue.

Summary

Preorder: root, left, right. Inorder: left, root, right (BST → sorted). Postorder: left, right, root.
Level-order: BFS with a queue; can return flat list or list of levels.
Recursive: base case null; place "visit" before, between, or after the two recursions. Time O(n), space O(h) for recursion.

11.3 Iterative Traversal

Introduction

Iterative traversal implements inorder, preorder, and postorder using an explicit stack instead of the call stack. This avoids stack overflow on very deep trees and gives you full control over the visit order. Preorder iterative is straightforward (push root; pop, visit, push right then left). Inorder requires "go left until null, then pop/visit and go right." Postorder can be done with two stacks or by doing a "reverse preorder" (root, right, left) and reversing the result. Level-order uses a queue (already iterative). This section gives the standard iterative implementations for preorder, inorder, and postorder.

Why Iterative?

Stack overflow: Recursion uses the call stack; a very tall tree can cause stack overflow. Iterative uses an explicit stack in heap memory.
No recursion: Some environments or style guides prefer iterative code. Iterative also makes it easier to pause/resume or process in chunks.
Same complexity: Time O(n), space O(h) for the stack—same as recursion, but the stack is under your control.

Preorder (Iterative)

Order: root → left → right. Use a stack. Push the root. While the stack is not empty: pop a node, visit it, then push its right child (if any), then its left child (if any). Pushing right before left ensures we process left before right when we pop.

Diagram: Stack During Preorder

  Tree:    1
          / \
         2   3
  Preorder: 1, 2, 3

  Step 1: stack=[1]           → pop 1, visit 1, push 3 then 2  → stack=[3,2]
  Step 2: stack=[3,2]        → pop 2, visit 2, push nothing   → stack=[3]
  Step 3: stack=[3]          → pop 3, visit 3, push nothing   → stack=[]
  Done. Output: 1, 2, 3

def preorder_iterative(root):
    if not root:
        return []
    stack = [root]
    result = []
    while stack:
        node = stack.pop()
        result.append(node.val)
        if node.right:
            stack.append(node.right)
        if node.left:
            stack.append(node.left)
    return result

Inorder (Iterative)

Order: left → root → right. Idea: go as far left as possible, pushing nodes onto the stack. When you hit null, pop (that's the next node to visit), visit it, then set current to its right and repeat (go left from there).

def inorder_iterative(root):
    result = []
    stack = []
    curr = root
    while curr or stack:
        while curr:
            stack.append(curr)
            curr = curr.left
        curr = stack.pop()
        result.append(curr.val)
        curr = curr.right
    return result

How it works: The inner while curr pushes all left descendants. When curr becomes None, we've reached the leftmost node; we pop it (visit), then move to its right subtree and repeat. This mimics "recurse left, visit, recurse right" without recursion.

Postorder (Iterative)

Order: left → right → root. One clean approach: do a modified preorder that visits root, then right, then left. The resulting sequence is the reverse of postorder. So run that and reverse the result.

def postorder_iterative(root):
    if not root:
        return []
    stack = [root]
    result = []
    while stack:
        node = stack.pop()
        result.append(node.val)
        if node.left:
            stack.append(node.left)
        if node.right:
            stack.append(node.right)
    return result[::-1]   # reverse: now left, right, root

Alternatively, use two stacks: push root to stack1; while stack1 not empty, pop to stack2 and push left then right to stack1. Then pop everything from stack2—that's postorder. Or use a single stack with a "last visited" pointer to avoid revisiting; the reverse-preorder trick is usually simpler to remember.

Summary Table

Traversal	Iterative idea
Preorder	Stack: pop, visit, push right, push left
Inorder	Go left pushing nodes; pop & visit; go to right
Postorder	Preorder root→right→left, then reverse

Time and Space Complexity

Time: O(n) — each node pushed and popped once.
Space: O(h) for the stack, h = tree height. Worst O(n) for a skew tree.

Common Mistakes

Preorder: pushing left before right. Then the left child would be popped after the right. We want to visit left first, so push right then left (so left is on top of the stack).
Inorder: forgetting to go right after pop. After visiting the node, set curr = curr.right to process the right subtree (or exit if null).
Postorder: wrong reverse. The trick is "preorder but root → right → left"; then reverse. If you do normal preorder (root → left → right) and reverse, you get reverse postorder, not postorder.

Interview Insight

If asked "traverse without recursion," say: "I'll use an explicit stack. Preorder: push root, then while stack not empty pop, visit, push right then left. Inorder: go left pushing nodes until null, then pop and visit, then go right. Postorder: modified preorder root→right→left then reverse." Mention that space is still O(h) and time O(n).

Practice Problems

LeetCode 94: Binary Tree Inorder Traversal (iterative).
LeetCode 144: Binary Tree Preorder Traversal (iterative).
LeetCode 145: Binary Tree Postorder Traversal (iterative).

Expert Tip

Preorder iterative: push right before left so left is popped first. Postorder: do "preorder" but push left then right, then reverse the output. Inorder: "go left with stack, pop and visit, then go right."

Summary

Preorder iterative: stack with root; pop → visit → push right, then left.
Inorder iterative: while curr or stack: go left pushing nodes; pop (visit); curr = curr.right.
Postorder iterative: preorder but push left then right; reverse result. All O(n) time, O(h) space.

11.4 Height & Diameter

Introduction

The height of a tree (or a node) is the number of edges on the longest path from that node down to a leaf. The diameter (or width) of a tree is the length of the longest path between any two nodes (measured in edges). Both are computed naturally with a postorder traversal: you need information from the children (their heights) before you can compute the current node's height or the best path through the current node. Height is a building block for balance checks and many other tree problems; diameter is a classic interview question (LeetCode 543). This section gives recursive definitions, code, and examples.

Formal Definitions

Height of a node: height(node) = 0 if the node is a leaf (no children). Otherwise height(node) = 1 + max(height(left_child), height(right_child)). For a null node we define height(null) = -1 so that a single-node tree has height 0. Height of the tree = height of the root.

Diameter: The diameter is the maximum over all pairs of nodes (u, v) of the length (in edges) of the unique path between u and v. Equivalently, it is the maximum over all nodes of the length of the longest path that passes through that node. For a node with left height L and right height R (in edges), the longest path through that node has length L + R + 2 (two edges from the node to the two children, then L and R edges down). So we can compute diameter by considering at each node the candidate through = L_h + R_h + 2 and taking the max over all nodes and over the diameters of the left and right subtrees.

Height (Recursive Definition)

Height of a node: Longest path (in edges) from that node to a leaf. Leaves have height 0. For an internal node, height = 1 + max(height(left_child), height(right_child)). Height of the tree = height of the root. Convention: height of an empty tree (null) = −1, so a single-node tree has height 0.

def height(root):
    if root is None:
        return -1
    return 1 + max(height(root.left), height(root.right))

Example: Height

Tree: 1 (root) with left 2 and right 3; node 2 has left 4 and right 5. Leaves 3, 4, 5 have height 0. Node 2 has height 1 + max(0, 0) = 1. Node 1 has height 1 + max(1, 0) = 2. So tree height = 2.

Diameter (Longest Path Between Any Two Nodes)

The diameter is the maximum number of edges on any path between two nodes. That path may or may not pass through the root. For each node, the longest path through that node is: left_height + right_height (if we use edge-based heights: the path goes from a leaf in the left subtree up to this node and down to a leaf in the right subtree, so two "legs" of length left_height and right_height). So we can compute at every node the candidate diameter "through this node" = left_height + right_height, and take the maximum over all nodes. We also need to compare with the best diameter entirely in the left or right subtree.

Recursive approach: Return both (height, diameter) for the subtree. Base case: null → height = −1, diameter = 0. At a node: get (L_height, L_diam) and (R_height, R_diam). Current height = 1 + max(L_height, R_height). Candidate diameter through this node = (L_height + 1) + (R_height + 1) in terms of edges = L_height + R_height + 2? No: if height is "number of edges to leaf," then the path from a left leaf to this node has L_height + 1 edges (from node to left leaf). So path through node = (L_height + 1) + (R_height + 1) = L_height + R_height + 2. Actually the standard definition: left_height and right_height are already "max edges from left/right child to a leaf in that subtree." So from current node, the longest path down the left has length L_height + 1 (one edge to left child, then L_height edges). So the path "through" the node connecting a left leaf and a right leaf has (L_height + 1) + (R_height + 1) = L_height + R_height + 2 edges. So diameter_through = L_height + R_height + 2. Then diameter at this node = max(L_diam, R_diam, L_height + R_height + 2).

Many sources define "height" as the number of nodes on the path (so leaf = 1). Then diameter through node = L_height + R_height - 1 or similar. The most common convention in coding problems is edges: height(null) = -1, height(leaf) = 0. Then:

Path through node (edges) = (L_height + 1) + (R_height + 1) = L_height + R_height + 2.

Some problems define diameter as the number of nodes on the longest path. Then path through node = 1 + L_height + R_height (current node + left path + right path in "node count" height). Always clarify "edges" vs "nodes" in the problem.

Implementation: Diameter (Edge-Based)

def diameter_of_binary_tree(root):
    def dfs(node):
        if node is None:
            return -1, 0   # height, diameter
        L_h, L_d = dfs(node.left)
        R_h, R_d = dfs(node.right)
        height = 1 + max(L_h, R_h)
        through = L_h + R_h + 2   # edges through this node
        diam = max(L_d, R_d, through)
        return height, diam

    if root is None:
        return 0
    _, d = dfs(root)
    return d

Example: Diameter

Same tree: 1 (root), left 2 (with children 4, 5), right 3. Heights: 4,5,3 → 0; 2 → 1; 1 → 2. Longest path: from 4 to 5 (or 4 to 3, or 5 to 3). Path 4 → 2 → 5 has 2 edges. Path 4 → 2 → 1 → 3 has 3 edges. So diameter = 3. In the code: at node 1, L_h=1, R_h=0, through = 1+0+2 = 3; at node 2, through = 0+0+2 = 2. So max diameter = 3.

Diagram: Diameter "Through" a Node

           1
          / \
         2   3       At node 1: left height L_h=1 (path 2→4 or 2→5), right height R_h=0 (node 3).
        / \          Path THROUGH node 1 = (edge 1→2) + (edge 1→3) + (path in left) + (path in right)
       4   5               = 1 + 1 + L_h + R_h  =  2 + 1 + 0  =  3 edges.  So diameter ≥ 3.

  Longest path in tree:  4 --- 2 --- 1 --- 3   (edges: 4-2, 2-1, 1-3)  → length 3 ✓

Why This Topic Matters

Height is used everywhere: balance factor (AVL), level-order, "minimum depth," and as a subroutine for diameter.
Diameter is a classic problem (e.g. LeetCode 543). The pattern "postorder, combine left and right info" is reusable.
Getting the base case and the "through node" formula right (edges vs nodes) is a common interview pitfall.

Time and Space Complexity

Time: O(n) — each node visited once. We do a constant amount of work per node (compare, add, max).
Space: O(h) for the call stack, h = tree height. Worst case O(n) for a skew tree.

Edge Cases

Empty tree (root is None): Diameter 0; height −1. Return 0 for diameter from the wrapper if needed.
Single node: Height 0, diameter 0 (no path of length > 0 between two nodes).
Only one child: At a node with only a left child, right height = −1; path through = L_h + (−1) + 2 = L_h + 1. Still correct.

Common Mistakes

Common Mistake

Counting nodes vs edges. If the problem says "diameter = longest path" and measures in nodes, then at each node the path through it is 1 + L_height + R_height (if height is "number of nodes to leaf"). If it measures in edges, use L_height + R_height + 2 with edge-based height (null = −1). Check the problem statement.

Wrong base case for height: Use −1 for null so that height(leaf) = 0. If you return 0 for null, then height(leaf) becomes 1 and all formulas shift.
Forgetting to consider diameter inside subtrees: The longest path might not pass through the root. So diameter = max(left_diam, right_diam, through_current).

Expert Tip

For diameter, use a helper that returns (height, diameter). Height = 1 + max(L_h, R_h). Diameter through node = L_h + R_h + 2 (edges). Overall diameter = max(L_d, R_d, through). Return (height, diameter) from the helper.

Interview Insight

State clearly: "I'll use postorder so I have both children's heights. For each node I'll compute the longest path through it as left height plus right height plus two (for the two edges to the children). The diameter is the max of that over all nodes and the diameters in the subtrees." Then code the dfs returning (height, diameter).

Practice Problems

LeetCode 543: Diameter of Binary Tree (edge-based diameter).
LeetCode 110: Balanced Binary Tree (use height; check |left_h − right_h| ≤ 1 and recurse).

Summary

Height: null → −1; else 1 + max(height(left), height(right)). Tree height = height(root).
Diameter: longest path (in edges) between any two nodes. At each node, path through = L_height + R_height + 2; diameter = max(L_diam, R_diam, through).
Use postorder; return (height, diameter) from helper. O(n) time, O(h) space. Clarify edges vs nodes in the problem.

11.5 Balanced Trees

Introduction

A balanced binary tree is one in which for every node, the heights of its left and right subtrees differ by at most 1. Formally: |height(left) − height(right)| ≤ 1, and both left and right subtrees are themselves balanced. A balanced tree has height O(log n), so operations that depend on height (e.g. search in a BST) stay efficient. Checking whether a tree is balanced is a classic problem (LeetCode 110): compute height recursively and at each node verify the balance condition; if any node is unbalanced, return false (or a sentinel like −1). This section gives the definition, the O(n) check algorithm, and code.

Definition

A binary tree is height-balanced if:

For every node, |height(left_subtree) − height(right_subtree)| ≤ 1.
Both the left and right subtrees are height-balanced (recursive definition).

An empty tree is balanced (by convention). A single node is balanced (height 0; both children have height −1, difference 0).

Diagram: Balanced vs Unbalanced

  Balanced (|L_h - R_h| ≤ 1):     Unbalanced (at root: L_h=2, R_h=0, diff=2):
       1                               1
      / \                             /
     2   3                           2
    /                                \
   4                                 3
  (at 1: L_h=1, R_h=0, |1-0|=1 ✓)      (at 1: L_h=2, R_h=-1, |2-(-1)|=3 ✗)

Algorithm: Check Balance and Return Height

Use a helper that returns the height of the subtree if it is balanced, and a sentinel value (e.g. −1 or −∞) if any subtree is unbalanced. Then the tree is balanced iff the helper returns a non-sentinel at the root. This avoids computing height separately from balance and keeps one O(n) pass.

Base case: null → return 0 (or −1 if you use edge-based height; then "unbalanced" sentinel can be −2 or use a tuple (is_balanced, height)).
Recurse on left and right. If either returns the sentinel, return sentinel (unbalanced somewhere below).
If |L_height − R_height| > 1, return sentinel.
Otherwise return 1 + max(L_height, R_height).

Common convention: use edge-based height (null = −1). So sentinel = −2 or any value that means "invalid." Then balanced check: if helper returns −2, not balanced.

def is_balanced(root):
    def height_if_balanced(node):
        if node is None:
            return -1   # edge-based: null has height -1
        L = height_if_balanced(node.left)
        R = height_if_balanced(node.right)
        if L == -2 or R == -2:
            return -2   # already found unbalanced subtree
        if abs(L - R) > 1:
            return -2
        return 1 + max(L, R)

    return height_if_balanced(root) != -2

Example

Balanced: Root with two children (both leaves). Left height 0, right height 0, difference 0. Root height 1. OK. Unbalanced: Root with only a left child; left child has only a left child (chain). At root: left height 1, right height −1, difference 2 > 1 → not balanced.

Alternative: Return (is_balanced, height)

Some prefer returning a tuple so the meaning is explicit:

def is_balanced_v2(root):
    def check(node):
        if node is None:
            return True, -1
        ok_left, h_left = check(node.left)
        ok_right, h_right = check(node.right)
        if not ok_left or not ok_right or abs(h_left - h_right) > 1:
            return False, 0   # height irrelevant
        return True, 1 + max(h_left, h_right)

    return check(root)[0]

Time and Space Complexity

Time: O(n) — each node visited once; we don't recompute height for the same node.
Space: O(h) for the call stack.

Why This Topic Matters

AVL and Red-Black trees (later topics) maintain balance so that height stays O(log n) and operations remain O(log n).
LeetCode 110: Balanced Binary Tree is a common interview question. The "return sentinel if unbalanced" trick is the standard O(n) solution.

Common Mistakes

Computing height in a separate pass. That gives O(n) per node and total O(n²) for a skew tree. Use one helper that returns either height or a sentinel so each node is visited once.
Wrong sentinel value. Use a value that can't be a valid height (e.g. −2 when null returns −1). Check if L == -2 or R == -2 before checking abs(L - R) > 1.

Interview Insight

"I'll use a helper that returns the height if the subtree is balanced, and −2 (or similar) if not. For null I return −1. For a node I recurse left and right; if either is −2 or |L−R|>1 I return −2; else return 1+max(L,R). The tree is balanced iff the helper doesn't return the sentinel."

Practice Problems

LeetCode 110: Balanced Binary Tree.
LeetCode 108: Convert Sorted Array to BST (build a balanced BST from sorted array).

Expert Tip

Use one helper that returns height when balanced and a sentinel (e.g. −2) when not, so you only traverse once. Check abs(L - R) <= 1 and that both L and R are not the sentinel.

Summary

Balanced: For every node, |height(left) − height(right)| ≤ 1, and both subtrees are balanced.
Check: Helper returns height if balanced, sentinel (e.g. −2) if not. Tree balanced iff root returns non-sentinel. O(n) time, O(h) space.

11.6 Lowest Common Ancestor

Introduction

The Lowest Common Ancestor (LCA) of two nodes p and q in a binary tree is the deepest node that has both p and q as descendants (a node can be a descendant of itself). It appears in many problems: distance between two nodes, path queries, and tree-based structures. For a general binary tree, the standard approach is a single recursive pass: if the current node is null or equals p or q, return it; otherwise recurse on left and right; if both sides return non-null, the current node is the LCA; otherwise return whichever side is non-null. For a BST, we can use the key order to descend left or right. This section gives both the general-tree and BST solutions.

Definition

LCA(p, q): The node that is an ancestor of both p and q and has the greatest depth (is "lowest" in the tree). By convention, if one of p or q is an ancestor of the other, that node is the LCA (e.g. LCA(root, leaf) = root when the leaf is in the tree).

Recursive Solution (General Binary Tree)

Idea: At each node, if the node is None or equals p or q, return the node. Recurse on left and right. If both left and right return non-null, the current node is the LCA (p and q lie in different subtrees). If only one side is non-null, that side holds the LCA (or one of p, q); propagate it up.

def lowest_common_ancestor(root, p, q):
    if root is None or root is p or root is q:
        return root
    left = lowest_common_ancestor(root.left, p, q)
    right = lowest_common_ancestor(root.right, p, q)
    if left and right:
        return root   # p and q in different subtrees
    return left if left else right

Example

Tree: 1 (root), left 2 (children 4, 5), right 3. LCA(4, 5) = 2 (both in left subtree of 1). LCA(4, 3) = 1 (4 in left, 3 in right). LCA(4, 2) = 2 (2 is ancestor of 4; return 2 when we hit 2).

Diagram: LCA Examples

           1
          / \
         2   3
        / \
       4   5

  LCA(4, 5):  Both in subtree of 2  →  LCA = 2
  LCA(4, 3):  4 in left subtree of 1, 3 in right  →  LCA = 1
  LCA(4, 2):  2 is an ancestor of 4  →  when we visit 2, we return 2 (root is p or q)  →  LCA = 2
  LCA(1, 5):  1 is ancestor of 5  →  LCA = 1

Why This Works

When we return a non-null value from a subtree, it means "this subtree contains at least one of p or q (or their LCA)." If both left and right return non-null, p and q must be in different subtrees, so the current node is the LCA. If only one side is non-null, the LCA is in that subtree (or the returned value is p or q itself); we just pass it up.

BST: Use Key Order

In a BST, if both p.val and q.val are less than root.val, the LCA is in the left subtree. If both are greater, it's in the right subtree. Otherwise (one ≤ root.val ≤ the other, or root is one of p, q), the root is the LCA.

def lowest_common_ancestor_bst(root, p, q):
    if p.val > q.val:
        p, q = q, p
    while root:
        if root.val < p.val:
            root = root.right
        elif root.val > q.val:
            root = root.left
        else:
            return root

Time and Space Complexity

General tree: O(n) time, O(h) space (call stack).
BST (iterative): O(h) time, O(1) space.

Common Mistakes

Assuming both p and q exist in the tree. The classic problem assumes they exist. If they might not, you need to verify both are found (e.g. count how many of p, q were seen in the subtree) and return None if only one is present.
BST: not normalizing p and q. Ensure p.val ≤ q.val (swap if needed) so the condition "root between p and q" is simply p.val ≤ root.val ≤ q.val.

Interview Insight

"For a general tree I'll recurse: if root is None or p or q I return root. Then I get left and right results. If both are non-null, root is the LCA. Otherwise I return whichever is non-null. For BST I'll iterate: go left if root.val > q.val, right if root.val < p.val; when root is between p and q (or equals one), that's the LCA."

Practice Problems

LeetCode 236: Lowest Common Ancestor of a Binary Tree.
LeetCode 235: Lowest Common Ancestor of a BST.
Distance between two nodes: dist = depth(p) + depth(q) − 2*depth(LCA).

Expert Tip

General tree: "if root in (None, p, q) return root; recurse left and right; if both non-null return root else return left or right." BST: descend left or right by comparing keys until root lies between p and q (inclusive).

Summary

LCA = deepest node that has both p and q as descendants. Node is its own descendant.
General tree: Recurse; return root if root in (None, p, q); else if both left and right non-null return root, else return the non-null side.
BST: Iterative or recursive: go left if both keys < root, right if both > root; else root is LCA. O(h) for BST, O(n) for general.

11.7 Path Sum Problems

Introduction

Path sum problems ask whether (or how many, or which) paths in a binary tree have a given sum. Common variants: (1) Root-to-leaf: Does any path from root to a leaf sum to target? (2) Return all root-to-leaf paths that sum to target. (3) Path sum III: Count paths where the sum equals target—paths can start and end at any node (not necessarily root or leaf). The first two use simple recursion with a running sum; the third uses a prefix-sum + hash table idea similar to "subarray with given sum." This section covers all three patterns.

Variant 1: Has Path Sum (Root to Leaf)

Problem: Given root and targetSum, return true if there exists a root-to-leaf path such that the sum of node values equals targetSum. At each node, subtract the node's value from the remaining target; at a leaf, check if the remaining is 0. Recurse on left and right with the new target.

def has_path_sum(root, target_sum):
    if root is None:
        return False
    rem = target_sum - root.val
    if root.left is None and root.right is None:
        return rem == 0
    return has_path_sum(root.left, rem) or has_path_sum(root.right, rem)

Example

Tree: 5 (root), left 4 (left 11 with children 7, 2), right 8 (right 4). Target 22. Path 5 → 4 → 11 → 2 has sum 5+4+11+2 = 22. So return true.

Variant 2: All Paths With Sum (Root to Leaf)

Return a list of all root-to-leaf paths where the path sum equals targetSum. Use DFS: maintain current path (list) and running sum; at a leaf, if sum equals target, append a copy of the path to the result. Backtrack by popping after recursing.

def path_sum_ii(root, target_sum):
    result = []

    def dfs(node, rem, path):
        if node is None:
            return
        path.append(node.val)
        rem -= node.val
        if node.left is None and node.right is None and rem == 0:
            result.append(path[:])
        dfs(node.left, rem, path)
        dfs(node.right, rem, path)
        path.pop()

    if root:
        dfs(root, target_sum, [])
    return result

Diagram: Root-to-Leaf vs Any Path

  Tree:        5
              / \
             4   8
            /   / \
          11   13  4
          / \       \
         7   2       1

  Root-to-leaf (Variant 1 & 2):  Path must start at root and end at a leaf.
    Example: 5 → 4 → 11 → 2  (sum 22) ✓

  Any path (Variant 3):  Path can start and end at any node (parent → descendant).
    Example: 11 → 7  (sum 18), or 5 → 4 → 11 (sum 20). We use prefix sum on the
    current path: at node 11, running_sum = 5+4+11 = 20; if target=20, count += seen[0].

Variant 3: Count Paths With Sum (Any Start/End)

Paths can start and end at any node (parent to descendant). Same idea as "subarray with given sum": maintain a running sum from the root as we traverse, and at each node check how many times (running_sum - target) has been seen (prefix sums along the current path). Use a dict mapping prefix_sum → count; when going down, add current prefix to the dict; when going up, remove it (backtrack).

def path_sum_iii(root, target_sum):
    from collections import defaultdict
    count = defaultdict(int)
    count[0] = 1

    def dfs(node, running):
        if node is None:
            return 0
        running += node.val
        ans = count.get(running - target_sum, 0)
        count[running] += 1
        ans += dfs(node.left, running)
        ans += dfs(node.right, running)
        count[running] -= 1
        return ans

    return dfs(root, 0)

Why count[0] = 1? So when running_sum == target_sum, we have running_sum - target_sum == 0, and we count one path (from root to current node).

Edge Cases (Path Sum)

Empty tree: hasPathSum(empty, anything) = false; path_sum_ii(empty, k) = []; path_sum_iii(empty, k) = 0.
Single node: Root-to-leaf path is just the root; check if root.val == target (or rem == 0 after subtracting).
Target is zero: A path with sum 0 exists if some root-to-leaf path sums to 0 (e.g. tree with negative values that cancel). For variant 3, count[0]=1 counts "path from root to current node" when running_sum == target.
Negative values: All three variants work with negative node values; variant 3's prefix-sum approach is correct as long as we backtrack the map when leaving a node.

Pattern Recognition

Use root-to-leaf path sum when the problem says "path from root to leaf" or "root-to-leaf sum." Use prefix sum on the current path + hash table when paths can start and end at any node (variant 3). The key is: at each node, "how many paths ending here have sum target?" = count of prefix sums equal to (running_sum − target).

Time and Space Complexity

Variant 1 & 2: O(n) time. Space O(h) for recursion; variant 2 also O(n) for storing paths in the worst case.
Variant 3: O(n) time, O(h) space for recursion and the prefix-sum map (at most h keys on current path).

Expert Tip

Root-to-leaf: pass (remaining target) down; at leaf check rem == 0. For "all paths," backtrack with path.append/pop. For "any path" count: prefix sum on the current path + hash table; initialize count[0]=1 and backtrack the map when leaving the node.

Summary

Root-to-leaf sum: Recurse with (target - node.val); at leaf return (rem == 0).
All root-to-leaf paths: DFS with path list; at leaf if sum matches append path[:]; backtrack with pop.
Count paths (any start/end): Prefix sum along path + dict; count[running - target]; backtrack count when leaving. O(n) time.

11.8 Serialize & Deserialize

Introduction

Serialize converts a binary tree into a string (e.g. for storage or transmission); deserialize reconstructs the tree from that string. The format must encode both values and structure. A common approach is preorder with a marker for null children (e.g. "null" or "#"): we can then reconstruct uniquely by reading tokens left to right and building the tree recursively. LeetCode 297 is the classic problem. This section covers preorder-based serialize/deserialize with a simple format.

Format: Preorder with Null Markers

Serialize: preorder traversal; for each node output its value and a separator; for null output a sentinel (e.g. "null"). Example: tree 1(2, 3) with 2 having left 4 → "1,2,4,null,null,null,3,null,null". Deserialize: split the string into a list of tokens; consume one token at a time; if it's the null marker, return None; otherwise create a node, then recursively build left and right (consuming tokens in preorder order).

Diagram: Serialize (Preorder + Null)

  Tree:      1
            / \
           2   3
          /
         4

  Preorder visit: 1 → 2 → 4 → (null) → (null) → (null) → 3 → (null) → (null)
  Serialized:     "1,2,4,null,null,null,3,null,null"

  Reading back:  first token 1 → root; 2 → left of 1; 4 → left of 2;
  null → right of 4 is None; null → right of 2 is None; null → left of 3 is None;
  3 → right of 1; null, null → children of 3.  Tree restored.

class Codec:
    def serialize(self, root):
        if root is None:
            return "null"
        left = self.serialize(root.left)
        right = self.serialize(root.right)
        return str(root.val) + "," + left + "," + right

    def deserialize(self, data):
        tokens = data.split(",")
        self.i = 0

        def build():
            if self.i >= len(tokens) or tokens[self.i] == "null":
                self.i += 1
                return None
            node = TreeNode(int(tokens[self.i]))
            self.i += 1
            node.left = build()
            node.right = build()
            return node

        return build()

Example

Tree: 1 (root), left 2, right 3. Serialize: "1,2,null,null,3,null,null". Deserialize: first token 1 → root; next 2 → left child; next null → left of 2 is None; next null → right of 2 is None; next 3 → right of root; then null, null for 3's children. Tree is restored.

Why Preorder Works

Preorder (root, left, right) encodes the structure: when we read back, the first token is the root; the next tokens form the left subtree (until we've consumed the same "shape" as the left), then the right. Null markers tell us when a subtree ends, so we don't need extra delimiters.

Time and Space Complexity

Serialize: O(n) time and space (string length O(n)).
Deserialize: O(n) time (each token read once), O(n) space for the tree and token list.

Common Mistakes

Forgetting null markers for missing children. If you only output values in preorder without "null," you cannot tell whether a value is a left child, right child, or root of a subtree when reading back. The nulls define the structure.
Deserialize: not advancing the index. Each call to build() must consume exactly one token (either a value or "null"). Increment the index (or use an iterator) so the next call sees the next token.
Using a different separator or format. Serialize and deserialize must agree on the format (e.g. comma, "null" spelling).

Interview Insight

Say: "I'll use preorder and output 'null' for missing children so the structure is uniquely recoverable. For deserialize I'll split the string and use a single global index (or iterator) that advances as we consume tokens. Each node consumes one token; if it's 'null' we return None, else we build the node and recurse for left and right."

Practice Problems

LeetCode 297: Serialize and Deserialize Binary Tree.
Variants: use different delimiters or binary format; serialize to a list instead of string.

Expert Tip

Use preorder + "null" for missing children. Serialize: if null return "null"; else return str(val) + "," + serialize(left) + "," + serialize(right). Deserialize: split by comma; use an index (or iterator) to consume one token per call; if "null" return None and advance; else build node and set left/right by recursing.

Summary

Serialize: Preorder; output value or "null"; comma-separated. Deserialize: split tokens; build in preorder (consume one token per node; "null" → None).
One-to-one mapping between tree and string. O(n) time and space for both.

11.9 Binary Search Tree

Introduction

A Binary Search Tree (BST) is a binary tree where for every node, all values in the left subtree are less than the node's value, and all values in the right subtree are greater (often defined as left ≤ root < right or strict inequality, depending on problem). This property gives inorder traversal = sorted order. Search, insert, and delete can be done in O(h) time where h is height (O(log n) if balanced). BSTs support efficient lookup, range queries, and ordered iteration. This section covers the BST property, search/insert, validation, and delete (concept).

Formal Definition

A binary tree is a BST if for every node n: (1) every key in the left subtree of n is strictly less than n.key (or ≤ if duplicates allowed on the left), and (2) every key in the right subtree of n is strictly greater than n.key. An empty tree is a BST. This invariant implies that an inorder traversal visits keys in non-decreasing order.

BST Property

For every node with value val: all nodes in the left subtree have value < val; all nodes in the right subtree have value > val. (Duplicate handling: some definitions use ≤ on one side.) As a result, inorder (left → root → right) visits nodes in ascending order.

Search

Compare target with root: if equal, found; if target < root.val, search left; else search right. If we reach null, not found. O(h) time.

def search_bst(root, target):
    if root is None or root.val == target:
        return root
    if target < root.val:
        return search_bst(root.left, target)
    return search_bst(root.right, target)

Insert

Find the position where the key would be found (search); insert a new node as a leaf there. If key < root.val go left (if left is null, attach new node); else go right. O(h) time.

def insert_bst(root, val):
    if root is None:
        return TreeNode(val)
    if val < root.val:
        root.left = insert_bst(root.left, val)
    else:
        root.right = insert_bst(root.right, val)
    return root

Validate BST

Check that every node lies in an allowed range (min, max). Root is in (-∞, +∞); left child must be in (min, root.val); right child in (root.val, max). Recurse with updated bounds. O(n) time.

def is_valid_bst(root):
    def check(node, lo, hi):
        if node is None:
            return True
        if not (lo < node.val < hi):
            return False
        return check(node.left, lo, node.val) and check(node.right, node.val, hi)

    return check(root, float("-inf"), float("inf"))

Example

Valid BST: 5 (root), left 3, right 7; 3 has left 1. Inorder: 1, 3, 5, 7 (sorted). Invalid: 5 with left 6 (6 > 5 violates left-subtree rule).

Diagram: BST Property

  Valid BST:                    Invalid (6 > 5 in left subtree):
       5                              5
      / \                            / \
     3   7                          6   7
    /     (all left < 5,            ↑
   1       all right > 5)          left subtree must be < 5

  Inorder (L→root→R): 1, 3, 5, 7  (always sorted in a BST)

Delete (Concept)

To delete a node: (1) If it's a leaf, remove it. (2) If it has one child, replace it with that child. (3) If it has two children, replace its value with the inorder successor (smallest in right subtree) or inorder predecessor (largest in left subtree), then delete that successor/predecessor node (which has at most one child). O(h) time.

Edge Cases (BST)

Empty tree: search/insert/validate on null root; return None, new node, or true (empty tree is valid BST) as appropriate.
Single node: Valid BST; search returns the node if key matches; insert replaces or adds as child per implementation.
Duplicates: Problem may allow left ≤ root < right or strict inequality. Validate and insert logic must match (e.g. allow left or right for equal keys).
Integer overflow in validate: Using float('inf') for (lo, hi) avoids overflow; or use None to mean "no bound."

Time and Space Complexity

Search, insert, delete: O(h) time; O(h) space for recursion. h = height (O(log n) if balanced, O(n) worst).
Validate: O(n) time, O(h) space.

Expert Tip

BST: left < root < right ⇒ inorder = sorted. Search/insert: compare with root and go left or right. Validate: pass (min, max) range; left in (min, root.val), right in (root.val, max).

Interview Insight

"BST: every node has left subtree < root < right subtree; inorder gives sorted order. Search and insert: compare with root, recurse left or right; O(h). To validate I pass (lo, hi) and tighten: left gets (lo, root.val), right gets (root.val, hi). For delete with two children I use inorder successor or predecessor."

Practice Problems

LeetCode 98: Validate Binary Search Tree.
LeetCode 700: Search in a BST.
LeetCode 701: Insert into a BST.
LeetCode 450: Delete Node in a BST.

Summary

BST property: Left subtree < root < right subtree ⇒ inorder gives sorted order.
Search/insert: Compare with root; recurse left or right; O(h). Delete: replace with successor or predecessor, then delete that node.
Validate: Check each node is in (lo, hi); tighten range for left/right. O(n).

11.10 AVL Tree

Introduction

An AVL tree is a self-balancing BST where for every node, the heights of the left and right subtrees differ by at most 1. The balance factor of a node is height(left) − height(right); in an AVL tree it is −1, 0, or 1. After insert or delete, we may need to rotate to restore this property. Rotations (single: left/right; double: left-right/right-left) rearrange nodes so the tree stays balanced while preserving the BST order. AVL guarantees O(log n) height, so search, insert, and delete are O(log n). This section covers the balance factor, rotations, and when to apply them.

Balance Factor

Balance factor (BF) = height(left subtree) − height(right subtree). AVL invariant: |BF| ≤ 1 for every node. If after an insert/delete some node has BF = 2 or −2, we fix it by rotating at that node (or a descendant).

Rotations

Right rotation (RR): Used when the left subtree is too tall (BF = 2). The left child becomes the new root of the subtree; the old root becomes its right child; the previous left child's right subtree becomes the old root's left subtree. Preserves BST order.

Left rotation (LL): Used when the right subtree is too tall (BF = −2). The right child becomes the new root; the old root becomes its left child; the previous right child's left subtree becomes the old root's right subtree.

Diagram: Left Rotation (LL)

  Before (right-heavy, BF=-2):    After left rotate at 1:
       1                               2
        \                             / \
         2                           1   3
          \
           3
  Node 2 becomes new root; 1 becomes left child of 2; 2's old left (none) becomes 1's right.

Left-Right (LR): When BF = 2 and the left child has BF = −1 (left's right subtree is taller). First left-rotate the left child, then right-rotate the current node.

Right-Left (RL): When BF = −2 and the right child has BF = 1. First right-rotate the right child, then left-rotate the current node.

def height(node):
    return -1 if node is None else 1 + max(height(node.left), height(node.right))

def right_rotate(z):
    y = z.left
    z.left = y.right
    y.right = z
    return y

def left_rotate(z):
    y = z.right
    z.right = y.left
    y.left = z
    return y

Example

Insert 1, 2, 3 in order into an AVL. After 1, 2: BF(root)=−1. After 3: right subtree of root grows; BF(root)=−2. Apply left rotation at root: node 2 becomes root, 1 is its left child, 3 is its right child. Tree is balanced.

Insert and Delete (Concept)

Insert: Insert as in BST; then walk back up the path to the root. At each node, recompute height and BF. If |BF| = 2, apply the appropriate rotation (LL, RR, LR, RL) once; the subtree height often decreases so ancestors may need no further fix (in standard AVL, one or two rotations suffice per insert).

Delete: Delete as in BST; then rebalance along the path from the deleted node to the root, applying rotations when |BF| = 2.

Time and Space Complexity

Search, insert, delete: O(log n) time (height is O(log n)). Space O(log n) for recursion.
Rotation: O(1) per rotation; at most O(1) rotations per insert/delete along the path.

Common Mistakes

Applying the wrong rotation. BF = 2 with left child's BF = 1 → single right rotation (RR). Left child's BF = −1 → double rotation (LR: left rotate left child, then right rotate node). Similarly for BF = −2 check right child's BF for LL vs RL.
Forgetting to update heights after rotation. After any rotation, recompute heights of the nodes that moved; then propagate height updates up the path to the root.

Interview Insight

You usually don't implement full AVL in an interview; explaining the idea is enough: "AVL keeps |BF| ≤ 1. After insert we fix with single (RR/LL) or double (LR/RL) rotation. I'd need to update heights and check the child's BF to choose the rotation." If asked to code, implement height() and one rotation (e.g. left rotate) and describe the rest.

Practice Problems

LeetCode 1382: Balance a BST (can use inorder + rebuild, or AVL-style rotations).
Concept: implement insert with rebalance; or explain when to use RR, LL, LR, RL.

Expert Tip

BF = height(left) − height(right). BF = 2: left heavy → RR or LR (check left child's BF). BF = −2: right heavy → LL or RL (check right child's BF). After rotation, update heights and continue up.

Summary

AVL: BST with |balance factor| ≤ 1 at every node. BF = height(left) − height(right).
Rebalance: Single rotations (RR when left-heavy, LL when right-heavy); double rotations (LR, RL) when the taller child is skewed the other way.
Insert/delete: BST step then rebalance along path to root. O(log n) per operation.

11.11 Red-Black Tree

Introduction

A Red-Black tree is a self-balancing BST where each node has an extra color (red or black). Invariants on colors and "black height" guarantee that the tree stays roughly balanced, so height is O(log n) and search, insert, and delete are O(log n). Red-black trees are used in many standard libraries (e.g. std::map in C++) because they balance well in practice and rebalancing after insert/delete involves a bounded number of rotations and recolorings. This section covers the invariants and the high-level idea of insert/delete fix-up.

Invariants (Rules)

Every node is either red or black.
The root is black.
Leaves (null pointers) are considered black.
A red node has only black children (no two reds in a row on any path).
For every node, all simple paths from that node down to descendant leaves contain the same number of black nodes (called "black height").

From these, the longest path (alternating red–black) is at most twice the shortest (all black), so height ≤ 2·log₂(n+1) = O(log n).

Insert (High Level)

Insert as in a BST and color the new node red. This may violate "red has black children" (if the parent is red) or "root is black" (if we inserted the root). Fix-up: Walk up from the new node. If the current node is red and its parent is red, we have a "double red." Depending on the color of the uncle (parent's sibling): (1) If uncle is red: recolor parent and uncle to black, grandparent to red, and continue from grandparent. (2) If uncle is black (or null): rotate (and possibly recolor) so the double red is resolved—either a single rotation (LL/RR style) or a double rotation (LR/RL style), then recolor. After fix-up, ensure the root is black.

Delete (High Level)

Delete as in a BST. If the removed node was black, the "black height" on some path decreases; fix-up restores it by recolorings and rotations. The sibling of the node that "lost" a black is used to push a black down or rotate. Details are more involved than insert; the idea is to propagate the "deficit" up until we can fix it with a rotation and recolor.

Example

After inserting a red node under a red parent: if the uncle is red, we recolor (parent and uncle → black, grandparent → red). If the uncle is black, we rotate at the grandparent (e.g. left-rotate if the red nodes form a right spine) so that the tree satisfies "no two reds in a row" and black heights stay even.

Red-Black vs AVL

AVL: Stricter balance (|BF| ≤ 1); slightly lower height; more rotations on insert/delete.
Red-Black: Looser balance; fewer rotations in practice; often faster for mixed insert/delete workloads. Both give O(log n) operations.

Time and Space Complexity

Search, insert, delete: O(log n) time. Space O(log n) for recursion.
Fix-up: O(log n) steps; O(1) rotations per insert; delete may do O(log n) recolorings/rotations along the path.

Expert Tip

Remember the five invariants; the key is "same black count on every path" and "no two reds adjacent." Insert: new node red; fix double red with uncle red (recolor) or uncle black (rotate + recolor). Root stays black.

Practice Problems

Implementations: rarely required in interviews; understanding invariants and when to recolor/rotate is enough.
Compare with AVL: when to prefer Red-Black (fewer rotations, good for mixed workloads) vs AVL (stricter balance, more lookups).

Summary

Red-Black tree: BST with red/black nodes; root and leaves black; red nodes have black children; same black height on all paths ⇒ height O(log n).
Insert: Insert red; fix double red (recolor if uncle red; rotate + recolor if uncle black). Delete: Fix black-height by recolor/rotate. Both O(log n).

11.12 Trie

Introduction

A Trie (prefix tree) is a tree used to store a set of strings. Each edge is labeled with a character; a path from the root to a node spells a prefix (possibly a full word). Nodes typically have a flag (e.g. is_end) to mark the end of a word. Tries support insert, search (exact word), and startsWith (prefix lookup) in O(m) time where m is the length of the word or prefix. They are used for autocomplete, spell check, and prefix-based problems (e.g. "count words with prefix"). This section covers the structure and basic operations.

Structure

Each node has:

Children: A mapping from character to child node (e.g. dict or array of size 26 for lowercase letters).
is_end: True if a word ends at this node (so the path from root spells a complete word).

The root represents the empty prefix. To insert "cat", we add edges c → a → t and set is_end at the node for "t".

Insert

Start at the root. For each character in the word, go to the corresponding child (create it if missing). After processing the last character, set is_end = True at that node. Time O(m), m = word length.

Search (Exact Word)

Follow the path for each character. If we hit a missing child, the word is not in the trie. If we finish the word, return True only if is_end is True at the final node (so "car" is not found when only "cart" was inserted). Time O(m).

startsWith (Prefix)

Follow the path for each character of the prefix. If we can traverse without missing a child, the prefix exists. We don't require is_end. Time O(m).

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for c in word:
            if c not in node.children:
                node.children[c] = TrieNode()
            node = node.children[c]
        node.is_end = True

    def search(self, word):
        node = self.root
        for c in word:
            if c not in node.children:
                return False
            node = node.children[c]
        return node.is_end

    def startsWith(self, prefix):
        node = self.root
        for c in prefix:
            if c not in node.children:
                return False
            node = node.children[c]
        return True

Example

Insert "cat", "car", "card". Trie: root → c → a → t (is_end); root → c → a → r (is_end) → d (is_end). search("car") → True; search("ca") → False (no word ends at "ca"). startsWith("ca") → True.

Diagram: Trie for "cat", "car", "card"

  root
    │
    c
    │
    a
   / \
  t   r          t: is_end (word "cat")
  │   │          r: is_end (word "car")
  │   d              d: is_end (word "card")
  │
  (cat) (car, card share c→a→r then branch)

  search("car")   → follow c,a,r  → node has is_end ✓
  search("ca")    → follow c,a    → node is_end? No ✗
  startsWith("ca")→ follow c,a    → path exists ✓

Edge Cases (Trie)

Empty string: If the problem allows "" as a word, mark root.is_end = True when inserting "". search("") then returns True.
Prefix of existing word: Insert "cat" then "ca"—you need a node at "ca" with is_end = True. search("ca") is True only if you set is_end when inserting "ca".
Duplicate insert: Inserting the same word twice: just set is_end again; no structural change. Count of words may require a separate count field per node if needed.

Pattern Recognition

Use a Trie when the problem involves: prefix matching, "all words with prefix P," autocomplete, spell check, storing a set of strings with fast prefix/word lookup, or when you need to traverse by character and share structure across strings (e.g. "longest common prefix," "word squares").

Time and Space Complexity

Insert, search, startsWith: O(m) time per operation, m = length of word/prefix.
Space: O(total characters in all words) in the worst case; nodes are shared for common prefixes.

Interview Insight

"I'll use a trie: each node has a dict of character → child and an is_end flag. Insert: walk character by character, create nodes as needed, set is_end at the last node. Search: walk and return whether the final node has is_end. startsWith: walk and return whether the path exists."

Practice Problems

LeetCode 208: Implement Trie (Prefix Tree).
LeetCode 212: Word Search II (trie of words + DFS on board).
LeetCode 14: Longest Common Prefix (trie or simple compare).

Expert Tip

Trie: one node per prefix; children by character; is_end for complete words. Insert: traverse/create; set is_end. Search: traverse; return is_end at last node. startsWith: traverse; return True if path exists.

Summary

Trie: Tree for strings; path = prefix; is_end marks word end. Insert/search/startsWith in O(m).
Use for prefix lookups, autocomplete, "count words with prefix," and string set membership.

11.13 Segment Tree

Introduction

A Segment Tree is a binary tree used for range queries (e.g. sum, min, max over [l, r]) and point updates (or range updates with lazy propagation) on an array. Each node stores an aggregate value for a segment [l, r]. The root covers [0, n−1]; left child covers the left half, right child the right half. Build takes O(n); query(l, r) and point update(index, value) take O(log n). It is useful when you have many range queries and updates. This section covers a segment tree for range sum with point update.

Structure

We use an array-based representation: root at index 1; for node at index i, left child at 2*i, right at 2*i + 1. Each node holds the aggregate (e.g. sum) for its segment. Leaves correspond to single elements. Tree size: about 4*n (or 2 * next_power_of_2(n)) to be safe.

Diagram: Segment Tree for arr[0..3] (Sum)

  arr = [a0, a1, a2, a3]   (indices 0..3)

  Logical tree (each node covers a range):
            [0..3] sum
           /        \
      [0..1]         [2..3]
      /    \         /    \
  [0..0] [1..1]  [2..2] [3..3]
   a0      a1      a2      a3

  Array representation (root at 1):
  index:  1        2        3        4    5    6    7
  value:  sum(0-3) sum(0-1) sum(2-3) a0   a1   a2   a3
  children: 2,3      4,5      6,7     -    -    -    -

Build

Fill leaves with the array values. Then fill internal nodes bottom-up: tree[i] = tree[2*i] + tree[2*i+1] (for sum). O(n) time.

Query(l, r)

Recurse from the root. If the current node's segment is entirely inside [l, r], return its value. If it doesn't overlap [l, r], return 0 (for sum). Otherwise recurse on left and right children and combine (e.g. add) the results. O(log n) time.

Point Update

Update the leaf for the given index (add delta or set value). Then update all ancestors: for index i, tree[i] = tree[2*i] + tree[2*i+1]. O(log n) time.

class SegmentTree:
    def __init__(self, nums):
        n = len(nums)
        self.n = n
        self.size = 1
        while self.size < n:
            self.size *= 2
        self.tree = [0] * (2 * self.size)
        for i in range(n):
            self.tree[self.size + i] = nums[i]
        for i in range(self.size - 1, 0, -1):
            self.tree[i] = self.tree[2*i] + self.tree[2*i + 1]

    def update(self, index, val):
        i = self.size + index
        self.tree[i] = val
        i //= 2
        while i:
            self.tree[i] = self.tree[2*i] + self.tree[2*i + 1]
            i //= 2

    def query(self, left, right):
        l, r = left + self.size, right + self.size
        s = 0
        while l <= r:
            if l % 2 == 1:
                s += self.tree[l]
                l += 1
            if r % 2 == 0:
                s += self.tree[r]
                r -= 1
            l //= 2
            r //= 2
        return s

Example

Array [1, 3, 5, 7, 9]. Build: leaves 1,3,5,7,9; parents = sum of children. query(1, 3) = 3+5+7 = 15. update(2, 10): set index 2 to 10, then update ancestors; query(1, 3) = 3+10+7 = 20.

Time and Space Complexity

Build: O(n). Query / point update: O(log n).
Space: O(n) for the tree array (about 4n nodes).

Common Mistakes

Wrong segment boundaries in query. The iterative query that merges segments (l, r with l%2, r%2) must use the same 0-based or 1-based convention as your build. Check that [left, right] is inclusive and matches the problem.
Array size too small. Use at least 4*n (or 2 * next power of 2) so that all nodes fit; otherwise index 2*i or 2*i+1 can go out of bounds.
Update: forgetting to propagate. After updating the leaf, update all ancestors (i = i//2 in a loop) with the same combine function (e.g. sum of children).

Interview Insight

"I'll use a segment tree with array representation: root at 1, children at 2*i and 2*i+1. Build bottom-up. For range sum query I merge segments that fall entirely inside [l, r]. For point update I update the leaf and then propagate to the root. Time O(n) build, O(log n) per query and update." Mention lazy propagation if the problem has range updates.

Practice Problems

LeetCode 307: Range Sum Query - Mutable (segment tree or BIT).
LeetCode 315: Count of Smaller Numbers After Self (segment tree / BIT for rank).
SPOJ RMQ or similar: range min/max query with point update.

Expert Tip

Segment tree: array representation, root at 1; children at 2*i and 2*i+1. Build bottom-up. Query: merge segments that lie inside [l, r]. Update: update leaf then propagate up. For range-update use lazy propagation.

Summary

Segment tree: Binary tree over segments; each node = aggregate for [l, r]. Build O(n); query/point update O(log n).
Use for range sum/min/max and point (or range with lazy) updates. Array size ~4n; query/update by walking from leaves to root.

11.14 Fenwick Tree (BIT)

Introduction

A Fenwick Tree (Binary Indexed Tree, BIT) supports prefix sum (sum of elements from index 0 to i) and point update (add a delta to one element) in O(log n) time and O(n) space. It is simpler and often faster in practice than a segment tree for these two operations. Range sum [l, r] = prefix_sum(r) − prefix_sum(l−1). The key idea: each index i in the BIT array stores the sum of a contiguous segment ending at i; the segment length is determined by the lowest set bit of i (i & -i). This section covers the 1-indexed BIT for prefix sum and point update.

Idea

We use a 1-indexed array tree. Index i is responsible for the segment of length lsb(i) = i & -i ending at position i: that is, indices [i - lsb(i) + 1, i]. To compute prefix_sum(i), we add tree[i], then subtract lsb(i) and repeat: i -= i & -i until i is 0. To update(i, delta), we add delta to tree[i], then add lsb(i) and repeat: i += i & -i until we exceed n. Both are O(log n).

Diagram: Fenwick Tree — Which Index Covers What

  Index i   Binary   lsb(i)=i&-i   Covers range (1-indexed)
  ------    ------   -----------   ----------------------
  1         0001     1             [1..1]
  2         0010     2             [1..2]
  3         0011     1             [3..3]
  4         0100     4             [1..4]
  5         0101     1             [5..5]
  6         0110     2             [5..6]
  7         0111     1             [7..7]
  8         1000     8             [1..8]

  prefix_sum(7) = tree[7] + tree[6] + tree[4]  (7 → 6 → 4 → 0; add and subtract lsb)
  update(5, d): add d to tree[5], tree[6], tree[8]  (5 → 6 → 8; add lsb)

Operations

prefix_sum(i): Sum of original array from index 1 to i (1-indexed). Start with sum = 0, pos = i. While pos > 0: sum += tree[pos]; pos -= pos & -pos. Return sum.

update(i, delta): Add delta to the element at index i. While pos ≤ n: tree[pos] += delta; pos += pos & -pos.

range_sum(l, r): prefix_sum(r) − prefix_sum(l − 1). Use 1-indexed l, r.

class FenwickTree:
    def __init__(self, n):
        self.n = n
        self.tree = [0] * (n + 1)

    def update(self, i, delta):
        while i <= self.n:
            self.tree[i] += delta
            i += i & -i

    def prefix_sum(self, i):
        s = 0
        while i > 0:
            s += self.tree[i]
            i -= i & -i
        return s

    def range_sum(self, l, r):
        return self.prefix_sum(r) - self.prefix_sum(l - 1)

Build from array: Initialize tree to zeros, then for each index i call update(i, arr[i]) (using 1-based index). O(n log n). Or build in O(n) by filling tree[i] with sum of segment and then updating "parents" (standard O(n) build exists).

Example

Array [1, 3, 5, 7] (1-indexed: indices 1..4). After build: prefix_sum(3) = 1+3+5 = 9. update(2, 2): add 2 to index 2 (so value becomes 5); prefix_sum(3) = 1+5+5 = 11. range_sum(2, 4) = prefix_sum(4) − prefix_sum(1) = 16 − 1 = 15.

Time and Space Complexity

Update, prefix_sum, range_sum: O(log n) per call.
Space: O(n). Build: O(n log n) with naive updates; O(n) with careful build.

Edge Cases (Fenwick Tree)

1-indexed vs 0-indexed: The standard BIT is 1-indexed. If your problem uses 0-based indices, convert: update(i+1, delta) and prefix_sum(i+1); range_sum(l, r) uses prefix_sum(r+1) − prefix_sum(l).
Empty array or n=0: Initialize with n+1 size; avoid update/query when n is 0 (or handle with a guard).
Range [l, r] when l=1: range_sum(1, r) = prefix_sum(r) − prefix_sum(0); prefix_sum(0) should be 0 (no elements).

Common Mistakes

Using 0-based index in the BIT. The lsb logic and "which index covers what" assume 1-based indices. If you use 0-based, the covering ranges and update/query logic change; stick to 1-based and convert at the interface.
Forgetting to add delta (or setting value). update(i, delta) adds delta to the element. If the problem says "set value at i to v," you need to add (v − old_value) or maintain the array and update with the difference.

Interview Insight

"Fenwick tree uses 1-indexed array; each index i stores the sum of a segment of length (i & -i) ending at i. Prefix sum: add tree[i] and do i -= i & -i. Update: add delta and do i += i & -i. Range sum is prefix_sum(r) − prefix_sum(l−1). O(log n) per operation, O(n) space. Simpler than segment tree for prefix/range sum + point update."

Practice Problems

LeetCode 307: Range Sum Query - Mutable (BIT or segment tree).
LeetCode 315: Count of Smaller Numbers After Self (BIT for rank/count).
Inversion count: use BIT to count smaller elements to the right.

Expert Tip

BIT is 1-indexed. Update: i += i & -i. Prefix sum: i -= i & -i. Range sum [l, r] = prefix_sum(r) − prefix_sum(l−1). Use when you need prefix/range sum + point updates; simpler than segment tree for that.

Summary

Fenwick Tree (BIT): 1-indexed array; index i covers segment of length i & -i ending at i. prefix_sum and update in O(log n).
range_sum(l, r) = prefix_sum(r) − prefix_sum(l−1). Build by updates in O(n log n) or O(n) with proper build.

11.15 Sparse Table

Introduction

A Sparse Table is a data structure for range min/max (or other idempotent) queries on a static array. After O(n log n) preprocessing, each query is answered in O(1) time. It works for idempotent operations: min, max, gcd—where combining a value with itself gives the same value (so overlapping intervals are fine). It does not support point updates; the array is static. For range sum you need segment tree or BIT. This section covers the idea, build, and query for range minimum.

Idea

Precompute st[i][j] = minimum of the segment starting at index i with length 2^j (i.e. arr[i..i+2^j−1]). We can compute st[i][j] from st[i][j−1] and st[i + 2^(j−1)][j−1] (two halves). For a query [l, r], let k = floor(log2(r − l + 1)). The segments [l, l+2^k−1] and [r−2^k+1, r] cover [l, r] (they overlap but for min/max that's fine). So query(l, r) = min(st[l][k], st[r−2^k+1][k]).

Build

st[i][0] = arr[i] (length 1). For j from 1 to max_j: st[i][j] = min(st[i][j−1], st[i + 2^(j−1)][j−1]), for all i such that the second segment is in bounds. Precompute log2 for integers (e.g. log_table[length] = k) for O(1) query.

import math

def build_sparse_table(arr):
    n = len(arr)
    max_j = math.floor(math.log2(n)) + 1
    st = [[0] * max_j for _ in range(n)]
    for i in range(n):
        st[i][0] = arr[i]
    for j in range(1, max_j):
        step = 1 << (j - 1)
        for i in range(n - (1 << j) + 1):
            st[i][j] = min(st[i][j-1], st[i + step][j-1])
    return st

def query_min(st, l, r, log_table):
    length = r - l + 1
    k = log_table[length]
    return min(st[l][k], st[r - (1 << k) + 1][k])

Precomputing log2 for Query

To get k = floor(log2(length)) in O(1), precompute an array log_table: for each length from 1 to n, store the exponent k such that 2^k ≤ length < 2^(k+1). Then query_min(st, l, r, log_table) uses k = log_table[r - l + 1].

Example

arr = [3, 2, 4, 5, 1, 1, 5, 2]. st[i][0] = arr[i]. st[0][1] = min(arr[0..1]) = 2; st[0][2] = min(arr[0..3]) = 2. Query [2, 5]: length 4, k=2; min(st[2][2], st[5-4+1][2]) = min(min(arr[2..5]), min(arr[2..5])) = min(1, 1) = 1.

Time and Space Complexity

Build: O(n log n) time and space (n × log n table).
Query: O(1) time.
No updates: Sparse table is static. For point updates use segment tree.

When to Use

Use sparse table when: array is static, queries are range min/max/gcd, and you need O(1) per query. Use segment tree when you need point/range updates or non-idempotent (e.g. sum) queries.

Common Mistakes

Using for range sum. Sum is not idempotent (overlapping segments would double-count). Use segment tree or BIT for sum.
Off-by-one in query. The second segment starts at r - (1 << k) + 1 and has length 2^k, so it ends at r. Check that both segments are within [l, r].

Interview Insight

"For static RMQ I can use a sparse table: precompute st[i][j] = min over [i, i+2^j−1] in O(n log n). Query [l,r]: k = floor(log2(r-l+1)); return min(st[l][k], st[r-2^k+1][k]) in O(1). Works for min, max, gcd—idempotent only. No updates."

Practice Problems

Range Minimum Query (RMQ) on static array.
Problems where you need many range min/max queries and the array doesn't change.

Expert Tip

st[i][j] = op over arr[i..i+2^j−1]. Build: st[i][0]=arr[i]; st[i][j] = op(st[i][j-1], st[i+2^(j-1)][j-1]). Query [l,r]: k = floor(log2(r-l+1)); return op(st[l][k], st[r-2^k+1][k]). Idempotent only (min, max, gcd).

Summary

Sparse table: Static array; O(n log n) build, O(1) range min/max (or gcd) query. Idempotent operations only.
st[i][j] = min over [i, i+2^j−1]. Query: two overlapping segments of length 2^k cover [l, r]. No updates.

11.16 Binary Lifting

Introduction

Binary Lifting is a technique on rooted trees that precomputes "power-of-two" steps upward from each node. With O(n log n) preprocessing, you can answer k-th ancestor (the node reached by going up k edges from a node) in O(log n) per query, and Lowest Common Ancestor (LCA) in O(log n) as well. The idea is similar to a sparse table: store up[u][j] = the node reached by moving 2^j steps up from u, then decompose any jump into binary. This section covers the precomputation, k-th ancestor, and LCA using binary lifting.

Formal Definition

Given a rooted tree with n nodes and root r, define:

parent(u) = the parent of node u (root has no parent).
up[u][0] = parent(u); up[u][j] = up[ up[u][j−1] ][j−1] for j ≥ 1 (i.e. 2^j-th ancestor = two steps of 2^(j−1)).
K-th ancestor of u: the node reached by moving exactly k edges from u toward the root (0-th ancestor = u; 1-st = parent).
LCA(u, v): the deepest node that is an ancestor of both u and v.

Mental Model

Think of climbing the tree in "powers of two": from any node you can jump 1, 2, 4, 8, … steps up. To move up k steps, write k in binary (e.g. 5 = 101) and apply the corresponding jumps (1 step + 4 steps). The precomputed table up[u][j] gives you the result of a single 2^j-step jump from u.

Precomputation

Assume we have parent[u] and depth[u] for each node (from a BFS/DFS from the root). Let LOG = ceil(log2(n)).

up[u][0] = parent[u] (or -1 / None for root).
For j from 1 to LOG−1: up[u][j] = up[ up[u][j−1] ][j−1] if up[u][j−1] is valid; else -1.

Process nodes in BFS order (so when we compute up[u][j], up[parent[u]][*] is already computed). Time O(n log n), space O(n log n).

K-th Ancestor Query

To find the k-th ancestor of u: if k > depth[u], no such node (return -1). Otherwise, for each bit set in k, jump by that power of two. Example: k = 5 (binary 101) → jump 2^0 then 2^2: u = up[u][0], then u = up[u][2]. Iterate j from 0 to LOG−1; if k has the j-th bit set, do u = up[u][j]. O(log n) time.

LCA Using Binary Lifting

1) Bring both nodes to the same depth: if depth[u] > depth[v], replace u with its (depth[u]−depth[v])-th ancestor (using k-th ancestor); similarly if depth[v] > depth[u]. 2) If u == v, return u. 3) Lift both u and v in large steps: for j from LOG−1 down to 0, if up[u][j] != up[v][j], set u = up[u][j], v = up[v][j]. After the loop, u and v are one step below the LCA, so parent[u] (or up[u][0]) is the LCA. O(log n) per query.

Diagram: Binary Lifting (up table)

  Tree (root at top):        up[u][j] = 2^j-th ancestor of u

        r (root)             u=3: up[3][0]=2, up[3][1]=r
       / \                   u=2: up[2][0]=r, up[2][1]=-1
      1   2                  K-th ancestor of 3, k=2: 3 -> up[3][1]=r
       \   \                 LCA(3,4): same depth? 3 depth 2, 4 depth 2.
        3   4                up[3][1]!=up[4][1] -> 3=up[3][1]=r, 4=up[4][1]=r;
        then up[3][0]=up[4][0]=r -> LCA = r (or parent of 3/4).

def build_binary_lifting(parent, n):
    LOG = (n).bit_length()
    up = [[-1] * LOG for _ in range(n)]
    for u in range(n):
        up[u][0] = parent[u]
    for j in range(1, LOG):
        for u in range(n):
            if up[u][j-1] != -1:
                up[u][j] = up[up[u][j-1]][j-1]
    return up

def kth_ancestor(up, u, k, depth):
    if k > depth[u]:
        return -1
    for j in range(len(up[0])):
        if (k >> j) & 1:
            u = up[u][j]
            if u == -1:
                return -1
    return u

def lca(up, depth, u, v):
    if depth[u] < depth[v]:
        u, v = v, u
    d = depth[u] - depth[v]
    u = kth_ancestor(up, u, d, depth)
    if u == v:
        return u
    LOG = len(up[0])
    for j in range(LOG - 1, -1, -1):
        if up[u][j] != up[v][j]:
            u, v = up[u][j], up[v][j]
    return up[u][0]

Example

Tree: root 0, children 1,2; 1's child 3; 2's child 4. depth = [0,1,1,2,2]. up[3][0]=1, up[3][1]=0. kth_ancestor(3, 2) = up[3][1] = 0. LCA(3, 4): bring to same depth (both 2); 3 and 4 have different up[][1] (1 vs 2), so u=0, v=0; then up[3][0]=1, up[4][0]=2, so we don't move; LCA = parent(3)=1? No—after bringing to same depth we have 3 and 4; up[3][1]=0, up[4][1]=0 so we set 3=0, 4=0; then LCA = up[0][0] = root 0.

Time and Space Complexity

Preprocessing: O(n log n) time and space (n nodes × log n levels).
K-th ancestor: O(log n) per query.
LCA: O(log n) per query.

When to Use

Use binary lifting when you have a static tree and many k-th ancestor or LCA queries. For a single LCA query, two DFS passes (parent/depth + simple climb) are O(n); binary lifting pays off when you have many queries. Also used in advanced tree techniques (e.g. path aggregates with segment tree over Euler tour).

Edge Cases

Root: parent[root] = -1; up[root][j] = -1 for all j. kth_ancestor(root, 0) = root; kth_ancestor(root, 1) = -1.
K > depth: kth_ancestor(u, k) should return -1 (or invalid) when k exceeds depth[u].
LCA(u, u): return u. Same depth step is a no-op when depths are equal.

Common Mistakes

Processing order in build. Compute up[u][j] only after parent's up is ready; BFS/level order ensures that.
LCA loop direction. In the "lift both" step, iterate j from LOG−1 down to 0 so you take the largest possible steps first and don't overshoot the LCA.

Interview Insight

"I'll use binary lifting: precompute up[u][j] = 2^j-th ancestor in O(n log n). K-th ancestor: break k into bits and jump. LCA: bring both to same depth with k-th ancestor, then lift both while up[u][j] != up[v][j]. Final parent is LCA. O(log n) per query."

Practice Problems

LeetCode 1483: Kth Ancestor of a Tree Node (binary lifting).
LeetCode 236: Lowest Common Ancestor of a Binary Tree (also solvable with binary lifting on general tree after parent/depth build).
Distance between two nodes: depth[u] + depth[v] − 2*depth[LCA(u,v)].

Expert Tip

up[u][0]=parent; up[u][j]=up[up[u][j-1]][j-1]. K-th ancestor: for each bit in k, u=up[u][j]. LCA: same depth (k-th ancestor), then for j from high to 0 if up[u][j]!=up[v][j] lift both; return up[u][0].

Summary

Binary lifting: Precompute up[u][j] = 2^j-th ancestor; O(n log n) build, O(log n) k-th ancestor and LCA.
K-th ancestor: decompose k in binary and jump. LCA: same depth, then lift both until parents match; that parent is LCA.

11.17 Euler Tour (Tree Flattening)

Introduction

An Euler Tour (or tree flattening) is a DFS order that visits each node when entering and when leaving the subtree. The result is an array where every subtree corresponds to a contiguous segment: if node u has in[u] and out[u] (first and last time we visit it), then the segment [in[u], out[u]] contains exactly the nodes in the subtree of u. This lets you answer "subtree queries" (e.g. sum or update all nodes in subtree of u) using a segment tree or Fenwick tree on the flattened array. Build: one DFS, O(n).

Formal Definition

For a rooted tree, perform a DFS from the root. When we first enter a node u, append u to the tour and set in[u] = current_index. When we finish processing all children and backtrack from u, append u again (or just record out[u] = current_index if we use a single "exit" index). The subtree of u in the tree corresponds to the contiguous range [in[u], out[u]] in the tour (if we only store "in" and "out" and the segment between them is the subtree). In the variant where we push at enter and at exit, the segment [in[u], out[u]] has length 2·subtree_size(u) − 1; for subtree queries we often use a variant that stores each node once: in[u] to out[u] spans exactly the subtree (by storing node indices in order of first visit only, and out[u] = last index of any node in subtree).

Two Common Variants

Enter + exit (full tour): Push node when entering and when leaving. Segment [in[u], out[u]] has every node in subtree appearing twice (except u once at boundaries). Useful for path queries (e.g. count edges on path) with a different structure.
Subtree flattening (one index per node): Store each node at the time of first visit. Fill an array ord[] so that ord[in[u]]] = u and the range in[u] .. out[u] is exactly the in-times of all nodes in the subtree of u. Then subtree of u = segment [in[u], out[u]] in ord (or in a value array indexed by in-time).

Mental Model

Imagine walking along the edges of the tree: start at root, go down to a child, eventually backtrack. Write down the node every time you "enter" it. The list you get has the property: for any node u, the block of entries from "enter u" to "enter the last descendant and then we never see u's subtree again" is exactly the subtree. So [in[u], out[u]] = subtree of u in the flattened array.

Build (DFS)

Initialize a timer timer = 0. For each node u: set in[u] = timer, then timer += 1 (or append u to tour). Recurse on all children. Set out[u] = timer - 1 (last index that belongs to u's subtree). So subtree of u = [in[u], out[u]] inclusive. O(n) time.

Diagram: Euler Tour (subtree = contiguous segment)

  Tree:        r          DFS order (enter only): r, 1, 3, 2, 4
              / \         in[r]=0, in[1]=1, in[3]=2, in[2]=3, in[4]=4
             1   2         out[r]=4, out[1]=2, out[3]=2, out[2]=4, out[4]=4
            / \   \        Subtree of 1 = [1,2] -> nodes 1,3. Subtree of r = [0,4] -> all.
           3   (4) 4
           (subtree of 1: 1,3)

def euler_tour(g, root=0):
    n = len(g)
    in_time = [0] * n
    out_time = [0] * n
    timer = [0]
    def dfs(u, parent):
        in_time[u] = timer[0]
        timer[0] += 1
        for v in g[u]:
            if v != parent:
                dfs(v, u)
        out_time[u] = timer[0] - 1
    dfs(root, -1)
    return in_time, out_time

# Subtree of u in flattened array = [in_time[u], out_time[u]]
# Use with segment tree / BIT: value at index in_time[u] = value of node u

Use Case: Subtree Sum / Update

Store val[in_time[u]] = value_of_node_u. Subtree sum for u = range query [in_time[u], out_time[u]] on a segment tree or BIT. Point update at node u = update index in_time[u]. Range update on subtree of u = update segment [in_time[u], out_time[u]]. O(log n) per query with segment tree/BIT.

Time and Space Complexity

Build: O(n) one DFS. in/out arrays: O(n).
Subtree query/update: O(log n) with segment tree or BIT on the flattened array of size n.

Edge Cases

Single node: in[u] = out[u]; segment has one element.
Root: Subtree of root = [0, n−1].

Common Mistakes

Confusing in/out with "enter/exit" variant. For subtree = contiguous segment, use "first visit" only and out[u] = last index in subtree (as built above).
0-based vs 1-based. Segment tree/BIT on indices 0..n−1: use in_time and out_time as 0-based; if BIT is 1-based, use in_time[u]+1.

Interview Insight

"I'll flatten the tree with a DFS: in[u] when we enter, out[u] when we leave. Subtree of u is the contiguous segment [in[u], out[u]]. Then I can use a segment tree or BIT on that array for subtree sum/update in O(log n)."

Practice Problems

Subtree sum queries / subtree update (e.g. add x to all nodes in subtree of u).
Problems that need "all nodes in subtree" as a range on an array (e.g. tree + segment tree).

Expert Tip

Euler tour: DFS, in[u] = timer++, recurse, out[u] = timer−1. Subtree of u = [in[u], out[u]]. Put node values at in[u]; use segment tree/BIT for subtree queries/updates.

Summary

Euler tour (flattening): DFS to get in[u], out[u]; subtree of u = contiguous segment [in[u], out[u]].
Use with segment tree or BIT for O(log n) subtree sum/update. Build O(n).

11.18 Heavy-Light Decomposition

Introduction

Heavy-Light Decomposition (HLD) splits a rooted tree into a set of heavy paths such that any path from a node to the root intersects at most O(log n) heavy paths. Each heavy path is a contiguous segment in a DFS order, so you can use a segment tree (or similar) on the concatenation of these paths to support path queries (e.g. max edge weight on path u–v) and sometimes path updates. HLD is used when you need many path-aggregate queries/updates on a tree; building it is O(n), and each path query is O(log² n) with a segment tree over chains.

Formal Definition

For each node u, the heavy child is the child whose subtree has the largest size (break ties arbitrarily). The heavy edge is the edge from u to its heavy child; all other edges from u to children are light edges. A heavy path is a maximal sequence of nodes connected by heavy edges. Each node belongs to exactly one heavy path. The root of that path is the head of the chain. When we DFS and assign positions, we visit the heavy child first so that each heavy path becomes a contiguous segment in the DFS order.

Mental Model

From each node, pick the "heaviest" child (largest subtree) and call that edge heavy; the rest are light. Following heavy edges from any node leads down to a leaf and forms one chain. The tree is partitioned into chains; a path from u to root goes "up" and may switch chains at light edges. Because switching chains at least halves the subtree size, you switch at most O(log n) times.

Steps to Build

Compute size[u] for all nodes (DFS).
For each node, mark the heavy child (child with max size).
DFS again (heavy child first) to assign pos[u] in a global array and head[u] (chain head). Each chain is contiguous in the global array.

Then map node values to the segment tree at indices pos[u]. Path from u to v: find LCA w; path u–v = path u–w + path w–v. To query path u–w: while u is not in the same chain as w, query segment [pos[head[u]], pos[u]], then move u to parent of head[u]. Repeat until u and w are in the same chain; then query [pos[w], pos[u]]. Same for the other half. O(log n) chain jumps × O(log n) segment tree query = O(log² n).

Diagram: Heavy vs light edges

  Tree (numbers = subtree sizes):   Heavy edges (---), light (···)
           5 (root)                        r
          / \                             / \
         3   1                           H   L
        / \   \                         / \   \
       1  1   1                         H  L   L
      Heavy child = child with largest subtree. Path from leaf to root crosses O(log n) light edges.

def dfs_size(g, u, p, size):
    size[u] = 1
    for v in g[u]:
        if v != p:
            dfs_size(g, v, u, size)
            size[u] += size[v]

def dfs_hld(g, u, p, size, head, pos, head_of, timer):
    head_of[u] = head
    pos[u] = timer[0]
    timer[0] += 1
    heavy = None
    for v in g[u]:
        if v != p and (heavy is None or size[v] > size[heavy]):
            heavy = v
    if heavy is not None:
        dfs_hld(g, heavy, u, size, head, pos, head_of, timer)
    for v in g[u]:
        if v != p and v != heavy:
            dfs_hld(g, v, u, size, v, pos, head_of, timer)  # new chain, head = v

# Query path u -> v: split at LCA w. For u->w: while u not in chain of w, query [pos[head_of[u]], pos[u]], u = parent[head_of[u]]; then query [pos[w], pos[u]].
# Same for v->w. Combine results (e.g. max of segments).

Path Query (u to v)

Let w = LCA(u, v). Path u–v = path u–w plus path w–v. To get aggregate from u to w: while u and w are in different chains, query the segment from pos[head_of[u]] to pos[u], then set u = parent[head_of[u]]. When u and w are in the same chain, query [pos[w], pos[u]] and combine. Do the same for v to w. Combine the two halves (e.g. take max, or concatenate).

Time and Space Complexity

Build: O(n) for size DFS + HLD DFS.
Path query/update: O(log² n) with segment tree (O(log n) chain jumps × O(log n) per segment query).
Space: O(n) for pos, head, size, plus segment tree O(n).

When to Use

Use HLD when you need path queries or updates (max/min/sum on path u–v, or update edges/nodes on a path). For subtree queries only, Euler tour + segment tree is simpler. HLD is heavier to implement but standard for path problems on trees.

Edge Cases

Leaf: No heavy child; it starts its own chain (head = itself).
Path u–u: Single node; return value at u (or identity for aggregate).

Common Mistakes

Querying the wrong segment. When u and w are in the same chain, segment is [pos[w], pos[u]] (assuming pos[w] ≤ pos[u]); order depends on DFS. Be consistent with depth.
Forgetting to combine both halves. Path u–v = u–LCA + LCA–v; combine the two aggregates correctly (e.g. for max, take max of both; for sum, add).

Interview Insight

"I'll use heavy-light decomposition: heavy child = largest subtree. Heavy paths are contiguous in DFS order. Path from u to v: go to LCA in O(log n) steps, each step querying one segment on a segment tree. Total O(log² n) per path query. Use when we need path max/sum/update."

Practice Problems

Path maximum/minimum/sum query on a tree (nodes or edges).
Path update (e.g. add x to all nodes on path u–v).
CP/contest problems that explicitly ask for HLD or "path queries on tree."

Expert Tip

HLD: heavy child = max subtree size; DFS heavy first so each chain is contiguous. path_query(u,v) = split at LCA; climb from u (and v) by chains, query segment [pos[head[u]], pos[u]], then move to parent of head. O(log² n) with segment tree.

Summary

HLD: Partition tree into heavy paths; any path to root crosses O(log n) chains. Chains are contiguous in segment tree.
Path query: climb from u and v to LCA, query segment per chain; O(log² n). Build O(n).

11.19 Centroid Decomposition

Introduction

Centroid Decomposition is a technique that recursively splits a tree by centroids. A centroid is a node whose removal leaves no connected component of size greater than n/2. Every tree has at least one centroid (and at most two). We pick a centroid, solve for it (e.g. count paths through it), then remove it and recurse on each remaining component. The recursion depth is O(log n), so total work over all levels is often O(n log n). It is used for problems like "count pairs of nodes (u, v) such that distance(u, v) = k" or "sum of distances to all nodes."

Formal Definition

In a tree of n nodes, a node c is a centroid if every connected component of the tree after removing c has size ≤ n/2. Equivalently: for every neighbor v of c, the size of the subtree of v (when rooting at c) is ≤ n/2. To find a centroid: start at any node, then repeatedly move to the neighbor that has subtree size > n/2 until no such neighbor exists; that node is a centroid. The centroid tree is built by: pick centroid c, remove c, recurse on each component; the centroid tree has c as root and its children are the roots of the centroid trees of those components. Depth of centroid tree is O(log n).

Mental Model

Think of the centroid as the "balance point" of the tree: no single branch has more than half the nodes. After removing it, we have several smaller trees; we recursively find their centroids and make them children of the current centroid. Any path in the original tree crosses at most O(log n) centroids in this decomposition, which lets us count paths by "paths through centroid c" and then recurse.

Finding a Centroid

1) Root the tree arbitrarily; compute size[u] for all nodes (DFS). 2) Start at the root. 3) If there exists a child v such that size[v] > n/2, move to v and repeat. 4) The node where we stop is a centroid. O(n) time.

Using Centroid Decomposition (e.g. count paths of length k)

At each centroid c: count paths of length k that pass through c. Such a path has one endpoint in one component (after removing c) and the other in another (or c itself). For each component, compute distances from c to all nodes in that component; store counts by distance. Then for each distance d in one component, we need k−d in another; add to answer. Then mark c as removed and recurse on each component. Total O(n log n) if we do O(size) work per centroid.

Diagram: Centroid and decomposition

  Tree (n=5):        Centroid: remove 2 -> components size 2, 1, 1 (all <= 5/2)
     1 - 2 - 3        So 2 is a centroid. Centroid tree: 2 is root; left subtree = centroid of {1,4};
    /     \            right = centroid of {3,5}. Depth O(log n).
   4       5

def get_size(g, u, p, size, removed):
    size[u] = 1
    for v in g[u]:
        if v != p and not removed[v]:
            get_size(g, v, u, size, removed)
            size[u] += size[v]

def find_centroid(g, u, p, n, size, removed):
    for v in g[u]:
        if v != p and not removed[v] and size[v] > n // 2:
            return find_centroid(g, v, u, n, size, removed)
    return u

def decompose(g, u, removed, parent_centroid):
    size = {}
    get_size(g, u, -1, size, removed)
    n = size[u]
    c = find_centroid(g, u, -1, n, size, removed)
    removed[c] = True
    # parent_centroid[c] = parent in centroid tree (if needed)
    for v in g[c]:
        if not removed[v]:
            decompose(g, v, removed, c)
    # Process paths through c here (e.g. count paths of length k)
    removed[c] = False  # only if we need to traverse again; often we don't restore

Time and Space Complexity

Find centroid: O(n) per level. Recursion depth O(log n), so total O(n log n) for building the decomposition (if we do size DFS per centroid).
Path-count type problems: Often O(n log n) or O(n log² n) depending on how we aggregate per centroid.
Space: O(n) for size, removed, and centroid tree.

When to Use

Use centroid decomposition when the problem asks for counting paths with a property (e.g. length = k, sum of weights = k), or aggregating over all pairs (e.g. sum of distances). The key is "paths through current centroid" then recurse. For single-source or single-path queries, BFS/DFS or LCA may be simpler.

Edge Cases

n = 1: The only node is the centroid; no children.
Line graph: Centroid is the middle node(s); recursion depth is O(log n).

Common Mistakes

Size in wrong tree. When computing size for "find centroid," only count nodes in the current component (ignore removed nodes).
Counting the same path twice. When counting paths through c, ensure you count pairs (u, v) with u and v in different components (or one is c) and don't double-count.

Interview Insight

"I'll use centroid decomposition: find a node whose removal leaves no component larger than n/2, count paths through it (e.g. by distance buckets in each component), then recurse on components. Depth O(log n), so total O(n log n) for path-counting problems."

Practice Problems

Count pairs (u, v) such that dist(u, v) = k.
Sum of distances between all pairs, or from a set of nodes to all others.
Problems that say "paths in a tree" and need to aggregate over many paths.

Expert Tip

Centroid: remove it, all components size ≤ n/2. Find by moving to child with size > n/2 until none. Decompose: pick centroid, solve paths through it, recurse on components. Depth O(log n); use for path-count and pair-aggregate problems.

Summary

Centroid: Node whose removal leaves no component of size > n/2. Found in O(n).
Centroid decomposition: Recursively pick centroid, count/solve paths through it, recurse. O(log n) depth, often O(n log n) total. Use for path counting and distance problems.

Section 12: Heap

This section covers heaps: complete binary trees that satisfy the heap property. You will learn Min Heap and Max Heap, heapify, and classic patterns like Top K elements, merge K sorted lists, and median in a stream. Master these to tackle priority-queue problems in interviews and contests.

12.1 Min Heap

Introduction

A min heap is a complete binary tree where every node has a value smaller than or equal to the values of its children. The smallest element is always at the root. Min heaps are used whenever you need fast access to the minimum (e.g. priority queues, scheduling, finding the K smallest elements). Unlike a sorted array, you can insert and remove the minimum in O(log n) time while keeping the structure valid.

Real-World Analogy

Imagine a hospital emergency queue: patients are not served in arrival order but by priority (e.g. severity). The person with the smallest “priority number” (most urgent) is at the front. When someone new arrives, they are placed in the right spot; when the front is served, the next most urgent rises to the top. The min heap is exactly this: the “smallest” (highest priority) is always at the root, and updates are done in logarithmic time.

Formal Definition

Complete binary tree: All levels are fully filled except possibly the last, which is filled from left to right.
Min-heap property: For every node i, value(i) ≤ value(children of i). So the root has the minimum value in the entire tree.
We do not require ordering between siblings (e.g. left vs right); only parent ≤ children.

Why This Topic Matters

Min heaps power priority queues, which appear in scheduling, graph algorithms (Dijkstra), merge K sorted lists, Top K problems, and median-finding. Interviewers often ask you to implement a heap from scratch or to recognize when “always take the smallest/largest” suggests a heap. Understanding the array representation and the bubble-up / bubble-down operations is essential for both implementation and complexity analysis.

Mental Model

Think of the heap as a pyramid of values: the smallest sits on top. When you add a new value, drop it at the next free spot (bottom-right of the tree), then let it float up by swapping with its parent until the parent is smaller or you hit the root. When you remove the minimum, you take the root, replace it with the last element in the tree, then let that value sink down by swapping with the smaller child until both children are larger or you hit a leaf. “Float up” and “sink down” keep the min-heap property with O(log n) work per operation.

Array Representation (Critical)

We store the heap in a 0-indexed array and use index arithmetic to navigate the tree:

Root at index 0.
For a node at index i: parent at (i - 1) // 2, left child at 2*i + 1, right child at 2*i + 2.

So we never need pointers; the tree structure is implicit. The array must stay “complete”: we always add at the end and remove from the end when we pop the root.

ASCII Diagram

  Min heap (tree view)          Array: [2, 5, 7, 9, 6, 10, 8]
          2 (root, min)
         / \
        5   7
       / \  / \
      9  6 10  8

  Index:  0  1  2  3  4  5  6
  Parent of 4: (4-1)//2 = 1  → value 5
  Left of 1: 2*1+1 = 3  → value 9
  Right of 1: 2*1+2 = 4 → value 6
  Every node ≤ its children.

Core Operations

1. Insert (push)

Append the new element at the end of the array (next free position in the complete tree).
Bubble up (sift-up): Compare with parent; if smaller, swap with parent and repeat until parent is smaller or we reach the root.

Cost: O(log n) because the path from leaf to root has at most log₂(n+1) nodes.

2. Get minimum (peek)

Return the element at index 0. No structural change. O(1).

3. Remove minimum (pop)

Save the root (minimum) to return later.
Replace the root with the last element of the array and shrink the size by one.
Bubble down (sift-down): Compare the new root with both children; if it is greater than the smaller child, swap with that child and repeat until both children are ≥ current or we reach a leaf.

Cost: O(log n) because we traverse at most one path from root to leaf.

Python Implementation

class MinHeap:
    def __init__(self):
        self.heap = []

    def _parent(self, i):
        return (i - 1) // 2

    def _left(self, i):
        return 2 * i + 1

    def _right(self, i):
        return 2 * i + 2

    def _swap(self, i, j):
        self.heap[i], self.heap[j] = self.heap[j], self.heap[i]

    def push(self, x):
        self.heap.append(x)
        self._sift_up(len(self.heap) - 1)

    def _sift_up(self, i):
        while i > 0:
            p = self._parent(i)
            if self.heap[i] >= self.heap[p]:
                break
            self._swap(i, p)
            i = p

    def peek(self):
        if not self.heap:
            raise IndexError("peek from empty heap")
        return self.heap[0]

    def pop(self):
        if not self.heap:
            raise IndexError("pop from empty heap")
        n = len(self.heap)
        self._swap(0, n - 1)
        min_val = self.heap.pop()
        if self.heap:
            self._sift_down(0)
        return min_val

    def _sift_down(self, i):
        n = len(self.heap)
        while True:
            left = self._left(i)
            right = self._right(i)
            smallest = i
            if left < n and self.heap[left] < self.heap[smallest]:
                smallest = left
            if right < n and self.heap[right] < self.heap[smallest]:
                smallest = right
            if smallest == i:
                break
            self._swap(i, smallest)
            i = smallest

    def size(self):
        return len(self.heap)

Line-by-Line Explanation

_parent(i), _left(i), _right(i): Index formulas for the implicit binary tree; keep the logic in one place.
push(x): Append at end, then _sift_up from that index until the min-heap property holds (current ≥ parent or at root).
_sift_up(i): While not at root, compare with parent; if smaller, swap and move index to parent.
pop(): Swap root with last element, remove last (that’s the min), then _sift_down(0) so the new root sinks to the correct level.
_sift_down(i): Find the smallest among node i and its two children; if the smallest is a child, swap with that child and repeat from the child index; otherwise stop.

Time Complexity

push: O(log n). One append + at most O(log n) swaps along the path to the root.
peek: O(1).
pop: O(log n). Swap + pop + at most O(log n) swaps along one path downward.
Building a heap from n elements by repeated push: O(n log n). Building in place with heapify (topic 12.3) is O(n).

Space Complexity

O(n) for storing n elements in the array. No extra space proportional to n for the operations (only a few variables for indices).

Edge Cases

Empty heap: peek and pop should raise or return a sentinel; the implementation above raises IndexError.
Single element: After one push, one pop leaves the heap empty; no need to sift down.
Duplicate values: Min heap allows duplicates; any of the equal minima can be returned first. No need for stable ordering unless the problem requires it.

Common Mistake

In _sift_down, compare with both children and swap with the smaller one. Swapping with the larger child can break the min-heap property in the other subtree.

Expert Tip

Python’s heapq module is a min-heap on a list. Use heapq.heappush(h, x), heapq.heappop(h), heapq.heapify(lst). For max-heap, negate values or use a custom comparator. Knowing both the library and the manual implementation makes you interview-ready.

Evolution: From Naive to Heap

For “repeatedly get the minimum and possibly add new elements”:

Brute force: Store elements in an unsorted list; each “get min” is O(n) scan, each insert O(1). Total for n get-mins + inserts: O(n²).
Better: Keep a sorted list; get min O(1), but insert is O(n) to maintain order. Still O(n²) for n operations if we do many inserts.
Optimal: Min heap — get min O(1), insert and remove min O(log n). n operations → O(n log n).

Interview Insight

If the problem involves “K smallest/largest”, “merge K sorted”, “stream of numbers and report median”, or “schedule by priority”, think heap. Clarify whether you can use heapq or must implement from scratch; then use the array index formulas and sift-up/sift-down correctly.

Summary

Min heap: Complete binary tree with parent ≤ children; minimum at root.
Array: Index i → parent (i-1)//2, left 2*i+1, right 2*i+2.
Insert: Append, then sift up. Remove min: Swap root with last, pop last, then sift down. Peek: Return root.
Time: push O(log n), pop O(log n), peek O(1). Space: O(n).

12.2 Max Heap

Introduction

A max heap is a complete binary tree where every node has a value greater than or equal to the values of its children. The largest element is always at the root. Max heaps are used when you need fast access to the maximum: Top K largest elements, scheduling by highest priority, or any “repeatedly take the biggest” scenario. The structure and array representation are identical to the min heap; only the comparison direction changes.

Relationship to Min Heap

Max heap is the mirror of min heap: replace “smaller” with “larger” in every comparison. Parent ≥ both children; root holds the global maximum. All operations (push, pop, peek) have the same O(log n) or O(1) complexity. In Python’s heapq (which is min-heap only), a common trick for a max heap is to negate values: push -x, and when you pop, take -heapq.heappop(h) to get the real maximum.

Formal Definition

Complete binary tree: Same as min heap — all levels full except possibly the last, filled left to right.
Max-heap property: For every node i, value(i) ≥ value(children of i). The root is the maximum.

Why This Topic Matters

Many problems ask for the K largest elements, the maximum in a sliding window, or “always process the highest-priority item.” Max heap (or negated min heap) is the right tool. Interviewers may ask you to implement a max heap from scratch or to adapt a min-heap solution; knowing the single comparison flip makes this straightforward.

Mental Model

Same pyramid as min heap, but the largest is on top. On insert: append at the end, then sift up by swapping with the parent while the current value is greater than the parent. On remove max: save the root, replace root with the last element, pop the last, then sift down by swapping with the larger child until both children are ≤ current or you reach a leaf.

Array Representation

Identical to min heap: root at 0, parent (i-1)//2, left 2*i+1, right 2*i+2. Only the sift logic uses “greater than” instead of “less than.”

ASCII Diagram

  Max heap (tree view)          Array: [10, 8, 7, 5, 6, 3, 4]
          10 (root, max)
         /  \
        8    7
       / \   / \
      5  6  3  4

  Every node ≥ its children. Sift-up: swap if current > parent.
  Sift-down: swap with the *larger* child if current < that child.

Core Operations (Summary)

Push: Append, then sift up (swap with parent while heap[i] > heap[parent]).
Peek: Return heap[0]. O(1).
Pop (remove max): Swap root with last, pop last, then sift down (swap with the larger child while current is smaller than that child).

Python Implementation

class MaxHeap:
    def __init__(self):
        self.heap = []

    def _parent(self, i):
        return (i - 1) // 2

    def _left(self, i):
        return 2 * i + 1

    def _right(self, i):
        return 2 * i + 2

    def _swap(self, i, j):
        self.heap[i], self.heap[j] = self.heap[j], self.heap[i]

    def push(self, x):
        self.heap.append(x)
        self._sift_up(len(self.heap) - 1)

    def _sift_up(self, i):
        while i > 0:
            p = self._parent(i)
            if self.heap[i] <= self.heap[p]:  # stop when current <= parent
                break
            self._swap(i, p)
            i = p

    def peek(self):
        if not self.heap:
            raise IndexError("peek from empty heap")
        return self.heap[0]

    def pop(self):
        if not self.heap:
            raise IndexError("pop from empty heap")
        n = len(self.heap)
        self._swap(0, n - 1)
        max_val = self.heap.pop()
        if self.heap:
            self._sift_down(0)
        return max_val

    def _sift_down(self, i):
        n = len(self.heap)
        while True:
            left = self._left(i)
            right = self._right(i)
            largest = i
            if left < n and self.heap[left] > self.heap[largest]:   # compare with >
                largest = left
            if right < n and self.heap[right] > self.heap[largest]:
                largest = right
            if largest == i:
                break
            self._swap(i, largest)
            i = largest

    def size(self):
        return len(self.heap)

Min Heap vs Max Heap: Comparison

Aspect	Min Heap	Max Heap
Property	Parent ≤ children	Parent ≥ children
Root	Minimum	Maximum
Sift-up	Swap if current < parent	Swap if current > parent
Sift-down	Swap with smaller child	Swap with larger child
heapq	Direct use	Negate values: push -x, pop -val

Expert Tip

To use heapq as a max heap: heapq.heappush(h, -x) and max_val = -heapq.heappop(h). For objects, push (-priority, item) to get max-first ordering.

Time and Space Complexity

Same as min heap: push O(log n), pop O(log n), peek O(1). Space O(n).

When to Use Which

Min heap: K smallest, merge K sorted (smallest next), Dijkstra (smallest distance), median (lower half max).
Max heap: K largest, “highest priority first,” sliding window max (often with deque or two heaps), lower half of median (max of lower half).

Interview Insight

If the problem says “K largest,” use a max heap or a min heap of size K (keep only K elements; pop the smallest when exceeding K). For “K smallest,” use a max heap of size K (pop the largest) or a min heap and pop K times. State which you’re using and why.

Summary

Max heap: Complete binary tree with parent ≥ children; maximum at root.
Same array layout as min heap; only comparisons are reversed (use > and “larger child” in sift-down).
heapq in Python is min-heap only; use negated values for a max heap.
Same complexities as min heap; choose by whether you need “smallest” or “largest” at the root.

12.3 Heapify

Introduction

Heapify is the operation of turning an arbitrary array of numbers into a valid heap (min or max) in place. You already use it when you call heapq.heapify(lst) or when building a heap from a list. The surprising part: building a heap from n elements can be done in O(n) time, not O(n log n). This lesson explains why that is true and how to implement and use heapify correctly.

Why This Topic Matters

Whenever you have a pre-existing list of values and need a heap (e.g., for Top K, merge K sorted, or a one-time priority queue), building the heap with heapify is faster than pushing each element one by one. Understanding the O(n) analysis also comes up in interviews when you're asked to optimize "build heap from array."

Two Ways to Build a Heap

Method 1: Repeated Insert (Push) — O(n log n)

Start with an empty heap. For each element in the list, call push(x). Each push does O(log n) work in the worst case (sift up), and we do n pushes, so total time is O(n log n). Simple but not optimal when you already have all the data.

Method 2: Heapify (Bottom-Up Sift-Down) — O(n)

Treat the array as a complete binary tree (same index rules: parent (i-1)//2, left 2*i+1, right 2*i+2). Then, starting from the last non-leaf node down to the root, run sift-down (bubble down) at each node. This restores the heap property in one pass and can be proven to run in O(n).

Concept Note: We sift down, not up, and we start from the bottom of the tree (last non-leaf) toward the root. That way, when we sift down at a node, its children are already valid heaps (either leaves or already fixed).

Mental Model

Leaves are already valid heaps (single nodes).
Index of the last non-leaf in a 0-based array of length n is n // 2 - 1.
For each node from that index down to 0, we "fix" the subtree rooted at that node by sifting the node down until both children are valid (smaller than it in a min heap, or larger in a max heap).

ASCII Diagram: Heapify Order (Min Heap)

  Array length n = 7. Last non-leaf index = 7//2 - 1 = 2.

  Tree indices:    0
                 /   \
                1     2     ← start heapify here (index 2), then 1, then 0
               / \   / \
              3   4 5   6

  Order of sift-down: 2 → 1 → 0. At each step, the node may sink down
  multiple levels until its subtree satisfies the heap property.

Formal Definition

Heapify (build heap in place): Given an array A[0..n-1], rearrange it so that the complete binary tree represented by the array satisfies the heap property (min or max). The algorithm is: for i = n//2 - 1 down to 0, perform sift-down at index i.

Step-by-Step: Heapify for Min Heap

Set n = len(arr).
Last non-leaf index: start = n // 2 - 1.
For i = start down to 0 (inclusive):
- Run sift-down at index i: compare with left and right children, swap with the smaller child if the current node is larger, and repeat until the node is ≤ both children or reaches a leaf.

Python Implementation: Heapify (Min Heap)

def heapify_min(arr):
    """Turn list into a min heap in place. O(n)."""
    n = len(arr)

    def sift_down(i):
        while True:
            left = 2 * i + 1
            right = 2 * i + 2
            smallest = i
            if left < n and arr[left] < arr[smallest]:
                smallest = left
            if right < n and arr[right] < arr[smallest]:
                smallest = right
            if smallest == i:
                break
            arr[i], arr[smallest] = arr[smallest], arr[i]
            i = smallest

    # Start from last non-leaf down to root
    for i in range(n // 2 - 1, -1, -1):
        sift_down(i)

Why Is Heapify O(n) and Not O(n log n)?

Intuition: Most nodes are near the bottom of the tree. When we sift down from a node at height h, it can move at most h steps. There are roughly n/2^(h+1) nodes at height h. So the total work is on the order of:

\[ \sum_{h=0}^{\lfloor \log n \rfloor} \frac{n}{2^{h+1}} \cdot h \;=\; n \sum_{h \ge 0} \frac{h}{2^{h+1}} \;=\; O(n). \]

The sum $\sum \frac{h}{2^h}$ is a constant, so the whole build is O(n). This is why heapq.heapify and a proper "build heap from array" implementation are preferred when you start with a full list.

Time and Space Complexity

Time: O(n) for heapify on n elements.
Space: O(1) extra if done in place (only the array is modified).

Using heapq.heapify

import heapq

arr = [7, 2, 5, 1, 9, 3]
heapq.heapify(arr)   # in-place min heap
# arr is now a valid min-heap (e.g. arr[0] is minimum)
print(arr[0])        # 1
print(heapq.heappop(arr))  # 1

Expert Tip: For a max heap in Python, you can heapify a list of negated values: arr_neg = [-x for x in arr], then heapq.heapify(arr_neg). Pop with -heapq.heappop(arr_neg).

Edge Cases

Empty list: n//2 - 1 is -2 in Python; range(-1, -1, -1) is empty, so no iterations — correct.
Single element: n//2 - 1 = -1; loop runs zero times; single node is already a heap.
Two elements: One non-leaf at index 0; one sift-down may swap them — correct.

Common Mistakes

Common Mistake: Starting sift-down from index 0 (root) first. That doesn't guarantee children are heaps yet. You must start from the last non-leaf so that when you sift down, subtrees below are already valid.

Common Mistake: Using sift-up instead of sift-down for heapify. Sift-up gives O(n log n) when applied to every node; only the bottom-up sift-down yields O(n).

Summary

Heapify builds a heap from an array in place by running sift-down from the last non-leaf (n//2 - 1) down to the root.
Complexity is O(n), not O(n log n), because most nodes are low in the tree and move few steps.
Use heapq.heapify(lst) for a min heap in Python; for max heap, heapify a negated list.
Prefer heapify over n pushes when you already have all elements — it's faster and same space.

12.4 Top K Elements

Introduction

Top K Elements is one of the most common patterns in coding interviews and real systems: given a collection of items (numbers, strings, objects with a score), find the K largest or K smallest elements. Heaps give an efficient, streaming-friendly solution that avoids sorting the entire dataset. Mastering this pattern unlocks problems like "K closest points," "K most frequent elements," and "merge K sorted lists."

Real-World Analogy

Imagine you run a music app with millions of songs. You want to show users "Top 10 most played this week." You could sort all songs by play count and take the first 10 — but sorting millions of entries is expensive and unnecessary. A smarter approach: keep only a small "candidate set" of size K (e.g., a min heap of the top 10 so far). As you scan through songs, you compare each with the smallest in your top 10; if the new song is bigger, it kicks out the smallest and joins the set. At the end, you have exactly the top K without ever fully sorting the list.

Formal Definition

Input: An array (or stream) of n elements and an integer K (1 ≤ K ≤ n).
Output: The K elements that are largest (or smallest) by some comparison key.
Goal: Achieve better than O(n log n) full sort when possible, and support streaming (process elements one by one) when needed.

Why This Topic Matters

Interviewers love "Top K" in various forms: K largest, K smallest, K most frequent, K closest.
Heaps give O(n log K) time and O(K) space — ideal when K is much smaller than n.
The same idea extends to priority queues in Dijkstra, merge K sorted lists, and finding the median of a stream.

Mental Model

K largest: Keep a min heap of size K. The root is the "smallest of the top K." If a new element is larger than the root, pop the root and push the new one. At the end, the heap contains the K largest; order inside the heap doesn't matter for the answer.
K smallest: Keep a max heap of size K (in Python: min heap of negated values). The root is the "largest of the bottom K." If a new element is smaller than that, pop the root and push the new one. Result: K smallest.

Evolution: Brute Force → Better → Optimal

Brute Force: Full Sort

Sort the entire array, then take the first K (for K smallest) or last K (for K largest).

Time: O(n log n).
Space: O(n) or O(log n) depending on sort.

Simple but wasteful when K ≪ n.

Better: Partial Sort or Quickselect

Use quickselect to find the K-th smallest (or largest) element, then partition. Or use a library partial sort.

Time: O(n) average for quickselect; O(n log K) for heap approach.
Doesn't naturally support streaming (elements arriving one by one).

Optimal (for streaming and when K is small): Heap

Maintain a min heap of size K for "K largest" (or max heap of size K for "K smallest"). One pass over the data; each insertion is O(log K).

Time: O(n log K).
Space: O(K).

Works for streams and is easy to reason about in interviews.

Step-by-Step: K Largest Using Min Heap

Create an empty min heap (e.g. Python list with heapq).
For each element x in the array:
- If heap size < K: heapq.heappush(heap, x).
- Else: if x > heap[0], then heapq.heapreplace(heap, x) (or pop then push). This keeps the heap size K and drops the smallest of the current top K when a larger candidate appears.
After the loop, the heap contains exactly the K largest elements. The root is the K-th largest; to get them in sorted order (optional), pop K times or sort the heap.

Python Implementation: K Largest and K Smallest

import heapq

def k_largest(nums, k):
    """Return the K largest elements. Uses min heap of size K."""
    if k <= 0 or not nums:
        return []
    if k >= len(nums):
        return list(nums)

    heap = []
    for x in nums:
        if len(heap) < k:
            heapq.heappush(heap, x)
        elif x > heap[0]:
            heapq.heapreplace(heap, x)  # pop smallest, push x
    return heap  # order is heap order; for sorted: sorted(heap, reverse=True)

def k_smallest(nums, k):
    """Return the K smallest elements. Uses max heap of size K via negated min heap."""
    if k <= 0 or not nums:
        return []
    if k >= len(nums):
        return list(nums)

    heap = []  # min heap of -x => "max heap" of x
    for x in nums:
        if len(heap) < k:
            heapq.heappush(heap, -x)
        elif x < -heap[0]:   # x is smaller than current "max of smallest K"
            heapq.heapreplace(heap, -x)
    return [-x for x in heap]

Examples Section

Example 1: K Largest — Walkthrough

Example: nums = [3, 2, 1, 5, 6, 4], K = 2. We want the 2 largest: 5 and 6.

Min heap of size 2 (only the top 2 candidates):

Push 3 → heap [3].
Push 2 → heap [2, 3] (min at root).
5 > 2 → replace: pop 2, push 5 → heap [3, 5].
6 > 3 → replace: pop 3, push 6 → heap [5, 6].
4 < 5 → do nothing.

Final heap = [5, 6]. The 2 largest are 5 and 6. Root (5) is the 2nd largest.

Example 2: K Smallest — Walkthrough

Example: nums = [7, 10, 4, 3, 20, 15], K = 3. We want the 3 smallest: 3, 4, 7.

Max heap of size 3 implemented as min heap of negatives:

Push -7 → heap [-7].
Push -10 → heap [-10, -7]. Root -10 means "largest of small set" is 10.
Push -4 → heap [-10, -7, -4].
3 < 10 (current max of small set) → replace -10 with -3 → heap [-7, -3, -4].
20 > 7 → ignore.
15 > 7 → ignore.

Heap contains [-7, -3, -4] → values [7, 3, 4]. The 3 smallest are 3, 4, 7.

Example 3: Code Run with Output

import heapq

nums = [3, 2, 1, 5, 6, 4]
k = 2

# K largest
heap = []
for x in nums:
    if len(heap) < k:
        heapq.heappush(heap, x)
    elif x > heap[0]:
        heapq.heapreplace(heap, x)

print("K largest (heap order):", heap)           # e.g. [5, 6]
print("K-th largest value:", heap[0])           # 5 (2nd largest)
print("Sorted K largest:", sorted(heap, reverse=True))  # [6, 5]

Output:

K largest (heap order): [5, 6]
K-th largest value: 5
Sorted K largest: [6, 5]

Example 4: K Most Frequent (Top K by Frequency)

Example: Given nums = [1, 1, 1, 2, 2, 3] and K = 2, return the 2 most frequent elements: 1 (freq 3) and 2 (freq 2). Here the "score" is frequency; we want "top K by frequency."

Approach: Count frequencies, then use a min heap of size K on (frequency, element). Python compares tuples by first element, then second. So we store (freq, item) and the heap keeps the K pairs with largest freq (smallest of those K at root).

from collections import Counter
import heapq

def top_k_frequent(nums, k):
    count = Counter(nums)
    heap = []
    for num, freq in count.items():
        if len(heap) < k:
            heapq.heappush(heap, (freq, num))
        elif freq > heap[0][0]:
            heapq.heapreplace(heap, (freq, num))
    return [num for _, num in heap]

# Example run
nums = [1, 1, 1, 2, 2, 3]
print(top_k_frequent(nums, 2))  # [2, 1] or [1, 2] (heap order)

Result: the two most frequent elements are 1 and 2.

Time and Space Complexity

Time: O(n log K) — n elements, each heap operation O(log K).
Space: O(K) for the heap. If you count a frequency map, O(n) for that; heap alone is O(K).

Edge Cases

K ≤ 0 or empty array: Return empty list.
K ≥ n: All elements are "top K"; return a copy of the array (or heap of all).
K = 1: One pass with a single variable (or a heap of size 1) to track max or min.
Duplicates: Heap approach naturally handles duplicates; they can coexist in the heap.

Common Mistakes

Common Mistake: Using a max heap for "K largest." That would keep the global max at root but you'd have to pop K times (K log n) and you'd get the K largest in reverse order. The efficient pattern is min heap of size K so the "worst" of the top K is at the root and you only replace when something better comes.

Common Mistake: Forgetting that heapq is a min heap. For K smallest you must negate values (or use a custom comparator) to simulate a max heap of size K.

Pattern Recognition

Use the Top K heap pattern when you see:

"K largest," "K smallest," "K most frequent," "K closest," "K closest points to origin."
Streaming data where you can't sort the whole input.
Problems that reduce to "maintain a set of K best candidates and update as we scan."

Interview Insight

Interview Insight: State clearly: "I'll use a min heap of size K for K largest so the root is the smallest of the top K. When a larger element appears, I replace the root." Mention time O(n log K) and space O(K). For "K smallest," say you use a max heap of size K (e.g. negated min heap in Python).

Practice Problems

LeetCode 215: Kth Largest Element in an Array (use min heap of size K or quickselect).
LeetCode 347: Top K Frequent Elements (count + min heap on frequency).
LeetCode 373: Find K Pairs with Smallest Sums (heap of pairs from two sorted arrays).
K closest points to origin: maintain min heap of size K by distance (or max heap of K smallest distances).

Summary

Top K largest: Min heap of size K; replace root when a larger element is seen. Result: heap contains K largest; root = K-th largest.
Top K smallest: Max heap of size K (in Python: min heap of negated values); replace when a smaller element is seen.
Time O(n log K), space O(K). Beats full sort when K ≪ n and supports streaming.
Same idea extends to "top K by any key" (e.g. frequency): use (key, item) in the heap and compare by key.

12.5 Merge K Sorted Lists

Introduction

In many problems, we are given K sorted lists (or arrays, or linked lists) and asked to merge them into one sorted list. A naive pairwise merge can be slow when K or the total number of elements is large. A min heap gives a clean, optimal solution that generalizes the two-list merge you saw in merge sort.

Real-World Analogy

Imagine K checkout queues in a supermarket, each already ordered by arrival time. You want to reconstruct the global order in which customers arrived across all queues. You always pick the earliest arriving customer among the queue fronts, then advance that queue. A min heap automates "find the earliest front" in O(log K) time.

Problem Definition

Input: K sorted lists, total of N elements.
Output: A single sorted list containing all N elements.
Goal: Better than O(NK) and easy to implement for arbitrary K.

Brute Force and Better Approaches

Approach 1: Concatenate + Sort (Brute Force)

Concatenate all K lists into one big list (size N).
Sort the big list using a standard sort: O(N log N).

Simple, but it ignores the fact that each list is already sorted.

Approach 2: Repeated Pairwise Merge

Merge list 1 and list 2 (like merge sort) into a new sorted list.
Merge that result with list 3, and so on.

Each merge of two lists of total size M costs O(M). In the worst case (unbalanced merging), the total complexity can approach O(NK). Even if you balance merges (like a tournament tree), the complexity is O(N log K), but the implementation is more involved.

Approach 3 (Optimal and Clean): Min Heap

Use a min heap of size at most K, where each heap entry represents the "current front" element from one of the lists. Repeatedly extract the smallest element from the heap and push the next element from the same list.

Time: O(N log K) — N heap operations, each O(log K).
Space: O(K) extra for the heap, plus the output list.

Mental Model

Visualize each list as a line of sorted items with a pointer at the front.
The heap always stores at most one element per list: the current front with its list index.
At each step, you remove the globally smallest front from the heap and advance that list’s pointer.

ASCII Diagram

Given 3 sorted lists:
L0: 1  4  7
L1: 2  5  8
L2: 3  6  9

Initial heap (value, list_index, element_index):
[(1, 0, 0), (2, 1, 0), (3, 2, 0)]

Pop (1,0,0) → output [1], push next from L0 → (4,0,1)
Heap: [(2,1,0), (3,2,0), (4,0,1)]

Pop (2,1,0) → output [1,2], push (5,1,1)
Heap: [(3,2,0), (4,0,1), (5,1,1)]

... continue until heap is empty ...

Final output: [1,2,3,4,5,6,7,8,9]

Python Implementation (Lists of Arrays)

import heapq

def merge_k_sorted_lists(lists):
    """
    Merge K sorted lists (Python lists) into one sorted list.
    lists: List[List[int]]
    Returns: List[int]
    """
    heap = []
    result = []

    # 1) Initialize heap with first element of each non-empty list
    for list_idx, arr in enumerate(lists):
        if arr:  # non-empty
            first_val = arr[0]
            heapq.heappush(heap, (first_val, list_idx, 0))  # (value, which list, index in that list)

    # 2) Extract-min and push next from same list
    while heap:
        val, list_idx, elem_idx = heapq.heappop(heap)
        result.append(val)

        next_idx = elem_idx + 1
        if next_idx < len(lists[list_idx]):
            next_val = lists[list_idx][next_idx]
            heapq.heappush(heap, (next_val, list_idx, next_idx))

    return result

Example: Arrays

Example:

lists = [
    [1, 4, 5],
    [1, 3, 4],
    [2, 6]
]

print(merge_k_sorted_lists(lists))

Step-by-step heap evolution (values only):

Initial heap: [1 (L0), 1 (L1), 2 (L2)] → pop 1 (L0), push 4 → output [1].
Heap: [1 (L1), 2 (L2), 4 (L0)] → pop 1 (L1), push 3 → output [1, 1].
Heap: [2 (L2), 4 (L0), 3 (L1)] → pop 2 (L2), push 6 → output [1, 1, 2].
Heap: [3 (L1), 4 (L0), 6 (L2)] → pop 3, push 4 → output [1, 1, 2, 3].
Heap: [4 (L0), 6 (L2), 4 (L1)] → pop 4 (L0), push 5 → output [1, 1, 2, 3, 4].
Heap: [4 (L1), 6 (L2), 5 (L0)] → pop 4 (L1) → output [1, 1, 2, 3, 4, 4].
Heap: [5 (L0), 6 (L2)] → pop 5 → output [1, 1, 2, 3, 4, 4, 5].
Heap: [6 (L2)] → pop 6 → output [1, 1, 2, 3, 4, 4, 5, 6].

Final result: [1, 1, 2, 3, 4, 4, 5, 6].

Linked List Variant (LeetCode-style)

Many interview problems use linked lists instead of arrays. The idea is identical: the heap stores (node.val, list_id, node). On pop, you append node.val to the result list and push node.next if it exists.

import heapq

class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next

def merge_k_sorted_linked_lists(lists):
    """
    lists: List[ListNode] - each is the head of a sorted linked list.
    Returns: head of merged sorted linked list.
    """
    heap = []
    # Initialize heap with head of each list
    for i, node in enumerate(lists):
        if node:
            heapq.heappush(heap, (node.val, i, node))

    dummy = ListNode(0)
    tail = dummy

    while heap:
        val, i, node = heapq.heappop(heap)
        tail.next = node
        tail = tail.next
        if node.next:
            heapq.heappush(heap, (node.next.val, i, node.next))

    return dummy.next

Time and Space Complexity

N = total number of elements across all K lists.
Each element is pushed to and popped from the heap at most once.
Each heap operation costs O(log K) (heap size ≤ K).
Total time: O(N log K).
Extra space: O(K) for the heap, plus O(N) for the merged output.

Edge Cases

All lists empty: Heap is empty, result is empty list.
Some lists empty: We simply skip them during initialization.
K = 1: Return that list directly.
K = 0: Return empty list.

Common Mistakes

Common Mistake: Pushing all N elements into the heap at once. That makes heap size N and operations cost O(log N) each, giving O(N log N) overall. The optimal solution keeps only the current front of each list in the heap (size ≤ K).

Common Mistake: Forgetting to track which list an element came from, so you don't know which next element to push. Always store the list index (or list id) in the heap tuple.

Pattern Recognition

Use this pattern whenever you see:

"Merge K sorted arrays/lists/streams."
"Multi-way merge" problems (e.g. merging log files from many servers).
Anything that sounds like "always take the smallest front across multiple sorted sources."

Interview Insight

Interview Insight: Clearly state: "I'll put the first element of each list into a min heap along with its list index. Then, while the heap is not empty, I pop the smallest element, append it to the result, and push the next element from that list. This runs in O(N log K) time and O(K) extra space." This explanation shows you understand both the algorithm and its complexity.

Summary

Goal: Merge K sorted lists with total N elements efficiently.
Tool: Min heap holding the current front element of each list.
Complexity: O(N log K) time, O(K) extra space.
Extremely common pattern in system design (logs, streams) and interviews (e.g. LeetCode 23).

12.6 Median in Stream

Introduction

The Median in a Data Stream problem asks: as numbers arrive one by one (a stream), maintain a data structure so that at any moment you can quickly return the median of all elements seen so far. Re-sorting after every insertion would be too slow. The classic efficient solution uses two heaps: a max heap for the lower half and a min heap for the upper half, so the median is always at the "boundary" between them.

Real-World Analogy

Imagine a running race where times are reported live. You want to show the "middle" time so far — the median — after each finisher. You could sort all times after every new result, but that gets expensive. Instead, you keep two groups: the slower half (you only care about the fastest in that half, i.e. its max) and the faster half (you only care about the slowest in that half, i.e. its min). The median is either that max, that min, or their average, depending on how many numbers you have. Two heaps give you that max and min in O(1) and updates in O(log n).

Formal Definition

Stream: A sequence of numbers arriving one at a time. You must support:

addNum(num) — add a number to the stream.
findMedian() — return the median of all numbers added so far.

Median (for sorted order of current elements):

If count n is odd: median = middle element = element at index n // 2 (0-based).
If count n is even: median = average of the two middle elements = (element at n//2 - 1 + element at n//2) / 2.

In heap terms: the lower half has the smaller elements; the upper half has the larger elements. The "middle" is at the boundary: the max of the lower half and the min of the upper half.

Why This Topic Matters

Classic interview problem (e.g. LeetCode 295). Tests understanding of heaps and invariants.
Real use: sliding-window medians, real-time analytics, load balancing (median latency).
Pattern: "maintain two halves with a clear boundary" appears in other problems too.

Mental Model: Two Heaps

Lower half (left): A max heap — we need the largest of the small numbers. In Python, implement as a min heap of negated values.
Upper half (right): A min heap — we need the smallest of the large numbers.
Invariant: Size of lower half is either equal to or one more than the size of upper half. So the median is always: the max of the lower half (when total is odd), or the average of max-of-lower and min-of-upper (when total is even).

Evolution: Brute Force → Optimal

Brute Force: Store All + Sort on Query

Keep a list. On addNum, append. On findMedian, sort the list and return the middle (or average of two middles).

addNum: O(1). findMedian: O(n log n).

Optimal: Two Heaps (Max-Heap + Min-Heap)

Maintain lo (max heap for lower half) and hi (min heap for upper half). After each add, rebalance so that len(lo) >= len(hi) and len(lo) - len(hi) <= 1. Median = lo[0] when total is odd, or (lo[0] + hi[0]) / 2 when even (with negated lo, use -lo[0]).

addNum: O(log n). findMedian: O(1).
Space: O(n).

Step-by-Step: addNum with Two Heaps

If the lower-half (max) heap is empty or num <= current max of lower half, push num into the lower half (push -num into the min-heap representation of the max heap). Otherwise, push num into the upper-half min heap.
Rebalance sizes: if lower half has more than one extra element than upper half, move the max of the lower half to the upper half (pop from lo, push negated value to hi). If upper half becomes larger than lower half, move the min of the upper half to the lower half (pop from hi, push negated to lo).
After rebalance: len(lo) >= len(hi) and len(lo) - len(hi) <= 1.

ASCII Diagram

After adding: 1, 2, 3, 4, 5

Lower half (max heap, stored as min heap of -x):  [-3, -2, -1]  → max = 3
Upper half (min heap):                           [4, 5]

Total count = 5 (odd). Median = max of lower = 3.

After adding 6:
Lower: [-3,-2,-1]   Upper: [4,5,6]
Rebalance: sizes 3 and 3 → even. Median = (3 + 4) / 2 = 3.5

Python Implementation

import heapq

class MedianFinder:
    def __init__(self):
        self.lo = []   # max heap of lower half (store -x for min-heap simulation)
        self.hi = []   # min heap of upper half

    def addNum(self, num: int) -> None:
        if not self.lo or num <= -self.lo[0]:
            heapq.heappush(self.lo, -num)
        else:
            heapq.heappush(self.hi, num)

        # Rebalance: we want len(lo) >= len(hi) and len(lo) - len(hi) <= 1
        if len(self.lo) > len(self.hi) + 1:
            move = -heapq.heappop(self.lo)
            heapq.heappush(self.hi, move)
        elif len(self.hi) > len(self.lo):
            move = heapq.heappop(self.hi)
            heapq.heappush(self.lo, -move)

    def findMedian(self) -> float:
        if len(self.lo) > len(self.hi):
            return -self.lo[0]
        return (-self.lo[0] + self.hi[0]) / 2.0

Examples Section

Example 1: Step-by-Step Walkthrough

Example: Operations: addNum(1), addNum(2), findMedian(), addNum(3), findMedian().

addNum(1): lo = [-1], hi = []. Median would be 1 (odd count).
addNum(2): 2 > 1 (max of lo), so push to hi. lo = [-1], hi = [2]. Sizes 1 and 1; no rebalance. findMedian() → (1 + 2) / 2 = 1.5.
addNum(3): 3 > 1, push to hi. lo = [-1], hi = [2, 3]. Now len(hi) > len(lo); rebalance: move 2 from hi to lo. lo = [-2, -1], hi = [3]. findMedian() → -lo[0] = 2.

Example 2: Code Run with Output

mf = MedianFinder()
mf.addNum(1)
mf.addNum(2)
print(mf.findMedian())   # 1.5
mf.addNum(3)
print(mf.findMedian())   # 2.0
mf.addNum(4)
mf.addNum(5)
print(mf.findMedian())   # 3.0

Output:

1.5
2.0
3.0

Example 3: Even vs Odd Count

Example: After 1, 2, 3, 4 (even count): lower half = [1, 2], upper half = [3, 4]. Median = (2 + 3) / 2 = 2.5. After adding 5 (odd): lower half has 3 elements, upper 2; median = max of lower = 3.

Time and Space Complexity

addNum: O(log n) — at most two heap operations (one push, one possible move).
findMedian: O(1) — just reading the root of one or two heaps.
Space: O(n) — all elements stored in the two heaps.

Edge Cases

No elements: Define behavior (e.g. return 0 or raise). LeetCode 295 assumes at least one add before first findMedian.
Single element: Median is that element; one heap has one element, the other is empty.
Duplicate values: Algorithm works; duplicates can go to either half by the <= comparison (consistent with "lower half" containing the middle).

Common Mistakes

Common Mistake: Letting the two heaps get out of balance (e.g. upper half much larger than lower). Then the "middle" is not at the roots. Always rebalance after every add so that the median is defined by the two roots.

Common Mistake: Using a single heap. One heap doesn't give you the middle element; you need the boundary between two halves, hence two heaps.

Pattern Recognition

Use two heaps when you need:

Median (or similar "middle" statistic) in a stream or dynamic set.
Fast access to both "largest of the small" and "smallest of the large" with incremental updates.

Interview Insight

Interview Insight: Say: "I'll maintain a max heap for the lower half and a min heap for the upper half. On add, I push to the appropriate heap and rebalance so the lower half has at most one more element than the upper. Median is the max of the lower half if total is odd, or the average of the two roots if even. addNum is O(log n), findMedian is O(1)."

Practice Problems

LeetCode 295: Find Median from Data Stream (exact problem).
Sliding window median: maintain two heaps for the window and update as the window moves.

Summary

Median in stream = two heaps: max heap (lower half) + min heap (upper half).
Invariant: len(lower) >= len(upper) and len(lower) - len(upper) <= 1.
Median: odd total → root of lower; even total → average of the two roots.
addNum O(log n), findMedian O(1), space O(n).

Section 13: Graph Theory

This section introduces graphs: nodes (vertices) connected by edges. You will learn how to represent graphs in code, traverse them using BFS and DFS, and build up to powerful algorithms like Dijkstra, Topological Sort, Minimum Spanning Tree, and Network Flow. Mastering graph representation is the first and most important step: if you choose the wrong representation, your algorithms will be harder to write, reason about, and optimize.

13.1 Graph Representation

Introduction

A graph is a set of vertices (nodes) connected by edges. Before you can run BFS/DFS, Dijkstra, or any other graph algorithm, you must decide how to store the graph in memory. In Python, the most common representations are:

Edge list
Adjacency matrix
Adjacency list (the most common in competitive programming and interviews)

Key Concepts

Directed vs Undirected: In a directed graph (digraph), edges have direction (u → v). In an undirected graph, edges are two-way (u — v).
Weighted vs Unweighted: Weighted edges have a cost/weight (e.g. distance, time). Unweighted edges are effectively weight 1.
Sparse vs Dense: A graph with few edges compared to n² is sparse; one with many edges is dense. This heavily influences representation choice.

Representation 1: Edge List

Store the graph simply as a list of edges, each edge being a pair (u, v) for unweighted, or a triple (u, v, w) for weighted graphs.

# Undirected, unweighted edge list
edges = [
    (0, 1),
    (0, 2),
    (1, 2),
    (2, 3),
]

Concept Note: Edge lists are great for algorithms that naturally iterate over edges (like Kruskal's MST), but they are inefficient for neighbor lookups like "what are all neighbors of node u?" (O(m) scan, where m is number of edges).

Representation 2: Adjacency Matrix

For a graph with n vertices (typically labeled 0..n-1), an adjacency matrix is an n × n 2D array where entry matrix[u][v] indicates whether there is an edge from u to v (and possibly stores its weight).

n = 4

# Unweighted directed graph: 1 means edge exists, 0 means no edge
matrix = [[0] * n for _ in range(n)]

edges = [(0, 1), (0, 2), (1, 2), (2, 3)]
for u, v in edges:
    matrix[u][v] = 1

# For undirected: mark both directions
for u, v in edges:
    matrix[u][v] = 1
    matrix[v][u] = 1

Pros and Cons

Pros: O(1) check if edge (u, v) exists; very simple; good for dense graphs.
Cons: Uses O(n²) space even if there are few edges; iterating neighbors of u is O(n) (scan the whole row).

Representation 3: Adjacency List (Preferred)

The adjacency list stores, for each vertex u, a list of its neighbors. In Python, we usually use a list of lists (for 0..n-1 vertex labels) or a dictionary mapping each node to a list of neighbors.

n = 4

# Unweighted, undirected graph using list of lists
adj = [[] for _ in range(n)]

edges = [(0, 1), (0, 2), (1, 2), (2, 3)]
for u, v in edges:
    adj[u].append(v)
    adj[v].append(u)  # for undirected

print(adj)  # e.g. [[1, 2], [0, 2], [0, 1, 3], [2]]

For weighted graphs, we store (neighbor, weight) pairs:

# Weighted directed graph
adj = [[] for _ in range(n)]

weighted_edges = [
    (0, 1, 5),   # edge 0 -> 1 with weight 5
    (0, 2, 2),
    (1, 2, 1),
    (2, 3, 7),
]

for u, v, w in weighted_edges:
    adj[u].append((v, w))

Pros and Cons

Pros: Space O(n + m) (where m is number of edges): ideal for sparse graphs; iterating all neighbors of u is O(deg(u)), which is often small.
Cons: Checking if an arbitrary edge (u, v) exists may require an O(deg(u)) scan.

ASCII Diagram and Adjacency List Example

Graph (undirected):

   0
  / \
 1---2
      \
       3

Edges: (0,1), (0,2), (1,2), (2,3)

Adjacency list:
0: 1, 2
1: 0, 2
2: 0, 1, 3
3: 2

Python Example: Building and Traversing a Graph

Let's build an undirected, unweighted graph using an adjacency list and run a simple BFS from node 0.

from collections import deque

def build_undirected_graph(n, edges):
    adj = [[] for _ in range(n)]
    for u, v in edges:
        adj[u].append(v)
        adj[v].append(u)
    return adj

def bfs(start, adj):
    n = len(adj)
    visited = [False] * n
    order = []
    q = deque([start])
    visited[start] = True

    while q:
        u = q.popleft()
        order.append(u)
        for v in adj[u]:
            if not visited[v]:
                visited[v] = True
                q.append(v)
    return order

n = 4
edges = [(0, 1), (0, 2), (1, 2), (2, 3)]
adj = build_undirected_graph(n, edges)
print(\"Adjacency list:\", adj)
print(\"BFS from 0:\", bfs(0, adj))

Example Run

Example: For the graph above, BFS from node 0 might visit nodes in the order [0, 1, 2, 3] (depending on neighbor order). The key takeaway: once you choose an adjacency list representation, algorithms like BFS and DFS become straightforward to implement.

When to Use Which Representation

Representation	Space	Best For
Edge list	O(m)	Algorithms that iterate edges (e.g. Kruskal)
Adjacency matrix	O(n²)	Dense graphs, constant-time edge checks
Adjacency list	O(n + m)	Sparse graphs, BFS/DFS, Dijkstra, most interview problems

Time and Space Complexity Summary

Edge list: Space O(m). Neighbor iteration O(m). Edge existence check O(m).
Adjacency matrix: Space O(n²). Neighbor iteration for u: O(n). Edge check (u, v): O(1).
Adjacency list: Space O(n + m). Neighbor iteration for u: O(deg(u)). Edge check (u, v): O(deg(u)).

Common Mistakes

Common Mistake: Using an adjacency matrix for a very large sparse graph (e.g. n = 10⁵). This leads to O(n²) memory which is impossible in practice; use adjacency lists instead.

Common Mistake: Forgetting to add both (u, v) and (v, u) for undirected graphs when using adjacency lists or matrices.

Interview Insight

Interview Insight: At the start of any graph problem, say explicitly: \"I will represent the graph as an adjacency list of size n, where adj[u] holds all neighbors of u (and weights if needed). This gives O(n + m) space and lets BFS/DFS run in O(n + m).\" This shows that you understand the trade-offs and will make your later code easier to follow.

Summary

Graphs can be represented as edge lists, adjacency matrices, or adjacency lists.
For most interview problems and competitive programming tasks, adjacency lists are the right default choice.
The representation strongly affects time and space complexity of graph algorithms; choose based on n, m, and required operations.

13.2 BFS

Introduction

Breadth-First Search (BFS) is a graph (and tree) traversal algorithm that explores all nodes at the current "distance" (number of edges) from the source before moving to nodes one step farther. It uses a queue: you process nodes in the order they were discovered, which naturally gives you level-by-level exploration. BFS is the standard tool for shortest path in unweighted graphs, finding connected components, and many grid/puzzle problems.

Real-World Analogy

Imagine a rumor spreading in a social network: it starts from one person. In the first minute, all direct friends hear it. In the second minute, all friends of those friends (who haven't heard yet) hear it, and so on. BFS does exactly this: it "spreads" from the source in waves. The first wave is distance 1, the next is distance 2, etc. So the first time you reach a node, you've found a shortest path to it (in terms of number of edges).

Formal Definition

Input: A graph (adjacency list or matrix), and a source vertex s.
Output: Depending on the problem: visitation order, distances from s, a shortest-path tree (parent pointers), or simply "all reachable nodes."
Key property: In an unweighted graph, BFS from s visits vertices in non-decreasing order of shortest-path distance (number of edges) from s. The first time a node is reached, that path is a shortest path.

Why This Topic Matters

Shortest path in unweighted graphs: BFS gives O(V + E) solution; no need for Dijkstra.
Level-order traversal in trees; multi-source BFS (e.g. all 0s as sources in a grid).
Connected components, bipartite checking, and many interview problems (grid, word ladder, etc.).

Mental Model

Use a queue (FIFO). Start by enqueueing the source and marking it visited.
While the queue is not empty: dequeue a node u, then enqueue all unvisited neighbors of u and mark them visited. Those neighbors are "one edge farther" than u.
Because we process in FIFO order, we always finish all nodes at distance d before processing any node at distance d + 1.

Step-by-Step Algorithm

Create a queue and a visited set (or array). Enqueue s and mark s visited.
Optionally, set dist[s] = 0 and parent[s] = None.
While the queue is not empty:
- Dequeue u.
- For each neighbor v of u: if v is not visited, mark v visited, set dist[v] = dist[u] + 1 (and parent[v] = u if needed), and enqueue v.

ASCII Diagram

Graph (undirected):     BFS from 0 (queue order)
   0                         Level 0: 0
  / \                        Level 1: 1, 2
 1---2                        Level 2: 3
  \ /
   3

Queue steps: [0] → pop 0, add 1,2 → [1,2] → pop 1, add 3 → [2,3] → pop 2 → [3] → pop 3 → []
Visitation order: 0, 1, 2, 3. Distances: d[0]=0, d[1]=1, d[2]=1, d[3]=2.

Python Implementation

from collections import deque

def bfs_shortest_paths(adj, start):
    """
    adj: list of lists (adjacency list), indices 0..n-1.
    start: source vertex.
    Returns: (dist, parent) where dist[u] = shortest distance from start to u,
             parent[u] = predecessor on a shortest path (None for start).
    """
    n = len(adj)
    dist = [-1] * n
    parent = [None] * n
    dist[start] = 0
    q = deque([start])

    while q:
        u = q.popleft()
        for v in adj[u]:
            if dist[v] == -1:
                dist[v] = dist[u] + 1
                parent[v] = u
                q.append(v)

    return dist, parent

def path_from_parent(parent, start, end):
    """Reconstruct path from start to end using parent array."""
    path = []
    cur = end
    while cur is not None:
        path.append(cur)
        cur = parent[cur]
    path.reverse()
    return path if path and path[0] == start else []

Examples Section

Example 1: BFS Visitation and Distances

Example: Graph: 4 nodes, edges (0,1), (0,2), (1,2), (2,3). BFS from 0.

Adjacency list: adj = [[1, 2], [0, 2], [0, 1, 3], [2]]

Start: queue = [0], dist = [0, -1, -1, -1].
Pop 0: neighbors 1, 2 → dist[1]=1, dist[2]=1, queue = [1, 2].
Pop 1: neighbor 0 (visited), 2 (visited). Queue = [2].
Pop 2: neighbor 3 → dist[3]=2, queue = [3].
Pop 3: no new neighbors. Done.

Result: dist = [0, 1, 1, 2]. Shortest path from 0 to 3 has length 2 (e.g. 0→2→3).

Example 2: Code Run with Path Reconstruction

adj = [[1, 2], [0, 2], [0, 1, 3], [2]]
dist, parent = bfs_shortest_paths(adj, 0)
print("Distances:", dist)           # [0, 1, 1, 2]
print("Path 0 -> 3:", path_from_parent(parent, 0, 3))  # [0, 2, 3]

Output:

Distances: [0, 1, 1, 2]
Path 0 -> 3: [0, 2, 3]

Example 3: Multi-Source BFS (Grid)

Example: Given a 2D grid, some cells are "sources" (e.g. value 0). Find the distance from each cell to its nearest source. Idea: start with all source cells in the queue at distance 0; then run BFS. Each step adds 1 to the distance.

from collections import deque

def grid_bfs_distances(grid, source_value=0):
    """grid: 2D list. Cells with value source_value are sources. Return 2D dist."""
    rows, cols = len(grid), len(grid[0])
    dist = [[-1] * cols for _ in range(rows)]
    q = deque()
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == source_value:
                dist[r][c] = 0
                q.append((r, c))
    while q:
        r, c = q.popleft()
        for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
            nr, nc = r + dr, c + dc
            if 0 <= nr < rows and 0 <= nc < cols and dist[nr][nc] == -1:
                dist[nr][nc] = dist[r][c] + 1
                q.append((nr, nc))
    return dist

# Example: 3x3 grid, (0,0) and (2,2) are sources (value 0)
grid = [[0, 1, 1], [1, 1, 1], [1, 1, 0]]
print(grid_bfs_distances(grid))  # distances to nearest 0

Time and Space Complexity

Time: O(V + E) — each vertex is enqueued and dequeued at most once; each edge is considered at most once (for directed) or twice (for undirected).
Space: O(V) for visited/dist/parent and the queue (queue can hold up to O(V) vertices in the worst case).

Edge Cases

Disconnected graph: BFS from s only reaches nodes in the same component. To visit all nodes, run BFS from each unvisited node (or use a loop over components).
Single node: Queue starts with [s], then empty; dist[s]=0, others -1.
Directed graph: Same algorithm; only follow outgoing edges from u (adj[u]).

Common Mistakes

Common Mistake: Forgetting to mark a node as visited when you enqueue it (not when you dequeue it). If you mark on dequeue, the same node can be enqueued multiple times and you lose O(V+E) and may get wrong distances.

Common Mistake: Using a stack instead of a queue. That gives DFS, not BFS; you no longer get level-by-level order or correct unweighted shortest paths.

Pattern Recognition

Use BFS when you need:

Shortest path in terms of number of edges (unweighted graph).
Level-by-level or "distance in steps" exploration (e.g. word ladder, grid moves).
Multi-source shortest distances (all sources in the queue at distance 0).

Interview Insight

Interview Insight: Say: "For unweighted shortest path I'll use BFS with a queue. I'll maintain a distance array and only enqueue a node when we first discover it, so each node is processed once. That gives O(V+E) and guarantees the first time we reach a node we have a shortest path."

Practice Problems

LeetCode 127: Word Ladder (BFS over "word graph").
LeetCode 542: 01 Matrix (multi-source BFS from 0s).
LeetCode 1091: Shortest Path in Binary Matrix (BFS on grid).

Summary

BFS = explore graph using a queue; visit nodes in order of increasing distance from the source.
In unweighted graphs, BFS from s computes shortest-path distances (and a shortest-path tree) in O(V + E).
Mark nodes visited when you enqueue them; use a queue (deque), not a stack.
Multi-source BFS: start with all sources in the queue at distance 0.

13.3 DFS

Introduction

Depth-First Search (DFS) is a fundamental graph traversal algorithm that explores as far as possible along one path before backtracking. You can think of it as always going \"deeper\" first, using a stack (either explicit or via recursion). DFS is the basis for many advanced algorithms: topological sort, cycle detection, connected components, articulation points & bridges, strongly connected components (SCC), and more.

DFS vs BFS: Mental Contrast

BFS explores in layers (level by level) using a queue → good for shortest paths in unweighted graphs.
DFS explores by going deep into the graph using a stack/recursion → good for exploring structure, detecting cycles, and topological ordering.

Formal Definition

Input: A graph (usually adjacency list), and optionally a starting node.
Output: A traversal order, discovery/finish times, connected components, or answers to questions like \"is there a path between u and v?\", \"is the graph acyclic?\", etc.

Recursive DFS: Core Idea

The recursive DFS for a starting node u is:

Mark u as visited.
For each neighbor v of u:
- If v is not visited, recursively DFS from v.

ASCII Diagram

Graph (undirected):

   0
  / \
 1   2
     |
     3

DFS from 0 (one possible order):
0 → 1 (backtrack) → 2 → 3 (backtrack) → done

Python Implementation (Recursive)

def dfs_recursive(adj, start, visited=None, order=None):
    \"\"\"Depth-first search from start. Returns visitation order.\"\"\"
    if visited is None:
        visited = set()
    if order is None:
        order = []

    visited.add(start)
    order.append(start)

    for v in adj[start]:
        if v not in visited:
            dfs_recursive(adj, v, visited, order)

    return order

Python Implementation (Iterative with Stack)

def dfs_iterative(adj, start):
    visited = set()
    order = []
    stack = [start]

    while stack:
        u = stack.pop()
        if u in visited:
            continue
        visited.add(u)
        order.append(u)
        # Push neighbors in reverse if you want a specific order
        for v in reversed(adj[u]):
            if v not in visited:
                stack.append(v)

    return order

Examples Section

Example 1: Simple DFS Order

Example: Graph: 4 nodes, edges (0,1), (0,2), (2,3). Adjacency list: adj = [[1,2],[0],[0,3],[2]].

Using dfs_recursive(adj, 0):

Start at 0: visit 0, then neighbor 1 → visit 1 (backtrack).
Back at 0: next neighbor 2 → visit 2, then neighbor 3 → visit 3 (backtrack).
Traversal order: [0, 1, 2, 3] (one valid DFS order).

Example 2: Counting Connected Components

Example: Given an undirected graph, count how many connected components it has. DFS from each unvisited node and increment the component count.

def count_components(adj):
    n = len(adj)
    visited = [False] * n
    components = 0

    def dfs(u):
        visited[u] = True
        for v in adj[u]:
            if not visited[v]:
                dfs(v)

    for u in range(n):
        if not visited[u]:
            components += 1
            dfs(u)

    return components

For adj = [[1], [0], [3], [2]] (two separate edges 0–1 and 2–3), count_components(adj) returns 2.

Example 3: Cycle Detection in Undirected Graph

Example: Use DFS to detect if an undirected graph contains a cycle. If during DFS you visit a neighbor that is already visited and is not the parent, you found a cycle.

def has_cycle_undirected(adj):
    n = len(adj)
    visited = [False] * n

    def dfs(u, parent):
        visited[u] = True
        for v in adj[u]:
            if not visited[v]:
                if dfs(v, u):
                    return True
            elif v != parent:
                # visited neighbor that is not parent → cycle
                return True
        return False

    for u in range(n):
        if not visited[u]:
            if dfs(u, -1):
                return True
    return False

Time and Space Complexity

Time: O(V + E) — each vertex and edge is explored at most once.
Space: O(V) for visited + recursion stack (in recursive version) or explicit stack (iterative).

Edge Cases

Disconnected graph: A single DFS from one start will not visit all nodes; run DFS from each unvisited node to cover all components.
Deep/long path: Recursive DFS may hit recursion depth limits in Python on very deep graphs; in those cases, prefer iterative DFS with an explicit stack.
Directed graph: DFS is defined the same way, but many algorithms (e.g. cycle detection, SCC) interpret edges differently.

Common Mistakes

Common Mistake: Forgetting to mark nodes as visited before recursing, which can lead to infinite recursion on cycles.

Common Mistake: Assuming DFS gives shortest paths in terms of edges — that is only guaranteed for BFS in unweighted graphs, not DFS.

Pattern Recognition

Use DFS when you need:

To explore all reachable nodes and recurse on structure (trees, graphs).
To find connected components, cycles, topological order, or articulation points & bridges.
To perform backtracking-style exploration (e.g. generating paths, solving puzzles on graphs).

Interview Insight

Interview Insight: Clearly state: \"I'll use DFS with a visited set to avoid infinite loops on cycles. The recursive function marks the node, explores all unvisited neighbors, and backtracks. This runs in O(V + E) time and uses O(V) extra space for visited and the recursion stack.\"

Summary

DFS explores depth-first using recursion or an explicit stack.
Time complexity O(V + E), space O(V).
Key for many graph algorithms: connected components, cycle detection, topological sort, SCC, and more.

13.4 Topological Sort (DFS & Kahn's Algorithm)

Introduction

A topological sort of a directed acyclic graph (DAG) is a linear ordering of its vertices such that for every directed edge u → v, u comes before v in the ordering. Topological order is the backbone of many dependency problems: task scheduling, build systems, course prerequisites, and more. In this topic, we will learn two classic algorithms: DFS-based topological sort and Kahn's algorithm (BFS + indegree).

Real-World Analogy

Imagine you have a set of courses with prerequisites: to take course B, you must first complete course A. This forms a directed edge A → B. A topological ordering is a valid sequence of courses you can follow so that all prerequisites are satisfied. Similarly, in build systems, some files or modules must be built before others; topological sort gives a valid build order.

When Topological Order Exists

The graph must be a DAG (Directed Acyclic Graph).
If there is a cycle (like A → B → C → A), no linear order can satisfy all edges.

Method 1: DFS-Based Topological Sort

Mental Model

Think of running DFS on the directed graph. For each node, you first recursively visit all nodes reachable from it, and only after exploring all its outgoing edges do you \"finish\" the node and add it to a list. If you then reverse this list of finishing times, you get a valid topological order. Intuition: a node comes after all of its descendants in DFS finishing time, so reversing moves it before them.

Algorithm (DFS)

Maintain a visited array and a list order.
For each vertex u:
- If u is not visited, run dfs(u).
In dfs(u):
- Mark u as visited.
- For each neighbor v (edge u → v): if v is not visited, dfs(v).
- After exploring all neighbors, append u to order.
Reverse order; the result is a topological sort.

Python Implementation (DFS)

def topo_sort_dfs(adj):
    """
    adj: adjacency list of a directed graph, vertices 0..n-1.
    Returns a list of vertices in topological order.
    Assumes the graph is a DAG (no cycles).
    """
    n = len(adj)
    visited = [False] * n
    order = []

    def dfs(u):
        visited[u] = True
        for v in adj[u]:
            if not visited[v]:
                dfs(v)
        order.append(u)

    for u in range(n):
        if not visited[u]:
            dfs(u)

    order.reverse()
    return order

Method 2: Kahn's Algorithm (BFS + Indegree)

Mental Model

Kahn's algorithm repeatedly removes nodes with indegree 0 (no incoming edges). Such nodes have no prerequisites, so they can safely come next in the topological order. When you remove a node u, you conceptually delete its outgoing edges (u → v), which may cause some neighbors v to drop to indegree 0, and thus become candidates next.

Algorithm (Kahn's)

Compute indegree[v] for all vertices v (number of incoming edges).
Push all vertices with indegree[v] == 0 into a queue.
While the queue is not empty:
- Pop u from the queue, append u to the result order.
- For each neighbor v (edge u → v): decrement indegree[v] by 1; if it becomes 0, push v to the queue.
If order contains all vertices, you found a topological ordering. If not, the graph had a cycle.

Python Implementation (Kahn's Algorithm)

from collections import deque

def topo_sort_kahn(adj):
    n = len(adj)
    indegree = [0] * n
    for u in range(n):
        for v in adj[u]:
            indegree[v] += 1

    q = deque([u for u in range(n) if indegree[u] == 0])
    order = []

    while q:
        u = q.popleft()
        order.append(u)
        for v in adj[u]:
            indegree[v] -= 1
            if indegree[v] == 0:
                q.append(v)

    if len(order) != n:
        raise ValueError("Graph has a cycle; no topological ordering exists.")
    return order

Examples Section

Example 1: Simple DAG

Example: Courses: 0 → 1 → 3, and 0 → 2 → 3 (0 must come before 1 and 2; 1 and 2 must come before 3).

adj = [
    [1, 2],  # 0
    [3],     # 1
    [3],     # 2
    []       # 3
]

A valid topological order is [0, 1, 2, 3] or [0, 2, 1, 3]. Running either topo_sort_dfs(adj) or topo_sort_kahn(adj) will produce a valid ordering.

Example 2: Kahn's Algorithm Step-by-Step

Using the same graph as above:

Compute indegrees: indegree[0]=0, indegree[1]=1, indegree[2]=1, indegree[3]=2.
Queue starts with [0] (only node with indegree 0).
Pop 0 → order=[0]. Decrement indegree[1] to 0, indegree[2] to 0 → queue becomes [1, 2].
Pop 1 → order=[0, 1]. Decrement indegree[3] to 1 → queue=[2].
Pop 2 → order=[0, 1, 2]. Decrement indegree[3] to 0 → queue=[3].
Pop 3 → order=[0, 1, 2, 3]. Queue empty, order length = 4 = n → success.

Example 3: Detecting a Cycle

Example: Graph with edges 0 → 1, 1 → 2, 2 → 0 (a directed cycle). There is no valid topological order.

adj = [
    [1],  # 0
    [2],  # 1
    [0],  # 2
]

In Kahn's algorithm, no node will ever have indegree 0 (or after a few steps the queue becomes empty before we've processed all vertices). Our implementation checks len(order) != n and raises an error. In DFS-based approaches, you can detect a cycle by tracking recursion stack (colors: WHITE/GRAY/BLACK).

Time and Space Complexity

DFS-based topological sort: O(V + E) time, O(V) space (visited + recursion stack + order).
Kahn's algorithm: O(V + E) time, O(V) space (indegree array + queue + order).

Common Mistakes

Common Mistake: Trying to topologically sort graphs that contain cycles without checking for them. Always remember: topological ordering exists iff the graph is a DAG.

Common Mistake: Forgetting to reverse the DFS finishing order list; without reversing, parents can come after their children, breaking the topological property.

Pattern Recognition

Use topological sort when you see:

Tasks, jobs, or courses with prerequisites (dependencies form a DAG).
Build order of modules or packages based on dependency edges.
Any problem that says \"do X before Y\" for many pairs (X, Y) and asks for a valid global order.

Interview Insight

Interview Insight: Say: \"I'll model the prerequisites as a directed graph and compute a topological order. I can use DFS and append nodes on post-order then reverse, or use Kahn's algorithm with indegrees and a queue. Both run in O(V+E). If a cycle exists, there is no valid order, which I can detect if I process fewer than V nodes.\"

Summary

Topological sort orders vertices u before v for all edges u → v in a DAG.
DFS method: run DFS, push nodes after exploring neighbors, then reverse the list.
Kahn's algorithm: repeatedly remove indegree-0 nodes, updating neighbors' indegrees.
Time O(V + E), space O(V); only defined for DAGs (no cycles).

13.5 Dijkstra

Introduction

Dijkstra's algorithm finds the shortest path distances from a single source vertex to all other vertices in a graph with non-negative edge weights. It is one of the most important algorithms in graph theory and appears in routing (GPS, networks), scheduling, and many interview problems.

Key Requirements

Graph may be directed or undirected.
Edge weights must be non-negative. If negative edges exist, Dijkstra can give wrong answers (use Bellman-Ford or other algorithms instead).
Graph is usually represented with an adjacency list storing (neighbor, weight) pairs.

Real-World Analogy

Imagine you are at a city intersection (the source) and want to know the shortest driving distance to every other intersection. Initially, distances are infinity except for your starting point (distance 0). At each step, you permanently choose the not-yet-finalized intersection with the smallest known distance and \"relax\" edges from it, possibly updating distances of its neighbors if going through this intersection yields a shorter route. This is exactly what Dijkstra's algorithm does using a min-priority queue.

Mental Model

Maintain an array dist where dist[v] is the current best known distance from source s to v.
Use a min-heap (priority queue) to always pick the vertex u with the smallest tentative distance.
When you \"finalize\" a vertex (pop it from the heap with its minimal distance), you relax its outgoing edges u → v: if dist[u] + w(u,v) < dist[v], update dist[v] and push a new pair into the heap.

Algorithm (High-Level)

Initialize dist[v] = ∞ for all v, and dist[s] = 0 for the source vertex s.
Create a min-heap and push (0, s).
While the heap is not empty:
- Pop (d, u) from the heap. If d > dist[u], skip (this is an outdated entry).
- For each edge u → v with weight w, if dist[u] + w < dist[v], update dist[v] and push (dist[v], v) into the heap.

ASCII Diagram

Graph (directed, weighted):

    (1)
  0 ----> 1
  |       |
 (4)    (2)
  |       v
  v      3
  2 --(5) ^
   \      |
   (1)   (1)
     \    |
      v   |
       4 --

Edges:
0 -> 1 (1), 0 -> 2 (4)
1 -> 3 (2)
2 -> 3 (5), 2 -> 4 (1)
4 -> 3 (1)

Shortest distances from 0:
dist[0] = 0
dist[1] = 1
dist[2] = 4
dist[4] = 5
dist[3] = 6   (0 -> 1 -> 3 or 0 -> 2 -> 4 -> 3)

Python Implementation (Adjacency List + Heap)

import heapq

def dijkstra(adj, source):
    \"\"\"adj: adjacency list, adj[u] = list of (v, w) edges. Returns dist[] and parent[].\"\"\"
    n = len(adj)
    INF = float('inf')
    dist = [INF] * n
    parent = [None] * n
    dist[source] = 0

    heap = [(0, source)]  # (distance, node)

    while heap:
        d, u = heapq.heappop(heap)
        if d > dist[u]:
            continue  # outdated entry
        for v, w in adj[u]:
            nd = d + w
            if nd < dist[v]:
                dist[v] = nd
                parent[v] = u
                heapq.heappush(heap, (nd, v))

    return dist, parent

def reconstruct_path(parent, s, t):
    path = []
    cur = t
    while cur is not None:
        path.append(cur)
        cur = parent[cur]
    path.reverse()
    return path if path and path[0] == s else []

Examples Section

Example 1: Shortest Paths from a Source

Example: Use the graph from the ASCII diagram and compute shortest distances from node 0.

adj = [
    [(1, 1), (2, 4)],  # 0
    [(3, 2)],          # 1
    [(3, 5), (4, 1)],  # 2
    [],                # 3
    [(3, 1)],          # 4
]

dist, parent = dijkstra(adj, 0)
print(\"dist:\", dist)
print(\"path 0 -> 3:\", reconstruct_path(parent, 0, 3))

One possible output:

dist: [0, 1, 4, 6, 5]
path 0 -> 3: [0, 1, 3]

Example 2: Step-by-Step Heap Evolution (Intuition)

Start: dist[0]=0, others ∞. Heap = [(0,0)].
Pop (0,0): relax 0→1 (1) → dist[1]=1, push (1,1); relax 0→2 (4) → dist[2]=4, push (4,2). Heap=[(1,1),(4,2)].
Pop (1,1): relax 1→3 (2) → dist[3]=3, push (3,3). Heap=[(3,3),(4,2)].
Pop (3,3): no outgoing edges. Heap=[(4,2)].
Pop (4,2): relax 2→3 (5) → new distance 9 > current dist[3]=3, ignore; relax 2→4 (1) → dist[4]=5, push (5,4).
Pop (5,4): relax 4→3 (1) → new distance 6 > 3, ignore. Done.

Example 3: Unreachable Nodes

Example: If some nodes cannot be reached from the source, their dist[v] will remain ∞. You can treat that as \"unreachable\" in problem statements.

Time and Space Complexity

Let V = number of vertices, E = number of edges.
Each edge is relaxed at most once in the main loop.
Each relaxation may push a new entry into the heap. Heap operations cost O(log V).
Time: O(E log V) using a binary heap (Python's heapq).
Space: O(V + E) for the adjacency list, O(V) for dist/parent and the heap.

Edge Cases

Negative weights: Dijkstra is not valid if any edge weight is negative. Use Bellman-Ford or other algorithms instead.
Disconnected graph: Some nodes may remain at distance ∞, meaning unreachable from the source.
Multiple edges / self-loops: Algorithm still works; relaxations simply may never improve dist values.

Common Mistakes

Common Mistake: Using Dijkstra on graphs with negative edge weights; the algorithm assumes that once a node is popped with distance d, no shorter path will appear later.

Common Mistake: Not checking if d > dist[u] when popping from the heap, which can cause extra work or incorrect processing of outdated entries.

Pattern Recognition

Use Dijkstra when you see:

\"Shortest path\" or \"minimum cost\" in a graph with non-negative weights.
Grid problems where moving between cells has different positive costs (e.g. terrain costs).
Network routing, travel planning, or any path-finding with positive distances.

Interview Insight

Interview Insight: Say: \"I'll use Dijkstra's algorithm with a min-heap. I keep a dist array initialized to ∞ except dist[source]=0. Each step I pop the node with smallest distance, relax its outgoing edges, and push neighbors with improved distances. This runs in O(E log V) with adjacency lists and a binary heap, and requires non-negative edge weights.\"

Summary

Dijkstra computes single-source shortest paths in graphs with non-negative edge weights.
Uses a min-heap priority queue and edge relaxation: if going through u shortens dist[v], update it.
Time complexity O(E log V), space O(V + E).
Do not use when negative edge weights are present; use Bellman-Ford or other algorithms instead.

13.6 Bellman-Ford

Introduction

Bellman-Ford is a single-source shortest-path algorithm that, unlike Dijkstra, works with negative edge weights. It can also detect negative cycles (cycles whose total weight is negative), in which case no finite shortest path exists from the source to nodes reachable through that cycle. The trade-off is higher time complexity: O(V · E) instead of O(E log V).

When to Use Bellman-Ford

Graph has negative edge weights (Dijkstra is invalid).
You need to detect negative cycles (e.g. arbitrage in currency graphs).
Dense graphs where V is small; O(V · E) may be acceptable.

Real-World Analogy

Imagine currency exchange rates: each edge (A → B) has a "cost" (e.g. −log(rate)). A path from currency A back to A with total negative cost means you can make money by cycling (arbitrage). Bellman-Ford can find shortest paths and, with one extra pass, tell you if such a "negative cycle" exists.

Algorithm (High-Level)

Initialize dist[s] = 0 and dist[v] = ∞ for all other vertices.
Repeat V − 1 times: for every edge (u, v) with weight w, relax: if dist[u] + w < dist[v], set dist[v] = dist[u] + w (and optionally update parent).
Negative cycle check: Run one more relaxation pass. If any edge (u, v) still improves dist[v], then the graph contains a negative cycle reachable from the source.

Why V − 1 rounds? A shortest path from s to any vertex has at most V − 1 edges. Each round relaxes all edges once; after V − 1 rounds, shortest paths of up to V − 1 edges have been propagated. If a path has V or more edges and is still improving, it must use a negative cycle.

Mental Model

Think of "waves" of relaxation: round 1 fixes shortest paths of length 1 edge, round 2 fixes paths of length 2, and so on. After V − 1 rounds, all finite shortest paths are correct.
If after V − 1 rounds you can still relax an edge, that relaxation is "driven" by a negative cycle.

ASCII Diagram

Directed graph (can have negative weights):

    (2)
  0 ----> 1
   \      |
 (-1)   (1)
    \    v
     \   2
      \  |
      (3)|
       v v
        3

Edges: 0→1(2), 0→2(-1), 1→2(1), 2→3(3)
No negative cycle. Shortest from 0: dist[0]=0, dist[1]=2, dist[2]=-1, dist[3]=2.

Python Implementation

def bellman_ford(edges, n, source):
    """
    edges: list of (u, v, w) directed edges. Vertices 0..n-1.
    Returns (dist, parent, has_negative_cycle).
    """
    INF = float('inf')
    dist = [INF] * n
    parent = [None] * n
    dist[source] = 0

    # V - 1 relaxation rounds
    for _ in range(n - 1):
        for u, v, w in edges:
            if dist[u] != INF and dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                parent[v] = u

    # Negative cycle detection: one more round
    has_negative_cycle = False
    for u, v, w in edges:
        if dist[u] != INF and dist[u] + w < dist[v]:
            has_negative_cycle = True
            break

    return dist, parent, has_negative_cycle

Examples Section

Example 1: Graph Without Negative Cycle

Example: n = 4, source = 0. Edges: (0,1,2), (0,2,-1), (1,2,1), (2,3,3).

edges = [(0, 1, 2), (0, 2, -1), (1, 2, 1), (2, 3, 3)]
dist, parent, has_neg = bellman_ford(edges, 4, 0)
print("dist:", dist)           # [0, 2, -1, 2]
print("negative cycle:", has_neg)  # False

Output: dist: [0, 2, -1, 2], negative cycle: False. Shortest path 0→2→3 has length 2.

Example 2: Graph With Negative Cycle

Example: Add edge (3, 0, -5). Now 0→2→3→0 forms a cycle 0→2(−1)+2→3(3)+3→0(−5) = −3 < 0. Bellman-Ford detects it.

edges = [(0, 1, 2), (0, 2, -1), (1, 2, 1), (2, 3, 3), (3, 0, -5)]
dist, parent, has_neg = bellman_ford(edges, 4, 0)
print("negative cycle:", has_neg)  # True

Output: negative cycle: True. Distances may be incorrect for nodes in or reachable through the cycle.

Example 3: Edge List vs Adjacency List

Bellman-Ford is typically implemented over an edge list (list of (u, v, w)) so that each round iterates over all edges once. If your graph is stored as an adjacency list, build the edge list first or iterate adj[u] for each u and relax (u, v, w) for each neighbor.

Time and Space Complexity

Time: O(V · E) — V − 1 rounds, each round O(E) relaxations.
Space: O(V) for dist and parent; O(E) for the edge list.

Edge Cases

Negative cycle reachable from source: Algorithm reports it; distances to nodes reachable via the cycle are meaningless (can be made arbitrarily negative).
Disconnected graph: Nodes unreachable from the source stay at ∞; they are not affected by negative cycles in other components.
Multiple edges: Include each edge in the edge list; relaxation handles them naturally.

Common Mistakes

Common Mistake: Using Bellman-Ford when all weights are non-negative; Dijkstra is faster (O(E log V)). Use Bellman-Ford only when you need negative weights or negative-cycle detection.

Common Mistake: Forgetting to check dist[u] != INF before relaxing u→v; otherwise "∞ + w" could incorrectly update dist[v] in languages where ∞ + negative is still ∞ (in Python it is, but the check keeps logic clear).

Pattern Recognition

Use Bellman-Ford when you see:

Shortest path with negative edge weights.
Negative cycle detection (e.g. arbitrage, fault detection in networks).
Constraints like "at most K edges" (modified Bellman-Ford can be used for limited-hop shortest path).

Interview Insight

Interview Insight: Say: "For graphs with negative weights I'll use Bellman-Ford. I do V−1 rounds of relaxing all edges, then one more round to detect if any distance still improves—if so, there's a negative cycle. Time O(V·E), space O(V)."

Summary

Bellman-Ford computes single-source shortest paths and detects negative cycles.
V − 1 rounds of full edge relaxation; one extra round to detect negative cycle.
Time O(V · E), space O(V). Use when edges can be negative or you need cycle detection.

13.7 Floyd-Warshall

Introduction

Floyd-Warshall is an all-pairs shortest path algorithm: it computes the shortest distance between every pair of vertices in a graph. It works with negative edge weights and can detect negative cycles. The algorithm is simple to implement (three nested loops) but has O(V³) time and space, so it is practical only when the number of vertices V is moderate (typically a few hundred or when you explicitly need a full distance matrix).

When to Use Floyd-Warshall

You need shortest path between every pair of nodes (e.g. distance matrix for a small graph).
Graph may have negative edge weights (unlike Dijkstra).
V is not too large (V² or V³ is acceptable).
Dense graphs: one run gives all pairs; running V times Dijkstra would be O(V · E log V), which can be worse for dense E ≈ V².

Real-World Analogy

Imagine a table of distances between every pair of cities. Initially you have direct road distances. Floyd-Warshall asks: "For each pair (A, B), could we get a shorter distance by going through city C?" It tries every intermediate city and updates the table. After considering all intermediates, the table holds the true shortest distances (and can reveal if any city acts as a "negative cycle" hub).

Dynamic Programming Idea

Define dist[i][j][k] = shortest path from i to j using only vertices {0, 1, …, k} as intermediates. Then:

Base: dist[i][j][−1] = direct edge weight (or ∞ if no edge). We use a 2D table and overwrite it: let dist[i][j] mean "shortest from i to j using intermediates 0..k" in round k.
Transition: Either we don't use vertex k, so dist[i][j] stays as is; or we go i → k → j, so dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]).

Algorithm (High-Level)

Initialize an n×n matrix dist: dist[i][j] = 0 if i == j, else weight of edge (i, j) or ∞ if no edge.
For k = 0 to n − 1: for each pair (i, j), set dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]).
Negative cycle: If after the loops any dist[i][i] < 0, the graph contains a negative cycle (you can go from i to i with negative cost).

Mental Model

In round k, we "allow" vertex k as an intermediate. For every pair (i, j), we ask: is it shorter to go i → k → j than our current best? The order of the k-loop matters (k must be the outer loop); i and j can be in any order.

ASCII Diagram

Graph (4 nodes, directed, weighted):

  0 --(2)--> 1
  |          |
 (4)        (1)
  v          v
  2 <--(2)-- 3

Edges: 0→1(2), 0→2(4), 1→3(1), 3→2(2).

Initial dist (direct edges, rest ∞):
    0   1   2   3
0   0   2   ∞   ∞
1   ∞  0   ∞   1
2   ∞  ∞   0   ∞
3   ∞  ∞   2   0

After Floyd-Warshall, dist[0][2] = 5 (0→1→3→2).

Python Implementation

def floyd_warshall(n, edges, directed=True):
    """
    n: number of vertices (0..n-1).
    edges: list of (u, v, w). If directed=False, add both (u,v,w) and (v,u,w).
    Returns: 2D list dist, and has_negative_cycle (bool).
    """
    INF = float('inf')
    dist = [[INF] * n for _ in range(n)]
    for i in range(n):
        dist[i][i] = 0
    for u, v, w in edges:
        dist[u][v] = min(dist[u][v], w)
        if not directed:
            dist[v][u] = min(dist[v][u], w)

    for k in range(n):
        for i in range(n):
            for j in range(n):
                if dist[i][k] != INF and dist[k][j] != INF:
                    dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])

    has_negative_cycle = any(dist[i][i] < 0 for i in range(n))
    return dist, has_negative_cycle

Examples Section

Example 1: Small Graph, All-Pairs Distances

Example: n = 4, edges: (0,1,2), (0,2,4), (1,3,1), (3,2,2). Find shortest distance from every node to every other.

n = 4
edges = [(0, 1, 2), (0, 2, 4), (1, 3, 1), (3, 2, 2)]
dist, neg_cycle = floyd_warshall(n, edges)
for i in range(n):
    print(dist[i])

Output (one possible):

[0, 2, 5, 3]
[inf, 0, 3, 1]
[inf, inf, 0, inf]
[inf, inf, 2, 0]

Interpretation: dist[0][2]=5 via 0→1→3→2; dist[1][0]=∞ (no path from 1 to 0).

Example 2: Negative Cycle Detection

Example: Add edge (2, 0, -10). Then 0→1→3→2→0 has length 2+1+2−10 = −5. After Floyd-Warshall, dist[i][i] will be negative for nodes on this cycle (e.g. dist[0][0] < 0).

edges_neg = [(0, 1, 2), (0, 2, 4), (1, 3, 1), (3, 2, 2), (2, 0, -10)]
dist, neg_cycle = floyd_warshall(4, edges_neg)
print("has_negative_cycle:", neg_cycle)  # True

Example 3: Transitive Closure (Unweighted)

For an unweighted graph, you can use the same structure to compute reachability: set dist[i][j]=1 if there is an edge, 0 for i==j, ∞ otherwise; then run Floyd-Warshall with "min" and "sum" replaced by logical OR and AND (or keep 0/1 and use dist[i][j] = dist[i][j] or (dist[i][k] and dist[k][j])). After the loops, dist[i][j] < ∞ means "j is reachable from i".

Time and Space Complexity

Time: O(V³) — three nested loops over V.
Space: O(V²) for the distance matrix.

Edge Cases

No edge between i, j: Initialize to ∞; after the algorithm, still ∞ means no path.
Negative cycle: Some dist[i][i] < 0; distances to nodes reachable through the cycle are not well-defined.
Undirected graph: Store each edge in both directions (or set dist[u][v] = dist[v][u] = w).

Common Mistakes

Common Mistake: Using Floyd-Warshall when you only need single-source shortest path; Dijkstra or Bellman-Ford is more efficient. Use Floyd-Warshall when you need all pairs.

Common Mistake: Wrong loop order: k (intermediate) must be the outer loop. If i or j is outer, the DP recurrence is incorrect.

Pattern Recognition

Use Floyd-Warshall when you see:

"Shortest path between all pairs of vertices."
Small graph (V ≤ few hundred) and need a full distance matrix.
Reachability / transitive closure with the same triple-loop structure.

Interview Insight

Interview Insight: Say: "I'll use Floyd-Warshall for all-pairs shortest path. I maintain a V×V matrix and in the outer loop over intermediate k, I relax dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]). Time O(V³), space O(V²). I can detect negative cycles by checking if any dist[i][i] < 0."

Summary

Floyd-Warshall computes all-pairs shortest paths in O(V³) time and O(V²) space.
Uses dynamic programming: allow intermediates 0..k; relax dist[i][j] via k.
Works with negative weights; negative cycle detected if dist[i][i] < 0 for some i.
Use when you need every pair; otherwise prefer Dijkstra or Bellman-Ford for single-source.

13.8 Prim's Algorithm

Introduction

Prim's algorithm finds a Minimum Spanning Tree (MST) of a connected, undirected, weighted graph. An MST is a spanning tree (connects all vertices, no cycles) whose total edge weight is as small as possible. Prim's grows a single tree from a starting vertex by repeatedly adding the cheapest edge that connects a vertex already in the tree to a vertex outside it—very similar in structure to Dijkstra, but the "key" is the minimum edge weight to reach a node, not the path length.

Minimum Spanning Tree (MST)

Spanning tree: Subgraph that is a tree and includes all vertices.
Minimum: Sum of edge weights is minimum among all spanning trees.
If the graph is not connected, there is no spanning tree; we can run Prim's on each component to get an MST forest.

Real-World Analogy

Imagine laying cable to connect all houses in a neighborhood at minimum cost. You start at one house. At each step, you extend the cable to the nearest house not yet connected (minimum cost edge from the current network to a new house). When all houses are connected, you have an MST—no redundant links, minimum total cost.

Algorithm (High-Level)

Start with a single vertex (e.g. 0); mark it "in the tree."
Maintain the minimum cost to add each vertex to the tree (initially ∞ except neighbors of the start).
Repeat until all vertices are in the tree: pick the vertex v with minimum cost that is not yet in the tree; add it and the edge that achieved that cost to the MST; update the minimum cost for neighbors of v (if an edge v→u has weight w and w < current cost for u, set cost[u] = w).

This is exactly like Dijkstra, but the "distance" is replaced by "minimum edge weight from the current tree to this vertex."

Mental Model

You have a growing set T of vertices in the MST. For each vertex not in T, track the cheapest edge from T to that vertex.
Each step: add the vertex with the cheapest such edge; that edge becomes part of the MST; then update the cheapest edge for its neighbors.

ASCII Diagram

Undirected weighted graph:

     1
  0 --- 1
  |\    |
 4| \2  |3
  |  \  |
  2 --- 3
     1

Edges: 0-1(1), 0-2(4), 0-3(2), 1-3(3), 2-3(1).

Prim from 0: add 0; cheapest to 1 is 1, to 3 is 2, to 2 is 4.
Add 1 (edge 0-1); then add 3 (edge 0-3 or 2-3); then add 2 (edge 2-3).
MST edges: (0,1), (0,3), (2,3) or (0,1), (2,3), (0,3). Total weight = 1+2+1 = 4.

Python Implementation (Min-Heap)

import heapq

def prim(n, adj, start=0):
    """
    n: number of vertices (0..n-1).
    adj: adjacency list, adj[u] = list of (v, w) for undirected edges.
    Returns: (mst_total_weight, mst_edges).
    """
    in_mst = [False] * n
    min_cost = [float('inf')] * n
    min_cost[start] = 0
    parent = [None] * n
    heap = [(0, start, -1)]  # (cost, node, parent)
    mst_edges = []
    mst_weight = 0

    while heap:
        c, u, p = heapq.heappop(heap)
        if in_mst[u]:
            continue
        in_mst[u] = True
        mst_weight += c
        if p != -1:
            mst_edges.append((p, u, c))
        for v, w in adj[u]:
            if not in_mst[v] and w < min_cost[v]:
                min_cost[v] = w
                parent[v] = u
                heapq.heappush(heap, (w, v, u))

    return mst_weight, mst_edges

Examples Section

Example 1: Small Graph MST

Example: n = 4, undirected edges: (0,1,1), (0,2,4), (0,3,2), (1,3,3), (2,3,1). Find MST from node 0.

def build_adj_undirected(n, edges):
    adj = [[] for _ in range(n)]
    for u, v, w in edges:
        adj[u].append((v, w))
        adj[v].append((u, w))
    return adj

n = 4
edges = [(0, 1, 1), (0, 2, 4), (0, 3, 2), (1, 3, 3), (2, 3, 1)]
adj = build_adj_undirected(n, edges)
weight, mst_edges = prim(n, adj, 0)
print("MST weight:", weight)   # 4
print("MST edges:", mst_edges) # e.g. [(0, 1, 1), (0, 3, 2), (3, 2, 1)]

Output: MST weight: 4, and three edges (e.g. 0-1, 0-3, 2-3) forming the MST.

Example 2: Step-by-Step (Intuition)

Start: in_mst = [T,F,F,F], heap = [(0,0,-1)].
Pop (0,0,-1): add 0; push (1,1,0), (4,2,0), (2,3,0).
Pop (1,1,0): add 1, edge (0,1); push (3,3,1) but 3 already has cost 2 from 0, so (2,3,0) stays better.
Pop (2,3,0): add 3, edge (0,3); push (1,2,3).
Pop (1,2,3): add 2, edge (3,2). All in MST; total weight 0+1+2+1 = 4.

Example 3: Disconnected Graph

Example: If the graph has two components, Prim from one vertex only spans that component. Run Prim (or a loop) from each unvisited vertex to get a minimum spanning forest.

Time and Space Complexity

Time: O(E log V) with a binary heap—each edge is considered at most once, and we do O(E) heap operations of cost O(log V).
Space: O(V) for in_mst, min_cost, parent; O(V) for the heap in the worst case.

Prim vs Kruskal

Prim: Grows one tree from a source; uses a min-heap of "crossing" edges; good for dense graphs (can be O(V²) with an array instead of heap).
Kruskal: Sorts all edges and adds the smallest that doesn't create a cycle (Union-Find); O(E log E); often simpler and good for sparse graphs.

Edge Cases

Disconnected graph: Prim from one node gives an MST of that component only; run for each component to get an MST forest.
Single node: MST has weight 0 and no edges.
Multiple edges between same pair: Use the minimum weight; the algorithm naturally uses the smallest when relaxing.

Common Mistakes

Common Mistake: Confusing Prim with Dijkstra: in Prim the key is the minimum edge weight from the tree to the node; in Dijkstra it is the total path length. Both use a min-heap but with different keys.

Common Mistake: Forgetting to add both (u,v,w) and (v,u,w) for undirected graphs when building the adjacency list.

Pattern Recognition

Use Prim (or Kruskal) when you see:

"Minimum spanning tree," "connect all nodes at minimum cost," "minimum wiring/cabling."
Problems that reduce to MST (e.g. clustering with minimum total distance).

Interview Insight

Interview Insight: Say: "I'll use Prim's algorithm: start from a vertex, maintain a min-heap of (cost, node) where cost is the minimum edge weight from the current tree to that node. Each step I add the smallest-cost node and update neighbors. Time O(E log V) with a heap. Alternatively, Kruskal with sort + Union-Find is O(E log E)."

Summary

Prim's algorithm builds an MST by growing a single tree, always adding the minimum-weight edge to a new vertex.
Implementation is similar to Dijkstra; key = min edge weight from tree to node, not path length.
Time O(E log V) with heap, space O(V). For disconnected graphs, run on each component.

13.9 Kruskal's Algorithm

Introduction

Kruskal's algorithm finds a Minimum Spanning Tree (MST) by considering edges in increasing order of weight and adding an edge to the MST only if it does not create a cycle. It uses a Union-Find (Disjoint Set Union) data structure to check in nearly constant time whether adding an edge would connect two vertices that are already in the same connected component. Kruskal is simple to implement and often preferred for sparse graphs.

Why Kruskal Works

The greedy choice: the minimum-weight edge that does not form a cycle is always part of some MST (cut property). So we sort all edges, then process them from smallest to largest; for each edge (u, v, w), if u and v are not yet in the same component, add the edge and merge their components. After processing all edges, we have exactly V − 1 edges (for a connected graph) and they form an MST.

Real-World Analogy

Imagine you have a list of possible road segments between cities, each with a cost. You want to connect all cities at minimum total cost without building redundant roads (no cycles). Sort the segments by cost, then add the cheapest segment that doesn't already connect two cities that are connected (directly or indirectly). That's Kruskal.

Algorithm (High-Level)

Sort all edges by weight (ascending).
Initialize a Union-Find structure with each vertex in its own set.
For each edge (u, v, w) in sorted order:
- If find(u) != find(v), add the edge to the MST and union(u, v).
- Otherwise skip (u and v are already in the same component; adding this edge would create a cycle).
Stop when you have added V − 1 edges (connected graph) or when no more edges remain.

Union-Find (Disjoint Set) Recap

We need two operations: find(x) — which set does x belong to? and union(x, y) — merge the sets containing x and y. With path compression and union by rank, both are nearly O(1) amortized. We use this to check "are u and v in the same component?" before adding an edge.

Mental Model

Start with each vertex as its own "island." Edges are bridges. Sort bridges by cost.
Pick the cheapest bridge; if it connects two different islands, build it and merge the islands. Repeat until you have one island (one connected component) and exactly V − 1 bridges (MST).

ASCII Diagram

Same graph as Prim (4 nodes):

  0 --- 1      Edges sorted by weight: (0,1,1), (2,3,1), (0,3,2), (1,3,3), (0,2,4)
  |\    |
  | \   |      Kruskal: add (0,1,1), add (2,3,1), add (0,3,2). Now 0,1,2,3 connected.
  2 --- 3      Skip (1,3,3) — 1 and 3 same component. Skip (0,2,4) — 0 and 2 same component.
               MST: (0,1), (2,3), (0,3). Total weight = 4.

Python Implementation (with Union-Find)

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True

def kruskal(n, edges):
    """
    n: number of vertices (0..n-1).
    edges: list of (u, v, w) for undirected edges.
    Returns: (mst_total_weight, mst_edges).
    """
    edges_sorted = sorted(edges, key=lambda e: e[2])
    uf = UnionFind(n)
    mst_edges = []
    mst_weight = 0

    for u, v, w in edges_sorted:
        if uf.find(u) != uf.find(v):
            uf.union(u, v)
            mst_edges.append((u, v, w))
            mst_weight += w
            if len(mst_edges) == n - 1:
                break

    return mst_weight, mst_edges

Examples Section

Example 1: Same Graph as Prim

Example: n = 4, edges: (0,1,1), (0,2,4), (0,3,2), (1,3,3), (2,3,1). Find MST with Kruskal.

n = 4
edges = [(0, 1, 1), (0, 2, 4), (0, 3, 2), (1, 3, 3), (2, 3, 1)]
weight, mst = kruskal(n, edges)
print("MST weight:", weight)   # 4
print("MST edges:", mst)        # [(0, 1, 1), (2, 3, 1), (0, 3, 2)]

Output: MST weight: 4, MST edges: [(0, 1, 1), (2, 3, 1), (0, 3, 2)]. Same total weight as Prim; edge set may differ but cost is the same.

Example 2: Step-by-Step

Sorted edges: (0,1,1), (2,3,1), (0,3,2), (1,3,3), (0,2,4).
Add (0,1,1): components {0,1}, {2}, {3}. MST = [(0,1,1)].
Add (2,3,1): components {0,1}, {2,3}. MST = [(0,1,1), (2,3,1)].
Add (0,3,2): merge {0,1} and {2,3} → one component. MST = [(0,1,1), (2,3,1), (0,3,2)].
We have 3 edges = n−1; stop. (1,3,3) and (0,2,4) would create cycles; skip.

Example 3: Disconnected Graph (MST Forest)

Example: If the graph is disconnected, Kruskal naturally produces a minimum spanning forest: we add edges until no more can be added without creating a cycle. The number of edges in the result is n − number_of_components.

No code change needed: just stop when no more edges can be added; the result may have fewer than n−1 edges.

Time and Space Complexity

Time: O(E log E) for sorting edges; O(E · α(V)) for the Union-Find operations (α is inverse Ackermann, effectively constant). So O(E log E) overall, which is O(E log V) when E = O(V²).
Space: O(V) for Union-Find; O(E) for the sorted edge list (or sort in place).

Kruskal vs Prim

Kruskal: Sort edges once; no need for adjacency list; easy to implement; excellent for sparse graphs (E ≈ V).
Prim: Uses a heap and adjacency list; good when you have one source or dense graphs (with array: O(V²)).

Edge Cases

Disconnected graph: Result is a spanning forest; number of edges = V − number of components.
Multiple edges between same pair: Include all in the edge list; the sort will consider the smallest first; Union-Find prevents duplicates in the MST.
Equal weights: Any order among equal-weight edges is fine; the MST may not be unique but total weight is.

Common Mistakes

Common Mistake: Forgetting to sort edges by weight. Processing in arbitrary order does not guarantee an MST.

Common Mistake: Adding an edge when find(u) == find(v); that would create a cycle. Always check before union.

Pattern Recognition

Use Kruskal when you see:

"Minimum spanning tree," "connect all nodes at minimum cost," especially when the input is an edge list.
Sparse graphs (E not much larger than V); sorting E edges is cheap.

Interview Insight

Interview Insight: Say: "I'll use Kruskal: sort edges by weight, then iterate and add each edge if it doesn't connect two nodes already in the same component, using Union-Find. Time O(E log E), space O(V). Same MST weight as Prim; different algorithm, same result."

Summary

Kruskal's algorithm builds an MST by adding edges in increasing order of weight, skipping edges that would create a cycle (Union-Find).
Time O(E log E), space O(V). Simple and ideal for sparse graphs.
Requires a working Union-Find (Disjoint Set) for cycle detection.

13.10 Disjoint Set (Union-Find)

Introduction

A Disjoint Set Union (DSU), also called Union-Find, is a data structure that maintains a partition of elements into disjoint sets. It supports two main operations: find(x) — which set does x belong to? — and union(x, y) — merge the sets containing x and y. It is essential for Kruskal's MST, cycle detection in graphs, connected components, and many problems that ask "are x and y in the same group?" with dynamic merging.

Operations

find(x): Return a representative (e.g. root) of the set containing x. If find(x) == find(y), then x and y are in the same set.
union(x, y): Merge the sets containing x and y. After this, find(x) == find(y).
Often we also want number of sets or size of set containing x; both can be maintained with minor extra bookkeeping.

Representation: Parent Array

We represent each set as a tree: each node points to its parent; the root points to itself. The "representative" of a set is its root. We store parent[i] = parent of element i (or i if it is the root). Initially parent[i] = i for all i (each element is its own set).

Find with Path Compression

To find the root of x, walk up the parent chain until we reach a node that points to itself. Path compression: while traversing, set parent[x] = root for every node along the path so that future finds are O(1) for those nodes. This keeps the tree flat and gives amortized near-constant time.

Union by Rank (or Size)

When merging two trees, attach the smaller tree under the larger tree's root so that the depth doesn't grow unnecessarily. Union by rank: maintain a rank[i] (upper bound on height); when merging, make the root with smaller rank point to the root with larger rank; if equal, increment the new root's rank. Alternatively, union by size uses set size instead of rank. Both yield amortized O(α(n)) per operation, where α is the inverse Ackermann function (effectively a constant).

Mental Model

Each set is a tree; the root is the "representative." find(x) = go to root; union(x,y) = make one root point to the other.
Path compression flattens the tree on every find; union by rank keeps trees short. Together they make operations extremely fast in practice.

ASCII Diagram

Initial: parent = [0,1,2,3,4]  (5 singletons)

After union(0,1), union(2,3):  sets {0,1}, {2,3}, {4}
  parent might be [0,0,2,2,4]  (1→0, 3→2)

After union(1,3): merge sets containing 1 and 3
  find(1)=0, find(3)=2; union(0,2) → e.g. parent[2]=0
  parent = [0,0,0,2,4]  so find(4)=4, find(0)=find(1)=find(2)=find(3)=0

Python Implementation

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.count = n  # number of disjoint sets

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # path compression
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        self.count -= 1
        return True

    def same(self, x, y):
        return self.find(x) == self.find(y)

Examples Section

Example 1: Basic Union and Find

Example: n = 5. Union(0,1), union(2,3), union(1,3). Then find(0), find(2), find(4).

uf = UnionFind(5)
uf.union(0, 1)
uf.union(2, 3)
uf.union(1, 3)
print(uf.find(0), uf.find(2), uf.find(4))  # 0, 0, 4 (0 and 2 in same set after merge)
print(uf.same(0, 2), uf.same(0, 4))        # True, False
print("Number of sets:", uf.count)          # 2

Output: 0 0 4, True False, Number of sets: 2 (sets {0,1,2,3} and {4}).

Example 2: Counting Connected Components (Graph)

Example: Given n vertices and a list of edges, count connected components by starting with n sets and unioning each edge's endpoints.

def count_components(n, edges):
    uf = UnionFind(n)
    for u, v in edges:
        uf.union(u, v)
    return uf.count

n, edges = 5, [(0, 1), (1, 2), (3, 4)]
print(count_components(n, edges))  # 2  (components {0,1,2} and {3,4})

Example 3: Cycle Detection in Undirected Graph

Example: For each edge (u, v), if find(u) == find(v) then u and v are already connected — adding (u, v) would create a cycle. Otherwise union(u, v).

def has_cycle(n, edges):
    uf = UnionFind(n)
    for u, v in edges:
        if uf.same(u, v):
            return True
        uf.union(u, v)
    return False

edges = [(0, 1), (1, 2), (2, 0)]  # triangle
print(has_cycle(3, edges))  # True

Time and Space Complexity

Amortized time per find or union: O(α(n)), where α(n) is the inverse Ackermann function (≤ 5 for any practical n). Effectively constant.
Space: O(n) for parent and rank arrays.

Applications

Kruskal's MST: Check if adding an edge connects two different components (find) and merge them (union).
Connected components: Start with n sets; for each edge, union the two endpoints; number of sets = number of components.
Cycle detection: If for edge (u,v) we have find(u)==find(v), we have a cycle.
Dynamic connectivity: Answer "are u and v connected?" as edges are added (or removed, with more advanced structures).

Edge Cases

Single element: find(x) returns x; union(x,x) does nothing (already same set).
Union same set twice: union(x,y) after they're already in the same set is a no-op; return False if you use a boolean to indicate "did we merge?".

Common Mistakes

Common Mistake: Forgetting path compression in find — without it, the tree can become a long chain and find becomes O(n). Always set parent[x] = root when recursing.

Common Mistake: Merging in the wrong direction (e.g. always making py point to px without comparing rank). Use union by rank or size so the smaller tree is attached under the larger.

Pattern Recognition

Use Union-Find when you see:

"Same group," "connected," "merge sets," "detect cycle" as edges or relations are added.
Kruskal's algorithm, connected components, or "dynamic connectivity" style problems.

Interview Insight

Interview Insight: Say: "I'll use a Disjoint Set with path compression and union by rank. find(x) returns the root and compresses the path; union(x,y) merges the roots by rank. Amortized O(α(n)) per operation. Good for cycle detection and Kruskal."

Summary

Union-Find maintains disjoint sets with find (representative) and union (merge).
Path compression + union by rank give amortized O(α(n)) per operation and O(n) space.
Essential for Kruskal, connected components, and cycle detection in graphs.

13.11 Tarjan's Algorithm (SCC & Bridges)

Introduction

Tarjan's algorithm refers to a family of DFS-based methods that use discovery time and low-link (or "low") values to find important structures in graphs. Two central applications are: (1) Strongly Connected Components (SCC) in directed graphs — maximal sets of vertices where every pair can reach each other; and (2) Bridges in undirected graphs — edges whose removal increases the number of connected components. Both run in O(V + E) with a single DFS (or two passes).

Part 1: Strongly Connected Components (SCC)

Definition

In a directed graph, an SCC is a maximal set of vertices such that for every pair u, v in the set, there is a path from u to v and from v to u. Each vertex belongs to exactly one SCC. Tarjan finds all SCCs in one DFS using a stack and "low-link" values.

Idea (Tarjan for SCC)

During DFS, assign each vertex a discovery time (disc) and compute a low-link (low): the smallest disc reachable from the current vertex by following tree edges and at most one back/cross edge.
Maintain a stack of vertices that might be in the current SCC. When we finish a vertex u and low[u] == disc[u], u is the "root" of an SCC; pop from the stack until u is popped — those vertices form one SCC.

Python: Tarjan SCC

def tarjan_scc(adj):
    n = len(adj)
    disc = [-1] * n
    low = [-1] * n
    on_stack = [False] * n
    stack = []
    time = [0]
    sccs = []

    def dfs(u):
        disc[u] = low[u] = time[0]
        time[0] += 1
        stack.append(u)
        on_stack[u] = True
        for v in adj[u]:
            if disc[v] == -1:
                dfs(v)
                low[u] = min(low[u], low[v])
            elif on_stack[v]:
                low[u] = min(low[u], disc[v])
        if low[u] == disc[u]:
            scc = []
            while True:
                v = stack.pop()
                on_stack[v] = False
                scc.append(v)
                if v == u:
                    break
            sccs.append(scc)

    for u in range(n):
        if disc[u] == -1:
            dfs(u)
    return sccs

Example: SCC

Example: Directed graph: 0→1, 1→2, 2→0 (cycle); 2→3, 3→4, 4→3. SCCs: {0,1,2} and {3,4}.

adj = [[1], [2], [0, 3], [4], [3]]
print(tarjan_scc(adj))  # [[2, 1, 0], [4, 3]] or similar order

Part 2: Bridges (Undirected)

Definition

A bridge is an edge whose removal increases the number of connected components. In an undirected graph, edge (u, v) is a bridge if and only if there is no back edge from the subtree of v (in the DFS tree) to u or an ancestor of u. Equivalently: (u, v) is a bridge if low[v] > disc[u] (when u is the parent of v in the DFS tree). Root needs special handling (bridge if it has more than one child in the DFS tree).

Idea (Tarjan for Bridges)

Run DFS from each unvisited node. For each vertex v, compute low[v] = minimum of disc[v] and disc[w] for all w reachable from v by tree edges and exactly one back edge. For tree edge (parent, v): if low[v] > disc[parent], then (parent, v) is a bridge.

Python: Bridges

def find_bridges(n, edges):
    adj = [[] for _ in range(n)]
    for u, v in edges:
        adj[u].append(v)
        adj[v].append(u)
    disc = [-1] * n
    low = [-1] * n
    time = [0]
    bridges = []

    def dfs(u, parent):
        disc[u] = low[u] = time[0]
        time[0] += 1
        for v in adj[u]:
            if disc[v] == -1:
                dfs(v, u)
                low[u] = min(low[u], low[v])
                if low[v] > disc[u]:
                    bridges.append((u, v))
            elif v != parent:
                low[u] = min(low[u], disc[v])

    for u in range(n):
        if disc[u] == -1:
            dfs(u, -1)
    return bridges

Example: Bridges

Example: Undirected: 0-1, 1-2, 2-0 (cycle), 2-3, 3-4. Edges 2-3 and 3-4 are bridges (removing either disconnects the graph).

edges = [(0, 1), (1, 2), (2, 0), (2, 3), (3, 4)]
print(find_bridges(5, edges))  # [(2, 3), (3, 4)]

Time and Space Complexity

SCC: O(V + E) — one DFS; each vertex and edge processed once.
Bridges: O(V + E) — one DFS with parent check.
Space: O(V) for disc, low, stack/recursion.

Common Mistakes

Common Mistake (Bridges): Using low[v] instead of disc[v] when updating low from a back edge. The condition for a bridge is low[v] > disc[u]; for back-edge updates use disc[w] (not low[w]) so that we only consider a single back edge.

Common Mistake (SCC): Only updating low from a neighbor that is on the stack (otherwise it's from another SCC and should not affect our low value). Check on_stack[v] before using disc[v] to update low[u].

Pattern Recognition

SCC: "Strongly connected," "mutually reachable," condensation graph, 2-SAT.
Bridges: "Critical connections," "edges whose removal disconnects," "articulation points" (related: vertex removal).

Interview Insight

Interview Insight: For SCC: "I'll use Tarjan's algorithm: DFS with disc and low, and a stack. When low[u]==disc[u], pop until u to get one SCC. O(V+E)." For bridges: "DFS with disc and low; tree edge (u,v) is a bridge if low[v] > disc[u]. O(V+E)."

Summary

Tarjan SCC: One DFS with disc, low, stack; when low[u]==disc[u], pop to form an SCC.
Tarjan Bridges: DFS with disc and low; (parent, v) is a bridge if low[v] > disc[parent].
Both run in O(V + E). Use for strongly connected components and critical edges.

13.12 Bridges & Articulation Points

Introduction

In an undirected graph, two key concepts describe "critical" structure: bridges (edges whose removal increases the number of connected components) and articulation points, or cut vertices (vertices whose removal increases the number of connected components). Both can be found in O(V + E) with one DFS using discovery time and low-link values, similar to Tarjan's approach in the previous topic.

Bridges (Recap)

A bridge is an edge (u, v) such that removing it disconnects the graph (increases the number of connected components). In a DFS tree, a tree edge (parent, v) is a bridge if and only if low[v] > disc[parent] — meaning no back edge from the subtree of v reaches parent or above. See topic 13.11 for full Tarjan bridges implementation.

Articulation Points (Cut Vertices)

Definition

An articulation point is a vertex whose removal (together with its incident edges) increases the number of connected components. So the graph becomes "more disconnected" if we delete that vertex. Finding all articulation points helps identify single points of failure in networks.

Conditions (DFS Tree)

Root of the DFS tree: The root is an articulation point if and only if it has at least two children in the DFS tree. (Removing it disconnects those subtrees.)
Non-root vertex u: u is an articulation point if there exists a child v of u in the DFS tree such that low[v] ≥ disc[u]. That means no back edge from v's subtree reaches above u, so removing u would disconnect the subtree of v from the rest.

Mental Model

During DFS, for each vertex u we compute low[u] (earliest reachable discovery time). For a non-root u, if some child v has low[v] ≥ disc[u], then v's subtree has no back edge to u's ancestors — so u "splits" the graph. For the root, we simply count its children in the DFS tree.

Single DFS: Bridges and Articulation Points Together

One DFS can compute disc, low, and then determine both bridges and articulation points. We need to count children of the root separately to classify the root as an articulation point.

Python Implementation

def bridges_and_articulation_points(n, edges):
    adj = [[] for _ in range(n)]
    for u, v in edges:
        adj[u].append(v)
        adj[v].append(u)
    disc = [-1] * n
    low = [-1] * n
    time = [0]
    bridges = []
    is_articulation = [False] * n

    def dfs(u, parent):
        disc[u] = low[u] = time[0]
        time[0] += 1
        children = 0
        for v in adj[u]:
            if disc[v] == -1:
                children += 1
                dfs(v, u)
                low[u] = min(low[u], low[v])
                if parent != -1 and low[v] >= disc[u]:
                    is_articulation[u] = True
                if low[v] > disc[u]:
                    bridges.append((u, v))
            elif v != parent:
                low[u] = min(low[u], disc[v])
        if parent == -1 and children >= 2:
            is_articulation[u] = True

    for u in range(n):
        if disc[u] == -1:
            dfs(u, -1)

    articulation_points = [u for u in range(n) if is_articulation[u]]
    return bridges, articulation_points

Examples Section

Example 1: Bridges and Articulation Points

Example: Graph: 0-1, 1-2, 2-0 (triangle), 2-3, 3-4. Vertex 2 is an articulation point (removing it disconnects 0,1 from 3,4). Vertex 3 is an articulation point (disconnects 4). Edges (2,3) and (3,4) are bridges.

edges = [(0, 1), (1, 2), (2, 0), (2, 3), (3, 4)]
bridges, ap = bridges_and_articulation_points(5, edges)
print("Bridges:", bridges)              # [(2, 3), (3, 4)]
print("Articulation points:", ap)      # [2, 3]

Output: Bridges: [(2, 3), (3, 4)], Articulation points: [2, 3].

Example 2: Root as Articulation Point

Example: Star graph: center 0 connected to 1, 2, 3. If we run DFS from 0, the root 0 has three children; removing 0 disconnects the graph. So 0 is an articulation point.

edges = [(0, 1), (0, 2), (0, 3)]
_, ap = bridges_and_articulation_points(4, edges)
print(ap)  # [0]

Example 3: No Articulation Point

Example: Cycle 0-1-2-3-0: no vertex is an articulation point (removing any one leaves a path). No bridges either.

edges = [(0, 1), (1, 2), (2, 3), (3, 0)]
bridges, ap = bridges_and_articulation_points(4, edges)
print("Bridges:", bridges, "AP:", ap)  # [], []

Time and Space Complexity

Time: O(V + E) — one DFS.
Space: O(V) for disc, low, and the articulation flag.

Edge Cases

Disconnected graph: Run DFS from each unvisited vertex; each DFS root is checked for ≥2 children. Vertices in other components are not reachable and are correctly not affected.
Single vertex or two vertices: No bridge (or one edge that is a bridge depending on definition); articulation point check for root (two vertices: root has one child, so not AP).

Common Mistakes

Common Mistake: Marking the root as an articulation point when it has only one child. Only roots with two or more children are articulation points.

Common Mistake: Using low[v] > disc[u] for articulation points. The correct condition for a non-root u is low[v] ≥ disc[u] (≥, not >). Equality can occur when v has a back edge to u itself; removing u still disconnects v's subtree if there's no other path.

Pattern Recognition

Use this when you see:

"Critical nodes," "vertices whose removal disconnects," "single point of failure," "articulation points."
"Critical edges," "bridges," "edges whose removal disconnects" (see also 13.11).

Interview Insight

Interview Insight: Say: "I'll use one DFS with disc and low. For bridges: tree edge (u,v) is a bridge if low[v] > disc[u]. For articulation points: root is AP if it has ≥2 children; non-root u is AP if some child v has low[v] ≥ disc[u]. O(V+E)."

Summary

Bridge: edge (u,v) with low[v] > disc[u] (in DFS tree with u as parent).
Articulation point: root with ≥2 children, or non-root u with a child v such that low[v] ≥ disc[u].
One DFS finds both in O(V + E).

13.13 Bipartite Graph

Introduction

A graph is bipartite if its vertices can be partitioned into two sets (say A and B) such that no edge has both endpoints in the same set — every edge goes between A and B. Equivalently, the graph is 2-colorable: we can assign two "colors" to vertices so that no two adjacent vertices share the same color. Bipartite graphs are exactly those that contain no odd-length cycle. Checking whether a graph is bipartite is done by BFS or DFS coloring in O(V + E).

Formal Definition

Partition: V = A ∪ B, A ∩ B = ∅, and every edge (u, v) has u in A and v in B (or vice versa).
2-colorable: There exists a function color : V → {0, 1} such that for every edge (u, v), color(u) ≠ color(v).
No odd cycle: The graph has no cycle of odd length. (If it had an odd cycle, 2-coloring would be impossible.)

Real-World Analogy

Imagine people and jobs: edges mean "person can do job." You want to split people and jobs into two groups so that every "can do" link is between the two groups (people on one side, jobs on the other). That's a bipartite structure. Scheduling conflicts (e.g. events that can't share a room) also model as edges; 2-coloring means assigning two time slots so that conflicting events get different slots — possible only if the conflict graph is bipartite.

Algorithm: BFS/DFS 2-Coloring

Start from an arbitrary unvisited vertex; assign it color 0. Then traverse the graph (BFS or DFS). For each edge (u, v), if v is unvisited, assign v the opposite color of u (1 − color[u]). If v is already visited, check that color[v] ≠ color[u]; if color[v] == color[u], we have found an edge inside the same "side" — the graph is not bipartite. If we finish without conflict, the graph is bipartite.

Mental Model

Think of "layers": from a start vertex, layer 0 gets color 0, layer 1 gets color 1, layer 2 gets color 0, and so on. If any edge connects two vertices of the same layer (same color), that edge creates an odd cycle — not bipartite.

Python Implementation (BFS)

from collections import deque

def is_bipartite(n, edges):
    adj = [[] for _ in range(n)]
    for u, v in edges:
        adj[u].append(v)
        adj[v].append(u)
    color = [-1] * n  # -1 = unvisited; 0 and 1 = two colors

    for start in range(n):
        if color[start] != -1:
            continue
        color[start] = 0
        q = deque([start])
        while q:
            u = q.popleft()
            for v in adj[u]:
                if color[v] == -1:
                    color[v] = 1 - color[u]
                    q.append(v)
                elif color[v] == color[u]:
                    return False
    return True

Examples Section

Example 1: Bipartite Graph

Example: Graph: 0-1, 1-2, 2-3, 3-0 (square). This is a cycle of length 4 (even). Color 0,1,2,3 as 0,1,0,1 — no edge inside same color. So the graph is bipartite.

edges = [(0, 1), (1, 2), (2, 3), (3, 0)]
print(is_bipartite(4, edges))  # True

Example 2: Not Bipartite (Odd Cycle)

Example: Triangle: 0-1, 1-2, 2-0. This is an odd cycle. Whatever color we give 0 and 1, the third vertex 2 is adjacent to both — we cannot 2-color. So not bipartite.

edges = [(0, 1), (1, 2), (2, 0)]
print(is_bipartite(3, edges))  # False

Example 3: Disconnected Graph

Example: Two components: component 1 is a square (bipartite); component 2 is a triangle (not bipartite). The whole graph is not bipartite because at least one component fails.

# Square 0-1-2-3-0 and triangle 4-5-6-4
edges = [(0, 1), (1, 2), (2, 3), (3, 0), (4, 5), (5, 6), (6, 4)]
print(is_bipartite(7, edges))  # False

Time and Space Complexity

Time: O(V + E) — each vertex and edge is processed once (we may start BFS/DFS from each unvisited vertex).
Space: O(V) for color array and the queue (or recursion stack).

Edge Cases

Single vertex or empty graph: Trivially bipartite (one or zero colors needed).
Disconnected: Must check every component; if any component is not bipartite, the whole graph is not.
No edges: Every vertex can get the same color in principle, but we can still assign alternating colors per component; the graph is bipartite.

Common Mistakes

Common Mistake: Only running BFS/DFS from one vertex. In a disconnected graph, other components might contain an odd cycle; you must iterate over all unvisited vertices and run the coloring from each.

Common Mistake: Forgetting to check the "already visited" case: when you see an edge (u, v) and v is already colored, you must verify color[v] ≠ color[u]. If they are equal, return false immediately.

Pattern Recognition

Think "bipartite" when you see:

"Two groups," "no two adjacent same type," "2-colorable," "schedule with two slots so no conflict."
Problems that use bipartite matching (maximum matching in bipartite graphs) or "can we split into two sets with no internal edges?"

Interview Insight

Interview Insight: Say: "A graph is bipartite iff it has no odd cycle. I'll run BFS (or DFS) and 2-color: assign the start 0, then each neighbor the opposite color. If I ever find an edge between two vertices with the same color, it's not bipartite. O(V+E), one pass per component."

Summary

Bipartite = vertices can be split into two sets with no edge inside a set = 2-colorable = no odd cycle.
Algorithm: BFS/DFS with colors 0 and 1; conflict (same color on both ends of an edge) ⇒ not bipartite.
Time O(V + E), space O(V). Check every component.

13.14 Eulerian Path

Introduction

An Eulerian path is a path in a graph that visits every edge exactly once. An Eulerian circuit (or Eulerian cycle) is an Eulerian path that starts and ends at the same vertex. The famous "Seven Bridges of Königsberg" problem asked whether such a walk exists. Euler showed that it depends on the degrees of the vertices. We will see the conditions for undirected and directed graphs and a simple algorithm (Hierholzer's) to build an Eulerian circuit or path in O(E).

Definitions

Eulerian path: Walk that uses every edge exactly once (vertices may repeat).
Eulerian circuit: Eulerian path that is closed (start = end).

Conditions: Undirected Graph

Eulerian circuit exists iff the graph is connected (except isolated vertices) and every vertex has even degree.
Eulerian path (not circuit) exists iff the graph is connected (except isolated vertices) and exactly 0 or 2 vertices have odd degree. If 2 vertices have odd degree, any Eulerian path must start at one of them and end at the other.

Conditions: Directed Graph

Eulerian circuit exists iff the graph is strongly connected (or one connected component when we ignore direction) and for every vertex in-degree = out-degree.
Eulerian path exists iff: at most one vertex has out_degree − in_degree = 1 (start), at most one has in_degree − out_degree = 1 (end), and all others have in_degree = out_degree. The underlying graph (ignoring direction) must be connected.

Hierholzer's Algorithm

To build an Eulerian circuit: start from a vertex (or from an "start" vertex if we want a path), and do a DFS that removes each edge as it is used. When we get stuck (current vertex has no outgoing edges left), that vertex is a "dead end" — we push it onto a path and backtrack. The final path is built in reverse order; reverse it to get the actual Eulerian circuit (or path). Alternatively, build the path by appending vertices when we backtrack (then reverse at the end). Time O(E).

Mental Model

Imagine tracing a pencil along edges without lifting it, using each edge once. You can only get "stuck" at the start/end vertex (for a path) or at the same vertex you started (for a circuit). Hierholzer simulates this by going as far as possible, then backtracking and recording vertices when we have no way out — that gives the reverse of the Eulerian order.

Python Implementation (Undirected Eulerian Circuit)

from collections import defaultdict

def eulerian_circuit_undirected(n, edges):
    """
    n: vertices 0..n-1. edges: list of (u, v).
    Returns list of vertices in order of Eulerian circuit, or [] if none exists.
    """
    deg = [0] * n
    adj = defaultdict(list)
    for u, v in edges:
        adj[u].append(v)
        adj[v].append(u)
        deg[u] += 1
        deg[v] += 1
    if any(d % 2 != 0 for d in deg):
        return []
    # Start from a vertex with at least one edge
    start = next(i for i in range(n) if deg[i] > 0)
    path = []
    stack = [start]
    while stack:
        u = stack[-1]
        if adj[u]:
            v = adj[u].pop()
            adj[v].remove(u)  # remove reverse edge
            stack.append(v)
        else:
            path.append(stack.pop())
    path.reverse()
    return path

Note: Using a multiset or keeping an index per vertex for adjacency list avoids slow remove. For simplicity we show the idea; in practice use a list of pairs and a "next" pointer per vertex, or a multiset.

Finding Start Vertex for Eulerian Path

For an Eulerian path (not circuit), pick the start vertex as follows: if there are two vertices with odd degree, start at one of them (the path will end at the other). If all degrees are even, start at any vertex with non-zero degree. Then run the same "remove edge and recurse, push on backtrack" logic; reverse the resulting list to get the path order. For a clean O(E) implementation, use an adjacency list with a "next index" per vertex so you don't scan already-used edges.

Examples Section

Example 1: Eulerian Circuit (All Even Degrees)

Example: Graph: triangle 0-1, 1-2, 2-0. Every vertex has degree 2 (even). An Eulerian circuit is 0 → 1 → 2 → 0 (or any cyclic permutation).

Degrees: 0:2, 1:2, 2:2. Condition satisfied; circuit exists.

Example 2: Eulerian Path (Two Odd Degrees)

Example: Graph: 0-1, 1-2, 2-3 (a path). Vertices 0 and 3 have degree 1 (odd); 1 and 2 have degree 2. Eulerian path must start at 0 and end at 3 (or vice versa): 0 → 1 → 2 → 3.

Two odd-degree vertices ⇒ path exists; start at one, end at the other.

Example 3: No Eulerian Path (Four Odd Degrees)

Example: Graph: two separate edges (0-1) and (2-3). Vertices 0,1,2,3 each have degree 1 (odd). We have four odd-degree vertices, so no Eulerian path or circuit exists.

Time and Space Complexity

Time: O(E) to check degrees and O(E) for Hierholzer (each edge used once). Total O(E) (with suitable data structure to avoid O(E) per edge removal).
Space: O(V + E) for graph and path.

Edge Cases

Disconnected graph: If more than one component has edges, no Eulerian path/circuit uses all edges. Check connectivity (or degree conditions only for the component that has edges).
Zero edges: Single vertex is a trivial circuit.
Multiple edges / self-loops: Conditions and algorithm extend; count multiplicity in degrees.

Common Mistakes

Common Mistake: Forgetting that for an Eulerian path (not circuit) we need exactly 0 or 2 odd-degree vertices. With 4 or more odd-degree vertices, no Eulerian path exists.

Common Mistake: In Hierholzer, removing an edge from an adjacency list in O(degree) time per edge can make the whole algorithm O(E²). Use a "next index" per vertex or store edges with a used flag for O(E).

Pattern Recognition

Think "Eulerian" when you see:

"Use every edge exactly once," "trace without lifting the pencil," "postman route."
Problems that reduce to finding a closed or open walk covering all edges.

Interview Insight

Interview Insight: Say: "Eulerian circuit exists iff connected and all even degrees; Eulerian path iff 0 or 2 odd degrees. I'll check degrees first, then use Hierholzer: DFS and when stuck, push vertex and backtrack; reverse the sequence for the circuit. O(E) time."

Summary

Eulerian circuit: All vertices even degree (undirected); in = out (directed).
Eulerian path: Exactly 0 or 2 odd-degree vertices (undirected); one start and one end vertex (directed).
Hierholzer: DFS, remove edges, push vertex when stuck; reverse to get the path. O(E).

13.15 Hamiltonian Path

Introduction

A Hamiltonian path is a path in a graph that visits every vertex exactly once. A Hamiltonian cycle (or circuit) is a Hamiltonian path that starts and ends at the same vertex. Unlike the Eulerian problem (which has a simple degree-based characterization and O(E) algorithm), determining whether a graph has a Hamiltonian path or cycle is NP-complete in general. We typically use backtracking or DP with bitmask for small graphs (small V).

Hamiltonian vs Eulerian

Eulerian: Visit every edge exactly once. Polynomial-time check and construction (degree conditions + Hierholzer).
Hamiltonian: Visit every vertex exactly once. NP-complete; no simple necessary-and-sufficient condition for general graphs.

Definitions

Hamiltonian path: Permutation (v₁, v₂, …, vₙ) of the vertices such that every consecutive pair is adjacent (there is an edge between vᵢ and vᵢ₊₁).
Hamiltonian cycle: Hamiltonian path with an edge from the last vertex back to the first.

Why It's Hard

There is no known condition like "all degrees even" that characterizes Hamiltonian graphs. Some sufficient conditions exist (e.g. Dirac's theorem: if the graph has n ≥ 3 vertices and every vertex has degree ≥ n/2, then the graph has a Hamiltonian cycle), but they are not necessary. In practice we use backtracking or DP for small n.

Approach 1: Backtracking (DFS)

Try to build a path vertex by vertex. From the current vertex u, try each unvisited neighbor v; recurse. If we ever have visited all n vertices, we have a Hamiltonian path. If we need a cycle, also check that the last vertex is adjacent to the start. Backtrack when we hit a dead end. Time in the worst case is O(n!) for n vertices; with pruning it can be faster for sparse graphs.

Approach 2: DP with Bitmask

For small n (e.g. n ≤ 20), we can use dynamic programming: state is (mask, last) where mask is a bitmask of visited vertices and last is the last vertex in the path. dp[mask][last] = true if there is a path that visits exactly the vertices in mask and ends at last. Transition: extend by a neighbor v not in mask. Base: mask has one bit set (start vertex). Answer: any state with mask = (1<<n)−1 for path; for cycle, also require an edge from last to start. Time O(n² · 2ⁿ), space O(n · 2ⁿ).

Python Implementation: Backtracking

def hamiltonian_path_exists(adj, n):
    """Returns True if the graph has a Hamiltonian path. adj: adjacency list."""
    def dfs(path, visited):
        if len(path) == n:
            return True
        u = path[-1]
        for v in adj[u]:
            if not visited[v]:
                visited[v] = True
                path.append(v)
                if dfs(path, visited):
                    return True
                path.pop()
                visited[v] = False
        return False

    for start in range(n):
        visited = [False] * n
        visited[start] = True
        if dfs([start], visited):
            return True
    return False

Python Implementation: DP Bitmask (Path)

def hamiltonian_path_dp(adj, n):
    """Returns True if graph has Hamiltonian path. adj: list of sets or lists (neighbors)."""
    # dp[mask][last] = can we visit all in mask and end at last?
    dp = [[False] * n for _ in range(1 << n)]
    for i in range(n):
        dp[1 << i][i] = True

    for mask in range(1 << n):
        for last in range(n):
            if not dp[mask][last]:
                continue
            for v in adj[last]:
                if mask & (1 << v):
                    continue
                new_mask = mask | (1 << v)
                dp[new_mask][v] = True

    full = (1 << n) - 1
    return any(dp[full][v] for v in range(n))

Examples Section

Example 1: Graph With Hamiltonian Path

Example: Path graph 0-1-2-3: a Hamiltonian path is 0→1→2→3 (or the reverse). Every vertex appears exactly once.

adj = [[1], [0, 2], [1, 3], [2]]
print(hamiltonian_path_exists(adj, 4))  # True

Example 2: Graph With Hamiltonian Cycle

Example: Complete graph on 4 vertices (every pair connected). Any permutation is a path; and we can start at 0, visit 1,2,3, and return to 0 — so a Hamiltonian cycle exists.

For cycle: after finding a path of length n, check if the last vertex is adjacent to the start.

Example 3: No Hamiltonian Path

Example: Graph: 0-1-2 (path of 3) and isolated vertex 3. We have 4 vertices but no path can visit all 4 (vertex 3 is disconnected). So no Hamiltonian path.

adj = [[1], [0, 2], [1], []]  # 3 is isolated
print(hamiltonian_path_exists(adj, 4))  # False

Time and Space Complexity

Backtracking: Worst case O(n!) (try all orderings); with pruning, often much better.
DP bitmask: O(n² · 2ⁿ) time, O(n · 2ⁿ) space. Practical for n up to about 20.

Edge Cases

Single vertex: Trivially has a Hamiltonian path (and cycle if we allow "empty" cycle).
Disconnected graph: No Hamiltonian path that visits all vertices (cannot reach other components).
Two vertices with one edge: That edge is a Hamiltonian path and (if we consider the same vertex twice) we need to define cycle; typically two vertices with one edge: path yes, cycle no (no loop).

Common Mistakes

Common Mistake: Confusing Hamiltonian (visit every vertex once) with Eulerian (visit every edge once). They are different problems with different complexity.

Common Mistake: Using backtracking or naive enumeration for large n. For n > 20, consider DP bitmask only if n is still small enough that 2ⁿ is feasible; otherwise the problem is intractable in practice.

Pattern Recognition

Think "Hamiltonian" when you see:

"Visit every city exactly once," "TSP (Traveling Salesman)" — TSP asks for a minimum-weight Hamiltonian cycle.
"Order all nodes with constraints," "permutation of vertices with adjacency constraints."

Interview Insight

Interview Insight: Say: "Hamiltonian path visits every vertex exactly once; it's NP-complete. For small n I can use backtracking or DP with bitmask: state (mask, last) and extend by unvisited neighbors. DP is O(n² · 2ⁿ). For large n there's no efficient exact algorithm."

Summary

Hamiltonian path visits every vertex exactly once; Hamiltonian cycle is a closed such path.
Problem is NP-complete. Use backtracking (small/sparse) or DP bitmask (n ≤ ~20) for exact solution.
DP: state (mask, last), transition by unvisited neighbors; O(n² · 2ⁿ) time.

13.16 Network Flow

Introduction

Network flow is a powerful model for problems where something \"flows\" through a network: water through pipes, cars through roads, data through links, or goods in a supply chain. The classic problem is Maximum Flow: given a directed graph with capacities on edges, a source s, and a sink t, find the maximum amount of flow that can be sent from s to t without exceeding capacities or violating conservation at intermediate nodes.

Basic Definitions

Capacity c(u, v): Maximum allowed flow on edge (u, v) (often a non-negative integer).
Flow f(u, v): Actual flow along (u, v), with 0 ≤ f(u, v) ≤ c(u, v).
Conservation: For every vertex u except source s and sink t, total incoming flow equals total outgoing flow.
Value of flow: Total flow out of s (or into t).
Residual graph: Graph that indicates how we can adjust flow: residual capacity c_f(u, v) = c(u, v) − f(u, v) for forward edges, and c_f(v, u) = f(u, v) for backward edges.

Max-Flow / Min-Cut Theorem

The Max-Flow / Min-Cut Theorem states that the value of the maximum s–t flow equals the capacity of the minimum s–t cut (a partition of the vertices into two sets S and T = V \\ S with s in S and t in T, capacity = sum of capacities of edges from S to T). Many network design and scheduling problems can be reduced to max-flow or min-cut.

Ford-Fulkerson and Edmonds-Karp

Ford-Fulkerson: Repeatedly find an augmenting path from s to t in the residual graph and push as much flow as possible along it. Complexity depends on how paths are chosen; if capacities are integers, it terminates.
Edmonds-Karp: A specific implementation of Ford-Fulkerson that always chooses the shortest augmenting path in terms of number of edges using BFS. It runs in O(V · E²) time.

Mental Model

Think of starting with zero flow. You repeatedly find a path from s to t along edges that have remaining capacity (residual capacity > 0) and push as much flow as possible along that path (the bottleneck). When no such path remains, you have a maximum flow. The residual graph tells you where you can still push more flow (forward edges) or cancel some you previously sent (backward edges).

Python Implementation: Edmonds-Karp (Adjacency List)

from collections import deque

def edmonds_karp(n, edges, source, sink):
    """
    n: number of vertices (0..n-1)
    edges: list of (u, v, c) directed edges with capacity c
    Returns: max_flow value
    """
    # Build capacity matrix and adjacency list
    capacity = [[0] * n for _ in range(n)]
    adj = [[] for _ in range(n)]
    for u, v, c in edges:
        capacity[u][v] += c  # allow parallel edges by accumulating
        adj[u].append(v)
        adj[v].append(u)  # add reverse edge for residual graph

    def bfs():
        parent = [-1] * n
        parent[source] = source
        q = deque([source])
        while q:
            u = q.popleft()
            for v in adj[u]:
                if parent[v] == -1 and capacity[u][v] > 0:
                    parent[v] = u
                    if v == sink:
                        return parent
                    q.append(v)
        return parent

    max_flow = 0
    while True:
        parent = bfs()
        if parent[sink] == -1:
            break  # no more augmenting paths
        # find bottleneck capacity along path
        flow = float('inf')
        v = sink
        while v != source:
            u = parent[v]
            flow = min(flow, capacity[u][v])
            v = u
        # update residual capacities
        v = sink
        while v != source:
            u = parent[v]
            capacity[u][v] -= flow
            capacity[v][u] += flow
            v = u
        max_flow += flow

    return max_flow

Examples Section

Example 1: Simple Flow Network

Example: Consider a network with 4 nodes (0 = s, 3 = t) and edges: 0→1 (capacity 3), 0→2 (2), 1→2 (1), 1→3 (2), 2→3 (3). What is the maximum flow from 0 to 3?

n = 4
edges = [
    (0, 1, 3),
    (0, 2, 2),
    (1, 2, 1),
    (1, 3, 2),
    (2, 3, 3),
]
print(edmonds_karp(n, edges, 0, 3))  # 4

One maximum flow: 0→1→3 with 2 units and 0→2→3 with 2 units, total 4.

Example 2: Bottleneck Intuition

Example: If the edge 2→3 had capacity 1 instead of 3, then the flow through 0→2→3 would be limited to 1, and total max flow would be 3 (2 via 0→1→3, 1 via 0→2→3).

Time and Space Complexity

Edmonds-Karp: O(V · E²) time, O(V²) space for capacity matrix + O(V + E) for adjacency.
Ford-Fulkerson (generic): O(E · |max_flow|) in the worst case (depends on capacities and path choices).

Edge Cases

No path from s to t: BFS never reaches t; max flow = 0.
Parallel edges: Handled by summing capacities between the same u, v.
Self-loops: Typically irrelevant for s–t flow and can be ignored or left with zero capacity.

Common Mistakes

Common Mistake: Forgetting to add reverse edges in the residual graph; without them, you cannot \"undo\" flow and re-route, which is essential for correctness.

Common Mistake: Miscomputing residual capacities when updating after augmentation (must decrease forward and increase backward capacities).

Pattern Recognition

Use max-flow / network flow when you see:

\"Maximum number of disjoint paths,\" \"maximum matching\" (in bipartite graphs), \"assign people to tasks,\" \"route as many units as possible.\"
Capacity constraints on edges or nodes, and conservation of some quantity except at sources/sinks.

Interview Insight

Interview Insight: Say: \"I'll model this as a max-flow problem with source, sink, and capacities. I'll use Edmonds-Karp: repeatedly BFS in the residual graph to find the shortest augmenting path, push the bottleneck flow, and update residual capacities. This runs in O(V·E²).\"

Summary

Network flow models flows with capacities and conservation; max-flow seeks the largest s–t flow.
Residual graph and augmenting paths underpin Ford-Fulkerson and Edmonds-Karp.
Edmonds-Karp: BFS for augmenting paths, O(V·E²); foundation for many advanced flow algorithms.

13.17 Edmonds-Karp

Introduction

Edmonds-Karp is the name for the max-flow algorithm that implements Ford–Fulkerson by always choosing the shortest augmenting path (in number of edges) from source to sink in the residual graph, using BFS. This choice guarantees at most O(V · E) augmentations and total time O(V · E²), making it a standard, easy-to-code algorithm for maximum flow.

Why “Shortest” Augmenting Path?

In generic Ford–Fulkerson, we only require some augmenting path; the number of augmentations can be huge (even infinite for irrational capacities). Edmonds–Karp fixes this: by always taking a shortest s–t path in the residual graph, one can prove that the distance from s to any vertex (in edges) never decreases. Each edge can be “critical” (bottleneck on a shortest path) at most O(V) times, so there are at most O(V · E) augmentations. Each BFS costs O(E), hence O(V · E²) total.

Algorithm Summary

Initialize flow to zero; residual capacities = original capacities.
Repeat: run BFS from s to t in the residual graph (only edges with residual capacity > 0). If t is unreachable, stop.
Reconstruct the path via parent pointers; find bottleneck = minimum residual capacity on the path.
Augment: subtract bottleneck from forward residual capacities, add bottleneck to backward residual capacities. Add bottleneck to total flow.

Python Implementation

from collections import deque

def edmonds_karp(n, edges, source, sink):
    capacity = [[0] * n for _ in range(n)]
    adj = [[] for _ in range(n)]
    for u, v, c in edges:
        capacity[u][v] += c
        adj[u].append(v)
        adj[v].append(u)

    def bfs():
        parent = [-1] * n
        parent[source] = source
        q = deque([source])
        while q:
            u = q.popleft()
            for v in adj[u]:
                if parent[v] == -1 and capacity[u][v] > 0:
                    parent[v] = u
                    if v == sink:
                        return parent
                    q.append(v)
        return parent

    max_flow = 0
    while True:
        parent = bfs()
        if parent[sink] == -1:
            break
        flow_inc = float('inf')
        v = sink
        while v != source:
            u = parent[v]
            flow_inc = min(flow_inc, capacity[u][v])
            v = u
        v = sink
        while v != source:
            u = parent[v]
            capacity[u][v] -= flow_inc
            capacity[v][u] += flow_inc
            v = u
        max_flow += flow_inc
    return max_flow

Examples Section

Example 1: Two Paths

Example: Vertices 0 (s), 1, 2, 3 (t). Edges: 0→1 (10), 0→2 (10), 1→3 (10), 2→3 (10). Two disjoint paths; max flow = 20.

n = 4
edges = [(0, 1, 10), (0, 2, 10), (1, 3, 10), (2, 3, 10)]
print(edmonds_karp(n, edges, 0, 3))  # 20

Example 2: Bottleneck in the Middle

Example: Same 4 nodes; edges: 0→1 (100), 0→2 (100), 1→2 (1), 1→3 (100), 2→3 (100). The edge 1→2 has capacity 1, so at most 1 unit can go 0→1→2→3; the rest must go 0→1→3 or 0→2→3. Max flow = 200 (e.g. 100 via 0→1→3, 100 via 0→2→3; or 1 via 0→1→2→3 and 99+100 on the other two routes).

edges2 = [(0, 1, 100), (0, 2, 100), (1, 2, 1), (1, 3, 100), (2, 3, 100)]
print(edmonds_karp(4, edges2, 0, 3))  # 200

Example 3: No Path

Example: Source and sink in different components: edges only 0→1 and 2→3. Max flow from 0 to 3 = 0.

edges3 = [(0, 1, 5), (2, 3, 5)]
print(edmonds_karp(4, edges3, 0, 3))  # 0

Time and Space Complexity

Time: O(V · E²) — O(V · E) augmentations, each requiring O(E) BFS.
Space: O(V²) for capacity matrix; O(V + E) for adjacency list and BFS.

Common Mistakes

Common Mistake: Omitting reverse edges in the adjacency list. Without them, the residual graph cannot represent “canceling” flow, and the algorithm can fail to find the maximum flow.

Common Mistake: Forgetting to update both directions when augmenting: decrease capacity[u][v] and increase capacity[v][u].

Summary

Edmonds-Karp = Ford–Fulkerson with BFS for shortest augmenting path.
O(V · E²) time, O(V²) space; deterministic and easy to implement.
Use whenever you need max-flow in practice for moderate-sized graphs; for very large graphs, consider Dinic or other algorithms.

13.18 Dinic

Introduction

Dinic's algorithm (also written Dinitz) is a faster maximum flow algorithm than Edmonds-Karp. It uses level graphs and blocking flows: in each phase, it builds a layered graph with BFS and then pushes as much flow as possible along many paths in that layer using DFS, until no more flow can be sent (blocking flow). For general graphs it runs in O(V² · E); for unit capacities or bipartite matching it is often even faster in practice.

Core Ideas

Level graph: Run BFS from the source in the residual graph. Assign each vertex a level = distance in edges from source. Keep only edges (u, v) where level[v] = level[u] + 1. This gives a DAG (no backward edges in the level graph).
Blocking flow: In the level graph, send flow from s to t until no s–t path remains in the level graph (every path has at least one saturated edge). That flow is a blocking flow for this phase.
Phases: Each phase: (1) BFS to build level graph; (2) repeated DFS to find and push blocking flow; then update residual and repeat until no augmenting path exists.

Mental Model

Edmonds-Karp sends flow along one shortest path per BFS. Dinic "locks in" the current distances and sends flow along many shortest paths in the same level graph before recomputing levels. That reduces the number of BFS phases and improves performance.

Edge Representation (Forward + Reverse)

Store each logical edge as two directed edges: forward (u→v with capacity c) and backward (v→u with capacity 0). Each edge stores a pointer to its reverse so we can update both when sending flow: forward.cap −= f, reverse.cap += f.

Python Implementation

from collections import deque

class Dinic:
    def __init__(self, n):
        self.n = n
        self.adj = [[] for _ in range(n)]  # each: [to, cap, rev_index]

    def add_edge(self, u, v, c):
        fwd = [v, c, None]
        bwd = [u, 0, None]
        fwd[2] = len(self.adj[v])
        bwd[2] = len(self.adj[u])
        self.adj[u].append(fwd)
        self.adj[v].append(bwd)

    def bfs_level(self, s, t):
        self.level = [-1] * self.n
        q = deque()
        self.level[s] = 0
        q.append(s)
        while q:
            u = q.popleft()
            for v, cap, rev in self.adj[u]:
                if cap > 0 and self.level[v] == -1:
                    self.level[v] = self.level[u] + 1
                    q.append(v)
        return self.level[t] != -1

    def dfs_flow(self, u, t, f, it):
        if u == t:
            return f
        for i in range(it[u], len(self.adj[u])):
            it[u] = i
            v, cap, rev = self.adj[u][i]
            if cap > 0 and self.level[v] == self.level[u] + 1:
                d = self.dfs_flow(v, t, min(f, cap), it)
                if d > 0:
                    self.adj[u][i][1] -= d
                    self.adj[v][rev][1] += d
                    return d
        return 0

    def max_flow(self, s, t):
        flow = 0
        INF = 10**18
        while self.bfs_level(s, t):
            it = [0] * self.n
            while True:
                f = self.dfs_flow(s, t, INF, it)
                if f == 0:
                    break
                flow += f
        return flow

Examples Section

Example 1: Same Network as Edmonds-Karp

Example: Vertices 0 (s), 1, 2, 3 (t). Edges: 0→1 (3), 0→2 (2), 1→2 (1), 1→3 (2), 2→3 (3). Max flow = 4.

n = 4
dinic = Dinic(n)
dinic.add_edge(0, 1, 3)
dinic.add_edge(0, 2, 2)
dinic.add_edge(1, 2, 1)
dinic.add_edge(1, 3, 2)
dinic.add_edge(2, 3, 3)
print(dinic.max_flow(0, 3))  # 4

Example 2: Two Disjoint Paths

Example: Edges: 0→1 (10), 1→3 (10), 0→2 (10), 2→3 (10). Two disjoint paths; max flow = 20.

dinic2 = Dinic(4)
dinic2.add_edge(0, 1, 10)
dinic2.add_edge(1, 3, 10)
dinic2.add_edge(0, 2, 10)
dinic2.add_edge(2, 3, 10)
print(dinic2.max_flow(0, 3))  # 20

Time and Space Complexity

Time: O(V² · E) for general graphs. For unit capacities or bipartite matching, often O(√V · E) or better in practice.
Space: O(V + E) for adjacency lists and edge structures.

Common Mistakes

Common Mistake: Not using an iterator array it[] in DFS. Without it, we rescan from the start of each vertex's list and performance degrades; the iterator ensures we don't revisit saturated edges in the same phase.

Common Mistake: Incorrect reverse-edge indices when adding edges; wrong rev breaks residual updates and gives wrong flow.

Pattern Recognition

Use Dinic when you see:

Max-flow on larger graphs where Edmonds-Karp may be too slow.
Bipartite matching, scheduling, or routing modeled as flow.
Competitive programming problems with strict time limits.

Summary

Dinic uses level graphs (BFS) and blocking flows (DFS) for maximum flow.
O(V² · E) in general; faster on many special cases; preferred over Edmonds-Karp for large graphs.

13.19 Johnson's Algorithm

Introduction

Johnson's algorithm finds all-pairs shortest paths in a directed graph that may have negative edge weights (but no negative cycles). It combines Bellman-Ford (to compute a "potential" function) with Dijkstra (run once per vertex on reweighted edges). The reweighting makes all edge weights non-negative so Dijkstra is valid, and the final distances are corrected to get the true shortest paths. Total time is O(V² log V + V E) with a binary heap, or O(V E + V²) with a Fibonacci heap.

Why Not Just Bellman-Ford or Floyd-Warshall?

Bellman-Ford from each source: O(V² · E) — slow for all pairs.
Floyd-Warshall: O(V³) and handles negative weights, but no benefit from sparse graphs.
Johnson: One Bellman-Ford O(V·E) plus V runs of Dijkstra O(E log V) each ⇒ O(V·E + V·E log V) = O(V E log V) for sparse graphs, which can beat Floyd-Warshall when E is small.

Algorithm Overview

Add a dummy vertex s with zero-weight edges to every other vertex. Run Bellman-Ford from s. Let h[v] = shortest distance from s to v. If Bellman-Ford detects a negative cycle, stop (no finite all-pairs distances).
Reweight edges: For each edge (u, v) with weight w, set w'(u,v) = w(u,v) + h[u] − h[v]. The key property: w' ≥ 0 (triangle inequality from shortest paths), and any path's length in w' differs from its length in w by a constant that depends only on endpoints: dist'(u,v) = dist(u,v) + h[u] − h[v].
Run Dijkstra from each vertex u in the graph with the new weights w'. Get d'[u][v] = shortest distance from u to v under w'.
Convert back: True shortest distance dist[u][v] = d'[u][v] − h[u] + h[v].

Mental Model

The potential h[v] represents "how much cheaper" it is to get to v from the dummy source. Reweighting with + h[u] − h[v] makes paths that go "downhill" in potential have non-negative cost, so Dijkstra works. The correction − h[u] + h[v] in the final step removes the reweighting so we get real distances.

Python Implementation (Sketch)

import heapq

def johnson(n, edges):
    """
    n: vertices 0..n-1. edges: list of (u, v, w).
    Returns: 2D list dist, or None if negative cycle exists.
    """
    # Step 1: Add dummy source n, run Bellman-Ford
    adj_bf = [[] for _ in range(n + 1)]
    for u, v, w in edges:
        adj_bf[u].append((v, w))
    for v in range(n):
        adj_bf[n].append((v, 0))

    INF = float('inf')
    h = [INF] * (n + 1)
    h[n] = 0
    for _ in range(n):
        for u in range(n + 1):
            for v, w in adj_bf[u]:
                if h[u] != INF and h[u] + w < h[v]:
                    h[v] = h[u] + w
    # Negative cycle check
    for u in range(n + 1):
        for v, w in adj_bf[u]:
            if h[u] != INF and h[u] + w < h[v]:
                return None  # negative cycle

    h = h[:n]  # discard dummy

    # Step 2 & 3: Build adj with reweighted edges, run Dijkstra from each u
    adj = [[] for _ in range(n)]
    for u, v, w in edges:
        adj[u].append((v, w + h[u] - h[v]))

    dist = [[INF] * n for _ in range(n)]
    for s in range(n):
        dist[s][s] = 0
        heap = [(0, s)]
        while heap:
            d, u = heapq.heappop(heap)
            if d != dist[s][u]:
                continue
            for v, w in adj[u]:
                if dist[s][u] + w < dist[s][v]:
                    dist[s][v] = dist[s][u] + w
                    heapq.heappush(heap, (dist[s][v], v))
        for v in range(n):
            if dist[s][v] != INF:
                dist[s][v] = dist[s][v] - h[s] + h[v]
    return dist

Examples Section

Example 1: Graph With Negative Edge (No Negative Cycle)

Example: n = 3, edges: (0,1,1), (1,2,-2), (2,0,1). No negative cycle. After reweighting, all edges non-negative; run Dijkstra from 0, 1, 2 and convert back to get all-pairs distances.

Dummy source 3 → 0,1,2 with weight 0. Bellman-Ford gives h[0], h[1], h[2]. Reweight; then Dijkstra from each vertex; correct with −h[u]+h[v].

Example 2: Negative Cycle

Example: Edges (0,1,1), (1,2,1), (2,0,-3). Cycle 0→1→2→0 has weight −1. Bellman-Ford detects this in the extra pass; Johnson returns None.

Time and Space Complexity

Time: O(V · E) for Bellman-Ford + V × O(E log V) for V Dijkstras ⇒ O(V E log V) with binary heap. With Fibonacci heap: O(V E + V² log V).
Space: O(V²) for the distance matrix; O(V + E) for adjacency and heaps.

Edge Cases

Negative cycle: Bellman-Ford step must detect it; return None or signal that all-pairs distances are undefined.
Disconnected: Unreachable pairs remain at INF after Dijkstra; correction formula still applies (INF stays INF).

Common Mistakes

Common Mistake: Forgetting to use the reweighted graph for Dijkstra. You must run Dijkstra on w'(u,v) = w(u,v) + h[u] − h[v], not on original weights.

Common Mistake: Forgetting to convert back: final dist[u][v] must be d'[u][v] − h[u] + h[v], not d'[u][v].

Summary

Johnson's algorithm solves all-pairs shortest paths with possible negative weights (no negative cycle).
Uses one Bellman-Ford to get potentials h, reweights to non-negative, then V Dijkstras; correct with −h[u]+h[v].
Time O(V E log V) with binary heap; good for sparse graphs compared to Floyd-Warshall O(V³).

13.20 0-1 BFS

Introduction

0-1 BFS is a variant of BFS that finds the shortest path (in terms of total edge weight) in a graph where every edge has weight either 0 or 1. Instead of a priority queue (Dijkstra), we use a double-ended queue (deque): push vertices reached by a 0-weight edge to the front and vertices reached by a 1-weight edge to the back. This keeps the deque ordered by distance, so we always process the smallest-distance node next. Time complexity is O(V + E) — linear, like BFS.

When to Use 0-1 BFS

Graph has only 0 and 1 edge weights (e.g. "free" vs "cost 1" moves).
You need shortest path from a source; Dijkstra would work but 0-1 BFS is simpler and faster (no log factor).
Common in grid problems: moving to an empty cell = 0, breaking a wall or paying cost = 1.

Algorithm

Initialize dist[s] = 0 for source s, dist[v] = ∞ for others. Use a deque; push s at the front.
While the deque is not empty:
- Pop a vertex u from the front of the deque.
- For each neighbor v with edge weight w (0 or 1):
  - If dist[u] + w < dist[v], update dist[v] = dist[u] + w. If w == 0, push v to the front of the deque; else push v to the back.

Why this works: vertices in the deque are always in non-decreasing order of distance. So the front always has the smallest distance; we never need a heap.

Mental Model

Think of the deque as two "layers": the front contains nodes at the current minimum distance; the back contains nodes at current distance + 1. Processing from the front keeps the invariant. When we add a 0-weight edge, we don't increase distance, so we put the node at the front; when we add a 1-weight edge, we put it at the back.

Python Implementation

from collections import deque

def bfs_01(n, adj, source):
    """
    n: vertices 0..n-1.
    adj: adj[u] = list of (v, w) where w is 0 or 1.
    Returns: dist[0..n-1] from source.
    """
    INF = float('inf')
    dist = [INF] * n
    dist[source] = 0
    dq = deque([source])

    while dq:
        u = dq.popleft()
        for v, w in adj[u]:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                if w == 0:
                    dq.appendleft(v)
                else:
                    dq.append(v)
    return dist

Examples Section

Example 1: Simple 0-1 Graph

Example: n = 4, source = 0. Edges: 0→1 (0), 0→2 (1), 1→3 (1), 2→3 (0). Shortest path from 0 to 3: 0→1 (0) → 3 (1) = cost 1; or 0→2 (1) → 3 (0) = cost 1. So dist[3] = 1.

adj = [
    [(1, 0), (2, 1)],  # 0
    [(3, 1)],           # 1
    [(3, 0)],           # 2
    []                  # 3
]
print(bfs_01(4, adj, 0))  # [0, 0, 1, 1]

Output: [0, 0, 1, 1] — dist[0]=0, dist[1]=0 (via 0→1 weight 0), dist[2]=1, dist[3]=1.

Example 2: Grid With 0 and 1 Costs

Example: 2D grid: moving to adjacent cell costs 0 if the cell is empty, 1 if it is a wall (or you pay to remove it). 0-1 BFS from (0,0) gives minimum cost to reach each cell. Each cell is a vertex; edges to neighbors have weight 0 or 1.

# Conceptual: grid[r][c] = 0 or 1 (cost to enter)
# Vertex index = r * cols + c. Neighbors: up, down, left, right.
# Edge weight = grid[nr][nc]. Use bfs_01 on that graph.

Time and Space Complexity

Time: O(V + E) — each vertex is popped at most once; each edge is relaxed at most once. No log factor.
Space: O(V) for dist and the deque.

Edge Cases

All weights 0: Behaves like standard BFS; all reachable nodes get distance 0 (or same level).
All weights 1: Same as standard BFS; distance = number of edges.
Disconnected: Unreachable vertices remain at INF.

Common Mistakes

Common Mistake: Pushing to the wrong end: 0-weight must go to the front, 1-weight to the back. Reversing them breaks the distance ordering.

Common Mistake: Using 0-1 BFS when edge weights can be 2 or more; the deque ordering only works for 0 and 1. For general non-negative weights use Dijkstra.

Pattern Recognition

Use 0-1 BFS when you see:

"Minimum cost" or "shortest path" with only two types of cost (e.g. free vs 1, or 0 vs 1).
Grid problems: empty cell = 0, wall/cost = 1; or "minimum walls to break" to reach target.

Summary

0-1 BFS finds shortest path when all edge weights are 0 or 1, using a deque (0 → front, 1 → back).
O(V + E) time, O(V) space; no priority queue needed.
Ideal for grid and graph problems with binary costs.

13.21 2-SAT

Introduction

2-SAT is the problem of deciding whether a Boolean formula in conjunctive normal form (CNF) where each clause has exactly two literals can be satisfied. Each clause is of the form (a ∨ b) (at least one of a, b is true). We want an assignment to all variables that makes every clause true, or we report that no such assignment exists. The classic solution reduces 2-SAT to strongly connected components (SCC) in an "implication graph" and runs in O(V + E) (linear in the number of variables and clauses).

From Clauses to Implications

A clause (a ∨ b) is equivalent to: if ¬a then b, and if ¬b then a. So we have two implications: ¬a → b and ¬b → a. We build a directed graph with one node per literal (so for variable x we have nodes for x and ¬x). For each clause (a ∨ b), add directed edges (¬a, b) and (¬b, a).

Implication Graph and Satisfiability

Implication graph: 2n nodes (n variables × 2 for literal and negation). Edges from implications.
Key fact: The 2-SAT formula is satisfiable if and only if no variable x has both a path from x to ¬x and a path from ¬x to x — i.e. x and ¬x are not in the same strongly connected component (SCC).
If x and ¬x lie in the same SCC, then we must have both x → ¬x and ¬x → x, so x must be both true and false — impossible.

Algorithm

Build the implication graph from all clauses (each clause (a ∨ b) gives edges (¬a→b) and (¬b→a)).
Find all SCCs (e.g. Tarjan's algorithm or two DFS Kosaraju).
For each variable x, check: if scc_id[x] == scc_id[¬x], return "unsatisfiable."
Otherwise, assign each variable: one common method is to assign x = false if the SCC of x appears before the SCC of ¬x in the topological order of the condensation graph (and x = true otherwise). Alternatively: assign x so that the literal in the "later" SCC is true (so the implication chain is satisfied).

Assignment Construction

After computing SCCs, process variables in the reverse topological order of the condensation graph. For each variable x, if we haven't assigned it yet, set x so that the literal that appears in the "later" SCC (higher topological order) is true. In practice: if scc[x] < scc[¬x], we want x to be false (so ¬x is true and appears "later"); if scc[x] > scc[¬x], set x true. So: value[x] = (scc[x] > scc[¬x]) (assuming higher topo order = larger scc id when we assign in reverse order).

Python Implementation (Using Tarjan SCC)

def two_sat(n, clauses):
    """
    n: number of variables (0..n-1). Each variable i has literals i (true) and i+n (false/¬i).
    clauses: list of (a, b) meaning (literal_a ∨ literal_b). Literal: 0..n-1 = variable true, n..2n-1 = variable false.
    Returns: assignment [True/False] for each variable, or None if unsatisfiable.
    """
    # Build implication graph: 2n nodes
    N = 2 * n
    adj = [[] for _ in range(N)]
    def neg(lit):
        return lit + n if lit < n else lit - n
    for a, b in clauses:
        adj[neg(a)].append(b)  # ¬a → b
        adj[neg(b)].append(a)  # ¬b → a

    # Tarjan SCC
    disc = [-1] * N
    low = [-1] * N
    on_stack = [False] * N
    stack = []
    time = [0]
    scc_id = [-1] * N
    scc_count = [0]

    def dfs(u):
        disc[u] = low[u] = time[0]
        time[0] += 1
        stack.append(u)
        on_stack[u] = True
        for v in adj[u]:
            if disc[v] == -1:
                dfs(v)
                low[u] = min(low[u], low[v])
            elif on_stack[v]:
                low[u] = min(low[u], disc[v])
        if low[u] == disc[u]:
            while True:
                v = stack.pop()
                on_stack[v] = False
                scc_id[v] = scc_count[0]
                if v == u:
                    break
            scc_count[0] += 1

    for u in range(N):
        if disc[u] == -1:
            dfs(u)

    # Check x and ¬x in same SCC?
    for i in range(n):
        if scc_id[i] == scc_id[i + n]:
            return None
    # Assign: variable i true iff scc[i] > scc[¬i]
    return [scc_id[i] > scc_id[i + n] for i in range(n)]

Examples Section

Example 1: Satisfiable Formula

Example: Variables x0, x1. Clauses: (x0 ∨ x1), (¬x0 ∨ x1), (x0 ∨ ¬x1). So we have implications: ¬x0→x1, ¬x1→x0; x0→x1, ¬x1→¬x0; ¬x0→¬x1, x1→x0. Build graph and run SCC. If no variable has x and ¬x in same SCC, assign as above. One solution: x0=true, x1=true (all clauses satisfied).

# Literals: 0=x0, 1=x1, 2=¬x0, 3=¬x1. Clauses (a∨b): (0,1), (2,1), (0,3)
n = 2
clauses = [(0, 1), (2, 1), (0, 3)]
ans = two_sat(n, clauses)
print(ans)  # e.g. [True, True]

Example 2: Unsatisfiable Formula

Example: Clauses (x0 ∨ x0), (¬x0 ∨ ¬x0) simplify to requiring x0 and ¬x0 both true — impossible. In implication graph: edges (¬x0→x0) and (x0→¬x0), so x0 and ¬x0 are in the same SCC. Algorithm returns None.

Time and Space Complexity

Time: O(n + m) where n = number of variables, m = number of clauses. Building graph O(m), Tarjan O(2n + 2m) = O(n + m).
Space: O(n + m) for the implication graph and SCC data.

Common Mistakes

Common Mistake: Adding only one edge per clause. Each clause (a ∨ b) must yield two edges: (¬a→b) and (¬b→a).

Common Mistake: Wrong literal encoding: ensure a consistent convention (e.g. variable i: literal i = true, literal i+n = false) and that negation is computed correctly when building edges.

Pattern Recognition

Use 2-SAT when you see:

Constraints that are "at least one of two choices" or "if not A then B" type.
Binary assignments (true/false, 0/1) with pairwise constraints.

Summary

2-SAT is solved by building the implication graph and checking that no variable has x and ¬x in the same SCC.
Assignment: for each variable x, set x = true iff the SCC of x has higher topological order than the SCC of ¬x (or use the standard reverse-topo assignment).
O(n + m) time with Tarjan (or Kosaraju) for SCC.

13.22 Stable Matching (Gale-Shapley)

Introduction

The stable matching problem (also known as the stable marriage problem) has two sets of agents (e.g. men and women, or students and schools). Each agent has a strict preference list over the other set. We want a perfect matching (everyone matched exactly once) with no blocking pair — no two agents who prefer each other over their current partners. Gale-Shapley is an algorithm that always finds a stable matching in O(n²) time.

Definitions

Matching: A set of pairs (a, b) with one from each set; each agent appears in at most one pair. Perfect matching: everyone appears exactly once.
Blocking pair: Two agents x and y (from different sets) who are not matched to each other but each prefers the other to their current partner. So (x, y) would "run off" together.
Stable matching: A perfect matching with no blocking pair.

Gale-Shapley Algorithm (Proposer-Acceptor)

One set is the proposers (e.g. men); the other is the acceptors (e.g. women). Each proposer has a list of acceptors in order of preference; each acceptor has a list of proposers. The algorithm:

Every proposer is initially free. Each acceptor has no partner.
While there exists a free proposer who has not proposed to every acceptor:
- Pick such a proposer m. He proposes to his most preferred acceptor w to whom he has not yet proposed.
- If w is free, she tentatively accepts (m, w). If w is matched to m', she accepts the one she prefers (m or m'); the rejected proposer becomes free again.
When no free proposer has any remaining choices, the tentative matching is final and is stable.

Key Properties

Termination: Each proposer proposes at most n times, so at most n² proposals; the algorithm always terminates.
Stable: The output matching has no blocking pair (proof by contradiction: if (m, w) blocked, w would have rejected m only for someone she prefers, so she doesn't prefer m to her partner).
Proposer-optimal: Every proposer gets the best partner they can have in any stable matching. Acceptors get their worst valid partner in any stable matching (acceptor-pessimal).

Mental Model

Proposers "propose" in order of preference; acceptors keep the best offer so far and reject the rest. Rejected proposers move down their list. Because acceptors only trade up, no one who was rejected can later form a blocking pair with an acceptor who already has a better (for her) partner.

Python Implementation

def gale_shapley(n, pref_proposer, pref_acceptor):
    """
    n: number of proposers and acceptors (each 0..n-1).
    pref_proposer[i] = list of n acceptor indices in order of preference for proposer i.
    pref_acceptor[j] = list of n proposer indices in order of preference for acceptor j.
    Returns: list partner_proposer[0..n-1] where partner_proposer[i] = acceptor matched to proposer i.
    """
    # rank_acceptor[j][m] = rank of proposer m in acceptor j's list (0 = best)
    rank_acceptor = [[0] * n for _ in range(n)]
    for j in range(n):
        for r, m in enumerate(pref_acceptor[j]):
            rank_acceptor[j][m] = r

    partner_acceptor = [-1] * n  # partner_acceptor[j] = proposer matched to j
    next_proposal = [0] * n      # next index in pref_proposer[i] to try
    free = list(range(n))        # free proposers

    while free:
        m = free.pop()
        if next_proposal[m] >= n:
            continue
        w = pref_proposer[m][next_proposal[m]]
        next_proposal[m] += 1
        if partner_acceptor[w] == -1:
            partner_acceptor[w] = m
        else:
            m_prime = partner_acceptor[w]
            if rank_acceptor[w][m] < rank_acceptor[w][m_prime]:  # w prefers m
                partner_acceptor[w] = m
                free.append(m_prime)
            else:
                free.append(m)

    # Build result: partner_proposer[m] = w such that partner_acceptor[w] == m
    partner_proposer = [0] * n
    for w in range(n):
        m = partner_acceptor[w]
        if m != -1:
            partner_proposer[m] = w
    return partner_proposer

Examples Section

Example 1: Two-by-Two

Example: n = 2. Proposers 0, 1; acceptors 0, 1. Preferences: proposer 0: [1, 0], proposer 1: [0, 1]; acceptor 0: [0, 1], acceptor 1: [1, 0]. So proposer 0 prefers acceptor 1 then 0; proposer 1 prefers 0 then 1; acceptor 0 prefers proposer 0 then 1; acceptor 1 prefers proposer 1 then 0.

n = 2
pref_proposer = [[1, 0], [0, 1]]   # 0 prefers 1>0, 1 prefers 0>1
pref_acceptor = [[0, 1], [1, 0]]   # 0 prefers 0>1, 1 prefers 1>0
result = gale_shapley(n, pref_proposer, pref_acceptor)
print(result)  # e.g. [1, 0] -> proposer 0 gets acceptor 1, proposer 1 gets acceptor 0

One stable outcome: (0,1) and (1,0). Proposer 0 gets acceptor 1; proposer 1 gets acceptor 0. No blocking pair.

Example 2: Three-by-Three

Example: n = 3. Proposers' preferences: 0: [1,0,2], 1: [2,0,1], 2: [0,1,2]. Acceptors' preferences: 0: [2,1,0], 1: [0,2,1], 2: [1,0,2]. Run Gale-Shapley; output is a stable matching (e.g. proposer-optimal).

Time and Space Complexity

Time: O(n²) — each proposer proposes at most n times; each proposal is O(1) with rank lookup.
Space: O(n²) for preference and rank arrays; O(n) for matching state.

Common Mistakes

Common Mistake: Comparing preferences by value instead of rank. Acceptors choose between two proposers by rank (lower rank = preferred). Precompute rank_acceptor[j][m] so comparison is O(1).

Common Mistake: Letting a proposer propose again to the same acceptor after being rejected. Track next_proposal[m] so each proposer moves down their list and never re-proposes to the same acceptor.

Pattern Recognition

Use Gale-Shapley when you see:

"Stable matching," "stable marriage," "college admissions," "internship matching" with two sides and preference lists.
Problems asking for a matching with no "blocking pair" or "no one would switch."

Summary

Stable matching: perfect matching with no blocking pair. Gale-Shapley always finds one.
Proposers propose in preference order; acceptors keep best offer. O(n²) time.
Output is proposer-optimal and acceptor-pessimal among all stable matchings.

14.1 Recursion Deep Dive

Introduction

Recursion is one of the most powerful and elegant ideas in computer science, but it is also one of the most misunderstood by beginners. In this deep dive, you will build a rock-solid mental model for how recursion works, how Python actually executes recursive functions under the hood, how to design your own recursive solutions, and how to analyze their time and space complexity. By the end, recursion will feel less like “magic” and more like a predictable, mechanical process you can control.

Real-World Analogy

Imagine a line of people, each holding an envelope. The first person is asked:

If your envelope says “STOP”, open it and read the number inside.
Otherwise, pass the question to the next person in line, wait for their answer, then add 1 to it and say that.

Eventually, someone’s envelope says “STOP”; that person answers with, say, 0. Then the person just before them adds 1 and answers 1, the one before answers 2, and so on back to the start. Each person only does a tiny bit of work and relies on the next person for the rest. This is exactly how recursion works: each call handles a tiny part of the job and delegates the rest to a “smaller” version of the same problem.

Formal Definition

Informally, a function is recursive if it calls itself (directly or indirectly). Formally:

Concept Note: A recursive definition describes an object or function in terms of smaller instances of itself, together with one or more base cases that do not recurse.

Every well-formed recursive function has:

Base case(s): Simple input(s) where the answer is known immediately and no further recursion is needed.
Recursive case(s): Rule(s) that reduce the current problem to one or more smaller subproblems of the same type.

Why This Topic Matters

Interview relevance: Many classic interview problems (trees, backtracking, divide-and-conquer, DP) are naturally recursive.
Expressiveness: Recursive code often mirrors the mathematical or combinatorial definition of the problem, making it easier to reason about.
Foundation for advanced topics: Backtracking, dynamic programming, and many graph and tree algorithms build directly on recursion.

Mental Model: The Call Stack

When you call a function in Python, the interpreter allocates a stack frame that stores:

Parameter values
Local variables
Return address (where to continue after the function returns)

With recursion, each recursive call gets its own frame. Think of the call stack as a stack of plates: each call pushes a new plate; when a call returns, its plate is popped off.

Call stack grows downward (top of stack is the most recent call)

Before any calls:
  [main]

Call fact(3):
  [main]
  [fact(3)]

fact(3) calls fact(2):
  [main]
  [fact(3)]
  [fact(2)]

fact(2) calls fact(1):
  [main]
  [fact(3)]
  [fact(2)]
  [fact(1)]

fact(1) hits base case and returns 1.
Then fact(2) returns 2 * 1, fact(3) returns 3 * 2, etc., and frames are popped.

Step-by-Step Breakdown: Factorial

The factorial of a non-negative integer n, written n!, is defined as:

0! = 1
n! = n × (n − 1)! for n ≥ 1

Notice that the definition itself is recursive: n! is defined in terms of (n − 1)!.

Recursive Design Recipe

Define the problem clearly: Input: integer n ≥ 0. Output: n!.
Find the base case: What is the simplest n you can handle directly? Here, 0! = 1.
Assume the recursive call works: Assume you already have a function that correctly computes (n − 1)!.
Write the recursive step: Using (n − 1)!, how do you get n!? Answer: n! = n × (n − 1)!
Make progress: Ensure each recursive call moves toward the base case (n gets smaller).

Python Implementation: Factorial

def factorial(n: int) -> int:
    """Compute n! for n >= 0 using recursion."""
    if n < 0:
        raise ValueError("n must be non-negative")

    # Base case
    if n == 0:
        return 1

    # Recursive case: n! = n * (n - 1)!
    return n * factorial(n - 1)

Line-by-Line Explanation

if n < 0: ... – we first guard against invalid input.
if n == 0: return 1 – this is the base case; no more recursion.
return n * factorial(n - 1) – we trust that factorial(n - 1) works (inductive assumption), multiply by n, and return.

Execution Trace for factorial(3)

factorial(3)
  -> 3 * factorial(2)
                 |
                 v
            factorial(2)
              -> 2 * factorial(1)
                             |
                             v
                        factorial(1)
                          -> 1 * factorial(0)
                                         |
                                         v
                                    factorial(0)
                                      -> 1  (base case)

Unwinding:
  factorial(0) returns 1
  factorial(1) returns 1 * 1 = 1
  factorial(2) returns 2 * 1 = 2
  factorial(3) returns 3 * 2 = 6

Another Example: Sum of an Array

Problem: Given a list of numbers, return their sum.

Recursive idea:

Base case: The sum of an empty list is 0.
Recursive case: sum of [x] + rest is x + sum(rest).

from typing import List


def recursive_sum(arr: List[int]) -> int:
    if not arr:  # base case: empty list
        return 0

    first = arr[0]
    rest = arr[1:]
    return first + recursive_sum(rest)

Common Mistake: Beginners sometimes forget to shrink the problem. Here, if you wrote return arr[0] + recursive_sum(arr), the list would never get smaller, and you would get infinite recursion until Python raises RecursionError.

Time and Space Complexity

Factorial and Recursive Sum

Time complexity: For both factorial(n) and recursive_sum(arr), we make exactly one recursive call with a “smaller” input each time.

Let T(n) be the time to compute factorial(n). We have:

T(0) = O(1)         # base case
T(n) = T(n - 1) + O(1)   # one recursive call plus constant work

If you expand this recurrence:

T(n) = T(n - 1) + c
     = T(n - 2) + c + c
     = ...
     = T(0) + n * c
     = O(n)

Time: O(n) for factorial(n) and O(n) for recursive_sum on a list of length n.
Space (call stack): Also O(n), because at most n calls are on the stack at once before unwinding.

Brute Force → Better → Optimal (Recursion vs Iteration)

Brute Force Thinking

A beginner might first write factorial using a loop (iterative):

def factorial_iterative(n: int) -> int:
    if n < 0:
        raise ValueError("n must be non-negative")
    result = 1
    for k in range(1, n + 1):
        result *= k
    return result

This is already efficient (O(n) time, O(1) extra space). Recursion does not automatically make things faster—it makes them clearer when the problem is naturally recursive (trees, backtracking, divide-and-conquer).

Recursive “Better” for Structure

For problems with a recursive structure (e.g., binary trees: process root, left subtree, right subtree), the recursive version is often the most natural and concise. For example, tree traversal is much clearer recursively than managing your own explicit stack on the first try.

Optimization Insight: In Python specifically, recursion adds overhead and uses the call stack. For simple linear problems (like summing a list or computing factorial), an iterative solution is typically more memory-efficient and can be faster. Use recursion when it matches the natural structure of the problem (trees, divide-and-conquer, backtracking) or when clarity matters more than a small constant-factor overhead.

Tail Recursion (Preview)

A recursive call is in tail position if it is the very last operation in the function (nothing remains to do after the recursive call returns). For example:

def tail_recursive_factorial(n: int, acc: int = 1) -> int:
    if n == 0:
        return acc
    return tail_recursive_factorial(n - 1, acc * n)

In some languages (e.g., Scheme, some C compilers), the compiler can optimize tail recursion to reuse the same stack frame (tail call optimization), making space O(1). Python does not perform tail call optimization, so tail-recursive functions in Python still use O(n) stack space and are subject to the recursion limit.

Common Mistakes

Common Mistake: Missing or incorrect base case. If your base case never triggers (or is wrong), your recursion will never stop, causing a RecursionError.

Common Mistake: Not making progress toward the base case. Always ensure each recursive call moves closer to the base case (e.g., n - 1, index + 1, smaller subarray, smaller tree).

Common Mistake: Doing extra work after recursion when not needed. Sometimes you can simplify and reduce overhead by pushing more of the work into the parameters (accumulators) instead of doing complex work while unwinding.

Interview Insight

Interview Insight: When asked to solve a recursive problem in an interview:

Start by clearly stating the base case(s) and what they return.
Define the recursive step verbally (“Assume I can solve the smaller problem; here’s how I use it”).
Draw a small recursion tree or call stack for a tiny input (n = 3, 4) to verify correctness.
Then write code that exactly matches your definition.

Practice Problems

Compute the nth Fibonacci number recursively (then think about why the naive version is slow).
Given a string, return its reverse using recursion.
Count how many times a target value appears in a list using recursion.
Given a binary tree, compute its height and node count recursively.

Summary

Every recursive function needs clear base case(s) and recursive case(s) that shrink the problem.
The call stack holds one frame per active call, giving recursion an extra O(depth) space cost.
Linear recursive algorithms like factorial and sum of array typically run in O(n) time and O(n) space.
Use recursion when it matches the natural structure of the problem (trees, divide-and-conquer, backtracking) and favor iteration for simple linear tasks in Python.

14.2 Tail Recursion

Introduction

Not all recursive calls are created equal. When the recursive call is the last thing your function does—with no further computation after it returns—we say the call is in tail position. Such functions are called tail-recursive. In languages that support tail call optimization (TCO), tail recursion can use constant stack space instead of growing the stack with every call. Python does not perform TCO, but understanding tail recursion sharpens your reasoning about recursion, stack usage, and how to convert recursive ideas into efficient iterative code.

Real-World Analogy

Imagine passing a baton in a relay. In ordinary recursion, the runner receives the baton, runs a leg, then waits for the next runner to finish the rest of the race, and only then does something with the result (e.g., adds their own time). In tail recursion, the runner passes the baton and is done—they do nothing after the handoff. The “result” is carried forward in the baton itself (like an accumulator). When the last runner crosses the line, the final answer is already in the baton; no one needs to “add up” work on the way back.

Formal Definition

Concept Note: A function is tail-recursive if every recursive call it makes is in tail position: that is, the return value of the recursive call is returned directly (or is the entire result of the function) with no further computation performed after the call returns.

More precisely:

Tail position: The last action of the function is to call itself (and possibly return that call’s result with no extra work).
Tail call: A function call that is in tail position.
Tail call optimization (TCO): A compiler/runtime optimization that reuses the current stack frame for a tail call, so the stack does not grow.

Why This Topic Matters

Stack safety: In TCO-supporting languages, tail-recursive code can run in O(1) stack space, avoiding stack overflow on large inputs.
Conversion to loops: Any tail-recursive function can be mechanically converted to an equivalent loop with no recursion—useful in Python where TCO is absent.
Interview clarity: Interviewers sometimes ask “can you make this tail-recursive?” or “convert this to iterative”; knowing the pattern helps.

Mental Model: Accumulator Pattern

A common way to make a recursive function tail-recursive is to add an accumulator parameter that carries the “result so far.” The base case then returns the accumulator (or a function of it); the recursive case updates the accumulator and passes it down, with no work left to do after the recursive call returns.

Ordinary recursion (factorial):
  fact(n) = n * fact(n-1)   →  work AFTER the call returns (multiply by n)

Tail recursion (factorial with accumulator acc):
  fact_tail(n, acc) = fact_tail(n-1, n * acc)   →  no work after the call; result is in acc when n=0

Step-by-Step: Converting Factorial to Tail-Recursive Form

Original: fact(n) = n * fact(n-1), base case fact(0) = 1.
Introduce accumulator: Define fact_tail(n, acc) meaning “compute n! × acc.” So fact(n) = fact_tail(n, 1).
Recursive rule: fact_tail(n, acc) = fact_tail(n-1, n * acc)—the “n ×” is folded into acc.
Base case: fact_tail(0, acc) = acc (we have accumulated the full n! in acc).
The recursive call is now the last operation; no multiplication after return.

Python Implementation

Tail-recursive factorial

def factorial_tail(n: int, acc: int = 1) -> int:
    """Tail-recursive factorial. fact(n) = factorial_tail(n, 1)."""
    if n < 0:
        raise ValueError("n must be non-negative")
    if n == 0:
        return acc
    return factorial_tail(n - 1, n * acc)

Tail-recursive sum of list (using index to avoid slicing)

To avoid O(n) slicing per call, we pass an index and the list:

from typing import List


def sum_tail(arr: List[int], i: int = 0, acc: int = 0) -> int:
    """Tail-recursive sum: sum(arr) = sum_tail(arr, 0, 0)."""
    if i == len(arr):
        return acc
    return sum_tail(arr, i + 1, acc + arr[i])

Line-by-Line Explanation (factorial_tail)

acc holds “partial product” so far: when we reach n=0, we have n! × acc_initial. With acc_initial=1, we get n!.
if n == 0: return acc — base case; no more recursion; the answer is in acc.
return factorial_tail(n - 1, n * acc) — single return of the recursive call; nothing is done after the call returns. So this is a tail call.

Ordinary vs Tail Recursion: Comparison

Aspect	Ordinary recursion	Tail recursion
After recursive call returns	More work (e.g. multiply by n)	Nothing; return that result
Stack in Python	O(n) frames	O(n) frames (no TCO)
Stack with TCO	Still O(n)	O(1)
Conversion to loop	Need explicit stack or different rewrite	Straightforward: loop that updates (n, acc) until n=0

Tail Recursion to Iteration (Python-Friendly)

Because Python does not do TCO, the “optimal” way to get constant stack space is to convert the tail-recursive function into a loop. The transformation is mechanical:

Replace the recursive function with a loop.
Loop variables = the parameters that change (e.g. n, acc).
Base case → loop exit condition; return value = accumulator (or final state).
Recursive case → update variables and continue loop.

def factorial_iterative(n: int) -> int:
    if n < 0:
        raise ValueError("n must be non-negative")
    acc = 1
    while n != 0:
        acc = n * acc
        n = n - 1
    return acc

This is exactly the same computation as factorial_tail(n, 1), but with no recursive calls and O(1) extra space.

Time and Space Complexity

Time: Same as the ordinary recursive version—O(n) for factorial and for list sum (with index, no slicing).
Space in Python: Tail-recursive factorial_tail and sum_tail still use O(n) stack space because there is no TCO. The iterative version uses O(1) extra space.

Edge Cases

n < 0: Define behavior (e.g. raise); tail version handles it the same as ordinary version.
n = 0 or empty list: Base case returns accumulator; ensure initial accumulator is correct (1 for factorial, 0 for sum).

Common Mistakes

Common Mistake: Assuming Python optimizes tail calls. It does not. Writing tail-recursive code in Python does not by itself reduce stack usage; convert to a loop if you need O(1) stack.

Common Mistake: Wrong initial accumulator. For factorial, factorial_tail(n, 0) would always return 0. The public API should call the tail helper with the correct initial value (e.g. factorial_tail(n, 1)).

Optimization Insight: When you need constant stack space in Python, prefer rewriting the tail-recursive logic as a while loop over relying on tail recursion. The loop is the “TCO by hand” and is the standard Pythonic approach.

Interview Insight

Interview Insight: If asked to “make it tail-recursive” or “use O(1) space”: (1) Add an accumulator (and possibly an index) so that no work is done after the recursive call. (2) In Python, mention that you’d convert the tail-recursive version to a loop for real O(1) stack usage. Showing both the tail-recursive form and the equivalent loop demonstrates depth.

Practice Problems

Write a tail-recursive version of “reverse a string” (e.g. pass index and an accumulator string or list).
Write a tail-recursive “length of list” and then convert it to a loop.
Implement “power(a, b)” (a^b) tail-recursively using an accumulator, then as a loop.

Summary

Tail recursion means the recursive call is in tail position—nothing is done after it returns.
Use an accumulator (and sometimes an index) to carry partial results and make the last step a single recursive call.
With TCO, tail recursion uses O(1) stack; Python does not do TCO, so stack remains O(n).
Convert tail-recursive functions to a loop in Python for O(1) stack and to avoid recursion limits.

14.3 Subsets

Introduction

Given a set (or list) of distinct elements, the subsets problem asks: generate all possible subsets. The empty set and the set itself count as subsets. This is a fundamental backtracking pattern: at each step you have a choice—include the current element or exclude it—and you explore both options recursively. Mastering this pattern unlocks subset-sum, combination, and many “generate all possibilities” interview problems.

Real-World Analogy

Imagine packing a suitcase from a row of items. For each item you ask: “Do I take it or leave it?” You don’t need to decide the order—only which items are in the bag. Every possible combination of “take/leave” gives one subset. Empty bag = empty set; all items = full set. Recursion walks through the row: at each item you branch into “include” and “exclude,” then recurse on the rest. When you’ve passed every item, the current bag is one subset—record it and return.

Formal Definition

Concept Note: A subset of a set S is any set whose elements all belong to S (including ∅ and S itself). If S has n elements, there are exactly 2ⁿ subsets. The subsets problem: given an array (or list) of n distinct elements, output a collection of all 2ⁿ subsets (each subset typically as a list).

We do not consider order: [1, 2] and [2, 1] are the same subset. Duplicates in the input are often disallowed so that subsets are uniquely defined.

Why This Topic Matters

Backtracking foundation: The include/exclude choice at each index is the core of many recursive enumeration problems.
Interview staple: “Subsets,” “Subsets II” (with duplicates), and “Subset Sum” appear frequently.
Exponential size: Output has 2ⁿ subsets, so time is at least Ω(2ⁿ); the goal is to generate them in O(2ⁿ) time without extra waste.

Mental Model: Decision Tree

For nums = [1, 2, 3], think of a tree where:

Level i corresponds to index i (element nums[i]).
Each node has two children: “include nums[i]” and “exclude nums[i]”.
Each leaf is a complete subset (after processing all n indices).

                    start
                   /      \
            include 1    exclude 1
             /    \       /    \
         inc 2  exc 2  inc 2  exc 2
          ...   ...     ...   ...
(Leaves: [1,2,3], [1,2], [1,3], [1], [2,3], [2], [3], []  → 8 = 2^3)

Step-by-Step Breakdown

State: Current index i, and a current subset (path) path built so far.
Base case: When i == len(nums), we have fixed include/exclude for every element; path is one subset—append a copy to the result.
Recursive case: For index i, two choices:
- Exclude: Recurse with i + 1 and same path.
- Include: Append nums[i] to path, recurse with i + 1, then backtrack (pop) so the same path can be reused for other branches.
Order of exploration (exclude then include, or vice versa) only affects the order of subsets in the output; both are correct.

Python Implementation

from typing import List


def subsets(nums: List[int]) -> List[List[int]]:
    result: List[List[int]] = []

    def backtrack(i: int, path: List[int]) -> None:
        if i == len(nums):
            result.append(path[:])  # copy of current subset
            return
        # Choice 1: exclude nums[i]
        backtrack(i + 1, path)
        # Choice 2: include nums[i]
        path.append(nums[i])
        backtrack(i + 1, path)
        path.pop()  # backtrack

    backtrack(0, [])
    return result

Line-by-Line Explanation

result collects all subsets; path is the current partial subset (modified and restored during recursion).
if i == len(nums): result.append(path[:]) — base case: we’ve decided for every element; append a copy of path (so later backtracking doesn’t change stored subsets).
backtrack(i + 1, path) — exclude nums[i]; recurse on the rest.
path.append(nums[i]); backtrack(i + 1, path); path.pop() — include nums[i], recurse, then undo so the next sibling branch sees the same path.

Common Mistake: Forgetting path.pop() or appending path instead of path[:]. Without the copy, every entry in result would reference the same list, which ends up empty (or wrong) after backtracking.

Example Walkthrough

For nums = [1, 2]:

Start: i=0, path=[].
Exclude 1: backtrack(1, []) → base case at i=2 → add [].
Include 1: path=[1], then backtrack(1, [1]). Exclude 2 → add [1]; include 2 → add [1,2]. Pop 2 then pop 1.
Result: [[], [2], [1], [1, 2]] (order may vary by implementation).

Time and Space Complexity

Time: O(n · 2ⁿ). There are 2ⁿ leaves; each base case does O(n) work to copy path. So total O(n · 2ⁿ).
Space (excluding output): O(n) for recursion stack and path. Space for output is O(n · 2ⁿ) to store all subsets.

Edge Cases

Empty input: nums = [] → one subset []. Our base case at i=0 appends path[:] = [], so result = [[]]. Correct.
Single element: nums = [1] → [[], [1]]. Handled by same logic.

Subsets II (With Duplicates)

If the array may contain duplicates and we want unique subsets (as sets of values), we must avoid generating the same subset in different orders. Standard approach: sort the array, then at each index when we “include,” skip over all subsequent elements that equal the current one (so we only “include” one representative of each duplicate value in a single branch).

def subsets_with_dup(nums: List[int]) -> List[List[int]]:
    result: List[List[int]] = []
    nums = sorted(nums)

    def backtrack(i: int, path: List[int]) -> None:
        result.append(path[:])
        for j in range(i, len(nums)):
            if j > i and nums[j] == nums[j - 1]:
                continue  # skip duplicate
            path.append(nums[j])
            backtrack(j + 1, path)
            path.pop()

    backtrack(0, [])
    return result

Here we use a “for-loop over next start index” style: at each step we choose the next index j to include (and skip duplicate values). Result still contains all unique subsets.

Alternative: Bitmask

Each subset can be represented by a bitmask of length n: bit i is 1 if the element at index i is included. We can iterate over all integers from 0 to 2ⁿ − 1 and decode each into a subset. Time still O(n · 2ⁿ); no recursion, but the pattern is less flexible for “subset sum” or “skip duplicates” variants.

Common Mistakes

Common Mistake: Appending path instead of path[:]. You must append a copy so that later backtracking doesn’t mutate the subsets already stored in result.

Common Mistake: In Subsets II, skipping duplicates incorrectly. The rule: when building from index i, allow the first occurrence of a value in the remaining segment, and skip j > i where nums[j] == nums[j-1] so you don’t start two branches that would form the same subset.

Interview Insight: State clearly: “At each index I have two choices—include or exclude—and I backtrack after include.” Mention copying path when recording a subset. For duplicates, say “sort and skip duplicate starts.”

Practice Problems

LeetCode 78: Subsets (distinct elements).
LeetCode 90: Subsets II (with duplicates).
Subset sum: list all subsets that sum to a target (same tree, add a sum parameter and optionally prune).

Summary

Subsets: generate all 2ⁿ subsets via include/exclude at each index; base case appends a copy of path.
Always path.pop() after the “include” branch to backtrack.
Time O(n · 2ⁿ), space O(n) for recursion and path.
With duplicates (Subsets II): sort and skip duplicate “next element” choices to get unique subsets.

14.4 Permutations

Introduction

A permutation of a sequence is a rearrangement of its elements. Unlike subsets, order matters: [1, 2, 3] and [3, 2, 1] are different permutations. Given n distinct elements, there are n! permutations. The permutations problem asks: generate all n! permutations. This is another core backtracking pattern: at each step you “place” one element in the next position by choosing from the remaining elements, recurse, then undo the choice. Mastering this pattern is essential for ordering problems, anagrams, and “arrange all” style questions.

Real-World Analogy

Imagine lining up n people in a row. The first position: you can pick any of n people. The second position: any of the remaining n−1. Then n−2, and so on. Each full arrangement is one permutation. Backtracking does exactly this: “place person A in position 0, then recursively arrange the rest; then try person B in position 0, and so on.” When everyone is placed, you have one permutation—record it. Then undo the last placement and try the next candidate.

Formal Definition

Concept Note: A permutation of a set (or list) of n elements is an ordered arrangement of all n elements. Two permutations are the same only if every position has the same element. For n distinct elements there are n! = n × (n−1) × … × 1 permutations. The permutations problem: given an array of n elements, output a collection of all n! permutations (each as a list).

Order distinguishes permutations; duplicates in the input complicate “unique” permutations (handled in Permutations II).

Why This Topic Matters

Backtracking core: “Choose one unused element for the next slot, recurse, then unchoose” is the standard way to enumerate orderings.
Interview staple: LeetCode 46 (Permutations), 47 (Permutations II), and many “arrange” / “anagram” problems.
Size: Output has n! permutations; time is at least Ω(n · n!) to write them; we aim for O(n · n!) without extra waste.

Mental Model: Placement Tree

For nums = [1, 2, 3]:

Level 0: choose which element goes in position 0 (three choices).
Level 1: choose which remaining element goes in position 1 (two choices).
Level 2: one element left → one choice; then we have a full permutation.

Position:  0    1    2
Choice 1:  1  → 2  → 3   [1,2,3]
            \   \→ 3 → 2   [1,3,2]
Choice 2:  2  → 1  → 3   [2,1,3]
         ... etc. (6 leaves = 3!)

Step-by-Step Breakdown

State: Current position pos (0 to n), current partial permutation path, and which elements are still available (e.g. a list or a “used” boolean array).
Base case: When pos == n, every position is filled; path is one permutation—append a copy to the result.
Recursive case: For position pos, try every available element: put it at path[pos], mark it used, recurse to pos + 1, then unmark and backtrack.
Alternatively, swap elements in the original array so that “used” items are in the prefix and “available” in the suffix; then swap back after recursing.

Python Implementation (Using a “Used” Array)

from typing import List


def permutations(nums: List[int]) -> List[List[int]]:
    result: List[List[int]] = []
    n = len(nums)
    used = [False] * n
    path: List[int] = []

    def backtrack(pos: int) -> None:
        if pos == n:
            result.append(path[:])
            return
        for i in range(n):
            if used[i]:
                continue
            path.append(nums[i])
            used[i] = True
            backtrack(pos + 1)
            used[i] = False
            path.pop()

    backtrack(0)
    return result

Line-by-Line Explanation

used[i] is True if nums[i] is already in path.
if pos == n: result.append(path[:]) — base case: all positions filled; append a copy of path.
for i in range(n) — try every index; if used[i]: continue skips already-placed elements.
path.append(nums[i]); used[i] = True; backtrack(pos + 1); used[i] = False; path.pop() — place element, recurse, then undo so other branches can use it.

Python Implementation (Swap-Based, In-Place)

We can keep “available” elements in nums[pos .. n-1]. For each j >= pos, swap nums[pos] and nums[j], recurse on pos + 1, then swap back.

def permutations_swap(nums: List[int]) -> List[List[int]]:
    result: List[List[int]] = []
    n = len(nums)

    def backtrack(pos: int) -> None:
        if pos == n:
            result.append(nums[:])
            return
        for j in range(pos, n):
            nums[pos], nums[j] = nums[j], nums[pos]
            backtrack(pos + 1)
            nums[pos], nums[j] = nums[j], nums[pos]

    backtrack(0)
    return result

Here “choose element at index j for position pos” is done by swapping; after recursion we restore the array so the next j gets the original arrangement.

Time and Space Complexity

Time: O(n · n!). There are n! leaves; each base case does O(n) to copy the permutation. So total O(n · n!).
Space (excluding output): O(n) for recursion stack, path and used (or O(1) extra for swap-based if we don’t count the input). Output space is O(n · n!).

Edge Cases

Empty input: nums = [] → one permutation []. Base case at pos=0 appends path[:] = []; result = [[]]. Correct.
Single element: nums = [1] → [[1]]. One leaf.

Permutations II (With Duplicates)

If the array contains duplicates, we must output unique permutations—no duplicate sequences. Strategy: sort the array, then when choosing the next element for the current position, skip duplicates. Rule: for indices j in the “available” range, if nums[j] == nums[j-1] and the previous one was not used in this position, skip (to avoid generating the same permutation from two equal elements in different orders).

def permutations_ii(nums: List[int]) -> List[List[int]]:
    result: List[List[int]] = []
    nums = sorted(nums)
    n = len(nums)
    used = [False] * n
    path: List[int] = []

    def backtrack(pos: int) -> None:
        if pos == n:
            result.append(path[:])
            return
        for i in range(n):
            if used[i]:
                continue
            if i > 0 and nums[i] == nums[i - 1] and not used[i - 1]:
                continue  # skip duplicate: same value and previous not used
            path.append(nums[i])
            used[i] = True
            backtrack(pos + 1)
            used[i] = False
            path.pop()

    backtrack(0)
    return result

The condition not used[i - 1] ensures we don’t start two branches that would place the same value at the same position (which would yield duplicate permutations).

Comparison: Subsets vs Permutations

Aspect	Subsets	Permutations
Order	Does not matter	Matters
Count	2ⁿ	n!
Choice at step	Include or exclude current element	Pick one unused element for current position
State	Index + path	Position + path + used (or swap range)

Common Mistakes

Common Mistake: Appending path instead of path[:]. You must append a copy so backtracking doesn’t mutate stored permutations.

Common Mistake: Forgetting to unmark used[i] or to path.pop() after the recursive call. Without undoing, the next iteration reuses the same element or leaves the path in a wrong state.

Common Mistake: In Permutations II, using the wrong duplicate-skip condition. The standard is: skip when nums[i] == nums[i-1] and not used[i-1] (so we don’t place the same value at the same position twice via different indices).

Optimization Insight: The swap-based implementation avoids an extra used array and keeps “available” elements in a contiguous suffix. Same asymptotic time; slightly less auxiliary space.

Interview Insight: Say: “At each position I try every unused element, recurse, then backtrack.” For duplicates: “Sort and skip when the same value was already considered for this position (e.g. not used[i-1] when nums[i]==nums[i-1]).”

Practice Problems

LeetCode 46: Permutations (distinct elements).
LeetCode 47: Permutations II (with duplicates).
Next Permutation (single next in lex order)—different idea but good contrast.
Letter combinations / permutations of digits (e.g. phone keypad).

Summary

Permutations: all n! orderings; order matters; at each position choose one unused element, recurse, backtrack.
Track “used” elements (array or swap-based); base case appends a copy of the current path/array.
Time O(n · n!), space O(n) for recursion and working storage.
Permutations II: sort and skip duplicate choices (same value, previous not used) to get unique permutations.

14.5 Combination Sum

Introduction

The Combination Sum family asks: given an array of candidates and a target value, find all unique combinations of candidates whose sum equals the target. A combination is a multiset (order does not matter); [2, 2, 3] and [3, 2, 2] are the same. Two main variants: I—each candidate may be used unlimited times; II—each candidate may be used at most once, and the array may contain duplicates. Both are solved by backtracking with a “start index” to avoid listing the same combination in different orders, and (for II) skipping duplicate values correctly.

Real-World Analogy

You have coins of given denominations and must make exact change for a target amount. Each way to make change is a “combination”: which coins and how many of each. You don’t care about the order you pick coins—only the multiset. Backtracking: “Use one more coin of type A (if it keeps us ≤ target), recurse; then stop using A and try the next coin type.” That “next coin type” is the start index: we only consider candidates from index i onward so we never build [3, 2] and [2, 3] as two different combinations.

Formal Definition

Concept Note: Combination Sum I: Given distinct positive candidates and a target, find all unique combinations where the same number may be used multiple times and the sum equals target. Combination Sum II: Given candidates (possibly with duplicates) and a target, find all unique combinations where each candidate is used at most once and the sum equals target. “Unique” means no two combinations are the same when viewed as multisets (or sorted lists).

Why This Topic Matters

Classic backtracking: Combines “subset” style (which elements) with a constraint (sum = target) and optional reuse.
Interview staple: LeetCode 39 (Combination Sum), 40 (Combination Sum II), and variants (e.g. k numbers that sum to target).
Pattern: “Start index” + “use or skip” (or “use 0, 1, … times”) + prune when sum exceeds target.

Mental Model: Decision Tree with Start Index

We build combinations by deciding, for each “slot,” which candidate to use next—but we only consider candidates at or after a start index so we never generate [2, 3] and [3, 2] separately.

At each call: current sum, current path (combination so far), start index i.
Base: sum == target → record path; sum > target → return (prune).
For each j from i to n−1: add candidates[j] to path, recurse (with same j for unlimited use, or j+1 for use-once), then backtrack.

Candidates [2, 3, 5], target 5.
Start at 0: use 2 → sum=2, recurse from 0 again (unlimited) or from 1 (once).
            use 3 → sum=3, recurse...
            use 5 → sum=5 → record [5].
Pruning: if sum > target, stop that branch.

Step-by-Step Breakdown

State: start (index into candidates), path (current combination), cur_sum (sum of path).
Base cases: If cur_sum == target, append path[:] to result and return. If cur_sum > target, return (prune).
Recursive case: For j from start to len(candidates)-1:
- I (unlimited): Add candidates[j], recurse with start = j (can use j again), then pop and continue to next j.
- II (once): If duplicate (j > start and candidates[j]==candidates[j-1]), skip. Else add candidates[j], recurse with start = j+1, pop.

Python Implementation: Combination Sum I (Unlimited Use)

from typing import List


def combination_sum(candidates: List[int], target: int) -> List[List[int]]:
    result: List[List[int]] = []
    path: List[int] = []

    def backtrack(start: int, cur_sum: int) -> None:
        if cur_sum == target:
            result.append(path[:])
            return
        if cur_sum > target:
            return
        for j in range(start, len(candidates)):
            path.append(candidates[j])
            backtrack(j, cur_sum + candidates[j])  # same j: can reuse
            path.pop()

    backtrack(0, 0)
    return result

Python Implementation: Combination Sum II (Use at Most Once)

def combination_sum_ii(candidates: List[int], target: int) -> List[List[int]]:
    result: List[List[int]] = []
    path: List[int] = []
    candidates = sorted(candidates)

    def backtrack(start: int, cur_sum: int) -> None:
        if cur_sum == target:
            result.append(path[:])
            return
        if cur_sum > target:
            return
        for j in range(start, len(candidates)):
            if j > start and candidates[j] == candidates[j - 1]:
                continue  # skip duplicate: same value, avoid same combination
            path.append(candidates[j])
            backtrack(j + 1, cur_sum + candidates[j])  # j+1: use once
            path.pop()

    backtrack(0, 0)
    return result

Line-by-Line Explanation

I: backtrack(j, ...) keeps start = j so we can pick the same candidate again. cur_sum > target prunes.
II: candidates is sorted so duplicates are adjacent. if j > start and candidates[j] == candidates[j-1]: continue avoids using the same value twice in the same “slot” (which would create duplicate combinations). backtrack(j+1, ...) ensures each candidate is used at most once.

Common Mistake: In Combination Sum II, forgetting to sort or using the wrong duplicate condition. You must sort so that “skip when same as previous” works; the condition j > start (not j > 0) is what prevents duplicate combinations from the same value appearing in two different positions.

Time and Space Complexity

I: Time depends on target and candidates. In the worst case we explore many combinations; often described as O(2^(target/min)) in loose terms. Space O(target/min) for recursion depth and path.
II: Up to 2^n subsets, each checked for sum; O(2^n) time in the worst case. Space O(n) for recursion and path.

Edge Cases

Target 0: One valid combination: empty list []. Handle by either appending when cur_sum==0 at start, or defining “target 0” as return [[]].
Empty candidates: No combinations; return [].
All candidates > target (II): Pruning will cut all branches; result [].

Common Mistakes

Common Mistake: Reusing the same index when you meant “use once” (II): passing j instead of j+1 in the recursive call gives multiple uses of the same candidate.

Common Mistake: Forgetting to prune when cur_sum > target; without it you can recurse unnecessarily (and risk stack overflow or TLE).

Optimization Insight: You can prune earlier: before recursing, check if cur_sum + candidates[j] > target and skip that j (or break if candidates are positive and sorted, since later j are larger).

Interview Insight: State: “I use a start index so each combination is built in one order only. For I I recurse with start=j to allow reuse; for II I use start=j+1 and sort + skip duplicates.” Mention pruning when sum exceeds target.

Practice Problems

LeetCode 39: Combination Sum (unlimited use).
LeetCode 40: Combination Sum II (at most once, with duplicates).
Combination Sum III: use exactly k numbers from 1..9 that sum to n.
Subset Sum (count or list subsets that sum to target)—same tree, different base case.

Summary

Combination Sum I: Backtrack with start index; recurse with start=j to allow reusing the same candidate; prune when cur_sum > target.
Combination Sum II: Sort candidates; recurse with start=j+1; skip when j > start and candidates[j]==candidates[j-1] to avoid duplicate combinations.
Always append path[:] and pop after recursion to backtrack correctly.
Pruning on sum is essential for efficiency.

14.6 N Queens

Introduction

The N Queens problem: place n queens on an n×n chessboard so that no two queens attack each other. Queens attack along their row, column, and both diagonals. We place exactly one queen per row (or per column); at each row we choose a column, check that it is safe with respect to all previously placed queens, then recurse to the next row. Backtrack when no column is valid. This is a classic constraint satisfaction problem and a standard backtracking interview question.

Real-World Analogy

Imagine placing one token per row on a grid. Each token “blocks” its entire column and both diagonal directions. Your job: fill every row without any two tokens sharing a column or diagonal. You try the first row—pick a column; then the second row—pick a column that isn’t blocked; and so on. If you reach a row where every column is blocked, you undo the last row’s choice and try another column. That’s exactly N Queens: one queen per row, try each column, check safety, recurse, backtrack.

Formal Definition

Concept Note: On an n×n chessboard, a queen attacks any square on the same row, same column, or same diagonal (both main and anti-diagonals). The N Queens problem: place n queens so that no two attack. Equivalently: choose n squares, one per row and one per column, with no two on the same diagonal. Output can be all valid board configurations or a compact representation (e.g. list of column indices per row).

Why This Topic Matters

Constraint backtracking: Combines “choose one option per step” with a non-trivial safety check (column + diagonals).
Interview staple: LeetCode 51 (N-Queens) and 52 (N-Queens II count). Tests recursion, pruning, and encoding state.
Pattern: One decision per row/column; efficient “is this cell safe?” is key (sets or arrays for columns and diagonals).

Mental Model: Row-by-Row Placement

We fill the board row by row. Row 0: try column 0, 1, …, n−1. For each choice, mark that column and the two diagonals as “used,” then recurse to row 1. At row 1 we only try columns that are still safe. If we reach row n, we have placed n queens—record the solution. If for some row no column is safe, backtrack: unmark the last placement and try the next column.

Row 0: place Q at col 1 → mark col 1, diag (0-1), anti (0+1)
Row 1: try col 0 (safe?), col 2 (safe?), ...
Row 2: ...
...
Row n-1: place last Q → solution. Backtrack to find more.

Diagonal Indexing

For a cell (row, col):

Main diagonal (↘): cells with same (row − col). Use index row - col (can be negative; shift by +n for array index).
Anti-diagonal (↙): cells with same (row + col). Use index row + col (0 to 2n−2).

So we maintain three sets (or boolean arrays): cols, main_diag (row−col), anti_diag (row+col). A placement (r, c) is safe iff c not in cols, (r−c) not in main_diag, (r+c) not in anti_diag.

Step-by-Step Breakdown

State: Current row r (0 to n), current placement (e.g. list of column indices or a 2D board), and sets/arrays for columns and diagonals.
Base case: If r == n, all rows have a queen; record the solution (e.g. append a copy of the board or the column list).
Recursive case: For each column c, if (r, c) is safe, mark column and both diagonals, add queen to placement, recurse to r+1, then unmark and remove queen (backtrack).

Python Implementation

from typing import List


def solve_n_queens(n: int) -> List[List[str]]:
    result: List[List[str]] = []
    # placement: board[r] = column index of queen in row r
    board: List[int] = [-1] * n
    cols: set = set()
    main_diag: set = set()   # row - col
    anti_diag: set = set()   # row + col

    def safe(r: int, c: int) -> bool:
        return c not in cols and (r - c) not in main_diag and (r + c) not in anti_diag

    def format_board() -> List[str]:
        rows = []
        for c in board:
            rows.append("." * c + "Q" + "." * (n - c - 1))
        return rows

    def backtrack(r: int) -> None:
        if r == n:
            result.append(format_board())
            return
        for c in range(n):
            if not safe(r, c):
                continue
            board[r] = c
            cols.add(c)
            main_diag.add(r - c)
            anti_diag.add(r + c)
            backtrack(r + 1)
            anti_diag.discard(r + c)
            main_diag.discard(r - c)
            cols.discard(c)
            board[r] = -1

    backtrack(0)
    return result

Line-by-Line Explanation

board[r] = c means the queen in row r is in column c. We only need one index per row.
safe(r, c): no conflict in column c, main diagonal (r−c), or anti-diagonal (r+c).
backtrack(r): if r == n, format the board (e.g. [".Q..", "...Q", ...]) and append to result.
In the loop: if safe, set board[r]=c, add c and diagonals to sets, recurse backtrack(r+1), then remove from sets and reset board[r] so the next column can be tried.

Time and Space Complexity

Time: In the worst case we explore many placements; upper bound is O(n!) because we try up to n choices per row and pruning reduces it. Often cited as O(n!) or with a tighter analysis.
Space: O(n) for recursion stack, board, and the three sets. Excluding output, O(n).

Edge Cases

n = 1: One queen, one cell; one solution.
n = 2 or 3: No solution; result is [].
n = 4: Two solutions (up to symmetry).

Common Mistakes

Common Mistake: Forgetting to unmark the column and diagonals after the recursive call. You must remove (r, c) from all three sets and reset board[r] so the next column is tried with a clean state.

Common Mistake: Wrong diagonal indices. Main diagonal is (row − col); anti-diagonal is (row + col). Using (row, col) or swapped signs gives incorrect pruning.

Optimization Insight: Using sets (or boolean arrays) for columns and diagonals gives O(1) safety check. Checking by iterating over previous queens is correct but O(n) per check; the set approach is preferred.

Interview Insight: Say: “Place one queen per row; for each row try every column. Track columns and both diagonals (row−col and row+col) in sets. If a cell is safe, place the queen, recurse to the next row, then backtrack.” Mention the diagonal indexing clearly.

Practice Problems

LeetCode 51: N-Queens (return all distinct board configurations).
LeetCode 52: N-Queens II (return the number of distinct solutions).
Print one valid configuration only (stop after first solution if desired).

Summary

N Queens: Place n queens, one per row; try each column per row; ensure no two share column or diagonal.
Track columns and both diagonals (row−col, row+col) for O(1) safety check.
Backtrack: after recursing, unmark column and diagonals and remove the queen so the next column can be tried.
Output can be list of strings (one per row, "Q" and ".") or list of column indices per row.

14.7 Sudoku Solver

Introduction

The Sudoku Solver problem: given a 9×9 grid partially filled with digits 1–9 (and empty cells marked as '.' or 0), fill the grid so that every row, every column, and every 3×3 sub-box contains the digits 1–9 exactly once. We solve it by backtracking: pick an empty cell, try each valid digit (1–9), check that it doesn’t violate row, column, or box constraints, place it and recurse; if no digit leads to a solution, backtrack and try another digit (or another cell order). It’s a classic constraint satisfaction problem and a standard interview question.

Real-World Analogy

Imagine a 9×9 grid with some numbers already written. Your job: fill the blanks so that in every row, every column, and every 3×3 block, the numbers 1–9 each appear exactly once. You pick one empty cell, try 1, then 2, …; for each try you check “is this allowed here?” (no same number in row, column, or block). If it’s allowed, you write it and move to the next empty cell. If you ever get stuck (no valid digit), you erase the last number you wrote and try the next option. That’s backtracking: try, check, recurse, undo.

Formal Definition

Concept Note: A valid Sudoku solution satisfies: (1) Each row contains 1–9 exactly once. (2) Each column contains 1–9 exactly once. (3) Each of the nine 3×3 sub-boxes (non-overlapping) contains 1–9 exactly once. The solver receives a board with some cells filled and some empty; it must fill empty cells in place (or return a filled board) so the grid satisfies all three rules. The input is guaranteed to have exactly one solution in typical problem statements.

Why This Topic Matters

Constraint backtracking: Multiple constraints (row, column, box) and many choices per cell make it a strong test of backtracking and pruning.
Interview staple: LeetCode 37 (Sudoku Solver). Tests recursion, validity checks, and in-place modification.
Pattern: “Find empty cell → try each valid value → recurse → backtrack” generalizes to many grid puzzles.

Mental Model: Cell-by-Cell Fill

We process the grid in some order (e.g. row-major). For each cell:

If the cell is already filled, move to the next cell (or return true if no more cells—solved).
If the cell is empty, try digits 1–9. For each digit, check if it is valid in this cell (not already in the same row, column, or 3×3 box). If valid, place it, recurse to the next cell; if the recursion returns true, we’re done. Otherwise remove the digit and try the next.
If no digit works, return false so the caller can backtrack.

Grid order: (0,0), (0,1), ... (8,8).
At empty cell (r,c): try d in 1..9
  if valid(r, c, d): board[r][c]=d, recurse next cell; if true return true; else board[r][c]='.'
return false

Box Index

The nine 3×3 boxes are indexed by (row // 3, col // 3). So box index for cell (r, c) is box = (r // 3, c // 3). To check “digit d in box containing (r, c)”: look at cells (r//3)*3 + i, (c//3)*3 + j for i, j in 0..2. Alternatively, use a single index box_id = (r // 3) * 3 + (c // 3) (0–8).

Step-by-Step Breakdown

Find next empty cell: Scan row by row (and column by column); if no empty cell, the board is solved—return True.
Try digits 1–9: For each digit d, check valid(board, r, c, d) (not in row r, not in column c, not in the 3×3 box containing (r, c)).
Place and recurse: Set board[r][c] = d, call solver (next cell). If it returns True, return True. Else set board[r][c] back to empty and try next d.
Backtrack: If no digit works, return False.

Python Implementation

from typing import List  # board: List[List[str]] with '1'..'9' or '.'


def solve_sudoku(board: List[List[str]]) -> None:
    def valid(r: int, c: int, d: str) -> bool:
        for i in range(9):
            if board[r][i] == d or board[i][c] == d:
                return False
        br, bc = (r // 3) * 3, (c // 3) * 3
        for i in range(3):
            for j in range(3):
                if board[br + i][bc + j] == d:
                    return False
        return True

    def solve() -> bool:
        for r in range(9):
            for c in range(9):
                if board[r][c] != '.':
                    continue
                for d in "123456789":
                    if not valid(r, c, d):
                        continue
                    board[r][c] = d
                    if solve():
                        return True
                    board[r][c] = '.'
                return False  # no digit worked
        return True  # no empty cell

    solve()

Line-by-Line Explanation

valid(r, c, d): Check row r (all columns), column c (all rows), and the 3×3 box starting at (br, bc) = (r//3*3, c//3*3). If d appears anywhere, return False.
solve(): Double loop finds first empty cell (board[r][c] == '.'). If none, return True. For that cell, try each d; if valid, place d, recurse; if solve() returns True, propagate True. Else reset cell to '.' and try next d. If no d works, return False.

Common Mistake: Forgetting to reset board[r][c] to '.' when the recursive call fails. Without resetting, the board is left in an invalid state and later checks are wrong.

Time and Space Complexity

Time: Worst case we try up to 9 choices per empty cell; number of empty cells can be large. Loosely O(9^m) where m is the number of empty cells, but pruning (valid check) reduces it. Often cited as exponential in the number of blanks.
Space: O(1) extra if we don’t count the board; recursion depth is at most the number of empty cells (e.g. O(81) for 9×9).

Edge Cases

Already solved: No empty cells; solve() returns True immediately.
Invalid initial board: Same digit twice in a row/col/box; our solver may still run but won’t find a solution. Problem usually guarantees valid input.
Character representation: Use strings '1'..'9' and '.' (or 0) as in LeetCode; ensure comparison is consistent.

Optimization: Track Rows, Columns, Boxes

Instead of scanning row/column/box every time, maintain three 9×9 boolean arrays (or sets of digits): rows[r][d], cols[c][d], boxes[b][d] indicating whether digit d is in row r, column c, or box b. When placing d at (r,c), set these to True; when removing, set to False. Then valid(r,c,d) is O(1). Initialize from the given board.

Common Mistakes

Common Mistake: Returning False too early: you must return False only after trying all 9 digits for the current empty cell. Returning False as soon as one digit fails would skip other valid options.

Common Mistake: Wrong box bounds. The box for (r, c) starts at row (r//3)*3 and column (c//3)*3, and spans 3 rows and 3 columns. Off-by-one or wrong multiplier is a frequent bug.

Interview Insight: Say: “Find the next empty cell. Try digits 1–9; for each, check row, column, and 3×3 box. If valid, place and recurse; if solve() returns true we’re done. Else unplace and try next digit. If none work, return false.” Mention box index (r//3)*3, (c//3)*3.

Practice Problems

LeetCode 37: Sudoku Solver (solve in place).
LeetCode 36: Valid Sudoku (check if a filled board is valid—no need to solve).
Variants: larger grids (e.g. 16×16), or “find all solutions.”

Summary

Sudoku Solver: Fill empty cells so each row, column, and 3×3 box has 1–9 exactly once.
Backtrack: find empty cell → try 1–9 → valid (row, col, box) → place, recurse, unplace if recurse fails.
Box for (r, c): top-left at (r//3)*3, (c//3)*3; check that 3×3 region for duplicates.
Reset cell to empty when backtracking; use row/col/box arrays for O(1) validity if optimizing.

14.8 Rat in Maze

Introduction

The Rat in a Maze problem: given an n×n grid where some cells are open (1) and some blocked (0), find a path from the top-left (0, 0) to the bottom-right (n−1, n−1). The rat can move only up, down, left, or right (typically one step at a time). We use backtracking: from the current cell, try each allowed direction; if the next cell is in bounds, open, and unvisited, mark it visited, recurse; if we reach the destination, record or return the path; otherwise unmark and try the next direction. Variations ask for one path, all paths, or the shortest path (BFS is better for shortest).

Real-World Analogy

Imagine a mouse in a maze drawn on graph paper. Some squares are walls; the mouse can only step on open squares and move one step up, down, left, or right. Starting at the top-left corner, the mouse tries one direction: if the next square is open and not yet visited, it steps there and continues. If it reaches the bottom-right, it has found a path. If it gets stuck (all neighbors blocked or visited), it steps back and tries another direction. That try-step-back-try-again is backtracking.

Formal Definition

Concept Note: Input: an n×n matrix maze with maze[i][j] == 1 (open) or 0 (blocked). Start at (0, 0), destination (n−1, n−1). Valid move: one step in one of four directions (up, down, left, right) to a cell that is in bounds, open, and (in the standard formulation) not already on the current path. Output: a path (sequence of moves or coordinates) from start to destination, or report that no path exists. Sometimes the problem allows visiting a cell multiple times; then we only need to avoid going back to the previous cell to prevent trivial loops.

Why This Topic Matters

Grid backtracking: Same “try each option, recurse, backtrack” pattern on a 2D grid; foundation for many puzzle and path problems.
Interview staple: Common in coding rounds; sometimes “print all paths” or “count paths” (with possible DP overlap).
Pattern: Direction array (dx, dy), bounds check, “open and unvisited” check, mark/unmark, recurse.

Mental Model: Try All Directions

At current cell (r, c):

If (r, c) is the destination, we have a path—record it and return (or continue to find more paths).
Otherwise, try each of the four neighbors (down, up, right, left—or any order). For each neighbor (nr, nc): if in bounds, open, and unvisited, mark visited, add to path, recurse, then unmark and remove from path.

Directions: D(down), U(up), R(right), L(left) → (1,0), (-1,0), (0,1), (0,-1)
At (r,c): for each (dr,dc), next = (r+dr, c+dc)
  if in bounds and maze[nr][nc]==1 and not visited[nr][nc]:
    mark, path += move, recurse, unmark, path pop

Step-by-Step Breakdown

State: Current position (r, c), current path (list of moves or cells), visited matrix (or mark/unmark on the maze).
Base case: If (r, c) == (n−1, n−1), destination reached—append current path to result (or return true for “one path” version).
Recursive case: Define direction vectors (e.g. down, up, right, left). For each direction, compute next cell (nr, nc). If nr, nc in [0, n−1], maze[nr][nc] is open, and (nr, nc) not visited: mark visited, append move to path, recurse, then unmark and pop path.

Python Implementation

from typing import List, Tuple

# maze: 1 = open, 0 = blocked. Find path from (0,0) to (n-1,n-1).
# Moves: D U R L → (1,0), (-1,0), (0,1), (0,-1)
DIRS = [(1, 0, "D"), (-1, 0, "U"), (0, 1, "R"), (0, -1, "L")]


def rat_in_maze(maze: List[List[int]], n: int) -> List[str]:
    result: List[str] = []
    path: List[str] = []
    visited = [[False] * n for _ in range(n)]

    def in_bounds(r: int, c: int) -> bool:
        return 0 <= r < n and 0 <= c < n

    def backtrack(r: int, c: int) -> None:
        if r == n - 1 and c == n - 1:
            result.append("".join(path))
            return
        for dr, dc, move in DIRS:
            nr, nc = r + dr, c + dc
            if not in_bounds(nr, nc) or maze[nr][nc] == 0 or visited[nr][nc]:
                continue
            visited[nr][nc] = True
            path.append(move)
            backtrack(nr, nc)
            path.pop()
            visited[nr][nc] = False

    if maze[0][0] == 1:
        visited[0][0] = True
        backtrack(0, 0)
    return result

Line-by-Line Explanation

DIRS: (delta row, delta col, move name) for down, up, right, left.
backtrack(r, c): If (r, c) is (n−1, n−1), save path string and return. For each direction, compute (nr, nc); skip if out of bounds, blocked, or visited. Else mark (nr, nc) visited, append move, recurse, then pop move and unmark.
Start: if (0,0) is open, mark it visited and call backtrack(0, 0). Result list holds all path strings (e.g. ["DDRR", "DRDR"]).

Common Mistake: Forgetting to unmark visited[nr][nc] after the recursive call. Without unmarking, cells stay “visited” and other valid paths that reuse that cell are missed (for “all paths”); or the same path is blocked on backtrack.

Time and Space Complexity

Time: In the worst case we explore many paths; each cell can be tried in multiple paths. Upper bound is exponential (e.g. O(4^(n²)) in theory); in practice pruning (blocked/visited) reduces it.
Space: O(n²) for visited matrix; O(path length) for recursion stack and path list. For “all paths” we store each path string.

Edge Cases

Start or destination blocked: If maze[0][0]==0 or maze[n-1][n-1]==0, no path; return [] or false.
1×1 grid: Start is destination; path is [] (no moves). Handle by checking (r,c)==(n-1,n-1) at entry and appending empty path if needed.
No path: All paths get stuck; result is [].

Variation: Count Paths or One Path

For “count all paths,” increment a counter (or len(result)) when reaching the destination instead of storing path strings. For “find one path,” return True as soon as you reach the destination and pass the path back; no need to try other directions.

Common Mistakes

Common Mistake: Not marking the start cell (0,0) as visited before the first backtrack call. Otherwise the start can be “re-entered” from a neighbor and create duplicate or invalid paths.

Common Mistake: Wrong direction order or wrong delta values. (1,0) is “down” if row increases downward; (-1,0) is “up.” Check problem’s coordinate system.

Optimization Insight: For “shortest path” in an unweighted grid, BFS is better than backtracking (first time reaching (n-1,n-1) gives shortest). Use backtracking when you need all paths or a simple path.

Interview Insight: Say: “From each cell I try all four directions. Check in bounds, open, and unvisited. Mark visited, recurse, unmark. When I reach (n-1,n-1) I record the path.” Mention direction array and that unmarking is required for “all paths.”

Practice Problems

GfG / classic: Rat in a Maze (print all paths in lex order—often D, L, R, U).
LeetCode 79: Word Search (path in grid spelling a word—similar try-directions backtracking).
Count number of paths; shortest path (BFS).

Summary

Rat in Maze: Find path from (0,0) to (n−1,n−1) on a grid; move to open neighbors only; backtrack over directions.
Use direction vectors (dr, dc), bounds check, and “open and unvisited” check; mark before recurse, unmark after.
Base case: at destination, record path (or return true).
For shortest path use BFS; use backtracking for all paths or one path.

14.9 Branch & Bound

Introduction

Branch and Bound is a search strategy that extends backtracking with bounds to prune the search tree. We explore the same kind of "decision tree" (branch), but at each node we compute a bound (e.g. a lower bound for a minimization problem) that tells us the best possible value we can get from that subtree. If the bound shows that no descendant can beat our current best solution, we prune that branch and do not explore it. This turns "find all solutions" into "find an optimal solution" and often reduces the number of nodes visited compared to plain backtracking.

Real-World Analogy

Imagine searching for the cheapest route that visits several cities. Plain backtracking would try every ordering and then pick the best. Branch and bound: as you build a partial route, you estimate the minimum cost any completion could have (e.g. using a simple lower bound). If that estimate is already higher than the best full route you've found so far, you stop extending that partial route—you "prune" that branch. You still branch (try next city), but you bound (estimate) and cut off hopeless branches.

Formal Definition

Concept Note: Branch: Split the problem into subproblems (e.g. "include this item" vs "exclude it"), each corresponding to a child node in the search tree. Bound: For each node, compute a lower bound (minimization) or upper bound (maximization) on the best objective value achievable in that subtree. If the bound is worse than the current best solution, prune the node (do not explore its descendants). The algorithm keeps a global "best so far" and updates it when a full solution is found.

Why This Topic Matters

Optimization over backtracking: When you need the best solution (not all), bounds can drastically reduce the search space.
Classic problems: 0/1 Knapsack (max profit), Traveling Salesman (min cost), Job Assignment (min cost).
Interview relevance: Less common than pure backtracking, but understanding "bound + prune" helps in optimization and DP discussions.

Mental Model: Tree + Bound + Prune

Think of the search as a tree:

Each node = partial solution (e.g. first k items chosen for knapsack).
Children = extensions (e.g. include item k+1 or exclude it).
At each node: compute a bound (e.g. lower bound for min TSP = current cost + minimum remaining edge cost).
If bound >= current best (for minimization), prune. Otherwise recurse into children.
When we reach a leaf (full solution), update "best" if this solution is better.

Minimization example:
  best = infinity
  at node: bound = lower_bound(partial)
  if bound >= best: return   (prune)
  if leaf: best = min(best, cost); return
  for each child: recurse

Step-by-Step Breakdown

State: Partial solution (e.g. current weight, value, or path), current index or level, and global best value (and optionally best solution).
Bound: Compute a bound for the current partial solution. For minimization: lower bound = best possible completion; for maximization: upper bound.
Prune: If bound cannot beat the current best, return without exploring children.
Base case: If partial solution is complete (e.g. all items decided), update best if better and return.
Branch: Generate children (e.g. include next item / exclude next item); for each child, recurse.

Backtracking vs Branch and Bound

Aspect	Backtracking	Branch & Bound
Goal	Find all solutions or one solution	Find optimal solution (min or max)
Pruning	Only when constraint violated (e.g. invalid)	When bound shows subtree cannot beat current best
Extra work per node	None (or simple feasibility)	Compute bound (lower/upper)

Example: 0/1 Knapsack (Maximization)

Given weights and values of n items and capacity W, maximize total value with total weight <= W. Branch: at each item, include or exclude. Bound: an upper bound for the current partial solution (e.g. current value + fractional knapsack value of remaining items—greedy upper bound). If this upper bound <= best value so far, prune.

# Sketch: items = [(w1,v1), ...], capacity W. best_value = 0.
def knapsack_bb(i: int, cur_w: int, cur_v: int) -> None:
    if cur_w > W:
        return
    if i == n:
        global best_value
        best_value = max(best_value, cur_v)
        return
    # Upper bound: cur_v + fractional knapsack value of items i..n-1 with remaining weight (W - cur_w)
    ub = upper_bound(i, cur_w, cur_v)
    if ub <= best_value:
        return  # prune
    knapsack_bb(i + 1, cur_w, cur_v)           # exclude
    knapsack_bb(i + 1, cur_w + w[i], cur_v + v[i])  # include

A good upper bound is critical: too loose gives little pruning; too tight (invalid) can miss the optimum. Fractional knapsack bound is valid and often strong.

Time and Space Complexity

Time: Still exponential in the worst case (no pruning), but in practice a good bound can reduce the tree significantly. Depends heavily on the problem and bound quality.
Space: O(depth of tree) for recursion; O(1) or O(n) for best solution and state.

Edge Cases

No feasible solution: best remains at initial value (e.g. -infinity for max, +infinity for min); report "no solution."
Bound computation: Ensure the bound is valid (never underestimate a lower bound, never overestimate an upper bound for max problems) or pruning may discard the optimal solution.

Common Mistakes

Common Mistake: Using an invalid bound (e.g. a lower bound that is too high, or an upper bound that is too low). The bound must be mathematically valid so that pruning never cuts off a node that could lead to a better solution than the current best.

Common Mistake: Forgetting to update the best solution when reaching a leaf. The algorithm relies on "current best" to prune; if you never update it, pruning never activates or is wrong.

Optimization Insight: Order of branching can matter: exploring "promising" branches first (e.g. higher value-per-weight in knapsack) often improves the best solution earlier and increases pruning. Some implementations use a priority queue (best-first) instead of DFS.

Interview Insight: Say: "Branch and bound is like backtracking but we compute a bound at each node. If the bound shows we can't do better than the best solution so far, we prune. Used for optimization—knapsack, TSP, assignment." Emphasize valid bounds.

Practice Problems

0/1 Knapsack (max profit) with fractional upper bound.
Traveling Salesman: lower bound = current cost + MST of remaining cities (or similar).
Job Assignment (min cost): bound = current cost + minimum possible cost of assigning remaining jobs.

Summary

Branch and Bound: Branch (split into subproblems) and bound (estimate best value in subtree); prune if bound cannot beat current best.
Used for optimization (min or max); keep a global "best" and update at leaves.
Bounds must be valid (lower bound <= true minimum; upper bound >= true maximum) or pruning may miss the optimum.
Strong bounds and good branching order improve pruning and speed.

14.10 Pruning Techniques

Introduction

Pruning in backtracking and search means cutting off branches of the search tree that cannot lead to a solution (or to a better solution, in optimization). Without pruning, we may explore millions of useless nodes; with good pruning, we often reduce the search space dramatically. This topic summarizes pruning techniques: when and how to detect "this branch is hopeless" and skip it. The ideas apply across subsets, permutations, combination sum, N Queens, Sudoku, and branch-and-bound.

Real-World Analogy

Imagine searching a maze by trying every path. As soon as you see a sign "dead end" or "no exit," you stop walking that way and try another. Pruning is that sign: we compute something (constraint violated? bound too bad? duplicate?) and decide "don't go further here." The earlier and more accurate the sign, the less walking we do.

Formal Definition

Concept Note: Pruning is the act of not exploring a branch of the search tree because we can prove (or safely assume) that no descendant of that branch will be a solution we care about. Techniques include: constraint pruning (partial solution already violates a constraint), bound pruning (bound shows we can't beat current best), symmetry pruning (skip equivalent branches), ordering heuristics (explore promising branches first to improve bounds earlier), and early termination (stop as soon as one solution is found).

Why This Topic Matters

Efficiency: Good pruning can turn an infeasible brute force into an acceptable solution.
Interview: Saying "I'll prune when..." shows you think about search space and optimization.
Reuse: The same ideas (constraint check, bound, skip duplicate) apply to many problems.

Mental Model

Before or after extending the current node, ask:

Is the current (partial) solution already invalid? → Constraint prune.
Can this subtree never beat the best we have? → Bound prune.
Is this branch equivalent to one we already explored? → Symmetry / duplicate prune.
Do we only need one solution? → Early terminate when we find it.

Types of Pruning

1. Constraint Pruning (Feasibility)

If the partial solution already violates a problem constraint, no extension can fix it. Examples:

Combination Sum: If cur_sum > target, stop—adding more will only increase the sum.
N Queens: If placing a queen at (r, c) attacks an existing queen, don't place it (or don't recurse).
Knapsack: If cur_weight > W, return; no need to add more items.

Implement by checking the constraint at the start of the recursive call or before recursing; if not feasible: return.

2. Bound Pruning (Optimization)

Used when we want the best solution. Compute a bound (lower for min, upper for max). If bound cannot beat the current best, prune. See Branch & Bound (14.9).

3. Symmetry and Duplicate Pruning

Avoid exploring branches that are equivalent to ones already explored. Examples:

Subsets / Combination Sum: Use a "start index"—only consider candidates from index i onward so we don't get [2,3] and [3,2] as two different branches.
Subsets II / Combination Sum II: Sort and skip: if j > start and candidates[j] == candidates[j-1], skip j so we don't generate the same combination twice.
Permutations II: Skip placing the same value at the same position twice (e.g. nums[i]==nums[i-1] and not used[i-1]).

4. Ordering Heuristics

Order in which we try choices can affect how much we prune:

Try promising branches first: In branch-and-bound, trying "include high-value item" before "exclude" may improve the best solution early, so later bound pruning is more effective.
Try constrained options first: In Sudoku, choosing the cell with fewest valid digits (most constrained) can reduce branching.

5. Early Termination

When the problem asks for "one solution" or "any valid path," return as soon as you find it. Don't continue to explore other branches. Example: Rat in Maze "find one path"—if reached destination: return True and propagate True up so the caller stops trying other directions.

Example: Pruning in Combination Sum

def backtrack(start: int, cur_sum: int) -> None:
    if cur_sum == target:
        result.append(path[:])
        return
    if cur_sum > target:   # constraint prune
        return
    for j in range(start, len(candidates)):   # start index = symmetry/order prune
        path.append(candidates[j])
        backtrack(j, cur_sum + candidates[j])  # I: same j
        path.pop()

Here cur_sum > target is constraint pruning; start ensures we don't generate the same combination in different orders.

Example: Pruning in N Queens

We don't "prune" in the sense of returning early from a node—we simply never recurse into invalid cells. The "safety" check (column and diagonals) acts as pruning: we only branch into safe columns. So "try each column, recurse only if safe" is constraint pruning at the branch level.

Common Mistakes

Common Mistake: Pruning too aggressively with an invalid rule. For example, skipping a duplicate in Permutations II with the wrong condition can drop valid permutations. Always ensure your prune condition only removes branches that are truly redundant or infeasible.

Common Mistake: Forgetting to prune when it's easy. In Combination Sum, checking if cur_sum > target: return is simple and avoids many useless recursions. Missing it can cause TLE or stack overflow on large targets.

Optimization Insight: Combine techniques: use constraint pruning (feasibility) always; add bound pruning for optimization; add symmetry/duplicate pruning when the problem has equivalent branches; use early termination when one solution is enough.

Interview Insight: When asked to optimize a backtracking solution, say: "I'll add pruning: (1) constraint check so we don't extend invalid partial solutions, (2) if we need the best solution, a bound to cut bad branches, (3) if there are duplicates or symmetric choices, skip redundant branches." Give one concrete example per type.

Practice Problems

Revisit Subsets II, Combination Sum II, Permutations II and name the pruning used (duplicate/symmetry).
Revisit N Queens: no explicit "prune" return, but only branching into safe columns is constraint pruning.
0/1 Knapsack with branch-and-bound: bound pruning + optional constraint prune (weight > W).

Summary

Pruning = not exploring branches that can't lead to a (or a better) solution.
Constraint pruning: Partial solution violates a constraint → return.
Bound pruning: Bound shows subtree can't beat current best → return (optimization).
Symmetry/duplicate pruning: Start index, sort+skip duplicates, so we don't explore equivalent branches.
Early termination: Return as soon as one solution is found when that's all we need.

Section 15: Dynamic Programming

This section covers dynamic programming (DP): solving problems by breaking them into overlapping subproblems and storing results so each subproblem is solved once. You will learn memoization (top-down: recurse and cache) and tabulation (bottom-up: fill a table), then apply them to classic problems like Fibonacci, Knapsack, LCS, LIS, and Coin Change. State design and transition design are the keys: once you define "what to store" and "how to combine," the code follows.

15.1 Memoization

Introduction

Memoization is a technique to make recursive algorithms efficient by caching the results of function calls. When a function is called with the same arguments again, we return the stored result instead of recomputing. This is the top-down approach to dynamic programming: you write the natural recurrence (e.g. fib(n) = fib(n-1) + fib(n-2)), then add a cache so each subproblem is solved only once. Memoization turns exponential-time recursion into time proportional to the number of distinct subproblems.

Real-World Analogy

Imagine solving a puzzle that has many identical sub-puzzles. The first time you solve a sub-puzzle, you write the answer on a sticky note and stick it on that piece. The next time you need that same sub-puzzle, you read the note instead of solving it again. Memoization is that sticky note: the cache stores answers for "already solved" inputs so we never recompute them.

Formal Definition

Concept Note: Memoization (top-down DP): Before computing the result for a given input, check if we have already computed it (e.g. in a dictionary or array). If yes, return the cached value. If no, compute the result (typically using recursive calls that may hit the cache), store it, and return it. The "state" is the function arguments; the cache key is usually those arguments (or a tuple of them).

Why This Topic Matters

Foundation of DP: Many DP solutions are easiest to derive as recurrences; memoization implements them with minimal change.
Interview standard: "First write the recurrence, then add memoization" is a common approach; tabulation can follow.
Complexity: Reduces time from exponential (repeated subproblems) to O(number of states × cost per state).

Mental Model

For a recursive function f(args):

If args is in the cache, return cache[args].
Otherwise compute the result (using base cases and recursive calls to f).
Store the result in cache[args], then return it.

Every distinct set of arguments is a "state"; we compute each state at most once.

Step-by-Step: Fibonacci with Memoization

Recurrence: fib(0)=0, fib(1)=1, fib(n)=fib(n-1)+fib(n-2) for n≥2. Without cache, many repeated calls (e.g. fib(2) computed many times). With a cache keyed by n, we compute fib(n) only once per n.

Base: n=0 or n=1 → return n.
Check cache: if n in cache, return cache[n].
Compute: result = fib(n-1) + fib(n-2) (these calls may use cache).
Store cache[n] = result; return result.

Python Implementation

Explicit cache (dict)

def fib_memo(n: int, cache: dict = None) -> int:
    if cache is None:
        cache = {}
    if n <= 1:
        return n
    if n in cache:
        return cache[n]
    result = fib_memo(n - 1, cache) + fib_memo(n - 2, cache)
    cache[n] = result
    return result

Using functools.lru_cache

from functools import lru_cache

@lru_cache(maxsize=None)
def fib_lru(n: int) -> int:
    if n <= 1:
        return n
    return fib_lru(n - 1) + fib_lru(n - 2)

lru_cache automatically stores return values keyed by arguments. maxsize=None means unbounded cache. Arguments must be hashable.

Line-by-Line Explanation (Explicit Cache)

cache maps n → fib(n). Default cache=None so we create one dict per top-level call.
if n <= 1: return n — base case.
if n in cache: return cache[n] — already computed; no recursion.
result = fib_memo(n-1, cache) + fib_memo(n-2, cache) — recurse; nested calls will use the same cache and may return cached values.
cache[n] = result; return result — store and return.

Time and Space Complexity

Time: Each distinct argument (state) is computed once. For Fibonacci, states are 0..n, so O(n) calls with O(1) work each → O(n) time.
Space: O(n) for the cache (n+1 entries) and O(n) for the recursion stack in the worst case. So O(n) total.

Without memoization, fib(n) would make about 2^n recursive calls; memoization reduces this to O(n).

Edge Cases

n < 0: Define behavior (e.g. raise or extend definition).
n = 0 or 1: Base case; no cache needed for these if you handle them before the cache check.

Cache Key Design

For functions with multiple arguments, the cache key must uniquely identify the state. Use a tuple of arguments: cache[(i, j)]. Ensure arguments are hashable (no lists; use tuples). For recursive DP on arrays, the state might be (index, extra_param).

Common Mistakes

Common Mistake: Forgetting to return the cached value. You must return cache[key] after the lookup; otherwise you compute again and may overwrite the cache without using it.

Common Mistake: Using a mutable value (e.g. a list) as part of the cache key. Keys must be hashable. Use tuple(indices) or tuple(tuple(row) for row in grid) if you need to key by structure.

Optimization Insight: Memoization only computes states that are actually reached by the recurrence. In problems where many states are unreachable, memoization can be more efficient than tabulation (which might fill a full table). When all states are reachable and order is clear, tabulation can be simpler and avoid recursion stack.

Memoization vs Tabulation

Memoization: Top-down; recurse and cache; compute only needed states; recursion stack. Tabulation: Bottom-up; fill a table in order; often iterate; no recursion. Same asymptotic time and space when both solve the same subproblems. Choose memoization when the recurrence is natural and state space is sparse or order is tricky.

Interview Insight: Say: "I'll write the recurrence first (base case + recursive case), then add a cache. Before computing, check if the state is already in the cache; after computing, store and return." Mention that time becomes O(states) instead of exponential.

Practice Problems

Fibonacci (above).
Climbing stairs: ways to reach step n (same recurrence as fib).
0/1 Knapsack: memoize (index, remaining_weight).
Longest Common Subsequence: memoize (i, j) on two string indices.

Summary

Memoization = cache results of recursive calls; before compute, check cache; after compute, store and return.
Transforms exponential recursion into O(states × work per state) time.
Cache key = function arguments (tuple if multiple); must be hashable.
Use explicit dict or @lru_cache; same idea applies to any recurrence.

15.2 Tabulation

Introduction

Tabulation is the bottom-up approach to dynamic programming. Instead of recursing and caching, we fill a table (usually an array or 2D array) in a fixed order so that when we compute dp[i], all the values it depends on (e.g. dp[i-1], dp[i-2]) are already computed. We use loops, not recursion. The recurrence is the same as in memoization—only the order of evaluation changes: we solve "smaller" subproblems first, then build up to the answer.

Real-World Analogy

Building a wall brick by brick: you don't build the top row first and then ask "what's under it?" You build the bottom row, then the next, then the next. Each row depends only on the row below. Tabulation is like that: fill the table in an order where every cell you write only needs cells you've already written.

Formal Definition

Concept Note: Tabulation (bottom-up DP): Define a table dp[...] where each entry corresponds to a state (e.g. dp[i] = answer for subproblem of size i). Determine an order of filling so that when computing dp[state], all states it depends on are already computed. Initialize base cases (e.g. dp[0], dp[1]), then iterate and fill the rest. No recursion; the final answer is typically dp[n] or dp[0][m] etc.

Why This Topic Matters

No recursion stack: Avoids stack overflow when the "depth" of subproblems is large.
Same recurrence: Once you have the recurrence (from memoization or reasoning), tabulation is a mechanical rewrite: loops in dependency order.
Space optimization: Often only a few previous rows or columns are needed, so we can reduce space (e.g. O(n) to O(1) for Fibonacci).

Mental Model

For a 1D recurrence like Fibonacci:

Define dp[i] = value for subproblem i.
Set base cases: dp[0], dp[1], ...
For i from first "unknown" index to n: dp[i] = f(dp[i-1], dp[i-2], ...) using the recurrence.
Return dp[n] (or the relevant entry).

Critical: fill in an order such that every dependency is already in the table.

Step-by-Step: Fibonacci with Tabulation

Recurrence: fib(0)=0, fib(1)=1, fib(i)=fib(i-1)+fib(i-2). Table: dp[i] = fib(i). Order: i = 0, 1, 2, ..., n. So we need dp[0] and dp[1] first; then for i ≥ 2, dp[i] depends only on dp[i-1] and dp[i-2], both already computed.

Python Implementation

Full table

def fib_tab(n: int) -> int:
    if n <= 1:
        return n
    dp = [0] * (n + 1)
    dp[0] = 0
    dp[1] = 1
    for i in range(2, n + 1):
        dp[i] = dp[i - 1] + dp[i - 2]
    return dp[n]

Space-optimized (only previous two)

def fib_tab_opt(n: int) -> int:
    if n <= 1:
        return n
    prev2, prev1 = 0, 1
    for i in range(2, n + 1):
        curr = prev1 + prev2
        prev2, prev1 = prev1, curr
    return prev1

We only need the last two values, so O(1) extra space instead of O(n).

Line-by-Line Explanation

dp[i] stores fib(i). Base: dp[0]=0, dp[1]=1.
Loop i from 2 to n: dp[i] = dp[i-1] + dp[i-2] — recurrence; i-1 and i-2 are already filled.
Return dp[n]. In the optimized version, prev2 and prev1 roll forward so we never need the full array.

Time and Space Complexity

Time: O(n) — one loop, O(1) work per iteration.
Space (full table): O(n) for dp. Space (optimized): O(1) for two variables.

Order of Filling (2D Example)

For a 2D table (e.g. dp[i][j] depends on dp[i-1][j] and dp[i][j-1]), we must fill so that when we compute dp[i][j], those cells are already done. Typical: fill row by row, or column by column. For dp[i][j] depending on dp[i+1][j] and dp[i][j+1], fill in reverse order (bottom-right to top-left).

Edge Cases

n <= 1: Return n without building the table.
Table size: Use n+1 for 0-indexed fib(n); ensure indices stay in bounds.

Tabulation vs Memoization

Aspect	Memoization	Tabulation
Direction	Top-down (recurse to smaller)	Bottom-up (fill from base)
Structure	Recursion + cache	Loops + table
States computed	Only those reached by recurrence	All in the fill order (unless optimized)
Stack	Uses recursion stack	No recursion

Common Mistakes

Common Mistake: Filling the table in the wrong order. If dp[i] depends on dp[i+1], you must fill from high index to low index, or the dependency is not yet computed.

Common Mistake: Off-by-one in table size or indices. Use dp = [0] * (n + 1) if you need indices 0..n; ensure every access is within bounds.

Interview Insight: Say: "I'll define dp[i] as ... and base cases dp[0], dp[1]. Then I'll iterate from 2 to n and apply the recurrence. If we only need the last few values, we can use O(1) space." For 2D, mention the fill order (row by row or by dependency).

Practice Problems

Fibonacci (above); climbing stairs (same table).
0/1 Knapsack: 2D table dp[i][w]; fill by i and w.
LCS: 2D table dp[i][j]; fill row by row or by (i+j).

Summary

Tabulation = bottom-up: fill a table in dependency order so each cell's dependencies are already computed.
Same recurrence as memoization; evaluation order is explicit (loops).
No recursion → no stack overflow; often allows space optimization (e.g. keep only last row or last two values).
Ensure correct fill order and indices; for 2D, state the order clearly.

15.3 State Design

Introduction

In dynamic programming, the state is the set of parameters that uniquely define a subproblem. Choosing the right state is the first and most important step: it determines the dimensions of your table (or the keys in your memo), the recurrence, and the complexity. State design means deciding: "What do I need to know to answer this subproblem?" Once the state is clear, the transition (how to combine smaller states) often follows naturally.

Real-World Analogy

Imagine solving "best way to get from A to B" with choices along the way. The "state" is your current situation: where you are and what resources you have left (e.g. time, money). Two people in the same place with the same resources face the same subproblem—so we store the answer keyed by (place, resources). State design is asking: "What exactly is 'same situation'?" so we don't store redundant or insufficient information.

Formal Definition

Concept Note: A state in DP is a tuple of parameters that uniquely identifies a subproblem. The subproblem for state S is: "Given that we are in situation S, what is the optimal (or feasible) answer from here?" The state space is the set of all possible S. We want the state to be sufficient (enough to define the subproblem) and minimal (no redundant parameters that don't affect the answer).

Why This Topic Matters

First step in any DP: Wrong state → wrong or inefficient solution; right state → clean recurrence.
Interview: "What is your state?" is a standard question; answering clearly shows you understand the problem.
Complexity: Number of states = table size (or cache size); minimizing state dimensions keeps complexity manageable.

Mental Model: What Must We Remember?

Ask: "If I'm in the middle of the problem, what information do I need to compute the rest without redoing the past?" That information is your state. Examples:

Fibonacci: "How many steps left?" → state = n (one parameter).
0/1 Knapsack: "Which items are left and how much weight capacity?" → state = (index, remaining_weight).
LCS: "How much of each string is left?" → state = (i, j) (prefix lengths or indices).

Step-by-Step: How to Choose State

Identify the "decisions" or "progress": What choices are we making? (e.g. which item to take, which character to match.)
Ask what we need to know after each decision: Typically "how much of the input is left" (indices) and any "resource" (capacity, count, flag).
Express as parameters: State = (index, ...) or (i, j, ...). Ensure that two different states never have the same answer when they should differ (and that equivalent situations map to the same state).
Check base cases: Smallest state(s) should have known answers (e.g. empty input, zero capacity).

Common State Patterns

1D State

Single parameter: position, index, or "length." Examples: Fibonacci (n), climbing stairs (step), house robber (index). Table: dp[i].

2D State

Two parameters: often two indices (two strings, two sequences) or (index, capacity). Examples: LCS (i, j), knapsack (i, w), edit distance (i, j). Table: dp[i][j] or dp[i][w].

State with Extra Dimension

Sometimes we need an extra parameter: "number of items taken," "whether we took the previous," "remaining k." Examples: "at most k transactions" → (index, k); "paint n houses with no two adjacent same color" → (house, color). Table: dp[i][k] or dp[i][color].

Example: 0/1 Knapsack State

Problem: Items with weight and value; capacity W. Maximize value with total weight ≤ W.

State: (i, w) = "considering items from index i onward, with w weight remaining." Answer for (i, w) = max value we can get from items i..n-1 with capacity w. Base: i = n (no items) → 0; or w = 0 → 0. Transition: skip item i → (i+1, w); take item i → value[i] + (i+1, w - weight[i]) if w ≥ weight[i]. State is 2D: index and remaining capacity.

Example: LCS State

Problem: Longest common subsequence of two strings A, B.

State: (i, j) = "LCS of A[0..i-1] and B[0..j-1]" (prefixes of length i and j). Base: i=0 or j=0 → 0. Transition: if A[i-1]==B[j-1] then 1 + dp(i-1,j-1); else max(dp(i-1,j), dp(i,j-1)). State is 2D: two indices.

Common Mistakes

Common Mistake: Redundant state: including a parameter that can be derived from others (e.g. "current sum" when it's determined by "which items we took" and we already have index). Sometimes redundant state simplifies the transition—then it's a tradeoff.

Common Mistake: Insufficient state: forgetting a dimension that affects the answer. Example: in "max profit with at most k transactions," if we don't track k, we can't enforce the limit. State must include k.

Optimization Insight: After defining state, check if the number of distinct states is acceptable. If state has many dimensions and the range of each is large, the table may be too big; look for a more compact state or a different formulation.

Interview Insight: Say: "My state is (i, j) meaning ... The base case is ... The transition is: we have two choices, ... so we take the max." Stating state clearly before writing code helps you and the interviewer stay aligned.

Practice Problems

Climbing stairs: state = step index (1D).
0/1 Knapsack: state = (index, remaining weight) (2D).
LCS: state = (i, j) (2D).
House robber: state = index; sometimes "did we take previous?" (index, 0/1).

Summary

State = parameters that uniquely define a subproblem; must be sufficient and ideally minimal.
Ask: "What do I need to know to solve the rest from here?" → that's your state.
Common: 1D (index/length), 2D (two indices or index + capacity), or 2D + extra (index, k or index, flag).
Wrong state → wrong recurrence; right state makes the transition natural.

15.4 Transition Design

Introduction

Once the state is defined, the transition is the rule that expresses dp[state] in terms of other states. It answers: "Given that I'm in this state, what choices do I have, and how do the results of those choices combine?" Transition design is the heart of the recurrence: base cases handle the smallest states; for every other state we write an equation (or min/max/sum over choices) that uses only "smaller" or "already computed" states. Getting the transition right is what makes the DP correct.

Real-World Analogy

At every step you have a few choices. Each choice leads to a new situation (a new state). The "transition" is the rule: "My best outcome from here = (best outcome if I choose A) combined with (best outcome if I choose B)"—e.g. take the max, or add the cost of the choice and recurse. You're not inventing new math; you're writing down "what happens if I do this" and "how do I combine the outcomes."

Formal Definition

Concept Note: A transition (recurrence) for state S is an expression that gives dp[S] using only base cases and dp[S'] for states S' that are "smaller" or "already computed." Typically: dp[S] = combine( choice_1(S), choice_2(S), ... ) where each choice may involve a cost/reward and a recursive dp[next_state]. Combine is often max, min, or + (sum). The transition must only reference states that are computed before S in the fill order (tabulation) or that eventually hit base cases (memoization).

Why This Topic Matters

Correctness: Wrong transition → wrong answer; the transition encodes the problem logic.
Interview: "What are the choices at this state?" and "How do you combine them?" are the next questions after state.
Fill order: In tabulation, the transition dictates which states must be computed first (dependencies).

Mental Model: Choices and Combine

For each state, ask:

What are my choices? (e.g. take item / skip item; use character i / skip i.)
What state does each choice lead to? (e.g. (i+1, w) or (i+1, w - weight[i]).)
What is the "value" of each choice? (immediate cost or reward + value of the next state.)
How do I combine? Usually max (optimization), min (minimization), or + (counting/sum).

Common Transition Patterns

Linear combination (Fibonacci-style)

dp[i] = dp[i-1] + dp[i-2] (or a linear combination of a few previous terms). No explicit "choice"; the recurrence is the rule. Base: dp[0], dp[1].

Take or skip (knapsack-style)

At state (i, w): choice 1 — skip item i → go to (i+1, w), value = dp(i+1, w). Choice 2 — take item i (if w ≥ weight[i]) → value = value[i] + dp(i+1, w - weight[i]). Then dp(i,w) = max(choice1, choice2). Base: when i = n or w = 0.

Match or skip (LCS / edit distance-style)

At state (i, j): if A[i-1] == B[j-1] then we can "match" → 1 + dp(i-1, j-1). Else we "skip" one character: max(dp(i-1, j), dp(i, j-1)). So dp(i,j) = 1 + dp(i-1,j-1) if match, else max(dp(i-1,j), dp(i,j-1)). Base: i=0 or j=0 → 0.

Min/max over many options

Sometimes we take the best over several next states: dp[i] = min over k of (cost(i,k) + dp[k]). Examples: matrix chain multiplication, segment splits. Ensure k runs over valid indices and that dp[k] is computed before dp[i] (or use memoization).

Example: 0/1 Knapsack Transition

# dp[i][w] = max value from items i..n-1 with capacity w
# Base: dp[n][w] = 0 for all w; dp[i][0] = 0
# Transition:
if w < weight[i]:
    dp[i][w] = dp[i+1][w]                    # can't take
else:
    dp[i][w] = max( dp[i+1][w], value[i] + dp[i+1][w - weight[i]] )

Two choices: skip (same w, next i) or take (value plus state with reduced w and next i). Take the max.

Example: LCS Transition

# dp[i][j] = LCS length of A[0..i-1] and B[0..j-1]
# Base: dp[0][j] = dp[i][0] = 0
if A[i-1] == B[j-1]:
    dp[i][j] = 1 + dp[i-1][j-1]
else:
    dp[i][j] = max(dp[i-1][j], dp[i][j-1])

Match (both advance) or skip one of the two characters; take max of the two skip options.

Dependency Order

In tabulation, we must fill the table so that when we compute dp[i][j], every state it depends on is already computed. For knapsack dp[i][w] depends on dp[i+1][...], so we fill i from n-1 down to 0 (or j from 0 to n depending on definition). For LCS dp[i][j] depends on dp[i-1][j-1], dp[i-1][j], dp[i][j-1] — all with "smaller" i+j or smaller i,j — so we can fill row by row, or by increasing i+j.

Common Mistakes

Common Mistake: Using a state in the transition that is not yet computed (in tabulation). For example, if dp[i] depends on dp[i+1], fill from high i to low i, not the other way around.

Common Mistake: Forgetting a choice. In knapsack you must consider both "take" and "skip"; in LCS you must consider both "skip A[i]" and "skip B[j]" when characters don't match. Missing a choice gives a suboptimal or wrong answer.

Optimization Insight: Sometimes the transition can be simplified by grouping equivalent choices or by algebraic manipulation (e.g. prefix sums). After writing the obvious recurrence, check if it can be rewritten for fewer lookups or simpler code.

Interview Insight: Say: "At this state I have two choices: ... and .... For the first I get ... and go to state ...; for the second I get ... and go to .... So I take the max/min/sum." Write the recurrence in one line, then code it.

Practice Problems

Fibonacci: transition dp[i] = dp[i-1] + dp[i-2].
0/1 Knapsack: transition = max(skip, take) with correct indices.
LCS: transition = match or max(skip A, skip B).
Climbing stairs: dp[i] = dp[i-1] + dp[i-2] (or sum of last k steps for k steps at a time).

Summary

Transition = how dp[state] is computed from other states; encodes choices and combine rule (max/min/sum).
For each state: list choices → next state and value for each → combine (usually max, min, or +).
Patterns: linear combo, take/skip, match/skip, min over options; base cases handle smallest states.
In tabulation, fill order must respect dependencies; in memoization, recursion handles order.

15.5 Fibonacci

Introduction

The Fibonacci sequence is the classic first example of dynamic programming. Defined by F(0)=0, F(1)=1, and F(n)=F(n-1)+F(n-2) for n≥2, it has overlapping subproblems: computing F(n) naively by recursion recomputes F(n-2), F(n-3), ... many times, leading to exponential time. Fibonacci illustrates state (single parameter n), transition (add two previous), and the full progression: brute force → memoization → tabulation → space-optimized tabulation.

Why Fibonacci for DP

Overlapping subproblems: F(n-2) is needed for both F(n) and F(n-1); without storing it, we recompute.
Simple state: One parameter n; state space is 0..n.
Simple transition: F(n) = F(n-1) + F(n-2); no choices, just combine.
Same pattern everywhere: Climbing stairs, tile problems, and many counting problems use the same recurrence.

State and Transition

State: n (or index i) — "Fibonacci value at position n."

Base cases: dp[0] = 0, dp[1] = 1.

Transition: dp[i] = dp[i-1] + dp[i-2] for i ≥ 2. No "choice"; it's a direct recurrence.

Brute Force → Memoization → Tabulation → Space-Optimized

1. Brute force (recursion only)

def fib_naive(n: int) -> int:
    if n <= 1:
        return n
    return fib_naive(n - 1) + fib_naive(n - 2)

Time: O(2^n) — each call branches into two; many repeated subproblems. Space: O(n) stack.

2. Memoization (top-down)

def fib_memo(n: int, cache: dict = None) -> int:
    if cache is None:
        cache = {}
    if n <= 1:
        return n
    if n in cache:
        return cache[n]
    cache[n] = fib_memo(n - 1, cache) + fib_memo(n - 2, cache)
    return cache[n]

Time: O(n). Space: O(n) cache + O(n) stack.

3. Tabulation (bottom-up, full table)

def fib_tab(n: int) -> int:
    if n <= 1:
        return n
    dp = [0] * (n + 1)
    dp[0], dp[1] = 0, 1
    for i in range(2, n + 1):
        dp[i] = dp[i - 1] + dp[i - 2]
    return dp[n]

Time: O(n). Space: O(n) table; no recursion stack.

4. Space-optimized tabulation

def fib_opt(n: int) -> int:
    if n <= 1:
        return n
    prev2, prev1 = 0, 1
    for i in range(2, n + 1):
        curr = prev1 + prev2
        prev2, prev1 = prev1, curr
    return prev1

Time: O(n). Space: O(1) — only two variables. We only need the last two values to compute the next.

Time and Space Summary

Approach	Time	Space
Brute force	O(2^n)	O(n) stack
Memoization	O(n)	O(n)
Tabulation	O(n)	O(n)
Space-optimized	O(n)	O(1)

Edge Cases

n < 0: Usually undefined; raise or define (e.g. extend for negative indices if needed).
n = 0 or 1: Return n; handle before building any table.
Large n: Values grow fast; use modulo if problem asks for F(n) mod M to avoid overflow (Python ints are big, but modulo is common in contests).

Common Mistakes

Common Mistake: Off-by-one: returning dp[n-1] when the problem expects the nth Fibonacci number (0-indexed: F(0)=0, F(1)=1, so F(n) is at index n). Ensure your loop runs to n and you return the correct index.

Common Mistake: In space-optimized version, updating variables in the wrong order. You must do curr = prev1 + prev2 then prev2, prev1 = prev1, curr. Updating prev1 before computing curr loses the previous value.

Interview Insight: "Fibonacci has overlapping subproblems, so I use DP. State is n; transition is F(n)=F(n-1)+F(n-2). I can do memoization or tabulation; for constant space we only need the last two values." Often asked as a warm-up before harder DP.

Summary

Fibonacci: F(0)=0, F(1)=1, F(n)=F(n-1)+F(n-2); state = n, transition = add previous two.
Brute force is O(2^n); memoization and tabulation are O(n) time; space-optimized tabulation is O(1) space.
Same pattern applies to climbing stairs and many counting problems.
Handle n ≤ 1; watch off-by-one and variable update order in the O(1)-space version.

15.6 Knapsack (0/1 & Unbounded)

Introduction

The knapsack problem: given items with weights and values, and a capacity W, choose items to maximize total value without exceeding capacity. Two main variants: 0/1 Knapsack — each item at most once; Unbounded Knapsack — each item can be used any number of times. Both are solved by DP with state (index, capacity) or (capacity only for unbounded) and transition "take or skip" (0/1) or "take one more of current item or move on" (unbounded). Knapsack is the template for many "choose subset with constraint" problems.

Problem Definition

Input: n items; item i has weight wt[i] and value val[i]; capacity W.
0/1: Each item at most once. Maximize sum of values of chosen items such that sum of weights ≤ W.
Unbounded: Each item can be taken any number of times. Same objective.

0/1 Knapsack: State and Transition

State: dp[i][w] = maximum value we can get from items 0..i-1 with capacity w (or: from items i..n-1 with remaining capacity w — definition can vary; below we use "items 0..i-1, capacity w").

Base: dp[0][w] = 0 for all w (no items); dp[i][0] = 0 (no capacity).

Transition: For item i-1 (0-indexed), we have two choices:

Skip: dp[i][w] = dp[i-1][w].
Take: If wt[i-1] ≤ w, dp[i][w] = val[i-1] + dp[i-1][w - wt[i-1]].
dp[i][w] = max(skip, take) (take only if valid).

Answer: dp[n][W].

0/1 Knapsack: Python (Tabulation)

def knapsack_01(wt: list, val: list, W: int) -> int:
    n = len(wt)
    dp = [[0] * (W + 1) for _ in range(n + 1)]
    for i in range(1, n + 1):
        for w in range(1, W + 1):
            dp[i][w] = dp[i - 1][w]  # skip
            if wt[i - 1] <= w:
                take = val[i - 1] + dp[i - 1][w - wt[i - 1]]
                dp[i][w] = max(dp[i][w], take)
    return dp[n][W]

0/1 Knapsack: Space Optimization

dp[i][w] only depends on dp[i-1][...]. So we can use a single row (or two rows) and fill w from right to left so we don't overwrite values we still need: dp[w] = max(dp[w], val[i-1] + dp[w - wt[i-1]]) for w from W down to wt[i-1].

def knapsack_01_opt(wt: list, val: list, W: int) -> int:
    dp = [0] * (W + 1)
    for i in range(len(wt)):
        for w in range(W, wt[i] - 1, -1):  # reverse so we don't use updated dp
            dp[w] = max(dp[w], val[i] + dp[w - wt[i]])
    return dp[W]

Unbounded Knapsack: State and Transition

State: dp[w] = maximum value we can get with capacity w, using any item any number of times.

Base: dp[0] = 0.

Transition: For each capacity w, try including one copy of each item i: if wt[i] ≤ w, dp[w] = max(dp[w], val[i] + dp[w - wt[i]]). We iterate w from 0 to W so that dp[w - wt[i]] may already include more of item i (unbounded).

Answer: dp[W].

Unbounded Knapsack: Python

def knapsack_unbounded(wt: list, val: list, W: int) -> int:
    dp = [0] * (W + 1)
    for w in range(1, W + 1):
        for i in range(len(wt)):
            if wt[i] <= w:
                dp[w] = max(dp[w], val[i] + dp[w - wt[i]])
    return dp[W]

Note: We iterate w forward (0 to W) so that the same item can be used multiple times (we use updated dp[w - wt[i]]). In 0/1 we iterate w backward to avoid using the same item twice.

0/1 vs Unbounded: Key Difference

Aspect	0/1 Knapsack	Unbounded Knapsack
Item use	At most once	Unlimited
State	(i, w) or 1D with reverse w loop	(w) — 1D
Loop order (1D)	w from W down to wt[i]	w from 1 to W (forward)

Time and Space Complexity

0/1: Time O(n×W), space O(n×W) full table or O(W) with 1D and reverse loop.
Unbounded: Time O(n×W), space O(W).

Edge Cases

W = 0 or n = 0: Answer 0.
All weights > W: Answer 0.
Negative weight/value: Standard formulation assumes non-negative; if allowed, problem changes.

Common Mistakes

Common Mistake: In 0/1 space-optimized, iterating w from 0 to W instead of W down to wt[i]. Forward order would let the same item be used multiple times (unbounded behavior).

Common Mistake: In unbounded, iterating w backward. Then dp[w - wt[i]] hasn't been updated for this round and we effectively use each item at most once (0/1 behavior).

Interview Insight: "State is (index, capacity) for 0/1, or just (capacity) for unbounded. Transition: take or skip; take means add value and reduce capacity. For 0/1 we need previous row only so we can do 1D with reverse w loop; for unbounded we iterate w forward so the same item can be reused."

Practice Problems

LeetCode 416: Partition Equal Subset Sum (0/1: can we reach sum/2?).
LeetCode 518: Coin Change 2 (unbounded: number of ways to make amount).
LeetCode 322: Coin Change (unbounded: minimum number of coins).

Summary

0/1 Knapsack: Each item once; state (i, w); transition max(skip, take); 1D with w loop reversed.
Unbounded Knapsack: Each item unlimited; state (w); transition try each item, dp[w] = max(val[i] + dp[w-wt[i]]); w loop forward.
Reverse w in 0/1 to avoid using same item twice; forward w in unbounded to allow reuse.
Time O(n×W); space O(W) with optimization for both.

15.7 LCS

Introduction

The Longest Common Subsequence (LCS) of two strings is the longest sequence of characters that appears in both strings in the same relative order (not necessarily contiguous). For example, LCS of "abcde" and "ace" is "ace" (length 3). LCS is a classic 2D DP: state (i, j) = LCS length of the prefixes A[0..i-1] and B[0..j-1]; transition is "match if equal, else skip one character from either string." It appears in diff tools, bioinformatics, and many string problems.

Formal Definition

Concept Note: A subsequence of a string is obtained by deleting zero or more characters without changing the order of the remaining characters. The LCS of strings A and B is a subsequence of both A and B of maximum length. There may be more than one LCS; we often compute only the length. If we need one actual sequence, we can backtrack through the DP table.

State and Transition

State: dp[i][j] = length of LCS of A[0..i-1] and B[0..j-1] (prefixes of length i and j).

Base: dp[0][j] = dp[i][0] = 0 for all i, j (empty prefix has LCS length 0).

Transition:

If A[i-1] == B[j-1]: we can match this character → dp[i][j] = 1 + dp[i-1][j-1].
Else: we skip one character — either from A or from B → dp[i][j] = max(dp[i-1][j], dp[i][j-1]).

Answer: dp[len(A)][len(B)].

Python Implementation

def lcs(A: str, B: str) -> int:
    m, n = len(A), len(B)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if A[i - 1] == B[j - 1]:
                dp[i][j] = 1 + dp[i - 1][j - 1]
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    return dp[m][n]

Line-by-Line Explanation

dp[i][j] uses 1-based length for i, j; indices into A, B are i-1, j-1.
Match: A[i-1] == B[j-1] → extend LCS by 1, so 1 + dp[i-1][j-1].
No match: best of "skip A[i-1]" (dp[i-1][j]) or "skip B[j-1]" (dp[i][j-1]).
Fill order: row by row (or column by column); when we compute dp[i][j], dp[i-1][j-1], dp[i-1][j], dp[i][j-1] are already computed.

Reconstructing One LCS (Backtrack)

To recover one LCS string, backtrack from dp[m][n]: if A[i-1]==B[j-1], include that character and go to (i-1, j-1); else go to (i-1, j) or (i, j-1) depending on which gave the max (or either if equal). Build the string in reverse, then reverse at the end.

def lcs_string(A: str, B: str) -> str:
    m, n = len(A), len(B)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if A[i - 1] == B[j - 1]:
                dp[i][j] = 1 + dp[i - 1][j - 1]
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    # Backtrack
    i, j = m, n
    res = []
    while i and j:
        if A[i - 1] == B[j - 1]:
            res.append(A[i - 1])
            i, j = i - 1, j - 1
        elif dp[i - 1][j] >= dp[i][j - 1]:
            i -= 1
        else:
            j -= 1
    return "".join(reversed(res))

Time and Space Complexity

Time: O(m×n) — two nested loops over m and n.
Space: O(m×n) for the table. Can be reduced to O(min(m,n)) by keeping only two rows (current and previous), since dp[i][j] only depends on row i-1 and current row.

Edge Cases

Empty string: LCS of "" and anything is ""; length 0. Our base case handles this.
No common character: LCS length 0; dp stays 0.
One string is subsequence of the other: LCS length = length of the shorter string.

Common Mistakes

Common Mistake: Off-by-one: using A[i] and B[j] instead of A[i-1] and B[j-1] when dp[i][j] corresponds to prefixes of length i and j. Ensure consistency between "length" indices (1..m) and "array" indices (0..m-1).

Common Mistake: Confusing subsequence with substring. Substring is contiguous; subsequence can skip characters. LCS is about subsequences.

Interview Insight: "State is (i, j) for prefix lengths; if characters match we take 1 + LCS(i-1, j-1); else we take max of skip A or skip B. Fill row by row. O(m n) time and space."

Practice Problems

LeetCode 1143: Longest Common Subsequence (length).
Print one LCS (backtrack above).
LeetCode 72: Edit Distance (related 2D DP).

Summary

LCS: longest subsequence common to two strings; state (i, j) = LCS length of prefixes of length i, j.
Transition: match → 1 + dp[i-1][j-1]; else max(dp[i-1][j], dp[i][j-1]).
Time O(m×n), space O(m×n) or O(min(m,n)) with two rows.
Subsequence ≠ substring; backtrack table to recover one LCS string.

15.8 LIS

Introduction

The Longest Increasing Subsequence (LIS) problem: given an array of numbers, find the length of the longest subsequence that is strictly (or non-strictly) increasing. For example, in [10, 9, 2, 5, 3, 7, 101, 18], one LIS is [2, 3, 7, 101] with length 4. LIS can be solved in O(n²) with DP (dp[i] = longest increasing subsequence ending at index i) or in O(n log n) using a "tails" array and binary search. It is a classic DP and appears in many forms (e.g. Russian doll envelopes, building bridges).

Formal Definition

Concept Note: A subsequence is obtained by deleting some elements without changing the order of the rest. An increasing subsequence is one where each element is strictly greater than the previous (strict) or greater than or equal (non-strict). The LIS is an increasing subsequence of maximum length. We often compute only the length; reconstructing one LIS is done by backtracking through the DP or the tails structure.

O(n²) DP: State and Transition

State: dp[i] = length of the longest increasing subsequence that ends at index i (and includes nums[i]).

Base: dp[i] = 1 for all i (each element alone is a subsequence of length 1).

Transition: For each i, consider all j < i such that nums[j] < nums[i]. We can extend the LIS ending at j by appending nums[i], so dp[i] = 1 + max{ dp[j] : j < i and nums[j] < nums[i] }. If no such j exists, dp[i] stays 1.

Answer: max(dp[0], dp[1], ..., dp[n-1]).

O(n²) Python Implementation

def lis_n2(nums: list) -> int:
    if not nums:
        return 0
    n = len(nums)
    dp = [1] * n
    for i in range(1, n):
        for j in range(i):
            if nums[j] < nums[i]:
                dp[i] = max(dp[i], 1 + dp[j])
    return max(dp)

O(n log n) Approach: Tails + Binary Search

We maintain an array tails where tails[k] = smallest ending value of an increasing subsequence of length k+1 seen so far. For each new value x, we either extend the longest chain (append to tails) or replace the first element in tails that is ≥ x (so we keep a "better" candidate for that length). Finding that position is a binary search. The length of LIS is the length of tails at the end.

import bisect

def lis_nlogn(nums: list) -> int:
    tails = []
    for x in nums:
        pos = bisect.bisect_left(tails, x)
        if pos == len(tails):
            tails.append(x)
        else:
            tails[pos] = x
    return len(tails)

bisect_left(tails, x) returns the index where x would be inserted to keep tails sorted. If pos == len(tails), all current endings are < x, so we extend. Otherwise we replace tails[pos] with x (same length, better ending value for future).

Comparison: O(n²) vs O(n log n)

Aspect	O(n²) DP	O(n log n) Tails
Time	O(n²)	O(n log n)
Space	O(n)	O(n) for tails
Reconstruct LIS	Easy: backtrack parent pointers from max dp[i]	Tails doesn't store indices; need extra structure to recover

Time and Space Complexity

O(n²): Time O(n²), space O(n) for dp array.
O(n log n): Time O(n log n) (n elements, binary search each), space O(n) for tails (length at most n).

Edge Cases

Empty array: LIS length 0.
Single element: LIS length 1.
Strictly decreasing: LIS length 1 (each element alone).
Non-strict (≤): Use bisect_right and replace when > instead of ≥ if problem allows equal.

Common Mistakes

Common Mistake: Defining dp[i] as "LIS of the prefix ending at i" but forgetting that the LIS must *include* nums[i]. Our definition "LIS ending at i" (including i) makes the transition correct: we extend only from j with nums[j] < nums[i].

Common Mistake: In O(n log n), using bisect_right instead of bisect_left for strict LIS. We want to replace the first element that is ≥ x so the sequence stays strictly increasing; bisect_left gives that position.

Interview Insight: "DP: dp[i] = 1 + max dp[j] over j < i with nums[j] < nums[i]; answer max(dp). O(n²). For O(n log n) we maintain the smallest tail for each length and binary search; length of that array is the LIS length."

Practice Problems

LeetCode 300: Longest Increasing Subsequence (length).
LeetCode 354: Russian Doll Envelopes (sort + LIS on second dimension).
Number of LIS (count; different DP).

Summary

LIS: longest increasing (strict or non-strict) subsequence; we usually compute length.
O(n²) DP: dp[i] = LIS ending at i; dp[i] = 1 + max dp[j] for j < i, nums[j] < nums[i]; answer max(dp).
O(n log n): Tails array + binary search; length of tails = LIS length.
Empty/single element edge cases; use bisect_left for strict LIS in the O(n log n) method.

15.9 Coin Change

Introduction

The Coin Change problem: given coins of certain denominations and a target amount, find the minimum number of coins needed to make that amount (each coin can be used unlimited times), or find the number of distinct combinations that make the amount. Both are unbounded DP: state is amount (or amount + some index depending on formulation); we iterate over amounts and try each coin. Coin Change is a direct application of the unbounded knapsack idea—minimize coins (min instead of max) or count ways (sum over choices).

Two Classic Variants

Minimum number of coins (LeetCode 322): Return the fewest number of coins that sum to amount. If impossible, return -1.
Number of combinations (LeetCode 518): Return the number of combinations that make amount. Order does not count (1+2 and 2+1 are the same).

Variant 1: Minimum Number of Coins

State: dp[a] = minimum number of coins needed to make amount a (using coins unlimited times).

Base: dp[0] = 0 (no coins needed for 0). Initialize dp[a] = infinity for a > 0 (or a large number) to represent "not yet reachable."

Transition: For each amount a from 1 to amount, try each coin c: if c <= a, dp[a] = min(dp[a], 1 + dp[a - c]). After the loop, if dp[amount] is still infinity, return -1.

def coin_change_min(coins: list, amount: int) -> int:
    INF = float("inf")
    dp = [INF] * (amount + 1)
    dp[0] = 0
    for a in range(1, amount + 1):
        for c in coins:
            if c <= a:
                dp[a] = min(dp[a], 1 + dp[a - c])
    return dp[amount] if dp[amount] != INF else -1

We iterate a forward so that the same coin can be used multiple times (unbounded).

Variant 2: Number of Combinations

State: dp[a] = number of ways to make amount a (order of coins doesn't matter—combinations).

Base: dp[0] = 1 (one way to make 0: use no coins).

Transition: To avoid counting (1,2) and (2,1) as different, we iterate by coin first, then amount. For each coin c, for each amount a from c to amount: dp[a] += dp[a - c]. This way we build combinations in a fixed order (e.g. use coin 1 first, then coin 2, ...).

def coin_change_ways(coins: list, amount: int) -> int:
    dp = [0] * (amount + 1)
    dp[0] = 1
    for c in coins:
        for a in range(c, amount + 1):
            dp[a] += dp[a - c]
    return dp[amount]

If we iterated amount first then coin, we would count permutations (order matters). For combinations, coin-first is correct.

Key Difference: Loop Order for Combinations

Concept Note: For number of combinations (order doesn't matter), loop over coins on the outside, amount on the inside. So we consider "use coin c" for all amounts before moving to the next coin; each combination is counted once. For number of permutations (order matters), loop amount on the outside, coins on the inside; then dp[a] counts all ordered ways to form a.

Time and Space Complexity

Minimum coins: Time O(amount × len(coins)), space O(amount).
Number of ways: Time O(amount × len(coins)), space O(amount).

Edge Cases

amount = 0: Min coins = 0; number of ways = 1.
No solution (min coins): e.g. amount = 3, coins = [2]. Return -1.
Empty coins: Min coins for amount > 0 is impossible (-1); ways = 0.

Common Mistakes

Common Mistake: For "number of combinations," putting the amount loop on the outside and coins on the inside. That counts permutations. Use coins outer, amount inner for combinations.

Common Mistake: For min coins, initializing dp[0] to 0 but forgetting to initialize other dp[a] to infinity. Then min(dp[a], 1+dp[a-c]) might use 0 and give wrong "0 coins" for unreachable amounts.

Interview Insight: "Unbounded DP: dp[amount] is min coins or number of ways. For min coins, try each coin and take min(1 + dp[a-c]). For number of ways (combinations), loop coins first then amount so we don't count order. Return -1 if unreachable for min."

Practice Problems

LeetCode 322: Coin Change (minimum number of coins).
LeetCode 518: Coin Change 2 (number of combinations).
Number of permutations to make amount (amount loop outer, coins inner).

Summary

Min coins: dp[a] = min(1 + dp[a-c]) over coins c; dp[0]=0, else init infinity; return -1 if dp[amount] still infinity.
Number of combinations: dp[0]=1; loop coins outer, amount inner; dp[a] += dp[a-c].
Loop order matters: coins-first for combinations, amount-first for min coins (either order works for min).
Time O(amount × |coins|), space O(amount).

15.10 Matrix Chain Multiplication

Introduction

Matrix Chain Multiplication: given a chain of matrices with compatible dimensions, find the order of multiplying them (where to put parentheses) that minimizes the total number of scalar multiplications. Multiplying an (a×b) matrix by a (b×c) matrix costs a×b×c scalar multiplications. The order of multiplication can change the total cost dramatically. This is a classic interval DP or partition DP: we define dp[i][j] = min cost to multiply matrices i through j, and try every split point k between i and j.

Problem Definition

We have n matrices A₀, A₁, ..., A_n-1. Matrix A_i has dimensions d[i] × d[i+1]. So we are given an array d of length n+1: d[0], d[1], ..., d[n]. The product A₀×A₁×...×A_n-1 is well-defined and results in a d[0]×d[n] matrix. We want the minimum number of scalar multiplications to compute this product (by choosing the order of operations).

Example: Three matrices: 10×20, 20×30, 30×40. (A×B)×C costs 10×20×30 + 10×30×40 = 6000 + 12000 = 18000. A×(B×C) costs 20×30×40 + 10×20×40 = 24000 + 8000 = 32000. So the first order is better.

State and Transition

State: dp[i][j] = minimum number of scalar multiplications to compute the product A_i×A_i+1×...×A_j (i and j 0-indexed, i ≤ j).

Base: dp[i][i] = 0 (single matrix—no multiplication needed).

Transition: For i < j, we must split at some k where i ≤ k < j: multiply A_i...A_k (cost dp[i][k]), multiply A_k+1...A_j (cost dp[k+1][j]), then multiply the two results. The two results have dimensions d[i]×d[k+1] and d[k+1]×d[j+1], so that final multiplication costs d[i] * d[k+1] * d[j+1]. So:

dp[i][j] = min over k in [i, j) of { dp[i][k] + dp[k+1][j] + d[i]*d[k+1]*d[j+1] }

Fill Order

dp[i][j] depends on dp[i][k] and dp[k+1][j] where both segments [i,k] and [k+1,j] are shorter than [i,j]. So we fill by length L = j - i + 1: first L=1 (already base), then L=2, 3, ..., n. So: for L from 2 to n, for i from 0 to n-L, j = i+L-1, then for k from i to j-1 compute the candidate and take min.

Python Implementation

def matrix_chain_order(d: list) -> int:
    n = len(d) - 1  # number of matrices
    if n <= 0:
        return 0
    dp = [[0] * n for _ in range(n)]
    for L in range(2, n + 1):       # chain length
        for i in range(n - L + 1):
            j = i + L - 1
            dp[i][j] = float("inf")
            for k in range(i, j):
                cost = dp[i][k] + dp[k + 1][j] + d[i] * d[k + 1] * d[j + 1]
                dp[i][j] = min(dp[i][j], cost)
    return dp[0][n - 1]

d[i]*d[k+1]*d[j+1]: result of A_i...A_k is d[i]×d[k+1], result of A_k+1...A_j is d[k+1]×d[j+1]; multiplying them costs d[i]*d[k+1]*d[j+1].

Line-by-Line Explanation

n = len(d) - 1: n matrices, dimensions d[0]×d[1], d[1]×d[2], ..., d[n-1]×d[n].
Base: diagonal dp[i][i] = 0 (initialized); we only fill when L ≥ 2.
L = length of segment (number of matrices); i = start index; j = i+L-1 = end index.
k is the split: we multiply (A_i...A_k) × (A_k+1...A_j). Cost = dp[i][k] + dp[k+1][j] + cost of final multiply.

Time and Space Complexity

Time: O(n³). Three nested loops: length L (n), start i (O(n)), split k (O(n)).
Space: O(n²) for the dp table.

Edge Cases

n = 0 or 1: Zero or one matrix; return 0 (no multiplication).
n = 2: Two matrices; only one way to multiply; cost d[0]*d[1]*d[2].

Common Mistakes

Common Mistake: Wrong cost formula. The last multiplication is (A_i...A_k) × (A_k+1...A_j). Left result is d[i]×d[k+1], right is d[k+1]×d[j+1], so cost is d[i]*d[k+1]*d[j+1]. Using d[i]*d[k]*d[j] or similar is a common off-by-one error.

Common Mistake: Filling in wrong order. dp[i][j] must be computed only after all dp[i][k] and dp[k+1][j] for i ≤ k < j. Filling by length (L from 2 to n) ensures that.

Interview Insight: "State dp[i][j] = min cost to multiply matrices i..j. Try every split k; cost = dp[i][k] + dp[k+1][j] + d[i]*d[k+1]*d[j+1]. Fill by chain length so smaller segments are ready. O(n³) time, O(n²) space."

Practice Problems

Classic: Matrix chain multiplication (min scalar multiplications).
Print optimal parentheses (store split point k in another table and backtrack).
Variants: burst balloons (similar interval DP), optimal BST.

Summary

Matrix chain: Given dimensions d[0..n], minimize scalar multiplications for A₀×...×A_n-1.
State dp[i][j] = min cost for matrices i..j; base dp[i][i]=0; transition try split k: dp[i][k] + dp[k+1][j] + d[i]*d[k+1]*d[j+1].
Fill by length L = 2..n so dependencies are computed first.
Time O(n³), space O(n²).

15.11 Partition DP

Introduction

Partition DP (also called interval DP or split DP) is the pattern where the state is a range [i, j] and the transition tries every way to partition that range into subranges (e.g. [i, k] and [k+1, j]), solve the subranges recursively or from smaller intervals, and combine their results with a cost or value that depends on the split. Matrix chain multiplication (15.10) is the canonical example; others include burst balloons, palindrome partitioning, merge stones, and optimal BST. The hallmark is: state = (i, j), transition = try all k, combine.

Formal Definition

Concept Note: In partition DP, we define dp[i][j] as the optimal value (min or max cost, count, etc.) for the subproblem on the interval [i, j] (array segment, string substring, or range of indices). The transition: for each partition point k in [i, j), we split into [i, k] and [k+1, j] (or similar), get dp[i][k] and dp[k+1][j], and combine them with a cost that may depend on i, k, j. We take min or max over all k. Base cases: single-element or empty intervals (e.g. dp[i][i]).

Why This Topic Matters

Pattern recognition: Many "choose where to split" or "parenthesize" problems are partition DP.
Fill order: We must compute dp[i][j] only after all shorter intervals are done—typically by increasing length L = j - i + 1.
Complexity: Often O(n³): n² states, O(n) partition points per state.

Mental Model

For a segment [i, j]:

Base: if i == j (or i > j), return the base value (e.g. 0 or known cost).
For each split k from i to j-1: left = [i, k], right = [k+1, j]. Cost = combine(dp[i][k], dp[k+1][j], i, k, j).
dp[i][j] = min (or max) over all k of that cost.

Common Examples

1. Matrix Chain (already seen)

dp[i][j] = min cost to multiply matrices i..j. Split at k: cost = dp[i][k] + dp[k+1][j] + d[i]*d[k+1]*d[j+1]. Fill by length.

2. Burst Balloons (LeetCode 312)

Given balloons with values nums[i]; when you burst balloon k (in range [i,j]), you get nums[i-1]*nums[k]*nums[j+1] (with boundary 1). Maximize total coins. State: dp[i][j] = max coins from bursting all balloons in (i, j) exclusively (so i+1 to j-1), with boundaries i and j. Split: "last balloon burst is k" → dp[i][j] = max over k of (dp[i][k] + dp[k][j] + nums[i]*nums[k]*nums[j]). Base: dp[i][i+1] = 0 (no balloon in between).

3. Palindrome Partitioning (min cuts)

Given a string, partition into palindromic substrings. Find minimum number of cuts. State: dp[i][j] = min cuts for s[i..j] to be partitioned into palindromes; or dp[i] = min cuts for prefix s[0..i-1]. Transition: try each split so that the right part is a palindrome; dp[i] = min(1 + dp[j]) for j where s[j..i-1] is palindrome. (Alternative formulation with 2D interval is also possible.)

4. Merge Stones / Minimum Cost to Merge

Merge adjacent piles with cost = sum of the pile; minimize total cost to merge into one. dp[i][j] = min cost to merge segment [i,j] into one pile. Split: merge [i,k] and [k+1,j] first, then merge the two; cost = dp[i][k] + dp[k+1][j] + sum(i..j). Fill by length.

Fill Order (General)

dp[i][j] depends on dp[i][k] and dp[k+1][j] for k between i and j-1. Both [i,k] and [k+1,j] have length smaller than [i,j]. So we iterate by length L from 2 to n: for each L, for each start i, j = i + L - 1, then for each k from i to j-1. This guarantees dependencies are ready.

for L in 2..n:
  for i in 0..(n-L):
    j = i + L - 1
    for k in i..(j-1):
      dp[i][j] = combine(dp[i][k], dp[k+1][j], ...)

Time and Space Complexity

Time: Typically O(n³)—three nested loops (length, start, split). Sometimes O(n²) if the split is determined by a simpler rule.
Space: O(n²) for the 2D dp table.

Common Mistakes

Common Mistake: Filling in wrong order (e.g. by i, j without regard to length). Then dp[i][k] or dp[k+1][j] might not be computed yet. Always fill by increasing length (or ensure k creates strictly smaller intervals).

Common Mistake: Off-by-one in the combine formula (e.g. which index to use for "cost of merging" or "boundary"). Check with a small example (e.g. length 2).

Interview Insight: "This is partition DP: state is interval [i,j], we try every split k and combine subresults. Fill by interval length so smaller intervals are done first. Often O(n³) time."

Practice Problems

Matrix chain multiplication (15.10).
LeetCode 312: Burst Balloons.
LeetCode 132: Palindrome Partitioning II (min cuts).
Merge stones / minimum cost to merge consecutive elements.

Summary

Partition DP: state = interval [i, j]; transition = try every split k, combine dp[i][k] and dp[k+1][j] with a cost.
Fill by length L = 2 to n so that shorter intervals are computed first.
Matrix chain, burst balloons, merge stones, palindrome partitioning are classic examples.
Typically O(n³) time, O(n²) space.

15.12 DP on Trees

Introduction

DP on Trees means defining subproblems on the tree structure: typically, the state is "at node u" (and possibly an extra dimension like "did we take this node" or "how many nodes chosen"). We compute the answer for a node using the answers for its children—so we process in post-order (children before root). The recurrence is natural: each subtree is a subproblem; we combine results from subtrees. Classic problems include maximum path sum, house robber III (take/skip nodes with no two adjacent), tree diameter, and "best outcome in subtree" with constraints.

Why Trees Work Well for DP

No cycles: Each subtree is independent once we fix the root; no overlapping in a confusing way.
Natural subproblems: "Subtree rooted at u" is a clear subproblem; state = node (and maybe a small extra state).
Order: Post-order DFS ensures we visit children before parent, so when we compute for node u, we already have results for all children.

Mental Model

For each node u:

Recursively get results for all children (e.g. left and right for binary tree).
Combine children's results with the current node's value (or constraint) to get the result for the subtree rooted at u.
Return one or more values (e.g. "best path in subtree," "best if we take u," "best if we don't take u").

State and Transition (General)

State: Often (node u) or (node u, flag). For example: dp[u] = best value for subtree at u; or (take[u], skip[u]) = best if we take u / skip u. We don't always store in a table—we can return values from a DFS and use them in the parent.

Transition: Combine children. For binary tree: left_result, right_result = dfs(left), dfs(right); then result_at_u = f(node.val, left_result, right_result). For "take/skip" (house robber): take[u] = val[u] + skip[left] + skip[right]; skip[u] = max(take[left], skip[left]) + max(take[right], skip[right]).

Example: Maximum Path Sum (Binary Tree)

Find the maximum path sum (any path: node-to-node, not necessarily root-to-leaf). For each node we need: (1) the best path that goes through this node and ends at this node (so we can extend to parent)—call it "chain"; (2) the best path entirely inside the subtree (candidate for global max).

def max_path_sum(root):
    best = [float("-inf")]

    def dfs(node):
        if not node:
            return 0
        left = max(0, dfs(node.left))   # ignore negative chains
        right = max(0, dfs(node.right))
        # path through this node: node.val + left + right
        best[0] = max(best[0], node.val + left + right)
        # chain ending at this node (for parent)
        return node.val + max(left, right)

    dfs(root)
    return best[0]

We return the "chain" (max sum path from some node in subtree down to current node); the global best path might be "left chain + node + right chain," which we update at each node.

Example: House Robber III (Take / Skip)

Tree version: we can't rob two adjacent nodes (parent and child). For each node return (take, skip): take = node.val + skip(left) + skip(right); skip = max(take(left), skip(left)) + max(take(right), skip(right)). Answer at root = max(take(root), skip(root)).

def rob(root):
    def dfs(node):
        if not node:
            return (0, 0)
        left_take, left_skip = dfs(node.left)
        right_take, right_skip = dfs(node.right)
        take = node.val + left_skip + right_skip
        skip = max(left_take, left_skip) + max(right_take, right_skip)
        return (take, skip)

    t, s = dfs(root)
    return max(t, s)

Time and Space Complexity

Time: O(n)—we visit each node once and do O(1) or O(children) work per node. Total O(n) for a tree with n nodes.
Space: O(h) for recursion stack (height h); O(n) if we store a dp table per node.

Edge Cases

Empty tree (null root): Return 0 or identity value for the problem.
Single node: Base case; return node value or (val, 0) for take/skip.
Negative values: In path sum, "ignore negative" by max(0, chain) so we don't drag negative into the parent.

Common Mistakes

Common Mistake: Computing parent before children. We must use post-order: first recurse to all children, then use their return values at the current node. In-order or pre-order won't have children's results ready.

Common Mistake: In path sum, forgetting that the "best path through this node" can be the global answer (left + node + right), not just the chain we return to the parent. Update a global (or nonlocal) max at each node.

Interview Insight: "DP on tree: state is subtree at node; we need results from children first so use post-order DFS. Return value(s) for subtree; combine at current node. For take/skip, return (take, skip) and combine with skip from children when we take parent."

Practice Problems

LeetCode 124: Binary Tree Maximum Path Sum.
LeetCode 337: House Robber III.
Tree diameter (longest path between two nodes): for each node, diameter through node = 1 + max depth left + max depth right; return max depth and best diameter.
Sum of all paths (root to leaf) or path count.

Summary

DP on Trees: State = subtree rooted at node (and maybe take/skip or other flag); compute after children (post-order).
Transition = combine children's results with current node; return value(s) for subtree.
Time O(n), space O(h) or O(n).
Max path sum: chain + update global with "through node"; house robber III: (take, skip) with take = val + skip(children).

15.13 DP on Graph

Introduction

DP on Graph means defining subproblems on the vertices (and sometimes edges) of a graph and computing them in an order that respects dependencies. On a DAG (directed acyclic graph), we have a natural order—topological sort—so we can compute dp[v] after all predecessors of v. That gives shortest path, longest path, count of paths, and similar problems in one or two passes. On graphs with cycles, "DP" in the classic sense is harder (no fixed order); we use shortest-path algorithms (Bellman-Ford, Dijkstra) or state that includes more information (e.g. bitmask of visited nodes). This topic focuses on DAG-based DP, which is the standard "DP on graph" in interviews.

Real-World Analogy

Imagine tasks with dependencies: task B can only start after task A finishes. The graph has an edge A → B meaning "A before B." To compute the earliest finish time at each task (a kind of DP), we must process tasks in an order where every dependency is done first—that's topological order. DP on a DAG is exactly that: we assign a value to each node using the values of the nodes that point into it, in topo order.

Formal Definition

Concept Note: In DP on a DAG, we define dp[v] as the optimal value for node v (e.g. shortest path from source to v, longest path, or number of paths to v). The transition: dp[v] is computed from dp[u] for all predecessors u of v (u → v), plus the edge cost or weight. We must process nodes in topological order so that when we compute dp[v], all dp[u] for u → v are already computed. If the graph has cycles, there is no topo order; we need other techniques (shortest path algorithms or expanded state).

Why This Topic Matters

DAGs everywhere: Dependency graphs, project scheduling, compilation order, and many problem constraints form DAGs.
Single-source shortest path in DAG: Can be solved in O(V + E) with one topo pass—simpler than Dijkstra when the graph is a DAG.
Interview: "Longest path in a DAG," "number of paths from source to target," "critical path" are common.

Mental Model

For each node v in topological order:

Initialize dp[v] (e.g. 0 for source, infinity for others in shortest path).
For each edge (u, v) with weight w: update dp[v] from dp[u] (e.g. dp[v] = min(dp[v], dp[u] + w) for shortest path).
After the pass, dp[v] is the answer for node v (e.g. shortest distance from source to v).

Prerequisite: Topological Sort

We need nodes in an order such that for every edge (u, v), u appears before v. Algorithms: DFS (push to stack when leaving) or Kahn's (in-degree zero queue). Without topo order we cannot guarantee that dp[u] is ready when we compute dp[v].

Example 1: Shortest Path in DAG (Single Source)

Problem: Given a DAG with edge weights and a source s, find the shortest distance from s to every vertex. Negative weights are allowed (unlike Dijkstra).

State: dist[v] = shortest distance from s to v.

Base: dist[s] = 0; dist[v] = ∞ for v ≠ s.

Transition: For each node u in topo order, for each neighbor v of u: dist[v] = min(dist[v], dist[u] + weight(u,v)). This is "relaxation" in topo order.

def shortest_path_dag(graph, weights, source, n):
    # graph: list of lists, graph[u] = list of neighbors v
    # weights: (u,v) -> w or dict
    from collections import deque
    indeg = [0] * n
    for u in range(n):
        for v in graph[u]:
            indeg[v] += 1
    topo = []
    q = deque([i for i in range(n) if indeg[i] == 0])
    while q:
        u = q.popleft()
        topo.append(u)
        for v in graph[u]:
            indeg[v] -= 1
            if indeg[v] == 0:
                q.append(v)

    INF = float("inf")
    dist = [INF] * n
    dist[source] = 0
    for u in topo:
        if dist[u] == INF:
            continue
        for v in graph[u]:
            w = weights.get((u, v), 1)
            dist[v] = min(dist[v], dist[u] + w)
    return dist

Example 2: Longest Path in DAG

Same idea: for longest path from s to v, initialize dist[v] = -∞ and use dist[v] = max(dist[v], dist[u] + w). Or negate weights and run shortest path. Application: critical path in project scheduling.

Example 3: Number of Paths from Source to Target

State: paths[v] = number of paths from source to v.

Base: paths[source] = 1; paths[v] = 0 for others initially.

Transition: For each u in topo order, for each neighbor v of u: paths[v] += paths[u] (add paths that come through u).

def count_paths_dag(graph, source, target, n):
    from collections import deque
    indeg = [0] * n
    for u in range(n):
        for v in graph[u]:
            indeg[v] += 1
    topo = []
    q = deque([i for i in range(n) if indeg[i] == 0])
    while q:
        u = q.popleft()
        topo.append(u)
        for v in graph[u]:
            indeg[v] -= 1
            if indeg[v] == 0:
                q.append(v)

    paths = [0] * n
    paths[source] = 1
    for u in topo:
        for v in graph[u]:
            paths[v] += paths[u]
    return paths[target]

Graphs with Cycles

If the graph has cycles, there is no topological order. Options:

Shortest path: Use Bellman-Ford or Dijkstra (non-negative weights).
State expansion: e.g. dp[v][mask] = best way to reach v having visited set mask (Hamiltonian path style); then we can iterate in mask order.
Memoization: DFS from source with cache: dp(v) = f(neighbors' dp). Works when the recurrence is acyclic in "dependency" sense (e.g. shortest path in a DAG from v to target: we need neighbors to target first, so we process in reverse topo order or use recursion with memo).

Time and Space Complexity

Topological sort: O(V + E).
One pass over nodes and edges: O(V + E). So total O(V + E) for shortest/longest path or count paths in a DAG.
Space: O(V) for dist/paths and topo list.

Edge Cases

Source not in topo order first: Ensure we only relax from nodes that are reachable from source (check dist[u] != INF before relaxing, or run BFS/DFS from source first and only consider those nodes).
Multiple components: Nodes unreachable from source stay at ∞ or 0 paths; that's correct.
Cycle in graph: Topo sort fails (we won't get all nodes, or Kahn's will leave some with positive in-degree). Detect and handle (e.g. report "no order" or use a different algorithm).

Common Mistakes

Common Mistake: Processing nodes in the wrong order. If we iterate by node index (0 to n-1) instead of topological order, we might compute dp[v] before some predecessor u has been processed, so dp[v] is wrong. Always use topo order.

Common Mistake: Forgetting that the source might not be the first in topo order. We initialize dist[source]=0 and only relax from nodes u where dist[u] is finite; that way unreachable nodes stay at ∞ and we don't incorrectly update from uninitialized values.

Optimization Insight: For "single source shortest path" in a DAG, one forward topo pass is enough. For "single destination" (shortest path from every node to target), reverse the graph and run from target in reversed topo order. No need for multiple passes like Bellman-Ford.

Interview Insight: "On a DAG we can do DP in topological order: one pass, compute dp[v] from all predecessors. Shortest path: relax in topo order, O(V+E). Longest path: same with max. Count paths: paths[v] += paths[u] for each edge u→v. If the graph has cycles, we need Bellman-Ford/Dijkstra or state that includes more info."

Practice Problems

Shortest path in DAG (single source, possibly negative weights).
Longest path in DAG (critical path).
Number of paths from source to target in DAG.
LeetCode 329: Longest Increasing Path in a Matrix (build DAG from grid, then longest path).

Summary

DP on Graph (DAG): State = value at each node (e.g. dist, path count); transition = combine from all predecessors; order = topological sort.
Shortest path: dist[v] = min(dist[v], dist[u] + w) in topo order; longest path: use max; count paths: paths[v] += paths[u].
Time O(V + E), space O(V). With cycles, use other algorithms or expanded state.
Always process in topo order; only relax from reachable nodes (dist[u] finite) when applicable.

15.14 Bitmask DP

Introduction

Bitmask DP uses an integer (or bitmask) to represent a subset of a set of n elements: bit i is 1 if element i is in the subset, 0 otherwise. So we have 2ⁿ possible subsets encoded as 0 to 2ⁿ−1. The state is often (mask, ...) where mask is this integer, plus optional dimensions (e.g. "last visited node"). This lets us solve "visit each city exactly once" (TSP), "assign n tasks to n people with cost," and other subset-based optimization in O(2ⁿ × poly(n)) time. The key is: state = subset (mask) + optional context; transition = try adding or removing one element.

Real-World Analogy

Imagine a checklist of n places. Instead of storing a list of "visited" or "not visited," you carry a single number: the binary digits tell you which places are checked (1) and which are not (0). For example, 1011 (binary) = 11 means places 0, 1, and 3 are visited. When you visit one more place, you flip one bit. Bitmask DP is that idea: the mask is the checklist, and we build up from smaller masks (fewer places visited) to the full mask (all visited).

Formal Definition

Concept Note: A bitmask of length n is an integer in [0, 2ⁿ−1]. We interpret it as a subset of {0, 1, ..., n−1}: element i is in the subset iff the i-th bit of the mask is 1. Operations: add i → mask | (1 << i); remove i → mask & ~(1 << i); check i → (mask >> i) & 1 or mask & (1 << i); iterate subsets of mask → iterate i where bit i is 1. In bitmask DP, state includes a mask (and often another parameter like "current position"); we iterate over masks and/or over elements to add/remove.

Why This Topic Matters

Exponential but feasible: 2ⁿ is manageable for n up to ~20; bitmask DP is the standard for "subset of items" optimization.
TSP and assignment: Traveling salesman (visit all, min cost), job assignment (min cost to assign n jobs to n workers), and many "choose one per slot" problems.
Interview: Less common than linear/2D DP, but appears in harder rounds; knowing the pattern is a plus.

Bit Operations Cheat Sheet

1 << i — bit i set (2ⁱ).
mask | (1 << i) — add i to subset.
mask & ~(1 << i) — remove i from subset.
(mask >> i) & 1 or bool(mask & (1 << i)) — is i in subset?
mask.bit_count() (Python 3.10+) or bin(mask).count("1") — size of subset.

Mental Model

State = (mask, optional extra). For "visit all and end somewhere": dp[mask][v] = best cost to have visited exactly the set in mask and be at v. Transition: we got to v from some u in mask; so dp[mask][v] = min over u in mask, u ≠ v of (dp[mask without v][u] + cost(u, v)). Base: dp[1<<s][s] = 0 (start at s, only s visited).

Example: Traveling Salesman (TSP)

Problem: n cities, distance/cost matrix. Start at city 0, visit every city exactly once, return to 0. Minimize total cost.

State: dp[mask][v] = minimum cost to have visited all cities in mask and be at city v (v must be in mask). We will finally return to 0, so answer = min over v of (dp[full_mask][v] + cost[v][0]).

Base: dp[1 << 0][0] = 0 (start at 0, only 0 visited).

Transition: For each mask and each v in mask (v ≠ 0 or we allow 0 when mask has more), we could have come from some u in mask with u ≠ v. So dp[mask][v] = min over u in mask, u ≠ v of (dp[mask without v][u] + cost[u][v]). Iterate masks in increasing order (or by size); for each mask, for each v in mask, for each u in mask with u ≠ v, relax.

Python: TSP (Bitmask DP)

def tsp(n, cost):
    # cost[i][j] = cost from i to j; n cities 0..n-1
    INF = float("inf")
    full = (1 << n) - 1
    dp = [[INF] * n for _ in range(1 << n)]
    dp[1][0] = 0  # mask with only 0, at city 0

    for mask in range(1 << n):
        for v in range(n):
            if not (mask & (1 << v)):
                continue
            prev = mask & ~(1 << v)
            for u in range(n):
                if not (prev & (1 << u)):
                    continue
                if dp[prev][u] != INF:
                    dp[mask][v] = min(dp[mask][v], dp[prev][u] + cost[u][v])

    ans = INF
    for v in range(1, n):
        if dp[full][v] != INF:
            ans = min(ans, dp[full][v] + cost[v][0])
    return ans

Note: We iterate all masks; for each v in mask we look at prev = mask without v and all u in mask. So we need dp[prev][u] already computed—prev is a smaller mask (fewer bits), so iterating mask from 0 to 2ⁿ−1 works (smaller masks are computed first). Actually same mask size can have multiple entries; we're iterating by mask value and prev < mask, so we're good.

Line-by-Line Explanation (TSP)

dp[mask][v]: min cost to visit all cities in mask and end at v.
Base: dp[1][0] = 0 (only city 0 visited, we're at 0).
For each mask and v in mask: prev = mask without v. We came from some u in prev; so dp[mask][v] = min( dp[prev][u] + cost[u][v] ) over u in prev.
Answer: after visiting all (mask = full), go back to 0: min over v of dp[full][v] + cost[v][0].

Fix: Indentation in TSP Loop

The inner loop over u must be inside the "v in mask" block, and we must compute prev and use it correctly. Correct structure:

for mask in range(1 << n):
    for v in range(n):
        if not (mask & (1 << v)):
            continue
        prev = mask & ~(1 << v)
        for u in range(n):
            if not (prev & (1 << u)):
                continue
            if dp[prev][u] != INF:
                dp[mask][v] = min(dp[mask][v], dp[prev][u] + cost[u][v])

Time and Space Complexity

Time: O(2ⁿ × n²) — for each of 2ⁿ masks and each v (n), we iterate over u (n).
Space: O(2ⁿ × n) for the dp table.

Edge Cases

n = 1: Only one city; no travel; return 0 (or cost[0][0] if self-loop).
Disconnected: If some cost[i][j] is infinite, ensure we don't use it; INF check in transition.

Common Mistakes

Common Mistake: Wrong bit operations: (1 << i) not (1 < i); use | to add bit, & ~(1<<i) to remove. Check "is i in mask" with (mask >> i) & 1 or mask & (1 << i).

Common Mistake: Iterating masks in an order where a "larger" mask (more bits) is computed before a "smaller" one that it depends on. For TSP, dp[mask] depends on dp[prev] where prev has one fewer bit—so iterating mask from 0 to 2^n-1 is correct (prev < mask when we remove a bit). For other problems, iterate by number of bits (size of subset) if needed.

Interview Insight: "State is (mask, current); mask encodes which items we've used. Transition: try adding one more item (flip a bit). For TSP, dp[mask][v] = min over u in mask of dp[mask without v][u] + cost(u,v). Iterate masks so smaller subsets are done first. O(2^n * n^2) for TSP."

Practice Problems

TSP (visit all, return to start).
Assignment: n tasks to n people, cost[i][j]; min total cost (same structure: dp[mask] = min cost to assign tasks in mask; try assigning the last task to each person).
LeetCode 847: Shortest Path Visiting All Nodes (graph; state (mask, node)).

Summary

Bitmask DP: State includes a mask (subset) encoded as an integer; use bit operations to add/remove/check elements.
TSP: dp[mask][v] = min cost to visit mask and end at v; transition from dp[mask\v][u] + cost[u][v]; iterate masks in order so dependencies are ready.
Time typically O(2ⁿ × poly(n)); space O(2ⁿ × ...).
Check bit ops (1<<i, |, &, ~) and iteration order.

15.15 Digit DP

Introduction

Digit DP (digit dynamic programming) solves problems about numbers in a range that satisfy a property on their digits. Examples: count numbers in [L, R] with digit sum equal to S, with no digit 4, or with digits in non-decreasing order. We build the number digit by digit (from most significant). The key state is (position, tight, ...): tight means the prefix we've chosen equals the prefix of the upper bound, so we cannot pick a digit larger than the bound's digit at the current position. We use memoization over (pos, tight, optional params) and iterate over valid digits at each step.

Problem Type

Typically: count integers in [0, N] (or [L, R] as count(R) − count(L−1)) that satisfy a digit constraint. We convert N to a list of digits (e.g. "1234" → [1,2,3,4]) and process from left to right.

State and Transition

State: (pos, tight, ...). pos = current position (0 = most significant). tight = True if the prefix built so far equals the prefix of the bound (so we are "tight" and the next digit cannot exceed bound[pos]). Extra state depends on the problem: digit sum, "has digit 4," "last digit," etc.

Base: When pos == len(digits), we have built a full number; return 1 if it satisfies the property, else 0.

Transition: At pos, try each digit d from 0 to (bound[pos] if tight else 9). New_tight = tight and (d == bound[pos]). Recurse to (pos+1, new_tight, updated_extra_state). Sum (for count) or combine results.

Example: Count Numbers in [0, N] With No Digit 4

State: (pos, tight). If we've built a prefix without 4 and we're at pos, try d in 0..9 (or 0..bound[pos] if tight); skip d==4. Base: pos == len → return 1.

def count_no_four(n: int) -> int:
    s = list(map(int, str(n)))
    def dfs(pos: int, tight: bool) -> int:
        if pos == len(s):
            return 1
        up = s[pos] if tight else 9
        res = 0
        for d in range(0, up + 1):
            if d == 4:
                continue
            res += dfs(pos + 1, tight and (d == s[pos]))
        return res
    return dfs(0, True)

With memoization: cache (pos, tight); same state is reused. Without memo, we have many repeated (pos, tight) calls.

Example: Count Numbers With Digit Sum = Target

State: (pos, tight, sum_so_far). Try digit d; new_sum = sum_so_far + d; recurse (pos+1, new_tight, new_sum). Base: pos == len and sum_so_far == target → 1, else 0. Memo on (pos, tight, sum_so_far).

def count_digit_sum(n: int, target: int) -> int:
    s = list(map(int, str(n)))
    from functools import lru_cache
    @lru_cache(maxsize=None)
    def dfs(pos: int, tight: bool, sum_so_far: int) -> int:
        if pos == len(s):
            return 1 if sum_so_far == target else 0
        if sum_so_far > target:
            return 0
        up = s[pos] if tight else 9
        res = 0
        for d in range(0, up + 1):
            res += dfs(pos + 1, tight and (d == s[pos]), sum_so_far + d)
        return res
    return dfs(0, True, 0)

Range [L, R]

Count in [L, R] = count(R) − count(L−1). Implement count(N) for upper bound N; then answer = count(R) − count(L−1). Handle L=0 (count(L−1) = count(-1) — define as 0 or implement count for "numbers <= N" and use count(R) − count(L−1) with care).

Time and Space Complexity

Time: O(digits × 2 × (domain of extra state)) with memo. For "no 4," states are (pos, tight) — O(len(N)). For digit sum, extra dimension is sum (bounded by 9×len(N)). So typically O(poly(log N)) states.
Space: Same as state space + recursion depth O(len(N)).

Edge Cases

N = 0: Digits = [0]; count is 0 or 1 depending on whether 0 is allowed.
Leading zeros: Usually we build numbers with the same length as N (leading zeros not used for "tight" comparison). If we allow shorter numbers, we need an extra "started" or "leading zero" flag.

Common Mistakes

Common Mistake: Forgetting to pass and update tight correctly. When tight is True and we choose d < bound[pos], the next state has tight = False (we're below the bound). When we choose d == bound[pos], tight remains True only if we were tight.

Common Mistake: Not memoizing. Without cache, the same (pos, tight, ...) is recomputed many times; digit DP is exponential without memo.

Interview Insight: "Digit DP: build number digit by digit. State is (position, tight, ...). Tight means we're still matching the bound so we can't exceed the current digit. Try digits 0 to bound[pos] (if tight) or 0 to 9; recurse with new tight = tight and (d == bound[pos]). Memoize on state."

Practice Problems

Count numbers in [1, N] with no digit 4 (or 9).
Count numbers in [L, R] with digit sum = S.
Sum of digits of all numbers in [L, R] (extra state: sum so far; at the end add sum to global or return (count, sum)).

Summary

Digit DP: Count (or optimize) numbers in a range satisfying a digit property; build digit by digit from MSB.
State: (pos, tight, ...); tight = prefix equals bound prefix; try d from 0 to bound[pos] (if tight) or 9.
Memoize on (pos, tight, extra); [L,R] = count(R) − count(L−1).
Time O(poly(log N)) with memo; watch tight update and leading zeros.

15.16 Divide & Conquer DP

Introduction

Divide & Conquer DP (also called the Knuth-Yao or optimal split optimization) applies to certain partition DP recurrences where the optimal split point k has a monotonicity property: if opt[i][j] is the best k for segment [i, j], then opt[i][j-1] ≤ opt[i][j] ≤ opt[i+1][j]. That lets us search only in a narrow range of k instead of trying all k in [i, j), reducing time from O(n³) to O(n²) for problems like optimal BST, matrix chain (with certain cost functions), and some "merge" problems. The cost function must satisfy the quadrangle inequality (and sometimes monotonicity) for the optimization to be valid.

When It Applies

Recurrence of the form:

dp[i][j] = min over k in [i, j) of { dp[i][k] + dp[k+1][j] + cost(i, j, k) }

Often cost(i, j, k) is just w(i, j) (doesn't depend on k)—e.g. sum of frequencies in [i, j] for optimal BST. For the optimization to hold we need:

Quadrangle inequality (QI): cost satisfies a certain inequality so that the best k doesn't "jump" arbitrarily.
Monotonicity of opt: opt[i][j-1] ≤ opt[i][j] ≤ opt[i+1][j], so when we fill by length L, we know opt[i][j] lies between opt[i][j-1] and opt[i+1][j].

Idea: Restrict k Range

Instead of looping k from i to j-1, we loop k only from opt[i][j-1] to opt[i+1][j] (with bounds i and j-1). For the first row or when opt is not yet computed, use i and j-1. When we fill by increasing length L = j−i+1, opt[i][j-1] and opt[i+1][j] are already known from previous lengths.

Example: Optimal BST (Sketch)

Given keys and frequencies, build a BST that minimizes total search cost (sum of depth×frequency). dp[i][j] = min cost for keys i..j; root is some k in [i, j]; cost = dp[i][k-1] + dp[k+1][j] + sum(freq[i..j]). The sum doesn't depend on k. With QI, opt[i][j-1] ≤ opt[i][j] ≤ opt[i+1][j]. So we iterate k only in that range.

# Fill by length L; opt[i][j] = best root for keys i..j
for L in range(1, n + 1):
    for i in range(0, n - L + 1):
        j = i + L - 1
        lo = opt[i][j - 1] if j > 0 else i
        hi = opt[i + 1][j] if i + 1 <= j else j
        dp[i][j] = inf
        for k in range(lo, hi + 1):
            cost = (dp[i][k-1] if k > i else 0) + (dp[k+1][j] if k < j else 0) + sum(freq[i:j+1])
            if cost < dp[i][j]:
                dp[i][j], opt[i][j] = cost, k

Time and Space Complexity

Time: O(n²) — for each of O(n²) states (i, j), we try O(opt range) values of k; the total over all (i, j) is O(n²) when the range is bounded by monotonicity (each k is tried O(n) times total).
Space: O(n²) for dp and opt tables.

Common Mistakes

Common Mistake: Using this optimization when the cost does not satisfy the required inequalities. Then opt[i][j] can lie outside [opt[i][j-1], opt[i+1][j]] and the restricted k loop gives the wrong answer.

Common Mistake: Off-by-one in the bounds for lo/hi (e.g. k must be in [i, j] for optimal BST as root index; for partition DP k is often in [i, j) so hi = j-1). Match the recurrence.

Interview Insight: "When the recurrence is dp[i][j] = min over k of dp[i][k] + dp[k+1][j] + cost, and the cost has quadrangle inequality, the best k is monotonic: opt[i][j-1] ≤ opt[i][j] ≤ opt[i+1][j]. So we only try k in that range and get O(n²) instead of O(n³)."

Practice Problems

Optimal BST (minimize expected search cost).
Matrix chain with certain cost structures.
Merge stones / minimum cost to merge (when the cost satisfies QI).

Summary

Divide & Conquer DP: Optimize partition DP by restricting the split k using monotonicity of the optimal split point.
Requires quadrangle inequality (and often monotonicity); then opt[i][j-1] ≤ opt[i][j] ≤ opt[i+1][j].
Loop k only in [opt[i][j-1], opt[i+1][j]]; fill by length so these opt values are known.
Time O(n²), space O(n²).

15.17 Knuth Optimization

Introduction

Knuth Optimization (also called Knuth-Yao or the opt table method) is a technique to speed up certain partition DP recurrences from O(n³) to O(n²). It applies when the cost function in the recurrence satisfies Knuth's conditions (a form of quadrangle inequality and monotonicity), which imply that the optimal split point k for segment [i, j] lies between the optimal split for [i, j−1] and for [i+1, j]. We maintain an opt[i][j] table and only try k in that range instead of all k in [i, j), reducing the inner loop from O(n) to O(1) amortized. It is widely used for optimal BST, matrix chain (with appropriate cost), and similar problems.

Real-World Analogy

Imagine cutting a rod at different positions to minimize cost. If "the best cut for a longer piece is never to the left of the best cut for a shorter piece starting at the same place," then when we solve for a longer segment we only need to look near where we cut the slightly shorter segments. Knuth optimization formalizes this: the best split doesn't jump around; it moves monotonically, so we narrow the search.

Formal Definition

Concept Note: Consider the recurrence dp[i][j] = min_{i ≤ k < j} { dp[i][k] + dp[k+1][j] + w(i, j) } for i < j, with base dp[i][i] = 0. Let opt[i][j] be the value of k that achieves this minimum (smallest k if tie). Knuth's conditions on the weight w(i, j) are: (1) Quadrangle inequality (QI): w(i, j) + w(i′, j′) ≤ w(i, j′) + w(i′, j) for i ≤ i′ ≤ j ≤ j′. (2) Monotonicity: w(i, j) ≤ w(i, j+1) and w(i, j) ≤ w(i−1, j). Under these, we have opt[i][j−1] ≤ opt[i][j] ≤ opt[i+1][j], so we can restrict the k loop.

Why This Topic Matters

Speedup: O(n³) → O(n²) for a class of partition DP problems; critical when n is large.
Optimal BST and variants: Standard solution uses Knuth optimization (or the same idea).
Competitive programming: Common in advanced DP problems; knowing when and how to apply it is valuable.

Mental Model

When filling dp[i][j] for a segment [i, j]:

We know opt[i][j−1] and opt[i+1][j] from previous iterations (shorter segments).
So the best k for [i, j] lies in [opt[i][j−1], opt[i+1][j]] (clamped to [i, j−1]).
Loop only over k in that range; update dp[i][j] and set opt[i][j] to the best k.

Knuth's Conditions in Practice

For optimal BST: w(i, j) = sum of frequencies from i to j (prefix sum). This satisfies QI and monotonicity. For matrix chain with cost = d[i]*d[k+1]*d[j+1], the cost depends on k, so the standard matrix chain recurrence is not of the form w(i,j) only; a different but related formulation (e.g. with cumulative cost) can satisfy the conditions. Many "concave" or "convex" cost functions in partition DP satisfy them.

Step-by-Step: Optimal BST with Knuth

Input: Keys 0..n−1 (or 1..n), frequency f[i] for key i. We want a BST minimizing sum of (depth of key i × f[i]).
Precompute: Prefix sum of frequencies so w(i, j) = sum(f[i..j]) in O(1).
Base: dp[i][i] = 0 (or f[i] if single node has cost); opt[i][i] = i.
Fill by length L: For L from 2 to n, for i from 0 to n−L, j = i+L−1. Set lo = opt[i][j−1], hi = opt[i+1][j] (or i and j−1 for first row/column). For k from lo to hi: cost = dp[i][k−1] + dp[k+1][j] + w(i,j). If cost < dp[i][j], update dp[i][j] and opt[i][j] = k.

Python Implementation (Optimal BST)

def optimal_bst(freq):
    n = len(freq)
    # prefix sum: w(i,j) = pref[j+1] - pref[i]
    pref = [0]
    for f in freq:
        pref.append(pref[-1] + f)

    dp = [[0] * n for _ in range(n)]
    opt = [[i] * n for i in range(n)]

    for L in range(2, n + 1):
        for i in range(0, n - L + 1):
            j = i + L - 1
            dp[i][j] = float("inf")
            lo = opt[i][j - 1] if j > 0 else i
            hi = opt[i + 1][j] if i + 1 <= j else j
            for k in range(lo, hi + 1):
                left = dp[i][k - 1] if k > i else 0
                right = dp[k + 1][j] if k < j else 0
                w = pref[j + 1] - pref[i]
                cost = left + right + w
                if cost < dp[i][j]:
                    dp[i][j] = cost
                    opt[i][j] = k
    return dp[0][n - 1]

Line-by-Line Explanation

pref: w(i, j) = pref[j+1] − pref[i] in O(1).
opt[i][j]: best root k for keys i..j; initialized to i (for length 1, opt[i][i]=i).
Length L from 2 to n; for segment [i, j] with j = i+L−1, lo = opt[i][j−1], hi = opt[i+1][j].
Cost for root k: dp[i][k−1] + dp[k+1][j] + w(i,j). Update dp and opt when we get a better cost.

Time and Space Complexity

Time: O(n²). For each (i, j) we try O(opt[i+1][j] − opt[i][j−1] + 1) values of k; the sum over all (i, j) of this range is O(n²) due to monotonicity (each k is "used" in O(n) segments).
Space: O(n²) for dp and opt.

Edge Cases

n = 0 or 1: Single node: cost is f[0]; or return 0 for empty.
lo > hi: Clamp lo to i and hi to j−1 (or j for optimal BST where k is root index in [i, j]); ensure the loop runs correctly when opt[i][j−1] > opt[i+1][j] (shouldn't happen if conditions hold, but defensive coding helps).

Common Mistakes

Common Mistake: Applying Knuth optimization when the cost does not satisfy QI/monotonicity. Verify the weight function for your problem (e.g. prefix sum for optimal BST is fine; arbitrary cost is not).

Common Mistake: Using opt[i+1][j] when i+1 > j. For length-2 segment [i, i+1], opt[i+1][i+1] = i+1; lo = opt[i][i] = i, hi = opt[i+1][i+1] = i+1. So k runs from i to i+1. For root index in [i, j], k can be i or i+1; both are valid. Ensure bounds are clamped to [i, j].

Optimization Insight: Without the opt table we would iterate k from i to j−1 (or i to j for BST) for every (i, j), giving O(n³). The key is that the total number of (k, i, j) triples where k is tried for (i, j) is O(n²) because the range of k for (i, j) is bounded by previous opt values that "move" at most O(n) total across the table.

Interview Insight: "For partition DP with recurrence dp[i][j] = min over k of dp[i][k] + dp[k+1][j] + w(i,j), if w satisfies Knuth's conditions (QI + monotonicity), then opt[i][j-1] ≤ opt[i][j] ≤ opt[i+1][j]. Store opt[i][j] and only loop k in that range; total time becomes O(n²)."

Practice Problems

Optimal BST (classic).
Matrix chain with cost that fits the form (e.g. some variants).
Merge stones / minimum cost to merge when cost is additive (sum) over the segment.

Summary

Knuth Optimization: Restrict the split k in partition DP to [opt[i][j−1], opt[i+1][j]] when the cost w(i,j) satisfies quadrangle inequality and monotonicity.
Maintain opt[i][j] = best k for segment [i, j]; fill by length so opt[i][j−1] and opt[i+1][j] are known.
Time O(n²), space O(n²). Optimal BST is the standard example.
Do not apply when cost doesn't satisfy the conditions.

15.18 Convex Hull Trick

Introduction

The Convex Hull Trick (CHT) solves the problem: given many linear functions f_k(x) = m_k·x + c_k, for a query value x find the minimum (or maximum) of f_k(x) over all k. The "trick" is that we only need to maintain the lower envelope (for min) or upper envelope (for max)—the set of line segments that actually win for some x. In DP, recurrences of the form dp[i] = min_j (m[j]*x[i] + c[j]) + const can be evaluated in O(log n) per state using CHT, turning O(n²) into O(n log n). Used when the transition is linear in the "query" variable.

Problem Formulation

We have lines L_k: y = m_kx + c_k. For query x, compute min_k L_k(x) (or max). The lower envelope is the piecewise minimum; it is a convex chain (when lines are sorted by slope). We add lines one by one and maintain the hull so that we can query in O(log n) or O(1) amortized.

When Slopes and Queries Are Monotonic

If we add lines in increasing slope order and query increasing x, we can use a deque. The hull is maintained by removing lines that are never minimal: when adding a new line, pop from the back while the new line is better than the back line at the intersection point with the second-from-back. Query: pop from the front while the next line is better at current x. Each line is pushed and popped at most once, so total O(n).

Deque Implementation (Min Hull, Increasing Slope, Increasing Query)

# Lines as (m, c). Query x returns min over added lines.
# Assumes lines added in order of increasing m; queries in increasing x.
def cht_deque():
    lines = []  # list of (m, c) that form the hull

    def cross(m1, c1, m2, c2):
        # x where m1*x+c1 == m2*x+c2
        return (c2 - c1) / (m1 - m2) if m1 != m2 else float("-inf")

    def add(m, c):
        while len(lines) >= 2:
            m1, c1 = lines[-2]
            m2, c2 = lines[-1]
            x12 = cross(m1, c1, m2, c2)
            x1n = cross(m1, c1, m, c)
            if x1n <= x12:  # new line overtakes back at or before back's end
                lines.pop()
            else:
                break
        lines.append((m, c))

    def query(x):
        while len(lines) >= 2:
            m1, c1 = lines[0]
            m2, c2 = lines[1]
            if m1*x + c1 > m2*x + c2:
                lines.pop(0)
            else:
                break
        return lines[0][0]*x + lines[0][1] if lines else float("inf")

    return add, query

DP Application

Suppose dp[i] = min_{j < i} (m[j]*x[i] + c[j]) + const[i], where m[j] and c[j] depend on j (and known values), and x[i] is a value at i. Then we can treat each (m[j], c[j]) as a line, add them as we compute j, and query(x[i]) to get the min. If x[i] is monotonic (e.g. prefix sum), the deque version applies and we get O(n) total.

General Case: Li Chao Tree

When query x is not monotonic or lines are added in arbitrary order, we need a structure that supports insert(line) and query(x). The Li Chao segment tree stores in each segment the line that is "best" at the segment's midpoint; updates and queries are O(log U) where U is the range of x (after coordinate compression if needed).

Time and Space Complexity

Deque (monotonic): O(n) total for n add and n query (amortized O(1) per operation).
Li Chao: O(log U) per insert and per query; space O(U) or O(n log U) with dynamic structure.

Common Mistakes

Common Mistake: Using the deque CHT when slopes or queries are not monotonic. Then the "pop front" or "pop back" logic is wrong and the hull is incorrect.

Common Mistake: Division in cross: when m1 == m2 (parallel lines), avoid division by zero; return a sentinel (e.g. -inf so the line that is always worse is removed).

Interview Insight: "When dp[i] = min_j (m[j]*x[i] + c[j]) + const, we have linear functions in x. Convex hull trick: maintain the lower envelope of lines. If slopes and queries are monotonic, use a deque for O(n) total; otherwise Li Chao tree for O(n log U)."

Practice Problems

DP problems where the transition is linear in a variable (e.g. max profit with linear cost per item).
Classic: "Commando" style (dp[i] = max a[i]*b[j] + c[j] + ...).

Summary

Convex Hull Trick: Query min (or max) of linear functions f_k(x) = m_k*x + c_k at given x.
Maintain lower (or upper) envelope; add lines and query. Deque when slopes and x are monotonic; Li Chao for general.
DP: when transition is dp[i] = min_j (m[j]*x[i] + c[j]) + const, CHT gives O(n) or O(n log n) instead of O(n²).
Watch monotonicity assumptions and division by zero in line intersection.

15.19 Profile DP

Introduction

Profile DP (also called contour DP or plug DP) is a technique for counting or optimizing on a grid by processing row by row (or column by column). The profile is the state of the "boundary" between the current row and the previous one—typically which cells are filled, or connectivity information along the contour. The state is often encoded as a mask or a string of length m (the width). We compute dp[row][profile]: number of ways (or best cost) to fill the grid up to the current row with the given profile at the boundary. Transition: try all valid ways to extend the profile to the next row. Used for domino tiling, counting fillings with shapes, and path problems on grids. State space can be exponential in m (e.g. 2^m or larger with connectivity).

What Is a Profile?

When we have filled rows 0..r−1 and are about to fill row r, the profile describes the interface between row r−1 and row r. For simple tiling (e.g. 1×2 dominoes): profile might be a bitmask of length m where bit j = 1 if the cell (r−1, j) is already covered by a domino that "sticks out" into row r (vertical domino). So we know which cells in row r are blocked from above. For connectivity problems (e.g. Hamiltonian path on grid), the profile stores which cells on the contour are "connected" (same path component)—more complex encoding (e.g. bracket sequence or state of m+1 "plugs").

State and Transition

State: dp[r][profile] = number of ways (or min cost) to fill rows 0..r−1 such that the boundary between row r−1 and row r is in state profile.

Base: Row 0: profile usually "empty" or "all cells need to be filled"; dp[0][initial_profile] = 1.

Transition: From (r, profile), try all valid ways to place tiles in row r that are consistent with profile (e.g. vertical domino from above fills a cell; we can place horizontal dominoes in row r). This yields a new profile for the boundary between row r and row r+1. Add to dp[r+1][new_profile].

Example: Domino Tiling (1×2) on n×m Grid

Profile = mask of m bits: bit j = 1 if (r−1, j) is covered by a domino that extends down (so (r, j) is "blocked" from above). We iterate over row r: for each cell we can place a horizontal domino (covers (r,j) and (r,j+1)) if both are free, or a vertical domino (covers (r,j) and (r+1,j)) if (r,j) is free. The state is (column, profile); we go column by column and update the profile (which cells in current row are now covered). Final answer: dp[n][0] (full grid filled, no dangling from last row). Implementation details depend on whether we process cell-by-cell or row-by-row; the key is encoding "what the previous row leaves for the current row."

Time and Space Complexity

State space: O(n × |profiles|). For simple mask, |profiles| = 2^m. With connectivity (bracket/plug), it can be larger but often still manageable for small m.
Transition: Per state we try several choices (e.g. place domino or not); total time O(n × |profiles| × choices). Often used when m is small (e.g. m ≤ 10–20).

Common Mistakes

Common Mistake: Wrong profile encoding: the profile must capture everything the next row needs to know. Missing "which cells are covered from above" or connectivity can make the DP incorrect.

Common Mistake: Off-by-one in row/column: when moving to the next row, the "new" profile is the state after finishing the current row, not before. Be consistent about whether profile is "at the start of row r" or "at the end of row r−1."

Interview Insight: "Profile DP is for grid problems: state is (row, profile) where profile encodes the boundary between rows. We fill row by row and transition by trying all valid ways to extend. State space is often 2^m or similar; use when m is small."

Practice Problems

Domino tiling: count ways to tile n×m with 1×2 (or 2×1) dominoes.
Count ways to fill grid with L-shapes or other fixed shapes.
Hamiltonian path on grid (connectivity profile / plug DP).

Summary

Profile DP: Grid problems; state = (row, profile); profile = boundary state between current and previous row (mask or connectivity).
Process row by row; transition = try all valid ways to fill current row consistent with profile, get new profile.
State space O(n × 2^m) or more; use for small m. Domino tiling is the classic example.
Encode profile correctly; be consistent about row boundaries.

15.20 SOS DP

Introduction

SOS DP (Sum Over Subsets DP) computes, for each mask m (0 to 2ⁿ−1), the sum of A[s] over all submasks s of m (i.e. s ⊆ m, or s & m == s). Naively that is O(3ⁿ) (each mask has 2^(popcount(m)) submasks). SOS DP does it in O(n × 2ⁿ) by processing one bit at a time: after processing bit i, F[m] contains the sum over all submasks of m that only use bits 0..i. Used in problems like "for each mask m count pairs (i, j) with i | j = m" or "sum of values over all submasks."

Formal Definition

Concept Note: Given an array A of length 2ⁿ (indexed by mask), define F[m] = sum of A[s] for all s such that s ⊆ m (s is a submask of m, i.e. the set of bits in s is a subset of the set of bits in m, or (s | m) == m equivalently (s & m) == s). SOS DP computes F for all m in O(n × 2ⁿ). Variant: sum over supermasks (m ⊆ s) uses a similar loop with the opposite direction.

Why This Topic Matters

Fast submask/supermask aggregation: Many problems ask "for each mask, sum over submasks or count pairs with OR = mask." Naive is O(3^n); SOS is O(n × 2^n).
Competitive programming: Common in problems involving bitmasks and subset convolution.
Interview: Less common but appears in bitmask-heavy problems.

Mental Model

Start with F[m] = A[m]. Then for each bit i (0 to n−1): for each mask m that has bit i set, add F[m without bit i] to F[m]. After processing bit i, F[m] = sum of A[s] over all s that are submasks of m and only differ in bits 0..i. After all bits, F[m] = sum over all submasks of m.

Algorithm

Initialize F[m] = A[m] for all m. Then:

for i in 0..n-1:
  for m in 0..(2^n - 1):
    if m has bit i set:
      F[m] += F[m ^ (1 << i)]

Order matters: we must iterate m in ascending order (or ensure we don't use an already-updated F[m ^ (1<<i)] for the same bit i—so we iterate m and only add from the "previous" state, which is F[m ^ (1<<i)] that hasn't included bit i yet). So the inner loop is over all m; for each m with bit i set, we add F[m without i]. This way each submask is counted once.

Python Implementation

def sos_submask(A, n):
    """F[m] = sum of A[s] for all s ⊆ m."""
    F = A[:]
    for i in range(n):
        for m in range(1 << n):
            if m & (1 << i):
                F[m] += F[m ^ (1 << i)]
    return F

Iterating m from 0 to 2^n−1 is correct: when we update F[m], we use F[m ^ (1<<i)] which has not been updated for bit i yet in this round (we're iterating m and m ^ (1<<i) < m when bit i is set in m), so we're adding the sum over submasks that don't include bit i.

Line-by-Line Explanation

F = A[:]: start with F[m] = A[m] (each mask is its own submask).
For each bit i: for each mask m with bit i set, F[m] += F[m ^ (1<<i)]. So we add the sum of all submasks of m that don't have bit i (which is F[m ^ (1<<i)] after previous bits) to F[m]. Thus F[m] becomes the sum over submasks that may or may not have bit i.
After all bits, F[m] = sum of A[s] for all s ⊆ m.

Sum Over Supermasks

To get G[m] = sum of A[s] for all s such that m ⊆ s (s is a supermask of m): iterate bits and for each m that does not have bit i, add G[m | (1<<i)] to G[m]. Process bits and iterate m in descending order so we don't double-count. Alternatively: define m' = complement of m; sum over submasks of m' in A' where A'[s] = A[complement of s]; then relate to supermasks of m.

Time and Space Complexity

Time: O(n × 2ⁿ) — two nested loops.
Space: O(2ⁿ) for F (can do in-place over A).

Edge Cases

n = 0: 2^0 = 1 mask; F[0] = A[0].
In-place: If we update A in place, the order of the inner loop (m from 0 to 2^n−1) ensures we read the "old" value for m ^ (1<<i) when we update m (since m ^ (1<<i) < m when bit i is in m).

Common Mistakes

Common Mistake: Wrong loop order. For submask sum we iterate m from 0 to 2^n−1 and for each m with bit i set we add F[m ^ (1<<i)]. If we iterated m in descending order, we might use an F[m ^ (1<<i)] that already includes bit i in this round and double-count.

Common Mistake: Confusing submask (s ⊆ m) with supermask (m ⊆ s). Submask: s & m == s. Supermask: s & m == m. The SOS loop for submasks: add from m without bit i; for supermasks the direction is different.

Interview Insight: "SOS DP: for each mask m we want sum of A[s] over all submasks s of m. Do F[m]=A[m], then for each bit i and each m with bit i set, add F[m without i] to F[m]. O(n*2^n). Used when we need fast submask or supermask aggregation."

Practice Problems

For each mask m, count pairs (i, j) such that a[i] | a[j] = m (use SOS on frequency array).
Sum over submasks: given A, compute F[m] = sum of A[s] for s ⊆ m.
Subset convolution (more advanced, builds on SOS idea).

Summary

SOS DP: F[m] = sum of A[s] over all submasks s of m; computed in O(n × 2ⁿ).
Initialize F[m] = A[m]; for each bit i, for each m with bit i set: F[m] += F[m ^ (1<<i)].
Loop order: m from 0 to 2^n−1 so we don't double-count. Sum over supermasks uses a different direction.
Use for submask/supermask aggregation in bitmask problems.

Section 16: Bit Manipulation

This section covers bit manipulation: working with numbers at the level of individual bits (0s and 1s). You will learn bit basics (representation, positions, and operators), XOR tricks, single-number and subset/mask problems, Gray code, and bit DP. Mastery of bits is essential for low-level optimization, encoding, and many interview problems that ask for O(1) space or constant-time checks.

16.1 Bit Basics

Introduction

Bit basics are the foundation of bit manipulation: understanding how integers are stored as sequences of bits, how to read and set individual bit positions, and how the core bitwise operators (AND, OR, XOR, NOT, and shifts) work. Every other topic in this section builds on these ideas. Without a solid grasp of bit basics, XOR tricks, bitmasks, and bit DP will feel like magic; with it, they become systematic tools.

Real-World Analogy

Think of an integer as a row of light switches: each switch is either on (1) or off (0). The rightmost switch is the "ones" place, the next is "twos," then "fours," and so on—each position is a power of 2. Flipping a switch changes that bit; bitwise operators are rules for combining two rows of switches (e.g., "turn on a light only if both rows have it on" for AND). Bit basics are learning how these switches are numbered and how the rules work.

Formal Definition

Concept Note: A bit is a binary digit: 0 or 1. Non-negative integers in a computer are represented in binary: as a sum of powers of 2. For example, 13 = 8 + 4 + 1 = 1·2³ + 1·2² + 0·2¹ + 1·2⁰, so in binary we write 1101 (from highest power to lowest). The least significant bit (LSB) is the rightmost bit (position 0); the most significant bit (MSB) is the leftmost. Bitwise operators act on corresponding bits of two numbers (or one number for NOT) to produce a result bit-by-bit.

Why This Topic Matters

Interviews: Many problems (single number, subset generation, power-of-two checks) rely on bit operations. Interviewers often expect you to know AND/OR/XOR/NOT and shifts by heart.
Performance: Bit operations are among the fastest CPU instructions; using them can replace branches and arithmetic in hot loops.
Encoding and flags: Bits are used to represent sets (each bit = one element), permissions, and compact state in DP (e.g., bitmask DP).

Mental Model

For any non-negative integer n:

Binary: Write n as a sum of powers of 2; the coefficient of 2ⁱ is the bit at position i (0 = LSB).
Position i: "Is 2ⁱ included in n?" → check (n >> i) & 1. Set bit i: n | (1 << i). Clear bit i: n & ~(1 << i).
Operators: AND = both 1 → 1; OR = at least one 1 → 1; XOR = exactly one 1 → 1; NOT = flip; left shift = multiply by 2; right shift = integer divide by 2.

Bit Positions and Powers of 2

In binary, the bit at position 0 (rightmost) has value 2⁰ = 1; position 1 has value 2¹ = 2; position 2 has value 4; and so on. So the integer value is the sum of (bit at position i) × 2ⁱ over all i.

  Position:    3  2  1  0   (0 = LSB)
  Power of 2:  8  4  2  1
  n = 13:      1  1  0  1   → 8+4+0+1 = 13
  n = 6:       0  1  1  0   → 0+4+2+0 = 6

To read the bit at position i: shift n right by i so that bit moves to position 0, then mask with 1 to keep only that bit: (n >> i) & 1. To set bit i to 1: n | (1 << i). To clear bit i (set to 0): n & ~(1 << i). To toggle bit i: n ^ (1 << i).

Bitwise Operators (Step-by-Step)

We compare two numbers bit-by-bit. Assume 8-bit width for clarity (Python uses arbitrary precision; the idea is the same).

AND (`&`)

Result bit is 1 only when both input bits are 1. Use: extract a subset of bits (mask), check if a bit is set, clear bits.

  a     = 12  →  1 1 0 0
  b     = 10  →  1 0 1 0
  a & b =  8  →  1 0 0 0   (only position 3 has 1 in both)

OR (`|`)

Result bit is 1 when at least one input bit is 1. Use: set bits, merge sets.

  a     = 12  →  1 1 0 0
  b     = 10  →  1 0 1 0
  a | b = 14  →  1 1 1 0

XOR (`^`)

Result bit is 1 when exactly one input bit is 1 (one or the other, not both). Use: toggle bits, detect difference, cancel duplicates (a ^ a = 0).

  a     = 12  →  1 1 0 0
  b     = 10  →  1 0 1 0
  a ^ b =  6  →  0 1 1 0

NOT (`~`)

Flips every bit. In Python, integers are arbitrary-precision, so ~n is -(n+1) (two's complement of the representation). For a fixed width w, NOT would give (2^w - 1) - n. To clear bit i we use n & ~(1 << i): ~(1 << i) is a mask with every bit 1 except bit i.

Left shift (`<<`)

n << k shifts all bits of n left by k positions; new right bits are 0. Equivalent to multiplying n by 2^k (for non-negative n in range).

  5 << 1  →  10    (5×2)    5 << 2  →  20    (5×4)

Right shift (`>>`)

n >> k shifts all bits right by k; in Python for non-negative n this is integer division by 2^k (floor).

  13 >> 1  →  6    13 >> 2  →  3

Python Implementation: Reading, Setting, Clearing, Toggling

def get_bit(n: int, i: int) -> int:
    """Return the bit at position i (0 = LSB)."""
    return (n >> i) & 1

def set_bit(n: int, i: int) -> int:
    """Set bit at position i to 1."""
    return n | (1 << i)

def clear_bit(n: int, i: int) -> int:
    """Set bit at position i to 0."""
    return n & ~(1 << i)

def toggle_bit(n: int, i: int) -> int:
    """Flip bit at position i."""
    return n ^ (1 << i)

def is_power_of_two(n: int) -> bool:
    """True iff n is 1, 2, 4, 8, ... (exactly one bit set)."""
    return n > 0 and (n & (n - 1)) == 0

Line-by-Line Explanation

get_bit: n >> i moves bit i to position 0; & 1 keeps only that bit (0 or 1).
set_bit: 1 << i is a number with only bit i set; n | ... forces that bit to 1 without changing others.
clear_bit: ~(1 << i) is a mask with 0 only at position i; n & ... clears that bit.
toggle_bit: n ^ (1 << i) flips bit i (0→1, 1→0).
is_power_of_two: Powers of 2 have form 100...0; subtracting 1 gives 011...1. So n & (n-1) is 0 only when n has at most one bit set. We need n > 0 because 0 & (-1) is 0 but 0 is not a power of 2.

ASCII Diagram: AND / OR / XOR at One Position

  Bit A   Bit B   A&B   A|B   A^B
    0       0      0     0     0
    0       1      0     1     1
    1       0      0     1     1
    1       1      1     1     0

Time and Space Complexity

For the basic operations above (get/set/clear/toggle bit, is_power_of_two):

Time: O(1) — a fixed number of arithmetic/bit operations.
Space: O(1) — no extra structures (Python integers are immutable; we return new integers for set/clear/toggle).

If we iterate over all bits of n (e.g., count set bits by looping i = 0 to bit length), the number of iterations is O(log n) for positive n, since the bit length is about log₂(n).

Edge Cases

Negative numbers: In Python, negative integers have a "virtual" infinite sign extension in two's complement. Right shift n >> k is arithmetic for negative n (sign-extending). For bit tricks, we often work with non-negative n or use n & 0xFFFF... to restrict to a fixed width.
Shift amount: n >> k when k is large (e.g. k ≥ bit length of n) gives 0 for non-negative n. n << k with large k can produce very large numbers; no overflow in Python, but be aware of magnitude.
n = 0: 0 & (0-1) is 0, but 0 is not a power of two—hence the n > 0 check in is_power_of_two.

Common Mistakes

Common Mistake: Using 1 << i and forgetting that bit positions are 0-indexed from the LSB. Position 0 is the rightmost bit; position 1 is the second from the right. So "the first bit" often means position 0.

Common Mistake: Confusing logical AND (and) with bitwise AND (&). 5 and 3 is 3 (truthy value); 5 & 3 is 1. Use & for bit masks.

Common Mistake: Assuming n % 2 and n & 1 are always equivalent. For non-negative n they are (both give LSB). For negative n in Python, n % 2 is 0 or 1 by Python's floor-mod rule; n & 1 is still the LSB of the two's-complement representation. For "is n even?" on integers, both are fine in practice; for bit extraction, n & 1 is the standard.

Expert Tip: To count how many bits are set in n: use bin(n).count('1') for clarity, or a loop with n & 1 and n >>= 1, or n.bit_count() in Python 3.10+. For "clear the lowest set bit," use n & (n - 1) — this is the same trick as in is_power_of_two and appears in many problems.

Optimization Insight: Checking even/odd with n & 1 is typically faster than n % 2 (one bit op vs. division). Multiplying by 2^k via n << k and dividing by 2^k via n >> k are also faster than n * (2**k) and n // (2**k). In tight loops, prefer bit operations when the intent is bit-level.

Interview Insight: "Bit basics: know how to get/set/clear/toggle the i-th bit with shift and mask. Know AND, OR, XOR, NOT, and left/right shift. Power-of-two check: n > 0 and (n & (n-1)) == 0. Clear lowest set bit: n & (n-1). These show up everywhere in bit manipulation and bitmask DP."

Practice Problems

Implement count_set_bits(n) (number of 1s in binary).
Check if the i-th bit is set without using get_bit (directly with one expression).
Given n, return the position of the LSB that is 1 (e.g. n=12 → 2; 12 = 1100, lowest set bit is at position 2).
Turn off the rightmost 1 in n using one expression (same as clear lowest set bit).

Summary

Bits: Binary digits 0 and 1; position i has value 2ⁱ; LSB is position 0.
Read/set/clear/toggle: (n >> i) & 1, n | (1 << i), n & ~(1 << i), n ^ (1 << i).
Operators: AND (both 1), OR (at least one 1), XOR (exactly one 1), NOT (flip), << (×2^k), >> (÷2^k).
Power of two: n > 0 and (n & (n - 1)) == 0. Clear lowest set bit: n & (n - 1).
Master these before XOR tricks, bitmasks, and bit DP.

16.2 XOR Tricks

Introduction

XOR tricks are patterns that use the exclusive-or operator (^) to solve problems in elegant, often constant-space ways. XOR has special algebraic properties—cancelling duplicates, toggling bits, and being reversible—that make it ideal for "find the unique element," "swap without extra variable," and "find missing number" type problems. This topic builds directly on Bit Basics: you use the same shift-and-mask ideas, but the focus is on when and why XOR is the right tool.

Real-World Analogy

Imagine two identical keys: if you "XOR" them together, they cancel out and you get nothing (0). If you XOR a key with nothing, you get the same key back. Now suppose you have a pile of duplicate keys and one unique key: XORing every key together cancels all pairs; what remains is the unique one. That's the core idea behind "single number" and "find the missing number" problems: pairing cancels, the odd one out remains.

Formal Definition

Concept Note: XOR (exclusive-or), written a ^ b, is a bitwise operation: the result bit is 1 when exactly one of the two input bits is 1. Key properties: (1) a ^ a = 0 (same value cancels); (2) a ^ 0 = a (identity); (3) XOR is commutative (a ^ b = b ^ a) and associative ((a ^ b) ^ c = a ^ (b ^ c)). So the XOR of a multiset of numbers depends only on the parity of how many times each bit appears: even occurrences cancel, odd ones remain.

Why This Topic Matters

Interview staple: "Single Number," "Missing Number," and "Two numbers that appear once" are common; the expected solution is often O(n) time, O(1) space using XOR.
No extra space: XOR lets you accumulate a "signature" in one variable, so you avoid hashing or extra arrays when finding the unique or missing element.
Foundation for 16.3: Single Number problems (next topic) are direct applications of these tricks.

Mental Model

Cancel pairs: x ^ x = 0. So XORing all numbers in an array where every value appears twice except one → the result is the single number.
Identity: x ^ 0 = x. XORing with 0 leaves the number unchanged; 0 is the "neutral" element.
Order doesn't matter: Because of commutativity and associativity, a ^ b ^ c ^ a ^ b = c (the two a's and two b's cancel).
Reversible: (a ^ b) ^ b = a. So XOR can be used to "encode" and "decode" with the same key (e.g. swap by triple-XOR).

Core XOR Properties (Step-by-Step)

These four facts are the entire foundation. Memorize them.

Self-cancel: a ^ a = 0. Same number XORed with itself gives 0.
Identity: a ^ 0 = a. XOR with 0 does nothing.
Commutative: a ^ b = b ^ a. Order of operands doesn't matter.
Associative: (a ^ b) ^ c = a ^ (b ^ c). So we can write a ^ b ^ c without parentheses and compute in any order.

From (1)–(4) it follows that the XOR of a multiset is 0 if every value appears an even number of times, and equals the XOR of all values that appear an odd number of times. If exactly one value appears once and the rest twice, the XOR of the whole array is that single value.

Classic Tricks

1. Swap two variables without a temporary

Using (a ^ b) ^ b = a and (a ^ b) ^ a = b:

a = a ^ b   # a now holds a^b
b = a ^ b   # b = (a^b)^b = a
a = a ^ b   # a = (a^b)^a = b

After these three steps, a and b are swapped. No extra variable, but only safe for integers (and same variable: if a and b are the same variable, you'd zero it out—so in practice ensure they're distinct indices or values).

2. Find the single number (every other appears twice)

Given an array where every element appears exactly twice except one that appears once, XOR all elements. Pairs cancel; the result is the single number.

def single_number(nums: list[int]) -> int:
    result = 0
    for x in nums:
        result ^= x
    return result

Time O(n), space O(1). This is the core of the next topic (16.3).

3. Find the missing number (0 to n)

Given an array of size n containing distinct integers from 0 to n (inclusive) with one number missing, we can XOR (1) all indices from 0 to n, and (2) all elements in the array. The XOR of (1) and (2) cancels every index that has a matching element; the only "unpaired" value is the missing number.

def missing_number(nums: list[int]) -> int:
    n = len(nums)
    xor_all = 0
    for i in range(n + 1):
        xor_all ^= i
    for x in nums:
        xor_all ^= x
    return xor_all

Alternatively: xor_all = 0; for i in range(n): xor_all ^= i ^ nums[i]; then xor_all ^= n. Same idea: indices and values pair except the missing one.

4. Toggle a bit (from Bit Basics)

n ^ (1 << i) flips the i-th bit. XOR with 1 toggles; XOR with 0 keeps unchanged.

5. Check if two numbers have opposite signs (optional)

For integers a and b, (a ^ b) < 0 is true when one is positive and one is negative (in two's complement, the sign bit differs). Not always asked, but shows XOR of sign bits.

ASCII Diagram: Why XOR Cancels Pairs

  Array: [4, 1, 2, 1, 2]   →  only 4 appears once

  4 ^ 1 ^ 2 ^ 1 ^ 2
  = 4 ^ (1^1) ^ (2^2)     (reorder by commutativity)
  = 4 ^  0   ^  0
  = 4

Python Implementation: Two Numbers That Appear Once

Variation: every element appears twice except two numbers that each appear once. Find those two.

XOR all elements → xor_all = a ^ b (the two unknowns).
Pick any set bit in xor_all (e.g. the lowest: diff = xor_all & -xor_all or xor_all & ~(xor_all - 1)). That bit is 1 in one of a,b and 0 in the other.
Partition the array: XOR all elements where that bit is set → one of the numbers; XOR all where that bit is 0 → the other. (Same as: one group has a, the other has b; pairs within each group still cancel.)

def two_single_numbers(nums: list[int]) -> tuple[int, int]:
    xor_all = 0
    for x in nums:
        xor_all ^= x
    # xor_all = a ^ b (the two singles)
    low_bit = xor_all & (-xor_all)  # or: xor_all & ~(xor_all - 1)
    first = 0
    for x in nums:
        if x & low_bit:
            first ^= x
    second = xor_all ^ first
    return (first, second)

Line-by-Line Explanation (Two Singles)

xor_all: After the loop, xor_all = a ^ b (the two numbers that appear once).
low_bit = xor_all & (-xor_all): In two's complement, -xor_all flips bits and adds 1; xor_all & -xor_all keeps only the lowest set bit of xor_all. So low_bit is a power of 2 where a and b differ.
if x & low_bit: We split numbers into two groups—those with that bit set and those without. One of a,b is in the first group, the other in the second; duplicates still come in pairs in the same group.
first ^= x: XOR of all elements in the first group gives one of the two singles (the other group gives the other). second = xor_all ^ first recovers the second from a ^ b ^ a = b.

Time and Space Complexity

Single number: O(n) time, O(1) space.
Missing number: O(n) time, O(1) space.
Two single numbers: O(n) time, O(1) space (two passes over the array, a few variables).

Edge Cases

Empty array: Single-number and two-singles should define behavior (e.g. return 0 or raise). For missing number, n=0 means the only "index" is 0 and the array is empty, so the missing number is 0.
Single element: For single number, return that element. For missing number with n=1, array has one element (0 or 1); the missing one is the other.
Swap: If a and b refer to the same variable (e.g. arr[i] ^= arr[j] when i == j), you zero that cell. Always check i != j before swapping with XOR.

Common Mistakes

Common Mistake: Using XOR when elements don't appear in pairs. The "single number" trick only works when every other element appears exactly twice. If one element appears three times, the XOR of the array is single ^ x (where x is the triple), not the single.

Common Mistake: For "two numbers that appear once," forgetting to use a bit where a and b differ. If you split by a bit where both have 0 (or both 1), the two singles end up in the same group and you get a ^ b instead of a or b. Always use a set bit of xor_all = a ^ b to partition.

Common Mistake: In "missing number," off-by-one in the range. The set is 0..n (size n+1) with one missing, so the array length is n. Make sure you XOR indices 0..n (inclusive) and all n elements, so every index except the missing number is paired with a value.

Expert Tip: To get the lowest set bit of n: n & (-n) (two's complement) or n & ~(n - 1). Both yield the smallest power of 2 that divides the bit pattern of n; use this to "split" by one differing bit in the two-singles problem.

Optimization Insight: Single-number and missing-number can be done in one pass by maintaining one accumulator and XORing with index and value in the same loop (e.g. for missing: acc ^= i ^ nums[i] for i in range(n), then acc ^= n). Same O(n) but fewer loop iterations and one variable.

Interview Insight: "XOR tricks: (1) a^a=0, a^0=a; order doesn't matter. (2) Single number in pairs: XOR all. (3) Missing number in 0..n: XOR [0..n] and array, result is missing. (4) Two singles: XOR all → a^b; take a differing bit, partition and XOR each group to get a and b. (5) Swap: a^=b; b^=a; a^=b; (ensure different variables)."

Practice Problems

Single Number I: every element twice except one — XOR all.
Missing Number: array of size n with distinct 0..n, one missing — XOR indices and values.
Single Number III: every element twice except two — partition by a bit of (a^b), XOR each group.
Swap two integers without a temporary variable using XOR.
Given an array where every element appears 3 times except one that appears 1 time — think why XOR alone isn't enough (need digit-by-digit mod 3 or similar).

Summary

Properties: a ^ a = 0, a ^ 0 = a; XOR is commutative and associative, so pairs cancel.
Single number (pairs): XOR every element → result is the one that appears once.
Missing number (0..n): XOR all indices 0..n and all array elements → result is the missing number.
Two singles: XOR all → a^b; pick a set bit, partition by that bit, XOR each half to get the two numbers.
Swap: a^=b; b^=a; a^=b; (only when a and b are distinct variables).
These patterns are the basis for the next topic (Single Number problems).

16.3 Single Number Problems

Introduction

Single Number problems ask you to find the element that appears a different number of times than the others—usually once while every other element appears twice (or three times), or to find two elements that each appear once while the rest appear twice. These are direct applications of XOR tricks from 16.2: the "single number I" and "single number III" variants are solved with the patterns you already learned. The "single number II" variant (one appears once, rest three times) cannot be solved by plain XOR alone; it needs a bit-count or state-machine approach. This topic ties everything together and adds the generalization to three occurrences.

Real-World Analogy

Imagine a room of people where everyone has a twin except one person (Single Number I): if you "cancel" every pair, the one left without a pair is the answer. If instead two people have no twin (Single Number III), you first find "how the two differ" (e.g. one has a red badge, one doesn't), split the room into two groups by that trait, then in each group cancel pairs—the leftover in each group is one of the two. For "everyone has two twins except one" (Single Number II), pairing doesn't work; you need to count "at each position, how many people have that trait" and take the remainder mod 3 to isolate the single.

Formal Definition

Concept Note: Single Number I: Array of integers; every element appears exactly twice except one that appears once. Find the single one. Single Number II: Every element appears exactly three times except one that appears once. Find the single one. Single Number III: Every element appears exactly twice except two numbers that each appear once. Find both. Constraints typically allow O(n) time and O(1) extra space; the intended solution uses bit manipulation.

Why This Topic Matters

Interview frequency: Single Number I and III are classic; II appears as a follow-up. Knowing the progression (XOR for pairs → partition for two singles → bit-count/mod-3 for triplets) shows depth.
Reinforces XOR: Single Number I and III are the main use cases for the XOR tricks from 16.2; practicing them cements the "cancel pairs" mental model.
Bit-count pattern: Single Number II introduces counting bits mod k (here k=3), which generalizes to "every other appears k times."

Mental Model

I (pairs): XOR everything → pairs cancel, result is the single. One variable, one pass.
III (two singles): XOR everything → get a ^ b. Use one differing bit to split the array; XOR each half to get a and b.
II (triplets): XOR doesn't cancel triplets. For each bit position, count how many numbers have that bit set; take count % 3. The resulting bits form the single number (or use a state machine: "ones" and "twos" per bit).

Single Number I (One Single, Rest Twice)

This is the direct application of a ^ a = 0 and commutativity: XOR of the entire array equals the single element.

def single_number_i(nums: list[int]) -> int:
    result = 0
    for x in nums:
        result ^= x
    return result

Time: O(n). Space: O(1).

Single Number III (Two Singles, Rest Twice)

After XORing all elements we have xor_all = a ^ b. We need to separate a and b. They differ in at least one bit; pick the lowest set bit of xor_all (so one of a,b has that bit, the other doesn't). Partition the array by that bit and XOR each partition: one partition gives a, the other gives b.

def single_number_iii(nums: list[int]) -> list[int]:
    xor_all = 0
    for x in nums:
        xor_all ^= x
    low = xor_all & (-xor_all)  # lowest set bit
    a, b = 0, 0
    for x in nums:
        if x & low:
            a ^= x
        else:
            b ^= x
    return [a, b]

Time: O(n). Space: O(1). Order of a and b in the output may vary; the problem usually accepts any order.

Single Number II (One Single, Rest Three Times)

XOR cancels pairs, not triplets: if x appears three times, x ^ x ^ x = x, so the XOR of the whole array is not the single. We need to exploit "count mod 3" per bit.

Idea: Bit-count mod 3

For each bit position i, count how many numbers in the array have bit i set. If every other number appears 3 times, that count is 3 * (number of distinct values with bit i) + 0 or 1; the remainder mod 3 is exactly the i-th bit of the single number. So: for each bit i, set the i-th bit of the result to (count of numbers with bit i set) % 3.

def single_number_ii_bitcount(nums: list[int]) -> int:
    result = 0
    for i in range(32):  # assume 32-bit integers
        count = 0
        for x in nums:
            if (x >> i) & 1:
                count += 1
        result |= (count % 3) << i
    return result

This works for non-negative 32-bit integers. For Python's arbitrary-precision integers, you can use i up to max(nums).bit_length() or a fixed upper bound (e.g. 32 or 64) if the problem states a range.

Optimization: State machine (ones and twos)

We can simulate "count mod 3" per bit with two variables ones and twos: ones holds bits that have been set 1 time mod 3, twos holds bits set 2 times mod 3. After processing all numbers, ones is the single. The update rules (for each number x):

def single_number_ii_state(nums: list[int]) -> int:
    ones, twos = 0, 0
    for x in nums:
        ones = (ones ^ x) & ~twos   # add x to ones, but remove from ones if already in twos
        twos = (twos ^ x) & ~ones   # add x to twos, but remove from twos if now in ones
    return ones

Intuition: when we see x, we XOR it into ones (first occurrence), then on second occurrence it moves from ones to twos (because we mask by ~twos and later twos gets x and ones loses it via ~ones), and on third occurrence it clears from both. So each bit appears in ones only when its count mod 3 is 1. One pass, O(1) space.

Evolution: Brute Force → Better → Optimal

Brute force: For each element, count how many times it appears (nested loop or hash). Time O(n²) or O(n) with O(n) space.
Better (I & III): Hash map to count frequencies, then find the element with count 1. O(n) time, O(n) space.
Optimal (I & III): XOR solution — O(n) time, O(1) space. For II, bit-count or state machine — O(n) time, O(1) space (vs. hash).

Comparison Table

  Problem        | Frequency of others | Approach              | Key idea
  ---------------|---------------------|------------------------|------------------
  Single Number I| twice               | XOR all                | Pairs cancel
  Single Number II| three times        | Bit-count % 3 or state | Triplets don't cancel
  Single Number III| twice             | XOR all, then partition| a^b, split by diff bit

Time and Space Complexity

Single Number I: O(n) time, O(1) space.
Single Number III: O(n) time, O(1) space.
Single Number II (bit-count): O(n × bits) time (e.g. O(32n)), O(1) space.
Single Number II (state): O(n) time, O(1) space.

Edge Cases

Single element: I and III — array of length 1 or 2; handle accordingly (e.g. for III with n=2, both are "singles").
Negative numbers (II): Bit-count and state-machine work on the binary representation; for negative integers in Python, 32-bit or 64-bit mask may be needed (e.g. x & 0xFFFFFFFF) if the problem expects a fixed width.
Order of output (III): Returning [a, b] vs [b, a] is usually acceptable; sort if the problem requires sorted order.

Common Mistakes

Common Mistake: Using the "XOR all" solution for Single Number II. XOR of three copies of x is x, so the total XOR is single ^ (xor of all distinct triple values), which is not the single number.

Common Mistake: In Single Number III, splitting by a bit that is 0 in both a and b. You must use a bit where a ^ b is 1, i.e. a set bit of xor_all, so that a and b go to different groups.

Interview Insight: "Single Number I: XOR all. Single Number III: XOR all → a^b; get lowest set bit with xor_all & -xor_all, partition array by that bit, XOR each group to get the two numbers. Single Number II: either count set bits per position mod 3, or use the ones/twos state machine. State the constraint (twice vs three times) and choose the right tool."

Practice Problems

LeetCode 136 — Single Number (I).
LeetCode 137 — Single Number II.
LeetCode 260 — Single Number III.
Generalize: every element appears k times except one that appears once (use count mod k per bit or state with log₂(k) variables).

Summary

Single Number I: XOR every element; pairs cancel → result is the single. O(n), O(1).
Single Number III: XOR all → a^b; pick lowest set bit of that; partition by that bit, XOR each half → the two singles. O(n), O(1).
Single Number II: XOR doesn't work (triplets don't cancel). Use bit-count mod 3 per bit, or ones/twos state machine. O(n), O(1).
Recognize "pairs" (XOR) vs "two singles" (XOR + partition) vs "triplets" (mod 3 or state).

16.4 Subsets via Bitmask

Introduction

Subsets via bitmask is the technique of representing every subset of a set of n elements as an integer from 0 to 2ⁿ − 1: the i-th bit is 1 if and only if the i-th element is in the subset. By iterating over all such integers, we generate all 2ⁿ subsets without recursion or backtracking—just a loop over masks and bit checks. This is the foundation for subset enumeration, brute-force over choices, and bitmask DP (Section 15.14 / 16.6).

Real-World Analogy

Imagine a row of n light switches; each switch controls whether one item is "in" or "out" of a subset. Every possible on/off configuration corresponds to one subset. If we label configurations by counting in binary (0 = all off, 1 = only first on, 2 = only second on, …, 2ⁿ−1 = all on), then we have a one-to-one mapping between integers 0 to 2ⁿ−1 and subsets. Subsets via bitmask is: "for each number in that range, read the bits and build the subset."

Formal Definition

Concept Note: Given a set of n elements (typically indexed 0 to n−1), a bitmask is an integer m in the range 0 ≤ m < 2ⁿ. The subset represented by m is { i : the i-th bit of m is 1 }. So mask 0 is the empty set, mask 2ⁿ−1 is the full set. Subset enumeration via bitmask means iterating m = 0 to 2ⁿ−1 and, for each m, building the subset of indices (or elements) where (m >> i) & 1 is 1.

Why This Topic Matters

Exhaustive search: Many problems ask for "all subsets" or "try every combination of choices"; bitmask gives a simple, non-recursive way to iterate all 2ⁿ subsets.
Bitmask DP: In DP, state is often "subset of items chosen so far"; the state is stored as an integer mask. Subsets via bitmask is how you interpret and iterate those states.
Interview staple: Subset Sum, Partition, "generate all subsets," and TSP-style problems often use or build on this idea.

Mental Model

Mask = subset: Integer m from 0 to 2ⁿ−1 ↔ subset of {0, 1, …, n−1}. Bit i set ⇔ element i in subset.
Iterate: Loop m = 0 to 2ⁿ − 1; for each m, loop i = 0 to n−1 and if (m >> i) & 1, include element i.
Cardinality: The number of elements in the subset is the number of set bits in m (popcount). Subset of size k ↔ mask with exactly k bits set.

Step-by-Step: Generate All Subsets

Let arr be the list of n elements (or use indices 0..n−1).
For m = 0 to 2ⁿ − 1 (inclusive):
Build a list for this mask: for each i in 0..n−1, if (m >> i) & 1, append arr[i] (or i) to the current subset.
Yield or collect this subset (or process it directly).

ASCII Diagram: n = 3

  Elements: [A, B, C]  (indices 0, 1, 2)
  n = 3  →  2^3 = 8 subsets

  mask  binary   subset
  0     000      {}
  1     001      {A}
  2     010      {B}
  3     011      {A,B}
  4     100      {C}
  5     101      {A,C}
  6     110      {B,C}
  7     111      {A,B,C}

Python Implementation

def subsets_bitmask(arr: list) -> list[list]:
    """Generate all subsets of arr using bitmask. Order of subsets follows mask 0..2^n-1."""
    n = len(arr)
    result = []
    for m in range(1 << n):  # 1 << n = 2^n
        subset = []
        for i in range(n):
            if (m >> i) & 1:
                subset.append(arr[i])
        result.append(subset)
    return result

To get only subsets of a given size k, add a check: if bin(m).count('1') == k (or m.bit_count() == k in Python 3.10+) before appending.

Line-by-Line Explanation

1 << n: Same as 2ⁿ; the number of masks.
for m in range(1 << n): m takes values 0, 1, …, 2ⁿ−1.
(m >> i) & 1: 1 if the i-th bit of m is set, 0 otherwise—so we include arr[i] exactly when bit i is 1.
Each m produces one subset; order is by numeric value of the mask (empty set first, then singletons, etc., but not strictly by size—e.g. mask 3 = {0,1} comes before mask 4 = {2}).

Time and Space Complexity

Time: O(2ⁿ × n): we iterate 2ⁿ masks and for each mask iterate n bits and build a list of size at most n. So total O(n · 2ⁿ).
Space: O(2ⁿ) for storing all subsets (each subset can be size O(n)); if we only process one subset at a time and don't store all, space is O(n) for the current subset plus O(1) extra.

Generating all subsets is inherently exponential; the factor of n is from building each subset. For "process each subset without storing" use a generator to keep space O(n).

Edge Cases

n = 0: range(1) gives one mask (0); the only subset is the empty list []. Correct.
Large n: 2ⁿ grows very fast (n=20 → about 1e6, n=25 → 33e6). Use only when n is small enough; for larger n, backtracking or pruning is preferred.
Subsets of indices vs elements: The code above uses arr[i] so subsets are over the actual elements; if you need indices, append i instead.

Common Mistakes

Common Mistake: Using range(2**n) instead of range(1 << n). For small n both work; for larger n, 2**n is an exponentiation and can be slightly slower. More importantly, ensure the upper bound is exclusive: masks are 0 to 2ⁿ−1, so range(1 << n) is correct (stops at 2ⁿ).

Common Mistake: Off-by-one in the bit check: the i-th element corresponds to bit i, not bit n−1−i, unless you deliberately use the other convention. So "first element" is bit 0 (LSB), "last element" is bit n−1.

Expert Tip: To iterate only over masks with exactly k set bits (subsets of size k), you can use for m in range(1 << n): if m.bit_count() == k: .... For very large n, use Gosper's hack or a recursive/backtracking generator to avoid iterating all 2ⁿ masks when you only want a fixed size.

Optimization Insight: If you only need to process each subset (e.g. check a condition) and don't need to store all, use a generator: for m in range(1 << n): subset = [arr[i] for i in range(n) if (m >> i) & 1]; .... This keeps memory O(n). For bitmask DP you don't build the subset list at all—you use the integer mask as the state key.

Interview Insight: "Subsets via bitmask: mask from 0 to 2^n−1, bit i set means element i in subset. Loop m, for each i check (m>>i)&1 and build the subset. Time O(n·2^n), space O(n) if you don't store all. Use when n is small (e.g. ≤20); for 'all subsets of size k' either filter by popcount or use backtracking."

Pattern Recognition

Whenever the problem asks for "all subsets," "every combination of choices from n items," or "try all 2ⁿ possibilities," consider bitmask enumeration. If the problem then asks for the best subset satisfying a constraint (e.g. maximum sum, feasibility), you iterate masks and evaluate each—or use bitmask DP to avoid redoing work (Section 16.6).

Practice Problems

Generate all subsets of an array (LeetCode 78 — Subsets).
Generate all subsets of size k (combinations).
Subset Sum: is there a subset that sums to target? (Iterate masks, sum selected elements.)
Partition Equal Subset Sum: can the array be split into two subsets with equal sum? (Check if any subset sums to half total.)

Summary

Bitmask = subset: Integer m in [0, 2ⁿ−1]; bit i set ⇔ element i in subset.
Enumerate all: For m from 0 to 2ⁿ−1, build subset = [ arr[i] for i in range(n) if (m >> i) & 1 ].
Time: O(n · 2ⁿ); space: O(n) per subset if not storing all.
Foundation for brute-force subset problems and bitmask DP (state = mask).

16.5 Gray Code

Introduction

Gray code (also called reflected binary code) is an ordering of the 2ⁿ n-bit binary numbers such that consecutive numbers differ in exactly one bit. This property is useful in error correction, analog-to-digital conversion, and puzzles (e.g. Tower of Hanoi). In competitive programming and interviews, you may be asked to generate the n-bit Gray code sequence (LeetCode 89) or to convert between binary and Gray encoding. The key formula: the i-th Gray code value (0-indexed) is i ^ (i >> 1).

Real-World Analogy

Imagine a circular dial with 2ⁿ positions, each labeled with an n-bit pattern. If two adjacent positions differed in several bits, a slight misalignment could read a completely wrong value. Gray code ensures that when you move from one position to the next, only one "digit" flips—so small errors cause at most a small misread. It's like a ruler where neighboring marks are as similar as possible so you never jump from 011 to 100 by a tiny error.

Formal Definition

Concept Note: An n-bit Gray code is a sequence of 2ⁿ distinct n-bit integers (usually 0 to 2ⁿ−1) such that every two consecutive values differ in exactly one bit position. The binary reflected Gray code is the standard construction: the i-th term (i from 0 to 2ⁿ−1) is gray(i) = i ⊕ (i >> 1) (XOR of i with i right-shifted by 1). Equivalently, to convert binary b to Gray g: g = b ^ (b >> 1). To convert Gray g back to binary b: start with the MSB of b equal to the MSB of g, then each lower bit is b[i] = g[i] ⊕ b[i+1] (or use a loop: b = g; while g: g >>= 1; b ^= g).

Why This Topic Matters

Interview: LeetCode 89 "Gray Code" asks for the n-bit Gray code sequence; the one-liner i ^ (i >> 1) for i in range(2ⁿ) is the expected solution.
Single-bit change: When you need to enumerate all 2ⁿ states but want consecutive states to differ by one bit (e.g. in some DP or search), Gray code gives that order.
Conversion: Binary ↔ Gray conversion appears in encoding and hardware; the XOR formula is simple and O(1) per value.

Mental Model

Sequence: List of 2ⁿ numbers; each pair of neighbors differs in exactly one bit.
Formula: The i-th Gray code number (0-indexed) = i ^ (i >> 1). No need to build the sequence recursively—just iterate i and apply the formula.
Binary → Gray: g = b ^ (b >> 1). Gray → binary: repeatedly XOR with shifted copy of current value until shift becomes 0.

Construction: Reflected Binary Gray Code

For n=1, the Gray code is just [0, 1]. For n=2, take the n=1 sequence, reflect it (so 1, 0), prefix the first half with 0 and the reflected half with 1: [00, 01, 11, 10] → decimals [0, 1, 3, 2]. For n=3, repeat: reflect [0,1,3,2] to [2,3,1,0], prefix first four with 0 and next four with 1 → [0,1,3,2,6,7,5,4]. The formula i ^ (i >> 1) gives exactly this sequence: for i = 0,1,2,…,7 we get 0,1,3,2,6,7,5,4.

ASCII Diagram: n = 3

  i    binary(i)   i>>1   gray = i^(i>>1)   binary(gray)
  0    000         000    000                0
  1    001         000    001                1
  2    010         001    011                3
  3    011         001    010                2
  4    100         010    110                6
  5    101         010    111                7
  6    110         011    101                5
  7    111         011    100                4

  Gray sequence (order): 0, 1, 3, 2, 6, 7, 5, 4
  Consecutive pairs differ in exactly one bit.

Python Implementation

Generate n-bit Gray code sequence (LeetCode 89)

def gray_code(n: int) -> list[int]:
    """Return the n-bit Gray code sequence (0 to 2^n - 1 in Gray order)."""
    return [i ^ (i >> 1) for i in range(1 << n)]

Binary to Gray

def binary_to_gray(b: int) -> int:
    return b ^ (b >> 1)

Gray to binary

def gray_to_binary(g: int) -> int:
    b = g
    while g:
        g >>= 1
        b ^= g
    return b

Line-by-Line Explanation

gray_code(n): For each i from 0 to 2ⁿ−1, i ^ (i >> 1) produces the Gray code value at position i. The list is already in the required order (consecutive indices differ by 1 in binary, which yields consecutive Gray values differing in one bit).
binary_to_gray: One XOR with the right-shifted value. Each bit of the result is the XOR of that bit and the next higher bit of b, which is the standard binary-to-Gray rule.
gray_to_binary: We recover b from g. The MSB of b equals the MSB of g. For the next bit, b[i] = g[i] ⊕ b[i+1]. The loop does: repeatedly shift g right and XOR into b, so b accumulates the decoded binary. (Starting with b=g, then b ^= (g>>1), then b ^= (g>>2), … recovers b.)

Time and Space Complexity

Generate sequence: O(2ⁿ) time (one operation per value), O(2ⁿ) space for the list. If we stream values (generator), space is O(1).
Binary ↔ Gray (single value): O(1) for binary-to-Gray; O(log n) or O(number of bits) for Gray-to-binary (the while loop runs once per bit).

Edge Cases

n = 0: 2⁰ = 1; the sequence is [0]. 0 ^ (0 >> 1) = 0. Correct.
n = 1: Sequence [0, 1]; consecutive differ in one bit. Correct.
Gray to binary for large g: In Python integers are arbitrary-precision; the while loop runs until g becomes 0, so it runs over the bit length of g.

Common Mistakes

Common Mistake: Confusing the index i with the value. The Gray code sequence is [ gray(0), gray(1), …, gray(2ⁿ−1) ]. So the value at position i is i ^ (i >> 1), not the other way around. We iterate i (binary 0..2ⁿ−1) and compute the Gray value for that index.

Common Mistake: Using i ^ (i << 1) instead of i ^ (i >> 1). Gray code uses right shift: we XOR with the value that has bits shifted right by 1, so each Gray bit is (binary bit i) ⊕ (binary bit i+1).

Expert Tip: If the problem asks for Gray code as binary strings (e.g. ["0","1","11","10"] for n=2), generate the integers with the formula and then format: [format(i ^ (i >> 1), '0' + str(n) + 'b') for i in range(1 << n)]. Or pad with zeros to length n.

Interview Insight: "Gray code: consecutive n-bit numbers differ in one bit. Generate with [i ^ (i >> 1) for i in range(2**n)]. Binary to Gray: b ^ (b >> 1). Gray to binary: b = g; while g: g >>= 1; b ^= g. LeetCode 89 is the standard problem."

Practice Problems

LeetCode 89 — Gray Code: generate the sequence for n.
Convert a given binary number to Gray and back; verify round-trip.
Find the position (index) of a given Gray code value in the n-bit sequence (equivalent to Gray-to-binary).

Summary

Gray code: Ordering of 0..2ⁿ−1 so consecutive values differ in exactly one bit.
Generate: [ i ^ (i >> 1) for i in range(1 << n) ].
Binary → Gray: g = b ^ (b >> 1).
Gray → binary: b = g; while g: g >>= 1; b ^= g.
Uses only XOR and shift; no recursion needed for the sequence.

16.6 Bit DP

Introduction

Bit DP (bitmask dynamic programming) is the technique of using a bitmask— an integer whose bits represent a subset of n elements—as (part of) the state in dynamic programming. Instead of storing "which items we've chosen" as a list or set, we encode it as a single integer from 0 to 2ⁿ−1. The DP state is then something like dp[mask] or dp[mask][j], and transitions correspond to flipping bits (adding or removing one element). This gives O(2ⁿ × poly(n)) solutions for problems like Traveling Salesman (TSP), assignment (n tasks to n people), and "best subset" optimization. Bit DP is the natural combination of Section 16.4 (Subsets via Bitmask) and DP (Section 15): state = subset encoded as mask; transition = try one more element. See also Section 15.14 (Bitmask DP) for more examples.

Real-World Analogy

Imagine you have a checklist of n cities to visit. Instead of writing down the list of cities you've already visited, you keep a single number: its binary digits are 1 for "visited" and 0 for "not visited." So the number 13 (binary 1101) means cities 0, 2, and 3 are visited. When you move to a new city, you flip one bit. In Bit DP we ask: "For every possible checklist (every mask), what is the best cost (or score) we can achieve?" We fill the table from small checklists (few 1s) to the full checklist (all 1s), so when we compute a state we have already computed the smaller states it depends on.

Formal Definition

Concept Note: In Bit DP, the state space includes a mask m ∈ [0, 2ⁿ−1], where bit i = 1 means "element i is included" in the subset we are building. Common state shapes: dp[mask] = best value for the subset mask; dp[mask][i] = best value for subset mask with some extra context (e.g. current position i). Transitions: from a state (mask, …), try adding one new element j (set bit j): new_mask = mask | (1 << j); then dp[new_mask][…] is updated from dp[mask][…] plus the cost or reward of including j. We iterate masks so that when we compute dp[mask], all states for subsets of mask (fewer bits) are already computed.

Why This Topic Matters

Exponential but feasible: For n up to about 20, 2ⁿ is around 1e6; with a small polynomial factor (e.g. n or n²), Bit DP is the standard solution for "choose a subset and optimize."
TSP and assignment: Traveling Salesman (visit all cities once, minimize cost) and "assign n tasks to n people" (minimize total cost) are classic Bit DP problems; interviews and contests ask them.
Unifies bits and DP: You use bit operations (Section 16.1) to encode subsets (Section 16.4) and DP (Section 15) to avoid recomputing; mastering Bit DP shows you can combine these tools.

Bit Operations Recap (State = Mask)

These are the operations you need to implement Bit DP. Let mask be the current subset and i an element (index 0 to n−1).

Include i: new_mask = mask | (1 << i)
Exclude i / remove i: mask & ~(1 << i)
Is i in mask? (mask >> i) & 1 or mask & (1 << i)
Size of subset (popcount): mask.bit_count() (Python 3.10+) or bin(mask).count('1')
Full set (all n elements): full = (1 << n) - 1

Mental Model

State: dp[mask] or dp[mask][j] = "best value when the subset chosen so far is mask" (and optionally "we are at j" or "last chosen was j").
Base case: Usually the empty subset (mask = 0) or a subset with one element (mask = 1 << i).
Transition: To extend mask by one element j (where j is not in mask), set new_mask = mask | (1 << j) and update dp[new_mask][…] from dp[mask][…] plus the cost/reward of adding j.
Order: Iterate masks in increasing order (0 to 2ⁿ−1). When we process mask, all subsets of mask (masks with fewer 1s) have smaller numeric value, so they are already computed. For TSP we need "mask without v," which is smaller than mask, so the same order works.

Step-by-Step: Designing a Bit DP

Identify the "elements": What are the n items? (e.g. n cities, n tasks.)
Define state: What do we need to remember? Usually "which subset" (mask) plus one more thing (e.g. current city, last task assigned). So state = (mask, j) → dp[mask][j].
Base case: Smallest mask (e.g. mask with one element). Set dp[1<<i][i] = 0 or given value.
Transition: For each mask and each "current" j in mask, how did we get here? From some previous state (smaller mask, different j). Write the recurrence (min/max/sum over possible predecessors).
Answer: Usually dp[full_mask][…] or min/max over dp[full_mask][j] for all j, possibly plus a final step (e.g. return to start).
Loop order: Iterate mask from 0 to 2ⁿ−1 so that any "mask without one element" is already computed.

Example 1: Traveling Salesman (TSP)

Problem: n cities (0 to n−1). Cost matrix cost[u][v]. Start at city 0, visit every city exactly once, return to 0. Minimize total cost.

State: dp[mask][v] = minimum cost to have visited exactly the set of cities in mask and currently be at city v (v must be in mask).

Base: dp[1 << 0][0] = 0 (we start at city 0; only city 0 is visited).

Transition: We reached (mask, v) by having been at some city u in the set "mask without v," then moving from u to v. So:

dp[mask][v] = min over u in (mask \ {v}) of ( dp[mask & ~(1<<v)][u] + cost[u][v] )

Answer: After visiting all cities, we are at some v; we need to go back to 0. So ans = min over v of ( dp[full][v] + cost[v][0] ).

Python: TSP (Bit DP)

def tsp(n, cost):
    """cost[i][j] = cost from i to j. Start at 0, visit all once, return to 0. Min total cost."""
    INF = float("inf")
    full = (1 << n) - 1
    dp = [[INF] * n for _ in range(1 << n)]
    dp[1 << 0][0] = 0

    for mask in range(1 << n):
        for v in range(n):
            if not (mask & (1 << v)):
                continue
            prev_mask = mask & ~(1 << v)  # mask without v
            for u in range(n):
                if not (prev_mask & (1 << u)):
                    continue
                if dp[prev_mask][u] != INF:
                    dp[mask][v] = min(dp[mask][v], dp[prev_mask][u] + cost[u][v])

    ans = INF
    for v in range(n):
        if dp[full][v] != INF:
            ans = min(ans, dp[full][v] + cost[v][0])
    return ans

Line-by-Line Explanation (TSP)

full = (1 << n) - 1: The mask with all n bits set (all cities visited).
dp[mask][v]: Minimum cost to visit exactly the cities in mask and end at v.
Base: dp[1<<0][0] = 0: only city 0 visited, we're at 0, cost 0.
For each mask and each v in mask: prev_mask = mask & ~(1<<v) is the set "mask without v." We must have come from some city u in prev_mask; the cost is dp[prev_mask][u] + cost[u][v]. We take the minimum over such u.
Answer: For the full mask, we are at some v; add cost[v][0] to return to 0, and take the minimum over v.

Example 2: Assignment (Min Cost to Assign n Tasks to n People)

Problem: n tasks, n people. cost[i][j] = cost if person j does task i. Each person does exactly one task; each task assigned to exactly one person. Minimize total cost.

State: dp[mask] = minimum total cost to assign the tasks corresponding to the set mask (mask has k set bits = first k "slots" or we interpret: bit j set = person j already assigned). Common convention: mask = "which tasks have been assigned" (task i assigned if bit i set). Then we need "which person did the last task?" — so state dp[mask] where the number of set bits in mask = number of tasks assigned; we assign tasks in order 0,1,…, so mask with k bits = first k tasks assigned. Then dp[mask] = min over "who did the last task" (the k-th task): dp[mask without j] + cost[k-1][j] for j in mask. Simpler: dp[mask] = min cost to assign tasks for the set of indices in mask (we need to know how many tasks = popcount(mask)); the "last" task is one of them. So dp[mask] = min over i in mask of ( dp[mask without i] + cost[popcount(mask)-1][i] ) if we assign task number (popcount-1) to person i. Actually the standard formulation: mask = subset of people already assigned (or tasks); then we assign task by task. Let mask = "tasks already assigned" (bit i = 1 if task i is done). Then dp[mask] = min cost to complete the tasks in mask (each task assigned to a distinct person). Transition: the last task we completed is some i in mask; it was assigned to person j = (popcount(mask)-1)-th person... This gets index-heavy. Simpler: mask = subset of people used (bit j = 1 if person j assigned). We assign task 0, then task 1, ... So when mask has k ones, we've assigned tasks 0..k-1. So dp[mask] = min over j in mask of ( dp[mask without j] + cost[popcount(mask)-1][j] ). Base: dp[0] = 0. Answer: dp[full].

def assignment_min_cost(cost):
    """cost[i][j] = cost for task i to be done by person j. n tasks, n people. Min total cost."""
    n = len(cost)
    full = (1 << n) - 1
    dp = [float("inf")] * (1 << n)
    dp[0] = 0

    for mask in range(1 << n):
        k = mask.bit_count()   # number of tasks assigned so far (tasks 0..k-1)
        for j in range(n):
            if not (mask & (1 << j)):
                continue
            prev = mask & ~(1 << j)
            dp[mask] = min(dp[mask], dp[prev] + cost[k - 1][j])

    return dp[full]

Here mask = set of people already assigned; k = number of people in mask = number of tasks we've assigned (tasks 0 to k−1). So the last task assigned was task k−1, and it was assigned to some person j in mask. Cost for that is cost[k−1][j]; the rest is dp[prev]. We minimize over j.

Time and Space Complexity

TSP: O(2ⁿ × n²) time — for each of 2ⁿ masks and each v (n), we loop over u (n). Space O(2ⁿ × n).
Assignment: O(2ⁿ × n) time — for each mask we iterate over n bits to find j in mask; popcount is O(1) in Python 3.10+. Space O(2ⁿ).

In general, Bit DP is O(2ⁿ × (number of states per mask) × (transition cost)). The 2ⁿ factor is fixed; the rest depends on the problem.

Edge Cases

n = 0 or n = 1: TSP with one city: no travel; return 0. Assignment with n=1: one task, one person; return cost[0][0].
Unreachable / INF: If some costs are infinite, initialize dp with INF and only relax when the previous state is not INF; the answer may be INF if no valid tour or assignment exists.
Base case mask: Ensure the base state (e.g. dp[1<<0][0] for TSP) is set before any state that depends on it; iterate from mask=0 so that small masks are computed first.

Common Mistakes

Common Mistake: Wrong bit operations: writing 1 < i instead of 1 << i; using + to add an element instead of | (1 << i); using - to remove instead of & ~(1 << i). Always use the correct bit ops so the mask truly represents a subset.

Common Mistake: Iteration order: computing dp[mask] before dp[prev] when prev is a subset of mask. For "mask without v," prev is always smaller than mask (fewer bits), so iterating mask from 0 to 2ⁿ−1 ensures prev is already computed. If your transition used "mask with one more element," you'd iterate and add; then you must ensure the state you're extending is already computed (again, smaller mask first).

Common Mistake: Off-by-one in assignment: mask represents "which people assigned"; the number of assigned people is popcount(mask), and they did tasks 0, 1, …, popcount-1. So the last task is task (popcount(mask)-1) and the person who did it is one of the set bits in mask. Double-check cost indices (task index vs person index).

Expert Tip: When the state is (mask, i) and transitions add one element, iterating mask from 0 to 2ⁿ−1 is usually correct because "mask without j" has fewer 1s and thus a smaller numeric value. If you need to iterate by "size" of subset (number of 1s), you can generate masks by popcount: for size in range(n+1), then for each mask with popcount(mask)==size. For TSP we don't need that; numeric order is enough.

Optimization Insight: Memory: TSP uses dp[mask][v]; you only need the current "layer" if you iterate by subset size (all masks with k bits). Then space can be O(n²) per layer, total O(2ⁿ) only if you store all layers. Often the full table is simpler to code. For assignment, dp[mask] is 1D; space O(2ⁿ) is standard.

Interview Insight: "Bit DP: state = (mask, optional). Mask = subset of n elements (bit i = 1 if chosen). Transition = add one element: new_mask = mask | (1<<j). Iterate mask 0 to 2^n-1 so subsets are computed before supersets. TSP: dp[mask][v] = min cost to visit mask and end at v; from dp[mask\v][u] + cost[u][v]. Assignment: dp[mask] = min cost to assign tasks to people in mask; last task (popcount-1) assigned to some j in mask. O(2^n * poly(n))."

Pattern Recognition

Use Bit DP when: (1) You need to "choose a subset" of n elements (n small, e.g. ≤ 20). (2) The objective is min/max/count over valid subsets. (3) The constraint is "each element used at most once" or "visit each exactly once." (4) Subproblems naturally decompose by "which subset we've chosen so far" plus optional context (current position, last item). If the problem asks for "minimum cost to visit all" or "assign all with min cost," think Bit DP.

Practice Problems

TSP: visit all cities once, return to start (min cost).
Assignment: n tasks to n people, cost[i][j], min total cost.
LeetCode 847 — Shortest Path Visiting All Nodes: state (mask, node), BFS or DP.
Partition to K equal sum subsets (can use mask for "which elements in current subset" in backtracking; Bit DP if K and n are small).

Summary

Bit DP: DP state includes a bitmask encoding a subset; transitions add or remove one element (flip one bit).
State: dp[mask] or dp[mask][j]; mask = subset of {0,…,n−1}; use | (1<<i) to add, & ~(1<<i) to remove, (mask>>i)&1 to check.
TSP: dp[mask][v] = min cost to visit mask and end at v; transition from dp[mask\v][u] + cost[u][v]; answer min_v dp[full][v] + cost[v][0].
Assignment: dp[mask] = min cost to assign tasks 0..popcount(mask)-1 to people in mask; transition from dp[mask\j] + cost[k-1][j].
Order: Iterate mask from 0 to 2ⁿ−1 so smaller subsets are ready when computing larger ones. Time O(2ⁿ × poly(n)), space O(2ⁿ × …).

Section 17: Greedy Algorithms

This section covers greedy algorithms: making the best local choice at each step in the hope that it leads to a global optimum. You will learn Activity Selection (maximum non-overlapping intervals), Fractional Knapsack, Huffman Coding, Job Sequencing, Interval Merging, and the Gas Station problem. Greedy works when the problem has optimal substructure and the greedy choice property—recognizing when that holds is a key skill for interviews and contests.

17.1 Activity Selection

Introduction

The Activity Selection problem: given n activities, each with a start time and finish time, select the maximum number of activities that can be performed by a single person (or resource) assuming only one activity at a time. Two activities are compatible if they do not overlap—i.e. the finish time of one is less than or equal to the start time of the other. This is one of the classic problems where a greedy algorithm is optimal: sort by finish time and repeatedly pick the next activity that doesn't overlap the last chosen one.

Real-World Analogy

Imagine you have a single meeting room and many meeting requests with fixed start and end times. You want to schedule as many meetings as possible. The greedy strategy: always choose the meeting that ends earliest among those that haven't started yet (and don't overlap the last one you picked). Ending early frees the room sooner, leaving more room for later meetings. This "earliest finish first" rule turns out to be optimal for maximizing the count.

Formal Definition

Concept Note: Input: n activities; activity i has start time s[i] and finish time f[i] (assume s[i] < f[i]). Two activities i and j are compatible if they do not overlap: f[i] ≤ s[j] or f[j] ≤ s[i]. Goal: Select a set of mutually compatible activities of maximum size. Greedy choice: Sort activities by finish time; then in order, add an activity to the solution if its start time is at least the finish time of the last chosen activity.

Why This Topic Matters

Classic greedy: Activity Selection is the standard first example of "greedy works here"; it teaches the pattern of sorting by one key (finish time) and scanning once.
Interview staple: Same idea appears as "merge intervals," "non-overlapping intervals," "minimum rooms for meetings," and "maximum events you can attend."
Optimal substructure: The problem has the property that an optimal solution contains an optimal solution to a smaller subproblem (activities that start after some time t).

Mental Model

Sort by finish time: So we always consider "what ends first." That way we free the resource as early as possible.
Greedy rule: After choosing some activities, the next valid choice is any activity whose start ≥ last finish. Among those, picking the one that finishes earliest (first in sorted order) is optimal.
Why it works: If an optimal solution passes over an activity that ends earlier than the one it chose at some step, we can swap: the earlier-finishing activity is also compatible and leaves more room for the rest. So there is always an optimal solution that follows our greedy choices.

Step-by-Step Algorithm

Sort activities by finish time (ascending). If finish times tie, any order is fine (or sort by start as tiebreaker).
Initialize: last finish = −∞ (or a value before any start), count = 0 (or list of chosen indices).
For each activity in sorted order: if activity.start ≥ last_finish, then choose it: set last_finish = activity.finish, increment count (or append to list).
Return the count (or the list of chosen activities).

ASCII Diagram

  Activities (start, finish): (1,3), (2,4), (3,5), (0,6), (5,7), (8,9)

  Sorted by finish: (1,3) (2,4) (3,5) (0,6) (5,7) (8,9)
                     f=3   f=4   f=5   f=6   f=7   f=9

  Greedy: pick (1,3) → last=3. (2,4) start 2 < 3 → skip. (3,5) start 3 ≥ 3 → pick, last=5.
          (0,6) start 0 < 5 → skip. (5,7) start 5 ≥ 5 → pick, last=7. (8,9) start 8 ≥ 7 → pick.

  Chosen: (1,3), (3,5), (5,7), (8,9) → 4 activities. Optimal.

Python Implementation

def activity_selection(starts: list[int], finishes: list[int]) -> list[int]:
    """Return indices of a maximum-size set of non-overlapping activities.
    starts[i], finishes[i] = start and finish time of activity i."""
    n = len(starts)
    # Sort by finish time; keep original index
    activities = [(finishes[i], starts[i], i) for i in range(n)]
    activities.sort()

    result = []
    last_finish = -1
    for f, s, idx in activities:
        if s >= last_finish:
            result.append(idx)
            last_finish = f
    return result

Line-by-Line Explanation

We sort by (finish, start, index) so that activities are processed in order of increasing finish time; we keep the original index to return.
last_finish = -1: initially no activity chosen, so any start time ≥ −1 is valid.
For each activity: if its start is at least last_finish, it doesn't overlap the last chosen one; we add it and update last_finish to this activity's finish.
We only need to compare with the last chosen activity (not all previous) because we sorted by finish—the last chosen has the maximum finish among all chosen, so compatibility with it implies compatibility with the whole set.

Time and Space Complexity

Time: O(n log n) — dominated by sorting. One pass over the sorted list is O(n).
Space: O(n) for the list of (finish, start, index) and the result. If we only need the count, we can avoid storing indices and use O(1) extra beyond the sort.

Edge Cases

Empty input: Return empty list (or count 0).
Single activity: Return that one (start ≥ last_finish with last_finish = −1).
All overlap: Only one activity can be chosen; the one that finishes first is selected.
No overlap: All activities are compatible; we take all (each has start ≥ previous finish after sorting).

Common Mistakes

Common Mistake: Sorting by start time instead of finish time. If you sort by start, you might pick a long activity that starts early and blocks many shorter ones that could all fit. The correct greedy choice is "earliest finish first."

Common Mistake: Forgetting that activities are [start, end) or [start, end] — compatibility is typically start_j ≥ end_i (no overlap). Ensure you use s >= last_finish (or strict > if intervals are open). Don't use s > last_finish if the problem allows touching (one ends at 3, next starts at 3).

Interview Insight: "Activity Selection: sort by finish time, then take each activity if its start ≥ last chosen finish. O(n log n). Same idea for 'maximum non-overlapping intervals' or 'maximum events you can attend.'"

Practice Problems

LeetCode 435 — Non-overlapping Intervals (min intervals to remove so rest are non-overlapping: equivalent to max intervals to keep).
LeetCode 452 — Minimum Number of Arrows to Burst Balloons (interval overlap variant).
Maximum number of meetings in one room (same as Activity Selection).

Summary

Problem: Maximum number of non-overlapping activities (each has start, finish).
Greedy: Sort by finish time; repeatedly pick the next activity with start ≥ last chosen finish.
Why optimal: Earliest-finish-first leaves the resource free as early as possible; there is always an optimal solution that includes the greedy choice.
Time: O(n log n); space: O(n).

17.2 Fractional Knapsack

Introduction

The Fractional Knapsack problem: given n items, each with a weight and a value, and a knapsack of capacity W, you may take any fraction of each item (e.g. half of an item). The goal is to maximize the total value in the knapsack without exceeding capacity. Because fractions are allowed, a greedy strategy is optimal: sort items by value per unit weight (value/weight) in descending order, then repeatedly take as much as possible of the next item until the knapsack is full. This contrasts with the 0/1 Knapsack (Section 15.6), where each item is taken whole or not— that problem requires DP; Fractional Knapsack is solved by greedily choosing the "most valuable per pound" first.

Real-World Analogy

Imagine you can fill a bag with grains: rice, gold dust, and sand. Each has a value per kilogram. You want to maximize the total value in the bag. The best strategy is to take as much as you can of the most valuable-per-kg material first (e.g. gold), then the next (e.g. rice), and only use the least valuable (sand) to top off the bag. That's exactly the fractional knapsack greedy: value per unit weight decides the order; you take whole items until capacity runs out, then a fraction of the next item if needed.

Formal Definition

Concept Note: Input: n items; item i has weight w[i] and value v[i]; knapsack capacity W. We may take a fraction x[i] of item i (0 ≤ x[i] ≤ 1), contributing weight x[i]*w[i] and value x[i]*v[i]. Constraint: Total weight ≤ W. Goal: Maximize total value. Greedy: Sort items by v[i]/w[i] (value per unit weight) descending. In that order, take each item fully if capacity allows; otherwise take the fraction that fills the remaining capacity.

Why This Topic Matters

Greedy vs DP: Fractional Knapsack is the classic example where greedy works because we can "take a fraction"; 0/1 Knapsack cannot be solved by this greedy (counterexample: one heavy high-value item vs many light medium-value items).
Interview: Often asked as "maximum value you can get with capacity W if you can take fractions of items."
Key idea: When the choice is continuous (fractions), "value per unit" is the right ordering; when the choice is discrete (whole items only), we need DP.

Mental Model

Value density: value/weight = how much value we get per unit capacity. We want to fill the knapsack with the "densest" value first.
Greedy order: Process items from highest value/weight to lowest. For each item, take min(remaining capacity, item weight)—i.e. take the whole item or a fraction that fills the rest.
Why optimal: If an optimal solution used less of a high-density item and more of a lower-density item, we could swap: replace some of the lower-density with high-density and get more value for the same weight. So an optimal solution always uses items in order of decreasing value/weight until capacity is used.

Step-by-Step Algorithm

Compute value per unit weight for each item: ratio[i] = v[i] / w[i]. Handle w[i] == 0 (infinite ratio: take that item fully; or skip if no weight).
Sort items by ratio descending (highest value-per-weight first).
Initialize: remaining = W, total_value = 0.
For each item in sorted order: take take = min(remaining, w[i]); add take * (v[i]/w[i]) to total_value; subtract take from remaining. If remaining == 0, break.
Return total_value (and optionally which fractions of which items were taken).

ASCII Diagram

  Items: (weight, value)  (2, 40), (3, 50), (5, 60)   Capacity W = 5
  Ratio (v/w): 40/2=20, 50/3≈16.67, 60/5=12
  Sorted by ratio: (2,40), (3,50), (5,60)

  Take (2,40) fully: weight 2, value 40, remaining = 3.
  Take (3,50) fully: weight 3, value 50, remaining = 0. Stop.
  Total value = 40 + 50 = 90. (Item 3 not taken; we're full.)

Python Implementation

def fractional_knapsack(weights: list[float], values: list[float], capacity: float) -> float:
    """Maximize value with capacity W; fractions of items allowed. Returns max total value."""
    n = len(weights)
    # (ratio = value/weight, weight, value) — sort by ratio desc
    items = []
    for i in range(n):
        w, v = weights[i], values[i]
        ratio = v / w if w > 0 else float('inf')
        items.append((ratio, w, v))
    items.sort(key=lambda x: x[0], reverse=True)

    total_value = 0.0
    remaining = capacity
    for ratio, w, v in items:
        if remaining <= 0:
            break
        take = min(remaining, w)
        total_value += take * ratio   # take * (v/w) = value taken
        remaining -= take
    return total_value

Line-by-Line Explanation

ratio = v / w: value per unit weight. If w == 0, we take that item fully (ratio infinity) or define behavior (e.g. skip).
sort(..., reverse=True): highest ratio first so we fill the knapsack with the most "valuable per kg" items first.
take = min(remaining, w): take the whole item or only the fraction that fits. take * ratio is the value we get (take * (v/w) = (take/w) * v).
We stop when remaining <= 0 (knapsack full) or when we've processed all items.

Time and Space Complexity

Time: O(n log n) — sorting by ratio. The scan is O(n).
Space: O(n) for the list of (ratio, w, v). In-place sorting of indices by ratio would also work and keep the same complexity.

Edge Cases

Capacity 0: Return 0 (we take nothing).
Weight 0: Item contributes no weight but has value; ratio is infinite. Taking it fully gives free value. Handle by ratio = inf and taking take = min(remaining, w) — for w=0 we take 0 weight, so we might need to add the full value if we allow "zero-weight items" (problem-dependent). Often problems assume w[i] > 0.
Total weight < W: We take all items; remaining stays positive; total value is sum of all values.

Common Mistakes

Common Mistake: Sorting by value or weight instead of by value/weight. A very valuable heavy item might have lower value-per-pound than a lighter, less valuable item; taking the heavy one first can block better choices. Always sort by ratio (value per unit weight).

Common Mistake: Using this greedy for 0/1 Knapsack (whole items only). In 0/1 you cannot take a fraction—each item is all or nothing. Greedy by value/weight can be wrong: e.g. W=4, items (3, 18) and (2, 10). Ratio 6 and 5. Fractional greedy takes (3,18) and half of (2,10) → value 23. In 0/1 you can take at most one of them; best is 18. So for 0/1 Knapsack use DP (Section 15.6), not this greedy.

Expert Tip: If the problem asks for "which items and what fraction," store in the loop: for each item record (index, take_amount) and return (total_value, list of (index, fraction)). Fraction = take / w[i].

Interview Insight: "Fractional Knapsack: sort by value/weight descending, then take each item fully or a fraction that fills remaining capacity. O(n log n). Only works when fractions are allowed; for 0/1 Knapsack use DP."

Practice Problems

Standard Fractional Knapsack (given weights, values, W; return max value).
Same but return the list of (item index, fraction taken).
Compare with 0/1 Knapsack: same instance, show that fractional optimal ≥ 0/1 optimal, and that greedy fractional can exceed 0/1 optimal.

Summary

Problem: Maximize value with capacity W; any fraction of each item allowed.
Greedy: Sort by value/weight (desc); take each item fully or the fraction that fills remaining capacity.
Why optimal: Value-per-weight ordering ensures we never prefer a lower-density item over a higher-density one; swapping would improve.
Time: O(n log n); space: O(n). Does not work for 0/1 Knapsack—use DP there.

17.3 Huffman Coding

Introduction

Huffman Coding is a greedy algorithm used for lossless data compression. Given symbols and their frequencies, it builds a binary prefix code so that frequent symbols get shorter codes and rare symbols get longer codes. The objective is to minimize the total number of bits needed to encode the data. Huffman coding is widely used in compression systems (for example, parts of ZIP, PNG, and JPEG pipelines).

Real-World Analogy

Imagine sending messages where some words appear very often ("the", "is") and others are rare. If every word had a fixed-length code, you waste bits on common words. A smarter strategy is to assign very short codes to common words and longer codes to rare words. Huffman coding does exactly this in an optimal way while ensuring the decoding process is unambiguous.

Formal Definition

Concept Note: Given symbols S = {s1, s2, ..., sk} with frequencies f1, f2, ..., fk, find a prefix-free binary code (no code is a prefix of another) minimizing: sum(fi * code_length(si)). Huffman's greedy solution repeatedly combines the two lowest-frequency nodes into one parent node until a single tree remains. Left edge is often labeled 0, right edge 1.

Why This Topic Matters

Core greedy proof pattern: "pick smallest two repeatedly" is a classic example of greedy choice plus optimal substructure.
Practical systems: Understanding Huffman helps you reason about real compression formats and encoding tradeoffs.
Interview-ready: Commonly appears with heaps/priority queues and tree construction questions.

Mental Model

Each symbol starts as a leaf with weight = frequency.
Always merge the two lightest trees first (greedy choice).
The merged node has weight equal to the sum of child weights.
Repeat until one tree remains; path from root to leaf is the symbol's code.
Frequent symbols tend to stay near the root, so they get shorter codes.

Step-by-Step Breakdown

Create a min-heap (priority queue) of nodes: each node is (frequency, symbol/tree).
While heap size > 1:
- Pop the two smallest nodes x and y.
- Create a parent node with frequency x.freq + y.freq.
- Set parent.left = x (bit 0), parent.right = y (bit 1).
- Push parent back into heap.
The remaining node is the root of the Huffman tree.
DFS from root to assign codes: append '0' on left, '1' on right.

ASCII Diagram

Example frequencies:
  A:5  B:9  C:12  D:13  E:16  F:45

Merge smallest repeatedly:
  5+9=14
  12+13=25
  14+16=30
  25+30=55
  45+55=100 (root)

One possible tree (0=left,1=right):

                 [100]
                /     \
             F:45     [55]
                     /    \
                   [25]   [30]
                  /   \   /   \
               C:12 D:13 [14] E:16
                         /  \
                       A:5  B:9

Codes (one valid assignment):
  F:0
  C:100
  D:101
  A:1100
  B:1101
  E:111

Python Implementation

import heapq
from dataclasses import dataclass
from typing import Optional

@dataclass
class Node:
    freq: int
    ch: Optional[str] = None
    left: Optional["Node"] = None
    right: Optional["Node"] = None

def huffman_codes(freq_map: dict[str, int]) -> dict[str, str]:
    """
    Build Huffman codes for symbols in freq_map.
    Returns mapping: symbol -> binary code.
    """
    heap = []
    uid = 0  # tie-breaker for stable heap ordering
    for ch, freq in freq_map.items():
        heapq.heappush(heap, (freq, uid, Node(freq=freq, ch=ch)))
        uid += 1

    if not heap:
        return {}

    # Special case: one symbol -> assign "0"
    if len(heap) == 1:
        only = heap[0][2]
        return {only.ch: "0"}

    while len(heap) > 1:
        f1, _, n1 = heapq.heappop(heap)
        f2, _, n2 = heapq.heappop(heap)
        parent = Node(freq=f1 + f2, left=n1, right=n2)
        heapq.heappush(heap, (parent.freq, uid, parent))
        uid += 1

    root = heap[0][2]
    codes = {}

    def dfs(node: Node, path: str) -> None:
        if node.ch is not None:
            codes[node.ch] = path
            return
        dfs(node.left, path + "0")
        dfs(node.right, path + "1")

    dfs(root, "")
    return codes

Line-by-Line Explanation

Use a min-heap so we can always extract the two smallest frequencies in O(log n).
uid avoids comparison errors when frequencies tie (heap entries stay comparable).
Each merge creates an internal node with summed frequency.
After all merges, one root remains: that root encodes all symbols.
DFS builds codes from root to leaves; left adds '0', right adds '1'.
Single-symbol input is a special edge case: assign code "0" so encoding isn't empty.

Brute Force → Better → Optimal

Brute Force (impractical)

Enumerate all prefix-free binary code assignments and pick the best weighted length. This search space is enormous and not feasible.

Better Intuition

We suspect low-frequency symbols should be deeper and high-frequency symbols shallower, but manually enforcing this for all symbols is still hard.

Optimal Greedy (Huffman)

Merge the two least frequent symbols/subtrees first, reduce the problem size by one, and repeat. Using a min-heap gives O(k log k), where k is number of distinct symbols.

Time Complexity

Building heap: O(k) to heapify or O(k log k) with repeated push (both acceptable).
Merges: k−1 merges, each with pop/pop/push = O(log k), total O(k log k).
DFS code generation: O(k) nodes/leaves traversal.
Overall: O(k log k).

Space Complexity

Heap stores up to O(k) nodes.
Tree has O(k) leaves and O(k) internal nodes, so O(k) total auxiliary space.
Code map stores one code per symbol: O(k).

Edge Cases

Empty frequency map: return empty code map.
One symbol: assign "0" (or "1") by convention.
Tied frequencies: multiple valid Huffman trees/codes may exist; total encoded length remains optimal.

Common Mistakes

Common Mistake: Forgetting prefix-free requirement. If one code is prefix of another, decoding becomes ambiguous.

Common Mistake: Using a max-heap or sorting descending and merging largest frequencies first. That increases average code length.

Common Mistake: Ignoring equal-frequency tie handling in Python heap tuples, causing comparison errors between custom objects.

Optimization Insight: If frequencies are already sorted and static, a two-queue linear-time merge strategy can build the tree in O(k) after sorting. In practice, min-heap O(k log k) is simpler and standard.

Interview Insight: "Huffman coding: push all symbol frequencies into a min-heap, repeatedly merge the two smallest nodes, then DFS for 0/1 codes. This greedy is optimal for minimum weighted path length in a prefix code. Complexity O(k log k)."

Pattern Recognition

If a problem says "combine two minimum-cost elements repeatedly" and each merge contributes to future cost, think min-heap greedy. Huffman is the canonical version of this pattern.

Practice Problems

Given symbol frequencies, build Huffman codes and compute total encoded bits.
Given a Huffman tree, output all symbol codes.
Given text, build frequency map, compress with Huffman codes, and decode back.

Summary

Goal: Minimum weighted prefix-free binary encoding.
Greedy rule: Always merge the two least frequent nodes.
Data structure: Min-heap for efficient repeated smallest extraction.
Result: Frequent symbols get shorter codes, rare symbols longer codes.
Complexity: O(k log k) time, O(k) space.

17.4 Job Sequencing

Introduction

Job Sequencing with Deadlines is a classic greedy scheduling problem. Each job takes exactly one unit of time, has a deadline (latest time slot by which it must be finished), and gives a profit if completed by deadline. We can do at most one job per time slot. The goal is to choose and schedule jobs to maximize total profit.

The key greedy strategy is: sort jobs by profit descending, and for each job place it in the latest available slot not exceeding its deadline. This preserves earlier slots for other jobs and leads to optimal profit.

Real-World Analogy

Imagine you are a freelancer with limited daily slots and many client tasks. Each task pays differently and has a latest acceptable submission day. You want maximum earnings. Intuitively, you pick the highest-paying task first. But you schedule it as late as possible before its deadline so earlier slots remain open for other tasks. That is exactly the job sequencing greedy rule.

Formal Definition

Concept Note: Input is a list of jobs, each as (id, deadline, profit). Every job requires one unit time. Time slots are usually 1..D, where D = max(deadline). A job with deadline d must be scheduled in some slot t ≤ d. Objective: maximize total profit. Greedy algorithm:

Sort jobs by decreasing profit.
For each job, assign it to the latest free slot ≤ its deadline.
If no such slot exists, skip the job.

Why This Topic Matters

Interview classic: Frequently asked in greedy sections, often with "return max jobs and max profit".
Greedy intuition: Shows how local choices (highest profit first) combine with careful placement (latest valid slot).
Foundation for scheduling: Builds intuition for interval scheduling and deadline-based planning problems.

Mental Model

Think of deadlines as limited parking spaces in time.
High-profit jobs are "more valuable cars" that should get priority.
Park each chosen job in the rightmost spot it can legally occupy.
Rightmost placement keeps left spots available for jobs with earlier deadlines.

Brute Force → Better → Optimal

Brute Force

Try all subsets of jobs, and for each subset check if it can be scheduled before deadlines, then compute profit. This is exponential (O(2^n * n log n) or worse), not feasible for large n.

Better

Sort by profit and for each job linearly scan backward for a free slot. This is the classic greedy implementation: O(n log n + n * D), where D is max deadline (or O(n^2) in worst case when D ~ n).

Optimal (with DSU / Disjoint Set)

Use union-find to quickly locate the latest free slot ≤ deadline in almost O(1) amortized time. Complexity becomes O(n log n + n * alpha(n)), typically written O(n log n). This is preferred when deadlines are large.

Step-by-Step Greedy (Simple Slot Array)

Sort all jobs by profit descending.
Compute max_deadline.
Create slots array of size max_deadline + 1 initialized empty.
For each job, scan from min(deadline, max_deadline) down to 1.
If a free slot is found, place job there and add its profit.
Return selected job sequence (or count) and total profit.

ASCII Diagram

Jobs: (id, deadline, profit)
  J1(2,100), J2(1,19), J3(2,27), J4(1,25), J5(3,15)

Sort by profit:
  J1(2,100), J3(2,27), J4(1,25), J2(1,19), J5(3,15)

Slots: [1] [2] [3]
Start empty:  _   _   _

Place J1 at latest <=2  -> slot2
  _  J1  _
Place J3 at latest <=2  -> slot1 (slot2 filled)
 J3  J1  _
J4 deadline1 -> slot1 occupied, skip
J2 deadline1 -> slot1 occupied, skip
Place J5 at latest <=3 -> slot3
 J3  J1  J5

Total profit = 27 + 100 + 15 = 142

Python Implementation (Simple Greedy)

from dataclasses import dataclass

@dataclass
class Job:
    job_id: str
    deadline: int
    profit: int

def job_sequencing(jobs: list[Job]) -> tuple[list[str], int]:
    """
    Returns (scheduled_job_ids_in_slot_order, max_profit).
    Each job takes 1 unit time and must be done by its deadline.
    """
    if not jobs:
        return ([], 0)

    jobs.sort(key=lambda j: j.profit, reverse=True)
    max_deadline = max(job.deadline for job in jobs)
    slots = [None] * (max_deadline + 1)  # index 0 unused

    total_profit = 0
    for job in jobs:
        for t in range(min(job.deadline, max_deadline), 0, -1):
            if slots[t] is None:
                slots[t] = job
                total_profit += job.profit
                break

    schedule = [slots[t].job_id for t in range(1, max_deadline + 1) if slots[t] is not None]
    return (schedule, total_profit)

Line-by-Line Explanation

Sort by highest profit first so most valuable jobs get first chance.
Slots represent time 1..max_deadline; one job per slot.
For each job, scan backward from its deadline to find the latest free valid slot.
When scheduled, add profit once and move to next job.
Backward scan is crucial; forward scan can block future jobs with tighter deadlines.

Optimization Insight (DSU Approach)

Optimization Insight: To avoid backward scanning for every job, use DSU (union-find). Let parent[t] point to the latest available slot ≤ t. When slot s is occupied, union it with s-1, meaning next query for s should jump directly to the next free candidate. This reduces scheduling lookup to near-constant time.

Python Implementation (DSU Optimized)

def job_sequencing_dsu(jobs: list[Job]) -> tuple[list[str], int]:
    if not jobs:
        return ([], 0)

    jobs.sort(key=lambda j: j.profit, reverse=True)
    max_deadline = max(job.deadline for job in jobs)
    parent = list(range(max_deadline + 1))  # parent[t] = best available slot <= t
    slot_job = [None] * (max_deadline + 1)

    def find(x: int) -> int:
        if parent[x] != x:
            parent[x] = find(parent[x])
        return parent[x]

    total_profit = 0
    for job in jobs:
        s = find(min(job.deadline, max_deadline))
        if s > 0:
            slot_job[s] = job
            total_profit += job.profit
            parent[s] = find(s - 1)  # mark s as used

    schedule = [slot_job[t].job_id for t in range(1, max_deadline + 1) if slot_job[t] is not None]
    return (schedule, total_profit)

Time Complexity

Simple greedy: Sorting O(n log n), placement O(n * D) in worst case, where D=max deadline.
If D ~ n: often simplified as O(n^2).
DSU optimized: Sorting O(n log n), each find/union near O(alpha(n)), total O(n log n).

Space Complexity

Simple version: O(D) for slots.
DSU version: O(D) for parent + slot tracking arrays.
Additional sorting overhead depends on language implementation, typically O(n).

Edge Cases

No jobs: answer is empty schedule and profit 0.
Deadlines <= 0: such jobs are unschedulable in 1-based slot model; skip or prefilter.
Same deadlines/profits: multiple optimal schedules may exist with same profit.
Very large deadlines: if max deadline is huge but n is small, coordinate-compress deadlines or cap relevant slots to n.

Common Mistakes

Common Mistake: Placing each selected job at the earliest free slot instead of latest free slot. That can block jobs with tighter deadlines and reduce total profit.

Common Mistake: Sorting by deadline first. For this objective (maximize profit), primary sort should be by profit descending.

Common Mistake: Confusing this with interval scheduling where jobs have arbitrary durations. Here each job takes exactly one unit time.

Interview Insight: "Job sequencing with deadlines: sort jobs by profit descending, then assign each to the latest free slot <= deadline. Simple implementation uses backward slot scan; optimized version uses DSU to find slots fast."

Pattern Recognition

When you see tasks with unit duration, deadlines, and profits where objective is maximize total profit, think of this exact greedy pattern: highest profit first + latest valid slot placement.

Practice Problems

Standard Job Sequencing Problem (maximize count and profit).
Return scheduled job IDs in slot order and total profit.
Implement both backward-scan and DSU versions; compare runtime on large random tests.

Summary

Goal: Maximize profit under deadlines with unit-time jobs.
Greedy choice: Process jobs by highest profit first.
Placement rule: Put each chosen job in the latest available slot <= deadline.
Complexity: O(n log n + nD) simple, O(n log n) with DSU optimization.

17.5 Interval Merging

Introduction

Interval Merging means combining overlapping intervals into disjoint intervals that cover the same ranges. Given intervals like [1,3] and [2,6], they overlap, so they merge into [1,6]. This is a core greedy pattern used in scheduling, calendar systems, event conflict detection, and range normalization.

Real-World Analogy

Think of booked times on a calendar. If one meeting is 10:00–11:00 and another is 10:30–12:00, you can summarize occupied time as 10:00–12:00. If a third meeting is 13:00–14:00, that remains separate. Merging intervals creates a clean "occupied blocks" view.

Formal Definition

Concept Note: Input: list of intervals [start, end] with start <= end. Output: a list of non-overlapping intervals where:

Union of output intervals equals union of input intervals.
No two output intervals overlap.

Standard greedy approach:

Sort intervals by start time.
Scan from left to right, merging into the last output interval when overlap exists.

Why This Topic Matters

Interview frequent: LeetCode 56 (Merge Intervals) is one of the most common interval problems.
Foundational pattern: Sorting + linear scan appears in many interval tasks (insert interval, erase overlap, meeting rooms).
Data cleaning: Practical when normalizing time ranges, IP ranges, or numeric segments.

Mental Model

After sorting by start, any future overlap can only happen with the most recently merged interval.
Keep a result list; compare current interval with result[-1].
If overlap: extend end boundary.
If no overlap: start a new merged block.

Step-by-Step Breakdown

If input is empty, return empty list.
Sort intervals by start ascending (tie-break by end ascending is fine).
Initialize result with first interval.
For each next interval [s, e]:
- If s <= last_end, overlap exists; update last_end = max(last_end, e).
- Else, append [s, e] as a new interval.
Return result.

ASCII Diagram

Input:
  [1,3]  [2,6]  [8,10]  [15,18]

Sorted (already sorted):
  [1,3], [2,6], [8,10], [15,18]

Scan:
  result = [1,3]
  [2,6] overlaps with [1,3] -> merge -> [1,6]
  [8,10] no overlap with [1,6] -> append
  [15,18] no overlap with [8,10] -> append

Output:
  [1,6], [8,10], [15,18]

Python Implementation

def merge_intervals(intervals: list[list[int]]) -> list[list[int]]:
    """
    Merge overlapping intervals.
    Intervals are inclusive ranges [start, end].
    """
    if not intervals:
        return []

    intervals.sort(key=lambda x: x[0])
    merged = [intervals[0][:]]  # copy first interval

    for s, e in intervals[1:]:
        last = merged[-1]
        if s <= last[1]:  # overlap
            last[1] = max(last[1], e)
        else:
            merged.append([s, e])

    return merged

Line-by-Line Explanation

Sort by start so potential overlaps become adjacent.
merged stores normalized disjoint intervals built so far.
Only compare current interval with merged[-1], not all previous intervals.
Overlap condition s <= last_end merges touching/inclusive intervals.
Non-overlap starts a new block in output.

Brute Force → Better → Optimal

Brute Force

Repeatedly compare every pair, merge overlaps, restart until stable. This can devolve into O(n^2) or worse.

Better

Sort first, then only compare neighbors conceptually during one pass. This avoids repeated global scans.

Optimal (comparison model)

O(n log n) due to sorting lower bound. After sorting, the merge scan is O(n). Total O(n log n) is optimal for arbitrary unsorted input.

Time Complexity

Sorting: O(n log n)
Single pass merge: O(n)
Overall: O(n log n)

Space Complexity

Output: O(n) in worst case (no intervals overlap).
Extra: O(1) beyond output if sorting in place (language-dependent sort stack may add O(log n)).

Edge Cases

Empty input: return [].
Single interval: return it unchanged.
Fully nested: [1,10] and [2,3] -> keep [1,10].
Touching intervals: [1,2] and [2,3] merge if inclusive endpoints are intended.

Common Mistakes

Common Mistake: Forgetting to sort first. Without sorting, local comparisons are unreliable and intervals can be missed.

Common Mistake: Using wrong overlap rule. Decide whether touching intervals should merge: s <= last_end (inclusive merge) vs s < last_end (strict overlap only).

Common Mistake: Appending references directly and mutating original data unexpectedly. Copy intervals when needed.

Optimization Insight: If intervals are already sorted by start, skip sorting and merge in O(n). In streaming systems with ordered incoming ranges, this gives linear-time incremental merging.

Interview Insight: "Merge intervals: sort by start, initialize answer with first interval, then for each interval either extend the last merged interval if overlap exists, or append as new. O(n log n) time."

Pattern Recognition

Whenever input is intervals/ranges and you need normalized non-overlapping blocks, start with sort by start + linear scan. This pattern also appears in Insert Interval and many event timeline problems.

Practice Problems

LeetCode 56 — Merge Intervals.
LeetCode 57 — Insert Interval.
LeetCode 435 — Non-overlapping Intervals (related greedy variant).
LeetCode 452 — Minimum Number of Arrows to Burst Balloons.

Summary

Core idea: Sort intervals by start, then merge overlaps in one pass.
Overlap check: compare current start with last merged end.
Complexity: O(n log n) time, O(n) output space.
Key caution: define inclusive vs strict overlap correctly.

17.6 Gas Station Problem

Introduction

The Gas Station Problem asks: given two arrays gas and cost, where gas[i] is fuel available at station i and cost[i] is fuel needed to go from station i to i+1 (circularly), can we complete one full cycle? If yes, return a valid starting station index; otherwise return -1.

This is a famous greedy problem because a naive "try every start" approach is O(n^2), while the optimal greedy solution is O(n): one pass, constant extra space, and a clear correctness argument.

Real-World Analogy

Imagine driving around a circular highway with fuel pumps at checkpoints. At each checkpoint you get some fuel, then you spend fuel to reach the next checkpoint. If your tank becomes negative at some point, your chosen start clearly fails. The greedy insight is stronger: if start s fails at station t, then no station between s and t can be a valid start either.

Formal Definition

Concept Note: Input: arrays gas[0..n-1], cost[0..n-1]. Define net gain at i as delta[i] = gas[i] - cost[i]. We need an index start such that cumulative fuel from start around the full circle never drops below 0.

If sum(gas) < sum(cost), solution does not exist.
If sum(gas) >= sum(cost), at least one solution exists (in this problem setting, unique if guaranteed).

Why This Topic Matters

Interview staple: LeetCode 134 is one of the most common greedy interview questions.
Greedy proof skill: Teaches elimination arguments ("if this segment fails, all starts inside it fail").
Linear optimization: Great example of reducing O(n^2) brute force to O(n).

Mental Model

Track running tank while scanning from left to right.
If tank drops below 0 at i, current start cannot reach i+1.
Any start between current start and i also fails (it would have even less fuel before reaching i+1).
So move start to i+1 and reset local tank to 0.
Global feasibility is checked by total sum.

Brute Force → Better → Optimal

Brute Force

Try each station as start, simulate full cycle each time. Complexity O(n^2), too slow for large n.

Better Intuition

Observe that failures eliminate whole ranges of starts, not just one index. We should skip impossible candidates in bulk.

Optimal Greedy

One pass with two accumulators: total += gas[i]-cost[i] for feasibility and tank += gas[i]-cost[i] for current candidate start. If tank < 0, reset start to i+1 and tank to 0. At end, if total < 0 return -1; else return start.

Step-by-Step Breakdown

Initialize total = 0, tank = 0, start = 0.
For each station i:
- delta = gas[i] - cost[i]
- total += delta, tank += delta
- If tank < 0, set start = i + 1 and tank = 0.
After loop: if total < 0, return -1 else return start.

ASCII Diagram

gas  = [1, 2, 3, 4, 5]
cost = [3, 4, 5, 1, 2]
delta= [-2,-2,-2,+3,+3]

i=0: tank=-2 -> fail, start=1, tank=0
i=1: tank=-2 -> fail, start=2, tank=0
i=2: tank=-2 -> fail, start=3, tank=0
i=3: tank=+3
i=4: tank=+6
total = 0 (feasible)

Answer: start = 3
Route from 3: tank never goes negative over full cycle.

Python Implementation

def can_complete_circuit(gas: list[int], cost: list[int]) -> int:
    """
    Return start index to complete circular route once, or -1 if impossible.
    """
    total = 0
    tank = 0
    start = 0

    for i in range(len(gas)):
        delta = gas[i] - cost[i]
        total += delta
        tank += delta

        # Current candidate start cannot reach i+1
        if tank < 0:
            start = i + 1
            tank = 0

    return start if total >= 0 else -1

Line-by-Line Explanation

total tracks global feasibility. If negative at end, impossible for all starts.
tank tracks fuel for current candidate start segment.
When tank < 0 at i, current start fails before i+1.
Set start = i+1: all indices from old start..i are discarded as impossible.
Reset tank = 0 and continue scan.
Final answer is candidate start only if total is non-negative.

Correctness Intuition

Suppose we start at s and first fail at station i (tank becomes negative at i). Then cumulative sum from s to i is negative. Any station k between s and i has even less accumulated fuel to reach i+1 (because it skips some prefix that did not make sum positive enough). So none of those k can be valid starts. Hence skipping directly to i+1 is safe and complete.

Time Complexity

Single scan: O(n)
Each index processed once: O(n) overall

Space Complexity

Only a few scalar variables (total, tank, start)
Space: O(1)

Edge Cases

No feasible cycle: if total gas < total cost, answer is -1.
Single station: return 0 if gas[0] >= cost[0], else -1.
Multiple valid starts: some variants may allow multiple; this algorithm returns one valid candidate.
All zeros: total = 0, start 0 is valid.

Common Mistakes

Common Mistake: Forgetting global check total >= 0. Local resets alone are not enough to prove feasibility.

Common Mistake: Resetting start incorrectly (e.g., to i instead of i+1) after tank becomes negative.

Common Mistake: Assuming negative tank means "no solution". It only invalidates current candidate segment; another start may still work.

Optimization Insight: No extra data structures are required. This is already optimal O(n) time and O(1) space; focus on clear invariants and correctness argument during interviews.

Interview Insight: "Gas Station: maintain total and tank. If tank drops below zero at i, start cannot be in previous segment, so set start=i+1 and reset tank. At end, total>=0 implies start is valid; else return -1."

Pattern Recognition

When a circular traversal asks for a feasible start and failures invalidate contiguous ranges of starts, look for a greedy one-pass elimination strategy with a global feasibility check.

Practice Problems

LeetCode 134 — Gas Station.
Circular tour with petrol pumps (classic variant).
Adaptation: return all feasible starts (harder; requires additional reasoning).

Summary

Goal: Find start index to complete one circular trip, or -1.
Greedy: Reset start to i+1 whenever current tank becomes negative.
Feasibility: Total gas must be at least total cost.
Complexity: O(n) time, O(1) space.

Section 18: Computational Geometry

This section introduces computational geometry: solving algorithmic problems involving points, lines, vectors, polygons, and spatial relationships. You will learn core building blocks such as points and vectors, cross product, orientation tests, line intersection, polygon area, convex hull, and sweep line. The focus is on clean geometric intuition plus robust formulas that work in code.

18.1 Points & Vectors

Introduction

Points and vectors are the alphabet of computational geometry. Most geometry algorithms reduce to a few vector operations: subtraction, addition, scaling, dot product, and later cross product. If you deeply understand what a point is, what a vector is, and how to compute with them, topics like line intersection, convex hull, polygon area, and sweep line become much easier.

Real-World Analogy

A point is like a location pin on a map (where). A vector is like a movement instruction (how far and in which direction). For example, "start at (2,3)" is a point; "move by (+4, -1)" is a vector. Applying the vector to the point gives a new point: (6,2).

Formal Definition

Concept Note: In 2D geometry:

A point is a coordinate pair P = (x, y).
A vector is a directed displacement v = (vx, vy).
Vector from A to B is B - A = (Bx-Ax, By-Ay).
Point translation: P + v = (Px+vx, Py+vy).
Vector length: |v| = sqrt(vx^2 + vy^2).
Squared length (often preferred in code): |v|^2 = vx^2 + vy^2.

Why This Topic Matters

Foundation layer: Orientation, intersection, area, hulls all depend on point/vector operations.
Implementation reliability: Many bugs in geometry come from weak coordinate modeling and sign mistakes.
Interview readiness: Geometry questions often test whether you can convert picture intuition into vector math.

Mental Model

Points are positions; vectors are movements/directions.
You can subtract two points to get a vector.
You can add a vector to a point to get another point.
You can add/subtract vectors to combine movements.
Distance between points is length of their difference vector.

Step-by-Step Breakdown

Represent each point as (x, y).
To get direction from A to B, compute AB = B - A.
To move point P by vector v, compute P' = P + v.
To compare distances, prefer squared distance to avoid unnecessary square roots.
Use dot product to reason about projection/angle-type checks (perpendicular, acute, obtuse).

ASCII Diagram

Coordinate plane:

   y
   ^
 5 |                 B(5,4)
 4 |                *
 3 |         A(2,2) *
 2 |        *
 1 |
 0 +---------------------------------> x
    0 1 2 3 4 5 6

Vector AB = B - A = (5-2, 4-2) = (3,2)

Interpretation:
Start at A, move +3 in x and +2 in y to reach B.

Core Operations

1) Vector Addition and Subtraction

For vectors u=(ux,uy) and v=(vx,vy): u+v=(ux+vx, uy+vy), u-v=(ux-vx, uy-vy).

2) Scalar Multiplication

k*v = (k*vx, k*vy). This changes magnitude; sign of k can reverse direction.

3) Dot Product

u·v = ux*vx + uy*vy. Uses:

Length: |v|^2 = v·v
Orthogonal check: u·v = 0 means perpendicular.
Angle type: dot > 0 acute, dot = 0 right, dot < 0 obtuse.

4) Distance Between Points

For points A and B, distance: dist(A,B)=sqrt((Bx-Ax)^2 + (By-Ay)^2). In algorithm comparisons, use squared distance to avoid floating precision and sqrt cost.

Python Implementation

from dataclasses import dataclass
import math

@dataclass(frozen=True)
class Point:
    x: float
    y: float

def vector(a: Point, b: Point) -> Point:
    """Return vector AB = B - A."""
    return Point(b.x - a.x, b.y - a.y)

def add(u: Point, v: Point) -> Point:
    return Point(u.x + v.x, u.y + v.y)

def sub(u: Point, v: Point) -> Point:
    return Point(u.x - v.x, u.y - v.y)

def scale(v: Point, k: float) -> Point:
    return Point(v.x * k, v.y * k)

def dot(u: Point, v: Point) -> float:
    return u.x * v.x + u.y * v.y

def norm2(v: Point) -> float:
    """Squared length."""
    return dot(v, v)

def norm(v: Point) -> float:
    """Length."""
    return math.sqrt(norm2(v))

def dist2(a: Point, b: Point) -> float:
    return norm2(vector(a, b))

def dist(a: Point, b: Point) -> float:
    return math.sqrt(dist2(a, b))

Line-by-Line Explanation

Point holds coordinates; immutable (frozen=True) avoids accidental mutation bugs.
vector(a,b) computes displacement from a to b by subtraction.
add/sub/scale implement fundamental vector algebra operations.
dot computes scalar product used for projections and angle checks.
norm2 uses dot(v,v), preferred when only comparing distances.
dist2 and dist are point-to-point distance helpers.

Brute Force → Better → Optimal Thinking

Brute Force Habit

Beginners often compute Euclidean distance with sqrt everywhere, even when only ordering/comparing is needed.

Better Practice

Compare squared distances to avoid sqrt and reduce floating error exposure.

Optimal Geometry Style

Use integer arithmetic where possible (especially with input integers), defer floating operations until final output, and structure all geometry around reusable vector primitives.

Time Complexity

Each primitive operation (add/sub/dot/norm2/vector) is O(1).
Distance with sqrt is O(1) but more expensive constant factor than squared distance.
Geometry algorithms built from these primitives depend on number of points/segments (covered in later topics).

Space Complexity

Each operation uses O(1) extra space.
Algorithm-level memory depends on problem (e.g. hull arrays, sweep structures).

Edge Cases

Coincident points: A = B gives zero vector and zero distance.
Large coordinates: squared terms may overflow in fixed-width integer languages; use 64-bit types.
Floating precision: equality comparisons on floats should use epsilon tolerance.

Common Mistakes

Common Mistake: Confusing point subtraction direction: B-A is vector from A to B, not vice versa.

Common Mistake: Overusing sqrt in comparisons. Prefer squared distances for speed and stability.

Common Mistake: Treating points and vectors as the same concept everywhere. They share coordinate representation but have different meaning.

Expert Tip: Create tiny helper functions/classes early. Clean primitives dramatically reduce bugs in later topics like cross product, orientation, and segment intersection.

Interview Insight: "Model points as (x,y), vector AB as B-A, distance via norm of difference, and use dot product for angle/ projection reasoning. Keep squared distances when possible."

Pattern Recognition

If a geometry question involves movement, direction, distance, collinearity, turn direction, or projection, convert it to vector operations first. This algebraic translation is the key skill.

Practice Problems

Given two points A and B, compute vector AB, distance, and squared distance.
Given vectors u and v, determine if they are perpendicular using dot product.
Find the nearest point to origin from a list of points using squared distance.

Summary

Points represent location; vectors represent displacement.
Core operations: subtraction, addition, scaling, dot product, norm, distance.
Prefer squared distance for comparisons.
Strong point/vector fundamentals make all later computational geometry topics easier.

18.2 Cross Product

Introduction

The cross product is one of the most powerful tools in computational geometry. In 2D, it helps answer questions like:

Did we turn left or right?
Are three points clockwise, counterclockwise, or collinear?
What is the area of a triangle/parallelogram?

Even though "cross product" is a 3D vector operation in full linear algebra, in 2D geometry we use its scalar z-component. This scalar sign (+/−/0) is the heart of many geometry algorithms.

Real-World Analogy

Imagine walking from point A to B, then deciding where C lies relative to your direction. If C is to your left, that's a positive turn. If C is to your right, that's a negative turn. If C is straight ahead on the same line, turn is zero. Cross product gives this left/right/straight information instantly.

Formal Definition

Concept Note: For 2D vectors u = (ux, uy) and v = (vx, vy), define: cross(u, v) = ux*vy - uy*vx. This equals the z-component of the 3D cross product if vectors are embedded as (x, y, 0).

cross(u, v) > 0 : v is to the left of u (counterclockwise turn).
cross(u, v) < 0 : v is to the right of u (clockwise turn).
cross(u, v) = 0 : u and v are collinear.

Magnitude relation: |cross(u,v)| = |u||v|sin(theta), which equals area of the parallelogram formed by u and v.

Why This Topic Matters

Orientation test basis: Next topic (18.3) directly uses cross signs.
Area computations: Triangle and polygon area formulas use cross products.
Core in advanced geometry: Convex hull, segment intersection, winding checks all rely on cross product.

Mental Model

Cross product is a signed area measurement.
Sign tells direction of turn (left/right).
Absolute value tells "how much turn" geometrically (scaled area).
Zero means no turn (collinear points/vectors).

Step-by-Step: Cross for Three Points

Given points A, B, C, define vectors: AB = B - A, AC = C - A. Then orientation value: cross(AB, AC).

Compute AB = (Bx-Ax, By-Ay).
Compute AC = (Cx-Ax, Cy-Ay).
Evaluate AB.x * AC.y - AB.y * AC.x.
Interpret sign:
- > 0: A→B→C is counterclockwise
- < 0: A→B→C is clockwise
- = 0: A, B, C are collinear

ASCII Diagram

Case 1: Left turn (positive cross)

      C
     *
    /
   /
A *------* B

AB points right, AC points up-right -> cross(AB, AC) > 0

Case 2: Right turn (negative cross)

A *------* B
    \
     \
      * C

AB points right, AC points down-right -> cross(AB, AC) < 0

Area Interpretation

For vectors u and v:

Parallelogram area = |cross(u, v)|
Triangle area (with sides u and v from same origin) = |cross(u, v)| / 2

For triangle ABC: area = |cross(B-A, C-A)| / 2.

Python Implementation

from dataclasses import dataclass

@dataclass(frozen=True)
class Point:
    x: float
    y: float

def cross(u: Point, v: Point) -> float:
    """2D cross product (scalar z-component)."""
    return u.x * v.y - u.y * v.x

def vector(a: Point, b: Point) -> Point:
    return Point(b.x - a.x, b.y - a.y)

def orientation(a: Point, b: Point, c: Point) -> float:
    """Positive: left turn, Negative: right turn, Zero: collinear."""
    return cross(vector(a, b), vector(a, c))

def triangle_area2(a: Point, b: Point, c: Point) -> float:
    """Twice the signed area of triangle ABC."""
    return orientation(a, b, c)

def triangle_area(a: Point, b: Point, c: Point) -> float:
    """Unsigned area of triangle ABC."""
    return abs(triangle_area2(a, b, c)) / 2.0

Line-by-Line Explanation

cross(u, v) computes the determinant of [u; v].
orientation(a,b,c) is cross of AB and AC, the standard turn test primitive.
triangle_area2 returns signed double area (very common in geometry to avoid division).
triangle_area takes absolute value and divides by 2 for actual area.

Brute Force → Better → Optimal Thinking

Brute Force Habit

Using trigonometry (angles, arccos, atan2) for left/right checks is slower and numerically fragile.

Better Practice

Use determinant/cross directly for orientation and area checks.

Optimal Geometry Style

Prefer integer arithmetic cross computations when coordinates are integers; avoid floating operations unless final answer requires decimals.

Time Complexity

Cross product of two vectors: O(1).
Orientation for three points: O(1).
Triangle area from cross: O(1).

Space Complexity

All primitive calculations use O(1) extra space.

Edge Cases

Collinear points: cross = 0 (or very close to 0 with floats).
Large coordinates: products may overflow in 32-bit ints; use 64-bit in C++/Java.
Floating inputs: compare against epsilon, e.g., abs(val) < 1e-9.

Common Mistakes

Common Mistake: Swapping vector order accidentally. cross(u,v) = -cross(v,u), so sign flips.

Common Mistake: Using points directly without forming vectors from a common origin in orientation checks.

Common Mistake: Treating near-zero floating result as exact zero without epsilon.

Expert Tip: In geometry code, define one reusable function like orient(a,b,c) early and use it everywhere (hull, intersection, polygon routines). Consistency prevents sign bugs.

Interview Insight: "For 2D vectors u and v, cross = ux*vy - uy*vx. Sign tells turn direction, abs gives area scale. Orientation(a,b,c) = cross(b-a, c-a). This primitive powers many geometry problems."

Pattern Recognition

If a question asks left/right turn, clockwise/counterclockwise order, collinearity, or area of triangle/ polygon pieces, cross product is usually the first tool to try.

Practice Problems

Given A, B, C, determine orientation (CW/CCW/collinear).
Compute area of triangle from three points using cross product.
Given a polyline of points, count how many left turns occur.

Summary

Formula: cross(u,v)=ux*vy-uy*vx.
Sign meaning: positive left turn, negative right turn, zero collinear.
Area meaning: |cross| = parallelogram area, |cross|/2 = triangle area.
Complexity: O(1) time and space for primitive operations.

18.3 Orientation Test

Introduction

The orientation test determines whether three points A, B, C make a left turn (counterclockwise), a right turn (clockwise), or are collinear. This tiny O(1) primitive is one of the most used checks in computational geometry and appears in convex hulls, segment intersection, polygon algorithms, and sweep-line methods.

Real-World Analogy

Imagine standing at A, walking to B, and then deciding where C lies relative to your facing direction. If C is to your left, it is a counterclockwise turn. If it is to your right, clockwise. If straight ahead on the same line, collinear. The orientation test computes this exactly from coordinates.

Formal Definition

Concept Note: For points A(ax, ay), B(bx, by), C(cx, cy), define: orient(A,B,C) = (bx-ax)*(cy-ay) - (by-ay)*(cx-ax). This is the 2D cross product of vectors AB and AC.

orient > 0 => counterclockwise (left turn)
orient < 0 => clockwise (right turn)
orient = 0 => collinear

Geometrically, |orient|/2 is area of triangle ABC.

Why This Topic Matters

Core geometry primitive: Many geometry algorithms reduce to repeated orientation checks.
Intersection logic: Segment intersection is mostly orientation comparisons + boundary checks.
Convex hull decisions: Hull construction repeatedly removes points that cause wrong turn direction.

Mental Model

Orientation is the signed "turn amount" from AB to AC.
Positive sign means C is to the left of directed line AB.
Negative sign means C is to the right.
Zero means A, B, C lie on one line.

Step-by-Step Breakdown

Given points A, B, C, compute vector AB and vector AC.
Compute determinant: (Bx-Ax)*(Cy-Ay) - (By-Ay)*(Cx-Ax).
Check sign:
- positive -> CCW
- negative -> CW
- zero -> collinear
Use this result as building block in larger algorithms.

ASCII Diagram

1) Counterclockwise (left turn)

      C
     *
    /
   /
A *------* B

orient(A,B,C) > 0

2) Clockwise (right turn)

A *------* B
    \
     \
      * C

orient(A,B,C) < 0

3) Collinear

A *-----*-----* C
        B

orient(A,B,C) = 0

Python Implementation

from dataclasses import dataclass

@dataclass(frozen=True)
class Point:
    x: int
    y: int

def orient(a: Point, b: Point, c: Point) -> int:
    """
    Returns:
      >0 for counterclockwise
      <0 for clockwise
       0 for collinear
    """
    return (b.x - a.x) * (c.y - a.y) - (b.y - a.y) * (c.x - a.x)

def orientation_label(a: Point, b: Point, c: Point) -> str:
    v = orient(a, b, c)
    if v > 0:
        return "CCW"
    if v < 0:
        return "CW"
    return "COLLINEAR"

Line-by-Line Explanation

orient computes determinant/cross of AB and AC.
No trigonometry is needed; only integer arithmetic for integer coordinates.
orientation_label maps numeric sign to readable category for debugging/interview output.

Application: Segment Intersection Skeleton

For segments AB and CD, compute: o1 = orient(A,B,C), o2 = orient(A,B,D), o3 = orient(C,D,A), o4 = orient(C,D,B). General intersection occurs when o1 and o2 have opposite signs and o3 and o4 have opposite signs. Collinear edge cases require on-segment checks.

Brute Force → Better → Optimal Thinking

Brute Force Habit

Using slope comparisons (dy/dx) can fail on vertical lines and introduces floating precision issues.

Better Practice

Replace slope logic with orientation determinant. No division required, robust for vertical/horizontal lines.

Optimal Geometry Style

Build all geometry predicates from orient and avoid floating arithmetic unless unavoidable. This improves correctness and performance.

Time Complexity

Single orientation query: O(1).
Four orientation checks for segment intersection core: O(1).

Space Complexity

Orientation computation uses O(1) extra space.

Edge Cases

Duplicate points: if A=B or B=C etc., orientation can be 0 and needs problem-specific handling.
Collinearity: orientation=0 alone does not imply overlap/intersection; bounding-box checks may be needed.
Large coordinates: use 64-bit integer types in fixed-width languages to avoid overflow.
Floating coordinates: use epsilon threshold for near-zero values.

Common Mistakes

Common Mistake: Reversing argument order accidentally. orient(A,B,C) = -orient(A,C,B); sign flips and logic breaks.

Common Mistake: Using slope method and dividing by zero on vertical lines.

Common Mistake: Ignoring collinear edge handling in intersection or hull problems.

Optimization Insight: Since orientation is O(1), performance bottlenecks usually come from how often you call it (algorithm design), not from the formula itself. Cache repeated orientation calls in heavy loops if needed.

Interview Insight: "Orientation(A,B,C) via cross(B-A, C-A) tells left/right/collinear in O(1). Prefer it over slopes because it avoids division and precision pitfalls. It's the backbone of segment intersection and convex hull."

Pattern Recognition

If a geometry problem asks any of these: clockwise vs counterclockwise, turn direction, collinearity, or relative side of a directed line, immediately think orient(a,b,c).

Practice Problems

Given triples of points, classify each as CW, CCW, or collinear.
Implement segment intersection using orientation + on-segment checks.
Use orientation to filter non-left turns in a convex hull stack routine.

Summary

Orientation formula: (Bx-Ax)*(Cy-Ay) - (By-Ay)*(Cx-Ax).
Sign tells direction: + CCW, - CW, 0 collinear.
Complexity: O(1) time and O(1) space.
This primitive powers intersection tests, hull algorithms, and many geometry predicates.

18.4 Line Intersection

Introduction

Line Intersection problems ask whether two lines or line segments intersect, and sometimes where they intersect. In competitive programming and interviews, this usually means:

Do two segments AB and CD intersect?
If they intersect at one point, what is that point?
How do we handle collinear overlaps and touching endpoints?

The key tool is the orientation test from 18.3. With a few orientation checks plus boundary checks, we can robustly detect intersection in O(1).

Real-World Analogy

Think of two roads drawn on a map. They may cross at an interior point, just touch at one endpoint, run parallel forever, or overlap partially if they are on the same line. Segment intersection logic is exactly classifying these possibilities using coordinate math.

Formal Definition

Concept Note: For segments AB and CD:

General intersection occurs when C and D lie on opposite sides of line AB, and A and B lie on opposite sides of line CD.
Using orientation: o1 = orient(A,B,C), o2 = orient(A,B,D), o3 = orient(C,D,A), o4 = orient(C,D,B).
General case intersect if o1*o2 < 0 and o3*o4 < 0.
Special collinear cases require "point on segment" checks.

Why This Topic Matters

Geometry backbone: Segment intersection is used in polygon algorithms, clipping, collision detection, and sweep-line methods.
Interview frequent: Often appears directly or as a subroutine in harder geometry tasks.
Precision practice: Teaches careful handling of edge cases (collinear, endpoints, overlap).

Mental Model

Each segment defines a directed line and a side test (orientation sign).
If endpoints of one segment fall on different sides of the other line, they "straddle" it.
Mutual straddling implies proper intersection.
If orientations are zero, points are collinear and you must check interval overlap.

Step-by-Step Segment Intersection Test

Compute o1=orient(A,B,C), o2=orient(A,B,D), o3=orient(C,D,A), o4=orient(C,D,B).
If o1 and o2 have opposite signs and o3 and o4 have opposite signs, return intersect.
Handle special cases when any orientation is 0:
- If o1==0 and C is on segment AB -> intersect.
- If o2==0 and D is on segment AB -> intersect.
- If o3==0 and A is on segment CD -> intersect.
- If o4==0 and B is on segment CD -> intersect.
Else, no intersection.

ASCII Diagram

1) Proper intersection (crossing)

A *------\
          \
           * X
          /
C *------/
       B      D

Segments AB and CD cross at interior point X.

2) Touching at endpoint

A *-----* B
        |
        |
        * C

B is shared endpoint -> intersection exists.

3) Collinear overlap

A *---------* B
      C *---------* D

Same line, overlapping range -> intersection exists.

Python Implementation (Boolean Intersection)

from dataclasses import dataclass

@dataclass(frozen=True)
class Point:
    x: int
    y: int

def orient(a: Point, b: Point, c: Point) -> int:
    return (b.x - a.x) * (c.y - a.y) - (b.y - a.y) * (c.x - a.x)

def on_segment(a: Point, b: Point, p: Point) -> bool:
    """Assumes p is collinear with segment ab; checks bounding box inclusion."""
    return (min(a.x, b.x) <= p.x <= max(a.x, b.x) and
            min(a.y, b.y) <= p.y <= max(a.y, b.y))

def segments_intersect(a: Point, b: Point, c: Point, d: Point) -> bool:
    o1 = orient(a, b, c)
    o2 = orient(a, b, d)
    o3 = orient(c, d, a)
    o4 = orient(c, d, b)

    # General case: strict opposite signs
    if (o1 > 0 and o2 < 0 or o1 < 0 and o2 > 0) and \
       (o3 > 0 and o4 < 0 or o3 < 0 and o4 > 0):
        return True

    # Special collinear cases
    if o1 == 0 and on_segment(a, b, c):
        return True
    if o2 == 0 and on_segment(a, b, d):
        return True
    if o3 == 0 and on_segment(c, d, a):
        return True
    if o4 == 0 and on_segment(c, d, b):
        return True

    return False

Line-by-Line Explanation

orient gives relative side/turn information in O(1).
on_segment uses bounding box to verify collinear point lies within segment endpoints.
General case checks strict sign opposition for mutual straddling.
Special cases catch endpoint touching and collinear overlap.

Computing Exact Intersection Point (Non-Parallel Lines)

For infinite lines AB and CD, if they are not parallel, you can compute intersection using determinants. Let:

den = (x1-x2)*(y3-y4) - (y1-y2)*(x3-x4)

If den == 0, lines are parallel (or coincident). Otherwise:

px = ((x1*y2 - y1*x2)*(x3-x4) - (x1-x2)*(x3*y4 - y3*x4)) / den
py = ((x1*y2 - y1*x2)*(y3-y4) - (y1-y2)*(x3*y4 - y3*x4)) / den

This gives line-line intersection point. For segment-segment intersection, additionally ensure point lies on both segments.

Brute Force → Better → Optimal Thinking

Brute Force Habit

Solve line equations with floating-point slopes/intercepts and compare ranges manually. This is error-prone for vertical lines and precision corner cases.

Better Practice

Use orientation signs for robust boolean intersection, then compute exact point only when necessary.

Optimal Predicate Style

Integer orientation + bounding checks gives fast, robust O(1) predicate suitable for high-volume geometry tasks.

Time Complexity

Constant number of orientation and min/max operations.
Overall: O(1) per segment pair query.

Space Complexity

Only fixed temporary variables.
Overall: O(1).

Edge Cases

Shared endpoint: counts as intersection in most definitions.
Collinear disjoint: orientations may be 0 but bounding boxes do not overlap.
Collinear overlapping: intersection exists as a segment (not unique point).
Parallel non-collinear lines: no intersection.

Common Mistakes

Common Mistake: Checking only general case and forgetting collinear endpoint/overlap cases.

Common Mistake: Using slope-based formulas that divide by zero for vertical lines.

Common Mistake: Comparing floating values directly without tolerance when coordinates are non-integers.

Optimization Insight: In algorithms needing many intersection tests (e.g., sweep line), keep orientation and bounding checks as branch-light helper functions. Predicate efficiency matters more than one-time formulas.

Interview Insight: "For segment intersection, compute four orientations. Opposite signs on both pairs => proper intersection. Then handle collinear on-segment cases. This O(1) predicate is standard and robust."

Pattern Recognition

If a problem involves collision/contact of line segments, polygon edge crossing, ray casting, or planar graph edge checks, line intersection predicate is a core building block.

Practice Problems

Implement boolean segment intersection for integer points.
Return exact intersection point for two non-parallel infinite lines.
Given many segments, count intersecting pairs (naive O(n^2), then optimize later with sweep line).

Summary

Use orient signs to test relative sides and mutual straddling.
Always include collinear on-segment edge cases.
Boolean segment intersection runs in O(1) time and O(1) space per query.
This predicate is foundational for advanced computational geometry algorithms.

18.5 Polygon Area

Introduction

The Polygon Area problem asks you to compute the area enclosed by a polygon given its vertices in order. The standard and most important method is the Shoelace Formula (also called Gauss's area formula), which runs in O(n) for n vertices and works for any simple polygon (convex or concave).

This topic is foundational in computational geometry because many advanced tasks (centroid, winding, clipping, lattice geometry) build on the same cross-product summation pattern.

Real-World Analogy

Imagine tracing a boundary on a map with GPS points in order. Instead of decomposing manually into many triangles and adding areas one by one, shoelace gives one clean loop: pair each point with the next point, compute a signed cross term, sum them, then divide by 2.

Formal Definition

Concept Note: For polygon vertices P0, P1, ..., P(n-1) in cyclic order, with Pi = (xi, yi), define: signed_area2 = sum(i=0..n-1) (xi * y(i+1) - yi * x(i+1)), where index i+1 wraps around modulo n.

Signed area: A_signed = signed_area2 / 2
Actual area: A = |A_signed|

Sign indicates orientation: counterclockwise ordering gives positive signed area, clockwise gives negative.

Why This Topic Matters

Interview frequent: "Compute area of polygon from vertices" is a common geometry prompt.
Cross-product pattern: Reinforces orientation/cross ideas from 18.2 and 18.3.
Algorithmic utility: Needed in polygon processing, GIS, graphics, and robotics mapping.

Mental Model

Area can be seen as sum of signed triangle/parallelogram contributions from each edge.
Each edge Pi -> P(i+1) contributes cross(Pi, P(i+1)) / 2.
Positive/negative contributions cancel correctly for concave shapes.
Absolute value at the end gives geometric area.

Step-by-Step Shoelace Computation

Initialize area2 = 0 (this stores twice signed area).
For each vertex i, let j = (i+1) mod n.
Add term: xi * yj - yi * xj to area2.
After loop, signed area = area2 / 2.
Return abs(area2) / 2 for actual polygon area.

ASCII Diagram

Example polygon (rectangle):
P0(0,0), P1(4,0), P2(4,3), P3(0,3)

Shoelace table:
 i   (xi,yi)    (xj,yj)      xi*yj - yi*xj
 0   (0,0)   -> (4,0)        0*0 - 0*4 = 0
 1   (4,0)   -> (4,3)        4*3 - 0*4 = 12
 2   (4,3)   -> (0,3)        4*3 - 3*0 = 12
 3   (0,3)   -> (0,0)        0*0 - 3*0 = 0

area2 = 24
Area = |24| / 2 = 12

Python Implementation

from dataclasses import dataclass

@dataclass(frozen=True)
class Point:
    x: float
    y: float

def polygon_area(points: list[Point]) -> float:
    """
    Returns absolute area of a simple polygon.
    Points must be given in boundary order (clockwise or counterclockwise).
    """
    n = len(points)
    if n < 3:
        return 0.0

    area2 = 0.0
    for i in range(n):
        j = (i + 1) % n
        area2 += points[i].x * points[j].y - points[i].y * points[j].x

    return abs(area2) / 2.0

def signed_polygon_area(points: list[Point]) -> float:
    """
    Positive for counterclockwise ordering, negative for clockwise.
    """
    n = len(points)
    if n < 3:
        return 0.0

    area2 = 0.0
    for i in range(n):
        j = (i + 1) % n
        area2 += points[i].x * points[j].y - points[i].y * points[j].x

    return area2 / 2.0

Line-by-Line Explanation

n < 3 returns 0 because fewer than 3 points cannot enclose positive area.
j = (i + 1) % n wraps last vertex back to first, closing polygon.
Each loop adds one shoelace cross term.
polygon_area uses absolute value to return geometric area.
signed_polygon_area preserves orientation information (useful in many algorithms).

Brute Force → Better → Optimal

Brute Force

Triangulate manually from a fixed point and sum triangle areas. Works but can be more error-prone and verbose.

Better

Use cross products edge-by-edge to accumulate signed contributions.

Optimal (for ordered vertices)

Shoelace formula gives O(n) time and O(1) extra space, which is optimal since every vertex must be read.

Time Complexity

Single pass over n vertices.
Overall: O(n).

Space Complexity

Only constant extra variables (area2, indices).
Overall: O(1) extra space.

Edge Cases

Less than 3 points: area is 0.
Collinear polygon points: area can be 0.
Clockwise vertex order: signed area negative; absolute still correct.
Repeated first point at end: either handle directly or preprocess to avoid duplicate closing point.

Common Mistakes

Common Mistake: Forgetting wrap-around term from last point to first point.

Common Mistake: Using unordered points. Shoelace requires vertices in boundary order.

Common Mistake: Returning signed area without absolute value when problem asks geometric area.

Optimization Insight: If coordinates are integers, keep area2 as integer for exact arithmetic and divide at the end. This avoids floating precision drift in large polygons.

Interview Insight: "Polygon area in O(n): sum cross terms xi*y(i+1) - yi*x(i+1), wrap around with modulo, area = abs(sum)/2. Signed area sign also reveals vertex orientation."

Pattern Recognition

Whenever you are given polygon vertices in order and asked for area (or orientation), think shoelace immediately. The same cross-sum appears in centroid and winding-based formulas.

Practice Problems

Compute area of convex and concave polygons from ordered vertices.
Determine whether vertex order is clockwise or counterclockwise via signed area.
Given polygon points, remove duplicate final closing point and recompute area robustly.

Summary

Main formula: Area = |sum(xi*y(i+1) - yi*x(i+1))| / 2.
Works for simple convex/concave polygons with ordered vertices.
Complexity: O(n) time, O(1) extra space.
Signed area gives orientation; absolute gives geometric area.

18.6 Convex Hull

Introduction

The Convex Hull of a set of points is the smallest convex polygon that contains all points. A classic visualization is: put nails on a board at each point and stretch a rubber band around them; when released, the band outlines the convex hull.

Convex hull is a cornerstone geometry problem. Many higher-level tasks (diameter of points, rotating calipers, collision boundaries, farthest pair, shape simplification) start by computing the hull.

Real-World Analogy

Imagine plotting all delivery locations on a map. If you want a minimal outer boundary that encloses all locations, you draw the boundary around the outermost points only. Inner points do not matter for the boundary. That outer boundary is the convex hull.

Formal Definition

Concept Note: Given points in 2D, the convex hull is the minimal convex set containing all points. In polygon form, hull vertices are a subset of input points, ordered around boundary. A point lies on hull if it cannot be expressed as a strict convex combination of other points.

Why This Topic Matters

Foundational geometry primitive: Used by many optimization and distance problems.
Interview significance: Tests sorting + orientation + stack-like construction.
Algorithmic pattern: "Maintain valid boundary, pop when turn is wrong" appears in several geometry routines.

Mental Model

Sort points left-to-right.
Build lower boundary from left to right.
Build upper boundary from right to left.
While adding a point, if last turn is not counterclockwise (or not strictly, depending on collinear policy), remove middle point.
Combine boundaries to form full hull.

Approach Choice

There are multiple hull algorithms:

Jarvis March (Gift Wrapping): O(nh), where h = hull points.
Graham Scan: O(n log n).
Monotonic Chain (Andrew): O(n log n), easy to implement and interview-friendly.

We use Monotonic Chain because it is concise, robust, and directly uses orientation tests.

Brute Force → Better → Optimal

Brute Force

For each pair of points (A, B), check if all other points lie on one side of line AB. If yes, AB is hull edge. Complexity O(n^3), too slow for large n.

Better Intuition

Hull is boundary-only, so interior points should be eliminated quickly when they cause inward turns.

Optimal Practical Method

Sort points and use orientation-based stack popping. Complexity O(n log n) due to sorting; linear pass after sort.

Step-by-Step (Monotonic Chain)

Sort unique points lexicographically by (x, y).
Build lower hull:
- Iterate sorted points.
- While last two points + new point make non-left turn, pop last point.
- Append new point.
Build upper hull similarly using reversed order.
Concatenate lower and upper, excluding duplicate endpoints.

ASCII Diagram

Points:
        *           *
    *       *   *
  *   *  *      *
        *

Hull uses only outer boundary points:
        H-----------H
      /               \
    H                   H
      \               /
        H-----------H

Interior points are ignored.

Python Implementation

from dataclasses import dataclass

@dataclass(frozen=True, order=True)
class Point:
    x: int
    y: int

def cross(o: Point, a: Point, b: Point) -> int:
    """
    Cross of OA x OB where O is origin point 'o'.
    Positive => counterclockwise turn.
    """
    return (a.x - o.x) * (b.y - o.y) - (a.y - o.y) * (b.x - o.x)

def convex_hull(points: list[Point]) -> list[Point]:
    """
    Monotonic chain convex hull.
    Returns hull vertices in counterclockwise order without repeating first point.
    """
    pts = sorted(set(points))
    n = len(pts)
    if n <= 1:
        return pts

    lower = []
    for p in pts:
        while len(lower) >= 2 and cross(lower[-2], lower[-1], p) <= 0:
            lower.pop()
        lower.append(p)

    upper = []
    for p in reversed(pts):
        while len(upper) >= 2 and cross(upper[-2], upper[-1], p) <= 0:
            upper.pop()
        upper.append(p)

    # Remove duplicate endpoints
    return lower[:-1] + upper[:-1]

Line-by-Line Explanation

sorted(set(points)) removes duplicates and sorts by x, then y.
cross(lower[-2], lower[-1], p) checks turn direction of last edge with new point.
<= 0 pops non-left turns, producing strictly convex boundary (collinear middle points removed).
Upper hull repeats same logic in reverse order.
Last point of lower and upper duplicates first point of the other boundary, so we slice with [:-1].

Collinear Policy (Important)

The condition cross <= 0 removes collinear points on edges, keeping only extreme endpoints. If you want to keep all boundary collinear points, change condition to cross < 0. Interviewers may ask this variant explicitly.

Time Complexity

Sorting: O(n log n)
Lower + Upper passes: O(n) amortized (each point pushed/popped at most once per pass)
Overall: O(n log n)

Space Complexity

Hull stacks and sorted list require O(n) space.
Overall: O(n).

Edge Cases

0 or 1 point: hull is the same set.
2 points: hull is both points.
All points collinear: with cross <= 0, result has two endpoints only.
Duplicate points: remove before processing.

Common Mistakes

Common Mistake: Forgetting to sort points first. Monotonic chain relies on sorted order.

Common Mistake: Wrong turn condition sign (< vs >) causing inside-out hull.

Common Mistake: Returning duplicated start/end points or missing endpoint slices in concatenation.

Optimization Insight: If points are already sorted and deduplicated, hull construction is O(n). In repeated-query settings, preprocessing sort once can save significant runtime.

Interview Insight: "Convex hull (Andrew): sort points, build lower and upper hull with orientation/cross checks and stack pops. Use cross<=0 for strict hull endpoints only, or cross<0 to keep collinear boundary points. O(n log n)."

Pattern Recognition

If a problem asks for outer boundary, minimal enclosing polygon from points, or wants to ignore interior points before further computation (farthest pair, perimeter), convex hull is usually the first step.

Practice Problems

LeetCode 587 — Erect the Fence (convex hull with collinear boundary points kept).
Compute perimeter/area of convex hull of given points.
Given points, find farthest pair by first computing hull (rotating calipers extension).

Summary

Goal: smallest convex polygon containing all points.
Method: sort + lower/upper hull with orientation-based popping.
Complexity: O(n log n) time, O(n) space.
Key detail: choose collinear policy via cross condition.

18.7 Sweep Line Algorithm

Introduction

A Sweep Line Algorithm is a design technique for geometry problems where we move an imaginary line (usually left to right) across the plane and process events in sorted order. Instead of checking all pairs (often O(n^2)), we maintain only the currently relevant objects in an active set, which often reduces complexity to O(n log n).

Sweep line is not one single algorithm; it is a pattern used in many tasks: segment intersection detection, union of intervals, closest pair of points, skyline, rectangle overlap, and area/coverage computations.

Real-World Analogy

Imagine a vertical scanner moving across a map from left to right. At each x-position where something changes (an object starts or ends), you update your current "active" objects. You do not care about objects far away from the scanner because they cannot interact right now. This local focus is what makes sweep line fast.

Formal Definition

Concept Note: Sweep line framework has three components:

Events: Critical x (or y) coordinates where state changes (e.g., segment start/end).
Active set: Data structure of objects intersecting current sweep position.
Update/query rules: Insert/remove objects at events and check only local neighbors or aggregate state.

Typical data structures: sorted list, balanced BST / ordered set, Fenwick/segment tree (for range sweeps), priority queue (for dynamic event generation).

Why This Topic Matters

Major complexity drop: Converts many O(n^2) geometric checks into O(n log n).
Interview and contest value: Appears in advanced geometry and interval problems.
Reusable paradigm: Same event-sorting + active-structure idea works across domains.

Mental Model

Sort events by sweep coordinate.
At each event, update active set.
Only nearby/adjacent items in active set can create new interactions.
Maintain an invariant about what active set represents at current sweep position.

Core Sweep Template

events = sorted(all_events)
active = OrderedStructure()

for event in events:
    if event.type == "start":
        active.insert(event.object)
        # check possible interactions with neighbors
    elif event.type == "end":
        # check neighbors that become adjacent after removal
        active.remove(event.object)
    else:
        # custom event handling (if problem generates extra events)
        process(event)

Step-by-Step Example: Detect Any Overlapping Intervals

Problem: given intervals on a number line, determine whether any two overlap. This is a 1D sweep line version that clearly shows the pattern.

Convert each interval [l, r] into two events: (l, start), (r, end).
Sort events by position; if equal position, process starts before ends (or define based on closed/open interval policy).
Maintain active_count = number of currently open intervals.
On start: if active_count > 0, overlap exists; increment count.
On end: decrement count.

ASCII Diagram

Intervals:
  I1: [1,5]
  I2: [3,7]
  I3: [8,10]

Events sorted:
  x=1 start(I1)
  x=3 start(I2)  -> active already non-empty => overlap found
  x=5 end(I1)
  x=7 end(I2)
  x=8 start(I3)
  x=10 end(I3)

Python Implementation (1D Sweep)

def has_overlap(intervals: list[tuple[int, int]]) -> bool:
    """
    Return True if any two closed intervals overlap.
    """
    events = []
    for l, r in intervals:
        events.append((l, 0))  # 0 = start
        events.append((r, 1))  # 1 = end

    # For closed intervals [l,r], start before end at same coordinate means touching counts as overlap.
    events.sort(key=lambda x: (x[0], x[1]))

    active = 0
    for _, typ in events:
        if typ == 0:  # start
            if active > 0:
                return True
            active += 1
        else:         # end
            active -= 1

    return False

Line-by-Line Explanation

Each interval contributes two events.
Sorting determines the exact sweep processing order.
active tracks how many intervals are currently open.
When a new interval starts while one is active, overlap is detected.

2D Sweep Insight: Segment Intersection (High Level)

In 2D segment intersection (Bentley-Ottmann style), events are segment endpoints (and sometimes discovered intersections), and active set stores segments ordered by y-coordinate at current sweep x. Only neighboring segments in this order need intersection checks, which is the key optimization from O(n^2) naive checks.

Brute Force → Better → Optimal

Brute Force

Check all pairs of objects (e.g., all segment pairs) -> O(n^2).

Better

Sort by one coordinate and process incrementally while keeping active candidates.

Optimal Pattern (for many sweep problems)

Event sorting O(n log n) + logarithmic active structure operations per event -> O(n log n) (or O((n+k) log n) when k intersections/events are produced).

Time Complexity

General sweep pattern: O(E log E + updates/queries), where E is number of events.
Typical case: O(n log n) for static event sets.
Intersection-reporting variants: often O((n + k) log n), where k is reported intersections.

Space Complexity

Events list + active structure usually require O(n) space.
Output-sensitive variants add O(k) output storage.

Edge Cases

Tied events: tie-break order (start/end/intersection) must match problem semantics.
Touching boundaries: decide whether touching counts as overlap/intersection.
Duplicate objects: ensure deterministic handling to avoid double counting.
Floating geometry: precision issues can break event ordering and active comparisons.

Common Mistakes

Common Mistake: Wrong tie-breaking at same coordinate, causing missed or extra intersections/overlaps.

Common Mistake: Using unsorted or unstable active representation when ordered neighbor checks are required.

Common Mistake: Forgetting to remove ended objects from active set, causing false positives.

Optimization Insight: Keep event objects lightweight and precompute comparable keys used in sorting/active ordering. In heavy test cases, data-structure choice (balanced BST vs array/bisect) dominates runtime.

Interview Insight: "Sweep line = sort events, maintain active set, process local interactions only. Complexity typically drops from O(n^2) to O(n log n). Clearly state event type, active invariant, and tie-break rules."

Pattern Recognition

If a problem has many objects along one axis and asks for overlaps/intersections/coverage while scanning position, think sweep line. Keywords: events, active intervals/segments, sorted endpoints, local neighbor checks.

Practice Problems

Detect if any intervals overlap (1D sweep).
Compute maximum number of overlapping intervals (meeting rooms variant).
Count/report segment intersections (advanced sweep with ordered active set).
Rectangle union area with line sweep + segment tree (advanced).

Summary

Idea: move an imaginary line, process sorted events, maintain active set.
Benefit: avoids all-pairs checks by focusing on currently relevant objects.
Typical complexity: O(n log n) or output-sensitive O((n+k) log n).
Key correctness detail: define and preserve event ordering + active-set invariant.

Section 19: Advanced Data Structures

This section covers advanced structures used when arrays, hash maps, and basic trees are not enough. You will learn ordered sets/maps, Treaps, Skip Lists, B-Trees, KD Trees, Persistent Segment Trees, and Rope. The common theme is maintaining order or rich query power with efficient updates.

19.1 Ordered Set / Map

Introduction

An Ordered Set stores unique keys in sorted order. An Ordered Map stores key-value pairs with keys kept in sorted order. Unlike hash-based structures, ordered structures support powerful operations such as:

find smallest/largest key quickly
find predecessor/successor of a key
iterate keys in sorted order
query/count in key ranges efficiently

These are typically implemented using self-balancing BSTs (Red-Black Tree, AVL, etc.), giving O(log n) update/query times.

Real-World Analogy

Think of a dictionary shelf arranged alphabetically. You can quickly find words near a target word, and easily browse in order. A hash table is like a random box with labels: great for exact lookup, but poor for "next bigger word" or "all words between A and B". Ordered set/map is the alphabetically organized shelf.

Formal Definition

Concept Note:

Ordered Set: collection of distinct keys from a totally ordered domain.
Ordered Map: mapping key -> value with unique ordered keys.
Core ops (n = number of keys): insert/delete/search = O(log n), next/prev = O(log n), in-order traversal = O(n).
Typical backend: balanced BST, so tree height stays O(log n).

Why This Topic Matters

Range and neighbor queries: Essential in problems involving closest values, interval boundaries, and sorted dynamic sets.
Sweep line active sets: Many geometry/interval algorithms need ordered active structures.
Interview depth: Knowing when hash map is insufficient and ordered map is required is a key design skill.

Mental Model

Hash map: fast exact key lookup, no order guarantees.
Ordered map/set: slightly slower exact lookup (log n), but rich order-aware operations.
Use ordered structures when "next/previous/range" appears in requirements.

Core Operations (Conceptual)

Ordered Set

add(x), remove(x), contains(x)
first(), last()
lower(x) (greatest key < x), higher(x) (smallest key > x)
floor(x) (greatest key <= x), ceiling(x) (smallest key >= x)
iterate in sorted order

Ordered Map

put(key, value), get(key), remove(key)
All ordered-key neighbor operations from set apply to keys.
Range views/queries on key intervals in many language libraries.

Step-by-Step Usage Pattern

Insert elements as stream arrives.
For each new element x, query predecessor/successor to find nearest existing values.
Optionally update answer from range query on [L, R].
Delete elements when sliding window or active interval moves.

This pattern appears in nearest-neighbor updates, balanced window statistics, and sweep line active sets.

ASCII Diagram

Ordered set keys:
  2, 5, 8, 13, 21

For x = 10:
  predecessor (lower/floor) = 8
  successor   (higher/ceiling) = 13

For range [5, 13]:
  keys in range -> 5, 8, 13

Python Implementation Notes

Python standard library has no built-in tree-based ordered set/map. Common options:

bisect + sorted list: simple, but insertion/deletion O(n) due to list shifts.
third-party SortedContainers: near O(log n) operations, very practical.
heapq: good for min/max only, not general ordered predecessor/successor.

Example with bisect (educational baseline)

import bisect

class OrderedSetList:
    def __init__(self):
        self.a = []

    def add(self, x: int) -> None:
        i = bisect.bisect_left(self.a, x)
        if i == len(self.a) or self.a[i] != x:
            self.a.insert(i, x)  # O(n) shift

    def remove(self, x: int) -> None:
        i = bisect.bisect_left(self.a, x)
        if i < len(self.a) and self.a[i] == x:
            self.a.pop(i)       # O(n) shift

    def contains(self, x: int) -> bool:
        i = bisect.bisect_left(self.a, x)
        return i < len(self.a) and self.a[i] == x

    def lower(self, x: int):
        i = bisect.bisect_left(self.a, x)
        return self.a[i - 1] if i > 0 else None

    def higher(self, x: int):
        i = bisect.bisect_right(self.a, x)
        return self.a[i] if i < len(self.a) else None

Line-by-Line Explanation

bisect_left finds insertion/search position in sorted order via binary search O(log n).
Membership test is O(log n) by checking that position.
lower/higher come naturally from neighboring indices.
Insertion/deletion in Python list is O(n) due to element shifts, so this is not a true balanced-tree complexity baseline.

Complexity Table (Typical Implementations)

Structure                          Search   Insert   Delete   Lower/Higher   Ordered Iteration
-----------------------------------------------------------------------------------------------
Hash Set / Hash Map                O(1)*    O(1)*    O(1)*    Not supported  Unordered
Balanced BST Ordered Set/Map       O(log n) O(log n) O(log n) O(log n)       O(n)
Python sorted list + bisect        O(log n) O(n)     O(n)     O(log n)       O(n)
(* average-case for hash structures)

Brute Force → Better → Optimal

Brute Force

Keep unsorted list and scan linearly for predecessor/successor/range each time: O(n) per query.

Better

Keep sorted list + binary search for lookup/neighbors: O(log n) query but O(n) updates.

Optimal (dynamic ordered data)

Use balanced BST ordered set/map for O(log n) both queries and updates.

Time Complexity

Balanced BST ordered set/map: insert, erase, find, lower/higher all O(log n).
In-order iteration: O(n).
Range query traversal: O(log n + k) where k results returned.

Space Complexity

Store n keys (and n values for map): O(n).
Balanced tree pointers/metadata add constant-factor overhead per node.

Edge Cases

Empty set/map: predecessor/successor queries should return None/null safely.
Duplicate inserts in set: ignored by set semantics.
Key overwrite in map: insert same key updates value instead of creating duplicate key.
Boundary queries: lower(min_key) or higher(max_key) has no answer.

Common Mistakes

Common Mistake: Using hash maps when problem needs ordered neighbors or range traversal.

Common Mistake: Confusing lower vs floor and higher vs ceiling semantics.

Common Mistake: Assuming Python bisect list gives O(log n) insertion/deletion. Search is O(log n), updates are O(n).

Optimization Insight: In problems with many updates and neighbor queries, choose a true ordered-tree structure (or a high-quality ordered container library) rather than repeatedly sorting or list insertion.

Interview Insight: "If I need predecessor/successor or range queries with dynamic updates, I choose ordered set/map (balanced BST): O(log n) insert/delete/find/neighbor. Hash map cannot do ordered operations."

Pattern Recognition

Keywords that strongly indicate ordered set/map: "closest smaller/greater", "next element in dynamic set", "range count/sum over keys", "maintain sorted active elements", "floor/ceiling".

Practice Problems

Maintain a dynamic set of numbers and answer predecessor/successor queries online.
Sliding window with nearest value lookup (active ordered set).
Given stream of numbers, report count of keys in [L, R] after each update.

Summary

Ordered set/map stores keys in sorted order with dynamic updates.
Main strength: predecessor/successor and range operations in O(log n).
Backed by balanced BST in typical language libraries.
Use when order-aware queries are required; use hash structures for pure exact-lookup workloads.

19.2 Treap

Introduction

A treap (tree + heap) is a randomized binary search tree where each node stores a key (BST order: left < root < right) and a random priority (heap order: parent priority ≥ children in max-heap convention, or the reverse for min-heap). The random priorities make the tree balanced in expectation: height is O(log n) with high probability, so search, insert, and delete run in expected O(log n) time without explicit rotations like AVL or Red-Black trees.

Real-World Analogy

Think of organizing people in a line by height (BST key order), but each person also draws a random “VIP number” (priority). Higher VIP numbers float up like in a heap. The combination forces a unique shape that stays shallow on average—no one has to manually rebalance by complex rules; the randomness does the work.

Formal Definition

Concept Note: A treap node holds (key, priority). Invariants:

BST property: all keys in left subtree < node.key < all keys in right subtree.
Heap property: each node’s priority is ≥ (or ≤, by convention) both children’s priorities.

Priorities are usually drawn uniformly at random when inserting. Uniqueness of (key, priority) pairs ensures the structure is well-defined. Expected height is Θ(log n).

Why This Topic Matters

Simple balancing: Easier to implement than AVL/RB in contest settings; split/merge are powerful primitives.
Implicit treap extension: Same idea supports sequence operations (rope-like) by keying by index.
Interview depth: Shows you understand randomized data structures and BST invariants.

Mental Model

BST alone can degenerate to a chain; heap priority “pulls” high-priority nodes up, keeping depth small in expectation.
Insert: treat as BST insert, then rotate (or use split/merge) until heap property holds—or build via split/merge directly.
Split and merge are the two core operations from which everything else can be built.

Split and Merge

Split(root, key)

Splits the treap into two treaps: (L, R) where all keys in L are ≤ key (or < key, by convention) and all keys in R are > key. Walk down the tree; heap priorities guide rotations implicitly in recursive split.

Merge(L, R)

Assumes all keys in L < all keys in R. Combines into one treap: compare roots’ priorities and attach the smaller root as child of the larger.

Insert and Delete (via Split/Merge)

Insert(k): Split root at k, create node (k, random priority), merge(merge(left, node), right).
Delete(k): Split to isolate k, split again to remove node, merge remaining parts.

ASCII Diagram (Conceptual)

  Treap (max-heap on priority, BST on key)

        (40, prio 9)
       /              \
  (20, 7)          (50, 8)
  /     \              \
(10,6) (30,5)       (60,4)

Keys follow BST; priorities decrease down the heap (example numbers illustrative).

Python Implementation (Split / Merge Treap)

import random
from dataclasses import dataclass
from typing import Optional, Tuple

@dataclass
class Node:
    key: int
    pri: int
    left: Optional["Node"] = None
    right: Optional["Node"] = None

def split(root: Optional[Node], key: int) -> Tuple[Optional[Node], Optional[Node]]:
    """All keys <= key go left, > key go right."""
    if root is None:
        return (None, None)
    if root.key <= key:
        l, r = split(root.right, key)
        root.right = l
        return (root, r)
    else:
        l, r = split(root.left, key)
        root.left = r
        return (l, root)

def merge(a: Optional[Node], b: Optional[Node]) -> Optional[Node]:
    """All keys in a < all keys in b."""
    if not a:
        return b
    if not b:
        return a
    if a.pri > b.pri:
        a.right = merge(a.right, b)
        return a
    else:
        b.left = merge(a, b.left)
        return b

def insert(root: Optional[Node], key: int) -> Optional[Node]:
    # split: left has keys <= t, right has > t
    left, right = split(root, key - 1)       # left: keys < key, right: keys >= key
    mid_left, mid_right = split(right, key)  # mid_left: keys == key, mid_right: keys > key
    if mid_left is not None:               # key already present (unique keys)
        return merge(merge(left, mid_left), mid_right)
    node = Node(key=key, pri=random.randint(1, 10**9))
    return merge(merge(left, node), mid_right)

def erase(root: Optional[Node], key: int) -> Optional[Node]:
    left, right = split(root, key - 1)
    mid_left, mid_right = split(right, key)  # drop mid_left (subtree of nodes with this key)
    return merge(left, mid_right)

Here split(root, t) puts keys <= t in the left treap and > t in the right treap. Double-split isolates keys equal to key for insert/delete. Duplicate-key policy should match the problem statement.

Line-by-Line Explanation

split: If root.key ≤ key, the root belongs to the left treap; split right child and attach left part back.
merge: Higher priority becomes parent; recursively merge the other side.
insert: Isolate range for key, insert new leaf if missing, merge three parts.
erase: Isolate the node with key and drop it by merging around it.

Time and Space Complexity

Expected time: O(log n) per split, merge, insert, delete, search (walk).
Worst case (rare): O(n) if random priorities are adversarially bad—use cryptographic RNG or treap as intended (random).
Space: O(n) nodes, O(log n) expected recursion depth.

Edge Cases

Duplicate keys: Decide whether BST allows duplicates; may store count in node.
Empty tree: split/merge base cases return None.
Deterministic priorities: If priorities repeat, tie-break consistently to avoid structural ambiguity.

Common Mistakes

Common Mistake: Violating BST order during merge (merging when max(L) ≥ min(R)).

Common Mistake: Confusing split boundary (≤ vs < key); off-by-one breaks invariants.

Expert Tip: For sequence problems, use implicit treap (index as implicit key) with split by size; same merge logic, powerful for range reverse/substring operations.

Interview Insight: "Treap = BST on keys + heap on random priorities. Implement with split and merge; expected O(log n). Good when you need a balanced ordered structure without writing AVL rotations."

Practice Problems

Implement treap with insert, delete, and search.
Extend with subtree size for k-th order statistic.
Implicit treap: reverse subarray, cut/paste sequence.

Summary

Treap: randomized BST balanced by heap priorities.
Core ops: split, merge; insert/delete compose from them.
Expected complexity: O(log n) height and operations.
Alternative to AVL/RB; especially popular in competitive programming.

19.3 Skip List

Introduction

A skip list is a probabilistic data structure that maintains a sorted sequence of elements and supports search, insert, and delete in expected O(log n) time. Instead of a single linked list (O(n) search), skip lists stack several levels of sparse express lanes: higher levels skip over many nodes so you can “fast-forward” toward the target, then drop down level by level. It is a practical alternative to balanced BSTs and is used in real systems (for example Redis sorted sets use a skip list–like design; Java’s ConcurrentSkipListMap is skip-list based).

Real-World Analogy

Imagine a train line with a local track that stops at every station and an express track that skips stops. To reach a distant station, you ride express until you would overshoot, then switch to a slower line that stops more often. Skip lists are the same idea in a linked structure: higher levels are express, lower levels are local.

Formal Definition

Concept Note: A skip list has a header node and levels 0 (bottom, full list) up to some max level L. Each node stores a key (and optional value), a forward array of pointers forward[i] to the next node at level i, and a level (height) chosen at insert time. Invariant: level i lists are sorted by key; level i+1 is a subsequence of level i. Search starts at the highest level of the header, walks forward while next key < target, then steps down one level, repeating until level 0. Expected number of levels per node is O(1) if level is chosen with geometric distribution (e.g. coin flips until tails).

Why This Topic Matters

Engineering reality: Easier to implement lock-free or concurrent variants than many tree rebalancing schemes.
Same asymptotics as balanced BST: Expected O(log n) search/insert/delete without rotations.
Interviews: Tests understanding of randomized structures and layered linked lists.

Mental Model

Bottom level (0) is an ordinary sorted linked list.
Higher levels are shortcuts: a node at level k appears in lists 0..k.
Search never goes backward along a level—only forward and down.
Random height keeps shortcuts sparse so expected path length is logarithmic.

Random Level (Typical Rule)

On insert, set level = 1, then while random() < p (often p = 1/2 or 1/4) and level < MAX_LEVEL, increment level. Expected level is small; cap MAX_LEVEL to O(log n) to bound pointers.

Step-by-Step: Search

Start at header, current level = max level in structure.
While forward pointer at this level points to a node with key < target, move forward.
If next key ≥ target or null, go down one level.
At level 0, forward is either the node with key or the insertion position.

Step-by-Step: Insert

Search path: record, at each level, the last node before dropping (the “update” array).
Choose random level for new node; extend list height if needed.
Splice new node: for each level ≤ new level, set new.forward[i] = update[i].forward[i] and update[i].forward[i] = new.

Step-by-Step: Delete

Find node at level 0 (same search).
For each level where the node participates, redirect predecessors’ forward pointers to skip the node.

ASCII Diagram

Levels (higher = more express):

  L2:  HEAD --------> 30 -----------------------------> NULL
  L1:  HEAD --> 15 --> 30 --------> 45 --> NULL
  L0:  HEAD --> 10 --> 15 --> 20 --> 30 --> 45 --> NULL

Search for 20: from L2, skip to 30 (too far), down; at L1, skip to 15, forward would be 30 > 20, down;
at L0, walk 10 -> 15 -> 20.

Python Implementation (Educational)

import random
from dataclasses import dataclass, field
from typing import List, Optional

MAX_LEVEL = 16
P = 0.5

@dataclass
class Node:
    key: int
    forward: List[Optional["Node"]] = field(default_factory=list)

class SkipList:
    def __init__(self):
        self.header = Node(key=-10**18, forward=[None] * MAX_LEVEL)
        self.level = 0

    def _random_level(self) -> int:
        lvl = 0
        while random.random() < P and lvl < MAX_LEVEL - 1:
            lvl += 1
        return lvl

    def search(self, key: int) -> bool:
        cur = self.header
        for i in range(self.level, -1, -1):
            while cur.forward[i] is not None and cur.forward[i].key < key:
                cur = cur.forward[i]
        cur = cur.forward[0]
        return cur is not None and cur.key == key

    def insert(self, key: int) -> None:
        update: List[Optional[Node]] = [None] * MAX_LEVEL
        cur = self.header
        for i in range(self.level, -1, -1):
            while cur.forward[i] is not None and cur.forward[i].key < key:
                cur = cur.forward[i]
            update[i] = cur
        cur = cur.forward[0]
        if cur is not None and cur.key == key:
            return
        new_level = self._random_level()
        if new_level > self.level:
            for i in range(self.level + 1, new_level + 1):
                update[i] = self.header
            self.level = new_level
        x = Node(key=key, forward=[None] * MAX_LEVEL)
        for i in range(new_level + 1):
            x.forward[i] = update[i].forward[i]
            update[i].forward[i] = x

    def delete(self, key: int) -> None:
        update: List[Optional[Node]] = [None] * MAX_LEVEL
        cur = self.header
        for i in range(self.level, -1, -1):
            while cur.forward[i] is not None and cur.forward[i].key < key:
                cur = cur.forward[i]
            update[i] = cur
        cur = cur.forward[0]
        if cur is None or cur.key != key:
            return
        for i in range(self.level + 1):
            if update[i].forward[i] != cur:
                continue
            update[i].forward[i] = cur.forward[i]
        while self.level > 0 and self.header.forward[self.level] is None:
            self.level -= 1

The sentinel header uses a key smaller than any real key. Production code uses a proper comparator and optional values per node.

Line-by-Line Explanation

_random_level: geometric distribution; expected O(1) levels per node.
search: same walk as insert until level 0, then check key equality.
insert: update[i] is the predecessor at level i before splicing in the new node.
delete: unlink at every level where the target appears; shrink self.level if top levels empty.

Time and Space Complexity

Expected time: O(log n) for search, insert, delete (with p = 1/2 and MAX_LEVEL = O(log n)).
Worst case (unlikely): O(n) if every node reaches max level (bad luck).
Space: O(n) nodes; expected O(n) extra pointers total (each node ~2 pointers on average with p = 1/2).

Edge Cases

Duplicate keys: Decide whether insert is no-op or multiset; above code skips duplicate.
Empty list: header only; level may be 0.
MIN/MAX key: sentinel must be smaller than any inserted key (or use separate head/tail sentinels).

Common Mistakes

Common Mistake: Forgetting to extend update entries to header when new node’s level exceeds current list height.

Common Mistake: Off-by-one on level indices when allocating forward arrays (level 0..L vs length L+1).

Expert Tip: Choosing p = 1/4 reduces pointer count at the cost of slightly taller expected search; tune p and MAX_LEVEL for memory vs speed in your workload.

Interview Insight: "Skip list: sorted linked list plus random-height express pointers. Search walks right while next < key, else down. Insert records predecessors per level and splices. Expected O(log n) like balanced BST, simpler than rotations for some concurrent designs."

Pattern Recognition

When you need sorted order, predecessor/successor, or range iteration with expected logarithmic updates and want a linked structure (or concurrency-friendly design), skip list is a strong candidate next to treap or balanced tree libraries.

Practice Problems

Implement skip list with insert, delete, search, and optional lower_bound.
Compare average path length vs theoretical O(log n) by simulation.
Extend nodes with satellite data (sorted map semantics).

Summary

Skip list: multi-level sorted linked lists with random node heights.
Core idea: search forward on high levels, drop down when next key would overshoot.
Expected complexity: O(log n) time, O(n) space.
Practical alternative to balanced BSTs; used in real concurrent and database-adjacent structures.

19.4 B-Tree

Introduction

A B-tree is a self-balancing search tree designed so that each node can hold many keys (not just one like a typical binary node). That high branching factor keeps the tree shallow, which is ideal when each node read/write corresponds to an expensive disk block or page. B-trees (and the closely related B+ tree, where all records live in leaves) are the standard index structure in relational databases and many file systems.

Real-World Analogy

A phone book split into thick chapters is easier to search than a single endless scroll: you jump to the right chapter (wide internal node), then narrow inside it. B-tree nodes are those chapters—each stores several separators so one disk read brings many routing decisions at once.

Formal Definition

Concept Note: Fix an integer order (minimum degree) t ≥ 2 (definitions vary by textbook; some use maximum children m instead). A B-tree of order t satisfies:

Every node has at most 2t − 1 keys and at most 2t children (common convention).
Every node except root has at least t − 1 keys (at least t children for internal nodes).
Root may have as few as 1 key (if not a leaf) or is a single leaf.
All leaves appear at the same depth (perfectly balanced).
Internal node with k keys has k + 1 children; keys partition child subtrees.

Search compares key against sorted keys in node, chooses child interval, descends until leaf or hit.

B-Tree vs B+ Tree (Practical Note)

In a B+ tree, keys in internal nodes are only separators; all actual records (or pointers to rows) sit in leaves, often linked in order for range scans. Database indexes are usually B+ trees. Classic B-tree may store values in internal nodes too. Algorithm courses often teach B-tree properties; production systems emphasize B+ tree behavior.

Why This Topic Matters

Disk and cache locality: Fewer levels ⇒ fewer random I/Os for large datasets.
Industry standard: Explaining indexes as “B+ tree” is expected in system design interviews.
Contrast with BST: Binary trees minimize comparisons in RAM; B-trees minimize node accesses when nodes are huge.

Mental Model

Each node = one block/page of sorted keys + child pointers.
Height stays O(log n) but base of logarithm is large (number of keys per node), so height is small.
Insert may split a full node; delete may merge or borrow from sibling.

Search Algorithm

Start at root.
Find smallest index i such that key ≤ keys[i] (or use linear/ binary search within node).
If key equals keys[i], found (unless internal-only separators in B+).
Else descend to child i (keys before i are smaller interval).
Repeat until leaf or match.

Cost per level: O(t) comparisons inside node if linear scan, O(log t) if binary search on keys within node.

Insert (High Level)

Descend to the leaf where the new key belongs.
If leaf has room (< max keys), insert in sorted order.
If leaf is full, split into two nodes with median key promoted to parent (or handled per variant).
If parent becomes overfull, split propagates upward; new root may appear if old root splits.

Delete (High Level)

Remove key from leaf (or swap with inorder predecessor/successor if stored in internal node—B+ usually deletes from leaf).
If node has too few keys, borrow from left/right sibling via parent rotation, or merge with sibling and pull down parent key.
If root becomes empty after merge, shrink height.

ASCII Diagram (Order t = 2, max 3 keys per node — illustrative)

                [ 40 | 70 ]
               /    |    \
         [10|20] [50|60] [80|90]

Internal node splits key range; children are subtrees with keys in (..,40), (40,70), (70,..).
Actual B-trees use larger t in practice (many keys per page).

Python Sketch: Search in a Simplified Node

def find_child_index(keys: list[int], k: int) -> int:
    """Return child slot for key k (0..len(keys))."""
    i = 0
    while i < len(keys) and k > keys[i]:
        i += 1
    return i

# Conceptual: node has keys[] and children[] length len(keys)+1
# Recursively descend until leaf or match.

Time Complexity

Let n = number of keys, each node hold Θ(t) keys, height h = O(log_t n).
Search: O(h · t) with linear scan per node, or O(h · log t) with binary search inside node.
Since t is constant for a fixed page size, this is O(log n) with a small constant height.
Insert/Delete: O(h) node visits; splits/merges O(h) amortized in typical analysis.

Space Complexity

O(n) keys plus O(n) child pointers overall (tree structure).
Internal fragmentation in real disks adds constant-factor overhead per page.

Edge Cases

Root split: tree grows in height when root is full and splits.
Root merge: height may shrink when root’s children merge.
Duplicate keys: policy varies (disallow, or chain in leaf for B+).

Common Mistakes

Common Mistake: Confusing order t with maximum children m across textbooks—always check definitions.

Common Mistake: Assuming B-tree is “binary tree for disk”—branching is multi-way; depth is much smaller than AVL for same n when pages are wide.

Optimization Insight: Choosing page size and key size to maximize keys per node improves fanout and reduces tree height—this is a major tuning knob in real databases.

Interview Insight: "B-tree: wide nodes, all leaves same depth, search by scanning keys in node then child index. Insert splits full nodes upward. Used because disk I/O is block-sized; B+ tree puts data in leaves for range scans."

Pattern Recognition

Questions about database indexes, filesystem metadata, “why not binary tree on disk,” or “log base” of index depth → think B-tree / B+ tree and fanout.

Practice Problems

Trace insertions that cause leaf split and then internal split on paper.
Compare max height of AVL vs B-tree for same n and large page fanout.
Explain why B+ tree is preferred for range queries in SQL indexes.

Summary

B-tree: balanced multi-way search tree with min/max keys per node.
Goal: minimize tree height for block-oriented storage.
Ops: search; insert/delete with split, merge, borrow.
B+ tree: internal separators only, records in leaves—common in DB engines.

19.5 KD Tree

Introduction

A k-d tree (k-dimensional tree) is a binary space-partitioning tree that stores points in k-dimensional space. Each internal node splits space with an axis-aligned hyperplane: at depth d, we often split on axis d mod k (cycle through dimensions). Points in the “left” subtree lie on one side of the split, points in the “right” subtree on the other. k-d trees support nearest-neighbor search, range queries (report points in a box or ball), and approximate geometry queries more efficiently than brute-force O(n) scans when dimensionality is modest and data is not adversarial.

Real-World Analogy

Imagine organizing a map of cities: first draw a vertical line dividing east and west halves, then within each half draw horizontal lines, then vertical again, and so on. Each region shrinks until it holds one city (or a small bucket). To find the city closest to a query point, you descend the tree like a decision tree, then backtrack and check whether the other side of a split could hold a closer city.

Formal Definition

Concept Note: A k-d tree node stores a point p, a split dimension axis ∈ {0,…,k−1}, and left/right children containing points with coordinate on that axis less than or greater than (or ≤ / >) p[axis] according to convention. Leaves may store one point or a small bucket. The tree is binary and partitions ℝ^k recursively. Common construction: sort by current axis and split at median for balance, giving O(n log n) build time for n points.

Why This Topic Matters

Nearest neighbor: Core in graphics, ML (k-NN), mapping, and collision broad-phase.
Range search: “All points in axis-aligned rectangle” in average sublinear time per query in good cases.
Curriculum bridge: Connects BST ideas to computational geometry and higher dimensions.

Mental Model

Each node = one splitting plane perpendicular to one coordinate axis.
Depth cycles axis: x, then y, then z, then x again in 3D.
Search prunes subtrees when query cannot possibly beat current best distance.

Construction (Median Split)

Choose axis = depth mod k.
Sort points by that axis (or use nth_element / quickselect for median in O(n) per level).
Median point becomes node; recursively build left from lower half, right from upper half.
Stop when no points remain (or bucket size ≤ 1).

Balanced median builds yield height O(log n) for n points (assuming distinct coordinates in typical analysis).

Nearest Neighbor Search (Sketch)

Recursive descent: at each node, go to the child on the same side of the split as the query point.
Update best distance when visiting a point.
On unwind, if distance from query to splitting plane ≤ current best radius, search the other subtree too (sphere can cross the plane).

Pruning is the key: if the query is far from the plane compared to best distance, skip the other side.

ASCII Diagram (k = 2)

Points in plane:

  y
  |     C
  |   B   D
  | A     E
  +---------- x

First split on x (vertical line through median x):
  left region: A, B   right region: C, D, E
Next split on y within each region, and so on.

Python Implementation (Build + NN Search Sketch)

from dataclasses import dataclass
from typing import List, Optional, Tuple

@dataclass
class KDNode:
    point: Tuple[float, ...]
    axis: int
    left: Optional["KDNode"] = None
    right: Optional["KDNode"] = None

def build_kdt(points: List[Tuple[float, ...]], depth: int = 0) -> Optional[KDNode]:
    if not points:
        return None
    k = len(points[0])
    axis = depth % k
    points.sort(key=lambda p: p[axis])
    mid = len(points) // 2
    return KDNode(
        point=points[mid],
        axis=axis,
        left=build_kdt(points[:mid], depth + 1),
        right=build_kdt(points[mid + 1 :], depth + 1),
    )

def dist2(a: Tuple[float, ...], b: Tuple[float, ...]) -> float:
    return sum((x - y) ** 2 for x, y in zip(a, b))

def nearest(node: Optional[KDNode], target: Tuple[float, ...],
            best: Tuple[float, Optional[Tuple[float, ...]]]) -> Tuple[float, Optional[Tuple[float, ...]]]:
    """Returns (best_dist2, best_point)."""
    if node is None:
        return best
    d = dist2(node.point, target)
    if d < best[0]:
        best = (d, node.point)
    axis = node.axis
    diff = target[axis] - node.point[axis]
    first, second = (node.left, node.right) if diff < 0 else (node.right, node.left)
    best = nearest(first, target, best)
    if diff * diff < best[0]:  # hyperplane might intersect best sphere
        best = nearest(second, target, best)
    return best

Production code adds tie-breaking, bucket leaves, and iterative stacks; high dimensions often use approximate NN or other structures (LSH, ball trees) because k-d tree performance can degrade.

Line-by-Line Explanation

build_kdt: sorts on current axis, median as root—simple but O(n log² n) if sorting each level naively; median-of-axis via quickselect improves to O(n log n) total.
nearest: visits near child first, then optionally the far child if the splitting hyperplane is close enough.
diff * diff < best[0]: squared distance to plane along axis (exact plane distance in axis-normal direction).

Time and Space Complexity

Build: O(n log² n) naive per-level sort; O(n log n) with linear-time median partition per level.
NN query (average, low k): often O(log n) with good pruning; worst case O(n).
Space: O(n) nodes.

In high dimensions, “curse of dimensionality” makes pruning weak—many queries degrade toward linear scan.

Edge Cases

Empty set: return None.
Duplicate coordinates: may need duplicate handling or bucket nodes.
All points collinear: tree still valid but splits may be degenerate.

Common Mistakes

Common Mistake: Forgetting to visit the “far” subtree when the hypersphere around the query intersects the splitting plane.

Common Mistake: Expecting O(log n) NN in high k; performance often breaks down as k grows.

Expert Tip: For static point sets and many queries, k-d tree is great. For dynamic inserts/deletes, rebalancing or alternate structures (R-tree, LSH) may be preferred.

Interview Insight: "k-d tree: binary tree partitioning k-space by cycling axes; median split balances. Nearest neighbor: recurse to near side, then if distance to splitting plane < best radius, search far side. Watch high-dimensional degradation."

Pattern Recognition

Keywords: 2D/3D nearest point, “points in rectangle,” k-NN with small k and moderate n, spatial indexing without full database engine.

Practice Problems

LeetCode 973 — K Closest Points to Origin (can use heap; try k-d tree for learning).
Implement range count: points inside axis-aligned box.
Compare runtime of brute-force vs k-d tree NN on random 2D points.

Summary

k-d tree: binary tree splitting k-dimensional space with axis-aligned planes.
Build: median on rotating axes; NN: recurse + prune across split plane.
Good for: low–moderate dimension, static or lightly dynamic point sets.
Caveat: high dimension ⇒ curse of dimensionality; consider other methods.

19.6 Persistent Segment Tree

Introduction

A persistent data structure preserves old versions after updates: you can still query “what did the array look like at time t?” or “sum elements in [L, R] in version t?” A persistent segment tree is a segment tree where each point update creates a new root and only allocates O(log n) new nodes along the path from root to leaf, reusing unchanged subtrees from the previous version. This is path copying (structural sharing).

Real-World Analogy

Think of version control: when you edit one file, Git does not duplicate the entire repository—it stores a delta or new blob and reuses unchanged trees. Persistent segment trees do the same: only the path from root to the changed leaf is “new”; sibling subtrees are shared pointers to old nodes.

Formal Definition

Concept Note: A segment tree node covers an index interval [l, r) (or [l,r] by convention). It stores an aggregate (sum, min, count, etc.). Persistent update: to set position pos, clone the root, then recursively clone only the child on the path containing pos; the other child pointer copies the previous child’s pointer. Each version is identified by a root pointer. Space over time: O((n + q) log n) for initial build + q updates if each update touches O(log n) new nodes.

Why This Topic Matters

Offline/historical queries: “Range sum at version k” without storing full array snapshots.
Competitive programming: Classic for K-th order statistic on subarray, mergeable history problems.
Functional persistence: Illustrates immutable structure sharing clearly.

Mental Model

Normal segment tree: one tree, mutable.
Persistent: each update yields a new root; old root still valid for old queries.
Unchanged branches are literally the same nodes in memory (shared).

Brute Force → Better → Optimal

Brute Force

After each update, copy the entire array or entire segment tree: O(n) or O(n) space per version → O(nq) for q updates.

Better

Store only deltas per version (difference arrays)—works for some queries but not universal.

Optimal (Persistent Segment Tree)

Path copy only: O(log n) new nodes per update, O(log n) query per version.

Step-by-Step: Persistent Point Update

Start from previous version’s root and interval [0, n).
Create a new node copying the aggregate field; point one child to old child if range does not contain pos.
Recurse into the half that contains pos; the other child reuses previous pointer unchanged.
At leaf, write new value and propagate sums upward in cloned nodes only.
Return new root as new version id.

ASCII Diagram (Path Copy)

Version 0 tree:          Version 1 after updating index i:

       R0                      R1 (new root)
      /  \                    /  \
    A0    B0                A1   B0   <- B subtree reused
   / \                     / \
 ... ...                 ... ...   only left spine cloned

Python Implementation (Sum, Persistent Update)

from dataclasses import dataclass
from typing import Optional, List

@dataclass
class Node:
    left: Optional["Node"]
    right: Optional["Node"]
    sum: int

def build(a: List[int], l: int, r: int) -> Node:
    if l + 1 == r:
        return Node(None, None, a[l])
    m = (l + r) // 2
    left = build(a, l, m)
    right = build(a, m, r)
    return Node(left, right, left.sum + right.sum)

def update(prev: Optional[Node], l: int, r: int, pos: int, val: int) -> Node:
    if l + 1 == r:
        return Node(None, None, val)
    m = (l + r) // 2
    if pos < m:
        new_left = update(prev.left, l, m, pos, val)
        new_right = prev.right
    else:
        new_left = prev.left
        new_right = update(prev.right, m, r, pos, val)
    return Node(new_left, new_right, new_left.sum + new_right.sum)

def query(node: Node, l: int, r: int, ql: int, qr: int) -> int:
    if qr <= l or r <= ql:
        return 0
    if ql <= l and r <= qr:
        return node.sum
    m = (l + r) // 2
    return query(node.left, l, m, ql, qr) + query(node.right, m, r, ql, qr)

build creates version 0. Each update(prev_root, …) returns a new root for the next version. Assumes prev is non-null for internal calls; guard or use leaves for base. Production code adds null checks and dynamic size.

Line-by-Line Explanation

update clones only the side containing pos; the other child is shared via reference.
sum recomputed from children in new nodes along the path.
query is identical to normal segment tree but uses the root for the chosen version.

Time and Space Complexity

Build: O(n) nodes for initial array.
Update: O(log n) time, O(log n) new nodes.
Query: O(log n) per version.
Total space after q updates: O(n + q log n) nodes stored.

Edge Cases

First version: build from initial array before persistent updates.
Out of range index: reject or define behavior.
Zero-length range: query returns 0.

Common Mistakes

Common Mistake: Mutating nodes in place—this would corrupt older versions. Always allocate new nodes along the update path.

Common Mistake: Forgetting to reuse the unchanged child pointer from prev when cloning.

Expert Tip: Persistent segment trees are often built on value frequencies (merge sort tree / order-statistic trick) for “K-th smallest in subarray” by combining with prefix versions and binary search on value.

Interview Insight: "Persistent segtree: each update clones O(log n) nodes on path to leaf; other subtrees shared. New root = new version. Query old version by old root pointer. Space O(n + q log n), time O(log n) per op."

Pattern Recognition

Problems asking for range queries on historical array states, K-th number in subarray, or “if we only changed one element per step, answer across timeline” often map to persistent segment tree or related persistent structures (Fenwick with copies is not standard—segment tree persistence is the usual tool).

Practice Problems

Maintain versions of an array with point updates; answer range sum for arbitrary past version.
K-th smallest value in subarray L..R (chairman tree / merge sort tree + persistence).
Compare memory of full copy vs persistent after many updates.

Summary

Persistent segment tree: immutable path copying on update; shared unchanged subtrees.
Version = root pointer; update O(log n) time and nodes.
Use for: historical range queries and advanced order-statistics on subarrays.
Never mutate old nodes in place when older versions must remain valid.

19.7 Rope Data Structure

Introduction

A rope is a tree-based string data structure optimized for very large text and frequent edits (insert, delete, split, concatenate) at arbitrary positions. Instead of storing one giant contiguous string (where middle edits can be expensive due to shifting), a rope stores text in leaves and keeps internal nodes with subtree length metadata. This gives near O(log n) edits on average for balanced ropes.

Real-World Analogy

Think of a book made of many pages clipped together. Inserting text near page 500 does not require rewriting all pages after it; you split one page and insert new pages, then reconnect. A rope is that “editable pages” model for strings.

Formal Definition

Concept Note: A rope is a balanced binary tree where:

Leaves hold short string chunks.
Each internal node stores weight = total length of left subtree (or full subtree size by variant).
In-order traversal of leaves gives the full string.

Fundamental operations are split and concatenate; insert/delete/substring are built from them.

Why This Topic Matters

Text editors: Classic structure for efficient mutable text buffers.
Large strings: Avoids O(n) copying for every middle insertion/deletion in long documents.
Tree-operation pattern: Reinforces split/merge ideas from treaps and persistent structures.

Mental Model

String is represented as concatenation of leaf chunks.
Internal weights help route by index (like order-statistics in trees).
To edit at position i, split around i, modify middle, then concatenate back.

Core Operations

1) Index / Character Access

Compare index i with node weight. If i < weight, go left; else go right with i - weight. Reach leaf and index into chunk.

2) Split(rope, i)

Returns two ropes: left contains first i characters, right contains the rest.

3) Concat(a, b)

Create parent whose left = a, right = b, and recompute weight/size. Rebalance if needed.

4) Insert/Delete/Substr

Insert at i: split at i -> concat(left, new_text_rope) -> concat(result, right).
Delete [l, r): split at l, split second part at (r-l), drop middle, concat remaining.
Substring [l, r): same split pattern but keep middle.

ASCII Diagram

Rope for "HelloWorld!!!" (chunked):

             [len=13]
             /      \
        [len=5]    [len=8]
        /    \      /    \
    "He"   "llo" "World" "!!!"

In-order leaves -> "He" + "llo" + "World" + "!!!"

Python Sketch (Conceptual Rope Node)

from dataclasses import dataclass
from typing import Optional

@dataclass
class RopeNode:
    left: Optional["RopeNode"] = None
    right: Optional["RopeNode"] = None
    text: str = ""            # non-empty only for leaves in this sketch
    weight: int = 0           # length of left subtree (or len(text) for leaf convention)
    total_len: int = 0

def recalc(node: Optional[RopeNode]) -> int:
    if node is None:
        return 0
    if node.left is None and node.right is None:
        node.weight = len(node.text)
        node.total_len = len(node.text)
    else:
        left_len = recalc(node.left)
        right_len = recalc(node.right)
        node.weight = left_len
        node.total_len = left_len + right_len
    return node.total_len

Production ropes include balancing (often via AVL/Red-Black/Treap-like priorities), leaf-size constraints, and efficient split/concat implementations.

Step-by-Step Example: Insert

Insert "XYZ" into "abcdef" at position 3:

Split at 3 -> left "abc", right "def".
Create rope for "XYZ".
Concat(left, new) -> "abcXYZ".
Concat(result, right) -> "abcXYZdef".

Brute Force → Better → Optimal

Brute Force

Plain immutable string edits in middle repeatedly: each edit may copy O(n) characters.

Better

Gap buffer/piece table (editor-specific alternatives) reduce some costs depending on edit patterns.

Optimal (for tree-based random-position edits)

Rope with balanced tree gives O(log n) navigation/split/concat plus chunk-size effects, making large-document random edits practical.

Time Complexity (Balanced Rope)

Index access: O(log n)
Split/concat: O(log n)
Insert/delete: O(log n + k) where k is size of inserted/deleted chunk handling
Flatten to full string: O(n)

Without balancing, rope height can degrade and operations become slower (up to O(n) in worst case).

Space Complexity

O(n) for characters plus tree node overhead.
Persistent/versioned variants can share unchanged subtrees similarly to persistent trees.

Edge Cases

Insert at 0 or end: reduces to prepend/append concat.
Delete empty range: no change.
Very small text: plain string may be faster due to lower constants.

Common Mistakes

Common Mistake: Forgetting to update weight/length metadata after structural changes.

Common Mistake: Never rebalancing; repeated concatenations can create skewed trees and slow operations.

Expert Tip: Keep leaf chunks within a target size range (e.g. 256–1024 chars). This improves cache locality and limits tree height/churn for typical editor operations.

Interview Insight: "Rope stores text as balanced tree of chunks. Split and concat are primitives; insert/delete are compositions. Middle edits become O(log n) instead of O(n) full-string shifts in large documents."

Pattern Recognition

If requirements include huge mutable text, many random-position edits, undo/versioning, or fast substring operations, rope/piece-table/gap-buffer discussion is expected. Rope is strong for tree-based split/merge workloads.

Practice Problems

Implement split and concat for a simplified rope with metadata updates.
Build insert/delete using only split + concat.
Benchmark repeated middle insertions: plain string vs rope-style chunk tree.

Summary

Rope: balanced tree of string chunks with length metadata.
Core idea: split + concat compose efficient text edits.
Best for: large, mutable text with frequent middle edits.
Requires metadata maintenance and balancing for promised performance.

20.1 Meet in the Middle

Introduction

Meet in the Middle (MITM) is a powerful algorithmic strategy used when brute force is too slow, but classic dynamic programming is not feasible because constraints are large or values are huge. The core trick is simple: instead of solving a problem over n elements at once, split it into two halves of roughly n/2 each, solve each half separately, and then combine the results efficiently.

For beginners, this pattern can feel magical at first. But once you see the complexity math, it becomes one of the most practical tools for interview and competitive programming problems involving subsets, sums, or combinations.

Real-World Analogy

Imagine trying to open a safe where the code has 8 digits. Brute force means trying all 10^8 possibilities. Instead, suppose you split the code into left 4 digits and right 4 digits. You precompute all left possibilities and all right possibilities, then combine information to find the full code much faster. You did not reduce correctness, only restructured the search.

Formal Definition

Meet in the Middle is a divide-and-combine strategy where the search space is partitioned into two parts, all possibilities of each part are enumerated, and a fast merge/search technique (sorting + binary search, hashing, two pointers, etc.) is used to produce the final answer.

Concept Note: MITM is most useful when direct enumeration is O(2^n), and splitting gives roughly O(2^(n/2)) work per side plus merge overhead.

Why This Topic Matters

Transforms impossible brute force into feasible solutions for n around 30 to 45.
Appears in subset-sum style interview questions when constraints block standard DP table methods.
Builds strong pattern recognition for balancing time and memory in exponential problems.
Teaches a reusable design principle: precompute partial state, then merge smartly.

Mental Model

All subsets of n elements:

2^n combinations (too many)
        |
        v
Split into two halves:
Left half (n/2)      Right half (n/2)
2^(n/2) subsets      2^(n/2) subsets
        |                    |
        +------ combine via search -------+
                       |
                       v
                 final answer

The key benefit is that 2^(n/2) is dramatically smaller than 2^n. For example, if n = 40, brute force is about 1 trillion subsets, while MITM uses about 1 million subsets per side.

Evolution: Brute Force → Better → Optimal

Brute Force

Enumerate every subset of all n elements and check condition.

Time: O(2^n) (usually too slow for n ~ 40)
Space: small if streamed, but time kills feasibility.

Better

Use pruning/backtracking if constraints allow monotonic cuts. This helps some cases, but worst case can still be near 2^n.

Optimal (for medium n exponential search problems)

Use MITM: split array into 2 halves, generate all subset results for each half, then combine using binary search/hash/two pointers.

Time often near O(2^(n/2) * poly(n))
Space often O(2^(n/2))

Optimization Insight: MITM trades memory for massive speedup. You store many partial results so merge operations avoid redoing exponential work.

Step-by-Step Breakdown (Classic Subset Sum: Max Sum ≤ S)

Problem: Given an array arr and a limit S, find the maximum subset sum that is less than or equal to S.

Split arr into left and right.
Generate all subset sums of left, store in list L.
Generate all subset sums of right, store in list R.
Sort R.
For each value x in L, we need the largest y in R such that x + y ≤ S.
Use binary search on R to find this best y.
Track the global maximum x + y.

ASCII Diagram

arr = [3, 34, 4, 12, 5, 2], S = 10

Split:
left  = [3, 34, 4]
right = [12, 5, 2]

L = all subset sums(left)  -> [0, 3, 34, 37, 4, 7, 38, 41]
R = all subset sums(right) -> [0, 12, 5, 17, 2, 14, 7, 19]
Sort R -> [0, 2, 5, 7, 12, 14, 17, 19]

For each x in L:
  find largest y <= S - x in R (binary search)
  candidate = x + y
best candidate <= S is answer

Python Implementation

from bisect import bisect_right
from typing import List


def all_subset_sums(nums: List[int]) -> List[int]:
    """
    Returns list of sums of all subsets of nums.
    If len(nums) = k, result size is 2^k.
    """
    sums = [0]
    for num in nums:
        new_sums = []
        for current in sums:
            new_sums.append(current + num)
        sums.extend(new_sums)
    return sums


def max_subset_sum_leq_s(arr: List[int], s: int) -> int:
    """
    Meet in the Middle solution.
    Returns max subset sum <= s.
    """
    n = len(arr)
    mid = n // 2
    left = arr[:mid]
    right = arr[mid:]

    left_sums = all_subset_sums(left)
    right_sums = all_subset_sums(right)
    right_sums.sort()

    best = 0

    for x in left_sums:
        if x > s:
            continue

        # Need maximum y such that x + y <= s
        target = s - x
        idx = bisect_right(right_sums, target) - 1
        if idx >= 0:
            best = max(best, x + right_sums[idx])

    return best


# Example
if __name__ == "__main__":
    arr = [3, 34, 4, 12, 5, 2]
    s = 10
    print(max_subset_sum_leq_s(arr, s))  # 10

Line-by-Line Explanation

Function `all_subset_sums`

Start with [0] because empty subset sum is always 0.
For each number, create new sums by adding it to every existing sum.
Append those new sums to original list; count doubles each step.
After k numbers, there are exactly 2^k sums.

Function `max_subset_sum_leq_s`

Split array into two halves to reduce exponent from n to n/2.
Compute all subset sums for both halves.
Sort right sums so we can binary search quickly.
For each left sum x, find best right sum y ≤ s - x.
Use bisect_right to get index of last valid y.
Update answer with x + y.

Time Complexity

Let left size be n1 and right size be n2, with n1 + n2 = n and usually n1 ≈ n2 ≈ n/2.

Generate left_sums: O(2^n1)
Generate right_sums: O(2^n2)
Sort right_sums: O(2^n2 log(2^n2)) = O(2^n2 * n2)
Loop over left_sums and binary search each: O(2^n1 * log(2^n2)) = O(2^n1 * n2)

When halves are equal, complexity is roughly O(2^(n/2) * n), far better than O(2^n).

Space Complexity

left_sums stores 2^n1 values.
right_sums stores 2^n2 values.
Total: O(2^(n/2)) when split evenly.

Edge Cases

Empty array: answer is 0 (empty subset).
All numbers greater than S: answer may still be 0 if empty subset allowed.
Negative numbers present: MITM still works, but your assumptions about pruning/change in ordering must be careful.
Large duplicates: valid; no special correctness issue, only many repeated sums.

Common Mistakes

Common Mistake: Forgetting to include empty subset sum 0; this can break correctness for small S.

Common Mistake: Using linear scan instead of binary search during merge; this can degrade performance badly.

Common Mistake: Splitting unevenly in a way that creates very different exponent sizes; keep halves close for best performance.

Pattern Recognition

Suspect MITM when you see:

n around 30 to 45 (too large for 2^n, too small for heavy polynomial with huge values).
Subset/combinational decisions (pick or skip each element).
Need to optimize sum/count/closest value by combining partial results.
Constraints where DP by sum is impossible (sum values too large).

Interview Insight

Interview Insight: When you propose MITM, explicitly mention the exponent drop: "I am reducing search from 2^n to roughly 2 * 2^(n/2) plus merge. That makes n ~ 40 practical." Interviewers look for this complexity-driven reasoning, not just the final code.

Practice Problems

Maximum subset sum ≤ S (classic MITM foundation).
Count subsets with sum in range [A, B] using sorted list + binary search bounds.
Closest subsequence sum (find subset sum closest to target).
Partition with minimum difference for moderate n and large values.

Example: If constraints say n = 40 and values up to 10^12, DP by sum is impossible, and 2^40 brute force is too big. This is the textbook signal for Meet in the Middle.

Summary

MITM splits one exponential search into two smaller exponential parts.
Main gain is exponent reduction from n to n/2.
Typical flow: split -> generate all half-results -> sort/hash -> merge by search.
Best suited for subset-style problems with medium n and large value ranges.
In interviews, always justify MITM with complexity math.

20.2 Mo's Algorithm

Introduction

Mo's Algorithm is an offline query optimization technique for range queries on arrays. It is used when you are asked to answer many queries like "compute something on subarray [L, R]" and a direct approach per query is too slow. Instead of solving each query from scratch, Mo's Algorithm reorders queries so that the current window changes only a little from one query to the next.

For beginners, think of it as "smart scheduling of queries." The data does not change, queries are known in advance, and we process them in an order that minimizes repeated work.

Real-World Analogy

Suppose you manage a camera sliding on a rail to inspect different segments of a long pipeline. If requests come in random order, the camera keeps jumping back and forth, wasting time. If you reorder requests by nearby segments, camera movement is reduced. Mo's Algorithm does exactly this for the query window boundaries L and R.

Formal Definition

Mo's Algorithm is an offline algorithm that sorts range queries in block order so that a current interval can be adjusted incrementally using add() and remove() operations, reducing total complexity compared to recomputing each query independently.

Concept Note: Mo's Algorithm requires queries to be known beforehand (offline). It does not naturally handle interleaved updates unless extended variants are used.

Why This Topic Matters

Solves many query problems where O(q * n) is too slow.
Teaches a powerful engineering idea: reduce total work by changing processing order.
Frequently appears in coding contests and advanced interview rounds.
Builds intuition for pointer movement costs and amortized analysis.

Mental Model

At any time, maintain one active range [curL, curR]. For next query [L, R], move boundaries step by step:

If curL > L, decrement curL and add(arr[curL]).
If curR < R, increment curR and add(arr[curR]).
If curL < L, remove(arr[curL]) then increment curL.
If curR > R, remove(arr[curR]) then decrement curR.

Current Window: [curL........curR]
Target  Window:     [L..........R]

Move left and right ends gradually.
Each movement triggers add/remove in O(1) (or near O(1)).

Evolution: Brute Force → Better → Optimal

Brute Force

For each query, scan from L to R and compute answer.

Per query: up to O(n)
Total: O(q * n)

Better

Prefix sums help only for additive/invertible functions (like sum). But for richer metrics (distinct count, frequency-based power, mode-like conditions), prefix sum may not work.

Optimal (for static offline range queries with local updateable state)

Use Mo's Algorithm with block decomposition and incremental add/remove updates.

Typical total complexity: O((n + q) * sqrt(n) * F), where F is cost of one add/remove.
When F = O(1), this is usually around O((n + q) * sqrt(n)).

Optimization Insight: You are not optimizing one query; you are optimizing total pointer movement across all queries.

Step-by-Step Breakdown (Count Distinct in Range)

Problem: Given array arr and queries [L, R], return number of distinct elements in each range.

Choose block size B = int(sqrt(n)).
Represent each query as (L, R, index) to restore output order later.
Sort queries by (L // B, R) (with optional odd-even R optimization).
Maintain:
- freq[value] = count of value in current window
- distinct = number of values with frequency > 0
Adjust current window to each query using add/remove boundary moves.
Store distinct as answer for that query's original index.

ASCII Diagram

arr index:  0 1 2 3 4 5 6 7
arr value: [1,1,2,1,3,4,2,3]

Queries:
Q0: [0,4]
Q1: [1,3]
Q2: [2,6]
Q3: [4,7]

After Mo sorting, process in movement-friendly order.

Window moves:
Start: empty
-> expand to Q?
-> shrink/expand slightly to next Q
-> ...
Instead of recomputing counts from scratch each time.

Python Implementation

from math import isqrt
from typing import List, Tuple


def mos_distinct_count(arr: List[int], queries: List[Tuple[int, int]]) -> List[int]:
    """
    Returns distinct element count for each query [L, R] (inclusive)
    using Mo's Algorithm.
    """
    n = len(arr)
    q = len(queries)
    if n == 0:
      return [0] * q

    block = max(1, isqrt(n))

    indexed_queries = []
    for i, (l, r) in enumerate(queries):
        indexed_queries.append((l, r, i))

    # Standard Mo ordering with odd-even optimization on R
    indexed_queries.sort(
        key=lambda x: (
            x[0] // block,
            x[1] if ((x[0] // block) % 2 == 0) else -x[1]
        )
    )

    freq = {}
    answers = [0] * q
    distinct = 0

    cur_l, cur_r = 0, -1  # empty window

    def add(value: int) -> None:
        nonlocal distinct
        old = freq.get(value, 0)
        freq[value] = old + 1
        if old == 0:
            distinct += 1

    def remove(value: int) -> None:
        nonlocal distinct
        old = freq[value]
        if old == 1:
            distinct -= 1
            del freq[value]
        else:
            freq[value] = old - 1

    for l, r, idx in indexed_queries:
        while cur_l > l:
            cur_l -= 1
            add(arr[cur_l])

        while cur_r < r:
            cur_r += 1
            add(arr[cur_r])

        while cur_l < l:
            remove(arr[cur_l])
            cur_l += 1

        while cur_r > r:
            remove(arr[cur_r])
            cur_r -= 1

        answers[idx] = distinct

    return answers


if __name__ == "__main__":
    arr = [1, 1, 2, 1, 3, 4, 2, 3]
    queries = [(0, 4), (1, 3), (2, 6), (4, 7)]
    print(mos_distinct_count(arr, queries))  # [3, 2, 4, 3]

Line-by-Line Explanation

Block Size and Query Ordering

block = sqrt(n) balances number of blocks and movement inside blocks.
Sort by block of L first, then by R to reduce jumps.
Odd-even trick on R further reduces backtracking in practice.

State Maintenance

freq dictionary tracks frequency of each element in current window.
distinct is the live answer for current window.
add increases frequency and updates distinct when frequency becomes 1.
remove decreases frequency and updates distinct when frequency becomes 0.

Pointer Movement

Expand/shrink window one step at a time until current range equals query range.
Every pointer step performs exactly one add or remove.
Once aligned, current distinct is answer for that query.

Additional Worked Example

Example: For arr = [5, 5, 6, 7, 5], query [1, 4] includes values [5, 6, 7, 5]. Frequencies become {5: 2, 6: 1, 7: 1}, so distinct count is 3. If next query is [2, 4], you remove leftmost 5; frequencies become {5: 1, 6: 1, 7: 1}, distinct stays 3. This shows how one update can reuse nearly all previous work.

Time Complexity

Let n be array size and q number of queries.

Sorting queries: O(q log q).
Total pointer movement is approximately O((n + q) * sqrt(n)) in standard analysis.
If each add/remove is O(1), total is near O((n + q) * sqrt(n)).
If add/remove costs F, multiply movement term by F.

Compared to brute force O(q * n), this is a major improvement for large q.

Space Complexity

Query storage: O(q).
Frequency map/array: O(U) where U is number of distinct values in active domain.
Answer array: O(q).

Edge Cases

Single-element query: answer computed by one add operation.
All values same: distinct is always 1 for non-empty range.
Large value domain: use dictionary or coordinate compression.
Empty array: guard and return zeros for all queries.

Common Mistakes

Common Mistake: Treating ranges as half-open in one place and inclusive in another, causing off-by-one errors.

Common Mistake: Forgetting to store original query index, leading to answers returned in sorted-order instead of input-order.

Common Mistake: Writing expensive add/remove logic; Mo's succeeds only when per-step updates are cheap.

When Mo's Algorithm Is Not Ideal

Online queries that must be answered immediately in given order.
Problems with frequent point/range updates unless using advanced Mo-with-updates variant.
Query functions that are hard to maintain incrementally on add/remove.

Pattern Recognition

Think "Mo's Algorithm" when:

You have many static array range queries.
Direct per-query recomputation is too slow.
You can maintain answer with local add/remove operations.
Queries can be processed offline.

Interview Insight

Interview Insight: Say this explicitly: "I will process queries offline in Mo order and maintain a sliding window with O(1) add/remove. This avoids O(length) recomputation per query." This demonstrates both algorithm knowledge and implementation strategy.

Practice Problems

Count distinct numbers in each range.
Frequency of value x in each range (for fixed x and for mixed x).
Range power metric like sum(freq[v]^2 * v) using update formulas.
Number of pairs with equal values inside each query range.

Expert Tip: Start by implementing Mo for distinct count first. Once that is stable, adapt only the add/remove formulas for new query functions.

Summary

Mo's Algorithm is an offline range-query optimization via smart query ordering.
Core mechanism is incremental window adjustment with add/remove hooks.
Best for static arrays with many range queries and cheap local updates.
Typical complexity is around O((n + q) * sqrt(n)) when updates are O(1).
It is a pattern for minimizing total work, not just speeding one query.

20.3 Square Root Decomposition

Introduction

Square Root Decomposition is a technique to speed up range queries (and sometimes updates) by dividing an array into blocks of size about sqrt(n). Instead of scanning every element for every query, you precompute block-level summaries, then answer a query using a mix of full blocks and small leftover parts.

For beginners, this is one of the best "bridge topics" between brute force and advanced trees (Fenwick Tree / Segment Tree). It teaches how to trade preprocessing and memory for faster repeated operations.

Real-World Analogy

Think of a warehouse inventory stored shelf-by-shelf. If someone asks for total items between shelf 12 and shelf 487, counting each item one by one is slow. If you already know the total for each shelf section, you can add full section totals quickly and only manually count the partial start/end sections.

Formal Definition

Square Root Decomposition partitions an array of size n into ~sqrt(n) blocks, each of size ~sqrt(n), and stores summary information per block (sum/min/max/count, depending on problem) to process operations faster than naive scanning.

Concept Note: The "sqrt" choice balances two costs: number of blocks touched and work inside a block. This balance often leads to O(sqrt(n)) per query/update.

Why This Topic Matters

Simple to implement compared to segment trees.
Gives strong intuition for block-based optimization strategies.
Works well for medium constraints and static/semi-dynamic arrays.
Frequently appears in interviews as a stepping stone to advanced structures.

Mental Model

Array of size n
|----block0----|----block1----|----block2----| ... |
     ~sqrt(n)       ~sqrt(n)       ~sqrt(n)

Query [L, R]:
1) Consume left partial block element-by-element
2) Consume full middle blocks using precomputed summary
3) Consume right partial block element-by-element

You do small manual work at the boundaries and fast aggregated work in the middle.

Evolution: Brute Force → Better → Optimal

Brute Force

For each range query, iterate from L to R.

Range sum query: O(n) worst case per query.
With many queries, total can become very large.

Better

Prefix sums solve static range sum queries in O(1) but do not handle point updates efficiently (O(n) rebuild if naive). So for mixed queries + updates, prefix-only approach is limited.

Optimal (for this pattern and simplicity target)

Use square root decomposition with block sums.

Point update: O(1) (adjust one block summary)
Range sum query: O(sqrt(n))

Optimization Insight: You compress repeated work into block summaries once, then reuse that work across all future queries.

Step-by-Step Breakdown (Range Sum + Point Update)

Choose block size B = ceil(sqrt(n)).
Create array block_sum where block_sum[i] stores sum of block i.
Build: iterate array once and add each value to its block sum.
Point update index idx to new_val:
- Find block b = idx // B.
- Adjust block_sum[b] += (new_val - arr[idx]).
- Update arr[idx] = new_val.
Range query [L, R]:
- If within same block, direct iterate.
- Else process left partial block, then full middle blocks using block_sum, then right partial block.

ASCII Diagram

n = 16, B = 4
Index:   0  1  2  3 | 4  5  6  7 | 8  9 10 11 | 12 13 14 15
Block:   0  0  0  0 | 1  1  1  1 | 2  2  2  2 |  3  3  3  3

Query [2, 13]
Left partial:  indices 2..3
Full blocks:   block 1, block 2 (use block_sum directly)
Right partial: indices 12..13

Python Implementation

from math import isqrt
from typing import List


class SqrtDecompositionRangeSum:
    """
    Supports:
      - point update: arr[idx] = value
      - range sum query: sum(arr[l:r+1])
    """
    def __init__(self, arr: List[int]) -> None:
        self.n = len(arr)
        self.arr = arr[:]  # keep internal copy
        self.block_size = max(1, isqrt(self.n))
        if self.block_size * self.block_size < self.n:
            self.block_size += 1  # ceil(sqrt(n))

        self.block_count = (self.n + self.block_size - 1) // self.block_size
        self.block_sum = [0] * self.block_count

        for i, val in enumerate(self.arr):
            self.block_sum[i // self.block_size] += val

    def update(self, idx: int, new_val: int) -> None:
        block_idx = idx // self.block_size
        delta = new_val - self.arr[idx]
        self.arr[idx] = new_val
        self.block_sum[block_idx] += delta

    def query(self, left: int, right: int) -> int:
        if left > right:
            return 0

        total = 0
        start_block = left // self.block_size
        end_block = right // self.block_size

        if start_block == end_block:
            for i in range(left, right + 1):
                total += self.arr[i]
            return total

        # Left partial block
        end_of_start_block = (start_block + 1) * self.block_size - 1
        for i in range(left, min(end_of_start_block, self.n - 1) + 1):
            total += self.arr[i]

        # Full middle blocks
        for b in range(start_block + 1, end_block):
            total += self.block_sum[b]

        # Right partial block
        start_of_end_block = end_block * self.block_size
        for i in range(start_of_end_block, right + 1):
            total += self.arr[i]

        return total


if __name__ == "__main__":
    nums = [2, 1, 5, 3, 4, 7, 6, 2, 9, 8]
    ds = SqrtDecompositionRangeSum(nums)

    print(ds.query(2, 7))   # 27
    ds.update(3, 10)        # index 3: 3 -> 10
    print(ds.query(2, 7))   # 34
    print(ds.query(0, 9))   # 54

Line-by-Line Explanation

Construction

Compute block size close to sqrt(n).
Allocate block_sum for all blocks.
One pass over array to fill block sums.

Update Operation

Find block containing index.
Compute delta between new and old value.
Apply delta to both array and block summary.
No full recomputation needed.

Query Operation

If both indices in same block, direct loop.
Otherwise split into three segments: left partial, full blocks, right partial.
Use block_sum only for complete blocks.
Add remaining elements manually at boundaries.

Additional Worked Examples

Example: For arr = [4, 8, 1, 6, 3, 7, 2, 5, 9], if block size is 3, block sums are:

Block 0 (0..2): 4 + 8 + 1 = 13
Block 1 (3..5): 6 + 3 + 7 = 16
Block 2 (6..8): 2 + 5 + 9 = 16

Query [1, 7]:

Left partial: indices 1..2 -> 8 + 1 = 9
Full middle: block 1 -> 16
Right partial: indices 6..7 -> 2 + 5 = 7
Total = 9 + 16 + 7 = 32

Example: After update index 4 from 3 to 11, only block 1 changes by +8. New block 1 sum becomes 24. This shows why updates are efficient.

Time Complexity

Build: O(n)
Point update: O(1)
Range query:
- At most two partial blocks: about O(sqrt(n)) in worst case due to block size.
- Middle full blocks: about O(sqrt(n)).
Total per query: O(sqrt(n))

Space Complexity

Array copy: O(n)
Block summary: about O(sqrt(n))
Total auxiliary (excluding input storage choice): O(sqrt(n))

Edge Cases

n = 0: guard behavior (no valid updates/queries).
left > right: return neutral value (0 for sum).
Single-element range: direct access works naturally.
Last block smaller: always clamp loop bounds by n - 1.

Common Mistakes

Common Mistake: Forgetting to update block summary during point update, causing stale query answers.

Common Mistake: Incorrect block boundary math (especially end index of partial blocks).

Common Mistake: Assuming all blocks have equal size in loops; last block can be shorter.

Comparison With Nearby Approaches

Approach	Query	Update	When Useful
Brute Force	O(n)	O(1)	Very small inputs
Prefix Sum	O(1)	O(n) naive rebuild	Static arrays
Sqrt Decomposition	O(sqrt(n))	O(1)	Balanced simple solution
Segment Tree	O(log n)	O(log n)	High performance + flexibility

Pattern Recognition

Think of square root decomposition when:

You need many range queries and occasional updates.
Segment tree feels heavy for current constraints/time.
A block-level summary can combine answers quickly.
Problem constraints are around 10^5 with moderate operations and relaxed time limits.

Interview Insight

Interview Insight: In an interview, say: "I can solve this with sqrt decomposition in O(sqrt(n)) query and O(1) point update, then upgrade to segment tree if stricter complexity is required." This shows practical judgment and scalability awareness.

Practice Problems

Range sum query with point update.
Range minimum query with point update (store block minimums).
Count numbers greater than k in range (block sorting variant).
Jump-game style queries with block precomputed jump pointers (advanced sqrt decomposition).

Expert Tip: First implement only range sum + point update. Once correct, swap block summary logic (sum -> min/max/count) to adapt quickly to many problem variants.

Summary

Square Root Decomposition divides data into blocks of size about sqrt(n).
Query combines boundary scans with fast full-block summaries.
For range-sum + point-update, common complexity is O(sqrt(n)) query and O(1) update.
It is easier than segment trees and great for medium-complexity constraints.
This technique strengthens your understanding of preprocessing and amortized optimization.

20.4 Randomized Algorithms

Introduction

Randomized algorithms intentionally use randomness during execution to improve average performance, simplify logic, or avoid worst-case adversarial inputs. Instead of following exactly the same path for the same problem shape, the algorithm makes random choices (for example, choosing a random pivot in quicksort), which often leads to strong expected performance.

For beginners, the key mindset is this: randomness is not "guessing without logic." It is a controlled design tool with mathematical guarantees such as expected runtime, probability of error, or high-probability success.

Real-World Analogy

Imagine searching for one specific card in a large shuffled deck. If you always start from the top and the deck is adversarially arranged, you may be unlucky often. If you randomly sample strategically, your expected search behavior can become more stable against bad arrangements. Randomized algorithms similarly reduce dependence on input order patterns.

Formal Definition

A randomized algorithm is an algorithm that has access to random bits and may produce different internal execution paths for the same input. Its performance/correctness is analyzed probabilistically (expected time, error probability, success probability, etc.).

Concept Note: "Expected O(n log n)" means average over random choices made by the algorithm, not necessarily average over input distributions.

Why This Topic Matters

Many classic high-performance algorithms are randomized (randomized quicksort, randomized selection, hashing techniques).
Randomization often gives simpler implementations than deterministic worst-case-optimal approaches.
Prevents predictable worst-case behavior in adversarial settings.
Important for interviews, contests, distributed systems, and probabilistic data structures.

Mental Model

Deterministic Algorithm:
Input -> fixed path -> fixed output/runtime behavior

Randomized Algorithm:
Input + random bits -> one of many possible paths
                       -> output/runtime analyzed by probability

The output may still be always correct (Las Vegas), while runtime varies; or runtime may be bounded but output has tiny error chance (Monte Carlo).

Core Subtopics

1) Las Vegas vs Monte Carlo

Las Vegas: Always correct output, random runtime (example: randomized quicksort pivot choice affects runtime, not correctness).
Monte Carlo: Bounded runtime, but small probability of wrong answer (example: probabilistic primality tests with configurable error probability).

2) Expected Value in Algorithm Analysis

Expected runtime is a weighted average over all random choices. Even if some runs are slow, if they are rare, the expected runtime can still be excellent.

3) Amplification

For Monte Carlo methods, repeating the algorithm independently can reduce error probability exponentially.

Optimization Insight: If one trial fails with probability p, k independent trials fail together with probability p^k.

Evolution: Brute Force → Better → Optimal

Brute Force

Try all possibilities deterministically (often infeasible for large inputs).

May guarantee correctness, but time can explode.

Better

Use deterministic heuristics; may improve average behavior but can still be vulnerable to crafted worst-case inputs.

Optimal (practical perspective)

Use randomized design with probabilistic guarantees: strong expected performance, lower implementation complexity, robust behavior against adversarial patterns.

Step-by-Step Breakdown: Randomized Quicksort

Randomized quicksort picks a random pivot index in each recursive call, then partitions around it.

If subarray has 0 or 1 elements, return.
Pick random pivot index in current subarray.
Swap pivot to end (or start) for partition convenience.
Partition elements into less-than pivot and greater-or-equal regions.
Recursively sort left and right partitions.

ASCII Diagram

Array: [9, 1, 7, 3, 8, 2, 5]
Random pivot chosen: 3

Partition result:
[1, 2]  [3]  [9, 7, 8, 5]
  left  pivot   right

Recurse on left and right similarly.

Python Implementation

import random
from typing import List


def randomized_quicksort(arr: List[int]) -> List[int]:
    """
    Returns a new sorted list using randomized quicksort.
    Expected runtime: O(n log n), worst case: O(n^2) but unlikely.
    """
    nums = arr[:]  # avoid mutating caller data

    def partition(left: int, right: int) -> int:
        # Choose random pivot and move it to end
        pivot_idx = random.randint(left, right)
        nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx]
        pivot = nums[right]

        store = left
        for i in range(left, right):
            if nums[i] < pivot:
                nums[store], nums[i] = nums[i], nums[store]
                store += 1

        nums[store], nums[right] = nums[right], nums[store]
        return store

    def sort(left: int, right: int) -> None:
        if left >= right:
            return
        p = partition(left, right)
        sort(left, p - 1)
        sort(p + 1, right)

    if nums:
        sort(0, len(nums) - 1)
    return nums


if __name__ == "__main__":
    data = [9, 1, 7, 3, 8, 2, 5]
    print(randomized_quicksort(data))

Line-by-Line Explanation

pivot_idx = random.randint(left, right) ensures pivot choice does not depend on input order pattern.
Partition places pivot in final sorted position.
Elements less than pivot go left, others go right.
Recursive calls sort subproblems independently.
Correctness remains deterministic: final sorted output is always correct.
Randomness affects recursion shape and thus runtime.

Additional Example: Randomized Quickselect (k-th Smallest)

Quickselect finds the k-th smallest element. Random pivot gives expected O(n) time.

import random
from typing import List


def randomized_quickselect(arr: List[int], k: int) -> int:
    """
    Returns k-th smallest (1-indexed k).
    Expected O(n), worst O(n^2).
    """
    if not 1 <= k <= len(arr):
        raise ValueError("k out of range")

    nums = arr[:]
    target = k - 1
    left, right = 0, len(nums) - 1

    while left <= right:
        pivot_idx = random.randint(left, right)
        nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx]
        pivot = nums[right]

        store = left
        for i in range(left, right):
            if nums[i] < pivot:
                nums[store], nums[i] = nums[i], nums[store]
                store += 1
        nums[store], nums[right] = nums[right], nums[store]

        if store == target:
            return nums[store]
        if store < target:
            left = store + 1
        else:
            right = store - 1

    raise RuntimeError("Unexpected state")

Example: For [7, 10, 4, 3, 20, 15] and k = 3, quickselect returns 7 (3rd smallest) without fully sorting all elements.

Time Complexity

Randomized Quicksort

Expected: O(n log n)
Worst case: O(n^2) (very unlikely across random pivots)

Randomized Quickselect

Expected: O(n)
Worst case: O(n^2)

Expected bounds come from probability of balanced-enough partitions over random pivot choices.

Space Complexity

Quicksort recursion stack: expected O(log n), worst O(n).
Quickselect iterative version: O(1) extra (ignoring copied input array choice).

Edge Cases

Duplicate values: partition rules must handle equals consistently.
Already sorted input: deterministic bad pivot quicksort suffers, randomized pivot protects expected performance.
Very small arrays: overhead can dominate; hybrid with insertion sort is common in production.
Reproducibility needs: set random seed when deterministic testing is required.

Common Mistakes

Common Mistake: Assuming randomized means "always faster." It means better expected behavior, not guaranteed best single run.

Common Mistake: Using non-random pivot accidentally (for example fixed first element), losing probabilistic protection.

Common Mistake: Forgetting to analyze error probability in Monte Carlo algorithms.

When to Prefer Randomized Algorithms

Input can be adversarial or highly patterned.
Deterministic worst-case-optimal method is too complex to implement under time pressure.
Small probability of error is acceptable (Monte Carlo scenarios).
You need practical speed with simple implementation.

Pattern Recognition

Suspect randomization when you hear:

"Average performance is fine, worst-case patterns hurt."
"Need robust behavior across unknown or hostile test data."
"Can trade tiny failure probability for big performance gain."
"Need random pivot/hash/sampling for stability."

Interview Insight

Interview Insight: Clearly state the guarantee type: "This is Las Vegas (always correct, expected runtime bound)" or "This is Monte Carlo (bounded runtime, controllable error probability)." Interviewers value this precision.

Practice Problems

Implement randomized quicksort and compare against deterministic pivot on sorted inputs.
Implement randomized quickselect for k-th smallest.
Simulate repeated Monte Carlo trials and compute empirical error reduction.
Design a random sampling approach to estimate majority candidate before verification.

Expert Tip: In production or contests, combine randomness with verification steps when possible. Randomization gives speed; verification gives confidence.

Summary

Randomized algorithms use controlled randomness to improve expected behavior.
Las Vegas: always correct, random runtime. Monte Carlo: bounded runtime, small error chance.
Classic examples include randomized quicksort and randomized quickselect.
The right analysis is probabilistic (expected time, error probability, high probability bounds).
Randomization is a practical engineering tool, not a shortcut around correctness.

20.5 Reduction Techniques

Introduction

Reduction is one of the most important problem-solving techniques in computer science. The idea is simple but powerful: instead of solving a new problem directly, transform it into another problem you already know how to solve efficiently. If the transformation is correct and efficient, you inherit the known solution.

For beginners aiming at mastery, reduction changes how you think about hard problems. You stop asking only "How do I solve this from scratch?" and start asking "Which known problem does this resemble, and how can I map it there?"

Real-World Analogy

Suppose you have a document in a rare file format. You do not build a custom printer driver for that format. You convert it to PDF, then use standard printing tools. You solved your original task by reducing it to a widely supported one.

Formal Definition

A problem A is reducible to problem B (written informally as A -> B) if any instance of A can be transformed into an instance of B such that solving that transformed instance yields a correct solution for A, and the transformation itself is computationally efficient.

Concept Note: In algorithm design, reduction is often used to reuse known algorithms. In complexity theory, reductions are used to compare hardness between problem classes.

Why This Topic Matters

Speeds up problem solving by reusing known patterns and algorithms.
Essential for interview questions where direct solution looks messy.
Foundation of NP-hardness/NP-completeness proofs.
Improves code quality: cleaner logic, less reinvention, fewer bugs.

Mental Model

Original Problem A
      |
      | transform(input)
      v
Known Problem B  -- solve with known algorithm --> result_B
      |
      | map back (if needed)
      v
Answer for Problem A

The transformation and mapping back must preserve correctness.

Core Types of Reductions

1) Algorithmic Reduction (Practical)

Reduce to a standard data structure or algorithm you already know, then solve quickly in interviews/contests.

2) Complexity Reduction (Theory)

Use polynomial-time transformations to show relative hardness (for example, prove new problem is NP-hard by reducing a known NP-hard problem to it).

3) Decision vs Optimization Reduction

Sometimes an optimization problem can be solved via repeated decision checks (often with binary search on answer).

Evolution: Brute Force → Better → Optimal

Brute Force

Design custom algorithm directly on original statement, often ending in high complexity and complex case handling.

Better

Spot partial structure but still perform significant manual logic for special cases.

Optimal (thinking approach)

Recognize canonical target problem, reduce cleanly, apply proven solver, and map result back.

Optimization Insight: A good reduction is both correctness-preserving and complexity-improving compared to direct brute force.

Step-by-Step Reduction Framework

State source problem clearly: input, output, constraints.
Identify target known problem: sorting, hashing, graph shortest path, bipartite matching, etc.
Define transformation: exactly how source input becomes target input.
Prove correctness: show answer equivalence in both directions when needed.
Analyze complexity: transformation cost + target solver cost + reverse mapping cost.
Handle edge cases: ensure transformation remains valid on boundaries.

Reduction Example 1: Pair Sum → Hash Lookup

Problem

Given array nums and target T, determine if any pair sums to T.

Reduction Idea

Reduce pair search to repeated lookup problem: for each x, check whether T - x has been seen before.

Pair Sum
  -> For each element x
  -> Need existence of (T - x)
  -> Hash set membership query

Python Implementation

from typing import List


def has_two_sum(nums: List[int], target: int) -> bool:
    seen = set()
    for x in nums:
        if target - x in seen:
            return True
        seen.add(x)
    return False

Line-by-Line Explanation

seen stores numbers processed so far.
For current x, complement is target - x.
If complement exists in seen, valid pair found.
Otherwise add x and continue.

Reduction result: from naive O(n^2) pair checking to expected O(n).

Reduction Example 2: Scheduling With Deadlines → Sort + Heap

Problem

Each task has duration and deadline. Maximize number of tasks completed before deadlines.

Reduction Idea

Reduce to ordered processing by deadlines, maintaining chosen durations in a max-heap. If total time exceeds current deadline, remove longest task.

Python Implementation

import heapq
from typing import List, Tuple


def max_tasks(tasks: List[Tuple[int, int]]) -> int:
    """
    tasks[i] = (duration, deadline)
    """
    tasks.sort(key=lambda x: x[1])  # reduce to deadline order

    total = 0
    max_heap = []  # store negative durations

    for duration, deadline in tasks:
        total += duration
        heapq.heappush(max_heap, -duration)

        if total > deadline:
            longest = -heapq.heappop(max_heap)
            total -= longest

    return len(max_heap)

Why Reduction Works

Sorting by deadline makes each prefix a local feasibility checkpoint.
When infeasible, removing longest duration gives best chance to fit more tasks.
Heap gives fast access to longest chosen task.

ASCII Diagram

Source input
   |
   | transform
   v
Sorted/graph/heap/hash representation
   |
   | apply known algorithm
   v
intermediate solution
   |
   | interpret/map back
   v
final answer

Time Complexity (How to Derive Properly)

Always split analysis into components:

Transformation cost: build target instance.
Solver cost: run known algorithm on transformed instance.
Mapping-back cost: convert result to original problem output format.

Total complexity is the sum of these components.

Example: If transformation is O(n), solver is O(n log n), and mapping back is O(n), total is O(n log n).

Space Complexity

Depends on target representation (hash set, heap, graph, DP table).
Include extra memory introduced by transformation, not just solver memory.
In interviews, explicitly mention whether transformation is in-place or auxiliary.

Edge Cases

Empty input: transformed problem must remain valid.
Duplicate elements: mapping should preserve multiplicity if needed.
Signed/large values: avoid assumptions that break hashing/sorting logic.
Constraint mismatches: ensure target algorithm assumptions actually hold.

Common Mistakes

Common Mistake: Performing a transformation that loses information required to reconstruct the final answer.

Common Mistake: Claiming reduction without proving correctness relation between original and transformed instances.

Common Mistake: Ignoring transformation cost and only quoting target algorithm complexity.

Pattern Recognition

Reduction is likely when problem statements include clues like:

"Can this be sorted first?"
"Can each query become membership lookup?"
"Can this interval/task/string problem become graph traversal?"
"Can optimization be solved via binary search + feasibility check?"

Interview Insight

Interview Insight: Say your reduction explicitly: "I will reduce this to X by transforming each input element into Y. Then I'll apply algorithm Z. Correctness follows because ...". This communication style strongly improves interview performance.

Practice Problems

Reduce interval overlap checks to sorting + linear scan.
Reduce duplicate detection to hash set membership.
Reduce shortest transformation sequence problem to BFS on implicit graph.
Reduce capacity minimization problem to binary search on answer + greedy feasibility.

Expert Tip: Build a "target toolbox" in your head: sort, prefix sums, hash map, heap, union-find, BFS/DFS, shortest path, DP. Most strong reductions map into one of these.

Summary

Reduction means solving a new problem by transforming it into a known one.
A valid reduction must preserve correctness and stay computationally efficient.
It is both a practical interview technique and a core complexity-theory concept.
Always analyze transformation + solve + mapping-back costs together.
Mastering reduction greatly accelerates advanced problem-solving ability.

20.6 NP-Completeness Basics

Introduction

NP-Completeness is a foundational concept in theoretical computer science that helps us classify problems by computational difficulty. It tells us why some problems likely do not have fast exact algorithms and guides us toward practical alternatives like approximation, heuristics, or special-case optimization.

For beginners, this topic is not about memorizing heavy theory symbols. It is about understanding a powerful decision framework: when to keep searching for a fast algorithm, and when to stop and change strategy.

Real-World Analogy

Imagine packing a truck with many packages where each has size and value constraints. Finding the perfect combination might be possible for small inputs, but becomes explosively hard at scale. NP-Completeness is the formal language for this "combinatorial explosion" and helps engineers choose realistic methods under time limits.

Formal Definition

P (Polynomial Time)

Class of decision problems solvable in polynomial time by a deterministic algorithm.

NP (Nondeterministic Polynomial Time)

Class of decision problems where a proposed solution can be verified in polynomial time.

NP-Hard

A problem is NP-hard if every problem in NP can be polynomial-time reduced to it. NP-hard problems are at least as hard as NP problems, and may not even be decision problems.

NP-Complete

A decision problem is NP-complete if it is both:

In NP, and
NP-hard.

Concept Note: If any NP-complete problem has a polynomial-time exact algorithm, then all problems in NP do. That would imply P = NP.

Why This Topic Matters

Explains why many optimization problems resist efficient exact algorithms.
Prevents wasted effort trying to force impossible asymptotic improvements.
Guides practical choices: approximation, randomized methods, branch-and-bound, ILP, or constraint solvers.
Frequently tested in higher-level interviews and competitive programming discussions.

Mental Model

P  ⊆  NP
      |
      | hardest problems inside NP
      v
  NP-Complete

NP-Hard includes NP-Complete and possibly even harder/non-decision problems.

Think of NP-complete problems as "universal hard cores" of NP: if you solve one efficiently, you unlock all.

Core Subtopics

1) Decision vs Optimization Form

NP-completeness is defined on decision problems (yes/no). Many optimization problems are handled by converting them to decision versions.

2) Polynomial-Time Reduction

To show problem B is hard, reduce known hard problem A to B (A -> B). This means: if we could solve B fast, we could solve A fast.

3) Verification Perspective

For NP membership, you do not need to find a solution quickly; you only need to verify a candidate quickly.

Evolution: Brute Force → Better → Optimal

Brute Force

Enumerate all possibilities (subsets, permutations, assignments). Usually exponential.

Better

Use pruning, memoization, or special constraints. Works for medium-sized instances but not worst-case scalable.

Optimal (strategy-level, not always exact algorithm)

Classify hardness first. If NP-complete, choose best practical approach: approximation, heuristic search, or exact algorithms on small/structured cases.

Optimization Insight: In NP-complete settings, the biggest optimization is often choosing the right problem-solving paradigm, not micro-optimizing code.

Step-by-Step: How to Prove NP-Completeness

Convert your target problem to a decision version (if needed).
Show target is in NP (polynomial-time verifier).
Pick a known NP-complete source problem.
Build polynomial-time reduction from source to target.
Prove correctness of transformation (yes-instance maps to yes-instance, no to no).
Conclude target is NP-hard; with step 2, target is NP-complete.

ASCII Diagram

Known NP-Complete Problem A
          |
          | polynomial-time reduction
          v
Target Problem B

If B were easy (poly-time), A would be easy.
So B is at least as hard as A.

Example 1: Subset Sum (Decision Form)

Problem: Given integers and target T, is there a subset with sum exactly T?

Given candidate subset, verification is polynomial (compute sum and compare).
Hence in NP.
Known NP-complete problem (decision form).

Example: nums = [3, 34, 4, 12, 5, 2], T = 9 -> yes (4 + 5).

Example 2: Traveling Salesman Problem (Decision Form)

Decision TSP: Is there a tour with total cost <= K?

Candidate tour can be verified quickly by summing edge costs and checking validity.
Decision version is NP-complete; optimization version is NP-hard.

Common Mistake: Mixing decision and optimization versions during complexity classification.

Python Helper (Verifier Mindset Demonstration)

This code does not solve NP-complete problems efficiently; it demonstrates polynomial-time verification for a candidate certificate.

from typing import List


def verify_subset_sum(nums: List[int], chosen_indices: List[int], target: int) -> bool:
    """
    Verifier for Subset Sum certificate:
    chosen_indices represent one proposed subset.
    Runs in polynomial time.
    """
    total = 0
    n = len(nums)

    for idx in chosen_indices:
        if idx < 0 or idx >= n:
            return False
        total += nums[idx]

    return total == target


if __name__ == "__main__":
    nums = [3, 34, 4, 12, 5, 2]
    cert = [2, 4]  # nums[2]=4, nums[4]=5
    print(verify_subset_sum(nums, cert, 9))  # True

Line-by-Line Explanation

Certificate is list of indices representing a proposed subset.
Verifier checks index validity and accumulates selected values.
Final equality check to target decides acceptance.
This is polynomial-time verification, illustrating NP membership intuition.

Time Complexity Perspective

For NP-complete problems, no known polynomial-time exact algorithms in general case.
Exact methods often exponential: O(2^n), O(n!), etc.
Verification of certificates for NP problems is polynomial.

Space Complexity Perspective

Depends on chosen method: brute force recursion, DP pseudo-polynomial tables, branch-and-bound, SAT encoding, etc.
When discussing NP-completeness, focus primarily on asymptotic hardness class and reduction correctness.

Edge Cases and Clarifications

Pseudo-polynomial algorithms: may exist (for example DP on subset sum values), but do not imply problem is in P.
Special graph/input structures: many NP-hard problems become easy on restricted cases.
Approximation possibility: NP-hard does not always mean no good approximation, but guarantees vary by problem.
Practical solvability: small input sizes can still be solved exactly using optimized exponential methods.

Common Mistakes

Common Mistake: Saying "NP means not polynomial." Correct meaning: verifiable in polynomial time.

Common Mistake: Claiming NP-complete proof without providing a valid polynomial-time reduction.

Common Mistake: Assuming NP-hard automatically implies NP-complete (it must also be in NP and usually decision-form).

Pattern Recognition

Suspect NP-complete territory when you see:

Combinatorial selection with global constraints (subsets, partitions, schedules, tours).
Decision form asks "Does there exist ... ?".
Huge search space with no obvious greedy/local-optimal guarantee.
Problems resembling SAT, 3-SAT, Clique, Vertex Cover, Subset Sum, Hamiltonian Cycle, TSP decision.

Interview Insight

Interview Insight: Strong answers say: "This looks NP-hard/NP-complete in general, so I will either solve a constrained version exactly, or provide approximation/heuristic with clear trade-offs." This shows both theory awareness and engineering practicality.

Practice Problems

Classify decision vs optimization versions of Knapsack, TSP, and Set Cover.
Write a verifier for Clique certificate.
Explain why Subset Sum has pseudo-polynomial DP but is still NP-complete.
Map a new problem to a known NP-complete problem via reduction sketch.

Expert Tip: Keep a "source catalog" of classic NP-complete problems (3-SAT, Vertex Cover, Clique, Subset Sum). In proofs, selecting the right source problem often makes the reduction much cleaner.

Summary

NP-completeness classifies hardest decision problems in NP.
To prove NP-complete: show in NP and show NP-hard via polynomial-time reduction.
Decision vs optimization distinction is essential.
This theory directly guides practical strategy choices for hard real-world problems.
Mastering basics here prepares you for P vs NP and advanced complexity topics.

20.7 P vs NP

Introduction

The P vs NP question is one of the most famous open problems in computer science and mathematics. It asks whether every problem whose solution can be verified quickly can also be solved quickly. This sounds simple, but its implications affect optimization, cryptography, AI, scheduling, logistics, and many real-world engineering systems.

As a beginner, your goal is not to "solve" P vs NP. Your goal is to deeply understand what the question means, why it matters, and how it changes practical algorithm design decisions.

Real-World Analogy

Suppose someone gives you a completed Sudoku. Checking if it is valid is fast. But creating a valid filled Sudoku from scratch can be much harder. P vs NP asks whether this "easy to verify but hard to find" gap is fundamental, or whether we just have not discovered the right fast algorithms yet.

Formal Definitions

P

Set of decision problems solvable in polynomial time by deterministic algorithms.

NP

Set of decision problems for which a proposed solution (certificate) can be verified in polynomial time.

The P vs NP Question

Is P = NP ?

If yes: every efficiently verifiable problem is also efficiently solvable.
If no: there exist problems that are easy to verify but inherently hard to solve.

Concept Note: We know P ⊆ NP. The unknown part is whether this inclusion is strict.

Why This Topic Matters

Explains why many practical optimization problems remain computationally difficult.
Influences cryptographic assumptions (for example, difficulty of certain factoring/discrete-log style tasks).
Helps engineers choose realistic strategies: exact, approximate, heuristic, or probabilistic.
Improves interview communication when discussing hard problem classes.

Mental Model

All easy-to-solve decision problems  -> P
All easy-to-verify decision problems -> NP

Known: P is inside NP.
Unknown: Is NP strictly bigger?

Think of P as "construct answer quickly" and NP as "check answer quickly."

Core Subtopics

1) Verification vs Construction

NP focuses on verification efficiency, not construction efficiency. Many learners confuse these.

2) NP-Complete Problems

NP-complete problems are the hardest problems in NP. If one NP-complete problem is in P, then P = NP.

3) Practical Consequence

Because P = NP is unproven, practitioners assume hard instances remain hard and build robust approximate/heuristic pipelines.

Evolution: Brute Force → Better → Optimal (Practical Strategy)

Brute Force

Search all possible solutions and verify each. Correct but usually exponential.

Better

Use pruning, branch-and-bound, memoization, or constraint propagation to shrink search space.

Optimal (engineering mindset under current knowledge)

Classify complexity, then choose constrained exact methods, approximations, or heuristics rather than chasing unknown polynomial-time exact solutions for NP-complete cases.

Optimization Insight: Correct complexity classification often saves more time than any low-level optimization effort.

Step-by-Step Reasoning Workflow for New Problems

Convert to decision version if needed.
Check if candidate answers can be verified in polynomial time (NP membership clue).
Look for reduction to/from known NP-complete problems.
If likely NP-hard, decide practical path: exact for small n, approximation, or heuristic.
Communicate trade-offs clearly (quality vs runtime vs guarantees).

ASCII Diagram

P ?= NP

Case 1: P = NP
  easy-to-verify  => easy-to-solve

Case 2: P != NP
  some problems remain verification-easy but solution-hard

Concrete Example: SAT Intuition

SAT (Boolean satisfiability) asks whether there exists an assignment of variables that makes a boolean formula true.

Given an assignment, verification is fast (evaluate formula).
Finding a satisfying assignment may be hard for large instances.
SAT was the first proven NP-complete problem.

Example: Formula (x1 OR x2) AND (NOT x1 OR x3). If someone gives x1=False, x2=True, x3=True, checking truth is quick.

Python Implementation (Verification Demonstration)

This snippet demonstrates fast verification (NP perspective), not fast universal solving.

from typing import Dict, List, Tuple

# Clause represented as list of literals:
# (var_name, is_positive)
Clause = List[Tuple[str, bool]]


def verify_cnf(clauses: List[Clause], assignment: Dict[str, bool]) -> bool:
    """
    Verifies if a given assignment satisfies a CNF formula.
    Runs in polynomial time in formula size.
    """
    for clause in clauses:
        clause_ok = False
        for var, is_positive in clause:
            if var not in assignment:
                return False
            value = assignment[var]
            literal_value = value if is_positive else (not value)
            if literal_value:
                clause_ok = True
                break
        if not clause_ok:
            return False
    return True


if __name__ == "__main__":
    # (x1 OR x2) AND (NOT x1 OR x3)
    formula = [
        [("x1", True), ("x2", True)],
        [("x1", False), ("x3", True)]
    ]
    assignment = {"x1": False, "x2": True, "x3": True}
    print(verify_cnf(formula, assignment))  # True

Line-by-Line Explanation

Each clause is checked independently.
A clause is satisfied if at least one literal evaluates true.
Formula is satisfied only if all clauses are satisfied.
This verification is polynomial in number of literals and clauses.

Time Complexity Perspective

Verification for NP problems is polynomial by definition.
Exact solving for NP-complete problems has no known polynomial-time algorithm in general.
Brute force for SAT-style problems can be O(2^n) over variable assignments.

Space Complexity Perspective

Verification often needs only input + assignment storage.
Exact search methods may require recursion stacks, memo tables, or solver state structures.

Edge Cases and Clarifications

Small inputs: NP-hard problems can still be solved exactly in practice.
Structured instances: special constraints can make hard problems tractable.
Pseudo-polynomial algorithms: do not automatically place a problem in P.
Randomized/approximate solvers: useful in practice without resolving P vs NP.

Common Mistakes

Common Mistake: Interpreting NP as "non-polynomial." NP means polynomial-time verifiable.

Common Mistake: Saying "hard in worst case means always hard." Many practical instances are easier due to structure.

Common Mistake: Forgetting that NP-complete discussion applies to decision versions.

Pattern Recognition

You are in P vs NP territory when a problem has:

Existential decision phrasing: "Does there exist...?"
Easy candidate checking but difficult candidate finding.
Large combinatorial search spaces (assignments/subsets/tours/schedules).
Connections to SAT, Clique, Vertex Cover, Hamiltonian Cycle, TSP decision, etc.

Interview Insight

Interview Insight: A strong answer is: "This resembles NP-hard/NP-complete structure. I will provide exact approach for small constraints and an approximation/heuristic for scale, with explicit trade-offs." This shows theory + practical judgment.

Practice Problems

Differentiate P, NP, NP-hard, NP-complete with examples.
Write verifiers for SAT and Clique candidate certificates.
Convert optimization formulations into decision forms.
For a known NP-hard problem, design one exact-small and one approximate-large strategy.

Expert Tip: In system design and production engineering, "hardness awareness" helps set realistic SLAs and choose appropriate solver technology early.

Summary

P vs NP asks whether fast verification implies fast solving.
We know P ⊆ NP; equality remains unknown.
NP-complete problems are central: one polynomial algorithm there would collapse the gap.
In practice, treat NP-complete problems with strategy diversity: exact, approximate, heuristic.
Understanding this topic improves both interview communication and real-world algorithm decisions.

20.8 Simulated Annealing & Heuristic Search

Introduction

Simulated Annealing and Heuristic Search are practical strategies for solving hard optimization problems where exact algorithms are too slow. Instead of guaranteeing the perfect global optimum every time, these methods aim to find very good solutions quickly, especially for large NP-hard search spaces.

For beginners, this topic is the bridge between theory and real-world engineering: when exact methods are unrealistic, you still need smart, measurable ways to produce high-quality solutions.

Real-World Analogy

Imagine trying to find the lowest valley in a huge mountain range at night. A purely greedy strategy always walking downhill can get stuck in a nearby valley (local minimum). Simulated Annealing sometimes allows uphill moves early, so you can escape shallow valleys and eventually settle into deeper ones as the "temperature" cools down.

Formal Definitions

Heuristic Search

A search strategy guided by domain-specific rules or scoring functions to find good solutions faster than exhaustive search.

Simulated Annealing (SA)

A probabilistic local-search metaheuristic inspired by thermal annealing in metallurgy. At each step, it explores a neighboring solution and may accept worse moves with probability based on temperature and score difference.

Concept Note: SA is usually a metaheuristic, meaning it needs a problem-specific solution representation, neighbor-generation method, and objective function.

Why This Topic Matters

Provides practical tools for NP-hard optimization where exact methods do not scale.
Useful in routing, scheduling, layout optimization, hyperparameter tuning, and game AI.
Teaches trade-offs between solution quality and runtime budget.
Frequently discussed in advanced interviews as "what would you do at scale?"

Mental Model

Current solution S
   |
   | generate neighbor S'
   v
If better -> accept
If worse  -> maybe accept with probability exp(-delta / T)
   |
   v
Lower temperature T gradually

High T: explore widely
Low  T: exploit best regions

Core Subtopics

1) Objective Function

A numeric score to optimize (minimize cost or maximize reward). Everything depends on objective quality.

2) Neighborhood Design

How to move from one candidate solution to a nearby one (swap two cities, reassign one task, flip one bit, etc.).

3) Cooling Schedule

How temperature decreases over time (geometric cooling, linear cooling, adaptive cooling).

4) Acceptance Rule

If neighbor is better, accept. If worse by delta, accept with probability exp(-delta / T).

Evolution: Brute Force → Better → Optimal (Practical)

Brute Force

Try every possible solution. Correct but impossible for large combinatorial spaces.

Better

Greedy/local search quickly finds decent solutions but often gets trapped in local optima.

Optimal (under large-scale constraints)

Use heuristic/metaheuristic methods (SA, tabu search, genetic algorithms, beam search) with quality-time trade-off controls.

Optimization Insight: For hard optimization, "best solution within time budget" is often more valuable than "provably optimal solution that never finishes."

Step-by-Step: Simulated Annealing Workflow

Pick initial solution S (random or heuristic seed).
Set initial temperature T0 and cooling rate alpha.
Generate neighbor S'.
Compute delta = cost(S') - cost(S) for minimization.
If delta <= 0, accept S'.
Else accept with probability exp(-delta / T).
Track best solution seen so far.
Cool temperature: T = T * alpha.
Stop when T low enough or iteration/time limit reached.

ASCII Diagram

Search Landscape (cost):

High cost        /\      /\ 
                /  \____/  \___
               /             \____  <- global minimum region
              /\__ local min

Greedy: falls and gets stuck at local min
SA: can jump out early (high T), then settle later (low T)

Python Implementation (TSP-Style Demonstration)

import math
import random
from typing import List, Tuple

Point = Tuple[float, float]


def distance(a: Point, b: Point) -> float:
    dx = a[0] - b[0]
    dy = a[1] - b[1]
    return math.sqrt(dx * dx + dy * dy)


def route_cost(route: List[int], points: List[Point]) -> float:
    total = 0.0
    n = len(route)
    for i in range(n):
        p = points[route[i]]
        q = points[route[(i + 1) % n]]  # cycle
        total += distance(p, q)
    return total


def random_neighbor(route: List[int]) -> List[int]:
    # Swap two random positions
    n = len(route)
    i, j = random.sample(range(n), 2)
    new_route = route[:]
    new_route[i], new_route[j] = new_route[j], new_route[i]
    return new_route


def simulated_annealing_tsp(
    points: List[Point],
    initial_temp: float = 1000.0,
    cooling_rate: float = 0.995,
    min_temp: float = 1e-3,
    iterations_per_temp: int = 200
) -> Tuple[List[int], float]:
    n = len(points)
    current = list(range(n))
    random.shuffle(current)
    current_cost = route_cost(current, points)

    best = current[:]
    best_cost = current_cost

    temp = initial_temp

    while temp > min_temp:
        for _ in range(iterations_per_temp):
            candidate = random_neighbor(current)
            candidate_cost = route_cost(candidate, points)
            delta = candidate_cost - current_cost

            # Always accept better; sometimes accept worse
            if delta <= 0 or random.random() < math.exp(-delta / temp):
                current = candidate
                current_cost = candidate_cost

                if current_cost < best_cost:
                    best = current[:]
                    best_cost = current_cost

        temp *= cooling_rate

    return best, best_cost


if __name__ == "__main__":
    pts = [(0, 0), (1, 5), (5, 2), (6, 6), (8, 3), (2, 1)]
    best_route, best_cost = simulated_annealing_tsp(pts)
    print("Best route:", best_route)
    print("Best cost:", round(best_cost, 3))

Line-by-Line Explanation

route_cost measures objective (tour length).
random_neighbor defines neighborhood by swapping two cities.
delta compares candidate and current cost.
Acceptance rule balances exploration and exploitation.
best is tracked independently from current to avoid losing strong solutions.
Temperature cooling gradually reduces probability of accepting worse moves.

Additional Worked Example (Acceptance Probability)

Example: Suppose current cost is 100 and candidate cost is 105, so delta = 5.

At T = 100: accept probability exp(-5/100) ≈ 0.951 (very likely).
At T = 1: accept probability exp(-5/1) ≈ 0.0067 (very unlikely).

This is exactly the exploration-to-exploitation transition.

Time Complexity

Let:

K = number of temperature levels
I = iterations per temperature
N_eval = cost to evaluate one candidate

Total complexity is roughly O(K * I * N_eval).

For TSP-like full route recomputation, N_eval = O(n).
With incremental delta evaluation, this can be reduced in many problems.

Space Complexity

Store current and best solution: typically O(n).
Extra overhead depends on representation and neighbor generation.

Edge Cases

Temperature too low initially: behaves almost greedy, poor exploration.
Cooling too fast: freezes before discovering strong regions.
Cooling too slow: high runtime.
Weak neighborhood: search cannot escape structural traps effectively.

Common Mistakes

Common Mistake: Not defining a meaningful objective function. Bad objective means bad optimization regardless of algorithm.

Common Mistake: Using one fixed parameter setting for all problem sizes/instances.

Common Mistake: Reporting one run only; stochastic methods should be evaluated over multiple runs/seeds.

Comparison: Greedy vs SA vs Exact

Approach	Quality	Speed	Guarantee
Greedy	Often decent	Very fast	Rarely global optimal
Simulated Annealing	Usually better with tuning	Moderate	No exact guarantee
Exact Search/DP	Optimal	Often too slow at scale	Optimality proof

Pattern Recognition

Use SA/heuristic search when:

Problem is NP-hard and input size is large.
Exact optimum is less important than high-quality solution quickly.
Objective function is easy to evaluate.
You can design meaningful neighborhood transitions.

Interview Insight

Interview Insight: Say explicitly: "I cannot guarantee global optimum efficiently for this scale, so I will use a metaheuristic (Simulated Annealing) with measurable quality metrics, multiple seeds, and runtime budget constraints." This is a strong senior-level response.

Practice Problems

Implement SA for Traveling Salesman with swap and 2-opt neighborhoods.
Apply SA to maximize score in a scheduling/assignment toy problem.
Compare greedy vs SA quality across 20 random seeds.
Tune cooling schedules and plot cost vs iteration.

Expert Tip: Always benchmark stochastic algorithms with fixed random seeds for reproducibility, then evaluate robustness using multiple random restarts.

Summary

Heuristic search aims for strong solutions quickly in hard search spaces.
Simulated Annealing escapes local minima by probabilistically accepting worse moves early.
Quality depends on objective design, neighborhood design, and cooling schedule.
No exact optimality guarantee, but often excellent practical performance.
Essential for large-scale optimization where exact methods are infeasible.

21.1 Design LRU Cache

Introduction

Designing an LRU (Least Recently Used) Cache is one of the most important data structure design problems for interviews and real-world systems. The cache stores key-value pairs with fixed capacity. When capacity is full and a new key must be inserted, we evict the least recently used key.

The core challenge is not just implementing get/set behavior. The challenge is achieving both operations in constant time, O(1).

Real-World Analogy

Imagine a study desk with space for only 3 books. The books you use often stay on the desk. If a new book comes and desk is full, you remove the book that has not been touched for the longest time. That is exactly LRU behavior.

Formal Definition

An LRU cache supports:

get(key): return value if key exists; otherwise return -1.
put(key, value): insert/update key-value pair.

Constraint: both operations should run in O(1) average time.

Concept Note: "Recently used" means any successful get/put on that key moves it to most-recent position.

Why This Topic Matters

Classic interview design question testing data-structure composition.
Used in CPU caches, DB buffer pools, API caching layers, and web backends.
Teaches how to combine fast lookup with fast order maintenance.
Builds foundation for advanced designs like LFU and ARC caches.

Mental Model

Need two abilities together:
1) Find by key quickly         -> Hash Map (dict)
2) Track usage order quickly   -> Doubly Linked List

Most recent <-> ... <-> Least recent

Hash map gives direct node access. Doubly linked list gives O(1) move-to-front and O(1) remove-from-end eviction.

Evolution: Brute Force → Better → Optimal

Brute Force

Use list of keys by recency. Lookup is O(n); updates may also be O(n).

Better

Use hash map for key lookup and separate array/list for order. Still costly to move keys in middle due to shifting.

Optimal

Use Hash Map + Doubly Linked List:

get: O(1) lookup + O(1) move node to front.
put: O(1) insert/update + O(1) tail eviction if needed.

Optimization Insight: The key design trick is storing references to linked-list nodes in the hash map. Without node references, removing/moving arbitrary nodes becomes slow.

Data Structure Blueprint

Doubly Linked List Role

Head side: most recently used (MRU).
Tail side: least recently used (LRU).
Supports O(1) remove and O(1) insert at front if node pointer is known.

Hash Map Role

map[key] = node_reference
Provides O(1) direct access to node for get/update.

Step-by-Step Operations

get(key)

If key not in map, return -1.
Get node from map.
Remove node from current place in list.
Insert node right after head (MRU position).
Return node value.

put(key, value)

If key already exists:
- Update node value.
- Move node to MRU position.
If key does not exist:
- Create node and insert at MRU position.
- Add key -> node in map.
- If size exceeds capacity, remove tail's previous node (LRU) and delete from map.

ASCII Diagram

Head <-> [K3] <-> [K1] <-> [K7] <-> Tail
          MRU                    LRU

get(K1):
Head <-> [K1] <-> [K3] <-> [K7] <-> Tail

put(new) when full:
Evict K7 (near tail), insert new near head

Python Implementation

class Node:
    def __init__(self, key: int, value: int):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None


class LRUCache:
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = {}  # key -> node

        # Dummy head and tail to simplify edge cases
        self.head = Node(0, 0)  # MRU side next to head
        self.tail = Node(0, 0)  # LRU side prev to tail
        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node: Node) -> None:
        prev_node = node.prev
        next_node = node.next
        prev_node.next = next_node
        next_node.prev = prev_node

    def _add_to_front(self, node: Node) -> None:
        first_real = self.head.next
        self.head.next = node
        node.prev = self.head
        node.next = first_real
        first_real.prev = node

    def get(self, key: int) -> int:
        if key not in self.cache:
            return -1

        node = self.cache[key]
        self._remove(node)
        self._add_to_front(node)
        return node.value

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            node = self.cache[key]
            node.value = value
            self._remove(node)
            self._add_to_front(node)
            return

        new_node = Node(key, value)
        self.cache[key] = new_node
        self._add_to_front(new_node)

        if len(self.cache) > self.capacity:
            # Evict least recently used node (just before tail)
            lru = self.tail.prev
            self._remove(lru)
            del self.cache[lru.key]

Line-by-Line Explanation

Node and Sentinels

Node stores key, value, prev, next.
Dummy head and tail avoid null-check complexity at boundaries.

Private Helpers

_remove(node): unlink node in O(1).
_add_to_front(node): place node after head in O(1).

Public API

get: check map, move accessed node to front, return value.
put existing key: update + move to front.
put new key: insert front; if overflow, evict tail-prev.

Worked Example

Example: Capacity = 2

put(1, 10) -> cache: [1]
put(2, 20) -> cache: [2, 1] (2 is MRU)
get(1) returns 10 -> order becomes [1, 2]
put(3, 30) -> capacity exceeded, evict LRU key 2 -> order [3, 1]
get(2) returns -1

Time Complexity

get: O(1) average
put: O(1) average
Why: hashmap lookup O(1) average, linked-list pointer operations O(1)

Space Complexity

O(capacity) for hash map + linked list nodes.

Edge Cases

Capacity = 1: every new key evicts previous key.
Put existing key: must update value and mark as most recent.
Get missing key: return -1 without modifying order.
Repeated gets: key should remain near front.

Common Mistakes

Common Mistake: Forgetting to move node to front on get, which breaks recency semantics.

Common Mistake: Using singly linked list, making arbitrary node removal O(n).

Common Mistake: Evicting wrong side (MRU instead of LRU).

Pattern Recognition

Use this design pattern when problem asks for:

Fast lookup by key + fast recency/frequency order maintenance.
O(1) operations under fixed capacity policy.
Eviction by usage order (LRU/LFU variants).

Interview Insight

Interview Insight: In interviews, state design first: "I will combine hashmap for O(1) key lookup and doubly linked list for O(1) recency updates/eviction." Then walk through one get and one put-with-eviction example.

Practice Problems

Implement LRU Cache with same API and test edge cases.
Add delete(key) operation in O(1).
Implement thread-safe LRU conceptually (locks/read-write strategy).
Compare LRU vs LFU behavior on repeated access patterns.

Expert Tip: In production, consider built-in ordered dictionaries for simpler implementation, but understand manual DLL + hashmap deeply for interview mastery and custom policy extensions.

Summary

LRU cache evicts the least recently used key when full.
Optimal design is Hash Map + Doubly Linked List.
Both get and put achieve O(1) average time.
Correct recency updates are as important as correct lookup.
This pattern is a cornerstone of data-structure design interviews.

21.2 Design LFU Cache

Introduction

An LFU (Least Frequently Used) cache evicts the key with the lowest access frequency when the cache is full. If several keys share the same minimum frequency, the usual tie-break is LRU among those keys (evict the least recently used among the least frequent).

This topic builds directly on LRU thinking, but the eviction policy depends on counts of gets and puts, not only recency.

Real-World Analogy

Imagine a small bookshelf that fits only a few books. You track how often each book is opened. When you need space for a new book, you remove the one opened least often. If two books were opened equally rarely, you remove the one you have not touched for the longest time.

Formal Definition

Support:

get(key): return the value if present; otherwise -1. A successful get increments that key’s frequency.
put(key, value): set or insert. Inserting a new key increments its frequency to 1. Updating an existing key also counts as use (frequency increases).

When capacity is exceeded, evict the LFU key; break ties by LRU.

Concept Note: Problem statements sometimes differ slightly (e.g. whether put on existing key increments frequency). Always confirm the spec in an interview.

Why This Topic Matters

Common follow-up after LRU in FAANG-style interviews.
Models “popularity” better than pure recency for some workloads (hot keys stay longer).
Teaches composing multiple structures: key map, per-frequency lists, global min frequency.

Mental Model

key -> Node(key, value, freq)

freq 1:  head <-> ... <-> tail   (LRU order within this freq)
freq 2:  head <-> ... <-> tail
freq 3:  ...

min_freq  ->  points to smallest freq that still has keys
Eviction: remove LRU node from list at min_freq

Evolution: Brute Force → Better → Optimal

Brute Force

Store keys in a dict with (freq, last_used_time). On eviction, scan all keys to find minimum — O(n) per eviction.

Better

Keep a min-heap of (freq, time, key) — updates are messy and not all O(1).

Optimal

Hash map + doubly linked lists per frequency + min_freq pointer. Each get/put touches only local list operations — O(1) average.

Optimization Insight: Group keys by frequency so you never scan all keys to find the minimum frequency bucket — you only maintain min_freq and move it when a bucket empties.

Data Structure Blueprint

key_to_node: key -> Node for O(1) lookup.
freq_to_list: freq -> DoublyLinkedList storing keys at that frequency in LRU order (e.g. head = MRU, tail-prev = LRU).
min_freq: smallest frequency that currently has at least one key.

Step-by-Step Operations

get(key)

If key missing, return -1.
Remove node from its current frequency list.
If that list becomes empty and freq == min_freq, increment min_freq (next bucket exists because we are about to add at freq+1).
Increment node’s frequency; append to MRU side of the new frequency list.
Return value.

put(key, value)

If key exists: update value, then same frequency bump as get (one use).
If new key and cache full: evict LRU node from freq_to_list[min_freq], delete from key_to_node.
Insert new node with frequency 1; set min_freq = 1 (new key is among the smallest freq).

ASCII Diagram

freq 1:  [A] <-> [B]     (B is LRU at freq 1)
freq 2:  [C]

min_freq = 1  ->  evict B if we must evict one key at freq 1

Python Implementation

class Node:
    def __init__(self, key: int, value: int, freq: int = 1):
        self.key = key
        self.value = value
        self.freq = freq
        self.prev = None
        self.next = None


class DoublyLinkedList:
    """Maintains LRU order: head side = MRU, before tail = LRU."""

    def __init__(self) -> None:
        self.head = Node(0, 0, 0)
        self.tail = Node(0, 0, 0)
        self.head.next = self.tail
        self.tail.prev = self.head

    def add_to_front(self, node: Node) -> None:
        nxt = self.head.next
        self.head.next = node
        node.prev = self.head
        node.next = nxt
        nxt.prev = node

    def remove(self, node: Node) -> None:
        node.prev.next = node.next
        node.next.prev = node.prev

    def pop_lru(self) -> Node:
        """Remove node just before tail (LRU)."""
        if self.head.next is self.tail:
            raise ValueError("empty list")
        lru = self.tail.prev
        self.remove(lru)
        return lru

    def is_empty(self) -> bool:
        return self.head.next is self.tail


class LFUCache:
    def __init__(self, capacity: int) -> None:
        self.capacity = capacity
        self.key_to_node: dict = {}
        self.freq_to_list: dict = {}
        self.min_freq = 0

    def _ensure_list(self, freq: int) -> DoublyLinkedList:
        if freq not in self.freq_to_list:
            self.freq_to_list[freq] = DoublyLinkedList()
        return self.freq_to_list[freq]

    def _touch(self, node: Node) -> None:
        f = node.freq
        self.freq_to_list[f].remove(node)
        if self.freq_to_list[f].is_empty():
            del self.freq_to_list[f]
            if self.min_freq == f:
                # Moved key will be inserted at f+1, so new minimum frequency is f+1
                self.min_freq = f + 1
        node.freq = f + 1
        self._ensure_list(node.freq).add_to_front(node)

    def get(self, key: int) -> int:
        if key not in self.key_to_node:
            return -1
        node = self.key_to_node[key]
        self._touch(node)
        return node.value

    def put(self, key: int, value: int) -> None:
        if self.capacity <= 0:
            return
        if key in self.key_to_node:
            node = self.key_to_node[key]
            node.value = value
            self._touch(node)
            return

        if len(self.key_to_node) >= self.capacity:
            lst = self.freq_to_list[self.min_freq]
            ev = lst.pop_lru()
            del self.key_to_node[ev.key]
            if lst.is_empty():
                del self.freq_to_list[self.min_freq]

        node = Node(key, value, 1)
        self.key_to_node[key] = node
        self._ensure_list(1).add_to_front(node)
        self.min_freq = 1

Line-by-Line Explanation

DoublyLinkedList holds all keys sharing the same frequency; order encodes LRU tie-breaking.
_touch removes the node from old freq list; if that list empties and was min_freq, bump min_freq.
Node moves to freq + 1 at MRU position.
put on new key: if full, evict LRU at min_freq via pop_lru.
New keys start at freq 1 and reset min_freq to 1.

Worked Example

Example: Capacity = 2

put(1, 1), put(2, 2) — both freq 1, min_freq = 1.
get(1) — key 1 goes to freq 2; only key 2 remains at freq 1, min_freq = 1.
put(3, 3) — full; evict LRU among freq 1 → key 2 removed.
Cache holds keys 1 and 3.

Time Complexity

get and put: O(1) average — hash map lookup plus constant linked-list work per list.

Space Complexity

O(capacity) for nodes and map entries; frequency lists share those nodes.

Edge Cases

capacity == 0: put should no-op (LeetCode-style).
Single key repeated access: frequency grows; min_freq updates when lower buckets empty.
Tie on eviction: must evict LRU within minimum frequency list.

Common Mistakes

Common Mistake: Not updating min_freq after removing the last key at the current minimum frequency.

Common Mistake: Confusing LRU order inside a frequency list (eviction must take the correct end).

Common Mistake: Assuming LFU is “strictly better” than LRU — workload-dependent.

LRU vs LFU (Quick Comparison)

Policy	Evicts based on	Good when
LRU	Recency	Temporal locality
LFU	Frequency (tie: LRU)	Stable hot keys

Pattern Recognition

Reach for LFU when the problem mentions:

Evict least frequent key, or “count” of accesses.
Tie-breaking by recency among equal frequency.

Interview Insight

Interview Insight: State the design in one sentence: “Map from key to node; map from frequency to a DLL of nodes at that frequency in LRU order; track min_freq for O(1) eviction.” Then trace get and a full put that triggers eviction.

Practice Problems

Implement LFU with the LeetCode 460 API and test against examples.
Compare behavior with LRU on the same access sequence.
Extend with a max total memory size in bytes (variable value sizes).

Expert Tip: After LFU, study LFU with aging (decay counters over time) — real caches sometimes combine frequency and recency to avoid “sticky” keys.

Summary

LFU evicts the least frequently used key; ties broken by LRU within that frequency.
Use key -> node, freq -> DLL, and min_freq for O(1) operations.
Moving a key to a higher frequency may empty the old bucket and advance min_freq.
Clarify problem details for put on existing keys in interviews.

21.3 Design Twitter

Introduction

The "Design Twitter" problem is a classic object-oriented + data structure design question. You need to support posting tweets, following/unfollowing users, and returning a personalized news feed containing recent tweets from the user and people they follow.

This problem tests whether you can combine clean API design, efficient data modeling, and practical complexity trade-offs.

Real-World Analogy

Think of each user as a newspaper publisher. If user A subscribes to users B, C, and D, A's feed should show the newest headlines across all these publishers, mixed by timestamp. You do not need to sort the entire history each time; you only need the most recent few items.

Formal Definition

Support operations:

postTweet(userId, tweetId)
getNewsFeed(userId): return up to 10 most recent tweet IDs from user + followees.
follow(followerId, followeeId)
unfollow(followerId, followeeId)

Ordering should be by most recent first.

Concept Note: In this standard version, each tweet has a globally increasing timestamp, making recency comparisons straightforward.

Why This Topic Matters

Common interview problem that mixes hash maps, sets, lists, and heaps.
Introduces real feed-aggregation thinking used in large social systems.
Teaches k-way merge pattern for "top recent items from multiple streams."

Mental Model

Each user has:
  - own tweet list (most recent at end)
  - set of followees

Feed for user U:
  Merge recent tweets from:
    U + all users U follows
  Keep top 10 by timestamp

Evolution: Brute Force → Better → Optimal

Brute Force

Collect all tweets in system, filter by follow relations, sort, take top 10. Very expensive.

Better

Collect only tweets from relevant users (self + followees), sort combined list, take top 10.

Optimal (for interview constraints)

Use per-user tweet lists and max-heap k-way merge over latest tweet from each followed stream.

Avoid sorting all candidate tweets from scratch each request.
Extract only up to 10 feed items.

Optimization Insight: Feed asks only top 10, so do partial extraction via heap instead of full global sort.

Data Structure Design

followees: dict[int, set[int]] -> who each user follows.
tweets: dict[int, list[tuple[int, int]]] -> per-user list of (timestamp, tweetId).
time counter -> incremented each post to maintain global order.

Step-by-Step Operations

postTweet(userId, tweetId)

Increment global timestamp.
Append (time, tweetId) to that user's tweet list.

follow(followerId, followeeId)

If same user, ignore (or safely no-op).
Add followee to follower's set.

unfollow(followerId, followeeId)

Remove followee from follower set if present.
If absent, no-op.

getNewsFeed(userId)

Build candidate user set: self + followees.
Push each candidate user's latest tweet into max-heap (by timestamp).
Pop most recent tweet, append to answer.
From same user stream, push the previous tweet (if exists).
Repeat until 10 tweets or heap empty.

ASCII Diagram

User 1 follows: {2, 3}

Tweets:
U1: (t=8, id=101), (t=12, id=102)
U2: (t=11, id=201)
U3: (t=9, id=301), (t=10, id=302)

Heap starts with latest from each stream:
[(12,102,U1), (11,201,U2), (10,302,U3)]
Pop top repeatedly and push previous from same user stream.

Python Implementation

import heapq
from collections import defaultdict
from typing import List


class Twitter:
    def __init__(self):
        self.time = 0
        self.followees = defaultdict(set)  # follower -> set of followees
        self.tweets = defaultdict(list)    # user -> list of (time, tweetId)

    def postTweet(self, userId: int, tweetId: int) -> None:
        self.time += 1
        self.tweets[userId].append((self.time, tweetId))

    def getNewsFeed(self, userId: int) -> List[int]:
        users = set(self.followees[userId])
        users.add(userId)  # user should see own tweets

        # Max-heap using negative time in Python min-heap
        heap = []
        for u in users:
            if self.tweets[u]:
                idx = len(self.tweets[u]) - 1  # latest index
                t, tid = self.tweets[u][idx]
                heapq.heappush(heap, (-t, tid, u, idx))

        feed = []
        while heap and len(feed) < 10:
            neg_t, tid, u, idx = heapq.heappop(heap)
            feed.append(tid)

            prev_idx = idx - 1
            if prev_idx >= 0:
                pt, ptid = self.tweets[u][prev_idx]
                heapq.heappush(heap, (-pt, ptid, u, prev_idx))

        return feed

    def follow(self, followerId: int, followeeId: int) -> None:
        if followerId == followeeId:
            return
        self.followees[followerId].add(followeeId)

    def unfollow(self, followerId: int, followeeId: int) -> None:
        self.followees[followerId].discard(followeeId)

Line-by-Line Explanation

time gives unique increasing order for tweets.
tweets[user] acts as an append-only personal timeline.
getNewsFeed initializes heap with latest tweet from each relevant user.
Each pop gives globally most recent remaining tweet among streams.
After pop, pushing previous tweet from same stream creates k-way merge behavior.

Worked Example

Example:

User 1 posts 5, posts 6.
User 1 follows user 2.
User 2 posts 7.
getNewsFeed(1) returns [7, 6, 5] (newest first).
User 1 unfollows user 2.
getNewsFeed(1) returns [6, 5].

Time Complexity

postTweet: O(1)
follow/unfollow: O(1) average (set operations)
getNewsFeed:
- Heap init over F followed users + self: O(F log F) worst-case (or O(F) with heapify variant).
- Up to 10 pop/push operations: O(10 log F) = O(log F) practically bounded constant factor.

Space Complexity

Follow graph storage: O(total follow relations).
Tweet storage: O(total tweets).
Feed heap: O(F).

Edge Cases

User with no tweets and no follows: empty feed.
Self-follow requests: usually ignored.
Unfollow non-followed user: no-op.
Multiple users with same logical time: avoided by global incrementing counter.

Common Mistakes

Common Mistake: Sorting all tweets every feed request instead of using top-k merge.

Common Mistake: Forgetting to include user's own tweets in feed.

Common Mistake: Removing follow relation with expensive list operations instead of set discard.

System Design Extension Insight

At real scale, pull-based feed generation can become expensive. Systems often use hybrid fan-out strategies:

Fan-out on write: push tweet to followers' timelines at post time.
Fan-out on read: compute feed when requested.
Hybrid: push for normal users, pull for celebrity accounts.

Pattern Recognition

This problem pattern appears when you need:

Top-k recent items from multiple sorted streams.
Social graph style follow/unfollow relations.
Efficient API operations with evolving user activity.

Interview Insight

Interview Insight: First solve cleanly for coding interview constraints (hash maps + heap merge). Then mention production-level feed fan-out trade-offs to show senior design awareness.

Practice Problems

Extend feed size from 10 to configurable k.
Add likeTweet and return top liked recent tweets.
Implement pagination for older feed pages.
Add blocked users constraint to feed filtering.

Expert Tip: For interview code, prioritize correctness and clean APIs first. Once stable, discuss scaling choices (storage sharding, write amplification, cache invalidation) as advanced follow-up.

Summary

Design Twitter combines graph relationships and top-k feed aggregation.
Use per-user tweet lists, follow sets, and heap-based k-way merge for feed.
Main feed optimization: extract only needed recent items, not full history.
This problem is a strong bridge between DSA design and system design thinking.

21.4 Design MinStack

Introduction

MinStack is a stack data structure that supports normal stack operations and can also return the minimum element in constant time. The key requirement is: push, pop, top, and getMin should all be O(1).

This is a classic interview design problem because it looks simple, but naive approaches often fail on time complexity.

Real-World Analogy

Imagine a pile of books where each book has a weight label. Besides normal push/pop behavior, you want to instantly answer: “What is the lightest book currently in the pile?” If you scan the full pile every time, it is slow. MinStack keeps extra tracking so the answer is immediate.

Formal Definition

Design a stack that supports:

push(x): push element x.
pop(): remove top element.
top(): return top element.
getMin(): return minimum element currently in stack.

All operations must run in O(1).

Concept Note: "O(1)" for getMin is the core constraint that rules out scanning the full stack.

Why This Topic Matters

Very common interview question and often asked as a warm-up for advanced design questions.
Teaches augmentation pattern: enrich a base data structure with auxiliary state.
Same pattern appears in max-stack, queue-with-min, and monotonic data structures.

Mental Model

Main stack: stores actual values
Min stack : stores minimum so far at each depth

Depth i in min stack = minimum among first i elements in main stack

When you pop from main stack, you also pop from min stack, so both remain synchronized.

Evolution: Brute Force → Better → Optimal

Brute Force

Keep one normal stack; for getMin, scan all elements.

push/pop/top: O(1)
getMin: O(n)

Better

Maintain one variable current_min. Easy for push, but pop becomes hard when the popped element equals current min (you no longer know next minimum without scanning).

Optimal

Use two stacks:

Main stack stores values.
Min stack stores minimum-so-far after each push.

Now all operations are O(1).

Optimization Insight: Instead of recomputing minima, precompute and store them incrementally at each depth.

Step-by-Step Operations

push(x)

Push x to main stack.
If min stack empty, push x.
Else push min(x, min_stack[-1]) to min stack.

pop()

Pop from main stack.
Pop from min stack.

top()

Return main stack top.

getMin()

Return min stack top.

ASCII Diagram

After pushes: 5, 3, 7, 2

Main: [5, 3, 7, 2]
Min : [5, 3, 3, 2]

top()    -> 2
getMin() -> 2

After pop():
Main: [5, 3, 7]
Min : [5, 3, 3]
getMin() -> 3

Python Implementation

class MinStack:
    def __init__(self):
        self.stack = []
        self.min_stack = []

    def push(self, val: int) -> None:
        self.stack.append(val)
        if not self.min_stack:
            self.min_stack.append(val)
        else:
            self.min_stack.append(min(val, self.min_stack[-1]))

    def pop(self) -> None:
        self.stack.pop()
        self.min_stack.pop()

    def top(self) -> int:
        return self.stack[-1]

    def getMin(self) -> int:
        return self.min_stack[-1]

Line-by-Line Explanation

stack holds actual values in LIFO order.
min_stack[i] stores minimum among first i + 1 pushed elements still present.
push computes new running minimum instantly from previous minimum.
pop removes both stacks together to keep depths aligned.
getMin is O(1) because minimum is always at min_stack[-1].

Alternative Compact Approach

You can also store pairs in one stack: (value, min_so_far). This avoids maintaining two separate lists but uses the same core idea.

class MinStackPairs:
    def __init__(self):
        self.stack = []

    def push(self, val: int) -> None:
        current_min = val if not self.stack else min(val, self.stack[-1][1])
        self.stack.append((val, current_min))

    def pop(self) -> None:
        self.stack.pop()

    def top(self) -> int:
        return self.stack[-1][0]

    def getMin(self) -> int:
        return self.stack[-1][1]

Worked Example

Example:

push(4) -> min=4
push(1) -> min=1
push(3) -> min=1
getMin() returns 1
pop() removes 3
getMin() still 1
pop() removes 1
getMin() now 4

Time Complexity

push: O(1)
pop: O(1)
top: O(1)
getMin: O(1)

Space Complexity

O(n) extra space for min tracking.
Two-stack version stores n values + n minima.

Edge Cases

Duplicate minima: min stack still works because each depth stores min-so-far.
All decreasing values: min stack mirrors main values.
All increasing values: min stack repeats first minimum.
Operations on empty stack: define behavior (exception/no-op) based on problem specification.

Common Mistakes

Common Mistake: Updating min only on push, but not synchronizing pop for min stack.

Common Mistake: Tracking a single global min variable without handling pop of current minimum.

Common Mistake: Scanning stack in getMin, violating O(1) requirement.

Pattern Recognition

Use this augmentation pattern when:

A base data structure needs an extra query in constant time.
You can precompute “state so far” incrementally (min/max/gcd prefix-like metadata).
Push/pop operations naturally maintain aligned metadata stacks.

Interview Insight

Interview Insight: Start by saying: "I will maintain a second stack for running minima synchronized with the main stack." Then quickly trace a duplicate-minimum case to prove correctness.

Practice Problems

Design MaxStack with O(1) max retrieval.
Queue with O(1) min using two MinStacks.
Support getSecondMin() with reasonable complexity trade-offs.
Build a stack that supports getMin and getMax in O(1).

Expert Tip: In interview code, choose clarity over clever compression tricks. The two-stack design is simple, robust, and easy to explain under pressure.

Summary

MinStack augments a stack to answer minimum queries in O(1).
Optimal idea: maintain synchronized running-min metadata.
All required operations become O(1) with O(n) extra space.
This is a foundational design pattern for augmented data structures.

21.5 Design HashMap

Introduction

Designing a HashMap means building a data structure that stores key-value pairs and supports fast insert, search, and delete operations. In interviews, this problem checks whether you understand hashing fundamentals instead of only using built-in dictionaries.

The objective is to achieve average-case O(1) for put, get, and remove.

Real-World Analogy

Think of a large set of mailboxes. A hash function is like a rule that maps each person's name to one mailbox number. If two names map to the same mailbox, you need a strategy to handle that collision without losing data.

Formal Definition

Design a map with operations:

put(key, value) – insert or update key.
get(key) – return value if key exists, else -1.
remove(key) – delete key if present.

Target complexity: average O(1) per operation.

Concept Note: Worst-case can degrade to O(n) with heavy collisions, which is why good hash functions and resizing policies matter.

Why This Topic Matters

HashMap is one of the most used data structures in real software systems.
Understanding internals helps debug performance and collision issues.
Frequently asked in coding interviews as “Design HashMap from scratch”.

Mental Model

key --hash()--> bucket index

table[index] holds entries that hash to same index
Collision handling required when multiple keys share index

Hashing gives quick bucket location; collision strategy gives correctness.

Evolution: Brute Force → Better → Optimal

Brute Force

Store key-value pairs in a list and linearly search for every operation.

put/get/remove: O(n)

Better

Use fixed-size array with direct index for small key ranges. Fast but not general for large/sparse keys.

Optimal (general-purpose)

Use hashing + collision handling + resizing (rehashing) for stable average O(1).

Optimization Insight: Maintain load factor (entries / bucket count). Resize when too high to keep bucket chains short.

Collision Handling Approaches

1) Separate Chaining

Each bucket stores a list of entries. Colliding keys go into the same list.

2) Open Addressing (Linear/Quadratic/Double Hashing)

Store entries directly in table and probe for next free slot on collision.

In this course implementation, we use separate chaining because it is simpler and interview-friendly.

Step-by-Step Design (Separate Chaining)

Create array of buckets.
For a key, compute bucket index with hash function.
put: search bucket; update if key exists else append new pair.
get: search bucket and return value if found.
remove: search bucket and delete pair if found.
Resize and rehash when load factor exceeds threshold (e.g., 0.75).

ASCII Diagram

bucket_count = 8

index: 0   1   2   3   4   5   6   7
       [] [k1] [] [k2->k9] [] [k3] [] []

k2 and k9 collided to same bucket (index 3)

Python Implementation

from typing import List, Tuple


class MyHashMap:
    def __init__(self):
        self.capacity = 8
        self.size = 0
        self.load_factor_threshold = 0.75
        self.buckets: List[List[Tuple[int, int]]] = [[] for _ in range(self.capacity)]

    def _index(self, key: int) -> int:
        return hash(key) % self.capacity

    def _rehash(self) -> None:
        old_buckets = self.buckets
        self.capacity *= 2
        self.buckets = [[] for _ in range(self.capacity)]
        self.size = 0

        for bucket in old_buckets:
            for key, value in bucket:
                self.put(key, value)

    def put(self, key: int, value: int) -> None:
        idx = self._index(key)
        bucket = self.buckets[idx]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return

        bucket.append((key, value))
        self.size += 1

        if self.size / self.capacity > self.load_factor_threshold:
            self._rehash()

    def get(self, key: int) -> int:
        idx = self._index(key)
        bucket = self.buckets[idx]
        for k, v in bucket:
            if k == key:
                return v
        return -1

    def remove(self, key: int) -> None:
        idx = self._index(key)
        bucket = self.buckets[idx]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket.pop(i)
                self.size -= 1
                return

Line-by-Line Explanation

buckets is an array where each entry is a list of (key, value) pairs.
_index maps a key to bucket index.
put updates existing key if found, else appends new pair.
get linearly checks only one bucket chain, not whole map.
remove deletes matching key from its bucket.
_rehash doubles capacity and reinserts all pairs to new buckets.

Worked Example

Example:

put(1, 100)
put(9, 900) (may collide with key 1 for small bucket count)
get(1) returns 100
put(1, 111) updates existing key
get(1) returns 111
remove(9), then get(9) returns -1

Time Complexity

Average: put/get/remove = O(1)
Worst case: O(n) if many keys collide into same bucket.
Rehash: O(n) occasionally, but amortized cost per operation remains O(1) average.

Space Complexity

O(n + m) where n = number of entries, m = number of buckets.

Edge Cases

Update existing key: size should not increase.
Remove missing key: no-op.
Negative/large keys: hash function should still map safely.
Frequent inserts: ensure resizing is implemented or performance degrades.

Common Mistakes

Common Mistake: Forgetting to rehash existing entries after resizing capacity.

Common Mistake: Increasing size on key update.

Common Mistake: Assuming no collisions and storing one key per index.

Pattern Recognition

Use HashMap when you need:

Fast lookup by key.
Frequent insert/delete/search operations.
No requirement for sorted order.

Interview Insight

Interview Insight: Always mention collisions and load factor. A complete answer is not just “use modulo index” — it must explain collision handling and resizing.

Practice Problems

Implement HashSet using same hashing framework.
Build frequency counter from scratch using custom HashMap.
Implement open addressing version and compare with chaining.
Add iterator over key-value pairs.

Expert Tip: In real systems, hash quality and memory layout strongly affect performance. Algorithmic O(1) is only part of the story; cache locality and collision distribution matter too.

Summary

HashMap provides average O(1) put/get/remove using hashing.
Collision handling is essential for correctness.
Resizing keeps load factor controlled and operations fast in practice.
This design is foundational for many higher-level algorithms and systems.

21.6 Design Rate Limiter

Introduction

A rate limiter controls how many requests a user/client can make in a given time window. It protects systems from abuse, traffic spikes, accidental overload, and unfair resource usage.

In interview settings, this topic tests both algorithmic data structure skills and practical backend design thinking.

Real-World Analogy

Think of a building elevator that allows only a fixed number of people per minute for safety. If too many people arrive, some must wait. A rate limiter is the software version of this control gate.

Formal Definition

A rate limiter answers this decision repeatedly:

allow(user, timestamp) -> True if request allowed, else False.

Example policy: "Allow at most 3 requests per 10 seconds per user."

Concept Note: Correctness depends on exactly how you define the window (fixed window, sliding window, token bucket, etc.).

Why This Topic Matters

Critical for API reliability and abuse prevention in production systems.
Common in backend and system design interviews.
Teaches windowing, counters, queues, and trade-offs between precision and cost.

Mental Model

Incoming request
      |
      v
Lookup user state (counter/timestamps/tokens)
      |
Check policy
  allow? -> Yes: consume capacity
           No : reject/throttle

Core Approaches

1) Fixed Window Counter

Count requests in discrete windows (e.g., per minute). Fast and simple, but can allow bursts near boundary transitions.

2) Sliding Window Log

Store timestamps of recent requests; remove outdated ones each call. Accurate, but memory heavier.

3) Token Bucket

Tokens refill at steady rate; request consumes one token. Supports burst tolerance with smooth long-term rate control.

Evolution: Brute Force → Better → Optimal

Brute Force

Store all historical request timestamps forever and scan all on every request. Correct but very slow and memory-heavy.

Better

Use fixed window counters. O(1) operations, but edge burst artifacts can violate fairness expectations.

Optimal (for accuracy + interview clarity)

Sliding window log using deque per user: keeps only relevant timestamps; accurate per-window enforcement.

Optimization Insight: Keep only data needed for current decision window. Old events should be evicted eagerly or lazily during access.

Step-by-Step (Sliding Window Log)

Policy: max limit requests per window_seconds.

For incoming request at time t, get user's deque.
Remove timestamps <= t - window_seconds (outside active window).
If deque size is already >= limit, reject request.
Else append t and allow request.

ASCII Diagram

Window = 10s, Limit = 3
User A timestamps deque:
[12, 15, 19]

Request at t=22:
Remove <= 12  -> deque becomes [15, 19]
size=2 < 3 -> allow and append 22
new deque: [15, 19, 22]

Python Implementation (Sliding Window)

from collections import defaultdict, deque
from typing import Deque, Dict


class RateLimiter:
    def __init__(self, limit: int, window_seconds: int):
        self.limit = limit
        self.window = window_seconds
        self.user_requests: Dict[str, Deque[int]] = defaultdict(deque)

    def allow(self, user_id: str, timestamp: int) -> bool:
        q = self.user_requests[user_id]
        window_start = timestamp - self.window

        # Remove requests outside current sliding window
        while q and q[0] <= window_start:
            q.popleft()

        if len(q) >= self.limit:
            return False

        q.append(timestamp)
        return True

Line-by-Line Explanation

user_requests[user] stores only recent timestamps relevant to policy.
Cleanup loop keeps queue minimal by removing expired requests.
Queue length equals current request count inside active window.
Accepting request appends timestamp, updating future state.

Additional Example (Boundary Behavior)

Example: limit=2, window=5s. Requests at t=1,2 are allowed. Request at t=4 is rejected (already 2 in [ -1,4 ]). At t=7, request at t=1 expires, so request at t=7 is allowed.

Time Complexity

Per request: amortized O(1) for deque cleanup + append/check.
Each timestamp enters and leaves deque once.

Space Complexity

Per user: O(number of requests within current window).
Total: sum across active users.

Edge Cases

Out-of-order timestamps: simple deque approach assumes non-decreasing request time per user.
Very high cardinality users: need state eviction (TTL cleanup) for inactive users.
Clock skew across servers: distributed systems require synchronized or logical time strategy.
Limit = 0: reject all requests by definition.

Common Mistakes

Common Mistake: Forgetting to remove expired timestamps, causing false rejections and memory growth.

Common Mistake: Confusing fixed-window behavior with sliding-window behavior in explanations.

Common Mistake: Using local in-memory limiter in distributed setup without shared state, leading to inconsistent limits across instances.

Distributed System Considerations

Use Redis or centralized store for shared counters/timestamps.
Prefer atomic operations/Lua scripts to avoid race conditions.
Choose key granularity: per user, per IP, per API key, per endpoint.
Decide fail-open vs fail-closed behavior when rate-limit store is unavailable.

Pattern Recognition

Rate limiter design appears when requirements mention:

"X requests per Y seconds"
Traffic shaping / abuse prevention
Per-client fairness under high throughput

Interview Insight

Interview Insight: First provide a correct single-machine design (sliding window or token bucket), then proactively discuss distributed consistency, atomic updates, and storage strategy.

Practice Problems

Implement fixed-window and compare behavior with sliding-window on bursty traffic.
Implement token bucket rate limiter.
Add endpoint-specific policies (different limits per API route).
Build distributed limiter using Redis sorted sets or counters.

Expert Tip: In production, combine rate limiting with observability: track allow/reject metrics, per-key hot spots, and retry-after guidance.

Summary

Rate limiters protect systems by controlling request frequency.
Sliding window log offers accurate policy enforcement with manageable complexity.
Choose strategy (fixed/sliding/token) based on fairness, precision, and cost needs.
Distributed correctness requires shared atomic state and careful clock assumptions.

21.7 Design Vending Machine

Introduction

Designing a Vending Machine is a classic object-oriented design problem that tests how you model states, transitions, inventory, and payments in a clean and extensible way. The challenge is not just “dispense item” — it is handling all business rules correctly.

This problem is excellent practice for modeling real-world workflows with robust error handling.

Real-World Analogy

You select a product, insert money, and expect one of two outcomes: either product + change, or clear reason for rejection/refund. Internally, the machine moves through states like “waiting for selection”, “waiting for payment”, and “dispensing”.

Formal Definition

A vending machine should support:

Load inventory with item code, price, and quantity.
Select an item by code.
Insert money.
Dispense item if payment is sufficient and stock exists.
Return change/refund when needed.

Concept Note: This is a stateful system; behavior of the same method call depends on current machine state.

Why This Topic Matters

Highly common in low-level design interviews.
Teaches state machine thinking and class responsibility separation.
Builds habits for transactional correctness and edge-case handling.

Mental Model

States:
IDLE -> ITEM_SELECTED -> PAYMENT_COLLECTED -> DISPENSING -> IDLE
  \_______________________________________/
                cancel/refund path

Every action should be validated against current state and machine invariants (stock, balance, item validity).

Evolution: Brute Force → Better → Optimal

Brute Force

Put all logic in one giant function with many if-else checks. Works for tiny demos, becomes hard to maintain.

Better

Use one class with helper methods for inventory and payment checks. Cleaner, but state transitions can still become messy.

Optimal (for interview-quality design)

Model:

Clear entities (Item, InventorySlot, VendingMachine).
Explicit state variable (or State pattern for larger systems).
Deterministic transitions and guarded operations.

Optimization Insight: Explicit state modeling reduces bugs more than micro-optimizing code in design-heavy problems.

Core Components

Item

Represents product details: code, name, price.

Inventory Slot

Maps item to available quantity.

VendingMachine

Inventory storage
Current selected item
Inserted balance
State transitions and business rules

Step-by-Step Flow

User selects item code.
Machine validates code and stock.
User inserts money (possibly multiple times).
When balance >= price, machine dispenses item.
Machine returns change if balance > price.
Machine resets session state for next customer.

ASCII Diagram

[IDLE]
  | select(valid, in-stock)
  v
[ITEM_SELECTED]
  | insert money
  v
[PAYMENT_COLLECTED]
  | enough balance?
  | yes -> dispense + change -> reset
  | no  -> wait for more / cancel-refund

Python Implementation

from dataclasses import dataclass
from typing import Dict, Optional, Tuple


@dataclass(frozen=True)
class Item:
    code: str
    name: str
    price: int  # price in smallest unit (e.g., cents)


@dataclass
class InventorySlot:
    item: Item
    quantity: int


class VendingMachine:
    IDLE = "IDLE"
    ITEM_SELECTED = "ITEM_SELECTED"

    def __init__(self):
        self.inventory: Dict[str, InventorySlot] = {}
        self.state = VendingMachine.IDLE
        self.selected_code: Optional[str] = None
        self.balance = 0

    def load_item(self, item: Item, quantity: int) -> None:
        if quantity <= 0:
            return
        if item.code in self.inventory:
            self.inventory[item.code].quantity += quantity
        else:
            self.inventory[item.code] = InventorySlot(item=item, quantity=quantity)

    def select_item(self, code: str) -> str:
        if code not in self.inventory:
            return "Invalid item code."
        slot = self.inventory[code]
        if slot.quantity <= 0:
            return "Item out of stock."

        self.selected_code = code
        self.state = VendingMachine.ITEM_SELECTED
        self.balance = 0
        return f"Selected {slot.item.name}. Price: {slot.item.price}"

    def insert_money(self, amount: int) -> str:
        if self.state != VendingMachine.ITEM_SELECTED:
            return "Select an item first."
        if amount <= 0:
            return "Insert a positive amount."

        self.balance += amount
        slot = self.inventory[self.selected_code]
        if self.balance < slot.item.price:
            remaining = slot.item.price - self.balance
            return f"Inserted {amount}. Remaining: {remaining}"
        return f"Inserted {amount}. Ready to dispense."

    def dispense(self) -> Tuple[str, int]:
        if self.state != VendingMachine.ITEM_SELECTED or self.selected_code is None:
            return ("No item selected.", 0)

        slot = self.inventory[self.selected_code]
        price = slot.item.price
        if slot.quantity <= 0:
            self._reset_session()
            return ("Item became unavailable.", self._refund_all())
        if self.balance < price:
            return ("Insufficient balance.", 0)

        slot.quantity -= 1
        change = self.balance - price
        item_name = slot.item.name
        self._reset_session()
        return (f"Dispensed: {item_name}", change)

    def cancel(self) -> int:
        refund = self._refund_all()
        self._reset_session()
        return refund

    def _refund_all(self) -> int:
        refund = self.balance
        self.balance = 0
        return refund

    def _reset_session(self) -> None:
        self.state = VendingMachine.IDLE
        self.selected_code = None
        self.balance = 0

Line-by-Line Explanation

Item and InventorySlot separate immutable product info from mutable stock count.
select_item validates existence and stock before entering purchase session.
insert_money updates balance only in valid state.
dispense enforces payment and stock checks, decrements inventory, and computes change.
cancel returns full refund and resets machine session.

Worked Example

Example:

Load Coke (code C1, price 120, qty 2).
Select C1 -> state ITEM_SELECTED.
Insert 50 -> remaining 70.
Insert 100 -> balance 150 (enough).
Dispense -> returns "Dispensed: Coke", change 30, quantity reduces by 1.

Time Complexity

load_item, select_item, insert_money, dispense, cancel: O(1) average (hash map access).

Space Complexity

O(n) where n is number of item codes loaded into machine.

Edge Cases

Invalid item code: reject immediately.
Out-of-stock item: prevent selection/dispense.
Insufficient balance: keep waiting or allow cancel.
Cancel mid-transaction: full refund.
Concurrent users: real system needs session locking/isolation.

Common Mistakes

Common Mistake: Forgetting to reset state after dispense/cancel, causing transaction leakage into next user.

Common Mistake: Deducting stock before confirming payment sufficiency.

Common Mistake: Using floating-point money instead of integer smallest units (cents/paise), causing precision issues.

Design Extensions

Coin inventory for exact change-making.
Multiple payment methods (cash/card/UPI).
Admin mode for refill/pricing updates.
Telemetry: sales logs, low-stock alerts, fault reporting.

Pattern Recognition

This pattern appears when systems involve:

Clear workflow states and transitions.
Inventory/resources and transactional updates.
User actions that can fail at different checkpoints.

Interview Insight

Interview Insight: Interviewers value explicit state transition reasoning. Narrate state changes for each action and mention invariants (non-negative stock, no stale selection, accurate refunds).

Practice Problems

Add exact change-making using limited coin inventory.
Support multiple item selection cart before checkout.
Add timeout auto-cancel with refund.
Model this using full State Design Pattern classes.

Expert Tip: For LLD interviews, prioritize invariant safety and clear contracts first. Fancy patterns help only after correctness and state clarity are solid.

Summary

Vending machine design is a stateful workflow + inventory + payment problem.
Clean object model with guarded transitions yields maintainable behavior.
Correctness depends on state reset, stock handling, and money accounting.
This is a strong practice problem for real-world low-level design interviews.

21.8 Design TinyURL

Introduction

Design TinyURL is a classic system design + data structure problem where long URLs are converted into short unique aliases. The core requirements are correctness, uniqueness, fast lookup, and scalability.

In interviews, this problem checks whether you can move from simple mapping logic to production concerns like collisions, key space size, distributed ID generation, and analytics.

Real-World Analogy

Imagine replacing long home addresses with short locker numbers. Instead of writing the full address every time, you hand out a compact code that maps back to the real destination when needed.

Formal Definition

Provide two main APIs:

encode(longUrl) -> returns short URL.
decode(shortUrl) -> returns original long URL.

Typical constraints:

Same short URL should decode to exactly one long URL.
Generated keys should avoid collisions (or handle them safely).
Operation latency should be low.

Concept Note: TinyURL is fundamentally a bidirectional mapping problem plus reliable key generation strategy.

Why This Topic Matters

Frequently asked in backend/system design interviews.
Combines hashing, encoding, storage, and distributed architecture choices.
Teaches how small API surfaces can hide large scaling complexity.

Mental Model

Long URL --(generate short key)--> short key
short key --(store mapping)------> long URL

decode:
short key --(lookup)-------------> long URL

Core Approaches

1) Auto-increment ID + Base62 Encoding

Generate numeric IDs sequentially, then encode into Base62 (0-9, a-z, A-Z). Deterministic and collision-free if ID uniqueness is guaranteed.

2) Random Key Generation

Generate random 6-8 char keys and retry on collision. Simple, but collision checks are required.

3) Hash-based Key

Use hash of URL and truncate. Needs collision resolution and often does not guarantee uniqueness by itself.

Evolution: Brute Force → Better → Optimal

Brute Force

Store full URL and search linearly for decode/encode relations. Too slow.

Better

Use hash map with random key generation and collision retries.

Optimal (common production-friendly baseline)

Use unique numeric IDs (from DB sequence/snowflake-like service), Base62 encode for short token, and store direct mapping in key-value storage.

Optimization Insight: Base62 gives dense compact keys; each additional character multiplies key space by 62.

Step-by-Step Design (ID + Base62)

Generate unique numeric ID.
Convert ID to Base62 token.
Store token -> longUrl.
Optionally store longUrl -> token to return same short URL for duplicate long URLs.
Return short URL as baseDomain + "/" + token.
For decode, parse token and lookup original long URL.

ASCII Diagram

ID Service -> 125
Base62(125) -> "cb"
Store: "cb" -> "https://example.com/very/long/path"

Decode "cb" -> lookup -> original URL

Python Implementation (Interview-Scale)

class Codec:
    ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    BASE = 62

    def __init__(self):
        self.id_counter = 1
        self.short_to_long = {}
        self.long_to_short = {}
        self.domain = "https://tiny.url/"

    def _encode_base62(self, num: int) -> str:
        if num == 0:
            return "0"
        chars = []
        while num > 0:
            num, rem = divmod(num, self.BASE)
            chars.append(self.ALPHABET[rem])
        return "".join(reversed(chars))

    def encode(self, longUrl: str) -> str:
        # Optional dedup: same long URL returns same short URL
        if longUrl in self.long_to_short:
            return self.domain + self.long_to_short[longUrl]

        token = self._encode_base62(self.id_counter)
        self.id_counter += 1

        self.short_to_long[token] = longUrl
        self.long_to_short[longUrl] = token
        return self.domain + token

    def decode(self, shortUrl: str) -> str:
        token = shortUrl.rsplit("/", 1)[-1]
        return self.short_to_long.get(token, "")

Line-by-Line Explanation

id_counter ensures unique ID generation in this single-instance demonstration.
_encode_base62 converts numeric IDs to short human-friendly tokens.
short_to_long is required for decode path.
long_to_short is optional but useful for idempotent encode behavior.
decode extracts token and does O(1) average map lookup.

Worked Example

Example:

encode("https://abc.com/page1") -> https://tiny.url/1 (token depends on base62 encoding).
encode("https://abc.com/page2") -> another unique token.
decode("https://tiny.url/1") -> original page1 URL.
Re-encoding page1 can return same token if dedup map is enabled.

Time Complexity

encode: O(log62 N) for Base62 conversion + O(1) average hash ops.
decode: O(1) average hash lookup.

Space Complexity

O(n) for stored mappings, where n is number of unique URLs.

Edge Cases

Duplicate URLs: choose dedup or always-new-token policy.
Invalid short token: return error/empty response/404.
URL normalization: decide whether equivalent URLs map to same key.
Token exhaustion: increase token length or expand key generation space.

Common Mistakes

Common Mistake: Assuming truncated hash is collision-free.

Common Mistake: Not planning globally unique ID generation in distributed deployments.

Common Mistake: Ignoring malicious URLs and security filtering in real systems.

System Design Considerations

Storage: key-value DB for token -> URL, optional secondary index for dedup.
ID generation: DB auto-increment, distributed ID service, or pre-allocated ranges.
Caching: cache hot token lookups for low-latency decode.
Analytics: click counts, referrers, geolocation, time buckets.
Security: phishing detection, malware scanning, abuse rate limiting.
Expiration: optional TTL-based link expiry and cleanup jobs.

Pattern Recognition

TinyURL-style design appears when you need:

Compact externally visible identifiers for long internal data.
Fast reverse lookup by short token.
Large-scale unique ID allocation across distributed systems.

Interview Insight

Interview Insight: Give a clean MVP first (ID + Base62 + map), then discuss distributed uniqueness, storage, cache, and abuse prevention. This progression shows practical senior-level thinking.

Practice Problems

Add custom alias support while preserving uniqueness constraints.
Implement expiration and soft-delete of links.
Add analytics counters with eventual consistency discussion.
Implement random-key strategy and collision retry logic.

Expert Tip: In large systems, design decode path to be extremely fast and resilient — read-heavy workloads dominate URL shortener traffic.

Summary

TinyURL is a bidirectional mapping + key generation design problem.
ID + Base62 is a strong baseline: compact, predictable, collision-free with unique IDs.
Decode speed, uniqueness guarantees, and abuse controls are critical in production.
This problem is a strong blend of DSA fundamentals and practical system design.

21.9 Design Logging System

Introduction

A Logging System stores log entries and supports time-based retrieval. In interview versions, each log usually has an ID and timestamp, and queries request IDs between two timestamps at a given granularity (Year, Month, Day, etc.).

This problem tests how well you handle temporal data, string/timestamp normalization, and query boundaries.

Real-World Analogy

Think of a surveillance archive where every event has exact date-time. If someone asks “show all events in June 2017” or “show events between these two days,” you should quickly filter by time range and precision level.

Formal Definition

Support operations:

put(logId, timestamp) – store a log.
retrieve(start, end, granularity) – return log IDs with timestamps in the requested range after applying granularity.

Typical timestamp format: "YYYY:MM:DD:HH:MM:SS".

Concept Note: Granularity means you compare only a prefix of timestamp fields. Example: Day granularity compares up to YYYY:MM:DD.

Why This Topic Matters

Common interview problem combining data structure and string/time handling.
Very relevant for backend systems, observability tools, and monitoring platforms.
Teaches precision-aware range queries and indexing trade-offs.

Mental Model

Store logs as (timestamp, id)

Query:
1) truncate start/end by granularity
2) expand end to inclusive upper bound for that granularity
3) return logs with timestamp in [start, end]

Evolution: Brute Force → Better → Optimal

Brute Force

Store unsorted logs; for each query scan all logs and compare processed timestamps.

put: O(1)
retrieve: O(n)

Better

Keep logs sorted by timestamp and use binary search for range boundaries.

Optimal (for this problem style)

Sorted timestamp list + binary search + granularity prefix transformation.

Fast range extraction.
Simple correctness around timestamp string comparisons.

Optimization Insight: The given timestamp format is lexicographically ordered the same as chronological order, so string comparison works directly.

Granularity Mapping

Prefix lengths:

Year -> 4
Month -> 7
Day -> 10
Hour -> 13
Minute -> 16
Second -> 19

Step-by-Step Query Logic

Convert start and end to granularity-aware lower/upper bounds.
For lower bound: keep prefix, fill rest with minimal suffix :00:00:00....
For upper bound: keep prefix, fill rest with maximal suffix :99:99:99... (safe lexical upper trick).
Use binary search over sorted timestamps to find matching interval.
Return IDs in that index range.

ASCII Diagram

Stored (sorted):
2017:01:01:23:59:59 -> id1
2017:01:02:00:00:00 -> id2
2017:02:10:12:00:00 -> id3

Query:
start=2017:01:01:00:00:00
end  =2017:01:31:23:59:59
granularity=Month

Range becomes:
[2017:01:00:00:00:00, 2017:01:99:99:99:99]
Matches id1, id2

Python Implementation

from bisect import bisect_left, bisect_right
from typing import List, Tuple


class LogSystem:
    def __init__(self):
        self.logs: List[Tuple[str, int]] = []  # sorted by timestamp
        self.gidx = {
            "Year": 4,
            "Month": 7,
            "Day": 10,
            "Hour": 13,
            "Minute": 16,
            "Second": 19,
        }

    def put(self, log_id: int, timestamp: str) -> None:
        # For interview simplicity: append then sort.
        # In production/high-volume inserts, use better indexing/storage.
        self.logs.append((timestamp, log_id))
        self.logs.sort(key=lambda x: x[0])

    def retrieve(self, start: str, end: str, granularity: str) -> List[int]:
        idx = self.gidx[granularity]

        # Build lexical lower/upper bounds for chosen granularity
        low = start[:idx] + "0" * (19 - idx)
        high = end[:idx] + "9" * (19 - idx)

        left = bisect_left(self.logs, (low, -10**18))
        right = bisect_right(self.logs, (high, 10**18))

        return [log_id for _, log_id in self.logs[left:right]]

Line-by-Line Explanation

logs stores tuples sorted by timestamp for range binary search.
gidx maps granularity to prefix cutoff index.
low and high are generated by keeping prefix and relaxing suffix.
bisect_left finds first timestamp >= low; bisect_right finds first > high.
Slice between indices yields all logs in the time range.

Worked Example

Example:

put(1, "2017:01:01:23:59:59")
put(2, "2017:01:02:00:00:00")
put(3, "2017:02:01:00:00:00")
retrieve("2017:01:01:00:00:00", "2017:01:31:23:59:59", "Month") -> [1, 2]

Time Complexity

put in this simple version: O(n log n) due to full sort after each insert.
retrieve: O(log n + k), where k is number of returned logs.

With always-sorted insertion via bisect + list insert, put becomes O(n) shift cost and retrieval stays O(log n + k).

Space Complexity

O(n) to store logs.

Edge Cases

No logs: retrieval returns empty list.
start > end: should return empty or validate input.
Multiple logs same timestamp: all matching IDs should be returned.
Granularity mismatch understanding: ensure truncation applies symmetrically to start and end.

Common Mistakes

Common Mistake: Comparing full timestamps without granularity truncation.

Common Mistake: Incorrect inclusive upper-bound handling, missing logs at boundary.

Common Mistake: Assuming numeric date parsing is required; lexical compare works due to fixed-width format.

System Design Extensions

Partition logs by date/hour shards for scalable retrieval.
Use time-series databases or index trees for high-volume ingestion.
Add retention policies (TTL), compression, and archival storage.
Support filters by service, severity, region in addition to time range.

Pattern Recognition

This pattern appears when you need:

Time-based range retrieval.
Precision/granularity-aware queries.
Large append-heavy data with selective reads.

Interview Insight

Interview Insight: Mention why lexical ordering works for this timestamp format. This single observation often simplifies the entire solution and impresses interviewers.

Practice Problems

Optimize put using bisect insertion instead of full sort.
Add severity filtering (INFO/WARN/ERROR) alongside time retrieval.
Implement rolling log retention (delete entries older than X days).
Design distributed logging ingestion pipeline with query index service.

Expert Tip: In production observability stacks, ingestion write throughput and retention cost are often bigger challenges than single-query algorithm complexity.

Summary

Logging System design combines timestamp storage and granularity-aware range querying.
Fixed-format timestamp strings allow direct lexical ordering and binary search.
Correct boundary handling is the key correctness detail.
This problem is an excellent bridge between DSA and practical backend observability design.

22.1 Pattern Recognition Framework

Introduction

Pattern recognition is the skill of mapping a new problem to a known solution template quickly and correctly. Most interview problems look different on the surface, but underneath they repeat the same families: sliding window, two pointers, binary search on answer, graph traversal, DP, greedy, and so on.

This topic gives you a practical framework to identify those patterns under time pressure.

Real-World Analogy

A good doctor does not memorize every possible symptom combination separately. They identify clinical patterns and then apply proven treatment protocols. DSA interviews are similar: strong candidates recognize problem signatures and apply the right algorithmic protocol.

Formal Definition

Pattern recognition in DSA is the process of extracting structural cues from a problem statement and selecting the most appropriate algorithm/data structure template with complexity justification.

Concept Note: Pattern recognition is not guessing. It is evidence-based mapping from constraints + objective + input structure to algorithm class.

Why This Topic Matters

Reduces solve time dramatically in interviews and contests.
Prevents brute-force dead ends by forcing early complexity thinking.
Improves communication: you can explain “why this pattern” clearly.

Mental Model

Problem Statement
    |
    v
Signal Extraction
 (constraints, input form, operation type, objective)
    |
    v
Pattern Candidate Set
    |
    v
Complexity + Correctness Check
    |
    v
Chosen Template + Edge Cases + Implementation

The 6-Step Pattern Recognition Framework

Classify input shape: array/string, matrix, tree, graph, intervals, stream.
Identify task verb: search, count, optimize, shortest path, connectivity, partition.
Read constraints first: they eliminate impossible complexities.
List 2-3 candidate patterns: do not lock in too early.
Pick by complexity + invariant: choose method with a provable invariant.
Validate with edge cases: empty input, duplicates, negatives, boundaries.

Pattern Cue Table

Problem Signal	Likely Pattern	Typical Complexity
Contiguous subarray/substring	Sliding Window / Prefix Sum	O(n)
Sorted data + target condition	Binary Search / Two Pointers	O(log n) / O(n)
Tree/graph reachability	DFS / BFS / Union-Find	O(V+E)
Optimal with overlapping subproblems	Dynamic Programming	Varies
Intervals scheduling/merging	Sort + Greedy	O(n log n)

Evolution: Brute Force → Better → Optimal

Brute Force

Start with complete enumeration to understand state space and correctness baseline.

Better

Use data structures to remove repeated work (hash map, prefix sums, sorting).

Optimal

Recognize the dominant pattern early and implement invariant-driven solution with target complexity.

Optimization Insight: In interviews, optimal path often comes from removing recomputation, not inventing a brand-new algorithm.

Step-by-Step Example (Framework in Action)

Problem: “Find length of longest substring without repeating characters.”

Input shape: string.
Task verb: longest contiguous segment.
Constraint: typically O(n) desired.
Pattern candidates: brute-force substrings, sliding window.
Choose sliding window with frequency/last-seen map.
Invariant: window always has unique characters.

Python Mini-Framework Helper

This helper is not a solver; it demonstrates a checklist-style classifier to train your intuition.

def suggest_patterns(problem_text: str, n_hint: int | None = None) -> list[str]:
    text = problem_text.lower()
    patterns = []

    if "substring" in text or "subarray" in text or "contiguous" in text:
        patterns.append("Sliding Window / Prefix Sum")
    if "sorted" in text or "monotonic" in text:
        patterns.append("Binary Search / Two Pointers")
    if "graph" in text or "node" in text or "edge" in text:
        patterns.append("BFS / DFS / Union-Find / Shortest Path")
    if "tree" in text:
        patterns.append("DFS / BFS / Tree DP")
    if "minimum" in text or "maximum" in text or "count ways" in text:
        patterns.append("Dynamic Programming / Greedy")
    if "interval" in text:
        patterns.append("Sort + Merge / Greedy")

    if n_hint is not None:
        if n_hint <= 2000:
            patterns.append("O(n^2) may be acceptable")
        else:
            patterns.append("Target near O(n) or O(n log n)")

    # Remove duplicates while preserving order
    seen = set()
    ordered = []
    for p in patterns:
        if p not in seen:
            seen.add(p)
            ordered.append(p)
    return ordered

Line-by-Line Explanation

Uses keyword cues from statement text to propose candidate patterns.
n_hint injects complexity sanity check.
Returns ordered unique recommendations, mimicking interview thought flow.

Time Complexity Perspective

Pattern recognition itself is fast; main gain is avoiding wrong-path implementations.
Primary goal: pick a pattern whose target complexity fits constraints.

Space Complexity Perspective

Many optimal patterns trade space for speed (hash maps, DP tables, heaps).
Always state this trade-off explicitly in interviews.

Edge Cases Checklist (Universal)

Empty input / single element.
Duplicates / all equal values.
Negative values / zero handling.
Boundary indices and overflow-prone operations.

Common Mistakes

Common Mistake: Choosing pattern based on memorized keywords only, without validating constraints and invariants.

Common Mistake: Jumping to DP for everything; many problems are simpler with greedy or two pointers.

Common Mistake: Not articulating why alternatives were rejected.

Interview Insight

Interview Insight: Say this early: "I’ll quickly classify the problem by input shape, constraints, and required operation, then choose the best-fitting pattern." This signals senior problem-solving discipline.

Practice Problems

Take 20 random problems and label primary + secondary pattern before coding.
For each solved problem, write one-sentence “pattern signature” for revision.
Redo medium problems by forcing an alternative valid pattern and compare trade-offs.

Expert Tip: Build a personal "pattern notebook" with trigger signals, invariants, and template skeletons. Revision speed improves dramatically.

Summary

Pattern recognition is the fastest path from problem statement to correct algorithm family.
Use a repeatable framework: classify -> shortlist -> justify -> implement.
Constraints + invariants should drive pattern choice, not keyword guessing.
This skill is the backbone of interview consistency.

22.2 Problem Difficulty Ladder

Introduction

A problem difficulty ladder is a structured progression system for DSA practice where you intentionally move from easy patterns to advanced combinations. Instead of solving random questions, you follow a sequence that builds transferable skills layer by layer.

This topic helps you train like an athlete: controlled progression, measurable checkpoints, and targeted weakness repair.

Real-World Analogy

You do not begin gym training with maximum weight on day one. You build movement quality first, then strength, then complexity under fatigue. DSA mastery works the same way: foundations first, then harder pattern combinations, then interview simulation pressure.

Formal Definition

Problem Difficulty Ladder is a staged practice framework where each level introduces stricter constraints, deeper pattern composition, and higher implementation precision requirements.

Concept Note: Difficulty is not only algorithm complexity; it also includes ambiguity, edge-case density, and implementation fragility.

Why This Topic Matters

Prevents random practice and skill plateaus.
Builds confidence through repeatable progression milestones.
Improves interview readiness by matching practice to target company bar.

Mental Model

Level 1: Pattern Recognition
Level 2: Pattern Execution
Level 3: Pattern Mixing
Level 4: Constraint-Driven Optimization
Level 5: Interview Simulation

Each level assumes mastery of the previous one. Skipping levels usually creates hidden weakness.

The 5-Level Difficulty Ladder

Level 1: Foundational Pattern Identification

Goal: detect core pattern quickly.
Problem type: easy variants of arrays/strings/hash maps.
Target: explain why chosen pattern fits constraints.

Level 2: Clean Implementation Under Time

Goal: implement correctly without template copy-paste dependency.
Problem type: medium single-pattern questions.
Target: pass edge cases in first/second attempt.

Level 3: Hybrid Pattern Problems

Goal: combine two or more patterns (e.g., binary search + greedy check, DFS + DP).
Problem type: medium-hard composition problems.
Target: reason about interaction between sub-techniques.

Level 4: Optimization and Trade-offs

Goal: move from acceptable solution to optimal complexity.
Problem type: hard constraints, advanced data structures.
Target: justify why alternatives fail complexity limits.

Level 5: Interview Simulation

Goal: communicate, code, debug, and optimize in one realistic session.
Problem type: unseen mixed-difficulty questions under strict time.
Target: complete solution with clear explanation and test strategy.

Evolution: Brute Force → Better → Optimal (Training Strategy)

Brute Force Practice

Solve random questions without pattern tracking. Progress feels slow and inconsistent.

Better Practice

Group by topic, but still no progression gates or readiness metrics.

Optimal Practice

Use ladder with entry criteria, exit criteria, and review cycles.

Optimization Insight: What you measure improves. Track not only solved count, but solve time, hint dependency, bug types, and communication quality.

Step-by-Step Weekly Ladder Plan

Select one core topic cluster (e.g., sliding window + prefix sum).
Solve 6-8 level-1/2 problems for recognition and execution.
Solve 4-6 level-3 hybrid problems.
Solve 2-3 level-4 optimization problems.
Run one level-5 mock interview session.
Perform retrospective: errors, time leaks, communication gaps, pattern confusion.

Difficulty Ladder Scorecard

Metric	Target	Why It Matters
Pattern identification time	< 3 minutes	Reduces interview dead time
First correct implementation rate	>= 70%	Reliability under pressure
Hint dependency	Decreasing trend	Independence growth
Edge-case misses	<= 1 per problem	Code robustness

ASCII Progress Board

Week N:
L1 [#####]
L2 [#### ]
L3 [###  ]
L4 [##   ]
L5 [#    ]

Goal: push weakest filled bar each week

Python Tracker Utility (Practice Analytics)

from dataclasses import dataclass
from typing import List


@dataclass
class Attempt:
    level: int
    solved: bool
    used_hint: bool
    minutes: int
    edge_case_bugs: int


def summarize(attempts: List[Attempt]) -> dict:
    if not attempts:
        return {"total": 0}

    total = len(attempts)
    solved = sum(a.solved for a in attempts)
    hint_count = sum(a.used_hint for a in attempts)
    avg_minutes = sum(a.minutes for a in attempts) / total
    avg_bugs = sum(a.edge_case_bugs for a in attempts) / total

    by_level = {}
    for lv in range(1, 6):
        group = [a for a in attempts if a.level == lv]
        if not group:
            continue
        by_level[lv] = {
            "count": len(group),
            "solve_rate": sum(a.solved for a in group) / len(group),
            "avg_minutes": sum(a.minutes for a in group) / len(group),
        }

    return {
        "total": total,
        "solve_rate": solved / total,
        "hint_rate": hint_count / total,
        "avg_minutes": avg_minutes,
        "avg_edge_case_bugs": avg_bugs,
        "by_level": by_level,
    }

Line-by-Line Explanation

Each attempt records level, correctness, hint usage, time, and bug count.
summarize provides global and per-level health metrics.
Use this to identify bottlenecks (for example level-3 hybrid weakness).

Time Complexity Perspective

Ladder planning is about improving average solve complexity choices over time.
The real complexity gain is strategic: fewer brute-force starts and faster convergence to optimal patterns.

Space Complexity Perspective

Training logs are lightweight; keeping detailed notes gives disproportionate long-term payoff.

Edge Cases in Preparation

Overfitting to one platform: solve from multiple sources.
Skipping review: solved count rises but skill stagnates.
Only hard problems: weak fundamentals remain hidden.

Common Mistakes

Common Mistake: Measuring progress only by number of solved questions.

Common Mistake: Jumping to hard problems before mastering medium execution.

Common Mistake: Not re-solving previously failed problems after 1-2 weeks.

Interview Insight

Interview Insight: Interview consistency comes from ladder depth, not random exposure. A candidate with fewer but well-progressed problems often performs better than one with high unsystematic volume.

Practice Problems

Create your own 5-level ladder for one topic (e.g., graphs).
Run a 2-week cycle and analyze by-level solve rate changes.
For each failed problem, classify failure type: pattern miss, implementation bug, edge case miss, time panic.

Expert Tip: If your level-3 and level-4 progress stalls, reduce difficulty briefly and focus on speed + invariant articulation. Precision under moderate load beats chaotic hard-problem grinding.

Summary

Difficulty ladders convert random practice into systematic skill growth.
Use staged progression with measurable exit criteria per level.
Track quality metrics (time, hints, bugs), not only solved count.
This framework builds interview reliability and long-term mastery.

22.3 Whiteboard Coding

Introduction

Whiteboard coding is not just coding without an IDE. It is a structured communication exercise where interviewers evaluate how you think, decompose problems, reason about edge cases, and recover from mistakes in real time.

Strong candidates treat whiteboard coding as a collaborative design-and-implementation session, not silent puzzle solving.

Real-World Analogy

Imagine a pilot simulation: evaluators are not checking only whether the destination is reached, but how decisions are made under constraints, how checklists are followed, and how anomalies are handled calmly. Whiteboard interviews test similar discipline.

Formal Definition

Whiteboard coding is a constrained problem-solving format where the candidate explains approach, writes code manually, validates with examples, and analyzes complexity without relying on IDE tooling.

Concept Note: Interview success depends on both correctness and communication quality. Perfect code with poor explanation can still underperform.

Why This Topic Matters

Many companies still use whiteboard or whiteboard-like collaborative coding rounds.
Builds clarity of thought, algorithm articulation, and debugging confidence.
Improves on-the-spot reasoning when syntax support is limited.

Mental Model

Understand -> Clarify -> Plan -> Code -> Trace -> Analyze -> Improve

The order matters. Jumping directly into code usually causes avoidable errors.

The Whiteboard Coding Workflow (7 Steps)

Restate the problem: confirm input/output and objective.
Ask clarifying questions: constraints, duplicates, edge conditions, return format.
Propose brute force briefly: show baseline understanding.
Derive optimal approach: explain key invariant/data structure choice.
Write clean skeleton first: function signature, helpers, core loop.
Dry run with sample: trace variable changes aloud.
Complexity + edge cases: finalize confidently.

Brute Force → Better → Optimal (Communication Style)

Brute Force

“A direct approach is X with complexity O(...). It is correct but too slow because ...”

Better

“We can remove repeated work by using ... and reduce complexity to ...”

Optimal

“Final approach uses invariant ... with data structure ... giving O(...) time and O(...) space.”

Optimization Insight: Interviewers reward clear evolution of thinking more than immediate final answer dumping.

Whiteboard-Friendly Code Structure

Use short meaningful variable names (left, right, freq).
Split tricky logic into helper functions when possible.
Avoid deeply nested code if a guard clause can simplify flow.
Write comments only for non-obvious invariants.

ASCII Whiteboard Layout Strategy

+-----------------------------------------------+
| Problem Notes / Constraints                   |
|-----------------------------------------------|
| Example + Dry Run Table                       |
|-----------------------------------------------|
| Final Code                                    |
|-----------------------------------------------|
| Complexity + Edge Cases                       |
+-----------------------------------------------+

This layout keeps your thinking visible and easy for interviewer to follow.

Step-by-Step Demonstration Pattern

Use this sequence for almost any medium DSA question:

“I’ll restate: ...”
“Assumptions: ...”
“Naive approach: ... O(...)”
“Better insight: ...”
“Final algorithm steps: 1..2..3..”
Write code and narrate critical lines.
Dry run + complexity + edge case checks.

Python Template for Whiteboard Communication

def solve(nums):
    # 1) Guard clauses
    if not nums:
        return 0

    # 2) State initialization
    left = 0
    best = 0
    freq = {}

    # 3) Main loop with invariant narration:
    #    window [left..right] always valid
    for right, x in enumerate(nums):
        freq[x] = freq.get(x, 0) + 1

        while not is_valid(freq):  # placeholder condition
            y = nums[left]
            freq[y] -= 1
            if freq[y] == 0:
                del freq[y]
            left += 1

        best = max(best, right - left + 1)

    return best

This is a generic communication template: guard clauses, state, invariant loop, and final result.

Line-by-Line Explanation

Guard clause shows immediate boundary awareness.
Initialization makes data dependencies explicit.
Main loop updates state progressively.
Invariant-preserving while-loop demonstrates correctness control.
best update indicates objective tracking.

Time Complexity Checklist During Interview

State complexity before coding when possible.
Name each dominant loop/data-structure operation.
Mention amortized behavior if relevant (e.g., two pointers, deque pops).

Space Complexity Checklist

Differentiate input space vs extra auxiliary space.
Mention recursion stack for DFS/backtracking solutions.

Edge Cases to Always Ask/Check

Empty input, single element, all equal values.
Negative values, large constraints, duplicates.
Index boundaries and overflow-sensitive operations.

Common Mistakes

Common Mistake: Starting to code before clarifying constraints and expected behavior.

Common Mistake: Silent coding for long periods; interviewer loses signal about your reasoning.

Common Mistake: Not doing a full dry run after coding.

Common Mistake: Panic on bug; strong candidates debug methodically and communicate calmly.

Interview Recovery Strategy (When Stuck)

Pause and summarize current state in one sentence.
State where uncertainty is (pattern, edge case, implementation detail).
Propose smallest possible next test case.
Adjust with explicit reasoning instead of random edits.

Pattern Recognition in Whiteboard Context

When under pressure, use quick cue mapping:

Contiguous region -> sliding window/prefix.
Sorted + threshold -> binary search/two pointers.
Reachability/path -> BFS/DFS.
“Ways/optimal with overlap” -> DP.

Interview Insight

Interview Insight: The best whiteboard performance looks like pair programming: clear assumptions, visible reasoning, clean code, and controlled debugging.

Practice Problems

Re-solve 10 medium problems on paper without running code.
Record yourself explaining approach in under 2 minutes before coding.
Practice “live dry-run” for each solution with at least two edge cases.
Run timed mock interviews with a friend and feedback rubric.

Expert Tip: Train syntax muscle memory for your interview language. Whiteboard rounds should test algorithm thinking, not basic syntax hesitation.

Summary

Whiteboard coding evaluates reasoning + communication + correctness.
Follow a repeatable workflow: clarify, plan, code, trace, analyze.
Narrated brute-force-to-optimal progression increases interviewer confidence.
Calm debugging and explicit invariants often differentiate top candidates.

22.4 Communication Strategy

Introduction

In coding interviews, communication is a core technical skill, not a soft add-on. Interviewers evaluate not just what solution you reach, but how clearly and reliably you reason toward it.

A strong communication strategy helps you make your thinking visible, reduce misunderstandings, and recover smoothly from mistakes.

Real-World Analogy

A senior engineer in production incidents does not silently type fixes. They narrate assumptions, risks, and next steps so the team can coordinate. Interview communication follows the same principle: make your internal model externally understandable.

Formal Definition

Interview communication strategy is a structured way to articulate problem understanding, algorithm decisions, implementation plan, validation, and trade-offs throughout the session.

Concept Note: Communication quality improves the interviewer’s confidence in your engineering maturity, even before final code is complete.

Why This Topic Matters

Prevents solving the wrong interpretation of the problem.
Signals algorithmic clarity and collaboration style.
Creates opportunities for hints and alignment instead of silent failure.
Can differentiate two candidates with similar coding ability.

Mental Model

Understand -> Align -> Decide -> Implement -> Validate -> Reflect

At each stage, communicate the minimum essential information clearly and briefly.

The 6-Phase Communication Framework

Phase 1: Problem Alignment

Restate input, output, and objective in your own words.
Confirm assumptions and ambiguous requirements.

Phase 2: Constraint Anchoring

Mention expected complexity targets based on n limits.
Rule out infeasible brute force clearly.

Phase 3: Approach Narration

Present brute force briefly, then improved and final approach.
Name invariants/data structures explicitly.

Phase 4: Implementation Signposting

Before coding each block, say what it does.
Call out tricky lines and boundary handling.

Phase 5: Validation Loop

Dry run with one normal and one edge case.
Narrate state transitions (pointers/maps/queues).

Phase 6: Final Technical Wrap

Time and space complexity.
Trade-offs and possible optimizations.

Brute Force → Better → Optimal Communication Script

Brute Force

“A straightforward method is ..., complexity is O(...), but this may fail for large constraints.”

Better

“We can avoid repeated work by ..., improving to O(...).”

Optimal

“Final approach uses ... invariant with ... data structure; complexity becomes O(...).”

Optimization Insight: A crisp progression narrative often matters more than speaking continuously. Be concise but structured.

High-Value Sentence Templates

“Let me restate to confirm we are aligned...”
“Given n up to ..., I should target around O(...).”
“The invariant I maintain is ...”
“I’ll quickly dry run this on ...”
“One edge case here is ... and this line handles it.”

ASCII Interview Timeline

0-3 min   : Clarify + constraints
3-8 min   : Approach evolution
8-20 min  : Code with signposting
20-25 min : Dry run + complexity + refinements

Mini Python Example + Narration Style

The code is simple; focus is how to narrate intent while writing.

def two_sum(nums, target):
    # Narration: map stores value -> index seen so far
    seen = {}
    for i, x in enumerate(nums):
        need = target - x
        # Narration: if complement already seen, pair found
        if need in seen:
            return [seen[need], i]
        seen[x] = i
    return []

Line-by-Line Communication Notes

State data structure purpose before writing it (seen lookup table).
State the key equation (need = target - x) aloud.
Point out return behavior when solution is found.
Mention fallback return and assumptions about existence.

Time Complexity Communication Checklist

Name dominant operations and loop counts.
Distinguish average vs worst case when hashing is used.
Avoid vague terms like “fast” without Big-O.

Space Complexity Communication Checklist

Mention auxiliary structures explicitly (maps, stacks, queues).
Include recursion depth if recursion is used.

Common Mistakes

Common Mistake: Talking too little (silent coding), so interviewer cannot assess reasoning.

Common Mistake: Talking too much without structure, causing confusion and time loss.

Common Mistake: Defending a failing approach too long instead of pivoting early.

Common Mistake: Skipping complexity discussion at the end.

Recovery Strategy When You Make a Mistake

Acknowledge quickly: “I see a bug in boundary handling.”
Localize precisely: “Issue is in while-loop condition.”
Patch with rationale: “I’ll change this to preserve invariant ...”
Re-run one small test case to verify fix.

Pattern Recognition + Communication

When stating a chosen pattern, always attach evidence:

Input form (string/array/graph).
Objective type (max/min/count/path).
Constraint target (O(n), O(n log n), etc.).
Invariant that proves correctness.

Interview Insight

Interview Insight: Great interview communication sounds like collaborative engineering: concise alignment, explicit invariants, test-driven validation, and calm adaptation when needed.

Practice Problems

Solve 10 known problems while recording a 2-minute approach explanation before coding.
Practice one mock where you are graded only on clarity, not code correctness.
Create a personal checklist card: clarify, constraints, invariant, dry run, complexity.

Expert Tip: Use short structured checkpoints every few minutes (“Plan, progress, next step”). This keeps interviewer alignment high and reduces miscommunication risk.

Summary

Communication strategy is a technical multiplier in coding interviews.
Use a repeatable phase-based framework from alignment to wrap-up.
Narrate invariants and decisions, not every keystroke.
Clear recovery behavior after mistakes often improves interviewer confidence.

22.5 Time Management

Introduction

Time management in DSA interviews is the skill of allocating minutes intentionally across understanding, planning, coding, testing, and optimization. Many candidates fail not because they lack knowledge, but because they spend too long in one phase and run out of time for crucial final steps.

This topic teaches a practical pacing framework that improves completion rate and interview consistency.

Real-World Analogy

In a marathon, running too fast in the first few kilometers can destroy performance later. In interviews, over-investing early (for example, 20 minutes on one edge case before writing core logic) creates the same failure pattern.

Formal Definition

Interview time management is the deliberate budgeting of limited session time across problem-solving phases, with checkpoint-based adjustments to maximize the probability of a complete, correct, and communicable solution.

Concept Note: Good pacing is dynamic. You should re-evaluate progress at checkpoints, not follow a rigid script blindly.

Why This Topic Matters

Increases probability of shipping a full solution within interview limits.
Prevents “almost done but no dry run” outcomes.
Improves interviewer confidence through controlled execution.

Mental Model

Budget -> Execute -> Checkpoint -> Adjust -> Deliver

The goal is not perfection in every phase; the goal is high-confidence delivery before time expires.

Standard 45-Minute Coding Round Budget

Phase	Target Time	Outcome
Understand + clarify	3-5 min	Aligned problem statement
Approach design	7-10 min	Chosen algorithm + complexity
Implementation	18-22 min	Working code skeleton complete
Dry run + edge cases	6-8 min	Bug fixes + correctness confidence
Final wrap	2-3 min	Complexity + optional optimization

Brute Force → Better → Optimal (Pacing Strategy)

Brute Force Pacing

No time checkpoints; candidate gets stuck and notices too late.

Better Pacing

Rough phase targets, but no active adjustment if delayed.

Optimal Pacing

Checkpoint-driven pacing with explicit pivot decisions when phase exceeds budget.

Optimization Insight: In interviews, a complete near-optimal solution usually beats an incomplete perfect solution.

Checkpoint Rules (Practical)

If no clear approach by minute 10, state fallback approach and start coding.
If core skeleton not done by minute 25, reduce optional abstractions and finish main path first.
If debugging crosses 5 minutes, run smallest failing test and isolate one variable/invariant at a time.
Reserve final 2-3 minutes for complexity and trade-off summary no matter what.

ASCII Pace Tracker

00----05----10----20----30----40----45
| U/C | Design | Coding | Test | Wrap |

U/C = Understand + Clarify

Time-Aware Execution Template (Python)

This utility models phase tracking during mock practice sessions.

from dataclasses import dataclass
from typing import List


@dataclass
class PhaseLog:
    phase: str
    planned_minutes: int
    actual_minutes: int


def pacing_report(logs: List[PhaseLog]) -> dict:
    total_planned = sum(x.planned_minutes for x in logs)
    total_actual = sum(x.actual_minutes for x in logs)

    overruns = []
    for x in logs:
        delta = x.actual_minutes - x.planned_minutes
        if delta > 0:
            overruns.append((x.phase, delta))

    return {
        "total_planned": total_planned,
        "total_actual": total_actual,
        "overrun_minutes": max(0, total_actual - total_planned),
        "phase_overruns": overruns,
    }

Line-by-Line Explanation

PhaseLog captures planned vs actual time per phase.
pacing_report highlights where time leaks repeatedly occur.
Use these signals to adjust future budgets (for example, reduce design overthinking).

Time Complexity Perspective

Pacing decisions should be complexity-aware: avoid spending long time polishing non-viable O(n^2) plans when constraints need O(n log n) or better.

Space Complexity Perspective

When short on time, choose simpler implementation with slightly higher space if it is easier to code correctly and explain.

Edge Cases in Time Management

Hard problem spike: switch to clear baseline + discuss optimization path.
Unexpected bug late: patch minimal safe fix first, then mention full refinement if time remained.
Interviewer interruptions: answer briefly, then restate where you paused.

Common Mistakes

Common Mistake: Spending 15+ minutes silently searching for perfect approach before writing anything.

Common Mistake: Not budgeting time for dry run and complexity explanation.

Common Mistake: Over-engineering helper abstractions that consume coding time.

Interview Insight

Interview Insight: Interviewers often reward controlled progress. Even when solution is not perfect, strong pacing + transparent trade-offs can still produce a good evaluation.

Practice Problems

Run 5 timed mocks with fixed 45-minute budget and log phase splits.
For each mock, identify one recurring time leak and one correction rule.
Practice “minute-10 decision”: commit to a viable approach and move to coding.

Expert Tip: Maintain a personal pacing script. Under stress, pre-decided checkpoints reduce decision fatigue and help you stay composed.

Summary

Time management is a first-class interview skill, not an afterthought.
Use phase budgets and checkpoints to avoid late-stage incompletion.
Prioritize complete, validated solutions over perfection paralysis.
Consistent pacing dramatically improves interview outcomes.

22.6 Fast I/O

Introduction

Fast I/O (Input/Output) is the technique of reading and writing data efficiently when input size is huge. In many coding contests and some interview-style assessments, algorithm complexity is correct but solution still times out due to slow I/O methods.

In Python, understanding when and how to optimize I/O can be the difference between AC (Accepted) and TLE (Time Limit Exceeded).

Real-World Analogy

Suppose you can solve packages quickly in a warehouse, but the loading gate is narrow and slow. Overall throughput is still poor. Similarly, even an O(n) algorithm can underperform if each line read/write is expensive.

Formal Definition

Fast I/O means minimizing overhead in data ingestion and output emission by using buffered operations and reduced per-call overhead.

Concept Note: Fast I/O is a throughput optimization layer. It does not replace algorithmic optimization; it complements it.

Why This Topic Matters

Critical for competitive programming and large-batch coding tests.
Prevents TLE in otherwise correct solutions.
Builds awareness of runtime bottlenecks beyond algorithm Big-O.

Mental Model

Total runtime = Algorithm time + I/O overhead

If input/output volume is huge:
I/O overhead can dominate

Optimize both compute path and data movement path.

Brute Force → Better → Optimal

Brute Force

Use repeated input() and print() in loops.

Simple but high per-call overhead.

Better

Use sys.stdin.readline and buffer outputs in list, then join once.

Optimal (Python contest baseline)

Read raw bytes using sys.stdin.buffer.read(), parse tokens once, and write output using sys.stdout.write() in bulk.

Optimization Insight: Reducing the number of Python-level function calls often gives major speedup in high-volume I/O scenarios.

Step-by-Step Fast I/O Strategy

Read entire input once using buffered method.
Split into tokens and parse with pointer/index.
Avoid repeated string conversions where possible.
Collect outputs in list and write once at end.

ASCII Diagram

Slow path:
input() -> parse -> print()
input() -> parse -> print()
... repeated many times

Fast path:
read all -> tokenize -> compute -> join outputs -> single write

Python Implementations

Approach A: Practical Fast Enough (Most Cases)

import sys

def solve():
    input = sys.stdin.readline
    n = int(input().strip())
    arr = list(map(int, input().split()))
    ans = sum(arr)
    sys.stdout.write(str(ans) + "\\n")

if __name__ == "__main__":
    solve()

Approach B: High-Volume Fast I/O Template

import sys

def solve():
    data = sys.stdin.buffer.read().split()
    it = iter(data)

    n = int(next(it))
    total = 0
    for _ in range(n):
        total += int(next(it))

    sys.stdout.write(str(total) + "\\n")

if __name__ == "__main__":
    solve()

Line-by-Line Explanation

sys.stdin.buffer.read() reads bytes in one buffered call.
split() tokenizes by whitespace quickly.
Iterator over tokens avoids manual index tracking complexity.
sys.stdout.write avoids repeated print overhead.

Additional Example: Batch Output

Example: If you must output results for 200,000 queries, do not call print each time. Append to list and do:

out = []
for x in answers:
    out.append(str(x))
sys.stdout.write("\\n".join(out))

Time Complexity Perspective

Algorithm complexity remains same asymptotically.
Fast I/O reduces constant factors significantly in input-heavy tasks.

Space Complexity Perspective

Bulk read approach uses more memory (stores full input tokens).
readline-based approach uses less memory but may be slightly slower.
Choose based on input size and memory limits.

When to Use Which Approach

Small/medium input: normal input() is often fine.
Large input, moderate memory: readline + batched output.
Very large input, tight time: buffered read().split() style.

Edge Cases

Trailing spaces/newlines: use robust parsing (split() handles whitespace).
Empty input: guard before reading expected tokens.
Mixed token types: parse carefully and validate expected count.

Common Mistakes

Common Mistake: Optimizing I/O before fixing an O(n^2) algorithm that needs O(n log n) or O(n).

Common Mistake: Calling print in large loops instead of buffered output.

Common Mistake: Using full-input read on memory-constrained tasks without checking limits.

Interview Insight

Interview Insight: In most whiteboard interviews, fast I/O code is not required. Mention it only when discussing online judges or performance constraints; focus interview time on algorithm correctness first.

Practice Problems

Implement same solution with input(), readline, and buffer.read; benchmark differences.
Solve large query-sum problem with batched output.
Build reusable fast token parser template for contests.

Expert Tip: Keep two templates ready: a clean readline version and a high-throughput buffer.read version. Pick based on constraints instead of habit.

Summary

Fast I/O reduces runtime overhead in data-heavy problems.
Use buffered reads and batched writes when input/output volume is large.
Algorithm complexity still remains the primary performance driver.
Choose I/O strategy by balancing speed and memory constraints.

22.7 Modulo Tricks

Introduction

Modulo arithmetic is one of the most common tools in DSA and competitive programming. It helps prevent integer overflow, supports cyclic behavior, and enables efficient counting/combinatorics under large constraints.

This topic is not just “use % 1000000007”. It is about understanding the rules deeply so you can apply them correctly in dynamic programming, number theory, hashing, and prefix techniques.

Real-World Analogy

Think of a clock with 12 hours. After 12 comes 1 again. Modulo arithmetic works like this wrap-around system. On a clock, (10 + 5) mod 12 = 3. In programming, the same logic helps in circular indexing and bounded arithmetic.

Formal Definition

For integers a and positive m:

a mod m is the remainder when a is divided by m.

Concept Note: Two numbers are congruent modulo m if they have the same remainder: a ≡ b (mod m).

Why This Topic Matters

Avoids overflow in large multiplication/addition chains.
Essential in counting problems where answers are huge.
Enables advanced techniques: modular inverse, fast exponentiation, hashing, cyclic arrays.

Mental Model

Work inside a fixed remainder space [0, m-1]
Every operation "wraps" back into this range

As long as you apply modulo rules correctly, intermediate huge values can be safely controlled.

Core Identities

(a + b) % m = ((a % m) + (b % m)) % m
(a - b) % m = ((a % m) - (b % m) + m) % m
(a * b) % m = ((a % m) * (b % m)) % m
Division is not direct: (a / b) % m needs modular inverse of b (when it exists).

Brute Force → Better → Optimal

Brute Force

Compute large numbers directly then take modulo at the end. Risk: overflow/time issues.

Better

Take modulo after each operation to keep values bounded.

Optimal

Combine rolling modulo with fast exponentiation, modular inverse, and precomputation where required.

Optimization Insight: Apply modulo at every arithmetic step in loops/DP transitions, not just at final return.

Step-by-Step Tricks You Must Know

1) Safe Subtraction

Use (a - b + m) % m to avoid negative remainders.

2) Fast Power (Binary Exponentiation)

Compute a^b % m in O(log b), not O(b).

3) Modular Inverse

For prime m, inverse of x is x^(m-2) % m (Fermat's theorem), when x % m != 0.

4) Prefix Mod Trick

For subarray sum divisibility problems, track prefix sum remainders.

ASCII Diagram

Modulo 5 number line wraps:
... -2 -1 0 1 2 3 4 5 6 7 ...
remainders:
...  3  4 0 1 2 3 4 0 1 2 ...

Same remainder => same class mod 5

Python Implementation Snippets

Fast Power (mod exponentiation)

def mod_pow(a: int, b: int, mod: int) -> int:
    a %= mod
    result = 1
    while b > 0:
        if b & 1:
            result = (result * a) % mod
        a = (a * a) % mod
        b >>= 1
    return result

Modular Inverse (prime mod)

def mod_inv(x: int, mod: int) -> int:
    # Works when mod is prime and x % mod != 0
    return mod_pow(x, mod - 2, mod)

Count subarrays with sum divisible by k

from collections import defaultdict

def count_divisible_subarrays(nums, k):
    freq = defaultdict(int)
    freq[0] = 1
    prefix = 0
    ans = 0

    for x in nums:
        prefix = (prefix + x) % k
        ans += freq[prefix]
        freq[prefix] += 1
    return ans

Line-by-Line Explanation

mod_pow halves exponent each iteration using binary representation.
mod_inv uses Fermat theorem shortcut for prime modulus.
In prefix-divisibility, equal remainders imply divisible difference.

Worked Example

Example: nums = [4, 5, 0, -2, -3, 1], k = 5 Prefix remainders repeat multiple times. Every repeated remainder pair contributes one valid subarray. Final answer is 7.

Time Complexity

Fast exponentiation: O(log b)
Prefix remainder counting: O(n)
Mod inverse via fast power: O(log mod)

Space Complexity

Prefix remainder map: O(min(n, k))
Fast power/inverse: O(1) auxiliary

Edge Cases

Negative numbers: normalize remainder as needed (language-dependent behavior).
Non-prime modulus: modular inverse may not exist for all values.
Division under modulo: only valid when inverse exists.
Very large multiplication: apply modulo at each step.

Common Mistakes

Common Mistake: Writing (a - b) % m and assuming non-negative result in all languages.

Common Mistake: Doing modular division directly instead of multiplying by inverse.

Common Mistake: Forgetting modulo in intermediate DP transitions, causing overflow or wrong values.

Pattern Recognition

Modulo tricks usually appear when:

Problem asks for answer “mod 1e9+7” or large prime.
Subarray divisibility or remainder-frequency patterns exist.
Huge exponentiation/combinatorics required.
Circular indexing behavior is involved.

Interview Insight

Interview Insight: Say explicitly: “I will keep all transitions modulo M to prevent overflow and maintain correctness.” This signals implementation maturity and edge-case awareness.

Practice Problems

Implement nCr % mod with factorial + inverse factorial.
Count subarrays with sum divisible by k.
Compute huge power tower variants using modular exponentiation.
Solve circular array indexing problems with modulo normalization.

Expert Tip: Keep a trusted modulo utility snippet (pow, inv, safe subtract). Reusing a tested template reduces silent arithmetic bugs under pressure.

Summary

Modulo arithmetic is essential for large-number and cyclic problems.
Use core identities carefully, especially subtraction and division.
Fast exponentiation and modular inverse are must-know tools.
Correct modulo usage prevents many hidden runtime and correctness failures.

22.8 Contest Strategy

Introduction

Contest strategy is the system you use to maximize score under fixed time, not just your raw algorithm knowledge. Many strong coders underperform because they solve in the wrong order, spend too long debugging one problem, or ignore scoring mechanics.

This topic teaches how to convert knowledge into consistent contest outcomes.

Real-World Analogy

In a chess tournament, winning is not only about seeing deep tactics. You also manage clock time, choose practical lines, avoid unnecessary risk, and adapt to opponent pressure. Coding contests require the same strategic control.

Formal Definition

Contest strategy is a decision framework for problem selection, time allocation, risk management, debugging priority, and submission timing to optimize final rank/score.

Concept Note: In contests, solving fewer problems correctly can beat attempting many problems partially.

Why This Topic Matters

Improves rank without increasing theoretical knowledge immediately.
Reduces panic and poor decisions under time pressure.
Builds repeatable habits for long contests and interview assessments.

Mental Model

Scan -> Prioritize -> Execute -> Validate -> Submit -> Replan

You should continuously re-evaluate your strategy during the contest, not just at the start.

Brute Force → Better → Optimal (Contest Approach)

Brute Force

Start from problem A and continue sequentially regardless of fit. High risk of getting stuck early.

Better

Quickly scan all problems and solve easiest first.

Optimal

Use dynamic prioritization: solve highest expected-value problems first (confidence × points / time), with strict time caps and fallback decisions.

Optimization Insight: Contest performance depends on expected-value decisions, not ego-driven “hardest problem first” behavior.

Pre-Contest Preparation Checklist

Set up tested templates (fast I/O, graph boilerplate, modulo utilities).
Warm up with 1-2 short problems to activate speed and focus.
Review common bug checklist (indices, overflow, modulo negatives, recursion limits).
Prepare mental submission routine: sample test -> custom edge test -> submit.

In-Contest Step-by-Step Strategy

First 5 minutes: scan all problems and tag as Easy / Medium / Hard for you (not globally).
Solve momentum problems first: secure fast accepted submissions.
Set time cap per attempt: e.g., 20-25 minutes max before reassessment.
When stuck: leave concise notes and switch; return later with fresh state.
Before every submit: run quick edge checklist.
Final phase: prioritize bug-fixing near-complete solutions over speculative new starts.

ASCII Contest Timeline (120-minute Example)

0-----5-----30-----60-----90----110----120
|Scan| Easy | Mid  | Mid/Hard   | Fix | Final submit checks |

Problem Priority Scoring Heuristic

Use a practical score to decide next problem:

priority_score = confidence * points / estimated_minutes

Higher score means better expected return.

Python Utility for Priority Ranking

from dataclasses import dataclass
from typing import List


@dataclass
class ProblemOption:
    name: str
    points: int
    confidence: float       # 0.0 to 1.0
    estimated_minutes: int


def rank_problem_options(options: List[ProblemOption]) -> List[ProblemOption]:
    def score(p: ProblemOption) -> float:
        if p.estimated_minutes <= 0:
            return -1.0
        return (p.confidence * p.points) / p.estimated_minutes

    return sorted(options, key=score, reverse=True)

Line-by-Line Explanation

Each problem has subjective confidence, potential points, and expected solve time.
Ranking function estimates expected-value efficiency.
This is not exact math, but it prevents random decision-making under stress.

Time Management Inside Contest

Use hard stop rules (for example, 20 minutes without progress -> switch).
Track time spent vs points gained every 30 minutes.
Reserve final 10 minutes for validation and careful submission.

Debugging Strategy Under Pressure

Reproduce smallest failing case.
Check assumptions/invariants before rewriting logic.
Audit boundaries: loops, indices, empty/single inputs.
Only then patch specific lines.

Common Mistakes

Common Mistake: Spending 45+ minutes on one hard problem early while easy points remain unsolved.

Common Mistake: Submitting without quick edge checks, causing avoidable WA/TLE penalties.

Common Mistake: Rewriting entire solution during panic instead of targeted debugging.

Common Mistake: Ignoring scoreboard/penalty model in decision-making.

Edge Cases in Contest Decisions

Low confidence but high points: timebox and reevaluate quickly.
Many partial solutions: convert nearest-complete one to AC first.
Late-stage fatigue: simplify approach and avoid risky refactors.

Pattern Recognition in Contest Setting

Faster recognition means faster solves. During scan phase, map problem cues to likely patterns immediately and shortlist candidate templates before coding.

Interview Insight

Interview Insight: Contest strategy improves interviews too: quick problem triage, disciplined pacing, and calm debugging are directly transferable to coding rounds.

Practice Problems

Run 3 mock contests and log time allocation decisions every 20 minutes.
After each contest, classify misses: knowledge gap vs strategy error.
Practice “forced switch” drill: leave a stuck problem at minute 20 and solve another.

Expert Tip: Keep a post-contest review sheet with three columns: wrong decisions, root cause, and future rule. Strategy improves fastest when converted into explicit rules.

Summary

Contest success is a strategy problem as much as an algorithm problem.
Use scan-first prioritization, time caps, and expected-value decisions.
Secure points early, debug systematically, and submit with discipline.
Strong contest strategy builds transferable interview performance.

22.9 Mock Interviews

Introduction

Mock interviews are the closest training environment to real coding interviews. They combine problem-solving, communication, time management, and stress handling in one session. Solving problems alone is necessary, but mock interviews are where performance becomes interview-ready.

If your goal is actual selection, mock interviews are non-negotiable.

Real-World Analogy

A pilot does not train only by reading manuals; they train in flight simulators under realistic scenarios. Mock interviews are your simulator: same pressure, same constraints, same evaluation style.

Formal Definition

Mock interview is a structured practice session that replicates real interview conditions (time-boxed problem, live communication, no hidden IDE support assumptions, and post-round evaluation).

Concept Note: The value of a mock is not the question solved; it is the behavioral data you collect and improve.

Why This Topic Matters

Reveals gaps invisible in solo practice (panic, unclear explanation, pacing breakdown).
Builds confidence through repeated exposure to interview-like pressure.
Transforms knowledge into reliable interview execution.

Mental Model

Simulate -> Measure -> Diagnose -> Correct -> Re-simulate

Every mock should end with specific corrective actions, not generic feedback.

Brute Force → Better → Optimal (Mock Practice)

Brute Force

Random mock sessions with no rubric, no logs, no follow-up.

Better

Regular mocks with basic qualitative feedback.

Optimal

Rubric-based mock cycle: score dimensions, identify root causes, assign targeted drills, and re-test same weakness type.

Optimization Insight: A smaller number of high-quality, reviewed mocks beats a large number of unstructured mocks.

Mock Interview Session Blueprint (60 Minutes)

Phase	Duration	Goal
Problem understanding + clarification	5 min	Correct interpretation
Approach design	10 min	Brute-force to optimal reasoning
Coding	25 min	Working implementation
Dry run + edge cases	10 min	Bug detection
Feedback + action items	10 min	Improvement loop

Scoring Rubric Dimensions

Problem understanding (clarity, assumptions, scope).
Algorithm quality (correctness + complexity fit).
Code quality (structure, bugs, edge handling).
Communication (clear reasoning, collaboration).
Debugging and recovery (methodical correction under pressure).
Time management (phase pacing, completion).

ASCII Feedback Loop

Mock #N result
   |
   v
Root-cause labels
(pattern miss / bug / pacing / communication)
   |
   v
Targeted drills (3-5)
   |
   v
Mock #N+1 validation

Python Mock Tracker Utility

from dataclasses import dataclass
from typing import List, Dict


@dataclass
class MockResult:
    date: str
    understanding: int
    algorithm: int
    code: int
    communication: int
    debugging: int
    time_management: int
    primary_issue: str


def summarize_mocks(results: List[MockResult]) -> Dict:
    if not results:
        return {"count": 0}

    n = len(results)
    avg = {
        "understanding": sum(r.understanding for r in results) / n,
        "algorithm": sum(r.algorithm for r in results) / n,
        "code": sum(r.code for r in results) / n,
        "communication": sum(r.communication for r in results) / n,
        "debugging": sum(r.debugging for r in results) / n,
        "time_management": sum(r.time_management for r in results) / n,
    }

    issue_freq = {}
    for r in results:
        issue_freq[r.primary_issue] = issue_freq.get(r.primary_issue, 0) + 1

    return {"count": n, "averages": avg, "issue_frequency": issue_freq}

Line-by-Line Explanation

Each mock is scored on six interview-critical dimensions.
Summary computes trends, not just one-off outcomes.
Issue frequency reveals recurring bottlenecks for targeted practice.

Time Complexity Perspective

Mock interviews train complexity judgment under pressure, not just in calm offline solving.
A frequent mock failure pattern: selecting an O(n^2) plan despite large constraints due to stress.

Space Complexity Perspective

Mock feedback should include space trade-off reasoning quality, not only runtime choices.

Edge Cases in Mock Preparation

Over-practice with familiar partners: feedback may become predictable and less useful.
Only easy mocks: confidence rises but selection readiness does not.
No speaking practice: strong coder, weak interview signal.

Common Mistakes

Common Mistake: Treating mock as pass/fail event instead of diagnostic tool.

Common Mistake: Ignoring feedback logs and repeating same errors in later mocks.

Common Mistake: Focusing only on algorithm score while communication/time-management scores stay weak.

Mock Interview Formats You Should Rotate

Peer mock: accessible and frequent.
Recorded self-mock: excellent for communication self-audit.
Professional mock: high-quality calibrated feedback near final prep stage.

Interview Insight

Interview Insight: Real improvement comes from “feedback-to-drill” conversion. Every mock should produce 2-3 concrete rules for next week’s practice.

Practice Problems

Run 6 mocks over 3 weeks with rubric scoring and trend tracking.
Re-attempt one failed mock problem after one week to measure recovery.
Do one communication-only mock where coding is secondary and explanation quality is primary.

Expert Tip: Schedule mocks by objective: early phase for diagnostic breadth, mid phase for weakness repair, final phase for confidence + consistency rehearsal.

Summary

Mock interviews are the highest-fidelity preparation for real interviews.
Use rubric-based measurement across algorithm, coding, communication, and pacing.
Convert every mock outcome into targeted corrective drills.
Consistent mock feedback loops create reliable interview performance.

22.10 Revision Strategy

Introduction

Revision strategy is the system that turns solved problems into long-term interview-ready skill. Without revision, problem-solving quality decays quickly, pattern recall slows down, and previously solved questions feel new again.

This topic teaches how to revise efficiently so your preparation compounds over time.

Real-World Analogy

Learning DSA is like building muscle memory for an instrument. Practicing a piece once is not enough; spaced, structured repetition is what makes performance reliable on stage. Interviews are your stage.

Formal Definition

Revision strategy is a planned schedule for revisiting concepts, patterns, and past problems using spaced repetition, error-driven review, and timed re-implementation to maximize retention and execution speed.

Concept Note: Revision is not re-reading notes passively. Effective revision requires active recall and re-solving under constraints.

Why This Topic Matters

Prevents forgetting and keeps core patterns interview-ready.
Improves speed and confidence under time pressure.
Converts one-time practice into durable problem-solving intuition.

Mental Model

Solve -> Capture lessons -> Revisit at intervals -> Re-solve faster -> Internalize

Each revision cycle should reduce hint dependency and implementation time.

Brute Force → Better → Optimal (Revision)

Brute Force

Randomly revisit old problems when you remember them. Inconsistent and incomplete.

Better

Maintain topic-wise lists and occasionally re-solve.

Optimal

Use spaced revision + error buckets + timed mixed sets + mock integration.

Optimization Insight: Reviewing your mistakes yields higher return than re-solving only favorite problems.

Spaced Revision Schedule (Practical)

After solving a problem on Day 0, revisit on:

Day 1 (quick recall check)
Day 3 (re-implement key idea)
Day 7 (timed re-solve)
Day 14 (mixed set recall)
Day 30 (retention verification)

Revision Buckets Framework

Bucket A: Pattern Misses

You solved only after hints because initial pattern identification failed.

Bucket B: Implementation Bugs

Pattern was right, but coding errors caused WA/TLE.

Bucket C: Edge Case Misses

Core logic worked, but boundary inputs failed.

Bucket D: Complexity Misjudgment

Chosen approach did not meet constraints.

Step-by-Step Weekly Revision Plan

Pick 15-20 previously solved problems from different topics.
Tag each problem into revision buckets (A/B/C/D).
Re-solve 5 problems timed (no notes).
Revisit 5 problems as explanation-only drill (verbal algorithm articulation).
Run one mixed mock of 2-3 questions from weak buckets.
Update notes with one-line takeaway per problem.

ASCII Revision Cycle

Week Start:
[Select old problems]
      |
      v
[Tag failure type]
      |
      v
[Timed re-solve]
      |
      v
[Mock integration]
      |
      v
[Update notebook + next schedule]

Python Revision Tracker Utility

from dataclasses import dataclass
from typing import List


@dataclass
class RevisionEntry:
    problem: str
    bucket: str          # A/B/C/D
    solve_minutes: int
    used_hint: bool
    passed_all_tests: bool


def revision_summary(entries: List[RevisionEntry]) -> dict:
    if not entries:
        return {"count": 0}

    n = len(entries)
    bucket_count = {}
    for e in entries:
        bucket_count[e.bucket] = bucket_count.get(e.bucket, 0) + 1

    return {
        "count": n,
        "avg_time": sum(e.solve_minutes for e in entries) / n,
        "hint_rate": sum(e.used_hint for e in entries) / n,
        "pass_rate": sum(e.passed_all_tests for e in entries) / n,
        "bucket_distribution": bucket_count,
    }

Line-by-Line Explanation

Each entry tracks outcome quality, not just “solved/not solved”.
Bucket distribution reveals dominant weakness patterns.
Average time and hint rate show practical readiness trend.

Time Complexity Perspective

Revision improves your ability to choose optimal complexity faster.
Over time, you should need fewer brute-force starts before reaching correct asymptotic approach.

Space Complexity Perspective

Revision notes should stay concise: key invariant, common bug, final complexity, one edge case.
Large notes with low signal are hard to review and reduce efficiency.

Edge Cases in Revision Planning

Only revising easy problems: confidence rises but interview performance stagnates.
Only revising hard problems: fundamentals become shaky.
No timed component: recall may exist but execution speed remains low.

Common Mistakes

Common Mistake: Passive reading of old solutions without active re-implementation.

Common Mistake: Not tracking why you failed a problem, so same mistakes repeat.

Common Mistake: Ignoring solved problems for months, leading to steep recall decay.

Interview Insight

Interview Insight: Candidates who revise systematically explain patterns faster and code cleaner because they’ve compressed repeated lessons into strong mental templates.

Practice Problems

Create a personal revision sheet for top 10 high-frequency interview patterns.
Run 14-day spaced revision cycle and compare time/hint metrics before vs after.
Re-solve one old hard problem weekly under strict 35-minute timer.

Expert Tip: End every study week with a “revision day” instead of new problems only. Retention compounds; random volume does not.

Summary

Revision strategy transforms short-term solving into long-term mastery.
Use spaced repetition, failure buckets, and timed re-solving.
Track progress metrics (time, hints, pass rate), not just solved count.
Consistent revision is one of the strongest predictors of interview reliability.

23.1 SOLID Principles in Python

Introduction

SOLID is a set of five object-oriented design principles that help you write software that is easier to maintain, extend, test, and reason about. In Python, these principles are especially useful because dynamic typing and fast iteration can otherwise lead to tightly coupled, fragile code if design discipline is missing.

This topic moves from pure algorithm solving into engineering-level software design — exactly the layer needed for strong production coding and system interviews.

Real-World Analogy

Imagine building a modular home. If plumbing, wiring, and walls are all tangled together, small changes become dangerous and expensive. SOLID is like architectural standards that keep parts separated with clean interfaces, so upgrades are safe and predictable.

Formal Definition

SOLID expands to:

S – Single Responsibility Principle (SRP)
O – Open/Closed Principle (OCP)
L – Liskov Substitution Principle (LSP)
I – Interface Segregation Principle (ISP)
D – Dependency Inversion Principle (DIP)

Concept Note: SOLID principles are guidelines, not rigid rules. Use them to improve design quality, not to force unnecessary abstraction.

Why This Topic Matters

Reduces bug risk when features evolve.
Improves unit testing through decoupled components.
Common in senior interviews and code review expectations.
Builds foundation for design patterns and maintainable architecture.

Mental Model

High cohesion inside classes
Low coupling between classes

Change one feature -> minimal ripple effects

SOLID helps control two forces: where responsibility lives and how components depend on each other.

Brute Force → Better → Optimal (Design Evolution)

Brute Force

Monolithic classes with many unrelated responsibilities and hard-coded dependencies.

Better

Some helper classes introduced, but boundaries and abstractions are inconsistent.

Optimal

SOLID-aligned architecture: focused responsibilities, extension points, substitutable contracts, small interfaces, and dependency abstraction.

Optimization Insight: SOLID optimizes long-term change cost, not just immediate line count.

S — Single Responsibility Principle (SRP)

Idea

A class should have one reason to change.

Bad Example (Multiple Responsibilities)

class ReportManager:
    def create_report(self, data):
        return f"Report: {data}"

    def save_to_file(self, report, path):
        with open(path, "w") as f:
            f.write(report)

    def send_email(self, report, email):
        print(f"Sending to {email}: {report}")

This class handles creation, persistence, and notification — too many responsibilities.

Better Split

class ReportBuilder:
    def create(self, data):
        return f"Report: {data}"


class ReportRepository:
    def save(self, report, path):
        with open(path, "w") as f:
            f.write(report)


class ReportNotifier:
    def send_email(self, report, email):
        print(f"Sending to {email}: {report}")

O — Open/Closed Principle (OCP)

Idea

Software entities should be open for extension, closed for modification.

Example

Instead of editing one giant if/elif block for new discount types, introduce strategy classes implementing a common interface.

from abc import ABC, abstractmethod

class DiscountStrategy(ABC):
    @abstractmethod
    def apply(self, amount: float) -> float:
        pass


class NoDiscount(DiscountStrategy):
    def apply(self, amount: float) -> float:
        return amount


class SeasonalDiscount(DiscountStrategy):
    def apply(self, amount: float) -> float:
        return amount * 0.9

L — Liskov Substitution Principle (LSP)

Idea

Subtypes should be usable wherever base types are expected without breaking behavior contracts.

Classic Pitfall

If subclass changes expected behavior (for example throwing unsupported errors where parent guarantees support), LSP is violated.

Example Note

Avoid inheritance that forces invalid operations. Prefer composition or better abstractions when contracts differ.

I — Interface Segregation Principle (ISP)

Idea

Clients should not depend on methods they do not use.

Example

Instead of one large interface with print/scan/fax, define focused interfaces so each class implements only relevant capabilities.

from abc import ABC, abstractmethod

class Printable(ABC):
    @abstractmethod
    def print_doc(self, doc: str) -> None:
        pass


class Scannable(ABC):
    @abstractmethod
    def scan_doc(self) -> str:
        pass

D — Dependency Inversion Principle (DIP)

Idea

High-level modules should depend on abstractions, not concrete implementations.

Example

from abc import ABC, abstractmethod

class MessageSender(ABC):
    @abstractmethod
    def send(self, message: str) -> None:
        pass


class EmailSender(MessageSender):
    def send(self, message: str) -> None:
        print(f"Email: {message}")


class NotificationService:
    def __init__(self, sender: MessageSender):
        self.sender = sender

    def notify(self, text: str) -> None:
        self.sender.send(text)

NotificationService can now work with email, SMS, push, or mock sender without changing core logic.

Step-by-Step Refactoring Checklist

Identify classes with multiple reasons to change.
Extract responsibilities into focused components.
Replace switch-heavy logic with polymorphism/strategies.
Check subtype behavior contracts for LSP violations.
Split fat interfaces into role-specific interfaces.
Inject dependencies through abstractions (constructor injection).

ASCII Design Shift

Before:
App -> ConcreteA -> ConcreteB -> ConcreteC (tight coupling)

After:
App -> InterfaceX <- ConcreteA
App -> InterfaceY <- ConcreteB
App -> InterfaceZ <- ConcreteC

Time Complexity Perspective

SOLID usually does not change algorithmic Big-O directly.
Its performance effect is architectural: safer optimizations and easier profiling-driven changes.

Space Complexity Perspective

Abstractions may add slight object overhead.
Trade-off is generally worth it for maintainability and testability.

Edge Cases in Applying SOLID

Over-abstraction: too many tiny classes can harm readability.
Premature generalization: do not design for hypothetical future complexity only.
Inheritance misuse: composition often gives safer flexibility in Python.

Common Mistakes

Common Mistake: Treating SOLID as mandatory boilerplate in every small script.

Common Mistake: Using inheritance where behavior contracts do not align (LSP break).

Common Mistake: Hard-coding dependencies inside business logic, making tests difficult.

Pattern Recognition

You likely need SOLID-oriented refactoring when:

One class keeps changing for unrelated reasons.
Adding a feature requires editing many existing files.
Unit tests require heavy monkey-patching due to tight coupling.
Large interfaces force classes to implement irrelevant methods.

Interview Insight

Interview Insight: In LLD interviews, do not just name SOLID principles. Demonstrate one concrete refactor from bad design to improved design and explain the trade-off.

Practice Problems

Refactor a monolithic class into SRP-compliant components.
Convert if-else business rules into strategy pattern (OCP).
Inject repository and notifier abstractions for DIP-compliant service.
Review one existing project and identify one violation per SOLID principle.

Expert Tip: Start with SRP + DIP first. These two usually deliver the biggest practical quality improvement in Python codebases.

Summary

SOLID principles improve maintainability, extensibility, and testability.
They are design heuristics for managing change safely.
Apply pragmatically: enough abstraction for clarity, not abstraction for its own sake.
Mastering SOLID is essential for strong engineering-level interview performance.

23.2 Singleton Pattern

Introduction

The Singleton Pattern ensures that a class has only one instance and provides a global access point to that instance. It is commonly used for shared resources such as configuration managers, logging controllers, cache coordinators, and connection pools.

In Python, singleton design is straightforward to implement, but must be used carefully because overuse can create hidden global state and testing difficulties.

Real-World Analogy

Think of a control tower at an airport. You do not want multiple independent towers giving contradictory instructions. A singleton acts like one authoritative control point shared by all consumers.

Formal Definition

Singleton is a creational design pattern that restricts object instantiation so that exactly one instance of a class exists during application lifetime (or within a defined scope).

Concept Note: “One instance” can mean one per process, one per thread, or one per context depending on implementation requirements.

Why This Topic Matters

Common in low-level design interviews and real codebases.
Useful for centralized coordination and shared expensive resources.
Helps understand trade-offs between convenience and testability.

Mental Model

Client A ----\
Client B ----- > Singleton.get_instance() -> same object reference
Client C ----/

No matter how many times creation is requested, same instance should be returned.

Brute Force → Better → Optimal

Brute Force

Create class normally; multiple independent objects appear. Shared state coordination becomes inconsistent.

Better

Store a module-level global object. Simple, but less explicit and harder to control initialization semantics.

Optimal (interview-quality pattern)

Encapsulate instance control in class-level logic with optional thread safety.

Optimization Insight: Singleton optimizes resource coordination and initialization control, not algorithmic complexity.

Python Implementation Approaches

1) new-based Singleton

class Singleton:
    _instance = None

    def __new__(cls, *args, **kwargs):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance


class ConfigManager(Singleton):
    def __init__(self):
        # Guard to avoid reinitializing on every call
        if hasattr(self, "_initialized") and self._initialized:
            return
        self.settings = {}
        self._initialized = True

2) Metaclass-based Singleton (Reusable)

class SingletonMeta(type):
    _instances = {}

    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            cls._instances[cls] = super().__call__(*args, **kwargs)
        return cls._instances[cls]


class Logger(metaclass=SingletonMeta):
    def __init__(self):
        self.logs = []

    def log(self, message: str) -> None:
        self.logs.append(message)

3) Thread-safe Metaclass Variant

from threading import Lock


class ThreadSafeSingletonMeta(type):
    _instances = {}
    _lock = Lock()

    def __call__(cls, *args, **kwargs):
        with cls._lock:
            if cls not in cls._instances:
                cls._instances[cls] = super().__call__(*args, **kwargs)
        return cls._instances[cls]

Line-by-Line Explanation

_instance (or _instances map) stores already created object(s).
Creation path checks cache first; creates object only once.
__init__ may run multiple times unless guarded in __new__ approach.
Thread-safe version uses lock to prevent race conditions during first creation.

Step-by-Step Usage Example

Example:

a = Logger()
b = Logger()
id(a) == id(b) is True (same object).
a.log("x") is visible through b.logs too.

ASCII Diagram

Request #1 -> create instance -> store in _instances
Request #2 -> return stored instance
Request #3 -> return stored instance

Time Complexity Perspective

Instance retrieval: O(1) average dictionary lookup.
Thread-safe locking adds small synchronization overhead.

Space Complexity Perspective

O(1) per singleton class instance (or O(k) for k singleton classes in metaclass registry).

Edge Cases

Multithreading: race conditions can create multiple instances without lock.
Serialization/deserialization: may accidentally create extra objects if not handled carefully.
Testing: global shared state can leak between tests unless reset hooks exist.

Common Mistakes

Common Mistake: Forgetting that __init__ may run repeatedly in some singleton implementations.

Common Mistake: Using singleton for unrelated objects just to avoid dependency injection.

Common Mistake: Ignoring thread safety in concurrent applications.

When to Use vs Avoid

Use Singleton When

You truly need one shared coordinator/resource.
Centralized lifecycle management is required.
Configuration consistency must be guaranteed.

Avoid Singleton When

It introduces hidden global mutable state.
Dependency injection would provide clearer design.
Test isolation becomes difficult.

Pattern Recognition

Singleton is suitable if requirements include:

“Exactly one shared instance”
“Global access to central manager”
“Avoid duplicate expensive initialization”

Interview Insight

Interview Insight: In interviews, mention both sides: implementation and trade-offs. Saying “I’ll use singleton, but I’ll watch for testability and hidden global state” shows mature design judgment.

Practice Problems

Implement singleton logger with thread safety.
Build singleton config manager with lazy initialization.
Refactor singleton usage into dependency injection and compare testability.

Expert Tip: Prefer explicit dependency passing where possible. Use singleton only when single-instance semantics are truly part of the domain, not just convenience.

Summary

Singleton ensures only one instance with global access.
Useful for shared managers and expensive one-time resources.
Implement carefully with initialization guards and thread safety when needed.
Apply sparingly to avoid global-state design debt.

23.2 Singleton Pattern

Introduction

In Python, singleton design is straightforward to implement, but must be used carefully because overuse can create hidden global state and testing difficulties.

Real-World Analogy

Think of a control tower at an airport. You do not want multiple independent towers giving contradictory instructions. A singleton acts like one authoritative control point shared by all consumers.

Formal Definition

Singleton is a creational design pattern that restricts object instantiation so that exactly one instance of a class exists during application lifetime (or within a defined scope).

Concept Note: "One instance" can mean one per process, one per thread, or one per context depending on implementation requirements.

Why This Topic Matters

Common in low-level design interviews and real codebases.
Useful for centralized coordination and shared expensive resources.
Helps understand trade-offs between convenience and testability.

Mental Model

Client A ----Client B ----- > Singleton.get_instance() -> same object reference
Client C ----/

No matter how many times creation is requested, same instance should be returned.

Brute Force → Better → Optimal

Brute Force

Create class normally; multiple independent objects appear. Shared state coordination becomes inconsistent.

Better

Store a module-level global object. Simple, but less explicit and harder to control initialization semantics.

Optimal (interview-quality pattern)

Encapsulate instance control in class-level logic with optional thread safety.

Optimization Insight: Singleton optimizes resource coordination and initialization control, not algorithmic complexity.

Python Implementation Approaches

1) new-based Singleton

class Singleton:
    _instance = None

    def __new__(cls, *args, **kwargs):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance


class ConfigManager(Singleton):
    def __init__(self):
        # Guard to avoid reinitializing on every call
        if hasattr(self, "_initialized") and self._initialized:
            return
        self.settings = {}
        self._initialized = True

2) Metaclass-based Singleton (Reusable)

class SingletonMeta(type):
    _instances = {}

    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            cls._instances[cls] = super().__call__(*args, **kwargs)
        return cls._instances[cls]


class Logger(metaclass=SingletonMeta):
    def __init__(self):
        self.logs = []

    def log(self, message: str) -> None:
        self.logs.append(message)

3) Thread-safe Metaclass Variant

from threading import Lock


class ThreadSafeSingletonMeta(type):
    _instances = {}
    _lock = Lock()

    def __call__(cls, *args, **kwargs):
        with cls._lock:
            if cls not in cls._instances:
                cls._instances[cls] = super().__call__(*args, **kwargs)
        return cls._instances[cls]

Line-by-Line Explanation

_instance (or _instances map) stores already created object(s).
Creation path checks cache first; creates object only once.
__init__ may run multiple times unless guarded in __new__ approach.
Thread-safe version uses lock to prevent race conditions during first creation.

Step-by-Step Usage Example

Example:

a = Logger()
b = Logger()
id(a) == id(b) is True (same object).
a.log("x") is visible through b.logs too.

ASCII Diagram

Request #1 -> create instance -> store in _instances
Request #2 -> return stored instance
Request #3 -> return stored instance

Time Complexity Perspective

Instance retrieval: O(1) average dictionary lookup.
Thread-safe locking adds small synchronization overhead.

Space Complexity Perspective

O(1) per singleton class instance (or O(k) for k singleton classes in metaclass registry).

Edge Cases

Multithreading: race conditions can create multiple instances without lock.
Serialization/deserialization: may accidentally create extra objects if not handled carefully.
Testing: global shared state can leak between tests unless reset hooks exist.

Common Mistakes

Common Mistake: Forgetting that __init__ may run repeatedly in some singleton implementations.

Common Mistake: Using singleton for unrelated objects just to avoid dependency injection.

Common Mistake: Ignoring thread safety in concurrent applications.

When to Use vs Avoid

Use Singleton When

You truly need one shared coordinator/resource.
Centralized lifecycle management is required.
Configuration consistency must be guaranteed.

Avoid Singleton When

It introduces hidden global mutable state.
Dependency injection would provide clearer design.
Test isolation becomes difficult.

Pattern Recognition

Singleton is suitable if requirements include:

"Exactly one shared instance"
"Global access to central manager"
"Avoid duplicate expensive initialization"

Interview Insight

Interview Insight: In interviews, mention both sides: implementation and trade-offs. Saying "I will use singleton, but I will watch for testability and hidden global state" shows mature design judgment.

Practice Problems

Implement singleton logger with thread safety.
Build singleton config manager with lazy initialization.
Refactor singleton usage into dependency injection and compare testability.

Expert Tip: Prefer explicit dependency passing where possible. Use singleton only when single-instance semantics are truly part of the domain, not just convenience.

Summary

Singleton ensures only one instance with global access.
Useful for shared managers and expensive one-time resources.
Implement carefully with initialization guards and thread safety when needed.
Apply sparingly to avoid global-state design debt.

23.3 Factory Pattern

Introduction

The Factory Pattern is a creational design pattern that centralizes object creation logic. Instead of creating objects directly throughout your codebase, you delegate creation to a factory component that decides which concrete class to instantiate.

This pattern improves flexibility, readability, and maintainability when object creation rules are conditional or likely to evolve.

Real-World Analogy

Imagine ordering a drink at a cafe. You ask for “coffee,” but you do not manually prepare espresso, add milk, and assemble the final drink. The cafe system decides the exact preparation process and returns the right product. Factory pattern does the same for objects.

Formal Definition

Factory Pattern provides an interface for creating objects while allowing subclasses or factory methods to determine the concrete type returned.

Concept Note: Main goal is to separate “what to create” from “how to create,” reducing direct coupling between client code and concrete classes.

Why This Topic Matters

Reduces scattered conditional object creation code.
Makes extension easier when adding new implementations.
Frequently appears in LLD interviews and production systems.

Mental Model

Client -> Factory -> Concrete Product

Client knows abstraction,
Factory knows concrete class selection

Brute Force → Better → Optimal

Brute Force

Client code directly creates concrete classes with many if/elif branches across files.

Better

Move object creation into one helper function, but still tightly coupled and hard to extend cleanly.

Optimal

Use a factory abstraction (or factory method) so new product types can be added with minimal client changes.

Optimization Insight: Factory pattern optimizes changeability and architecture clarity, not runtime complexity.

Step-by-Step: Refactor to Factory

Identify repeated object creation condition blocks.
Extract common product interface (abstract base class/protocol).
Create concrete product classes implementing that interface.
Add factory method/class that maps input type to concrete class.
Replace direct constructor calls in client with factory call.

Python Implementation (Simple Factory)

from abc import ABC, abstractmethod


class Notification(ABC):
    @abstractmethod
    def send(self, message: str) -> None:
        pass


class EmailNotification(Notification):
    def send(self, message: str) -> None:
        print(f"[EMAIL] {message}")


class SMSNotification(Notification):
    def send(self, message: str) -> None:
        print(f"[SMS] {message}")


class PushNotification(Notification):
    def send(self, message: str) -> None:
        print(f"[PUSH] {message}")


class NotificationFactory:
    @staticmethod
    def create(channel: str) -> Notification:
        channel = channel.lower()
        if channel == "email":
            return EmailNotification()
        if channel == "sms":
            return SMSNotification()
        if channel == "push":
            return PushNotification()
        raise ValueError(f"Unsupported channel: {channel}")


# Client
notifier = NotificationFactory.create("email")
notifier.send("Welcome!")

Factory Method Style (Extensible)

from abc import ABC, abstractmethod


class Transport(ABC):
    @abstractmethod
    def deliver(self) -> str:
        pass


class Truck(Transport):
    def deliver(self) -> str:
        return "Deliver by road"


class Ship(Transport):
    def deliver(self) -> str:
        return "Deliver by sea"


class Logistics(ABC):
    @abstractmethod
    def create_transport(self) -> Transport:
        pass

    def plan_delivery(self) -> str:
        transport = self.create_transport()
        return transport.deliver()


class RoadLogistics(Logistics):
    def create_transport(self) -> Transport:
        return Truck()


class SeaLogistics(Logistics):
    def create_transport(self) -> Transport:
        return Ship()

Line-by-Line Explanation

Product interface (Notification/Transport) defines behavior contract.
Concrete classes implement behavior variants.
Factory encapsulates type-selection logic.
Client depends on abstraction, not concrete constructor details.

ASCII Diagram

Client
  |
  v
Factory.create(type)
  |
  +--> ConcreteA()
  +--> ConcreteB()
  +--> ConcreteC()

Time Complexity Perspective

Object creation selection is typically O(1) with direct mapping/branching.
Main benefits are architectural, not asymptotic.

Space Complexity Perspective

Negligible additional overhead for factory layer.
May store registry maps if using dynamic registration approach.

Edge Cases

Invalid type key: return clear error or fallback strategy.
Growing product variants: avoid giant if-else by registry/dictionary mapping.
Shared configuration: ensure factory supports dependency injection.

Common Mistakes

Common Mistake: Introducing factory for trivial one-class creation where direct construction is clearer.

Common Mistake: Keeping large conditional logic in many places instead of centralizing in one factory.

Common Mistake: Returning concrete-specific methods and breaking abstraction usage in clients.

Pattern Recognition

Factory pattern is a strong fit when:

Object type depends on runtime input/configuration.
Creation process is non-trivial and repeated in many places.
You expect new variants to be added over time.

Interview Insight

Interview Insight: In LLD interviews, explain why factory reduces coupling and mention extension path (adding new class without editing client logic). That “why” matters more than pattern name-dropping.

Practice Problems

Refactor payment gateway selection logic into a factory.
Create parser factory for JSON/XML/CSV handlers.
Build plugin-style factory with class registry map.

Expert Tip: For Python, a dictionary-based registry factory often stays cleaner than long if-elif chains as product count grows.

Summary

Factory pattern centralizes object creation and hides concrete selection details.
It improves extensibility and keeps client code focused on abstractions.
Use it when creation logic is conditional, repeated, or likely to evolve.
Apply pragmatically to avoid unnecessary abstraction overhead.

23.4 Adapter Pattern

Introduction

The Adapter Pattern allows two incompatible interfaces to work together without modifying their original code. It acts as a translator between a client’s expected interface and an existing class with a different interface.

In Python systems, adapter usage is common when integrating third-party SDKs, legacy modules, and external services.

Real-World Analogy

A laptop charger plug from one country may not fit a wall socket in another country. A travel adapter bridges this mismatch without changing either the laptop or the building wiring. Software adapters solve the same compatibility problem.

Formal Definition

Adapter Pattern is a structural design pattern that converts the interface of a class into another interface clients expect.

Concept Note: Adapter is about interface compatibility, not adding new business features.

Why This Topic Matters

Lets you integrate legacy or third-party code safely.
Reduces ripple effects by isolating integration differences.
Frequently used in production microservices and SDK wrappers.

Mental Model

Client expects: target_interface()
Adapter exposes: target_interface()
Adapter internally calls: adaptee.different_interface()

Brute Force → Better → Optimal

Brute Force

Modify client everywhere to support multiple incompatible APIs. Leads to scattered conditionals and brittle code.

Better

Add conversion logic near call sites. Still duplicated and hard to maintain.

Optimal

Introduce adapter layer so client stays stable and only adapter knows external API differences.

Optimization Insight: Adapters optimize change isolation: when external API changes, you usually update one adapter instead of many client call sites.

Core Participants

Target: interface expected by client.
Adaptee: existing class with incompatible interface.
Adapter: bridge converting target calls into adaptee calls.
Client: code that depends only on target interface.

Step-by-Step Refactor to Adapter

Define target interface used by your application.
Identify external/legacy method mismatch.
Create adapter implementing target interface.
Map and transform request/response fields inside adapter.
Inject adapter into client; remove external API calls from business logic.

Python Implementation

Scenario

Client expects send(message), but third-party notifier provides push_text(payload).

from abc import ABC, abstractmethod


class Notifier(ABC):
    @abstractmethod
    def send(self, message: str) -> None:
        pass


# Third-party / legacy class (Adaptee)
class LegacyPushService:
    def push_text(self, payload: dict) -> None:
        print(f"LEGACY_PUSH: {payload['body']}")


# Adapter
class LegacyPushAdapter(Notifier):
    def __init__(self, legacy_service: LegacyPushService):
        self.legacy_service = legacy_service

    def send(self, message: str) -> None:
        payload = {"body": message}
        self.legacy_service.push_text(payload)


# Client depends only on target interface
class AlertService:
    def __init__(self, notifier: Notifier):
        self.notifier = notifier

    def alert(self, text: str) -> None:
        self.notifier.send(text)


legacy = LegacyPushService()
adapter = LegacyPushAdapter(legacy)
service = AlertService(adapter)
service.alert("High CPU usage")

Line-by-Line Explanation

Notifier is stable interface for business layer.
LegacyPushService has incompatible method and payload shape.
LegacyPushAdapter translates send(text) into legacy API format.
AlertService remains clean and unaffected by legacy details.

ASCII Diagram

Client (AlertService)
      |
      v
Target: Notifier.send(msg)
      |
      v
Adapter (LegacyPushAdapter)
      |
      v
Adaptee (LegacyPushService.push_text(payload))

Class Adapter vs Object Adapter

Object Adapter (Preferred in Python)

Adapter holds adaptee instance (composition). Flexible and most common in Python.

Class Adapter

Adapter uses inheritance from adaptee + target interface. Less common and less flexible in Python due to multiple inheritance complexity concerns.

Time Complexity Perspective

Usually O(1) forwarding + transformation overhead.
Main benefits are architectural, not asymptotic.

Space Complexity Perspective

O(1) additional adapter object overhead per integration instance.

Edge Cases

Data format mismatch: adapter must validate and map fields robustly.
Error model mismatch: convert external exceptions to domain-specific exceptions.
Async vs sync mismatch: adapter may need async wrappers or background execution handling.

Common Mistakes

Common Mistake: Putting business rules inside adapter; adapter should focus on translation/integration.

Common Mistake: Letting client depend on adaptee-specific types, which defeats abstraction.

Common Mistake: Creating one giant adapter for many unrelated services instead of focused adapters.

Pattern Recognition

Use Adapter when:

You must reuse an existing class with incompatible interface.
You cannot or should not modify third-party/legacy code.
You want business layer to depend on stable internal contracts.

Interview Insight

Interview Insight: Clearly say: “I will isolate external API incompatibility behind an adapter so core logic remains unchanged if provider changes.” This demonstrates strong maintainability thinking.

Practice Problems

Wrap two payment gateway SDKs under one internal payment interface.
Adapt legacy XML parser output into modern JSON-domain objects.
Build adapter for async third-party API into sync internal contract (or vice versa).

Expert Tip: Keep adapter thin and deterministic. If translation logic grows large, split mapping/validation helpers to maintain clarity.

Summary

Adapter pattern bridges incompatible interfaces safely.
It decouples business logic from external API quirks.
Most effective when integrating legacy or third-party components.
Use composition-focused object adapters for flexibility in Python.

23.5 Decorator Pattern

Introduction

The Decorator Pattern lets you add behavior to objects dynamically without modifying their original class. Instead of creating many subclasses for every feature combination, decorators wrap objects and extend behavior layer by layer.

In Python, this pattern is especially powerful because both object-oriented wrappers and function decorators are common in production code.

Real-World Analogy

Think of ordering coffee: you start with a basic coffee, then add milk, then sugar, then whipped cream. Each add-on changes behavior (price/description) by wrapping the previous object, not by rewriting the coffee itself.

Formal Definition

Decorator Pattern is a structural pattern that attaches additional responsibilities to an object dynamically by placing it inside wrapper objects that implement the same interface.

Concept Note: Decorator is an alternative to subclassing for extending behavior.

Why This Topic Matters

Prevents subclass explosion for feature combinations.
Supports flexible runtime composition of behaviors.
Common in logging, caching, validation, retry, metrics, and middleware pipelines.

Mental Model

Client -> DecoratorN -> DecoratorN-1 -> ... -> BaseComponent

Each layer adds behavior before/after delegating

Brute Force → Better → Optimal

Brute Force

Create subclasses for every combination (e.g., CoffeeWithMilkAndSugarAndWhip). This quickly becomes unmanageable.

Better

Use flags/conditionals inside one class. Simpler initially, but violates SRP and becomes messy.

Optimal

Use decorators that wrap a common component interface and compose features dynamically.

Optimization Insight: Decorators optimize extensibility by enabling behavior composition at runtime rather than compile-time inheritance trees.

Core Participants

Component: common interface used by client.
Concrete Component: base object with default behavior.
Decorator Base: holds wrapped component and forwards calls.
Concrete Decorators: add specific behavior before/after delegation.

Step-by-Step Design

Define base interface for operation(s).
Implement core concrete component.
Create decorator base class implementing same interface.
Create concrete decorators for each extra feature.
Wrap components in required order at runtime.

Python OOP Decorator Pattern Example

from abc import ABC, abstractmethod


class Coffee(ABC):
    @abstractmethod
    def cost(self) -> float:
        pass

    @abstractmethod
    def description(self) -> str:
        pass


class BasicCoffee(Coffee):
    def cost(self) -> float:
        return 50.0

    def description(self) -> str:
        return "Basic Coffee"


class CoffeeDecorator(Coffee):
    def __init__(self, coffee: Coffee):
        self._coffee = coffee

    def cost(self) -> float:
        return self._coffee.cost()

    def description(self) -> str:
        return self._coffee.description()


class MilkDecorator(CoffeeDecorator):
    def cost(self) -> float:
        return super().cost() + 15.0

    def description(self) -> str:
        return super().description() + ", Milk"


class SugarDecorator(CoffeeDecorator):
    def cost(self) -> float:
        return super().cost() + 5.0

    def description(self) -> str:
        return super().description() + ", Sugar"


class WhipDecorator(CoffeeDecorator):
    def cost(self) -> float:
        return super().cost() + 20.0

    def description(self) -> str:
        return super().description() + ", Whip"


coffee = BasicCoffee()
coffee = MilkDecorator(coffee)
coffee = SugarDecorator(coffee)
coffee = WhipDecorator(coffee)

print(coffee.description())  # Basic Coffee, Milk, Sugar, Whip
print(coffee.cost())         # 90.0

Python Function Decorator Perspective

Python’s @decorator syntax is a language-level implementation of the same idea: wrapping behavior around a callable.

import time
from functools import wraps


def timing_decorator(fn):
    @wraps(fn)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = fn(*args, **kwargs)
        elapsed = time.time() - start
        print(f"{fn.__name__} took {elapsed:.6f}s")
        return result
    return wrapper


@timing_decorator
def compute(n):
    return sum(i * i for i in range(n))

Line-by-Line Explanation

Decorator object keeps reference to wrapped component.
Each concrete decorator augments behavior and delegates to wrapped object.
Order of wrapping affects final output/behavior.
Function decorators use wrapper closures to add behavior around function execution.

ASCII Diagram

Client
  |
  v
WhipDecorator
  |
  v
SugarDecorator
  |
  v
MilkDecorator
  |
  v
BasicCoffee

Time Complexity Perspective

Each decorator layer adds O(1) extra work per call.
Total overhead is proportional to number of layers wrapped.

Space Complexity Perspective

O(k) wrapper objects for k decorators.

Edge Cases

Decorator order: can change outputs and side effects.
Too many layers: may hurt readability/debugging.
Mutable shared state: ensure wrappers do not produce unintended interactions.

Common Mistakes

Common Mistake: Confusing decorator pattern with inheritance-based extension.

Common Mistake: Implementing decorators that do not preserve interface contract.

Common Mistake: Adding unrelated business logic in wrappers, reducing cohesion.

Decorator vs Adapter (Quick Contrast)

Pattern	Primary Goal	Interface
Decorator	Add behavior dynamically	Same interface
Adapter	Convert compatibility	Different -> expected interface

Pattern Recognition

Use Decorator when:

You need combinable optional features.
Inheritance tree is growing too large.
Behavior should be attachable/removable at runtime.

Interview Insight

Interview Insight: Mention that decorators keep base classes closed for modification but open for extension (aligns with OCP). This connection shows mature design thinking.

Practice Problems

Build API client decorators for retry, logging, and caching.
Implement text processing pipeline with layered decorators (trim, sanitize, encrypt).
Refactor subclass-heavy feature combinations into decorators.

Expert Tip: Keep each decorator focused on one concern. Stacking small focused decorators is easier to test and reason about than one giant wrapper.

Summary

Decorator adds behavior dynamically by wrapping components.
It avoids subclass explosion and supports flexible composition.
Widely used in Python through both OOP wrappers and function decorators.
Best applied when optional behaviors are combinable and evolving.

23.6 Facade Pattern

Introduction

The Facade Pattern provides a simple, unified interface over a complex subsystem. Instead of forcing clients to understand and orchestrate many low-level components, a facade offers one clean entry point for common workflows.

In Python projects, facade is useful when service orchestration grows messy and call-site complexity starts leaking everywhere.

Real-World Analogy

When planning a trip, you can book flights, hotels, insurance, and transport separately through different systems — or use a travel desk that handles all steps through one request. Facade is that travel desk for software subsystems.

Formal Definition

Facade Pattern is a structural pattern that defines a higher-level interface making a subsystem easier to use.

Concept Note: Facade simplifies usage; it does not hide the subsystem completely. Advanced clients may still access subsystem classes directly when needed.

Why This Topic Matters

Reduces client-side orchestration complexity.
Improves readability and discoverability of common workflows.
Helps enforce cleaner architectural boundaries in large systems.

Mental Model

Client -> Facade -> Subsystem A
                 -> Subsystem B
                 -> Subsystem C

Client calls one method; facade coordinates the rest.

Brute Force → Better → Optimal

Brute Force

Each client calls many subsystem classes directly and repeats orchestration logic.

Better

Shared helper utilities reduce some duplication but still expose too many subsystem details.

Optimal

Introduce a facade with clear business-level operations that internally coordinate subsystem calls.

Optimization Insight: Facade optimizes cognitive load and change impact by localizing orchestration logic in one layer.

Core Participants

Subsystems: existing components with detailed APIs.
Facade: simple interface that composes subsystem operations.
Client: depends on facade for common use cases.

Step-by-Step Refactor to Facade

Identify repeated multi-step workflows across clients.
Define high-level operations clients actually need.
Create facade class exposing those operations.
Move orchestration/ordering logic into facade.
Keep advanced subsystem access optional for special cases.

Python Implementation Example

Scenario: Order Processing Pipeline

class InventoryService:
    def reserve(self, item_id: str, qty: int) -> bool:
        print(f"Inventory reserved: {item_id} x{qty}")
        return True


class PaymentService:
    def charge(self, user_id: str, amount: float) -> bool:
        print(f"Charged {user_id}: {amount}")
        return True


class ShippingService:
    def create_shipment(self, user_id: str, item_id: str, qty: int) -> str:
        tracking_id = "TRK123"
        print(f"Shipment created: {tracking_id}")
        return tracking_id


class NotificationService:
    def send(self, user_id: str, message: str) -> None:
        print(f"Notify {user_id}: {message}")


class OrderFacade:
    def __init__(self):
        self.inventory = InventoryService()
        self.payment = PaymentService()
        self.shipping = ShippingService()
        self.notification = NotificationService()

    def place_order(self, user_id: str, item_id: str, qty: int, amount: float) -> dict:
        if not self.inventory.reserve(item_id, qty):
            return {"ok": False, "error": "Inventory unavailable"}

        if not self.payment.charge(user_id, amount):
            return {"ok": False, "error": "Payment failed"}

        tracking = self.shipping.create_shipment(user_id, item_id, qty)
        self.notification.send(user_id, f"Order placed. Tracking: {tracking}")
        return {"ok": True, "tracking_id": tracking}

Line-by-Line Explanation

Subsystems stay focused on their own responsibilities.
OrderFacade.place_order defines one business-level method for clients.
Facade controls operation order and failure handling.
Client now calls one method instead of coordinating four services manually.

ASCII Diagram

Client
  |
  v
OrderFacade.place_order()
  |--> Inventory.reserve()
  |--> Payment.charge()
  |--> Shipping.create_shipment()
  \--> Notification.send()

Time Complexity Perspective

Facade usually adds O(1) orchestration overhead around subsystem calls.
Overall complexity depends on subsystem operations, not facade itself.

Space Complexity Perspective

Minimal extra memory for facade object and references to subsystem services.

Edge Cases

Partial failure: facade may need compensating actions (rollback/cancel).
Long workflows: ensure error propagation remains clear and traceable.
Overgrown facade: split into domain-specific facades if it becomes too large.

Common Mistakes

Common Mistake: Turning facade into a “god class” containing all business rules and unrelated workflows.

Common Mistake: Hiding too much and blocking legitimate advanced subsystem usage.

Common Mistake: Treating facade as replacement for good subsystem design — facade should simplify, not compensate for broken boundaries.

Facade vs Adapter vs Decorator

Pattern	Primary Goal
Facade	Simplify usage of subsystem
Adapter	Convert interface compatibility
Decorator	Add behavior dynamically

Pattern Recognition

Use Facade when:

Clients repeatedly execute the same multi-step subsystem sequence.
Subsystem API is too detailed for common use cases.
You want a clean entry point for business-level operations.

Interview Insight

Interview Insight: In LLD interviews, explain facade in terms of “workflow simplification and coupling reduction at call sites.” This framing shows architecture-level understanding.

Practice Problems

Build HomeTheaterFacade over audio, projector, and streaming subsystems.
Create DeploymentFacade for build-test-deploy-notify pipeline.
Refactor scattered payment + inventory + notification orchestration into facade.

Expert Tip: Keep facade methods business-oriented (e.g., place_order()) instead of exposing low-level subsystem verbs. This preserves abstraction value.

Summary

Facade pattern provides a simple interface to complex subsystems.
It centralizes orchestration and reduces client-side complexity.
Best used for common workflows where many subsystem calls are repeatedly combined.
Use carefully to avoid oversized facade classes.

23.7 Observer Pattern

Introduction

The Observer Pattern defines a one-to-many dependency between objects so that when one object (the subject) changes state, all dependent objects (observers) are notified automatically.

This pattern is fundamental in event-driven systems, UI frameworks, pub-sub architectures, and real-time notification pipelines.

Real-World Analogy

Think of a YouTube channel subscription model. The channel uploads a new video (state change), and all subscribers (observers) receive notifications automatically. Subscribers can also unsubscribe whenever they want.

Formal Definition

Observer Pattern is a behavioral design pattern where a subject maintains a list of observers and notifies them of state changes, usually by calling an update method.

Concept Note: Observer decouples sender from receivers — subject does not need to know concrete observer types.

Why This Topic Matters

Enables event-driven architecture and loose coupling.
Supports dynamic subscriber management at runtime.
Frequently appears in GUI systems, messaging systems, and monitoring tools.

Mental Model

Subject state changes
      |
      v
notify_all()
  |      |      |
Obs1   Obs2   Obs3   (each reacts independently)

Brute Force → Better → Optimal

Brute Force

Subject directly calls concrete services (email, SMS, logs) with hard-coded dependencies.

Better

Move handlers to helper functions, but subject still knows too much about receivers.

Optimal

Use observer abstraction. Subject only broadcasts events; observers decide their own reactions.

Optimization Insight: Observer optimizes extensibility — adding new listeners usually requires no subject code changes.

Core Participants

Subject: maintains observers and triggers notifications.
Observer Interface: defines update contract.
Concrete Observers: implement reaction logic.

Step-by-Step Design

Create observer interface with update(event).
Subject stores observer list and supports attach/detach.
On state change, subject calls notify().
Each observer handles event independently.

Python Implementation

from abc import ABC, abstractmethod
from typing import List


class Observer(ABC):
    @abstractmethod
    def update(self, event: str) -> None:
        pass


class Subject:
    def __init__(self):
        self._observers: List[Observer] = []

    def attach(self, observer: Observer) -> None:
        if observer not in self._observers:
            self._observers.append(observer)

    def detach(self, observer: Observer) -> None:
        if observer in self._observers:
            self._observers.remove(observer)

    def notify(self, event: str) -> None:
        for observer in self._observers:
            observer.update(event)


class EmailNotifier(Observer):
    def update(self, event: str) -> None:
        print(f"[EMAIL] {event}")


class SMSNotifier(Observer):
    def update(self, event: str) -> None:
        print(f"[SMS] {event}")


class AnalyticsTracker(Observer):
    def update(self, event: str) -> None:
        print(f"[ANALYTICS] tracked event: {event}")


class OrderService(Subject):
    def place_order(self, order_id: str) -> None:
        event = f"Order placed: {order_id}"
        self.notify(event)


service = OrderService()
service.attach(EmailNotifier())
service.attach(SMSNotifier())
service.attach(AnalyticsTracker())

service.place_order("ORD-101")

Line-by-Line Explanation

Subject manages observer lifecycle with attach/detach methods.
notify broadcasts event to all current observers.
Concrete observers implement independent side effects.
OrderService focuses on business action, not notification specifics.

ASCII Diagram

OrderService (Subject)
   | state change
   v
notify(event)
   |------> EmailNotifier
   |------> SMSNotifier
   \------> AnalyticsTracker

Push vs Pull Notification Styles

Push Model

Subject sends event data directly in update call (as in current example).

Pull Model

Subject only signals change; observers query needed state from subject afterward.

Time Complexity Perspective

notify is O(n) where n = number of observers.
attach/detach is O(n) with list (can be O(1) average with set if needed and hashable observers).

Space Complexity Perspective

O(n) for storing observer references.

Edge Cases

Observer failure: one failing observer should not always block others (consider try/except per observer).
Observer removes itself during notification: iterate over snapshot copy to avoid mutation issues.
High-frequency events: notification storms may require batching or async queues.

Common Mistakes

Common Mistake: Subject depending on concrete observer classes directly, defeating decoupling.

Common Mistake: Forgetting detach support, causing memory leaks in long-running apps.

Common Mistake: Doing heavy blocking work inside observer updates on main thread.

Observer vs Pub-Sub (Quick Contrast)

Model	Coupling Style	Common Scope
Observer	Direct subject-observer references	In-process object collaboration
Pub-Sub	Broker/topic mediated	Distributed/event bus systems

Pattern Recognition

Use Observer when:

One state change should trigger multiple independent reactions.
Receivers should be pluggable/removable at runtime.
Sender must remain decoupled from specific side-effect handlers.

Interview Insight

Interview Insight: Mention failure isolation and async delivery options. Interviewers like when you go beyond textbook observer and discuss real-world notification reliability.

Practice Problems

Implement stock price subject with multiple subscriber dashboards.
Add async observer dispatch using queue/thread pool.
Implement observer priority ordering and retry policy for failures.

Expert Tip: Keep observer callbacks lightweight. For expensive side effects, publish lightweight events and let background workers process heavy tasks.

Summary

Observer pattern enables one-to-many event notification with loose coupling.
It is ideal for event-driven local object collaboration.
Attach/detach flexibility improves runtime extensibility.
Plan for failure handling and scalability in production use.

23.8 Strategy Pattern

Introduction

The Strategy Pattern defines a family of algorithms, encapsulates each one, and makes them interchangeable at runtime. Instead of hardcoding one algorithm in a class, you inject a strategy object that implements a shared contract.

This pattern is ideal when behavior changes based on context (for example pricing policy, payment method, sorting rule, or retry policy).

Real-World Analogy

Consider a navigation app. You can choose “fastest route,” “shortest distance,” or “avoid tolls.” The app context stays same, but route strategy changes based on user preference. Strategy pattern models this behavior cleanly.

Formal Definition

Strategy Pattern is a behavioral pattern that enables selecting an algorithm’s implementation at runtime by composing with interchangeable strategy objects.

Concept Note: Strategy emphasizes replacing conditional branching with polymorphism for cleaner extension.

Why This Topic Matters

Eliminates large if/elif blocks for behavior variants.
Supports Open/Closed Principle by adding new strategies without changing context class.
Improves unit testing by testing strategies independently.

Mental Model

Context
  |
  +--> Strategy Interface
          |--> Strategy A
          |--> Strategy B
          |--> Strategy C

Context delegates behavior to current strategy.

Brute Force → Better → Optimal

Brute Force

One class with huge conditional logic selecting behavior by flags/types.

Better

Split helper methods but still centralize branching in context.

Optimal

Extract algorithms into interchangeable strategy classes and inject one into context.

Optimization Insight: Strategy pattern optimizes maintainability and extension speed, especially when behavior variants grow over time.

Core Participants

Strategy Interface: common operation contract.
Concrete Strategies: different implementations.
Context: uses strategy to perform work; can switch strategy dynamically.

Step-by-Step Design

Identify algorithm variants currently in conditionals.
Create shared strategy interface.
Move each variant into separate concrete strategy class.
Inject strategy into context via constructor or setter.
Delegate execution from context to strategy.

Python Implementation Example

Scenario: Payment Processing

from abc import ABC, abstractmethod


class PaymentStrategy(ABC):
    @abstractmethod
    def pay(self, amount: float) -> str:
        pass


class CreditCardPayment(PaymentStrategy):
    def pay(self, amount: float) -> str:
        return f"Paid {amount} using Credit Card"


class UpiPayment(PaymentStrategy):
    def pay(self, amount: float) -> str:
        return f"Paid {amount} using UPI"


class WalletPayment(PaymentStrategy):
    def pay(self, amount: float) -> str:
        return f"Paid {amount} using Wallet"


class CheckoutContext:
    def __init__(self, strategy: PaymentStrategy):
        self.strategy = strategy

    def set_strategy(self, strategy: PaymentStrategy) -> None:
        self.strategy = strategy

    def checkout(self, amount: float) -> str:
        return self.strategy.pay(amount)


checkout = CheckoutContext(CreditCardPayment())
print(checkout.checkout(1200.0))

checkout.set_strategy(UpiPayment())
print(checkout.checkout(799.0))

Line-by-Line Explanation

PaymentStrategy defines the common contract.
Each concrete strategy encapsulates one payment behavior.
CheckoutContext is decoupled from concrete payment implementations.
set_strategy allows runtime behavior switching.

ASCII Diagram

CheckoutContext.checkout()
         |
         v
   current strategy.pay()
         |
   [CreditCard | UPI | Wallet]

Functional Strategy Variant (Pythonic)

In Python, strategies can also be callables/functions for lightweight use cases.

def fast_shipping(cost):
    return cost + 50


def free_shipping(cost):
    return cost


class ShippingContext:
    def __init__(self, strategy):
        self.strategy = strategy

    def total(self, base_cost):
        return self.strategy(base_cost)

Time Complexity Perspective

Pattern itself adds O(1) dispatch overhead.
Actual complexity depends on chosen strategy algorithm.

Space Complexity Perspective

O(k) for storing k strategy class definitions; runtime context holds one active strategy reference.

Edge Cases

No strategy set: context should validate or provide default strategy.
Stateful strategies: ensure reusability/thread-safety rules are clear.
Too many tiny strategies: can overcomplicate simple domains.

Common Mistakes

Common Mistake: Keeping big conditional blocks in context even after introducing strategies.

Common Mistake: Exposing concrete strategy internals to clients, reducing abstraction value.

Common Mistake: Creating strategy pattern where only one fixed behavior will ever exist.

Strategy vs Factory (Quick Contrast)

Pattern	Primary Focus
Strategy	Choosing behavior/algorithm at runtime
Factory	Creating object instances

Pattern Recognition

Use Strategy when:

You have multiple interchangeable algorithms for same task.
Behavior needs runtime switching.
Large conditional blocks keep growing with new policy types.

Interview Insight

Interview Insight: In interviews, explicitly connect strategy to OCP: "new behaviors can be added as new strategy classes without editing context logic." This demonstrates design maturity.

Practice Problems

Implement discount engine using strategy classes for coupon types.
Build compression context with gzip/zip/lz strategies.
Refactor tax calculation if-else tree into strategy pattern.

Expert Tip: Keep context small and policy-free. If context starts containing strategy-specific logic, abstraction is leaking.

Summary

Strategy pattern encapsulates interchangeable algorithms behind one interface.
It removes conditional complexity and supports runtime behavior switching.
Widely useful for policy-driven business logic in production systems.
Use it when behavior variants are expected to grow.

23.9 State Pattern

Introduction

The State Pattern allows an object to alter its behavior when its internal state changes. Instead of one class containing large conditional state logic, each state is represented as a separate class with its own behavior.

This pattern is especially useful for workflows like vending machines, order lifecycle, document publishing states, and connection/session handling.

Real-World Analogy

A traffic light behaves differently depending on its current state: green allows go, yellow warns transition, red enforces stop. Same object, different behavior per state. State pattern models this explicitly.

Formal Definition

State Pattern is a behavioral design pattern that lets an object delegate behavior to state-specific objects and switch between them dynamically.

Concept Note: State pattern is often described as “Strategy + state transition awareness.”

Why This Topic Matters

Removes complex state conditionals from central classes.
Makes transitions explicit and safer to extend.
Common in domain-driven workflows and backend services.

Mental Model

Context
  |
  +--> current_state.handle(...)
            |
            +--> may trigger context state transition

Context delegates behavior to current state object; state decides next valid transitions.

Brute Force → Better → Optimal

Brute Force

Use one class with huge if/elif blocks checking state in every method.

Better

Centralize state checks in helper methods, but conditional complexity still grows rapidly.

Optimal

Model each state as a class implementing a common interface; transitions change active state object.

Optimization Insight: State pattern optimizes change management in behavior-rich workflows by localizing logic per state.

Core Participants

Context: owns current state and delegates behavior.
State Interface: declares operations for state-specific behavior.
Concrete States: implement behavior and transitions.

Step-by-Step Design

Identify all distinct states and allowed transitions.
Define state interface for relevant actions.
Implement one class per state.
Context routes actions to active state.
State objects trigger transitions by updating context state.

Python Implementation Example

Scenario: Order Lifecycle

from abc import ABC, abstractmethod


class OrderState(ABC):
    @abstractmethod
    def pay(self, order):
        pass

    @abstractmethod
    def ship(self, order):
        pass

    @abstractmethod
    def deliver(self, order):
        pass


class CreatedState(OrderState):
    def pay(self, order):
        print("Payment received.")
        order.set_state(PaidState())

    def ship(self, order):
        print("Cannot ship before payment.")

    def deliver(self, order):
        print("Cannot deliver before shipping.")


class PaidState(OrderState):
    def pay(self, order):
        print("Order already paid.")

    def ship(self, order):
        print("Order shipped.")
        order.set_state(ShippedState())

    def deliver(self, order):
        print("Cannot deliver before shipping.")


class ShippedState(OrderState):
    def pay(self, order):
        print("Order already paid and shipped.")

    def ship(self, order):
        print("Order already shipped.")

    def deliver(self, order):
        print("Order delivered.")
        order.set_state(DeliveredState())


class DeliveredState(OrderState):
    def pay(self, order):
        print("Order already completed.")

    def ship(self, order):
        print("Order already completed.")

    def deliver(self, order):
        print("Order already delivered.")


class Order:
    def __init__(self):
        self.state = CreatedState()

    def set_state(self, state: OrderState):
        self.state = state

    def pay(self):
        self.state.pay(self)

    def ship(self):
        self.state.ship(self)

    def deliver(self):
        self.state.deliver(self)

Line-by-Line Explanation

Order context holds current state object.
Each state class defines valid behavior for actions.
Invalid transitions are handled where they logically belong (inside current state).
Successful actions can move order to next state via set_state().

ASCII State Flow

Created --pay--> Paid --ship--> Shipped --deliver--> Delivered

Invalid actions in any state are blocked by that state's logic.

State vs Strategy (Quick Contrast)

Pattern	Behavior Change Trigger
State	Internal lifecycle transitions
Strategy	External selection of algorithm/policy

Time Complexity Perspective

State method dispatch is O(1) per operation.
Main gain is architectural clarity, not asymptotic speed.

Space Complexity Perspective

Additional state objects/classes add small structural overhead.

Edge Cases

Invalid transitions: ensure each state handles disallowed actions explicitly.
State explosion: too many micro-states may hurt readability; group where appropriate.
Persistence: long-lived systems may need serializable state identity.

Common Mistakes

Common Mistake: Leaving transition logic in context and state classes simultaneously, creating confusion.

Common Mistake: Using state pattern for trivial 2-case toggles where simple conditionals are sufficient.

Common Mistake: Not documenting allowed transitions, leading to hidden illegal states.

Pattern Recognition

Use State Pattern when:

Object behavior changes significantly by lifecycle stage.
You see repeated if state == ... checks across many methods.
Transition rules are explicit business rules that may evolve.

Interview Insight

Interview Insight: In LLD interviews, draw state-transition diagram first, then map each state to class behavior. This shows strong domain modeling skill.

Practice Problems

Model document workflow: Draft -> Review -> Approved -> Published.
Implement media player states: Playing, Paused, Stopped.
Refactor a large conditional-driven workflow into state objects.

Expert Tip: Start with explicit transition table before coding states. This prevents missing edge transitions and keeps implementation consistent.

Summary

State pattern models behavior by lifecycle stage using dedicated state classes.
It replaces state conditionals with explicit, maintainable transitions.
Ideal for workflow-driven systems where valid actions depend on current state.
Use judiciously to balance clarity with implementation complexity.

24.1 Interview Score Rubric (0 to 10)

Introduction

Top courses are not judged only by topic coverage. They are judged by measurable outcomes. This rubric gives you a repeatable way to score any solve attempt and objectively track your interview readiness.

Rubric Dimensions

Dimension	Weight	What "Excellent" Looks Like
Problem Understanding	15%	Restates constraints, edge limits, output format correctly.
Approach Quality	25%	Moves from brute to optimal with clear trade-offs.
Correctness	20%	No logic bugs on dry run and custom test cases.
Complexity Analysis	15%	Precise Big-O with justification and constraints alignment.
Communication	15%	Structured, concise, interviewer-friendly narration.
Code Quality	10%	Clean naming, robust checks, no dead branches.

Expert Tip: A "passed interview-level solve" is usually 8.0+ with no hard correctness failure.

24.2 8-Week Timed Problem Roadmap

Plan

Weeks 1-2: Arrays, strings, hashing, two pointers, binary search (45 min/question).
Weeks 3-4: Stack/queue, linked list, trees, heaps (50 min/question).
Weeks 5-6: Graphs, recursion/backtracking, greedy (55 min/question).
Weeks 7-8: Dynamic programming + mixed mocks (60 min full interview simulation).

Execution Rule

Each week: 5 timed solves + 1 revision day + 1 full mock day. Never replace revision day with new questions.

24.3 Edge-Case & Test Design Checklist

Mandatory Test Buckets

Empty/min input and max boundary input.
Duplicate-heavy and all-equal values.
Strictly increasing/decreasing order patterns.
Negative values, zeros, and sign-mix inputs.
Single valid answer vs multiple valid answers.
Invalid-state guard tests when applicable.

Common Mistake: Solving only the sample tests and assuming correctness.

24.4 Mock Interview Protocol

60-Minute Format

5 min: clarify requirements and constraints.
10 min: propose brute and optimize to target approach.
30 min: code with incremental dry-runs.
10 min: edge tests + complexity discussion.
5 min: reflection and alternative approach.

Post-Mock Review Card

Record: final score, major mistake category, one technical fix, one communication fix, and next-day drill.

24.5 Contest + Interview Hybrid Routine

Use contests to build speed and pressure tolerance; use interview mocks to build explanation quality and structured reasoning.

Contest Day: speed, pattern recognition, fallback strategy.
Interview Day: clarity, trade-offs, readable production-style code.
Bridge Task: rewrite one contest solution as interview-grade explanation + clean code.

24.6 Portfolio & Credibility Proof Pack

What to Build

A public problem log: question, approach, mistakes, final complexity.
At least 3 polished writeups: one graph, one DP, one design problem.
One mini project using DSA choices with performance comparison.
A revision tracker with weak-pattern trendline.

Concept Note: Proof of process and consistency often differentiates equal-skill candidates.

24.7 Final Readiness Gate

Promotion Criteria (Ready for Interviews)

10 consecutive medium problems solved under time budget.
At least 3 hard problems solved with clean explanation.
Average mock score >= 8.2 over last 6 mocks.
No repeated critical mistakes in the last 2 weeks.

Summary

This section converts your course from content-heavy to outcome-driven. Once learners can pass this gate consistently, the practical course value typically reaches elite interview-prep standards.