Greedy Algorithms
Definition
Makes locally optimal choices at each step, hoping to find global optimum. Doesn't backtrack once decision made.When to Use
When problem has optimal substructure and greedy choice property. Not all problems work (activity selection, coin change do; general TSP doesn't).Greedy Choice Property
A globally optimal solution can be arrived at by making a locally optimal choice.Optimal Substructure
An optimal solution contains optimal solutions to subproblems.Complexity
Usually O(n log n) or O(n), depending on sorting/data structure.Minimum Spanning Trees (MST)
Prim's Algorithm
Start from any vertex, repeatedly add closest unvisited vertex. Build tree incrementally.| Time | O(V^2) |
|---|---|
| With heap | O((V+E)log V) |
| Space | O(V) |
Kruskal's Algorithm
Sort edges by weight, add edges in order if they don't create cycle (using Union-Find).| Time | O(E log E) |
|---|---|
| Dominates | Sorting edges |
| Space | O(V) |
i
Key Insight: Both produce same total weight but different trees. Prim better for dense graphs, Kruskal for sparse.
Network Flow
Ford-Fulkerson Method
Find augmenting paths from source to sink, add flow along paths until no more paths exist.Edmonds-Karp
Ford-Fulkerson with BFS to find shortest augmenting paths. Time: O(VE^2)Residual Graph
Shows remaining capacity on edges. If edge has flow f and capacity c, residual capacity is c - f forward and f backward.Augmenting Path
Path from source to sink in residual graph where all edges have positive capacity. Max flow equals bottleneck capacity.Linear Programming
Standard Form
Maximize c^T x subject to Ax <= b, x >= 0. Feasible region is convex polygon in R^n.Simplex Algorithm
Moves between vertices of feasible region. Each step improves objective value. Worst case exponential but polynomial on average.Basis & Non-basis
Basis variables are set to m largest values, non-basis to 0. Pivot moves between bases.Duality
Every LP has dual. If primal unbounded, dual infeasible. Optimal values equal.Pattern Matching: Knuth-Morris-Pratt (KMP)
Time Complexity
O(m + n) where m = pattern length, n = text length. Preprocessing takes O(m).Failure Function
Precomputed array that tells how far to shift pattern when mismatch occurs. Avoids re-checking characters.Algorithm Steps
- Compute failure function for pattern
- Match characters until mismatch
- Use failure function to skip ahead
- No backtracking in text pointer
Data Compression
Lossless Compression
Original data perfectly recoverable. Achieves better rates on structured/repetitive data.Lossy Compression
Information discarded, but often imperceptible to humans. Used for audio, images, video.Run-Length Encoding (RLE)
Replace runs of same character with (char, count). Good for data with long runs. Worst case: expands if no runs.Lempel-Ziv (LZ)
Dictionary-based. LZ77: sliding window. LZ78: builds dictionary dynamically. Basis for ZIP, GIF.Huffman Coding
Algorithm
Build binary tree by repeatedly combining two smallest frequency nodes. Assign codes: left = 0, right = 1.Properties
Optimal prefix code. No code is prefix of another. Unique decoding. Average code length close to entropy.Time Complexity: O(n log n) with heap. Space: O(n) for tree.
+
Advantage: Provably optimal for symbol-by-symbol encoding. Easy to implement and decode.
Streaming Algorithms
Problem Constraint
Process data that flows in one pass. Limited memory (o(n) or polylog(n)). Cannot store all data.Reservoir Sampling
Uniformly sample k items from unknown stream size. Keep first k, then with probability k/i replace random element when item i arrives.Frequency Moments
F_0: number of distinct elements. F_1: total elements. F_2: sum of squares of frequencies. Approximate with sketches.Count-Min Sketch
Approximate frequency counts with O(1/epsilon * log n) space. Hash each element to multiple rows. Take minimum count.Bloom Filters
Structure
Bit array of size m, k hash functions. To add: hash to k positions, set bits to 1. To check: hash, see if all k positions are 1.False Positives
Can say "might be in set" when not present. Probability ≈ (1 - e^(-kn/m))^k. No false negatives.Space: O(m) bits. Lookup: O(k). Trade-off between memory and false positive rate.
i
Applications: Web crawlers (visited URLs), caches, databases, duplicate detection.
Randomized Algorithms
Las Vegas
Always correct output, random running time. Example: Randomized Quicksort. Time varies but correctness guaranteed.Monte Carlo
Always runs in bounded time, but output might be wrong with small probability. Example: Miller-Rabin primality test.Randomized Quicksort
Choose pivot randomly instead of fixed position. Expected O(n log n), no worst case O(n^2) on adversarial input.Miller-Rabin Primality
Probabilistic test. Run k times for error probability 4^(-k). Much faster than deterministic for large numbers.Concentration Inequalities: Chernoff, Markov bound probability of large deviations.
Exam Traps & Common Mistakes
!
Greedy Doesn't Always Work: Check both optimal substructure AND greedy choice property. TSP is greedy-resistant.
!
MST Uniqueness: Different algorithms give same total weight but different edge sets if weights aren't unique.
!
Flow = Capacity: When all saturated, no augmenting paths exist. Max flow found when residual graph has no s-t paths.
!
Huffman Decoding: You need the tree (or code table) to decode. Can't reconstruct tree from code alone if variable-length.
!
Bloom Filter Deletion: Standard Bloom filter doesn't support deletion (removing bit breaks other elements). Use counting Bloom for deletions.
!
Randomized Complexity: Distinguish between worst-case, average-case, and expected time. Las Vegas is always correct, Monte Carlo not.