Advanced Algorithms — Weeks 6-11 Cheat Sheet

UFCFYR-15-2 · BSc Computer Science · Dr Kun Wei · Weeks 6–11

UFCFYR-15-2

Greedy Algorithms

Definition

Makes locally optimal choices at each step, hoping to find global optimum. Doesn't backtrack once decision made.

When to Use

When problem has optimal substructure and greedy choice property. Not all problems work (activity selection, coin change do; general TSP doesn't).

Greedy Choice Property

A globally optimal solution can be arrived at by making a locally optimal choice.

Optimal Substructure

An optimal solution contains optimal solutions to subproblems.

Complexity

Usually O(n log n) or O(n), depending on sorting/data structure.

Minimum Spanning Trees (MST)

Prim's Algorithm

Start from any vertex, repeatedly add closest unvisited vertex. Build tree incrementally.

Time	O(V^2)
With heap	O((V+E)log V)
Space	O(V)

Kruskal's Algorithm

Sort edges by weight, add edges in order if they don't create cycle (using Union-Find).

Time	O(E log E)
Dominates	Sorting edges
Space	O(V)

Key Insight: Both produce same total weight but different trees. Prim better for dense graphs, Kruskal for sparse.

Network Flow

Ford-Fulkerson Method

Find augmenting paths from source to sink, add flow along paths until no more paths exist.

Edmonds-Karp

Ford-Fulkerson with BFS to find shortest augmenting paths. Time: O(VE^2)

Residual Graph

Shows remaining capacity on edges. If edge has flow f and capacity c, residual capacity is c - f forward and f backward.

Augmenting Path

Path from source to sink in residual graph where all edges have positive capacity. Max flow equals bottleneck capacity.

Linear Programming

Standard Form

Maximize c^T x subject to Ax <= b, x >= 0. Feasible region is convex polygon in R^n.

Simplex Algorithm

Moves between vertices of feasible region. Each step improves objective value. Worst case exponential but polynomial on average.

Basis & Non-basis

Basis variables are set to m largest values, non-basis to 0. Pivot moves between bases.

Duality

Every LP has dual. If primal unbounded, dual infeasible. Optimal values equal.

Pattern Matching: Knuth-Morris-Pratt (KMP)

Time Complexity

O(m + n) where m = pattern length, n = text length. Preprocessing takes O(m).

Failure Function

Precomputed array that tells how far to shift pattern when mismatch occurs. Avoids re-checking characters.

Algorithm Steps

Compute failure function for pattern
Match characters until mismatch
Use failure function to skip ahead
No backtracking in text pointer

Data Compression

Lossless Compression

Original data perfectly recoverable. Achieves better rates on structured/repetitive data.

Lossy Compression

Information discarded, but often imperceptible to humans. Used for audio, images, video.

Run-Length Encoding (RLE)

Replace runs of same character with (char, count). Good for data with long runs. Worst case: expands if no runs.

Lempel-Ziv (LZ)

Dictionary-based. LZ77: sliding window. LZ78: builds dictionary dynamically. Basis for ZIP, GIF.

Huffman Coding

Algorithm

Build binary tree by repeatedly combining two smallest frequency nodes. Assign codes: left = 0, right = 1.

Properties

Optimal prefix code. No code is prefix of another. Unique decoding. Average code length close to entropy.

Time Complexity: O(n log n) with heap. Space: O(n) for tree.

Advantage: Provably optimal for symbol-by-symbol encoding. Easy to implement and decode.

Streaming Algorithms

Problem Constraint

Process data that flows in one pass. Limited memory (o(n) or polylog(n)). Cannot store all data.

Reservoir Sampling

Uniformly sample k items from unknown stream size. Keep first k, then with probability k/i replace random element when item i arrives.

Frequency Moments

F_0: number of distinct elements. F_1: total elements. F_2: sum of squares of frequencies. Approximate with sketches.

Count-Min Sketch

Approximate frequency counts with O(1/epsilon * log n) space. Hash each element to multiple rows. Take minimum count.

Bloom Filters

Structure

Bit array of size m, k hash functions. To add: hash to k positions, set bits to 1. To check: hash, see if all k positions are 1.

False Positives

Can say "might be in set" when not present. Probability ≈ (1 - e^(-kn/m))^k. No false negatives.

Space: O(m) bits. Lookup: O(k). Trade-off between memory and false positive rate.

Applications: Web crawlers (visited URLs), caches, databases, duplicate detection.

Randomized Algorithms

Las Vegas

Always correct output, random running time. Example: Randomized Quicksort. Time varies but correctness guaranteed.

Monte Carlo

Always runs in bounded time, but output might be wrong with small probability. Example: Miller-Rabin primality test.

Randomized Quicksort

Choose pivot randomly instead of fixed position. Expected O(n log n), no worst case O(n^2) on adversarial input.

Miller-Rabin Primality

Probabilistic test. Run k times for error probability 4^(-k). Much faster than deterministic for large numbers.

Concentration Inequalities: Chernoff, Markov bound probability of large deviations.

Exam Traps & Common Mistakes

Greedy Doesn't Always Work: Check both optimal substructure AND greedy choice property. TSP is greedy-resistant.

MST Uniqueness: Different algorithms give same total weight but different edge sets if weights aren't unique.

Flow = Capacity: When all saturated, no augmenting paths exist. Max flow found when residual graph has no s-t paths.

Huffman Decoding: You need the tree (or code table) to decode. Can't reconstruct tree from code alone if variable-length.

Bloom Filter Deletion: Standard Bloom filter doesn't support deletion (removing bit breaks other elements). Use counting Bloom for deletions.

Randomized Complexity: Distinguish between worst-case, average-case, and expected time. Las Vegas is always correct, Monte Carlo not.