CS140-lecture-20211027

Variable length encoding #

If we want to compress a file, we need to represent each symbol with a string of binary digits. If our strings are variable length, then that means the representation for any symbol should not be the prefix of the representation of another symbol.

image_2021-10-27-16-33-25 image_2021-10-27-16-34-39

Huffman codes #

image_2021-10-27-16-35-34

The value at each node is the frequency of symbols in the subtree. The leaves represent symbols, and each left or right child represents a 0 or 1. So, if we walk directly to a leaf, the values on each link represent the encoded symbol.

image_2021-10-27-16-37-21

So, the above tree shows that e is represented as 1101.

So how can we optimize this tree using a greedy algorithm? First off, we want the lowest frequency symbols to as low as possible in the tree (lowest frequency items have longer encodings).

image_2021-10-27-16-40-45

  • n n is the number of symbols
  • creates a heap with frequencies
  • each iteration extracts 2 nodes from heap, and inserts a new node that has the sum of frequencies

image_2021-10-27-16-42-16 image_2021-10-27-16-45-44 image_2021-10-27-16-48-00

So how do we prove that this is the optimal binary tree? We need to prove

  • the greedy choice property
  • the optimal substructure property

At each point, we are always choosing the 2 lowest frequency symbols.

image_2021-10-27-16-52-01

If during each swap, the subtree frequency does not change, then each tree is optimal.

image_2021-10-27-16-53-04

The reason that it is 0 \geq 0 is because it will have either lower or the same frequency.

image_2021-10-27-16-59-29 image_2021-10-27-17-02-04 image_2021-10-27-17-04-00

Based on our assumption, we think that T T' is optimal, whereas T T''' may not be optimal. That is why the cost of T T'' is \geq to C(T) C(T) .

Minimum spanning trees #

Finding a minimum spanning tree in a graph is also a greedy algorithm.

image_2021-10-28-10-09-47

  • spanning means it covers every node in the graph
  • tree means there is no cycles
  • minimum means that the total cost of the tree is the lowest possible

image_2021-10-28-10-11-13

  • “safe” means that when adding an edge to A A , it is still a subset of the MST

So how do we find the safe edge? That leads to different algorithms.

image_2021-10-28-10-12-22 image_2021-10-28-10-13-59

  • an edge that “crosses the cut”, means it has 1 endpoint in the first set, and 1 endpoint in the other.

image_2021-10-28-10-19-46

  • there should be at least 1 edge that crosse the cut, for example (x,y) (x,y) .
  • if we remove that edge, and instead add the light edge (u,v) (u,v) , that means that it is the smallest edge that crosses the cut
  • this leaves us with another tree, with a total weight that is the same minus w(x,y)+w(u,v) w(x,y) + w(u,v)

image_2021-10-28-10-21-42 image_2021-10-28-10-26-53

  • “CC” = connected component

MST algorithms #

image_2021-10-28-10-27-01

  • the algorithm starts by assuming all nodes in the graph are individual connected components (nothing is connected to each other, no nodes in the MST yet)

image_2021-10-28-10-29-24 image_2021-10-28-10-38-10

  • π[r] \pi[r] represents the “parent of node r r
  • extract-min in a binary heap has a runtime of O(lgn) O(\lg n)
  • decrease-key operation has a complexity of O(lgV) O(\lg V)
  • the amount of edges in a connected graph is E=V1 E = V-1 , or Ω(V) \Omega(V)
  • overall runtime complexity of Prim’s is O(ElgV) O(E \lg V)

Example of Prim’s #

image_2021-10-28-10-46-05 image_2021-10-28-10-46-52 image_2021-10-28-10-47-28

Note: This is not a directed graph, the arrows just show what nodes are directly being updated when a new node is added.

image_2021-10-28-10-50-21 image_2021-10-28-10-50-53 image_2021-10-28-10-50-55 image_2021-10-28-10-51-15

Example of Kruskal’s #

In comparison, Kruskal’s starts with all nodes being their own connected components. It chooses the minimum edge between any 2 connected components:

Note: Disregard the \infty signs in each node here, we're just reusing the graph from the Prim's example to show Kruskal's.

image_2021-10-28-10-52-30 image_2021-10-28-10-52-51 image_2021-10-28-10-53-10 image_2021-10-28-10-53-21 image_2021-10-28-10-53-35 image_2021-10-28-10-54-05