# Feb 4. Kruskal’s and Prim’s algorithms for MST

Minimum Weight Spanning Tree

Given $G=(V,E)$ and a weight function $w: E \rightarrow \mathbb{R}$, find a minimum cost spanning tree of $G$. Recall the following lemma for Kruskal’s algorithm:

Lemma: Every edge $e$ is a minimum weight edge across some cut $(S,\bar{S})$ of $G$.

Proof: Let $e=(u,v)$. Initially $u$ and $v$ are in separate components. Since the algorithm outputs a tree, at some point it must have added the edge $e$ that connected the components containing $u$ and $v$. Let $S$ be the component containing $u$ and $\bar{S}$ the component containing $v$ in the step before $e$ was added. Clearly $e$ is a minimum weight edge connecting $S$ and $\bar{S}$ by the definition of the greedy algorithm.

Theorem: Let $T^*$ be a minimum spanning tree and let $T_G$ be the tree output by the greedy algorithm. Then $w(T_G) = w(T^*)$.

Proof: Let $e=(u,v)$ be an edge in $T_G$ that is not in $T^*$. If such an edge doesn’t exist, then there is nothing to proof. Consider $T^* \cup \{(u,v)\}$. Let $C$ be the unique cycle in $T^* \cup \{(u,v)\}$. We claim that there is one edge $e'$ in the cycle that is atleast as heavy as $(u,v)$. By previous lemma there exists a cut $(S,\bar{S})$ such that $e$ is a minimum weight edge connecting $(S,\bar{S})$. Clearly the cycle $C$ crosses $(S,\bar{S})$ at least twice and so there must be an edge $e' \in T^*$ such that $w(e') \ge w(e)$. Let $T' = T^* \cup \{e\} \setminus \{e'\}$. Then $w(T') \le w(T^*)$. Thus we can keep replacing every edge of $T^*$ by the edges of $T_G$, so that finally we obtain $w(T_G) \le w(T^*)$. Since $w(T^*)$ was optimal the claim follows.

Caveat: What if at some point we remove an edge $e'$ that we added from $T_G$ at some earlier iteration?

We claim that there exists a cut $(S,\bar{S})$ in $G$ for which $e$ is the minimum connecting edge in $T_G$ and moreover $e$ is the only edge in $T_G$ connecting $S$ with $\bar{S}$. This can be seen for example by setting $(S,\bar{S})$ to be the two components in $T_G \setminus \{e\}$. Thus when we remove an edge from $T^* \cup \{e\}$, we know that the edge $e'$ (the second cycle edge crossing $(S,\bar{S})$) cannot have come from $T_G$.

Prim’s Algorithm

• Initialize $T = \{v\}$ for some $v \in V$.
• While $|T| < n-1$ do:
• Let $e$ be the minimum cost edge with one endpoint in $T$ and the other endpoint in $\bar{T}$.
• Set $T$ to $T\cup \{e\}$
• EndWhile

Notice that the previous lemma that we proved holds for Prim’s algorithm, since if edge $e$ was added, then clearly it must be the minimum cost edge connecting $(T,\bar{T})$.

Running Time for Kruskal’s Algorithm

Sorting the edges takes $O(m \log{m})$. If we naively check for cycles by doing $DFS$, then each iteration would take $O(m+n)$ and we do $n-1$ iterations. This would give us a running time bound of $O(mn)$. Can we do better?

We can maintain a list of sets of vertices that form the components at each period of time. Thus checking whether adding an edge $e=(u,v)$ forms a cycle reduces to looking up the component numbers of $u$ and $v$ and checking whether they are the same (in which case you get a cycle) or different (in which case you don’t get a cycle). How much time does it take to modify the components when an edge $e$ is added? We only need to modify the entries of the smaller list. Observe that in this case, every time we do this modification step the set $S$ which we modified became at least twice in size. Thus the entry for a particular vertex $v$ can only be modified at most $\log_2{n}$ times, since the component can grow to at most size $n$. Thus the total time to update the sets is $O(n\log{n})$.