4.2 - Algorithm - Prim's minimum spanning tree algorithm - [dsa 2] by tim roughgarden

Okay, so it's time to discuss our first minimum spanning tree algorithm namely Prim's algorithm. Definitely a candidate for the greatest hits compilation. And again remember even though it's called Prim's algorithm, it was actually discovered earlier by Jarnik. So how's it work? Well before showing you any pseudo code, let's first illustrate it on an example. As we go through the example, I hope that the similarities to Dijkstra's shortest path algorithm will be evident. I'm going to work with the same example graph from the previous video with four vertices and five edges. The plan is to grow a tree one edge at a time. And we're going to keep growing this tree like a mold. We're going to start from just a seed vertex. And then we're going to suck up one new vertex with each iteration of the algorithm. So, this is similar to Dijkstra's Algorithm. In Dijkstra's Algorithm, it was clear where we should grow the initial mold from, because we were given a source vertex, that they're trying to compute the shortest paths out of. We have no source vertex in the minimum spanning tree problem, but it turns out that we can just pick an arbitrary vertex to start. Doesn't matter which one, which is cool. So the plan is in E generation we're going to add one edge and span one new vertex adjacent to the ones we're already spanning. Now as a greedy algorithm Prim is simply going to select the cheapest edge that allows it to span one additional new vertex. Now the start of the algorithm here we're not really spamming anything. We are sort of thinking of ourselves as growing from and currently spanning the vertex in the upper right. So what are the edges in which we can span an adjacent vertex? Well, there is two inches. There is the top inch that costs one then we'll span an addition in the upper left vertex or the is the edge with cost two on the right. If we include that, we'll be able to span the vertex in the bottom right. So we're not going to be greedy, we're just going to choose the cheaper edge, the edge of cost one. Now, the vertices that our tree thus far spans are the top two vertices. So, in the next iteration, we want to add one more edge [COUGH] to span one additional new vertex. And now we see that there are three edges sticking out of what we're spanning thus far that will allow us to span a new edge. There's the edges that have cost two, three, and four. The two and the three will allow us to span the vertex in the bottom right. If we pick the four, that will allow us to span the vertex in the bottom left. Yeah, and we're going to be greedy, so of these three candid edges, we're going to pick the cheapest one which is the edge of cost two. So now the mold that we've been growing is in effect, covers all of the verticies except for the one in the bottom left. So now in the final iteration we want to include one more edge so that we span that final remaining vertex. The one in the bottom left. Note that there's there was this edge of cause three that we never added. But it got sucked up into the tree that we grew anyways. So we're going to go ahead and ignore that. Adding the three wouldn't allow us to span any more vertices. In fact, it would create a loop which we don't want. So we're going to say, okay. We'll have the two edges that would allow us to span an extra vertex. There's the four and there's the five. We're going to be greedy, we're going to pick the four. And once we have the edges of the cost one and two and three and four we have a spanning tree there's no loops there's a path from any vertex to any other vertex along the pink edges, the cost is seven you might recall from the previous video this is indeed the minimum cost spanning tree of this graph. Of course, the fact that we have this simple procedure that works correctly in this toy example, which is four vertices and five edges, really means nothing. I mean you shouldn't draw any immediate conclusions that this is a good algorithm in general even though that is going to be the case. So let's next go and actually define the algorithm generally. So if you have a general graph, what does it mean to start somewhere and grow a mold, span one more vertex each iteration, always proceeding greedily until you are done. So lets spell out the pseudo code on the next slide. So here is Prim's minimum spanning tree algorithm. We're going to start with just two lines of initialization. We're going to maintain a set of vertices, capital X. This is meant to the be the vertices that we span so far. Again, we need some seed vertex from which to start the process. It doesn't matter where, which one we pick. We're going to get the same tree no matter what, so just call it little s. That's an arbitrary vertex from which we start growth. The other thing we're maintaining is, of course, the tree. So that's initially going to be empty. We're going to add one edge to it in each iteration. An invarient that we are going to maintain throughout the algorithm is that the edges that currently reside in the set capital T span the verticies that currently reside in the set capital X. Then we're going to have our main while loop. this is the workhorse of the algorithm. And it's very similar to the one in Dijkstra's algorithm. Namely, each iteration is responsible for picking one edge crossing the current frontier. advancing to include one new vertex. And again, it's going to be greed. The criterion's going to be different, in fact, simpler, than with Dijkstra's Algorithm instead of looking at links. We're just going to say, what's the cheapest edge that allows us to span a new vertex? So the loop's going to keep going, as long as there are vertices that we don't yet span. And then what we do is we search to the edges that allow us to span a new vertex. So which edges are those? Well we want there to be one endpoint in the set X of vertices we already have our tree spanning and we want the other end point to be non-redundant, so we want it to be outside of X. So if we have an edge that crosses the frontier in this sense, one endpoint in X, in endpoint outside that's how we increase the number of spanned vertices by one in an iteration. So if E is the cheapest edge amongst all of those that cross the front here with one end point on either side, that's the one we're going to add to our tree so far capital T in this iteration, it's end point that's not already in capital X, that's going to be the very text that we add to X in this iteration. And again the semantics of an iteration is that we're trying to increase the number of spanned vertices while paying as little as possible, that's the sense in which a prim's algorithm is a greedy algorithm. So as usual with a greedy algorithm, this seems natural enough, but it's not at all clear that it's correct, that it always computes in minimum spanning tree. In fact, if you think about it's not even obvious, it actually computes a spannin tree at all, minimum or otherwise, but it is correct. Let's make that statement precise on the next slide. So the key claim is that Prim's Algorithm is correct. Given any connected input graph, it is guaranteed to output a spanning tree with minimum possible cost. So before we delve into any details, let me just finish this video by telling you about the proof plan. We're going to prove this theorem in two parts. First, we're going to establish that it outputs some spanning tree. Maybe, maybe not minimum. Even that's non trivial. Then we'll worry about arguing that the spanning tree output actually is one of minimum cost. Both parts of the proof are interesting. For part one to argue that we output some spanning tree, we're going to review some preliminaries about graphs and about cuts and about spanning trees and graphs. For part two to argue optimality, we're going to rely on a very neat property of spanning trees, minimum spanning trees called the cut property. I'm happy to report so that the work that we do here and in both parts will bear further fruit later we're going to reuse these ingredients when we prove the correctness of another MST algorithm named McCrustgrals algorithm. For those of you who would much rather talk about running time than correctness don't worry your time will come after we wrap up this correctness proof I'll address how do you implement prim's algorithm quickly in particular using heaps we'll get the running time down to the near linear bound of O of M log n.