2 3 2b Random Graphs And Alternative Models 2004

Let's continue learning about how other training models can be useful, and also what some of their limitations are. So previously, we figured out the degree distribution. And we noted that for Erdos-Renyi Random Graphs, you don't really have hubs. That is, extremely well connected nodes. What we'll talk about next is the emergence of the giant component in Erdos-Renyi Random Graphs and also what their average shortest path is. What I'd like you to do is to go to this app which is a net logo app, to look at how the giant component comes about. In this model, when you click setup, it's just going to start with a number of nodes. So, here I've selected 80, maybe we should up that to. Well, 500, yes that's a nice number. So we are going to the max, and if you just click on go once, you'll be adding. Edges one by one, and this can take quite a while. [laugh] So what you'd like to do it just hit the go button with the little repeat and that will just continuously add edges. And if this is, for whatever reason, too slow for you, you can turn the layout off for a while, you can turn it back on, or you can accelerate by moving the speed slider as well. And what you're looking for is, at what point is the giant component going to start to emerge. So that takes us to the next quiz. Which is just what is the average degree of a node in the graph at the point where the giant component emerges? And, if you remember from week one, the giant component is a component that occupies a finite fraction of the graph as the graph grows. So the idea is that, even for very large graphs, if you have a significant, well, a fraction of the nodes being connected, then this is the giant component. This is somewhat related to percolation, which is studied in physics. So if you look at a 2D lattice, a percolation might look something like this. If this is soil and, the, the gray areas are actually where there's no soil, and so water can go through. The question is, at what density of these empty cells, that is what fraction of all the spots do they need to occupy, such that you start seeing the, the water seeping through? So I'd like you to go to this model as well, and I'll try to find it here, so you can, you know set, of a particular value for P. Here we're going to have 25 percent of the node, of the sites occupied, and we can set up the lattice, and here I can tell that it's just too small, right? It doesn't, the water can't percolate through. You can, instead of trying all possible values, you can also systematically vary p from zero to one. And so, here it is. So, what I'd like you to do is to answer the following quiz question, which is just, what is the percolation threshold in a two D lattice? What fraction of the sites need to be occupied? And this then, is related to our problem of the percolation threshold in the random graph, which is just going to tell us, at what density of the network. And here we're really talking about, by density, in this case, we mean the average degree of the, of each node. At what average degree do you start to see a giant component, and in fact, you should have seen that right around degree one is when the giant component starts to merge, at first its going to look thoroughly stringy and tree-like, and then over time its going to look more and more dense, over time meaning that you're adding more and more edges. Another way of looking at this, which is interesting from an [laugh] epidemiological perspective is that, you know. You want to know how can the disease spread through the network if people are connected, even if it's through several hops. It means that, theoretically, the disease could spread from person to person to person. And there's a nice property of the degree distribution. Because it has an exponential tail and an exponential distribution, will tell you that the average degree of a node that you are connected to excluding yourself. So, how many friends do they have besides you is the average degree in the network. They might say, well, wait a second. How can they have the average number of friends plus me? That's, like, average number plus one. Well. The people you're connected to are actually not average. If you think about it, the probability that you're connected to someone who has three friends is three times as much as the probability of being connected to someone who has just one friend. So, indeed, your, [laugh], your friends do tend to be more popular than average. Simply because more people knew them, and you're more likely to be one of those people. And so, then when we return to this idea of the giant component and the, and the percolation threshold what we're really asking is, you know, I have a friend, what is, what is the average number of other people that they know? If it's at least one other person on average, then this chain can continue. And, you know, were likely to be a part of this giant component through which information or disease etcetera can spread. However, if my friend has fewer than one other friend on average, then it's likely that this chain is just going to terminate sooner rather than later. You might be asking, why is there always just one giant component? Why can't the network accommodate several? In the Erdos-Renyi Random Graph and many other networks, the giant component will occupy some fraction of the network that scales linearly with the size of the, the graph. It means it's some fraction of n. And, all the other smaller components will occupy at least some factor times log n nodes. That they'll incorporate that many nodes. And the question is why, why is this? Well let's look at the counter-example. What if we have two large components. If we were to continue adding edges how long could they really stay separate before merging into one giant component? So what I'd like you to do is to look at this particular net logo app with two components. And what you'll be doing at first is, you're going to keep them separate. So this switch is going to be on. And you can select some number of nodes, 80 is fine. And then we're going to create the nodes and you're going to have the circles and the squares. And while you're keeping them separate, and you're adding edges, say one by one, they're only going to be occurring between the circles and between the squares, right? And so you're going to do this for a while. You can kind of [laugh] let it evolve, right? And what I'd like you to do then is to flip this switch, and this, and then add edges one at a time and see how many edges do you need to add before these components merge and hopefully that will help with some of the intuition about why we have just one giant component. So this is basically the quiz I'd like you to take, which is, tell me how many quakes it took before the components merge. The next thing that we can figure out about our Erdos-Renyi Random Graph is the average shortest path. And this is just how many hops on average does it take between any pairs of nodes. So here I've circled two nodes in blue and this is their average shortest path, it takes three hops. One, two, three. And we would do this for all pairs of nodes. Now what we want to know is can we use knowledge about the average degree and properties of the order Erdos-Renyi Random Graph to figure out how the average tortoise path should scale with the size of the network. So, the calculation that we're going to do, and its going to be an approximation. It's actually extremely difficult to derive exact relationships for the. For, you know, the exact shortest path for a particular model. But, for many of them you can figure out what the general scaling is going to be. So the approach here is that you have a node and it has some number of friends on average, so it might have say two friends at distance one and four friends on average at distance four, and eight friends on average at distance three if the average number of friends is, two. And here I've, I've you know, I've made each branching point to two, but in general, you know, some friends are going to have one other friend and other friends are going to have three other friends. It's the basic idea is that at distance L, you're going to have a number of L-th degree friends that's, that's proportional to c to the l. And this makes the scaling of the average shortest path equal to log n over log c. And again, this isn't an exact quantity but a scaling and approximation. So, there's something very cool about this, which is that. The graph can grow very, very large meaning that it has lots and lots of nodes, and yet any two nodes are going to be connected through a short number of hops. So, here on the X axis I've plotted the number of nodes and on the Y axis we have the average shortest path, and you can see that even as the network gets very large, you get the shortest path growing slowly. It's growing only logarithmically. So let's talk about logarithms for those of you who aren't as familiar about them. As you exponentiate the number for example with two, two to zero power is one. Two to the one is two. Two to the two is four. Two to the three is eight. That exponent is what would happen [laugh] is basically if you were to take the log base two of eight it would be three. Right? So when you place numbers on a logarithmic axis, what happens is that things that are powers of each others. So, one and ten, and ten and 100 will be equally spaced. And, so you'll be able to see exponential and logarithmic relationships between the variables. So if we plot out the number of nodes on the logarithmic axis, on the X axis, what we can see is then a straight line. Meaning that this is the growth of the, of the average shortest path. It's something like log n. And so, just to quiz you, I'd like to know what happens to the, the average shortest path in an Erdos-Renyi graph when it, when the number of nodes increases 100 fold. So, for example, you go from 100 nodes to 10,000 nodes. What will be the proportional change in the average shortest path? The second aspect that I'd like to discuss in addition to all of the neat things that we can predict with our Erdos-Renyi model is the question of how realistic it is. So. We're going to consider some alternative models, alternative ways of generating graphs and we're going to compare them in a series of little quizzes to what you would see with an equivalent Erdos-Renyi graph. So, I would like you to go to this particular [inaudible] model. And so let me just go to that model in fact. Okay, so in this model, you can you know, generate your regular Erdos-Renyi random graph. It's reporting several things back to you. So here's the degree distribution, the size of the giant component, the average degree and the average shortest path within the giant component. So probably you want to start off with using probabilities, p in stead of the number which is the number, average number of neighbors that you want in your network. And you can lay out this network and there's some parameters you can change. For example the, the length of the script. Things or the repulsion strand, and even the spring constant. And you can see how all of those work out for you. The main point is just that you see a little bit of the network structure and you can evaluate what the model is doing. Now, the first model that we'll be looking at is the introduction model. In the Erdos-Renyi model what you have is just nodes, forming edges at random in you might think well in a social net work may be that's not that realistic certainly some encounters are random however lot of people we meet through our friends. So. Keeping the same probability of linking, what I'd like you to do is, C asked you vary this probability of being introduced to your friends. So if it's zero, it means all of your edges are purely random. So it should look really similar to any Erdos-Renyi graph. However, if you increase this probability, most of the introductions most of the edges that you get should occur through introductions. And so, you may want to compare the structures, so let's see what the quiz asks. This is our model. We want to know relative to the Erdos-Renyi random graph whether there is difference in the number of edges, in the number of clues triads that, these guys were all three people linked to each other. The average shortest path, the degree distribution and the size of the giant component. The next model, we'll contain, we'll consider, is the static geographical model. So each node is going to try to connect to some number of it's closest neighbors. Some nodes may end up with more neighbors, simply because a lot of others were close to it and not, you know, but it already had it's three closest neighbors. And for this one, we'll want to turn off the layout. Just so we can actually see, the, the geography at work. I'm going to also switch to the null neighbors of determining the density of the graph. And so I'm just going to just click on stat to you and just highlighting the largest connected component here in red. And what we'd like to know about this one let's see what our, oops. What our quiz is. We would like to know relative to the Erdos-Renyi random graph, whether this kind of static geographical, geographically generated graph has longer, shorter, average shortest paths broader and narrower degree distribution and smaller or larger giant components. So, to compare something like that all you would do is, without changing any other parameters, you would click on Erdos-Renyi to generate an equivalent Erdos-Renyi random graph and then you would look at the size of its giant component, average degree, average shortest path, to contrast with the other model. So going back. The next one that we will consider is the random encounter model. This is somewhat similar to the static geographic model, [laugh]. Except that it's not static. So, people moved around but those encounters have to do whether they were proximate to other people. In fact, if they kind of bump into each other, it's when they form an edge. And then they can kind of like wander around from there, but they, they still retain that edge. And again, we want to compare with the Erdos-Renyi Random Graph. So for this, I also want to, at least initially, turn off the layout function. I'm going to click on random encounter, and see, for example, if the number of neighbors is two, what is the, what does the resulting structure look like. I can apply the layout algorithm and I want to, you know, for this one, you may want to see what happens if you have an average degree of three. Right, and we want to look at this network relative to the equivalent Erdos-Renyi, right? And so again, we want to answer very similar questions about, now are there more fewer clues triodes, is there a smaller or a larger giant component relative to the Erdos-Renyi. The next model we'll look at is a growth model. Here, instead of having all the nodes present upfront, as we do with the Erdos-Renyi model, we're going to be adding them over time just, intermittently, along with the edges that are being added. So you might ask, well, okay, what does that mean? Nodes that have been around longer, do they get more edges? We shall see. So in the growth model you're just clicking I won't switch to that model well I guess I might as well. Okay so you just click on growth and you're going to be similarly looking for things such as the presence or absence of hubs. So again looking at the degree distribution and also looking at the size of the giant component relative to the Erdos-Renyi Random Graph. What we've learned from these other models I hope is that in some instances the Erdos-Renyi model might be possible and it might match these other dynamics. So maybe the size of the giant component wasn't that different. Maybe the degree distribution was similar. However sometimes these different dynamics actually led to different graph properties. And so what we'll look at next is some alternative models more formally that can explain some of the things such as degree distributions observed in real world networks that don't quite match the Erdos-Renyi binomial degree distribution.