Tip:
Highlight text to annotate it
X
Let's continue learning about how other training models can be useful, and also
what some of their limitations are. So previously, we figured out the degree
distribution. And we noted that for Erdos-Renyi Random
Graphs, you don't really have hubs. That is, extremely well connected nodes.
What we'll talk about next is the emergence of the giant component in
Erdos-Renyi Random Graphs and also what their average shortest path is.
What I'd like you to do is to go to this app which is a net logo app, to look at
how the giant component comes about. In this model, when you click setup, it's
just going to start with a number of nodes.
So, here I've selected 80, maybe we should up that to.
Well, 500, yes that's a nice number. So we are going to the max, and if you
just click on go once, you'll be adding. Edges one by one, and this can take quite
a while. [laugh] So what you'd like to do it just
hit the go button with the little repeat and that will just continuously add edges.
And if this is, for whatever reason, too slow for you, you can turn the layout off
for a while, you can turn it back on, or you can accelerate by moving the speed
slider as well. And what you're looking for is, at what
point is the giant component going to start to emerge.
So that takes us to the next quiz. Which is just what is the average degree
of a node in the graph at the point where the giant component emerges?
And, if you remember from week one, the giant component is a component that
occupies a finite fraction of the graph as the graph grows.
So the idea is that, even for very large graphs, if you have a significant, well, a
fraction of the nodes being connected, then this is the giant component.
This is somewhat related to percolation, which is studied in physics.
So if you look at a 2D lattice, a percolation might look something like
this. If this is soil and, the, the gray areas
are actually where there's no soil, and so water can go through.
The question is, at what density of these empty cells, that is what fraction of all
the spots do they need to occupy, such that you start seeing the, the water
seeping through? So I'd like you to go to this model as
well, and I'll try to find it here, so you can, you know set, of a particular value
for P. Here we're going to have 25 percent of the
node, of the sites occupied, and we can set up the lattice, and here I can tell
that it's just too small, right? It doesn't, the water can't percolate
through. You can, instead of trying all possible
values, you can also systematically vary p from zero to one.
And so, here it is. So, what I'd like you to do is to answer
the following quiz question, which is just, what is the percolation threshold in
a two D lattice? What fraction of the sites need to be
occupied? And this then, is related to our problem
of the percolation threshold in the random graph, which is just going to tell us, at
what density of the network. And here we're really talking about, by
density, in this case, we mean the average degree of the, of each node.
At what average degree do you start to see a giant component, and in fact, you should
have seen that right around degree one is when the giant component starts to merge,
at first its going to look thoroughly stringy and tree-like, and then over time
its going to look more and more dense, over time meaning that you're adding more
and more edges. Another way of looking at this, which is
interesting from an [laugh] epidemiological perspective is that, you
know. You want to know how can the disease
spread through the network if people are connected, even if it's through several
hops. It means that, theoretically, the disease
could spread from person to person to person.
And there's a nice property of the degree distribution.
Because it has an exponential tail and an exponential distribution, will tell you
that the average degree of a node that you are connected to excluding yourself.
So, how many friends do they have besides you is the average degree in the network.
They might say, well, wait a second. How can they have the average number of
friends plus me? That's, like, average number plus one.
Well. The people you're connected to are
actually not average. If you think about it, the probability
that you're connected to someone who has three friends is three times as much as
the probability of being connected to someone who has just one friend.
So, indeed, your, [laugh], your friends do tend to be more popular than average.
Simply because more people knew them, and you're more likely to be one of those
people. And so, then when we return to this idea
of the giant component and the, and the percolation threshold what we're really
asking is, you know, I have a friend, what is, what is the average number of other
people that they know? If it's at least one other person on
average, then this chain can continue. And, you know, were likely to be a part of
this giant component through which information or disease etcetera can
spread. However, if my friend has fewer than one
other friend on average, then it's likely that this chain is just going to terminate
sooner rather than later. You might be asking, why is there always
just one giant component? Why can't the network accommodate several?
In the Erdos-Renyi Random Graph and many other networks, the giant component will
occupy some fraction of the network that scales linearly with the size of the, the
graph. It means it's some fraction of n.
And, all the other smaller components will occupy at least some factor times log n
nodes. That they'll incorporate that many nodes.
And the question is why, why is this? Well let's look at the counter-example.
What if we have two large components. If we were to continue adding edges how
long could they really stay separate before merging into one giant component?
So what I'd like you to do is to look at this particular net logo app with two
components. And what you'll be doing at first is,
you're going to keep them separate. So this switch is going to be on.
And you can select some number of nodes, 80 is fine.
And then we're going to create the nodes and you're going to have the circles and
the squares. And while you're keeping them separate,
and you're adding edges, say one by one, they're only going to be occurring between
the circles and between the squares, right?
And so you're going to do this for a while.
You can kind of [laugh] let it evolve, right?
And what I'd like you to do then is to flip this switch, and this, and then add
edges one at a time and see how many edges do you need to add before these components
merge and hopefully that will help with some of the intuition about why we have
just one giant component. So this is basically the quiz I'd like you
to take, which is, tell me how many quakes it took before the components merge.
The next thing that we can figure out about our Erdos-Renyi Random Graph is the
average shortest path. And this is just how many hops on average
does it take between any pairs of nodes. So here I've circled two nodes in blue and
this is their average shortest path, it takes three hops.
One, two, three. And we would do this for all pairs of
nodes. Now what we want to know is can we use
knowledge about the average degree and properties of the order Erdos-Renyi Random
Graph to figure out how the average tortoise path should scale with the size
of the network. So, the calculation that we're going to
do, and its going to be an approximation. It's actually extremely difficult to
derive exact relationships for the. For, you know, the exact shortest path for
a particular model. But, for many of them you can figure out
what the general scaling is going to be. So the approach here is that you have a
node and it has some number of friends on average, so it might have say two friends
at distance one and four friends on average at distance four, and eight
friends on average at distance three if the average number of friends is, two.
And here I've, I've you know, I've made each branching point to two, but in
general, you know, some friends are going to have one other friend and other friends
are going to have three other friends. It's the basic idea is that at distance L,
you're going to have a number of L-th degree friends that's, that's proportional
to c to the l. And this makes the scaling of the average
shortest path equal to log n over log c. And again, this isn't an exact quantity
but a scaling and approximation. So, there's something very cool about
this, which is that. The graph can grow very, very large
meaning that it has lots and lots of nodes, and yet any two nodes are going to
be connected through a short number of hops.
So, here on the X axis I've plotted the number of nodes and on the Y axis we have
the average shortest path, and you can see that even as the network gets very large,
you get the shortest path growing slowly. It's growing only logarithmically.
So let's talk about logarithms for those of you who aren't as familiar about them.
As you exponentiate the number for example with two, two to zero power is one.
Two to the one is two. Two to the two is four.
Two to the three is eight. That exponent is what would happen [laugh]
is basically if you were to take the log base two of eight it would be three.
Right? So when you place numbers on a logarithmic
axis, what happens is that things that are powers of each others.
So, one and ten, and ten and 100 will be equally spaced.
And, so you'll be able to see exponential and logarithmic relationships between the
variables. So if we plot out the number of nodes on
the logarithmic axis, on the X axis, what we can see is then a straight line.
Meaning that this is the growth of the, of the average shortest path.
It's something like log n. And so, just to quiz you, I'd like to know
what happens to the, the average shortest path in an Erdos-Renyi graph when it, when
the number of nodes increases 100 fold. So, for example, you go from 100 nodes to
10,000 nodes. What will be the proportional change in
the average shortest path? The second aspect that I'd like to discuss
in addition to all of the neat things that we can predict with our Erdos-Renyi model
is the question of how realistic it is. So.
We're going to consider some alternative models, alternative ways of generating
graphs and we're going to compare them in a series of little quizzes to what you
would see with an equivalent Erdos-Renyi graph.
So, I would like you to go to this particular [inaudible] model.
And so let me just go to that model in fact.
Okay, so in this model, you can you know, generate your regular Erdos-Renyi random
graph. It's reporting several things back to you.
So here's the degree distribution, the size of the giant component, the average
degree and the average shortest path within the giant component.
So probably you want to start off with using probabilities, p in stead of the
number which is the number, average number of neighbors that you want in your
network. And you can lay out this network and
there's some parameters you can change. For example the, the length of the script.
Things or the repulsion strand, and even the spring constant.
And you can see how all of those work out for you.
The main point is just that you see a little bit of the network structure and
you can evaluate what the model is doing. Now, the first model that we'll be looking
at is the introduction model. In the Erdos-Renyi model what you have is
just nodes, forming edges at random in you might think well in a social net work may
be that's not that realistic certainly some encounters are random however lot of
people we meet through our friends. So.
Keeping the same probability of linking, what I'd like you to do is, C asked you
vary this probability of being introduced to your friends.
So if it's zero, it means all of your edges are purely random.
So it should look really similar to any Erdos-Renyi graph.
However, if you increase this probability, most of the introductions most of the
edges that you get should occur through introductions.
And so, you may want to compare the structures, so let's see what the quiz
asks. This is our model.
We want to know relative to the Erdos-Renyi random graph whether there is
difference in the number of edges, in the number of clues triads that, these guys
were all three people linked to each other.
The average shortest path, the degree distribution and the size of the giant
component. The next model, we'll contain, we'll
consider, is the static geographical model.
So each node is going to try to connect to some number of it's closest neighbors.
Some nodes may end up with more neighbors, simply because a lot of others were close
to it and not, you know, but it already had it's three closest neighbors.
And for this one, we'll want to turn off the layout.
Just so we can actually see, the, the geography at work.
I'm going to also switch to the null neighbors of determining the density of
the graph. And so I'm just going to just click on
stat to you and just highlighting the largest connected component here in red.
And what we'd like to know about this one let's see what our, oops.
What our quiz is. We would like to know relative to the
Erdos-Renyi random graph, whether this kind of static geographical,
geographically generated graph has longer, shorter, average shortest paths broader
and narrower degree distribution and smaller or larger giant components.
So, to compare something like that all you would do is, without changing any other
parameters, you would click on Erdos-Renyi to generate an equivalent Erdos-Renyi
random graph and then you would look at the size of its giant component, average
degree, average shortest path, to contrast with the other model.
So going back. The next one that we will consider is the
random encounter model. This is somewhat similar to the static
geographic model, [laugh]. Except that it's not static.
So, people moved around but those encounters have to do whether they were
proximate to other people. In fact, if they kind of bump into each
other, it's when they form an edge. And then they can kind of like wander
around from there, but they, they still retain that edge.
And again, we want to compare with the Erdos-Renyi Random Graph.
So for this, I also want to, at least initially, turn off the layout function.
I'm going to click on random encounter, and see, for example, if the number of
neighbors is two, what is the, what does the resulting structure look like.
I can apply the layout algorithm and I want to, you know, for this one, you may
want to see what happens if you have an average degree of three.
Right, and we want to look at this network relative to the equivalent Erdos-Renyi,
right? And so again, we want to answer very
similar questions about, now are there more fewer clues triodes, is there a
smaller or a larger giant component relative to the Erdos-Renyi.
The next model we'll look at is a growth model.
Here, instead of having all the nodes present upfront, as we do with the
Erdos-Renyi model, we're going to be adding them over time just,
intermittently, along with the edges that are being added.
So you might ask, well, okay, what does that mean?
Nodes that have been around longer, do they get more edges?
We shall see. So in the growth model you're just
clicking I won't switch to that model well I guess I might as well.
Okay so you just click on growth and you're going to be similarly looking for
things such as the presence or absence of hubs.
So again looking at the degree distribution and also looking at the size
of the giant component relative to the Erdos-Renyi Random Graph.
What we've learned from these other models I hope is that in some instances the
Erdos-Renyi model might be possible and it might match these other dynamics.
So maybe the size of the giant component wasn't that different.
Maybe the degree distribution was similar. However sometimes these different dynamics
actually led to different graph properties.
And so what we'll look at next is some alternative models more formally that can
explain some of the things such as degree distributions observed in real world
networks that don't quite match the Erdos-Renyi binomial degree distribution.