Tip:
Highlight text to annotate it
X
>> JOHN HEIDEMAN: Okay. My name is John. These are my colleagues at USC and this is the response
by the department of Homeland Security. So we started with this question of what can
things tell us about damage. So these are the ways we use to digest problems on the
Internet what happens to the infrastructure. But we actually have a broader goal which is not
just to track which is not just to track outages here but in all the world. We want to know
about natural disasters beauty also want to know about unfortunately human disasters,
Egypt, and lib bra, and Syria. We want to know about the shape of outages, long-term
outages or things that may affect people for a shorter time. And not just this core routing
infrastructure, but also home networks and systems. Because we actually believe that
those end systems are where a lot of the outages happen.
So as some background what pings do, tell us if an IP address is active. And so you
have ping command and over time, you get back your reprise. Those are green dots. You can
interrupt this. So it tells you how that address is being used. You can also get negative replies.
So if we send a ping and get no reply that's a black dot. And so by combining these you
can tell sort of what's going on in the network.
So pings tell you something about the network, because if you get a bunch of positive replies
that says that part of the network is up or at least something in that part of the network
is up. That's pretty simple. And you would like to think that if you get negative replies
that says that part of the network is down, but unfortunately, they don't quite tell you
enough. Because there's a bunch of ambiguity in negative replies. Could be the network's
down, it could be that somebody shut their laptop or it could be that the addresses get
reassigned dynamically and change or have a firewall or all this other stuff. Okay?
So the challenge is how do we disambiguate negative replies to tell a hurricane struck
when he closed his laptop, because those have different importance, of course. And so this
ambiguity is this challenge that we deal with. And to deal with this, we look at, not just
one address, which would be very difficult to disambiguate but look at a bunch of addresses
that are adjacent to each other. And so if you put in a bunch of addresses you'll get
a bunch of responses but over time, a bunch respond and a bunch stop responding. That's
probably a good indication that the network which used to be healthy and happy and up
is no long he were up. That's what we call an outage. That's how we do tech outages.
So we do this in blocks around the Internet. So this bar here represents 256 addresses
in one slice 24, that's a block of connect addresses on the Internet. And you can see
a bunch of green and black dots. This is real data and you can see an outage in the middle
of there, that little black line. And if we do this to a whole bunch of blocks, you can
see sometimes those outages are correlated to adjacent blocks, have an outage at the
Sametime, possibly caused by the same thing. Sometimes we see outages in other boxes at
different times. So we map these outages we detect from the raw ping data to outages.
We map each block into a single line and then we -- and then we cluster those lines, and
you can see patterns in here -- well, I can see patterns in here of lines that show up
next to each other that they represent different outages. And just to give you a better -- so
this -- we see a couple of different outage that is we took data in the past. And what
I'm going to show you is plots like this, where we have time going across the X-axis,
we have blocks going across the Y-axis, and you see colored splotches in the middle. Those
are outages, and in fact, this data here is our data on Hurricane Sandy. So I'll come
back to that in a little bit. I don't want to get ahead of myself, but -- so for Hurricane
Sandy data, we reanalyze an existing data set that we've been collecting. We've been
collecting data in this way since 2006. So we probe a random sample of 41,000/24 blocks
in the Internet. For two weeks we commence and the data is available. And the technical
details about how we collect it are also available on this slide.
So, getting very specific about Sandy we took one of our data sets, 41,000 blocks, we looked
at about 12,000 of those in the U.S. Of those 12,000, about 4,000 have been updated. We
can analyze, and this picture on the right shows where we geolocate those blocks to.
Some of the blocks, this big blob in the Atlantic is not sand. These are blocks where we can't
tell. This is our data and this is Sandy. Three days before you see a pretty calm network.
Then when Sandy makes landfall, which was very conveniently at midnight UTC, you can
see a bunch of blocks go out. Okay? A bunch of networks go out in our sample. So, the
next step you want to take is to quantify this. It's easy to guess, yes, of course,
networks went down. We look at the margin distribution here. If you sum up the number
of colored dots that each Column that's the number that appears in this thing here. It
represents the percentage of the world, the percentage of the U.S. that's down at any
given instant.
This plots that marginal distribution. The Blue Line is median per day. Each red X is
a measurement over 11 minutes. And what you can see a, first, the Internet's a big place.
Some of it is always down. That's a fact of life. Right? And our rate which is reasonably
steady is about 210ths of a percentage. And you can see that very clearly in the three
days before landfall. After landfall we see that rate doubles. So if you just look at
the U.S. twice -- there was twice as much outage the day Sandy -- after Sandy hit. Okay?
And then the final thing you can see in our data is after four days it pretty much came
back to the baseline. So, the neat thing is, by doing these stupid things, these simple
measurements, and analyzing them in the right way, you can get quite a bit of information
out of -- and understanding about what's going on on the ground. So, the next step s you
know, is this really Sandy, is this just noise, because we do see all kinds of stuff in our
data. So to isolate -- to demonstrate that this was actually correlated with Sandy we
geolocated all these IP addresses, all these blocks, I'm sorry, and you can see the colored
bars in the middle. The light colored bars are New York and New Jersey. So the big uptick
is correlated with outages in New York and New Jersey, pretty compelling that what we
are seeing here is the effects of Sandy. And we actually plotted and geolocated on the
map and so you can see three days before not much in the Northeast, three days, 4 days
after you can see -- I should stop waving my hand. You can see a lot of stuff in the
New York, New Jersey area. And then it tapers off. So we find this is a pretty compelling
evidence of Sandy and that quantification of how much damage was seen at least in our
sample.
So, this is Sandy, just as evidence -- just as evidence of generality of this approach,
we actually have another -- data from a number of other major events, this is the Japanese
earthquake in 2011. In March, 2011. This is the Egyptian revolution, of course, we started
our survey just after they shut off the network but we can see it come back on.
We can see the big world events, the neat thing is that we see less publicized events,
there is also a pretty big outage in Australia. It didn't make the news because there was
no Australian revolution and because they have a much bigger footprint than Egypt. And
so most of us know what's up. But it's important to know these smaller events as well particularly
if we try to improve the resiliency infrastructure over time. This is a two-week sample where
you see outages in America. So we can start to get a handle on those as well. Two American
carriers in one number. Okay. So the bottom line is our goal, they're not just big world
events but try to understand the resiliency of the Internet as a whole so once we can
measure something we can do a better job to improve it.
So we're actually on our way to trying to accomplish this task. And the first step to
understanding the U.S., infrastructure as a whole is to track all IP goes not just a
random sample. The challenge here is our current approach takes -- is quite traffic-intensive.
And the neat thing is, we think we can get our traffic rates down to about 20 probes
an hour which is less than 1% of the background radiation. It's less -- it would be a tiny
increase in the amount of traffic that you would see just to be on the Internet at all.
So when we get to this point a single machine can track outages in the entire IP addresses
where we hope to show results on that very shortly.
So just to summarize, so we show that with pings you can track the effects of the natural
disasters, and on critical infrastructure like the Internet, details about this are
in our technical work and the data is available and I'd love your feedback. So, you want to
hold the questions to the end. So I think I'll hand to our next.