Tip:
Highlight text to annotate it
X
>> [MUSIC]
>> DAVID J. MALAN: All right.
So this is CS50 and this is the end of week 10.
So some of you might have seen this already, but being circulated of late
is an article that I thought I'd read an excerpt from and then show you a
three minute video that paints the same picture.
It was really a touching story, I thought, of this intersection of the
real world with genuinely compelling uses of technology.
>> So the article was entitled, "A boy oversleeps on train, uses Google Maps
to find family 25 years later." And the first couple of paragraphs were,
"When Saroo was five years old he went with his older brother to scrounge for
change on a passenger train in a town about two hours
from his small hometown.
Saroo became tired and hopped on a nearby train where he thought his
brother was, then fell asleep.
When he woke up he was in Calcutta, nearly 900 miles away.
Saroo tried to find his way back, but he didn't know
the name of his hometown.
And as a tiny illiterate boy in a vast city full of forgotten children he had
virtually no chance of getting home.
>> He was a street child for a while until a local adoption agency hooked
him up with an Australian couple who brought him to
live in Hobart, Tasmania.
Saroo moved there, learned English, and grew up.
But he never stopped looking for his family and his hometown.
>> Decades later, he discovered Google Earth and followed rail tracks.
And giving himself a prescribed radius based on how long he thought he was
asleep and how fast he thought the train was going, he knew he'd grown up
in a warm climate, he knew he spoke Hindi as a child, and he'd been told
that he looked like he was from East India.
>> Finally, after years of scouring the satellite photos, he
recognized a few landmarks.
And after chatting with an administrator of a nearby town's
Facebook page, he realized he'd found home."
>> So here then is the video telling that tale from his perspective.
>> [VIDEO PLAYBACK]
>> -It was 26 years ago and I was just about to turn five.
We got to the train station and we boarded a train together.
My brother just said I'll stay here and I'll come back.
And I just thought, well, you know, I might as well just go to sleep and
then he'll just wake me up.
And when I wake up the next day, the whole carriage was empty on a runaway
train, a ghost train taking me I don't know where.
>> I was adopted out to Australia to a Australian family.
And Mom had decorated my room with the map of India, which she
put next to my bedside.
I woke up every morning seeing that map, and hence, it sort of kept the
memories alive.
>> People would say, you're trying to find a needle in a haystack.
Saroo, you'll never find it.
I'd have flashes of the places that I used to go, the flashes
of my family's faces.
There was the image of my mother sitting down with her legs crossed
just watching her cry.
Life is just so hard.
That was my treasure.
>> And I was looking in Google Map and realized there's Google Earth as well.
In a world where you could zoom into I started to have all these thoughts and
what possibilities that this could do for me.
I said to myself, well, you know, you've got all the photographic
memories and landmarks where you're from and you know what
the town looks like.
This could be an application that you can use to find your way back.
>> I thought, well, I'll put a dot on Calcutta Train Station in a radius
line that you should be searching around this area.
I came across these train tracks.
And I started following it and I came to a train station which reflected the
same image that was in my memories.
>> Everything matched.
I just thought, yep.
I know where I'm going.
I'm just going to let the map that I have in my head to lead me and take me
back to my hometown.
>> I came to the doorstep of the house that I was born and walked around
about fifteen meters around the corner.
There was three ladies standing outside adjacent to each other.
And the middle one stepped forward.
And I just thought, this is your mother.
She came forward, she hugged me, and we were there for about five minutes.
>> She grabbed my hand and she took me to the house and got on the phone and she
rang my sister and my brother to say that your brother has just all of the
sudden appeared like a ghost.
>> And then the family was reunited again.
Everything's all good.
I help my mother out.
She doesn't have to be slaving away.
She can lead the rest of her life in peace.
>> It was a needle in a haystack, but the needle was there.
Everything's there.
Everything we have in the world is the tap of a button.
But you've got to have the will and the determination to wanting it.
>> [END VIDEO PLAYBACK]
>> So a really sweet story.
And it actually reminds me of quite a topic that's been getting quite a bit
of attention of late in The Crimson, more nationally in general.
Especially as MOOCs are taking the stage of late.
MOOCs being these massive and open online courses of which CS50 is one.
>> And people talking about how, for instance, the humanities aren't really
catching up or aren't nearly as in vogue as they once were.
And I would encourage you guys, much like Jonathan did on Monday, to think
about as you exit 50, and we know already about 50% of you will not
continue on to take another computer science course, and that's totally
fine and expected.
Because one of the overarching goals of a class like this is really to
empower you guys with just an understanding of how all of this stuff
works and how this world of technology works.
>> So that when you are back in your own worlds, whether it's pre-med or
whether it's the humanities or the social sciences or some other field
altogether, that you guys are bringing some technical savvy to the table and
helping to make smart decisions when it comes to the use of and
introduction of technology into your world.
>> For instance I was reminded of late too of two of the undergraduate
classes I took two years ago, which were such simple uses of technology
but ever so compelling.
First Nights with Professor Tom Kelly if you've taken the class.
It's a class on classical music on this stage here where you learn a
little something about music.
It's actually First Nights that CS50 borrowed the idea of tracks for those
less comfortable in between and more comfortable.
>> In my time they had different tracks for kids with absolutely no music
experience like me, and then kids who had been performing since they were
five years old.
And that class, for instance, just had a website like most any other, but it
was a website that allowed you to explore music on it and play back
musical clips from class, from the web, and just use technology in a very
seamless way.
>> Another class years later that I audited, essentially, in grad school,
Anthro 1010, Introduction to Archaeology here.
It was amazing.
And one of the most compelling yet super obvious, in retrospect, uses of
software was that the professors in that class used Google Earth.
We were sitting across the street in some lecture hall.
And you couldn't travel, for instance, to the Middle East to the dig that one
of the professors had just come back on, but we could do that virtually by
flying around in Google Earth and looking at a bird's eye view at the
dig site he had just returned from a week ago.
>> So I would encourage you guys, especially in the humanities, to go
back to those departments after this class bringing your final projects
with you or ideas of your own, and see just what you can do to infuse your
own fields in humanities or beyond with a little bit of this sort of
thing that we've explored here in CS50.
>> So with that picture painted, thought we'd try to tackle two things today.
One, try to give you a sense of where you can go after 50.
And in particular, if you choose to tackle a web based project as is
incredibly common, how you can go about taking off all of CS50's
training wheels and going out there on your own and not having to rely on a
PDF or a specification of a pset?
Not having to rely on a CS50 appliance anymore.
But can really pull yourself up by your bootstraps.
>> With that said, C-based final projects are welcome.
Things that use the stand for a portable library in
graphics are welcome.
We just know that statistically a lot of people bite off projects in PHP and
Python and Ruby and MySQL and other environments, so we'll bias some of
our remarks toward that.
>> But a quick look back.
So we took for granted in pset7 the fact that $_SESSION existed.
This was a super global, a global, associative array.
And what does this let you do?
Functionally, what's the feature this gives us?
Yeah?
To track the user's ID.
And why is this useful?
To be able to store inside of this super global JHarvard or [? Scroobs ?]
or Malan's user ID when he or she visits a site.
>> Exactly.
So you don't have to log in again and again.
It would be a really lame world wide web if every time you clicked a link
on a site like Facebook or every time you clicked on an email in Gmail you
had to re-authenticate to prove that it's still you and not your roommate
who might have walked up to your computer in your absence.
>> So we use SESSION to just remember who you are.
And how is this implemented underneath the hood?
How does a website that uses , the protocol that web browsers and servers
speak, how does HTTP, which is a stateless protocol, let's say.
>> And by stateless I mean, once you connect to a website, download some
HTMLs, some JavaScript, some CSS, your browser's icon stops spinning.
You don't have a constant connection to the server typically.
That's it.
There's no state maintained constantly.
So how is SESSION implemented in such a way that every time you do visit a
new page, the website remembers who you are?
What's the underlying implementation detail?
Shout it out.
It's one word.
>> Cookies.
All right.
So cookies.
Well, how are cookies used?
We'll recall that a cookie is generally just a piece of information.
And it's often a big random number, but not always.
And a cookie is planted on your hard drive or in your computer's RAM so
that every time you revisit that same website, your browser reminds the
server, I am user 1234567.
I am user 1234567.
>> And so long as the server has remembered that user 1234567 is
JHarvard, the website will just assume that you are who you say you are.
And recall that we present these cookies sort of in the form of a
virtual hand stand.
It's sent in the HTTP headers just to remind the server that you are who it
thinks you are.
>> Of course, there's a threat.
What threat does this open us up to if we're essentially using sort of a club
or an amusement park mechanism for remembering who we are?
>> If you copy someone's cookie and hijack their session, so to speak, you
can pretend to be someone else and the website most likely is just going to
believe you.
So we'll come back to that.
Because the other theme for today beyond empowerment is also talking
about the very scary world we live in and just how much of what you do on
the web, how much of what you do even on your cell phones today can be
tracked really by anyone between you and point B.
>> And Ajax, recall.
We looked only briefly at this, although you've been using it
indirectly in pset8 because you're using Google Maps and because you're
using Google Earth.
Google Maps and Google Earth don't download the entire world to your
desktop, obviously, the moment you load pset8.
It only downloads a square of the world or a bigger square of the earth.
And then every time you sort of steer out of range you might notice--
especially if on a slow connection-- you might see some gray for a moment
or a bit of fuzzy imagery as the computer downloads more such tiles,
more such imagery from the world or the earth.
>> And Ajax is generally the technique by which websites are doing that.
Once you need more of the map, your browser is going to use Ajax, which is
not itself a language or technology, it's just a technique.
It's the use of JavaScript to go get more information from a server that
allows your browser to go get what's to the east or what's to the west of
what's otherwise currently being shown in that map.
So this is a topic that many of you will encounter either directly or
indirectly via final projects if you choose to make something that's
similarly dynamic that's pulling data from some third party website.
>> So we've got a really exciting next Wednesday ahead.
Quiz one, the information for which is on CS50.net already.
Know that there'll be a review session this coming Monday at 5:30.
The date and time is already posted on CS50.net in that About sheet.
And do let us know you have any questions.
Pset8 meanwhile is already in your hands.
>> And let me just address one FAQ to save folks some stress.
For the most part a lot of the chatter we see at office hours and a lot of
the bugs we see reported on Discuss are indeed bugs in a student's code.
But when you've encountered something like the Google Earth plug-in crashing
or not even working and you are confident it's not you, it's not a
[? chamad ?] issue, it is not a bug you introduced into the
distribution code.
>> Realize just FYI--
this is sort of plan Z--
that the last time we used this problem set and we ran into similar
issues, there's a line of code in service.js that essentially is this,
that says, turn buildings on.
And they work around the last time we did this in, again, corner cases where
students just couldn't get the darn thing to work is change true to false
in that one line of code.
And you'll find it if you search through service.js.
>> I don't recommend this because you will create the most barren landscape
of Cambridge, Massachusetts.
This will literally flatten your world so that all you see are the teaching
fellows and course assistants on the horizon and no buildings.
But realize for whatever reason the Google Earth plug-in seems still to be
buggy a year later, so this might be your fail save.
So rather than resort to tears, resort to turning buildings off if you know
it's the plug-in that's not cooperating on your Mac or PC.
But, this is again last resort if you're sure it's not a bug.
>> So the Hackathon.
A couple of teasers just to get you excited.
We had quite a few RSVPs.
And just to paint a picture of what awaits, I thought I'd give you a few
seconds recall of this imagery from last year.
>> [MUSIC]
>> DAVID J. MALAN: Wait, oh.
We even have our literal CS50 shuttles.
>> [MUSIC]
>> DAVID J. MALAN: So that's what awaits you in terms of the Hackathon.
And this will be an opportunity, to be clear, not to start your final
projects but to continue working on your final projects alongside
classmates and staff and lots of food.
And again, if you're awake at 5:00 AM we'll take you down the road to IHOP.
>> The CS50 fair, meanwhile, is the climax for the entire class where
you'll bring your laptops and friends, maybe even family to a room on campus
down the street to exhibit your projects on laptops, on tall tables
like this with lots of food and friends and music in the background,
as well as our friends from industry.
Companies like Facebook and Microsoft and Google and Amazon and bunches of
others so that if interested in just hearing about the real world or
chatting with folks about real world internship or full time opportunities,
know that some of our friends from industry will be there.
And a couple of pictures we can paint here are as follows.
>> [MUSIC]
>> DAVID J. MALAN: All right.
So that then is the CS50 fair.
So let's now proceed to tell a story that really will empower you hopefully
for things like final projects.
So one of few little things to seed your mind, either for final projects
or just more generally for projects that you might decide to tackle after
the course, these are all documented on manual.cs50.net where the CS50
manual where we have lots of techniques documented.
>> And this is just shorthand notation for saying that there exists in the
world things called SMS to email gateways, which is a fancy way of
saying, there's servers in the world that know how to convert emails to
text messages.
So if for your final project you want to create some sort of mobile themed
service that allows you to alert friends or users to events on campus
or what's being served in the D Hall that night or any such alert feature,
know that it's simple as sending an email as with PHPMailer which you
might have used for pset7 or we saw briefly a week or so ago, to
addresses like this.
>> And in fact you can text this assuming your friend has an unlimited texting
plan and you don't want to charge them $0.10.
But if you send an email to your friend who you know to have Verizon or
AT&T using Gmail and just sending it to their phone number at whatever the
sub domain there is, realize you will send a text message.
>> But this is one of those things to be careful of.
If you troll through last year's CS50 videos I think it was, a horrific,
horrific, horrific bug I wrote in code ended up sending about 20,000 text
messages live to our students in class.
And only because someone noticed that they were getting multiple text
messages from me did I have the wherewithal to hit Control C quickly
and stop that process.
Control C, you recall, is your friend in instances of infinite loop.
So beware the power we have just given to you rather irresponsibly, most
likely, based on my own experience.
But that's on the web and has been there for some time.
>> All right.
So textmarks.com.
So this is a website.
And there's bunches of others out there as well that we've actually used
as a class for years to be able to receive text messages.
Unfortunately, sending text messages is easy as sending emails like that.
Receiving's a little harder, especially if you want to have one of
those sexy short codes that's only five or six digits long.
>> So for instance, for years you've been able to send a text message-- and you
can try this as well--
to 41411.
And that's the phone number for this particular startup.
And if you send a message to 41411--
I'll just write it up here, so 41411--
and then send them a message like SBOY for Shuttle Boy.
And then type in something like mather quad.
So you send that text message to that phone number.
Within a few seconds you should get back a response from the CS50 Shuttle
Boy service, which is the shuttle scheduling software that we've had out
there on the web for some time.
And it will respond to you via text message.
>> Because what we have done as a class, as a programmer, is to write software,
configured our free account with text marks to listen for text messages sent
to SBOY at that number.
And what they do is forward those text messages to our PHP-based website as
HTTP parameters saying, here.
This user with this phone number sent you this text message.
Do with it what you want.
>> So we wrote some software that upon receiving a string like SBOY mather
quad, we parse it.
We figure out where the spaces are between words.
And we as a class decide how to respond to that.
And if you try that now, for instance, you should see, via response within a
few seconds, the next few shuttles going from mather to the quad if any.
And there's other stops.
You can type in Boylston or other such stops on campus, and it should
recognize those words.
>> So parse.com.
This is another service that we've been pointing some students at for
final projects that's wonderful in that it's free for a
reasonable amount of usage.
And if I go to parse.com you'll see that this is an alternative to
actually having something like your own MySQL database.
And frankly, it's just kind of mesmerizing.
This is what's inside of the cloud even on a cloudy day.
>> So parse.com allows you to do a bunch of interesting things.
And there's other alternatives to this out there.
For instance, you can use them as your back end database.
So you don't need to have a web hosting company.
You don't need to have a MySQL database.
You can instead use their back end.
>> If you're doing a mobile project for Android or iOS or the like, know that
there exists things like push services so you can push alerts to your friends
or your users' home screens.
And then a bunch of other features as well.
>> So if you have interest, check out these websites and websites like them
to just see how many other peoples' shoulders you can stand on to make
really cool software of your own.
>> Now in terms of authentication, an FAQ, is how do you actually guarantee
that your users are people on campus, Harvard students or faculty or staff?
So CS50 has its own authentication service called CS50 ID.
Go to that URL and you can restrict your website to anyone with a Harvard
ID, for instance.
So know that we can handle that.
You guys should not be in the business of saying, what's your Harvard ID?
What's your Harvard PIN?
Let me now do something with it.
We'll do all of that.
And what we'll give you back is someone's name and email address, but
not anything sensitive.
>> An app on a mobile device, it can be made to work on a mobile device, but
it's not quite designed for that.
So you'll end up spending a non trivial amount of time doing so.
So I would discourage that route for now.
This is really intended for web based applications.
>> So web hosting.
So if you haven't seen on the course's homepage--
and here's where we'll begin a story--
web hosting is all about paying for usually a service, host a server owned
by someone else on the web that has an IP address, and you then put your
website on it.
And they usually give you email accounts and databases
and other such features.
>> Know that if you don't want to actually pay for such, go to that URL
there and CS50 actually has a non-profit account that you can use to
actually have not http://project inside of the appliance
for your final project.
If you actually want it to be something like, isawyouharvard.com,
you can buy that domain name-- although not that particular one-- and
then you can go about hosting it on a public web server like we can offer
you guys through here.
>> And in fact if unfamiliar, if you've never been to
isawyouharvard.com, one, go there.
But two, know that that was a young woman's name by Tej To Toor Too two
years ago, three years ago, who was a CS50 alumni who happened a day or two
before the CS50 fair sent out an email to her house mailing list and voila.
Two days later by the CS50 fair, she had hundreds of users all creeping on
each other on her website and saying how they had seen
her or him on campus.
So that's one of CS50's favorite success stories from
a CS50 final project.
>> So how do you go about putting a website like that on the internet?
Well, there's a few such ingredients here.
So one, you have to buy a domain name.
There are bunches of places in the world from which you can
buy a domain name.
And for instance, one that we recommend only because it's popular
and it's cheap is called namecheap.com.
But you can go godaddy.com and dozens of others out there.
You can read up on reviews.
>> But for the most part it doesn't matter from whom you
buy a domain name.
And they vary in price and they vary in suffix.
The suffixes like .com, .net, .org, .io, .tv, those
actually vary in price.
But if we wanted to do something like cats.com we can go to this website,
click Search.
Presumably this one is taken.
But apparently, catsagainst.com is available.
pluscats.com is available.
Lovecats, catscorner, dampcats.net.
All of this hopefully pseudo randomly generated.
If you want cats.pw, $1,500 only, which is a bit insane.
So someone has really snatched up all the cat related domain names here for
varying prices.
>> As an aside, let's see.
Who has cats.com?
Know that you guys have at your disposal fairly
sophisticated commands now.
Like I can type literally who is cats.com?
And because of the way the internet is structured you can actually see who
has registered this.
Apparently this person is [INAUDIBLE] using a proxy service.
So whoever owns cats.com doesn't want the world to know who they are.
So they've registered if through some random privacy service.
But sometimes you actually get actual owners.
>> And this is to say, especially if you're pursuing some startup and you
really want some domain name and you're willing to pay someone else for
it, you can figure out contact information in that way.
>> But also interesting is this.
Let me scroll up to this portion.
So this is that same output.
And this is just tacky.
So apparently cats.com can be yours for the right price.
But what's interesting here is that the name servers--
this is total abuse of what a name server's supposed to be-- your name
server is not supposed to be thisdomainforsale.com.
If we actually choose something like--
let's choose something a little more legitimate like, who is google.com,
and scroll up here.
So here--
what happened there?
Interesting.
Beyond who is--
let's keep it more low key.
>> Who is mit.edu?
OK.
This is helpful.
So this is what I was hoping for.
Legitimate use of the DNS service.
Name servers here indicate the following.
This is MIT's way of saying, whenever someone in the world, wherever they
are, types in mit.edu and hits Enter, your laptop, whether Mac or PC, will
somehow eventually figure out that the people in the world that know what the
IP address is for mit.edu or any of the sub domains at mit.edu or any of
these servers here-- and it actually looks like MITs infrastructure is
pretty robust as you would expect.
They have multiple names servers which is good for redundancy.
And in fact, they seem to be globally distributed across the world.
A bunch of those seem to be in the US, a couple in Asia, one in Europe, two
in somewhere else.
>> But the point here is that DNS that we've been taking for granted and
generally described as a big Excel table that has IP addresses and domain
names is actually fairly sophisticated hierarchical service so that in the
world there's actually a finite number of servers that essentially know where
all of the .coms are or all of the .nets are, all of the
.orgs are, and so forth.
>> So when you go ahead and buy a domain name from a place like Name Cheap or
Go Daddy or any other website, one of the key steps that you'll have to do
you, if you do this even for your final project, is tell the registrar
from whom you're buying the domain name, who in the world knows your
website's IP addresses, who your name servers are.
>> So if you use, for instance CS50's hosting account-- we happen to have
this account through dreamhost.com which is a
popular web hosting company--
they will tell you that you should buy your domain and tell the world that
your domain's name server is ns1.dreamhost.com, ns2.dreamhost.com,
and ns3.dreamhost.com.
>> But that's it.
Buying a domain name means giving them the money and getting ownership of the
domain, but it's more like a rental though.
You get it for a year and then they bill you recurringly for the rest of
your life until you cancel the domain name.
And then you tell them who the name servers are.
But then you're done with your registrar.
And from there you'll interact only with your web hosting company, which
in CS50's case will be DreamHost.
But again, more documentation will be provided to you if you decide to go
that route.
>> So if you do this after the course's end, simply googling web hosting
company will turn up thousands of options.
And I would generally encourage you to ask friends who might have used a
company before if they recommend them and had a good experience.
>> Because there's a lot of fly by night web hosting companies, like a guy in
his basement with a server that has an IP address.
He has some extra RAM and hard disk space and just sells web hosting
accounts even though there's no way that server could handle hundreds of
users or thousands of users.
So realize you will get what you pay for.
>> For quite a while for my personal home page-- and this was totally acceptable
because I had, like, two visitors a month--
I was paying, like, $2.95 a month.
And I'm pretty sure it was in someone's basement.
But again, you don't get necessarily any guarantees of uptime or
scalability.
So again, you're typically looking at something more than that.
>> Well, what about SSL?
So what's SSL used for?
Let's now start to steer in the directions of security and things that
can harm us.
Especially as you venture out on your own.
>> What's SSL, or what's SSL used for?
Security, OK.
So it's used for security.
What does that mean?
So it stands for Secure Sockets Layer.
And it is indicated by a URL that starts with https://.
Many of us have probably never typed https://, but you'll often find that
your browser is redirected from HTTP to HTTPS so that everything is there
after encrypted.
>> FYI, using SSL requires typically that you have a unique IP address.
And typically to get a unique IP address you need to pay a web hosting
company a few dollars more per month.
So realize this is very easily implemented these days by buying an IP
address and by buying what's called an SSL certificate.
But realize that it does come at some additional cost.
And, as we'll try to scare in just a bit, it's not even necessarily 100%
protective of whatever it is you're trying to protect.
>> So for security, I'd thought I'd do sort of a random segue here.
As you might know from CS50's lecture videos, our production team has been a
fan as I have of taking really nice photography of campus, and aerial
photography most recently.
If you ever look up and you see something flying with a little camera,
it may actually be CS50.
And I just thought I'd share minute of some of the footage the team has
gathered, particularly as we look to the spring semester and next fall.
If any of you have a knack for photography, videography, we would
love to get you involved behind the scenes.
But more on those details in a week.
>> [MUSIC]
>> DAVID J. MALAN: Turns out there's a miniature golf course on the top of
the stadium that we never knew about.
>> [MUSIC]
>> DAVID J. MALAN: You can see the outline of the drone there.
>> [MUSIC]
>> DAVID J. MALAN: The best part here is, watch the jogger on the left.
>> [MUSIC]
>> DAVID J. MALAN: Another example of what you can do with technology that's
only tangentially, frankly, related to security.
But I thought that would be a more fun way of just saying, security.
So let's see if we can't scare you guys now with not only a bit of a few
threats, but also an underlying understanding of what these threats
are so that moving forward you can decide how and whether to defend
yourself against these things and at least to be mindful of them as you
make decisions as to whether or not to send that email, whether or not to log
into that website, whether or not to use that cyber cafe's Wi-Fi access
point so that you know what the threats are indeed around you.
>> So Jonathan referred to something like this on Monday.
He had a window screen shot.
This one is of a Mac.
How many of you have ever installed software on your Mac or PC?
Obviously everyone.
How many of you have given much thought to typing in your password
when prompted?
I mean, even I don't, frankly.
So a couple of us are good at being paranoid.
But consider what you're actually doing here.
>> On a typical Mac or PC you have an administrator account.
And typically you're the only one using a laptop at least these days.
So your account, Malan or JHarvard or whatever it is, is the
administrator account.
And what that means is you have root access to your computer.
You can install anything you want, delete anything you want.
>> And typically these days, because of dated design decisions from years ago,
the way most software gets installed is as an administrator.
And even if your Mac or PC has at least gotten smart enough over the
years with the latest incarnations of Mac OS and Windows to not run your
username by default as the administrator, when you download some
new program off the internet and try to install it, you're probably going
to be prompted for your password.
But the catch is at that point, you're literally handing the keys of your
computer over to whatever random program you just downloaded and
allowing it to install whatever it wants.
>> And as Jonathan alluded to, realize that it might say that it wants to
install your software that you care about, Spotify or iTunes or whatever
it is you're trying to install.
But you're literally trusting the author or authors of the software to
only do what the program is supposed to do.
>> But there is absolutely nothing stopping most programs on most
operating systems from deleting files, from uploading them to some company's
website, from trolling around, for encrypting things.
And again, we've sort of built an entire infrastructure over
the years on trust.
And so realize that you've just been trusting random people and random
companies for the most part.
>> And Jonathan alluded to too, sometimes those companies themselves are sort of
knowingly malicious, all right?
Sony caught a lot of flack a few years ago for installing what was called a
rootkit kit on people's computers without their knowledge.
And the gist of this was that when you bought a CD for instance that they
didn't want you to be able to copy or rip the music off of, the CD would
install, without your knowing, a rootkit on your computer.
Rootkit just meaning software that runs as administrator that potentially
does bad things.
>> But among the things this thing did was it hid itself.
So some of you might be pretty savvy with your computer and know, well, I
can just open the Task Manager or the Activity Monitor and I can look at all
of the arcanely named programs that are running.
And if anything looks suspicious I'll just kill it or delete it.
But that's what the rootkit did.
It essentially said, if running Task Manager, don't show yourself.
>> So the software was there.
And only if you really, really looked hard could you even find it.
And this was done in the name of copy protection.
But just imagine what could have been done otherwise.
>> Now in terms of protecting yourself.
A lot of websites are wonderfully gracious in that they put these
padlock icons on their homepage which means that the website is secure.
This is from bankofamerica.com this morning.
So what does that little padlock icon there mean next to the Sign In button?
>> Absolutely nothing.
It means someone knows how to use Photoshop to make a picture of a
padlock icon.
Like quite literally, the fact that it's there is meant to be a positive
signal to the user like, ooh, secure website.
I should trust this website and now type in my username and password.
And this has been conventional for years, as recently as this morning.
>> But consider the habits that this is getting us into.
Consider the implicit message that all of these banks in this case have been
sending us for years.
If you see padlock, then secure.
All right?
>> So how can you abuse that system of trust if you're the bad guy?
Put a padlock on your website, and logically, the users have been
conditioned for years to assume padlock means secure.
And it might actually be secure.
You might have a wonderfully secure SSL HTTPS connection to a
fake website .com.
And no one else in the world can see that you're about to hand him or her
your username and password to your account.
>> This though, perhaps, is a little more reassuring.
So this is a screen shot of the top of my browser this morning at
bankofamerica.com.
And notice here too we have a padlock icon.
What does it mean in this context in Chrome at least?
>> So this is now using SSL.
So this is actually a better thing.
And the fact that Chrome is making it green is meant to draw our attention
to the fact that this is not only over SSL.
This is a company that someone out there has verified is actually
bankofamerica.com.
And that means that Bank of America, when buying their so-called SSL
certificate, essentially big random, somewhat random numbers that implement
security for them, they have been verified by some independent third
party that says, yep.
This is actually the CEO of Bank of America trying to buy the certificate.
Chrome will therefore trust that certification authority and say in
green, this is bankofamerica.com.
And Bank of America just pays a few hundred dollars for that or a few
thousand as opposed to a few tens of dollars.
>> But here too, how many of you have ever behaved any differently because
the URL in your browser is green instead of black?
Right?
So a couple of us.
And that's good to be paranoid.
But even then, those of you who even notice these things, do you actually
stop logging into an otherwise secure website if the URL is not green?
All right, so probably not, right?
At least most of us, if it's not green, most likely you're just going
to be like, whatever.
Like, I want to log into this website.
That's why I'm here.
I'm going to log in nonetheless.
>> As an aside, Chrome is a little better about this.
But there's a lot of browsers like Firefox for instance, at least for
some time, where that padlock icon is, you can actually put any
icon of your own.
Let me see what the latest version of Firefox looks like.
So if we go to CS50.net.
>> OK, so they've gotten better as well.
What the browsers used to do is like, here's for instance [? SAAS's ?]
crest up here.
That's the so-called favorite icon for a website.
Years ago--
actually not that long ago-- that little shield would have been right
here next to the URL.
Because some genius decided that it would just look pretty classy to have
your graphical logo right next to your URL.
And design wise, that actually is pretty compelling.
>> So what did bad guy start doing?
They started changing their favorite icons, or their default icon for a
homepage to be not a crest but a padlock, which had
absolutely no meaning.
Other than their favorite icon was a padlock it had no
indications of security.
>> So the lessons here are a couple I think.
One is that there are actually some well intentioned mechanisms for
teaching us users about security even if you weren't even aware what green
meant or what even HTTPS meant.
But if those mechanisms get us into the bad habit of trusting websites
when we see those positive signals, they're very easily abused as we saw
just a moment ago with something silly like this.
>> So session hijacking comes into play, as we said before,
with cookies for instance.
And what does this actually mean?
Well with session hijacking this is all about stealing someone's cookies.
So if I open up Chrome here, for instance, and I open up the Inspector
down here and I go to the Network Tab--
and we've done this before--
and I go to something like http://facebook.com Enter, a whole
bunch of stuff goes across the screen because of all the images and CSS and
JavaScript files.
>> But if I look at this one here notice that Facebook is indeed planting one
or more cookies on my browser right here.
So these are essentially the hand stamps that represent me.
And now hopefully my browser will present this again and again when
revisiting that website.
But that only is secure, we said a couple weeks ago, if you're using SSL.
>> But even SSL itself can be compromised.
Consider after all the way SSL works.
When your browser connects to a remote server via https://, long story short,
cryptography is involved.
It's not as simple as Caesar or Visionaire or even DES, DES from a
while back in pset2.
It's more sophisticated than that.
It's called public key cryptography.
But really big and really random numbers are used to scramble
information between point A, you, and point B, like facebook.com.
>> But the problem is, how many of us again ever type in https:// to start
our website connection in that secure mode?
I mean, how many of you even type http://facebook.com?
All right, if you do, like, hello.
You don't need to do that anymore, right?
The browser will figure it out.
>> But most of us do indeed just type facebook.com.
Because if we're using a browser, the browsers have gotten smart enough by
2013 to assume if you're using a browser, you type in an address, you
probably want to access it not via email or instant message.
You mean HTTP and Port 80.
Those conventions have been adopted.
>> But how does redirection work?
Well, notice what happens here.
If I go back to Chrome--
and let's do this in incognito mode so that all of my
cookies are thrown away.
And let me go here to, again, facebook.com.
And let's see what happens.
>> Recall that the first request was indeed just for facebook.com.
But what was the response that I got?
It wasn't a 200 OK.
It was 300, or 301, which is a redirect telling me to go to
http://www.facebook.com, which is where Facebook wants me to go.
But then if we look at the next request, and we've seen this before,
notice what their second response is.
Specifically that they want me now to go to the SSL version of Facebook.
>> So here is an opportunity.
This is a wonderfully useful feature of just the web and HTTP.
If the end user like Facebook wants me to stay on the secure version of their
website, great.
They will redirect me for myself.
And so I don't have to even think about that.
>> But what if between point A and B, between you and Facebook, there's some
bad guy, there's some system administrator at Harvard who's curious
to see who your friends are.
Or there's some--
years ago, this used to sound crazy--
but there's some government entity like the NSA who's actually interested
in who you're poking on Facebook.
Where's the opportunity there?
Well, so long as someone has enough technical savvy and they have access
to your actual network over Wi-Fi or some physical wire,
what could they do?
>> Well, if they're on the same network as you and they know something about
TCP/IP and IP addresses and DNS and how all of that works, what if that
man in the middle, what if that National Security Agency, whatever it
may be, but what if that entity simply responds more quickly than Facebook to
your HTTP request and says, oh, I am Facebook.
Go ahead, and here's the HTML for facebook.com.
>> Computers are pretty darn fast.
So you could write a program running on a server like nsa.gov that when it
hears a request from you for facebook.com, very quickly behind the
scenes gets the real facebook.com making a perfectly [? esque ?] secure
SSL connection between NSA and between Facebook, getting that HTML very
securely for the login page, and then the NSA server just responds to you
with a login page for facebook.com.
>> Now how many of you would even notice that you're using Facebook over HTTP
still at that point because you've accidentally connected to nsa.gov and
not Facebook?
The URLs not changing.
All of this is being done behind the scenes.
But most of us, myself included, probably wouldn't notice
such a minor detail.
>> So you might have a perfectly workable connection between you and what you
think is Facebook, but there's a so-called man in the middle.
And this is a general term for man in the middle attack where you have some
entity between you and point B that's somehow manipulating, stealing, or
watching your data.
So even SSL is not surefire, especially if you've been tricked into
not turning it on because of how these underlying mechanisms actually work.
>> So a lesson today then too is if you really want to be paranoid--
and even here there are threats--
you should really start getting into the habit of typing in https://www
whatever domain name you actually care about.
>> And as an aside too there's yet another threat with
regard to session hijacking.
Very often when you first visit a website like facebook.com, unless the
server has been configured to say that that hand stamp it put on you
yesterday should be secure itself, your browser might very well, upon
visiting things like facebook.com google.com, twitter.com, your browser
might be presenting that hand stamp only to be slapped down and said, no.
Use SSL.
>> But it's too late at that point.
If you have already sent your hand stamp, your cookie, in the clear with
no SSL, you have a split second vulnerability where someone sniffing
your traffic, whether roommate or NSA, can then use that same cookie, and
with a bit of technical savvy, present it as his or her own.
>> Another attack you might not have thought about.
This one is really on you if you screw this up in writing some website that
somehow uses SQL.
So here, for instance, is a screen shot of Harvard's login.
And this is a general example of something with a
username and password.
Super common.
So let's assume that SSL exists and there's no man in the middle or
anything like that.
Now we're focusing on the server's code that you might write.
>> Well, when I type in a username and password, suppose that the PIN service
is implemented in PHP.
And you might have some code on that server like this.
Get the user name from the post super global and get the password, and then
if they're using some pset7 like code there's a query function
that might do this.
Select Star from users where username equals that and password equals that.
>> That looks, at first glance, totally reasonable.
This is syntactically valid PHP code.
Logically there's nothing wrong with this.
Presumably there's some more lines that actually do something with the
result that comes back from the database.
But this is vulnerable for the following reason.
>> Notice that, like a good citizen, I have put in quotes, single
quotes, the user name.
And I put in single quotes the password.
And that's a good thing because they're not supposed to be numbers.
Typically they're going to be text.
So I'm quoting them like strings.
>> And if I now advance further what if-- and I've removed the bullets from the
PIN service temporarily--
what if I try to log in as President [? Scroob ?]
but I claim that my password is 12345' OR '1'='1, and notice
what I haven't done.
I did not close the other single quote.
Because I'm pretty sharp here as the bad guy.
And I'm assuming they're you're not very good with your
PHP and MySQL code.
I'm guessing that you're not checking for the presence of quotes.
>> So what just happened is that when your user has typed in that string,
the query you're about to create looks like this.
And long story short, if you and something together or you or something
together this is going to return a row from the database.
Because it is always the case that 1 equals 1.
>> And just because you didn't anticipate that your users, good or bad, might
have an apostrophe in their name you have created a SQL query that's still
valid, and will return now more results than you might have intended.
And so this bad guy now has potentially logged in to your server
because your database is returning a row even if he or she has no idea what
[? Scroob's ?] actual password is.
>> Oh, I realized a typo here.
I should've said password equals 12345 like the previous
example or 1 equals 1.
I'll fix that online.
>> So why did we have you using the query function with question marks?
One of the things the query function does for you is it makes sure that
when you pass in arguments after the commas here like this that the query
that's actually sent to the database looks like this.
A lot uglier to look at, but back slashes have been automatically
inserted to avoid precisely that injection attack that I showed a
moment ago.
>> Now a fun XKCD that I thought I'd pull up here that hopefully should now be a
little more understandable is this one here.
>> A little bit?
Maybe we need a little more discussion on that.
So this is alluding to a little kid named Bobby who has somehow taken
advantage of a website that is just trusting that what the user has typed
in is not, in fact, SQL code, but is in fact a string.
>> Now you may recall that drop--
you might have seen this-- drop means delete a table, delete a database.
So if you essentially claim that your name is Robert";droptabl
estudentsomething, ]
you might very well trick the database not only into checking that you're
indeed Robert, but semicolon also proceed to drop the table.
>> And so SQL injection attacks can actually be as threatening as this
whereby you can delete someone's data, you can select more datas than
intended, you can insert or update data.
And you can actually see this upon at home exercise, not for malicious
purposes but just for instructional, is any time you're prompted to log
into website, especially some sort of non very public, very popular website,
try logging in as John O'Reilly or someone with an
apostrophe in their name.
Or literally just type apostrophe, hit Enter, and see what happens.
>> And all too often, tragically, people have not sanitized their inputs and
made sure that things like quotes or semicolons are escaped.
Which is why in pset7 we give you this query function.
But do not under appreciate exactly what it is doing for you.
>> So with that said, enjoy using the web this week.
And we will see you on Monday.
>> At the next CD50.
>> [MUSIC]