Tip:
Highlight text to annotate it
X
Okay, let us begin for today for our workshop
for today a couple things to make note of
before we even begin, I posted and I'm just going to repost this again
just in case to be, here is a link to our -- the GitHub accounts we'll be using here today
everything in there should be, can be used to create everything that we're
talking about tonight specifically. So you should be able to all the the files
needed for it within juptyr and also the files needed to create the grass
network and also shape files and all the stuff to get into to create maps and and
networking. So also, if you go to that link you'll see another link that'll go
through the presentation itself. For today what we'll be doing specifically
is looking at creating a network graph and also a map as well. It's one of the
things I was able to learn from yesterday's conversation with the
group here sort of the last presentation was trying to limit some of the specific
conversations you just took most of the coding involved and just kind of making
sure people sort of understand what the thought process and actually making
visualization. So there might be a couple small changes if you happen to look at
the code. Earlier today I've made some small changes on GitHub and so for the most
part a lot of its the same but for some of this is small changes here and
there but I'll talk about that as we go forward. I'm going to share my screen here
a couple things to make note of like I said, first thing that some of the expectation
is that you are able to install the Anaconda program. Anaconda is sort of a
way for us to kind of put some code together and run it within the browser
itself. if you happen to not have been at the first workshop, one of the issues
you can run into in a lot of this is the fact that if we were trying to develop
some of these d3 visualizations, because it's a JavaScript library and this is
something that's within HTML itself it's something where we have to set up an
actual web server,, kind of set up configurations for that
jupyter kind of lets us sort of a lot of that stuff and so whenever you
open up jupyter,
You're going to see a couple tabs, if you download it's one of the newer ones you see
something called jupyter lab. It's actually something that's been
relatively new in terms of some of the new progression in some of this and
might be something we'll use for future workshops, but for the most part
we're going to be using the jupyter notebook itself so you're gonna click on that launch
this directly from there. If you go to set up an account, the way to actually
download this and get this sort of working for you, is actually go to
under dudaspm, say data workshops you're gonna see a couple listed there:
one for the last one one for this ones that workshop - all you need to do is
actually click on clone or download and you can download it as a zip. If once you
actually open up anaconda if you don't know exactly where to put that
particular folder you can actually go into projects and then whenever you add
a new project so yesterday I added one called test they don't tell you the home
directory so for me the users PMD 19 and then I have a folder I specifically
created called Anaconda projects and then anything that I need to put in there
I'll put in there or for the actual presentation so for that I would
specifically go and say clone and download, download to zip, it'll download for
me and then I would just take that, unzip it
and drop it into that Anaconda folder that I just mentioned... let's
throw it into...
and start the copy over. And so for us it's going to be, you're
going to see a new folder called DataWorkshop - Master and you're going to see Workshop 1 and
Workshop 2 - one thing I'll make note just in case you're just joining the
conversation. Everyone is muted at least from what I see here.
So if you have questions please use the chat and I'll
keep an eye on that throughout the conversation even if you just kind of like
raise your hand or something like that I can sort of unmute people to make sure we have get all
questions answered. But, we're gonna be going through and using Workshop 2
specifically for today so if you go once you load up jupyter it will open up in a
new tab and if I go to the home directory like I said there is I have a
folder that I specifically created called Anaconda project. And you're going
to go into there and find DataWorkshop - Master and everything is
going to be inside here for for you to utilize for today. The first thing we
are going to be talking about is creating the network graph itself and I should actually
get to my slides real quick just to kind of keep on target. I actually should be
doing a quick little introduction sort of, of myself. Just that this is the
second sort of workshop that I put together. I also would like to say special
thanks to the Institute for CyberScience, they sponsored all of these
these sponsored these workshops for me and teaching and learning technology has
helped as well especially specifically finding a location here on campus for
the light of them. But I specialized in data visualization, social
media and data analytics and I support here at the Hucks Institute for Life Sciences
basically for CyberScience and Eberly College of Science as well and for you know
like I've mentioned before these workshops sort of incorporate sort of the
art, the data, the coding and the UI/UX or which or is user interface user
design. So I very kind of talked about sort of the startup screen, environment
variables and in terms of.. Oh, one of the things we're going to be
doing today is also we'll probably be skipping this gap step specifically this
is specifically with Python but I'll come back to that in one second. So for
us we're going to be in the network folder. Originally yesterday when I went
through these I actually had a script here - in this first cell here and
to actually go and run some of this information you can actually see that
it's going to have different cells so this first one here that is highlighted green
is one cell. One of the things you're going to see is the fact that you can go
Cell, and then Run Cell. And if you would happen to run this first cell it's not
going to work out specifically because of the fact that I was actually going to
showcase how to install a couple libraries. This is a Python scripts right
here and what this is actually creating is a random graph. Just to kind of talk
through some of these things, one of the things you need to install is something
called Network X. It's a library that's open source that lets us create these random
graphs. But for today I actually went through and created a random one or
actually downloaded a pretty popular one for for network graphs and I'll be using
that specifically as a way to sort of showcase how to create the
visualizations themselves. So this first part here might, we won't be actually
utilizing too much today but I kinda want to speak to something
that's a little bit more known if you will and this is actually a dataset
tailored to Les Miserablés and looking at characters and their occurrence within the
story structure. So you'll kind of see you know main characters sort of be
highlighted but you see sort of the back and forth between these two, these
characters and this is actually kind of one of the default data sets that a lot
of people utilize for network graphs so I decided to use that again. A couple
things to note: it's kind of a lot of information here but for the most part
the things that I'll kind of highlight is the fact that you have two sections:
kind of break those out for a second; that is nodes and the unlinked if you're
the basic 1,2 in terms of network graphs is very kind of crude drawing in
terms of what I'm showcasing here but if you a lot of times when you think about
networks it's specifically something like a social network or a conversation
network or email network of some sort. You know individuals and at some point
these people like sort of converse with one another so we're assuming Sally
talks to Matt, Sally talks to Bob and Bob
talks to Dave. And how we're going to represent that today is
we're going to review representing them specifically
using something called nodes or another term that people use is vertex and links.
Links or edges basically. And so we'd have four nodes for our four
individuals and then links connecting those two individuals as well. In this
case up here I have a directed graph so that basically means Sally is talking
directly to Matt but for our case today we're just going to assume that they're
sort of equal in terms of their undirected if you will. And so we have
nodes and links sort of set up for that specifically. Another thing that
we kind of see within this data set is something called clusters or groups. And
all this basically means that the if you would look at this network sort of as a
whole a lot of people sort of generally sort of sort of converse or talk to one
another so in case in this case if you have network like this you'd have
basically because these people are talking sort of a lot with each other
and they're separate from sort of the rest of the graph they more than likely
be sort of created into a group a community, cluster, clique or, like I
said there's different terminologies for this. Today I'll just be referring to them
as a group. In the original code
in my little script up here -- In my script here basically I also included a
community basically algorithm again I'm not going to go to too much detail
because it's a little bit outside the scope of what we're trying to accomplish
today but for the most part it's basically it'll sort of do this
automatically for you if you will. And this is just again sort of partitioning
the network so it says well these people are highly coupled or highly
communicating with each other, so probably put in a group versus ones that
are just sort of separate from that particular sort of a group and sort of
conversation if you will. So for our nodes we're going to have names so the
individuals within again the story structure and then group is already sort
of calculated for us and so each of these nodes basically has those two
types of information our links are going to have basically three types of
information that's a source, a target, and a value. Value is technically not needed,
value in this case is basically sort of how often these people sort of
converge or sort of talk with one another if you will and so that's sort of
like meta information. The most important two things is basically the source and
the target. The source basically saying, indicating one meaning our node so if
you kind of go through here we have Node Zero, Node One so that would be
Napoleon and target would be Hyriel. And so for this basically kind of
work through this and so this is your way of connecting the dots if you will.
Literally and sort of figuratively as well. So where we're actually going to
start is on this script part here. And what you're going to see is if you again
if you would run this and you go to cell click one cells nothing's really going
to happen at this point and to be quite honest with you this subsection here
kind of redundant. I do this specifically just to showcase how you can add
additional little pieces of meta information or other types of information that
you might be interested in adding as well this is basically going to be the
structure for the rest of the visualization part of it. One question I
got from last time as well is basically within D3,
is there ways to actually import files? And there's actually a couple different
ways of doing this. The file format that you're seeing here is something called
JSON and so what we can actually use in D3 is a basically a JSON reader. JSON
basically is sort of a way of formatting data so you can sort of remove some of
the redundancy that can kind of occur with like something like a CSV or comma
separated values or TSV, the types of separated values. And so what this will
do is it'll produce two different things for us. It will create it creates the
benefits of graph and two parts: nodes and links. And those are basically
subsections. So from there we're just running through each of these so that's
what that for each basically means it's for each of those nodes. We're going to
create again this is very much redundant and I sort of acknowledge that but again
this is sort of just making sure we connect you know that we have all our
information out front. So first thing is basically you're gonna have the graphs
for each we're going to create a new array so it's sort of the same thing and
then for source and target same idea we're just going to make a reference to
the nodes up here and to sort of just make it sort of make those connections
for forward for d3. And if you want to you can actually change this back to the
graph.JSON it should work as well and for the most part if you kind of follow
this script where you have you know naming structure you maybe have a group
you don't necessarily need this but it's a group and then sort of source and
target for the most part you can just use this script over and over again. So
for us this is actually where we start getting to the visualization part of
it and if I run this you're going to see sort of at the bottom here our
network picture so this is actually the Les Miserablés data set being
visualized and as you can see if I rerun this it'll you know it will sort of
based on the data that it's currently has it'll sort of restructure this
information for us. So for the most part this is going to be the same thing
throughout and there's you make small changes throughout as well but for the
most part I kind of want to spend a little few moments just kind of making sure
that we sort of see what's going on specifically
I also added a couple places in terms of I'll just kind of I mean I'll spend a few
moments here just because this is sort of the main script and there's gonna be
some parts of this that I'm going to just sort of say assume that I'm not
going to go into a lot of detail with just because a lot of it sort of it's handled
in the background. I'll have links that you can kind of see throughout that's like
it kind of gives you you know if you want to change this variable what can
you change it with and things of that nature. The first thing you'll notice is
that we have the % % % JavaScript all that does is say to Jupiter is what's
going on here is JavaScript and we just have to declare that once. The second two
parts here are again sort of specifically for jupyter we're adding an
element called a div. A div is basically it's called it this basically in HTML a
divider it's basically a placeholder for us and as you can see here we have a div
id equals graph one and all that means is basically that we're going to add an
element which happens to be right here nor the graph for the moment and it's
going to be our placeholder and it's gonna have an ID called graph 1. And
this is and we can put some style information in terms of sort of the
width and the height and the margin:0 auto basically means we're going to
Center that specific div. So all that is is basically sort of a placeholder for
us and this is something you probably copy and paste in any type of project
the next is a required statement all this does is basically go out find this
d3 library so this is something that's online and lets us sort of use this as a
JavaScript library and one thing on those is d3 is an open source library so
basically you can utilize it for projects or for profit, things of that
nature, commercial, academic research as well
it's basically open in that regard as long as you sent them I
believe the MIT license agreement so you just basically specify that that'll give
you sort of a pathway to say that anytime you referenced d3 that you kind
of see here we're referring to that specific library. So one of the things I
also have here is the original example, so if you're interested you can actually
go check out this link this is where the original project is done and there's a number
of different ones Mike Bostock is basically the creator of d3, used to work
for the New York Times, did all but one of those New York Times visualization
that you might have seen in the past but he's actually was the creator of
d3 and ProtoVis this which was one of the original versions of d3 but the best
part here's a number of different examples that you can kind of go through
and try out play with some for curved edges, fisheye distortion, we'll go into a lot
of these for today but sort of a great resource to utilize if you're interested
in looking at different projects. So for us the first thing we did is like I said
we create that div we give it an ID. The second part here is the Select statement
the Select statements basically say "On this webpage I want you to find me at
the div", and this hashtag or pound sign depending on how you want to call it at
this point in time, basically says to the browser "ID equals" so this sort of reads
out div with an ID equal to graph one and that's what we've created it at this
point. All this line does is basically it says, "I want you to go select that div
and anything that's inside of it I want you to remove it." And the reason I have
that is the fact that if we didn't have this statement and I can actually show
that now if I remove this and I'd run this again we would actually start
getting a couple a little bit of redundancy almost to actually show all
of it. But yeah we would actually it's towards the bottom of the screen
unfortunately, but it's basically it's going to create some redundancy on
our graph and if our graph would sort of overlap with one another you'd actually
see this so I won't go into too much detail, basically there's a few [inaudible] that you'd see each
time. The main things is basically some some of the key characteristics or some
of the variables are like width and height we set that ourselves you can change
those as you see fit. The first thing is actually we're going
to be creating a sort of substructure in terms of an SVG. A lot of D3 uses
something called scalable vector graphics. SVG sort of I'll be sort of
referencing that from here on out. And so the nice thing about the SVG is the fact
that it's a situation where it uses an XY coordinate plot and no matter how far
we sort of zoom into this so if I go back to my example here and I
started zooming in you can kind of see that the actual graph itself doesn't
deteriorate at all because what it's actually doing is just recreating the
extra large coordinate plot it knows that these circles are supposed to be at
a specific location and those links are supposed to be at a specific location so
they just kind of redraw that specific space which is really nice especially if
you want to use things like this for for publications or posters basically
they're kind of almost like infinite size is basically to whatever specific
specifications you need for that particular again document or that
particular poster. And so what the first thing we do is basically we need we
need to have so we have our div to place our graphic in we need actually have the
SVG sort of specifies we know that we need to put the SVG object somewhere. So
one of the things I also added to the documentation here is as you can kind of
see is a link to something called w3 schools. w3 schools is actually sort of a
resource I'll use a lot of times just to kind of get some basic ideas of how to
create or develop code for things like SVG or HTML or CSS but basically it's
part of a wide library for you to try out and sort of play with in
terms of some of the basic code if you will. And I have a couple of specific
links for the circle and the line objects and considering that's what we
are using today to create the the nodes and links on this on these given graphs.
So if we go into the circle when I, like I mentioned before it sort of gives you
sort of a template of what you need to create this particular object so again
you sort of see that SVG so the case your outline so you have to have sort of
that first object and then inside there you have to put whatever
sort of it's called a viewpoint but basically it's sort of a
canvas or a way for you you know first to sort of to say that there's going to
be an SVG object here and how I think sort of the basics in terms of the
height and weight with this is specifically for the other thing that mentions is
sort of like the circle we creating we're not going to worry too much about
the fact that we need something the CX and CY which is basically the location
because we're using the force-directed algorithm this will sort of be
calculated and set for us pretty much automatically. But the other thing really
is this R which is the radius as well and we can hard code that or we
can sort of have it based on number of connections or things of that nature. So
this press is basically just sort of lets you sort of stylize, so stroking
the outside boundaries so that'd be a black color and then fill will be red
obviously and then sort of stroke width is basically how pronounced is that stroke
specifically. The other key piece of information we're going to need is the
line so again sort of shows you just the basic example in that this case for line
we're going to need four pieces of specific information that is the
starting x and y position and the ending x and y positions and that's designated
as x1 y1 x2 y2 and again some styling styling as well but for the most part
this is what we're going to need. And again the force directed graph
algorithms actually gonna take care of this for us. We don't have to worry too too
much about that. And then going back in that'll kind of give you an idea of what
key characteristics we're going to need and so we have here circle, radius five
and then line, just some basic information. But at the bottom is where
it's actually going to be specific to where the locations are going to be for
that for us that's going to sort of help us represent again sort of those nodes
and those links that's what we're actually to start here in terms of some
of these characteristics and then we'll come back to this part that says
simulation. But it's going to do the simulation for the force directed graph
itself but for now we're just going to start here with four links and what
we're doing is once we create that SVG object so that outside container we're
going to append in this case you're going to append a new group of points or
objects and specifically we're going to be
adding a line and then this point is actually where we're going to start
feeding in the data so again this links is referring back to links here again we
did this a little bit redundant but for the most part we're using that those
links to sort of highlight you know for the most part it's basically the
array of information so it links the source, the target, and the value and so we're
just feeding that into d3 to say we're going to need you to make as many links
as we have in our data set. So if we go in here you have... I'm not going to count
all these but for the most part we'll say 200-300 links, we didn't want it we don't
to manually sort of encode all this so we're just letting it sort of
say for each of these values in our array I'm kind of highlighting them one
at a time we need to create a new line if
you will and then the same idea for nodes as well for each of these we're
going to create a new circle or nodes in this case. And so you did this for links
and we do this for the nodes themselves. And we have circles grow nodes, links -- lines
for our links and the one thing you might notice at this point is the fact
that I don't actually declare their location all I'm just saying at this
point is we're gonna have 200 links, 100 nodes. Let's put it that way, and
one other thing you might notice as well this coloring as well and I'll come back
to that one second as well. So for this if we would just run this at this
point then I can actually just show you what happens I've actually nothing but
actually shown on the screen at this point because I have not given them a
location to put their the nodes and optionally lines themselves that's where
actually this simulation part comes into play and this is one that's built in
natively into d3. The force directed graph using specifically the force many
bodies sort of approach to this think about it in terms of for each of our
nodes and our graph they're going to maybe have a sort of a pulse rate our
charge between them so they're going to push each other apart.
Links are going to be basically sort of set in terms of we want them to be these
sort of this long you know 10 pixel long each time. Anyway you can set and change
those and I'll show you how to do that moving for
but for the most part we just sort of need to declare you know that we're
using this simulation and sort of just some basic information at this
point. One of the things you might see is this center part, all it does is say well
we're going to have you know on the screen here where you want to actually put
your force directed graph. And so we set the center to width, half, height, half
basically the center of that there for that object. That just basically sort of
works in that regards from there the simulation we just have to tell where
are the nodes and again this node here refers to our array up here so this
array and we tell them where all the links are again we're going to say the links
from the array as well. So it basically knows that it needs to work with these
nodes they're going to be on the screen so about roughly a hundred and we know
that these certain links are sort of connecting all these these nodes as well
so for us it's just sort of if the simulation takes care of a lot of the
thinking about sort of where to place these actual objects. So for the most part
we've kind of covered most of what's going on we sort of just have to touch
on the fact that we need to know where the placement is. That's where this
ticked function comes into play. You can see here on tick, we want you to do
something. Basically when you're running a simulation it's going to sort of try
to get you sort of a final position and it's sort of gonna try to get you sort
of an optimal position for this network graph. Again, I'm not going to go into too
much detail in terms of the algorithm that's sort of running in the background
but for the most part we're just using this to say for each of these ticks and
these ticks happen very rapidly so we're going to run for about 200 and 260 of
them within a second or so and for each of these ticks and as it gets closer to
a final sort of sort of simulation, final sort of layout if you will we want you
to move our link move our nodes to their actual sort of their actual gravity sort
of locations. And so this is actually taking this care taking care of this for
us so for each of our nodes it's setting that the x value and the y values to
wherever this sort of simulation needs to put it
and again the attributes for x1 and y1 so our starting XY position and our
final XY position is going to be handled by where the source and the targets
should be live basically. And again this is something that you use
kind of copy and paste through London's projects because this is very standard
for a lot of these projects. So for the most part you're going to kind of utilize
this as a good starting position and if you run this, like I said, you're going to
get sort of this output moving forward and if you want to you can
change some values just to kind of see what it looks like
so if I really want to go with the stroke width that's basically the how
thick the lines are between each of the links. I can rerun this and
now they're really really thick because I change them from two pixels up to ten
pixels and you know if I want to and come back here,, go to one pixel, rerun
this again, and now they're a little bit obviously a lot thinner at this point. You
can tie that the width to sort of values, and I'll share that in a couple seconds
here and contains a lot of things based on the data itself. D3 specifically D3
stands for data-driven documents it's going to help you with a lot of times
with making sure that you know you're taking just data or information and
repurposing it for the visualization that you're trying to put together
basically it's going to take care of a lot of the things that lets you use the
data to sort of create the graph so it's less you know less manual processing
that's actually going on less human error, you're representing the
data actually very accurately in that regard so it's basically sort of taking
a lot of that nuance and actually sort of helping you through that process and
again you can sort of control a lot of these variables one of these changes. One
thing you might have noticed is, I mentioned before, is color -- coloring,
basically d3 has a standard way of doing this. You can, these are sort of the
default values I just need to include this line here and what it'll do is
actually sort of create an ordinal scale basically it'll let me sort of say for
each data point within from 1 to 20 so this is category 20
so there's 20 colors that I can sort of utilize it's going to automatically sort
of take care of a lot of that issues for me so it's not I don't have to
specifically say for group zero I want you to be this color for group one I
want you to be this color and so on it's further, it is basically going to use this
color spectrum and for each new sort of data point if you will it's going to
color them appropriately within 20 colors if you because I'm using that
specific one if you're interesting and check out the website there's enough
there's a number of different color schemas that you can sort of utilize if
I want to I can set this 10 we run this again
and for the most part you're going to see that we're gonna have a little bit
redundant at this point so we have a group here that's blue and blue over
here basically all that means is in our data set here more than likely we have
groups that go past 10 at this point maybe up to 11 or 12 so basically that's
why it kind of goes over that threshold. So I'll set that back to 20... okay, if I
click let me run it and for the most part it should reproduce back to sort
of having those at that scale again. So that's how you sort of start off with
creating these network graphs from here basically we can make some small changes
here and there to highlight certain things, to change different variations,
change of styling, to sort of help sort of increase some of that knowledge
of here, or sort of that highlight certain things and style certain things
as well. So at this point basically if we run the next cell
It should sort of, you should see something that sort of looks similar to this it's gonna
be like slightly off maybe a little here and there for the most part but for the most
part you should see a network that sort of looks like this. Again this is using
that same data set. Some of the other things we can add to this is within our
simulation is the fact that you can add again, we've talked about this
mentioned this before, but you can add distances between each of these links so
it knows how far to push each of these links and, excuse me, all these nodes
apart and we can actually be a little bit clever in this regard. So in this
case, all I'm saying in this function here is that if the group, the source
node, or the source circle is in the same group as the target one, so in this case
if we go down here so these orange ones there's a link between these two oranges
and for the most part we want to say that if that's true we want to bring
them really close together so I'm going to change this number to five and if
this is a sort of true/false statement, so if they are, if this is false
this number here will apply. I'm gonna make this a hundred, and so if I rerun
this we should see that basically all the ones in the same group sort of pull
themselves closer together and all the ones that are sort of outside of the
group so basically everybody's kind of really close together and then links
that are outside of so if there's a light blue here to a peach color here
basically that's gonna be a hundred pixels apart they're going to push each
other apart as much as they can. And you can sort of, if you can imagine sort of playing
around with this this sort of again sort of highlight groupings or if you wanted
to do it the opposite way, so if I change this and hundreds this way and say ten
this way you're going to sort of see it's kind of all kind of meld together
if you will because of the fact that again we're not we're not we're biasing
towards pushing group nodes away from each other and then pulling
nodes that are well outside of groups or memberships closer together so that
and there's a couple other sort of functions we can do as well for the most
part we can bias where the x position in the Y position bar. At this point in time, so
if I wanted to... Oh, one other thing I mentioned to as well,
you can actually not only just set a distance between each of these nodes and
you can actually set a strength as well. So let's assume that we were able to set
our strength for all these lengths to zero. If you run this again you're going
to find that you're going to have a very sort of relaxed sort of relaxed network
if you will because there's no sort of force pulling each of these nodes
together because they're so it's sort of ignoring all these links on the screen
and you can sort of set that between zero and one, so I'm going to set it to
point five and for the most part should look like this back to 20.. and 50. We can sort
of set it to that. A couple other things you can also see as well you can sort of
set an x value and the y value so you can kind of specifically try to be in a
specific location, and also things like charge so charge is basically how
much the actual force directed graph sort of pushes nodes away from each other.
And I have a note here that basically says if it's a positive value in the
stream at this point is negative it's actually sort of using a charge that's
trying to push all the nodes away from each other. If we set that to a positive it's
basically -- I'll do that now, show you what happens. It's basically all the nodes are
trying to pull each other closer together towards the centre position.
it's not very informative, obviously. If you want to you can go and say, let's set
this to 300, and it's really going to start pushing all the nodes
sort of away from each other.
I use sort of a, that's sort of the charge factor, the collide factor basically
is the actual node repulsion itself so if I set this to, let's say 10
again it's going to do just specifically with the nodes themselves, sort of push
all the nodes sort of away from each other. And so if I set this... and again,
you can sort of play around with this if you want, just to see how these different
values change what's going on so, and if set this to 100 they're really trying to
push each other apart at this point. Again, you can change things as you see fit.
Something kind of cool, and fun to play with if you will
moving on to the next cell the only thing that's going on that's a little
bit different here is that I changed up the stroke width here to be sort of
utilizing that value that we've talked about before. So all this is is
basically saying, for this stroke width I want you to actually use the data to
sort of showcase how thick the lines are. So if we run this, we're going to see
that we have a lot of dark lines between some of these individuals, and for the
most part all this is telling us is the fact that these individuals are really
sort of community or interacting with each other within this Les Miserablés
dataset if you will. Or it's these inner lines are sort of saying that maybe
they only co-occur, happens only once throughout the story. One last thing I'll
mention is you can actually do something called dragging, and this dragging
specifically allows you to click on a node so I'll do that here and sort of
drag it around so you can kind of make it sort of a movable object. And all you
need to do is actually add a couple of different sort of set criterium in
terms of a drag started, drag while you're dragging, and then drag ended as
well. And this is something that a lot of times you're again you're going to sort
of have a default if you go back to this force directed graph here it's these are
basically just sort of default values. It's not something I've put together
it's basically they're calculations and things of that nature
so a lot of times I'll just copy and paste this but for the most part just
all it does is allow you to sort of add a little bit of functionality to it in
terms of now we can sort of set our graph, so we pre our grap, and now
somebody can go and click on it, drag it off this position here,
move it around, see what happens as you move it around. And if I change it say to
collide... So everybody's really trying to push each other away back down here and
click on this node and as I move it around you see basically all the other
nodes are kind of trying to get away from it if you will think about this in
terms like two magnets with the same sort of positive charge of one another
if you're showing the repulse each other at that point. So that's kind of heading
into the first sort of point in terms of this is how you create that set network
graph. And for the most part again, try to go through and just maybe change for
the value see what's going on maybe put together your own data set to see what
happened with with some of this you know, how this should look in this regard and
then sort of just sort of play around with sort of some of these changing
values as you move forward. So that's the network graph sort of portion
of the conversation. What I would like to do now is actually head
over to sort of the map structure at this point. So network graphs basically
sort of are handled in this data set and if you go into our dataset here in
terms of the Workshop 2- you're going to see basically that we have another file
called Map and they'll open that up here and from here and basically there is
some kind of this going on and I'll walk through this as we kind of
need to. The first thing I'll make note of before I move on to that part is.. go
back to my presentation here.. I skipped over a little bit of this information a couple
of sort of topics in terms of network graph theory... you know, ways you can add
sort of information to this stuff. What I recognize is basically kind of
walk through this and take a look at it at some point for the most part it's
just basically some centrality measures so you can add to that information
and ensure it's in the graph - and then make it more interesting visually
there's some some social science and topics as well that
you can kind of look into and again I'll kind of let that go just for the
conversation for today. On that, we're gonna actually go through and
talk about and I'm going to kind of skip a couple slides for a moment here and it
should sort of outline today's project if you
will. Our project is basically going to be I want to create a map of
the US including Alaska, Hawaii, Puerto Rico... basically anything that sort
of under the U.S. umbrella if you will. And for the most part we're going to then
limit this map to Pennsylvania and this is going to be by County. I'm going to go
out and find data for each of these counties and then basically color code
our counties by population. So this is sort of outlining what we're
going to be doing today. A couple things I'll make note of is the
fact that one of the main topics we'll be talking about is something called a
projection. One thing you have to consider for when you're creating maps
is the fact that you're trying to recreate an obviously three-dimensional
object which is obviously there and you're trying to project it onto a
two-dimensional plot which is your screen. So you have to think about how do
we project that on that screen. And I kind of like this diagram in terms of if
you would put a light bulb inside the earth and sort of.. how this would actually
project out and how people, you know, typically will see a normal map and how
we can sort of lead to some incorrect sort of proportions a lot of times
especially on the northern and southern parts of the hemispheres. So there's
actually an interesting project I recommend going and trying out it's
called the true size it's kind of a cool little thing basically you can sort of
type in a country and/or a function and it'll give you the size and again this
is sort of a default map that we typically would see in a Google map or
anything like that. The one I chose in here basically have the United States
and then I also have Greenland. So this actual small little blue object here is
Greenland. A lot of times people sort of assume that Greenland is sort of this
massive area, which it is but not this size. Basically
the issue is that you're projecting the top part of that sphere onto that two
dimensional map if you will and its actual size is you know smaller than the
United States. Well, quarter the size, if you will. The continental United States. So
check it out it's kind of a neat little way to sort of showcase the different
proportions that are actually true in terms of the
actual sizes of these continents or countries. For us, the first thing
you're going to do is you're going to go out and try to find something called a
ShapeFile. Now one thing I'd like to note is the
fact that when we're talking about creating a map one of the
questions I got yesterday was the fact that, "Can't I use just something like an
ArcGIS or a QGIS to sort of create a static map?" And that's very true, what
we're going to see today is not going to be sort of groundbreaking in terms of
things you might would do in software packages like ArcGIS for the most part
you're going to have this is just sort of starting in terms of just trying to
meld two pieces of information together. That being the ShapeFile of the
United States and then adding some additional information like the
county population information and you can sort of build out from there. But for
the most part, if you're just looking to create a static image ArcGIS is going to
be sort of one of your best places to go to start and put this together or QGIS
is another resource as well. But I'm still going to walk through just because
again, you can sort of build out more dynamic Maps, some interesting things
that you can kind of put together. But the first thing that we need to do is
actually find something called a ShapeFile. A ShapeFile is basically sort
of.. it's how these maps are being created and it's basically it's sort
of an agreed upon standard in terms of a lot of mapping structures. So if I
use a ShapeFile sort of from this location
specifically I'm using from the Census Bureau, basically the map of the
US, we can sort of have an agreed upon sort of way that the you know the United
States is going to be formatted and how it's going to look and it's dimensions
and it's going to use some things like their latitudes and
longitudes actually etch out all the states. Again, sort of using Alaska and
Hawaii as well. But it's sort of an agreed-upon sort of standard if you will.
If you go to this website and again I'll make a note of the fact that if you go
into our folder that you downloaded you should have all the information so we
don't just sit here and sort of wait for all these things to download. If you go
to that website you can kind of see that you can download different things in
terms of the congressional districts, zip code sort of map which is a very
large file because it's actually a lot of detail but people don't really know
sort of. I've created zip maps before, but for the most part County sort of has enough
little small details for us to kind of play around with. And you can download these
and sort of try these all out but for the most part, I'm going to talk about the
County one. And that's what this folder, this file called cb_2016_us_counties
zip folder is. Again, if you were using something like an ArcGIS you would just
open this up and there'd be sort of you know add the additional, you know, information
like another you know outside folder like a raster folder or file of some
sort, but for us we're gonna kind of walk through all these four ways to actually
create the map using d3. There's a couple resources that you can actually use
through ArcGIS to do the sort of to create sort of what data structure we
need to use, but I actually have a couple of links in my presentation -- a
couple of ways to actually convert this so we can handle this on basically these
are open source projects that allow us to do it.
Pretty straightforward. This first one, you would just take that I'll show you what
I mean... You would just take that zip folder of the ShapeFile, drop it right here, and
click convert to JSON-- GeoJSON. And for the most part, all you need to do is
sort of copy and paste this screen. For the most part, that's going to be enough
to get it kind of get us started in terms of this is the structure of the
data again so taking care of sort of just placing
out the actual states and also providing this and that sort of guidance in that
regard so we don't have to actually sort of start from scratch if you will. So all
I did was drop that ShapeFile in and it exports this out for me automatically a
nice little resource, again, free sort of resource online for us to utilize. The
other one I have listed here as well is something called mapshaper and
mapshaper is really nice to help us especially with very high detail aps if
you will. And so if I want to, and you go back in here -- so
this output is the one that's called us_json
I can just drop it into here
and it'll actually show you basically give you an outline of what you would
actually see in the ShapeFile itself. Again you're gonna see Alaska, Hawaii, and
so on and so forth, the Continental United States as well. You're going to
notice something as well under this over Alaska, because it sort of wraps around
the world it actually has parts of it sort of sticking out on the other side here.
What this can let you do is sort of simplify these maps if you would
like to so if you go into -- there's a button here called 'Simplify' and I'll
just kind of show you what happens if you simplify them... Going in and as I
simplify it more and more and more you're seeing a lot of those lines
become straight lines. So basically, it loses a lot of its details if you will.
But this is just nice again, so if you were looking at a larger zip map if you
will, with a lot of different lines and things of that nature it's going to
you're going to need to sort of simplify these maps a little bit. Again is just a
free resource to utilize if you ever need to and then we can just export that
as need be. So at this point we have and what you have here is basically we're
going to be using this us.all.JSON file and that's going to be basically
this map here. And that's gonna be sort of our starting point for creating these
maps. The first thing I'll make note of is the fact that this first cell, now
that we're back in jupyter and they're under that maps file, the first thing you're
gonna notice is a script here. I call this "The map scaling magic". This is
something, a function, that I've put together through my time working with
maps. I don't want to go into a lot of detail with it. It basically creates a
function called createScope. We're just going to use that throughout here. And
what this allows us to do is basically we can say based on the width and height
of the window we're creating our map in, so whatever this width is so let's say 8 by
100 by 1000 or 800 by 800 or whatever this sort of screen sizes we want to
create the map so it sort of uses... it maximizes that space.
If you don't have this you're gonna have to sort of play around with the scaling
back and forth, again, something I'm not gonna go into too much detail but sort
of something I utilize for a lot of times because if I have a map that's taking up
you know again for the 800 by 800 space I don't wanna have to think about sort of how
do i scale that to that particular screen or I'm using the entire screen
again I don't want have to think about that in terms of handling that so this
sort of is just sort of a magic function that I've used before and, again, I'll
leave it here just so you can utilize it but I'm not going to go into too much
detail in its regards. But it can be used throughout most of these conversations,
these cells if you will. So what we have next is basically the actual sort
of outline of [inaudible] recreating the actual map itself. One thing I'll make a
note of is the fact that I've also included a link to where I kind of got
the original inspiration for this project from this link up here... As you can
kind of see here kind of give you an idea of obviously the United States
here with Alaska, Hawaii at the bottom left-hand corner and sort of kind of go
through just sort of you know, again, we're not going to follow this word for word here,
for the most part, this is where the inspiration for the actual graph that
I'll be talking about here in a moment. So the first thing I'll make note of as
I already kind of mentioned before is that CreateScope, and again this is sort
of adhering to whatever sort of width and height, so in this case we have an 800
by 1000 window that we're going to be creating this in. So
basically what it'll sort of automatically tailor the shape file to
maximize that space for us. And so, I call it just a variable scale. So going
through this I mentioned before we're gonna be referencing us.all.json. This will be
a JSON file that will allow us to create the entire United States, including, you
know, Alaska in its sort of correct positioning if you will and Hawaii as
well. So when you bring them in as sets, we first have to declare a sort of
a specific type of projection for this and there's a number
that are actually available for us to sort of choose from and I'll just open
this up real quick here and sort of let you go through and sort of see some of
the available ones that would get sort of changed and I'm going to kind of skip to
the bottom here. There, these are actually the projections
themselves. So what I chose was sort of one of the default ones for work for
laying out this map but you want to go through and check it out in terms of
other ones you can change that and it'll actually change that perspective for you.
And again, sort of what we do first is sort of set up a projection so this
projection basically says for that given ShapeFile, how should this be
projected onto the screen itself. And for this first part here, what we're doing is
literally just setting up a 1, 0, 0 which doesn't change it at all. And what we can
then do is sort of tie that projection the one we chose here to that the
projection we're creating here to the screen itself and that's what this path
it's basically a path function if you will and so all that means is when we're
reading in the JSON sort of object into it and saying okay I want you to create
this as base on for this outline sort of let's say Pennsylvania.
Pennsylvania has sort of you know, a specific outline, we want you to sort of
change that perspective one to the one that we have set here and prepare the
projection of the actual screen itself. So just kind of include these lines. One
thing I'll make note of is the fact that once we sort of set this, initially we
actually have to rescale it. And this is where that createscope comes into play,
So you're sending the ShapeFile to it, it says okay I know the ShapeFile, I know
what the sort of the width and height you're looking for because these are
short standard throughout 800 by 1,000 I'm going to create something called
"scale" -- again this is just a variable that we kind of put together at this point,
and it lets you say set the scale, set the translate and again you just kind of
copy and paste a lot of these specific things. So from there, that'll
basically recreate, correct that shapefile so it's at 800 by 1,000 screen.
from there we have a lot of similarities between the other ones we saw before we
originally we're going to get rid of the whatever graphs there and then we're
going to create a new SVG object and this is the only new part to it
specifically is adding this path function so we create a group again sort
of just so we have it under sort of under needs a specific group in this
case we're going to call it County and then we just append the path from
there so we feed it in the shape file and its features and one of the things
I'll make note of as well when you look at the shapefile itself there's going to be
a couple things one is going to be features is basically the data used to
create the shapefiles. And then there is properties, properties is basically
metadata about those specific States or counties or whatever that shapefile
specifically has. And that's that sort of terminology features and property
there's pretty standard with J files so we just feed it in but all we want to
give it is the features itself so notice that it means to create these where the
features are so basically it's going to know that you're the XY coordinates
for all these features and all these different, in this case, States or
counties basically so and we know it has that sort of that information available
to it and then we just read it into the path which, again, we've created up
here. So it uses this projection that we've gave to it and we gave it to the
projection in terms of the screen width and height it at this point so if you
run this what you're going to see is sort of what we saw before is basically
sort of you know there's here's Alaska, here's the United States, here's Hawaii,
and as you can also see that Alaska kind of trails off because it goes
around the world it has some some bits over here as well parts of the state are
islands offices off to the side there for us we're going to be focusing on
specifically creating a map for Pennsylvania and this is where that
properties comes into play that we mentioned before. So this next line here
basically goes through there's a lot of the same thing that we've talked about
before but one thing we want to change is actually, instead of using all the
features, so for all the states, Alaska, Hawaii, you know the United States as well,
we just want you to focus on specifically Pennsylvania. And so
what we can actually do is use something called a filter. A filter basically says
out of all this all this information, I want you to filter on
basically one key characteristic or one key feature. And in this case one of
the properties that is given within the status that we are able to put together
it's basically something called a STATEFP. And all this is a sort of
an ID for each of the states. And it just so happens, 42 is Pennsylvania. So
kind of jumped ahead a little bit in terms of what what you know this is sort
of the main thing, in terms of saying well we don't want to focus on the
entire United States we want to focus in on Pennsylvania. So we use a filter and
what this actual return is is it's actually looking for some boolean logic
so it's a true or false situation so in this case as it's going through each of
the states, is the STATEFP equal to 42? Nope. Let's say for New York
it says, "Okay whatever that ID is, the STATEFP is for it, it's not 42 so we're
not going to use it. So it's only going to return any true scenarios. So in that
case, it's only going to find the state that is equal to 42, which I mentioned before is
Pennsylvania. So instead of an entire data set, we're now focused in on
specifically Pennsylvania and its counties. Nothing changes in terms of a
lot of what else is going on here at all. So all it's doing is changing the data
set that we brought in that shapefile, that us.json file and filtering it
specifically for Pennsylvania. So if we look at that, you should see Pennsylvania.
So here is sort all the counties within Pennsylvania. From there,
we're going to need sort of additional information to actually sort of put all
these pieces together at this point and this is basically looking at
sort of a resource that I also included, you go in here basically -- actually let's
go to their site real quick -- say Penn State Sites. Pennsylvania state data
center, and what it has is some just some metadata about various information
across the counties and so you know one of the things we're looking at is
population specifically so I went through data and actually
got through some of the census information and found per county information
for population and that's what we're going to using for for this map. So it's
called pacountypop.txt and from there what I do actually have here is a couple
different things in terms of just kind of taking the data as it originally was
and sort of reformatting it so I have population per county.
And these IDs are sort of similar to what we talked about before, the IDs are
for each of the counties. And so I have what it creates sort of at the end is a this
is actually an object which sort of says, "Okay for each of these, we're going to
have you know a population." So for each of these counties, based on their IDs,
we're going to have a specific population. So we just used this section
here to bring in that file, sort of parse through it, and so now we can actually
utilize it for the rest of the code here. Okay, so in this step again, we're gonna
have a lot of similarities between the last one and this one. shape features is
basically again sort of focused in on Pennsylvania. Creating the same
projections doing all that same as well. A couple of new things we have here is
basically, so we want to color in these specific counties. So we're going to
add a couple things to this information. And so for that, we
would like to have one of the things we're gonna have for this is
specifically we're going to use that population data to sort of say okay
based on all the populations of all the counties, we want to create a range from
zero to one, because what we're going to have is is this color palette, and this
color palette goes from zero to one. So what we want to do is actually say, "Okay,
for each County, we want to take the population, and set it so that the lowest
population is labeled or sort of mapped specifically to zero and the highest
population is going to be mapped to one." And so that will give us sort of that
spectrum that we're looking for. I use a power scale and I'll mention why
here in a moment. But for the most part, all this does is create a number between
zero and one, so it can also be decimals as well, obviously, where it maps the
lowest to the zero and the highest to the one. And then the palette is sort of a
way to sort of express that within the colors themselves. And if you go to this
particular site as well you will see that there are a number of different
sort of color palettes you can use. Diverging, basically is one sort of
color palette, we have single hues, so for sequential information we also have
multi-hues, and I specifically just chose one, just to kind of take a look
at different colorings and things of that nature. But, for the most part you can
kind of mix and match in terms of what you want to have. So if I want to change
it to this one here, all you'd have to go do is find it and change that specific
value. For the sake of argument I'm going to leave it on, it's called plasma is the color
spectrum I'll be using. So now, we basically recreate what we did before, as
we saw before, and we just sort of read in so for the fill that's going to be
the color inside the actual States themselves, you're going to go through
the data, we're going to find the IDs, and for each of these IDs it's going to have
a value, so it's going to have a value for each of the
counties. So if it's Allegheny County, it'll have the population there for it
as we sort of discussed up here. And it's also gonna, you know, for all
the other counties within Pennsylvania specifically, it's going to say, "Okay,
whatever that value is, go use our scaling to map it to a number between 0
and 1", and then the palette would sort of take care of it from there. So the
palette's looking for a number between 0 and 1 and it'll sort of apply the color
specifically for that given value. So this is what we end up in you know
initially. So we can kind of see here, the Philadelphia area, the Pittsburgh area,
obviously has the highest of those values in that regard so that's why
they're sort of obviously a different color versus the other ones which are
kind of uniform in terms of their populations. Obviously not all the
way. And I'll sort of mention this moving forward. If you got to remember one thing
at this particular step is it's using populations, so it's the population
of the Philadelphia area or the population of the Pittsburgh area is
much much higher than the other one, it's going to vary much bias a lot of those
values specifically to those areas and sort of mute the other ones.
And again there's a way to sort of correct this moving forward. So again,
sort of, this is our first map if you will at this point. One thing I'll
make a note of is I don't include a legend here. Legends are actually
something you require a little bit more stuff than what I'm looking to sort of
focus on here. But there is a great library that I would sort of just
recommend checking out that lets you sort of create these pretty
straightforward in terms of that. But because it's external to D3,
it's open source, it's just not an official part of the D3 library if
you will, but something i've used before for a lot of different projects. I'd check
it out though just to see you know you put legends using this additional
library if you will. So like I said, we got Pennsylvania here, but we have a
little bit -- a lot of bias, specifically towards like the
Philadelphia, the Pittsburgh areas specifically. So one thing we can
actually do is sort of change that bias. So what we can actually do, and there's
one step I'm skipping just for a moment just to talk about the color and its sort
of variation. So back here, where we have our next plot, we have the scaling
power log plot, what we're going to actually do is change the exponent. The
exponent basically says we're going to sort of you know sort of move the values,
the smaller values, closer together and sort of along the upper upper part of
the tail if you will. And so what that allows us to do is actually
highlight a lot more of the larger populations that are just so it's not
100% by or majority bias towards Philadelphia or Pittsburgh and so what
we did here is basically you get a little bit more density in terms of some
of these color areas if you will. So we can actually change that from the saved
to 0.5... the default value is 1 which basically uses a linear scale but let's
say if we change it to 2, it's going to very much sort of bias our data sets
to specifically those larger values so obviously these Pittsburgh in the
Philadelphia area gets sort of elevated if you will. So what we want to do is
actually, you know, create a more even scale. So let's say we set it down to 0.3, we
run this, and now we're actually getting more variation in color.
Because we obviously have some populations that are very low in
Pennsylvania. State College, obviously, is one of the you know higher ones, but not
relative to Pittsburgh and Philadelphia. We're going to play around with that. But
I think I'll make a note of here is looking to add some point
particles as well. Basically, we want to add a point to this given graph.
So for this case you can actually utilize something called, one of the
Google API calls. Google API allows you to actually sort
of say, given an address, give me the latitude and longitude of
that given address. And so you'll have live lab was where I was actually giving
a live version of this, so it's like oh, why don't I just use that location and
that's what's listed here, and then it gives you all this meta information. So
they'll all have libraries, it gives you a bounding box, if you're interested in that, or
specifically, a location, so this is the latitude and longitude of that location.
Change this a bit... So there's our point, and so all we're doing here is again
following the same procedure as we did before with Pennsylvania where you can
change a little bit in terms of the bias, in terms of the exponent, and the only
thing we're gonna add is a location or State College and that's again sort of
being computed here using the Google API call so now we just add a point, so a
circle, and give it sort of the cx, cy, so its location, X,Y position. One
thing I'll make a note of as well is that one thing you have to do is the
fact that the latitude and longitude is basically sort of ubiquitous to any map.
again sort of using the idea that the shapefiles are specifically sort of set
up so you can sort of append it to whatever projection you're looking at.
The projection we created up at the top in terms of that, the projection magic
that I kind of talked about before, we've set up the projection at this point and
so we have to actually run that latitude and longitude through here. So it knows
that it's not specifically, if you think about this,
if we were placing a point on our screen here, you know with a longitude of
negative 77, it would be off the page somewhere. And so what this does is
basically uses the projection that we've created, for seeing the United States
and also for Pennsylvania, but it's mapping it now to the 800 by 1,000
screen and so it'll say oh I know this location based on this projection this
latitude longitude an easy map to this actual location within the 800 by 1,000
display and so that's what we're all we're doing here on that step and so now
it knows specifically where to put that circle. So, we have a point and this
should be mapped to specifically to the Dollhouse Laboratory, and I
just changed the opacity of the fill so you can actually see the point. And so
you can add a little bit more back if you want to, let's say 7, refresh this and
you can still see your point at that for that and so you can actually do not just
one point you can do multiple different points but this is just sort of a simple
sort of step by step a way of how do I capture a actual latitude-longitude
using something like a Google map API and helps you map it to specifically one
of our plots as as you see here. So that's basically what I wanted to cover today
in terms of both the the network plots and the map plots as well. So hopefully
you can kind of follow through and if you run into questions, please always
feel free to contact me I can follow up with these particular documents. It's
listed on github, you're more than welcome to share if you'd like to in
terms of sharing these resources, and eventually, you know if you have
any questions please let me know as well. So I thank everybody for their time and
if there's any follow-up questions please let me know. Thank you, have a good
day