Data Visualization Workshop - Let's make a map and network graph!

Okay, let us begin for today for our workshop for today a couple things to make note of before we even begin, I posted and I'm just going to repost this again just in case to be, here is a link to our -- the GitHub accounts we'll be using here today everything in there should be, can be used to create everything that we're talking about tonight specifically. So you should be able to all the the files needed for it within juptyr and also the files needed to create the grass network and also shape files and all the stuff to get into to create maps and and networking. So also, if you go to that link you'll see another link that'll go through the presentation itself. For today what we'll be doing specifically is looking at creating a network graph and also a map as well. It's one of the things I was able to learn from yesterday's conversation with the group here sort of the last presentation was trying to limit some of the specific conversations you just took most of the coding involved and just kind of making sure people sort of understand what the thought process and actually making visualization. So there might be a couple small changes if you happen to look at the code. Earlier today I've made some small changes on GitHub and so for the most part a lot of its the same but for some of this is small changes here and there but I'll talk about that as we go forward. I'm going to share my screen here a couple things to make note of like I said, first thing that some of the expectation is that you are able to install the Anaconda program. Anaconda is sort of a way for us to kind of put some code together and run it within the browser itself. if you happen to not have been at the first workshop, one of the issues you can run into in a lot of this is the fact that if we were trying to develop some of these d3 visualizations, because it's a JavaScript library and this is something that's within HTML itself it's something where we have to set up an actual web server,, kind of set up configurations for that jupyter kind of lets us sort of a lot of that stuff and so whenever you open up jupyter, You're going to see a couple tabs, if you download it's one of the newer ones you see something called jupyter lab. It's actually something that's been relatively new in terms of some of the new progression in some of this and might be something we'll use for future workshops, but for the most part we're going to be using the jupyter notebook itself so you're gonna click on that launch this directly from there. If you go to set up an account, the way to actually download this and get this sort of working for you, is actually go to under dudaspm, say data workshops you're gonna see a couple listed there: one for the last one one for this ones that workshop - all you need to do is actually click on clone or download and you can download it as a zip. If once you actually open up anaconda if you don't know exactly where to put that particular folder you can actually go into projects and then whenever you add a new project so yesterday I added one called test they don't tell you the home directory so for me the users PMD 19 and then I have a folder I specifically created called Anaconda projects and then anything that I need to put in there I'll put in there or for the actual presentation so for that I would specifically go and say clone and download, download to zip, it'll download for me and then I would just take that, unzip it and drop it into that Anaconda folder that I just mentioned... let's throw it into... and start the copy over. And so for us it's going to be, you're going to see a new folder called DataWorkshop - Master and you're going to see Workshop 1 and Workshop 2 - one thing I'll make note just in case you're just joining the conversation. Everyone is muted at least from what I see here. So if you have questions please use the chat and I'll keep an eye on that throughout the conversation even if you just kind of like raise your hand or something like that I can sort of unmute people to make sure we have get all questions answered. But, we're gonna be going through and using Workshop 2 specifically for today so if you go once you load up jupyter it will open up in a new tab and if I go to the home directory like I said there is I have a folder that I specifically created called Anaconda project. And you're going to go into there and find DataWorkshop - Master and everything is going to be inside here for for you to utilize for today. The first thing we are going to be talking about is creating the network graph itself and I should actually get to my slides real quick just to kind of keep on target. I actually should be doing a quick little introduction sort of, of myself. Just that this is the second sort of workshop that I put together. I also would like to say special thanks to the Institute for CyberScience, they sponsored all of these these sponsored these workshops for me and teaching and learning technology has helped as well especially specifically finding a location here on campus for the light of them. But I specialized in data visualization, social media and data analytics and I support here at the Hucks Institute for Life Sciences basically for CyberScience and Eberly College of Science as well and for you know like I've mentioned before these workshops sort of incorporate sort of the art, the data, the coding and the UI/UX or which or is user interface user design. So I very kind of talked about sort of the startup screen, environment variables and in terms of.. Oh, one of the things we're going to be doing today is also we'll probably be skipping this gap step specifically this is specifically with Python but I'll come back to that in one second. So for us we're going to be in the network folder. Originally yesterday when I went through these I actually had a script here - in this first cell here and to actually go and run some of this information you can actually see that it's going to have different cells so this first one here that is highlighted green is one cell. One of the things you're going to see is the fact that you can go Cell, and then Run Cell. And if you would happen to run this first cell it's not going to work out specifically because of the fact that I was actually going to showcase how to install a couple libraries. This is a Python scripts right here and what this is actually creating is a random graph. Just to kind of talk through some of these things, one of the things you need to install is something called Network X. It's a library that's open source that lets us create these random graphs. But for today I actually went through and created a random one or actually downloaded a pretty popular one for for network graphs and I'll be using that specifically as a way to sort of showcase how to create the visualizations themselves. So this first part here might, we won't be actually utilizing too much today but I kinda want to speak to something that's a little bit more known if you will and this is actually a dataset tailored to Les Miserablés and looking at characters and their occurrence within the story structure. So you'll kind of see you know main characters sort of be highlighted but you see sort of the back and forth between these two, these characters and this is actually kind of one of the default data sets that a lot of people utilize for network graphs so I decided to use that again. A couple things to note: it's kind of a lot of information here but for the most part the things that I'll kind of highlight is the fact that you have two sections: kind of break those out for a second; that is nodes and the unlinked if you're the basic 1,2 in terms of network graphs is very kind of crude drawing in terms of what I'm showcasing here but if you a lot of times when you think about networks it's specifically something like a social network or a conversation network or email network of some sort. You know individuals and at some point these people like sort of converse with one another so we're assuming Sally talks to Matt, Sally talks to Bob and Bob talks to Dave. And how we're going to represent that today is we're going to review representing them specifically using something called nodes or another term that people use is vertex and links. Links or edges basically. And so we'd have four nodes for our four individuals and then links connecting those two individuals as well. In this case up here I have a directed graph so that basically means Sally is talking directly to Matt but for our case today we're just going to assume that they're sort of equal in terms of their undirected if you will. And so we have nodes and links sort of set up for that specifically. Another thing that we kind of see within this data set is something called clusters or groups. And all this basically means that the if you would look at this network sort of as a whole a lot of people sort of generally sort of sort of converse or talk to one another so in case in this case if you have network like this you'd have basically because these people are talking sort of a lot with each other and they're separate from sort of the rest of the graph they more than likely be sort of created into a group a community, cluster, clique or, like I said there's different terminologies for this. Today I'll just be referring to them as a group. In the original code in my little script up here -- In my script here basically I also included a community basically algorithm again I'm not going to go to too much detail because it's a little bit outside the scope of what we're trying to accomplish today but for the most part it's basically it'll sort of do this automatically for you if you will. And this is just again sort of partitioning the network so it says well these people are highly coupled or highly communicating with each other, so probably put in a group versus ones that are just sort of separate from that particular sort of a group and sort of conversation if you will. So for our nodes we're going to have names so the individuals within again the story structure and then group is already sort of calculated for us and so each of these nodes basically has those two types of information our links are going to have basically three types of information that's a source, a target, and a value. Value is technically not needed, value in this case is basically sort of how often these people sort of converge or sort of talk with one another if you will and so that's sort of like meta information. The most important two things is basically the source and the target. The source basically saying, indicating one meaning our node so if you kind of go through here we have Node Zero, Node One so that would be Napoleon and target would be Hyriel. And so for this basically kind of work through this and so this is your way of connecting the dots if you will. Literally and sort of figuratively as well. So where we're actually going to start is on this script part here. And what you're going to see is if you again if you would run this and you go to cell click one cells nothing's really going to happen at this point and to be quite honest with you this subsection here kind of redundant. I do this specifically just to showcase how you can add additional little pieces of meta information or other types of information that you might be interested in adding as well this is basically going to be the structure for the rest of the visualization part of it. One question I got from last time as well is basically within D3, is there ways to actually import files? And there's actually a couple different ways of doing this. The file format that you're seeing here is something called JSON and so what we can actually use in D3 is a basically a JSON reader. JSON basically is sort of a way of formatting data so you can sort of remove some of the redundancy that can kind of occur with like something like a CSV or comma separated values or TSV, the types of separated values. And so what this will do is it'll produce two different things for us. It will create it creates the benefits of graph and two parts: nodes and links. And those are basically subsections. So from there we're just running through each of these so that's what that for each basically means it's for each of those nodes. We're going to create again this is very much redundant and I sort of acknowledge that but again this is sort of just making sure we connect you know that we have all our information out front. So first thing is basically you're gonna have the graphs for each we're going to create a new array so it's sort of the same thing and then for source and target same idea we're just going to make a reference to the nodes up here and to sort of just make it sort of make those connections for forward for d3. And if you want to you can actually change this back to the graph.JSON it should work as well and for the most part if you kind of follow this script where you have you know naming structure you maybe have a group you don't necessarily need this but it's a group and then sort of source and target for the most part you can just use this script over and over again. So for us this is actually where we start getting to the visualization part of it and if I run this you're going to see sort of at the bottom here our network picture so this is actually the Les Miserablés data set being visualized and as you can see if I rerun this it'll you know it will sort of based on the data that it's currently has it'll sort of restructure this information for us. So for the most part this is going to be the same thing throughout and there's you make small changes throughout as well but for the most part I kind of want to spend a little few moments just kind of making sure that we sort of see what's going on specifically I also added a couple places in terms of I'll just kind of I mean I'll spend a few moments here just because this is sort of the main script and there's gonna be some parts of this that I'm going to just sort of say assume that I'm not going to go into a lot of detail with just because a lot of it sort of it's handled in the background. I'll have links that you can kind of see throughout that's like it kind of gives you you know if you want to change this variable what can you change it with and things of that nature. The first thing you'll notice is that we have the % % % JavaScript all that does is say to Jupiter is what's going on here is JavaScript and we just have to declare that once. The second two parts here are again sort of specifically for jupyter we're adding an element called a div. A div is basically it's called it this basically in HTML a divider it's basically a placeholder for us and as you can see here we have a div id equals graph one and all that means is basically that we're going to add an element which happens to be right here nor the graph for the moment and it's going to be our placeholder and it's gonna have an ID called graph 1. And this is and we can put some style information in terms of sort of the width and the height and the margin:0 auto basically means we're going to Center that specific div. So all that is is basically sort of a placeholder for us and this is something you probably copy and paste in any type of project the next is a required statement all this does is basically go out find this d3 library so this is something that's online and lets us sort of use this as a JavaScript library and one thing on those is d3 is an open source library so basically you can utilize it for projects or for profit, things of that nature, commercial, academic research as well it's basically open in that regard as long as you sent them I believe the MIT license agreement so you just basically specify that that'll give you sort of a pathway to say that anytime you referenced d3 that you kind of see here we're referring to that specific library. So one of the things I also have here is the original example, so if you're interested you can actually go check out this link this is where the original project is done and there's a number of different ones Mike Bostock is basically the creator of d3, used to work for the New York Times, did all but one of those New York Times visualization that you might have seen in the past but he's actually was the creator of d3 and ProtoVis this which was one of the original versions of d3 but the best part here's a number of different examples that you can kind of go through and try out play with some for curved edges, fisheye distortion, we'll go into a lot of these for today but sort of a great resource to utilize if you're interested in looking at different projects. So for us the first thing we did is like I said we create that div we give it an ID. The second part here is the Select statement the Select statements basically say "On this webpage I want you to find me at the div", and this hashtag or pound sign depending on how you want to call it at this point in time, basically says to the browser "ID equals" so this sort of reads out div with an ID equal to graph one and that's what we've created it at this point. All this line does is basically it says, "I want you to go select that div and anything that's inside of it I want you to remove it." And the reason I have that is the fact that if we didn't have this statement and I can actually show that now if I remove this and I'd run this again we would actually start getting a couple a little bit of redundancy almost to actually show all of it. But yeah we would actually it's towards the bottom of the screen unfortunately, but it's basically it's going to create some redundancy on our graph and if our graph would sort of overlap with one another you'd actually see this so I won't go into too much detail, basically there's a few [inaudible] that you'd see each time. The main things is basically some some of the key characteristics or some of the variables are like width and height we set that ourselves you can change those as you see fit. The first thing is actually we're going to be creating a sort of substructure in terms of an SVG. A lot of D3 uses something called scalable vector graphics. SVG sort of I'll be sort of referencing that from here on out. And so the nice thing about the SVG is the fact that it's a situation where it uses an XY coordinate plot and no matter how far we sort of zoom into this so if I go back to my example here and I started zooming in you can kind of see that the actual graph itself doesn't deteriorate at all because what it's actually doing is just recreating the extra large coordinate plot it knows that these circles are supposed to be at a specific location and those links are supposed to be at a specific location so they just kind of redraw that specific space which is really nice especially if you want to use things like this for for publications or posters basically they're kind of almost like infinite size is basically to whatever specific specifications you need for that particular again document or that particular poster. And so what the first thing we do is basically we need we need to have so we have our div to place our graphic in we need actually have the SVG sort of specifies we know that we need to put the SVG object somewhere. So one of the things I also added to the documentation here is as you can kind of see is a link to something called w3 schools. w3 schools is actually sort of a resource I'll use a lot of times just to kind of get some basic ideas of how to create or develop code for things like SVG or HTML or CSS but basically it's part of a wide library for you to try out and sort of play with in terms of some of the basic code if you will. And I have a couple of specific links for the circle and the line objects and considering that's what we are using today to create the the nodes and links on this on these given graphs. So if we go into the circle when I, like I mentioned before it sort of gives you sort of a template of what you need to create this particular object so again you sort of see that SVG so the case your outline so you have to have sort of that first object and then inside there you have to put whatever sort of it's called a viewpoint but basically it's sort of a canvas or a way for you you know first to sort of to say that there's going to be an SVG object here and how I think sort of the basics in terms of the height and weight with this is specifically for the other thing that mentions is sort of like the circle we creating we're not going to worry too much about the fact that we need something the CX and CY which is basically the location because we're using the force-directed algorithm this will sort of be calculated and set for us pretty much automatically. But the other thing really is this R which is the radius as well and we can hard code that or we can sort of have it based on number of connections or things of that nature. So this press is basically just sort of lets you sort of stylize, so stroking the outside boundaries so that'd be a black color and then fill will be red obviously and then sort of stroke width is basically how pronounced is that stroke specifically. The other key piece of information we're going to need is the line so again sort of shows you just the basic example in that this case for line we're going to need four pieces of specific information that is the starting x and y position and the ending x and y positions and that's designated as x1 y1 x2 y2 and again some styling styling as well but for the most part this is what we're going to need. And again the force directed graph algorithms actually gonna take care of this for us. We don't have to worry too too much about that. And then going back in that'll kind of give you an idea of what key characteristics we're going to need and so we have here circle, radius five and then line, just some basic information. But at the bottom is where it's actually going to be specific to where the locations are going to be for that for us that's going to sort of help us represent again sort of those nodes and those links that's what we're actually to start here in terms of some of these characteristics and then we'll come back to this part that says simulation. But it's going to do the simulation for the force directed graph itself but for now we're just going to start here with four links and what we're doing is once we create that SVG object so that outside container we're going to append in this case you're going to append a new group of points or objects and specifically we're going to be adding a line and then this point is actually where we're going to start feeding in the data so again this links is referring back to links here again we did this a little bit redundant but for the most part we're using that those links to sort of highlight you know for the most part it's basically the array of information so it links the source, the target, and the value and so we're just feeding that into d3 to say we're going to need you to make as many links as we have in our data set. So if we go in here you have... I'm not going to count all these but for the most part we'll say 200-300 links, we didn't want it we don't to manually sort of encode all this so we're just letting it sort of say for each of these values in our array I'm kind of highlighting them one at a time we need to create a new line if you will and then the same idea for nodes as well for each of these we're going to create a new circle or nodes in this case. And so you did this for links and we do this for the nodes themselves. And we have circles grow nodes, links -- lines for our links and the one thing you might notice at this point is the fact that I don't actually declare their location all I'm just saying at this point is we're gonna have 200 links, 100 nodes. Let's put it that way, and one other thing you might notice as well this coloring as well and I'll come back to that one second as well. So for this if we would just run this at this point then I can actually just show you what happens I've actually nothing but actually shown on the screen at this point because I have not given them a location to put their the nodes and optionally lines themselves that's where actually this simulation part comes into play and this is one that's built in natively into d3. The force directed graph using specifically the force many bodies sort of approach to this think about it in terms of for each of our nodes and our graph they're going to maybe have a sort of a pulse rate our charge between them so they're going to push each other apart. Links are going to be basically sort of set in terms of we want them to be these sort of this long you know 10 pixel long each time. Anyway you can set and change those and I'll show you how to do that moving for but for the most part we just sort of need to declare you know that we're using this simulation and sort of just some basic information at this point. One of the things you might see is this center part, all it does is say well we're going to have you know on the screen here where you want to actually put your force directed graph. And so we set the center to width, half, height, half basically the center of that there for that object. That just basically sort of works in that regards from there the simulation we just have to tell where are the nodes and again this node here refers to our array up here so this array and we tell them where all the links are again we're going to say the links from the array as well. So it basically knows that it needs to work with these nodes they're going to be on the screen so about roughly a hundred and we know that these certain links are sort of connecting all these these nodes as well so for us it's just sort of if the simulation takes care of a lot of the thinking about sort of where to place these actual objects. So for the most part we've kind of covered most of what's going on we sort of just have to touch on the fact that we need to know where the placement is. That's where this ticked function comes into play. You can see here on tick, we want you to do something. Basically when you're running a simulation it's going to sort of try to get you sort of a final position and it's sort of gonna try to get you sort of an optimal position for this network graph. Again, I'm not going to go into too much detail in terms of the algorithm that's sort of running in the background but for the most part we're just using this to say for each of these ticks and these ticks happen very rapidly so we're going to run for about 200 and 260 of them within a second or so and for each of these ticks and as it gets closer to a final sort of sort of simulation, final sort of layout if you will we want you to move our link move our nodes to their actual sort of their actual gravity sort of locations. And so this is actually taking this care taking care of this for us so for each of our nodes it's setting that the x value and the y values to wherever this sort of simulation needs to put it and again the attributes for x1 and y1 so our starting XY position and our final XY position is going to be handled by where the source and the targets should be live basically. And again this is something that you use kind of copy and paste through London's projects because this is very standard for a lot of these projects. So for the most part you're going to kind of utilize this as a good starting position and if you run this, like I said, you're going to get sort of this output moving forward and if you want to you can change some values just to kind of see what it looks like so if I really want to go with the stroke width that's basically the how thick the lines are between each of the links. I can rerun this and now they're really really thick because I change them from two pixels up to ten pixels and you know if I want to and come back here,, go to one pixel, rerun this again, and now they're a little bit obviously a lot thinner at this point. You can tie that the width to sort of values, and I'll share that in a couple seconds here and contains a lot of things based on the data itself. D3 specifically D3 stands for data-driven documents it's going to help you with a lot of times with making sure that you know you're taking just data or information and repurposing it for the visualization that you're trying to put together basically it's going to take care of a lot of the things that lets you use the data to sort of create the graph so it's less you know less manual processing that's actually going on less human error, you're representing the data actually very accurately in that regard so it's basically sort of taking a lot of that nuance and actually sort of helping you through that process and again you can sort of control a lot of these variables one of these changes. One thing you might have noticed is, I mentioned before, is color -- coloring, basically d3 has a standard way of doing this. You can, these are sort of the default values I just need to include this line here and what it'll do is actually sort of create an ordinal scale basically it'll let me sort of say for each data point within from 1 to 20 so this is category 20 so there's 20 colors that I can sort of utilize it's going to automatically sort of take care of a lot of that issues for me so it's not I don't have to specifically say for group zero I want you to be this color for group one I want you to be this color and so on it's further, it is basically going to use this color spectrum and for each new sort of data point if you will it's going to color them appropriately within 20 colors if you because I'm using that specific one if you're interesting and check out the website there's enough there's a number of different color schemas that you can sort of utilize if I want to I can set this 10 we run this again and for the most part you're going to see that we're gonna have a little bit redundant at this point so we have a group here that's blue and blue over here basically all that means is in our data set here more than likely we have groups that go past 10 at this point maybe up to 11 or 12 so basically that's why it kind of goes over that threshold. So I'll set that back to 20... okay, if I click let me run it and for the most part it should reproduce back to sort of having those at that scale again. So that's how you sort of start off with creating these network graphs from here basically we can make some small changes here and there to highlight certain things, to change different variations, change of styling, to sort of help sort of increase some of that knowledge of here, or sort of that highlight certain things and style certain things as well. So at this point basically if we run the next cell It should sort of, you should see something that sort of looks similar to this it's gonna be like slightly off maybe a little here and there for the most part but for the most part you should see a network that sort of looks like this. Again this is using that same data set. Some of the other things we can add to this is within our simulation is the fact that you can add again, we've talked about this mentioned this before, but you can add distances between each of these links so it knows how far to push each of these links and, excuse me, all these nodes apart and we can actually be a little bit clever in this regard. So in this case, all I'm saying in this function here is that if the group, the source node, or the source circle is in the same group as the target one, so in this case if we go down here so these orange ones there's a link between these two oranges and for the most part we want to say that if that's true we want to bring them really close together so I'm going to change this number to five and if this is a sort of true/false statement, so if they are, if this is false this number here will apply. I'm gonna make this a hundred, and so if I rerun this we should see that basically all the ones in the same group sort of pull themselves closer together and all the ones that are sort of outside of the group so basically everybody's kind of really close together and then links that are outside of so if there's a light blue here to a peach color here basically that's gonna be a hundred pixels apart they're going to push each other apart as much as they can. And you can sort of, if you can imagine sort of playing around with this this sort of again sort of highlight groupings or if you wanted to do it the opposite way, so if I change this and hundreds this way and say ten this way you're going to sort of see it's kind of all kind of meld together if you will because of the fact that again we're not we're not we're biasing towards pushing group nodes away from each other and then pulling nodes that are well outside of groups or memberships closer together so that and there's a couple other sort of functions we can do as well for the most part we can bias where the x position in the Y position bar. At this point in time, so if I wanted to... Oh, one other thing I mentioned to as well, you can actually not only just set a distance between each of these nodes and you can actually set a strength as well. So let's assume that we were able to set our strength for all these lengths to zero. If you run this again you're going to find that you're going to have a very sort of relaxed sort of relaxed network if you will because there's no sort of force pulling each of these nodes together because they're so it's sort of ignoring all these links on the screen and you can sort of set that between zero and one, so I'm going to set it to point five and for the most part should look like this back to 20.. and 50. We can sort of set it to that. A couple other things you can also see as well you can sort of set an x value and the y value so you can kind of specifically try to be in a specific location, and also things like charge so charge is basically how much the actual force directed graph sort of pushes nodes away from each other. And I have a note here that basically says if it's a positive value in the stream at this point is negative it's actually sort of using a charge that's trying to push all the nodes away from each other. If we set that to a positive it's basically -- I'll do that now, show you what happens. It's basically all the nodes are trying to pull each other closer together towards the centre position. it's not very informative, obviously. If you want to you can go and say, let's set this to 300, and it's really going to start pushing all the nodes sort of away from each other. I use sort of a, that's sort of the charge factor, the collide factor basically is the actual node repulsion itself so if I set this to, let's say 10 again it's going to do just specifically with the nodes themselves, sort of push all the nodes sort of away from each other. And so if I set this... and again, you can sort of play around with this if you want, just to see how these different values change what's going on so, and if set this to 100 they're really trying to push each other apart at this point. Again, you can change things as you see fit. Something kind of cool, and fun to play with if you will moving on to the next cell the only thing that's going on that's a little bit different here is that I changed up the stroke width here to be sort of utilizing that value that we've talked about before. So all this is is basically saying, for this stroke width I want you to actually use the data to sort of showcase how thick the lines are. So if we run this, we're going to see that we have a lot of dark lines between some of these individuals, and for the most part all this is telling us is the fact that these individuals are really sort of community or interacting with each other within this Les Miserablés dataset if you will. Or it's these inner lines are sort of saying that maybe they only co-occur, happens only once throughout the story. One last thing I'll mention is you can actually do something called dragging, and this dragging specifically allows you to click on a node so I'll do that here and sort of drag it around so you can kind of make it sort of a movable object. And all you need to do is actually add a couple of different sort of set criterium in terms of a drag started, drag while you're dragging, and then drag ended as well. And this is something that a lot of times you're again you're going to sort of have a default if you go back to this force directed graph here it's these are basically just sort of default values. It's not something I've put together it's basically they're calculations and things of that nature so a lot of times I'll just copy and paste this but for the most part just all it does is allow you to sort of add a little bit of functionality to it in terms of now we can sort of set our graph, so we pre our grap, and now somebody can go and click on it, drag it off this position here, move it around, see what happens as you move it around. And if I change it say to collide... So everybody's really trying to push each other away back down here and click on this node and as I move it around you see basically all the other nodes are kind of trying to get away from it if you will think about this in terms like two magnets with the same sort of positive charge of one another if you're showing the repulse each other at that point. So that's kind of heading into the first sort of point in terms of this is how you create that set network graph. And for the most part again, try to go through and just maybe change for the value see what's going on maybe put together your own data set to see what happened with with some of this you know, how this should look in this regard and then sort of just sort of play around with sort of some of these changing values as you move forward. So that's the network graph sort of portion of the conversation. What I would like to do now is actually head over to sort of the map structure at this point. So network graphs basically sort of are handled in this data set and if you go into our dataset here in terms of the Workshop 2- you're going to see basically that we have another file called Map and they'll open that up here and from here and basically there is some kind of this going on and I'll walk through this as we kind of need to. The first thing I'll make note of before I move on to that part is.. go back to my presentation here.. I skipped over a little bit of this information a couple of sort of topics in terms of network graph theory... you know, ways you can add sort of information to this stuff. What I recognize is basically kind of walk through this and take a look at it at some point for the most part it's just basically some centrality measures so you can add to that information and ensure it's in the graph - and then make it more interesting visually there's some some social science and topics as well that you can kind of look into and again I'll kind of let that go just for the conversation for today. On that, we're gonna actually go through and talk about and I'm going to kind of skip a couple slides for a moment here and it should sort of outline today's project if you will. Our project is basically going to be I want to create a map of the US including Alaska, Hawaii, Puerto Rico... basically anything that sort of under the U.S. umbrella if you will. And for the most part we're going to then limit this map to Pennsylvania and this is going to be by County. I'm going to go out and find data for each of these counties and then basically color code our counties by population. So this is sort of outlining what we're going to be doing today. A couple things I'll make note of is the fact that one of the main topics we'll be talking about is something called a projection. One thing you have to consider for when you're creating maps is the fact that you're trying to recreate an obviously three-dimensional object which is obviously there and you're trying to project it onto a two-dimensional plot which is your screen. So you have to think about how do we project that on that screen. And I kind of like this diagram in terms of if you would put a light bulb inside the earth and sort of.. how this would actually project out and how people, you know, typically will see a normal map and how we can sort of lead to some incorrect sort of proportions a lot of times especially on the northern and southern parts of the hemispheres. So there's actually an interesting project I recommend going and trying out it's called the true size it's kind of a cool little thing basically you can sort of type in a country and/or a function and it'll give you the size and again this is sort of a default map that we typically would see in a Google map or anything like that. The one I chose in here basically have the United States and then I also have Greenland. So this actual small little blue object here is Greenland. A lot of times people sort of assume that Greenland is sort of this massive area, which it is but not this size. Basically the issue is that you're projecting the top part of that sphere onto that two dimensional map if you will and its actual size is you know smaller than the United States. Well, quarter the size, if you will. The continental United States. So check it out it's kind of a neat little way to sort of showcase the different proportions that are actually true in terms of the actual sizes of these continents or countries. For us, the first thing you're going to do is you're going to go out and try to find something called a ShapeFile. Now one thing I'd like to note is the fact that when we're talking about creating a map one of the questions I got yesterday was the fact that, "Can't I use just something like an ArcGIS or a QGIS to sort of create a static map?" And that's very true, what we're going to see today is not going to be sort of groundbreaking in terms of things you might would do in software packages like ArcGIS for the most part you're going to have this is just sort of starting in terms of just trying to meld two pieces of information together. That being the ShapeFile of the United States and then adding some additional information like the county population information and you can sort of build out from there. But for the most part, if you're just looking to create a static image ArcGIS is going to be sort of one of your best places to go to start and put this together or QGIS is another resource as well. But I'm still going to walk through just because again, you can sort of build out more dynamic Maps, some interesting things that you can kind of put together. But the first thing that we need to do is actually find something called a ShapeFile. A ShapeFile is basically sort of.. it's how these maps are being created and it's basically it's sort of an agreed upon standard in terms of a lot of mapping structures. So if I use a ShapeFile sort of from this location specifically I'm using from the Census Bureau, basically the map of the US, we can sort of have an agreed upon sort of way that the you know the United States is going to be formatted and how it's going to look and it's dimensions and it's going to use some things like their latitudes and longitudes actually etch out all the states. Again, sort of using Alaska and Hawaii as well. But it's sort of an agreed-upon sort of standard if you will. If you go to this website and again I'll make a note of the fact that if you go into our folder that you downloaded you should have all the information so we don't just sit here and sort of wait for all these things to download. If you go to that website you can kind of see that you can download different things in terms of the congressional districts, zip code sort of map which is a very large file because it's actually a lot of detail but people don't really know sort of. I've created zip maps before, but for the most part County sort of has enough little small details for us to kind of play around with. And you can download these and sort of try these all out but for the most part, I'm going to talk about the County one. And that's what this folder, this file called cb_2016_us_counties zip folder is. Again, if you were using something like an ArcGIS you would just open this up and there'd be sort of you know add the additional, you know, information like another you know outside folder like a raster folder or file of some sort, but for us we're gonna kind of walk through all these four ways to actually create the map using d3. There's a couple resources that you can actually use through ArcGIS to do the sort of to create sort of what data structure we need to use, but I actually have a couple of links in my presentation -- a couple of ways to actually convert this so we can handle this on basically these are open source projects that allow us to do it. Pretty straightforward. This first one, you would just take that I'll show you what I mean... You would just take that zip folder of the ShapeFile, drop it right here, and click convert to JSON-- GeoJSON. And for the most part, all you need to do is sort of copy and paste this screen. For the most part, that's going to be enough to get it kind of get us started in terms of this is the structure of the data again so taking care of sort of just placing out the actual states and also providing this and that sort of guidance in that regard so we don't have to actually sort of start from scratch if you will. So all I did was drop that ShapeFile in and it exports this out for me automatically a nice little resource, again, free sort of resource online for us to utilize. The other one I have listed here as well is something called mapshaper and mapshaper is really nice to help us especially with very high detail aps if you will. And so if I want to, and you go back in here -- so this output is the one that's called us_json I can just drop it into here and it'll actually show you basically give you an outline of what you would actually see in the ShapeFile itself. Again you're gonna see Alaska, Hawaii, and so on and so forth, the Continental United States as well. You're going to notice something as well under this over Alaska, because it sort of wraps around the world it actually has parts of it sort of sticking out on the other side here. What this can let you do is sort of simplify these maps if you would like to so if you go into -- there's a button here called 'Simplify' and I'll just kind of show you what happens if you simplify them... Going in and as I simplify it more and more and more you're seeing a lot of those lines become straight lines. So basically, it loses a lot of its details if you will. But this is just nice again, so if you were looking at a larger zip map if you will, with a lot of different lines and things of that nature it's going to you're going to need to sort of simplify these maps a little bit. Again is just a free resource to utilize if you ever need to and then we can just export that as need be. So at this point we have and what you have here is basically we're going to be using this us.all.JSON file and that's going to be basically this map here. And that's gonna be sort of our starting point for creating these maps. The first thing I'll make note of is the fact that this first cell, now that we're back in jupyter and they're under that maps file, the first thing you're gonna notice is a script here. I call this "The map scaling magic". This is something, a function, that I've put together through my time working with maps. I don't want to go into a lot of detail with it. It basically creates a function called createScope. We're just going to use that throughout here. And what this allows us to do is basically we can say based on the width and height of the window we're creating our map in, so whatever this width is so let's say 8 by 100 by 1000 or 800 by 800 or whatever this sort of screen sizes we want to create the map so it sort of uses... it maximizes that space. If you don't have this you're gonna have to sort of play around with the scaling back and forth, again, something I'm not gonna go into too much detail but sort of something I utilize for a lot of times because if I have a map that's taking up you know again for the 800 by 800 space I don't wanna have to think about sort of how do i scale that to that particular screen or I'm using the entire screen again I don't want have to think about that in terms of handling that so this sort of is just sort of a magic function that I've used before and, again, I'll leave it here just so you can utilize it but I'm not going to go into too much detail in its regards. But it can be used throughout most of these conversations, these cells if you will. So what we have next is basically the actual sort of outline of [inaudible] recreating the actual map itself. One thing I'll make a note of is the fact that I've also included a link to where I kind of got the original inspiration for this project from this link up here... As you can kind of see here kind of give you an idea of obviously the United States here with Alaska, Hawaii at the bottom left-hand corner and sort of kind of go through just sort of you know, again, we're not going to follow this word for word here, for the most part, this is where the inspiration for the actual graph that I'll be talking about here in a moment. So the first thing I'll make note of as I already kind of mentioned before is that CreateScope, and again this is sort of adhering to whatever sort of width and height, so in this case we have an 800 by 1000 window that we're going to be creating this in. So basically what it'll sort of automatically tailor the shape file to maximize that space for us. And so, I call it just a variable scale. So going through this I mentioned before we're gonna be referencing us.all.json. This will be a JSON file that will allow us to create the entire United States, including, you know, Alaska in its sort of correct positioning if you will and Hawaii as well. So when you bring them in as sets, we first have to declare a sort of a specific type of projection for this and there's a number that are actually available for us to sort of choose from and I'll just open this up real quick here and sort of let you go through and sort of see some of the available ones that would get sort of changed and I'm going to kind of skip to the bottom here. There, these are actually the projections themselves. So what I chose was sort of one of the default ones for work for laying out this map but you want to go through and check it out in terms of other ones you can change that and it'll actually change that perspective for you. And again, sort of what we do first is sort of set up a projection so this projection basically says for that given ShapeFile, how should this be projected onto the screen itself. And for this first part here, what we're doing is literally just setting up a 1, 0, 0 which doesn't change it at all. And what we can then do is sort of tie that projection the one we chose here to that the projection we're creating here to the screen itself and that's what this path it's basically a path function if you will and so all that means is when we're reading in the JSON sort of object into it and saying okay I want you to create this as base on for this outline sort of let's say Pennsylvania. Pennsylvania has sort of you know, a specific outline, we want you to sort of change that perspective one to the one that we have set here and prepare the projection of the actual screen itself. So just kind of include these lines. One thing I'll make note of is the fact that once we sort of set this, initially we actually have to rescale it. And this is where that createscope comes into play, So you're sending the ShapeFile to it, it says okay I know the ShapeFile, I know what the sort of the width and height you're looking for because these are short standard throughout 800 by 1,000 I'm going to create something called "scale" -- again this is just a variable that we kind of put together at this point, and it lets you say set the scale, set the translate and again you just kind of copy and paste a lot of these specific things. So from there, that'll basically recreate, correct that shapefile so it's at 800 by 1,000 screen. from there we have a lot of similarities between the other ones we saw before we originally we're going to get rid of the whatever graphs there and then we're going to create a new SVG object and this is the only new part to it specifically is adding this path function so we create a group again sort of just so we have it under sort of under needs a specific group in this case we're going to call it County and then we just append the path from there so we feed it in the shape file and its features and one of the things I'll make note of as well when you look at the shapefile itself there's going to be a couple things one is going to be features is basically the data used to create the shapefiles. And then there is properties, properties is basically metadata about those specific States or counties or whatever that shapefile specifically has. And that's that sort of terminology features and property there's pretty standard with J files so we just feed it in but all we want to give it is the features itself so notice that it means to create these where the features are so basically it's going to know that you're the XY coordinates for all these features and all these different, in this case, States or counties basically so and we know it has that sort of that information available to it and then we just read it into the path which, again, we've created up here. So it uses this projection that we've gave to it and we gave it to the projection in terms of the screen width and height it at this point so if you run this what you're going to see is sort of what we saw before is basically sort of you know there's here's Alaska, here's the United States, here's Hawaii, and as you can also see that Alaska kind of trails off because it goes around the world it has some some bits over here as well parts of the state are islands offices off to the side there for us we're going to be focusing on specifically creating a map for Pennsylvania and this is where that properties comes into play that we mentioned before. So this next line here basically goes through there's a lot of the same thing that we've talked about before but one thing we want to change is actually, instead of using all the features, so for all the states, Alaska, Hawaii, you know the United States as well, we just want you to focus on specifically Pennsylvania. And so what we can actually do is use something called a filter. A filter basically says out of all this all this information, I want you to filter on basically one key characteristic or one key feature. And in this case one of the properties that is given within the status that we are able to put together it's basically something called a STATEFP. And all this is a sort of an ID for each of the states. And it just so happens, 42 is Pennsylvania. So kind of jumped ahead a little bit in terms of what what you know this is sort of the main thing, in terms of saying well we don't want to focus on the entire United States we want to focus in on Pennsylvania. So we use a filter and what this actual return is is it's actually looking for some boolean logic so it's a true or false situation so in this case as it's going through each of the states, is the STATEFP equal to 42? Nope. Let's say for New York it says, "Okay whatever that ID is, the STATEFP is for it, it's not 42 so we're not going to use it. So it's only going to return any true scenarios. So in that case, it's only going to find the state that is equal to 42, which I mentioned before is Pennsylvania. So instead of an entire data set, we're now focused in on specifically Pennsylvania and its counties. Nothing changes in terms of a lot of what else is going on here at all. So all it's doing is changing the data set that we brought in that shapefile, that us.json file and filtering it specifically for Pennsylvania. So if we look at that, you should see Pennsylvania. So here is sort all the counties within Pennsylvania. From there, we're going to need sort of additional information to actually sort of put all these pieces together at this point and this is basically looking at sort of a resource that I also included, you go in here basically -- actually let's go to their site real quick -- say Penn State Sites. Pennsylvania state data center, and what it has is some just some metadata about various information across the counties and so you know one of the things we're looking at is population specifically so I went through data and actually got through some of the census information and found per county information for population and that's what we're going to using for for this map. So it's called pacountypop.txt and from there what I do actually have here is a couple different things in terms of just kind of taking the data as it originally was and sort of reformatting it so I have population per county. And these IDs are sort of similar to what we talked about before, the IDs are for each of the counties. And so I have what it creates sort of at the end is a this is actually an object which sort of says, "Okay for each of these, we're going to have you know a population." So for each of these counties, based on their IDs, we're going to have a specific population. So we just used this section here to bring in that file, sort of parse through it, and so now we can actually utilize it for the rest of the code here. Okay, so in this step again, we're gonna have a lot of similarities between the last one and this one. shape features is basically again sort of focused in on Pennsylvania. Creating the same projections doing all that same as well. A couple of new things we have here is basically, so we want to color in these specific counties. So we're going to add a couple things to this information. And so for that, we would like to have one of the things we're gonna have for this is specifically we're going to use that population data to sort of say okay based on all the populations of all the counties, we want to create a range from zero to one, because what we're going to have is is this color palette, and this color palette goes from zero to one. So what we want to do is actually say, "Okay, for each County, we want to take the population, and set it so that the lowest population is labeled or sort of mapped specifically to zero and the highest population is going to be mapped to one." And so that will give us sort of that spectrum that we're looking for. I use a power scale and I'll mention why here in a moment. But for the most part, all this does is create a number between zero and one, so it can also be decimals as well, obviously, where it maps the lowest to the zero and the highest to the one. And then the palette is sort of a way to sort of express that within the colors themselves. And if you go to this particular site as well you will see that there are a number of different sort of color palettes you can use. Diverging, basically is one sort of color palette, we have single hues, so for sequential information we also have multi-hues, and I specifically just chose one, just to kind of take a look at different colorings and things of that nature. But, for the most part you can kind of mix and match in terms of what you want to have. So if I want to change it to this one here, all you'd have to go do is find it and change that specific value. For the sake of argument I'm going to leave it on, it's called plasma is the color spectrum I'll be using. So now, we basically recreate what we did before, as we saw before, and we just sort of read in so for the fill that's going to be the color inside the actual States themselves, you're going to go through the data, we're going to find the IDs, and for each of these IDs it's going to have a value, so it's going to have a value for each of the counties. So if it's Allegheny County, it'll have the population there for it as we sort of discussed up here. And it's also gonna, you know, for all the other counties within Pennsylvania specifically, it's going to say, "Okay, whatever that value is, go use our scaling to map it to a number between 0 and 1", and then the palette would sort of take care of it from there. So the palette's looking for a number between 0 and 1 and it'll sort of apply the color specifically for that given value. So this is what we end up in you know initially. So we can kind of see here, the Philadelphia area, the Pittsburgh area, obviously has the highest of those values in that regard so that's why they're sort of obviously a different color versus the other ones which are kind of uniform in terms of their populations. Obviously not all the way. And I'll sort of mention this moving forward. If you got to remember one thing at this particular step is it's using populations, so it's the population of the Philadelphia area or the population of the Pittsburgh area is much much higher than the other one, it's going to vary much bias a lot of those values specifically to those areas and sort of mute the other ones. And again there's a way to sort of correct this moving forward. So again, sort of, this is our first map if you will at this point. One thing I'll make a note of is I don't include a legend here. Legends are actually something you require a little bit more stuff than what I'm looking to sort of focus on here. But there is a great library that I would sort of just recommend checking out that lets you sort of create these pretty straightforward in terms of that. But because it's external to D3, it's open source, it's just not an official part of the D3 library if you will, but something i've used before for a lot of different projects. I'd check it out though just to see you know you put legends using this additional library if you will. So like I said, we got Pennsylvania here, but we have a little bit -- a lot of bias, specifically towards like the Philadelphia, the Pittsburgh areas specifically. So one thing we can actually do is sort of change that bias. So what we can actually do, and there's one step I'm skipping just for a moment just to talk about the color and its sort of variation. So back here, where we have our next plot, we have the scaling power log plot, what we're going to actually do is change the exponent. The exponent basically says we're going to sort of you know sort of move the values, the smaller values, closer together and sort of along the upper upper part of the tail if you will. And so what that allows us to do is actually highlight a lot more of the larger populations that are just so it's not 100% by or majority bias towards Philadelphia or Pittsburgh and so what we did here is basically you get a little bit more density in terms of some of these color areas if you will. So we can actually change that from the saved to 0.5... the default value is 1 which basically uses a linear scale but let's say if we change it to 2, it's going to very much sort of bias our data sets to specifically those larger values so obviously these Pittsburgh in the Philadelphia area gets sort of elevated if you will. So what we want to do is actually, you know, create a more even scale. So let's say we set it down to 0.3, we run this, and now we're actually getting more variation in color. Because we obviously have some populations that are very low in Pennsylvania. State College, obviously, is one of the you know higher ones, but not relative to Pittsburgh and Philadelphia. We're going to play around with that. But I think I'll make a note of here is looking to add some point particles as well. Basically, we want to add a point to this given graph. So for this case you can actually utilize something called, one of the Google API calls. Google API allows you to actually sort of say, given an address, give me the latitude and longitude of that given address. And so you'll have live lab was where I was actually giving a live version of this, so it's like oh, why don't I just use that location and that's what's listed here, and then it gives you all this meta information. So they'll all have libraries, it gives you a bounding box, if you're interested in that, or specifically, a location, so this is the latitude and longitude of that location. Change this a bit... So there's our point, and so all we're doing here is again following the same procedure as we did before with Pennsylvania where you can change a little bit in terms of the bias, in terms of the exponent, and the only thing we're gonna add is a location or State College and that's again sort of being computed here using the Google API call so now we just add a point, so a circle, and give it sort of the cx, cy, so its location, X,Y position. One thing I'll make a note of as well is that one thing you have to do is the fact that the latitude and longitude is basically sort of ubiquitous to any map. again sort of using the idea that the shapefiles are specifically sort of set up so you can sort of append it to whatever projection you're looking at. The projection we created up at the top in terms of that, the projection magic that I kind of talked about before, we've set up the projection at this point and so we have to actually run that latitude and longitude through here. So it knows that it's not specifically, if you think about this, if we were placing a point on our screen here, you know with a longitude of negative 77, it would be off the page somewhere. And so what this does is basically uses the projection that we've created, for seeing the United States and also for Pennsylvania, but it's mapping it now to the 800 by 1,000 screen and so it'll say oh I know this location based on this projection this latitude longitude an easy map to this actual location within the 800 by 1,000 display and so that's what we're all we're doing here on that step and so now it knows specifically where to put that circle. So, we have a point and this should be mapped to specifically to the Dollhouse Laboratory, and I just changed the opacity of the fill so you can actually see the point. And so you can add a little bit more back if you want to, let's say 7, refresh this and you can still see your point at that for that and so you can actually do not just one point you can do multiple different points but this is just sort of a simple sort of step by step a way of how do I capture a actual latitude-longitude using something like a Google map API and helps you map it to specifically one of our plots as as you see here. So that's basically what I wanted to cover today in terms of both the the network plots and the map plots as well. So hopefully you can kind of follow through and if you run into questions, please always feel free to contact me I can follow up with these particular documents. It's listed on github, you're more than welcome to share if you'd like to in terms of sharing these resources, and eventually, you know if you have any questions please let me know as well. So I thank everybody for their time and if there's any follow-up questions please let me know. Thank you, have a good day