Day 2 Homework
Contents
6. Day 2 Homework#
Here is a csv file that contains all of the countires that medalled in the 2018 Winter Olympic games, their latitudes and longitudes, medals won, GDP (2018), and population (2018). This time, we are going to plot the countries that medalled and add more information
6.1. Initial Instructions:#
Download the csv file from the link above.
Open a new Google Colab notebook and click on “New Notebook” in the corner.
Click the file icon on the left toolbar, then click on the upload icon (the button on the left).
Choose the new .csv file to upload in order to import the data.
Here’s the actual homework!
6.2. Assignment#
Open the csv file and examine what’s in it. What variable types are in here? Strings, floats, or integers?
In your Colab notebook, write a script that imports the data from the file to make a scatter plot.
As a hint pandas offers a read_csv function that one can use to easily load in the given file
As you parse the data, for each country you’ll need to tally up the total medals. You will also need to calculate GDP per 5000 capita (GPD per 5000 people).
What techniques that we have reviewed can we use to accomplish these?
This time when you plot the data, color the countries according to number of medals won, and change the size of each point based on GPD per capita.
See the plt.scatter documentation to try to figure out how to implement this.
Add the equator to the plot as well.
Congrats! You are all done.
The plot will have fewer points this time without the countires that did not medal, but contain more information about those countries. If you want a challenge, the advanced homework will add more complexity and the other countries!
6.3. Answer#
6.4. Additional Assignment#
If you were able to fly through the previous assignment, here is an addtional exercise that is a bit more of a challenge
Your solution for the first exercise should already be able to parse all of the data from the file.
Let’s examine three factors: GDP per capita, latitude, and raw population, and see how each of these affects medal results. While this is a complex question, we are going to assume a linear relationship for the sake of simplicity.
Let’s start with GDP per capita. There are many ways to do this next step, but one of the easiest is to import the
scipy.stats
package and use thelinregress()
function
import scipy.stats as ss
Take a look at the documentation for linregress. Particularly the examples may help.
What should we input into the function? What will be the output and how many should we expect?
Once you have successfully fit the function, we need to plot our results. Make a scatter plot with the x (GDP per capita) and y (medal total) values.
To the scatter plot, let’s add our fit line. 1) To do this you will need to define a linear function. Note, the order of the variables will be important. 2) Create an array of x values to plot (np.linspace() may be helpful) and input them into the function you just wrote. 3) Make sure you are using the slope and intercept generated from the ss.linregress(). 4) To make a line instead of a scatter plot, the plt.plot() function will be useful
Print out the R squared value as well.
Now we have a visualization for the realtionship between GDP per capita and Olympic medals!
Repeat this process (you can copy and paste most of your work here and just change the data you use) for latitude and population
Based on the results, which of these factors seems to be most impactful on Winter Olympic success?
Congrats, you’ve successfully learned to use Python to parse, plot, and statistically analyze data! This is the end of the Python Intensive Training base project.