5. Appendix#
Learning Objectives
Create and access elements of tuples
Create and access elements of dictionaries
Explain what HTTP is, including request methods
Use the requests package to make HTTP requests
Explain what a web API is
Load and access elements of JSON data
Write loops to do things repeatedly
Write comprehensions to do things repeatedly
Save data to a CSV or other file format
This appendix contains an optional part 5 of the workshop which demonstrates how to use Python to download some non-tabular data from the internet, convert it into a tabular format, and save it as a CSV file.
5.1. Containers#
Python provides a variety of containers for data. This section describes the three most widely-used containers. You can find even more types of containers in the collections module.
5.1.1. Lists#
Lists were introduced in Section 2.2.3. A list is an ordered collection of values. Lists are mutable, which means that the elements can be changed.
Here’s a quick recap of things you can do with lists:
# Create a list with square brackets [ ]
x = [1, 3, 5]
# Get an element by position with square brackets [ ]
x[0]
# Set an element
x[2] = -1
You can also remove an element of a list with the del
keyword. For instance:
del x[2]
x
[1, 3]
The del
keyword can also be used with dictionaries, which are described in
Section 5.1.3.
5.1.2. Tuples#
A tuple is an ordered collection of values. Think of coordinates. Tuples are immutable, which means that the elements of a tuple can’t be changed.
y = ("hi", 1, 3.7)
y
('hi', 1, 3.7)
type(y)
tuple
You can get elements of a tuple with indexing, just like a list:
y[0]
'hi'
If you try to change the elements, Python raises an error:
y[1] = 3
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[7], line 1
----> 1 y[1] = 3
TypeError: 'tuple' object does not support item assignment
You can also get the elements of a tuple by unpacking them. Unpacking is a
way to assign the elements of a tuple or list to multiple variables. Use =
to
unpack values, making sure that the structure on the left-hand side matches the
structure on the right-hand side:
u, v, w = y
u
'hi'
v
1
w
3.7
Unpacking also works with lists.
Tuples are slightly more efficient than lists, and are generally the best way to return a fixed number of results from a function (see Section 4.2).
5.1.3. Dictionaries#
A dictionary (or dict
) is a one-to-one map from keys to values.
That means that you use keys to look up values in a dictionary. Like lists,
dictionaries are mutable.
You can make a dictionary by enclosing comma-separated key: value
pairs in
curly braces { }
, like this:
x = {"hello": 1, 3: 5}
x
{'hello': 1, 3: 5}
type(x)
dict
The dictionary x
has two keys: "hello"
and 3
. Most of the time, the keys
in a dictionary will be strings, but you can use other types of keys. However,
the keys are required to be unique.
You can get and set elements of a dictionary by indexing with keys:
x["hello"]
1
You can also get elements with the .get
method, which gets an element by key
or returns a default value if the key isn’t in the dictionary:
x.get("hello", 10)
1
x.get("goodbye", 10)
10
You can get the keys or the values in a dictionary with the .keys
and .dict
methods respectfully:
x.keys()
dict_keys(['hello', 3])
x.values()
dict_values([1, 5])
You can use the list
function to convert either of these to an ordinary list:
list(x.keys())
['hello', 3]
Common use cases for dictionaries include creating lookup tables and returning named results from a function.
5.2. Surfing the Web#
The hypertext transfer protocol (HTTP) is a set of rules for communication between computers on a network. HTTP has a request-response model:
The client sends a request to a server. For example, the client might request a file.
The server sends a response to the request.
Your web browser goes through this process every time you visit a website. HTTP requests are also one way to download data sets from the web.
There are several different HTTP request methods:
GET requests data from the server.
Example: Visiting a web site
You can send a small amount of data with a GET request through the URL
POST sends data to the server.
Example: Submitting a web form
The response to a POST request often contains data too
Tip
If you browse the web with Chrome or Firefox, you can inspect the HTTP requests
your brower makes when you visit a website. Go to a website and press
Ctrl-Shift-i
(OS X: Cmd-Shift-i
) to open your browser’s web developer
tools. Click on the “Network” tab and reload the website. You should see the
tab populate with all of the requests your browser makes as it loads the
website.
After you make an HTTP request, the server will send a response. Every HTTP response includes a 3-digit status code to summarize the meaning of the response:
Range |
Meaning |
Name |
---|---|---|
100s (100 - 199) |
Request received, still processing |
Informational |
200s |
Success |
Success |
300s |
The requested data is somewhere else |
Redirect |
400s |
Something’s wrong with the request |
Client Error |
500s |
Something’s wrong with the server |
Server Error |
You can find a list of common status codes in Mozilla’s developer documentation.
A response usually also contains content as text or raw bytes. Some common formats for the content are:
Hypertext Markup Language (HTML)
Extensible Markup Language (XML)
JavaScript Object Notation (JSON)
5.2.1. The requests Package#
The requests package provides functions for making HTTP requests from
Python. You can use requests to make a GET request with the get
function:
import requests
response = requests.get("https://library.ucdavis.edu")
The status code of the response is in the .status_code
attribute:
response.status_code
200
If you want Python to raise an error any time the status code indicates a
problem, you can use the .raise_for_status
method. It doesn’t do anything if
the status code is okay:
response.raise_for_status()
The content of the response is stored as bytes in the .contents
attribute and
as text in the .text
attribute. Choose which one to use based on what kind of
site or file you requested. Web pages (.html
) are usually text files, so for
the Library site it makes more sense to use .text
to access the content:
text = response.text
# Only display the first 200 characters
print(text[:200])
<!-- This page is cached by the Hummingbird Performance plugin v3.7.2 - https://wordpress.org/plugins/hummingbird-performance/. -->
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF
5.2.2. Request Etiquette#
Making an HTTP request is not free. It has a real cost in CPU time and electricity. Server administrators will not appreciate it if you make too many requests or make requests too quickly. So:
Use the
time.sleep
function to slow down any requests you make in a loop (Section 5.3 explains loops). Aim for no more than 20-30 requests per second.Install and use the requests_cache package to avoid downloading extra data when you make the same request twice.
Failing to be polite can get you banned from websites!
# In a Terminal (not Python console!):
# conda install -c conda-forge requests-cache
#
import requests_cache
requests_cache.install_cache("my_cache")
5.2.3. Web APIs#
An application programming interface (API) is a collection of functions and data structures for communicating with other software. For instance, whenever you use a Python package, you’re using the API created by the package’s developers.
Websites sometimes provide a web API so that programmers can create applications that access data and settings. The most common kind of web API is representational state transfer (REST). The key ideas of REST are that:
You can “call” functions by making HTTP requests.
URLs called endpoints serve as function names.
Separate function calls are handled independently of each other. The server doesn’t remember previous calls.
For example, the Star Wars API (SWAPI) is a REST API with endpoints that return data about the Star Wars universe. The best way to learn how to use a web API is to read its documentation, and fortunately the Star Wars API is well-documented.
Try making a request to SWAPI’s “people” endpoint (the endpoint URL comes from the SWAPI documentation):
url = "https://swapi.dev/api/people/1/"
response = requests.get(url)
Don’t forget to check the status of the response:
response.status_code
200
A 200 status code means the request was successful, so the next step is to take a look at the content of the response.
Unless you’ve already read through the SWAPI documentation, you probably don’t
know whether the content will be bytes or text. When you’re unsure, you can try
printing the .text
value to check whether the content looks like text. Byte
content will usually look like gibberish if you try to display it as text.
Here’s the content of the response:
response.text
'{"name":"Luke Skywalker","height":"172","mass":"77","hair_color":"blond","skin_color":"fair","eye_color":"blue","birth_year":"19BBY","gender":"male","homeworld":"https://swapi.dev/api/planets/1/","films":["https://swapi.dev/api/films/1/","https://swapi.dev/api/films/2/","https://swapi.dev/api/films/3/","https://swapi.dev/api/films/6/"],"species":[],"vehicles":["https://swapi.dev/api/vehicles/14/","https://swapi.dev/api/vehicles/30/"],"starships":["https://swapi.dev/api/starships/12/","https://swapi.dev/api/starships/22/"],"created":"2014-12-09T13:50:51.644000Z","edited":"2014-12-20T21:17:56.891000Z","url":"https://swapi.dev/api/people/1/"}'
This is text about Luke Skywalker. In fact, according to the SWAPI documentation, all of the SWAPI endpoints return text.
Although the response content is text, it isn’t ordinary English text. It’s full of curly braces, square brackets, quotes, and other punctuation. If you think back to the beginning (Section 5.1) of this chapter, the way the punctuation is arranged might start to look familiar…
5.2.4. JavaScript Object Notation#
The response content looks a lot like Python lists and dictionaries! The response is not actually Python code, but it is JavaScript code. In JavaScript, lists and dictionaries are written the same way as in Python. The response is in a popular non-tabular data format called JavaScript Object Notation (JSON).
JavaScript Object Notation (JSON) looks and works a lot like Python lists and
dictionaries. Lists are surrounded with [ ]
, and dictionaries are surrounded
with { }
. There are many Python packages for converting JSON text into Python
lists and dictionaries.
Tip
You can use Python’s built-in json module to read a JSON file you didn’t
get from an HTTP request. For example, Jupyter notebooks (.ipynb
files) are
in JSON format.
When you know the content of a response is in JSON format, you can use the
.json
method on the response to convert it into Python lists and
dictionaries:
luke = response.json()
luke
{'name': 'Luke Skywalker',
'height': '172',
'mass': '77',
'hair_color': 'blond',
'skin_color': 'fair',
'eye_color': 'blue',
'birth_year': '19BBY',
'gender': 'male',
'homeworld': 'https://swapi.dev/api/planets/1/',
'films': ['https://swapi.dev/api/films/1/',
'https://swapi.dev/api/films/2/',
'https://swapi.dev/api/films/3/',
'https://swapi.dev/api/films/6/'],
'species': [],
'vehicles': ['https://swapi.dev/api/vehicles/14/',
'https://swapi.dev/api/vehicles/30/'],
'starships': ['https://swapi.dev/api/starships/12/',
'https://swapi.dev/api/starships/22/'],
'created': '2014-12-09T13:50:51.644000Z',
'edited': '2014-12-20T21:17:56.891000Z',
'url': 'https://swapi.dev/api/people/1/'}
type(luke)
dict
In this case, the outer object is a dictionary. The keys in the dictionary are:
list(luke.keys())
['name',
'height',
'mass',
'hair_color',
'skin_color',
'eye_color',
'birth_year',
'gender',
'homeworld',
'films',
'species',
'vehicles',
'starships',
'created',
'edited',
'url']
These keys describe the included information about Luke. For example, to get his eye color, you can use indexing:
luke["eye_color"]
'blue'
Sometimes data in JSON format will contain several layers of dictionaries and
lists. Remember that you can use indexing with square brackets [ ]
to
navigate these.
5.2.5. Using Endpoints#
The SWAPI endpoint /people/
returns information in JSON format about
different people in the Star Wars universe. An ID number after /people/
determines the person for which the endpoint returns information. Think of the
ID number as an argument to the endpoint. For instance, Luke Skywalker’s ID
number is 1
, so the URL for information about Luke is:
https://swapi.dev/api/people/1/
A good way to work with an endpoint that accepts arguments is to use Python to
paste the arguments onto the end. For example, this code pastes the number 2
onto the end of the string endpoint
:
endpoint = "https://swapi.dev/api/people/"
endpoint + str(2)
'https://swapi.dev/api/people/2'
The str
function converts the ID number into a string, and +
pastes the two
strings together.
Now try getting the information for the person with ID number 2
:
response = requests.get(endpoint + str(2))
# Raise an error if the request failed
response.raise_for_status()
person = response.json()
person
{'name': 'C-3PO',
'height': '167',
'mass': '75',
'hair_color': 'n/a',
'skin_color': 'gold',
'eye_color': 'yellow',
'birth_year': '112BBY',
'gender': 'n/a',
'homeworld': 'https://swapi.dev/api/planets/1/',
'films': ['https://swapi.dev/api/films/1/',
'https://swapi.dev/api/films/2/',
'https://swapi.dev/api/films/3/',
'https://swapi.dev/api/films/4/',
'https://swapi.dev/api/films/5/',
'https://swapi.dev/api/films/6/'],
'species': ['https://swapi.dev/api/species/2/'],
'vehicles': [],
'starships': [],
'created': '2014-12-10T15:10:51.357000Z',
'edited': '2014-12-20T21:17:50.309000Z',
'url': 'https://swapi.dev/api/people/2/'}
So the “person” with ID number 2
is actually the android C3-PO.
You can make using the endpoint more convenient by writing a Python function that takes an ID number as input, makes a request to the endpoint, and then returns the result. Here’s the code:
def get_swapi_person(id_num):
endpoint = "https://swapi.dev/api/people/"
# Note that 'id_num' replaces '2'
response = requests.get(endpoint + str(id_num))
# Raise an error if the request failed
response.raise_for_status()
return response.json()
As always, when you write a function, it’s important to test it out. For example:
get_swapi_person(50)
{'name': 'Ben Quadinaros',
'height': '163',
'mass': '65',
'hair_color': 'none',
'skin_color': 'grey, green, yellow',
'eye_color': 'orange',
'birth_year': 'unknown',
'gender': 'male',
'homeworld': 'https://swapi.dev/api/planets/41/',
'films': ['https://swapi.dev/api/films/4/'],
'species': ['https://swapi.dev/api/species/19/'],
'vehicles': [],
'starships': [],
'created': '2014-12-20T10:08:33.777000Z',
'edited': '2014-12-20T21:17:50.417000Z',
'url': 'https://swapi.dev/api/people/50/'}
When you’re working with a web API, it’s generally a good idea to create
wrapper functions, like get_swapi_person
, that turn each endpoint into an
actual Python function by wrapping up (or encapsulating) the requests code.
5.3. Loops#
Now suppose you want to get the information for the first 15 people in SWAPI.
You could do this by writing 15 calls to get_swapi_person
, but that would be
very tedious.
One major benefit of using a programming language like Python is that repetitive tasks can be automated. A loop is a way to automate doing a similar operation several times, so that you don’t have to repeat the code several times. Loops are a feature of almost all modern programming languages, so it’s useful to understand them.
In Python, there are two kinds of loops. We’ll focus on for-loops.
For-loops iterate over some object, and compute something for each element.
Each one of these computations is one iteration. A for-loop begins with the
for
keyword, followed by:
A placeholder variable, which will automatically be assigned an element at the beginning of each iteration
The
in
keywordAn object with elements
A colon
:
Code in the body of the loop must be indented by 4 spaces.
For example, to print out the names in a list names
, you can write:
names = ["Arthur", "Nick", "Cameron", "Pamela"]
for name in names:
print(name)
Arthur
Nick
Cameron
Pamela
The code in the body of the loop is evaluated once for each name in the list (that is, once for each iteration).
A common technique when programming with for-loops is to iterate over a
sequence of numbers. Then you can use the numbers to index other objects. In
Python, the range
function generates a sequence of numbers starting from 0.
As an example, here’s a version of the loop above that uses indexes:
for i in range(4):
print(names[i])
Arthur
Nick
Cameron
Pamela
Sometimes you might want both the indexes and the values for an object. In that
case, you can use the enumerate
function. When you use enumerate
in a
loop, the loop syntax is slightly different:
for i, name in enumerate(names):
print("This is iteration " + str(i))
print(name)
This is iteration 0
Arthur
This is iteration 1
Nick
This is iteration 2
Cameron
This is iteration 3
Pamela
Most of the time, you’ll want to save the results from the iterations into a
variable for use later on, rather than print them all out. The best way to do
this is to create an empty list before the loop, and then append to it with the
.append
method. For example, here’s a loop that computes the cumulative sums
from left to right for the numbers in a list:
nums = [1, 3, 2, -2, 10]
sums = []
total = 0
for num in nums:
total = total + num
sums.append(total)
sums
[1, 4, 6, 4, 14]
You can use a loop to get the information about the first 15 people in SWAPI. When you send HTTP requests (or call a function that sends HTTP requests) in a loop, be careful to follow the guidelines for request etiquette described in Section 5.2.2. For loops, limiting the number of requests per second is especially important. Here’s the code to get the information for the first 15 people:
import time
people = []
for i in range(15):
# i + 1 because range starts from 0
person = get_swapi_person(i + 1)
people.append(person)
# Do nothing for 1/10 of a second, to prevent making requests too quickly
time.sleep(0.1)
After running the loop, you can inspect the returned information in the list
people
:
people[0]
{'name': 'Luke Skywalker',
'height': '172',
'mass': '77',
'hair_color': 'blond',
'skin_color': 'fair',
'eye_color': 'blue',
'birth_year': '19BBY',
'gender': 'male',
'homeworld': 'https://swapi.dev/api/planets/1/',
'films': ['https://swapi.dev/api/films/1/',
'https://swapi.dev/api/films/2/',
'https://swapi.dev/api/films/3/',
'https://swapi.dev/api/films/6/'],
'species': [],
'vehicles': ['https://swapi.dev/api/vehicles/14/',
'https://swapi.dev/api/vehicles/30/'],
'starships': ['https://swapi.dev/api/starships/12/',
'https://swapi.dev/api/starships/22/'],
'created': '2014-12-09T13:50:51.644000Z',
'edited': '2014-12-20T21:17:56.891000Z',
'url': 'https://swapi.dev/api/people/1/'}
5.3.1. Comprehensions#
Python provides an efficient and powerful alternative to loops called comprehensions. Comprehensions automatically create a list or dictionary. A list comprehension creates a list, and a dictionary comprehension creates a dictionary.
Comprehensions are especially helpful if you have a list or dictionary and want
to transform each element in the same way. For instance, suppose you want to
get the eye color for each person in the people
list. You can get the eye
color for the first person with the code:
people[0]["eye_color"]
'blue'
You could use this code with a loop that changes the first index (0
) in order
to get the eye color of every person. However, you can do the same thing as a
loop more concisely with a list comprehension. Here’s the code:
eye_colors = [person["eye_color"] for person in people]
eye_colors[0]
'blue'
The syntax for a list comprehension includes the keywords for
and in
, just
like a for-loop. The difference is that in the list comprehension, the repeated
code comes before the for
keyword rather than after it, and the entire
expression is enclosed in square brackets [ ]
.
You can use list comprehensions to get other information as well. For instance, here’s the code to get lists of names and heights for each person in the list:
names = [person["name"] for person in people]
heights = [person["height"] for person in people]
You can learn more about comprehensions in the official Python documentation.
5.4. Exporting Data#
The extracted names
, heights
, and eye_colors
lists are like columns in a
DataFrame where each row is one person, so you might as well put them in a
DataFrame. You can use the pd.DataFrame
function and a dictionary of columns
to do this:
import pandas as pd
people_df = pd.DataFrame({"name": names, "height": heights,
"eye_color": eye_colors})
people_df
name | height | eye_color | |
---|---|---|---|
0 | Luke Skywalker | 172 | blue |
1 | C-3PO | 167 | yellow |
2 | R2-D2 | 96 | red |
3 | Darth Vader | 202 | yellow |
4 | Leia Organa | 150 | brown |
5 | Owen Lars | 178 | blue |
6 | Beru Whitesun lars | 165 | blue |
7 | R5-D4 | 97 | red |
8 | Biggs Darklighter | 183 | brown |
9 | Obi-Wan Kenobi | 182 | blue-gray |
10 | Anakin Skywalker | 188 | blue |
11 | Wilhuff Tarkin | 180 | blue |
12 | Chewbacca | 228 | blue |
13 | Han Solo | 180 | brown |
14 | Greedo | 173 | black |
Pandas provides a variety of methods for saving DataFrames to a file. Most of
these begin with .to_
. For example, to save a DataFrame to a CSV file, use
the .to_csv
method:
people_df.to_csv("swapi_people.csv")
It’s a good idea to save your data sets like this whenever you reach a checkpoint in the problem you’re trying to solve.
5.5. Practice Exercises#
5.5.1. Exercise 1#
Create a CSV file for the first 15 vehicles in SWAPI. The file should include columns for cargo capacity, cost, length, manufacturer, model, and passengers.
5.5.2. Exercise 2#
Create a histogram from the heights of the first 30 people in SWAPI.
Hint
You can use the float
function to convert a string to a decimal number.