3 Hands-On with SQL Code
Remember, SQL is a language that can help answer questions about the data in a database.
3.1 Viewing Data
What if we want to know, who are the users in the database?
3.1.1 SELECT & FROM
Let’s see how we can view the users table with SQL by writing our first query! Type this into the SQL Editor Pane:
SELECT * FROM users;
Place your cursor anywhere before the semicolon and then click the Execute query button or F9 key on your keyboard.
This is the most common query you’ll write as you start exploring your data. SELECT
is used to display the columns of tables you want and FROM
is used to specify which table to get the columns from.
The query asks the database to select everything (*
means “everything”) from
the table users
. It ends with a semicolon to tell the database that this is
the end of the request.
SQL ignores capitalization, spaces, and new lines in a query. Some tools which use SQL also ignore semicolons. However, it’s conventional to:
- Write SQL keywords (
SELECT
,FROM
, and so on) in UPPERCASE - Write table and column names in lowercase
- Write a semicolon
;
at the end of the query
You can use new lines to help organize queries and make them easier to read. For instance, this query does the same thing as the query above:
SELECT *
FROM users;
3.1.2 Selecting Columns
What if we don’t want to see all of the columns in the table? We can ask for just the columns we want to see. Let’s get the first name, birth date, phone, city, and zip code.
SELECT
first_name,
birth_date,
phone,
city,
zip_code
FROM users;
NOTE:
You can rearrange the columns however you’d like in theSELECT
statement and select a column multiple times.
3.1.3 Unique Values
What if we now want to know, in what cities do the people live in? We could scroll through the results and try to keep track of unique values, but that is tedious.
Instead we can use the SELECT DISTINCT
keywords on one or more columns to show
all the unique values.
Let’s look at the users table again and see which cities our users live in.
SELECT DISTINCT city
FROM users;
If we wanted to see unique combinations across multiple columns, we just add the columns we’re interested in. The query below shows unique city and zip code combinations.
SELECT DISTINCT
city,
zip_code
FROM users;
3.1.4 Ordering Results
What if we want to know, are the users young or old? We could scroll through the table and look their birth dates.
While we continue our data exploration, we also might want to sort the results in a specific way.
With SQL, you can sort on one or more columns with a combination of ascending or descending order using the ORDER BY
keyword.
Let’s sort the users table by birth date.
SELECT *
FROM users
ORDER BY birth_date;
By default, ORDER BY
sorts in ascending order. We can sort in descending order to get users born more recently by adding DESC
after the column.
SELECT *
FROM users
ORDER BY birth_date DESC;
CHALLENGE:
How would you sort users by zip code in descending order and then birth date in ascending order?
3.1.5 Limiting Number of Rows
Sometimes you’ll be working with a large table to analyze with lots of columns and rows. You can use LIMIT
to reduce the number of rows the query returns to give you a snapshot of the data you’re working with.
SELECT *
FROM users
LIMIT 10;
3.1.6 Commenting
As we’re writing queries, sometimes we want to write helpful comments to ourselves and others. There’s two ways to write comments so that text won’t be interpreted as SQL.
- Single line comments: text following two dashses “–”, like so
-- comment here
- Multiline comments: text between the characters “/* */”, like so
/* comment here */
/*
all users in the library system
sorted by most recent birth date
*/
SELECT *
FROM users
ORDER BY birth_date DESC; -- sort most recent here
CHECKPOINT:
Try and use all viewing keywords together to give you insight to the dataset. Write your insights in comments.
3.2 Filtering Data
Now that we’ve seen some ways to view our data, let’s learn how we can filter our data. This is really the core of SQL, where we can start to answer our own questions about the data!
We use the WHERE
clause to filter rows of a query by specifying one or more
conditions. The table below shows comparison operators that can be used and
combined with WHERE
to create conditions, some of which you may have seen
before in other programming languages.
Comparison Operator | Description |
---|---|
= |
equals |
> |
greater than |
>= |
greater than or equal to |
< |
less than |
<= |
less than or equal to |
<> or != |
not equal to |
In general, the type of data on each side of the operator needs to be the same: compare numbers to numbers, text to text, and so on. When working with text or date data, it’s necessary to wrap values in single quotes. For instance:
'text_value'
is a text value'2000-01-01'
is a date
Now let’s write a query to find all users that live in Davis:
SELECT *
FROM users
WHERE city = 'Davis';
SQL also provides a variety of arithmetic operators for working with numeric data:
Arithmetic Operators | Description |
---|---|
+ |
addition |
- |
subtraction |
* |
multiplication |
/ |
division |
% |
modulus |
3.2.1 AND & OR Operators
Often we’ll need to filter data based on more than one condition. We can ask
WHERE
to check multiple conditions with the keywords:
AND
, meaning a row must satisfy both conditionsOR
, meaning a row must satisfy at least one of two conditions
These are logical operators in SQL that evaluate to TRUE or FALSE.
Let’s find all users who live in Davis and were born before 1980.
SELECT *
FROM users
WHERE city = 'Davis' AND
zip_code < '1980-01-01';
Let’s now find all users who live in Davis or Sacramento.
SELECT *
FROM users
WHERE city = 'Davis' OR
city = 'Sacramento';
NOTE:
AND
&OR
will follow the order of operations. To useAND
&OR
in the same query, wrap parenthesis around theOR
conditions.
3.2.2 IN Operator
Sometimes we find ourselves writing multiple OR
conditions on the same column which can make our query look more complex than it really is. In this scenario we can condense multiple OR
conditions using IN
.
Let’s rewrite our previous query to use IN
.
SELECT *
FROM users
WHERE city IN ('Davis', 'Sacramento');
3.2.3 BETWEEN Operator
We can also also filter using ranges of values with BETWEEN
. This is handy when you’re working with numerical or date values and you don’t want to list out all possible values to meet your conditions.
Let’s write a query to find all users born in the 1990’s.
SELECT *
FROM users
WHERE birth_date BETWEEN '1990-01-01' AND '1999-12-31';
CHALLENGE:
Find all users with a zip code from 95000 - 96000.
3.2.4 LIKE Operator
The LIKE
keyword tests whether text values match a given pattern. There are
two different wildcard characters that you can use in the pattern:
_
matches exactly one character%
matches zero or more characters
Let’s look at an example to indicate we want to match the beginning of a string, but the end is allowed to vary. To show this, let’s find users that have a phone number area code that begins with 530 and can end with anything.
SELECT *
FROM users
WHERE phone LIKE '530%';
NOTE:
The wildcard%
can be used multiple times in one pattern.You can also use regular expressions in SQLite to match in more complicated situations. Read more about using regular expressions here.
The LIKE keyword differs between dialects of SQL, so it’s also a good idea to check the documentation for your RDBMS before using LIKE.
CHALLENGE:
How would you write a query so that the wild card finds users with an email that ends with icloud.com?
3.2.5 IS NULL Operator
So far we’ve worked with complete data, but how do we work with missing data?
In databases, NULL
means missing data. IS NULL
is used to test whether
there is missing data in a column.
Let’s look at an example to find users with a missing address.
SELECT *
FROM users
WHERE address IS NULL;
CHALLENGE:
How would you write a query to find users with a missing phone number or missing email?
3.2.6 NOT Operator
There will be times where we want to find only the rows that do not satisfy
some condition. To do this, use NOT
combined with other operators:
NOT IN
NOT BETWEEN
NOT LIKE
IS NOT NULL
Below is a query to find users that do not have a phone number area code that begins with 530.
SELECT *
FROM users
WHERE phone NOT LIKE '530%';
3.3 Aggregating Data
We’ve just looked at a number of ways to filter data, but now let’s look at some ways to aggregate data.
3.3.1 Count
Suppose we want to find out how many books have been checked out.
We can count the total number of rows in a table using the COUNT
function.
The function takes a column or *
as an argument, but the argument doesn’t
actually affect the count.
Here’s how we can use COUNT
to answer our question:
SELECT
COUNT(id)
FROM checkouts;
NOTE:
You can combineDISTINCT
withCOUNT
usingCOUNT(DISTINCT column_name)
to get a unique count of values in a column when duplicate values exist.CHALLENGE:
Find the total number of users that have checked out a book.
3.3.1.1 Renaming/Aliasing Columns
In the previous query, notice that the name of the column in the result is
COUNT(id)
, which isn’t easy to use in subsequent SQL queries or with other
data programming tools.
We can use AS
to rename or alias a column in the result of the query.
This is handy if you’re planning to export the result for future use,
especially if you’re sending it to someone else.
Let’s rename the column to total_checkouts
:
SELECT
COUNT(id) AS total_checkouts
FROM checkouts;
We can also include WHERE
. Let’s see how many books were checked out just today.
SELECT
COUNT(id) AS total_checkouts
FROM checkouts
WHERE checkout_date = '2022-04-11';
3.3.2 Average
The AVG
function returns the average value of a numeric column. Let’s find
the average number of days a book is checked out for:
SELECT
AVG(days_checking_out) AS avg_days_checking_out
FROM checkouts;
3.3.3 Sum
We can also sum the values in a numeric column with the SUM
function. Let’s
find the total number of books that were returned with damage:
SELECT
SUM(returned_with_damage) AS books_returned_with_damage
FROM checkouts;
Here’s another way to accomplish the same task using COUNT
and WHERE
:
SELECT
COUNT(id) AS books_returned_with_damage
FROM checkouts
WHERE returned_with_damage = 1;
3.3.4 Grouping Data
So now you’ve seen several functions working on a single column, but we sometimes want to summarize our data in more sophisticated ways. Let’s see what grouping can do for our data. Let’s write a query that counts the number of books checked out per day:
SELECT
checkout_date,
COUNT(book_id) AS books_checked_out
FROM checkouts
GROUP BY checkout_date;
Notice here how we asked for two columns - the checkout_date
and the count of
book_id
.
CHALLENGE:
You can alsoGROUP BY
more than one column. How would you find the total number of times a person checked out a book on a given day?
3.3.5 Having
HAVING
is similar to WHERE
, but it specifically works with GROUP BY
. Perhaps we’re only interested in days that had more than 5 checkouts. Let’s see what that looks like:
SELECT
checkout_date,
COUNT(book_id) AS books_checked_out
FROM checkouts
GROUP BY checkout_date
HAVING COUNT(book_id) > 5;
Now we’ve seen how we can use functions to aggregate data and how grouping data can give us meaningful insights. There are, of course, other functions available in SQL and we can’t go over all of them here, but now you’ve seen how they work and can apply your knowledge to new functions you find.
3.4 Joining Data
Joining tables allows us to combine information from more than one table into a new table. The tables need to have a key column to be able to link the tables together. In our Library Checkouts ERD, the id column in books is a key column that links to
book_id in checkouts
and
book_id in book_genre_link
3.4.1 JOIN Types
SQL has 4 main kinds of joins:
NOTE:
The above images come from the W3Schools’ SQL join page, an excellent resource for learning more about SQL.
What kinds of joins are there?
INNER JOIN
: Returns rows that have matching values in both tables; it gets you what’s in the middle of the venn diagram.LEFT JOIN
: Returns all rows from the left table, and the matched rows from the right table; the “left” table is the first table you write in the query.RIGHT JOIN
: Returns all rows from the right table, and the matched rows from the left table; the “right” table is the second table you write in the query or the “join” table.NOTE:
This is not supported in SQLite, however switching the tables in aLEFT JOIN
will emulate aRIGHT JOIN
.FULL OUTER JOIN
: Returns all rows when there is a match in either left or right table; nulls are generated in the table when a row in one table doesn’t have a match in the other table.NOTE:
This is not supported in SQLite, however you can emulate aFULL OUTER JOIN
described here.
3.4.2 JOIN steps
The anatomy of a JOIN:
SELECT
left_table.column1,
right_table.column1,
...
FROM left_table
[INNER | LEFT] JOIN right_table ON left_table.key_column = right_table.key_column;
Below are the steps for writing a JOIN:
SELECT
the columns we want in the output (using column references).
NOTE:
Column referencing helps you distinguish from what table the columns come from. Use the table name followed by a period to reference the column:table_name.column_name
.
- Then we have the
FROM
statement to tell it which table to start with (this is our “left” table). - Then we need our
JOIN
statement to say which table should get joined (this is our “right” table) - Finally, we have to say which columns the join should be based on with
ON
.
3.4.3 INNER JOIN
Let’s try an INNER JOIN
to see how this works:
SELECT
books.title AS book,
checkouts.checkout_date
FROM books
INNER JOIN checkouts ON books.id = checkouts.book_id;
We interpret the INNER JOIN
query as, “all books that have been checked out.”
3.4.4 LEFT JOIN
Now let’s try a LEFT JOIN
:
SELECT
books.title AS book,
checkouts.checkout_date
FROM books
LEFT JOIN checkouts ON books.id = checkouts.book_id;
We interpret the LEFT JOIN
query as, “all books and if they have been checked out or not.”
You might be thinking, what would happen if the tables in the LEFT JOIN
were flipped? We would get the same result as the INNER JOIN
query! That’s because there’s no instances where a checkout without a book could ever happen!
CHALLENGE:
Can you write a query that contains the title of the books and the names of the genres they’re categorized to?
3.5 Subqueries
So far we’ve been working with one SELECT
statement, but we can actually combine multiple SELECT
statements using subqueries. Subqueries are nested queries enclosed in parentheses that can be used with other keywords like JOIN
and WHERE
. Below are 2 examples of these use cases.
Let’s first look at a subquery in the WHERE
clause:
-- main query
SELECT *
FROM checkouts
WHERE book_id IN (
-- subquery
SELECT id
FROM books
WHERE format = 'Hardcover'
);
We retrieve hardcover books in a subquery. We then use the results of the subquery to filter the checkouts table using checkouts.book_id. In the end we get, “all checked out hardcover books.”
NOTE:
When writing a subquery withWHERE
andIN
, the subquery must select only one column forIN
to filter on.
Now let’s look at a subquery with JOIN
:
-- main query
SELECT
checkouts.user_id,
hardcover_books.title AS book,
checkouts.checkout_date,
checkouts.return_date
FROM checkouts
INNER JOIN (
-- subquery
SELECT *
FROM books
WHERE format = 'Hardcover'
) AS hardcover_books ON checkouts.book_id = hardcover_books.id;
We retrieve hardcover books in a subquery, just like before. Next, we write the subquery after JOIN
and alias it hardcover_books. Finally, we join on the keys and return the columns we want in the main SELECT
statement.
NOTE:
If you’re still a bit confused, just remember that every query results in a table. Subqueries give you the ability to create a “new” table on the fly even if that table didn’t exist in the database before.Also, note that both subquery examples are just different ways to get to the same result, “checked out hardcover books.”
3.6 Saving Query Results
There will be times when we want to save the results of a query so we can reuse it later when needed. Two commands to save a query as a new database object follow:
CREATE TEMPORARY TABLE
- This is a new table added to the database, just like the tables you’ve been working with, except it is only available in the current session. You typically do this to break down a complex problem into intermediate steps and pass your saved results to the final query.
CREATE VIEW
- A view is simply a saved query that can be executed when called. The query you save will usually be made up of multiple tables with added conditions if needed. You can use it in pretty much the same way you would a table. The only major difference is that a view, because it is updating from other tables, is not able to be edited.
If we want to create a temporary table, we just need to add CREATE TEMPORARY TABLE our_new_table_name AS
at the beginning of the query (adding in our own table name, of course). This is what it looks like:
CREATE TEMPORARY TABLE davis_resident_checkouts AS
SELECT
users.id AS user_id,
users.first_name,
users.last_name,
books.title AS book,
checkouts.checkout_date,
checkouts.return_date
FROM users
INNER JOIN checkouts ON users.id = checkouts.user_id
INNER JOIN books ON checkouts.book_id = books.id
WHERE users.city = 'Davis';
In much the same way we made the new table, we can make a view:
CREATE VIEW davis_resident_checkouts AS
SELECT
users.id AS user_id,
users.first_name,
users.last_name,
books.title AS book,
checkouts.checkout_date,
checkouts.return_date
FROM users
INNER JOIN checkouts ON users.id = checkouts.user_id
INNER JOIN books ON checkouts.book_id = books.id
WHERE users.city = 'Davis';
3.7 Data Management
3.7.1 Update Tables
You might have noticed at the beginning the users table has NULL values across different columns. We can fix this fairly easily, but we need to be careful. It’s challenging to undo something in a database so we want to be sure we’re doing it right. Let’s update the NULL values in the users.address to “N/A”.
It’s first helpful to write a query to be sure these are the rows you want to update:
SELECT *
FROM users
WHERE address IS NULL;
Once you’ve confirmed, the below statement updates the NULL values to “N/A”
UPDATE users
SET address = 'N/A'
WHERE address IS NULL;
The SET
keyword specifically targets just the address column and replaces NULL values with “N/A” when the condition is met in the WHERE
clause. It leaves the other values alone. If the WHERE
clause is removed, it will set all values in the whole column to “N/A” overwriting the users address, so proceed with caution!
3.7.2 Add & Populate a Column
Sometimes we want to make a new column and add data into it. Let’s make a new column called country in the users table and populate it with “USA” if there is a value in the column state.
First we’ll add the column and set the default value to “N/A”:
ALTER TABLE users
ADD country TEXT DEFAULT 'N/A';
NOTE:
TheDEFAULT
argument is optional, but if you leave it blank, it will make the default value NULL.
Now we update all values in the column to be “USA” where the state is not null.
UPDATE users
SET country = 'USA'
WHERE state IS NOT NULL;