Basic SQL Management - CompTIA Linux+ / LPIC-1 Cert Guide (Exams LX0-103 & LX0-104/101-400 & 102-400) (2016)

CompTIA Linux+ / LPIC-1 Cert Guide (Exams LX0-103 & LX0-104/101-400 & 102-400) (2016)

Chapter 13. Basic SQL Management

This chapter covers the following topics:

Image Database Basics

Image Learning SQL

This chapter covers the following objectives:

Image SQL data management: 105.3

Data is usually stored in some kind of database and the most popular databases are queried with a language called Structured Query Language.

As an administrator you’ll be responsible for maintaining the servers and often will be called to troubleshoot connection problems or help people use the database.

Additionally, many open source projects use a database to store their data, so learning how to query the database lets you get information out of the application that the developers didn’t anticipate.

“Do I Know This Already?” Quiz

The “Do I Know This Already?” quiz enables you to assess whether you should read this entire chapter or simply jump to the “Exam Preparation Tasks” section for review. If you are in doubt, read the entire chapter. Table 13-1 outlines the major headings in this chapter and the corresponding “Do I Know This Already?” quiz questions. You can find the answers in Appendix A, “Answers to the ‘Do I Know This Already?’ Quizzes and Review Questions.”

Image

Table 13-1 “Do I Know This Already?” Foundation Topics Section-to-Question Mapping

1. Which of the following is incorrect about relational databases?

a. Tables are arranged in rows and columns.

b. New data elements can be added to a row without changing the schema.

c. SQL is used to query the data.

d. Rows are given unique identifiers.

2. What do databases use to look up data quickly?

a. Binary object

b. Client-server

c. Embedded database

d. Index

3. Which of the following are case sensitive when writing SQL queries?

a. Names of tables

b. Keywords such as SELECT and INSERT

c. Logic operators such as AND and OR

d. Names of database servers

4. Which of the following queries will successfully return people from the employee table who are at least 30 years old?

a. SELECT * FROM Employee WHERE age >= 30;

b. SELECT * WHERE age >= 30

c. SELECT * FROM employee WHERE age >= 30;

d. SELECT * WHERE age >=30 FROM employee;

5. Given two tables representing cars and their owners:

cars: id, model, owner_id
owners: id, name

How would you display a list of all cars and their owners, including cars that don’t have owners?

a. SELECT * FROM owners LEFT JOIN cars ON owners.id = cars.owner_id;

b. SELECT * FROM cars, owners WHERE cars.owner_id = owners.id;

c. SELECT * FROM cars JOIN owners ON owners.id = cars.owner_id;

d. SELECT * FROM cars LEFT JOIN owners ON owners.id = cars.owner_id;

6. Given a table of cars and a table recalled models, which of the following uses a subselect correctly to find cars where the model has been recalled since January 1, 2010?

cars: id, model, owner_name
recalls: id, model, recall_date

a. SELECT * FROM cars LEFT JOIN recalls ON (recalls.model=cars.model) WHERE (recall_date >= ‘2010-01-01’);

b. SELECT * FROM cars WHERE model IN (SELECT * FROM recalls WHERE recall_date >= ‘2010-01-01’);

c. SELECT * FROM cars WHERE model IN (SELECT model FROM recalls WHERE recall_date >= ‘2010-01-01’);

d. SELECT * FROM cars WHERE model IN (SELECT model FROM recalls) WHERE recall_date >= ‘2010-01-01’;

7. Your employee table has a column for last name and office location. How would you produce a report of people sorted first by office and then by last name?

a. SELECT * FROM employee ORDER BY office_location, last_name;

b. SELECT * FROM employee ORDER BY last_name GROUP BY office_location;

c. SELECT * FROM employee ORDER BY last_name, office_location;

d. SELECT * FROM employee ORDER BY office_location GROUP BY last_name;

8. Your employee table has a column called years_of_service that tracks how many years each employee has worked with the company. It is year end and you’ve been asked to increase this number by one for every employee. Which command does this?

a. INSERT INTO employee SET years_of_service=years_of_service+1;

b. UPDATE employee SET years_of_service=years_of_service+1;

c. UPDATE years_of_service SET years_of_service+1;

d. UPDATE employee(years_of_service) += 1;

Foundation Topics

Database Basics

Most databases follow a client-server model where an application, called the client, connects over the network to a database server that stores the data. The client issues requests to the server and reads the response back over the network. Sometimes the client and the server are on the same machine but are separate entities and must still talk over some kind of connection.

Sometimes the client is an application, such as something graphical running on a user’s desktop. Other times a web server issues the requests when a browser asks and renders web pages based on the data returned from the database. Another possibility is that a user connects to the database and issues queries directly, perhaps importing into a spreadsheet or other analytical tool.

Types of Databases

Data differs based on the application, so no one database solution fits all. However, most databases fall into one of three camps: key-value, relational, and schemaless.

Key-Value Databases

A key-value database acts much like a dictionary. The keys are the words and the values are the definitions. If you know the key you can quickly find the value associated with it. Figure 13-1 shows what a key-value database could look like. The values can be simple words, a data structure that the client can understand, or a binary object like a picture.

Image

Figure 13-1 A key-value database

Key-value databases can be so simple that they don’t even need to have a server running. They can be simple files on disk that a local client opens and reads. Tools like rpm and apt store their data in key-value databases because there is no overhead in running a server, and the application benefits from the speed of looking up keys and indexes that can make searches of the values themselves fast.

Image

An index helps a database find content quickly. If you wanted to search a key-value database by value, you would have to visit every key-value pair to find what you want, which is time consuming. An index provides a quick lookup of keys that contain a certain value. There is a performance penalty when writing to the database because each index must be updated, but the speed gain when querying usually makes up for this.

Examples of key-value databases that operate on local files are Berkeley DB and the original Unix DBM, which is the database manager. Network-aware servers that operate in a key-value format include Redis, Dynamo, and Riak.

Relational Databases

Relational databases are given their name because they deal with both data and the relationships between the data. A table stores information about a single entity such as a book with multiple columns storing different pieces of information such as the title, price, and date of first print. The database manages relationships between tables such as between a book and its authors. The database also provides a way to piece the tables together and perform queries, which is almost always the Structured Query Language (SQL).

Figure 13-2 shows a sample set of tables and their relationships.

Image

Figure 13-2 A set of tables and their relationships

Image

In Figure 13-2 a book contains columns such as the title, description, price, and first print date. The table also has an identifier. Each book forms a row in the table much like rows in a spreadsheet. Each row is uniquely identified by the ID, though the database is free to search for rows by column such as all books costing between $30 and $40.

Figure 13-2 also shows a table for invoice items, which would represent a sale of a particular book to a particular person. Each invoice item holds a reference to the book so that the database can look up which book was sold by relating the invoice item’s book_id to the book’s id. If a book were to be sold five times, the database would hold one row in the book table and five rows in the invoice items table. This is called a many-to-one relationship.

The author table is related to the book table with a many-to-many relationship. A book might have been written by several authors, and any author may have written multiple books.

SQL lets you run complicated reports on your data. A bookstore would be able to produce inventory reports, or to figure out which categories contribute the most to sales.

MySQL, PostgreSQL, Oracle, and SQLite are examples of relational databases. SQLite is notable in that it is an example of an embedded database: a SQL compliant database that runs as part of the application that uses it instead of connecting over a network to a server.

Schemaless Databases

The database schema is the description of all the tables in the database including the makeup of the columns and any relationships between the objects.

If you need to make a change to the schema, such as to add a middle name to the author table or track a different tax in a sale table, you need to modify every row even if they don’t need the tax or a middle name. In some databases, this involves downtime to lock the tables against change while the update is running.

A schemaless database does away with the notion of database enforced relationships and moves to document storage. A document is a free-form description of the entity. If an author doesn’t have a middle name, it doesn’t need one. If you want to track a new piece of data, you just start using it and the new documents will have it.

There are tradeoffs to all design decisions. Schemaless databases are not good for transactional data where you are tracking contents of bank accounts or sales receipts. They are good at huge databases, such as if you were running a large social network.

Schemaless databases are also called NoSQL databases because they typically don’t use the SQL language.

Examples of schemaless databases are MongoDB, ElasticSearch, and Couchbase.

Learning SQL

The Structured Query Language, SQL, is a language designed to work with relational databases. In SQL you have commands to create your database, query data, and insert, update, and remove data.

SQL is a standardized language, though each implementation tries to differentiate itself in some way. This book uses only standard commands that work on a variety of databases.

As a Linux systems administrator you may be called on to help people with their queries. Some companies employ a special role called a database administrator (DBA) to do this, but people are often afraid to talk to the DBA. Knowing a bit of SQL can come in handy when the DBA is not available or your company doesn’t have one.

Almost every web-based monitoring tool uses a database in the backend, and a bit of SQL knowledge helps you work with these applications and even be able to run custom reports on your own data.

If you have a SQL database installed, such as MySQL, you can use it for these exercises. Otherwise, we suggest you install SQLite.

Using SQLite

SQL is an embedded database rather than a network-aware database. It’s designed to give applications a way to build in a SQL database without the overhead of running a separate server. SQLite is used on everything from Android phones to most web browsers.

Check to see whether your system has the sqlite3 command by typing which sqlite3. If you get back a path to the binary, you have it. If not, install it with apt-get install sqlite3 for Debian and Ubuntu systems or yum install sqlite on Red Hat and Fedora.

A SQLite database is a single file. Any SQLite client can open this file and manipulate it by either using the shared libraries or the command line client.

Run sqlite3 with the path to a database file to get started. If the file doesn’t exist, it is created.

$ sqlite3 newdatabase
SQLite version 3.7.13 2012-07-17 17:46:21
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>

The database used for this book can be found in the downloads section at the book’s home page: http://www.pearsonitcertification.com/title/9780789754554. You can download this file and have some sample data to play with.

SQL Basics

SQL is a programming language and it is therefore important to be precise when writing code to get the results you want. It is unlike other programming languages in that you rarely use loops and iterators. Instead you are writing expressions to filter and manipulate data and then looking at that data.

Image

A semicolon (;) terminates each statement in SQL. Commands may span multiple lines and often do for the sake of readability. If you forget to terminate your command, the interpreter thinks you’re going to type in some more commands.

Image

Comments start with two dashes (--) and can appear anywhere on a line except within quotes. The following two examples show comments in both styles:

SELECT * -- I want all the columns
FROM t1; -- t1 holds sales data

And

-- t1 holds sales data and it's important to get all columns
SELECT *
FROM t1;

Image

The SQL commands themselves are case-insensitive, but table names and data strings are case-sensitive. Therefore the following statements are all the same:

INSERT INTO name VALUES('Sean');
insert into name values('Sean');
InSeRt INto name VALues('Sean');

However, values and data are case sensitive, so these are all different:

insert into name values('SEAN'); -- The name will be stored all in
capitals
insert into name values('sean'); -- The name will be stored in
lowercase
insert into NAME values('Sean'); -- The table is called name, not NAME

Keywords Versus Data

In Bash a line often starts with some kind of command and is then followed by some data, which is either a string or a variable or both. SQL commands start with a keyword that is followed by a series of clauses to modify that keyword. For example, you might have a keyword to indicate that you want to read data from the database, and then some clauses to filter and sort the data before displaying it. While SQL does have variables they’re not used as often, and the keywords and data are mixed together.

A SQL command might look like this:

SELECT age
FROM person
WHERE name='Sean' AND age > 20
LIMIT 15

In this example the capitalized words are all SQL keywords. The word age refers to a column name and person is the name of a table. SQL has some particular rules about when to use double and single quotes. Single quotes are for user-entered data, and double quotes are for table and column names. Unless the table or column name contains spaces or symbols, the double quotes are optional.

In the preceding example age, person, and name are database artifacts and are either left bare or double quoted. The string being searched for in the name column is single quoted because it’s user-entered data.

The numbers are also left bare because they’re integers. The query is asking for ages greater than 20 and limiting the results to 15 rows.

Image

Note that the word Sean is single quoted, indicating it’s a user-entered string. If it were not quoted, it would be referring to a column. WHERE last_name = maiden_name is an example of a filter that matches rows where the values of the last_name and maiden_name columns are identical.

If you have some database experience, you may have come across situations where different quoting rules work. Many databases exhibit nonstandard behavior that can be surprising as you move between systems. In particular, MySQL and SQLite allow you to use double and single quotes for strings interchangeably, where a stricter database such as PostegreSQL would flag the use of double quotes for strings as an error. Just remember: Double quotes or nothing at all for column and table names, and single quotes for data you enter.

Selecting Data

The SELECT statement is the command you use to extract information from your database. The simplest way to use it is to ask for a single string:

sqlite> SELECT 'Hello, world!';
Hello, world!

You can also run expressions:

sqlite> SELECT 1 + 1;
2

It would be far more interesting to get information out of a table. This requires the FROM keyword to indicate the name of the table. The FROM clause happens after the SELECT clause.

sqlite> SELECT * FROM book;
id title year author_id
---------- ----------------------------------- ---------- ----------
1 LPIC 1 Exam Cram 2 2004 2
2 Linux and Windows 2000 Integration 2001 2
3 Wireless All In One For Dummies 2009 1
4 Check Point CCSA Exam Cram 2 2005 1

Image

SELECT * asks for all the available columns, and FROM book indicates that the database should search the book table. In the preceding example, the table has two rows with identifiers of 1 and 2, respectively.

Naming things is hard. Should the table be called book or books? Tables are usually named after the singular version of the entity they contain. In this case each row is a book; therefore, the table is called book. If you’re working in a database that already uses the plural form, then stay with that.

You can get more specific columns by asking for them individually:

sqlite> SELECT title, year FROM book;
title year
----------------------------------- ----------
LPIC 1 Exam Cram 2 2004
Linux and Windows 2000 Integration 2001
Wireless All In One For Dummies 2009
Check Point CCSA Exam Cram 2 2005

Here the query was only for two columns: title and year.

Being Choosy

Tables may have thousands or millions of records and looking at them all would become impossible. A WHERE clause lets you filter results based on one or more conditions. This clause comes after the FROM clause.

sqlite> SELECT year, title FROM book WHERE year >= 2005;
year title
---------- -----------------------------------
2009 Wireless All In One For Dummies
2005 Check Point CCSA Exam Cram 2

Here the WHERE clause asks for anything where the year column is greater than or equal to 2005. Two books are returned. The book written in 2005 is included because the query uses >=, which means greater than or equal to. A > by itself would mean greater than 2005 and would have only included the one row.

As the year is a bare word, it is referring to a column name. If it were single quoted, it would be referring to the literal string year. No column satisfies the condition that the word year is the same as the number 2005, so no rows would be returned.

Suppose that the book table also tracked the year the book was started in a column called written. A query could check for books started in the year before they were published by looking for rows where written is not the same as the published year.

sqlite> SELECT title, year, written FROM book WHERE year <> written;
title year written
----------------------------------- ---------- ----------
Linux and Windows 2000 Integration 2001 2000
Check Point CCSA Exam Cram 2 2005 2004

In this example, the WHERE clause compares two columns and only returns those where they are not equal using the <> operator.

Image

The basic numeric comparison operators are

= - equals

<> - not equals

< - less than

<= - less than or equal to

> - greater than

>= - greater than or equal to

Searching ranges is possible through the BETWEEN operator, which takes two numbers and matches values between the two:

sqlite> SELECT title, year FROM book WHERE year BETWEEN 2004 AND 2005;
title year
----------------------------------- ----------
LPIC 1 Exam Cram 2 2004
Check Point CCSA Exam Cram 2 2005

Image

You can also search for specific numbers in a set, such as to find books published in 2001 or 2005 using the IN keyword. This keyword expects a list of values to check, enclosed in parentheses.

sqlite> SELECT title, year FROM book WHERE year IN (2001, 2005);
title year
----------------------------------- ----------
Linux and Windows 2000 Integration 2001
Check Point CCSA Exam Cram 2 2005

Finally you can perform a substring match on a column with the LIKE operator. While the equality operator, =, only matches exact strings, LIKE lets you look for strings within the column by using one of two metacharacters:

Image _ matches a single character.

Image % matches zero or more characters.

LIKE behaves the same as = if there are no metacharacters. The following two statements have the same output:

SELECT * FROM author WHERE first_name = "Sean";
SELECT * FROM author WHERE first_name LIKE "Sean";

The statements are the same because they both only find rows where the first_name column contains Sean and nothing else.

To find all the Exam Cram 2 books you can ask the database for anything ending in Exam Cram 2.

sqlite> SELECT title FROM book WHERE title LIKE '% Exam Cram 2';
title
-----------------------------------
LPIC 1 Exam Cram 2
Check Point CCSA Exam Cram 2

Multiple Conditions

You’re not limited to one question in your WHERE clause. Conditions may be put together with AND and OR. An AND requires that both conditions are true, an OR requires that only one is true. So you can find all the Exam Cram 2 books published in 2005 or later, or any book where the writing started in 2000:

sqlite> SELECT title, year
FROM book WHERE (title LIKE '%Exam Cram 2' AND year >= 2005)
OR written = 2000;
title year
----------------------------------- ----------
Linux and Windows 2000 Integration 2001
Check Point CCSA Exam Cram 2 2005

Image

If the precedence of ANDs and ORs is confusing, use parentheses to group the conditions. In the preceding case, since AND has higher precedence than OR, the parentheses are unnecessary but do add clarity.

Sorting

Sorting your results helps you to make more sense of the report. Rows normally come back in an indeterminate manner as the query gets more complicated. The ORDER BY clause comes toward the end of the query.

ORDER BY typically expects a column name, or set of column names, on which to sort:

sqlite> SELECT * FROM book ORDER BY written, year;
id title year author_id written
----- ----------------------------------- ---- --------- -------
2 Linux and Windows 2000 Integration 2001 2 2000
1 LPIC 1 Exam Cram 2 2004 2 2004
4 Check Point CCSA Exam Cram 2 2005 1 2004
3 Wireless All In One For Dummies 2009 1 2009

Image

In the previous example, the query first sorts by the year of writing, with the publishing year column used for a tie breaker. Sorts are performed in ascending order by default, which can be overridden with the ASC and DESC keywords:

sqlite> SELECT title from book ORDER by title DESC;
title
-----------------------------------
Wireless All In One For Dummies
LPIC 1 Exam Cram 2
Linux and Windows 2000 Integration
Check Point CCSA Exam Cram 2

The preceding result set was sorted by title in descending order, so the Wireless book is displayed before the Linux books.

Another form of ORDER BY refers to the column number rather than the name:

sqlite> SELECT title, year from book ORDER by 2 DESC;
title year
----------------------------------- ----
Wireless All In One For Dummies 2009
Check Point CCSA Exam Cram 2 2005
LPIC 1 Exam Cram 2 2004
Linux and Windows 2000 Integration 2001

Rather than ORDER BY year DESC the command orders by column 2, which is the publishing year.

Limiting Results

Even with a WHERE clause you may end up with more results than you need. You could be trying to find the top 10 most sold computer books. A WHERE clause would restrict your search to only computer books, and an ORDER BY sales DESC puts the highest seller at the top, but you’re still getting all the results.

Image

LIMIT returns a shortened list of results. LIMIT is the last clause in a query, so it comes after ORDER BY.

The last two books in the database to be published are found with this query:

Image

If you’re trying to perform some kind of pagination, where you show a certain number of results on one page and more on the next, the LIMIT keyword optionally takes an offset that goes before the number of results you want:

sqlite> SELECT title, year from book ORDER by 2 DESC LIMIT 2, 2;
title year
----------------------------------- ----
LPIC 1 Exam Cram 2 2004
Linux and Windows 2000 Integration 2001

Here, LIMIT 2, 2 asks for the database to skip two results and then show two results. For those of us who can’t remember whether the offset comes before the number of results or the other way around, the same clause can be written as LIMIT 2 OFFSET 2.

Working with Multiple Tables

All the commands so far have operated on a single table. A database has many tables because of a concept called normalization.

Consider the books we’ve been looking at so far. A book has a title, author, and a few other elements. But an author may write more than one book, and a book may have more than one author. (We conveniently ignore the book with multiple authors scenario.)

If the row containing the book also contains the author information, we have duplicated data:

Image

This is inefficient especially when you have to track other items about the book and author. If the author changes his address, you need to update each book. What if there are two authors with the same name? How much information would you need to uniquely identify them?

The solution is to split up books and authors into separate tables and link the two of them.

Authors

Image

Books

Image

Each book has a link to the author through the identifier column. Picking book ID 2, Wireless All In One For Dummies, the author_id is 1. ID 1 in the author table is Sean Walberg.

Writing Queries with Joins

A join is part of a query that tells the database how to match rows between two tables. Tables that are joined are also available to the other clauses such as WHERE, ORDER BY, and SELECT.

Image

A join is performed with the JOIN keyword, which comes after the FROM clause. The query is built from left to right, so you need to join something onto the table you specified in the FROM clause.

In the simple book database you can obtain a list of books and their authors:

sqlite> SELECT * FROM author JOIN book ON book.author_id = author.id;
id first_name last_name id title year author_id written
-- ---------- --------- -- -------------------- ---- --------- -------
2 Ross Brunson 1 LPIC 1 Exam Cram 2 2004 2 2004
2 Ross Brunson 2 Linux and Windows 20 2001 2 2000
1 Sean Walberg 3 Wireless All In One 2009 1 2009
1 Sean Walberg 4 Check Point CCSA Exa 2005 1 2004

The key here is the JOIN book ON book.author_id = author.id. The query already has author through the FROM clause and this joins in book based on the condition that the book’s author_id is the same as the author’s id.

The condition specified after the ON keyword is important. Think of the two tables as sitting next to each other and you’ve asked the database to join them together. The database goes down the rows of one table and tries to make any matches it can to the other table.

Image

The database doesn’t know how to match up the rows if you don’t give it conditions. It will match up every row on the left with every row on the right:

sqlite> select * from author join book;
id first_name last_name id title year author_id written
-- ---------- --------- -- ------------------- ---- --------- -------
1 Sean Walberg 1 LPIC 1 Exam Cram 2 2004 2 2004
1 Sean Walberg 2 Linux and Windows 20 2001 2 2000
1 Sean Walberg 3 Wireless All In One 2009 1 2009
1 Sean Walberg 4 Check Point CCSA Exa 2005 1 2004
2 Ross Brunson 1 LPIC 1 Exam Cram 2 2004 2 2004
2 Ross Brunson 2 Linux and Windows 20 2001 2 2000
2 Ross Brunson 3 Wireless All In One 2009 1 2009
2 Ross Brunson 4 Check Point CCSA Exa 2005 1 2004

There are eight rows in the result, including rows indicating the wrong author wrote the book. Eight rows comes from two authors times four books for eight different author-book combinations. The ON clause helps the database to match up the rows so that an author is linked to all his books and not someone else’s.

Cleaning Up the Query

Looking carefully at the headers of the preceding results you can see that there are two columns called id. This is because the joined table contains an id column from both author and book.

The join clause used in the preceding example always mentions the name of the table when talking about the column. book.author_id = author.id means the author_id from book and the id from author. If we forget the names of the tables, the database has problems figuring out where the id column comes from, as there are two:

sqlite> SELECT * FROM author JOIN book ON author_id=id;
Error: ambiguous column name: id

Image

Typing in column names can be exhausting, so SQL lets you alias tables within the query with the AS keyword. The AS is used right after the first mention of the table name and is directly followed by the alias.

The query used so far can be rewritten as follows:

sqlite> SELECT * FROM author AS a JOIN book AS b ON b.author_id = a.id;
id first_name last_name id title year author_id written
-- ---------- --------- -- ---------------------- ---- ------- -------
2 Ross Brunson 1 LPIC 1 Exam Cram 2 2004 2 2004
2 Ross Brunson 2 Linux and Windows 2000 In 2001 2 2000
1 Sean Walberg 3 Wireless All In One For D 2009 1 2009
1 Sean Walberg 4 Check Point CCSA Exam Cra 2005 1 2004

Here the author table was renamed to a, the book table was renamed to b, and the JOIN was able to make use of the shorter names. The output is otherwise identical.

The AS is optional, so the query can be shortened even further:

SELECT * FROM author a JOIN book b ON b.author_id = a.id;

Be careful when aliasing columns to give it a name that makes sense. Also, as you alias column names you have to use that name for the rest of the query so it will also appear in any WHERE or GROUP BY clauses.

Advanced Joins

The data earlier was ideal: Every book had an author and every author had a book. But the real world is not so ideal. A book may have no sales, or the author isn’t in the system.

The query used in the previous section is called an inner join. It requires that each result row have a row from both the left and right tables; otherwise, the row doesn’t make it to the result.

Consider a new author table where there’s an author who hasn’t written any books:

sqlite> SELECT * FROM author;
id first_name last_name
--- ---------- ----------
1 Sean Walberg
2 Ross Brunson
3 Ada Lovelace
sqlite> SELECT * FROM book WHERE author_id = 3;
sqlite>

The report on books and authors doesn’t even mention Ms. Lovelace!

sqlite> SELECT first_name, last_name, title FROM author
JOIN book ON author_id=author.id;
first_name last_name title
---------- --------------- -----------------------------------
Ross Brunson LPIC 1 Exam Cram 2
Ross Brunson Linux and Windows 2000 Integration
Sean Walberg Wireless All In One For Dummies
Sean Walberg Check Point CCSA Exam Cram 2

Image

This is because the properties of an inner join require rows to match on both the left and right for something to appear in the result.

In terms of a Venn diagram, see Figure 13-3.

Image

Figure 13-3 A Venn diagram of an INNER JOIN

The result is the intersection of the left and right sides.

Getting all the authors even if they haven’t written any books requires a left join. Expressed as a Venn diagram, the left join and its partner the right join look like that shown in Figure 13-4.

Image

Figure 13-4 A Venn diagram of a LEFT JOIN and RIGHT JOIN

Image

The left join includes all elements from the left and anything from the right where there’s a match. If there’s no match on the right, the missing data is filled in with a special NULL value.

sqlite> SELECT first_name, last_name, title
FROM author
LEFT JOIN book ON author_id=author.id;
first_name last_name title
---------- --------------- -----------------------------------
Sean Walberg Check Point CCSA Exam Cram 2
Sean Walberg Wireless All In One For Dummies
Ross Brunson LPIC 1 Exam Cram 2
Ross Brunson Linux and Windows 2000 Integration
Ada Lovelace

The output is the same as before except that Ada Lovelace has an empty title next to her name.

Left Versus Right Joins

It is difficult to know when to use a left or right join. First you should visualize your query. Which rows have already been added to the query through the FROM clause and any previous JOINs, and which are you joining on? That’s the left and the right, respectively. Next, which side do you want all the rows included?

Most of the time the answer is going to be a left join only because that’s the way queries are often constructed. In the preceding example, the query started from the author table not the book table.

Null

When you query for authors and books and get back an author who hasn’t written a book, the database needs to fill in that blank spot somehow. Similarly, a field that has no value, such as the publication date of an unpublished book, needs some kind of placeholder. This placeholder is called aNULL.

sqlite> SELECT first_name, last_name, title
FROM author
LEFT JOIN book ON author_id=author.id
WHERE author.id = 3;
first_name last_name title
---------- --------------- -----------------------------------
Ada Lovelace

The NULL in the preceding example is in the title column. Some database systems explicitly write the word NULL and some leave a blank.

Image

NULL is more than an empty value. NULL means “information is missing.” Therefore you can’t compare anything to NULL:

sqlite> SELECT 1=1;
1=1
----------
1
sqlite> SELECT NULL=NULL;
NULL=NULL
----------

If you can’t compare null to anything, how can you write queries like one to find the authors with no books? It would be reasonable to think that this would work:

sqlite> SELECT first_name, last_name, title
FROM author
LEFT JOIN book ON author_id=author.id
WHERE title = NULL;

However, this won’t work because even if the title were NULL it can’t be compared to NULL. SQL implements a keyword named IS NULL for this case, with a corresponding IS NOT NULL to look for values that are not NULL.

sqlite> SELECT first_name, last_name, title
FROM author
LEFT JOIN book ON author_id=author.id
WHERE title IS NULL;
first_name last_name title
---------- --------------- -----------------------------------
Ada Lovelace

When writing queries, especially with joins, take care to make sure you’re handling NULL correctly.

Subselects

Subselects are another form of join. Instead of one large query, you’re inserting the results of a second query, called the child query, into the original, or parent query. This is similar to the command substitution in shell scripts you learned about in Chapter 12, “Shell Scripting.”

As a simple example, one can retrieve the author IDs where the author’s name is Sean with a simple query:

sqlite> SELECT id FROM author WHERE first_name = "Sean";
id
-----------------------------------
1

Those results can be injected into another query, such as to find all the books written by that author:

sqlite> SELECT title FROM book
WHERE author_id
IN (SELECT id FROM author WHERE first_name = "Sean");
title
-----------------------------------
Wireless All In One For Dummies
Check Point CCSA Exam Cram 2

Image

The subselects are enclosed in parentheses and usually given as an argument to a WHERE column IN clause. The subselect must return only one column; otherwise, you might get an error like this:

Error: only a single result allowed for a SELECT that is part of an
expression

Grouping Data

So far you’ve seen how to manipulate several tables into one larger table, and to sort and filter the resulting data. The SQL GROUP BY command lets you roll up rows based on similar columns and to perform calculations.

Image

GROUP BY expects a comma-separated list of columns. Rows that have the same values in those columns are rolled up and counted by aggregate functions in the SELECT clause. This query provides a list of authors and the number of books in the database:

sqlite> SELECT first_name, last_name, COUNT(title) AS books
FROM author
LEFT JOIN book ON author_id=author.id
GROUP BY first_name, last_name;
first_name last_name books
---------- --------------- -----
Ada Lovelace 0
Ross Brunson 2
Sean Walberg 2

The elements from the previous queries in this chapter are still there. The query selects some columns from author and joins in book with a left join. The difference is the GROUP BY clause and the COUNT(title).

The grouping is what rolls up all rows with similar authors into one. COUNT(title) is an aggregate function that asks the database for the number of non-NULL titles in each grouping.

Image

There are more aggregate functions, depending on your database:

Image AVG(column)—Returns the average of the column (NULLs are removed.)

Image COUNT(column)—Counts the number of non-NULL instances of column, or total rows if * is used instead of a column name

Image MIN(column)—Returns the minimum non-NULL value in the column

Image MAX(column)—Returns the maximum non-NULL value in the column

Image SUM(column)—Adds up all the non-NULL values in the column

Inserting Data

The INSERT command is the primary way to get new rows into your tables. The format of the INSERT command is

INSERT INTO tablename (columns) VALUES (values);

So to insert a new book:

sqlite> INSERT INTO book (title, year, author_id)
VALUES ('Sketch of the Analytical Engine', 1842, 3);
sqlite> SELECT * FROM book WHERE author_id = 3;
id title year author_id written
----- ------------------------------- ----- --------- --------
5 Sketch of the Analytical Engine 1842 3

In the preceding example, the command is telling the database to expect three pieces of data, which will go into the title, year, and author_id columns of the book table.

Notice that the id was set to 5 even though the value wasn’t specified. This is because that column is the primary key of the table and is filled in by the database automatically if it’s not provided.

The written column was unspecified, so it’s left as NULL to indicate the value is unknown.

Image

An alternate form of INSERT is to specify values for all the columns, which eliminates the need to tell the database which columns you will use.

sqlite> INSERT INTO book VALUES (NULL, "LPIC-1/CompTIA Linux+ Cert
Guide", NULL, 3, 2015);

As there are two unknown columns, the primary key and the publication year, those are entered in as NULL. The database adds its own primary key.

If you have many rows to insert you can add them all in one command:

INSERT INTO book (title) VALUES ('title1'), ('title2');

Updating Data

Once data is in your database you may need to make changes to existing rows. People change their names, books change their prices, and relationships between different entities change.

Image

The UPDATE command changes existing rows. The format of the command is

UPDATE table SET column1=value1, column2=value2 WHERE conditions.

To set a publication year for the book entered in the last section:

sqlite> UPDATE book SET year=2015 WHERE id=6;

It’s also possible to set a value based on another column, such as if you want to set the year the book was written to the same as the publication year if it’s unspecified:

sqlite> UPDATE book SET written=year WHERE written IS NULL;

This works because of the quoting rules discussed earlier. Year is a bare word; therefore, it refers to a column name. If it were single quoted, it would try to set the value of the column to the word year. This would fail because the column is set to only accept integers.

It’s important to get the command right on the first try, especially the WHERE clause. There’s no undo button!

Deleting Data

Deleting data is similar to querying data as you’re deleting the results of a query:

sqlite> DELETE FROM book WHERE author_id IS NULL;

This query deletes any book where the author_id is missing.

Just like the UPDATE, be careful! If you forget the WHERE clause, all your rows will be deleted.

Creating Tables

Unless you’re a developer, there’s a good chance you’ll never have to create a table. However, for completeness, tables are created with the CREATE TABLE command. The general form of this command is

CREATE TABLE tablename (
ColumnName1 type1 options1,
ColumnName2 type2 options2
);

Where the ColumnNames are the names you give the column, the type describes what is stored in the column, and the options are some optional adjustments to what is stored in the table.

Common data types are

Image int—An integer

Image smallint, mediumint, bigint—Integer types with various limits

Image character, varchar—Character strings, which require a length to be passed such as varchar(255)

Image text—A longer string or document that is larger than a database dependent limitation

Image blob—A binary object such as a picture

Image float—A floating point number

Image decimalA number with a decimal place stored in fixed precision format

Image Boolean—A true or false value

Image Date and datetime—Either a date or a date with a timestamp

The options give some more context to the column and allow for better data consistency:

Image NOT NULL—Do not allow NULL values to be stored.

Image PRIMARY KEY—This column is a primary key that is indexed and must be unique for each row.

Image UNIQUE—This column is not the primary key but is not allowed to have any duplicates.

The statements used to create the tables for this book are (with formatting added for clarity):

sqlite> .schema
CREATE TABLE author (
id integer primary key,
first_name varchar(255),
last_name varchar(255));
CREATE TABLE book (
id integer primary key,
title varchar(255),
year integer,
author_id integer,
written integer);

Summary

Databases are used to store information for easier retrieval and reporting. A relational database is one where various entities are stored in separate tables, and the database provides the Structured Query Language (SQL) with which to query and change the data.

SQL involves a keyword, such as SELECT, INSERT, UPDATE, or DELETE, followed by a series of clauses to provide additional information. For example, you might SELECT some data and filter it with a WHERE clause and aggregate it with GROUP BY.

Exam Preparation Tasks

As mentioned in the section “How to Use This Book” in the Introduction, you have a couple of choices for exam preparation: the exercises here, Chapter 21, “Final Preparation,” and the practice exams on the DVD.

Review All Key Topics

Review the most important topics in this chapter, noted with the Key Topics icon in the outer margin of the page. Table 13-2 lists a reference of these key topics and the page numbers on which each is found.

Image

Image

Image

Table 13-2 Key Topics for Chapter 13

Define Key Terms

Define the following key terms from this chapter and check your answers in the glossary:

client-server

key-value database

indexes

relational database

many-to-one relationship

many-to-many relationship

embedded database

database schema

normalization

inner join

left join

null value

aggregate function

primary key

table

Structured Query Language

Review Questions

The answers to these review questions are in Appendix A.

1. A table with rows and columns can be found in a(n):

a. Flat file database

b. Key-value database

c. Relational database

d. Schemaless database

2. Statements in SQL are separated by:

a. ;

b. A blank line

c. A new line

d. .

3. You have a table containing employee data, and one of the fields is years_of_service. How would you find everyone who has worked for at least 5 years, but no more than 20?

a. SELECT * FROM employee WHERE years_of_service > 5 AND years_of_service < 20;

b. SELECT * FROM employee where years_of_service BETWEEN 5, 20;

c. SELECT * FROM employee where years_of_service BETWEEN 5 AND 21;

d. SELECT * FROM employee where years_of_service BETWEEN 5 AND 20;

4. Given a table for managers and employees, how would you produce a report of the managers with their employees, ignoring managers with no employees?

a. SELECT * FROM managers, employees;

b. SELECT * FROM managers AS m JOIN employees AS ee ON (ee.manager_id = m.id);

c. SELECT * FROM managers AS m LEFT JOIN employees AS ee ON (ee.manager_id = m.id);

d. SELECT * FROM managers JOIN employees(id=manager_id);

5. Consider this query:

SELECT *
FROM products
LEFT JOIN sales on products.name = sales.product_name

Which of the following are true? (Choose two.)

a. All products will be included even if they have no sales.

b. All sales will be included even if the product is not missing.

c. The sales table contains the product’s name in the product_name column.

d. This query will calculate total sales per product.

6. Which of the following queries properly use table aliases?

a. SELECT * FROM employee WHERE employee AS ee.hire_date > ‘2010-01-01’ AND ee.status IS NOT NULL.

b. SELECT * FROM employee WHERE employee.hire_date > ‘2010-01-01’ AND ee.status IS NOT NULL.

c. SELECT * FROM employee AS ee WHERE ee.hire_date > ‘2010-01-01’ AND employee.status IS NOT NULL.

d. SELECT * FROM employee ee WHERE ee.hire_date > ‘2010-01-01’ AND ee.status IS NOT NULL.

7. A NULL in a column means:

a. The value is missing.

b. It is the same as zero.

c. It is the same as a blank string.

d. The column will match anything.

8. You have an employee table that includes a job description and a salary column. How do you calculate the average salary per position using SQL?

a. SELECT *, SUM(salary) FROM employee GROUP BY description;

b. SELECT description, SUM(salary) FROM employee;

c. SELECT description, SUM(salary) FROM employee GROUP BY description;

d. SELECT description, SUM(salary) FROM employee GROUP BY salary;

9. Which of the following statements is correct about the following SQL statement:

INSERT INTO employee (name, age) VALUES ('Alice', 30), ('Bob', 32);

a. The values have the rows and columns mixed; it should be (‘Alice’, ‘Bob’), (30, 22);.

b. The statement needs to specify all the columns in employee.

c. The statement needs to be split into two.

d. The statement inserts two rows.

10. Your employee table tracks years of service, but there are many employees where the years are not known and have a NULL in that column. How could you change everyone in this situation to have 20 years of experience?

a. UPDATE employee SET years_of_service = 20 WHERE years_of_service = NULL;

b. UPDATE employee SET years_of_service = 20 WHERE years_of_service IS NULL;

c. UPDATE employee SET years_of_service = 20;

d. UPDATE employee.years_of_service = 20 WHERE years_of_service IS NULL;