MySQL Basics - Introduction and Tutorials - MySQL in a Nutshell (2008)

MySQL in a Nutshell (2008)

Part I. Introduction and Tutorials

Chapter 3. MySQL Basics

Although the bulk of this new edition of MySQL in a Nutshell contains reference information, which you can read in small segments as needed, this chapter presents a basic MySQL tutorial. It explains how to log in to the MySQL server through the mysql client, create a database, create tables within a database, and enter and manipulate data in tables.

This tutorial does not cover MySQL in depth. Instead, it’s more of a sampler; it’s meant to show you what’s possible and to get you thinking about how to approach tasks in MySQL.

The mysql Client

There are various methods of interacting with the MySQL server to develop or work with a MySQL database. The most basic interface that you can use is the mysql client. With it, you can interact with the server from either the command line or within an interface environment.

If MySQL was installed properly on your server, mysql should be available for use. If not, see Chapter 2. On Unix-based systems, you can type whereis mysql. Windows, Macintosh, and other GUI-type systems have a program location utility for finding a program. If you used the default installation method, the mysql program probably resides at /usr/local/mysql/bin/mysql. On Unix systems, if /usr/local/mysql/bin/ is in your default path (the PATH environment variable), you can specify mysql without the full pathname. If the directory is not in your path, you can add it by entering:

PATH=$PATH:/usr/local/mysql/bin

export PATH

Assuming that everything is working, you will need a MySQL username and password. If you’re not the administrator, you must obtain these from her. If MySQL was just installed and the root password is not set yet, its password is blank. To learn how to set the root password and to create new users and grant them privileges, see Chapter 2 for starting points and Chapter 4 for more advanced details.

From a shell prompt, log in to MySQL like this:

mysql -h host -u user -p

If you’re logging in locally—that is, from the server itself—either physically or through a remote login method, such as SSH (secure shell), you can omit the -h host argument. This is because the default host is localhost, which refers to the system you are on. In other circumstances, where your commands actually have to travel over a network to reach the server, replace the argument host with either a hostname that is translatable to an IP address or the actual IP address of the MySQL server. You should replace the argument user with your MySQL username. This is not necessarily the same as your filesystem username.

The -p option instructs mysql to prompt you for a password. You can also add the password to the end of the -p option (e.g., enter -prover where rover is the password); if you do this, leave no space between -p and the password. However, entering the password on the command line is not a good security practice, because it displays the password on the screen and transmits the password as clear text through the network, as well as making it visible whenever somebody gets a list of processes running on the server.

When you’re finished working on the MySQL server, to exit mysql, type quit or exit, and press the Enter key.

Creating a Database and Tables

Assuming that you have all of the privileges necessary to create and modify databases on your server, let’s look at how to create a database and then tables within a database. For the examples in this chapter, we will build a database for a fictitious bookstore:

CREATE DATABASE bookstore;

In this brief SQL statement, we have created a database called bookstore. You may have noticed that the commands or reserved words are printed here in uppercase letters. This isn’t necessary; MySQL is case-insensitive with regard to reserved words for SQL statements and clauses. Database and table names are case-sensitive on operating systems that are case-sensitive, such as Linux systems, but not on systems that are case-insensitive, such as Windows. As a general convention, though, reserved words in SQL documentation are presented in uppercase letters and database names, table names, and column names in lowercase letters. You may have also noticed that the SQL statement shown ends with a semicolon. An SQL statement may be entered over more than one line, and it’s not until the semicolon is entered that the client sends the statement to the server to read and process it. To cancel an SQL statement once it’s started, enter \c instead of a semicolon.

With our database created, albeit an empty one, we can switch the default database for the session to the new database like this:

USE bookstore

This saves us from having to specify the database name in every SQL statement. MySQL by default will assume the current database, the one we last told it to use. No semicolon is given with the USE statement because it’s a client-based SQL statement.

Next, we will create our first table, in which we will later add data. We’ll start by creating a table that we’ll use to enter basic information about books, because that’s at the core of a bookstore’s business:

CREATE TABLE books (

book_id INT,

title VARCHAR(50),

author VARCHAR(50));

This SQL statement creates the table books with three columns. Note that the entire list of columns is contained within parentheses.

The first column is a simple identification number for each record, which represents one book. You can specify the data type either as INTEGER or as INT like the example. The second and third columns consist of character fields of variable width, up to 50 characters each.

To see the results of the table we just created, enter a DESCRIBE statement, which displays a table as output:

DESCRIBE books;

+---------+-------------+------+-----+---------+-------+

| Field | Type | Null | Key | Default | Extra |

+---------+-------------+------+-----+---------+-------+

| book_id | int(11) | YES | | NULL | |

| title | varchar(50) | YES | | NULL | |

| author | varchar(50) | YES | | NULL | |

+---------+-------------+------+-----+---------+-------+

Considering our bookstore a bit more, we realize that we need to add a few more columns for data elements: publisher, publication year, ISBN number, genre (e.g., novel, poetry, drama), description of book, etc. We also realize that we want MySQL to automatically assign a number to thebook_id column so that we don’t have to bother creating one for each row or worry about duplicates. Additionally, we’ve decided to change the author column from the actual author’s name to an identification number that we’ll join to a separate table containing a list of authors. This will reduce typing, and will make sorting and searching easier, as the data will be uniform. To make these alterations to the table that we’ve already created, enter the following SQL statement:

ALTER TABLE books

CHANGE COLUMN book_id book_id INT AUTO_INCREMENT PRIMARY KEY,

CHANGE COLUMN author author_id INT,

ADD COLUMN description TEXT,

ADD COLUMN genre ENUM('novel','poetry','drama'),

ADD COLUMN publisher_id INT,

ADD COLUMN pub_year VARCHAR(4),

ADD COLUMN isbn VARCHAR(20);

After the opening line of this SQL statement, notice that each clause in which we change or add a column is separated from the following one by a comma. On the second line here, we’re changing the book_id column. Even though we are keeping the column name and the data type the same, we have to restate them. We’re adding the AUTO_INCREMENT flag, which carries out the task mentioned in the previous paragraph, assigning a unique and arbitrary value to each book in the table. We’re also making the column the PRIMARY KEY for indexing, which allows faster data retrieval.

The first CHANGE clause may look confusing because it lists the column name (book_id) twice. This makes sense when you understand the syntax of a CHANGE clause: the first book_id names the existing column you want to change, and the rest of the clause specifies the entire new column. To understand this better, examine the second CHANGE clause: it replaces the existing author column with a new author_id column. There will no longer be a column named author.

In the third line, we’re changing the author column so that its label and data type align with the authors table that we’ll create later. The authors table will have an indexed column to represent the author, just as the books table has an indexed column to represent the books. To figure out which author the author_id column in the books table is pointing to, we’ll join the books table to the authors table in queries. Because the corresponding column in the authors table will have a data type of INT, so must this one.

The fourth line adds a column for each book’s description. This has a data type of TEXT, which is a variable-length data type that can hold very large amounts of data, up to 64 kilobytes. There are other factors, though, that can limit a TEXT column further. See Appendix A for a list of data types, their limits, and other limiting factors.

For genre, we’re enumerating a list of possible values to ensure uniformity. A blank value and a NULL value are also possible, although they’re not specified.

Before moving on to adding data to our books table, let’s quickly set up the authors table. This table will be what is known as a reference table. We need to enter data into the authors table, because when we enter data into the books table, we will need to know the identification number for the authors of the books:

CREATE TABLE authors

(author_id INT AUTO_INCREMENT PRIMARY KEY,

author_last VARCHAR(50),

author_first VARCHAR(50),

country VARCHAR(50));

This table doesn’t require too many columns, although we might add other columns to it for an actual bookstore. As mentioned before, as needed, we’ll join the books table to the authors table through the author_id in both tables.

In the authors table, we’ve separated the first and last name of each author into two columns so that we can easily sort and search on the last name. We’ve also added a column for the author’s country of origin so that we can search for works by authors of a particular country when asked by customers. For production use, it might be better to use a country code and then have yet another reference table listing the full names of countries. But we’re trying to keep this tutorial simple and include detail only when it has educational value.

Show Me

Let’s take a moment to admire our work and see what we’ve done so far. To get a list of databases, use the SHOW DATABASES statement:

SHOW DATABASES;

+-----------+

| Database |

+-----------+

| bookstore |

| mysql |

| test |

+-----------+

The result of the SHOW DATABASES statement lists not only the database we’ve created, but also two others. One is the mysql database, which contains data about user privileges and was covered in Chapter 2. The third database is the test database, which is set up by default when MySQL is installed. It’s there as a convenience for you to be able to add tables or run SQL statements for testing.

To see a list of tables in the bookstore database, once we select the bookstore database with the USE statement shown earlier, we would enter the following statement:

SHOW TABLES;

+---------------------+

| Tables_in_bookstore |

+---------------------+

| authors |

| books |

+---------------------+

The result of the SHOW TABLES statement provides a list containing our two tables, just as we expected. If you want to see a list of tables from another database while still using the bookstore database, add a FROM clause to the previous statement:

SHOW TABLES FROM mysql;

This displays a list of tables from the mysql database, even though the default database for the client session is the bookstore database.

Inserting Data

Now that we’ve set up our first two tables, let’s look at how we can add data to them. We’ll start with the simplest method: the INSERT statement. With the INSERT statement we can add one or more records. Before adding information about a book to our books table, because it refers to a field in our authors table, we need to add the author’s information to the latter. We’ll do this by entering these SQL statements through the mysql client:

INSERT INTO authors

(author_last, author_first, country)

VALUES('Greene','Graham','United Kingdom');

SELECT LAST_INSERT_ID( );

+-------------------+

| LAST_INSERT_ID( ) |

+-------------------+

| 1 |

+-------------------+

INSERT INTO books

(title, author_id, isbn, genre, pub_year)

VALUES('The End of the Affair', 1,'0099478447','novel','1951');

Our first SQL statement added a record, or row, for Graham Greene, an author who wrote the book The End of the Affair. The standard INSERT syntax names the columns for which the values are to be inserted, as we’re doing here. If you’re going to enter values for all of the columns, you don’t need to name the columns, but you must list the data in the same order in which the columns are listed in the table.

In the second SQL statement, we retrieved the identification number assigned to the row we just entered for the author by using the LAST_INSERT_ID() function. We could just as easily have entered SELECT author_id FROM authors;.

In the third SQL statement, we added data for a Graham Greene book. In that statement, we listed the columns in an order that’s different from their order in the table. That’s acceptable to MySQL; we just have to be sure that our values are in the same order.

Selecting Data

Now that we have one row of data in each of our two tables, let’s run some queries. We’ll use the SELECT statement to select the data that we want. To get all of the columns and rows from the books table, enter the following:

SELECT * FROM books;

The asterisk, which acts as a wildcard, selects all columns. We did not specify any criteria by which specific rows are selected, so all rows are displayed from the books table. To select specific columns, we name the columns we want. To select specific rows, we add a WHERE clause to the end of the SELECT statement:

SELECT book_id, title, description

FROM books

WHERE genre = 'novel';

This SQL statement displays just the book’s identification number, the book’s title, and the description of the book from the books table for all books where the genre column has a value of novel. The results will be more meaningful, of course, when we have data on more books in the database. So, let’s assume that we’ve entered data for a few dozen more books, and proceed.

If we want to get a list of novels from the database along with the author’s full name, we need to join the books table to the authors table. We can join the two tables with a JOIN clause like this:

SELECT book_id, title, pub_year,

CONCAT(author_first, ' ', author_last) AS author

FROM books

JOIN authors USING(author_id)

WHERE author_last = 'Greene';

In the FROM clause, we join the books table to the authors table using the author_id columns in both tables. If the columns had different labels, we would have to use a different clause or method in the JOIN clause to join the tables (e.g., ON (author_id = writer_id)). Notice in the second line of this SQL statement that we’ve employed a string function, CONCAT(). With this function you can take bits of data and merge them together with text to form more desirable-looking output. In this case, we’re taking the author’s first name and pasting a space (in quotes) onto the end of it, followed by the author’s last name. The results will appear in the output display as one column, which we’ve given a column heading of author. The keyword AS creates this column title with our chosen name, called an alias.

In the WHERE clause, we’ve specified that we want data on books written by authors with the last name Greene. If the books table did not contain books by Greene, nothing would be displayed. The results of the previous query are as follows:

+---------+-----------------------+----------+---------------+

| book_id | title | pub_year | author |

+---------+-----------------------+----------+---------------+

| 1 | The End of the Affair | 1951 | Graham Greene |

| 2 | Brighton Rock | 1937 | Graham Greene |

+---------+-----------------------+----------+---------------+

As you can see, a second book by Graham Greene was found and both have been displayed. The column heading was changed for the output of the author’s name per the AS clause. We could change the column headings in the display for the other columns with the keyword AS as well. Theauthor alias can be reused in a SELECT statement, but not in the WHERE clause, unfortunately. You can find more information on AS in Chapter 6.

Ordering, Limiting, and Grouping

For times when we retrieve a long list of data, it can be tidier to sort the data output in a specific order. To do this, we can use the ORDER BY clause. Suppose that we want a list of plays written by William Shakespeare from our database. We could enter the following SQL statement to retrieve such a list and to sort the data by the play title:

SELECT book_id, title, publisher

FROM books

JOIN authors USING(author_id)

JOIN publishers USING(publisher_id)

WHERE author_last = 'Shakespeare'

AND genre = 'play'

ORDER BY title, pub_year;

The ORDER BY clause comes at the end, after the WHERE clause. Here the ORDER BY clause orders the data results first by the title column and then, within title, by the pub_year column, or the year that the particular printing of the play was published. By default, data is sorted in ascending alphanumeric order. If we want to order the results in descending order for the titles, we can just add a DESC flag immediately after the title column in the ORDER BY clause and before the comma that precedes pub_year:

...

ORDER BY title DESC, pub_year;

A large bookstore will have many editions of Shakespeare’s plays, possibly a few different printings for each play. If we want to limit the number of records displayed, we could add a LIMIT clause to the end of the previous SQL statement:

SELECT book_id, title, publisher

FROM books

JOIN authors USING(author_id)

JOIN publishers USING(publisher_id)

WHERE author_last = 'Shakespeare'

AND genre = 'play'

ORDER BY title DESC, pub_year

LIMIT 20;

This addition limits the number of rows displayed to the first 20. The count starts from the first row of the result set after the data has been ordered according to the ORDER BY clause. If we want to retrieve the next 10, we would adjust the LIMIT clause to specify the number of rows to skip, along with the number of records to retrieve. So if we want to skip the first 20 rows and list the next 10 rows from our sort, we replace the LIMIT clause in the SQL statement with this one:

...

LIMIT 20, 10;

As you can see, in a two-argument LIMIT clause, the first argument specifies the number of rows to skip or the point to begin (i.e., 20) and the second argument states the number of rows to display (i.e., 10).

If we want to get just a list of titles by Shakespeare, and we are not concerned with which printing or publisher—that is to say, if we want one row for each title and are satisfied with the first row found for each—we could use the GROUP BY clause like this:

SELECT book_id, title

FROM books

JOIN authors USING(author_id)

WHERE author_last = 'Shakespeare'

GROUP BY title;

The result of this SQL statement is a list of titles by Shakespeare from the database, displaying the record identification number of the first one found for each title.

Analyzing and Manipulating Data

With MySQL you can not only retrieve raw data, but also analyze and format the data retrieved. For instance, suppose we want to know how many titles we stock by Leo Tolstoy. We could enter a SELECT statement containing a COUNT() function like this:

SELECT COUNT(*)

FROM books

JOIN authors USING(author_id)

WHERE author_last = 'Tolstoy';

+----------+

| COUNT(*) |

+----------+

| 12 |

+----------+

As another example, suppose that after setting up our database and putting it to use we have another table called orders that contains information on customer orders. We can query that table to find the total sales of a particular book. For instance, to find the total revenues generated from, say, William Boyd’s book Armadillo, we would enter the following SQL statement in the mysql client:

SELECT SUM(sale_amount) AS 'Armadillo Sales'

FROM orders

JOIN books USING(book_id)

JOIN authors USING(author_id)

WHERE title = 'Armadillo'

AND author_last = 'Boyd';

+-----------------+

| Armadillo Sales |

+-----------------+

| 250.25 |

+-----------------+

Here we are joining three tables together to retrieve the desired information. MySQL selects the value of the sale_amount column from each row in the orders table that matches the criteria of the WHERE clause. Then it adds those numbers and displays the sum with the column heading given.

For columns that contain date or time information, we can decide on the format for displaying the data using a variety of functions. For instance, suppose that we want to extract from the orders table the date that a customer made a particular purchase, based on his receipt number (e.g., 1250), which in turn is the record identification number, or sale_id. We could simply enter the following statement and get the default format as shown in the last line of results:

SELECT purchase_date AS 'Purchase Date'

FROM orders

WHERE sale_id = '1250';

+---------------+

| Purchase Date |

+---------------+

| 2004-03-01 |

+---------------+

This format (year-month-day) is understandable. However, if we want the month displayed in English rather than numerically, we have to use some date functions:

SELECT CONCAT(MONTHNAME(purchase_date), ' ',

DAYOFMONTH(purchase_date), ', ',

YEAR(purchase_date)) AS 'Purchase Date'

FROM orders

WHERE sale_id = '1250';

+---------------+

| Purchase Date |

+---------------+

| March 1, 2004 |

+---------------+

To put the date together in the typical human-readable format used in the United States, we’re using the CONCAT() function in conjunction with a few date functions. It may be a little confusing at first glance, because we’re inserting a space between the month and the day at the end of the first line and a comma and a space after the day at the end of the second line. As for the date functions, the first one extracts the month from the purchase_date column and formats it so its full name is displayed. The second date function on the second line extracts just the day, after which we explicitly specify a comma. The third date function on the third line extracts just the year.

As you can see in the results, our combination of functions works. However, it’s not the cleanest method by which the date can be assembled. We could use the DATE_FORMAT() function instead:

SELECT DATE_FORMAT(purchase_date, "%M %d, %Y")

AS 'Purchase Date'

FROM orders

WHERE sale_id = '1250';

This is a much more efficient method, and it provides the same output as the previous statement. You just have to know the formatting codes to be able to use this function properly. They’re listed in Chapter 12, along with several more formatting codes and many more date and time functions.

Changing Data

You can change data in a table using a few different methods. The most basic and perhaps the most common method is to use the UPDATE statement. With this statement, you can change data for all rows or for specific records based on a WHERE clause.

Looking back on the results displayed from an earlier query, we can see that Graham Greene’s book Brighton Rock has a copyright year of 1937. That’s not correct; it should be 1938. To change or update that bit of information, we would enter the following SQL statement:

UPDATE books

SET pub_year = '1938'

WHERE book_id = '2';

Query OK, 1 row affected (0.00 sec)

Rows matched: 1 Changed: 1 Warnings: 0

First, we state the name of the table that’s being updated. Next, we include the SET keyword with the column to change and its corresponding new value. If we wanted to change the values of more than one column, we would provide a comma-separated list of each column along with the equals sign operator and the new respective values. SET is given only once.

The preceding SQL statement has a WHERE clause limiting the rows that will be updated by specifying a condition the row must meet. In this case, our condition is for a specific value of a unique column, so only one row will be changed. The results of the query show that one row was affected, one row was matched, one row was changed, and there were no problems to generate warnings.

Sometimes inserting data into a table will cause a duplicate row to be created because a row for the data already exists. For instance, suppose that we want to run an SQL statement that inserts data on a few books into the books table and one of the books is already in the table. If we useINSERT, a duplicate row will generally be rejected. To prevent this, we can use the REPLACE statement, which inserts rows that are new and replaces existing rows with new data.

From MySQL’s perspective, duplicates occur only when columns defined as unique contain the same value. Because the book_id column is assigned automatically, it’s unlikely that we would duplicate it, because we wouldn’t tend to assign its value when adding records. What’s unique about each book in the book business is its ISBN number, which is the bar code number on the back of the book. To ensure that we do not have rows with the same ISBN number, we’ll alter our books table again and change the isbn column to a UNIQUE column, a column that requires a unique value. This way we won’t be able to enter data inadvertently on a book more than once:

ALTER TABLE books

CHANGE COLUMN isbn isbn VARCHAR(20) UNIQUE;

Now we’re ready to insert data for more books without worrying about duplicate rows for books with the same ISBN number. Here is an example in which we attempt to add two more books by Graham Greene, one of which is already in the table:

REPLACE INTO books

(title, author_id, isbn, genre, pub_year)

VALUES('Brighton Rock',1,'0099478471','novel','1938'),

('The Quiet American',1,'0099478393','novel','1955');

The syntax for the REPLACE statement is the same as the INSERT statement. Notice that we’ve added two rows here in one statement. This is the same syntax that you would use if you want to add more than one row using INSERT. Just list each row’s data within parentheses and separate them by commas, as shown. In this example, there is already a row for the book containing the ISBN number 0099478471 (i.e., Brighton Rock), so its data will be replaced and a new row will not be added. There is currently no row for Greene’s book The Quiet American, though, so it will be added.

Deleting Data

To delete specific rows of data, you can use the DELETE statement. For example, if we want to delete all rows of data from our books table for the author J. K. Rowling, because we’ve decided not to carry Harry Potter books, we could issue the following statement:

DELETE FROM books

WHERE author_id =

(SELECT authors.author_id FROM authors

WHERE author_last = 'Rowling'

AND author_first = 'J. K.');

DELETE FROM authors

WHERE author_last = 'Rowling'

AND author_first = 'J. K.';

Here, we’re deleting only rows from the books table where the author identification number is whatever is selected from the authors table based on the specified author’s last name and first name. That is to say, the author_id must be whatever value is returned by the SELECT statement, the subquery contained in the parentheses. This statement involves a subquery, so it requires version 4.1 or later of MySQL. To delete these same rows with an earlier version of MySQL, you would need to run the SELECT statement shown in parentheses here separately (not as a subquery), make note of the author’s identification number, and then run the first DELETE statement, manually entering the identification number at the end instead of the parenthetical SELECT statement shown.

An alternative to the previous SQL statements would be to utilize user-defined variables. Here is the same example using variables:

SET @potter =

(SELECT author_id FROM authors

WHERE author_last = 'Rowling'

AND author_first = 'J. K.');

DELETE FROM books

WHERE author_id = @potter;

DELETE FROM authors

WHERE author_id = @potter;

In the first part, we use the SET statement to establish a variable called @potter that will contain the results of the SELECT statement that follows in parentheses, another subquery. Incidentally, although this subquery is not available before version 4.1, user-defined variables are. The second SQL statement deletes the rows from books where the author identification number matches the value of the temporary variable. Next, we delete the data from the authors table, still making use of the variable. A user-defined variable will last until it’s reset or until the MySQL session is closed.

Searching Data

Once our database is loaded with large amounts of data, it can be cumbersome to locate data simply by scrolling through the results of SELECT statements. Also, sometimes we don’t have the exact or complete text for a column we’re examining. For these situations, we can use the LIKEoperator. Suppose that our books table now has thousands of entries. Suppose further that a customer says he’s looking for a specific book. He can’t remember the name of the author or the title of the book, but he does remember that the words traveler and winter are in the title. We could enter this SQL statement to search the database based on this minimal information:

SELECT book_id, title,

CONCAT(author_first, ' ', author_last) AS author

FROM books

JOIN authors USING(author_id)

WHERE title LIKE '%traveler%'

AND title LIKE '%winter%';

+---------+-----------------------------------+---------------+

| book_id | title | author |

+---------+-----------------------------------+---------------+

| 1400 | If on a winter's night a traveler | Italo Calvino |

+---------+-----------------------------------+---------------+

With the LIKE operator, we use the percent sign wildcard twice to indicate that we’re searching for all rows in which the title column’s data contains the string traveler with zero or more characters before it (the preceding percent sign), and zero or more characters after it (the terminating percent sign). Put another way, the word traveler must be contained somewhere in the column’s data to have a pattern match. The next part of the clause indicates that winter must also be found in the same column. Incidentally, the LIKE keyword is an operator like the equals sign.

If another customer asks us to search the database for a Graham Greene book with either the word Stamboul or the word Orient in the title, we could use OR within an expression like this:

SELECT book_id, title

FROM books

WHERE author_id = 1

AND title LIKE '%Stamboul%'

OR author_id = 1

AND title LIKE '%Orient%';

Since we already know the author’s identification number, this statement is more succinct and includes only one table. Notice that we have to specify the author_id in each expression; otherwise we might get results by other authors that match the words for which we’re searching. For more information on operators, see Appendix B. You can find more examples and possibilities for searching data in Chapter 6.

Importing Data in Bulk

Often, when setting up a new database, you will need to migrate data from an old database to MySQL. In the case of our bookstore, let’s suppose that a vendor has sent us a disk with a list of all of their books in a simple text file. Each record for each book is on a separate line, and each field of each record is separated by a vertical bar. Here’s what the fictitious vendor’s data text file looks like:

ISBN|TITLE|AUTHOR LAST|AUTHOR FIRST|COPYRIGHT DATE|

067973452X|Notes from Underground|Dostoevsky|Fyodor|August 1994|

...

Obviously, an actual vendor file would contain more fields and records than are shown here, but this is enough for our example. The first line contains descriptions of the fields in the records that follow. We don’t need to extract the first line; it’s just instructions for us. So, we’ll tell MySQL to ignore it when we enter our SQL statement.

As for the data, we must consider a few problems. First, the fields are not in the order that they are found in our tables. We’ll have to tell MySQL the order in which the data will be coming so that it can make adjustments. The other problem is that this text table contains data for both ourbooks table and our authors table. This is going to be a bit tricky, but we can deal with it. What we’ll do is extract the author information only in one SQL statement, then we’ll run a separate SQL statement to import the book information. Before starting, we’ve copied the vendor’s file called books.txt to a temporary directory (e.g., /tmp). Here we run a LOAD DATA INFILE statement from the mysql client:

LOAD DATA INFILE '/tmp/books.txt' REPLACE INTO TABLE authors

FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'

TEXT_FIELDS(col1, col2, col3, col4, col5)

SET author_last = col3, author_first = col4

IGNORE col1, col2, col5, 1 LINES;

First, I should point out that the TEXT_FIELDS and the IGNORE clause for columns are not available before version 4.1 of MySQL. The IGNORE n LINES clause has been around for a while, though. With IGNORE 1 LINES, the first line of the text file containing the column headings will be ignored. Going back to the first line in this SQL statement, we’ve named the file to load and the table in which to load the data. The REPLACE flag has the effect of the REPLACE statement mentioned earlier. Of course, since the name fields aren’t set to unique, there won’t be any duplicates as far as MySQL is concerned. In a real-life situation, you would have to alter your table to prevent duplicates based on the author’s name.

In the second line, we specify that fields are terminated by a vertical bar and that lines are terminated by a carriage return (\r) and a newline (\n). This is the format for an MS-DOS text file. Unix files have only a newline to terminate the line.

In the third line of the SQL statement, we create aliases for each column. In the fourth line, we name the table columns to receive data and set their values based on the aliases given in the previous line. In the final line, we tell MySQL to ignore the columns that we don’t want, as well as the top line, because it doesn’t contain data.

If you’re using an older version of MySQL that isn’t able to ignore unwanted columns, you will have to perform a couple of extra steps. There are a few different ways of doing this. One simple way, if the table into which we’re loading data isn’t too large, is to add three extra, temporary columns to authors that will take in the unwanted fields of data from the text file and drop them later. This would look like the following:

ALTER TABLE authors

ADD COLUMN col1 VARCHAR(50),

ADD COLUMN col2 VARCHAR(50),

ADD COLUMN col5 VARCHAR(50);

LOAD DATA INFILE '/tmp/books.txt' REPLACE INTO TABLE authors

FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'

IGNORE 1 LINES

(col1, col2, author_last, author_first, col5);

ALTER TABLE authors

DROP COLUMN col1,

DROP COLUMN col2,

DROP COLUMN col5;

These statements will work, but they’re not as graceful as the more straightforward statement shown earlier. In the second SQL statement here, notice that the IGNORE clause specifies one line to be ignored. The last line of the same statement lists the columns in the authors table that are to receive the data and the sequence in which they will be imported. In the third SQL statement, having finished importing the data from the vendor’s text file, we now delete the temporary columns with their unnecessary data by using a DROP statement. There’s usually no recourse from DROP, no undo. So take care in using it.

Once we manage to copy the list of authors into the authors table from the text file, we need to load the data for the books and find the correct author_id for each book. We do this through the following:

LOAD DATA INFILE '/tmp/books.txt' IGNORE INTO TABLE books

FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n'

TEXT_FIELDS(col1, col2, col3, col4, col5)

SET isbn = col1, title = col2,

pub_year = RIGHT(col5, 4),

author_id =

(SELECT author_id

WHERE author_last = col3

AND author_first = col4)

IGNORE col3, col4, 1 LINES;

In this SQL statement, we’ve added a couple of twists to get what we need. On the fifth line, to extract the year from the copyright field—which contains both the month and the year—we use the string function RIGHT(). It captures the last four characters of col5 as specified in the second argument.

The sixth line starts a subquery that determines the author_id based on data from the authors table, where the author’s last and first names match what is found in the respective aliases. The results of the column selected within the parentheses will be written to the author_id column.

Finally, we’re having MySQL ignore col3 and col4, as well as the column heading line. The IGNORE flag on the first line instructs MySQL to ignore error messages, not to replace any duplicate rows, and to continue executing the SQL statement. Doing this maneuver with earlier versions of MySQL will require temporary columns or a temporary table along the lines of the previous example. Actually, using a temporary table is still a prudent method for staging data. After you’ve verified it, you can execute an INSERT...SELECT statement (see Chapter 6).

Command-Line Interface

It’s not necessary to open the mysql interface to enter SQL statements into the MySQL server. In fact, sometimes you may have only a quick query to make in MySQL, and you’d rather just do it from the shell or command line. For instance, suppose we have a table called vendors in our database, and we want to get a quick list of vendors in Louisiana and their telephone numbers. We could enter the following from the command line in Linux (or an equivalent operating system) to get this list:

mysql --user='paola' --password='caporale1017' \

-e "SELECT vendor, telephone FROM vendors \

WHERE state='LA'" bookstore

We’re still using the mysql client, but we’re not entering the interface. As shown earlier, we provide the username paola and her password caporale1017 as arguments to the command. This line ends with a backslash to let the Unix shell know that there are more parameters to come. Otherwise, we would need to put all of the information shown on one line.

On the second line, we use the -e option to indicate that what follows it in double quotes is to be executed by the mysql client. Notice that what’s in double quotes is the same SQL statement with the same syntax as what we would enter if we were logged in to the interface. The syntax doesn’t change because we’re entering the SQL statement from the command line. We don’t need a terminating semicolon, though, because the mysql client knows where the SQL statement ends.

Finally, after the SQL statement, we provide the name of the database to be used. We could eliminate this argument by adding the database name before the table name, separated by a dot (i.e., bookstore.vendors).

There are other command-line options with the mysql client. There are also other command-line utilities available for accessing and manipulating data in MySQL. You can use some of these utilities for backing up the database or for performing server maintenance and tuning. They are covered in Chapters 15 and 16.

Conclusion

Obviously, you can do plenty more with MySQL. This tutorial was designed to give you an idea of how to create a database and manage the data in some very basic ways. The remaining sections of this book provide details on MySQL statements, clauses, arguments, options, and functions. If you’re new to MySQL, you can begin with the statements and clauses highlighted in this chapter, and refer to the chapters that follow for more options and to learn about other functions and features as needed.