Aggregate and Numeric Functions - Built-In Functions - Learning MySQL and MariaDB (2015)

Learning MySQL and MariaDB (2015)

Part IV. Built-In Functions

Chapter 12. Aggregate and Numeric Functions

Databases will always include numbers: there’s always something to value, count, or calculate. And you may want to round the results from those numbers to conform to personal preferences. There are numeric and arithmetic functions to do these things in MySQL and MariaDB. Some are known as aggregate functions. We will cover almost all of the aggregate and many numeric functions in this chapter. We won’t cover the more advanced functions related to statistics, or the mathematical functions related to calculus and geometry. Instead, we will cover the most useful and most used functions and leave the others for you to learn later on your own as you need them.

Aggregate Functions

Statistics can provide us with useful information about a database. If a database includes information about the activities of an organization, we can determine some statistical information about those activities. If a database includes numeric values associated with items an organization sells or tracks, statistics can provide us with information for making decisions about those items.

In our birdwatchers database, we can use aggregate functions to understand the behavior of our members in relation to our bird-watchers website, the events they attend, and other activities. For our rookery database, we can ascertain some information about birds using aggregate functions. That can be useful to our members related to searching for birds in the wild, as well as their concerns for the well-being of birds. We can ascertain information about where birds are seen in the wild by our members.

In this section, we will look at aggregate functions that will help us to determine this kind of information. In order to aggregate data together to calculate statistical values of sorts, we sometimes must use the GROUP BY clause. Some of the aggregate functions, such as the COUNT() function we’ve used in earlier chapters for counting rows in a table, do not require this clause, at least under certain conditions. We’ll start with COUNT() and then look at functions for simple statistics, such as determining an average for a set of numbers.

Counting Values

One of the simplest calculations we can do is to count. We learn it as children as an introduction to mathematics. So let’s start with counting, the COUNT() function.

Suppose we want to know how many birds are in the birds table. To do that, enter the folowing in mysql:

SELECT COUNT(*)

FROM birds;

+----------+

| COUNT(*) |

+----------+

| 28891 |

+----------+

Notice that we didn’t have to include the GROUP BY clause for this simple SQL statement. That’s because we wanted MySQL to count all of the rows in the table. We didn’t need GROUP BY because we didn’t want it to separate the rows into separate groups — there’s just one group here. Notice also that we’re giving COUNT() an asterisk as the argument. That’s a wildcard to tell MySQL that we want to count all of the rows found. Because we don’t have a WHERE clause, all of the rows will be selected.

Many of the bird species lack common names. So the common_name column in birds is blank for these species. COUNT() has a special convention: if you pass a column name instead of an asterisk as its argument, it counts only the columns that are not NULL. Let’s change that data and then see how it might look. Enter these two SQL statements:

UPDATE birds

SET common_name = NULL

WHERE common_name = '';

SELECT COUNT(common_name)

FROM birds;

+--------------------+

| COUNT(common_name) |

+--------------------+

| 9553 |

+--------------------+

That’s the number of birds with a common name in the table. We could have gotten the same results with a WHERE clause, and without having modified the data as we did. This lets us select only rows where the common_name does not equal ''. We’ve changed those values to NULL, though, so let’s use the WHERE clause to see how that would look based on NULL values. Enter the following:

SELECT COUNT(*) FROM birds

WHERE common_name IS NULL;

+----------+

| COUNT(*) |

+----------+

| 19338 |

+----------+

This gave us a different number. That’s because we’re counting the rows where the common_name is NULL — we used the operator IS NULL. Before, we counted the rows where the common_name was not NULL. We can count those with the WHERE clause like so:

SELECT COUNT(*) FROM birds

WHERE common_name IS NOT NULL;

+----------+

| COUNT(*) |

+----------+

| 9553 |

+----------+

That’s the answer we got before. It just required us to use the IS NOT NULL operator.

As useful as all of this may be, let’s get some more interesting results. Let’s count the number of birds within each family of birds. To do that, we have to use the GROUP BY clause. We’ll enter the following to get a count of the number of birds in each family:

SELECT COUNT(*)

FROM birds

GROUP BY family_id;

+----------+

| COUNT(*) |

+----------+

| 5 |

| 6 |

| 248 |

| 119 |

| 168 |

| 39 |

| 223 |

| ... |

+----------+

227 rows in set (0.15 sec)

In this example, we told MySQL to GROUP BY the family_id. So it sorted the rows by the family_id and counted the number of rows for each group. Because the results here would take up 227 rows, I’ve removed some of the results to save space. This SQL statement did what we asked, but it’s not very useful or interesting. It would be better to get the name of the bird families to go with these counts. To do this, we’ll have to use a JOIN to include the bird_families table. Here’s how we would do that:

SELECT bird_families.scientific_name AS 'Bird Family',

COUNT(*) AS 'Number of Species'

FROM birds JOIN bird_families USING(family_id)

GROUP BY birds.family_id;

+--------------------+-------------------+

| Bird Family | Number of Species |

+--------------------+-------------------+

| Gaviidae | 6 |

| Anatidae | 248 |

| Charadriidae | 119 |

| Laridae | 168 |

| Sternidae | 39 |

| Caprimulgidae | 223 |

| Sittidae | 92 |

| ... | |

+--------------------+-------------------+

225 rows in set (0.17 sec)

That’s nicer looking, and the results are more interesting. I’ve shortened the results again, but notice that we now have only 225 rows. That’s because we have some rows in the birds table in which the family_id is NULL. When using a database, watch for discrepancies like this; don’t ignore them just because you weren’t looking for problems. They can help you catch problems you overlooked.

Let’s modify the SELECT statement to show the number of rows in birds that do not have matching values in bird_families. We’ll do this with a LEFT JOIN (covered in Joining Tables, which included examples, but let’s apply that concept again here):

SELECT bird_families.scientific_name AS 'Bird Family',

COUNT(*) AS 'Number of Species'

FROM birds LEFT JOIN bird_families USING(family_id)

GROUP BY birds.family_id;

+--------------------+-------------------+

| Bird Family | Number of Species |

+--------------------+-------------------+

| NULL | 4 |

| NULL | 1 |

| Gaviidae | 6 |

| Anatidae | 248 |

| Charadriidae | 119 |

| Laridae | 168 |

| Sternidae | 39 |

| Caprimulgidae | 223 |

| Sittidae | 92 |

| ... | |

+--------------------+-------------------+

225 rows in set (0.17 sec)

Some of these rows may have a family_id of NULL, and one may have a family_id not contained in bird_families. To resolve this problem, we would run a SELECT to list rows where the bird_id is not included in bird_families. But this is getting away from learning about aggregate functions. Let’s assume that we’ve found the rows with missing data and fixed them so that we can move on.

In the results for the last two examples, you may have noticed that the names of the bird families are not listed alphabetically. That’s because GROUP BY orders rows based on the columns by which it is grouping (i.e., family_id). If we want to order the results based on the family name, thescientific_name in the bird_families table, we’d have to change the GROUP BY clause to group by that column. Try entering this:

SELECT bird_families.scientific_name AS 'Bird Family',

COUNT(*) AS 'Number of Species'

FROM birds LEFT JOIN bird_families USING(family_id)

GROUP BY bird_families.scientific_name;

+--------------------+-------------------+

| Bird Family | Number of Species |

+--------------------+-------------------+

| Acanthisittidae | 9 |

| Acanthizidae | 238 |

| Accipitridae | 481 |

| Acrocephalidae | 122 |

| Aegithalidae | 49 |

| Aegithinidae | 20 |

| Aegothelidae | 21 |

| Alaudidae | 447 |

| ... | |

+--------------------+-------------------+

That’s better. What would be nicer is if those results also showed the total number of birds at the bottom. We can get that from a separate SQL statement, but to get the total in the same results set, we would add WITH ROLLUP to the GROUP BY clause like so:

SELECT bird_families.scientific_name AS 'Bird Family',

COUNT(*) AS 'Number of Species'

FROM birds JOIN bird_families USING(family_id)

GROUP BY bird_families.scientific_name WITH ROLLUP;

+--------------------+-------------------+

| Bird Family | Number of Species |

+--------------------+-------------------+

| Acanthisittidae | 9 |

| Acanthizidae | 238 |

| Accipitridae | 481 |

| Acrocephalidae | 122 |

| Aegithalidae | 49 |

| Aegithinidae | 20 |

| Aegothelidae | 21 |

| Alaudidae | 447 |

| ... | |

| NULL | 28891 |

+--------------------+-------------------+

The total is on the last line and is equal to the count we did in the first example of this section. In the results here, the NULL value for the first field doesn’t refer to rows that don’t have a value for family_id. Instead, this is the total line. MySQL just doesn’t have a value to put in that field as a label, so it uses NULL. We can tweak that, though, to give it a label. While we’re doing that, let’s include counts by orders of birds. Enter the following:

SELECT IFNULL( bird_orders.scientific_name, '') AS 'Bird Order',

IFNULL( bird_families.scientific_name, 'Total:') AS 'Bird Family',

COUNT(*) AS 'Number of Species'

FROM birds

JOIN bird_families USING(family_id)

JOIN bird_orders USING(order_id)

GROUP BY bird_orders.scientific_name, bird_families.scientific_name

WITH ROLLUP;

+---------------------+--------------------+-------------------+

| Bird Order | Bird Family | Number of Species |

+---------------------+--------------------+-------------------+

| Anseriformes | Anhimidae | 3 |

| Anseriformes | Total: | 3 |

| Apodiformes | Apodidae | 316 |

| Apodiformes | Hemiprocnidae | 16 |

| Apodiformes | Trochilidae | 809 |

| Apodiformes | Total: | 1141 |

| Caprimulgiformes | Aegothelidae | 21 |

| Caprimulgiformes | Caprimulgidae | 224 |

| Caprimulgiformes | Nyctibiidae | 17 |

| Caprimulgiformes | Podargidae | 26 |

| ... | | |

| | Total: | 28890 |

+---------------------+--------------------+-------------------+

Besides adding another field to get the number of birds within an order of birds, we used the IFNULL() function to wrap the fields for the bird order counts and family counts. This function tells MySQL that if the value for the field will be NULL, it should be replaced with the value or string given — else it should return the count. Because the statement calculates first the primary totals (i.e., the totals for each family of birds), and then calculates the secondary totals (i.e., the totals for each order of birds), this works.

The results in the previous example aren’t marvelous, but you can easily use this method in conjunction with a script that will display these results on a web page. You can use an API to check for a value of Total: in the second field and then adjust for that. You could instead do these simple calculations in an API script, rather than have MySQL do them. However, sometimes it’s better to do calculations at the database system level. I have found often that better SQL statements make for tighter and easier to maintain API scripts. All right; enough of that. Let’s move on to more aggregate functions, besides just counting the number of rows.

Calculating a Group of Values

In Chapter 11, we created a new table, bird_identification_tests, for recording fun tests members could do online to try their skills at identifying birds. Suppose we want to tell the member how long it takes them on average to identify birds. A simple calculation would be to get the total time elapsed (i.e., subtracting id_end from id_start) for each row and then adding those differences together to get the sum of all rows. We would then divide that sum by the number of rows. To get the sum, we can use the SUM() function.

Before we jump too far ahead, though, let’s look at some entries for one of the members to remember and know what to do. We’ll use the TIMEDIFF() function to determine the difference between the time the test started and when it ended (covered in the section Comparing Dates and Times). Enter the following:

SELECT common_name AS 'Bird',

TIME_TO_SEC( TIMEDIFF(id_end, id_start) )

AS 'Seconds to Identify'

FROM bird_identification_tests

JOIN humans USING(human_id)

JOIN rookery.birds USING(bird_id)

WHERE name_first = 'Ricky' AND name_last = 'Adams';

+--------------------+---------------------+

| Bird | Seconds to Identify |

+--------------------+---------------------+

| Crested Shelduck | 81 |

| Moluccan Scrubfowl | 174 |

| Indian Pond-Heron | 181 |

+--------------------+---------------------+

Because we need the total number of seconds for each test in order to add the values together to get to an average, we used TIME_TO_SEC() to convert the results from TIMEDIFF() (e.g., to convert from 121, for 1 minute and 21 seconds, to 81 seconds). We did this extra step just to see how these values come more easily together with SUM() and to better understand the time functions in the following SQL statement:

SELECT CONCAT(name_first, SPACE(1), name_last)

AS 'Birdwatcher',

SUM(TIME_TO_SEC( TIMEDIFF(id_end, id_start) ) )

AS 'Total Seconds for Identifications'

FROM bird_identification_tests

JOIN humans USING(human_id)

JOIN rookery.birds USING(bird_id)

WHERE name_first = 'Ricky' AND name_last = 'Adams';

+-------------+-----------------------------------+

| Birdwatcher | Total Seconds for Identifications |

+-------------+-----------------------------------+

| Ricky Adams | 436 |

+-------------+-----------------------------------+

That gives us the correct number of seconds that Ricky Adams spent identifying three birds. Notice that this is another aggregate function that doesn’t require the GROUP BY clause. Now let’s change the SQL statement to calculate the average time (e.g., 426 seconds divided by 3 entries). To do this, we’ll use an absurdly complex and inefficient method. We’ll create a subquery to get each value to calculate the average. You don’t have to enter this one. Just look it over:

SELECT Identifications, Seconds,

(Seconds / Identifications) AS 'Avg. Seconds/Identification'

FROM

( SELECT human_id, COUNT(*) AS 'Identifications'

FROM bird_identification_tests

JOIN humans USING(human_id)

JOIN rookery.birds USING(bird_id)

WHERE name_first = 'Ricky' AND name_last = 'Adams')

AS row_count

JOIN

( SELECT human_id, CONCAT(name_first, SPACE(1), name_last)

AS 'Birdwatcher',

SUM(TIME_TO_SEC(TIMEDIFF(id_end, id_start)))

AS 'Seconds'

FROM bird_identification_tests

JOIN humans USING(human_id)

JOIN rookery.birds USING(bird_id) )

AS second_count

USING(human_id);

+-----------------+---------+-----------------------------+

| Identifications | Seconds | Avg. Seconds/Identification |

+-----------------+---------+-----------------------------+

| 3 | 436 | 145.3333 |

+-----------------+---------+-----------------------------+

That was a lot of work for something that should be simple — and it can be. Let’s change that to use AVG():

SELECT CONCAT(name_first, SPACE(1), name_last)

AS 'Birdwatcher',

AVG( TIME_TO_SEC( TIMEDIFF(id_end, id_start)) )

AS 'Avg. Seconds per Identification'

FROM bird_identification_tests

JOIN humans USING(human_id)

JOIN rookery.birds USING(bird_id)

WHERE name_first = 'Ricky' AND name_last = 'Adams';

+-------------+---------------------------------+

| Birdwatcher | Avg. Seconds per Identification |

+-------------+---------------------------------+

| Ricky Adams | 145.3333 |

+-------------+---------------------------------+

That was much easier, and without any subqueries. If we remove the WHERE clause, we would get the average time for all of the members. Let’s do that and change the formatting of the time to minutes and seconds, not just the average of total seconds. We’ll use SEC_TO_TIME() to do that, reversing the results of TIME_TO_SEC() now that we’ve calculated the average. Enter this on your server:

SELECT CONCAT(name_first, SPACE(1), name_last)

AS 'Birdwatcher',

COUNT(*) AS 'Birds',

TIME_FORMAT(

SEC_TO_TIME(AVG( TIME_TO_SEC( TIMEDIFF(id_end, id_start)))),

'%i:%s' )

AS 'Avg. Time'

FROM bird_identification_tests

JOIN humans USING(human_id)

JOIN rookery.birds USING(bird_id)

GROUP BY human_id LIMIT 3;

+---------------+-------+-----------+

| Birdwatcher | Birds | Avg. Time |

+---------------+-------+-----------+

| Rusty Osborne | 2 | 01:59 |

| Lexi Hollar | 3 | 00:23 |

| Ricky Adams | 3 | 02:25 |

+---------------+-------+-----------+

This time we included more members — but limited the results to three — and include the number of birds that each member identified. We also formatted the average time better. We can see that Ricky Adams took much longer on average than Lexi Hollar. It may be that Lexi is quicker or that Ricky was distracted when he was identifying birds.

Because we used the LIMIT clause, we can’t determine the longest and quickest average times from these results. To know that, we need to remove the LIMIT and then make the SQL statement a subquery of another in which we will add an ORDER BY clause. Essentially, the inner SELECT returns a list with each bird-watcher and their average time, which the outer SELECT puts in the order we want:

SELECT Birdwatcher, avg_time AS 'Avg. Time'

FROM

(SELECT CONCAT(name_first, SPACE(1), name_last) AS 'Birdwatcher',

COUNT(*) AS 'Birds',

TIME_FORMAT( SEC_TO_TIME( AVG(

TIME_TO_SEC( TIMEDIFF(id_end, id_start)))

),'%i:%s' ) AS 'avg_time'

FROM bird_identification_tests

JOIN humans USING(human_id)

JOIN rookery.birds USING(bird_id)

GROUP BY human_id) AS average_times

ORDER BY avg_time;

+-------------------+-----------+

| Birdwatcher | Avg. Time |

+-------------------+-----------+

| Lexi Hollar | 00:23 |

| Geoffrey Dyer | 00:25 |

| Katerina Smirnova | 00:48 |

| Rusty Osborne | 01:59 |

| Ricky Adams | 02:25 |

| Anahit Vanetsyan | 03:20 |

+-------------------+-----------+

Now we know that Lexi is the quickest and Anahit was the slowest. We had to use a subquery because you can’t generally put a GROUP BY and an ORDER BY clause in the same SQL statement. You have to do what we did here instead.

If we don’t want to know the names of who had the minimum average and who had the maximum, we could use the MAX() and MIN() functions. Let’s redo the previous SQL statement to include those aggregate functions. Try this on your server:

SELECT MIN(avg_time) AS 'Minimum Avg. Time',

MAX(avg_time) AS 'Maximum Avg. Time'

FROM humans

JOIN

(SELECT human_id, COUNT(*) AS 'Birds',

TIME_FORMAT(

SEC_TO_TIME( AVG(

TIME_TO_SEC( TIMEDIFF(id_end, id_start)))

), '%i:%s' ) AS 'avg_time'

FROM bird_identification_tests

JOIN humans USING(human_id)

JOIN rookery.birds USING(bird_id)

GROUP BY human_id ) AS average_times;

+-------------------+-------------------+

| Minimum Avg. Time | Maximum Avg. Time |

+-------------------+-------------------+

| 00:23 | 03:20 |

+-------------------+-------------------+

Comparing these results to the previous ones, we can see that they are correct. If we want to see the minimum and maximum time for each person, instead of the averages, we could do this:

SELECT CONCAT(name_first, SPACE(1), name_last) AS 'Birdwatcher',

TIME_FORMAT(SEC_TO_TIME(

MIN(TIME_TO_SEC( TIMEDIFF(id_end, id_start)))

),%i:%s' ) AS 'Minimum Time',

TIME_FORMAT(SEC_TO_TIME(

MAX(TIME_TO_SEC( TIMEDIFF(id_end, id_start)))

), '%i:%s' ) AS 'Maximum Time'

FROM bird_identification_tests

JOIN humans USING(human_id)

JOIN rookery.birds USING(bird_id)

GROUP BY Birdwatcher;

+-------------------+--------------+--------------+

| Birdwatcher | Minimum Time | Maximum Time |

+-------------------+--------------+--------------+

| Anahit Vanetsyan | 00:20 | 08:48 |

| Geoffrey Dyer | 00:09 | 00:42 |

| Katerina Smirnova | 00:22 | 01:02 |

| Lexi Hollar | 00:11 | 00:39 |

| Ricky Adams | 01:21 | 03:01 |

| Rusty Osborne | 01:50 | 02:08 |

+-------------------+--------------+--------------+

This shows an alphabetic list of members and each one’s minimum and maximum time to identify a bird. Essentially, once you group items by the bird-watcher, you can run aggregate functions such as AVG() and MAX() on them. We removed the field counting the number of identifications they made.

We could play with this more to see which birds take the longest to identify and which take the least amount of time. We could mark ones that are most difficult to identify for more advanced members. Some members may have a low average time if it were not for one bird that was particularly difficult to identify. For those entries, we could use the aggregate functions for more advanced statistical calculations to remove them, functions like STDDEV() and VARIANCE(). As a beginner, you probably won’t need to know them. Just know that they exist in case one day you do.

Before moving on, let’s look at one more example using MIN() and MAX(), an example that uses values other than time values. The bird_sightings table contains information on birds that our members saw in the field. It includes the GPS coordinates where each bird was seen: thelocation_gps column. This column contains two 11-digit numbers: the latitude and the longitude on the globe. Because birds tend to migrate between north and south, suppose we want to know the farthest north and south that birds were seen. We could use SUBSTRING() to extract the latitude, the MAX() function to determine which value is farthest north, and MIN() to determine which is the farthest south. We would do this like so:

SELECT common_name AS 'Bird',

MAX(SUBSTRING(location_gps, 1, 11)) AS 'Furthest North',

MIN(SUBSTRING(location_gps, 1, 11)) AS 'Furthest South'

FROM birdwatchers.bird_sightings

JOIN rookery.birds USING(bird_id)

WHERE location_gps IS NOT NULL

GROUP BY bird_id LIMIT 3;

+-----------------+----------------+----------------+

| Bird | Furthest North | Furthest South |

+-----------------+----------------+----------------+

| Eskimo Curlew | 66.16051056 | 66.16051056 |

| Whimbrel | 30.29138551 | 30.29138551 |

| Eurasian Curlew | 51.70469364 | 42.69096856 |

+-----------------+----------------+----------------+

In these results, because there was only one sighting of the first two birds, the values for both fields are the same. But for the Eurasian Curlew, you can see that it shows the farthest north and south that the bird was seen by our members.

Concatenating a Group

There is one more aggregate function that I want to cover before finishing with them. The GROUP_CONCAT() function is not used much, but it can be handy for particular situations. It’s used to concatenate together the values for a group into a comma-separated list. Without it, you would need to do a subquery and use CONCAT_WS() to concatenate the results of a field.

To list the bird families for a particular order of birds, we could issue a simple SELECT statement. Now suppose we want a list of bird orders and bird families together, but we want one of the fields in the results to contain all of the bird families for each bird order. That would be cumbersome to do without GROUP_CONCAT(). Let’s see what it can do for us, using this supposition. Enter the following on your server:

SELECT bird_orders.scientific_name AS 'Bird Order',

GROUP_CONCAT(bird_families.scientific_name)

AS 'Bird Families in Order'

FROM rookery.bird_families

JOIN rookery.bird_orders USING(order_id)

WHERE bird_orders.scientific_name = 'Charadriiformes'

GROUP BY order_id \G

*************************** 1. row ***************************

Bird Order: Charadriiformes

Bird Families in Order:

Charadriidae,Laridae,Sternidae,Burhinidae,Chionidae,Pluvianellidae,

Dromadidae,Haematopodidae,Ibidorhynchidae,Recurvirostridae,

Jacanidae,Scolopacidae,Turnicidae,Glareolidae,Pedionomidae,

Thinocoridae,Rostratulidae,Stercorariidae,Alcidae

I limited the results to one particular family to save space here. To get lists of orders for all families, just remove the WHERE clause:

SELECT bird_orders.scientific_name AS 'Bird Order',

GROUP_CONCAT(bird_families.scientific_name SEPARATOR ', ')

AS 'Bird Families in Order'

FROM rookery.bird_families

JOIN rookery.bird_orders USING(order_id)

GROUP BY order_id \G

If you tried that, you saw that the SEPARATOR clause of the GROUP_CONCAT() added a comma and a space after each family name.

Numeric Functions

Numeric functions are functions that change numbers in some way. They don’t do a calculation, per se. That would be arithmetic functions. Instead, they help you simplify the numeric result of a query. You might want to round a number up or down, or get the absolute value. These actions can be done easily with numeric functions. We’ll look at them in this section.

Rounding Numbers

Computers are very precise, so when we ask them to do a calculation, they will sometimes return a number with many decimal places. That may not matter to you, especially if the number is not displayed and used just by other functions for processing, either now or later. However, as humans, we tend to be more comfortable with rounded numbers. We’re usually not as precise as computers. To that end, there are a few numeric functions that may be used for rounding.

In Dynamic Columns, we created some tables with dynamic columns in MariaDB. These included surveys of members about their bird-watching preferences. Let’s use those tables and the data they contain to test some numeric functions. If you didn’t create those survey tables or if you aren’t using MariaDB, you won’t be able to participate in these examples.

To start, let’s look at one of the SQL statements we used in that section. We’ll run it again, but with more data from my site:

SELECT IFNULL(COLUMN_GET(choices, answer AS CHAR), 'total')

AS 'Birding Site', COUNT(*) AS 'Votes'

FROM survey_answers

JOIN survey_questions USING(question_id)

WHERE survey_id = 1

AND question_id = 1

GROUP BY answer WITH ROLLUP;

+--------------+-------+

| Birding Site | Votes |

+--------------+-------+

| forest | 30 |

| shore | 42 |

| backyard | 14 |

| total | 86 |

+--------------+-------+

This shows us the number of votes from members for the types of locations they prefer for watching birds. Let’s calculate the percentages to go with these results. To do this, we need first to count the number of votes for all of the choices. We could put that in a subquery, but let’s keep it simpler by executing a SELECT statement first to get that value. We’ll create a user-defined variable in which to temporarily store that number. A user variable is temporary and will last only for the current client session. It can be accessed only by the user that creates it. You would use the SETstatement to create a user variable. The variable name must start with @, followed by the equals sign, and then a value, an expression, or an SQL statement that will determine the value of the user-defined variable. Let’s create one now for our example. Enter the following on your MariaDB server:

SET @fav_site_total =

(SELECT COUNT(*)

FROM survey_answers

JOIN survey_questions USING(question_id)

WHERE survey_id = 1

AND question_id = 1);

SELECT @fav_site_total;

+-----------------+

| @fav_site_total |

+-----------------+

| 86 |

+-----------------+

Because I added plenty more rows to the survey_answers table, this result is now higher than previously. You’ll see that the total is correct in the results of the next example. Let’s use the variable we created as the denominator for calculating the percentage of votes for each choice:

SELECT COLUMN_GET(choices, answer AS CHAR)

AS 'Birding Site',

COUNT(*) AS 'Votes',

(COUNT(*) / @fav_site_total) AS 'Percent'

FROM survey_answers

JOIN survey_questions USING(question_id)

WHERE survey_id = 1

AND question_id = 1

GROUP BY answer;

+--------------+-------+---------+

| Birding Site | Votes | Percent |

+--------------+-------+---------+

| forest | 30 | 0.3488 |

| shore | 42 | 0.4884 |

| backyard | 14 | 0.1628 |

+--------------+-------+---------+

In this example, we’re dividing the number of votes for each choice by the variable containing the total number of votes. That gives us numbers with four decimal places. Let’s change those numbers to read as percentages by multiplying them by 100 and using the ROUND() function to get rid of the decimals. We’ll use CONCAT() to paste a percent sign to the end of the number:

SELECT COLUMN_GET(choices, answer AS CHAR)

AS 'Birding Site',

COUNT(*) AS 'Votes',

CONCAT( ROUND( (COUNT(*) / @fav_site_total) * 100), '%')

AS 'Percent'

FROM survey_answers

JOIN survey_questions USING(question_id)

WHERE survey_id = 1

AND question_id = 1

GROUP BY answer;

+--------------+-------+---------+

| Birding Site | Votes | Percent |

+--------------+-------+---------+

| forest | 30 | 35% |

| shore | 42 | 49% |

| backyard | 14 | 16% |

+--------------+-------+---------+

Notice that the ROUND() function rounded the first two numbers up and the last one down. That’s how rounding goes. Let’s change the results to show one decimal place:

SELECT COLUMN_GET(choices, answer AS CHAR)

AS 'Birding Site',

COUNT(*) AS 'Votes',

CONCAT( ROUND( (COUNT(*) / @fav_site_total) * 100, 1), '%') AS 'Percent'

FROM survey_answers

JOIN survey_questions USING(question_id)

WHERE survey_id = 1

AND question_id = 1

GROUP BY answer;

+--------------+-------+---------+

| Birding Site | Votes | Percent |

+--------------+-------+---------+

| forest | 30 | 34.9% |

| shore | 42 | 48.8% |

| backyard | 14 | 16.3% |

+--------------+-------+---------+

The ROUND() function rounded up and down to the first decimal place based on the true value, which includes multiple decimal places. Suppose we want to be conservative and round all values down, or all values up. For that, we need other functions.

Rounding Only Down or Up

To round only down, use the FLOOR() function. To round only up, use the CEILING() function. Let’s use the previous example to see how we would round down the results:

SELECT COLUMN_GET(choices, answer AS CHAR)

AS 'Birding Site',

COUNT(*) AS 'Votes',

CONCAT( FLOOR( (COUNT(*) / @fav_site_total) * 100), '%')

AS 'Percent'

FROM survey_answers

JOIN survey_questions USING(question_id)

WHERE survey_id = 1

AND question_id = 1

GROUP BY answer;

+--------------+-------+---------+

| Birding Site | Votes | Percent |

+--------------+-------+---------+

| forest | 30 | 34% |

| shore | 42 | 48% |

| backyard | 14 | 16% |

+--------------+-------+---------+

In this example, we replaced ROUND() with FLOOR() so that the results would be rounded down. The FLOOR() function doesn’t allow for specifying the number of decimal places. Instead, it rounds down to the integer value.

If we want to round only up, we would use the CEILING() function like so:

SELECT COLUMN_GET(choices, answer AS CHAR)

AS 'Birding Site',

COUNT(*) AS 'Votes',

CONCAT( CEILING( (COUNT(*) / @fav_site_total) * 100), '%') AS 'Percent'

FROM survey_answers

JOIN survey_questions USING(question_id)

WHERE survey_id = 1

AND question_id = 1

GROUP BY answer;

+--------------+-------+---------+

| Birding Site | Votes | Percent |

+--------------+-------+---------+

| forest | 30 | 35% |

| shore | 42 | 49% |

| backyard | 14 | 17% |

+--------------+-------+---------+

That rounded up all of the values. If a value has no decimal places, it wouldn’t change the value.

Truncating Numbers

If we don’t want to round a number up or down, but we just want to eliminate the extra decimal places, we can use TRUNCATE(). Let’s see how that looks with the same SQL statement we’ve been modifying:

SELECT COLUMN_GET(choices, answer AS CHAR)

AS 'Birding Site',

COUNT(*) AS 'Votes',

CONCAT( TRUNCATE( (COUNT(*) / @fav_site_total) * 100, 1), '%')

AS 'Percent'

FROM survey_answers

JOIN survey_questions USING(question_id)

WHERE survey_id = 1

AND question_id = 1

GROUP BY answer;

+--------------+-------+---------+

| Birding Site | Votes | Percent |

+--------------+-------+---------+

| forest | 30 | 34.8% |

| shore | 42 | 48.8% |

| backyard | 14 | 16.2% |

+--------------+-------+---------+

As the name of the function implies, it truncated the value after the number of decimal places specified (i.e., 1 in this example).

Eliminating Negative Numbers

Sometimes when we’re working with numbers in functions, we get them in the wrong order and the result is a number with a negative sign. If we’re trying to find only the difference between two numbers, we can use ABS() to return the absolute value, the value without a negative sign. Absolute values are also important for certain mathematical calculations.

We’ll try this function by using part of some examples from the previous section, where we determined the total number of seconds each member took to identify birds. This time we’ll just calculate a total for all rows, not grouping by human_id:

SELECT

SUM( TIME_TO_SEC( TIMEDIFF(id_start, id_end) ) )

AS 'Total Seconds for All',

ABS( SUM( TIME_TO_SEC( TIMEDIFF(id_start, id_end) ) ) )

AS 'Absolute Total'

FROM bird_identification_tests;

+-----------------------+----------------+

| Total Seconds for All | Absolute Total |

+-----------------------+----------------+

| -1689 | 1689 |

+-----------------------+----------------+

There’s not much to this function and example. The first field has a negative sign because we put the id_start before the id_end within TIMEDIFF(). We could just reverse the order, but there will be situations in which you won’t know which value will be greater than the other. For this, you may need ABS().

In other situations, you want to know whether a value is positive or negative. For this, you can use the SIGN() function. It returns a value of 1 if the argument given results in a positive number, -1 if it results in a negative number, and 0 if it’s given a value of zero.

As an example, let’s go back to our bird identification tests. Suppose we want a list of birds that members identified in less time than the average. We calculated the minimum average earlier in Calculating a Group of Values. We’ll reuse part of that SQL statement, but save the results to a user-defined variable and use that variable to compare each row in bird_identification_tests so we can list only rows in which the time it took to identify the bird was less than average. Set up that variable and test it by entering this on your server:

SET @min_avg_time =

(SELECT MIN(avg_time) FROM

(SELECT AVG( TIME_TO_SEC( TIMEDIFF(id_end, id_start)))

AS 'avg_time'

FROM bird_identification_tests

GROUP BY human_id) AS average_times);

SELECT @min_avg_time;

+---------------+

| @min_avg_time |

+---------------+

| 23.6667 |

+---------------+

That’s about right. We had a value of 23 seconds before, but that’s because we rounded it with TIME_FORMAT(). This is more accurate. Let’s use that variable now to do a comparison using SIGN() in the WHERE clause. Enter this on your server:

SELECT CONCAT(name_first, SPACE(1), name_last)

AS 'Birdwatcher',

common_name AS 'Bird',

ROUND(@min_avg_time - TIME_TO_SEC( TIMEDIFF(id_end, id_start) ) )

AS 'Seconds Less than Average'

FROM bird_identification_tests

JOIN humans USING(human_id)

JOIN rookery.birds USING(bird_id)

WHERE SIGN( TIME_TO_SEC( TIMEDIFF(id_end, id_start) - @min_avg_time)) = -1;

+-------------------+----------------------+---------------------------+

| Birdwatcher | Bird Identified | Seconds Less than Average |

+-------------------+----------------------+---------------------------+

| Lexi Hollar | Blue Duck | 3 |

| Lexi Hollar | Trinidad Piping-Guan | 13 |

| Geoffrey Dyer | Javan Plover | 15 |

| Katerina Smirnova | Blue Duck | 2 |

| Anahit Vanetsyan | Great Crested Grebe | 4 |

+-------------------+----------------------+---------------------------+

The use of SIGN() in the WHERE clause selects rows in which the member took less than the average time. That’s a function that would be difficult to duplicate in MySQL by any other method.

Summary

Although we didn’t cover all of the aggregate and numeric functions, we covered most of them — including the ones that are used most often. We primarily skipped the statistics functions. We didn’t cover many arithmetic functions, but those are pretty straightforward (e.g., POWER(2, 8)returns 2 to the eighth power, or 256), or they’re specialized (e.g., PI() returns π, or 3.141593). What’s important is that you feel comfortable with aggregate functions and using the GROUP BY clause — you’ll use them often — and that you have a firm grasp on the numeric functions covered in this chapter. Several other numeric functions exist, in case you ever need them. If you want to learn about these other functions, you can check the MySQL documentation or the MariaDB documentation.

Exercises

Numeric functions are pretty easy, once you know what each does. You probably didn’t have any trouble following the sections on them in this chapter. Aggregate functions, though, can be a little bothersome. Therefore, while some exercises in this section require you to use numeric functions, most include aggregate functions. Some call for you to combine numeric and aggregate functions. These should help you to retain what you learned in this chapter. There aren’t many exercises for this chapter, though, so it shouldn’t take you much time to complete all of them.

1. Construct a simple SELECT statement that counts the number of rows in the birds table where the common_name contains the word Least. Execute that to make sure you did it correctly. Next, modify that SQL statement to count the rows in which the common_name contains the word Great. You’ll do this by using the LIKE operator in the WHERE clause.

2. In Calculating a Group of Values, we covered how to group columns for counting. Combine the two SQL statements you constructed in the previous exercise and make one SQL statement using GROUP BY to produce one field in the results that shows the number of birds with acommon_name containing Least and another field that shows the number of birds that are Great.

3. In some of the examples in this chapter (see Counting Values), we had MySQL count the number of birds in each species and in each bird family. For this exercise, you may want to refer to those examples.
Construct a SELECT statement to query the birds table, with three fields in the results set: the name of the bird species, the number of birds in that species, and the percentage that species represents of the total number of species. Let MySQL calculate the total number of species; don’t enter that value manually in the SQL statement.
After you’ve successfully executed this SQL statement, modify the SQL statement using one of the numeric functions to round to one decimal place the field that contains the percentage value.

4. Do the previous exercise again, but this time create another SELECT statement that retrieves only the total number of bird species. With the SET statement, create a user variable to store that value taken by MySQL from the SELECT statement. You may give any name you want for that variable.
Now change the SELECT statement you created in the previous exercise, but use the variable you created for determining the percentage of total birds in the table. Once you have it executed correctly, exit the mysql client and log back in.
Run the same SQL statement to create the user variable and the second SQL statement for this exercise again. Notice the time it takes to execute in the results statement. Then execute again the SELECT statement from the previous exercise, the one that doesn’t use a user variable. Notice how long it took to execute compared to the SELECT statement that uses a user variable.

5. In the humans table, the membership_expiration column contains date values. Put together a SELECT statement in which you determine the number of months after the date 2014-01-01 until each member’s membership will expire. If you’re not sure how to do this, refer to Comparing Dates and Times. Use the SIGN() function in the WHERE clause to determine whether a membership has expired. List only unexpired memberships. This was covered in Eliminating Negative Numbers. Remember to use the IF NOT NULL operator in the WHERE clause to exclude those members who don’t have a paid membership (i.e., no expiration date). Label the field as Months to Expiration.

6. Modify the SQL statement you created for the previous exercise. Don’t exclude expired memberships this time — but still exclude those without a paid membership. Use the CONCAT() function to append " - expired" to the number of months remaining or past due. Don’t append the text if the membership hasn’t expired. You’ll have to use the IF() function to test that for the field containing the number of months. You’ll also have to use the ABS() function to remove the negative sign from the value.

7. Based on the SQL statement you constructed in the previous exercises, create a new one to determine the average number of months until expiration for all paid members in one field, and the average number of months past expiration, based on the date of 2014-01-01. You will need to use the AVG() function to calculate these averages. Once you have that working, add fields to determine the minimum and maximum number of months, using MIN() and MAX(), and the GROUP BY clause.