Joins with Aggregate Functions
Learn to use aggregate functions.
We'll cover the following
Let’s start answering some more complex queries.
How many customers do we have with orders?
Suppose that we had a lot of customers with lots of orders. Some customers are referred to as leads because they haven’t yet placed an order. So, we have customers that have orders
and customers that don’t have an order.
How can we count the customers that have orders?
We tried the following query earlier:
mysql> select * from customers join orders on orders.customer_id = customers.id;
+----+-----------+----------+----+--------------+---------------------+-------------+
| id | name | identity | id | order_number | ordered_at | customer_id |
+----+-----------+----------+----+--------------+---------------------+-------------+
| 4 | John Woo | JW001 | 2 | ABC001 | 2016-09-12 21:36:56 | 4 |
| 5 | Maria Foo | MF001 | 3 | ABC002 | 2016-09-12 21:40:20 | 5 |
| 5 | Maria Foo | MF001 | 4 | ABC003 | 2016-09-12 21:44:34 | 5 |
| 6 | Jim Papas | JP001 | 5 | ABC004 | 2016-09-12 21:46:38 | 6 |
| 6 | Jim Papas | JP001 | 6 | ABC005 | 2016-09-12 21:47:41 | 6 |
| 6 | Jim Papas | JP001 | 7 | ABC006 | 2016-09-12 21:49:31 | 6 |
+----+-----------+----------+----+--------------+---------------------+-------------+
6 rows in set (0.01 sec)
This joins the customers
with the orders
and won’t include any customer that doesn’t have an order. This is a start to the answer to our question, but it isn’t the full answer. It shows six rows. Let’s look at the following:
mysql> select count(*) from customers join orders on orders.customer_id = customers.id;
+----------+
| count(*) |
+----------+
| 6 |
+----------+
1 row in set (0.01 sec)
We get 6
, which isn’t the correct number of customers that have placed at least one order. The correct number should be 3
. The above query repeats the same customer for every order that the customer has. That’s why we don’t get the correct result.
Use of distinct
One correct SQL statement to count the customers that have placed an order is the following:
mysql> select count(distinct customers.id) from customers join orders on orders.customer_id = customers.id;
+------------------------------+
| count(distinct customers.id) |
+------------------------------+
| 3 |
+------------------------------+
1 row in set (0.00 sec)
So, instead of count(*)
, we tell MySQL that we want to count the distinct
customers.id values. The customers.id
values are 4
, 5
, 5
, 6
, 6
, and 6
. The distinct, unique set is 4
, 5
, and 6
, which means that the count would return 3
. This is the correct result.
Remember that we can alias a column header with a label of our own to make the header of the column read easier:
mysql> select count(distinct customers.id) as number_of_customers_with_orders from customers join orders on orders.customer_id = customers.id;
+---------------------------------+
| number_of_customers_with_orders |
+---------------------------------+
| 3 |
+---------------------------------+
1 row in set (0.00 sec)
Another easier way to count the number of customers that have placed orders is to use only the order’s customer and count the distinct orders.customer_id
values instead:
mysql> select count(distinct orders.customer_id) from orders;
+------------------------------------+
| count(distinct orders.customer_id) |
+------------------------------------+
| 3 |
+------------------------------------+
1 row in set (0.00 sec)
Since every customer that has placed an order appears inside the orders
table, there’s no reason to use the customers
table. The SQL statement above consumes fewer resources and is faster than the count that uses both the customers
and orders
tables since it doesn’t have to join information from different tables. Joins are costly. We should avoid them if we can.
Practice our queries in the code widget below:
Get hands-on with 1400+ tech skills courses.