3.5 Group Data
Learning Objectives
- perform additional aggregations using
GROUP BY
andHAVING
clause - discuss how
NULL
s are or aren’t affected by theGROUP BY
andHAVING
clauses - use the
GROUP BY
andORDER BY
clauses together to better sort your data
A lot of times, we’ll be looking at the average price for different types of products, or total perchasing for different customer segments. Then we need to aggregate data by groups. In this example, assume we want to know the number of customers we have by each region.
SELECT Region, COUNT(CustomerID) AS total_customers
FROM Customers
GROUP BY Region;
If we were to just have our SELECT
statement with Region, COUNT(CustomerID) AS total_customers
, but without specifying the variable to GROUP BY
, we’re going to get an error return. Because the computer doesn’t know how to count the customer IDs. So we put the GROUP BY
clause in the end. There are three things to pay attention:
GROUP BY
clauses can contain multiple columns.- Every column in your
SELECT
statement must be present in aGROUP BY
clause, except for aggregated calculations. NULL
s will be grouped together if yourGROUP BY
column containsNULL
s.
HAVING
clause works for groups. But WHERE
filters on rows not groups.
SELECT CustomerID, COUNT (*) AS orders
FROM Orders
GROUP BY CustomerID
HAVING COUNT (*) >=2;
To sum up, WHERE
filters before the data is grouped and then HAVING
filters after the data is grouped. Rows eliminated by the WHERE
clause will not be included in the GROUP BY
clause. It is important to know when you should use WHERE
versus HAVING
.
Another thing to note about GROUP BY
is that it’s always a good practice to use the ORDER BY
clause. The GROUP BY
does not sort the data in any fashion. It only groups it together. In our previous examples, we have a list of states, a list of regions. It’s not going to sort those regions in alphabetical order. It’s just going to group them by different regions. It is recommended to use ORDER BY
in this situation. It makes the results a little easier to read.
SELECT SupplierID, COUNT(*) AS Num_Prod
FROM Products
WHERE UnitPrice >= 4
GROUP BY SupplierID
HAVING COUNT (*) >= 2;