4.1 Subqueries
- Define subqueries
- Discuss advantages and disadvantages of using subqueries
- Explain how subqueries help us merge data from two or more tables
- Write more efficient subqueries
Subqueries are queries embedded into other queries. Since data is stored in multiple tables, you may need to get information from multiple sources. Data scientists often use subqueries to:
- Select specific records or columns and then use the criteria as filtering criteria for the next thing they want to select.
- Add additional criteria like filtering criteria that are not in your current table from another table into your query.
Example: Need to know the region each customer is from who has had an order with freight over 100. The freight information is in table Orders
and the customer information is in table Customers
.
SELECT customerID, CompanyName, Region
FROM Customers
WHERE customerID in (SELECT customerID FROM Orders WHERE Freight > 100);
DBMS is performing two operations:
- Get the customer ID for the orders selected
- Pass that to the
WHERE
clause and processing the overallSELECT
statement
Note that it always perform the innermost SELECT
portion first.
- Best Practices and Considerations
- There is no limit to the number of subqueries you can have
- Performance slows when you nest too deeply.
- Subquery selects can only retrieve a single column
Also, when you use subqueries, the code gets harder to understand. One of the best practices for writing subqueries is to be clear and consistent with indenting to make it easier to read. This website will pre-format SQL code for you: www.poorsql.com
Subqueries can also be used as calculations.
Example: Get the total number of orders by each customer
SELECT customer_name,
customer_state (SELECT COUNT(*) AS orders FROM Orders WHERE
Orders.customer_id = Customer.customer_id) AS orders
FROM customers
ORDER BY Customer_name
As you can see, we’re selecting customer name, the region and then just as if we were going to select a different column name we have a whole subquery in there. We have (SELECT COUNT(*) AS orders FROM Orders
, and count these based on the customer IDs. It’s aggregating the orders based on the customer IDs ( Orders.customer_id = Customer.customer_id
). In the end, we order the table by customer name.