4.2 Joins
Benefits of breaking data into tables:
- Efficient storage
- Easier manipulation
- Greater scalability
- Logically models a process
- Tables are linked by common values (keys)
Joins associate corrects from each table and allow data retrieval from multiple tables in one query. Note that joins are not physical. They persist for the duration of the query execution.
- Cartesian (cross) joins: each row from the first table joins with all the rows of another table
If table 1 has \(n_1\) rows and table 2 has \(n_2\) rows, then after cartesian join, the resulted table will have \(n_1 \times n_2\) rows. It is not frequently used and is computationally taxing.
Example: - Table 1: vendor_name - Table 2: product_name, product_price
SELECT vender_name, product_name, product_price
FROM Vendors, Products
WHERE Vendors.vendor_id = Products.vendor_id;
It does not match anything. It’s just multiplying what you had in the first table with those records in the second table.
- INNER JOIN selects records that have matching values in both tables
It is one of the most frequently used joins in SQL.
Example:
SELECT suppliers.CompanyName, ProductName, UnitPrice
FROM Suppliers INNER JOIN Products
ON Suppliers.supplierid = Products.supplierid
Inner join syntax:
- Join type is specified (INNER JOIN)
- Join condition is in the FROM clause and uses the On clause
- Joining more tables together affects overall database performance
- You can join multiple tables, no limit
- List all the tables, then define conditions
Example:
SELECT o.OrderID, c.CompanyName, e.LastName
FROM ( (Orders o INNER JOIN Customer c ON o.CustomerID = c.CustomerID)
INNER JOIN Employees e ON o.EmployeeID = e.EmployeeID
);
Be careful about the names and make sure you don’t make unnecessary joins.
Aliases and self-joins
SQL aliases give a table or column a temporary name. It can make column names more readable. An alias only exists for the duration of the query.
- WITHOUT alias
SELECT vendor_name, product_name, product_price
FROM Vendors, Products
WHERE Vendors.vendor_id = Products.vendor_id;
- WITH alias
SELECT vendor_name, product_name, product_price
FROM Vendors AS v, Products AS p
WHERE v.vendor_id = p.vendor_id;
Self-join is to join table to itself. For example, assume you want to match customers from the same city. You can take the table and treat it like two separate tables. Join the original table to itself.
SELECT column_name(s)
FROM table1 T1, table2 T2
WHERE condition;
SELECT A.CustomerName AS CustomerName1, B.CustomerName AS CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CustomerID = B.CustomerID
AND A.City = B.City
ORDER BY A.City;
- Left/Right/Full outer joins
LEFT JOIN returns all rows from the left table, even if there are no matches in the right table. RIGHT JOIN returns all rows from the right table. FULL JOIN returns rows in either table.
- Union
The UNION operator is used to combine the result-set of two or more SELECT statements. Each SELECT statement within UNION must have the same number of columns. Columns must have similar data types. The columns in each SELECT statement must be in the same order.
Example:
SELECT column_name(s) FROM
table1
UNION
SELECT column_name(s) FROM
table2;
- Best Practices
- Check the number of records
- Does it seem logical given the kind of join you are performing?
- Check for duplicates
- Check the number of records each time you make a new join
- Are you getting the expected results?
- Start small: one table at a time
- Think about what you are trying to do and map how you are joining data tables first
- Use a join condition
- Summary Diagram