This is the sixth in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database, the third commands to modify the structure and contents of an existing database, the fourth, advanced selection. The fifth post covered database design. This post covers multi-table database operations. No claim of authority is made for these posts, they are mainly intended as my notes on the topic.
Good database design leads us to separate information into separate tables, the information we require from a SELECT statement may reside in multiple tables. There are keywords and methods in SQL to help with extracting data from multiple tables. To assist with clarity aliases, indicated using the AS keyword, allow tables to be given shorter, or clearer, names temporarily. Various JOIN keywords enable lookups between tables, as with other aspects of SQL there are multiple ways of achieving the same results – in this case ‘subqueries’.
The AS keyword can be used to populate a new table with the results of a SELECT statement, or it can be used to alias a table name. In it’s aliasing guise it can be dropped, in shorthand. This is AS being used in table creation:
CREATE TABLE profession
(
id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
profession VARCHAR(20)
) AS
SELECT profession
FROM my_contacts
GROUP BY profession
ORDER BY profession;
The following two forms are equivalent, the first uses the AS to alias, the second uses an implicit alias:
SELECT profession AS mc_prof
FROM my_contacts AS mc
GROUP BY mc_prof
ORDER BY mc_prof;
SELECT profession mc_prof
FROM my_contacts mc
GROUP BY mc_prof
ORDER BY mc_prof;
The following examples use two tables boys which is a three column table {boy_id, boy, toy_id} and toys a two column table {toy_id, toy}.
boy_id | boy | toy_id |
---|---|---|
1 | Davey | 3 |
2 | Bobby | 5 |
3 | Beaver | 2 |
4 | Richie | 1 |
toy_id | toy |
---|---|
1 | Hula hoop |
2 | balsa glider |
3 | Toy soldiers |
4 | Harmonica |
5 | Baseball cards |
Cross, cartesian, comma join are all names for the same, relatively little used operation which returns every row from one table crossed with every row from a second table, that’s to say two 6 row tables will produce a result with 36 rows. Although see here for an application.
SELECT t.toy,
b.boy
FROM toys AS t
CROSS JOIN boys AS b;
Notice the use of the period and aliases to reference columns, this query will produce a 20 row table.
Inner join combines the rows from two tables using comparison operators in a condition, an equijoin returns rows which are the same, a non-equijoin returns rows that are different. These are carried out with the same keywords, the condition is different. This is an equijoin:
SELECT boys.boy,
toys.toy
FROM boys
INNER JOIN toys
ON boys.toy_id = toys.toy_id;
The ON and WHERE keywords can be used interchangeable; in this instance we do not use aliases furthermore since the columns in the two tables (toys and boys) have the same name we could use a natural join:
SELECT boys.boy,
toys.toy
FROM boys
NATURAL JOIN toys;
Natural join is a straightforward lookup operation, a key from one table is used to extract a matching row from a second table, where the key column has the same name in each table. Both of these versions produce the following table:
boy | toy |
---|---|
Richie | hula hoop |
Beaver | balsa glider |
Davey | toy soldiers |
Bobby | harmonica |
A non-equijoin looks like this:
SELECT boys.boy,
toys.toy
FROM boys
INNER JOIN toys
ON boys.toy_id<>toys.toy_id
ORDER BY boys.boy;
the resultant in this instance is four rows for each boy containing the four toys he does not have.
Outer joins are quite similar to inner joins, with the exception that they can return rows when no match is found, inserting a null value. The following query
SELECT b.boy,
t.toy
FROM boys b
LEFT OUTER JOIN toys t
ON b.toy_id = t.toy_id;
produces this result
Boy | toy |
---|---|
Richie | Hula hoop |
Beaver | balsa glider |
Davey | Toy soldiers |
NULL | Harmonica |
Bobby | Baseball cards |
That’s to say each row of the toys table is taken and matched to the boys table, where there is no match (for toy_id=4, harmonica) a null value is inserted in the boy column. Both LEFT OUTER JOIN and RIGHT OUTER JOIN are available but the same effect can be achieved by swapping the order in which tables are used in the query.
In some instances a table contains a self-referencing foreign key which is the primary key of the table. An example might be a three column table, clown_info, of “clowns” {id, name, boss_id} where each id refers to a clown name and the bosses identified by boss_id are simply other clowns in the same table. To resolve this type of key a self-join is required this uses two aliases of the same table.
SELECT c1.name,
c2.name AS boss
FROM clown_info c1
INNER JOIN clown_info c2
ON c1.boss_id = c2.id;
Notice both c1 and c2 alias to clown_info.
Keywords: AS, ON, INNER JOIN, NATURAL JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, UNION, INTERSECT, EXCEPT