Author's posts
Jul 25 2010
Some notes on SQL: 6 – Multi-table operations
This is the sixth in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database, the third commands to modify the structure and contents of an existing database, the fourth, advanced selection. The fifth post covered database design. This post covers multi-table database operations. No claim of authority is made for these posts, they are mainly intended as my notes on the topic.
Good database design leads us to separate information into separate tables, the information we require from a SELECT statement may reside in multiple tables. There are keywords and methods in SQL to help with extracting data from multiple tables. To assist with clarity aliases, indicated using the AS keyword, allow tables to be given shorter, or clearer, names temporarily. Various JOIN keywords enable lookups between tables, as with other aspects of SQL there are multiple ways of achieving the same results – in this case ‘subqueries’.
The AS keyword can be used to populate a new table with the results of a SELECT statement, or it can be used to alias a table name. In it’s aliasing guise it can be dropped, in shorthand. This is AS being used in table creation:
CREATE TABLE profession
(
id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
profession VARCHAR(20)
) AS
SELECT profession
FROM my_contacts
GROUP BY profession
ORDER BY profession;
The following two forms are equivalent, the first uses the AS to alias, the second uses an implicit alias:
SELECT profession AS mc_prof
FROM my_contacts AS mc
GROUP BY mc_prof
ORDER BY mc_prof;
SELECT profession mc_prof
FROM my_contacts mc
GROUP BY mc_prof
ORDER BY mc_prof;
The following examples use two tables boys which is a three column table {boy_id, boy, toy_id} and toys a two column table {toy_id, toy}.
boy_id | boy | toy_id |
---|---|---|
1 | Davey | 3 |
2 | Bobby | 5 |
3 | Beaver | 2 |
4 | Richie | 1 |
toy_id | toy |
---|---|
1 | Hula hoop |
2 | balsa glider |
3 | Toy soldiers |
4 | Harmonica |
5 | Baseball cards |
Cross, cartesian, comma join are all names for the same, relatively little used operation which returns every row from one table crossed with every row from a second table, that’s to say two 6 row tables will produce a result with 36 rows. Although see here for an application.
SELECT t.toy,
b.boy
FROM toys AS t
CROSS JOIN boys AS b;
Notice the use of the period and aliases to reference columns, this query will produce a 20 row table.
Inner join combines the rows from two tables using comparison operators in a condition, an equijoin returns rows which are the same, a non-equijoin returns rows that are different. These are carried out with the same keywords, the condition is different. This is an equijoin:
SELECT boys.boy,
toys.toy
FROM boys
INNER JOIN toys
ON boys.toy_id = toys.toy_id;
The ON and WHERE keywords can be used interchangeable; in this instance we do not use aliases furthermore since the columns in the two tables (toys and boys) have the same name we could use a natural join:
SELECT boys.boy,
toys.toy
FROM boys
NATURAL JOIN toys;
Natural join is a straightforward lookup operation, a key from one table is used to extract a matching row from a second table, where the key column has the same name in each table. Both of these versions produce the following table:
boy | toy |
---|---|
Richie | hula hoop |
Beaver | balsa glider |
Davey | toy soldiers |
Bobby | harmonica |
A non-equijoin looks like this:
SELECT boys.boy,
toys.toy
FROM boys
INNER JOIN toys
ON boys.toy_id<>toys.toy_id
ORDER BY boys.boy;
the resultant in this instance is four rows for each boy containing the four toys he does not have.
Outer joins are quite similar to inner joins, with the exception that they can return rows when no match is found, inserting a null value. The following query
SELECT b.boy,
t.toy
FROM boys b
LEFT OUTER JOIN toys t
ON b.toy_id = t.toy_id;
produces this result
Boy | toy |
---|---|
Richie | Hula hoop |
Beaver | balsa glider |
Davey | Toy soldiers |
NULL | Harmonica |
Bobby | Baseball cards |
That’s to say each row of the toys table is taken and matched to the boys table, where there is no match (for toy_id=4, harmonica) a null value is inserted in the boy column. Both LEFT OUTER JOIN and RIGHT OUTER JOIN are available but the same effect can be achieved by swapping the order in which tables are used in the query.
In some instances a table contains a self-referencing foreign key which is the primary key of the table. An example might be a three column table, clown_info, of “clowns” {id, name, boss_id} where each id refers to a clown name and the bosses identified by boss_id are simply other clowns in the same table. To resolve this type of key a self-join is required this uses two aliases of the same table.
SELECT c1.name,
c2.name AS boss
FROM clown_info c1
INNER JOIN clown_info c2
ON c1.boss_id = c2.id;
Notice both c1 and c2 alias to clown_info.
Keywords: AS, ON, INNER JOIN, NATURAL JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, UNION, INTERSECT, EXCEPT
Jul 22 2010
The Periodic Table
Understanding the Periodic Table is very much like making love to a beautiful woman, there’s no point rote-learning the location of the different elements if you don’t know what they do… langtry_girl*
The Periodic Table of the Elements is a presentation of the known elements which provides information on the relationships between those elements in terms of their chemical and physical properties. An element is a type of atom: iron, helium, sulphur, aluminium are all examples of elements. Elements cannot be broken down chemically into other elements, but elements can change. An atom is comprised of electrons, protons and neutrons.
This is all very nice, but if you look around you: at the wallpaper, the computer screen, the table – very little of what you see is made from pure elements. They’re made from molecules (pure elements joined together), and the molecules are arranged in different ways which may be completely invisible. So in a sense the periodic table represents the bottom of the tree of knowledge for people interested in materials, other scientists may be more interested in what makes up the elements.
As a design, shown above, the periodic table is a cultural icon which everyone knows. Even if they don’t understand what it means, they know what it stands for – it stands for science. How to make sure people know your scene is set in a lab or your character is a scientist? Bung in a periodic table. It has been purloined to organise other sorts of information, such as Crispian Jago’s rather fine “Periodic Table of Irrational Nonsense“, some more examples here. There is a song.
At various times in my life I’ve been able to name and correctly locate all the elements in the periodic table, normally takes a bit of effort and some mnemonics to help. Increasingly now, I can remember the mnemonics but not the elements they refer to.
Different parts of the periodic table are important to different sorts of scientists. To organic chemists carbon (C), hydrogen (H), oxygen (O), nitrogen (N) hold the majority of their interest with walk on parts for some of the transition metals (the pink ones in a block in the middle) which act as catalysts. Inorganic chemists are more wide ranging, only really forbidden from the Noble Gases (helium (He), neon(Ne), argon (Ar), krypton (Kr), xenon (Xe)) which refuse to react with anything. Semi-conductor physicists are after the odd “semi-metals”: silicon (Si), indium (In), gallium (Ga), germanium (Ge), arsenic (As). For magnets there’s iron (Fe), cobalt (Co), nickel (Ni) along with other transition metals and the Lanthanides. The actinides are for nuclear physicists, radiation scientists and atomic bomb makers. Hydrogen is for cosmologists. In this view, as a soft condensed matter physicist, I am closest to the organic chemists.
I’m rather fond the periodic table, it is the scientist’s badge, but I’m scared of fluorine.
*To be fair to langtry_girl, I pondered on twitter “Trying to finish the sentence: “Understanding the Periodic Table is very much like making love to a beautiful woman…” and I think hers was the best reply. It is, of course, a reference to Swiss Toni.
Jul 20 2010
Some notes on SQL: 5 – database design
This is the fifth in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database, the third commands to modify the structure and contents of an existing database, the fourth on advanced selection. This post covers database design, as such it is a little lighter on the code examples. No claim of authority is made for these posts, they are mainly intended as my notes on the topic. These notes are based largely on Head First SQL.
The goal of database design is to produce a database which is straightforward and efficient to search. This is done by splitting data into a set of tables, with lookups between those tables used to build the desired output results.
Efficient database design is normally discussed with reference to “normal forms“, the goal being to reach the highest order normal form. In practice, pragmatism is applied which means it may be sensible to hold back a little on this.
(
sid INTEGER,
last_name VARCHAR(30),
first_name VARCHAR(30),
PRIMARY KEY (sid)
);
For a composite key, this form is used:
PRIMARY KEY (column_1,column_2,column_3)
In a normalised relation a non-key field must provide a fact about the key, the whole key and nothing but the key.
Relationships between tables in a database are indicated like this:
(
order_id INTEGER,
order_date DATE,
customer_sid INTEGER,
amount DOUBLE,
PRIMARY KEY (order_id),
FOREIGN KEY (customer_sid) REFERENCES customer(sid)
);
(Example borrowed from here). PRIMARY KEY and FOREIGN KEY are examples of ‘constraints’, primary keys must be unique and a foreign key value cannot be used in a table if it does not exist as a primary key in the referenced table. The CONSTRAINT keyword is used to give a name to a constraint (a constraint being one of NOT NULL, UNIQUE, CHECK, Primary Key, Foreign Key). CHECK is not supported in MySQL.
Keywords: PRIMARY KEY, CONSTRAINT, FOREIGN KEY, REFERENCES, CONSTRAINT
Jul 15 2010
On choice
Choose life. Choose a job. Choose a career. Choose a family. Choose a big fucking television. Choose washing machines, cars, compact disc players and electric tin openers. Choose good health, low cholesterol and dental insurance. Choose fixed interest mortgage payments. Choose a starter home. Choose your friends. Choose leisure wear and matching luggage. Choose a three-piece suite and higher purchase a wide range of fucking fabrics. Choose D.I.Y. and wondering who the fuck you are on Sunday morning. Choose sitting in a large couch watching mind-numbing spirit-crushing game shows stuffing fucking junk food in your mouth. Choose rotting away at the end of it all, pissing your last in a miserable home, nothing more than an embarrassment to the selfish fucked-up brats you’ve sworn to replace yourself. Choose your future, choose life. But why would you want to do a thing like that? I chose not to choose life. I chose something else. – Trainspotting by Irving Welsh (Screenplay by John Hodge)
For the last 20 years or so politicians have been keen on offering us choice, my message is “I don’t want choice”!
Choice of schools is something of an academic question for me since I don’t have any children but I grew up in rural Dorset and there the offer of choice would have been hollow. There were two primary schools in my village : one Roman Catholic and one Church of England, following that we went to the local “Middle School” one mile away – next nearest offering five miles away, followed by an upper school five miles away and the nearest alternative 10 miles and above away (to be honest I don’t even know where the alternative would be)… and this in an area with a rural transport system, not an urban one. A great deal of effort is expended in trying to rank schools, there’s evidence showing this process is not very accurate – the vast majority of schools are statistically indistinguishable. And who says schools are so important for education? My educational success is down, in large part, to the support of my parents but no-one seems to mention that. No one wants to say: actually your child’s education is very much down to you.
We get choice in medical care these days too but how am I supposed to judge the quality of a doctor or a hospital? Set some bright people a target and they’ll do a fine job of hitting it but is the target really representing the thing you want? People are actually quite keen to go to the hospital that’s close to them. Do we really expect patients to make an informed choice of which hospital is best for them from a medical point of view. I’m pretty sure I couldn’t make an accurate choice of the best hospital for medical care. Best hospital for me is easy: it’s the one about half a mile from my house. And what’s the message you’re sending when you’re offering a choice of hospital or doctor and providing data that purports to represent quality?:
“Here’s a bunch of hospitals – make sure you chose the best one. Do you feel lucky?”
I’d much rather you made sure that it didn’t matter which hospitals I went to.
People don’t actually like lots of choice, academic research on jam shows that consumers are more likely to buy jam from a choice of 6 types than from a selection of 24 types, too much choice confuses and causes unhappiness. This chimes with my experience, to a large extent I’ve given up being a rational economic agent, live’s too short to sweat over a choice of 100 different TVs.
This problem of ranking difficult to rank things is quite general, I experience it myself at work in my targets. I’ve come to the tentative conclusion that for people working in areas without clearly quantifiable outputs (number of strawberries picked, widgets sold, football games won), ranking really amounts to three buckets: sack, ok, promote. Your sack and promote buckets should really be pretty small. Yet we expend great effort on making more precise gradings. More interestingly I remember as I sat through an interminable college meeting discussing with an English fellow the marking of students. Normally for degree courses there’s a certain amount of second marking, in physics where there are definite answers second marking works fairly well but for my colleague in English one marker could mark a First and the other a 2.2/3rd, for the same essay!
Don’t give me choice, give me uniformity!
Jul 14 2010
Some notes on SQL: 4 – Advanced select
This is the fourth in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database, the third commands to modify the structure and contents of an existing database. This post covers more advanced commands for selecting information from a database and ways of manipulating the results returned. No claim of authority is made for these posts, they are mainly intended as my notes on the topic.
SQL supports CASE statements, similar to those which are found in a range of programming languages, they are used to write multiple comparison sequences more compactly:
UPDATE my_table
SET new_column = CASE
WHEN column1 = somevalue1 THEN newvalue1
WHEN column2 = somevalue2 THEN newvalue2
ELSE newvalue3
END;
The CASE statement can also be used in a SELECT:
SELECT title,
price,
budget = CASE price
WHEN price > 20.00 THEN ‘Expensive’
WHEN price BETWEEN 10.00 AND 19.99 THEN ‘Moderate’
WHEN price < 10.00 THEN ‘Inexpensive’
ELSE ‘Unknown’
END,
FROM titles
(This second example is from here)
The way in which results are returned from a SELECT statement can be controlled by the ORDER BY keyword with the ASC (or ASCENDING) and DESC (or DESCENDING) modifiers. Results can be ordered by multiple keys. The sort order is numbers before letters, and uppercase letters before lowercase letters.
SELECT title,purchased
FROM movie_table
ORDER BY title ASC, purchased DESC;
ASCENDING order is assumed in the absence of the explicit keyword.
There are various functions that can be applied to sets of rows returned in a query to produce a single value these include MIN, MAX, AVG, COUNT and SUM. The functions are used like this:
SELECT SUM(sales)
FROM cookie_sales
WHERE first_name = ‘Nicole’;
This returns a sum of all of the “sales” values returned by the WHERE clause. Related is DISTINCT which is a keyword rather than a function so the syntax is slightly different:
SELECT DISTINCT sale_date
FROM cookie_sales
ORDER BY sale_date;
This returns a set of unique dates in the sale_date column.
The GROUP BY keyword is used to facilitate the use of functions such as SUM etc which take multiple arguments to produce a single output, or to reduce a list to distinct elements (in these circumstances it is identical to the DISTINCT keyword but execution may be faster). The format for GROUP BY is shown, by example below:
SELECT first_name, SUM(sales)
FROM cookie_sales
GROUP BY first_name;
This will return a sum of the “sales” by each person identified by “first_name”. A final keyword used to control the output of a SELECT statement is the LIMIT keyword which can take one or two parameters the behaviour for the two forms is quite different. One parameter form:
SELECT * FROM your_table LIMIT 5;
This returns the first five results from a SELECT. Two parameter form:
SELECT * FROM your_table LIMIT 5, 5;
This returns results 6,7,8,9 and 10 from the SELECT. The first parameter is the index of the first result to return (starting at 0 for the first position) and the second parameter is the number of results to return.
Keywords: CASE, WHEN, THEN, ELSE, ORDER BY, ASC, DESC, DISTINCT, MIN, MAX, AVG, COUNT, SUM, GROUP BY, LIMIT