Some notes on SQL: 2 – Basic SELECT

Part 1 of this sequence of blog posts provided a preamble and showed how to create databases. This post introduces the basic SELECT command, which shows you what lies within your database and as it’s name implies allows you to select only parts of the data contained within.

The basic form of SELECT is:

SELECT*FROMmy_contacts
WHEREfirst_name=‘Anne’;

* indicates that all fields should be returned from the table ‘my_contacts’, where the first_name field = ‘Anne’. We don’t have to take all the fields from a table:

SELECTfirst_name,last_name,emailFROMmy_contacts
WHEREfirst_name=‘Anne’;

As well as the equivalence operator =, we can also use comparison operators <> (not equal), <, >, <=, >= these work not only with numerical values, but also with text values. WHERE clauses can also be combined with AND and OR operators.

SELECTdrink_nameFROMdrink_info
WHEREcalories>=30
ANDcalories<=60;


SELECTdrink_nameFROMdrink_info
WHEREcaloriesBETWEEN30AND60;

The second select using the BETWEEN keyword is equivalent to the first.
In addition there are wildcards, % meaning ‘any number of characters’ and _ meaning ‘one character’ which are accessed via the LIKE keyword:

SELECTfirst_nameFROMmy_contacts
WHEREfirst_nameLIKE‘%im’;

This first search will return ‘Tim’, ‘Slim’, and ‘Ephraim’.

SELECTfirst_nameFROMmy_contacts
WHEREfirst_nameLIKE‘_im’;

This second search will only return ‘Tim’. NULL is special, nothing equals NULL but you can check if something is NULL:

SELECTfirst_nameFROMmy_contactsWHEREflagISNULL;

Comparisons can be made to a list with the IN keyword:

SELECT drink_name FROM drink_info
WHERE rating IN ( ‘good’, ‘excellent’, ‘average’ );

Finally, the NOT operator can be used to find the inverse of the selection made, the NOT keyword goes directly after IN but otherwise goes after WHERE:

SELECTdrink_nameFROMdrink_info
WHEREratingNOT IN (‘good’,‘excellent’,‘average’);


SELECTfirst_nameFROMmy_contacts
WHERE NOTfirst_nameLIKE‘_im’;

Keywords: AND, OR, BETWEEN, IS NULL, NOT, LIKE, IN

Some notes on SQL: 1 – creation

These are some notes on SQL a language for creating and querying databases, I’m learning it because a couple of programming projects I have in mind for work and home will need it. The source for these notes is  Head First SQL book. I’ve used a previous book in this series and I quite like the presentational style. I’m using MySQL via it’s command line client to do the exercises, because it’s about as straightforward as you can get. The code is formatted for display using this SQL to HTML formatter. Notepad++ recognises SQL as a language and will do syntax highlighting, so I type my commands into it and copy them into the MySQL command line client.

SQL is quite an old language and the convention is to write keywords in block capitals (reminds me of FORTRAN!). Command sequences are terminated by a semi-colon.

To start, this sequence creates a database, sets it as active and then adds a table containing a range of fields of different types, the command DESC shows the layout of a table:


CREATE DATABASE my_database;
USE my_database
CREATE TABLE contacts
(
contact_id INT NOT NULL,
first_name VARCHAR(20),
last_name VARCHAR(20),
birthday DATE,
life_story BLOB,
weight DEC(3, 2) NOT NULL DEFAULT 80.00,
state_code CHAR(2),
appointment DATETIME
);

Desc contacts;



NOT NULL keywords are used if a field must be specified on INSERT. Once created data can be added to the table using the INSERT command:

INSERT INTO contacts (contact_id,first_name,last_name,birthday,life_story,weight,
appointment)
VALUES
(1,‘Ian’,‘Hopkinson’,‘1970-24-04’,‘A very long text string’,80.0,
’10:30am 2010-21-06′
);

INSERT can be used with no specified fields (in which case values for all supplied fields must be provided for all fields), or with a subset of fields. In order to add the ‘ character we can either use ” or \’ (where ” is two single quote characters, rather than a double quote).

To delete a table:

DROP TABLE contacts;

This command should be used with care since it deletes the table whether or not it contains data. The next post should be on the SELECT command. 

Of course you can find SQL cheatsheets elsewhere.

Keywords: CREATE, DATABASE, TABLE, USE, DESC, DROP, INSERT INTO, VALUES, INT, VARCHAR, BLOB, CHAR, DATETIME, DATE, DEC

A sceptical look at the economy

This blog post was written partly because I’d got fed up with hearing about how all cuts were evil, without hearing an alternative plan. It has ended up more a collection of interesting data sources, and some mild ranting.

First to define some terms: The gross domestic product (GDP) is a measure of the total economic output of the country. It’s handy because we can use it to compare any other number we come up with to see how big they are. £1billion may sound like a lot, but the GDP is £1.5trillion, so £1billion is a less than 0.1% of GDP.

The debt is the total amount of money that the state owes; the deficit is the annual gap between what the state takes in taxes and what it spends. A debt is sustainable in the long term but running an annual deficit above a certain size, for a period, is not sustainable. The deficit can be divided into two components: a cyclical component which goes up and down with current economic conditions and a structural component which is on top of this. The structural component is the bad bit. There is some dispute over the validity of this division since economic cycles are not easy to define.

As Mr Micawber says in David Copperfield: “Annual income twenty pounds, annual expenditure nineteen nineteen six, result happiness. Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery.”

I’m a big fan of the Guardian’s government spending chart (see the image at the top of page), it’s a rather pretty way of seeing where government spending goes. For the year 2008/9 the total spending is £621bn, in this year the Treasury gets an enormous, anomalous amount in financial stabilisation (18%) – this will not recur in future years. Beyond this there’s the Department of Health, spending about 18%, Department of Work and Pensions (22%) with the largest fraction of this going on the state pension, Department for Children, Schools and Families (10%). The key point here is that a very large chunk of the money spent is spent on things that people very vocally want (schools, health care, state pensions).

The figures for where tax comes from are perhaps a little surprising, extracting the data from table 1 in this report by the Institute for Fiscal Studies. The major chunks are shown in the piechart below:

The “other” category is made up of various minor indirect taxes (tobacco, alcohol etc) and capital taxes (3%). The surprising thing to me was the relatively low level of corporation tax. There seems to be evidence of tax avoidance by corporations amounting to something in the region of £10bn, but this would only be roughly 1.6% of the tax take.

By the way, as a physicist, I look down on piecharts!
The total tax take of around 36% of GDP puts the UK roughly in the middle of the OECD table of taxes, with countries like New Zealand and Germany taking very similar levels of tax, France, Italy and Finland taking rather more (at around 44%) and countries like Ireland, Switzerland and the United States taking rather less at ~30%. The full OECD data is here, and wikipedia has a sortable list for all the countries of the world. An interesting exercise is to consider this list, and think in which countries you might want to live.
Of income tax the top 1% of earners pay nearly 25% of all income tax, you can see the full breakdown in this article on the BBC website. Put another way, the 40% tax band covering perhaps 15% of tax payers provides nearly 40% of income (here). This does exclude National Insurance contributions though. I was surprised by these figures, I’d assumed that the relatively small number of higher tax rate payers would result in a much lower total take.

The net result of these incomings and outgoings is that we had a national debt of 68.1% of GDP at the end of 2009, and a total deficit of 11.4% of GDP. (According to the Office of National Statistics). According to the newly formed Office of Budget Responsibility  (table 4.5) the national debt (which they call net debt) is 53.5% this year and the deficit (net borrowing) is 11.1% of which 8.8% is structural. 

Numbers without context are meaningless: a priori I have no idea how these numbers for debt and deficit compare to the past. However, help is at hand: this report shows how they have varied over the past 100 or so years. I’ve copied the key figure for national debt below:

At the end of the second world war the UK had a national debt of around 250% of GDP, much larger than our current debt (and even our predicted debt over the next few years). Interestingly we see in the same report (figure 1) that the deficit is rarely negative (i.e income greater than expenditure), hovering around 2% (i.e. still a deficit) debt is still paid off via growth in the economy and inflation.

I suppose the purpose of all of this preamble is a discussion of cuts, or if you prefer tax increases. Prior to the election everyone seemed to agree on the size of the gap to be filled but none of the parties managed to fill more than 25% of the gap, as evidenced in this report by the Institute of Fiscal Studies, the Guardian’s data blog had a nice breakdown of the measures proposed by the three main parties. The major political point of departure was when cuts should start (not if cuts would start), and my view prior to the election was that whoever won broadly similar levels of cuts would be made although there was some evidence that the balance between taxation and cutting would be different depending on party but since none of them revealed (or had) much of their plan it’s rather difficult to say.  It is a very minority view that no cuts are required, although I see the unions are trying that one out today along with threats if there are any cuts.

YouGov,  for the Sunday Times, helpfully asked the public:

“The government has asked for public advice on where it should cut public spending. Which, if any, of the following areas do you think should be targeted for cuts? Please tick up to three”. 

And the public demonstrated that if you ask a stupid question, you get a stupid answer (or, being generous to the public, if you ask a question without providing contextual information you get a stupid answer). I considered trying to find a fancy way of presenting this information, but in a nutshell: by far the most popular area for cuts (61%) is in international aid whose total budget is 0.8% of total spending (i.e. pretty much the smallest bit of the budget you can find).

To be fair to the public, many of them will be working in the private sector and will have variously experienced pay cuts (or at least freezes), reduced working, redundancies, budget reductions and frozen recruitment and they may well be feeling it’s someone else’s turn.

You can experiment with cuts yourself with this handy tool from the FT, have a play and think about how you’d stand up and justify the cuts you’ve made. To paraphrase Polly Toynbee: “Don’t be young, old, vulnerable, one of our brave boys, sick etc”.

The alternative to cuts are tax increases, but nobody seems keen to talk about them. Prior to the election there was a report stating that the deficit was equivalent to about 6p on the basic rate of income tax. Proposals to raise tax were normally described as a “Tax on jobs” or “Death tax”, which is unhelpful to say the least. Another popular idea is to tax the bankers, one option here is the Tobin or Robin Hood Tax which puts a small tax (typically fractions of a percent) on every financial transaction, because there are very many of these transactions potentially the amount raised could be large this would seem to require international coordination and it isn’t clear where the money raised would be spent (climate change, international aid, fund for future bank collapses have all been suggested). The banking sector contributes approximately £70bn to GDP, or 6.8%. The structural deficit isn’t about any money spent rescuing banks though, it’s about an ongoing gap between spending and taxation.

My personal view is that we should be talking about taxation, and where the balance between cuts and increased taxation should lie (currently it looks like 80/20 cuts to taxes). There should be some discussion of where tax rises are best levied : “on someone else” isn’t really a proper answer. Income tax seems like the best place to me, probably at basic rate with uplift of the lower threshold to protect some of the least well off, but possibly lowering the threshold to the upper tax bracket. In the longer term making the public sector more flexible to economic hardship would be nice, this time there seems to have been much more flexibility in how companies have approached recession – not necessarily painless, but better than losing your job. One element of this could be variable pay in the public sector (or bonuses as we colloquially call it) this provides two things to an employer: the ability to vary pay when income to the company is poor and some decoupling of current salaries from pension entitlements (since bonuses are typically not counted towards pension payments).

So to end on a happy note: I propose bonuses for the public sector!

* Update: hat-tip to AlexConner who pointed out that it is Mr Micawber not Uriah Heep who is responsible for the quote from David Copperfield

Compare and contrast

I thought I might try describing my job as an academic in a physics department, and comparing that to my current work as an industrial scientist.

Some scene setting: in the UK undergraduates are students who study taught degree courses lasting 3-4 years, typically they start at age 18 or 19. Postgraduates are studying for PhD’s, research courses lasting 3-4 years (after which research councils start getting nasty). After PhD. level there are postdoctoral workers who typically do contract research lasting 2-3 years per contract – they may do multiple contracts at an institution but it’s a rather unstable existence. Permanent academic staff are lecturers, senior lecturers, readers and professors in increasing order of seniority/pay.

As a lecturer-level academic, the shape of the year revolves around teaching, if not the effort involved. Undergraduate students start their year in September, with breaks over Christmas and Easter followed by exams in May/June. The teaching year amounts to about 30 weeks. Should you be lecturing the students, you will spend time preparing and giving lectures; how long this takes depends on your conscientiousness, the number of times you have lectured the course and the number of other things you have to do. In addition you will probably give tutorials, small groups of students working through questions set by other lecturers, practical classes and manage final year undergraduate projects and literature surveys. Compared to a school teacher or further education college lecturer your “contact” time with students will be relatively low – maybe 10 hours a week.

Final year projects are of particular interest to you as a researcher since there’s always vigorous competition amongst academics to attract the best undergraduates to do PhD.’s as postgraduates. A final year project done by a good student can be an excellent way to try an idea out. To be fair to students though, their performance in a final project and talking about that final year project can be the strongest part of a CV – since it demonstrates the ability to work individually in an unknown area.

In between undergraduate teaching there’s grant application writing, doing research of your own, writing papers, and then, come the end of term, the possibility of conferences.

In the end it was the apparently endless futility of writing grant applications which did for me as an academic. My success rate was zero, furthermore I had this terrible feeling that even after successfully winning a grant I would struggle to recruit postdocs or PhD students to do the work and there was little chance that having started a fruitful line of research there would be a good chance of continuing it with further successful grants.

I was recruited to my current company by a recruitment agency, who found my webpage still hanging around at Cambridge University a couple of years after I had left. I didn’t actually end up doing the job they nominally recruited me for but what I do is relevant to my research background and can be rather interesting.

I turned up to my new workplace on the Friday before I started and was shown my desk – in a shared office. I did wonder at that point whether I had done the right thing, back in academia I had an office roughly the size of a squash court and could go days without seeing anyone. As it turns out sharing an office isn’t too bad, you get to find out what’s going on, but it can be a pain when your neighbour decides to have a long, detailed meeting next to you.

Another novel aspect to working in industry is that someone seems interested in my career within the company. In getting on for 15 years as an academic I can remember rarely ever talking about my career with anyone who might have influence on its direction whilst in a company it’s at least an annual occasion. It’s true that the company’s enthusiasm for management-speak can be excessive (and changeable) as new human resources fads come and go.

I get to go to lots of meetings.

Relevant to current discussions on the public sector we have regular restructuring, and in the past year or so: pay freezes, arbitrary cuts in travel budget mid-year, a change to pensions for new recruits, and redundancies – the latest round equivalent to losing about 15% of the people on the site I work at. It’s fair to say that we are not necessarily models of efficiency internally: I heard on the news that it takes 5 signatures for someone in the NHS to buy a new bed costing about £1000 – sounds about par for the course.

One noticeable difference is that largely I feel much more wanted, inasmuch that if I’m put on a project then the project leader will be keen to get some sort of intellectual exertion on my part and will even appear quite pleased when this is achieved. Even better, people for whom I do “a bit on the side” are even more grateful. This is a big difference from being an academic, where the odd student (undergraduate or postgraduate) may appreciate your efforts but largely nobody shows much sign of caring about your research.

Looking back on my time as an academic: I think I would have benefited from some sort of master plan and career direction. I’d quite liked to have carried on as a postdoc, i.e. actually doing research work rather than trying to manage other people doing research. However, this isn’t a career option and is a rather unstable existence.

That’s nice, dear

This blog post is about programming, for people that don’t program – at least that’s the effect I’m aiming for. The title is in recognition of my tolerant wife, The Inelegant Gardener, who has learnt the appropriate response to my enthusiastic displays of the results of my programming: “That’s nice, dear”!.

I started programming a long time ago – in around 1980, at the school computer club, when I was 10. Since then I’ve been taught odd bits of programming by scientists, and done quite a lot of programming as part of my scientific job. I’ve started to get more interested in proper software engineering in the last few years. This is a roundabout way of saying I am an enthusiastic amateur.

People associate programming with the mathematically minded, but this isn’t necessarily the case: the codebreakers at Bletchley Park, who were amongst the first users of electronic computers, had a range of skills – amongst them were linguists and crossword wizards. I was talking to a Fellow in linguistics, who’d helped write his college’s library software – as he pointed out: a very logical view of language is a great benefit for a programmer. Programming is about giving an idiot very exact instructions, if the instructions concern maths then you need to know maths – otherwise you don’t.

The core of programming is still what I learned years ago, data (numbers or letters) is stored in “variables” that have names. There are conditional statements: “If [something is true] Then [do this] or else [do the other]”. There are looping statements: “Do this 100 times”. And there are functions: “add 2 to this number, square it, add the number you first thought of and tell me the answer” or “how many times does the letter a occur in this sentence”.

These simple statements are being buried under an increasing depth of additional ideas. Since the 80’s the big thing in programming has been “object-orientation”. In object-orientated programming you package up data of a particular sort with functions that relate to that data. So if you had data modelling an octopus you would include functions such as “wave-tentacles” and “change colour”, such functions would be useless for data describing a horse. The real benefit to this is comprehending larger software systems, because a sea of functions and data is grouped together into logical islands. Beyond this there are design patterns – reoccurring systems of objects which I haven’t entirely go the hang of.

In addition to the changes in language, there are changes in the tools used to program: syntax highlighting is nice, it amounts to colouring the verbs, nouns and proper names in programming in different colours – makes it easier to spot mistakes. Auto-completion is another handy tool, in a well-designed language there are only a limited number of next possible statements when you are programming – auto-completion presents you with them as you type. Sites like Stackoverflow are great for asking programming questions, and there no end of function libraries available on the web to help you out.

I have a number of little software projects on the go, you can see them in much the same way as woodworking projects, suduko or crosswords: they keep me out the way, muttering quietly to myself and exercising my brain. It doesn’t matter that what I’m doing isn’t groundbreaking and new.

Programming does lead to some odd habits; when I started programming it was useful to know binary and hexadecimal number systems, as a consequence I believe that numbers such as 1024 and 128 are nice and round. I’ve come to appreciate a wide range of bracket styles [] (){} since they are all used for different things and the semi-colon is one of the most important pieces of punctuation in my life. If I program for too long in a stretch I start to forget how to speak to people.

And just to show off the results of my latest fiddlings: maps of the UK election results. I got interested in doing this just after the General Election. The Guardian has published a lovely spreadsheet of election results, including data on every single candidate. You see lots of maps of data of this sort, I wanted to know how it was done. (Technical details beyond the maps.)

First of all the gender of MP’s by constituency: constituencies represented by ladies are marked pink, those by men marked blue:

The black constituency in northern England is Thirsk and Malton, which held its election on 27th May, following the death of one of the candidates during the general election campaign.
The population of each constituency is also interesting, here I have coloured the constituencies with 9 different shades of green, the palest shade corresponds to a voting population of between 20,000 and 30,000, the darkest shade corresponds to a population of between 100,000 and 110,000:
The Western Isles (now known by it’s Gaelic name: Na h-Eileanan an Iar) has the smallest population at about 22,000 and the Isle of Wight has the largest population with just under 110,000 potential voters. I used ColorBrewer to find a nice set of colours.
Finally here’s a map of which party came second in each constituency in the 2010 General Election:
Red for Labour, blue for the Conservatives, orange for Liberal Democrats, yellow for Scottish Nationalists, pale green for Plaid Cymru, dark green for Sinn Fein, blue for Ulster Conservatives and Unionists, and there are a few independents and minor Northern Island parties which are all coloured white.

Footnotes

So the task is to get the spreadsheet data into a map: To get started I did a bit of memory trawling and googling, a couple of people have written about colouring in maps: this one uses shapefile format map data and the R programming language, whilst this one uses SVG format map data and Python (another programming language). It turns out the shapefile format data for constituencies is a little difficult to get – you have to fill in forms! However enterprising people on Wikipedia have made SVG format constituency maps available. SVG stands for Scaleable Vector Graphics, it’s an XML format which means it’s plaintext and there are standard means to extract data from it and manipulate it. The only real problem is that the constituency names in the spreadsheet don’t exactly match the names inside the SVG format map – I had to resort to some horrible constituency by constituency coding for a load of them. To do this I used the C# programming language, largely because Visual Studio Express C# is a very nice, free development environment which I’ve used before. To view the SVG maps inside my application I used the Webkit .NET library to provide a webbrowser control (which wraps up the rendering engine used in the Safari and Google Chrome browsers) – the native C# webbrowser control is based on Internet Explorer – which doesn’t render SVG. Output to bitmaps is a bit clumsy, Inkscape (a free SVG editor) wasn’t keen on displaying the original constituency map, so I resorted to viewing the map in Google Chrome and taking a screen shot (a terrible bodge).