Category: Book Reviews

Reviews of books featuring a summary of the book and links to related material

Book review: The Subterranean Railway by Christian Wolmar

large-the-subterranean-railwayTo me the London underground is an almost magically teleportation system which brings order to the chaos of London. This is because I rarely visit London and know it only via Harry Beck’s circuit diagram map of the underground. To find out more about the teleporter, I have read The Subterranean Railway by Christian Wolmar.

London’s underground system was the first in the world, it predated any others by nearly 40 years. This had some drawbacks, for the first 30 years of its existence it ran exclusively using steam engines which are not good in an enclosed, underground environment. In fact travel in the early years of the Underground sounds really rather grim, despite its success.

The context for the foundation of the Underground was the burgeoning British rail network, it had started with one line between Manchester and Liverpool in 1830 by 1850 the country had a system spanning the country. The network did not penetrate to the heart of London, it had been stopped by a combination of landowner interests and expense. This exclusion was enshrined in the report of the 1846 Royal Commission on Metropolis Railway Termini. This left London with an ever-growing transport problem, now increased by the railway’s ability to get people to the perimeter of the city but no further.

The railways were the largest human endeavours since Roman times, as well as the engineering challenges there were significant financial challenges in raising capital and political challenges in getting approval. This despite the fact the the railway projectors were exempted from the restrictions on raising capital from groups of more than five people introduced after the South Seas Bubble.

The first underground line, the Metropolitan, opened in 1863 it ran from Paddington to Farringdon – it had been 20 years in the making, although construction only took 3 years. The tunnels were made by the cut-and-cover method, which works as described – a large trench is dug, the railway built in the bottom and then covered over. This meant the tunnels were relatively shallow, mainly followed the line of existing roads and involved immense disruption on the surface.

In 1868 the first section of the District line opened, this was always to be the Metropolitan’s poorer relative but would form part of the Circle line, finally completed in 1884 despite the animosity between James Staats Forbes and Edward Watkin – the heads of the respective companies at the time. It’s worth noting that it wasn’t until 1908 that the first London Underground maps were published; in its early days the underground “system” was the work of disparate private companies who were frequently at loggerheads and certainly not focussed on cooperating to the benefit of their passengers.

The underground railways rarely provided the returns their investors were looking for but they had an enormous social impact, for the first time poorer workers in the city could live out of town in relatively cheap areas and commute in, the railway companies positively encouraged this. The Metropolitan also invested in property in what are now the suburbs of London, areas such as Golders Green were open fields before the underground came. This also reflects the expansion of the underground into the surrounding country.

The first deep line, the City and South London was opened in 1890, it was also the first electric underground line. The deep lines were tunnelled beneath the city using the tunnelling shield developed by Marc Brunel, earlier in the 19th century. Following the first electrification the District and Metropolitan lines eventually electrified their lines, although it took some time (and a lot of money). The finance for the District line came via the American Charles Tyson Yerkes, who would generously be described as a colourful character, engaging in financial engineering which we likely imagine is a recent invention.

Following the First World War the underground was tending towards a private monopoly, government was looking to invest to make work and ultimately the underground was nationalised, at arms length, to form London Transport in 1933, led by the same men (Lord Ashfield and Frank Pick) who had run the private monopoly.

The London underground reached its zenith in the years leading up to the Second World War, gaining its identity (roundel, font and iconic map) and forming a coherent, widespread network. After the war it was starved of funds, declining – overtaken by the private car. Further lines were added such as the Victoria and Jubilee lines but activity was much reduced from the early years.

More recently it has seen something of a revival with the ill-fated Public-Private Partnership running into the ground, but not before huge amounts of money had been spent, substantially on improvements. As I write, the tunnelling machines are building Crossrail.

I felt the book could have done with a construction timeline, something like this on wikipedia (link), early on there seems to be a barrage of new line openings, sometimes not in strictly chronological order and to someone like me, unfamiliar with London it is all a bit puzzling. Despite this The Subterranean Railway is an enjoyable read.

Book Review: Clean Code by Robert C. Martin

Clean Code Bookcover

This review was first published at ScraperWiki.

Following my revelations regarding sharing code with other people I thought I’d read more about the craft of writing code in the form of Clean Code: A Handbook of Agile Software Craftmanship by Robert C. Martin.

Despite the appearance of the word Agile in the title this isn’t a book explicitly about a particular methodology or technology. It is about the craft of programming, perhaps encapsulated best by the aphorism that a scout always leaves a campsite tidier than he found it. A good programmer should leave any code they touch in a better state than they found it. Martin has firm ideas on what “better” means.

After a somewhat sergeant-majorly introduction in which Martin tells us how hard this is all going to be, he heads off into his theme.

Martin doesn’t like comments, he doesn’t like switch statements, he doesn’t like flag arguments, he doesn’t like multiple arguments to functions, he doesn’t like long functions, he doesn’t like long classes, he doesn’t like Hungarian* notation, he doesn’t like output arguments…

This list of dislikes generally isn’t unreasonable; for example comments in code are in some ways an anachronism from when we didn’t use source control and were perhaps limited in the length of our function names. The compiler doesn’t care about the comments and does nothing to police them so comments can be actively misleading (Guilty, m’lud). Martin prefers the use of descriptive function and variable names with a clear hierarchical structure to the use of comments.

The Agile origins of the book are seen with the strong emphasis on testing, and Test Driven Development. As a new convert to testing I learnt a couple of things here: clearly written tests being as important as clearly written code, the importance of test coverage (how much of you code is exercised by tests).

I liked the idea of structuring functions in a code file hierarchically and trying to ensure that each function operates at a single layer of abstraction, I’m fairly sold on the idea that a function should do one thing, and one thing only. Although to my mind the difficulty is in the definition of “thing”.

It seems odd to use Java as the central, indeed only, programming language in this book. I find it endlessly cluttered by keywords used in the specification of functions and variables, so that any clarity in the structure and naming that the programmer introduces is hidden in the fog. The book also goes into excruciating detail on specific aspects of Java in a couple of chapters. As a testament to the force of the PEP8 coding standard, used for Python, I now find Java’s prevailing use of CamelCase visually disturbing!

There are a number of lengthy examples in the book, demonstrating code before and after cleaning with a detailed description of the rationale for each small change. I must admit I felt a little sleight of hand was involved here, Martin takes chunks of what he considers messy code typically involving longish functions and breaks them down into smaller functions, we are then typically presented with the highest level function with its neat list of function calls. The tripling of the size of the code in function declaration boilerplate is then elided.

The book finishes with a chapter on “[Code] Smells and Heuristics” which summarises the various “code smells” (as introduced by Martin Fowler in his book Refactoring: Improving the Design of Existing Code) and other indicators that your code needs a cleaning. This is the handy quick reference to the lessons to be learned from the book. 

Despite some qualms about the style, and the fanaticism of it all I did find this an enjoyable read and felt I’d learnt something. Fundamentally I like the idea of craftsmanship in coding, and it fits with code sharing.

*Hungarian notation is the habit of appending letter or letters to variables to indicate their type.

Book review: Chasing Venus by Andrea Wulf

ChasingVenusI’ve been reading more of adventurous science of the Age of Enlightenment, more specifically Andrea Wulf’s book Chasing Venus: The Race to Measure the Heavens the scientific missions to measure the transit of Venus in 1761 and 1769.

Transits occur when a planet, typically Venus, lies directly between the earth and the Sun. During a transit Venus appears as a small black disc on the face of the sun. Since it’s orbit is also inside that of earth Mercury also transits the sun. Solar eclipses are similar but in this case the obscuring body is the moon, and since it is much closer to earth it completely covers the face of the sun.

Transits of Venus occur in pairs, 8 years apart separated by 100 or so years, they are predictable astronomical events. Edmund Halley predicted the 1761/1769 pair in 1716 and in addition proposed that the right type of observation would give a measure of the distance from the earth to the Sun. Once this distance is known distances of all the other planets from the sun can be calculated. In the same way as a solar eclipse can only be observed from a limited number of places on earth, the transit of Venus can only be observed from a limited number of places on earth. The observations required are the time at which Venus starts to cross the face of the sun, ingress, and the time at which it leaves, egress. These events are separated by several hours. In order to calculate the distance to the sun observations must be made at widely separate locations.

These timings had to be globally calibrated: some one in, say, London, had to be able to convert the times measured in Tahiti to the time London. This amounts to knowing precisely where the measurement was made – it is the problem of the longitude. At this time the problem of the longitude was solved given sufficient time, for land-based locations. It was still a challenge at sea.

At the time of the 1761/69 transits globe spanning travel was no easy matter, when Captain Cook landed on Tahiti in 1769 his was only the third European vessel to have done so, other ships had arrived in the two previous years; travel to the East Indies although regular was still hazardous. Even travel to the far North of Europe was a challenge, similarly across Russia to the extremes of Siberia. Therefore much of the book is given over to stories of long, arduous travel not infrequently ending in death.

Most poignant for me was the story of Jean-Baptiste Chappe d’Auteroche who managed to observe the entirety of both transits in Siberia and California but died of typhus shortly after observing the lunar eclipse critical to completing the observations he had made of Venus. His fellow Frenchman, Guillaume Joseph Hyacinthe Jean-Baptiste Le Gentil, observed the first transit onboard a ship on the way to Mauritius (his measurements were useless), remained in the area of the Indian Ocean until the second transit which he failed to observe because of the cloud cover and returned to France after 10 years, his relatives having declared him dead and the Académie des Sciences ceasing to pay him, assuming the same. Charles Green, observing for the Royal Society from Tahiti with Captain Cook and Joseph Banks, died after falling ill in Jakarta (then Batavia) after he had made his observations.

The measurements of the first transit in 1761 were plagued by uncertainty, astronomers had anticipated that they would be able to measure the times of ingress and egress with high precision but found that even observers at the same location with the same equipment measured times differing by 10s of seconds. We often see sharp, static images of the sun but viewed live through a telescope the picture is quite different; particularly close to the horizon the view of the sun the sun boils and shimmers. This is a result of thermal convection in the earth’s atmosphere, and is known as “seeing”. It’s not something I’d appreciated until I’d looked at the sun myself through a telescope. This “seeing” is what caused the problems with measuring the transit times, the disk of Venus did not cross a sharp boundary into the face of the sun, it slides slowly into a turbulent mess.

The range of calculated earth-sun distances for the 1761 measurements was 77,100,000 to 98,700,000 miles which spans the modern value of 92,960,000 miles. This represents a 22% range. By 1769 astronomers had learned from their experience, and the central estimate for the earth-sun distance by Thomas Hornsby was 93,726,000 miles, a discrepancy of less than 1% compared to the modern value. The range of the 1769 measurements was 4,000,000 miles which is only 4% of the earth-sun distance.

By the time of the second transit there was a great deal of political and public interest in the project. Catherine the Great was very keen to see Russia play a full part in the transit observations, in England George III directly supported the transit voyages and other European monarchs were equally keen.

Chasing Venus is of the same theme as a number of books I have reviewed previously: The Measure of the Earth, The Measure of All Things, Map of a Nation, and The Great Arc. The first two of these are on the measurement of the size, and to a degree, the shape of the Earth. The first in Ecuador in 1735, the second in revolutionary France. The Great Arc and Map of a Nation are the stories of the mapping by triangulation of India and Great Britain. In these books it is the travel, and difficult conditions that are the central story. The scientific tasks involved are simply explained, although challenging to conduct with accuracy at the time they were made and technically complex in practice.

There is a small error in the book which caused me initial excitement, the first transit of Venus was observed in 1639 by Jeremiah Horrocks and William Crabtree, Horrocks being located in Hoole, Cheshire according to Wulf. Hoole, Cheshire is suburb of Chester about a mile from where I am typing this. Sadly, Wulf is wrong, Horrocks appears to have made his observations either at Carr House in Bretherton or Much Hoole (a neighbouring village) both in Lancashire and 50 miles from where I sit.

Perhaps unfairly I found this book a slightly repetitive list of difficult journeys conducted first in 1761, and then in 1769. It brought home to me the level of sacrifice for these early scientific missions, and indeed global trade, simply in the separation from ones family for extended periods but quite often in death.

Posting abroad: my book reviews at ScraperWiki

It’s been a bit quiet on my blog this year, this is partly because I’ve got a new job at ScraperWiki. This has reduced my blogging for two reasons, the first is that I am now much busier but the second is that I write for the ScraperWiki blog. I thought I’d summarise here what I’ve done there just to keep everything in one place.

There’s a lot of programming and data science in my new job , so I’ve been reading programming and data analysis books on the train into work. The book reviews are linked below:

I seem to have read quite a lot!

Related to this is a post I did on Enterprise Data Analysis and visualisation: An interview study, an academic paper published by the Stanford Visualization Group.

Finally, I’ve been on the stage – or at least presenting at a meeting – I spoke at Data Science London a couple of weeks ago about Scraping and Parsing PDF files. I wrote a short summary of the event here.

datavisualization_andykirk javascriptthegoodparts1 machinelearningcover interactivevisualisation natural-language-processing-with-python

 

rinaction

Book review: Natural Language Processing with Python by Steven Bird, Ewan Klein & Edward Loper

natural-language-processing-with-python

This post was first published at ScraperWiki.

I bought Natural Language Processing in Python by Steven Bird, Ewan Klein & Edward Loper for a couple of reasons. Firstly, ScraperWiki are part of the EU Newsreader Project which seeks to make a “history recorder” using natural language processing to convert large streams of news articles into a more structured form. ScraperWiki’s role in this project is to scrape open sources of news related material, such as parliamentary records and to drive exploitation of the results of this work both commercially and through our contacts in the open source community. Although we’re not directly involved in the natural language processing work it seems useful to get a better understanding of the area.

Secondly, I’ve recently given a talk at Data Science London, and my original interpretation of the brief was that I should talk a bit about natural language processing. I know little of this subject so thought I should read up on it, as it turned out no natural language processing was required on my part.

This is the book of the Natural Language Toolkit Python library which contains a wide range of linguistic resources, methods for processing those resources, methods for accessing new resources and small applications to give a user-friendly interface for various features. In this context “resources” mean the full text of various books, corpora(large collections of text which have been marked up to varying degrees with grammatical and other data) and lexicons (dictionaries and the like).

Natural Language Processing is didactic, it is intended as a text for undergraduates with extensive exercises at the end of each chapter. As well as teaching the fundamentals of natural language processing it also seeks to teach readers Python. I found this second theme quite useful, I’ve been programming in Python for quite some time but my default style is FORTRANIC. The authors are a little scornful of this approach, they present some code I would have been entirely happy to write and describe it as little better than machine code! Their presentation of Python starts with list comprehensions which is unconventional, but goes on to cover the language more widely.

The natural language processing side of the book progresses from the smallest language structures (the structure of words), to part of speech labeling, phrases to sentences and ultimately deriving logical statements from natural language.

Perhaps surprisingly tokenization and segmentation, the process of dividing text into words and sentences respectively is not trivial. For example acronyms may contain full stops which are not sentence terminators. Less surprisingly part of speech (POS) tagging (i.e. as verb, noun, adjective etc) is more complex since words become different parts of speech in different contexts. Even experts sometimes struggle with parts of speech labeling. The process of chunking – identifying noun and verb phrases is of a similar character.

Both chunking and part of speech labeling are tasks which can be handled by machine learning. The zero order POS labeller assumes everything is a noun, the next simplest method is a simple majority voting one which takes the POS tag for previous word(s) and assumes the most frequent tag for the current word based on an already labelled body of text. Beyond this are the machine learning algorithms which take feature sets, including the tags of neighbouring words, to provide a best estimate of the tag for the word of interest. These algorithms include Bayesian classifiers, decision trees and the like, as discussed in Machine Learning in Action which I have previously reviewed. Natural Language Processing covers these topics fairly briefly but provides pointers to take things further, in particular highlighting that for performance reasons one may use external libraries from the Natural Language Toolkit library.

The final few chapters on context free grammars exceeded the limits of my understanding for casual reading, although the toy example of using grammars to translate natural language queries to SQL clarified the intention of these grammars for me. The book also provides pointers to additional material, and to where the limits of the field of natural language processing lie.

I enjoyed this book and recommend it, it’s well written with a style which is just the right level of formality. I read it on the train so didn’t try out as many of the code examples as I would have liked – more of this in future. You don’t have to buy this book, it is available online in its entirety but I think it is well worth the money.