Book review: A History of the 20th Century in 100 Maps by Tim Bryars an Tom Harper

100mapsA History of the 20th Century in 100 Maps by Tim Bryars and Tom Harper is similar in spirit to The Information Capital by James Cheshire and Oliver Uberti: a coffee table book pairing largely geographic visualisations with relatively brief text. Looking back I see that The Information Capital was a birthday present from Mrs SomeBeans, 100 Maps was a Christmas present from the same!

The book is published by the British Library. The maps are wide ranging in scale, scope and origin. Ordnance Survey maps of the trenches around the Battle of the Somme nestle against tourist maps of the Yorkshire coast, and tourist maps of Second World War Paris for the Nazi occupiers. The military maps are based on the detailed work by organisations like the Ordnance Survey whilst other maps are cartoons of varying degrees of refinement, “crude” being particularly relevant in the case of he Viz map of Europe!

War and tourism are recurring themes of the book. In war maps are used to plan a battle, conduct a post mortem and as propaganda. It is not uncommon to see newspaper cartoon style maps with countries represented by animals, or breeds of dog. In tourism maps are used as marketing – the presentation and “incidental” details at least as important as the navigation. Sometimes the “tourism” relates to state events. Not quite fitting into either war or tourism is the crudest map in the book, a sketch of Antarctica on the back of a menu card, made by Shackleton as he tried to persuade his neighbour at a formal dinner to help fund his expedition.

A third category is the economic map showing, for example the distribution of new factories around the UK in the post-war period or the exploitation of oil in the Caribbean or the North Sea. This maps were published sometimes for education, and sometimes as part of a prospectus for investment or a tool of persuasion.   

The focus of the book is really on the history of Britain rather than maps, this is a useful bait-and-switch for people like me who are ardent followers of maps but not so much of history. It struck me the impact that the two world wars of the 20th century had on UK. Not simply the effect of war itself but the rise of women in the workforce, and independence movements in the now former colonies, homes for heroes, nationalisation – the list goes on. It seems the appetite for all these things was generated during the First World War but they were not satisfied until after the Second. The First World War also saw the introduction of prohibitions on opium and cocaine, prior to this they were seen as the safe recreation of the upper middle classes during the war they were seen as a risk to the war effort. America had lead the way with the prohibition of drugs, and later alcohol, seeing them as a vice of immigrants and working classes.

I didn’t realise there were no controls on immigration to the UK until the 1905 Alien Act, this nugget is found in a commentary on a map entitled “Jewish East London”. This map was based on Charles Booth’s social maps of London and published by the Toybnee Trust. The Jewish population had been increasing over the preceding 25 years following pogroms and anti-Semitic violence in the Russian Empire. The Trust published the map alongside articles for and against immigration controls but, as the authors point out, the map is designed with its use of colouring and legends to make the case against.

100 maps is not all about the real world, E.H. Shepard’s map of the Hundred Acre Wood makes it in, as well Tolkien’s map of Middle Earth. In both cases the map provides an overview of the story inside. I rather like the map of Hundred Acre Wood map with “Eeyores Gloomy Place – rather boggy and sad”, and a compass rose which spells out P-O-O-H.

The book is well-made and presented but in these times of zoomable online maps physical media struggle to keep up. The maps are difficult to study in detail at the A4 size in which they are presented. 100 maps makes some comment on the technical evolution of maps, not so much in the collection of data, but in the printing and presentation process. It is only in the final decades of the century that full colour printed maps become commonplace. It is around 1990 that the Ordnance Survey becomes fully digital in its data collection.

All in all a fine present – thank you Mrs SomeBeans!

Review of the year: 2015

Another year comes to an end and it time to write my annual review. As usual my blog has been a mixture, with book reviews the most frequent item. I also wrote a bit about politics and some technology blog posts. You can see a list of my posts this year on the index page. My technology blog posts are about programming, and the tools that go with it – designed as much to remind me of how I did things as anything else.

My most read blog post this year was a technical one on setting up Docker to work on a Windows 10 PC – it appears to have gone out in an email to the whole Docker community. For the non-technical reader, Docker is like a little pop up workshop which a programmer can take with them wherever they go, all their familiar tools will be found in their Docker container. It makes sharing the development of software, and deploying it different places, much easier. 

Actually my most read blog post this year was the review of my telescope, which I wrote a few years ago – it clearly has enduring appeal! Sadly, I haven’t made much use of my telescope recently but I did reuse my experience to photograph the partial eclipse, visible in north west England in March. I took a whole pile of photographs and wrote a short blog post. It is a montage of my eclipse photos which graces the top of this post. I think the surprising thing for me was how long the whole thing took.

In book reading there was a mixture of technical books which I read in relation to my work, and because I am interested. My favourite of these was High Performance Python by Micha Gorelick and Ian Ozsvald, which lead me to thinking more deeply about my favoured programming language. I read a number of books relating to the history of science. The Values of Precision by edited by M. Norton Wise stood out – this was an edited collection about the evolution of precision in the sciences since population studies in pre-Revolutionary France. Many of the themes spoke directly to my experience as a scientist, and it was interesting to read about them from the point of view of historians. Andrea Wulf’s biography of Alexander von Humboldt was also very good. 

There was a General Election this year, which led to a little blogging on my part and then substantial trauma (as a Liberal Democrat). I stood for the local council in the “Chester Villages” ward, where I beat UKIP and the Greens (full results here), sadly the Chester Liberal Democrats lost their only seat on the Cheshire West and Chester Council.

I did a couple of little technical projects for my own interest over the year. I made my London Underground – Can I walk it? tool which helps the user decide whether to walk between London Underground stations, the distance between them often being surprisingly short in the central part of London. The distinguishing features of this tool is that it is dynamic, and covers walking distances which are not just nearest neighbour of the current line. You can find the website here. This little project incorporated a number of bits of technology I’d learnt about over the past few years, and featured help from David Hughes on the design side – you can see the result bellow.

image

My second project was looking at the recently released LIDAR data from the Environment Agency, I wrote about it here. LIDAR is a laser technique for determining the height of the land surface (or buildings, if they are in the way) to a high resolution – typically 1 metre but down to 25cm in some places. The data cover about 85% of England. The Environment Agency use the data to help plan flood and coastal defences, amongst other applications. I had fun overlaying the LIDAR imagery onto maps, and rendering it in 3D, below you can see St Paul’s cathedral rendered in 3D.

I changed job in the Autumn, moving from ScraperWiki in Liverpool to GB Group in Chester. In my new job I’m spending my days playing with data, and attending virtually no meetings – so all good there! Also my commute to work is a 25 minute cycle which I really enjoy. But I really value the experience I got at ScraperWiki. As a startup with an open source mentally I learnt lots of new things and could talk about them. I also got to work with some really interesting customers. It brought home to me how difficult it is to make a business work, it’s not enough just to do something clever – somebody has to pay you enough to do it – and that’s actually the really hard part.

I wrote a now obligatory holiday blog post. We stayed in Portinscale, just outside Keswick for our holiday at a time when the weather was rather better. The highlight of the trip for me was the Threlkeld Mining Museum, a place where older men collect old mining equipment for their entertainment and that of small children. Although Allan Bank in Grasmere was a close second, Allan Bank is a laid back hippy commune style National Trust property. Below you can see a view of Derwent Water to Catbells from Keswick.

A couple of things I haven’t blogged about: I started running in May and since then I’ve gone from running 5km in 34 minutes to 5km in 24 minutes, I’ve also lost 10kg. I should probably write a blog about this, since it involves data collection. There are some technical bits and pieces I’d quite like to write about (Python modules and sqlite) either because I use them so often or they’ve turned out to be useful. The other thing I haven’t written about is my CBT.

Book review: Risk assessment and Decision Analysis with Bayesian Networks by N. Fenton and M. Neil

riskassessmentAs a new member of the Royal Statistical Society, I felt I should learn some more statistics. Risk Assessment and Decision Analysis with Bayesian Networks by Norman Fenton and Martin Neil is certainly a mouthful but despite its dry title it is a remarkably readable book on Bayes Theorem and how it can be used in risk assessment and decision analysis via Bayesian Networks.

This is the “book of the software”, the reader gets access to the “lite” version of the author’s AgenRisk software, the website is here. The book makes heavy use of the software in terms of presenting Bayesian networks and also in the features discussed. This is no bad thing, the book is about helping people who analyse risk or build models to do their job rather than providing a deeply technical presentation for those who might be building tools or doing research in the area of Bayesian Networks. With access to AgenRisk the reader can play with the examples provided and make a rapid start on their own models.

The book is divided into three large sections. The first six chapters provide an introduction to probability, and the assessment of risk (essentially working out the probability of a particular outcome). The writing is pretty clear, I think its the best explanation of the null hypothesis and p-values that I’ve read. The notorious “Monty Hall” problem is introduced. It then goes into Bayes’ theorem in more depth.

Bayes Theorem originates in the writings of Reverend Bayes published posthumously in 1763. It concerns conditional probability, that is to say the likelihood that a hypothesis H is true given evidence E written P(H|E). The core point being that we might have the inverse of what we want: an understanding of the likelihood of evidence given a hypothesis, P(E|H). Bayes Theorem gives us a route to calculate P(H|E) given P(E|H), P(E) and P(H). The second benefit here is that we can codify our prejudices (or not) using priors. Other techniques deny the existence of such priors.

Bayesian statistics are often put in opposition to “frequentist” statistics. This division is sufficiently pervasive that starting to type frequentist, Google autocompletes with vs Bayesian! There is also an xkcd cartoon. Fenton and Neil are Bayesians and put the Bayesian viewpoint. As a casual observer of this argument I get the impression that the Bayesian view is prevailing.

Bayesian networks are structures (graphs) in which we connect together multiple “nodes” of Bayes theorem. That’s to say we have multiple hypothesis with supporting (or not) evidence which lead to a grand “outcome” or hypothesis. Such a grand outcome might be the probability that someone is guilty in a criminal trial or that your home might flood. These outcomes are conditioned on multiple pieces of evidence, or events, that need to be combined. The neat thing about Bayesian Networks is that we can plug in what data we have to make estimates of things we don’t know – regardless of whether or not they are the “grand outcome”.

The “Naive Bayesian Classifier” is a special case of a Bayesian network where the nodes are all independent leading to a simple hub and spoke network.

Bayesian networks were relatively little used until computational developments in the 1980s meant that arbitrary networks could be “solved”. I was interested to see David Speigelhalter’s name appear in this context, arguably he is one of few publically recognisable mathematicians in the UK.

The second section, covering four chapters, goes into some practical detail on how to construct Bayesian networks. This includes recurring idioms in Bayesian Networks which they name the cause consequence idiom, measurement idiom, definitional/synthesis idiom and induction idioms. The idea is that when one addresses a problem, rather than starting with a blank sheet of paper, you select the appropriate idiom as a starting point. The typical problem is that the “node probability tables” can quickly become very large for a carelessly constructed Bayesian network, Risk assessment’s idioms help reduce this complexity.

Along with idioms this section also covers how ranked and continuous scales are handled, and in particular the use of dynamic discretization schemes for continuous scales. There is also a discussion of confidence levels which highlights the difference in thinking between Bayesians and frequentists, essentially the Bayesians are seeking the best answer given the circumstances whilst the frequentists are obsessing about the reliability of the evidence.

The final section of three chapters gives some concrete examples in specific fields: operational risk, reliability and the law. Of these I found the law examples the most pertinent. Bayes analysis fits very comfortably with legal cases, in theory, a legal case is about assigning a probability to the guilt or otherwise of a defendant by evaluating the strength (or probability that they are true) of evidence. In practice one gets the impression that faulty “commonsense” can prevail in emotive cases, and experts in Bayesian analysis are only brought in at appeal.

I don’t find this surprising, you only have to look at the amount of discussion arising from the Monty Hall problem to see that even “trivial” problems in probability can be remarkably hard to reason clearly about. I struggle with this topic myself despite substantial mathematical training.

Overall a readable book on a complex topic, if you want to know about Bayesian networks and want to apply them then definitely worth getting but not an entertaining book for a casual reader.

Book review: Spark GraphX in Action by Michael S. Malak and Robin East

malak-meapI wrote about Spark not so long ago when I reviewed Learning Spark, at the time I noted that Learning Spark did not cover the graph processing component of Spark, GraphX. Spark GraphX in Action by Michael S. Malak and Robin East fills that gap.

I read the book via Mannings Early Access Program (MEAP), they approached me and gave me access to the whole book for free, this meant I read it on my Kindle which I tend not to do these days for technical books because I still find paper a more flexible medium. Early Access means the book is still a little rough around the edges but it is complete.

The authors suggest that readers should be comfortable reading Scala code to enjoy the book. Scala is the language Spark is written in, and the best way to access GraphX. In fact access via Python (my favoured route) is impossible and using Java it sounds ugly. Scala is a functional language which runs on the Java virtual machine. It seems to be motivated by a desire to remove Java’s verbosity but perhaps goes a little too far. There is no `return` keyword for identifying the return value of a function. Its affectation is to overload the meaning of the underscore _. As it was I felt comfortable enough reading Scala code. I was interested to read that the two “variable” definitions are `val` and `var`, `val` is immutable and is preferred – var is mutable. This is probably a lesson for my Python programming – immutable “variables” can provide higher performance (and using immutable for things that you intend to be immutable aids clarity and debugging).

From the point of view of someone who has read about Spark and graph theory in the past the book is pitched at the right level, there is some introductory material about Spark and also about graph theory and then a set of examples. The book finishes with some material on inspecting running jobs in Spark using the Spark web interface. If you have never heard of Spark, then this book probably isn’t a good place to start.

The examples start with basic algorithms on measuring shortest paths across a graph, connectedness and the Page Rank algorithm on which Google was originally built. These are followed by simple implementations of some further algorithms including shortest paths with weighted edges (essential for route finding) and the travelling salesman problem. There then follows a chapter on some machine learning algorithms including recommendation engines, spam detection, and document clustering. Where appropriate the authors cite the original papers for algorithms including PageRank, Pregel (Google’s graph processing framework) and SVD++ (which was a key component of the winning entry for the Netflix recommendation prize) which is very welcome. The examples are outlines rather than full implementations of these sophisticated algorithms.

Finally, there is a chapter titled “The Missing Algorithms”, this is more a discussion of utility functions for GraphX in terms of import from other schemes such as RDF, operations such as merging two graphs or trimming away stray vertices.

The book gives the impression that GraphX is not ready for the big time yet, in a couple of places the authors said “this bit has only just started working”, and when they move on to talking about using SVD++ in GraphX they explain how the algorithm is only half implemented in GraphX. Full implementations are available in other languages.

It seemed to me on my original reading about Spark that the big benefit was that you could write machine learning systems in a familiar language which ran on a single machine in Spark, and then scale up effortlessly to a computing cluster, if required. Those benefits are not currently present in GraphX, you need to worry about coding in a foreign language and about the quality of the underlying implementation. It feels like the appropriate approach (for me) should be to prototype using Python/Neo4J, and likely discover that that is all that is needed. Only if you have a very large graph do you need to consider switching to a Spark based solution, and I’m not convinced GraphX is how you would do it even then.

The code samples are poorly formatted but you can fix this by downloading the source code and viewing it in the editor of your choice with nice syntax highlighting and consistent indenting – this makes things much clearer. The figures are clear enough but I find the Kindle approach of embedding thumbnail scale figures unhelpful – you need to double click them to make them readable. A reasonable solution would be to make figures full page by default, if that is possible.

This is one of the better “* in Action” books I’ve read, it’s not convinced me to use GraphX – quite the reverse – but that’s no bad thing and I’ve learnt a little about recommender algorithms and Scala.

Book review: The Invention of Nature by Andrea Wulf

inventionofnatureThe Invention of Nature by Andrea Wulf is subtitled The Adventures of Alexander von Humboldt – this is his biography.

Alexander von Humboldt was born in Berlin in 1769, he died in 1859. The year in which On the Origin of Species was published. He was a naturalist of a Romantic tendency, born into an aristocratic family, giving him access to the Prussian court.

He made a four year journey to South America in 1800 which he reported (in part) in his book Personal Narratives, which were highly influential – inspiring Charles Darwin amongst many others. On this South American trip he made a huge number of observations across the natural and social sciences and was sought after by the newly formed US government as the Spanish colonies started to gain independence. Humboldt was a bit of a revolutionary at heart, looking for the liberation of countries, and also of slaves. This was one of his bones of contention with his American friends.

His key scientific insight was to see nature as an interconnected web, a system, rather than a menagerie of animals created somewhat arbitrarily by God. As part of this insight he saw the impact that man made on the environment, and in some ways inspired what was to become the environmentalist movement.

For Humboldt the poetry and art of his observations were as important as the observations themselves. He was a close friend of Goethe who found him a great inspiration, as did Henry David Thoreau. This was at the time when Erasmus Darwin was publishing his “scientific poems”. This is curious to the eye of the modern working scientist, modern science is not seen as a literary exercise. Perhaps a little more effort is spent on the technical method of presentation for visualisations but in large part scientific presentations are not works of beauty.

Humboldt was to go voyaging again in 1829, conducting a whistle-stop 15,000 mile 25 week journey across Russia sponsored by the government. On this trip he built on his earlier observations in South America as well as carrying out some mineral prospecting observations for his employers.

Despite a paid position in the Prussian court in Berlin he much preferred to spend his time in Paris, only pulled back to Berlin as the climate in Paris became less liberal and his paymaster more keen to see value for money.

Personally he seemed to be a mixed bag, he was generous in his support of other scientists but in conversation seems to have been a force of nature, Darwin came away from a meeting with him rather depressed – he had not managed to get a word in edgewise!

I’m increasingly conscious of how the climate of the time influences the way we write about the past. This seems particularly the case  with The Invention of Nature. Humboldt’s work on what we would now call environmentalism and ecology are highly relevant today. He was the first to talk so explicitly about nature as a system, rather than a garden created by God. He pre-figures the study of ecology, and the more radical Gaia Hypothesis of James Lovelock. He was already alert to the damage man could do to the environment, and potentially how he could influence the weather if not the climate. There is a brief discussion of his potential homosexuality which seems to me another theme in keeping with modern times.

The Invention of Nature is sub-subtitled “The Lost Hero of Science”, this type of claim is always a little difficult. Humboldt was not lost, he was famous in his lifetime. His name is captured in the Humboldt Current, the Humboldt Penguin plus many further plants, animals and geographic features. He is not as well-known as he might be for his theories of the interconnectedness of nature, in this area he was eclipsed by Charles Darwin. In the epilogue Wulf suggests that part of his obscurity is due to anti-German sentiment in the aftermath of two World Wars. I suspect the area of the “appropriate renownedness of scientific figures of the past” is ripe for investigation.

The Invention of Nature is very readable. There are seven chapters illustrating Humboldt’s interactions with particular people (Johann Wolfgang von Goethe, Thomas Jefferson, Simon Bolivar, Charles Darwin, Henry David Thoreau, George Perkins Marsh, Ernst Haeckel and John Muir). Marsh was involved in the early environmental movement in the US, Muir in the founding of the Yosemite National Park (and other National Parks). At first I was a little offended by this: I bought a book on Humboldt, not these other chaps! However, then I remembered I actually prefer biographies which drift beyond the core character and this approach is very much in the style of Humboldt himself.