Category: Book Reviews

Reviews of books featuring a summary of the book and links to related material

Book review: The Invention of Nature by Andrea Wulf

inventionofnatureThe Invention of Nature by Andrea Wulf is subtitled The Adventures of Alexander von Humboldt – this is his biography.

Alexander von Humboldt was born in Berlin in 1769, he died in 1859. The year in which On the Origin of Species was published. He was a naturalist of a Romantic tendency, born into an aristocratic family, giving him access to the Prussian court.

He made a four year journey to South America in 1800 which he reported (in part) in his book Personal Narratives, which were highly influential – inspiring Charles Darwin amongst many others. On this South American trip he made a huge number of observations across the natural and social sciences and was sought after by the newly formed US government as the Spanish colonies started to gain independence. Humboldt was a bit of a revolutionary at heart, looking for the liberation of countries, and also of slaves. This was one of his bones of contention with his American friends.

His key scientific insight was to see nature as an interconnected web, a system, rather than a menagerie of animals created somewhat arbitrarily by God. As part of this insight he saw the impact that man made on the environment, and in some ways inspired what was to become the environmentalist movement.

For Humboldt the poetry and art of his observations were as important as the observations themselves. He was a close friend of Goethe who found him a great inspiration, as did Henry David Thoreau. This was at the time when Erasmus Darwin was publishing his “scientific poems”. This is curious to the eye of the modern working scientist, modern science is not seen as a literary exercise. Perhaps a little more effort is spent on the technical method of presentation for visualisations but in large part scientific presentations are not works of beauty.

Humboldt was to go voyaging again in 1829, conducting a whistle-stop 15,000 mile 25 week journey across Russia sponsored by the government. On this trip he built on his earlier observations in South America as well as carrying out some mineral prospecting observations for his employers.

Despite a paid position in the Prussian court in Berlin he much preferred to spend his time in Paris, only pulled back to Berlin as the climate in Paris became less liberal and his paymaster more keen to see value for money.

Personally he seemed to be a mixed bag, he was generous in his support of other scientists but in conversation seems to have been a force of nature, Darwin came away from a meeting with him rather depressed – he had not managed to get a word in edgewise!

I’m increasingly conscious of how the climate of the time influences the way we write about the past. This seems particularly the case  with The Invention of Nature. Humboldt’s work on what we would now call environmentalism and ecology are highly relevant today. He was the first to talk so explicitly about nature as a system, rather than a garden created by God. He pre-figures the study of ecology, and the more radical Gaia Hypothesis of James Lovelock. He was already alert to the damage man could do to the environment, and potentially how he could influence the weather if not the climate. There is a brief discussion of his potential homosexuality which seems to me another theme in keeping with modern times.

The Invention of Nature is sub-subtitled “The Lost Hero of Science”, this type of claim is always a little difficult. Humboldt was not lost, he was famous in his lifetime. His name is captured in the Humboldt Current, the Humboldt Penguin plus many further plants, animals and geographic features. He is not as well-known as he might be for his theories of the interconnectedness of nature, in this area he was eclipsed by Charles Darwin. In the epilogue Wulf suggests that part of his obscurity is due to anti-German sentiment in the aftermath of two World Wars. I suspect the area of the “appropriate renownedness of scientific figures of the past” is ripe for investigation.

The Invention of Nature is very readable. There are seven chapters illustrating Humboldt’s interactions with particular people (Johann Wolfgang von Goethe, Thomas Jefferson, Simon Bolivar, Charles Darwin, Henry David Thoreau, George Perkins Marsh, Ernst Haeckel and John Muir). Marsh was involved in the early environmental movement in the US, Muir in the founding of the Yosemite National Park (and other National Parks). At first I was a little offended by this: I bought a book on Humboldt, not these other chaps! However, then I remembered I actually prefer biographies which drift beyond the core character and this approach is very much in the style of Humboldt himself.

Book Review: Canals: The making of a nation by Liz McIvor

canalsCanals: The making of a Nation by Liz McIvor is a tie-in with a BBC series of the same name, presented by the author. It is about canals in England from the mid-18th century through to the present day although most of the action takes place before the end of the 19th century.

The chapters of the book match the episodes of the series which are thematic, rather than chronological. Each chapter introduces a different topic, loosely tied to a particular canal.

The book starts with a discussion of the growth of London, and the Grand Junction canal linking it to Birmingham. The guild system was a factor in limiting the growth of the capital until the mid-18th century. The “Bubble Act” of 1720, enacted in the aftermath of the South Sea Bubble likely also had an impact. It prevented the formation of any joint stock company without an act of parliament to approve. It was repealed in 1825 before the railways saw their enormous growth. The Grand Junction canal was built as Birmingham became a manufacturing hub and London a great city with many requirements for daily life, and also a showroom to at least the United Kingdom, if not the world.

I was chastened to discover that the Bridgewater canal, one of the earliest of “canal boom” projects of the 18th century is only just up the road from me in Chester. I’d always assumed it was close to the town of a very similar name in Somerset! Bridgewater is named for the Duke of Bridgewater, Francis Egerton, for which the pub just over the road from me is presumably named. The Bridgewater canal was built around 1760, linking the Duke’s coal mines at Worsley to Manchester. With this revelation I realise that the Bridgewater canal and the Liverpool to Manchester railway, the first exclusively steam railway, are sited very close to each other.

Support for manufacture was the theme of canal building in the North of England, and also around Birmingham with canals built to move bulky raw materials to factories placed to benefit from hydraulic power, and benevolent climates for the processing of materials such as cotton. Manufacturers such as Josiah Wedgewood were keen to see their fragile wares safely make the outward journey to the showrooms of London.

The Kennet and Avon canal was built to provide navigable water access from Bristol to London. William Smith, who produced the first geological map of Great Britain is introduced in this chapter. I read more about him in The Map that Changed the World by Simon Winchester. The digging of canal cuts and tunnels reveal the local geology. Nowadays we see canals as bucolic thoroughfares but when they were built they were raw cuts indicating industrialisation.

The Manchester Ship Canal was opened in 1894 to bypass the port of Liverpool, these were the dying days of canal building. 154 died in its construction and 1404 were seriously injured from a workforce of 16,361. For comparison, projects such as the 2012 London Olympics and the close-to-completion Crossrail project are of similar scale yet have casualty numbers hovering around zero although these are best-in-class projects for health and safety. In this chapter McIvor talks more of the Irish “navigators” who built the canals, and something of the early trade union movement.

The families that worked the canals were seen as outsiders, once the long networks were set up they led an itinerant lifestyle with no fixed church or school for their children. The Victorian moralists arguing for improved conditions for the boat families seem to do so from the point of view of pointing out how bloody awful they were!

It’s interesting to see the likes of Thomas Telford and John Rennie cropping up repeatedly in this book. They have the air of rockstar engineers, not a niche found these days. Perhaps this is a result of the work of the Victorian writer, Samuel Smiles, who was very keen on self-improvement and wrote biographies of these men to promote his ideas.

To me the book lacks a little prehistory, the great boom for canal building in the UK was at the end of the 18th century but the very first “pound lock” in England was built in 1566 on the Exeter canal. What went on between these two times? And what was happening elsewhere in the world? Perhaps the answer here is that the canals in Britain never represented a technological revolution, they were always about the social and commercial climate being right.

Canals: The Making of a Nation is an unchallenging read, well-suited to a holiday. If you’re on a canal boat it won’t tell you much about the particular bridges and tunnels you pass over but it will give you a strong feeling for the lives of the people that built and used the canals, and why they were built in the first place.

Book review: Effective Computation in Physics by Anthony Scopatz & Kathryn D. Huff

ecipThis next review, of “Effective Computation in Physics” by Anthony Scopatz & Kathryn D. Huff, arose after a brief discussion on twitter with Mike Croucher after my review of “High Performance Python” by Ian Micha Gorelick and Ian Ozsvald. This in the context of introducing students, primarily in the sciences, to programming and software development.

I use the term “software development” deliberately. Scientists have been taught programming (badly, in my view*) for many years. Typically they are given a short course in the first year of their undergraduate training, where they are taught the crude mechanics of a programming language (typically FORTRAN, C, Matlab or Python). They are then left to it, perhaps taking up projects requiring significant coding as final year projects or in PhDs. The thing they have lacked is the wider skillset around programming – what you might call “software development”. The value of this is two-fold – firstly, it is a good training for a scientist to have for careers in science. Secondly, the wider software industry is full of scientists, providing students with a good grounding in this field is no bad thing for their future employability.

The book covers in at least outline all the things a scientist or engineer needs to know about software development. It is inspired by the Software Carpentry and The Hacker Within programmes.

The restriction to physics in the title seems needless to me. The material presented is mostly applicable to any science, and those working in the digital humanities, undertaking programming work. The examples have a physics basis but not to any great depth, and the decorative historical anecdotes are all physics based. Perhaps the only exception to this is the chapter on HDF5 which is a specialised data storage system, some coverage of SQL databases would make a reasonable substitute for a more general course. The chapter on parallel computing could likewise be dropped for a wider audience.

The book is divided into four broad sections. Including in these are chapters on:

  • Command line operations;
  • Programming in Python;
  • Build systems, version control, debugging and testing;
  • Documentation, publication, collaboration and licensing;

Command line operations are covered in two chunks, firstly in the basic navigation of the file system and files followed by a second chapter on “Regular Expressions” which covers find, grep, sed and awk – at a very basic level.

The introduction to Python is similarly staged with initial chapters covering the fundamentals of the core language, with sufficient detail and explanation that I learnt some new things**. Further chapters introduce core Python libraries for data analysis including NumPy, Pandas and matplotlib.

Beyond these core chapters on Python those on version control, debugging and testing are a welcome addition. Our dearest wish at ScraperWiki, a small software company where I worked until recently, was that new recruits and interns would come with at least some knowledge and habit for using source control (preferably Git). It is also nice to see some wider discussion of GitHub and the culture of Pull Requests and issue tracking. Systematic testing is also a useful skill to have, in fact my experience has been that formal testing is most useful for those most physics-like functions.

The final section covers documentation, publication and licensing. I found the short chapter on licensing rather useful, I’ve been working on some code to analyse LIDAR data and have made it public on GitHub, which helpfully asks which license I would like to use. As it turns out I chose the MIT license and this seems to be the correct one for the application. On publication the authors are Latex evangelists but students can chose to ignore their monomania on this point. Latex has a cult-like following in physics which I’ve never understood. I have written papers in Latex but much prefer Microsoft Word for creating documents, although Google Docs is nice for collaborative work. The view that a source control repository issue tracker might work for collaboration beyond coding is optimistic unless academics have changed radically in the last few years.

I’d say the only thing lacking was any mention of pair programming, although to be fair that is more a teaching method than course material. I found I learnt most when I had a goal of my own to work towards, and I had the opportunity to pair with people with more knowledge than I had. Actually, pairing with someone equally clueless in a particular technology can work pretty well.

There is a degree to which the book, particularly in this section strays into a fantasy of how the authors wish computational physics was undertaken, rather than describing how it is actually undertaken.

To me this is the ideal “Software development for scientists” undergraduate text, it is opinionated in places and I occasionally I found the style grating but nevertheless it covers the right bases.

*I’m happy to say this since I taught programming badly to physics undergraduates some years ago!

**People who know my Python skills will realise this is not an earthshattering claim.

Book review: High Performance Python by Micha Gorelick & Ian Ozsvald

highperformancepythonHigh Performance Python by Micha Gorelick and Ian Ozsvald is nominally a book about improving the speed and memory performance of your programs. Along the way it provides insight into some more advanced aspects of Python programming, including how the language works under the hood.

The book starts with tools for analysing the speed and memory performance of programs at global, function and line level. The authors emphasis the importance making these measurements, and using unit testing to ease the process of optimisation. Blindly optimising where you *think* the problem lies is never a good idea.

The next set of chapters talk about some core Python data structures a little about their implementation and relative performance. These include lists and tuples, dictionaries and sets, iterators and generators and matrices and vectors.It is here that the numpy library is introduced, and it is treated almost as a core Python library in its importance.

The difference between range and xrange in Python 2 is striking: if you wish to execute a loop some number of times then range builds a list of size that number of elements and xrange makes a generator and therefore xrange uses far less memory.

The next few chapters cover compiling Python to C for speed increases, concurrency, the multiprocessing module and clusters. Typically chapters take an example (Julia sets, diffusion equations, estimating pi, finding primes) and demonstrate the speedups which can be made, from the routine to the ridiculous. The authors point out when further optimisation is a bit pointless.

For compiling to C, there are a number of options of varying coverage and maturity. Cython offers the most mature, widest coverage but at the cost of making breaking changes to code. Other newer solutions include Numba and PyPy, they do not require breaking changes to code but they are less mature and in the case of PyPy do not support the important numpy library.

Concurrency is about making better use of a single processor using asynchronous methods, here there are libraries such as gevent and tornado.

For parallel processing most focus is on the multiprocessing library, most of the book is platform agnostic but this chapter is based on the Linux implementation of Python.I hadn’t realised before that “embarrassingly parallel” had a specific meaning i.e. that there is no need for interprocess communication for the problem at hand.

The coverage of computing clusters is fairly cursory, this isn’t really the focus of this book and as the authors highlight: running clusters of machines can bring a significant administration overhead.

The book finishes with a chapter on reducing RAM usage, either by choice of intrinsic data types or using probabilistic data structures such as the Bloom filter which offer an approximate answer for vastly less memory usage. Also included are the Morris Counter which provides an approximate counter in 1 byte of storage, I must admit to being bemused as to when I would need such a thing.

Finally, there are what I refer to as “war stories” from practitioners in the field. I really liked these, one of the difficulties in working in technology is the constant stream of options to choose from, often with no clear frontrunner, so learning how others have approached problems is really handy. Here Celery, the Distributed Task Queue, and ElasticSearch get multiple mentions.

Overall the book is well-produced and readable. It occasionally lapses into the problem of inviting the reader to admire the colour in a greyscale printed plot. In other places I felt the example code could have presented up front in its entirety rather than being dribbled out in bits. In the chapter on Using less RAM, I felt important things were discussed (tries and directed-acyclic word graphs (DAWGs) before they were introduced which was a bit confusing. Tries and DAWGs are systems for the compact storage of text, and are both tree-like structures – I hadn’t come across these before.

In some ways this book is more about productionisation rather than performance. For the straightforward non-production data analysis work I’ve done I can imagine being a bit smarter about my choice of data structures and using profiling to be aware the slow points are as a result of this book. In the repeated reanalysis cycle it is nice to have something run in a minute rather than 20, but is it worth a day of development time? I would likely only turn to compilation, concurrency and multiprocessing if I were going to use a particular analysis regularly, or my anticipated run time was going to be measured in days without optimization.

I recommend this book to anyone looking to advance their understanding of Python, and speed up their code.

Book review: The Son also Rises by Gregory Clark

downloadThe Son also Rises by Gregory Clark is a book about social mobility, as traced through surnames. Clark prefaces his work by saying that what he is to say might be considered radical and controversial. Other studies of social mobility have find modest “inheritability” between generations. This study finds high levels of inheritability spanning hundreds of years.

The theme for the early chapters is to find some source of high status individuals – be it graduation from prestigious universities such as Oxford, Cambridge or the American Ivy League, membership of professional bodies such as those for doctors or attorneys or from financial records such as occasional tax releases or records of wills (probate). Next a cohort of names is tracked through these systems and their level of incidence is compared against the background level of incidence for that surname. For example, “Smythe” is a relatively rare surname in the general UK population but it is found at a much higher level in records of registered doctors.

The selected cohort of surnames may be from a distinctive ethnic population – i.e. Japanese in America, Native Americans or French settlers. Or it may be selected from a set of high status individuals at a point in time i.e. the Normans who came the England with William the Conqueror, or Swedish nobles.

Clark’s discovery is that for all of these many cohorts across multiple measures of status the persistence over time is strong. The Smythes of 200 years ago had relatively high status then and they still do now. After nearly a 1000 years those with surnames associated with the Norman conquest are still a little over-represented in the intake of Oxford and Cambridge University. Similar behaviour is found for low status groups, Baldrick’s character through the several series of Blackadder is not far from the truth. In both cases these groups are “regressing towards the mean” but it is a long, slow process.

Following these initial demonstrations of social mobility, Clark states his general law which is that the correlation of status over generations is high compared to previously measured parent-child measures and remarkably constant across multiple countries, periods in history and cohorts. The magic number for the correlation is 0.75. He argues that the reason that his estimate is higher than others is that he models social mobility with an underlying constant and a random fluctuation, the methods of calculation for early figures mean that this random fluctuation is much more apparent and brings down the measured social mobility. I don’t feel he demonstrates the origin of this discrepancy very clearly.

Subsequent chapters go on to look at some cases where one really expects deviations from this general rule, in the Indian caste system where low mobility is expected and also in China, where post-revolution is expected to be a time of high social mobility. It turns out that in India, despite laws aimed at reducing caste based discrimination, social mobility is has not improved dramatically. In China social mobility seems to have been little bothered by the revolution. The odd groups that do break the rule of constant social mobility seem to do so by preferential recruitment i.e. in the past in Muslim countries non-Muslims were tolerated but charged a poll tax which meant that lower status/income people were more likely to convert to Islam leaving a more persistently high status non-Muslim population. A second route is by strong preference for “in group” marriage which is seen in the Indian Brahmin caste. It turns out that the surnames identified with British parliamentarians are particularly immobile.

As for the origin of this constant social mobility, Clark ascribes it to what he calls “social competence”. There is a confused discussion of the balance of nature and nurture, not helped by a table where nature and nurture headings are accidently swapped (I think). I believe that technically it is all nurture, and Clark is trying to work out whether it is all about money. It strikes me that your wider family is where you learn about what the possibilities are for you and, while every family has it’s black sheep, the fact that your father, two out of three uncles went to Cambridge University means that your expectation is that you should aspire to that. Your family sets what is “normal”.

I suspect that this is particularly the case for British parliamentarians where there seems to be a lot of siblings (Milibands, Johnsons, Eagles), husband wife (Cooper/Balls) and parent-child (Kinnock, Benn) combinations. Being a politician is an odd sort of job, there is not really a class at school for it, seeing your family working in the “family business” must be a big influence.

“The Son also Rises” is an interesting read but turning it into a 300 page book seems to belabour the point somewhat. I liked the incidental details of the origins of surnames, and the various sources of information on social status.

I got this as a Kindle edition, I wish I’d bought it as a paperback, there are numerous figures, tables and equations which didn’t render at a reasonable size in the first instance.