Category: Book Reviews

Reviews of books featuring a summary of the book and links to related material

Book Review: Stargazers–Copernicus, Galileo, the Telescope and the Church by Allan Chapman

stargazersIt’s been a while since my last book review here but I’ve just finished reading Stargazers: Copernicus, Galileo, the Telescope and the Church by Allan Chapman.

The book covers the period from the end of the 16th century, the time of Copernicus and Tycho Brahe, to the early 18th century and Bradley’s measurement of stellar aberration passing Galileo, Newton and others on the way. Conceptually this spans the full transition from a time when people believed in a Classical universe with earth at its centre, and stars and planets plastered onto crystal spheres, to the modern view of the solar system with the earth and other planets orbiting the sun.

This development parallels that in Arthur Koestler’s classic book "The Sleepwalkers”, however Chapman’s style is much more readable, his coverage is broader but not so deep. Chapman introduces a wealth of little personal anecdotes and experiments. For instance on visiting Tycho Brahe’s island observatory he recounts a meeting with a local farmer who had in his living room a marked stone from the Brahe’s observatory (which had been dismantled by the locals on Brahe’s death). Brahe was hated by his tenants for his treatment of them, a hate that was handed down through the generations. Illustrations are provided in the author’s own hand, which is surprisingly effective. He discusses his own work in reconstructing historical apparatus and observations.

Astronomy was an active field from well before the start of this period for a couple of reasons: firstly, astrology had been handed down from Classical times as a way of divining the future. To was believed that to improve the accuracy of astrological predictions better data on the locations of heavenly bodies over time was required. Similarly, the Christian Church required accurate astronomical measurement to determine when Easter fell, across increasingly large spans of the Earth.

The period covered by the book marks a time when new technology made increasingly accurate measurements of the heavens possible, and the telescope revealed features such as mountains on the moon, sunspots and the moons of Jupiter visible for the first time. Galileo was a principle protagonist in this revolution.

Amongst scientists there is something of the view that the Catholic Church suppressed scientific progress with Galileo the poster boy for the scientist’s case. Historians of science don’t share this view, and haven’t for quite some time. Looking back on Sleepwalkers, written in 1959 I noted the same thing – the historians view is generally that Galileo brought it on himself in the way he dismissed those that did not share his views in rather offensive terms. Galileo lived in a time when the well-entrenched Classical view of the universe was coming under increased pressure from new observations using new instruments. In some senses it was the collision with the long-held Classical view of the universe which led to his problems, the Church being more committed to this Classical view of the physical universe rather than to anything proposed in Scripture.

The role of the Church in promoting, and fostering science, is something Chapman returns to frequently – emphasising the scientific work that members of the Church did, and also the often good relationships that lay “scientists” of different faiths had with Church authorities.

Chapman introduces some of the lesser known English (and Welsh) contributors to the story. Harriet who made the earliest known sketches of the moon. The Lancashire astronomers, who made the first observations of the transit of Venus. John Wilkins whose meetings were to lead to the foundation of the Royal Society. He also notes the precedent of the Royal College of Physicians, formed in 1518. The novelty of the Royal Society when compared with earlier organisations of similar character was that the Fellows were responsible for new appointments, rather than them being imposed by a patron. This seems to have been an English innovation, repeated in the Oxbridge colleges, and Guilds.

Relating to these English astronomers was the development of precision instruments in England. This seems to have been spurred by the Dissolution of the monasteries. The glut of land, seized by Henry VIII, became available to purchase. The purchase of land meant a requirement for accurate surveying, and legal documents. Hence an industry was born of skilled men wielding high technology to produce maps.

I was distracted by the presence of Martin Durkin in the acknowledgements to this book, he was the architect of “polemical” Channel 4 documentary “The Great Global Warming Swindle”, so it cast doubt in my mind as to whether I should take this book seriously. On reflection Chapman’s position as presented in this book seems respectable, but it is interesting how a short statement in the acknowledgements made me consider this more deeply.

Overall, Stargazers is rather more readable than Sleepwalkers, not quite so single-tracked in it’s defence of the Catholic Church as God’s Philosophers and a different proposition to Fred Watson’s book of the same name, which is all about telescopes.

Book review: Docker Up & Running by Karl Matthias and Sean P. Kane

This review was first published at ScraperWiki.

This last week I have been reading dockerDocker Up & Running by Karl Matthias and Sean P. Kane, a newly published book on Docker – a container technology which is designed to simplify the process of application testing and deployment.

Docker is a very new product, first announced in March 2013, although it is based on older technologies. It has seen rapid uptake by a number of major web-based companies who have open-sourced their tooling for using Docker. We have been using Docker at ScraperWiki for some time, and our most recent projects use it in production. It addresses a common problem for which we have tried a number of technologies in search of a solution.

For a long time I have thought of Docker as providing some sort of cut down virtual machine, from this book I realise this is the wrong mindset – it is better to think of it as a “process wrapper”. The “Advanced Topics” chapter of this book explains how this is achieved technically. This makes Docker a much lighter weight, faster proposition than a virtual machine.

Docker is delivered as a single binary containing both client and server components. The client gives you the power to build Docker images and query the server which hosts the running Docker images. The client part of this system will run on Windows, Mac and Linux systems. The server will only run on Linux due to the specific Linux features that Docker utilises in doing its stuff. Mac and Windows users can use boot2docker to run a Docker server, boot2docker uses a minimal Linux virtual machine to run the server which removes some of the performance advantages of Docker but allows you to develop anywhere.

The problem Docker and containerisation are attempting to address is that of capturing the dependencies of an application and delivering them in a convenient package. It allows developers to produce an artefact, the Docker Image, which can be handed over to an operations team for deployment without to and froing to get all the dependencies and system requirements fixed.

Docker can also address the problem of a development team onboarding a new member who needs to get the application up and running on their own system in order to develop it. Previously such problems were addressed with a flotilla of technologies with varying strengths and weaknesses, things like Chef, Puppet, Salt, Juju, virtual machines. Working at ScraperWiki I saw each of these technologies causing some sort of pain. Docker may or may not take all this pain away but it certainly looks promising.

The Docker image is compiled from instructions in a Dockerfile which has directives to pull down a base operating system image from a registry, add files, run commands and set configuration. The “image” language is probably where my false impression of Docker as virtualisation comes from. Once we have made the Docker image there are commands to deploy and run it on a server, inspect any logging and do debugging of a running container.

Docker is not a “total” solution, it has nothing to say about triggering builds, or bringing up hardware or managing clusters of servers. At ScraperWiki we’ve been developing our own systems to do this which is clearly the approach that many others are taking.

Docker Up & Running is pretty good at laying out what it is you should do with Docker, rather than what you can do with Docker. For example the book makes clear that Docker is best suited to hosting applications which have no state. You can copy files into a Docker container to store data but then you’d need to work out how to preserve those files between instances. Docker containers are expected to be volatile – here today gone tomorrow or even here now, gone in a minute. The expectation is that you should preserve state outside of a container using environment variables, Amazon’s S3 service or a externally hosted database etc – depending on the size of the data. The material in the “Advanced Topics” chapter highlights the possible Docker runtime options (and then advises you not to use them unless you have very specific use cases). There are a couple of whole chapters on Docker in production systems.

If my intention was to use Docker “live and in anger” then I probably wouldn’t learn how to do so from this book since the the landscape is changing so fast. I might use it to identify what it is that I should do with Docker, rather than what I can do with Docker. For the application side of ScraperWiki’s business the use of Docker is obvious, for the data science side it is not so clear. For our data science work we make heavy use of Python’s virtualenv system which captures most of our dependencies without being opinionated about data (state).

The book has information in it up until at least the beginning of 2015. It is well worth reading as an introduction and overview of Docker.

Book Review: Learning Spark by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia

This post was first published at ScraperWiki.
learning-spark-book-coverApache Spark is a system for doing data analysis which can be run on a single machine or across a cluster, it  is pretty new technology – initial work was in 2009 and Apache adopted it in 2013. There’s a lot of buzz around it, and I have a problem for which it might be appropriate. The goal of Spark is to be faster and more amenable to iterative and interactive development than Hadoop MapReduce, a sort of Ipython of Big Data. I used my traditional approach to learning more of buying a dead-tree publication, Learning Spark by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia, and then reading it on my commute.

The core of Spark is the resilient distributed dataset (RDD), a data structure which can be distributed over multiple computational nodes. Creating an RDD is as simple as passing a file URL to a constructor, the file may be located on some Hadoop style system, or parallelizing an in-memory data structure. To this data structure are added transformations and actions. Transformations produce another RDD from an input RDD, for example filter() returns an RDD which is the result of applying a filter to each row in the input RDD. Actions produce a non-RDD output, for example count() returns the number of elements in an RDD.

Spark provides functionality to control how parts of an RDD are distributed over the available nodes i.e. by key. In addition there is functionality to share data across multiple nodes using “Broadcast Variables”, and to aggregate results in “Accumulators”. The behaviour of Accumulators in distributed systems can be complicated since Spark might preemptively execute the same piece of processing twice because of problems on a node.

In addition to Spark Core there are Spark Streaming, Spark SQL, MLib machine learning, GraphX and SparkR modules. Learning Spark covers the first three of these. The Streaming module handles data such as log files which are continually growing over time using a DStream structure which is comprised of a sequence of RDDs with some additional time-related functions. Spark SQL introduces the DataFrame data structure (previously called SchemaRDD) which enables SQL-like queries using HiveQL. The MLlib library introduces a whole bunch of machine learning algorithms such as decision trees, random forests, support vector machines, naive Bayesian and logistic regression. It also has support routines to normalise and analyse data, as well as clustering and dimension reduction algorithms.

All of this functionality looks pretty straightforward to access, example code is provided for Scala, Java and Python. Scala is a functional language which runs on the Java virtual machine so appears to get equivalent functionality to Java. Python, on the other hand, appears to be a second class citizen. Functionality, particularly in I/O, is missing Python support. This does beg the question as to whether one should start analysis in Python and make the switch as and when required or whether to start in Scala or Java where you may well be forced anyway. Perhaps the intended usage is Python for prototyping and Java/Scala for production.

The book is pitched at two audiences, data scientists and software engineers as is Spark. This would explain support for Python and (more recently) R, to keep the data scientists happy and Java/Scala for the software engineers. I must admit looking at examples in Python and Java together, I remember why I love Python! Java requires quite a lot of class declaration boilerplate to get it into the air, and brackets.

Spark will run on a standalone machine, I got it running on Windows 8.1 in short order. Analysis programs appear to be deployable to a cluster unaltered with the changes handled in configuration files and command line options. The feeling I get from Spark is that it would be entirely appropriate to undertake analysis with Spark which you might do using pandas or scikit-learn locally, and if necessary you could scale up onto a cluster with relatively little additional effort rather than having to learn some fraction of the Hadoop ecosystem.

The book suffers a little from covering a subject area which is rapidly developing, Spark is currently at version 1.4 as of early June 2015, the book covers version 1.1 and things are happening fast. For example, GraphX and SparkR, more recent additions to Spark are not covered. That said, this is a great little introduction to Spark, I’m now minded to go off and apply my new-found knowledge to the Kaggle – Avito Context Ad Clicks challenge!

Book review: Your Inner Fish by Neil Shubin

yourinnerfishI’m holiday so I’ve managed some more reading! This time Your Inner Fish by Neil Shubin. As recommended by my colleague, David Jones, at ScraperWiki.

This is ostensibly a story of a particular distant ancestor of humans, the first to walk on land 375 million years ago, but in practice it is broader than that. It is more generally about what it is to be a modern palaeontologist and taxonomy – the classification of living organisms.

Your Inner Fish is a personal account based around the work Shubin and his colleagues did in discovering the Tiktaalik species, the first walker, in the high Canadian Arctic. It turns out the distinguishing features of such animals are the formation of shoulders and a neck, underwater a fish can easily reorient its whole body to get its head facing the right way, on land a neck to move the head independently and shoulders to mount the front legs become beneficial. Shubin hypotheses that animals such as Tiktaalik evolved to walk on dry land to evade ever larger and more aggressive aquatic predators.

Shubin recounts the process that led him to the Arctic, starting with his earlier fossil hunting in road cuts in Pennsylvania. The trick to fossil hunting being finding bedrock of the right age being exposed in moderate amounts. Road cuts are a second best in the this instance, being rather small in scale. Palaeontologists find their best hunting grounds in deserts and the barren landscape of the north. Finding the right site is a combination of identifying where rocks of the right age are likely to be exposed and knowing whether someone has looked there already.

Once you are in the field, the tricky part comes: finding the fossils. This is a skill akin to being able to resolve a magic eye puzzle. This is a skill which is learnt practically in the field rather than theoretically in the classroom. I’m struck by how small some of the most important fossil sites are, Shubin shows a photo of the Tiktaalik site where 6 people basically fill it. The Walcott Quarry in the Burgess Shale is similarly compact.

The central theme of the book is the one-ness of life, in the sense that humans share a huge amount of machinery with all living things to do with the business of building a body. These days the focus of such interest is on DNA, and the similarity of genes and the proteins they encode across huge spans of the tree of life. In earlier times these similarities were identified in developmental processes and anatomy. It is significant that researchers such as Shubin span the fossil, development and genetic domains.

Anatomically fish, lizards, mammals and birds represent the reshuffling of the same components. The multiple jaw bones found in sharks and skates turn into the bones of the inner ear in mammals. The arches which form gills in fish morph and adapt in mammals to leave a weird layout of nerves in the face and skull. These similarities in gross anatomical features are reflected in the molecular machinery which drives development, the formation of complex bodies from a single fertilised cell. Organiser molecules are common across vertebrates.

It’s worth noting the contribution of Hilde Mangold to the development story, her supervisor Hans Spemann won the 1935 Nobel Prize for medicine based in part on the work differentiation in amphibian embryos she had presented in her 1923 thesis. She died at the age of 26 in 1924 as the result of an explosion in her apartment building. Nobel Prizes are only awarded to the living.

Why study this taxonomy? The reasons are two-fold, there is the purely intellectual argument of “because it is there”. The shared features of life are one of the pieces of evidence underpinning the theory of evolution. The second reason is utilitarian, linking all of life into a coherent structure gives us a better understanding of our own bodies, and how to fix them if they go wrong.

As examples of our faulty body Shubin highlights hiccups and hernias. Hiccups because the reflexes leading to hiccups are the descendants of the reflexes of tadpoles which allowed them to breathe through gills as well as lungs. Hernias because the placement of the testes outside the abdomen is an evolution from our fish ancestors who kept gonads internally – external placement is a botched job which leads to a weakness in the abdomen wall, particularly in men.

This book is shorter and more personal than Richard Dawkins’ and Stephen Jay Gould’s work in similar vein.

I liked it.

Book review: Gut by Giulia Enders

Gut-by-giulia-endersIt seems a while since I last reviewed a book here. Today I bring you Gut: The Inside Story of our Body’s Most Underrated Organ by Giulia Enders.

The book does exactly what it says on the tin: tell us about the gut. This is divided into three broad sections. Firstly the mechanics of it all, including going to the toilet and how to do it better. Secondly, the nervous system and the gut, and finally the bacterial flora that help the gut do its stuff.

The writing style seems to be directed at the early to mid-teenager which gets a bit grating in places. Sometimes things end up outright surreal, salmonella wear hats and I still don’t quite understand why. The text is illustrated with jaunty little illustrations.

From the mechanical point of view several things were novel to me: the presence of an involuntary internal sphincter shortly before the well-known external one. The internal sphincter allows “sampling” of what is heading for the outside world giving the owner the opportunity to decide what to do with their external sphincter.

The immune tissue in the tonsillar ring was also a new to me, its job is to sample anything heading towards the gut. This is most important in young children before their immune systems are fully trained. Related to the tonsils, the appendix also contain much immune tissue and has a role in repopulating the bacteria in the large intestine with more friendly sorts of bacteria following a bout of diarrhoea.

The second section, on the nervous system of the gut covers things such as vomiting, constipation and the links between the gut and depression.

The section on the bacterial flora of the gut gathers together some of the stories you may have already heard. For example, the work by Marshall on Helicobactor Pylori and its role in formation of stomach ulcers. What I hadn’t realised is that H. Pylori  is not thought to be all bad. Its benefits are in providing some defence against asthma and autoimmune diseases. Also in this section is toxoplasmosis, the cat-born parasite which can effect rats and humans, making them more prone to risk-taking behaviour.

I was delighted to discover the use to which sellotape is put in the detection of threadworms – potential sufferers are asked to collect threadworm eggs from around the anus using sellotape. I can imagine this is an unusual experience which I don’t intend to try without good reason.

There is a small amount of evangelism for breast-feeding and organic food which I found a little bit grating.

As usual with electronic books I hit the references section somewhat sooner than I expected, and here there is a clash with the casual style of the body of the book. Essentially, it is referenced as a scientific paper would be – to papers in the primary literature.

I don’t feel this book has left me with any great and abiding thoughts but on the other hand learning more about the crude mechanics of my body is at least a bit useful.