Category: Book Reviews

Reviews of books featuring a summary of the book and links to related material

Book review: Data Visualization: a successful design process by Andy Kirk

datavisualization_andykirk

This post was first published at ScraperWiki.

My next review is of Andy Kirk’s book Data Visualization: a successful design process. Those of you on Twitter might know him as @visualisingdata, where you can follow his progress around the world as he delivers training. He also blogs at Visualising Data.

Previously in this area, I’ve read Tufte’s book The Visual Display of Quantitative Information and Nathan Yau’s Visualize ThisTufte’s book is based around a theory of effective visualisation whilst Visualize This is a more practical guide featuring detailed code examples. Kirk’s book fits between the two: it contains some material on the more theoretical aspects of effective visualisation as well as an annotated list of software tools; but the majority of the book covers the end-to-end design process.

Data Vizualisation introduced me to Anscombe’s Quartet. The Quartet is four small datasets, eleven (x,y) coordinate pairs in each. The Quartet is chosen so the common statistical properties (e.g. mean values of x and y, standard deviations for same, linear regression coefficients) for each set are identical, but when plotted they look very different. The numbers are shown in the table below.

anscombesdata

Plotted they look like this:

anscombequartetAside from set 4, the numbers look unexceptional. However, the plots look strikingly different. We can easily classify their differences visually, despite the sets having the same gross statistical properties. This highlights the power of visualisation. As a scientist, I am constantly plotting the data I’m working on to see what is going on and as a sense check: eyeballing columns of numbers simply doesn’t work. Kirk notes that the design criteria for such exploratory visualisations are quite different from those highlighting particular aspects of a dataset, more abstract “data art” presentations, or a interactive visualisations prepared for others to use.

In contrast to the books by Tufte and Yau, this book is much more about how to do data visualisation as a job. It talks pragmatically about getting briefs from the client and their demands. I suspect much of this would apply to any design work.

I liked Kirk’s “Eight Hats of data visualisation design” metaphor; which name the skills a visualiser requires: Initiator, Data Scientist, Journalist, Computer Scientist, Designer, Cognitive Scientist, Communicator and Project Manager. In part, this covers what you will require to do data visualisation, but it also gives you an idea of whom you might turn to for help  –  someone with the right hat.

The book is scattered with examples of interesting visualisations, alongside a comprehensive taxonomy of chart types. Unsurprisingly, the chart types are classified in much the same way as statistical methods: in terms of the variable categories to be displayed (i.e. continuous, categorical and subdivisions thereof). There is a temptation here though: I now want to make a Sankey diagram… even if my data doesn’t require it!

In terms of visualisation creation tools, there are no real surprises. Kirk cites Excel first, but this is reasonable: it’s powerful, ubiquitous, easy to use and produces decent results as long as you don’t blindly accept defaults or get tempted into using 3D pie charts. He also mentions the use of Adobe Illustrator or Inkscape to tidy up charts generated in more analysis-oriented packages such as R. With a programming background, the temptation is to fix problems with layout and design programmatically which can be immensely difficult. Listed under programming environments is the D3 Javascript library, this is a system I’m interested in using  –  having had some fun with Protovis, a D3 predecessor.

Data Visualization works very well as an ebook. The figures are in colour (unlike the printed book) and references are hyperlinked from the text. It’s quite a slim volume which I suspect compliments Andy Kirk’s “in-person” courses well.

Book review: The Dinosaur Hunters by Deborah Cadbury

DinosaurHuntersA rapid change of gear for my book reviewing: having spent several months reading “The Eighth Day of Creation” I have completed “The Dinosaur Hunters” by Deborah Cadbury in only a couple of weeks. Is this a bad thing? Yes, and no – it’s been nice to read a book that rattles along at a good pace, is gripping and doesn’t have me leaping to make notes at every page – the downside is that I feel I have consumed a literary snack rather than a meal.

The Dinosaur Hunters covers the initial elucidation of the nature of large animal fossils, principally of dinosaurs, from around the beginning of the 19th century to just after the publication of Darwin’s “Origin of the Species” in 1859. The book is centred around Gideon Mantell (1790-1852) who first described the Iguanodon and was an expert in the geology of the Weald, at the same time running a thriving medical practice in his home town of Lewes. Playing the part of Mantell’s nemesis is Richard Owen (1804-1892), who formally described the group of species, the Dinosauria, and was to be the driving force in the founding of the Natural History Museum in the later years of the 19th century. Smaller parts are played by Mary Anning (1799-1847), fossil collector based in Lyme Regis; William Buckland (1784-1856) who described Megalosaurus – the first of the dinosaurs and spent much of his life trying to reconcile his Christian faith with new geological findings; George Cuvier (1769-1832) the noted French anatomist who related fossil anatomy to modern animal anatomy and identified the existence of extinctions (although he was a catastrophist who saw this as evidence of different epochs of extinction rather than a side effect of evolution); Charles Lyell (1897-1875) a champion of uniformitarianism (the idea that the modern geology is the result of processes visible today continuing over great amounts of time); Charles Darwin (1809-1882) who really needs no introduction, and Thomas Huxley (1825-1895) a muscular proponent of Darwin’s evolutionary theory.

For me a recurring theme was that of privilege and power in science, often this is portrayed as something which disadvantaged women but in this case Mantell is something of a victim too, as was William Smith as described in “The Map that Changed the World”. Mantell was desperate for recognition but held back by his full-time profession as a doctor in a minor town and his faith that his ability would lead automatically to recognition. Owen, on the other hand, with similar background (and prodigious ability) went first to St Bartholomew’s hospital and then the Royal College of Surgeon’s where he appears to have received better patronage but in addition was also brutal and calculating in his ambition. Ultimately Owen over-reached himself in his scheming, and although he satisfied his desire to create a Natural History Museum, in death he left little personal legacy – his ability trumped by his dishonesty in trying to obliterate his opponents.

From a scientific point of view the thread of the book is from the growing understanding of stratigraphy i.e. the consistent sequence of rock deposits through Great Britain and into Europe; the discovery of large fossil animals which had no modern equivalent; the discovery of an increasing range of these prehistoric remnants each with their place in the stratigraphy and the synthesis of these discoveries in Darwin’s theory of evolution. Progress in the intermediate discovery of fossils was slow because in contrast to the the early fossils of marine species such as icthyosaurus and plesiosaurus which were discovered substantially intact later fossils of large land animals were found fragmented in Southern England, which made identifying the overall size of such species and even the numbers of species present in your pile of fossils difficult.

These scientific discoveries collided with a social thread which saw the clergy deeply involved in scientific discovery at the beginning, becoming increasingly discomforted with the account of the genesis of life in Scripture being incompatible with the findings in the stone. This ties in with a scientific community trying to make their discoveries compatible with Scripture and what they perceived to be the will of God with the schism between the two eventually coming to a head by the publication of Darwin’s Origin of Species.

Occasionally the author drops into a bit of first person narration which I must admit to finding a bit grating, perhaps because for people long dead it is largely inference. I’d have been very happy to have chosen this book for a long journey or a holiday, I liked the wider focus on a story rather than an individual.

References

My Evernotes

Book review: The Eighth Day of Creation by Horace Freeland Judson

EighthDayMy reading moves seamlessly from the origins of cosmology (in Koestler’s Sleepwalkers) to the origins of molecular biology in “The Eighth Day of Creation” by Horace Freeland Judson. The book covers the revolution in biology starting with the elucidation of the structure of DNA through to how this leads to the synthesis, by organisms, of proteins – this covers a period from just before the Second World War to the early 1960s although in the Epilogue and Afterwords. Judson comments on the period up to the mid-nineties. Although the book does provide basic information on the core concepts (What is DNA? What is a protein?), I suspect it requires a degree of familiarity with these ideas to make much sense on a casual reading – the same applies to this blog post.

The first third or so of the book covers the elucidation of the structure of DNA. Three groups were working on this problem – that of Linus Pauling in the US, Franklin and Wilkins at Kings College in London and Crick and Watson in Cambridge. Key to the success of Crick and Watson was their collaboration: a willingness to talk to people who knew stuff they needed to know, and piecing the bits together. The structural features of their model were the helix form (this wasn’t news), specific and strong hydrogen bonding between bases, and the presence of two DNA chains (running in opposite directions). On the whole this wasn’t a new story to me, although I wasn’t familiar with the surrounding work which established DNA as the genetic material. Judson returns to the part Rosalind Franklin in the discovery in one of the Afterwords. It has been said that Franklin was greatly wronged over the discovery of DNA, but Judson does not hold this view and I tend to agree with him. The core of the problem is that the Nobel Prize is not awarded posthumously, and with her death at 37 from cancer, Franklin therefore missed out. Watson’s book The Double Helix was a rather personalised view of the characters involved most of whom were alive to carry out damage limitation, whilst Franklin was not – so here she was poorly treated but by Watson rather than a whole community of scientists. Perhaps the thing that said the most to me about the situation is that after she was diagnosed with cancer she stayed with Cricks at their home.

In parallel with the elucidation of the structure of the DNA work had been ongoing with understanding protein synthesis and genetics in viruses and bacteria. This included both how information was coded into DNA, with much effort expended in trying to establish overlapping codes. There are 20 amino acids and four bases in DNA, so three base pairs are required to specify an amino acid if the amino acid sequence is to be unconstrained but it was conceivable that two consecutive amino acids are coded by fewer than 6 base pairs but in this case there is a restriction on the possible amino acid sequences. This area was initiated by the physicist, George Gamow. I struggle a bit to see how it gained so much traction, this type of model was quickly ruled out by consideration of the amino acid sequences that we being established for proteins at the time. It turns out that amino acids are coded by three consecutive base pairs with redundancy (so several different base pair triplets code for the same amino acid). Also covered was the mechanism by which data passed from DNA to the ribosomes where protein synthesis takes place, important here are adaptor molecules which carry the appropriate amino acid to the site of synthesis.

Compared to the structure of DNA this work was a long difficult slog, involving intricate experiments with bacteria, bacteriophage viruses, bacterial sex, ultracentrifugation, chromatography and radiolabelling.

The final part of the book is on the elucidation of the structure of proteins, this was done using x-ray crystallography with the very first clear scattering patterns measured in the 1930s and the first full elucidation made in the late fifties. X-ray crystallography of proteins, containing many thousands of atoms is challenging. Fundamentally there is a issue, the “phase problem”, which means you don’t have quite enough information to determine the structure from the scattering pattern. This issue was resolved by heavy atom labelling, here you try to chemically attach a heavy atom such as mercury to your protein then compare the scattering pattern of this modified protein with that of the unmodified protein, which resolves the phase problem. Nowadays measuring the thousands of spots in an x-ray scattering pattern and carrying out the thousands and thousands of calculations required to resolve the structure is relatively straightforward but in the early days it was a massive manual labour.

As well as resolving structure a key discovery was made regarding the mode of action of proteins: essentially they work as adaptors between chemical distinct systems – when a molecule binds to one site on a protein it effects the ability of another type of molecule to bind to another site on the protein through changes in the protein structure induced by the first molecule’s binding. This feature opens up huge possibilities for cell biology – in the absence of this feature interactions between chemical systems can only occur if the participants in those systems interact with each other chemically.

It isn’t something I’d really appreciated properly but molecular biologists are quite organised in the organisms that they generally agree to work on. The truth is that there are uncountably many viruses and so to aid the progress of science one needs to select which ones to study: E. Coli, the T series bacteriophages, C. Elegans, D. Melanogaster and more recently the zebrafish, they almost play the part of an extra author.

Molecular biology was apparently dominated by physicists, I must admit I found this confusing in the past but Judson highlights the field as defined by its practioners: biochemistry is about energy and matter (and typically small molecules), molecular biology is about information (and typically macromolecules) – a more natural home for physicists.

I found the first and third parts an enjoyable read, my scientific background is in scattering so the technical material was at least familiar the central section on genetics I found fascinating but a bit of a slog. I’m somewhat in awe of the complexity of the experiments (and their apparent difficulty).

Looking back on my earlier book reviews, I read my comment on R.J. Evan’s book on historiography that history is a literary exercise as well as anything else, as a trained scientist this was something of an alien concept but in common with Koestler’s book the style of this book shines through.

 

Footnotes

My Evernotes

Book review: The Sleepwalkers: A History of Man’s Changing Vision of the Universe by Arthur Koestler

Sleepwalkers_ArthurKoestler.Another result of my plea for reading suggestions on twitter; this is a review and summary of Arthur Koestler’s book “The Sleepwalkers: A History of Man’s Changing Vision of the Universe”. The book is a history of cosmology running from Pythagoras, in the 6th century BC, to Galileo who spanned the end of the 16th century, just touching lightly on Newton. It traces a revolution from a time when the cosmos, beyond the earth, was considered different, stable and perfect, to a time when it was shown to be subject to earthly physics, be changeable and not perfect by any reasonable definition.

Kuhn’s language of paradigm shifts seems rather overused to me but here is an example of a true paradigm shift. The sleepwalkers in the title refers to the idea that the protagonists didn’t really know where they were headed with their ideas and quite often were lucky with errors which cancelled each other out.

The book starts with a cursory look at Babylonian and early Greek astronomy; despite considerable observational acumen their models of the universe were outright mythical. The Pythagoranean Brotherhood although in many senses still mystical started to think about the physics of the universe. I have a tendency to think of the ancient Greeks as one blob but as the book makes clear there is a huge span of time, and outlook, between Pythagoras, Aristotle and Plato and Ptolemy. Koestler is quite clearly disappointed with the Greeks: they make a promising start with Pythagoras, Aristarchus developed a heliocentric model for the solar system and then with Plato, Aristotle and Ptolemy they regress back to a geocentric model.

Following on from the Greeks the Middle Ages are covered, James Hannam in his book “God’s Philosophers” has covered why this period wasn’t all that bad in terms of intellectual development. Koestler is less sympathetic, his key accusations are that they philosophers of the middle ages were in thrall to the later Greeks and furthermore there were elements of Christian theology that abjured the pleasure of knowledge for knowledge’s sake.

After these preliminaries, Koestler turns to the core of his work: the cosmological developments of Copernicus, Tycho Brahe, Johannes Kepler and Galileo Galilei.

The model of the universe handed down from the ancient Greeks was one of circles (often referred to in this context as epicycles), they believed that motion in a circle was perfect, that the heavens were a separate, perfect realm and that therefore all motion in the heavens must be based on circular motion. Further, the model dominating at the end of their period, held that the earth lay at the centre of these circular motions. The only problem with this model is that it doesn’t fit well the observed motions of the sun, moon, Mercury, Venus, Mars, Jupiter and Saturn – the observable solar system which lay against an unchanging starry background. Or rather you can get a rough fit at the expense of stacking together a great number of epicycles – something like 50.

Copernicus’ contribution, published on his death in 1543, was to put the sun back at the centre of the universe. Copernicus led a rather uneventful life, was no sort of astronomical observer and only published his thesis at the end of his life at the strong urging of Georg Joachim Rheticus. He’d discussed his model fairly freely during his life, and his reasons for not publishing were more to do with fear of ridicule from his contemporaries rather than theological pressure. After his death his work, with the exception of the astronomical tables, sank into obscurity partly because it was a difficult read and partly because he managed to ostracise his former cheerleader, Rheticus. Copernicus’ model still holds to the epicycles of the Greeks, and only marginally reduces the complexity of the model.

Next up comes Johannes Kepler, interspersed with Tycho Brahe. Brahe was an astronomical observer and nobleman, funded very well by the Danish king; given his own island Hveen where he built his observatory. As a keen astrologer he began his observation programme when he found a conjunction of Jupiter and Saturn was poorly predicted by current astronomical tables – how can you cast an accurate fortune under these circumstances?

Kepler was a theoretician rather than an observer but also a keen astrologer. I emphasise this because these days astrology is not held in high regard but it is the father of observational astronomy. He had started to develop a model of the solar system based on the Platonic solids – something of a mystical exercise but realised he needed better data to support his model. Brahe was the man with the data, Kepler was only just in time though – he travelled to work with Brahe when Brahe moved to Prague less than 2 years later Brahe was dead. Nowadays we know Kepler for his three laws of planetary motion – it’s worth noting that Kepler’s laws are labelled retrospectively.)

He left copious records of his progress which Koestler traces in great detail, Kepler’s struggle to recognise that planetary orbits were ellipses was heroic and has something of a pantomime air to it – “They’re right in front of you!”. His approach was unprecedented in the sense that he sought to accurately model the very best, most recent measurements. Kepler also made some attempts at a physical model to describe the motions but ultimately he is remembered for the detailed description of their motion. Since it is not central to his theme, Koestler makes only passing reference to Kepler’s work on optics.

The penultimate figure in the story is Galileo, despite Kepler’s best efforts Galileo pretty much ignored him. Galileo gets quite short shrift from Koestler who feels that he brought his troubles with the Catholic Church upon himself. Reading this account his position is not unreasonable. Galileo’s two big contributions to the story are his promotion and use of the telescope, and his work on the motion of terrestrial bodies, the generalisation of which and application to the solar system was Newton’s great triumph. Cosmologically he was only later in his life a supporter of the somewhat retro Copernican model which was a cul-de-sac in terms of theoretical developments. At the time the Catholic Church, particularly the Jesuits, were interested in astronomy and not particularly hardline about the interpretation of Scripture to fit observations. Galileo wound them up both by claiming all newly observed celestial phenomena as his own and by putting the words of the Pope in the mouth of an idiot in one of his Dialogues.

This highlights two of the wider themes that Koestler brings to his book. At one point he describes his cast of characters as “moral dwarves”, he states this is relative to their scientific achievements but returns to this theme in the epilogue where he feels that our scientific developments have not been matched by our spiritual development. The second is the schism between science and the Church that began in this period, Koestler seems to put much of the blame for this on Galileo’s head feeling that it is by no means inevitable. In the epilogue he also draws a comparison between biological evolution and scientific developments, highlighting specifically that there are long periods of not that much happening and many diversions from the “true” path.

The book finishes with a brief mention of Newton’s synthesis of Kepler’s laws and Galileo’s dynamics to produce a model of the solar system which is close to that which we hold today.

This really is a rollicking good read! This is a relatively old book, published in 1959 and one might anticipate that it has not fully caught up with modern historiography however a brief look around the internet suggests that he is not criticised in any great sense. Koestler does tend to focus on a limited number of “great” individuals and goes for “firsts” but this perhaps is what makes it a good read.

Footnotes

My Evernotes for the book are here, last page of the book at the top!

Book Review: Alan Turing: The Enigma by Andrew Hodges

2012editionA brief panic over running out of things to read led me to poll my twitter followers for suggestions, Andrew Hodges’ biography of Alan Turing, Alan Turing: The Enigma  was one result of that poll. Turing is most famous for his cryptanalysis work at Bletchley Park during the Second World War. He was born 23rd June 1912, so this is his 100th anniversary year. He was the child of families in the Indian Civil Service, with a baronetcy in another branch of the family.

The attitude of his public school, Sherbourne, was very much classics first, this attitude seems to have been common and perhaps persists today. Turing was something of an erratic student, outstanding in the things that interested him (although not necessarily at all tidy) and very poor in those things that did not interest him.

After Sherbourne he went to King’s College, Cambridge University on a scholarship for which he had made several attempts (one for my old college, Pembroke). The value of the scholarship, £80 per annum, is quite striking: it is double the value of unemployment benefit and half that of a skilled worker. He started study in 1931, on the mathematics Tripos. His scholarship examination performance was not outstanding. Significant at this time is the death of his close school friend, Christopher Morcom in 1930.

King’s is a notorious hotbed of radicals, and at this time Communism was somewhat in vogue, a likely stimulus for this was the Great Depression: capitalism was seen to be failing and Communism offered, at the time, an attractive alternative. Turing does not appear to have been particularly politically active though.

During his undergraduate degree, in 1933, he provided a proof of the Central Limit Theorem – it turns out a proof had already been made but this was his first significant work. He then went on to answer Hilbert’s Entscheidungsproblem (German for “Decision Problem) in mathematics with his paper, “On computable numbers”1. This is the work in which he introduced the idea of a universal machine that could read symbols from a tape, adjust its internal state on the basis of those symbols and write symbols on the tape. The revelation for me in this work was that mathematicians of Turing’s era were considering numbers and the operations on numbers to have equivalent status. It opens the floodgates for a digital computer of the modern design: data and instructions that act on data are simply bits in memory there is nothing special about either of them. In the period towards the Second World War a variety of specialised electromechanical computing devices were built, analogue hardware which attacked just one problem. Turing’s universal machine, whilst proving that it could not solve every problem, highlighted the fact that an awful lot of problems could be solved with a general computing machine – to switch to a different problem, simply change the program.

Alonzo Church, at Princeton University, produced an answer for the Entscheidungsproblem  at the same time; Turing went to Princeton to study for his doctorate with Church as his supervisor.

Turing had been involved in a minor way in codebreaking before the outbreak of World War II and he was assigned to Bletchley Park immediately war started. His work on the “Turing machine” provides a clear background for attacking German codes based on the Enigma machine. This is not the place to relate in detail the work at Bletchley: Turing’s part in it was as something of a mathematical guru but also someone interested in producing practical solutions to problems. The triumph of Bletchley was not the breaking of individual messages but the systematic breaking of German systems of communication. Frequently, it was the breaking of a system which was critical in principle the Enigma machine (or variants of it) could offer practically unbreakable codes but in practice the way it was used offered a way in. Towards the end of the war Turing was no longer needed at Bletchley and he moved to a neighbouring establishment, Hanslope Park where he built a speech encrypting system, Delilah with Don Bayley – again a very practical activity.

Following the war Turing was seconded to the National Physical Laboratory where it was intended he would help build ACE (a general purpose computer), however this was not to be – in contrast to work during the war building ACE was a slow frustrating process and ultimately he left for Manchester University who were building their own computer. Again Turing shows a high degree of practicality: he worked out that an alcohol water mixture close to the composition of gin would be almost as good as mercury for delay line memory*. Philosophically Turing’s vision for ACE was different from the American vision for electronic computing led by Von Neumann: Turing sought the simplest possible computing machinery, relying on programming to carry out complex tasks – the American vision tended towards more complex hardware. Turing was thinking about software, a frustrating process in the absence of any but the most limited working hardware and also thinking more broadly about machine intelligence.

It was after the war that Turing also became interested in morphogenesis2 – how complex forms emerge from undifferentiated blobs in the natural world, based on the kinetics of chemical reactions. He used the early Manchester computer to carry out simulations in this area. This work harks back to some practical calculations on chemical kinetics which he did before going to university.

Turing’s suicide comes rather abruptly towards the end of the book. Turing had been convicted of indecency in 1952, and had undergone hormone therapy as an alternative to prison to “correct” his homosexuality. This treatment had ended a year before his suicide in 1954. By this time the UK government had tacitly moved to a position where no homosexual could work in sensitive government areas such as GCHQ. However, there is no direct evidence that this was putting pressure on Turing personally. Reading the book there is no sick feeling of inevitability as Turing approaches the end you know he has.

Currently there are calls for Turing to be formally pardoned for his 1952 indecency conviction, personally I’m ambivalent about this – a personal pardon for Turing is irrelevant: legal sanctions against homosexual men, in particular, were widespread at the time. An individual pardon for Turing seems to say, “all those other convictions were fine, but Turing did great things so should be pardoned”. Arnold Murray, the man with whom Turing was convicted was nineteen at the time, an age at which their activities were illegal in the UK until 2000.

What struck me most about Turing from this book was his willingness to engage with practical, engineering solutions to the results his mathematical studies produced.

Hodges’ book is excellent: it’s thorough, demonstrates deep knowledge of the areas in which Turing worked and draws on personal interviews with many of the people Turing worked with.

Footnotes

1. “On computable numbers, with an application to the Entscheidungsproblem”, A.M. Turing, Proceedings of the London Mathematical Society 42:230-265 (1936).

2. “The Chemical Basis of Morphogenesis”, A.M. Turing, Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, Vol. 237, No. 641. (Aug. 14, 1952), pp. 37-72.

3. My Evernotes for the book

4. Andrew Hodges’ website to accompany the book (link)