Category: Book Reviews

Reviews of books featuring a summary of the book and links to related material

Book review: Effective Computation in Physics by Anthony Scopatz & Kathryn D. Huff

ecipThis next review, of “Effective Computation in Physics” by Anthony Scopatz & Kathryn D. Huff, arose after a brief discussion on twitter with Mike Croucher after my review of “High Performance Python” by Ian Micha Gorelick and Ian Ozsvald. This in the context of introducing students, primarily in the sciences, to programming and software development.

I use the term “software development” deliberately. Scientists have been taught programming (badly, in my view*) for many years. Typically they are given a short course in the first year of their undergraduate training, where they are taught the crude mechanics of a programming language (typically FORTRAN, C, Matlab or Python). They are then left to it, perhaps taking up projects requiring significant coding as final year projects or in PhDs. The thing they have lacked is the wider skillset around programming – what you might call “software development”. The value of this is two-fold – firstly, it is a good training for a scientist to have for careers in science. Secondly, the wider software industry is full of scientists, providing students with a good grounding in this field is no bad thing for their future employability.

The book covers in at least outline all the things a scientist or engineer needs to know about software development. It is inspired by the Software Carpentry and The Hacker Within programmes.

The restriction to physics in the title seems needless to me. The material presented is mostly applicable to any science, and those working in the digital humanities, undertaking programming work. The examples have a physics basis but not to any great depth, and the decorative historical anecdotes are all physics based. Perhaps the only exception to this is the chapter on HDF5 which is a specialised data storage system, some coverage of SQL databases would make a reasonable substitute for a more general course. The chapter on parallel computing could likewise be dropped for a wider audience.

The book is divided into four broad sections. Including in these are chapters on:

  • Command line operations;
  • Programming in Python;
  • Build systems, version control, debugging and testing;
  • Documentation, publication, collaboration and licensing;

Command line operations are covered in two chunks, firstly in the basic navigation of the file system and files followed by a second chapter on “Regular Expressions” which covers find, grep, sed and awk – at a very basic level.

The introduction to Python is similarly staged with initial chapters covering the fundamentals of the core language, with sufficient detail and explanation that I learnt some new things**. Further chapters introduce core Python libraries for data analysis including NumPy, Pandas and matplotlib.

Beyond these core chapters on Python those on version control, debugging and testing are a welcome addition. Our dearest wish at ScraperWiki, a small software company where I worked until recently, was that new recruits and interns would come with at least some knowledge and habit for using source control (preferably Git). It is also nice to see some wider discussion of GitHub and the culture of Pull Requests and issue tracking. Systematic testing is also a useful skill to have, in fact my experience has been that formal testing is most useful for those most physics-like functions.

The final section covers documentation, publication and licensing. I found the short chapter on licensing rather useful, I’ve been working on some code to analyse LIDAR data and have made it public on GitHub, which helpfully asks which license I would like to use. As it turns out I chose the MIT license and this seems to be the correct one for the application. On publication the authors are Latex evangelists but students can chose to ignore their monomania on this point. Latex has a cult-like following in physics which I’ve never understood. I have written papers in Latex but much prefer Microsoft Word for creating documents, although Google Docs is nice for collaborative work. The view that a source control repository issue tracker might work for collaboration beyond coding is optimistic unless academics have changed radically in the last few years.

I’d say the only thing lacking was any mention of pair programming, although to be fair that is more a teaching method than course material. I found I learnt most when I had a goal of my own to work towards, and I had the opportunity to pair with people with more knowledge than I had. Actually, pairing with someone equally clueless in a particular technology can work pretty well.

There is a degree to which the book, particularly in this section strays into a fantasy of how the authors wish computational physics was undertaken, rather than describing how it is actually undertaken.

To me this is the ideal “Software development for scientists” undergraduate text, it is opinionated in places and I occasionally I found the style grating but nevertheless it covers the right bases.

*I’m happy to say this since I taught programming badly to physics undergraduates some years ago!

**People who know my Python skills will realise this is not an earthshattering claim.

Book review: High Performance Python by Micha Gorelick & Ian Ozsvald

highperformancepythonHigh Performance Python by Micha Gorelick and Ian Ozsvald is nominally a book about improving the speed and memory performance of your programs. Along the way it provides insight into some more advanced aspects of Python programming, including how the language works under the hood.

The book starts with tools for analysing the speed and memory performance of programs at global, function and line level. The authors emphasis the importance making these measurements, and using unit testing to ease the process of optimisation. Blindly optimising where you *think* the problem lies is never a good idea.

The next set of chapters talk about some core Python data structures a little about their implementation and relative performance. These include lists and tuples, dictionaries and sets, iterators and generators and matrices and vectors.It is here that the numpy library is introduced, and it is treated almost as a core Python library in its importance.

The difference between range and xrange in Python 2 is striking: if you wish to execute a loop some number of times then range builds a list of size that number of elements and xrange makes a generator and therefore xrange uses far less memory.

The next few chapters cover compiling Python to C for speed increases, concurrency, the multiprocessing module and clusters. Typically chapters take an example (Julia sets, diffusion equations, estimating pi, finding primes) and demonstrate the speedups which can be made, from the routine to the ridiculous. The authors point out when further optimisation is a bit pointless.

For compiling to C, there are a number of options of varying coverage and maturity. Cython offers the most mature, widest coverage but at the cost of making breaking changes to code. Other newer solutions include Numba and PyPy, they do not require breaking changes to code but they are less mature and in the case of PyPy do not support the important numpy library.

Concurrency is about making better use of a single processor using asynchronous methods, here there are libraries such as gevent and tornado.

For parallel processing most focus is on the multiprocessing library, most of the book is platform agnostic but this chapter is based on the Linux implementation of Python.I hadn’t realised before that “embarrassingly parallel” had a specific meaning i.e. that there is no need for interprocess communication for the problem at hand.

The coverage of computing clusters is fairly cursory, this isn’t really the focus of this book and as the authors highlight: running clusters of machines can bring a significant administration overhead.

The book finishes with a chapter on reducing RAM usage, either by choice of intrinsic data types or using probabilistic data structures such as the Bloom filter which offer an approximate answer for vastly less memory usage. Also included are the Morris Counter which provides an approximate counter in 1 byte of storage, I must admit to being bemused as to when I would need such a thing.

Finally, there are what I refer to as “war stories” from practitioners in the field. I really liked these, one of the difficulties in working in technology is the constant stream of options to choose from, often with no clear frontrunner, so learning how others have approached problems is really handy. Here Celery, the Distributed Task Queue, and ElasticSearch get multiple mentions.

Overall the book is well-produced and readable. It occasionally lapses into the problem of inviting the reader to admire the colour in a greyscale printed plot. In other places I felt the example code could have presented up front in its entirety rather than being dribbled out in bits. In the chapter on Using less RAM, I felt important things were discussed (tries and directed-acyclic word graphs (DAWGs) before they were introduced which was a bit confusing. Tries and DAWGs are systems for the compact storage of text, and are both tree-like structures – I hadn’t come across these before.

In some ways this book is more about productionisation rather than performance. For the straightforward non-production data analysis work I’ve done I can imagine being a bit smarter about my choice of data structures and using profiling to be aware the slow points are as a result of this book. In the repeated reanalysis cycle it is nice to have something run in a minute rather than 20, but is it worth a day of development time? I would likely only turn to compilation, concurrency and multiprocessing if I were going to use a particular analysis regularly, or my anticipated run time was going to be measured in days without optimization.

I recommend this book to anyone looking to advance their understanding of Python, and speed up their code.

Book review: The Son also Rises by Gregory Clark

downloadThe Son also Rises by Gregory Clark is a book about social mobility, as traced through surnames. Clark prefaces his work by saying that what he is to say might be considered radical and controversial. Other studies of social mobility have find modest “inheritability” between generations. This study finds high levels of inheritability spanning hundreds of years.

The theme for the early chapters is to find some source of high status individuals – be it graduation from prestigious universities such as Oxford, Cambridge or the American Ivy League, membership of professional bodies such as those for doctors or attorneys or from financial records such as occasional tax releases or records of wills (probate). Next a cohort of names is tracked through these systems and their level of incidence is compared against the background level of incidence for that surname. For example, “Smythe” is a relatively rare surname in the general UK population but it is found at a much higher level in records of registered doctors.

The selected cohort of surnames may be from a distinctive ethnic population – i.e. Japanese in America, Native Americans or French settlers. Or it may be selected from a set of high status individuals at a point in time i.e. the Normans who came the England with William the Conqueror, or Swedish nobles.

Clark’s discovery is that for all of these many cohorts across multiple measures of status the persistence over time is strong. The Smythes of 200 years ago had relatively high status then and they still do now. After nearly a 1000 years those with surnames associated with the Norman conquest are still a little over-represented in the intake of Oxford and Cambridge University. Similar behaviour is found for low status groups, Baldrick’s character through the several series of Blackadder is not far from the truth. In both cases these groups are “regressing towards the mean” but it is a long, slow process.

Following these initial demonstrations of social mobility, Clark states his general law which is that the correlation of status over generations is high compared to previously measured parent-child measures and remarkably constant across multiple countries, periods in history and cohorts. The magic number for the correlation is 0.75. He argues that the reason that his estimate is higher than others is that he models social mobility with an underlying constant and a random fluctuation, the methods of calculation for early figures mean that this random fluctuation is much more apparent and brings down the measured social mobility. I don’t feel he demonstrates the origin of this discrepancy very clearly.

Subsequent chapters go on to look at some cases where one really expects deviations from this general rule, in the Indian caste system where low mobility is expected and also in China, where post-revolution is expected to be a time of high social mobility. It turns out that in India, despite laws aimed at reducing caste based discrimination, social mobility is has not improved dramatically. In China social mobility seems to have been little bothered by the revolution. The odd groups that do break the rule of constant social mobility seem to do so by preferential recruitment i.e. in the past in Muslim countries non-Muslims were tolerated but charged a poll tax which meant that lower status/income people were more likely to convert to Islam leaving a more persistently high status non-Muslim population. A second route is by strong preference for “in group” marriage which is seen in the Indian Brahmin caste. It turns out that the surnames identified with British parliamentarians are particularly immobile.

As for the origin of this constant social mobility, Clark ascribes it to what he calls “social competence”. There is a confused discussion of the balance of nature and nurture, not helped by a table where nature and nurture headings are accidently swapped (I think). I believe that technically it is all nurture, and Clark is trying to work out whether it is all about money. It strikes me that your wider family is where you learn about what the possibilities are for you and, while every family has it’s black sheep, the fact that your father, two out of three uncles went to Cambridge University means that your expectation is that you should aspire to that. Your family sets what is “normal”.

I suspect that this is particularly the case for British parliamentarians where there seems to be a lot of siblings (Milibands, Johnsons, Eagles), husband wife (Cooper/Balls) and parent-child (Kinnock, Benn) combinations. Being a politician is an odd sort of job, there is not really a class at school for it, seeing your family working in the “family business” must be a big influence.

“The Son also Rises” is an interesting read but turning it into a 300 page book seems to belabour the point somewhat. I liked the incidental details of the origins of surnames, and the various sources of information on social status.

I got this as a Kindle edition, I wish I’d bought it as a paperback, there are numerous figures, tables and equations which didn’t render at a reasonable size in the first instance.

Book review: The Values of Precision edited by M. Norton Wise

valuesofprecisionThe Values of Precision edited by M. Norton Wise is a collection of essays from the Princeton Workshop in the History of Science held in the early 1990s.

The essays cover the period from the mid-18th century to the early 20th century. The early action is in France and moves to Germany, England and the US as time progresses. The topics vary widely, starting with population censuses, then moving on to measurement standards both linear and electrical, calculating methods and error analysis.

I’ve written some notes on each essay, skip to the end of the bullet points if you want the overview:

  • The first article is about the measurement of population, mainly in pre-revolutionary France. This was spurred by two motivations: firstly, monarchs were increasingly seeing the number of their subjects as a measure of their power and secondly, there was a concern that France was experiencing depopulation. In the 17th century the systematic recording of births, deaths and marriages was mandated by royal direction. In the period after this populations were either estimated from a count of “hearths” or from the number of births. The idea being that you could take either of these indirect measures and multiple them by some factor to get a true measure of population.
  • The second article is by Ken Alder, he of “The Measure of All Things” and is another trip to revolutionary France and their efforts to introduce a metric system of measurement. The revolutionary attempt failed but the system of standards they created prevailed in the middle of the 19th century but not without some effort. Alder highlights the resistance of France to metrification, and also how the revolution bred a will to introduce a rational system based on natural measurements rather than a physical object created by man. He also discusses some of the benefits of the pre-metric system: local control, the ability for workers to take a cut without varying price, connection to effort expended/quality. This last because land was measured in terms of the amount of grain used to seed it or the area one person could harvest in a day – this varies with the quality of the land.
  • Jan Golinski writes on Lavoisier (again from France at the turn of the Revolution) regarding “exactness” and its almost political nature. Lavoisier made much of his exact measurements in the determination of the masses of what are now called hydrogen and oxygen in producing a known mass of water. This caused some controversy since other experimenters of the time saw his claims of exactness in measurement to be mis-used in supporting his theory for chemical reactions. There were reasons to be sceptical of some of his claims, he often cited weighed amounts to more significant figures than were justified by the precision of his measurements and there are signs his recorded measurements are a little too good to be true. These could be seen as the birthing pains of a new way of doing science which didn’t just apply to chemical measurements of the time, but also to surveying and the measurement of population. These days the inappropriateness quoting of more significant figures than are justified by the measurement is drummed into students at an early age.
  • Next we move from France to Germany and a discussion of the method of least squares, and the authority of measurements by Kathryn M. Olesko. Characters such as Legrendre and Laplace had started to put the formal analysis of error and uncertainty in measurement on the map. This work was carried forward by Gauss with the method of least squares, essentially this says that the “true” value of a measurement is that which minimises the squared difference of all the measurements made of that value. It is an idea related to probability, and it is still deeply embedded in how we make measurements today and also how we compare measurement to theory. In common with events in France, the drive for better measurement came in Germany with a drive to standardise weights and measures for the purposes of trade. The action here takes place in the first half of the 19th century.
  • The trek through the 19th century continues with Simon Schaffer’s essay on the work in England and Germany on electrical units with a particular view to establishing whether the speed of light and the speed of propagation of electromagnetic waves were the same. This involved the standardisation of units of electrical resistance. It was work that went on for some time. Interesting from a practicing scientists point of view was the need for the bench scientist and instrument makers to work closely together.
  • The next chapter is a step away from the physical sciences with a look at life insurance and the actuarial profession in the first half of the 19th century. Theodore Porter describes the attitude of this industry to precision and calculation, noting that they fended off attempts to regulate the industry too tightly by arguing that there business could not be reduced to blind calculation. The skill, judgement and character of the actuary was important.
  • The Image of Precision is about Helmholtz’s work on muscle physiology in around 1850, he used an apparatus which showed the extension of a muscle graphically following stimulation, and measured the speed of nerve impulses using similar methods. The graphical method was in some senses less precise than an alternative method but it was a more compelling explanatory tool and provided for better understanding of the phenomena under study.
  • Next up is a discussion of the introduction of so-called “direct-reading” ammeters and voltmeters by Ayrton and Perry in around ~1870. This was an area of some dispute, with physicists claiming that determinations of volts and amps be made by reference to the basic units of length, time and mass. Ayrton and Perry were interested in training electrical engineers whose measurements would be made in environments not conducive to these physicist-preferred measurements. Not conducive in both a technical sense (stray magnetic fields, vibration and so forth) nor in the practical sense (an answer within 1 percent in 10 minutes was far superior to one within 0.5 percent in 2 hours).
  • As we approach the end of the book we learn of Henry Rowland, and his diffraction gratings, made at John Hopkins university. Rowland had toured Europe, and on his return set to making high quality diffraction gratings to measure optical spectra. This is a challenging technical task, to be useful a diffraction grating needs many very closely spaced lines of the same profile. Rowland sent out his diffraction gratings for a nominal price, making no profit, but did not reveal the details of his methods. It took many years for his work to be better, and even longer yet for better diffraction gratings to be available generally.
  • The collection finishes with the construction of mathematical tables, starting with a somewhat philosophical discussion of the limits of calculation but moving onto more pragmatic issues of the calculation and sharing tables. The need for these tables came original with the computationally intensive calculations for determining the longitude by the method of lunar distances. The 19th century saw the growth in mathematical analysis in a range of areas, spreading the need to make mathematical tables. Towards the end of the century machine calculation was used to help build these tables, and do the analysis they supported. Students of my generation will likely just about remember using tables of trigonometric and other functions, these days in my practical work they are entirely replaced by computer calculations done on demand.

There is a lot in here which will speak to those with a training in science, physics in particular. The techniques discussed and the concerns of the day we will recognise in our own training. The essays hold a slight distance from practitioners in this arts but that brings the benefit of a different view. Core to which is the way in which precision in measurement is a social as well as technical affair. To propagate standards of measurement requires the community to build trust in the work of others, this does not happen automatically.

I like this style of presentation, each essay has its own character and interest. The range covered is much larger than one might find in a book length biography, and there is a degree of urgency in the authors getting their key points across in the space allocated.

In this book the various chapters do not overlap in their topics and cover a substantial period in time and space with the editor providing some short linking chapters to tie things together. All in all very well done.

Book Review: Stargazers–Copernicus, Galileo, the Telescope and the Church by Allan Chapman

stargazersIt’s been a while since my last book review here but I’ve just finished reading Stargazers: Copernicus, Galileo, the Telescope and the Church by Allan Chapman.

The book covers the period from the end of the 16th century, the time of Copernicus and Tycho Brahe, to the early 18th century and Bradley’s measurement of stellar aberration passing Galileo, Newton and others on the way. Conceptually this spans the full transition from a time when people believed in a Classical universe with earth at its centre, and stars and planets plastered onto crystal spheres, to the modern view of the solar system with the earth and other planets orbiting the sun.

This development parallels that in Arthur Koestler’s classic book "The Sleepwalkers”, however Chapman’s style is much more readable, his coverage is broader but not so deep. Chapman introduces a wealth of little personal anecdotes and experiments. For instance on visiting Tycho Brahe’s island observatory he recounts a meeting with a local farmer who had in his living room a marked stone from the Brahe’s observatory (which had been dismantled by the locals on Brahe’s death). Brahe was hated by his tenants for his treatment of them, a hate that was handed down through the generations. Illustrations are provided in the author’s own hand, which is surprisingly effective. He discusses his own work in reconstructing historical apparatus and observations.

Astronomy was an active field from well before the start of this period for a couple of reasons: firstly, astrology had been handed down from Classical times as a way of divining the future. To was believed that to improve the accuracy of astrological predictions better data on the locations of heavenly bodies over time was required. Similarly, the Christian Church required accurate astronomical measurement to determine when Easter fell, across increasingly large spans of the Earth.

The period covered by the book marks a time when new technology made increasingly accurate measurements of the heavens possible, and the telescope revealed features such as mountains on the moon, sunspots and the moons of Jupiter visible for the first time. Galileo was a principle protagonist in this revolution.

Amongst scientists there is something of the view that the Catholic Church suppressed scientific progress with Galileo the poster boy for the scientist’s case. Historians of science don’t share this view, and haven’t for quite some time. Looking back on Sleepwalkers, written in 1959 I noted the same thing – the historians view is generally that Galileo brought it on himself in the way he dismissed those that did not share his views in rather offensive terms. Galileo lived in a time when the well-entrenched Classical view of the universe was coming under increased pressure from new observations using new instruments. In some senses it was the collision with the long-held Classical view of the universe which led to his problems, the Church being more committed to this Classical view of the physical universe rather than to anything proposed in Scripture.

The role of the Church in promoting, and fostering science, is something Chapman returns to frequently – emphasising the scientific work that members of the Church did, and also the often good relationships that lay “scientists” of different faiths had with Church authorities.

Chapman introduces some of the lesser known English (and Welsh) contributors to the story. Harriet who made the earliest known sketches of the moon. The Lancashire astronomers, who made the first observations of the transit of Venus. John Wilkins whose meetings were to lead to the foundation of the Royal Society. He also notes the precedent of the Royal College of Physicians, formed in 1518. The novelty of the Royal Society when compared with earlier organisations of similar character was that the Fellows were responsible for new appointments, rather than them being imposed by a patron. This seems to have been an English innovation, repeated in the Oxbridge colleges, and Guilds.

Relating to these English astronomers was the development of precision instruments in England. This seems to have been spurred by the Dissolution of the monasteries. The glut of land, seized by Henry VIII, became available to purchase. The purchase of land meant a requirement for accurate surveying, and legal documents. Hence an industry was born of skilled men wielding high technology to produce maps.

I was distracted by the presence of Martin Durkin in the acknowledgements to this book, he was the architect of “polemical” Channel 4 documentary “The Great Global Warming Swindle”, so it cast doubt in my mind as to whether I should take this book seriously. On reflection Chapman’s position as presented in this book seems respectable, but it is interesting how a short statement in the acknowledgements made me consider this more deeply.

Overall, Stargazers is rather more readable than Sleepwalkers, not quite so single-tracked in it’s defence of the Catholic Church as God’s Philosophers and a different proposition to Fred Watson’s book of the same name, which is all about telescopes.