Dr Administrator

Author's posts

Review of the year: 2016

Another year passes and once more it is time to write the annual review of my blogging. I no longer have an hour and a half or so of commuting on the train everyday, so I thought my reading rate might have dropped. However, I see in the last year I have 21 book reviews on my blog as opposed to 22 last year. As usual my reading is split between technical books, the history of science and various odds and ends.

In terms of technical books, Pro Git by Scott Chacon and Ben Straub and Test-driven Development with Python by Harry J.W. Percival probably had the biggest impact on me in terms of the way I did my job. But Beautiful Javascript edited by Anton Kovalyov was the most thought provoking, it is an edited collection of the thoughts of a set of skilled Javascript developers. Lab Girl by Hope Jahren is an autobiography describing how it is to be a scientist, it is beautifully written. Maphead by Ken Jennings is about those obsessed with maps rather than science. Of the more directly science-related books I think The Invention of Science by David Wootton was the best in terms of provoking thought, it’s also very readable. The Invention covers the Scientific Revolution from 1500-1700 in terms of the language available to and used by its practitioners.

A second contender for the “sweeping overview” award goes to A New History of Life by Peter D. Ward and Joe Kirschvink which focuses particularly on the work over the last 20 years on the very earliest life on earth. I read some economic history in the form of The Honourable Company by John Keay (about the East India Company) and the more general The Company by John Micklethwait and Adrian Wooldridge. I also read about the Romans, in the form of Mary Beard’s SPQR which is a history of ancient Rome, and Roman Chester by David J.P. Mason which is about my home city.

You can see all I’ve read on Goodreads. I don’t blog about my fiction reading, I think because for me blogging is mostly about reminding myself about facts and ideas I’ve read about and I struggle to see how I’d do that with fiction. Perhaps I should try. In fiction, I’ve been making some effort to read books not written by middle aged white men which has been rewarding.

This year’s holiday was to Benllech on the isle of Anglesey, an embarrassingly short drive from home – our holiday bungalow had leaflets describing attractions in our home city! We took in a number of castles, the beach on a daily basis and the Anglesey Sea Zoo. The photo at the top is from Amlwch which was once port to the Copper Mountain.

The year has been momentous politically with Jeremy Corbyn’s re-election as leader of the Labour Party, the Leave vote in the EU referendum, David Cameron stepping down as Prime Minister and then leaving parliament, Theresa May taking over as Prime Minister and the election of Donald Trump as president in the US. I haven’t written much about all of these things. I wrote a blog post shortly before the EU referendum, putting out my reasons for voting Remain. I accidently wrote that I thought Leave would win – which was strangely prophetic. In the aftermath of the vote I was dazed and disturbed, much as I thought I would be. I half wrote many blog posts after the vote but the only one I published was on the unsuitability of Boris Johnson for pretty much anything, let alone the delicate role of Foreign Secretary.

Things are looking up a little for my party, the Liberal Democrats, who seem the only ones prepared to oppose the government over their Brexit “plans”, and the only ones prepared to vote against the “Snooper’s Charter”. We’re the only ones making significant gains in local elections and have made significant showings in Westminster by-elections, getting a 23.5% swing in Witney and winning Richmond Park with a 30.4% swing. The Labour party seems to be marching itself into the wilderness with considerable enthusiasm.

David Laws’ Coalition was my only political reading of the year.

I’ve written a couple of times on exercise related things: The Running Man on my newfound enthusiasm for running. Since writing I have a fancier running watch (a Garmin Forerunner 235), I read Bob Glover’s The Runner’s Handbook and decided I had to have a heart rate monitor. As it was I don’t pay a huge amount of attention to the heart rate monitor but it is nice that the GPS is ready to go by the time I reach the end of my walk up the drive rather than five minutes later. I also wrote about cycling to work in Ride, as others struggle to find parking at work I have a 12 space bike shed mostly to myself (particularly in the winter)!

I’ve been trying out Headspace recently which is an app for guided meditation, it seems helpful for the gloomy winter. I realise that some of the elements of meditation I used to get from our long walks in the country. 

Work has been fun, I have built something which is now being sold to customers, and I made something of an impact with my sequinned jacket and willingness to dance the night away at the office Christmas party. 

Book review: Elasticsearch–The Definitive Guide by Clinton Gormley & Zachary Tong

elasticsearchBack to technology with this blog post and a review of Elasticsearch – The Definitive Guide by Clinton Gormley and Zachary Tong. The book is available for free online, and probably more up to date (here), that said Elasticsearch seems to be quite stable now. I have a dead tree copy because I’m old-fashioned.

Elasticsearch is a full-text search engine based on the Apache Lucene project. I was first made aware of it when I was working at ScraperWiki where we used it for a proof of concept system for analysing legalisation from many countries (I wasn’t involved hands-on with this work). Recently, I used it to make a little auto-completion web form for company names using the Companies House dataset. From download to implementing a solution which was x1000 times faster than a naive SQL querying system took less than a day – the default configuration and system is that good!

You can treat Elasticsearch like a SQL database to a fair degree, what it refers to indexes are what would be separate databases on a SQL server. Elasticsearch refers to document types instead of tables, and what would be rows in a SQL database are called “documents”. There are no joins as such in Elasticsearch but there are a number of workarounds such as parent-child relationships, nested objects or plain old denormalisation. I suspect one needs to be a bit cautious of treating Elasticsearch as a funny looking SQL database.

The preferred way to interact with Elasticsearch is using the HTTP API, this means that once installed you can prod away at your Elasticsearch database using curl from the commandline or the  Sense plugin for Google Chrome. The book is liberally scattered with examples written as HTTP requests, and online these can be launched from the browser (given a bit of configuration). To my mind the only downside of this is that queries are written in JSON which introduces a lot of extraneous brackets and quoting. For my experiments I moved quickly to using the Python interface which seems well-supported and complete (as do other language bindings).

Elasticsearch: The Definitive Guide is divided into 7 sections: Getting started, Search in Depth, Dealing with Human Language, Aggregations, Geolocation, Modelling your data, and finishes with Administration, Monitoring and Deployment.

The Getting Started section of the book covers everything you need to get you going but no single topic in any depth. The subsequent sections are largely about filling in that detail. The query language is completely different to SQL and queries come back with results ranked by a relevance score. I suspect this is where I’ll find myself working a lot in future, currently my queries give me a set of results which I filter in Python. I suspect I could write better queries which would return relevance scores which matched my application (and that I would trust). As it stands my queries always return *something* which may or may not be what I want.

I found the material regarding analyzers (which are applied to searchable fields and, symmetrically, search terms) very interesting and applicable to wider search problems where Elasticsearch is not necessarily the technology to be used. There is an overlap here with natural language processing in the sense that analyzers can include tokenizers, stemmers, and synonym lookups which are all part of the NLP domain. This is expanded on further in the “Dealing with human language” section.

The section on aggregations explains Elasticsearch’s “group by”-like functionality, and that on geolocation touches on spatial extension-like behaviour. Elasticsearch handles geohashes which are a relatively recent innovation in encoding spatial coordinates.

The book mentions very briefly the ELK stack which is Elasticsearch, Logstash and Kibana (all available from the elastic website). This is used to analyse log files, logstash funnels the log data into elasticsearch where it is visualised using Kibana. I tried out kibana briefly, its an easy to use visualising frontend.

Elasticsearch is a Big Data technology from the start which means it supports sharding, replication and distribution over nodes out of the box but it runs fine on a simple single node such as my laptop.

Elasticsearch is a pretty big book but the individual chapters are pretty short and to the point. As I’d expect from O’Reilly Elasticsearch is well-edited, and readable. I found it great for working out what all the parts of Elasticsearch are and now know what exists when it comes to solving live problems. The book is pretty good at telling you which things you can do, and which things you should do.

Book review: Roman Chester by David J.P. Mason

roman_chesterI recently realised that I live in a city with rather remarkable Roman roots. Having read Mary Beard’s book, SPQR, about the Roman’s in Rome, I turn now to Roman Chester: Fortress at the Edge of the World by David J.P. Mason.

The book starts with a chapter on the origins of the study of the Roman origins of Chester, and some background on Roman activities in Britain. The study of the Roman history of Chester begin back in the 18th century, with the hypocaust under the old Feathers Inn on Bridge Street a feature promoted by its owner. The Spud-u-like on the site now similarly boasts of its Roman remains. The original Roman east gate was still standing in the 18th century, and there exist several drawings of it from that period. The Victorians were keen excavators of the Roman archaeology, and formed the Chester Archaeological Society in 1849, and built the Grosvenor Museum in 1883.

A recurring theme of the book is the rather wilful destruction of substantial remains in the 1960s to build a couple of shopping centres. The Roman remains on the current Forum Shopping Centre site were destroyed after the rather fine Old Market Hall had been knocked down.

The core Roman activity in Chester was the fortress, established in 75AD under the reign of Vespasian. The fort is somewhat larger than other similar forts in England and the author suggests this was because it was, at one time, intended as the provincial governors base. Vespasian died shortly after the building of the Chester fortress started and the work paused. At the time of its Roman occupation Chester had a very fine harbour, the local sandstone was suitable for building, a brickworks was setup at Holt, further up the River Dee, and there was metal mining in North Wales and there was salt sourced from Northwich – all very important resource at the time.

Standing on the river Dee meant Chester could serve as a base for the further conquest of Britain and Ireland – although these plans did not come to fruition.  The evidence for this is some unusual buildings in the centre of the old fortress, and the rather more impressive nature of the original walls than the average Roman fort, and the discovery of rather classier than usual lead piping.

The book continues with a detailed examination of the various parts of the Roman fortress and the buildings it contained: the public baths, granaries and barracks. This is followed by a discussion of the surrounding canabae legionis, including the amphitheatre, the supporting Roman settlement and the more detached vicus. This includes the settlement at Heronbridge which was excavated relatively recently.

The third part of the book travels through time, looking at the periods c90-c120 in which the fortress was rebuilt, c120-c210 when the legion stationed at Chester was sent elsewhere to fight leaving the fortress to decline significantly. c210-c260 when the original impressive buildings at the heart of the fortress, not initially completed, were finally built. c260-c350 when the fortress fell and rose again. To finish in the period c350-c650 when Britain became detached from Rome, and fell into decline. The Roman fortress was robbed to provide building stone for the medieval walls and other structures including the cathedral.

Roman remains are visible throughout modern Chester. The north and east parts of the modern city walls follow the line of the walls of the Roman fortress. Some pillars are on display in front of the library, the hypocaust found under the Grosvenor shopping centre can now be found in the Roman Gardens, the amphitheatre is half exposed, parts of the walls particularly near Northgate and parallel to Frodsham street are contain Roman elements, the mysterious “quay wall” can be found down by the racecourse.

The book finishes with some comments on the general character of the investigations of Roman remains in Chester, and suggestions for further investigations and how to better exploit Chester’s Roman history. On the whole Chester has done moderately well in its treatment of the past, study started relatively early but much material has not been published. These days archaeology is mandated for new developments in the city but these tend to be rapid, keyhole operations with little coherent design.

Roman Chester is a rather a dry read, it is written much I would expect an article in a specialist archaeology journal to be written. The book could have done with a full double page map of modern, central Chester with the archaeological sites marked on it. As it was I was flicking between text descriptions and Google Maps to work out where everything was. Perhaps a project for the Christmas holiday!

If you are a resident of Chester then the book is absolutely fascinating.

Update

I’ve started making a map of Roman Chester on Google Maps.

The Logging module in Python

In the spirit of improving my software engineering practices I have been trying to make more use of the Python logging module. In common with many programmers my first instinct when debugging a programming problem is to use print statements (or their local equivalent) to provide an insight into what my program is up to. Obviously, I should be making use of any debugger provided but there is something reassuring about the immediacy and simplicity of print.

A useful evolution of the print statement in Python is the logging module which can be used as a simple print function but it can do so much more: you can configure loggers for different packages and modules whose behaviour can be controlled centrally; you can vary the verbosity of your logging messages. If you decide to switch to logging to a file rather than the terminal this can be achieved too, and you can even post your log messages to a website using HTTPhandler. Obviously logging is about much more than debugging.

I am writing this blog post because, as most of us have discovered, using logging is not quite as straightforward as we were led to believe. In particular you might find yourself in the situation where you feel you have set up your logging yet when you run your code nothing appears in your terminal window. Print doesn’t do this to you!

Loggers are arranged in a hierarchy. Loggers have handlers which are the things that cause a log to generate output to a device. If no log is specified then a default log called the root log is used. A logger has a name and the hierarchy is defined by the dots in the name, all the way “up” to the root logger. Any logger can have a handler attached to it, if no handler is attached then any log message is passed to the parent logger.

A log record has a message (the thing you would have printed) and a “level” which indicates the severity of the message these are specified by integers for which the logging module provides convenient labels. The levels in order of severity are logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL. A log handler will output a message if the level of the message is equal to or more than the level it has been set to. So a handler set to WARNING will show messages at the WARNING, ERROR and CRITICAL levels but not the INFO and DEBUG levels.

The simplest way to use the logging module is to import the library:

import logging

Then carry out some minimal configuration,

logging.basicConfig(level=logging.INFO)

and then put logging.info statements in our code, just as we would have done with print statements:

logging.info("This is a log message that takes a parameter = {}".format(a_parameter_value))

logging.debug, logging.warning, logging.error and logging.critical are used to publish log messages with different levels of severity. These are all convenience methods which remove the need to explicitly give the level as found in the logging.log function:

logging.log(logging.INFO, "This is a log message")

If we are writing a module, or other code that we anticipate others importing and running then we should create a logger using logging.getLogger(__name__) but leave configuring it to the caller. In this instance we use the name of the logger we have created instead of the module level “logging”. So to publish a message we would do:

logger = logging.getLogger(__name__)
logger.info("Hello")

In the module importing this library you would do something like:

import some_library
logging.basicConfig(level=logging.INFO)
# if you wanted to tweak the levels of another logger 
logger = logging.getLogger("some other logger")
logger.setLevel(logging.DEBUG)

basicConfig() configures the root logger which is where all messages end up in the absence of any other handler. The behaviour of logging.basicConfig() is downright obstructive at times. The core of the problem is that it can only be invoked once in a session, any future invocations are ignored. Worse than this it can be invoked implicitly. So if for example you do:

import logging
logging.warning("Hello")

You’ll see a message because secretly logging has effectively run logging.basicConfig(level=logging.WARNING) for you (or something similar). This means that if you were to then naively go ahead and run basicConfig yourself:

logging.basicConfig(level=logging.INFO)

You would see no message when you subsequently ran logging.info(“Hello”) because the “second” invocation of logging.basicConfig is ignored.

We can explicitly set the properties of the root logger by doing:

root_logger = logging.getLogger()
root_logger.setLevel(logging.INFO)

You can debug issues like this by checking the handlers to a logger. If you do:

import logging
lgr = logging.getLogger()
lgr.handlers

You get the empty list []. Issue a logging.warning() message and you see that a handler has been added to the root logger, lgr.handlers() returns something like [<logging.StreamHandler at 0x44327f0>].

If you want to see a list of all the loggers in the hierarchy then do:

logging.Logger.manager.loggerDict

So there you go, the logging module is great – you should use it instead of print. But beware of the odd behaviour of logging.basicConfig() which I’ve spent most of this post griping about. This is mainly so that I have all my knowledge of logging in one place rather than trying to remember which piece of code I pulled off a particular trick.

I used the logging documentation here, blog posts by Fang (here) and Praveen Gollakota (here) and tab completion in the ipython REPL in the preparation of this post.

Book review: The Invention of Science by David Wootton

inventionofscienceBack to the history of science with The Invention of Science by David Wootton which covers the period of the Scientific Revolution.

Wootton’s central theme is how language tracked the arrival of what we see as modern science in a period from about 1500 to 1700, and how this modern science was an important thing that has persisted to the present day. I believe he is a little controversial in denying the ubiquity of the Kuhnian paradigm shift and in his dismissal of what he refers to as the postmodern, “word-games” approach to the history of science which sees scientific statements as entirely equivalent to statements of beliefs.This approach is exemplified by Leviathan and the air-pump by Steven Shapin and Simon Schaffer which gets several mentions.

Wootton argues contrary to Kuhn that sometimes “paradigm shifts” happen almost silently. He also points out that Kuhn’s science is post-Scientific Revolution. One of the silent revolutions that he cites is the model of the world. “Flat-earth” in no way describes the pre-Colombus model of the world which originated from classical Greek scholarship. In this theoretical context the sphere is revered and the universe is built from the four elements: earth, wind, fire and water. The model for the “earth” is therefore a variety of uncomfortable attempts to superimpose spheres of water and earth. The Ancients got away with this because in Classical times the known world did not cover enough of the earth’s sphere to reveal embarrassing discrepancies between theory and actuality. With Colombus’s “discovery” of America and other expeditions crossing the equator and reaching The Far East over land these elemental sphere models were no longer viable. The new model of the earth which we hold to today entered quietly over the period 1475 to 1550. 

Colombus’s “discovery” also marks one of the key themes for the book, the development of new language to describe the fruits of scientific investigation. Prior to Colombus the idea of an original discovery was poorly expressed in Western European languages, writers had to specifically emphasise that they were the first to find something or somewhere out rather than a having a word to hand that expressed this. Prior to this time, Western European scholarship was very much focused on the “re-discovery” and re-interpretation of the lost wisdom of the Ancients. Words like “fact”,”laws” (of nature), “theories”, “hypotheses”, “experiment” and “evidence” also evolved over this period. This happened because the the world was changing, the printing press had arrived (which changed communication and collaboration entirely). Machines and instruments were being invented, and the application of maths was widening from early forms of banking to surveying and perspective drawing. These words morphed to their modern meanings across the European languages in a loosely coupled manner.

Experimentation is about more than just the crude mechanics of doing the experiment, it is about reporting that work to others so that they can replicate and extend the work. The invention of printing is important in this reporting process. This is why alchemy dies out sometime around the end of the 17th century. Although alchemy has experiments, clearly communicating your experiments to others is not part of the game. Alchemy is not a science, it is mysticism with scientific trappings.

As a sometime practising scientist all of these elements of discovery, facts, evidence, laws, hypotheses and theories are things whose definitions I take for granted. They are very clear to me now, and I know they are shared with other working scientists. What The Invention of Science highlights was that there was a time when these things were not true.

The central section of the book finishes with some thoughts on whether the Industrial Revolution required the Scientific Revolution on which to build. The answer is ultimately “yes”, although the time it takes is considerable. It flows from the work of Denis Papin on a steam digester in the late 17th century to Newcomen’s invention of the steam engine in the early 18th century. Steam engines don’t become ubiquitous until much later in the 18th century. The point here is that Papin’s work is very much in the spirit of a “academic” scientist (he had worked with Robert Boyle), whereas Newcomen sits in the world of industrial engineering and commerce.

I’ve not seen such an analysis of language in the study of the Scientific Revolution before, the author notes that much of this study is made possible by the internet. 

The editor clearly had a permissive view of footnotes, since almost every page has a footnote and more than a few pages are half footnote. The book also has endnotes, and some “afterthoughts”. Initially I found this a bit irritating but some of the footnotes are quite interesting. For example, the Matses tribe in the Amazon include provenance in their verb forms, using the incorrect verb form is seen as a lie. In my day to day work with data this “provenance required” approach is very appealing.

The Invention of Science is very rich, and thought provoking and presents a thesis which I had not seen presented before, although the “facts” of the Scientific Revolution are well known. I’m off to read Leviathan and the air-pump partly on the recommendation of the author of this book.