Author's posts
Nov 20 2020
Unit testing in Python using the unittest module
The aim of this blog post is to capture some simple “recipes” on testing code in Python that I can return to in the future. I thought it would also be worth sharing some of my thinking around testing more widely. The code in this GitHub gist illustrates the testing features I mention below.
My journey with more formal code testing started about 10 years ago when I was programming in Matlab. It only really picked up a couple of years later when I moved to work at a software startup, coding in Python. I’ve read a couple of books on testing (BDD in action by John Ferguson Smart, Test-Driven Development with Python by Harry J.W. Percival) as well as Working effectively with legacy code by Michael C. Feathers which talks quite a lot about testing. I wrote a blog post a number of years ago about testing in Python when I had just embarked on the testing journey.
As it stands I now use unit testing fairly regularly although the test coverage in my code is not great.
Python has two built-in mechanisms for carrying out tests, the doctest and the unittest modules. Doctests are added as comments to the top of function definitions (see lines 27-51 in the gist above). They are run either by adding calling doctest.testmod() in a script, or adding doctest to a Python commandline as shown below.
python -m doctest -v tests.py
Personally I’ve never used doctest – I don’t like the way the tests are scattered around the code rather than being in one place, and the “replicating the REPL” seems a fragile process but I include them here for completeness.
That leaves us with the unittest module. In Python it is not unusual use a 3rd party testing library which runs on top of unittest, popular choices include nosetests and, more recently, pytest. These typically offer syntactic sugar in terms of making tests slightly easier to code, and read. There is also additional functionality in writing and running test suites. Unittest is based on the Java testing framework, Junit, as such it inherits an object-oriented approach that demands tests are methods of a class derived from unittest.TestCase. This is not particularly Pythonic, hence the popularity of 3rd party libraries.
I’ve used nosetest for a while, now but it looks like its use is no longer recommended since it is no longer being developed. Pytest is the new favoured 3rd party library. Personally, I’m probably going to revert to writing tests using unittest. As a result of writing this blog post I will probably stop using nosetests as a test runner and simply use pure unittest.
The core of unittest is to call the function under test with a set of parameters, and check that the function returns the correct response. This is done using one of the assert* methods of the unittest.TestCase class. I nearly always end up using assertEquals. This is shown in minimal form in lines 67-76 above.
With data science work we often have a list of quite similar tests to run, calling the same function with a list of arguments and checking off the response against the expected value. Writing a function for each test case is a bit laborious, unittest has a couple of features to help with this:
- subTest puts all the test cases into a single test function, and executes them all, reporting only those that fail (see lines 82-90). This is a compact approach but not verbose. Note that nosetests does not run subTest correctly, it being a a feature of unittest only introduced in Python 3.4 (2014);
- alternatively we can use a functional programming trip to programmatically generate test functions and add them to the unittest.TestCase class we have derived, this is shown on lines 105-116;
Sometimes you write tests that you don’t always want to run either because they are slow to run, or because you used them in addressing a particular problem and now want to keep for the purposes of documentation but not to run. Decorators in unittest are used to skip tests, @unittest.skip() is the simplest of these, this is an “opt-out”.
Once you’ve written your tests then you need to run them. I liked using nosetests for this, if you ran it in a directory then it would trundle off and find any files that looked like they contained tests and run them, reporting back on the results of the tests.
Unittest has some test discovery functionality which I haven’t yet explored, the simplest way of invoking it is simply by running the script file using:
python tests.py -v
The -v flag indicates that output should be “verbose”, the name of each test is shown with a pass/fail status, and debug output if a test fails. By default unittest shows print messages from the test functions and the code being tested on the console, and also logging messages which can confuse test output. These can be supressed by running tests with the -b flag at the commandline or setting the buffer argument to True in the call to unittest.main(). Logging messages can be supressed by adding a NullHandler, as shown in the gist above on lines 188-119.
The only functionality I’ve used in nosetests and can’t do using pure unittest is re-running only those tests that failed. This limitation could be worked around using the -k commandline flag and using a naming convention to track those test still failing.
Not covered in this blog post are the setUp and tearDown methods which can be run before and after each test method.
I hope you found this blog post useful, I found writing it helpful in clarifying my thoughts and I now have a single point of reference in future.
Sep 22 2020
Book review: The clock and the camshaft by John Farrell
The clock and the camshaft by John Farrell is the story of technology through the Middle Ages which went on to support the Renaissance and the Scientific Revolution.
The book is structured by invention, and although some of the inventions are technologies as we would generally understand them there are also chapters on universities and monasteries, and languages. Each chapter looks at the ancient antecedents of a technology, where there is one, before looking at its place in the Middle Ages and how it played on to the Renaissance that followed. The antecedents are typically in the Roman Empire, China and the Middle East. The overall structure of the book is reminiscent of the technology “trees” one finds in a certain sort of computer game (Civilisation/Age of Empires).
There was a huge drop in population after the end of the Roman Empire in Europe in the 5th century CE until the 9th or 10th century. People no longer lived in towns or cities, and the art of building with stone appears to have been lost across much of Europe.
Food is a core concern at anytime and there were a couple of technological developments during the Middle Ages which helped here. The plough, used in the Mediterranean, was developed to better suit heavy Northern European soils. Horses were adopted to pull ploughs through the development of horse shoes and suitable harnesses.
In the Middle East water wheels were used in irrigation, from several centuries BCE. In Northern Europe irrigation was not quite such a concern but water wheels for power, in the first instance for milling wheat were important. This is not a simple technological development, for most individuals working the land it is convenient to hand mill wheat for your own consumption – a water powered mill is not worth the effort in maintenance or in initial capital outlay. This is where feudalism and monasteries get involved, feudal barons and monasteries can build and maintain a mill economically and they have subjects whose grain can be milled, for a price. Feudal masters obliged their subjects to use their mills, and pay a tariff to do so and under threat of punishment if they were found to be milling their own grain.
Once you have something that goes round and round, driven by a water or wind mill, then the next step is something that goes forwards and backwards. Or, more prosaically, converting rotation motion to linear motion. This might be to power a saw, or more often, to hammer things. Hammering things is important in the production of cloth (fulling), paper (pulping), and metal (crushing ore).Who would have thought hammering things was so important?
Paper is another key technology, the earliest writing is found in clay which was then superseded by papyrus – produced almost exclusively in Egypt. For rough notes codexes were used – parallel thin pieces of wood tied together. In Europe, after the fall of the Roman Empire, parchment made from the skins of goats or calves was used but this required a lot of dead animals. Meanwhile in China paper made from rags was being developed. This innovation was developed in Europe too, this arrival was key for new businesses. Now tradespeople could write things down relatively freely, critical for banking, and important in other businesses.
The challenge with clocks is to allow an power source to release its energy at a steady rate, this is done using an “escapement” mechanism. The first mechanical clocks were recorded in Europe towards the end of the 13th century.
Having forgotten how to build with stone at the end of the Roman Empire the cathedrals of the Middle Ages, built mostly in the 12th and 13th centuries were a sign that the skill of building with stone had been rediscovered. They were an evolution of Roman designs for grand buildings which allowed for much greater light through the insertion of windows. They followed the stone built castles of the Norman period around 1000 CE. Cathedrals are a rather more complex building than a castle but castles provided a good training ground.
Religion provided the impetuous for collecting manuscripts from the Arab world, during the 12th and 13th centuries with a view to improving their astronomic determinations of the date of Easter. Along the way they collected other manuscripts, returning to Spain and Italy to translate them.
Eye lenses were introduced in the first half of the 12th century, and appeared to evolve from glass used to display relics. There were antecedents of lenses found in ancient Egypt even back to the Bronze Age. The Venetians were early specialists in glass making, founding a guild in 1320. There was also expertise north of the Alps in Nurembourg but the quality of ground lenses dropped from 1500 with the first telescope makers towards the end of the century making their own lenses rather than buying them.
Monasteries, and monks, played an important role in carry knowledge across the Middle Ages after the fall of the Roman Empire. They were also important players in the material world, taking the part of a sort of feudal lord in some instances. Universities were in some senses a spin off from the collision between the Church and the Secular state, they arose originally as a place to study law – a topic which came to the fore in disputes between the Church and secular states over which had legal authority. Universities and monasteries are both examples of legal entities which were not people, an important innovation in law.
The book finishes with a chapter on lodestones which lead to the development of compasses for navigation, astrolabes and boats. Astrolabes were designed for astronomical measurement but also served as timekeepers, their design fed into the layout of the clock face. Boats were another technology which evolved as it moved north, the key innovation was switching to a skeleton-based design where the keel and ribs were laid down first, and then planks attached to them.
I liked this little book, much of what I’ve read in the history of science covers a later period – from the 17th century onward – The Clock and the Camshaft provides useful background, and is also very readable.
Aug 22 2020
Book review: A house through time by David Olusoga & Melanie Backe-Hansen
I’ve recently enjoyed watching A house through time, a series presented by David Olusoga tracking the history of a single house and its inhabitants across the years. The most recent series looked at house in Bristol, the city where I was an undergraduate. A house through time by David Olusoga and Melanie Backe-Hansen is the book of the series.
Rather than focus on a single house, as the TV series does, the book is a much broader sweep which looks at the history of the domestic dwelling back to Roman times, research methods and some social history which gives the “why” behind the houses.
This is a busman’s holiday for me, a large chunk of my job over the last few years has been to build a property database to help answer buildings insurance application questions. One of these questions is the property age, and it has been the cause of greatest pain for me. A house is a good background to this type of work, it provides the type of context which can be really helpful in understanding the data I come across. The issue for me though is that A house is written for those wishing to understand their own homes, rather than work out property age for 25 million or so dwellings but this is a niche interest and shouldn’t be taken as a criticism.
The book starts with a chapter on methods: how do you find out about your house? This is supported by an extensive set of links and a bibliography which strikes the happy medium between not providing any references, and referencing alternate words. The Census, and various surveys conducted before and during World War II are core to this, although these are ostensibly about people they provide evidence that an address existed at a point in time give or take variability in addresses and levels of details in addresses. Numbering of houses, as opposed to names, only started to rise in the middle of the 18th century. Also relevant are Ordnance Survey’s historical maps.
I was a bit surprised that there was very little mention of the listed building data, English Heritage and its partner organisations in Wales and Scotland aim to list all building built in the Georgian period and before. The data provides descriptions of the listed structures, this is the entry for 10 Guinea Street, Bristol which featured in one of the TV programmes.
There then follows a set of chapters on different periods, working forward in time covering the pre-Georgian, Georgian, Victorian, Interwar and post-war periods. These are the divisions I use in my work with the insurance industry (with the addition of a modern period starting in 1980).
There are a number of themes threaded through the book, much of the technological development of home building was relatively early. After the Roman’s left Britons reverted to living in wattle-and-daub or timber buildings for 400 years. The next significant technological developments were the discovery, and widening use of the chimney in the late 14th century followed by the re-discovery of brick making in the later 15th century. After that the next clear developments in building were in prefabricated and high-rise buildings post-Second World War.
A second theme is the legislative framework in which buildings wear built, these are two-fold there are “public safety” acts which are used to try to ensure safer buildings are built, these include the laws put in place after the Great Fire and those used to address the unsanitary conditions in Victorian slums in the later 19th century. These acts often specified a limited number of “model” properties and wonder whether these can be used for dating. There were also acts relating to taxation: window and brick taxes. It is the brick taxes that led to the standardisation of bricks, originally bricks were taxed by number so people made larger bricks so as to reduce their tax bills!
It is perhaps inevitable that the Victorian period running from 1837 to 1901 takes a large chunk of the book. This was a time during which there was a great move to the cities in support of the industrial revolution and a degree of “push” with the Inclosures Acts, Slum dwelling grew common, sanitation and urban clearances were initiated to relieve the slum conditions and the suburbs grew – supported first by omnibuses and then by railways. Although overcrowding and insanitary conditions were recognised early in the Victorian period addressing them took some time, with major improvements in the sewerage system happening towards the end of the 19th century. Often “improvement” schemes were more about sweeping aside the poor with no regard as to where they might live.
Towards the end of the Victorian period the suburbs started to grow, enabled by omnibus and then rail transport. It is at this time that semi-detached properties started to become common. The early suburbs gave me the impression of more rural aspects than modern suburbs. Some of the homes built in the late 19th century are very similar to those built in great numbers between the wars. It was only after the First World War that state intervention in building homes became widespread, the green shoots of this movement started in the late 19th century.
Sadly there is little scope for me to apply these methods to my own homes, I have nearly always lived in late sixties or seventies homes oddly they have had house numbers clustered around 30. In Bristol, as a student I lived in a basement flat close to the developments by Benjamin Stickland built around 1850.
I found A house really readable, it would be a great starting point if you were looking into the history of your own house or were just interested to understand how the domestic built environment came into being in the United Kingdom.
Aug 14 2020
Book review: Your voice speaks volumes by Jane Setter
I have a habit of reading the books written by people I follow on Twitter, Your voice speaks volumes by Jane Setter falls into that category. It is a book about how we speak (if we speak English, and largely if we are British).
Your voice is divided into seven chapters which cover seven separate themes.
It starts with a description of the mechanics of speech, and how we annotate sounds. I particularly like the chart of when children typically manage to produce different sounds, the earliest part of English come between 18 months and two years of age, with the last appearing between 5 and 8. I dutifully touched my larynx to feel the difference between the voiceless /s/ and the voiced /z/. Setter underestimates my ignorance by not explaining the difference between vowels and constants – I can tell you which letters are vowels but not why those letters are vowels.
The second chapter on accents is the one I found most fascinating, it turns out that certain features of accents follow the lines of the Anglo-Saxon occupation of Britain. So called rhotic, and non-rhotic pronunciation. Coming from the West Country my accent is probably a bit rhotic – I pronounce r’s more strongly. It is interesting to see accents cross a thousand years. This chapter also talks about how we are judged by our accents, a recurring theme is that women are more often criticised for their voices.
Chapter three talks about how we make judgements of a person based on the basis of how they speak, and how we might try to change those perceptions. Here we get an anecdote about Setter’s partner at university who had a masculine voice that did not match his slight physique! As in the previous chapter it is women who get the brunt of criticism for being perceived to have changed their voices. Men struggle to change their voices to sound more masculine/sexy – this is probably an evolutionary side effect – a voice indication of fitness that could be faked would not be very helpful. Included in this chapter are “uptalk” and vocal fry, uptalk is lifting the intonation at the end of a sentence. Uptalk I understand, I always associate it with Australians. Vocal fry is probably best understood by search youtube, it makes me think of Britney Spears.
I was shocked to discover that actors are still expected to have “received pronunciation RP” as their default voice and the ability to do General American as a “second language”. This is reflected in newsreaders where accents are largely notable by their absence. The chapter starts with some comments about Alesha Dixon singing “God Save the Queen”, she was criticised, purportedly, for Americanising her pronunciation by certain sections of the press. Setter highlights that Dixon’s pronunciation is only slightly Americanised and is most likely as a result of background in R&B music. It is an unwritten rule that different styles of music conventionally are sung in different accents, Country music and R&B “sound” better with an American accent. Hence bands like The Rolling Stones often have vocals with a hint of an American accent. Singing is a performance, rather than speech and so singers tend to learn a song, accent and all rather than sing with their speaking voice.
The chapter on forensic speaker analysis, based on Setters work in court this divides into auditory analysis, done with the ear, focused on the larger scale features of the voice and acoustic analysis which is done using software and looks at the frequency spectrum. It was interesting to learn how voice line-ups are constructed. The message of this chapter is that voice matches tend are indicative rather than absolute, analysis can show that two voice recordings could be from the same speaker but not confirm that fact.
The penultimate chapter talks about the importance of voice to the transgender community, to a degree trans men have an easier job, taking testerone leads to a natural lowering of the voice but the same is not true in reverse for trans women. Although pitch is the primary discriminating feature between male and female voices, it is not the only one.
The book finishes with a chapter on English as a second language. Setter has worked with call centre staff from India to help them provide a better service. Some of the complaints about such call centre staff boil down simply to customers not wanting to speak to foreigners. But in other cases the way in which English is spoken in England and India leads to misunderstanding in the manner of a conversation. What to an Indian English speaker is normal may sound like annoyance or frustration to an Indian speaker which makes me think of The Culture Map by Erin Meyer.
Your voice is written in quite a chatty style with a number of anecdotes to move the story along. It provides a useful overview of at least part of the work of a phonetician. The accompanying web page is a bit sparse (here) but includes a PDF of the introduction, so you can try before you buy. Your voice is endorsed by David Crystal whose books on the English language I feel I grew up on – to be honest I was pleased to discover he was still alive!
Aug 01 2020
Book review: 1491 by Charles C. Mann
I read 1491: New revelations of the Americas before Columbus by Charles C. Mann as a follow up to How the States got their Shapes by Mark Stein. I had been frustrated at how this latter book had focussed on the colonial period, with native Americans almost completely elided.
The book tracks through three broad themes looking at those themes as they relate to Amazonia in the South, through Peru into central America and then into North America. The emphasis is around central America and the more northern parts of South America. I think this is a result of where the archaeology are found.
The first theme is the numbers of native Americans in the time before colonisation, the broad picture here is that the population was large prior to colonisation, and in places still high as colonisation started but it was reduced dramatically by disease brought by the colonists – figures as high as a 90% reduction are discussed. The native Americans were highly vulnerable to the diseases the Europeans brought, both because they were genetically less diverse than the Europeans, and they had never experienced anything like the diseases brought. Initial contact between Europeans and Americans took place in the 16th century, with more serious attempts at colonisation not getting under way for a century, by which point disease had taken its toll. In an appendix Mann discusses whether syphilis travelled to Europe from the Americas, the evidence here is not clear.
The European view of the pre-colonial population of the Americas has varied, with the initial explorers recognising the sizeable populations of in the lands they found but as colonisation continued those memories were lost as the native population plummeted. Furthermore it was in the interests of the colonists to see the Americas as empty land, rather than land they took from others. During the 20th century these views have slowly been revised although there is considerable debate over exactly the scale of death through disease.
The second theme is origins: when were humans first found in the Americas? During the early part of the 20th century the view developed that the first human settlers in the Americas arrived over a land bridge from Asia around 15,000 years ago – the so called Clovis people. Towards the end of the 20th century this view has been revised with earliest origins going back to 30,000 or so years ago. In any case there were significant civilisations leaving behind ruins and burials dating back to 7,000 years ago.
The final theme is landscape, to what degree is the landscape we see in the Americas human-made? Here it seems that Europeans have one view of a human-made landscape which does not match what is found in the Americas. The Americas are the origin to a huge variety of important agricultural crops (potatoes, maize, peppers, cassava/manioc and squashes) but they are not found in the fields associated with European agriculture but more in cultivated woodland. This dichotomy is perhaps starkest in Amazonia where there is still considerable dispute as to what civilisations once lived there, if any, and to what degree the Amazonian rainforest is human-made. Again this is tinged with modern European and colonial sentiments, in particular from a conservationist point of view it is preferable to see the Amazon as an untouched wilderness rather than a landscape shaped by humans since this provides a strong argument against allowing (renewed) development.
American cities both North and South were often different from European cities, commerce has always been important in European cities but in the Americans cities were often great ritual centres with living accommodation for farmers and other workers but little sign of trade.
In a coda Mann discusses the potential impact that the egalitarian Haudenosaunee alliance had on the founding fathers of the United States. Mann is clear that his view on this is not yet mainstream but highlights that principles of the founding fathers were closer to those of the Haundenosaunee than those of the class-based hierarchy of European countries. Early colonists had extensive interactions with the native Americans, and it wasn’t unknown for them to join their communities seeing them as more congenial than their own.
The book finishes with appendices on names, the khipu system of writing with knots, syphilis and the Mesoamerican calendars. I must admit I bristled early on at the use of the term “Indian”, rather than “native American” but as Mann highlights “Indians” usually use the term “Indian” themselves without objection in the relevant context and he uses “Indian” and “native American” interchangeable simply to introduce a degree of variety. More generally he attempts to use the name that a member of a group would prefer to use for that group. It parallels my preference to be called European, British, English, from Dorset or from Cheshire depending on context.
I must admit I was aiming for a book that covered North American pre-colonial history in more detail. That said 1491 is readable, covering a great deal of ground. My next step is probably to look for a history of the Haudenosaunee.