Ian Hopkinson

Author's posts

Book review: Railways and The Raj by Christian Wolmar

railways_and_the_rajTwo interests combine with this book, Railways and The Raj by Christian Wolmar. I picked it up after a recommendation in Empireland by Sathnam Sanghera, which is about the British Empire from an Indian perspective but I’m also interested in railways. I have reviewed Wolmar’s Fire & Steam and The Subterranean Railway in the past. The Indian railway system has been sold as a benefit of colonialism, so I was interested to find out more.

Although the first railways in India were built as early as 1836, not long after those elsewhere, and for similar purposes: for shifting heavy loads short-distances at mines or similar, it wasn’t until the middle of the 19th century that railway building in earnest started. This followed two reports written by the Governor-General of India, Lord Dalhousie, in 1850 and 1853. In contrast to the chaotic growth of railways in Britain and elsewhere, Dalhousie’s plans, formulated a little after the first rush of railway building, presented a rational and coherent plan for the development of Indian railways.

The start to railway building was slow, with opposition from the East India Company in the first instance, furthermore physical conditions in India were challenging particularly the monsoon season which played havoc with railway bridges over rivers, and whose embankments disturbed the irrigation and drainage in surrounding areas. There were also serious mountain ranges to address.

The Indian railways were built very much for the benefit of the British, most of the rail companies were run from Britain, the levels of return on investment (made from Britain) were guaranteed by the Indian tax payer, most of the equipment (including rails and often sleepers) was sourced from Britain and the economic benefits of the freight transported by the railways were largely in Britain. Not only this, under the Raj, the senior positions in managing and running the railways were held by British people or Eurasians, and this extended to the train staff with drivers predominately British or Eurasian. The British travelling on the railways did so in luxurious first and second class carriages whereas the great majority of Indians travelled in a fairly grim third class.

Class, religious and gender differences were built into the fabric of the railway with various facilities provided separately for Muslim and Hindu passengers, and various castes. I struggle to decide how much this was a deliberate "divide and rule" policy of the British (which was later to have terrible consequences during Partition) or whether it was the right thing to do to respect local sensibilities (although it is fair to say "respecting local sensibilities" was not greatly in evidence during Britain’s colonial period).

There was some development of railways for famine relief – a recurring issue in Indian where millions died through famine in parts of the country. Beyond about 50 miles oxen, the main alternative for transporting food, consume more food than they can carry. The Victorian view was that the railway would carry food to be sold at the market rate from areas of surplus to those suffering famine, which did not greatly help the many poor unable to afford food.

There were lines built for military purposes, particularly in the north west in the direction of Afghanistan from where it was feared a Russian threat would come. More generally, as the railways developed the Indian Rebellion of 1857 was still fresh in the mind of the British and it was felt the railway could help move troops around to quell future rebellions – many early stations were built like fortresses. The railways were important during the two world wars but suffered in these periods from overuse and under-investment.

In a book with a number of shocks for white British sensibilities, I think I found the part on Partition most shocking most probably because it is not something I had thought about before: I knew India had gained independence after the Second World War and that Pakistan, and Bangladesh were part. I had not absorbed that it meant the displacement of between 10 and 20 million people, and the deaths of up to 2 million. 20 million people is a third the population of the United Kingdom and 2 million people is the population of Liverpool, Manchester and Birmingham combined.

After Independence and Partition, the successful running of the railways was seen as an important symbol of the success of Independence. Despite the rather hasty British exit, and the lack of home-grown talent and supply chains the post-Independence Indian Railway was quickly much improved.

One recurring theme of the book is the enormous scale of Indian Railways, it employs currently 1.3 million people – globally ranking alongside various Chinese state bodies, McDonald’s, Walmart and the NHS. In the early days the Indian Railways set up company towns in part to service white British employees but also for Indian employees because the railway works were often in otherwise isolated areas. Even now Indian Railways owns huge amounts of property in which its employees live, and also hospitals and schools. It remains central to transport in Indian where the capacity of the airline routes is limited, and the road network is relatively under-developed.

I enjoyed this book as a story of the development of the railway in India, but also as a sketch of Indian history from the middle of the 19th century. To answer my original question, the railway did benefit India ultimately, after Independence, but under colonial rule it was largely a benefit to Britain.

Book review: Software Design Decoded by Marian Petre and André van der Hoek

66-ways-expertsSoftware Design Decoded: 66 Ways Experts Think by Marian Petre and André van der Hoek is my next read.

I picked it up as a recommendation from The Programmer’s Brain by Felienne Hermans. It is an odd little book, something like A6 format with 66 pages containing a short paragraph or two on the behaviours of experts in software design. Each page dedicated to a single thought. There are sketches scattered liberally though the book by Yen Quach who is credited in the author biographies.

Although it does not have a contents page or index, Software Design Decoded is divided into "chapters":

  • Experts keep it simple
  • Experts collaborate
  • Expers borrow
  • Experts break rules
  • Experts sketch
  • Experts work with uncertainty
  • Experts are not afraid
  • Experts iterate
  • Experts test
  • Experts reflect
  • Experts keep going

I found this book reassuring as much as anything, and it also gave me some things to think about. Reassuring because it turns out I share habits with expert in software design, which must be a start to being an expert! I write quite a lot of software (for data analysis and data builds) but design tends to come as an afterthought.

I think the things I already do are to build something even if it isn’t the final form, I was interested in the comment about avoiding over-generalisation. The element I am missing here is to learn from this initial form and build something better (potentially discarding what I’ve already done). I also do a fair bit of testing, although in this book testing is wider than just software unit tests or even integration tests, it is about testing preconceptions and testing with the user.

I also liked the comment on focusing on the needs of the key stakeholders where the key stakeholders are the end users, this is a recurring theme – that the end users are the key focus, and them using the product/software are when the job is done.

Always learning gets a recommendation as well as not being afraid to use things in manners other than that intended.

I was interested to note the comments on experts forever sketching since it is something I scarcely do, sometimes a write sequences of tricky bits of code with the odd arrow. I remember learning how to draw flow charts in the late seventies but rarely use the skill (certainly not with all the proper symbols). Software Design Decoded is slightly contradictory on this, in one place experts sketch abstractly as an aid to thought with the sketches meaningless beyond the moment, and in another the sketches are kept for reference later and hence clear and well-labelled.

Notation also gets a couple of mentioned, I take this as a formalised system for naming things – something popular with physicists where the right notation is the difference between a page of formulae and a single line. I’m not really aware of using this in my own practice. Despite repeated attempts at object-oriented design I still tend to be quite "procedural".

I’m still in the "learning" phase of collaboration, for the first time in a while I’m working on code with other people (and it is a bit of a shock for all concerned), I still can’t abide by meetings but the experts can’t abide some of them (the ones with no direction).

I found this a bit of a "feel good" book, I share at least some of the habits of software design experts! I probably wouldn’t buy it for a personal read but if you have a coffee table in your software company this book would fit right in.

Book review: Ask a historian by Greg Jenner

ask_a_historianAsk a historian by Greg Jenner is a bit of a change of tack for me. It is a list of 50 questions to a historian, Greg Jenner. Each answer is conversational in style, a couple of thousand words at most, pitched at a level that my fairly bright 10 year old would understand although the content is such that I would be judicious in just sharing it with him. Jenner works on the TV series Horrible Histories which, amongst other things, puts historical incidents to modern pop tunes. It is highly educational and a firm favourite for all ages in our household!

Fifty questions is more than I can review individual, so I will simply outline the style of the questioning and highlight some of my favourites. They are divided into 12 thematic chapters with 4 or 5 questions in each chapter.

Chapter 1 – Fact or Fiction

2 – Is it true they put a dead pope on trial? Yes, it is true, a subsequent pope dug him up in order to do this! The papacy was a fairly wild institution particularly in the 9th century AD with a total of 24 popes in the period 896-904. Contrasting with a total of 5 in my 50 year life. The 9th century popes did not die of natural causes, their successors helped them along the way.

3 – Atlantis proves aliens are real? – There questions that make Jenner angry (not at the questioner), this is one of them. Jenner’s concern is two-fold on this, the first is the implication that non-Europeans couldn’t possibly have done all of these magnificent things – it must have been aliens – which is rather insulting. Secondly, the alien conspiracy theories often have their roots in Nazism.

Chapter 2 – Origins and Firsts

6 – When was the first Monday? No historian likes to be pinned down on a "first" but the origins of the days of the week go back a long way. There is some evidence that the Babylonians used a seven day cycle, it fits neatly into the Lunar month, but the seven day week was definitely in place by 2,500 years ago with the Jewish religion celebrating a Sabbath every seven days. There were other options, the ancient Egyptians celebrating a ten day week Etruscans and early Romans following an 8 days week (labelled with letters A to H).  

8 – When did birthdays start being celebrated? It is comforting to realise that we’ve been celebrating our birthdays for at least 2500 years. A birthday party invitation was found at Vindolanda, a Roman fort on Hadrian’s Wall.

Chapter 4 – Food

15 – How old is curry? I found it interesting that the heat we most associate with curry, produced by chillies, is the result of an import from South America. Also it is a bit chastening that "curry" is largely an invention of the British, a bastardisation of  a very diverse Indian cuisine.

Chapter 5 – Historiography

19 – Who names historical periods? This turns out to be a surprisingly difficult question, historians don’t necessarily agree on the extents of a period (like the Long Eighteenth Century), periods do not neatly delineate time – they overlap, and vary across the world. Periods like "Victorian" are ridiculously large and encompass massive changes in social and economic conditions. Finally, the inhabitants of a period may be unhappy with where they have been placed – the Tudors would not have liked being called Tudors.

Chapter 6 – Animals & Nature

23 – When did we start keeping hamsters as pets? All I can say on this question is that hamsters are creatures full of rage.

Chapter 11 – Language & Communications

45 – Where names for places in other languages come from? I liked this question, in large part because I remember travelling out of Pisa on a bus wondering why I’d never heard of the obviously large city of Firenze which I kept seeing on signs (it is the city I know as Florence). The names locals give places are endonyms and those that foreigners provide are exonyms. In the days of rapid and communication, essentially since the beginning of the 19th century there has been a tendency for exonyms and endonyms to be one and the same, give or take a bit of pronunciation. Bécs is the Hungarian name for Viennna, known as Wien by the Austrians. Vienna was at the border of the Magyar empire, and basically they called it "gateway". 

Chapter 12 – History in Pop Culture

49 – Why do we care so much about the Tudors? I liked this question because it hints at something I have seen elsewhere about Newton, and it occurs regarding Anne Boleyn’s purported 3rd nipple in an earlier question in this book. These stories were promoted by supporters or opponents in the years after a dynasty or person had died because they supported a preferred narrative and their influence persists for centuries.

The book finishes with a rather nicely crafted Recommended Reading section, and perhaps this is the point of the book – not as an end in itself but an introduction to a range of books for a more in depth view. Ask a historian would be an excellent holiday read, I must admit I prefer something more substantial on a single subject.

Understanding setup.py, setup.cfg and pyproject.toml in Python

This blog post is designed to clarify my thinking around installing Python packages and the use of setup.py, setup.cfg and pyproject.toml files. Hopefully it will be a useful reference for other people, and future me.

It is stimulated by my starting work on a new project where we have been discussing best practices in Python programming, and how to layout a git repository containing Python code. More broadly it is relevant to me as someone who programmes a lot in Python, mainly for my own local use, but increasingly for other people to consume my code. Prior to writing this the layout of my Python repositories was by a system of random inheritance dating back a number of years.

The subject of Python packaging, installation and publication is a bit complicated for mainly historic reasons – the original distutils module was created over 20 years ago. A suite of tools have grown up either as part of the standard library or de facto standards, and have evolved over time. Some elements are contentious in the sense that projects will have lengthy arguments over whether or not to support a particular method of configuration. A further complication for people whose main business is not distributing their code is that it isn’t necessarily at the start of a project and may never be relevant.

Update: I have updated this blog post 5th May 2023, the change is that project settings formerly in setup.cfg can now go in pyproject.toml, as per PEP-621 – described in more detail in the PyPA documentation. Currently I only use setup.cfg for flake8 configuration.  A reader from Mastodon commented that setup.py is not required for installation of a package but is required for build/publication.

tl;dr

Structure your Python project like this with setup.py and pyproject.toml in the top level with a tests directory and a src directory with a package subdirectory inside that:

The minimal setup.py file simply contains an invocation of the setuptools setup function, if you do not intend to publish your project then no setup.py file is required at all, pip install -e . will work without it:

setup.py

Setup.cfg is no longer required for configuring a package but third-party tools may still use it. Put at least this in pyproject.toml:

 

Then install the project locally:

pip install -e .

If you don’t do this “editable installation” then your tests won’t run because the package will not be installed. An editable install means that changes in the code will be immediately reflected in the functionality of the package.

It is common and recommended practice to use virtual environments for work in Python. I use the Anaconda distribution of Python in which we setup and activate a virtual environment using the following, to be run before the pip install statement

conda create -n tmp python=3.9
conda activate tmp

There is a copy of this code, including some Visual Code settings, and a .gitignore file in this GitHub repository: https://github.com/IanHopkinson/mypackage

Setup.py and setup.cfg

But why should we do it this way? It is worth stepping back a bit and defining a couple of terms:

module – a module is a file containing Python functions.

package – a package is a collection of modules intended to be installed and used together.

Basically this blog post is all about making sure import and from ... import ... works in a set of distinct use cases. Possibilities include:

  1. Coding to solve an immediate problem with no use outside of the current directory anticipated – in this case we don’t need to worry about pyproject.toml, setup.cfg, setup.py or even __init__.py.
  2. Coding to solve an immediate problem with the potentially to spread code over several files and directories – we should now make sure we put an empty __init__.py in each directory containing module files.
  3. Coding to provide a local library to reuse in other projects locally this will require us to run python setup.py develop or better pip install -e .
  4. Coding to provide a library which will be used on other systems you control again using pip install -e .
  5. Coding to provide a library which will be published publicly, here we will need to additionally make use of something like the packaging library.

I am primarily interested in cases 3 and 4, and my projects tend to be pure Python so I don’t need to worry about compiling code. More recently I have been publishing packages to a private PyPI repository but that is a subject for another blog post.

The setup.py and setup.cfg files are artefacts of the setuptools module which is designed to help with the packaging process. It is used by pip whose purpose is to install a package either locally or remotely. If we do not configure setup.py/setup.cfg correctly then pip will not work. In the past we would have written a setup.py file which contained a bunch of configuration information but now we should put that configuration information into setup.cfg which is effectively an ini format file (i.e. does not need to be executed to be read). This is why we now have the minimal setup.py file.

It is worth noting that setup.cfg is an ini format file, and pyproject.toml is a slightly more formal ini-like format.

What is pyproject.toml?

The pyproject.toml file was introduced in PEP-518 (2016) as a way of separating configuration of the build system from a specific, optional library (setuptools) and also enabling setuptools to install  itself without already being installed. Subsequently PEP-621 (2020) introduces the idea that the pyproject.toml file be used for wider project configuration and PEP-660 (2021) proposes finally doing away with the need for setup.py for editable installation using pip.

Although it is a relatively new innovation, there are a number of projects that support the use of pyproject.toml for configuration including black, pylint and mypy. More are listed here:

https://github.com/carlosperate/awesome-pyproject

Where do tests go?

Tests go in a tests directory at the top-level of the project with an __init__.py file so they are discoverable by applications like pytest. The alternative of placing them inside the src/mypackage directory means they will get deployed into production which may not be desirable.

Why put your package code in a src/ subdirectory?

Using a src directory ensures that you must install a package to test it, so as your users would do. Also it prevents tools like pytest incidently importing it.

Conclusions

I found it a useful exercise researching this blog post, the initial setup of a Python project is something I rarely consider and have previously done by rote. Now I have a clear understanding of what I’m doing, and I also understand the layout of Python projects. One of my key realisations is that this is a moving target, what was standard practice a few years ago is no longer standard, and in a few years time things will have changed again.

Book review: The Programmer’s Brain by Felienne Hermans

programmers_brainI picked up The Programmer’s Brain by Felienne Hermans, as a result of a thread on Twitter. I’ve been following Hermans for quite a while, and knew the areas of computer science she worked in but my interest in Programmer’s Brain was stimulated by a lengthy thread she posted over the Christmas break.

The book is based around the idea of the brain as having long term (LTM), short term (STM) and working memory and how these different sorts of memory come into play in programming tasks, how we can improve our memories, and how we can write code that supports our use of them. It cites a fair number academic studies in each area it looks at.

The book is divided into four parts.

The first part covers the reading of code. We do a lot of training on how to write code but none on reading it, yet as developers we spend a lot of time reading code, either our own code from the past, the code of our colleagues or library code.

Perhaps most traumatic for me was the suggestion that I should learn syntax. Hermans suggests flash cards to learn syntax, as an aid to reading code (and writing it), highlighting that going and looking up syntax is likely to break our flow, by the time we have checked out twitter and some pictures of kittens. Thinking about my own behaviour, this is definitely true. My first flash cards would all be around Python – set syntax, format statements, unittests boilerplate and the options for sort and sorted.

An idea I hadn’t come across before was refactoring code for readability which may be at odds with how code currently stands; you might, for example, inline functions to remove the need to go look them up and potentially lose your place in code. Or replace lambdas, list comprehensions or ternary operators – all of which take a bit more effort to parse – with their more verbose, conventional alternatives.

Two things that aid reading code are "chunking", experts in a field, like chess or programming, don’t learn remember every detail but they know the rules of possibility so they can break up a programme or a chess position into larger pieces (or chunks). They thus have better recall than novices.

The second aid to reading code are beacons, variable names and comments that hint about the higher purpose of code, to enable you to recall the right chunks. That’s to say if you are implementing code that uses a binary try you use the conventional names of root, branch, node, left and right rather than trying to be individualistic.

I suspect a lot of programmers, like me, will be looking at the rote learning exercises that Hermans proposes and starts to think immediately about how to automate them! I think there is scope for IDE extensions that allow you set up some flashcards or little code exercises. Also Hermans proposes quite a lot of printing out code and annotating it, again this was something I’d quite like IDE support for.

The second part is on understanding code more deeply, how it works. I was interested to learn that our natural language abilities are a better predictor of how good we are at comprehending what code does, than our mathematical abilities. In terms of understanding code, Hermans talks about marking up listings of code to highlight the occurrence of functions and variables. And, furthermore, to label variables by role following the work of Sajaniemi that is to say into the categories of fixed value, stepper, flag, walker (like a stepper), most recent holder, most wanted holder, gatherer, container, follower, organiser, and temporary. The co-occurrence of these roles provides strong clues as to what code does – in the same manner as design patterns. If we spot a design pattern we can access our long term memory as to what a design pattern does.

Following on from the idea of labelling roles of variables is the somewhat depreciated "Hungarian notation" proposed by Simonyi. This is where you include some type or role information in a variable name such as "strMyName" or "lb_textbox", Simonyi’s original proposal was to name variables with their roles, rather than just their types which is rather less useful in strongly typed languages and modern IDEs with syntax highlighting.

The third part is on writing code, starting with the importance of naming things. The key here is consistency in naming (i.e. stick with either snake case or camel case, don’t mix), and agreeing a "name mould" – a pattern for compiling parts of a name. Martin Fowler’s "code smells" are also covered in this section, highlighting how they interact with the model and how bad code smells prevent us accessing our long term memories. 

The final part is on collaborating on code, including the developer’s great bugbear "the interruption", it turns out this annoyance is well-founded with research showing that an interruption typically requires 15-20 minutes for recovery. I was also interested to see that we are not very good at multi-tasking, although we might think we are.

Also in this part is a discussion of the cognitive dimensions of code bases (CDCB), these are ideas like the error proneness of code, how easy it is to modify, how easy it is to test in parts applied at the level of an application or library. There is an implication here that the language you use to build a library may change over the course of time, perhaps starting with Python when you are roughing things out quickly, adding in type hinting when the library is more mature and shifting to Scala or Java when the design is stable and better performance is needed.

Finally, there is a small piece on onboarding new developers to a project, here the ideas of cognitive load repeat. Often when we are onboarding a new developer we show them the code, introduce a load of people, draw diagrams and so forth – all very fast. Under these circumstances our ideas about cognitive load tell us anyone will be overwhelmed.

I enjoyed this book, it feels like a guide to getting better at doing something I spend a lot of my time on. It is an area, learning in the field of programming, that I have not seen written about elsewhere.

Hopefully this book will change the way I work a bit, I’ll try to learn more syntax, I’ll not worry about reusing the same variable names, or even using Hungarian notation. I’ll try to remember the roles of variables. And I’ll try Hedy out with my son, Hedy is the teaching language Hermans wrote while also writing this book.