Author's posts
Jan 16 2023
Book review: The Wood Age by Roland Ennos
My first book of 2023 is The Wood Age: How wood shaped the whole of human history by Roland Ennos, a history of wood and human society.
The book is divided into four parts “pre-human” history, up to the industrial era, the industrial era and “now and the future”.
Part one covers our ancestors’ life in the trees and descent from them. Ennos argues that nest building as practised by, for example, orangutans is a sophisticated and little recognised form of tool use and involves an understanding of the particular mechanical properties of wood. Descending from the trees, Ennos sees digging sticks and fire as important. Digging sticks are effective for rummaging roots out of the earth, which is handy if you moving away from the leaves and fruits of the canopy. Wood becomes harder with drying (hence making better digging sticks), and the benefits of cooking food with (wood-based) fire are well-reported. The start of controlled use of fire is unknown but could be as long ago as 2,000,000 years. The final step – hair loss in humans – Ennos attributes to the ability to build wooden shelters, this seems rather farfetched to me. I suspect this part of the book is most open to criticism since it covers a period well before writing, and with very little fossilised evidence of the key component.
The pre-human era featured some use of tools made from wood, and this continued into the “stone” age but on the whole wood is poorly preserved over even thousands of years. The oldest wooden tools discovered dates to 450,000 years ago – a spear found in Essex. The peak of tool making in the Neolithic is the bow and arrow – as measured by the number of steps required, and materials, required.
The next part of the book covers the period from the Neolithic through to the start of the Industrial Revolution. In this period ideas about farming spread to arboriculture, with the introduction of coppicing which produces high yields of fire wood, and wood for wicker which is a new way of crafting with wood. There is some detailed discussion on how wood burns, and how the introduction of charcoal, which burns hotter is essential to the success of the “metal” ages and progressing from earthenware pottery (porous and weak) to stoneware, which is basically glassy and requires a firing temperature of over 1000 celsius. As an aside, I found it jarring that Ennos quoted all temperatures in Fahrenheit!
This section has the air of describing a technology tree in a computer game. The ability to make metal tools, initially copper then bronze then iron then steel, opens up progressively better tools and more ways of working with wood, like sawing planks which can be used to make better boats than those constructed by hollowing out logs or splitting tree trunks. Interestingly the boats made by Romans were not surpassed in size until the 17th century.
Wheels turn out to be more complicated than I first thought, slicing a tree trunk into disks doesn’t work because the disks split in use (and in any case cutting cleanly across the grain of wood is hard without a steel-bladed saw). The first wheels, three planks cut into a circle and held together with battens, are not great. The peak of wheel building is the spoked wheel which requires steam bent circumference, turned spokes and a turned central hub with moderately sophisticated joints. Ennos argues that the reason South America never really took to wheels, and the Polynesians did not build plank built boats was a lack of metals appropriate for making tools.
Harder, steel tools also enabled the carpentry of seasoned timber – better for making furniture than greenwood which splits and deforms as it dries.
Ultimately the use of wood was not limited by the production of wood but rather by transport and skilled labour. The Industrial Revolution picks up when coal becomes the fuel of choice – making manufacturing easier, and allowing cities to grow larger.
The final substantive part of the book covers the Industrial Revolution up to the present. This is largely the story of the replacement of wood as fuel with coal, wood as charcoal (used in smelting) with coke (which is to coal what charcoal is to wood), and the replacement of many small wood items with metal, ceramic, glass and more recently plastic. It is not a uniform story though, England moved to coal as a fuel early in the 19th century – driven by an abundance of coal, a relative shortage of wood, and the growth of large cities. Other countries in Europe and the US moved more slowly. The US built its railways with wooden infrastructure (bridges and sleepers), rather than the stone used in Britain, for a much lower cost. The US still tends to build domestic buildings in wood. The introduction of machine made nails and screws in the late 18th century makes construction in wood a lower skilled activity. Paper based on wood was invented around 1870, making newspapers and books much cheaper.
In the 21st century wood and processed-wood like plywood or chipboard are still used for many applications.
The final part of the book is a short look into the future, mainly from the point of view of re-forestation. I found this a bit odd because it starts complaining about the “deforestation myth” but then goes on to outline when humans caused significant deforestation and soil erosion damage.!
Ennos sees wood as an under-reported factor in the evolution of humanity, but authors often feel their topic is under-reported. I suppose this is inevitable since these are people so passionate about their topic that they have devoted their energy to writing a whole book about it.
This is a nice read, not too taxing but interesting.
Dec 29 2022
Review of the year: 2022
As is traditional here I present an annual review of my blog which is largely comprised of book reviews but this year includes some technical posts as I learnt some new software engineering skills.
In book terms I started the year with Natives by Akala – this is the autobiography of Akala, – it fits into the Black Lives Matter theme which I started in the previous year. Railways and the Raj by Christian Wolmar also has something of this air, the way the British ran the Raj, and the subsequent violence on Partition are a salutatory lesson.
I read a couple of books about scripts, one specifically focussed on Chinese script – Kingdom of Characters by Jing Tsu, and a second, very short book, on all scripts – Writing and script – A very Short Introduction by Andrew Robinson.
From a technical point of view I read Felienne Hermans’ The Programmer’s Brain which definitely provided a lot of food for thought, Software Design Decoded by Marian Petre and André van der Hoek and Data mesh by Zhamak Dehgani. The topic of this last book, the data mesh, has been a central theme of my work this year.
My favourite book of the year was Pale Rider – The Spanish Flu of 1918 by Laura Spinney which was written before the covid pandemic, it was interesting to see the differences – no effective vaccines, or even a clear understanding of viruses and the similarities – arguments over schools remaining open. I also read The Art of More by Michael Brooks – a history of maths, it turns out accounting and bureaucracy were important drivers in the invention of maths. The last book of the year was Dutch Light by Hugh Aldersey-Williams – a biography of Christiaan Huygens – the second I have read.
On a more general history front I read Ask a Historian by Greg Jenner and Curious devices and mighty machines by Samuel J.M.M. Alberti, which is about science museums.
I continue to learn how to play the guitar, Play it Loud by Brad Tolinski and Alan Di Perna fits in with this – it is a history of the electric guitar, broader than The Birth of Loud by Ian S. Port which I read a few years ago. I have stopped with learning to play the (electronic) drums.
My posting this year was a bit more varied than it has been for a while, I started a thread of technical posts written as I clarified my thinking for a project I am working on at work – one of which, Understanding setup.py, setup.cfg and pyproject.toml in Python, has been my most popular blog post by a large margin and boosted traffic to my blog to the highest level ever! That’s not to say traffic is particular high – I had about 20,000 visitors this year. Versioning in Python was in a similar vein – technical information about some very specific technology. A way of working: data science and Software engineering for Data Scientists were a bit more general and philosophical, they have received rather less traffic.
In the summer the whole family joined Chester’s mid-Summer Parade as pirates which was a great deal of fun.
On the holiday front, we went to Ambleside in the Lake District for a week in July. The photos below are from Allan Bank by Grasmere – an exceedingly relaxed National Trust property. I was impressed by my new phone’s ability to take reasonable photos through windows – normally the inside of the room would be under-exposed, the photo album for the trip is here with many more photographs.
We also went to Dorset in October, where I grew up, stopping off at the gardens at Stourhead on the way down (pictured below). I scattered the ashes of my dad and stepmother with my stepbrothers in the New Forest. I was surprised how much ashes were involved – a large bag of flour-sized quantity for each of them. Dad would have been proud that two parties converged from two directions on the same location in the middle of the Forest from an X on an Ordnance Survey map, probably less impressed by me getting lost in a bog on the way back! Although as Mrs H said, getting lost having said a final farewell to my dad was rather symbolic. I posted a eulogy for my dad, here.
More photos from Dorset, including the Tank Museum, Monkey World and the Slimbridge Wetland Centre on the way back, here.
The Winter brought more entertainment, on the left you see me in my suit for the office Christmas Party. It is difficult to appreciate the sparkly-ness of the shoes but they are still out since I enjoy seeing them sparkle. On the right is the chief Roman from Chester’s Saturnalia celebration.
We all got covid earlier in the year, I still haven’t got back to my former running form – 10km in 50 minutes, I can only manage 3km in 15 minutes and struggle to run much further without post-exercise malaise setting in. My Garmin running watch generously tells me I still have the body of a 31 year old, 21 years younger than my calendar age!
I’ve have had quite a lot of counselling for anxiety this year – featuring Eye Movement Desensitization and Reprocessing (EMDR) which I insisted on referring to as “disco lights”. It appears to have worked to some degree although in the depths of winter when I’m not doing anything that induces anxiety it is difficult to tell.
Dec 17 2022
Book review: Dutch Light by Hugh Aldersey-Williams
It’s taken me a while but my next review is of Dutch Light: Christian Huygens and the making of science in Europe by Hugh Aldersey-Williams.
I have read a biography of Christiaan Huygens – Huygens – the man behind the principle by C.D. Andriesse, this was a little over 10 years ago so it says something about my memory that I came to Aldersey-Williams book fairly fresh!
Huygens was born in 1629 and died in 1695, so after Galileo (1564 – 1642) and René Descartes (1596-1650) but before Isaac Newton (1642-1726).
Huygens came from a relatively prestigious family his father, Constantijn was an important diplomat as was his brother (also Constantijn, the Huygens reused forenames heavily!). The family had a broad view of education and his father and brothers were brought up to appreciate, and make, art, music, and drawing as well as learning more academic subjects. Christiaan’s scientific collaboration with his brother continued throughout his life – mainly focussed on lens grinding.
This practical turn had an impact on Huygen’s scientific work, he made the lenses and telescopes that he used to discover the rings of Saturn, and his discovery was sealed with the beautifully drafted illustrations of Saturn’s rings seen at varying orientations relative to earth. It had been known since Galileo’s time that there was something odd about Saturn but telescope technology was such that the rings were not clearly resolved, furthermore as earth changes position relative to Saturn we view the rings at different angles which changes their appearance which added to the confusion over their nature. Having hypothesised that the structures around Saturn were rings, Huygens was able to predict (successfully) when the rings would be oriented edge on to earth and hence disappear.
The Netherlands has given birth to more than its share of astronomers, Aldersey-Williams discusses whether this is a special feature of the landscape: big open skies with reflecting water, material resources – abundant high quality sand for glass/lens making or the culture – in particular the Dutch school of art from the period. He doesn’t come to a firm conclusion on this but gives the book its title.
Huygens work on telescopes and Saturn also led to his more theoretical work on a wave theory of optics and the “Huygens Principle”, something I learnt at school.
Aside from his practical work on astronomy, Huygens was a very capable mathematician – respected by Newton and Leibniz. His work pre-figured some of Newton’s later work, he led the way in describing nature, and observations, with mathematical equations. A was a transitional figure at the cusp of the Scientific Revolution, a pioneer of described observed phenomena using maths – diverging from Descartes who believed that nature could be explained by the power of pure thought.
Huygens also worked on clocks, largely in relation to the problem of the longitude, again this is an example of a combination of practical design skills and mathematical understanding. His main contributions in this area were modifications of pendulum clocks to be more accurate and the invention of a spring driven oscillator – more robust than pendulum driven clocks at sea. In the end his contributions were not sufficient to solve the problem of the longitude, and he also fell out with Hooke over the invention of the spring drive. He also had a dispute with Huret, the clockmaker who implemented his designs. But if you were working in science in the 17th century and didn’t fall out with Hooke, what sort of scientist were you?!
“…the making of science in Europe” in the title of this book refers to Huygens international activities. He was a founding member of the French Academie des Science, courted specifically by its prime mover – Jean-Baptiste Colbert, living in Paris for 16 years between 1666-1672. Colbert’s successor was not as favourable disposed towards Huygens, and when Colbert died in 1683 he left the Academie. Huygens also met and corresponded with scientists in London, at the Royal Society and elsewhere, and across the rest of Europe. This was a time when discoveries, and experimental techniques were being shared more often, if not universally.
Andriesse and Aldersey-Williams both ask why Huygens is not more famous when compared particularly to Newton. I’ve thought about this a bit since reading Andriesse’s book and come to the tentative conclusion that figures like Galileo, Newton, Einstein and Hawking are not famous scientists. They are famous, and they happen to be scientists, they are symbols for a period not necessarily rooted in scientific achievement. Newton was promoted very heavily after his death by the English, and prior to his death he was not only a scientist but also Warden of the Royal Mint, and briefly an MP.
I enjoyed this book more than the Andriesse biography, in both cases it felt that there was perhaps a scarcity of material for Huygens life which led to a great deal of discussion around Huygens father, to the extent that in the early pages it wasn’t clear whether references to Huygens were to Christiaan or his father Constantijn.
Dec 05 2022
Versioning in Python
I have recently been thinking about versioning in Python, both of Python and also of the Python packages. This is a record of how it is done for a current project and the reasoning behind it.
Python Versioning
At the beginning of the project we made a conscious decision to use Python 3.9, however our package is also used by our Airflow code which does integration tests, and provides reference Docker images based on Python 3.7 (their strategy is to use the oldest version of Python still in support). This approach is documented here. And the end of life dates for recent Python versions are listed here:
Since we started the project, Python 3.11 has been released so it makes sense to extend our testing from just Python 3.9 to include Python 3.7 and 3.11 too.
The project uses an Azure Pipeline to run continuous integration / continuous development tests, it is easy to add tests for multiple versions of Python using the following stanza in the configuration file for the pipeline.
Extending testing resulted in only a small number of minor issues, typically around Python version support for dependencies which were easily addressed by allowing more flexible versions in Python’s requirements.txt rather than pinning to a specific version. We needed to address one failing test where it appears Python 3.11 handles escaping of characters in Windows-like path strings differently from Python 3.9.
Package Versioning
Our project publishes a package to a private PyPi repository. This process fails if we attempt to publish the same version of the package twice, where the version is that specified in the “pyproject.toml”* configuration file rather than the state of the code.
Python has views on package version numbering which are described in PEP-440, this describes permitted formats. It is flexible enough to allow both Calendar Versioning (CalVer – https://calver.org/) or Semantic Versioning (SemVer – https://semver.org/) but does not prescribe how the versioning process should be managed or which of these schemes should be used.
I settled on Calendar Versioning with the format YYYY.MM.Micro. This is a considered personal taste. I like to know at a glance how old a package is, and I worry about spending time working out whether I need to bump major, minor or patch parts of a semantic version number whilst with Calendar Versioning I just need to look at the date! I use .Micro rather than .DD (meaning Day) because the day to be used is ambiguous in my mind i.e. is the day when we open a pull request to make a release or when it is merged?
It is possible to automate the versioning numbering process using a package such as bumpversion but this is complicated when working in a CI/CD environment since it requires the pipeline to make a git commit to update the version.
My approach is to use a pull request template to prompt me to update the version in pyproject.toml since this where I have stored version information to date, as noted below I moved project metadata from setup.cfg to pyproject.toml as recommended by PEP-621 during the writing of this blog post. The package version can be obtained programmatically using the importlib.metadata.version method introduced in Python 3.8. In the past projects defined __version__ programmatically but this is optional and is likely to fall out of favour since the version defined in setup.cfg/pyproject.toml is compulsory.
Should you wish to use Semantic Versioning then there are libraries that can help with this, as long as you following commit format conventions such as those promoted by the Angular project.
Once again I am struck on how this type of activity is a moving target – PEP-621 was only adopted in November 2020.
* Actually when this blog post was started version information and other project metadata were stored in setup.cfg but PEP-621 recommends it is put in pyproject.toml and is preferred by the packaging library. Setuptools has parallel instructions for using pyproject.toml or setup.cfg, although some elements to do with package and data discovery are in beta.
Nov 16 2022
Software Engineering for Data Scientists
For a long time I have worked as a data scientist, and before that a physical scientist – writing code to do data processing and analysis. I have done some work in software engineering teams but only in a relatively peripheral fashion – as a pair programmer to proper developers. As a result I have picked up some software engineering skills – in particular unit testing and source control. This year, for the first time, I have worked as a software engineer in a team. I thought it was worth recording the new skills and ways of working I have picked up in the process. It is worth pointing out that this was a very small team with only three developers working about 1.5 FTE.
This blog assumes some knowledge of Python and source control systems such as git.
Coding standards
At the start of the project I did some explicit work on Python project structure, which resulted in this blog post (my most read by a large margin). At this point we also discussed which Python version would be our standard, and which linters (syntax/code style enforcers) we would use (Black, flake and pylint) – previously I had not used any linters/syntax checkers other than those built-in to my preferred editors (Visual Studio Code). My Python project layout used to be a result of rote learning – working in a team forced me to clarify my thinking in this area.
Agile development
We followed an Agile development process, with work specified in JIRA tickets which were refined and executed in 2 week sprints. Team members were subjected to regular rants (from me) on the non-numerical “story points” which have the appearance of numbers BUT REALLY THEY ARE NOT! Also the metaphor of sprinting all the time is exhausting. That said I quite like the structure of working against tickets and moving them around the JIRA board. Agile development is the subject of endless books, I am not going to attempt to describe it in any detail here.
Source control and pull requests
To date my use of source control (mainly git these days) has been primitive; effectively I worked on a single branch to which I committed all of my code. I was fairly good at committing regularly, and my commit messages were reasonable useful. I used source control to delete code with confidence and as a record of what I was doing when.
This project was different – as is common we operated on the basis of developing new features on branches which were merged to the main branch by a process of “pull requests” (GitHub language) / “merge requests” (GitLab language). For code to be merged it needed to pass automated tests (described below) and review by another developer.
I now realise we were using the GitHub Flow strategy (a description of source control branching strategies is here) which is relatively simple, and fits our requirements. It would probably have been useful to talk more explicitly about our strategy here since I had had no previous experience in this way of working.
I struggled a bit with the code review element, my early pull requests were massive and took ages for the team to review (partly because they were massive, and partly because the team was small and had limited time for the project). At one point I Googled for dealing with slow code review and read articles starting “If it takes more than a few hours for code to be reviewed….” – mine were taking a couple of weeks! My colleagues had a very hard line on comments in code (they absolutely did not want any comments in code!)
On the plus side I learnt a lot from having my code reviewed – often in pushing me to do stuff I knew I should have done. I also learned from reviewing other’s code, often I would review someone else’s code and then go change my own code.
Automated pipelines
As part of our development process we used Azure Pipelines to run tests on pull requests. Azure is our corporate preference – very similar pipeline systems can be found in GitHub and GitLab. This was all new to me in practical, if not theoretical, terms.
Technically configuring the pipeline involved a couple of components. The first is optional, we used Linux “make” targets to specify actions such as running installation, linters, unit tests and integration tests. Make targets are specified in a Makefile, and are involved with simple commands like “make install”. I had a simple MakeFile which looked something like this:
The make targets can be run locally as well as in the pipeline. In practice we could fix all issues raised by black and flake8 linters but pylint produced a huge list of issues which we considered then ignored (so we forced a pass for pylint in the pipeline).
The Azure Pipeline was defined using a YAML file, this is a simple example:
This YAML specifies that the pipeline will be triggered on attempting a pull request against a main branch. The pipeline is run on an Ubuntu image (the latest one) with Python 3.9 installed. Three actions are done, first installation of the Python package specified in the git repo, then unit tests are run and finally a set of linters is run. Each of these actions is run regardless of the status of previous actions. Azure Pipelines offers a lot of pre-built tasks but they are not portable to other providers, hence the use of make targets.
The pipeline is configured by navigating to the Azure Pipeline interface and pointing at the GitHub repo (and specifically this YAML file). The pipeline is triggered when a new commit is pushed to the branch on GitHub. The results of these actions are shown in a pretty interface with extensive logging.
The only downside of using a pipeline from my point of view was that my standard local operating environment is Windows with the git-bash prompt providing a Linux-like commandline interface. The pipeline was run on an Ubuntu image, which meant that certain tests would pass locally, but not in the pipeline, and were consequently quite difficult to debug. Regular issues were around checking file sizes (line endings mean that file sizes on Linux and Windows differ) and file paths – even with Python’s pathlib – are different between Windows and Linux systems. Using a pipeline forces you to ensure your installation process is solid, since the pipeline image is built on every run.
We also have a separate pipeline to publish the Python package to a private PyPi repository but that is the subject of another blog post.
Conclusions
I learnt a lot working with other, more experienced, software engineers and as a measure of the usefulness of this experience I have retro-fitted the standard project structure and make targets to my legacy projects. I have started using pipelines for other applications.