Ian Hopkinson

Author's posts

Understanding setup.py, setup.cfg and pyproject.toml in Python

This blog post is designed to clarify my thinking around installing Python packages and the use of setup.py, setup.cfg and pyproject.toml files. Hopefully it will be a useful reference for other people, and future me.

It is stimulated by my starting work on a new project where we have been discussing best practices in Python programming, and how to layout a git repository containing Python code. More broadly it is relevant to me as someone who programmes a lot in Python, mainly for my own local use, but increasingly for other people to consume my code. Prior to writing this the layout of my Python repositories was by a system of random inheritance dating back a number of years.

The subject of Python packaging, installation and publication is a bit complicated for mainly historic reasons – the original distutils module was created over 20 years ago. A suite of tools have grown up either as part of the standard library or de facto standards, and have evolved over time. Some elements are contentious in the sense that projects will have lengthy arguments over whether or not to support a particular method of configuration. A further complication for people whose main business is not distributing their code is that it isn’t necessarily at the start of a project and may never be relevant.

tl;dr

Structure your Python project like this with setup.py, setup.cfg and pyproject.toml in the top level with a tests directory and a src directory with a package subdirectory inside that:

2022-02-13-tree

The minimal setup.py file simply contains an invocation of the setuptools setup function:

setup.py

Put at least this in setup.cfg, most of this is to do with setup finding the source files in the src directory:

Put at least this in pyproject.toml:

pyproject.toml

Then install the project locally, making the “live” version available to use it locally run this:

pip install -e .

If you don’t do this “editable installation” then your tests won’t run because the package will not be installed.

There is a copy of this code, including some Visual Code settings, and a .gitignore file in this GitHub repository: https://github.com/IanHopkinson/mypackage

Setup.py and setup.cfg

But why should we do it this way? It is worth stepping back a bit and defining a couple of terms:

module – a module is a file containing Python functions.

package – a package is a collection of modules intended to be installed and used together.

Basically this blog post is all about making sure “import” and “from … import …” works in a set of distinct use cases. Possibilities include:

  1. Coding to solve an immediate problem with no use outside of the current directory anticipated – in this case we don’t need to worry about setup.cfg, setup.py or even __init__.py.
  2. Coding to solve an immediate problem with the potentially to spread code over several files and directories – we should now make sure we put an empty __init__.py in each directory containing module files.
  3. Coding to provide a local library to reuse in other projects locally this will require us to run “python setup.py develop” or better “pip install -e .”
  4. Coding to provide a library which will be used on other systems you control again using “pip install -e .”
  5. Coding to provide a library which will be published publicly, here we will need to additionally make use of something like the packaging library.

I am primarily interested in cases 3 and 4, and my projects tend to be pure Python so I don’t need to worry about compiling code.

The setup.py and setup.cfg files are artefacts of the setuptools module which is designed to help with the packaging process. It is used by pip whose purpose is to install a package either locally or remotely. If we do not configure setup.py/setup.cfg correctly then pip will not work. In the past we would have written a setup.py file which contained a bunch of configuration information but now we should put that configuration information into setup.cfg which is effectively an ini file (i.e. does not need to be executed to be read). This is why we now have the minimal setup.py file.

It is worth noting that setup.cfg is an ini format file, and pyproject.toml is a slightly more formal ini-like format.

What is pyproject.toml?

The pyproject.toml file was introduced in PEP-518 (2016) as a way of separating configuration of the build system from a specific, optional library (setuptools) and also enabling setuptools to install  itself without already being installed. Subsequently PEP-621 (2020) introduces the idea that the pyproject.toml file be used for wider project configuration and PEP-660 (2021) proposes finally doing away with the need for setup.py for editable installation using pip.

Although it is a relatively new innovation, there are a number of projects that support the use of pyproject.toml for configuration including black, pylint and mypy. More are listed here:

https://github.com/carlosperate/awesome-pyproject

Where do tests go?

Tests go in a tests directory at the top-level of the project with an __init__.py file so they are discoverable by applications like pytest. The alternative of placing them inside the src/mypackage directory means they will get deployed into production which may not be desirable.

Why put your package code in a src/ subdirectory?

Using a src/ directory ensures that you must install a package to test it, so as your users would do. Also it prevents tools like pytest incidently importing it.

Conclusions

I found it a useful exercise researching this blog post, the initial setup of a Python project is something I rarely consider and have previously done by rote. Now I have a clear understanding of what I’m doing, and I also understand the layout of Python projects. One of my key realisations is that this is a moving target, what was standard practice a few years ago is no longer standard, and in a few years time things will have changed again.

Book review: The Programmer’s Brain by Felienne Hermans

programmers_brainI picked up The Programmer’s Brain by Felienne Hermans, as a result of a thread on Twitter. I’ve been following Hermans for quite a while, and knew the areas of computer science she worked in but my interest in Programmer’s Brain was stimulated by a lengthy thread she posted over the Christmas break.

The book is based around the idea of the brain as having long term (LTM), short term (STM) and working memory and how these different sorts of memory come into play in programming tasks, how we can improve our memories, and how we can write code that supports our use of them. It cites a fair number academic studies in each area it looks at.

The book is divided into four parts.

The first part covers the reading of code. We do a lot of training on how to write code but none on reading it, yet as developers we spend a lot of time reading code, either our own code from the past, the code of our colleagues or library code.

Perhaps most traumatic for me was the suggestion that I should learn syntax. Hermans suggests flash cards to learn syntax, as an aid to reading code (and writing it), highlighting that going and looking up syntax is likely to break our flow, by the time we have checked out twitter and some pictures of kittens. Thinking about my own behaviour, this is definitely true. My first flash cards would all be around Python – set syntax, format statements, unittests boilerplate and the options for sort and sorted.

An idea I hadn’t come across before was refactoring code for readability which may be at odds with how code currently stands; you might, for example, inline functions to remove the need to go look them up and potentially lose your place in code. Or replace lambdas, list comprehensions or ternary operators – all of which take a bit more effort to parse – with their more verbose, conventional alternatives.

Two things that aid reading code are "chunking", experts in a field, like chess or programming, don’t learn remember every detail but they know the rules of possibility so they can break up a programme or a chess position into larger pieces (or chunks). They thus have better recall than novices.

The second aid to reading code are beacons, variable names and comments that hint about the higher purpose of code, to enable you to recall the right chunks. That’s to say if you are implementing code that uses a binary try you use the conventional names of root, branch, node, left and right rather than trying to be individualistic.

I suspect a lot of programmers, like me, will be looking at the rote learning exercises that Hermans proposes and starts to think immediately about how to automate them! I think there is scope for IDE extensions that allow you set up some flashcards or little code exercises. Also Hermans proposes quite a lot of printing out code and annotating it, again this was something I’d quite like IDE support for.

The second part is on understanding code more deeply, how it works. I was interested to learn that our natural language abilities are a better predictor of how good we are at comprehending what code does, than our mathematical abilities. In terms of understanding code, Hermans talks about marking up listings of code to highlight the occurrence of functions and variables. And, furthermore, to label variables by role following the work of Sajaniemi that is to say into the categories of fixed value, stepper, flag, walker (like a stepper), most recent holder, most wanted holder, gatherer, container, follower, organiser, and temporary. The co-occurrence of these roles provides strong clues as to what code does – in the same manner as design patterns. If we spot a design pattern we can access our long term memory as to what a design pattern does.

Following on from the idea of labelling roles of variables is the somewhat depreciated "Hungarian notation" proposed by Simonyi. This is where you include some type or role information in a variable name such as "strMyName" or "lb_textbox", Simonyi’s original proposal was to name variables with their roles, rather than just their types which is rather less useful in strongly typed languages and modern IDEs with syntax highlighting.

The third part is on writing code, starting with the importance of naming things. The key here is consistency in naming (i.e. stick with either snake case or camel case, don’t mix), and agreeing a "name mould" – a pattern for compiling parts of a name. Martin Fowler’s "code smells" are also covered in this section, highlighting how they interact with the model and how bad code smells prevent us accessing our long term memories. 

The final part is on collaborating on code, including the developer’s great bugbear "the interruption", it turns out this annoyance is well-founded with research showing that an interruption typically requires 15-20 minutes for recovery. I was also interested to see that we are not very good at multi-tasking, although we might think we are.

Also in this part is a discussion of the cognitive dimensions of code bases (CDCB), these are ideas like the error proneness of code, how easy it is to modify, how easy it is to test in parts applied at the level of an application or library. There is an implication here that the language you use to build a library may change over the course of time, perhaps starting with Python when you are roughing things out quickly, adding in type hinting when the library is more mature and shifting to Scala or Java when the design is stable and better performance is needed.

Finally, there is a small piece on onboarding new developers to a project, here the ideas of cognitive load repeat. Often when we are onboarding a new developer we show them the code, introduce a load of people, draw diagrams and so forth – all very fast. Under these circumstances our ideas about cognitive load tell us anyone will be overwhelmed.

I enjoyed this book, it feels like a guide to getting better at doing something I spend a lot of my time on. It is an area, learning in the field of programming, that I have not seen written about elsewhere.

Hopefully this book will change the way I work a bit, I’ll try to learn more syntax, I’ll not worry about reusing the same variable names, or even using Hungarian notation. I’ll try to remember the roles of variables. And I’ll try Hedy out with my son, Hedy is the teaching language Hermans wrote while also writing this book.

Book review: Natives by Akala

nativesA return to the Black Lives Matter theme with Natives by Akala. Natives is an autobiography which illustrates many of the points made in Why I am no longer talking to white people by Reni Eddo-Lodge and Black and British by David Olusoga. Akala highlights that his working class origins are as much an issue as his race.

Akala is a rapper, poet, journalist, songwriter, author and activist – see their wikipedia page here. I don’t know what the etiquette is for using someone’s "birth name" when they publish under a pen name. Although I had not heard of Akala previously, I am familiar with the work of his older sister Ms Dynamite.

Akala has a white Scottish-German mother and a black Jamaican father. He grew up in Camden in the late Eighties. As he points out this is as some of the overt racism in Britain, which his fathers generation had experienced, had started to recede. He went to Jamaica once as a child but subsequently has visited many times. Alongside Black America, Jamaica and Shakespeare are his major cultural influences. He visited also visited the family in the Outer Hebrides, finding Scotland less racist than England.

He clearly remembers the occasion on which he realised that his mother was white, talking about coming home school having been racially abused at the age of five by another child. For racists there is no mixed-race, no being a little bit Black – for them it is all or nothing. This is reflected in the South African apartheid era laws. So although Akala is mixed-race this is pretty much meaningless since he is considered Black by the white world. Interestingly there are gradations in the Black community where in the Caribbean the paler skinned are seen as a higher social class (I think the same may be true in India), and in South Africa being successful is "acting the white man". At secondary school a teacher once stated to him in an argument that "The Ku Klux Klan stopped crime by killing black people" – this incident gets a whole chapter, you perhaps won’t be surprised that there were no adverse consequences for the teacher.

As a child Akala was academically gifted, going to various extra classes and a pan-African school at the weekend. This was a result of his mother’s drive but does not seem to have been uncommon for Black families.

I think the thing that really hit me was that when my (white, middle-class) son, aged 9, demonstrates his academic ability we get an email from his teacher praising him. When Akala achieved academically at school he was criticised (and was actually in a special needs class at one point). A recent incident with a friend of ours suggests this attitude for children who are not white has not completely gone from the teaching profession.

Despite these academic talents he still fell into something of the gang culture for a period, as he describes it he simply snapped out of it at the age of about 25 – something he says is typical. His less fortune cohort were either imprisoned or killed by this point. This is an odd juxtaposition of someone who has friends who are classical music composers and hospital consultants, but at the same time know people who are in prison or have been killed in street violence. 

Why was Akala and his cohort susceptible to gang culture? He sees it as a working class problem, rather than a race problem – citing the high levels of gang violence elsewhere in the UK where the black population is small. A second factor is the utter distrust of the police in the black community, driven by years of prejudice. They simply don’t see the police as there for them (with pretty good reason).

You can see this happening today in the UK. There is a steady stream of stories in the press of successful black people stopped in their cars (car not registered here, this car looks too expensive for you to own it), and stopped in the street (crime by a man matching your description). As a middle aged white man I don’t get stopped by the police because I don’t look right.

Tony Blair was happy to talk about "black on black" violence, although he would never describe violence in Northern Ireland or in Glasgow or Newcastle as white on white violence.  In fact I was surprised to learn that violence in Glasgow is a bigger problem than in London but the media like to report the violence in London and imply it is about the black population. The Labour party are happy to talk about the difficulties of "white working class boys" ignoring the fact that this is largely down to class not colour.

Akala talks a bit about South Africa and Cuba, it’s interesting the emphasis that he puts on the role of Cuba in ending apartheid with their military support against the South African regime in neighbouring countries. Overall his view of Cuba is more positive than mine. I think I have been corrupted by 50 years of anti-Castro propaganda. On Mandela and the ANC he is not quite so positive as your average middle-aged white man.

I found Natives a useful complement to Black and British by David Olusoga and Why I’m no longer talking to white people about race by Reni Eddo-Lodge because it talks of the individual impacts of what these other books described in a more abstract way.

Review of the year: 2021

The year started much in the manner of 2020 with Thomas, now aged 9, going back to school for a single day before returning to home schooling because of the covid pandemic. Despite this he’s doing well, surprising his RE teacher by explaining what omnipotent and omniscient meant! In the Autumn term his class had a covid outbreak which he managed to dodge despite sharing a table with three of the infected. Sharon and I have both been vaccination three times.

We managed to go on holiday to Anglesey on the hot week at the beginning of the summer, much time spent on the beach and we all got a bit sunburnt. We also travelled down to Dorset to see my mum in October- the first time we have seen her in over two years. We stopped off at Stonehenge on the way, it is the first time I remember visiting Stonehenge, I probably did as a child since I grew up only an hours drive away. Things have changed at Stonehenge since I was young, you used to approach by parking at a visitor centre and crossing under a main road but now the road has been closed and grassed over and the visitor centre has been moved away so you approach by a half hour walk which reflects the approach our prehistoric ancestors are though to have made.

stonehenge-2

 

stonehenge-1

Overall we rather enjoyed the experience, it felt like a bit of a pilgrimage and once we got to the stones it didn’t feel too busy.

In domestic news we now have two new cats, Lily and Michelle – adopted from Warrington Animal Welfare which incidentally has a fine view of the Warrington Transporter Bridge. We adopted our previous cats, Bill and Ted, when we moved to Chester in 2004, Ted died a few years ago and Bill disappeared, presumed dead in April. Lily (on the arm of the chair below) is three, and her daughter Michelle is 8 months old – its a bit of a novelty having young, playful cats again. Unlike Bill and Ted, Lily and Michelle actually use the cat beds and cat tree we have generously provided, they have not tried to dig their way out of the lounger where they are currently largely confined (looking at no cat in particular, Ted).

cats

For a while over the summer and autumn we had a trail camera setup in the back garden, this followed on from taking part in a hedgehog study run by Chester Zoo – they loaned us a trail camera to look for hedgehogs. None showed up until after we had returned the camera at the end of the study and bought our own! The highlights of our trail camera experiments were the evening a fox cub poked a hedgehog with their nose, and discovered it was prickly, and the tawny owl – an unexpected visitor for a suburban garden. We had quite a few foxes what looked like a family with three or four cubs – sometimes even showing up in the day time. And we have video proof that "yes, that is fox poo"!

IM_01846

IM_09904

I started and finished the year with books where the media was the message with History of Britain in Maps by Philip Parker to start, and Index, A history of the by Dennis Duncan which was about indexes specifically but also something of the history of the book. I have learned that indexes have been used as both satire and fiction.

I read a number of guitar related books this year, Guitar Method – Music Theory by Tom Kolb, How to Write Songs on Guitar by Rikky Rooksby, Guitar Pedals by Rob Thorpe, Riffs: How to Create and Play Great Riffs by Rikky Rooksby, Guitar Looping – The Creative Guide by Kristof Neyens and related to this Audio Production Basics with Ableton Live by Eric Kuehnl. My favourite of these is How to Write Songs on Guitar by Rikky Rooksby which is a bit of a "bigger picture" book, and rather more book-like than the others. I still play guitar badly but I treated myself to a new guitar (a Gretsch G2622 Streamliner DC in Ocean Turquoise, see below), it is a modest step up price-wise on my Squier Stratocaster and I think this shows.

guitar

On the more technical side I read Exercises in Programming Style by Cristina Videira Lopes which I really enjoyed, it shows the same program written in 33 different styles, all in the Python programming language. I also read Data Pipelines with Apache Airflow by Bas P Harenslak and Julian R De Ruiter, as well as writing a blog post on Python Documentation with Sphinx.

My history reading tended more towards the prehistoric with Hidden Histories by Mary-Ann Ochota which talks mainly about Prehistoric British landscapes, The Goddess & The Bull by Michael Balter – about Çatalhöyük one of the oldest settlements in the world, and Ancestors by Professor Alice Roberts – about prehistoric British burials.

On the Black Lives Matter front I read Why I’m no longer talking to white people about race by Reni Eddo-Lodge, Precolonial Black Africa by Cheikh Anta Diop, and Empireland by Sathnam Sanghera.

There were a couple of natural history books, Entangled Life by Merlin Sheldrake about fungus and Much Ado About Mothing by James Lowen, about moths. Sort of related since it pertains to the biological sciences was The Code Breaker by Walter Isaacson which is a biography of Jennifer Doudna, and the CRISPR gene-editing technology for which she won a Nobel Prize. And finally, Eye of the Beholder by Laura J. Snyder which is a joint biography of Johannes Vermeer and Antonie van Leeuwenhoek who were both born in Delft in 1632.

The year ends largely as it started with covid once again running wild in the UK, this time with the Omicron variant and we are unsure whether Thomas will return to school at the beginning of the year. Prospects are a little better because we now have a vaccine, and this seems effective against all variants although Thomas, as a 9 year old, does not receive any covid vaccinations.

Book review: Index, A history of the by Dennis Duncan

I came to indexIndex, A History of the by Dennis Duncan via a review in New Scientist. It is broadly a history of the book centred on indexes.

Duncan starts by talking about alphabetical order, and how it first came about – people have been writing their ABCs for approaching four millennia. The first catalogues or subject indexes date to the 3rd century BCE by Callimachus in the Great Library of Alexandria. The Greeks were more keen on alphabetisation than the ancient Romans. At this time writing was in scrolls, so there were no page numbers – making indexing somewhat difficult. An interesting language aside, the Greek "sittybos" was a parchment tag used to indicate the contents of a scroll from which we get the word syllabus, index is the Latin word for the same thing.

The next phase of evolution of the index was driven by the Church during the Middle Ages. Monasteries and nunneries valued reading, and the pope decreed that cathedrals should teach in 1079 which led to the creation of universities. The codex (essentially a book) had displaced the scroll as the primary format for writing by 600AD. Reading and teaching, and preaching, led to a need to find specific parts of large volumes of text hence the index.

The Bible was divided into chapters around 1200, verses were added in 1550. This is important because in the age of the handwritten manuscript pagination is variable – chapters and verses are a substitute for page numbering. Distinctios were created which drew together Bible references to make a theme for a sermon as well as more complete subject indexes, and word indexes.

A word index, or concordance, lists every occurrence of each word in a document. Here we are seeing the struggle to find the right size for an index, a concordance is too big, a table of contents, a sort of index, is not big enough. Hugh at the Dominican friary of St  Jacques in Paris produced the first concordance of the Latin bible in around 1230. The subject index is in between the concordance and table of contents in size but finding the right size is a job for skilled humans to this day.

At the same time as the Bible was being indexed and concordance-d Robert Grosseteste was creating a great index which spanned multiple books. It is always a bit difficult to determine whether historical figures have been enormously before their time, or whether an author is casting that historical figure with the present very much in mind. This struck most firmly with Grosseteste’s great index which looks to us very much like Google’s index of the internet.

Although there were a couple of experiments with page numbers on manuscript pages prior to the invention of the printing press in around 1440, it was a while before they were commonplace. Page numbers are a bit awkward to print because they fall outside the main body of the text, so there was some experimentation with using the folio marks used by the printer to compile a book properly. Sometimes in the 15th century readers were instructed to write the page numbers into the index themselves!

A recurring theme is whether reading an index is "cheating", whether it takes away from the activity of reading fully a book. Duncan also cites Socrates’ Phaedrus where it is argued that speaking is superior to writing/reading. We see a similar argument today as to whether Google has replaced our ability to read. Thinking of my own reading of non-fiction, I don’t make much use of indexes but I do have an Evernote of books like this with page number references – as you can see here – I use them when writing these reviews, so in a sense they are my own index.

In the 17th century the index as satire was invented, it was a time when political pamphlets were all the rage, and a format was found for the satirical index. I was entertained by the story of William Bromley, and the election to Speaker of the House in 1705. Just prior to the election his enemies published an index of his travelogue Remarks in the Grand Tour which cast him in a poor light, highlight errors of fact, statements of the obvious and hints of popery. He subsequently lost the election.

A genre I’d never seen before is the fictional index, that is a work of fiction which features a fictional index or even a work that is entirely a fictional index such as Nabakov’s Pale Fire or The Index by JG Ballard. Erasmus started this in 1532. Although there were experiments with indexing fiction, ultimately these never really took off.

Universal indexes in the manner of Grossetestes’ made something of a return in the 19th century, with Jacques-Paul Migne’s collected works of the Church Fathers. Later in the century there was an abortive attempt to index everything stemming from J. Ashton Cross’s presentation to the Conference of Librarians in 1877. A longer lasting effort was William Poole’s An Alphabetical Index to Subjects Treated in the Reviews and Other Periodicals perhaps this worked because its scope was smaller.

The final chapter covers the impact of computers on indexes, so far computers have been more dumb companions rather than creators in making indexes. They are able to generate concordances very quickly but even the best software struggles to identify appropriate subjects: what to index and what not to index. The problem is that computerised search provides and adequate index, so publishers are less inclined to spend money on an index – which is a separate, specialised activity to the authoring of a book.

I was worried that a history of the index would be a bit dull but I really enjoyed this book.