Python Documentation with Sphinx

I’ve been working on a proof of concept project at work, and the time has come to convert it into a production system. One of the things it was lacking was documentation, principally for the developers who would continue work on it. For software projects there is a solution to this type of problem: automated documentation systems which take the structure of the code and the comments in it (written in a particular way) and generate human readable documentation from it – typically in the form of webpages.

For Python the “go to” tool in this domain is Sphinx.

I used Sphinx a few years ago, and although I got to where I wanted in terms of the documentation it felt like a painful process. This time around I progressed much more quickly and was happier with the results. This blog post is an attempt to summarise what I did for the benefit of others (including future me). Slightly perversely, although I use a Windows 10 laptop, I use Git Bash as my command prompt but I believe everything here will apply regardless of environment.

There are any number of Sphinx guides and tutorials around, I used this one by Sam Nicholls as a basis supplemented with a lot of Googling for answers to more esoteric questions. My aim here is to introduce some pragmatic solutions to features I wanted, and to clarify some thing that might seem odd if you are holding the wrong concept of how Sphinx works in your head.

I was working on a pre-existing project. To make all of the following work I ran “pip install …” for the following libraries: sphinx, sphinx-rtd-theme, sphinx-autodoc-typehints, and m2r2. In real life these additional libraries were added progressively. sphinx-rtd-theme gives me the the popular “Readthedocs” theme, Readthedocs is a site that publishes documentation and the linked example shows what can be achieved with Sphinx. sphinx-autodoc-typehints pulls in type-hints from the code (I talked about these in another blog post) and m2r2 allows the import of Markdown (md) format files, Sphinx uses reStructuredText (rst) format by default. These are both simple formats that are designed to translate easily into HTML format which is a pain to edit manually.

With these preliminaries done the next step is to create a “docs” subdirectory in the top level of your repository and run the “sphinx-quickstart” script from the commandline. This will ask you a bunch of questions, you can usually accept the default or provide an obvious answer. The only exception to this, to my mind, is when asked “Separate source and build directories“, you should answer “yes“. When this process finishes sphinx-quickstart will have generated a couple of directories beneath “docs“: “source” and “build“. The build directory is empty, the source directory contains a conf.py file which contains all the configuration information you just provided, an index.rst file and a Makefile. I show the full directory structure of the repository further down this post.

I made minor changes to conf.py, switching the theme with html_theme = ‘sphinx_rtd_theme’, and adding the extensions I’m using:

extensions = [
'sphinx.ext.autodoc',
'sphinx_autodoc_typehints',
'm2r2',
]

In the past I added these lines to conf.py but as of 2022-12-26 this seems not to be necessary:

import os 
import sys
sys.path.insert(0, os.path.abspath('..'))

This allows the Sphinx to “see” the rest of your repository from the docs directory.

The documentation can now be built using the “make html” command but it will be a bit dull.

In order to generate the documentation from code a command like: “sphinx-apidoc -o source/ ../project_code“, run from the docs directory will generate .rst files in the source directory which reflect the code you have. To do this Sphinx imports your code, and it will use the presence of the __init__.py file to discover which directories to import. It is happy to import subdirectories of the main module as submodules. These will go into files of the form module.submodule.rst.

The rst files contain information from the docstrings in your code files, (those comments enclosed in triple double-quotes “””I’m a docstring”””. A module or submodule will get the comments from the __init__.py file as an overview then for each code file the comments at the top of the file are included. Finally, each function gets an entry based on its definition and some specially formatted documentation comments. If you use type-hinting, the sphinx-autodoc-typehints library will include that information in documentation. The following fragment shows most of the different types of annotation I am using in docstrings.

def initialise_logger(output_file:Union[str, bytes, os.PathLike], mode:Optional[str]="both")->None:
    """
    Setup logging to console and file simultanenously. The process is described here:
    Logging to Console and File In Python

    :param output_file: log file to use. Frequently we set this to:
    .. highlight:: python
    .. code-block:: python

            logname = __file__.replace("./", "").replace(".py", "")
            os.path.join("logs", "{}.log".format(logname)) 
        
    :param mode: `both` or `file only` selects whether output is sent to file and console, or file only
    
    :return: No return value
    """

My main complaint regarding the formatting of these docstrings is that reStructuredText (and I suspect all flavours of Markdown) are very sensitive to whitespace in a manner I don’t really understand. Sphinx can support other flavours of docstring but I quite like this default. The docstring above, when it is rendered, looks like this:

In common with many developers my first level of documentation is a set of markdown files in the top level of my repository. It is possible to include these into the Sphinx documentation with a little work. The two issues that need to be addressed is that commonly such files are written in Markdown rather reStructuredText. These can be fixed by using the m2r2 library. Secondly the top level of a repository is outside the Sphinx source tree, so you need to put rst files in the source directory which include the Markdown files from the root of the repository. For the CONTRIBUTIONS.md file the contributions.rst file looks like this:

.. mdinclude:: ../../CONTRIBUTIONS.md

Putting this all together the (edited) structure for my project looks like the following, I’ve included the top-level of the repository which contains the Markdown flavour files, the docs directory, where all the Sphinx material lives, and stubs to the directories containing the module code, with __init__.py files.

.

├── CONTRIBUTIONS.md
├── INSTALLATION.md
├── OVERVIEW.md
├── USAGE.md
├── andromeda_dq
│   ├── __init__.py
│   ├── scripts
│   │   ├── __init__.py
│   ├── tests
│   │   ├── __init__.py
├── docs
│   ├── Makefile
│   ├── make.bat
│   └── source
│       ├── _static
│       ├── _templates
│       ├── andromeda_dq.rst
│       ├── andromeda_dq.scripts.rst
│       ├── andromeda_dq.tests.rst
│       ├── conf.py
│       ├── contributions.rst
│       ├── index.rst
│       ├── installation.rst
│       ├── modules.rst
│       ├── overview.rst
│       └── usage.rst
├── setup.py

The index.rst file pulls together documentation in other rst files, these are referenced by their name excluded the rst extension (so myproject pulls in a link to myproject.rst). By default the index file does not pull in all of the rst files generated by apidoc, so these might need to be added (specifically the modules.rst file). The index.rst file for my project looks like this, all I have done manually to this file is add in overview, installation, usage, contributions and modules in the “toctree” section. Note that the indentation for these file imports needs to be the same as for the preceding :caption: directive.

.. Andromeda Data Quality documentation master file, created by
   sphinx-quickstart on Wed Sep 15 08:33:59 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Andromeda Data Quality
======================

Documentation built using Sphinx. To re-build run `make html` in the `docs`
directory of the project.

The OVERVIEW.md, INSTALLATION.md, USAGE.md, and CONTRIBUTIONS.md files are imported 
from the top level of the repo.

Most documentation is from type-hinting and docstrings in source files.

.. toctree::
   :maxdepth: 3
   :caption: Contents:

   overview
   installation
   usage
   contributions
   modules
   


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

The (edited) HTML index page for the documentation looks like this:

For some reason Sphinx puts the text in the __init__.py files which it describes as “Module Contents” at the bottom of the relevant package description, this can be fixed by manually moving the “Module contents” section to the top of the file in the relevant package rst file.

There is a little bit of support for Sphinx in Visual Code, I’ve installed the reStructuredText Syntax highlighting extension and the Python Sphinx Highlighter extension. The one thing I haven’t managed to do is automate the process of running “make html” either on commit of new code, or when code is pushed to a remote. I suspect this will be one of the drawbacks in using Sphinx. I’m getting a bit better at adding type-hinting and docstrings as I code now.

If you have any comments, or suggestions, or find any errors in this blog post feel free to contact me on twitter (@ianhopkinson_).

Book review: Precolonial Black Africa by Cheikh Anta Diop

diopMy next book follows on from reading Black and British by David Olusoga. It is Precolonial Black Africa by Cheikh Anta Diop. I was looking for an overview of African history from an African perspective. Diop’s relatively short book focuses on West Africa. It turns out he is a very interesting figure in himself, building several political parties, doing research in history as well as physics and chemistry and having a university named after him. Some of his ideas on African history are controversial (you can see the wikipedia page relating to him here).

The core of the controversy is two fold, one is his claim that ancient Egyptians were black, and the second is that there is a historical unity in West Africa civilisation with migration from the east of Africa populating the continent. The basis for this thesis relies quite heavily on similarities in totemic names across the region as well as cultural similarities. These days there is some support for the migration of populations out of the Nile basin to West Africa from DNA evidence.

Most of the discussion in this book is oriented around the area of West Africa where Diop grew up, in Senegal, with some mentions of Eygypt and Sudan. Diop draws parallels in the internal organisations across the empires of Ghana, Mossi, Mali and Songhai. The Empire of Ghana stretched beyond the boundaries of the modern country, and stood for 1250 years. Mossi was to the east and south, in the area of modern Burkino Faso, Mali and Songhai were a little to the north encompassing the modern Timbucktu. Looking at wikipedia these empires appear to have overlapped to a degree both in time and space. Precolonial Black Africa covers the period from about 300AD to the 17th century although it does not make much reference to dates.

There is almost no mention even of the area of Nigeria, a little to the east, or Southern Africa. I was nearly half way through the book before I realised that Sudan referred to two different places: Sudan the modern state in North East Africa, and the Sudan Empire which stretches across the southern margin of the Sahara in the West of Africa.

The books starts with a description of the caste system, emphasising the two-way nature of the system and contrasting it to a degree with the caste system in India.

Precolonial Black Africa contrasts Africa with Europe, in the period covered by the book Europe was based on city-states which evolved into feudal structures, with Roman geographical divisions, where defence from marauders by the lord in the castle was important. Land ownership was core of this political system whereas Africa evolved more along Egyptian lines which saw countries divided into regions with regional governance and no tradition of land ownership.

These empires were led by kings with a small cabinet of advisors who had both a regional responsibility and a specialism (like a minister for finance, or the army). Although not republics, nor democratic in the modern Western sense, Diop claims that these governments were more representative than their Western European equivalents of the time.

The technological expertise of the ancient Romans and Greeks was carried through the Middle Ages by the Arab world. It is no coincidence that Spain was once a technology leader, given the Muslim rule of Spain. Islamization of West Africa is a recurring theme of the book, and Arab writers feature regularly in the lists of sources for the early history of Africa. Islam was important in education through to the present day, this is in part responsible for slowed technological progress in the region. Islamic schools did not place a great emphasis on what they consider pagan history, nor so much on modern science.

Precolonial Black Africa covers technology relatively briefly, mentioning architecture and the Great Zimbabwe – a significant stone-built city in present day Zimbabwe whose early excavation was plagued by the then Rhodesian governments view that it could not be constructed by Black Africans. Coins, and metalworking are also mentioned – West Africa made relatively little use of the familiar coinage of European. Gold dust was used as currency, as were Cowrie shells. The Benin Bronzes dating from the 13th century demonstrate there was significant metalworking skill in West Africa (the Bronzes are currently in the news as the UK refuses to return them to Benin). Little of technology and writing seems to have survived from precolonial times, I suspect this is a combination of the environment which is not conducive to the preservation of paper (or even metal), successive colonisations by Islam and then Europeans and relatively little archaeological activity.Trade seemed quite significant across West Africa, even in the absence of conventional coinage.

The interesting thing reading this book is the contrast with flaws that Western history has had in the past, being focussed on great men, the idea of the natural superiority of the white man, and leaning heavily on Classical heritage for legitimacy. I suspect these points of view are generally not prevalent in modern academic history but they certainly hold sway with the current UK government and a coterie of right-wing historians. To a degree Diop suffers the same types of prejudices but from a different perspective – the superiority of the Black African. My view of African history is still heavily influenced by those old Western European foundations.  

After a rocky start I came to enjoy this book, I found the book alien in a couple of respects firstly in its discussion of history from an African perspective, and also simply that it is African history. What I know of Africa is largely through a colonial lens. 

Book review: Ancestors by Professor Alice Roberts

ancestorsSomewhat unintentionally my next book, Ancestors by Alice Roberts, follows on from Hidden Histories by Mary-Ann Ochta and The Goddess and The Bull by Michael Balter. Ancesters is an investigation of the transition from early Stone Age people in Britain through the Neolithic, to the Bronze Age finally the Iron Age through the medium of seven burials around Britain. As well as the facts of various burials Roberts talks too about in archaeological methodology over time.

The broad context of the book is a project on recording ancient DNA in which Roberts is involved, a project on hold due to covid. Motivation for this is that we can observe the movement of ancient peoples and relationships between people in burials using DNA. These techniques have not been applied extensively to Neolithic remains to date.

The first burial discussed is of the "Red Lady" in a cave in the Paviland Cliffs on the Gower in Wales. It dates back to the Paleolithic (old Stone Age), 34,000 years ago and is the oldest burial discovered in Britain, from a period before the last Ice Age. William Buckland was the first to scientifically describe the burial, and his descriptions reflect the opinions of the time. He sought to reconcile such burials with biblical knowledge, and social mores, initially describing the burial as of a "Red Lady" because of the decorative grave goods (and the body being caked in red ochre). It turns out the burial is actually of a man!

As far as we can tell deliberate burials by homo sapiens date back about 100,000 years. The evidence is mixed as to whether Neanderthals practiced burials. This rubicon is seen as important since burial rites represent a move to modern human thinking which distinguishes us from other animals (so far!). I particularly enjoyed the description of the "flower people" where, in a burial in Iraq, it has never been quite clear whether Neolithic people buried people in flowers or whether it was actually the work of gerbils that, by the way, also gnawed on the body.

Returning to UK we meet Cheddar Man, who was buried after the last Ice Age about 10,000 years ago. Incidentally we learn how to wind up an anatomist with fake skeletons: in real life the pelvis and ribcage of a skeleton collapse because the ligaments don’t hold them together after they’ve been in the ground for a bit – fake skeletons don’t show this. At 14,700 years old other skeletons in Gough’s Cave, where Cheddar man was found, are the earliest post-Ice Age human remains found in Britain.

Cheddar Man was from the Paleolithic or old Stone Age, the next burials discussed are from the Neolithic or New Stone Age. The defining feature of the New Stone Age is the move from hunter/gathering to agriculture and settlement. The key question is whether this change in behaviour was a transmission of ideas, or an influx of people with these new habits. This transition to agriculture 11,000 years ago is one of the central themes of The Goddess and The Bull, in Britain the transition takes place a later – about 6,000 years ago.

Farming arrived in Britain with people, rather than just ideas. There’s evidence of violence in some of the burials discovered (about 10% of skulls show signs of traumatic injury) but as far as can be ascertained this was farmer/farmer violence rather than hunter-gatherer/farmer violence. It seems that hunter-gathering died out with its practitioners rather than its practitioners converting to farming. Something that I hadn’t heard of before was the idea of a "house burial" – some Neolithic burial barrows are on the site of dwellings, longhouses, which have been ritually burnt. Neolithic burial sites are often reused in the Bronze and Iron Age, perhaps to maintain contact with the land. Perhaps burial becomes more important once we start to stake a claim on particular pieces of land.

There’s a small diversion at this point to discuss Pitt Rivers, a 19th century archaeologist whose methodology was beyond his times in the sense that he made meticulous records of what he had dug. He was born Augustus Lane-Fox but changed his surname to Pitt Rivers as a condition of receiving a substantial inheritance. He spent his later years in detailed excavation of his inherited Rushmore Estate which lies close to Salisbury and is incredibly rich in archaeology (or perhaps if you are rich, an archaeologist and inherited a large estate it turns out there is a lot of archaeology you can do).

Next we move to Bronze Age burials, where things get exciting in terms of grave goods. Starting gently with some arrows and so forth we move on to whole, upright chariots including the horses in the Iron Age!! The Bronze Age is also marked by an influx of people. I recall from my Seventies childhood the Beaker People (identified by a particular type of pottery).

At this point, in the late Iron Age we transition from prehistory to history with Roman writings on Britain. Such records need to be treated with a little care since they are often second hand and are the viewpoint of a conqueror. It is interesting to see the names of Iron Age tribes carried forward to the present day, for example in the Parisi in Northern France (turning into Paris) and Durotriges turning into Dorset.

Roberts notes at the end of the book that burial practices don’t have to be universal across a period we consider to be discrete such as the Bronze Age, to the people living at the time they were not "Bronze Age" they were people of a much narrower place and time. Large changes in burial practices are not necessarily indicative of religious changes – Britain shifted from burial to cremation from the end of the 19th century to the Sixties with no change in religion.

The writing of the book stretched into the covid pandemic, it is an interesting mix of topics written in an engaging style. There are a couple of places where the editing slips a bit. Overall I found it an engaging read.

Book review: The Code Breaker by Walter Isaacson

code_breakerFor my summer holiday reading I have The Code Breaker by Walter Isaacson, the author was recommended by a friend. It is the story of CRISPR gene editing, and Jennifer Doudna, one of the central characters in the development of this system and winner of the Nobel Prize for Chemistry in 2020 with Emmanuelle Charpentier for this work.

CRISPR is an acronym for "clustered regularly interspaced short palindromic repeats", a name derived from the DNA sequences that prompted its discovery. CRISPR are the basis of a type of immune system for bacteria against viruses. The CRISPR repeats form a fingerprint which matches the viral DNA and the associated system of enzymes allows a bacteria to snip out viral DNA which matches this sequence.

Whilst CRISPR is interesting in itself, it has applications in gene editing as a cure for disease in humans. CRISPR simply requires a short piece of RNA to match the target DNA in a gene to carry out its editing job. Short RNA sequences are easy to synthesise making CRISPR superior to earlier gene editing techniques. In addition there is potential to use CRISPR as a diagnostic tool for identifying infections such as covid and even as a cure for viral diseases. The Code Breaker does a good job of explaining CRISPR to a fair depth.

There is a section of the book on gene editing in humans and the moral issues this raises. Perhaps central to this is the story of He Jiankui, the Chinese scientist who led the work to carry out germ line edits to add a gene protective against HIV. Germ line gene edits mean editing the genes in an early stage embryo which means that all the cells in the child it gives rise to have the edit, including reproductive cells, hence the gene edit will be passed on to descendants. This is considered more radical than somatic cell gene editing where the changes stop with the person treated. I must admit to having some sympathy for He Jianku. Principally Western scientists had made a great show of considering the moral issues in germ line editing eventually deciding that the time was not yet right, but going against a moratorium or regulation in the area. This seems an ambiguous position to me, and the associated comments that Jiankui had done his work for publicity is a bit rich from a group of scientists who have been so competitive in the research over CRISPR. Jiankui conducted his research with the approval of his local ethics board but was subsequently disavowed by the Chinese authorities and then convicted.

Coronavirus is woven through the book because the work on CRISPR is very relevant here from a scientific point of view, and the key characters including the author are involved, as we all are! As far as I can tell Doudna et al have been involved heavily in conventional covid19 testing and have done research on CRISPR-based diagnostic tests which have great potential for the future – essentially they would allow any viral illness to be definitively tested at home (rather than a sample being sent off to do PCR test) – but are not yet used in production. Similarly there is the potential for CRISPR-based vaccines but these are not yet been deployed in anger. The Pfizer and Moderna vaccines are based on RNA but use older technology.

A chunk of the book covers the patent battles over CRISPR principally involving Doudna and her co-workers and Feng Zheng, scientist at the Broad Institute. The core of the patent dispute is how obvious the step from understanding the operation of the CRISPR system (which Doudna’s team demonstrated first) to applying it to human cells (which Zhang did first) is. I think my key learning from this part of the book is that I’m not very interested in patent battles! Tied up with the patent issue is the question of the great science Prizes which similarly give a winner takes all reward to a small group. The Nobel Prizes have a limit of three on the number of winners, so do more recently instituted prizes. Science simply isn’t done this way, and hasn’t been for a long time. There’s a group of at least a dozen scientist at the core of the CRISPR story and probably more, singling out a couple of people for a reward is invidious. It made me wonder whether the big science prizes are really about the prize giver rather than the winner.

The book is written in the more journalistic style that has arisen in scientific biography relatively recently, that’s to say there is a lot of incidental detail about where Isaacson met people and their demeanour than in older scientific biographies. I must admit I find this a bit grating, I’ve tended towards collective biographies recently rather than single person biographies which have a bit of a "great man" feel to them. However, I’m starting to make my peace with this new style – it makes science feel like a more human process, and makes for a more readable book. It’s fair to say that this is in no way a "great man" biography, although Doudna and her life and personality are a recurring theme other people get a similar treatment.

Book review: Guitar Looping – The Creative Guide by Kristof Neyens

creative_loopingFor completeness I include my review of Guitar Looping: The Creative Guide by Kristof Neyens. This is in the same series as Guitar Pedals by Rob Thorpe. These are both quite short books but I’ve found them useful.

A looper pedal is a simple recording device which is started and stopped using a footswitch. A loop can be built up by making successive recordings, or layers, one on top of another. Typically loops are only a few bars long at most but modern looper pedals can record for tens of minutes.

I bought a looper pedal a year or so ago (reviewed here) and, to be honest, it has languished a bit on my pedalboard. I think the problem is a lack of education in the right format. Also I probably should have started with the simplest looper available, the author uses a tc electronic ditto rather than a step up (my Boss RC-3).

In common with Guitar Pedals, Guitar Looping contains lots (117) of short examples annotated in normal musical notation and guitar tab notation with accompanying audio files downloadable from the website. There is a brief text introduction to each example. I find these nice exercises in ear training, it’s good to be able to follow along with the tune.

The author is quite fond of the volume swell as part of a loop, this has got me thinking I need a volume pedal – previously I couldn’t see the point of them. This presents a problem because I’ve run out of space on my pedalboard!

Aside from the technical skill of starting and stopping loops at the appropriate point, there is also the skill of controlling the volume of your play within a layer and also getting the volume of different layers right. The loops illustrated often contain a percussive layer made by playing with strings muted, a rhythm/bass layer and a melodic layer which may be single notes or simple chords. Neyens talks about providing both harmonic space and dynamic space in layers. That’s to say there is no point in recording a layer loud and filled with sound because there is nowhere to put additional layers. This means that individual layers can sound quite simple and sparse. To get harmonic space you might play low notes with an octave pedal, on the lower three strings and melodies on the higher three strings, further up the neck.

The other useful piece of information I picked up was how to make your guitar sound like a clarinet! You pick the string 12 frets from where you are fretting – so if you are holding down the low E string on the third fret you need to pluck it and see. Try it and see.

After reading this book I’m using my looper pedal a bit more, there’s a lot of ideas in here and perhaps the most important thing is a stimulus to play around a bit – it doesn’t cost anything!