September 2021 archive

Python Documentation with Sphinx

I’ve been working on a proof of concept project at work, and the time has come to convert it into a production system. One of the things it was lacking was documentation, principally for the developers who would continue work on it. For software projects there is a solution to this type of problem: automated documentation systems which take the structure of the code and the comments in it (written in a particular way) and generate human readable documentation from it – typically in the form of webpages.

For Python the “go to” tool in this domain is Sphinx.

I used Sphinx a few years ago, and although I got to where I wanted in terms of the documentation it felt like a painful process. This time around I progressed much more quickly and was happier with the results. This blog post is an attempt to summarise what I did for the benefit of others (including future me). Slightly perversely, although I use a Windows 10 laptop, I use Git Bash as my command prompt but I believe everything here will apply regardless of environment.

There are any number of Sphinx guides and tutorials around, I used this one by Sam Nicholls as a basis supplemented with a lot of Googling for answers to more esoteric questions. My aim here is to introduce some pragmatic solutions to features I wanted, and to clarify some thing that might seem odd if you are holding the wrong concept of how Sphinx works in your head.

I was working on a pre-existing project. To make all of the following work I ran “pip install …” for the following libraries: sphinx, sphinx-rtd-theme, sphinx-autodoc-typehints, and m2r2. In real life these additional libraries were added progressively. sphinx-rtd-theme gives me the the popular “Readthedocs” theme, Readthedocs is a site that publishes documentation and the linked example shows what can be achieved with Sphinx. sphinx-autodoc-typehints pulls in type-hints from the code (I talked about these in another blog post) and m2r2 allows the import of Markdown (md) format files, Sphinx uses reStructuredText (rst) format by default. These are both simple formats that are designed to translate easily into HTML format which is a pain to edit manually.

With these preliminaries done the next step is to create a “docs” subdirectory in the top level of your repository and run the “sphinx-quickstart” script from the commandline. This will ask you a bunch of questions, you can usually accept the default or provide an obvious answer. The only exception to this, to my mind, is when asked “Separate source and build directories“, you should answer “yes“. When this process finishes sphinx-quickstart will have generated a couple of directories beneath “docs“: “source” and “build“. The build directory is empty, the source directory contains a conf.py file which contains all the configuration information you just provided, an index.rst file and a Makefile. I show the full directory structure of the repository further down this post.

I made minor changes to conf.py, switching the theme with html_theme = ‘sphinx_rtd_theme’, and adding the extensions I’m using:

extensions = [
'sphinx.ext.autodoc',
'sphinx_autodoc_typehints',
'm2r2',
]

In the past I added these lines to conf.py but as of 2022-12-26 this seems not to be necessary:

import os 
import sys
sys.path.insert(0, os.path.abspath('..'))

This allows the Sphinx to “see” the rest of your repository from the docs directory.

The documentation can now be built using the “make html” command but it will be a bit dull.

In order to generate the documentation from code a command like: “sphinx-apidoc -o source/ ../project_code“, run from the docs directory will generate .rst files in the source directory which reflect the code you have. To do this Sphinx imports your code, and it will use the presence of the __init__.py file to discover which directories to import. It is happy to import subdirectories of the main module as submodules. These will go into files of the form module.submodule.rst.

The rst files contain information from the docstrings in your code files, (those comments enclosed in triple double-quotes “””I’m a docstring”””. A module or submodule will get the comments from the __init__.py file as an overview then for each code file the comments at the top of the file are included. Finally, each function gets an entry based on its definition and some specially formatted documentation comments. If you use type-hinting, the sphinx-autodoc-typehints library will include that information in documentation. The following fragment shows most of the different types of annotation I am using in docstrings.

def initialise_logger(output_file:Union[str, bytes, os.PathLike], mode:Optional[str]="both")->None:
    """
    Setup logging to console and file simultanenously. The process is described here:
    Logging to Console and File In Python

    :param output_file: log file to use. Frequently we set this to:
    .. highlight:: python
    .. code-block:: python

            logname = __file__.replace("./", "").replace(".py", "")
            os.path.join("logs", "{}.log".format(logname)) 
        
    :param mode: `both` or `file only` selects whether output is sent to file and console, or file only
    
    :return: No return value
    """

My main complaint regarding the formatting of these docstrings is that reStructuredText (and I suspect all flavours of Markdown) are very sensitive to whitespace in a manner I don’t really understand. Sphinx can support other flavours of docstring but I quite like this default. The docstring above, when it is rendered, looks like this:

In common with many developers my first level of documentation is a set of markdown files in the top level of my repository. It is possible to include these into the Sphinx documentation with a little work. The two issues that need to be addressed is that commonly such files are written in Markdown rather reStructuredText. These can be fixed by using the m2r2 library. Secondly the top level of a repository is outside the Sphinx source tree, so you need to put rst files in the source directory which include the Markdown files from the root of the repository. For the CONTRIBUTIONS.md file the contributions.rst file looks like this:

.. mdinclude:: ../../CONTRIBUTIONS.md

Putting this all together the (edited) structure for my project looks like the following, I’ve included the top-level of the repository which contains the Markdown flavour files, the docs directory, where all the Sphinx material lives, and stubs to the directories containing the module code, with __init__.py files.

.

├── CONTRIBUTIONS.md
├── INSTALLATION.md
├── OVERVIEW.md
├── USAGE.md
├── andromeda_dq
│   ├── __init__.py
│   ├── scripts
│   │   ├── __init__.py
│   ├── tests
│   │   ├── __init__.py
├── docs
│   ├── Makefile
│   ├── make.bat
│   └── source
│       ├── _static
│       ├── _templates
│       ├── andromeda_dq.rst
│       ├── andromeda_dq.scripts.rst
│       ├── andromeda_dq.tests.rst
│       ├── conf.py
│       ├── contributions.rst
│       ├── index.rst
│       ├── installation.rst
│       ├── modules.rst
│       ├── overview.rst
│       └── usage.rst
├── setup.py

The index.rst file pulls together documentation in other rst files, these are referenced by their name excluded the rst extension (so myproject pulls in a link to myproject.rst). By default the index file does not pull in all of the rst files generated by apidoc, so these might need to be added (specifically the modules.rst file). The index.rst file for my project looks like this, all I have done manually to this file is add in overview, installation, usage, contributions and modules in the “toctree” section. Note that the indentation for these file imports needs to be the same as for the preceding :caption: directive.

.. Andromeda Data Quality documentation master file, created by
   sphinx-quickstart on Wed Sep 15 08:33:59 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Andromeda Data Quality
======================

Documentation built using Sphinx. To re-build run `make html` in the `docs`
directory of the project.

The OVERVIEW.md, INSTALLATION.md, USAGE.md, and CONTRIBUTIONS.md files are imported 
from the top level of the repo.

Most documentation is from type-hinting and docstrings in source files.

.. toctree::
   :maxdepth: 3
   :caption: Contents:

   overview
   installation
   usage
   contributions
   modules
   


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

The (edited) HTML index page for the documentation looks like this:

For some reason Sphinx puts the text in the __init__.py files which it describes as “Module Contents” at the bottom of the relevant package description, this can be fixed by manually moving the “Module contents” section to the top of the file in the relevant package rst file.

There is a little bit of support for Sphinx in Visual Code, I’ve installed the reStructuredText Syntax highlighting extension and the Python Sphinx Highlighter extension. The one thing I haven’t managed to do is automate the process of running “make html” either on commit of new code, or when code is pushed to a remote. I suspect this will be one of the drawbacks in using Sphinx. I’m getting a bit better at adding type-hinting and docstrings as I code now.

If you have any comments, or suggestions, or find any errors in this blog post feel free to contact me on twitter (@ianhopkinson_).

Book review: Precolonial Black Africa by Cheikh Anta Diop

diopMy next book follows on from reading Black and British by David Olusoga. It is Precolonial Black Africa by Cheikh Anta Diop. I was looking for an overview of African history from an African perspective. Diop’s relatively short book focuses on West Africa. It turns out he is a very interesting figure in himself, building several political parties, doing research in history as well as physics and chemistry and having a university named after him. Some of his ideas on African history are controversial (you can see the wikipedia page relating to him here).

The core of the controversy is two fold, one is his claim that ancient Egyptians were black, and the second is that there is a historical unity in West Africa civilisation with migration from the east of Africa populating the continent. The basis for this thesis relies quite heavily on similarities in totemic names across the region as well as cultural similarities. These days there is some support for the migration of populations out of the Nile basin to West Africa from DNA evidence.

Most of the discussion in this book is oriented around the area of West Africa where Diop grew up, in Senegal, with some mentions of Eygypt and Sudan. Diop draws parallels in the internal organisations across the empires of Ghana, Mossi, Mali and Songhai. The Empire of Ghana stretched beyond the boundaries of the modern country, and stood for 1250 years. Mossi was to the east and south, in the area of modern Burkino Faso, Mali and Songhai were a little to the north encompassing the modern Timbucktu. Looking at wikipedia these empires appear to have overlapped to a degree both in time and space. Precolonial Black Africa covers the period from about 300AD to the 17th century although it does not make much reference to dates.

There is almost no mention even of the area of Nigeria, a little to the east, or Southern Africa. I was nearly half way through the book before I realised that Sudan referred to two different places: Sudan the modern state in North East Africa, and the Sudan Empire which stretches across the southern margin of the Sahara in the West of Africa.

The books starts with a description of the caste system, emphasising the two-way nature of the system and contrasting it to a degree with the caste system in India.

Precolonial Black Africa contrasts Africa with Europe, in the period covered by the book Europe was based on city-states which evolved into feudal structures, with Roman geographical divisions, where defence from marauders by the lord in the castle was important. Land ownership was core of this political system whereas Africa evolved more along Egyptian lines which saw countries divided into regions with regional governance and no tradition of land ownership.

These empires were led by kings with a small cabinet of advisors who had both a regional responsibility and a specialism (like a minister for finance, or the army). Although not republics, nor democratic in the modern Western sense, Diop claims that these governments were more representative than their Western European equivalents of the time.

The technological expertise of the ancient Romans and Greeks was carried through the Middle Ages by the Arab world. It is no coincidence that Spain was once a technology leader, given the Muslim rule of Spain. Islamization of West Africa is a recurring theme of the book, and Arab writers feature regularly in the lists of sources for the early history of Africa. Islam was important in education through to the present day, this is in part responsible for slowed technological progress in the region. Islamic schools did not place a great emphasis on what they consider pagan history, nor so much on modern science.

Precolonial Black Africa covers technology relatively briefly, mentioning architecture and the Great Zimbabwe – a significant stone-built city in present day Zimbabwe whose early excavation was plagued by the then Rhodesian governments view that it could not be constructed by Black Africans. Coins, and metalworking are also mentioned – West Africa made relatively little use of the familiar coinage of European. Gold dust was used as currency, as were Cowrie shells. The Benin Bronzes dating from the 13th century demonstrate there was significant metalworking skill in West Africa (the Bronzes are currently in the news as the UK refuses to return them to Benin). Little of technology and writing seems to have survived from precolonial times, I suspect this is a combination of the environment which is not conducive to the preservation of paper (or even metal), successive colonisations by Islam and then Europeans and relatively little archaeological activity.Trade seemed quite significant across West Africa, even in the absence of conventional coinage.

The interesting thing reading this book is the contrast with flaws that Western history has had in the past, being focussed on great men, the idea of the natural superiority of the white man, and leaning heavily on Classical heritage for legitimacy. I suspect these points of view are generally not prevalent in modern academic history but they certainly hold sway with the current UK government and a coterie of right-wing historians. To a degree Diop suffers the same types of prejudices but from a different perspective – the superiority of the Black African. My view of African history is still heavily influenced by those old Western European foundations.  

After a rocky start I came to enjoy this book, I found the book alien in a couple of respects firstly in its discussion of history from an African perspective, and also simply that it is African history. What I know of Africa is largely through a colonial lens.