Category: Technology

Programming, gadgets (reviews thereof) and computers

Rosetta Stone – TypeScript

In an earlier blog post I explained the motivation for a series of “Rosetta Stone” posts which described the ecosystem for different programming languages. This post covers TypeScript, the associated GitHub repository is here. This blog post aims to provide some discussion around technology choices whilst the GitHub repository provides details of what commands to execute and what files to create.

I was curious to try this exercise on a language, TypeScript, which I had not previously used or even considered. It so happens that TypeScript arose in a recent discussion so I thought I would look at it.

So for this post in particular, factual errors are down to ignorance on my part! I welcome corrections.

About TypeScript

TypeScript was developed by Microsoft with the first release in October 2012 with the aim of providing a language suited for use in large applications by adding a type system to JavaScript. TypeScript is compiled to JavaScript for execution, in common with a number of other languages.

TypeScript ranks quite low on the TIOBE index but ranks fourth on the GitHub Top Programming Languages and fifth in the StackOverflow rankings. This is likely because TIOBE is based on search rankings and for many TypeScript queries the answer will be found in the more extensive JavaScript corpus.

How is the language defined?

The homepage for TypeScript is https://www.typescriptlang.org/. There appears to be no formal, up to date language specification. The roadmap for the language is published by Microsoft, and develops through a process of Design Proposals. TypeScript releases a new minor version every 3 months, and once 10 minor versions have been released the major version is incremented. There is some debate about this strategy since it does not follow either conventional semantic or date-based versioning.

The JavaScript runtime on which compiled TypeScript code is run will also have an evolution and update process.

TypeScript Compilers and runtimes

The TypeScript compiler, tsc, is typically installed using the node.js package manager, npm. Node.js is a runtime engine which runs JavaScript without the need for a browser. It is downloaded and installed from here. Node.js is just one of a number of options for running JavaScript compiled from TypeScript, it can be run in the browser or using other systems such as Deno or Bun.

The install of node.js appears trivial but on Windows machines there is a lengthy install of Visual Studio build tools, the chocolatey package manager and Python after node.js has installed!

Once node.js is installed installing TypeScript is a simple package installation, it can be installed globally or at a project level. Typically getting started guides assume a global install which simplifies paths.

Tsc is configured with a file, tsconfig.json file – a template can be generated by running `tsc –init`

It is possible to compile TypeScript to JavaScript using Babel which is a build tool originally designed to transpile between versions of JavaScript.

Details of installation can be found here in the accompanying GitHub repository.

Package/library management

Npm, part of the node.js ecosystem is the primary package management system for TypeScript. Yarn and pnpm are alternatives that have a very similar (identical?) interface.

A TypeScript project is defined by a `package.json` file which holds the project metadata both user-generated and technical such as dependencies and scripts – in this sense it mirrors the Python `pyproject.toml` file. Npm generates a `package-lock.json` file which contains exact specifications of the installed packages for a project (somewhat like the manually generated requirements.txt file in Python). The JavaScript/TypeScript standard library is defined here. I note there is no CSV library 😉. More widely third party libraries can be found in the node registry here.

To use the standard JavaScript library in TypeScript the type definitions need to be installed with:

Npm install @types/node –save-dev

Local packages can be installed for development, as described here.

Npm has neat functionality whereby scripts for executing the project tests, linting, formatting and whatever else you want, can be specified in the `package.json` file.

Virtual environments

Python has long had the concept of a virtual environment, where the Python interpreter and installed packages can be specified at a project level to simplify dependency management. Npm essentially has the same functionality by the use of saved dependences which are installed into a `node_modules` folder. The node.js version can be specified in the `package.json` file, completing the isolation from global installation.

Project layout for package publication

There is no formally recommended project structure for TypeScript/npm packages. However, Microsoft has published a Node starter project which we must assume reflects best practice. An npm project will contain a `package.json` file at the root of the project and put locally, project-level packages into a node_modules directory.

Based on the node starter project, a reasonable project structure would contain the following folders, with configuration files in the project root directory:

  • dist – contains compiled JavaScript, `build` is another popular name for this folder;
  • docs – contains documentation output from documentation generation packages ;
  • node_modules – created by npm, contains copies of the packages used by a project;
  • src – contains TypeScript source files ;
  • tests – contains TypeScript test files;

How this works in practice is shown in the accompanying GitHub repository. TypeScript is often used for writing web applications in which case there would be separate directories for web assets like HTML, CSS and images.

Testing

According to State of JS, Jest has recently become the most popular testing framework for JavaScript, mocha.js is was the most popular until quite recently. The headline chart on State of JS is a little confusing – there is a selector below the chart that allows you to switch between Awareness, Usage, Interest and Retention. These results are based on a survey of a self-selecting audience, so “Buyer beware”. Jest was developed by Facebook to test React applications, there is a very popular Microsoft Visual Code plugin for Jest.

Jest is installed with npm alongside ts-jest and the Jest TypeScript types, and configured using a jest.config.json file in the root of the file. In its simplest form this configuration file provides a selector for finding tests, and a transform rule to apply ts-jest to TypeScript files before execution. Details can be found in the accompanying GitHub repository.

Static analysis and formatting tools

Static analysis, linting, for TypeScript is generally done using ESLint, the default JavaScript linter, although a special variant is installed as linked here. Previously there was a separate TSLint linter although this has been deprecated. Installation and configuration details can be found in the accompanying GitHub repo.

There is an ESLint extension for Microsoft Visual Code.

The Prettier formatter seems to be the go to formatter for TypeScript (as well as JavaScript and associated file formats). Philosophically, in common with Python’s black formatter, prettier intends to give you no choice in the matter of formatting. So I am not going to provide any configuration, the accompanying GitHub repository simply contains a .prettierignore file which lists directories (such as the compiled JavaScript) that prettier is to ignore.

There is a prettier extension for Visual Code

Documentation Generation

TypeDoc is the leading documentation generation system for TypeScript although Microsoft are also working on TSDoc which seeks to standardise the comments used in JSDoc style documentation. TypeDoc aims to support the TSDoc standard. TypeDoc is closely related to the JavaScript JSDoc package.

Wrapping up

I was struck by the similarities between Python and TypeScript tooling, particularly around configuring a project. The npm package.json configuration file is very similar in scope to the Python pyproject.toml file. Npm has the neat additional features of adding packages to package.json when they are installed and generating the equivalent of the requirements.txt file automatically. It also allows the user to specify a set of “scripts” for running tests, linting and so forth – in Python I typically use a separate tool, `make`, to do this.

In both Python and TypeScript third party tools have multiple configuration methods, including JSON or ini/toml format, Python or JavaScript files. At least one tool has a lengthy, angry thread on their GitHub repository arguing about allowing configuration to be set in the default package configuration file! I chose the separate json file method where available because it clearly separates out the configuration for a particular tool in the project, and is data rather than executable code. In production I have tended to use a single combined configuration file to limit the number of files in the root of a project.

I welcome comments, probably best on Mastodon where you can find me here.

Rosetta Stone – Python

Python Logo, interlocked blue and yellow stylised snakes

In an earlier blog post I explained the motivation for a series of “Rosetta Stone” posts which described the ecosystem for different programming languages. This post covers Python, the associated GitHub repository is here. This blog post aims to provide some discussion around technology choices whilst the GitHub repository provides details of what commands to execute and what files to create.

For Python my knowledge of the ecosystem is based on considerable experience as a data scientist working predominantly in Python rather than a full-time software developer or a computer scientist. Although much of what I learned about the Python ecosystem was as a result of working on a data mesh project as, effectively, a developer.

About Python

Python is a dynamically typed language, invented by Guido van Rossum with the first version released in 1991. It was intended as a scripting language which fell between shell scripting and compiled languages like C. As of 2023 it is the most popular language in the TIOBE index, and also on GitHub.

How is Python defined?

The home for Python is https://www.python.org/ where it is managed by the Python Software Foundation. The language is defined in the Reference although this is not a formal definition. Python has a regular release schedule with a new version appearing every year or so and a well-defined life cycle process. As of writing (October 2023) Python 3.12 has just been released. In the past the great change was from Python 2 to Python 3 which was released in December 2008 – this introduced breaking changes. The evolution of the language is through the PEP (Python Enhancement Proposal) – PEP documents are an excellent resource for understanding new features.

Python Interpreters

The predominant Python interpreter is CPython which is what you get if you download Python from the official website. Personally, I have tended to use the Anaconda distribution of Python for local development. I started doing this 10 years or so ago when installing some libraries on Windows machines was a bit tricky and Anaconda made it possible/easy. It also has nice support for virtual environments – in particular it allows the Python version for the virtual environment to be defined. However, I keep thinking I should review this decision since Anaconda includes a lot of things I don’t use, they recently changed their licensing model which makes it more difficult to use in organisations and the issues with installing libraries are less these days.

CPython is not the only game in town though, there is Jython which compiles Python to Java-bytecode, IronPython which compiles it to the .NET intermediate language, and PyPy which is written in Python. These alternatives generally have the air of being interesting technical demonstrations rather than fully viable alternatives to CPython.

Typically I invoke Python scripts using a command line in Git Bash like:

./my_script.py

This works because I start all of my Python scripts with:

#!/usr/bin/env python

More generally Python scripts are invoked like:

python my_script.py

Package/Library Management

Python has always come with a pretty extensive built-in library – “batteries included” is how it is often described. I am a data scientist, and rather harshly I often judge programming languages as to whether they include a built-in library for reading and writing CSV files (Python does)!

The most common method for managing third party libraries is the `pip` package. By default this installs packages from the Python Package Index repository. The Anaconda distribution includes the `conda` package manager, which I have occasionally used to install tricky packages, and there are `pipenv` and `poetry` tools which also handle virtual environments as well as dependencies.

With pip installing a package is done using a command like:

pip install scikit-learn

If required a specific version can be specified or a version newer than a specific version. A list of dependencies can be installed from a plain text file:

pip install -r requirements.txt

The dependencies of a project are defined in the `pyproject.toml` file which configures the project. These are often described as being abstract – i.e. they indicate which packages are required, and perhaps version limits, if the host project requires functionality only available after a certain limit. The `requirements.txt` file is often found in projects, this should be a concrete specification of package versions on the developer machine. It is the “Works for me(TM)” file. I must admit I only understood this distinction after looking at the node.js package manager, npm, where the `pyproject.toml` equivalent is updated when a new package is installed. The `requirements.txt` file, equivalent – `package-lock.json` – is updated with the exact version of a package actually installed.  

In Python local code can be installed as a package like:

pip install -e .

This so called “editable” installation means that a package can be used elsewhere on the same machine whilst keeping up to date with the latest changes to the code.

Virtual environments

Python has long supported the idea of a “virtual environment” – a project level installation of Python which largely isolates it from other projects on the same machine by installing packages locally.

This very nearly became mandatory, see PEP-0704 – however, virtual environments don’t work very well for certain use cases (for example continuous development pipelines) and it turns out that `pip` sits outside the PEP process so the PEP had no authority to mandate a change in `pip`!

The recommended approach to creating virtual environments is the built-in `venv` library. I use the Anaconda package manager since it allows the base version of Python to be varied on a project by project basis (or even allowing multiple versions for the same project). virtualenv, pipenv and poetry are alternatives.

IDEs like Visual Code allow the developer to select which virtual environment a project runs in.

Project layout for package publication

Tied in with the installation of packages is the creation and publication of packages. This is quite a complex topic, and I wrote a whole blog post on it. Essentially Python is moving to a package publication strategy which stores all configuration in a `pyproject.toml` file (toml is a simple configuration file format) rather than an executable Python file (typically called setup.py). This position has evolved over a number of years, and the current state is still propagating through the ecosystem. An example layout is shown below, setup.py is a legacy from former package structuring standards. The __init__.py files are an indication to Python that a directory contains package code.

Testing

Python has long included the `unittest` package as a built-in package – it is inspired the venerable JUnit test library for Java. `Pytest` is an alternative I have started using recently which has better support for reusable fixtures and a simpler, implicit syntax (which personally I don’t like). Readers will note that I have a tendency to use built-in packages where at all possible, this is largely to limit the process of picking the best of a range of options, and hedging against a package falling into disrepair. Typically I use Visual Code to run tests which has satisfying green tick marks for passing tests and uncomfortable red crosses for failing tests.

Integrated Development Environments

The choice of Integrated Development Environment is a personal one, Python is sufficiently straightforward that it is easy to use a text editor and commandline to complete development related tasks. I use Microsoft Visual Code, having moved from the simpler Sublime Text. Popular alternatives are the PyCharm IDE from JetBrains and the Spyder editor. There is even a built-in IDE called IDLE. The Jupyter Notebook is used quite widely particularly amongst data scientists (personally I hate the notebook paradigm, having worked with it extensively in Matlab) but this is more suited to exploratory data analysis and visualisation than code development. I use IPython, a simple REPL, a little to confirm syntax.

Static Analysis and Formatting Tools

I group static analysis and formatting tools together because for Python static analysers tend to creep into formatting. I have started using static analysis tools and a formatter since using Visual Code whose Python support builds it in, and using development pipelines when working with others. For static analysis I use a combination of pyflakes and pylint which are pretty standard choices, and for formatting I use black.

For Python a common standard for formatting is PEP-8 which describes the style used in the Python built-in library and C codebase.

Documentation Generation

I use sphinx for generating documentation, the process is described in detail this blog post. There is a built-in library, pydoc, which I didn’t realise existed! Doxygen, the de facto standard for C++ documentation generation will also work with Python.

Type-hinting

Type-hinting was formally added to Python in version 3.5 in 2015, it allows tools to carry out static analysis for compliance with the type-hints provided but is ignored by the interpreter. I wrote about this process in Visual Code in this blog post. I thought that type-hinting was a quirk of Python but it turns out that a number of dynamically typed languages allow some form of type-hinting, and TypeScript is a whole new language which adds type-hints to JavaScript.

Wrapping up

In writing this blog post I discovered a couple of built-in libraries that was not currently using (pydoc and venv). In searching for alternatives I also saw that over a period of a few years packages go in and out of favour, or at least support.

I welcome comments, probably best on Mastodon where you can find me here.

A Rosetta Stone for programming ecosystems

Rosetta Stone By © Hans Hillewaert, CC BY-SA 4.0, Link

The Rosetta Stone is a stone slab dating to 196BC on which is written the same decree in three different ancient Egyptian languages, it was key to deciphering these languages in the modern era.

It strikes me that learning a new programming language is not really an exercise in learning the syntax of a new language, for vast swathes of languages those things are very similar. For an experienced programmer the learning is in the ecosystem. What we need is a Rosetta Stone for software development in different languages that tells us which tools to use for different languages, or at least gives us a respectable starting point.

To my mind the ecosystem specific to a programming language includes the language specification and evolution process, compiler/interpreter options, package/dependency management, virtual environments, project layout, testing, static analysis and formatting tools, and documentation generation. Package management, virtual environments and project layout are inter-related, certainly in Python (my primary programming language).

In researching these tools I was curious about their history. Compilers have been around since nearly the beginning of electronic computing in the late forties and early fifties. Modern testing frameworks generally derive from SmallTalk’s sUnit – published in 1989. Testing clearly went on prior to this – I am sure it is referenced in The Mythical Man Month and Grace Hopper is cited for her work in testing components and computers.

I tend to see package/dependency management as being the system by which I install packages from an internet repository such as Python’s PyPI repository – in which case the first of these was CPAN, for the Perl language first online in 1995, not long after the birth of the World Wide Web.

Separate linters date back, to 1978. Indent was the first formatter, written in 1976. The first documentation generation tools arose towards the end of the eighties (link) with JavaDoc which I suspect inspired many subsequent implementations appearing in the mid-nineties.

Tool choices are not as straightforward as they seem, in nearly all cases there are multiple options as a result of an evolution in the way programming is done more generally, or developers seeking to improve what they see as pain points in current implementations.  Some elements are down to personal choice.

For my first two Rosetta Stone blog posts I look at Python and TypeScript. My aim is that the blog post will discuss the options and a GitHub repository will demonstrate one set of options in action. I am guided by my experience of working in a team on a Python project where we needed to agree a tool set and best practices. The use of development pipelines which run linters, formatters and tests automatically before code changes are merged, drove a lot of this work. The aim of these blog posts is, therefore, not to simply get an example of a programming language running but to create a project that software developers would be content to work with. The code itself is minimal, although I may add some more involved code in future.

I wrote the TypeScript in my “initial release” to see how the process would work for a language with which I was not familiar – it helped me understand the Python ecosystem better and gave me “feature envy”!

I found myself referencing numerous separate blog posts in writing these first two blog posts which suggests this Rosetta Stone is a worthwhile exercise. I also found my search results were not great, contaminated by a great deal of poorly written perhaps automatically generated material.

There are other, generic, parts of the ecosystem such as the operating system on which the code will run, the source control system and the Integrated Development Environment the developer uses which I will not generally discuss. I work almost exclusively on Windows but I prefer Git Bash as my shell. I use git with GitHub for source control and Visual Code as my editor/IDE.

When I started this exercise I thought that that there may be specific Integrated Development Environments used for specific languages. In the eighties and nineties when you bought a programming language the Integrated Development Environment was often part of the deal. This seems not to be the case anymore, most IDEs these days can be extended with plugins specific to a language so which IDE you start with is immaterial. In any case any language can be used as a combination of a text editor and command line tools.  

I have been programming since I was a child in the early eighties.  First in BASIC, then at university in FORTRAN, in industry in MATLAB before moving to Python. During that time I have also dabbled in C++ and Java but largely theoretical point of view. Although I have been programming for a long time it has generally been in the role of scientist / data scientist producing code for my own use, only in the last few years have I written code intended to be consumed by others.

These are my first two “Rosetta Stone” blog posts:

Versioning in Python

I have recently been thinking about versioning in Python, both of Python and also of the Python packages. This is a record of how it is done for a current project and the reasoning behind it.

Python Versioning

At the beginning of the project we made a conscious decision to use Python 3.9, however our package is also used by our Airflow code which does integration tests, and provides reference Docker images based on Python 3.7 (their strategy is to use the oldest version of Python still in support). This approach is documented here. And the end of life dates for recent Python versions are listed here:

Since we started the project, Python 3.11 has been released so it makes sense to extend our testing from just Python 3.9 to include Python 3.7 and 3.11 too.

The project uses an Azure Pipeline to run continuous integration / continuous development tests, it is easy to add tests for multiple versions of Python using the following stanza in the configuration file for the pipeline.

Extending testing resulted in only a small number of minor issues, typically around Python version support for dependencies which were easily addressed by allowing more flexible versions in Python’s requirements.txt rather than pinning to a specific version. We needed to address one failing test where it appears Python 3.11 handles escaping of characters in Windows-like path strings differently from Python 3.9.

Package Versioning

Our project publishes a package to a private PyPi repository. This process fails if we attempt to publish the same version of the package twice, where the version is that specified in the “pyproject.toml”* configuration file rather than the state of the code.

Python has views on package version numbering which are described in PEP-440, this describes permitted formats. It is flexible enough to allow both Calendar Versioning (CalVer – https://calver.org/) or Semantic Versioning (SemVer – https://semver.org/) but does not prescribe how the versioning process should be managed or which of these schemes should be used.

I settled on Calendar Versioning with the format YYYY.MM.Micro. This is a considered personal taste. I like to know at a glance how old a package is, and I worry about spending time working out whether I need to bump major, minor or patch parts of a semantic version number whilst with Calendar Versioning I just need to look at the date! I use .Micro rather than .DD (meaning Day) because the day to be used is ambiguous in my mind i.e. is the day when we open a pull request to make a release or when it is merged?

It is possible to automate the versioning numbering process using a package such as bumpversion but this is complicated when working in a CI/CD environment since it requires the pipeline to make a git commit to update the version.

My approach is to use a pull request template to prompt me to update the version in pyproject.toml since this where I have stored version information to date, as noted below I moved project metadata from setup.cfg to pyproject.toml as recommended by PEP-621 during the writing of this blog post. The package version can be obtained programmatically using the importlib.metadata.version method introduced in Python 3.8. In the past projects defined __version__ programmatically but this is optional and is likely to fall out of favour since the version defined in setup.cfg/pyproject.toml is compulsory.

Should you wish to use Semantic Versioning then there are libraries that can help with this, as long as you following commit format conventions such as those promoted by the Angular project.

Once again I am struck on how this type of activity is a moving target – PEP-621 was only adopted in November 2020.

* Actually when this blog post was started version information and other project metadata were stored in setup.cfg but PEP-621 recommends it is put in pyproject.toml and is preferred by the packaging library. Setuptools has parallel instructions for using pyproject.toml or setup.cfg, although some elements to do with package and data discovery are in beta.

Software Engineering for Data Scientists

For a long time I have worked as a data scientist, and before that a physical scientist – writing code to do data processing and analysis. I have done some work in software engineering teams but only in a relatively peripheral fashion – as a pair programmer to proper developers. As a result I have picked up some software engineering skills – in particular unit testing and source control. This year, for the first time, I have worked as a software engineer in a team. I thought it was worth recording the new skills and ways of working I have picked up in the process. It is worth pointing out that this was a very small team with only three developers working about 1.5 FTE.

This blog assumes some knowledge of Python and source control systems such as git.

Coding standards

At the start of the project I did some explicit work on Python project structure, which resulted in this blog post (my most read by a large margin). At this point we also discussed which Python version would be our standard, and which linters (syntax/code style enforcers) we would use (Black, flake and pylint) – previously I had not used any linters/syntax checkers other than those built-in to my preferred editors (Visual Studio Code). My Python project layout used to be a result of rote learning – working in a team forced me to clarify my thinking in this area.

Agile development

We followed an Agile development process, with work specified in JIRA tickets which were refined and executed in 2 week sprints. Team members were subjected to regular rants (from me) on the non-numerical “story points” which have the appearance of numbers BUT REALLY THEY ARE NOT! Also the metaphor of sprinting all the time is exhausting. That said I quite like the structure of working against tickets and moving them around the JIRA board. Agile development is the subject of endless books, I am not going to attempt to describe it in any detail here.

Source control and pull requests

To date my use of source control (mainly git these days) has been primitive; effectively I worked on a single branch to which I committed all of my code. I was fairly good at committing regularly, and my commit messages were reasonable useful. I used source control to delete code with confidence and as a record of what I was doing when.

This project was different – as is common we operated on the basis of developing new features on branches which were merged to the main branch by a process of “pull requests” (GitHub language) / “merge requests” (GitLab language). For code to be merged it needed to pass automated tests (described below) and review by another developer.

I now realise we were using the GitHub Flow strategy (a description of source control branching strategies is here) which is relatively simple, and fits our requirements. It would probably have been useful to talk more explicitly about our strategy here since I had had no previous experience in this way of working.

I struggled a bit with the code review element, my early pull requests were massive and took ages for the team to review (partly because they were massive, and partly because the team was small and had limited time for the project). At one point I Googled for dealing with slow code review and read articles starting “If it takes more than a few hours for code to be reviewed….” – mine were taking a couple of weeks! My colleagues had a very hard line on comments in code (they absolutely did not want any comments in code!)

On the plus side I learnt a lot from having my code reviewed – often in pushing me to do stuff I knew I should have done. I also learned from reviewing other’s code, often I would review someone else’s code and then go change my own code.

Automated pipelines

As part of our development process we used Azure Pipelines to run tests on pull requests. Azure is our corporate preference – very similar pipeline systems can be found in GitHub and GitLab. This was all new to me in practical, if not theoretical, terms.

Technically configuring the pipeline involved a couple of components. The first is optional, we used Linux “make” targets to specify actions such as running installation, linters, unit tests and integration tests. Make targets are specified in a Makefile, and are involved with simple commands like “make install”. I had a simple MakeFile which looked something like this:

The make targets can be run locally as well as in the pipeline. In practice we could fix all issues raised by black and flake8 linters but pylint produced a huge list of issues which we considered then ignored (so we forced a pass for pylint in the pipeline).

The Azure Pipeline was defined using a YAML file, this is a simple example:

This YAML specifies that the pipeline will be triggered on attempting a pull request against a main branch. The pipeline is run on an Ubuntu image (the latest one) with Python 3.9 installed. Three actions are done, first installation of the Python package specified in the git repo, then unit tests are run and finally a set of linters is run. Each of these actions is run regardless of the status of previous actions. Azure Pipelines offers a lot of pre-built tasks but they are not portable to other providers, hence the use of make targets.

The pipeline is configured by navigating to the Azure Pipeline interface and pointing at the GitHub repo (and specifically this YAML file). The pipeline is triggered when a new commit is pushed to the branch on GitHub. The results of these actions are shown in a pretty interface with extensive logging.

The only downside of using a pipeline from my point of view was that my standard local operating environment is Windows with the git-bash prompt providing a Linux-like commandline interface. The pipeline was run on an Ubuntu image, which meant that certain tests would pass locally, but not in the pipeline, and were consequently quite difficult to debug. Regular issues were around checking file sizes (line endings mean that file sizes on Linux and Windows differ) and file paths – even with Python’s pathlib – are different between Windows and Linux systems. Using a pipeline forces you to ensure your installation process is solid, since the pipeline image is built on every run.

We also have a separate pipeline to publish the Python package to a private PyPi repository but that is the subject of another blog post.

Conclusions

I learnt a lot working with other, more experienced, software engineers and as a measure of the usefulness of this experience I have retro-fitted the standard project structure and make targets to my legacy projects. I have started using pipelines for other applications.