Tag: software development

Book review: A Philosophy of Software Design by John Ousterhout

Next for review is A Philosophy of Software Design by John Ousterhout. This a book about a big idea for software design. The big idea is that good software design is about managing complexity. In Ousterhout’s view complexity leads to a set of bad things: (1) change amplification – small changes in code lead to big effects, (2) cognitive load – complex software is difficult to understand, and (3) unknown unknowns – complex software can spring the unexpected on us.

Ousterhout covers this big idea in 20 short chapters with frequent reference to a projects that he has run with students repeatedly (including a GUI text editor and a HTTP server) – providing a testbed for reviewing many design choices. He also uses the RAMCloud project as an example, as well as features of Java and the UNIX operating system. This makes a refreshing change from artificial examples.

To decrease complexity requires developers to think strategically rather than tactically which goes against the flow of some development methodologies. Ousterhout suggests spending 10-20% of time on strategic thinking – this will pay off in the long term. He cites Facebook as a company who worked tactically and Google and VMWare as companies who worked more strategically.

At the core of reducing complexity is the idea of “deep modules”, that’s to say systems that have a relatively small interface (the width) which hides information about a potentially complex process (the depth). The Java garbage collector is the limiting case for this – having no user accessible interface. The aim of the deep modules is to hide implementation details (information) from users whilst presenting an interface that only takes what is required. This means deciding what matters to the user – and the best answer is as little as possible.

This goes somewhat against the ideas of the Agile development movement, as expressed in Clean Code by Robert C. Martin (which I read 10 years ago) – who was a big fan of very short functions. I noticed in my review Clean Code that I have some sympathy with Ousterhout’s view – small functions introduce a complexity overhead in function definitions.

Also on the theme of Agile development, Martin (in Clean Code) sees comments as a failing whilst Ousterhout is a fan of comments, covering them in four chapters. I recently worked on a project where the coding style was to rigorously exclude comments which I found unhelpful, that said I look at my code now and see orphaned comments – no longer accurate or relevant. The first of Ousterhout’s chapters on comments talks about four excuses to not provide comments, and his response to them:

  1. Good code is self-documenting – some things cannot be said in code (like motivations and quirks)
  2. I don’t have time to document – 10% of time on comments will pay off in future
  3. Comments get out of date and are misleading – see later chapter
  4. The comments I have seen are useless – do better!

The later chapters focus on the considered use of comments – thinking about where and what to comment rather than sprinkling comments around at a whim. The use of auto-documentation systems (like Sphinx for Python) is a large part of realising this since they force you to follow standard formats for comments – typically after the first line of a function definition. Comments on implementation details should be found through the body of a function (and definitely not in source control commit messages). He also introduces the idea of a central file for recording design decisions that don’t fit naturally into the code. I include the chapter on “Choosing names” under “comments” – Ousterhout observes that if you are struggling to find a good name for something there is a good chance that what you are trying to name is complex and needs simplification.

Certain types of code, Ousterhout cites event-driven programming, are not amenable to producing easy to understand code. He also dedicates a chapter to handling errors – arguing that errors should be defined out of existence (for example deleting a file that doesn’t exist shouldn’t cause an error, because the definition of such a function should be “make sure a file does not exist” rather than “delete a file”). Any remaining exceptions should be handled in one place, as far as possible.

There is a chapter on modern software development ideas and where they fit, or don’t, with the central theme. Object-orientation he sees as good in general, with information hidden inside classes but warns against over use of inheritance which leads to complexity (just where is this method defined?). He also comments that although design patterns are generally good their over-application is bad. He is in favour of unit tests but not test-driven development. This seems to be related to his central issue around Agile development – it encourages tactical coding in an effort to produce features rapidly (per sprint). He believes Agile can work if the “features” developed in sprints are replaced with “abstractions”. He doesn’t like Java’s getters and setters, nor its complex serialisation system which requires you to setup up buffering separately from opening a file as a stream – I remember finding this puzzling.

I enjoyed this book – it provides some support for continuing to do things I currently do although they are a little against the flow of Agile development and food for thought in improving further.

Rosetta Stone – TypeScript

In an earlier blog post I explained the motivation for a series of “Rosetta Stone” posts which described the ecosystem for different programming languages. This post covers TypeScript, the associated GitHub repository is here. This blog post aims to provide some discussion around technology choices whilst the GitHub repository provides details of what commands to execute and what files to create.

I was curious to try this exercise on a language, TypeScript, which I had not previously used or even considered. It so happens that TypeScript arose in a recent discussion so I thought I would look at it.

So for this post in particular, factual errors are down to ignorance on my part! I welcome corrections.

About TypeScript

TypeScript was developed by Microsoft with the first release in October 2012 with the aim of providing a language suited for use in large applications by adding a type system to JavaScript. TypeScript is compiled to JavaScript for execution, in common with a number of other languages.

TypeScript ranks quite low on the TIOBE index but ranks fourth on the GitHub Top Programming Languages and fifth in the StackOverflow rankings. This is likely because TIOBE is based on search rankings and for many TypeScript queries the answer will be found in the more extensive JavaScript corpus.

How is the language defined?

The homepage for TypeScript is https://www.typescriptlang.org/. There appears to be no formal, up to date language specification. The roadmap for the language is published by Microsoft, and develops through a process of Design Proposals. TypeScript releases a new minor version every 3 months, and once 10 minor versions have been released the major version is incremented. There is some debate about this strategy since it does not follow either conventional semantic or date-based versioning.

The JavaScript runtime on which compiled TypeScript code is run will also have an evolution and update process.

TypeScript Compilers and runtimes

The TypeScript compiler, tsc, is typically installed using the node.js package manager, npm. Node.js is a runtime engine which runs JavaScript without the need for a browser. It is downloaded and installed from here. Node.js is just one of a number of options for running JavaScript compiled from TypeScript, it can be run in the browser or using other systems such as Deno or Bun.

The install of node.js appears trivial but on Windows machines there is a lengthy install of Visual Studio build tools, the chocolatey package manager and Python after node.js has installed!

Once node.js is installed installing TypeScript is a simple package installation, it can be installed globally or at a project level. Typically getting started guides assume a global install which simplifies paths.

Tsc is configured with a file, tsconfig.json file – a template can be generated by running `tsc –init`

It is possible to compile TypeScript to JavaScript using Babel which is a build tool originally designed to transpile between versions of JavaScript.

Details of installation can be found here in the accompanying GitHub repository.

Package/library management

Npm, part of the node.js ecosystem is the primary package management system for TypeScript. Yarn and pnpm are alternatives that have a very similar (identical?) interface.

A TypeScript project is defined by a `package.json` file which holds the project metadata both user-generated and technical such as dependencies and scripts – in this sense it mirrors the Python `pyproject.toml` file. Npm generates a `package-lock.json` file which contains exact specifications of the installed packages for a project (somewhat like the manually generated requirements.txt file in Python). The JavaScript/TypeScript standard library is defined here. I note there is no CSV library 😉. More widely third party libraries can be found in the node registry here.

To use the standard JavaScript library in TypeScript the type definitions need to be installed with:

Npm install @types/node –save-dev

Local packages can be installed for development, as described here.

Npm has neat functionality whereby scripts for executing the project tests, linting, formatting and whatever else you want, can be specified in the `package.json` file.

Virtual environments

Python has long had the concept of a virtual environment, where the Python interpreter and installed packages can be specified at a project level to simplify dependency management. Npm essentially has the same functionality by the use of saved dependences which are installed into a `node_modules` folder. The node.js version can be specified in the `package.json` file, completing the isolation from global installation.

Project layout for package publication

There is no formally recommended project structure for TypeScript/npm packages. However, Microsoft has published a Node starter project which we must assume reflects best practice. An npm project will contain a `package.json` file at the root of the project and put locally, project-level packages into a node_modules directory.

Based on the node starter project, a reasonable project structure would contain the following folders, with configuration files in the project root directory:

  • dist – contains compiled JavaScript, `build` is another popular name for this folder;
  • docs – contains documentation output from documentation generation packages ;
  • node_modules – created by npm, contains copies of the packages used by a project;
  • src – contains TypeScript source files ;
  • tests – contains TypeScript test files;

How this works in practice is shown in the accompanying GitHub repository. TypeScript is often used for writing web applications in which case there would be separate directories for web assets like HTML, CSS and images.

Testing

According to State of JS, Jest has recently become the most popular testing framework for JavaScript, mocha.js is was the most popular until quite recently. The headline chart on State of JS is a little confusing – there is a selector below the chart that allows you to switch between Awareness, Usage, Interest and Retention. These results are based on a survey of a self-selecting audience, so “Buyer beware”. Jest was developed by Facebook to test React applications, there is a very popular Microsoft Visual Code plugin for Jest.

Jest is installed with npm alongside ts-jest and the Jest TypeScript types, and configured using a jest.config.json file in the root of the file. In its simplest form this configuration file provides a selector for finding tests, and a transform rule to apply ts-jest to TypeScript files before execution. Details can be found in the accompanying GitHub repository.

Static analysis and formatting tools

Static analysis, linting, for TypeScript is generally done using ESLint, the default JavaScript linter, although a special variant is installed as linked here. Previously there was a separate TSLint linter although this has been deprecated. Installation and configuration details can be found in the accompanying GitHub repo.

There is an ESLint extension for Microsoft Visual Code.

The Prettier formatter seems to be the go to formatter for TypeScript (as well as JavaScript and associated file formats). Philosophically, in common with Python’s black formatter, prettier intends to give you no choice in the matter of formatting. So I am not going to provide any configuration, the accompanying GitHub repository simply contains a .prettierignore file which lists directories (such as the compiled JavaScript) that prettier is to ignore.

There is a prettier extension for Visual Code

Documentation Generation

TypeDoc is the leading documentation generation system for TypeScript although Microsoft are also working on TSDoc which seeks to standardise the comments used in JSDoc style documentation. TypeDoc aims to support the TSDoc standard. TypeDoc is closely related to the JavaScript JSDoc package.

Wrapping up

I was struck by the similarities between Python and TypeScript tooling, particularly around configuring a project. The npm package.json configuration file is very similar in scope to the Python pyproject.toml file. Npm has the neat additional features of adding packages to package.json when they are installed and generating the equivalent of the requirements.txt file automatically. It also allows the user to specify a set of “scripts” for running tests, linting and so forth – in Python I typically use a separate tool, `make`, to do this.

In both Python and TypeScript third party tools have multiple configuration methods, including JSON or ini/toml format, Python or JavaScript files. At least one tool has a lengthy, angry thread on their GitHub repository arguing about allowing configuration to be set in the default package configuration file! I chose the separate json file method where available because it clearly separates out the configuration for a particular tool in the project, and is data rather than executable code. In production I have tended to use a single combined configuration file to limit the number of files in the root of a project.

I welcome comments, probably best on Mastodon where you can find me here.

Rosetta Stone – Python

Python Logo, interlocked blue and yellow stylised snakes

In an earlier blog post I explained the motivation for a series of “Rosetta Stone” posts which described the ecosystem for different programming languages. This post covers Python, the associated GitHub repository is here. This blog post aims to provide some discussion around technology choices whilst the GitHub repository provides details of what commands to execute and what files to create.

For Python my knowledge of the ecosystem is based on considerable experience as a data scientist working predominantly in Python rather than a full-time software developer or a computer scientist. Although much of what I learned about the Python ecosystem was as a result of working on a data mesh project as, effectively, a developer.

About Python

Python is a dynamically typed language, invented by Guido van Rossum with the first version released in 1991. It was intended as a scripting language which fell between shell scripting and compiled languages like C. As of 2023 it is the most popular language in the TIOBE index, and also on GitHub.

How is Python defined?

The home for Python is https://www.python.org/ where it is managed by the Python Software Foundation. The language is defined in the Reference although this is not a formal definition. Python has a regular release schedule with a new version appearing every year or so and a well-defined life cycle process. As of writing (October 2023) Python 3.12 has just been released. In the past the great change was from Python 2 to Python 3 which was released in December 2008 – this introduced breaking changes. The evolution of the language is through the PEP (Python Enhancement Proposal) – PEP documents are an excellent resource for understanding new features.

Python Interpreters

The predominant Python interpreter is CPython which is what you get if you download Python from the official website. Personally, I have tended to use the Anaconda distribution of Python for local development. I started doing this 10 years or so ago when installing some libraries on Windows machines was a bit tricky and Anaconda made it possible/easy. It also has nice support for virtual environments – in particular it allows the Python version for the virtual environment to be defined. However, I keep thinking I should review this decision since Anaconda includes a lot of things I don’t use, they recently changed their licensing model which makes it more difficult to use in organisations and the issues with installing libraries are less these days.

CPython is not the only game in town though, there is Jython which compiles Python to Java-bytecode, IronPython which compiles it to the .NET intermediate language, and PyPy which is written in Python. These alternatives generally have the air of being interesting technical demonstrations rather than fully viable alternatives to CPython.

Typically I invoke Python scripts using a command line in Git Bash like:

./my_script.py

This works because I start all of my Python scripts with:

#!/usr/bin/env python

More generally Python scripts are invoked like:

python my_script.py

Package/Library Management

Python has always come with a pretty extensive built-in library – “batteries included” is how it is often described. I am a data scientist, and rather harshly I often judge programming languages as to whether they include a built-in library for reading and writing CSV files (Python does)!

The most common method for managing third party libraries is the `pip` package. By default this installs packages from the Python Package Index repository. The Anaconda distribution includes the `conda` package manager, which I have occasionally used to install tricky packages, and there are `pipenv` and `poetry` tools which also handle virtual environments as well as dependencies.

With pip installing a package is done using a command like:

pip install scikit-learn

If required a specific version can be specified or a version newer than a specific version. A list of dependencies can be installed from a plain text file:

pip install -r requirements.txt

The dependencies of a project are defined in the `pyproject.toml` file which configures the project. These are often described as being abstract – i.e. they indicate which packages are required, and perhaps version limits, if the host project requires functionality only available after a certain limit. The `requirements.txt` file is often found in projects, this should be a concrete specification of package versions on the developer machine. It is the “Works for me(TM)” file. I must admit I only understood this distinction after looking at the node.js package manager, npm, where the `pyproject.toml` equivalent is updated when a new package is installed. The `requirements.txt` file, equivalent – `package-lock.json` – is updated with the exact version of a package actually installed.  

In Python local code can be installed as a package like:

pip install -e .

This so called “editable” installation means that a package can be used elsewhere on the same machine whilst keeping up to date with the latest changes to the code.

Virtual environments

Python has long supported the idea of a “virtual environment” – a project level installation of Python which largely isolates it from other projects on the same machine by installing packages locally.

This very nearly became mandatory, see PEP-0704 – however, virtual environments don’t work very well for certain use cases (for example continuous development pipelines) and it turns out that `pip` sits outside the PEP process so the PEP had no authority to mandate a change in `pip`!

The recommended approach to creating virtual environments is the built-in `venv` library. I use the Anaconda package manager since it allows the base version of Python to be varied on a project by project basis (or even allowing multiple versions for the same project). virtualenv, pipenv and poetry are alternatives.

IDEs like Visual Code allow the developer to select which virtual environment a project runs in.

Project layout for package publication

Tied in with the installation of packages is the creation and publication of packages. This is quite a complex topic, and I wrote a whole blog post on it. Essentially Python is moving to a package publication strategy which stores all configuration in a `pyproject.toml` file (toml is a simple configuration file format) rather than an executable Python file (typically called setup.py). This position has evolved over a number of years, and the current state is still propagating through the ecosystem. An example layout is shown below, setup.py is a legacy from former package structuring standards. The __init__.py files are an indication to Python that a directory contains package code.

Testing

Python has long included the `unittest` package as a built-in package – it is inspired the venerable JUnit test library for Java. `Pytest` is an alternative I have started using recently which has better support for reusable fixtures and a simpler, implicit syntax (which personally I don’t like). Readers will note that I have a tendency to use built-in packages where at all possible, this is largely to limit the process of picking the best of a range of options, and hedging against a package falling into disrepair. Typically I use Visual Code to run tests which has satisfying green tick marks for passing tests and uncomfortable red crosses for failing tests.

Integrated Development Environments

The choice of Integrated Development Environment is a personal one, Python is sufficiently straightforward that it is easy to use a text editor and commandline to complete development related tasks. I use Microsoft Visual Code, having moved from the simpler Sublime Text. Popular alternatives are the PyCharm IDE from JetBrains and the Spyder editor. There is even a built-in IDE called IDLE. The Jupyter Notebook is used quite widely particularly amongst data scientists (personally I hate the notebook paradigm, having worked with it extensively in Matlab) but this is more suited to exploratory data analysis and visualisation than code development. I use IPython, a simple REPL, a little to confirm syntax.

Static Analysis and Formatting Tools

I group static analysis and formatting tools together because for Python static analysers tend to creep into formatting. I have started using static analysis tools and a formatter since using Visual Code whose Python support builds it in, and using development pipelines when working with others. For static analysis I use a combination of pyflakes and pylint which are pretty standard choices, and for formatting I use black.

For Python a common standard for formatting is PEP-8 which describes the style used in the Python built-in library and C codebase.

Documentation Generation

I use sphinx for generating documentation, the process is described in detail this blog post. There is a built-in library, pydoc, which I didn’t realise existed! Doxygen, the de facto standard for C++ documentation generation will also work with Python.

Type-hinting

Type-hinting was formally added to Python in version 3.5 in 2015, it allows tools to carry out static analysis for compliance with the type-hints provided but is ignored by the interpreter. I wrote about this process in Visual Code in this blog post. I thought that type-hinting was a quirk of Python but it turns out that a number of dynamically typed languages allow some form of type-hinting, and TypeScript is a whole new language which adds type-hints to JavaScript.

Wrapping up

In writing this blog post I discovered a couple of built-in libraries that was not currently using (pydoc and venv). In searching for alternatives I also saw that over a period of a few years packages go in and out of favour, or at least support.

I welcome comments, probably best on Mastodon where you can find me here.

Book review: The Mythical Man-month by Frederick Brooks Jr.

mythical-man-monthNext up The Mythical Man-Month: Essays on Software Engineering by Frederick Brooks Jr.

This is a classic in software engineering which I’ve not previously read, I guess because it is more about software project management rather than programming itself. That said it contains the best description I have seen as to why we love to program, it is a page long so I won’t quote it in full but essentially it divides into 5 parts.

  1. the joy of making things;
  2. the joy of making things which other people find useful;
  3. the joy of solving puzzles;
  4. the joy of learning;
  5. the joy of working in such a malleable medium;

The majority of the book was written in the mid-seventies, following the author’s experiences in the previous decade, delivering the IBM OS/360 system. This means it reads like Asimov’s Foundation in places, dated technology, dated prose, but at the same time insightful. This is the 20th anniversary edition, published in 1995 which includes 4 new chapters tacked on the end of the original book. Two of these – No Silver Bullet and No silver Bullet – refired are a couple of essays from the eighties around the idea that there are no silver bullets to making software production greatly more efficient – this is in the context of hardware improvements which were progressing at unimaginable speed – Brooks was looking for routes to similar evolution in software development.

The other two new chapters are a bullet point summary of the original book and a retrospective on the first publication. The bullet point summary chapter removes the need for my usual style of review!

The core of the book is the observation that large software projects frequently fail to be delivered as scheduled. There then follow some chapters on addressing this issue. The Mythical Man-Month chapter is the most famous of these, it essentially says the enticing idea of throwing more people at a problem will speed up delivery is wrong. In some cases this is trivially true – you may have seen the example from the book that two women do not produce a baby in 4.5 months rather than 9 months for a single woman. The reason increasing team numbers for software development similarly fails is the cost in time of effective communication between more people, and the cost in time of onboarding new people. Despite our knowledge we still routinely under-estimate programming tasks, largely through mis-placed optimism.

As mentioned above, The Mythical Man-Month is quite dated. The anachronisms come in several forms, there is the excitement over computer text-editing systems – a revelation at the time of OS/360 was being developed. There is a whole chapter devoted to memory/storage usage which I am pretty sure is no longer a concern except in a few niche cases. There is quite a lot of discussion of batch versus time share systems, from a time when there were one or maybe two computers in a building rather than one on every desk, even one in every pocket. There are role anachronisms, where a team has two secretaries and a "Program Clerk" whose role it is to type stuff into the computer!

There are some places where Brooks describes practices which sound quite modern but differ slightly differently to the current sense. So "architecture" in his parlance is more about interface design than "architecture" as we would describe it now. There is some pair-like programming but it has a leader and a follower rather than equals. Agile is mentioned but not in the modern sense.

I was interested to read Brooks disdain of flowcharts. I remember learning about these – including proper stencils from my father – a programmer trained in the early sixties. Brooks argument is that the flowchart is useful for programming in low-level languages like assembler but not useful for high level languages – particularly those using the "structured programming" approach which was new at the time. Structured programming replaces the GOTO statements found in earlier languages with blocks of code executed on if – then – else conditions and similar.

In a chapter entitled Plan to throw one away Brooks talks about the inevitability of the first iteration of a product being thrown away to be replaced by a better thought out version although he caveats this a little by allowing for progressive replacement. He notes that maintenance costs can be 40% of original cost and each new version fixing bugs has a 20% chance of introducing new bugs. This means the progressive replacement approach is a losing game.

In some ways this book is depressing, nearly 50 years after it was written software projects are still being delivered late with great regularity. On a more positive note I believe that the widespread availability of web APIs and online module libraries (such as PyPI for Python) can produce the sort of large uptick in programmer productivity that Brooks’ felt was out of reach. Perhaps this will not be seen as a productivity boost since it simply means the systems we aim to build are more complex and the number of lines of code measure of code does not capture the complexity of external libraries. The consistent user interfaces provided by Mac OS and Windows are also something Brooks was looking for in his original publication.

Is The Mythical-Man Month still worth reading? I’d say a qualified "yes", the issues it describes are still pertinent and it draws on quantitative research about teams of software developers which I believe will still be broadly relevant. It is a relatively short, easy to read, book. It gives a glimpse into the history of computing. On the downside, much of the incidental detail is so far out of date to be distracting.

Book review: Software Design Decoded by Marian Petre and André van der Hoek

66-ways-expertsSoftware Design Decoded: 66 Ways Experts Think by Marian Petre and André van der Hoek is my next read.

I picked it up as a recommendation from The Programmer’s Brain by Felienne Hermans. It is an odd little book, something like A6 format with 66 pages containing a short paragraph or two on the behaviours of experts in software design. Each page dedicated to a single thought. There are sketches scattered liberally though the book by Yen Quach who is credited in the author biographies.

Although it does not have a contents page or index, Software Design Decoded is divided into "chapters":

  • Experts keep it simple
  • Experts collaborate
  • Expers borrow
  • Experts break rules
  • Experts sketch
  • Experts work with uncertainty
  • Experts are not afraid
  • Experts iterate
  • Experts test
  • Experts reflect
  • Experts keep going

I found this book reassuring as much as anything, and it also gave me some things to think about. Reassuring because it turns out I share habits with expert in software design, which must be a start to being an expert! I write quite a lot of software (for data analysis and data builds) but design tends to come as an afterthought.

I think the things I already do are to build something even if it isn’t the final form, I was interested in the comment about avoiding over-generalisation. The element I am missing here is to learn from this initial form and build something better (potentially discarding what I’ve already done). I also do a fair bit of testing, although in this book testing is wider than just software unit tests or even integration tests, it is about testing preconceptions and testing with the user.

I also liked the comment on focusing on the needs of the key stakeholders where the key stakeholders are the end users, this is a recurring theme – that the end users are the key focus, and them using the product/software are when the job is done.

Always learning gets a recommendation as well as not being afraid to use things in manners other than that intended.

I was interested to note the comments on experts forever sketching since it is something I scarcely do, sometimes a write sequences of tricky bits of code with the odd arrow. I remember learning how to draw flow charts in the late seventies but rarely use the skill (certainly not with all the proper symbols). Software Design Decoded is slightly contradictory on this, in one place experts sketch abstractly as an aid to thought with the sketches meaningless beyond the moment, and in another the sketches are kept for reference later and hence clear and well-labelled.

Notation also gets a couple of mentioned, I take this as a formalised system for naming things – something popular with physicists where the right notation is the difference between a page of formulae and a single line. I’m not really aware of using this in my own practice. Despite repeated attempts at object-oriented design I still tend to be quite "procedural".

I’m still in the "learning" phase of collaboration, for the first time in a while I’m working on code with other people (and it is a bit of a shock for all concerned), I still can’t abide by meetings but the experts can’t abide some of them (the ones with no direction).

I found this a bit of a "feel good" book, I share at least some of the habits of software design experts! I probably wouldn’t buy it for a personal read but if you have a coffee table in your software company this book would fit right in.