Tag: software development

Jan 06 2025

Rosetta Stone – Rust

By Ian Hopkinson in Deva Data, Technology

In an earlier blog post I explained the motivation for a series of “Rosetta Stone” posts which described the ecosystem for different programming languages. This post covers Rust, the associated GitHub repository with some minimal code is here. This blog post aims to provide some discussion around technology choices whilst the GitHub repository provides details of what commands to execute and a minimal example project.

I became interested in Rust because a number of new tools such as uv and ruff for the Python ecosystem are written in Rust. It’s fair to say there has been a buzz around the language in programming circles for some years.

About Rust

Rust started out as a personal project by Graydon Hoare, an employee of Mozilla, in 2006. It was adopted by Mozilla in 2009 and in 2012 the first public release was made. It was originally designed with a focus on memory management and memory safety. It stands with C and assembly as the only languages supported in the development of the Linux kernel. I read the The Rust Programming Language book recently, the distinguishing feature of Rust is the borrow checker and ownership model which is what gives it strong memory safety credentials.

Rust is named for the rust fungi, not the corrosion of iron but it seems few people realise this!

Rust currently ranks 13^th on the TIOBE index but has been regularly voted the most admired language on the Stackoverflow rankings.

How is the language defined?

The home for the Rust language is here. Rust evolves through an RFC (request for comments) process, which can be found in the Rust RFC Book. Currently there is no formal specification for the language but there is an exercise to write one.

New versions of the language are released every 6 week (see here). Breaking changes can be introduced in editions (once every 3 years with the next edition, 2024, due in February 2025) but upgrading between editions is optional and there are tools to support version migrations. For more about the edition process see here.

There is a roadmap for the 2024 edition (see here) and presumably once that is released the next roadmap will be developed.

Rust Compilers

The Rust compiler is rustc, there are some alternatives but they are experiments rather than production systems. Typically rustc is invoked with the cargo package manager, see below. My first impression was that the compiler provided really good error messages.

Details of installation can be found here in the accompanying GitHub repository. Installing the whole Rust toolchain is ridiculously simple.

Package management

cargo is the Rust package manager, it is also typically used to compile code to produce a “crate” which may be a binary or library.

Rather neatly invoking cargo to produce a new package also creates a new Git repository with the appropriate layout:

cargo new hello_world --bin

This generates a cargo.toml file which defines the project and dependencies, and on compile a cargo.lock file is produced which specifies exactly the versions of all the dependences. This parallels the NPM system for JavaScript/TypeScript. There is also a --lib flag which generates a layout for a library package. In practice this is just a starting point, your code may include both binaries and libraries for which you will need to edit the cargo.toml file. However, this cargo new process reflects how I would start a project in Python.

The standard library is here, there are no built-in CSV or JSON libraries – regular readers will recall I use this as a measure of a language’s completeness since my favoured language, Python, has them! Others may have different opinions.

Third party Rust crates can be found on the crate registry.

Virtual environments

Rust does not require virtual environments since cargo fetches the appropriate packages on compile, it uses a local cache of packages but this is a true cache – if the version of a package is not in the cache then it is fetched.

Project layout for package publication

cargo new generates a minimal project layout. More generally the conventional project layout is shown below (see here) when code is compiled a target directory is created for the compiled code.

It is rather nice having a mandated project layout!

Testing

A testing framework is built-in and a skeleton test is included when a --lib project is created by cargo The tests look like this:

This is essentially a unit test and the intention in Rust is that the unit tests sit next to the functions which they test, in Python unit tests tend to be separated into tests directory. In a Rust project the tests directory holds integration tests. There is a third type of test, doctests which are embedded in the comments intended for documentation.

All three types of test can be run with cargo test.

The built-in test functionality does not include features such as mocks or parameterized tests but there are third party crates to provide this functionality.

Static analysis and formatting tools

Clippy is the built-in linter, used in addition to rustfmt, the formatter. The are invoked using cargo with cargo fmt and cargo clippy. As a compiled language the compiler provides static analysis too. I would not expect alternatives to clippy or rustfmt to gain much traction since these appear to be the “official” tools.

Documentation Generation

rustdoc is the standard documentation generator tool (see here), working from comments in code and invoked using cargo doc. It feels to me that there is scope for alternatives to rustdoc although perhaps I am mixing up the documentation generation engine with the standards for code comments for documentation which have arisen in Python.

Wrapping up

I was struck by how everything is pretty much built-in for Rust; compiling, packaging, testing, formatting, and linting. All of these tools can be driven from the package management tool, cargo. It is almost like the design team was trying to make the job of writing this post as easy as possible! In writing Rust code in associated with this post I found the built-in test framework missing features (like mocks and parameterization which I’m used to in Python), similarly the documentation system is quite simple so perhaps in time alternatives will start to predominate.

My “home” language of Python has acquired these features over a long period of time, utilizing third party libraries and tools that eventually became de facto standards.

It would be nice to make a comparison with Go which occupies a similar space in programming as a relatively recent arrival with what I believe is a similarly full tool suite.

I welcome comments, probably best on Mastodon where you can find me here.

Deva Data, rosetta stone, rust, software development

Dec 12 2024

Book review: The Rust Programming Language by Steve Klabnik and Carol Nichols

By Ian Hopkinson in Book Reviews, Deva Data

My next review is a small departure, it is for The Rust Programming Language by Steve Klabnik and Carol Nichols but rather than buy the physical book I read this interactive version, which has quizzes and embedded, runnable code.

The Rust programming language is the target for my next Rosetta Stone blog post which identifies all the tooling for a language. For Rust these tools (the compiler, formatter, linter, package management and automated documentation) are pretty much all “built-in” so my job will be easy. The challenge in this case has been learning the language since I try to write a little code for the Rosetta Stone posts to demonstrate what I’ve learned.

Rust is a relatively new language, very highly regarded amongst developers and one of only two approved for developing Linux. Its focus is on “safety” and speed. It has been used to make new, highly performant tools for the Python ecosystem, like ruff and uv, which is how I came to know of it.

For a Python programmer Rust does not look alien in the way that Haskell and Lisp do; admittedly it uses curly braces for scoping, is strongly-typed and supports pointers and references which are not features of Python but are common in other languages.

To me it feels like C but with some object-oriented features; the authors talk explicitly about it having features of functional languages including use of the Option value which contains something or nothing, the compiler enforces the handling of nothing. I think I might start using this more explicitly in Python which allows type-hinting to indicate an Option-like value. The other explicitly referenced functional feature is the use of closures, in Python closures are functions defined within functions or anonymous / lambda functions whilst in Rust they seem to be closer to functions passed as arguments to other functions.

Rust makes what seems an odd distinction between functions and macros in its standard library. Macros are identified with an exclamation mark, for example println! As I understand this is because a macro is implemented using code generation at runtime which allows the developer to supply, for example, a variable number of arguments to println! which would not be possible for a function println. Python doesn’t make this distinction and is very permissive in the arguments that a function can take, allowing both position and keyword arguments of variable number.

I was struck by the way that different languages use the same word for different things. For example in Rust a struct is both a data container but can also carry methods in the way that “classes” in other languages do. In Python an enum is a closed list of values but in Rust those values can have user-defined associated values so that an enum for IP address protocols can contain V4 and V6 (as it would in Python) but in addition it can hold an actual IP address associated with each entry in the enumeration.

My understanding is that Rust is considered an object-oriented language by some but not all. In practical terms the only object-oriented feature missing is inheritance although this can be approximated by the use of generics and traits.

Since I have scarcely used pointers and references in 40 years of programming I found these concepts a bit challenging but I have made progress in my understanding through reading this book. Excited by this new found understanding I was then confused by Rust’s ownership model! Ownership and borrowing are the big, unique conceptual features of Rust. The aim of ownership is to rigorously ensure that code is safe – no writing past the end of arrays, or dereferencing null points. It also means that the performance hit of a garbage collector is not required, this task is pretty much entirely handled at compile time by the borrow checker. The strong ownership model makes concurrent programming easier too.

In trying to understand ownership and borrowing I found some useful tools, the best entry point was the BORIS tool by Christian Schott which lists other visualization tools including the Aquascope tool used in this book. I found the first such tool in RustViz which is used in teaching, it needs the user to put annotations into the code rather than working out the annotations itself. I also read this article by Chris Morgan.

As language books go, this one is pretty readable, and I found the built-in quizzes handy, if only to illustrate my ignorance particularly of the ownership and borrowing model. I think for my next technical book I might read Category Theory for Programmers by Bartosz Milewski. I have felt when applying type-hints to Python, and learning Rust and Haskell that I was missing out by not understanding at least the basics of category theory.

I enjoyed this book and the format worked pretty well for me, although I need to find a way of reading online content like this in more comfort. I’m keen to give Rust a go now!

Deva Data, rust, software development

Aug 31 2024

Book review: Beautiful Code edited by Andy Oram & Greg Wilson

By Ian Hopkinson in Book Reviews, Deva Data

This review is of Beautiful Code edited by Andy Oram and Greg Wilson, a collection of 33 essays by 41 authors about computer code that the authors consider beautiful. A number of the authors are very well known, including Brian Kernighan, Charles Petzold, Douglas Crockford and Yukihiro Matsumoto.

The chapters vary considerably in length but average a little under 20 pages which works well for me – I find 20 pages is a reasonable chunk to read in one go. Although the chapters are presented without any organization, they are actually grouped into themes.

A few chapters are on algorithms, which is what I think of when people talk about beautiful code. A few chapters are on applications, and their architecture, a couple are on assistive technologies, a couple are on libraries/frameworks. A few are on operating system code: device driver architecture and handling threads. It is fair to say that beauty is in the eye of the beholder, and generally I felt authors wrote about their favourite piece of work rather than something that might be broadly considered beautiful code.

Several chapters I was more interested in the subject area than the actual code. There are a couple of chapters on bioinformatics (I was a lecturer in Biological Physics for a couple of years), a couple on Python (my primary programming language for the last 10 years or so) . The Python chapters are on the implementation of the dictionary in the core language and multidimensional iterators in Numpy (a widely used numerical library). The NASA chapter was a bit of a let-down since it involved strictly ground based code. However, I was excited to learn about the challenges of two Martian time zones as well as earth based time zones!

Pretty much all of the chapters contain moderate amounts of code. The implementing language varies, there are a number written using examples in Lisp. Ruby and Perl have a couple of outings, as does Haskell, also present are Java, C#, Python and C. Reading through the author biographies it seems some of them were involved in the creation or ongoing development of some of these languages.

I gave up on one chapter in part because it was in Lisp (not a language I’ve ever used) with no support except a suggestion to go read the first few chapters of the author’s book on Scheme (a dialect of Lisp) but also because it was about macro expansion which I’ve always considered a nasty kludge. Brian Hayes’ chapter also used Lisp but provides a little schematic diagram showing the structure of a Lisp function which was really helpful. Lisp programmers really like their brackets!

It was interesting to learn about Lisp’s advice functionality which is like Python decorators. Haskell is neat in its very clean separation between pure functions and functions with side effects. I can’t help thinking that both languages are best suited to highly mathematical developers working alone, their notation is exceedingly concise and impenetrable to outsiders. However, ideas from these more esoteric languages are usefully incorporated to more mainstream ones and programming styles.

Binary search and sorting are a feature of several of the early chapters. One author points out it took 12 years after its invention for a correct implementation of binary search to be written, and only 10% of developers get it right first time when implementing it themselves. The core error is an issue with numeric overflow which highlights that the difficulty of coding comes not just from algorithmic design but also lower level implementation details. The later chapter on the numerical analysis libraries from CERN (BLAS, LINPACK, LAPACK) highlight this again, optimum algorithms changed as the underlying machine CPU, memory and network architectures changed. The book finishes with a chapter on algorithms for checking the collinearity of three points on a plane, I liked this one. The twist at the end is that the best algorithm is to measure the area of the triangle the three points form, if it is zero then the points are colinear. This algorithm has the benefit if numerical stability, again a imposition of underlying numerical representation.

I liked the chapter on a logging framework, it made frequent references to design patterns and seemed like a nice example of beauty in higher level design.

In my view Charles Petzold’s chapter describes eldritch code, rather than beautiful! He shows how to generate C# intermediate language (IL) at runtime in order to speed up image processing operations. This involves line by line generation of raw IL using C#’s reflection functions. I’m not saying it is not very cunning or interesting but it isn’t pretty. It is also a personal interest of mine since I spent a number of years working on image analysis.

Some of the applications discussed have stood the test of time, ERP5 is still around as is Emacspeak (accessibility software for the blind). I can find no trace of Cryptonite (an email client) or Elocutor (accessibility software originally designed for Professor Stephen Hawking). Components of the Subversion and Perforce source control applications are included. Obsolesce seems to be a combination of the language used, the change in the web and competition. Beautiful Code was written in 2007, nearly 20 years ago and the web was a very different place then.

A couple of chapters talked about how code appeared on the page – I particularly liked the idea of “bookish” code, code laid out in the manner of a book or magazine to aid readability – interestingly this chapter was in favour of shorter variable names for readability rather than longer ones for description. A recurring theme is that the code is never beautiful in the first instance, it normally reaches beauty by a process of iteration and refinement.

The book was first published in 2007, and its age shows in some places. It finishes with biographies of the authors which could have more usefully been put with their respective chapters. I was sad to see that only one author appears to be a woman, I suspect this would be the case if the book was written now.

I enjoyed this book, most of the chapters struck some sort of cord with me.

Deva Data, software development

Feb 08 2024

Book review: A Philosophy of Software Design by John Ousterhout

By Ian Hopkinson in Book Reviews, Deva Data

Next for review is A Philosophy of Software Design by John Ousterhout. This a book about a big idea for software design. The big idea is that good software design is about managing complexity. In Ousterhout’s view complexity leads to a set of bad things: (1) change amplification – small changes in code lead to big effects, (2) cognitive load – complex software is difficult to understand, and (3) unknown unknowns – complex software can spring the unexpected on us.

Ousterhout covers this big idea in 20 short chapters with frequent reference to a projects that he has run with students repeatedly (including a GUI text editor and a HTTP server) – providing a testbed for reviewing many design choices. He also uses the RAMCloud project as an example, as well as features of Java and the UNIX operating system. This makes a refreshing change from artificial examples.

To decrease complexity requires developers to think strategically rather than tactically which goes against the flow of some development methodologies. Ousterhout suggests spending 10-20% of time on strategic thinking – this will pay off in the long term. He cites Facebook as a company who worked tactically and Google and VMWare as companies who worked more strategically.

At the core of reducing complexity is the idea of “deep modules”, that’s to say systems that have a relatively small interface (the width) which hides information about a potentially complex process (the depth). The Java garbage collector is the limiting case for this – having no user accessible interface. The aim of the deep modules is to hide implementation details (information) from users whilst presenting an interface that only takes what is required. This means deciding what matters to the user – and the best answer is as little as possible.

This goes somewhat against the ideas of the Agile development movement, as expressed in Clean Code by Robert C. Martin (which I read 10 years ago) – who was a big fan of very short functions. I noticed in my review Clean Code that I have some sympathy with Ousterhout’s view – small functions introduce a complexity overhead in function definitions.

Also on the theme of Agile development, Martin (in Clean Code) sees comments as a failing whilst Ousterhout is a fan of comments, covering them in four chapters. I recently worked on a project where the coding style was to rigorously exclude comments which I found unhelpful, that said I look at my code now and see orphaned comments – no longer accurate or relevant. The first of Ousterhout’s chapters on comments talks about four excuses to not provide comments, and his response to them:

Good code is self-documenting – some things cannot be said in code (like motivations and quirks)
I don’t have time to document – 10% of time on comments will pay off in future
Comments get out of date and are misleading – see later chapter
The comments I have seen are useless – do better!

The later chapters focus on the considered use of comments – thinking about where and what to comment rather than sprinkling comments around at a whim. The use of auto-documentation systems (like Sphinx for Python) is a large part of realising this since they force you to follow standard formats for comments – typically after the first line of a function definition. Comments on implementation details should be found through the body of a function (and definitely not in source control commit messages). He also introduces the idea of a central file for recording design decisions that don’t fit naturally into the code. I include the chapter on “Choosing names” under “comments” – Ousterhout observes that if you are struggling to find a good name for something there is a good chance that what you are trying to name is complex and needs simplification.

Certain types of code, Ousterhout cites event-driven programming, are not amenable to producing easy to understand code. He also dedicates a chapter to handling errors – arguing that errors should be defined out of existence (for example deleting a file that doesn’t exist shouldn’t cause an error, because the definition of such a function should be “make sure a file does not exist” rather than “delete a file”). Any remaining exceptions should be handled in one place, as far as possible.

There is a chapter on modern software development ideas and where they fit, or don’t, with the central theme. Object-orientation he sees as good in general, with information hidden inside classes but warns against over use of inheritance which leads to complexity (just where is this method defined?). He also comments that although design patterns are generally good their over-application is bad. He is in favour of unit tests but not test-driven development. This seems to be related to his central issue around Agile development – it encourages tactical coding in an effort to produce features rapidly (per sprint). He believes Agile can work if the “features” developed in sprints are replaced with “abstractions”. He doesn’t like Java’s getters and setters, nor its complex serialisation system which requires you to setup up buffering separately from opening a file as a stream – I remember finding this puzzling.

I enjoyed this book – it provides some support for continuing to do things I currently do although they are a little against the flow of Agile development and food for thought in improving further.

Deva Data, software development

Oct 24 2023

Rosetta Stone – TypeScript

By Ian Hopkinson in Deva Data, Technology

In an earlier blog post I explained the motivation for a series of “Rosetta Stone” posts which described the ecosystem for different programming languages. This post covers TypeScript, the associated GitHub repository is here. This blog post aims to provide some discussion around technology choices whilst the GitHub repository provides details of what commands to execute and what files to create.

I was curious to try this exercise on a language, TypeScript, which I had not previously used or even considered. It so happens that TypeScript arose in a recent discussion so I thought I would look at it.

So for this post in particular, factual errors are down to ignorance on my part! I welcome corrections.

About TypeScript

TypeScript was developed by Microsoft with the first release in October 2012 with the aim of providing a language suited for use in large applications by adding a type system to JavaScript. TypeScript is compiled to JavaScript for execution, in common with a number of other languages.

TypeScript ranks quite low on the TIOBE index but ranks fourth on the GitHub Top Programming Languages and fifth in the StackOverflow rankings. This is likely because TIOBE is based on search rankings and for many TypeScript queries the answer will be found in the more extensive JavaScript corpus.

How is the language defined?

The homepage for TypeScript is https://www.typescriptlang.org/. There appears to be no formal, up to date language specification. The roadmap for the language is published by Microsoft, and develops through a process of Design Proposals. TypeScript releases a new minor version every 3 months, and once 10 minor versions have been released the major version is incremented. There is some debate about this strategy since it does not follow either conventional semantic or date-based versioning.

The JavaScript runtime on which compiled TypeScript code is run will also have an evolution and update process.

TypeScript Compilers and runtimes

The TypeScript compiler, tsc, is typically installed using the node.js package manager, npm. Node.js is a runtime engine which runs JavaScript without the need for a browser. It is downloaded and installed from here. Node.js is just one of a number of options for running JavaScript compiled from TypeScript, it can be run in the browser or using other systems such as Deno or Bun.

The install of node.js appears trivial but on Windows machines there is a lengthy install of Visual Studio build tools, the chocolatey package manager and Python after node.js has installed!

Once node.js is installed installing TypeScript is a simple package installation, it can be installed globally or at a project level. Typically getting started guides assume a global install which simplifies paths.

Tsc is configured with a file, tsconfig.json file – a template can be generated by running `tsc –init`

It is possible to compile TypeScript to JavaScript using Babel which is a build tool originally designed to transpile between versions of JavaScript.

Details of installation can be found here in the accompanying GitHub repository.

Package/library management

Npm, part of the node.js ecosystem is the primary package management system for TypeScript. Yarn and pnpm are alternatives that have a very similar (identical?) interface.

A TypeScript project is defined by a `package.json` file which holds the project metadata both user-generated and technical such as dependencies and scripts – in this sense it mirrors the Python `pyproject.toml` file. Npm generates a `package-lock.json` file which contains exact specifications of the installed packages for a project (somewhat like the manually generated requirements.txt file in Python). The JavaScript/TypeScript standard library is defined here. I note there is no CSV library 😉. More widely third party libraries can be found in the node registry here.

To use the standard JavaScript library in TypeScript the type definitions need to be installed with:

Npm install @types/node –save-dev

Local packages can be installed for development, as described here.

Npm has neat functionality whereby scripts for executing the project tests, linting, formatting and whatever else you want, can be specified in the `package.json` file.

Virtual environments

Python has long had the concept of a virtual environment, where the Python interpreter and installed packages can be specified at a project level to simplify dependency management. Npm essentially has the same functionality by the use of saved dependences which are installed into a `node_modules` folder. The node.js version can be specified in the `package.json` file, completing the isolation from global installation.

Project layout for package publication

There is no formally recommended project structure for TypeScript/npm packages. However, Microsoft has published a Node starter project which we must assume reflects best practice. An npm project will contain a `package.json` file at the root of the project and put locally, project-level packages into a node_modules directory.

Based on the node starter project, a reasonable project structure would contain the following folders, with configuration files in the project root directory:

dist – contains compiled JavaScript, `build` is another popular name for this folder;
docs – contains documentation output from documentation generation packages ;
node_modules – created by npm, contains copies of the packages used by a project;
src – contains TypeScript source files ;
tests – contains TypeScript test files;

How this works in practice is shown in the accompanying GitHub repository. TypeScript is often used for writing web applications in which case there would be separate directories for web assets like HTML, CSS and images.

Testing

According to State of JS, Jest has recently become the most popular testing framework for JavaScript, mocha.js is was the most popular until quite recently. The headline chart on State of JS is a little confusing – there is a selector below the chart that allows you to switch between Awareness, Usage, Interest and Retention. These results are based on a survey of a self-selecting audience, so “Buyer beware”. Jest was developed by Facebook to test React applications, there is a very popular Microsoft Visual Code plugin for Jest.

Jest is installed with npm alongside ts-jest and the Jest TypeScript types, and configured using a jest.config.json file in the root of the file. In its simplest form this configuration file provides a selector for finding tests, and a transform rule to apply ts-jest to TypeScript files before execution. Details can be found in the accompanying GitHub repository.

Static analysis and formatting tools

Static analysis, linting, for TypeScript is generally done using ESLint, the default JavaScript linter, although a special variant is installed as linked here. Previously there was a separate TSLint linter although this has been deprecated. Installation and configuration details can be found in the accompanying GitHub repo.

There is an ESLint extension for Microsoft Visual Code.

The Prettier formatter seems to be the go to formatter for TypeScript (as well as JavaScript and associated file formats). Philosophically, in common with Python’s black formatter, prettier intends to give you no choice in the matter of formatting. So I am not going to provide any configuration, the accompanying GitHub repository simply contains a .prettierignore file which lists directories (such as the compiled JavaScript) that prettier is to ignore.

There is a prettier extension for Visual Code

Documentation Generation

TypeDoc is the leading documentation generation system for TypeScript although Microsoft are also working on TSDoc which seeks to standardise the comments used in JSDoc style documentation. TypeDoc aims to support the TSDoc standard. TypeDoc is closely related to the JavaScript JSDoc package.

Wrapping up

I was struck by the similarities between Python and TypeScript tooling, particularly around configuring a project. The npm package.json configuration file is very similar in scope to the Python pyproject.toml file. Npm has the neat additional features of adding packages to package.json when they are installed and generating the equivalent of the requirements.txt file automatically. It also allows the user to specify a set of “scripts” for running tests, linting and so forth – in Python I typically use a separate tool, `make`, to do this.

In both Python and TypeScript third party tools have multiple configuration methods, including JSON or ini/toml format, Python or JavaScript files. At least one tool has a lengthy, angry thread on their GitHub repository arguing about allowing configuration to be set in the default package configuration file! I chose the separate json file method where available because it clearly separates out the configuration for a particular tool in the project, and is data rather than executable code. In production I have tended to use a single combined configuration file to limit the number of files in the root of a project.

I welcome comments, probably best on Mastodon where you can find me here.

Deva Data, rosetta, software development, typescript

Tag: software development