Ian Hopkinson

Author's posts

Book review: Data Points by Nathan Yau

data_pointsI picked up Data Points by Nathan Yau as a recommended book on exploratory data analysis in Storytelling with data. I have previously read Nathan Yau’s book Visualize This.

Visualise This  was very focussed on the technical side of producing data visualisations, with code samples and so forth. This is a “bigger picture” book divided into three sections: context, exploration and presentation.

Context can be summarised as: who, how, what, when, where, why. Context is covered explicitly in the first chapter using the medium of Yau’s wedding photos as an example. Spinning off from here is a mention of the Quantified Self movement, there was a time a few years ago when this was popular – people would record aspects of their life in great detail and build visualisations from them. This was enabled by the growth of the first generations of smartphone which made this sort of data collection easy. Yau points out – and I think all data scientists can agree with this – that most of the job is actually collecting together the data required for a project and getting it into a shape to visualise.

The “Exploration” chapters start with an overview of what a data visualisation is, one of the strengths of this book is the many examples of visualisations, in this case going as far back as William Playfair in 1786 with the invention of the bar chart. This chapter also highlights that a data visualisation can be a flow chart, or it can be an abstract piece of art which is based on data. Yau cites John Tukey’s Exploratory Data Analysis a number of times which was published in the 1970s at a time when the author felt the need to explain that a “bold” effect can be achieved using a pen rather than a pencil. The point being that we now have immense power in readily available software to produce visualisations at the click of a button which would have taken an expert many hours of manual labour in the relatively recent past.

The next chapters provide a summary of how we build a data visualisation starting with the fundamental building blocks: title, visual cues (the data), coordinate system, scale and context elements. The visual cues are further broken down into attributes like position, length, angle, direction, shapes and so forth.

Once this groundwork has been done, there is an extensive taxonomy of chart types including more esoteric plots such as the cartogram (where geographic areas are distorted to show the relative sizes of variables), and radar or polar plots which, along with calendar heatmaps are useful for showing periodic timeseries data.
The “Visualising with clarity” chapter starts to talk about presentation, and how the purpose of visualisations is to allow comparisons. I think the useful takeaway from this chapter for me was that distribution plots are rather more difficult for the lay viewer to interpret than practitioners realise.
I found the penultimate chapter on “Designing for an audience” a little brief. A handy hint here was to design presentations for the audience at the back of the room – nobody likes to hear “this is probably too small for you to see” from a speaker. Another useful tip for making interactive presentations is that people like to find out about themselves, so if you have data on people then make it easy for viewers to “look themselves up” because that’s the first thing they are going to do.

The book finishes with a chapter on technologies, some of them such as R, Adobe Illustrator, Microsoft Excel, Google Sheets, Tableau are still around and remain good choices. Yau’s favoured combination is R with Adobe Illustrator used to polish the results. The Javascript library Data Driven Documents (d3) and Processing are still active. Other systems like IBM’s Many Eyes project, MapBox’s TileMill have disappeared. Javascript Libraries Raphael and the Javascript Infovis Toolkit appear dormant, in the sense that the activity on their GitHub repositories is minimal. Nobody talks about Flash and ActionScript anymore.

Data Points is much more a book about exploratory data visualisation then Storytelling with data, I think Yau believes that exploratory data analysis is an exercise in storytelling. The strength of this book is the wide range of examples used to illustrate the points being made through the book. The style is chatty, it is not a difficult read. It is less focussed on delivering specific lessons in making data visualisations than Storytelling with data.

Book review: Storytelling with data by Cole Nussbaumer Knaflic

storytellingThis book, Storytelling with data by Cole Nussbaumer Knaflic, fits in with my work, and my interests. It relates to data visualisation, an area in which I have read a number of books including The Visual Display of Quantitative Information by Edward R. Tufte, Visualize This by Nathan Yau, Data Visualization: a successful design process by Andy Kirk and Interactive Data Visualization for the web by Scott Murray. These range from the intensely theoretical (Tufte) to the deeply technical (Murray).

Storytelling with data is closest in content to Andy Kirk’s book and his website is cited in the (very good) additional resources list. A second similarity with Andy Kirk’s book is that Storytelling is “the book of the course” –  the book is derived from her the author’s training courses.

The differentiating factor with Knaflic’s book is the focus on storytelling, presenting a case to persuade rather than focussing on on the production of a data visualisation, although that is part of the process. The book is divided into 6 key lessons, each of which gets a chapter, with a couple of chapters of examples, an introduction and an epilogue this makes 10 chapters. The six key lessons are:

1. understand the context
2. choose an appropriate visual display
3. eliminate clutter
4. focus attention where you want it
5. think like a designer
6. tell a story

I think I got the most out of the understand the context and tell a story chapters, technically I am quite experienced but my knowledge is around how to make charts and process the data to make charts rather than telling a story. The understanding the context chapter talks about the “Big Idea” and the “3-minutes story”. The Big Idea is the single idea you are trying to get across in a presentation, and the 3-minute story is the elevator pitch – how you would put your story into 3 minutes. I liked a callout box with a list of verbs (accept, agree, begin, believe…) used to prompt you for what action you want your audience to take having seen your presentation.

The chapter on choosing an appropriate visual display is quite straightforward, Knaflic presents the 12 types of display she finds herself using frequently (which includes simple text, and text tables). This is a fairly small set since variations of bar charts – horizontal, vertical, stacked and waterfall cover off 5 types. This is appropriate, if you are telling a story to persuade then you don’t want to be spending your time explaining how your esoteric display works. Knaflic steers away from specific technology, only mentioning at the beginning of the book that all the charts shown were made in Microsoft Excel and Adobe Illustrator was sometimes used to get a chart looking just right at the end of the process.

There is a list of sins in data visualisation including the reviled pie chart, and 3D plots but perhaps surprisingly the use of secondary axes to plot data on different scales together.

The chapters on eliminate clutter, focus attention where you want it, and think like a designer are all about making sure that the viewer is paying attention where you want them to pay attention. Some of this is about the Tuftian “eliminate clutter” much of which creeps into charts through default behaviour in software. Some is about using gestalt theories of attention to group items together through similarity, proximity and so forth and some is about using pre-attentive attributes such as colour and type face to draw attention to certain elements. This reminded me of The Programmer’s Brain by Felienne Hermans, which links theories of how our brain works with the practices of programming.

The chapter on tell a story introduces some resources on storying telling from playwrights and screenwriters – basically the idea of the three act play with a setup, conflict and resolution. This is a different way of thinking for me, my presentations tend to follow the traditional structure of a scientific paper but it is interesting to see the link with creative writing and drama – which is generally excluded from scientific writing.

One of the lessons I learnt from this book was to make better use of of chart titles and PowerPoint titles, I tend to go for  descriptive chart titles (“Ticket Trend”, to use an example from the book) and PowerPoint titles which simply labelled a section of a talk (“Methodology”). Knaflic encourages us to use this valuable “real estate” in a presentation for a call to action: “Please Approve the Hire of 2 FTEs”.

The six lessons are reinforced with a chapter which covers a single worked example from beginning to end, and another chapter of case studies which looks at fixing particular issues with single charts.

I enjoyed this book, its beautifully produced and fairly easy reading. It also led me to buy two more books Resonate by Nancy Duarte and Data Points by Nathan Yau, and so the “to be read” pile grows again!

Book review: Richard Trevithick – Giant of Steam by Anthony Burton

A second hand book to review this time, Richard Trevithick – Giant of Steam by Anthony Burton. I bought it in Malvern. Richard Trevithick is best know as the inventor of the steam railway locomotive – the first person to put a steam engine on a carriage with wheels and put that carriage on metal rails. This followed his demonstration of a steam road carriage in 1801, with the railway locomotives in the following couple of years.

Richard Trevithick was born near Camborne in Cornwall to Ann Teague (a miners daughter) and Richard Trevithick Senior, a mine “captain”, in 1771. He died in 1833. He had a wife, Jane who would be well-described as “long-suffering” – Trevithick had little interest in providing a steady income for his family or at least if he had the desire he was inept at executing it and was briefly bankrupt in 1815. Furthermore he left for South America for a period of 11 years from 1816 to 1827, with little communication back home with his wife and friends in England during that period. Despite this his six children, and his wife, seemed to have held him in at least some regard and his son Francis, at the very least in high regard. Jane Trevithick lived until 1868.

The Cornish mining milieu is a key feature of his upbringing and subsequent career. The mine “captains” were very hands-on managers who led mining operations at the Cornish mines. They often had significant financial interest in mines. Cornwall in the 18th century was seen as a bit of an English Wild West with a degree of opposition to ideas developed outside the area. Steam engines had been born in the South West to drain mines, with the first made by Thomas Savery in 1698, followed by Thomas Newcomen’s more practicable engine invented in 1712. Both Savery and Newcomen were from the neighbouring county of Devon.

The James Watt / Matthew Boulton steam engine was to dominate the market for steam engines in the United Kingdom from 1775 until the end of the 18th century. It was a more efficient engine than those that went before, commercially it was protected aggressively by Watt and Boulton using patents which supressed other developments in the area until they expired.

Trevithick had a fairly minimal education but seemed to be a very adept calculator, he was a large, strong man with something of a temper. This caused him problems later in life with some of his inventions which essentially failed because he fell out massively with his backers/potential customers and stopped work on them. He had a life-long friendship with Davies Gilbert who was more scientifically inclined. Trevithick quickly moved to working in the local mines first as a helper to his father but then in his own right. It’s interesting that steam engines would have been a regular part of the Cornish mining industry for seventy or so years before Trevithick entered the scene. Developments were clearly relatively slow until the arrival of the Watt/Boulton engine. The key scientific development in the area, the discovery of latent heat – the energy required to bring water from the liquid to gaseous state – was only published in 1763 by Joseph Black.

On railway locomotives it turned out Trevithick was a little before his time, George Stephenson was to successfully kick off the railway revolution with the Stockton and Darlington Railway in 1825 and the Liverpool and Manchester line in 1829 – twenty or so years after Trevithick’s demonstration. Trevithick’s effort suffered from two issues, one systematic issue was Trevithick’s approach which was to demonstrate many ideas but never to follow them through to successful, commercial exploitation. The second, technical, issue was that iron rails at the time were not tough enough to handle the weight of a steam engine and soon fractured. Interestingly Robert Stephenson, George’s son and a significant railway engineer in his own right, met Trevithick in Columbia in 1826.

Trevithick’s real innovation was in developing a high pressure steam engine, operating at pressures ultimately in excess of 150 psi compared the Watt-Boulton engine operating at less than 10 psi. This gave Trevithick a compact and flexible power source that could be used for a variety of purposes and, according to his vision, could actually physically propel itself to new work. Essentially he had invented the traction engine which wasn’t to be successfully patented and exploited until the 1860s.

Trevithick moved to London with his family in 1803, he had demonstrated his railway locomotive and a road stream carriage there initially but he moved on to work on dredging for the new docks, and also a tunnel under the Thames. He was frustrated that the Admiralty were unwilling to take on any of his ideas. Ultimately nothing came of his London stay, other than he was made briefly bankrupt. That said, he actually did a pretty good job on a tunnel under the Thames, a task only successfully completed by the Brunels following nearly 20 years of work from 1824.

Soon after returning to Cornwall from London he left again, this time without his family, to Peru where he had been taken on to supply and install steam engines for the mint in Lima, and a mine in Cerro de Pasco. His plans in Peru were foiled by revolution. He then moved on to Costa Rica, where he started a pearl-fishing business using a diving bell he had designed a few years earlier. He also attempted to start a gold mine but was unable to raise sufficient finance for this.

He died in 1833, 6 years after having returned from South America.

I’ve missed out any mention of Trevithick’s threshing machine, his ideas for steam-powered boats, a diving bell and using iron containers to carry liquids on boats!

I found this book fascinating, I’ve previously read books on Thomas Telford, George and Robert Stephenson, Matthew Boulton, Isambard Kingdom Brunel, and William Armstrong who collectively span the Industrial Revolution in England – Trevithick fits into the earlier part of this story.

It has led me to wondering a little about being “before their time”, this was very apparent in the Trevithick story with so many of his ideas only coming to fruition decades after he died. Was he exceptional or is this not so uncommon – we simply don’t hear about those whose ideas required other developments for them to work? The names that have been prominent from the Industrial Revolution are those that not only invented but also were commercially successful, at least some of the time – leaving lasting monuments to their ideas.

Book review: The Wood Age by Roland Ennos

My first book of 2023 is The Wood Age: How wood shaped the whole of human history by Roland Ennos, a history of wood and human society.

The book is divided into four parts “pre-human” history, up to the industrial era, the industrial era and “now and the future”.

Part one covers our ancestors’ life in the trees and descent from them. Ennos argues that nest building as practised by, for example, orangutans is a sophisticated and little recognised form of tool use and involves an understanding of the particular mechanical properties of wood. Descending from the trees, Ennos sees digging sticks and fire as important. Digging sticks are effective for rummaging roots out of the earth, which is handy if you moving away from the leaves and fruits of the canopy. Wood becomes harder with drying (hence making better digging sticks), and the benefits of cooking food with (wood-based) fire are well-reported. The start of controlled use of fire is unknown but could be as long ago as 2,000,000 years. The final step – hair loss in humans – Ennos attributes to the ability to build wooden shelters, this seems rather farfetched to me. I suspect this part of the book is most open to criticism since it covers a period well before writing, and with very little fossilised evidence of the key component.

The pre-human era featured some use of tools made from wood, and this continued into the “stone” age but on the whole wood is poorly preserved over even thousands of years. The oldest wooden tools discovered dates to 450,000 years ago – a spear found in Essex. The peak of tool making in the Neolithic is the bow and arrow – as measured by the number of steps required, and materials, required.

The next part of the book covers the period from the Neolithic through to the start of the Industrial Revolution. In this period ideas about farming spread to arboriculture, with the introduction of coppicing which produces high yields of fire wood, and wood for wicker which is a new way of crafting with wood. There is some detailed discussion on how wood burns, and how the introduction of charcoal, which burns hotter is essential to the success of the “metal” ages and progressing from earthenware pottery (porous and weak) to stoneware, which is basically glassy and requires a firing temperature of over 1000 celsius. As an aside, I found it jarring that Ennos quoted all temperatures in Fahrenheit!

This section has the air of describing a technology tree in a computer game. The ability to make metal tools, initially copper then bronze then iron then steel, opens up progressively better tools and more ways of working with wood, like sawing planks which can be used to make better boats than those constructed by hollowing out logs or splitting tree trunks. Interestingly the boats made by Romans were not surpassed in size until the 17th century.

Wheels turn out to be more complicated than I first thought, slicing a tree trunk into disks doesn’t work because the disks split in use (and in any case cutting cleanly across the grain of wood is hard without a steel-bladed saw). The first wheels, three planks cut into a circle and held together with battens, are not great. The peak of wheel building is the spoked wheel which requires steam bent circumference, turned spokes and a turned central hub with moderately sophisticated joints. Ennos argues that the reason South America never really took to wheels, and the Polynesians did not build plank built boats was a lack of metals appropriate for making tools.

Harder, steel tools also enabled the carpentry of seasoned timber – better for making furniture than greenwood which splits and deforms as it dries.

Ultimately the use of wood was not limited by the production of wood but rather by transport and skilled labour. The Industrial Revolution picks up when coal becomes the fuel of choice – making manufacturing easier, and allowing cities to grow larger.

The final substantive part of the book covers the Industrial Revolution up to the present. This is largely the story of the replacement of wood as fuel with coal, wood as charcoal (used in smelting) with coke (which is to coal what charcoal is to wood), and the replacement of many small wood items with metal, ceramic, glass and more recently plastic. It is not a uniform story though, England moved to coal as a fuel early in the 19th century – driven by an abundance of coal, a relative shortage of wood, and the growth of large cities. Other countries in Europe and the US moved more slowly. The US built its railways with wooden infrastructure (bridges and sleepers), rather than the stone used in Britain, for a much lower cost. The US still tends to build domestic buildings in wood. The introduction of machine made nails and screws in the late 18th century makes construction in wood a lower skilled activity. Paper based on wood was invented around 1870, making newspapers and books much cheaper.

In the 21st century wood and processed-wood like plywood or chipboard are still used for many applications.

The final part of the book is a short look into the future, mainly from the point of view of re-forestation. I found this a bit odd because it starts complaining about the “deforestation myth” but then goes on to outline when humans caused significant deforestation and soil erosion damage.!

Ennos sees wood as an under-reported factor in the evolution of humanity, but authors often feel their topic is under-reported. I suppose this is inevitable since these are people so passionate about their topic that they have devoted their energy to writing a whole book about it.

This is a nice read, not too taxing but interesting.

Review of the year: 2022

Chester Cathedral on Christmas Eve

As is traditional here I present an annual review of my blog which is largely comprised of book reviews but this year includes some technical posts as I learnt some new software engineering skills.

In book terms I started the year with Natives by Akala – this is the autobiography of Akala, – it fits into the Black Lives Matter theme which I started in the previous year. Railways and the Raj by Christian Wolmar also has something of this air, the way the British ran the Raj, and the subsequent violence on Partition are a salutatory lesson.

I read a couple of books about scripts, one specifically focussed on Chinese script – Kingdom of Characters by Jing Tsu, and a second, very short book, on all scripts – Writing and script – A very Short Introduction by Andrew Robinson.

From a technical point of view I read Felienne Hermans’ The Programmer’s Brain which definitely provided a lot of food for thought, Software Design Decoded by Marian Petre and André van der Hoek and Data mesh by  Zhamak Dehgani. The topic of this last book, the data mesh, has been a central theme of my work this year.

My favourite book of the year was Pale Rider – The Spanish Flu of 1918 by Laura Spinney which was written before the covid pandemic, it was interesting to see the differences – no effective vaccines, or even a clear understanding of viruses and the similarities – arguments over schools remaining open. I also read The Art of More by Michael Brooks – a history of maths, it turns out accounting and bureaucracy were important drivers in the invention of maths. The last book of the year was Dutch Light by Hugh Aldersey-Williams – a biography of Christiaan Huygens – the second I have read.

On a more general history front I read Ask a Historian by Greg Jenner and Curious devices and mighty machines by Samuel J.M.M. Alberti, which is about science museums.

I continue to learn how to play the guitar, Play it Loud by Brad Tolinski and Alan Di Perna fits in with this – it is a history of the electric guitar, broader than The Birth of Loud by Ian S. Port which I read a few years ago. I have stopped with learning to play the (electronic) drums.

My posting this year was a bit more varied than it has been for a while, I started a thread of technical posts written as I clarified my thinking for a project I am working on at work – one of which, Understanding setup.py, setup.cfg and pyproject.toml in Python, has been my most popular blog post by a large margin and boosted traffic to my blog to the highest level ever! That’s not to say traffic is particular high – I had about 20,000 visitors this year. Versioning in Python was in a similar vein – technical information about some very specific technology. A way of working: data science and Software engineering for Data Scientists were a bit more general and philosophical, they have received rather less traffic.

In the summer the whole family joined Chester’s mid-Summer Parade as pirates which was a great deal of fun.

Thomas, Sharon and I (from the right) with two other pirates!

On the holiday front, we went to Ambleside in the Lake District for a week in July. The photos below are from Allan Bank by Grasmere – an exceedingly relaxed National Trust property. I was impressed by my new phone’s ability to take reasonable photos through windows – normally the inside of the room would be under-exposed, the photo album for the trip is here with many more photographs.

We also went to Dorset in October, where I grew up, stopping off at the gardens at Stourhead on the way down (pictured below). I scattered the ashes of my dad and stepmother with my stepbrothers in the New Forest. I was surprised how much ashes were involved – a large bag of flour-sized quantity for each of them. Dad would have been proud that two parties converged from two directions on the same location in the middle of the Forest from an X on an Ordnance Survey map, probably less impressed by me getting lost in a bog on the way back! Although as Mrs H said, getting lost having said a final farewell to my dad was rather symbolic. I posted a eulogy for my dad, here.

More photos from Dorset, including the Tank Museum, Monkey World and the Slimbridge Wetland Centre on the way back, here.

The Winter brought more entertainment, on the left you see me in my suit for the office Christmas Party. It is difficult to appreciate the sparkly-ness of the shoes but they are still out since I enjoy seeing them sparkle. On the right is the chief Roman from Chester’s Saturnalia celebration.

We all got covid earlier in the year, I still haven’t got back to my former running form – 10km in 50 minutes, I can only manage 3km in 15 minutes and struggle to run much further without post-exercise malaise setting in. My Garmin running watch generously tells me I still have the body of a 31 year old, 21 years younger than my calendar age!

I’ve have had quite a lot of counselling for anxiety this year – featuring Eye Movement Desensitization and Reprocessing (EMDR) which I insisted on referring to as “disco lights”. It appears to have worked to some degree although in the depths of winter when I’m not doing anything that induces anxiety it is difficult to tell.