Book review: Data Points by Nathan Yau

data_pointsI picked up Data Points by Nathan Yau as a recommended book on exploratory data analysis in Storytelling with data. I have previously read Nathan Yau’s book Visualize This.

Visualise This  was very focussed on the technical side of producing data visualisations, with code samples and so forth. This is a “bigger picture” book divided into three sections: context, exploration and presentation.

Context can be summarised as: who, how, what, when, where, why. Context is covered explicitly in the first chapter using the medium of Yau’s wedding photos as an example. Spinning off from here is a mention of the Quantified Self movement, there was a time a few years ago when this was popular – people would record aspects of their life in great detail and build visualisations from them. This was enabled by the growth of the first generations of smartphone which made this sort of data collection easy. Yau points out – and I think all data scientists can agree with this – that most of the job is actually collecting together the data required for a project and getting it into a shape to visualise.

The “Exploration” chapters start with an overview of what a data visualisation is, one of the strengths of this book is the many examples of visualisations, in this case going as far back as William Playfair in 1786 with the invention of the bar chart. This chapter also highlights that a data visualisation can be a flow chart, or it can be an abstract piece of art which is based on data. Yau cites John Tukey’s Exploratory Data Analysis a number of times which was published in the 1970s at a time when the author felt the need to explain that a “bold” effect can be achieved using a pen rather than a pencil. The point being that we now have immense power in readily available software to produce visualisations at the click of a button which would have taken an expert many hours of manual labour in the relatively recent past.

The next chapters provide a summary of how we build a data visualisation starting with the fundamental building blocks: title, visual cues (the data), coordinate system, scale and context elements. The visual cues are further broken down into attributes like position, length, angle, direction, shapes and so forth.

Once this groundwork has been done, there is an extensive taxonomy of chart types including more esoteric plots such as the cartogram (where geographic areas are distorted to show the relative sizes of variables), and radar or polar plots which, along with calendar heatmaps are useful for showing periodic timeseries data.
The “Visualising with clarity” chapter starts to talk about presentation, and how the purpose of visualisations is to allow comparisons. I think the useful takeaway from this chapter for me was that distribution plots are rather more difficult for the lay viewer to interpret than practitioners realise.
I found the penultimate chapter on “Designing for an audience” a little brief. A handy hint here was to design presentations for the audience at the back of the room – nobody likes to hear “this is probably too small for you to see” from a speaker. Another useful tip for making interactive presentations is that people like to find out about themselves, so if you have data on people then make it easy for viewers to “look themselves up” because that’s the first thing they are going to do.

The book finishes with a chapter on technologies, some of them such as R, Adobe Illustrator, Microsoft Excel, Google Sheets, Tableau are still around and remain good choices. Yau’s favoured combination is R with Adobe Illustrator used to polish the results. The Javascript library Data Driven Documents (d3) and Processing are still active. Other systems like IBM’s Many Eyes project, MapBox’s TileMill have disappeared. Javascript Libraries Raphael and the Javascript Infovis Toolkit appear dormant, in the sense that the activity on their GitHub repositories is minimal. Nobody talks about Flash and ActionScript anymore.

Data Points is much more a book about exploratory data visualisation then Storytelling with data, I think Yau believes that exploratory data analysis is an exercise in storytelling. The strength of this book is the wide range of examples used to illustrate the points being made through the book. The style is chatty, it is not a difficult read. It is less focussed on delivering specific lessons in making data visualisations than Storytelling with data.