Tag: data visualisation

Book review: Storytelling with you by Cole Nussbaumer Knaflic

I recently reviewed Cole Nussbaumer Knaflic’s Storytelling with data, as a result the storytelling team sent me a copy of Storytelling with you to review. Storytelling with you is the next step in the journey which started with Storytelling with data, widening the scope to talk more fully about the whole process of presenting from inception to delivery and not being concerned specifically with presenting data.

I’m a data scientist, previously an academic and then industrial research scientist. Presenting has been a constant throughout my career, both as an audience member and as a presenter. Yet it is something in which I have had relatively little training and given the quality of the presentations I have witnessed – I am not alone!

Those with a scientific background will be used to a standard way of presenting results that effectively replicates a scientific paper (introduction, methodology, results, discussion, conclusions). Knaflic’s earlier book proposed a break from this format: using ideas from storytelling to shape presentations. She cites Resonate by Nancy Duarte, as a reference for this approach. Storytelling with you is similar in content to Resonate but feels like a shorter, more focussed book.

The book is divided into three parts: plan, create and deliver. Each part comprises four chapters. Each chapter ends with an instalment of the “TRIX Case study”. TRIX is a trail mix product which requires revision and the presentation is about options for this revision. I really liked this, it enables Knaflic to provide examples of the material in each chapter without having to restate the context for each new outing. I have learnt that macadamia nuts are really important to the TRIX mix!

Planning starts with the audience, not the content. Who are they? What do they want? I find Linkedin is great for getting a quick view of audience members. In terms of content the plan starts with the Big Idea – the sentence that captures what the presentation is about. This is expanded into a full story using a storyboard based on Post-its.

Knaflic is keen on Post-Its for planning and organising material. My tendency when creating a presentation is to open up a PowerPoint file but this forces me into choices on format and so forth that I don’t need to make at the beginning. There is also a challenge in being unwilling to delete slides so carefully and laboriously created!

The section on the theory of storytelling is quite brief. One takeaway for me was to think of the children’s books you know as templates for storytelling. Over the last 10 years or so I have read a lot to my son so I am very familiar with a range of children’s books. I like Dr Seuss, and Julia Donaldson’s books – The Gruffalo, for example – not only do they provide a template for stories, they are designed to be read aloud and provide some ideas for delivery. For fun, you can even think about your presentation in the style of Dr Seuss!

The create section is very practical, including a walkthrough of how to use PowerPoint-like Slide Master – I found this welcome since whilst I am aware of the master slides my use of them is rather primitive. It also talks about font selection, picking a font which has a distinct bold form, and colour selection.

The appendix containing the completed slides for the TRIX case study is quite telling when I compare them to my own: the case study slides contain far less text and effectively no bullet points when compared to mine. The story of the presentation is read from the titles which summarise the slide they sit on rather than indicating the function of the slide.

In terms of content I found the section on images most interesting, corporate templates tend to have a bunch of images included, and I always feel the need to add an image to each slide – which is wrong.

There is a substantial section on delivery. I found the part on introducing yourself quite striking, it talks about picking out the characteristics which you wish to present and relating anecdotes that support them. I found this a bit calculated but realise I probably do this intuitively – I am notorious for my anecdotes!

I was bemused by the vision of Knaflic striking power poses in conference centre restrooms in preparation for presenting! She provides a lot of detail on how she prepares to deliver a presentation. I learnt long ago that practicing the opening is very important, I find it helps me to relax. Knaflic points out that practicing your ending is equally important – it sends your audience off into action.

In common with Resonate and Storytelling with data  the assumption is that you are preparing for a high stakes meeting and you are going to commit a lot of time to this process. Typically I find I make lots of low stakes presentations so there is a degree to which I would adapt the lessons in this book to that scenario. In fact the storytelling team have recognised this, and produced a blog post on a reduced process.

If you’re looking for a readable guide to planning, creating and delivering presentations then this is the book for you!

Book review: Data Points by Nathan Yau

data_pointsI picked up Data Points by Nathan Yau as a recommended book on exploratory data analysis in Storytelling with data. I have previously read Nathan Yau’s book Visualize This.

Visualise This  was very focussed on the technical side of producing data visualisations, with code samples and so forth. This is a “bigger picture” book divided into three sections: context, exploration and presentation.

Context can be summarised as: who, how, what, when, where, why. Context is covered explicitly in the first chapter using the medium of Yau’s wedding photos as an example. Spinning off from here is a mention of the Quantified Self movement, there was a time a few years ago when this was popular – people would record aspects of their life in great detail and build visualisations from them. This was enabled by the growth of the first generations of smartphone which made this sort of data collection easy. Yau points out – and I think all data scientists can agree with this – that most of the job is actually collecting together the data required for a project and getting it into a shape to visualise.

The “Exploration” chapters start with an overview of what a data visualisation is, one of the strengths of this book is the many examples of visualisations, in this case going as far back as William Playfair in 1786 with the invention of the bar chart. This chapter also highlights that a data visualisation can be a flow chart, or it can be an abstract piece of art which is based on data. Yau cites John Tukey’s Exploratory Data Analysis a number of times which was published in the 1970s at a time when the author felt the need to explain that a “bold” effect can be achieved using a pen rather than a pencil. The point being that we now have immense power in readily available software to produce visualisations at the click of a button which would have taken an expert many hours of manual labour in the relatively recent past.

The next chapters provide a summary of how we build a data visualisation starting with the fundamental building blocks: title, visual cues (the data), coordinate system, scale and context elements. The visual cues are further broken down into attributes like position, length, angle, direction, shapes and so forth.

Once this groundwork has been done, there is an extensive taxonomy of chart types including more esoteric plots such as the cartogram (where geographic areas are distorted to show the relative sizes of variables), and radar or polar plots which, along with calendar heatmaps are useful for showing periodic timeseries data.
The “Visualising with clarity” chapter starts to talk about presentation, and how the purpose of visualisations is to allow comparisons. I think the useful takeaway from this chapter for me was that distribution plots are rather more difficult for the lay viewer to interpret than practitioners realise.
I found the penultimate chapter on “Designing for an audience” a little brief. A handy hint here was to design presentations for the audience at the back of the room – nobody likes to hear “this is probably too small for you to see” from a speaker. Another useful tip for making interactive presentations is that people like to find out about themselves, so if you have data on people then make it easy for viewers to “look themselves up” because that’s the first thing they are going to do.

The book finishes with a chapter on technologies, some of them such as R, Adobe Illustrator, Microsoft Excel, Google Sheets, Tableau are still around and remain good choices. Yau’s favoured combination is R with Adobe Illustrator used to polish the results. The Javascript library Data Driven Documents (d3) and Processing are still active. Other systems like IBM’s Many Eyes project, MapBox’s TileMill have disappeared. Javascript Libraries Raphael and the Javascript Infovis Toolkit appear dormant, in the sense that the activity on their GitHub repositories is minimal. Nobody talks about Flash and ActionScript anymore.

Data Points is much more a book about exploratory data visualisation then Storytelling with data, I think Yau believes that exploratory data analysis is an exercise in storytelling. The strength of this book is the wide range of examples used to illustrate the points being made through the book. The style is chatty, it is not a difficult read. It is less focussed on delivering specific lessons in making data visualisations than Storytelling with data.

Book review: Storytelling with data by Cole Nussbaumer Knaflic

storytellingThis book, Storytelling with data by Cole Nussbaumer Knaflic, fits in with my work, and my interests. It relates to data visualisation, an area in which I have read a number of books including The Visual Display of Quantitative Information by Edward R. Tufte, Visualize This by Nathan Yau, Data Visualization: a successful design process by Andy Kirk and Interactive Data Visualization for the web by Scott Murray. These range from the intensely theoretical (Tufte) to the deeply technical (Murray).

Storytelling with data is closest in content to Andy Kirk’s book and his website is cited in the (very good) additional resources list. A second similarity with Andy Kirk’s book is that Storytelling is “the book of the course” –  the book is derived from her the author’s training courses.

The differentiating factor with Knaflic’s book is the focus on storytelling, presenting a case to persuade rather than focussing on on the production of a data visualisation, although that is part of the process. The book is divided into 6 key lessons, each of which gets a chapter, with a couple of chapters of examples, an introduction and an epilogue this makes 10 chapters. The six key lessons are:

1. understand the context
2. choose an appropriate visual display
3. eliminate clutter
4. focus attention where you want it
5. think like a designer
6. tell a story

I think I got the most out of the understand the context and tell a story chapters, technically I am quite experienced but my knowledge is around how to make charts and process the data to make charts rather than telling a story. The understanding the context chapter talks about the “Big Idea” and the “3-minutes story”. The Big Idea is the single idea you are trying to get across in a presentation, and the 3-minute story is the elevator pitch – how you would put your story into 3 minutes. I liked a callout box with a list of verbs (accept, agree, begin, believe…) used to prompt you for what action you want your audience to take having seen your presentation.

The chapter on choosing an appropriate visual display is quite straightforward, Knaflic presents the 12 types of display she finds herself using frequently (which includes simple text, and text tables). This is a fairly small set since variations of bar charts – horizontal, vertical, stacked and waterfall cover off 5 types. This is appropriate, if you are telling a story to persuade then you don’t want to be spending your time explaining how your esoteric display works. Knaflic steers away from specific technology, only mentioning at the beginning of the book that all the charts shown were made in Microsoft Excel and Adobe Illustrator was sometimes used to get a chart looking just right at the end of the process.

There is a list of sins in data visualisation including the reviled pie chart, and 3D plots but perhaps surprisingly the use of secondary axes to plot data on different scales together.

The chapters on eliminate clutter, focus attention where you want it, and think like a designer are all about making sure that the viewer is paying attention where you want them to pay attention. Some of this is about the Tuftian “eliminate clutter” much of which creeps into charts through default behaviour in software. Some is about using gestalt theories of attention to group items together through similarity, proximity and so forth and some is about using pre-attentive attributes such as colour and type face to draw attention to certain elements. This reminded me of The Programmer’s Brain by Felienne Hermans, which links theories of how our brain works with the practices of programming.

The chapter on tell a story introduces some resources on storying telling from playwrights and screenwriters – basically the idea of the three act play with a setup, conflict and resolution. This is a different way of thinking for me, my presentations tend to follow the traditional structure of a scientific paper but it is interesting to see the link with creative writing and drama – which is generally excluded from scientific writing.

One of the lessons I learnt from this book was to make better use of of chart titles and PowerPoint titles, I tend to go for  descriptive chart titles (“Ticket Trend”, to use an example from the book) and PowerPoint titles which simply labelled a section of a talk (“Methodology”). Knaflic encourages us to use this valuable “real estate” in a presentation for a call to action: “Please Approve the Hire of 2 FTEs”.

The six lessons are reinforced with a chapter which covers a single worked example from beginning to end, and another chapter of case studies which looks at fixing particular issues with single charts.

I enjoyed this book, its beautifully produced and fairly easy reading. It also led me to buy two more books Resonate by Nancy Duarte and Data Points by Nathan Yau, and so the “to be read” pile grows again!

Book review: The Information Capital by James Cheshire and Oliver Uberti

Today I review TheInformationCapitalThe Information Capital by James Cheshire and Oliver Uberti – a birthday present. This is something of a coffee table book containing a range of visualisations pertaining to data about London. The book has a website where you can see what I’m talking about (here) and many of the visualisations can be found on James Cheshire’s mappinglondon.co.uk website.

This type of book is very much after my own heart, see for example my visualisation of the London Underground. The Information Capital isn’t just pretty, the text is sufficient to tell you what’s going on and find out more.

The book is divided into five broad themes “Where We Are”, “Who We Are”, “Where We Go”, “How We’re Doing” and “What We Like”. Inevitably the majority of the visualisations are variants on a coloured map but that’s no issue to my mind (I like maps!).

Aesthetically I liked the pointillist plots of the trees in Southwark, each tree gets a dot, coloured by species and the collection of points marks out the roads and green spaces of the borough. The twitter map of the city with the dots coloured by the country of origin of the tweeter is in similar style with a great horde evident around the heart of London in Soho.

The visualisations of commuting look like thistledown, white on a dark blue background, and as a bonus you can see all of southern England, not just London. You can see it on the website (here). A Voroni tessellation showing the capital divided up by the area of influence (or at least the distance to) different brands of supermarket is very striking. To the non-scientist this visualisation probably has a Cubist feel to it.

Some of the charts are a bit bewildering, for instance a tree diagram linking wards by the prevalent profession is confusing and the colouring doesn’t help. The mood of Londoners is shown using Chernoff faces, this is based on data from the ONS who have been asking questions on life satisfaction, purpose, happiness and anxiety since 2011. On first glance this chart is difficult to read but the legend clarifies for us to discover that people are stressed, anxious and unhappy in Islington but perky in Bromley. You can see this visualisation on the web site of the book (here).

The London Guilds as app icons is rather nice, there’s not a huge amount of data in the chart but I was intrigued to learn that guilds are still being created, the most recent being the Art Scholars created in February 2014. Similarly the protected views of London chart is simply a collection of water-colour vistas.

I have mixed feelings about London, it is packed with interesting things and has a long and rich history. There are even islands of tranquillity, I enjoyed glorious breakfasts on the terrace of Somerset House last summer and lunches in Lincoln’s Inn Fields.  But I’ve no desire to live there. London sucks everything in from the rest of the country, government sits there and siting civic projects outside London seems a great and special effort for them. There is an assumption that you will come to London to serve. The inhabitants seem to live miserable lives with overpriced property and hideous commutes, these things are reflected in some of the visualisations in this book. My second London Underground visualisation measured the walking time between Tube station stops, mainly to help me avoid that hellish place at rush hour. There is a version of such a map in The Information Capital.

For those living outside London, The Information Capital is something we can think about implementing in our own area. For some charts this is quite feasible based, as they are, on government data which covers the nation such as the census or GP prescribing data. Visualisations based on social media are likely also doable although will lack weight of numbers. The visualisations harking back to classics such as John Snow’s cholera map or Charles Booth’s poverty maps of are more difficult since there is no comparison to be made in other parts of the country. And other regions of the UK don’t have Boris Bikes (or Boris, for that matter) or the Millennium Wheel.

It’s completely unsurprising to see Tufte credited in the end papers of The Information Capital. There are also some good references there for the history of London, places to get data and data visualisation.

I loved this book, its full of interesting and creative visualisations, an inspiration!

Inordinately fond of beetles… reloaded!


This post was first published at ScraperWiki.

Some time ago, in the era before I joined ScraperWiki I had a play with the Science Museums object catalogue. You can see my previous blog post here. It was at a time when I was relatively inexperienced with the Python programming language and had no access to Tableau, the visualisation software. It’s a piece of work I like to talk about when meeting customers since it’s interesting and I don’t need to worry about commercial confidentiality.

The title comes from a quote by J.B.S. Haldane, who was asked what his studies in biology had told him about the Creator. His response was that, if He existed then he was “inordinately fond of beetles”.

The Science Museum catalogue comprises three CSV files containing information on objects, media and events. I’m going to focus on the object catalogue since it’s the biggest one by a large margin – 255,000 objects in a 137MB file. Each object has an ID number which often encodes the year in which the object was added to the collection; a title, some description, it often has an “item name” which is a description of the type of object, there is sometimes information on the date made, the maker, measurements and whether it represents part or all of an object. Finally, the objects are labelled according to which collection they come from and which broad group in that collection, the catalogue contains objects from the Science Museum, Nation Railway Museum and National Media Museum collections.

The problem with most of these fields is that they don’t appear to come from a controlled vocabulary.

Dusting off my 3 year old code I was pleased to discover that the SQL I had written to upload the CSV files into a database worked almost first time, bar a little character encoding. The Python code I’d used to clean the data, do some geocoding, analysis and visualisation was not in such a happy state. Or rather, having looked at it I was not in such a happy state. I appeared to have paid no attention to PEP-8, the Python style guide, no source control, no testing and I was clearly confused as to how to save a dictionary (I pickled it).

In the first iteration I eyeballed the data as a table and identified a whole bunch of stuff I thought I needed to tidy up. This time around I loaded everything into Tableau and visualised everything I could – typically as bar charts. This revealed that my previous clean up efforts were probably not necessary since the things I was tidying impacted a relatively small number of items. I needed to repeat the geocoding I had done. I used geocoding to clean up the place of manufacture field, which was encoded inconsistently. Using the Google API via a Python library I could normalise the place names and get their locations as latitude – longitude pairs to plot on a map. I also made sure I had a link back to the original place name description.

The first time around I was excited to discover the Many Eyes implementation of bubble charts, this time I now realise bubble charts are not so useful. As you can see below in these charts showing the number of items in each subgroup. In a sorted bar chart it is very obvious which subgroup is most common and the relative sizes of the subgroup. I’ve coloured the bars by the major collection to which they belong. Red is the Science Museum, Green is the National Rail Museum and Orange is the National Media Museum.


Less discerning members of ScraperWiki still liked the bubble charts.


We can see what’s in all these collections from the item name field. This is where we discover that the Science Museum is inordinately fond of bottles. The most common items in the collection are posters, mainly from the National Rail Museum but after that there are bottles, specimen bottles, specimen jars, shops rounds (also bottles), bottle, drug jars, and albarellos (also bottles). This is no doubt because bottles are typically made of durable materials like glass and ceramics, and they have been ubiquitous in many milieu, and they may contain many and various interesting things.


Finally I plotted the place made for objects in the collection, this works by grouping objects by location and then finding latitude and longitude for those group location. I then plot a disk sized by the number of items originating at that location. I filtered out items whose place made was simply “England” or “London” since these made enormous blobs that dominated the map.




You can see a live version of these visualisation, and more on Tableau Public.

It’s an interesting pattern that my first action on uploading any data like this to Tableau is to do bar chart frequency plots for each column in the data, this could probably be automated.

In summary, the Science Museum is full of bottles and posters, Tableau wins for initial visualisations of a large and complex dataset.