Nabakov’s Favourite word is Mauve by Ben Blatt is an exploration of language through numbers. To set the scene Blatt discusses the attribution of The Federalist Papers – a set of essays written, anonymously, by one or more of Alexander Hamilton, James Madison and John Jay in support of ratification of the new US constitution. The problem was solved in in 1963 by Frederick Mosteller and David Wallace in 1963 by looking at the frequency of different words in the essays and how they compared to the frequencies of words in writings known to be by the three authors. They found that Madison had written all of the essays. An example of their approach: Madison used the word “whilst” in many of his known works but never the word “while”. Hamilton, on the other hand, never used the word “whilst”. Combining the frequencies of a number of such words provides a fingerprint for the writing style of an author. What struck me was that the “fingerprint” words are not at all exceptional.
In the sixties this type of frequency analysis was exceedingly tedious – Mosteller and Wallace physically cut up the essays and made little piles of words in order to count them! This type of heroic manual analysis was not uncommon across many quantitative sciences prior to the widespread availability of computers. These days such analyses are straightforward. The full texts of many books are freely downloadable, and there are programming libraries such as the natural language toolkit (NLTK) in Python which provide functions for word counting and other more sophisticated analyses
Blatt takes the opportunity to extend word counting analysis to more topics and a much extended collection of texts. These include best selling novels, fan fiction, classic fiction and US and UK English corpora (large bodies of expertly selected text). The books are all in English but with some foreign translations, and they are biased to the US market.
The topics covered include: the overuse of adverbs, particularly those ending -ly; he vs she – how male authors sometimes write almost entirely without mentioning “she” whilst the most extreme female authors still write about 20% “he”; differences between US and UK writers – it comes down to blokes, blimey and brilliant; and how the reading age of popular fiction has dropped over the years. Here there is a diversion into Dr Seuss’ Cat in the Hat and it’s 220 word vocabulary, given to Dr Seuss by Rudolf Flesch who was interested in readability, in fact I’ve recently used The Flesch-Kincaid readability index which he helped develop.
The title of the book comes from an analysis of favourite words of authors, those words which they use significantly more frequently than other others. Nabakov is an interesting case since he uses all words about colour significantly more frequently than other authors. This is likely linked to his synaesthesia – of which he has written. Ray Bradbury, in the other hand, is a fan of “cinnamon”, whilst Michael Connelly likes “nodding” and its variants. The chapter on favourite words also covers repeated words and clichés. Blatt is not judgemental about these habits, sometimes they have a dramatic effect.
As almost an aside Blatt reveals some of the commercial side of the publishing industry. I was struck by the “Big Name Author with …” phenomenon where a big name author such as James Patterson or Tom Clancy publish with a lesser known or unknown author. Analysis along the lines of Mosteller and Wallace show that these co-authors write the books with the Big Name providing story outlines (Patterson is straightforward that this is the case). Another example is the Stratemeyer Syndicate who published The Hardy Boys and Nancy Drew series which I recall from my childhood in the seventies. These books purport to have a named author but actually the author is a fiction and the books are published to a formula by a variety of writers (spread over more years than a living author might achieve). Finally, there is the phenomenon of the gigantic author credit on the front cover – Stephen King suffered from this, his name covered only 3% of the front cover of his first book, towards the end of the eighties it approached 40%!
The book finishes with an analysis of first and last lines.
The emphasis of the book is very much on the numbers with fairly cursory examination of the reasons for the numbers found, that said the book is an easy and thought provoking read.