Category: Book Reviews

Reviews of books featuring a summary of the book and links to related material

Book review: Designing Data-Intensive Applications by Martin Kleppmann

Designing Data-Intensive Applications by Martin Kleppmann does pretty much what it says in the title. The book provides a lot of detail on how various types of databases and database functionality work, and how these can be plumbed together to build applications. It is reminiscent of Seven Databases in Seven Weeks by Eric Redmond and Jim R. Wilson, in the sense that it provides a broad overview of a range of different data systems which are specialised for different applications. It is authoritative and well-written. Seven Databases is more concerned with the specifics of particular NoSQL databases whilst Designing Data-Intensive Applications is concerned about data applications rather than just the underlying database.

The book is divided into three broad sections covering foundations of data systems, distributed data and derived data. Each chapter starts with a cartoon map of the territory, which I thought would be a bit gimmicky but it serves as a nice summary of what the chapter covers particularly in terms of the software available.

The section on data systems talks about reliability, scaleability and maintainability before going on to discuss types of database (i.e. relational, graph and so forth) and some of the low-level implementation of data storage systems such as hash indexes and B-trees.

Reliability is about a system returning responses in a timely fashion, Amazon have observed sales drop by 1% for every 100ms of delay, other have reported a drop in consumer satisfaction of 16% with 1 second slowdown. The old academic in me twitches at providing these statistics without citing the reference. However, Designing Data-Intensive Applications is heavily referenced.

There is some interesting historical detail, including the IMS database which IBM built for the Apollo space program in the late 1960s (which is still available today), and the CODASYL database model for graph-like databases from a little later. Its interesting to see how some of these models have been revisited recently in light of the advent of fast, large memory in place of slow disk or even tape drives.

I was introduced to databases rather late in my career, they are not really a core part of the scientific computing background I have. Learning the distinction between OLAP (analytics) and OLTP (transactions) databases was useful. Briefly, transactional databases are optimised to work on single rows and provide fairly strong guarantees on transactional integrity. The access pattern for analytics databases is different, typically analytical workflows want to take the contents of an entire column and carry out aggregations and calculations over the whole column. Transactions are not so important on such databases but consistency is important, a query may take a long time to run but it should provide results as if it ran on the database at a single point in time. These workflows are better serviced by so-called column-stores such as Vertica.

The section on distributed data systems covers replication, partitioning, transactions and consensus. The problem with distributed systems is that you never know things have failed for ever, and its difficult to know what order things have happened in. This reminds me a bit of teaching special relativity to physics undergraduates long ago.

It is hard to even be able to rely on timekeeping on servers. I found this a bit surprising, when we put our minds to measuring time we can be incredibly accurate. GPS time signals have an accuracy significantly better than microseconds, yet servers synced well using NTP (Network Time Protocol) achieve something like 100 milliseconds – a factor of thousands poorer. And this accuracy is only achieved if everything is configured correctly. This is important because we therefore cannot rely on timestamps to provide a unique order for events across multiple servers, nor can we even rely on timestamps synced with NTP to be always increasing!

The two big themes in terms of databases are transactions and consensus. These are the concepts that provide the best assurance on the integrity of operations and their success over distributed systems. I used the word “assurance” rather than “guarantee” deliberately because reading Designing Data-Intensive Applications it becomes clear that perfection is hard to achieve and there are always trade-offs. It also highlights the problems of the language used to describe features. Some terms are used in different ways in different contexts.

The derived data sections starts with praise for the Unix way of piping data between simple command line scripts, Data Science at the Commandline covers this area in much more detail. It then goes on to discuss the MapReduce ecosystem and the differences between batch and stream processing. This feels like a section I will be returning to.

The book finishes with some speculation as to the future of the field, the two thoughts that stuck with me are the idea of federated databases, systems which use a common query language to interface with multiple different datastores. The second idea is that of unbundling functionality so that, for example, data may be stored in a standard SQL database for unique ID based queries and in Elasticsearch for full-text search queries – in some ways these are simply different facets of the same idea.

Designing Data-Intensive Applications is a big book with no padding, it is packed with detail including many references, but remains readable. Across a fair number of titles this is definitely one of the better technology books I have read.

Book review: Sprint by Jake Knapp

sprintSprint: How to Solve Big Problems and Test New Ideas in Just Five Days by Jake Knapp with John Zeratsky and Braden Kowitz is another book in my business oriented stream of reading.

The sprint is a 5 day programme for planning and running a consumer test of a prototype, starting on Monday with the consumer test on Friday. The programme is laid out in huge detail, even lunch times and break times, with suggested menus are proposed and a maximum size for the sprint team of 7. It is something of the spirit of a “sprint” in the Agile sense but not the same thing.

The book arises from the authors’ experiences at Google Ventures, a venture capital firm, and their work with startups for the most part. I suspect this has a bearing of the cited success of the process, startups are typically compact organisations and typically at the beginning they really need to get something in front of customers. This looks like a great way of doing that, I can see it being more challenging in a mature organisation. Knapp does provide some examples from more mature organisations as well, and mentions at the end that college lecturers have adapted it for courses.

Knapp sets up the sprint as being in contrast to a conventional brainstorming session where everyone has an equal say and no idea is too stupid. The drawback of the brainstorming method is that typically a huge number of ideas are generated in the session, many of questionable quality and then nothing happens afterwards.

Sprint is strong on the idea of a Decider, someone that will make the ultimate decision at points through the programme. The Decider is typically someone like the CEO but if the CEO can’t be available all week then they can delegate to someone else. The Decider can be influenced by spot-votes of other participants but they have the casting vote. Spot-voting is when participants indicate preferences by places sticky spots on items. The higher level implication of the Decider is that there is someone committed to the sprint who has the power to make things happen after the sprint has happened.

The five days of the Sprint are as follows:

  • Monday – defining the challenge and coming up with a target;
  • Tuesday – come up with solutions;
  • Wednesday – plan out the prototype;
  • Thursday – build the prototype;
  • Friday – Run the consumer test; 

My experience of brainstorming is that typically the challenge / target stage is done elsewhere, and the main action is in the “come up with solutions” stage. In this programme the “come up with solutions” part is more of an individual exercise than a group one.

The prototype is planned out as a storyboard of around 15 frames which represent the screens someone might see on a website or app as they conducted the core task. The key initial frame might be a fake news article linking to the prototype website.

The prototype is typically implemented as a facade, it is a fake of a website or app built largely in Keynote (Apple’s presentation software). Initially I bristled at this since my special skill is building fairly functional prototypes in short-order but even I would struggle to do that in one working day. Knapp provides a few of examples where the prototype is something else, they worked with a health clinic in the US which tried out a family friendly clinic arrangement in one of their existing clinics, a pump manufacturer who made the a sales brochure for a new pump rather than a model pump and a robotics company who had the majority of a prototype hotel delivery robot already built.

The commitment of time is large, attendees are expected 10am-5pm all five days, actually 9am-5pm on Friday. There is some scope to allow the Decider to make appearances intermittently, and Monday includes an “Ask the experts” session where outsiders can be brought in for 30 minutes or so. I can see in a larger company that it would be hard to carve out the required time. Also in a larger company it is unlikely you would get a genuine Decider on board, the output of the sprint process would go into competition with other priorities.

The book finishes with a summary of the 5 day programme, a shopping list – indicating the exact number of packs of Post-It notes you should provide and same questions and answers. To a degree I like this, these are my type of people but I can imagine for many the level of detail, control will be oppressive.

Sprint is a quick and easy read, it is chatty in style and is littered with little stories from sprints Knapp and his team have taken part in. I’m probably not in a position where I’d be able to implement the sprint programme in its entirety but provides a lot of food for thought, little ways of changing things.

Book review: Matthew Boulton: Selling What All the World Desires by Shena Mason

matthew_boultonMatthew Boulton: Selling what all the World Desires by Shena Mason is a rather sumptuous book featuring a collection of articles and a catalogue of objects relating to Matthew Boulton, organised by Birmingham City Council on the bicentenary of his death in 2009.

Boulton was famous for his Soho Manufactory built a couple of miles from the centre of modern Birmingham. There he started making “toys”, following in the footsteps of his father. At the time “toys” were small metal objects such as buttons, buckles, watch chains and the like for which Birmingham was famous. Over time he brought a high degree of mechanisation and productionisation to the process.

But “toys” were only the start of his business interests, he soon moved into making higher value objects such as vases, candle holders and tableware made from silver, Sheffield plate (silver plated tin) and ormolu (gold gilded bronze or brass), aiming to supply a growing middle class clientele by producing objects at scale with a high degree of mechanisation to reduce cost. For this he cultivated connections in well-to-do society, and employed the best designers.

I was interested to read the article on Picturing Soho by Val Loggie which talks about how the architected design of the factory was essentially part of Boulton’s marketing strategy. The Soho site drew many visitors, it was a feature of the late Enlightenment that facilities such as these attracted visitors from across Europe and America. Boulton even installed tea rooms and a show room to furnish their needs. Although a continuing concern was the risk of industrial espionage which led ultimately to the curtailment of such visits in the early years of the 19th century.

As part of his silver work he campaigned for Birmingham to have its own Assay Office to hallmark silver goods. Previously silver items needed to go to Chester to be assayed and receive a hallmark which was a lengthy journey, costing money and risking damage to items. Gaining an assay office required an act of parliament for which Boulton lobbied in the face of opposition from London silver and goldsmiths. The London case was damaged when a “secret shopper” investigation showed that most silverware passing through the London assay office was below standard, and furthermore they were caught trying to bribe Boulton’s former employees to speak against him. An assay office was granted to Sheffield in the same act.

Boulton also built a mint at Soho, pretty much fully mechanising the process of producing coinage, trade tokens and decorative medals. This work seems to have been one of his more profitable enterprises. Towards the end of the 18th century the government had not minted new copper coinage for quite some time which caused problems because it was often pennies and tuppences that workers needed to buy essentials. Ultimately Boulton was given the contract to mint a large quantity of copper coinage, and was selling minting machinery around the world.

Finally, there was his work on steam engines with James Watt. Watt invented an improvement to the Newcomen steam engine in use at the time which made it much more efficient, in terms of the amount of coal required to produce the same power. Watt also developed engines that produced reliable rotary motion, essentially for driving factory machinery rather than just pumping water out of mines. In the first instance Watt and Boulton acted as consultants, designing engines for specific customers and buying in parts from various suppliers to construct them. They charged a fraction of the cost saving from reduced coal use, which sounds like it was rather difficult to administer. The engine business, they maintained their income by lobbying parliament to extend their patent. Later they built a foundry at Soho which made all of the parts of the engine.

Actually, there was one more thing, Watt and Boulton produced a system for mechanical reproduction of letters and paintings.

Boulton’s businesses were continued after his death by his son, and the son of the James Watt. The silver plate company and foundry lasted longest but by the end of the 19th century they were gone. The Soho Manufactory made it to the dawn of photography but was demolished in 1863. Boulton’s Soho House remains on the site but the rest of the works, and parkland in which they sat have been overtaken by housing. 

In some ways he was the metalworking equivalent of Josiah Wedgewood with whom he was well-acquainted through there membership of The Lunar Society, you can read more about them in Jenny Uglow’s The Lunar Men. He was also interested in the science of the time.

Many of Boulton’s ventures seem to have been of limited commercial value, they often required significant investment which he raised via loans, and revenue typically fell below expectations.

This is a beautiful book, the articles cover the key parts of Boulton’s work at Soho but it is not a biography. The catalogue, which makes up half the book is worth reading too – the photographs are gorgeous and there are descriptive text boxes which explain the wider context of the objects.

Book review: Lost in Math by Sabine Hossenfelder

lost_in_mathIt is physics for my next read, although my background is in physics and chemistry I don’t read much physics. Lost in Math by Sabine Hossenfelder is a journey through modern fundamental physics and how it has lost its way over the last few years in a quest for beauty rather than relevance.

My background is actually in a different part of physics, the physics of squishy things like plastics, proteins and plants. I stopped being an academic physicist nearly twenty years ago but even at that time there was a definite feeling that some area of physics felt themselves superior to others. Experimental soft matter physicists, like myself, were at the bottom of the pile.

This background does mean that I’ve talked to actually string theorists about string theory, and been intrigued that when you asked them where the extra (20 or so) dimensions the theory requires were the fall back answer was always “curled up very small” – they were unable to express it differently. 

The problem in fundamental physics is that theory is running well ahead of what can be experimentally confirmed. The Higgs boson found at CERN in 2012 was predicted in the early sixties, some 50 years previously. Gravitational waves, first observed in 2016, were predicted by Einstein 100 years previously. Theories today are generating hypotheses which may never be experimentally accessible, on current technology they require accelerators the size of galaxies and and Jupiter sized detectors.

With theory running so far ahead of experiment, how does one decide whether a theory is correct, an accurate model of the universe? The answer of choice for a number of years has been beauty, and naturalness. Distinctly unphysical concepts. Defining beauty is a difficult business, in physics as well as elsewhere. For physicists it means beautiful maths. I wonder whether there is a a link with music here, the Westerners have trained their ears to find particular note combinations harmonious or beautiful but in other traditions different combinations are considered beautiful. Naturalness is a related idea, which has a technical meaning, naturalness abhors taking one very large number from another very large number to leave a number of just the right size. What are the chances of that happening?

Hossenfelder embarks on a world tour to address these issues, talking to scientists across the US and Europe. The style of her writing is journalistic and confessional. This is refreshing to see in a book about physics.

An interesting point raised is that the point of a Kuhnian revolution is as much that our perception of beauty shifts when there is a paradigm shift, as anything else.

The pain for particle physicists is that there is this zoo of 25 particles from which all the matter we can see is constructed but they seem so arbitrary, there is no rhyme or reason to their masses or deep reason for their number. Really, particle physicists want an equation from which these features simply appear rather than find themselves in the position of having to set the values of masses and so forth. This is why physicists are physicists and not biologists or chemists. Chemists revel in mess, biologists are even worse.

The hope was that the LHC at CERN would reveal new particles after the Higgs boson, which would confirm that there was something beyond the Standard Model, this would provide some meat for them to gnaw at and the prospect of planning the next big facility to find out more. But so far there has been nothing, leaving particle physics at a loss.

Cosmology is suffering from a similar problem, although the problem in cosmology is linking up general relativity which explains black holes and the like with quantum mechanics. No one really knows what quantum mechanics means, just that it allows you to explain the values measured in certain experiments really well for reasons best not inspected too closely.

It is sometimes thought that scientists collect loads of data and then come up with a theory that explains it all, this hasn’t been the case in physics for a long time. For the best part of the last 400 years physics has been about coming up with plausible theories and checking to see if they are correct.

Hossenfelder finishes with some thoughts on other types of cognitive and social bias, and even provides an appendix of remedies to address them.

Lost in Math has the air of a disenchanted author making a final tour of the topic she loves before leaving for a job in industry, so it is heartening to find Hossenfelder still in fundamental physics. It seems to me that this level of introspection and the personal touch is something that is needed in academic research.

Fortunately for British readers the phrase “lost in math” is scarcely used in the text.

Book review: The Culture Map by Erin Meyer

More work-related reading for this post with The Culture Map by Erin Meyer, it has the lengthy subtitle “Decoding how people think, lead, and get things done across cultures”.

The Culture Map
The Culture Map

Meyer’s thesis is that there are national cultures which can be described by a countries location on a set of eight axes, and managing fruitful international collaborations requires recognition of this fact and an appreciation of where the team members lie on this scale.

The book is divided into eight chapters, each concerning one of the axes. Typically a chapter will start with what one might term an anecdote or case study which introduces an incident which illustrates the wider point of the chapter. These are all very personal and individual, there are names of people and companies, and specific meetings and scenarios. This is followed by a summary table which lists out where different countries fall on this particular axis and then goes on to suggest some strategies to address potential issues in multicultural teams.

The eight axes are:

  • Communicating – is communication high-context (i.e. implicit) or low-context;
  • Evaluating – is negative feedback provided directly or indirectly;
  • Persuading – principles-first or applications-first? To convince someone do you describe a concrete instance (application) or recommendation or start with a theoretical model (principles)?
  • Leading – hierarchical or egalitarian;
  • Deciding – are decisions made consensually or top-down?
  • Trusting – is trust based on tasks (i.e. work successfully completed) or relationships (sharing meals and drinks);
  • Disagreeing – is disagreement confrontational or non-confrontational;
  • Scheduling – is scheduling linear-time (i.e. on time) or flexible-time?

In most cases the themes are considered in isolation but in a couple of cases there are interactions. For example, between communication styles (high and low context) and negative feedback styles (direct and indirect). The US, and to some degree UK and Canada, are unusual in that they favour low-context, explicit communication but indirect negative feedback. The second case is in the disagreeing style (confrontational or non-confrontational) where a ninth axes is slipped in: emotional expressiveness.

As someone with a background in the physical sciences this type of of book can be a bit challenging. Physical scientists expect theoretical models, such as the one presented here, to represent an underlying physical truth. The model is therefore, crudely, right or wrong. Outside the physical sciences a model can be something else: a framework for exploration and discussion. That’s to say the important thing is not the “correctness” of a model but the opportunity it presents in framing discussions. I suspect this makes us principles-first on the persuading axis.

In this case the physical scientist in my wants to argue about whether there really are 8 axes or should it be fewer (or more) and how well-established is the evidence for each of these axes. For some axes Meyer cites academic work in support. She also provides some rationalisation for where countries fall on an axes on the basis of history or prevalent religion.

The book presents itself as a manual for working between cultures but I wondered from the start whether it was more generally applicable. Individual styles vary within a national culture, if I look at my approach to timekeeping then I fall on the positively Germanic end of the scale, whilst other English people I work with have a much more Italian view of timekeeping. Arguably software developers as a group are on the “low context” end of the communication scale, computers are pretty much the definition of low context communicators – everything is absolutely explicit.

Meyer does touch on this idea briefly at the beginning of the book, talking about how the national scores on a scale represent the average across the distribution of individuals’ scores for a nation but doesn’t really pick it up as an idea.

Some themes arise in these solutions, the first of which is that recognising difference is half the battle. The second is about being explicit about how you will handle areas of potential misunderstanding. Finally, there is a warning about not trying too much to ape characteristics that are not your own. For example, if you come from a culture where criticism is typically indirect, don’t go all out to be direct in your criticism because it really is possible to go too far and you won’t be a good judge of what “too far” is.

I’ve noted when reading books on marketing that the style they use has a distinct marketing air, and I wonder whether the same is true for this book. Are the anecdotes about dinner to appeal to our relationship-trust side, and the summary tables our task-based trust side?

This is really a book which I wish I’d read long ago, in part because I’ve worked in international teams as an academic and commercially in both small and large companies. But also because I see in this book as a guide to working with people more generally, even those in the same culture.