Author's posts
Oct 23 2014
Book review: Pompeii by Mary Beard
For a change I have been reading about Roman history, in the form of Pompeii: The Life of a Roman Town by Mary Beard.
Mary Beard is a Cambridge classicist. I think it helps having seen her on TV, jabbing her figure at a piece of Roman graffiti, explaining what it meant and why it was important with obvious enthusiasm. For me it gave the book a personality.
I imagine I am not unusual in gaining my knowledge of Roman culture via some poorly remembered caricature presented in pre-16 history classes at school and films including the Life of Brian, Gladiator and Up Pompeii.
Pompeii is an ancient Italian town which was covered in a 4-6 metre blanket of ash by an eruption of nearby Vesuvius in 79 AD. Beneath the ash the town was relatively undamaged. It was rediscovered in 1599 but excavations only started in the mid 18th century. These revealed a very well-preserved town including much structure, artwork and the remains of the residents. The bodies of the fallen left voids in the ash which were reconstructed by filling them with plaster.
The book starts with a salutatory reminder that Pompeii wasn’t a town frozen in normal times but one in extremis as it succumbed to a volcanic eruption. We can’t assume that the groups of bodies found or the placement of artefacts represent how they might have been found in normal daily life.
There are chapters on the history of the city, the streets, homes, painting, occupations, administration, various bodily pleasures (food, wine, sex and bathing), entertainment (theatre and gladiators) and temples.
I’ve tended to think of the Roman’s as a homogeneous blob who occupied a chunk of time and space. But this isn’t the case, the pre-Roman history of the town features writing in the Oscan language. The Greek writer Strabo, working in the first century BC wrote about a sequence of inhabitants: Oscans, Etruscans, Pelasgians and then Samnites – who also spoke Oscan.
Much of what we know of Pompeii seems to stem from the graffiti found all about the remains. It would be nice to learn a bit more about this evidence since it seems important, and clearly something different is going on from what we find in modern homes and cities. If I look around homes I know today then none feature graffiti, granted there is much writing on paper but not on the walls.
From the depths of my memory I recall the naming of various rooms in the Roman bath house but it turns out these names may not have been in common usage amongst the Romans. Furthermore, the regimented progression from hottest to coldest bath may also be somewhat fanciful. Something I also didn’t appreciate was that the meanings of some words in ancient Latin are not known, or are uncertain. It’s obvious in retrospect that this might be the case but caveats on such things are rarely heard.
Beard emphasises that there has been a degree of “over-assumption” in the characterisation of the various buildings in Pompeii. For instance on some reckonings there are huge numbers of bars and brothels. So for instance, anything with a counter and some storage jars gets labelled a bar. Anything with phallic imagery gets labelled a brothel, the Pompeiian’s were very fond of phallic imagery. A more conservative treatment brings these numbers down enormously.
I am still mystified by the garum, the fermented fish sauce apparently loved by many, it features moderately in the book since the house of a local manufacturer is one of the better preserved ones, and one which features very explicit links to his trade. It sounds absolutely repulsive.
The degree of preservation in Pompeii is impressive, the scene that struck me most vividly was in The House of Painters at Work. In this case the modern label for the house describes exactly what was going on, other houses are labelled with the names of dignitaries present when a house was uncovered, or after key objects found in the house. It is not known what the inhabitants called the houses, or even the streets. Deliveries seemed to go by proximity to prominent buildings.
I enjoyed Pompeii, the style is readable and it goes to some trouble to explain the uncertainty and subtlety in interpreting ancient remains.
Once again I regret buying a non-fiction book in ebook form, the book has many illustrations including a set of colour plates and I still find it clumsy looking at them in more detail or flicking backwards and forwards in an ereader.
Oct 20 2014
Book review: Graph Theory and Complex Networks by Maarten van Steen
This review was first published at ScraperWiki.
My last read, on the Gephi graph visualisation package, was a little disappointing but gave me an enthusiasm for Graph Theory. So I picked up one of the books that it recommended: Graph Theory and Complex Networks: An Introduction by Maarten van Steen to learn more. In this context a graph is a collection of vertices connected by edges, the edges may be directed or undirected. The road network is an example of a graph; the junctions between roads are vertices, the edges are roads and a one way street is a directed edge – two-way streets are undirected.
Why study graph theory?
Graph theory underpins a bunch of things like route finding, timetabling, map colouring, communications routing, sol-gel transitions, ecologies, parsing mathematical expressions and so forth. It’s been a staple of Computer Science undergraduate courses for a while, and more recently there’s been something of a resurgence in the field with systems on the web provided huge quantities of graph-shaped data both in terms of the underlying hardware networks and the activities of people – the social networks.
Sometimes the links between graph theory and an application are not so obvious. For example, project planning can be understood in terms of graph theory. A task can depend on another task – the tasks being two vertices in a graph. The edge between such vertices is directed, from one to the other, indicating dependency. To give a trivial example: you need a chicken to lay an egg. As a whole a graph of tasks cannot contain loops (or cycles) since this would imply that a task depended on a task that could only be completed after it, itself had been completed. To return to my example: if you need an egg in order to get a chicken to lay an egg then you’re in trouble! Generally, networks of tasks should be directed acyclic graphs (or DAG) i.e. they should not contain cycles.
The book’s target audience is 1st or 2nd year undergraduates with moderate background in mathematics, it was developed for Computer Science undergraduates. The style is quite mathematical but fairly friendly. The author’s intention is to introduce the undergraduate to mathematical formalism. I found this useful, since mathematical symbols are difficult to search for and shorthands such as operator overloading even more so. This said, it is still an undergraduate text rather than a popular accounts don’t expect an easy read or pretty pictures, or even pretty illustrations.
The book divides into three chunks. The first provides the basic language for describing graphs, both words and equations. The second part covers theorems arising from some of the basic definitions, including the ideas of “walks” – traversals of a graph which take in all vertices and “tours” which take in all edges. This includes long standing problems such as the Dijkstra’s algorithm for route finding, and the travelling salesman problem. Also included in this section are “trees” – networks with no cycles – where is a cycle is a closed walk which visits vertices just once.
The third section covers the analysis of graphs. This starts with metrics for measuring graphs such as vertex degree distributions, distance statistics and clustering measures. I found this section rather brief, and poorly illustrated. However, it is followed by an introduction to various classes of complex networks including the original random graphs(connect), small-world and scale-free networks. What is stuck me about complex graphs is that they are each complex in their own way. Random, small-world and scale-free networks are all methods for constructing a network in order to try to represent a known real world situation. Small-world networks arise from one of Stanley Milgram’s experiments: sending post across the US via social networks. The key feature is that there are clusters of people who know each other but these clusters are linked by the odd “longer range” contact.
The book finishes with some real world examples relating to the world wide web, peer-to-peer sharing algorithms and social networks. What struck me in social networks is that the vertices (people!) you identify as important can depend quite sensitively on the metric you use to measure importance.
I picked up Graph Theory after I’d been working with Gephi, wanting to learn more about the things that Gephi will measure for me. It serves that purpose pretty well. In addition I have a better feel for situations where the answer is “graph theory”. Furthermore, Gephi has a bunch of network generators to create random, small-world and scale-free networks so that you can try out what you’ve learned.
Sep 22 2014
Book review: Network Graph Analysis and visualization with Gephi by Ken Cherven
This review was first published at ScraperWiki.
I generally follow the rule that if I haven’t got anything nice to say about something then I shouldn’t say anything at all. Network Graph Analysis and visualization with Gephi by Ken Cherven challenges this principle.
Gephi is a system for producing network visualisations, as such it doesn’t have a great many competitors. Fans of Unix will have used Graphviz for this purpose in the past but Gephi offers greater flexibility in a more user-friendly package. Graph theory and network analysis have been growing in importance over the past few years in part because of developments in the analysis of various complex systems using network science. As a physical scientist I’ve been aware of this trend, and it clearly also holds in the social sciences. Furthermore there is much increased availability of network information from social media such as Twitter and Facebook.
I’ve used Gephi a few times in the past, and to be honest there has been an air of desperate button clicking to my activities. That’s to say I felt Gephi could provide the desired output but I could only achieve it by accident. I have an old-fashioned enthusiasm for books even for learning about modern technology. Hence Network Graph Analysis and visualization with Gephi – the only book I could find with Gephi in the title. There is substantial online material to support Gephi but I hoped that this book would give me a better insight into how Gephi worked and some wider understand of graph theory and network analysis.
On the positive side I now have a good understanding of the superficial side of the interface, a feel for how a more expert user thinks about Gephi, and some tricks to try.
I discovered from Network Graph Analysis that the “Overview” view in Gephi is what you might call “Draft”, a place to prepare visualisations which allows detailed interaction. And the “Preview” view is what you might call “Production”, a place where you make a final, beautiful version of your visualisations.
The workflow for Gephi is to import data and then build a visualisation using one of a wide range of layout algorithms. For example, force-based layouts assume varying forces between nodes for which an arrangement of nodes can be calculated by carrying out a pseudo-physical simulations. These algorithms can take a while to converge, and may get trapped in local minima. The effect of these layout algorithms is to reveal some features of the network. For example, the force layouts can reveal clusters of nodes which might also be discovered by a more conventional statistical clustering algorithm. The concentric layout allows a clearer visualisation of hierarchy in a network.
It’s clear that the plugin ecosystem is important to the more experienced user of Gephi. Plugins provide layout algorithms, data helpers, new import and export functionality, analysis and so forth. You can explore them in the Gephi marketplace.
Cherven recommends a fairly small, apparently well-chosen set of references to online resources and books. The Visual Complexity website looks fabulous. You can read the author’s complete, pre-publication draft of Networks, Crowds and Markets: Reasoning about a highly connected world by David Easley and Jon Kleinberg here. It looks good but it’s nearly 800 pages! I’ve opted for the rather shorter Graph Theory and Complex Networks: An Introduction by Maarten van Steen.
On the less positive side, this is an exceedingly short book. I read it in a couple of 40 minute train journeys. It’s padded with detailed descriptions of how to install Gephi and plugins, including lots of screenshots. The coverage is superficial, so whilst features may be introduced the explanation often tails off into “…and you can explore this feature on your own”.
Network Graph Analysis is disappointing, it does bring a little enlightenment to a new user of Gephi but not very much. A better book would have provided an introduction to network and graph analysis with Gephi the tool to provide practical experience and examples, in the manner that Data Mining does for weka and Natural Language Processing with Python does for the nltk library.
This book may be suitable for someone who is thinking about using Gephi and isn’t very confident about getting started. The best alternative that I’ve found is the online material on GitHub (here).
Sep 21 2014
Book review: Falling Upwards by Richard Holmes
I read Richard Holmes book The Age of Wonder some time ago, in it he made a brief mention of balloons in the 18th century. It pricked my curiosity, so when I saw his book Falling Upwards, all about balloons, I picked it up.
The chapters of Falling Upwards cover a series of key points in the development of ballooning, typically hydrogen balloons from the last couple of decades of the 18th century to the early years of the 20th century. One of the early stories is a flight from my own home city, Chester. Thomas Baldwin recorded his flight in Airopaidia: Containing the Narrative of a Balloon Excursion from Chester, the eighth of September, 1785. The book does not have the air of a rigorous history of ballooning, it introduces technical aspects but not systematically. It is impressionistic to a degree, and as a result a rather pleasant read. For Holmes the artistic and social impact of balloons are as important as the technical.
In the beginning there was some confusion as to the purposes to which a balloon might be put, early suggestions included an aid to fast messengers who would stay on the ground to provide but use a small balloon to give them “10 league boots”, there were similar suggestions for helping heavy goods vehicles.
In practice for much of the period covered balloons were used mainly for entertainment – both for pleasure trips but also aerial displays involving acrobatics and fireworks. Balloons were also used for military surveillance. Holmes provides chapters on their use in the American Civil War by the Union side (and very marginally by the Confederates). And in the Franco-Prussian war they were used to break the Prussian siege of Paris (or at least bend it). The impression gained though is that they were something like novelty items for surveillance. By the time of the American Civil War in the 1860’s it wasn’t routine or obvious that one must use balloon surveillance, it wasn’t a well established technique. This was likely a limitation of both the balloons themselves and the infrastructure required to get them in the air.
Balloons gave little real utility themselves, except in exceptional circumstances, but they made a link to heavier-than-air flight. They took man into the air, and showed the possibilities but for practical purposes generally didn’t deliver – largely due to their unpredictability. To a large extent you have little control of where you will land in a balloon once you have gone up. Note, for example, that balloons were used to break the Prussian siege of Paris in the outbound direction only. A city the size of Paris is too small a target to hit, even for highly motivated fliers.
Nadar (pseudonym of Gaspard-Félix Tournachon), who lived in Paris, was one of the big promoters of just about anything. He fought a copyright battle with his brother over his, adopted, signature. Ballooning was one of his passions, he inspired Jules Verne to starting writing science fiction. His balloon, Le Géant, launched in 1863 was something of a culmination in ballooning – it was enormous – 60 metres high but served little purpose other than to highlight the limitations of the form – as was Nadar’s intent.
From a scientific point of view Falling Upwards covers James Glaisher and Henry Coxwell’s flights in the mid-nineteenth century. I was impressed by Glaisher’s perseverance in taking manual observations at a rate of one every 9 seconds throughout a 90 minute flight. Glaisher had been appointed by the British Association for the Advancement of Science to do his work, he was Superintendent for Meteorology and Magnetism at the Royal Greenwich Observatory. With his pilot Henry Coxwell he made a record-breaking ascent to approximately 8,800 meters in 1862, a flight they were rather lucky to survive. Later in the 19th century other scientists were to start to identify the layers in the atmosphere. Discovering that it is only a thin shell – 5 miles or so thick which is suitable for life.
The final chapter is on the Salomon Andrée’s attempt to reach the North Pole by balloon, as with so many polar stories it ends in cold, lonely, perhaps avoidable death for Andrée and his two colleagues. Their story was discovered when the photos and journals were recovered from White Island in the Artic Circle, some 30 years after they died.
Falling Upwards is a rather conversational history. Once again I’m struck by the long periods for technology to reach fruition. It’s true that from a technology point of view that heavier-than-air flight is very different from ballooning. But it’s difficult to imagine doing the former without the later.
Sep 18 2014
Of Matlab and Python
I’ve been a scientist and data analyst for nearly 25 years. Originally as an academic physicist, then as a research scientist in a large fast moving consumer goods company and now at a small technology company in Liverpool. In common to many scientists of my age I came to programming in the early eighties when a whole variety of home computers briefly flourished. My first formal training in programming was FORTRAN after which I have made my own way.
I came to Matlab in the late nineties, frustrated by the complexities of producing a smooth workflow with FORTRAN involving interaction, analysis and graphical output.
Matlab is widely used in academic circles and a number of industries because it provides a great deal of analytical power in a user-friendly environment. Its notation for handling matrix (array) calculations is slick. Its functionality is extended by a range of toolboxes, and there is a community of scientists sharing new functionality. It shares this feature set with systems such as IDL and PV-WAVE.
However, there are a number of issues with Matlab:
- as a programming language it has the air of new things being botched onto a creaking frame. Support for unit testing is an afterthought, there is some integration of source control into the Matlab environment but it is with Source Safe. It doesn’t support namespaces. It doesn’t support common data structures such as dictionaries, lists and sets.
- The toolbox ecosystem is heavily focused on scientific applications, generally in the physical sciences. So there is no support for natural language processing, for example, or building a web application based on the powerful analysis you can do elsewhere in the ecosystem;
- the licensing is a nightmare. Once you’ve got core Matlab additional toolboxes containing really useful functionality (statistics, database connections, a “compiler”) are all at an additional cost. You can investigate pricing here. In my experience you often find yourself needing a toolbox for just a couple of functions. For an academic things are a bit rosier, universities get lower price licenses and the process by which this is achieved is opaque to end-users. As an industrial user, involved in the licensing process, it is as bad as line management and sticking needles in your eyes in the “not much fun thing to do” stakes;
- running Matlab with network licenses means that your code may stop running part way through because you’ve made a call to a function to which you can’t currently get the license. It is difficult to describe the level of frustration and rage this brings. Now of course one answer is to buy individual licenses for all, or at least a significant surplus of network licenses. But tell that to the budget holder particularly when you wanted to run the analysis today. The alternative is to find one of the license holders of the required toolbox and discover if they are actually using it or whether they’ve gone off for a three hour meeting leaving Matlab open;
- deployment to users who do not have Matlab is painful. They need to download a more than 500MB runtime, of exactly the right version and the likelihood is they will be installing it just for your code;
I started programming in Python at much the same time as I started on Matlab. At the time I scarcely used it for analysis but even then when I wanted to parse the HTML table of contents for Physical Review E, Python was the obvious choice. I have written scrapers in Matlab but it involved interfering with the Java underpinnings of the language.
Python has matured since my early use. It now has a really great system of libraries which can be installed pretty much trivially, they extend far beyond those offered by Matlab. And in my view they are of very good quality. Innovation like IPython notebooks take the Matlab interactive style of analysis and extend it to be natively web-based. If you want a great example of this, take a look at the examples provided by Matthew Russell for his book, Mining the Social Web.
Python is a modern language undergoing slow, considered improvement. That’s to say it doesn’t carry a legacy stretching back decades and changes are small, and directed towards providing a more consistent language. Its used by many software developers who provide a source of help, support and an impetus for an decent infrastructure.
Ubuntu users will find Python pre-installed. For Windows users, such as myself, there are a number of distributions which bundle up a whole bunch of libraries useful for scientists and sometimes an IDE. I like python(x,y). New libraries can generally be installed almost trivially using the pip package management system. I actually use Python in Ubuntu and Windows almost equally often. There are a small number of libraries which are a bit more tricky to install in Windows – experienced users turn to Christoph Gohlke’s fantastic collection of precompiled binaries.
In summary, Matlab brought much to data analysis for scientists but its time is past. An analysis environment built around Python brings wider functionality, a better coding infrastructure and freedom from licensing hell.