Dr Administrator

Author's posts

Dec 31 2014

Review of the year: 2014

By SomeBeans in Miscellaneous

Once again I look back on a year of blogging. You can see what I’ve been up to on the index page of this blog.

I get the feeling that my blog is just for me and a few students trying to fake having done their set reading. I regularly use my blog to remember how to fix my Ubuntu installation, and to help me remember what I’ve read.

A couple of posts this year broke that pattern.

Of Matlab and Python compared the older, proprietary way of doing scientific computing with Matlab to the rapidly growing, now mature, alternative of the Python ecosystem. I’ve used Matlab for 15 years or so as a scientist. At my new job, which is more open source and software developer oriented, I use Python. My blog post struck a cord with those burnt by licensing issues with Matlab. Basically, with Matlab you pay for a core license and then pay for toolboxes which add functionality (and sometimes you only use a small part of that functionality). It’s even more painful if you are managing networked licenses serving users across the world.

My second blog post with a larger readership was Feminism. This started with the unprofessional attire choice of a scientist on the Rosetta/Philae comet landing mission but turned into a wider, somewhat confessional post on feminism. In a nutshell: women routinely experience abuse and threat of which I believe men are almost entirely oblivious.

As before my blogging energies have been split between my own blog here, and the ScraperWiki blog. My personal blogging is dominated by book reviews these days as, to be honest, is my blogging at ScraperWiki. I blog about data science books on the ScraperWiki blog – typically books about software – and anything else on this blog. “Anything else” is usually broadly related to the history of science and technology.

This year has been quite eclectic. I read about the precursors to Darwin and his theory of evolution, macroeconomics, the Bell Laboratories, railways, parenthood, technology in society, finding the longitude (twice), Lord Kelvin, ballooning, Pompeii and I’ve just finished a book on Nevil Maskelyne – Astronomer Royal in the second half of the 18th century. I think my favourite of these was Finding Longitude by Richard Dunn and Rebekah Higgitt not only is the content well written but it is beautifully presented.

Over on the ScraperWiki blog I reviewed a further 12 books, bingeing on graph theory and data mining. My favourites from the "work" set were Matthew A. Russell’s Mining the Social Web and Seven Databases in Seven Weeks. Mining the Social Web because it introduces a bunch of machine learning algorithms around interesting social data, and the examples are supplied as IPython notebooks run in a virtual machine. Seven Databases is different – it gives a whistle stop tour of various types of database but manages to give deep coverage quite quickly.

I continue to read a lot whilst not doing a huge amount of programming – as I observed last year. I did write a large chunk of the API to the EU NewsReader project we’re working on which involved me learning SPARQL – a query language for the semantic web. Obviously to learn SPARQL I read a book, Learning SPARQL, I also had some help from colleagues on the project.

I had a lot of fun visualising the traffic and history of the London Underground, I did a second visualisation post on whether to walk between Underground stations in London.

Back on this blog I did some writing about technology, talking about my favourite text editor (Sublime Text), my experiences with Apple, Ubuntu and Windows operating systems, the dinky Asus T100 Transformer laptop, and replacing my hard drive with an SSD (much easier than I thought it would be). The Asus is sadly unused it just doesn’t serve a useful purpose beside my tablet and ultrabook format laptop. The SSD drive is a revelation, it just makes everything feel quicker.

The telescope has been in the loft for much of the last year but I did a blog post on the Messier objects – nebulae and so forth, and I actually took an acceptable photo of the Orion nebula although this went unblogged.

Finally, the source of the photo at the top of the page, I visited San Sebastian for an EU project I’m working on. I only had my phone so the pictures aren’t that good.

Happy New Year!

review

Dec 28 2014

Book review: Maskelyne – Astronomer Royal edited by Rebekah Higgitt

By SomeBeans in Book Reviews

Over the years I’ve read a number of books around the Royal Observatory at Greenwich: books about finding the longitude or about people.

Maskelyne – Astronomer Royal edited by Rebekah Higgitt is unusual for me – it’s an edited volume of articles relating to Nevil Maskelyne by a range of authors rather than a single author work. Linking these articles are “Case Studies” written by Higgitt which provide background and coherence.

The collection includes articles on the evolution of Maskelyne’s reputation, Robert Waddington – who travelled with him on his St Helena trip, his role as a manager, the human computers used to calculate the tables in the Nautical Almanac, his interactions with clockmakers, his relationships with savants across Europe, his relationship with Joseph Banks, and his family life.

The Royal Observatory with its Astronomer Royal was founded by Charles II in 1675 with the goal of making astronomical observations to help with maritime navigation. The role gained importance in 1714 with the passing of the Longitude Act, which offered a prize to anyone who could present a practical method of finding the longitude at sea. The Astronomer Royal was one of the appointees to the Board of Longitude who judged applications. The observations and calculations done, and directed, from the Observatory were to form an important part of successful navigation at sea.

The post of Astronomy Royal was first held by John Flamsteed and then Edmund Halley. A persistent problem to the time of Maskelyne was the publication of the observations of the Astronomers Royal. Flamsteed and Newton notoriously fell out over such measurements. It seems very odd to modern eyes, but the observations the early Astronomers Royal made they essentially saw as their personal property, removed by executors on their death and thus lost to the nation. Furthermore, in the time of Maskelyne the Royal Observatory was not considered the pre-eminent observatory in Britain in terms of the quality of its instruments or observations.

Maskelyne’s appointment was to address these problems. He made the observations of the Observatory available to the Royal Society (the Visitors of the Observatory) on an annual basis and pushed for the publication of earlier observations. He made the making of observations a much more systematic affair, and he had a keen interest in the quality of the instruments used. Furthermore, he started the publication of the Nautical Almanac which provided sailors with a relatively quick method for calculating their longitude using the lunar distance method. He was keenly aware of the importance of providing accurate, reliable observational and calculated results.

He was appointed Astronomer Royal in 1765 not long after a trip to St Helena to make measurements of the first of a pair of Venus transits in 1761, to this he added a range of other activities which including testing the lunar distance method for finding longitude, the the “going” of precision clocks over an extended period and Harrison’s H4 chronometer. In later years he was instrumental in coordinating a number of further scientific expeditions doing things such as ensuring uniform instrumentation, providing detailed instructions for observers and giving voyages multiple scientific targets.

H4 is a primary reason for Maskelyne’s “notoriety”, in large part because of Dava Sobel’s book on finding the longitude where he is portrayed as the villain against the heroic clockmaker, John Harrison. By 1761 John Harrison had been working on the longitude problem by means of clocks for many years. Sobel’s presentation sees Maskelyne as a biased judge, favouring the Lunar distance method for determining longitude acting in his own interests against Harrison.

Professional historians of science have long felt that Maskelyne was hard done by Sobel’s biography. This book is not a rebuttal of Sobel’s but is written with the intention of bringing more information regarding Maskelyne to a general readership. It’s also stimulated by the availability of new material regarding Maskelyne.

Much of the book covers Maskelyne’s personal interactions with a range of people and groups. It details his exchanges with the “computers” who did the lengthy calculations which went into the Nautical Almanac; his interactions with a whole range of clockmakers for whom he often recommended to others looking for precision timepieces for astronomical purposes. It also discusses his relationships with other savants across Europe and the Royal Society. His relationship with Joseph Banks garners a whole chapter. A proposition in one chapter is that such personal, rather than institutional, relationships were key to 18th century science, I can’t help feeling this is still the case.

The theme of these articles is that Maskelyne was a considerate and competent man, going out of his way to help and support those he worked with. To my mind his hallmark is bringing professionalism to the business of astronomy.

In common with Finding Longitude this book is beautifully produced, and despite the multitude of authors it hangs together nicely. It’s not really a biography of Maskelyne but perhaps better for that.

Dec 09 2014

To infinity and beyond! Or how I replaced my hard disk with an SSD

By SomeBeans in Technology

Nearly two years ago I bought a Sony Vaio T13 laptop as my combined work and home main computer. It’s a nice piece of kit and has served me well. When I bought it I commented that I’d like to have had an SSD drive rather than the conventional 500GB “spinning rust” hybrid drive with which it came. At the time specifying a 512GB SSD from the outset was eyewateringly expensive.

Nearly 2 years later I finally got around to making the upgrade! And it was remarkably straightforward. Largely through laziness I went for the 512GB SSD drive I’d identified nearly two years ago – the Samsung 840 Pro, the price had dropped from something like £450 to £230. There are cheaper options, down to about £150 but I’m already saving money if the Pro started at £450* ;-)

The drive itself is rather insubstantial, turning up in a padded envelope that just doesn’t feel heavy enough. It comes with a CD containing Data Migration software and some wizard diagnostic software. The migration software clones your current hard drive to the new SSD. You need to get a SATA to USB adaptor, like this one, to do this. Cloning my drive took about 5 hours but I need to decrypt it first which was a 24 hour or so job – my drive was encrypted with Bitlocker.

With SSD containing contents of original drive cloned on to it in hand, all that is required is to open up laptop and swap the drives over. This turns out to be really easy on the Vaio: unscrew battery, unscrew drive compartment cover, unscrew drive cage from from laptop, remove old hard drive, remove drive cage from old drive, put drive cage on new drive and then repeat steps to remove old drive in reverse to install new drive.

A set of dinky screw drivers is handy and it would have helped if I’d realised the drive cage was screwed to the laptop frame before I started prising at it with the big screwdriver but no harm done.

I actually found replacing the hard drive on my laptop easy than replacing or adding a drive to a desktop. Whenever I’ve added a hard drive to a desktop there has been cursing and skinned knuckles in removing/adding the power connector and unseemly cowboying of the drive into some ill fitting drive cage using left over grub screws. Compared to this working on the Vaio was a joy. You might want to test the “lie of the land” on your own model of laptop in terms of accessibility to the drive. My suspicion is that it will be generally straightforward since laptops often have the drive size as an option.

The moment of truth is rebooting after installation – this Just Worked^TMwhich was a relief. First hints of improved performance were in re-encrypting the hard drive. Decrypting the conventional drive the peak IO transfer rate was about 30MB/s whilst with the SSD drive the peak was around 150MB/s. Opening up Microsoft Office applications is much snappier, as is opening Sublime Text. I should probably have a go at uploading a multi-gigabyte CSV file to MySQL, which I know is heavily IO bound but I can’t be bothered. All in all my laptop just feels rather more responsive.

I played a bit with the supplied diagnostic wizard software but didn’t think much of it, so promptly uninstalled it.

Overall: much more straightforward and less scary than I anticipated – I recommend this approach to anyone with a laptop to refresh and a modicum of courage.

*This logic brought to you by Mrs SomeBeans and her Yamaha Thundercat purchase!

data science, Sony Vaio, SSD

Dec 09 2014

NewsReader – the developers story

By SomeBeans in Technology

This post was first published at ScraperWiki.

ScraperWiki has been a partner in NewsReader, an EU Framework 7 research project, for the last couple of years. The aim of NewsReader is to give computers the power to “understand” the news; to extract from a myriad of news articles the underlying events which gave rise to those articles; the who, the where, the why and the what of those events. The project is comprised of academic researchers specialising in computational linguistics (VUA in Amsterdam, EHU in the Basque Country and FBK in Trento), Lexis Nexis – a major news aggregator, and a couple of small technology companies: ourselves at ScraperWiki and SynerScope – a Dutch startup specialising in the visualisation of complex networks.

Our role at ScraperWiki is in providing mechanisms to enable developers to exploit the NewsReader technology, and to feed news into the system. As part of this work we have developed a simple REST API which gives access to the KnowledgeStore, the system which underpins NewsReader. The native query language of the KnowledgeStore is SPARQL – the query language of the semantic web. The Simple API provides a set of predefined queries which are easier for end users to work with than raw SPARQL, and help us as service managers by providing a predictable set of optimised queries. If you want to know more technical detail then we’ve written a paper about it (here).

The Simple API has seen live action at a Hack Day on World Cup news which we held in London in the summer. Attendees were able to develop a range of applications which probed violence, money and corruption in the realm of the World Cup. I blogged about our previous Hack Day here and here. The Simple API, and the Hack Day helped us shake out some bugs and add features which will make it even better next time.

“Next time” is another Hack Day to be held in the Amsterdam on 21st January 2015, and London on the 30th January 2015. This time we have processed 6,000,000 articles relating to the car industry over the period 2005-2014. The motor industry is a trillion dollar a year business, so we can anticipate finding lots of valuable information in this horde.

From our previous experience the three things that NewsReader excels at are:

Finding networks of interactions, identifying important players. For the World Cup Hack Day we at ScraperWiki were handicapped slightly by having no interest in football! But the NewsReader technology enabled us to quickly identify that “Sepp Blatter”, “Jack Warner” and “Mohammed bin Hammam” were important in world football. This is illustrated in this slightly cryptic visualisation made using Gephi:
Finding events of a particular type. the NewsReader technology carries out semantic role labeling: taking sentences and identifying what type of event is described in that sentence and what roles the participants took. This information is then aggregated and exposed using semantic web technology. In the World Cup Hack Day participants used this functionality to identify events involving violence, bribery, gambling, and other financial transactions;
Establishing timelines. In the World Cup data we could track the events involving “Mohammed bin Hammam” through time and the type of events he was involved in. This enabled us to quickly navigate to pertinent news articles.

You can see fragments of code used to extract these data using the Simple API in these GitHub Gists (here and here), and dynamic visualisations illustrating these three features here and here.

The Simple API is up and running already, you can find it (here). It is self-documenting, simply visit the root URL and you’ll see query examples with optional and compulsory parameters. Be aware though: the Simple API is under active development, and the underlying data in the KnowledgeStore is being optimised for the Hack Days so it may not be available when you visit.

If you want to join our automotive Hack Day then you can sign up for the Amsterdam event (here) and the London event (here).

data science, newsreader, scraperwiki

Nov 27 2014

Book review: Linked by Albert-László Barabási

By SomeBeans in Book Reviews

This review was first posted at ScraperWiki.
I am on a bit of a graph theory binge, it started with an attempt to learn about Gephi, the graph visualisation software, which developed into reading a proper grown up book on graph theory. I then learnt a little more about practicalities on reading Seven Databases in Seven Weeks, which included a section on Neo4J – a graph database. Now I move on to Linked by Albert-László Barabási, this is a popular account of the rise of the analysis of complex networks in the late nineties. A short subtitle used on earlier editions was “The New Science of Networks”. The rather lengthy subtitle on this edition is “How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life”.

In mathematical terms a graph is an abstract collection of nodes linked by edges. My social network is a graph comprised of people, the nodes, and their interactions such as friendships, which are the edges. The internet is a graph, comprising routers at the nodes and the links between them are edges. “Network” is a less formal term often used synonymously with graph, “complex” is more a matter of taste but it implies large and with a structure which cannot be trivially described i.e. each node has four edges is not a complex network.

The models used for the complex networks discussed in this book are the descendants of the random networks first constructed by Erdős and Rényi. They imagined a simple scheme whereby nodes in a network were randomly connected with some fixed probability. This generates a particular type of random network which do not replicate real-world networks such as social networks or the internet. The innovations introduced by Barabási and others are in the measurement of real world networks and new methods of construction which produce small-world and scale-free network models. Small-world networks are characterised by clusters of tightly interconnected nodes with a few links between those clusters, they describe social networks. Scale-free networks contain nodes with any number of connections but where nodes with larger numbers of connections are less common than those with a small number. For example on the web there are many web pages (nodes) with a few links (edges) but there exist some web pages with thousands and thousands of links, and all values in between.

I’ve long been aware of Barabási’s work, dating back to my time as an academic where I worked in the area of soft condensed matter. The study of complex networks was becoming a thing at the time, and of all the areas of physics soft condensed matter was closest to it. Barabási’s work was one of the sparks that set the area going. The connection with physics is around so-called power laws which are found in a wide range of physical systems. The networks that Barabási is so interested in show power law behaviour in the number of connections a node has. This has implications for a wide range of properties of the system such as robustness to the removal of nodes, transport properties and so forth. The book starts with some historical vignettes on the origins of graph theory, with Euler and the bridges of Königsberg problem. It then goes on to discuss various complex networks with some coverage of the origins of their study and the work that Barabási has done in the area. As such it is a pretty personal review. Barabási also recounts some of the history of “six degrees of separation”, the idea that everyone is linked to everyone else by only six links. This idea had its traceable origins back in the early years of the 20th century in Budapest.

Graph theory has been around for a long while, and the study of random networks for 50 years or so. Why the sudden surge in interest? It boils down to a couple of factors, the first is the internet which provides a complex network of physical connections on which a further complex network of connections sit in the form of the web. The graph structure of this infrastructure is relatively easy to explore using automatic tools, you can build a map of millions of nodes with relative ease compared to networks in the “real” world. Furthermore, this complex network infrastructure and the rise of automated experiments has improved our ability to explore and disseminate information on physical networks. For example, the network of chemical interactions in a cell, the network of actors in movies, our social interactions, the spread of disease and so forth. In the past getting such detailed information on large networks was tiresome and the distribution mechanisms for such data slow and inconvenient.

For a book written a few short years ago, Linked can feel strangely dated. It discusses Apple’s failure in the handheld computing market with the Newton palm top device, and the success of Palm with their subsequent range. Names of long forgotten internet companies float by, although even at the time of writing Google was beginning its dominance.

If you are new to graph theory and want an unchallenging introduction then Linked is a good place to start. It’s readable and has a whole load of interesting examples of scale free networks in the wild. Whilst not the whole of graph theory, this is where interesting new things are happening.

data science, graph theory, scraperwiki

I've worked as a scientist for the last 30 years, at various universities, a large home and personal care company, a startup in Liverpool called The Sensible Code Company (formerly ScraperWiki Ltd), GBG and now as a consultant in data science.

I write about:
* the books I have read, typically science and history (or both), partly as a reminder to myself and partly as a review;
* science, things I have done or things I find interesting;
* technology, programming and gadgets;
politics, and current affairs;
* ...and other stuff as it takes my fancy - holidays, photographs and things I want to remember.

Dr Administrator

Author's posts

Review of the year: 2014

Book review: Maskelyne – Astronomer Royal edited by Rebekah Higgitt

To infinity and beyond! Or how I replaced my hard disk with an SSD

NewsReader – the developers story

Book review: Linked by Albert-László Barabási

About

Recent Posts

Categories

Blog Archive

Goodreads

Gardening

History

Politics

Science

Writers

Dr Administrator

Author's posts

Review of the year: 2014

Book review: Maskelyne – Astronomer Royal edited by Rebekah Higgitt

To infinity and beyond! Or how I replaced my hard disk with an SSD

NewsReader – the developers story

Book review: Linked by Albert-László Barabási

About

Recent Posts

Tags

Categories

Blog Archive

Goodreads

Gardening

History

Politics

Science

Writers