SomeBeans

Apr 05 2014

Book review: Darwin’s Ghosts by Rebecca Stott

By SomeBeans in Book Reviews

Charles Darwin’s On the Origin of Species was rushed into print after a very long gestation when it became clear that Alfred Russell Wallace was close to publishing the same ideas on evolution. Lacking from the first edition was a historical overview of what went before, pertinent to the ideas of evolution. On the occasion of the publication of the first American edition, Darwin took the opportunity to address the lack. Darwin’s Ghosts: In search of the first evolutionists by Rebecca Stott is a modern look at those influences.

After an introductory, motivating chapter Darwin’s Ghosts works in approximately chronological order. Each chapter introduces a person, or group of people, who did early work in areas of biology which ultimately related to evolution. The first characters introduced are Aristotle, and then Jahiz, a Persian scholar working around 860AD. Aristotle brought systematic observation to biology, a seemingly basic concept which was not then universal. He wrote The History of Animals in about 350BC. The theme of systematic observation and experimentation continues through the book. Jahiz extended Aristotle’s ideas to include interactions of species, or webs. His work is captured in The Book of Living Beings.

Next up was a curiosity over fossils, and the inklings that things had not always been as they were now. Leonardo Da Vinci (1452-1519) and, some time later, Bernard Palissy (1510-1590) are used to illustrate this idea. Everyone has heard of da Vinci. Palissy was a Hugenot who lived in the second half of the 16th century. He was a renowned potter, and commissioned by Catherine de Medici to build the Tuileries gardens in Paris but in addition he lectured on natural sciences.

I must admit to being a bit puzzled at the introduction of Abraham Trembley (1710-1784), he was the tutor of two sons of a prominent Dutch politician. He worked on hydra, a very simple aquatic organism and his wikipedia page credits him as being one of the first experimental zoologists. He discovered that whole hydra could regenerated from parts of a “parent”.

Conceptually the next developments were in hypothesising a great age for the earth coupled to ideas that species were not immutable, they change over time. Benoît de Maillet (1656-1739) wrote on this but only posthumously. Similarly Robert Chambers (1802-1871) was to write anonymously about evolution in Vestiges of the Natural History of Creation first published in 1844. Note that this publication date is only 15 years before the first publication of the Origin of Species.

The reasons for this reticence on the part of a number of writers is that these ideas of mutability and change collide with major religions, they are “blasphemous”. This becomes a serious issue over the years spanning 1800. Erasmus Darwin, Charles’s grandfather, was something of an evolutionist but wrote relatively cryptically about it for fear of his career as a doctor. I reviewed Desmond King-Hele’s biography of Erasmus Darwin some time ago. At the time when Erasmus wrote evolution was considered a radical idea, both in political and religious senses. This at a time when the British establishment was feeling vulnerable following the Revolution in France and the earlier American revolution.

I have some sympathy with the idea that religion suppressed evolutionary theory, however it really isn’t as simple as that. The part religion plays is as a support to wider cultural and political movements.

The core point of Darwin’s Ghosts is that a scientist working in the first half of the 19th century was standing on the shoulders of giants or at least on top of a pile of people the lowest strata of which date back a couple of millennia. Not only this, they are not on an isolated pinnacle, around them are others also standing. Culturally we are fond of stories of lone geniuses but practically they don’t exist.

In fact the theory of evolution is a nice demonstration of this interdependence – Darwin was forced to publish his theory because Wallace had essentially got the gist of it entirely independently – his story is the final chapter in the book. For Wallace the geographic ranges of species were a key insight into forming the theory. A feature very apparent in the area of southeast Asia where he was working as a freelance specimen collector.

Once again I am caught out by my Kindle – the book proper ends at 66% of the way through, although Darwin’s original essay is included as an appendix taking us to 70%. Darwin’s words are worth reading, if only for his put-down of Richard Owen for attempting to claim credit for evolutionary theory, despite being one of those who had argued against it previously.

I enjoyed this book, much of my reading is scientific mono-biography which misses the ensemble nature of science which this book demonstrates.

biology, evolution, history of science, women writers

Mar 23 2014

The Third Way

By SomeBeans in Technology

Operating Systems were the great religious divide of our age.

A little over a year ago I was writing about my experiences setting up my Sony Vaio Windows 8 laptop to run Ubuntu on a virtual machine. Today I am exploring the Third Way – I’m writing this on a MacBook Air. This is the result of a client requirement: I’m working with the Government Digital Service who are heavily Mac oriented.

I think this makes me a secularist in computing terms.

My impressions so far:

Things got off to a slightly rocky start, the MacBook I’m using is a hand-me-down from a departing colleague. We took the joint decision to start from scratch on this machine and as a result of some cavalier disk erasing we ended up with a non-booting MacBook. In theory we should have been able to do a reinstall over the internet, in practice this didn’t work. So off I marched to our local Apple Store to get things sorted. The first time I’d entered such an emporium. I was to leave disappointed, it turned out I needed to make an appointment for a “Genius” to triage my laptop and the next appointment was a week hence, and I couldn’t leave the laptop behind for a “Genius Triage”. Alternatively, I could call Apple Care.

As you may guess this Genius language gets my goat! My mate Zarino was an Apple College Rep – should they have called him a Jihadi? Could you work non-ironically with a job title of Genius?

Somewhat bizarrely, marching the Air to the Apple Store and back fixed the problem, and an hour or so later I had a machine with an operating system. Perhaps it received a special essence from the mothership. On successfully booting my first actions were to configure my terminal. For the initiated the terminal is the thing that looks like computing from the early 80s – you type in commands at a prompt and are rewarded with more words in return. The reason for this odd choice was the intended usage. This MacBook is for coding, so next up was installing Sublime Text. I now have an environment for coding which superficial looks like the terminal/editor combination I use in Windows and Ubuntu!

It’s worth noting that for the MacBook the bash terminal I am using is a native part of the operating system, as it is for the Ubuntu VM on Windows the bash terminal is botched on to make various open source tools work.

Physically the machine is beautiful. My Vaio is quite pretty but compared to the Air it is fat and heavy. It has no hard disk indicator light. It has no hard disk, rather a 256GB SSD which means it boots really fast. 256GB is a bit small for me these days, with a title of data scientist I tend to stick big datasets on my laptop.

So far I’ve been getting used to using cmd+c and cmd+v to copy and paste, having overwritten stuff repeatedly with “v” having done the Windows ctrl+v. I’m getting used to the @ and ” keys being in the wrong place. And the menu bar for applications always appearing at the top of the screen, not the top of the application window. Fortunately the trackpad I can configure to simulate a two button mouse, rather than the default one button scheme. I find the Apple menu bar at the top a bit too small and austere and the Dock at the bottom is a bit cartoony. The Notes application is a travesty, a little faux notebook although I notice in OS X Mavericks it is more business-like.

For work I don’t anticipate any great problems in working entirely on a Mac, we use Google Apps for email and make extensive use of Google Docs. We use online services like Trello, GitHub and Pivotal in place of client side applications. Most the coding I do is in Python. The only no go area is Tableau which is currently only available on Windows.

I’ve never liked the OS wars, perhaps it was a transitional thing. I grew up in a time when there were a plethora of home computers. I’ve written programs on TRS-80s, Commodore VIC20, Amstrad CPC464s, Sinclair ZX81 and been aware of many more. At work I’ve used Dec Alphas, VAX/VMS and also PCs and Macs. Latterly everything is one the web, so the OS is just a platform for a browser.

I’m thinking of strapping the Air and the Vaio back to back to make a triple booting machine!

data science, MacBook Air, OS X

Feb 26 2014

Messier and messier

By SomeBeans in Science

Regular readers with a good memory will recall I bought a telescope about 18 months ago. I bemoaned the fact that I bought it in late Spring, since it meant it got dark rather late. I will note here that astronomy is generally incompatible with a small child who might wake you up in the middle of the night, requiring attention and early nights.

Since then I’ve taken pictures of the sun, the moon, Jupiter, Saturn and as a side project I also took wide angle photos of the Milky Way and star trails (telescope not required). Each of these bought their own challenges, and awe. The sun because it’s surprisingly difficult to find the thing in you view finder with the serious filter required to stop you blinding yourself when you do find it. The moon because it’s just beautiful and fills the field of view, rippling through the “seeing” or thermal turbulence of the atmosphere. Jupiter because of it’s Galilean moons, first observed by Galileo in 1610. Saturn because of it’s tiny ears, I saw Saturn on my first night of proper viewing. As the tiny image of Saturn floated across my field of view I was hopping up and down with excitement like a child.

I’ve had a bit of a hiatus in the astrophotography over the past year but I’m ready to get back into it.

My next targets for astrophotography are the Deep Sky Objects (DSOs), these are largish faint things as opposed to planets which are smallish bright things. My accidental wide-angle photos clued me into the possibilities here. I’d been trying to photograph constellations, which turn out to be a bit dull, at the end of the session I put the sensitivity of my camera right up and increased the exposure time and suddenly the Milky Way appeared! Even in rural Wales it was only just visible to the naked eye.

Now I’m keen to explore more of these faint objects. The place to start is the Messier Catolog of objects. This was compiled by Charles Messier and Pierre Méchain in the latter half of the 18th century. You may recognise the name Méchain, he was one of the two French men who surveyed France on the cusp of the Revolution to define a value for the meter. Ken Alder’s book The Measure of All Things, describes their adventures.

Messier and Mechain weren’t interested in the deep sky objects, they were interested in comets and compiled the list in order not to be distracted from their studies by other non-comety objects. The list is comprised of star clusters, nebula and galaxies. I must admit to being a bit dismissive of star clusters. The Messier list is by no means exhaustive, observations were all made in France with a small telescope so there are no objects from the Southern skies. But they are ideal for amateur astronomers in the Northern hemisphere since the high tech, professional telescope of the 18th century is matched by the consumer telescope of the 21st.

I’ve know of the Messier objects since I was a child but I have no intuition as to where they are, how bright and how big they are. So to get me started I found some numbers and made some plots.

The first plot shows where the objects are in the sky. They are labelled, somewhat fitfully with their Messier number and common name. Their locations are shown by declination, how far away from the celestial equator an object is, towards the North Pole and right ascension, how far around it is along a line of celestial latitude. I’ve added the moon to the plot in a fixed position close to the top left. As you can see the majority of the objects are North of the celestial equator. The size of the symbols indicates the relative size of the objects. The moon is shown to the same scale and we can see that a number of the objects are larger than the moon, these are often star clusters but galaxies such as Andromeda – the big purple blob on the right and the Triangulum Galaxy are also bigger than the moon. As is the Orion nebula.

So why aren’t we as familiar with these objects as we are with the moon. The second plot shows how bright the Messier objects are and their size. The horizontal axis shows their apparent size – it’s a linear scale so that an object twice as far from the vertical axis is twice as big. Note that these are apparent sizes, some things appear larger than others because they are closer. The Messier The vertical axis shows the apparent brightness, in astronomy brightness is measured in units of “magnitude” which is a logarithmic scale. This means that although the sun is roughly magnitude –26 and the moon is roughly magnitude –13, the sun is 10,000 times bright than the moon. The Messier objects are all much dimmer than Venus, Jupiter and Mercury and generally dimmer than Saturn.

So the Messier objects are often bigger but dimmer than things I have already photographed. But wait, the moon fills the field of view of my telescope. And not only that my telescope has an aperture of f/10 – a measure of it’s light gathering power. This is actually rather “slow” for a camera lens, my “fastest” lens is f/1.4 which represents a 50 fold larger light gathering power.

For these two reasons I have ordered a new lens for my camera, a Samyang 500mm f/6.3 this is going to give me a bigger field of view than my telescope which has a focal length of 1250mm. And also more light gathering power – my new lens should have more than double the light gathering power!

Watch this space for the results of my new purchase!

astronomy, data science, data visualisation

Feb 15 2014

Sublime

By SomeBeans in Technology

Sublime Text

Coders can be obsessive about their text editors. Dividing into relatively good natured camps. It is text editors not development environments over which they obsess and the great schism is between is between the followers of vim and those of Emacs. The line between text editor and development environment can be a bit fuzzy. A development environment is designed to help you do all the things required to make working software (writing, testing, compiling, linking, debugging, organising projects and libraries), whilst a text editor is designed to edit text. But sometimes text editors get mission creep.

vim and emacs are both editors with long pedigree on Unix systems. vim‘s parent, vi came into being in 1976, with vim being born in 1991, vim stands for “Vi Improved”. Emacs was also born in 1976. Glancing at the emacs wikipedia page I see there are elements of religiosity in the conflict between them.

To users of OS X and Windows, vim and emacs look and feel, frankly, bizarre. They came into being when windowed GUI interfaces didn’t exist. In basic mode they offer a large blank screen with no icons or even text menu items. There is a status line and a command line at the bottom of the screen. Users interact by issuing keyboard commands, they are interfaces with only keyboard shortcuts. It’s said that the best way to generate a random string of characters is to put a class of naive computer science undergraduates down in front of vim and tell them to save the file and exit the program! In fact to demonstrate the point, I’ve just trapped myself in emacs whilst trying to take a screen shot.

vim, image by Hermann Uwe

emacs, image by David Mundy

vim and emacs are both incredibly extensible, they’re written by coders for coders. As a measure of their flexibility: you can get twitter clients which run inside them.

I’ve used both emacs and vim but not warmed to either of them. I find them ugly to look at and confusing, I don’t sit in front on an editor enough of the day to make remembering keyboard shortcuts a comfortable experience. I’ve used the Matlab, Visual Studio and Spyder IDEs but never felt impassioned enough to write a blog post about them. I had a bad experience with Eclipse, which led to one of my more valued Stackoverflow answers.

But now I’ve discovered Sublime Text.

Sublime Text is very beautiful, particularly besides vim and emacs. I like the little inset in the top right of my screen which shows the file I’m working on from an eagle’s perspective, the nice rounded tabs. The colour scheme is subtle and muted, and I can get a panoply of variants on the theme. At Unilever we used to talk about trying to delight consumers with our products – Sublime Text does this. My only wish is that it went the way of Google Chrome and got rid of the Windows bar at the top.

Not only this, as with emacs and vim, I can customise Sublime Text with code or use other packages other people have written and in my favoured language, Python.

I use Sublime Text mainly to code in Python, using a Git Bash prompt to run code and to check it into source control. At the moment I have the following packages installed:

Package Control – for some reasons the thing that makes it easy to add new packages to Sublime Text comes as a separate package which you need to install manually;
PEP8 Autoformat – languages have style guides. Soft guidelines to ensure consistent use of whitespace, capitalisation and so forth. Some people get very up tight about style. PEP8 is the Python style guide, and PEP8 autoformat allows you to effortlessly conform to the style guide and so avoid friction with your colleagues;
Cheat Sheets – I can’t remember how to do anything, cheat sheets built into the editor make it easy to find things, and you can add your own cheat sheets too;
Markdown Preview – Markdown is a way of writing HTML without all the pointy brackets, this package helps you view the output of your Markdown;
SublimeRope – a handy package that tells you when your code won’t run and helps with autocompletion. Much better than cryptic error messages when you try to run faulty code. I suspect this is the most useful one so far.
Git and GitGutter – integrating Git source control into the editor. Git provides all the Git commands on a menu whilst GitGutter adds markers in the margin (or gutter) showing the revision status. These work nicely on Ubuntu but I haven’t worked out how to configure them on Windows.
SublimeREPL – brings a Python prompt into the editor. There are some configuration subtleties here when working with virtual environments.

I know I’ve only touched the surface of Sublime Text but unlike other editors I want to learn more!

data science, Python, Sublime Text

Feb 13 2014

Face ReKognition

By SomeBeans in Technology

This post was first published at ScraperWiki. The ReKognition API has now been withdrawn.

I’ve previously written about social media and the popularity of our Twitter Search and Followers tools. But how can we make Twitter data more useful to our customers? Analysing the profile pictures of Twitter accounts seemed like an interesting thing to do since they are often the faces of the account holder and a face can tell you a number of things about a person. Such as their gender, age and race. This type of demographic information is useful for marketing, and understanding who your product appeals to. It could also be a way of tying together public social media accounts since people like me use the same image across multiple accounts.

Compact digital cameras have offered face recognition for a while, and on my PC, Picasa churns through my photos identifying people in them. I’ve been doing image analysis for a long time, although never before on faces. My first effort at face recognition involved using the OpenCV library. OpenCV provides a whole suite of image analysis functions which do far more than just detect faces. However, getting it installed and working with the Python bindings on a PC was a bit fiddly, documentation was poor and the built-in face analysis capabilities were poor.

Fast forward a few months, and I spotted that someone had cast the ReKognition API over the images that the British Library had recently released, a dataset I’ve been poking around at too. The ReKognition API takes an image URL and a list of characteristics in which you are interested. These include, gender, race, age, emotion, whether or not you are wearing glasses or, oddly, whether you have your mouth open. Besides this summary information it returns a list of feature locations (i.e. locations in the image of eyes, mouth nose and so forth). It’s straightforward to use.

But who should be the first targets for my image analysis? Obviously, the ScraperWiki team! The pictures are quite small but ReKognition identified I was a “Happy, white, male, age 46 with no glasses on and my mouth shut”. Age 46 is a bit harsh – I’m actually 39 in my profile picture. A second target came out “Happy, Indian, male, age 24.7, with glasses on and mouth shut”. This was fairly accurate, Zarino was 25 when the photo was taken, he is male, has his glasses on but is not Indian. Two (male) members of the team, have still not forgiven ReKognition for describing them as female, particularly the one described as a 14 year old.

Fun as it was, this doesn’t really count as an evaluation of the technology. I investigated further by feeding in the photos of a whole load of famous people. The results of this are shown in the chart below. The horizontal axis is someone’s actual age, the vertical axis shows their age predicted by ReKognition. If the predictions were correct the points representing the celebrities would fall on the solid line. The dotted line shows a linear regression fit to the data. The equation of the line y = 0.673x (I constrained it to pass through zero) tells us that the age is consistently under-predicted by a third, or perhaps celebrities look younger than they really are! The R² parameter tells us how good the fit is: a value of 0.7591 is not too bad.

I also tried out ReKognition on a couple of class photos – taken at reunions, graduations and so forth. My thinking here being that I would get a cohort of people aged within a year of each other. These actually worked quite well; for older groups of people I got a standard deviation of only 5 years across a group of, typically, 10 people. A primary school class came out at 16+/-9 years, which wasn’t quite so good. I suspect the performance here is related to the fact that such group photos are taken relatively carefully and the lighting and setup for each face in the photo is, by its nature, the same.

Looking across these experiments: ReKognition is pretty good at finding faces in photos, and not find faces where there are none (about 90% accurate). It’s fairly good with gender (getting it right about 80% of the time, typically struggling a bit with younger children), it detects glasses pretty well. I don’t feel I tested it well on race. On age results are variable, for the ScraperWiki set the R^2 value for linear regression between actual and detected ages is about 0.5. Whilst for famous people it is about 0.75. In both cases it tends to under-estimate age and has never given an age above 55 despite being fed several more mature celebrities and grandparents. So on age, it definitely tells you something and under certain circumstances it can be quite accurate. Don’t forget the images we’re looking at are completely unconstrained, they’re not passport photos.

Finally, I applied face recognition to Twitter followers for the ScraperWiki account, and my personal account. The Summarise This Data tool on the ScraperWiki Platform provides a quick overview of the data added by face recognition.

It turns out that a little over 50% of the followers of both accounts have a picture of a human face as their profile picture. It’s clear the algorithm makes the odd error mis-identifying things that are not human faces as faces (including the back of a London Taxi Cab). There’s also the odd sketch or cartoon of a face, rather than a photo and some accounts have pictures of famous people, rather than obviously the account holder. Roughly a third of the followers of either account are identified as wearing glasses, three quarters of them look happy. Average ages in both cases were 30. The breakdown in terms of race is 70:13:11:7 White:Asian:Indian:Black. Finally, my followers are approximately 45% female, and those of ScraperWiki are about 30% female.

We’re now geared up to apply this to lists of Twitter followers – are you interested in learning more about your followers? Then send us an email and we’ll be in touch.

data science, scraperwiki

I've worked as a scientist for the last 30 years, at various universities, a large home and personal care company, a startup in Liverpool called The Sensible Code Company (formerly ScraperWiki Ltd), GBG and now as a consultant in data science.

I write about:
* the books I have read, typically science and history (or both), partly as a reminder to myself and partly as a review;
* science, things I have done or things I find interesting;
* technology, programming and gadgets;
politics, and current affairs;
* ...and other stuff as it takes my fancy - holidays, photographs and things I want to remember.

Book review: Darwin’s Ghosts by Rebecca Stott

The Third Way

Messier and messier

Sublime

Face ReKognition

About

Recent Posts

Categories

Blog Archive

Goodreads

Gardening

History

Politics

Science

Writers

Book review: Darwin’s Ghosts by Rebecca Stott

The Third Way

Messier and messier

Sublime

Face ReKognition

About

Recent Posts

Tags

Categories

Blog Archive

Goodreads

Gardening

History

Politics

Science

Writers