Author's posts
Mar 14 2013
Book review: The Eighth Day of Creation by Horace Freeland Judson
My reading moves seamlessly from the origins of cosmology (in Koestler’s Sleepwalkers) to the origins of molecular biology in “The Eighth Day of Creation” by Horace Freeland Judson. The book covers the revolution in biology starting with the elucidation of the structure of DNA through to how this leads to the synthesis, by organisms, of proteins – this covers a period from just before the Second World War to the early 1960s although in the Epilogue and Afterwords. Judson comments on the period up to the mid-nineties. Although the book does provide basic information on the core concepts (What is DNA? What is a protein?), I suspect it requires a degree of familiarity with these ideas to make much sense on a casual reading – the same applies to this blog post.
The first third or so of the book covers the elucidation of the structure of DNA. Three groups were working on this problem – that of Linus Pauling in the US, Franklin and Wilkins at Kings College in London and Crick and Watson in Cambridge. Key to the success of Crick and Watson was their collaboration: a willingness to talk to people who knew stuff they needed to know, and piecing the bits together. The structural features of their model were the helix form (this wasn’t news), specific and strong hydrogen bonding between bases, and the presence of two DNA chains (running in opposite directions). On the whole this wasn’t a new story to me, although I wasn’t familiar with the surrounding work which established DNA as the genetic material. Judson returns to the part Rosalind Franklin in the discovery in one of the Afterwords. It has been said that Franklin was greatly wronged over the discovery of DNA, but Judson does not hold this view and I tend to agree with him. The core of the problem is that the Nobel Prize is not awarded posthumously, and with her death at 37 from cancer, Franklin therefore missed out. Watson’s book The Double Helix was a rather personalised view of the characters involved most of whom were alive to carry out damage limitation, whilst Franklin was not – so here she was poorly treated but by Watson rather than a whole community of scientists. Perhaps the thing that said the most to me about the situation is that after she was diagnosed with cancer she stayed with Cricks at their home.
In parallel with the elucidation of the structure of the DNA work had been ongoing with understanding protein synthesis and genetics in viruses and bacteria. This included both how information was coded into DNA, with much effort expended in trying to establish overlapping codes. There are 20 amino acids and four bases in DNA, so three base pairs are required to specify an amino acid if the amino acid sequence is to be unconstrained but it was conceivable that two consecutive amino acids are coded by fewer than 6 base pairs but in this case there is a restriction on the possible amino acid sequences. This area was initiated by the physicist, George Gamow. I struggle a bit to see how it gained so much traction, this type of model was quickly ruled out by consideration of the amino acid sequences that we being established for proteins at the time. It turns out that amino acids are coded by three consecutive base pairs with redundancy (so several different base pair triplets code for the same amino acid). Also covered was the mechanism by which data passed from DNA to the ribosomes where protein synthesis takes place, important here are adaptor molecules which carry the appropriate amino acid to the site of synthesis.
Compared to the structure of DNA this work was a long difficult slog, involving intricate experiments with bacteria, bacteriophage viruses, bacterial sex, ultracentrifugation, chromatography and radiolabelling.
The final part of the book is on the elucidation of the structure of proteins, this was done using x-ray crystallography with the very first clear scattering patterns measured in the 1930s and the first full elucidation made in the late fifties. X-ray crystallography of proteins, containing many thousands of atoms is challenging. Fundamentally there is a issue, the “phase problem”, which means you don’t have quite enough information to determine the structure from the scattering pattern. This issue was resolved by heavy atom labelling, here you try to chemically attach a heavy atom such as mercury to your protein then compare the scattering pattern of this modified protein with that of the unmodified protein, which resolves the phase problem. Nowadays measuring the thousands of spots in an x-ray scattering pattern and carrying out the thousands and thousands of calculations required to resolve the structure is relatively straightforward but in the early days it was a massive manual labour.
As well as resolving structure a key discovery was made regarding the mode of action of proteins: essentially they work as adaptors between chemical distinct systems – when a molecule binds to one site on a protein it effects the ability of another type of molecule to bind to another site on the protein through changes in the protein structure induced by the first molecule’s binding. This feature opens up huge possibilities for cell biology – in the absence of this feature interactions between chemical systems can only occur if the participants in those systems interact with each other chemically.
It isn’t something I’d really appreciated properly but molecular biologists are quite organised in the organisms that they generally agree to work on. The truth is that there are uncountably many viruses and so to aid the progress of science one needs to select which ones to study: E. Coli, the T series bacteriophages, C. Elegans, D. Melanogaster and more recently the zebrafish, they almost play the part of an extra author.
Molecular biology was apparently dominated by physicists, I must admit I found this confusing in the past but Judson highlights the field as defined by its practioners: biochemistry is about energy and matter (and typically small molecules), molecular biology is about information (and typically macromolecules) – a more natural home for physicists.
I found the first and third parts an enjoyable read, my scientific background is in scattering so the technical material was at least familiar the central section on genetics I found fascinating but a bit of a slog. I’m somewhat in awe of the complexity of the experiments (and their apparent difficulty).
Looking back on my earlier book reviews, I read my comment on R.J. Evan’s book on historiography that history is a literary exercise as well as anything else, as a trained scientist this was something of an alien concept but in common with Koestler’s book the style of this book shines through.
Footnotes
Jan 29 2013
Enterprise data analysis and visualization
This post was first published at ScraperWiki.
The topic for today is a paper[1] by members of the Stanford Visualization Group on interviews with data analysts, entitled “Enterprise Data Analysis and Visualization: An Interview Study”. This is clearly relevant to us here at ScraperWiki, and thankfully their analysis fits in with the things we are trying to achieve.
The study is compiled from interviews with 35 data analysts across a range of business sectors including finance, health care, social networking, marketing and retail. The respondents are harvested via personal contacts and predominantly from Northern California; as such it is not a random sample, we should consider results to be qualitatively indicative rather than quantitatively accurate.
The study identifies three classes of analyst whom they refer to as Hackers, Scripters and Application Users. The Hacker role was defined as those chaining together different analysis tools to reach a final data analysis. Scripters, on the other hand, conducted most of their analysis in one package such as R or Matlab and were less likely to scrape raw data sources. Scripters tended to carry out more sophisticated analysis than Hackers, with analysis and visualisation all in the single software package. Finally, Application Users worked largely in Excel with data supplied to them by IT departments. I suspect a wider survey would show a predominance of Application Users and a relatively smaller relative population of Hackers.
The authors divide the process of data analysis into 5 broad phases Discovery – Wrangle – Profile – Model – Report. These phases are generally self explanatory – wrangling is the process of parsing data into a format suitable for further analysis and profiling is the process of checking the data quality and establishing fully the nature of the data.
This is all summarised in the figure below, each column represents an individual so we can see in this sample that Hackers predominate.
At the bottom of the table are identified the tools used, divided into database, scripting and modeling types. Looking across the tools in use SQL is key in databases, Java and Python in scripting, R and Excel in modeling. It’s interesting to note here that even the Hackers make quite heavy use of Excel.
The paper goes on to discuss the organizational and collaborative structures in which data analysts work, frequently an IT department is responsible for internal data sources and the productionising of analysis workflows.
Its interesting to highlight the pain points identified by interviewees and interviewers:
- scripts and intermediate data not shared;
- discovery and wrangling are time consuming and tedious processes;
- workflows not reusable;
- ingesting semi-structured data such as log files is challenging.
Why does this happen? Typically the wrangling scraping phase of the operation is ad hoc, the scripts used are short, practioners don’t see this as their core expertise and they’ll typically draw from a limited number of data sources meaning there is little scope to build generic tools. Revision control tends not to be used, even for the scripting tools where it is relatively straightforward perhaps because practioners have not been introduced to revision control or simply see the code they write as too insignificant to bother with revision control.
ScraperWiki has its roots in data journalism, open source software and community action but the tools we build are broadly applicable, as one of the respondents to the survey said:
“An analyst at a large hedge fund noted their organization’s ability to make use of publicly available but poorly-structured data was their primary advantage over competitors.”
References
Jan 27 2013
More Shiny – Sony Vaio T13 laptop with Windows 8
I thought I’d mix together a review of my shiny new laptop (a Sony Vaio T13) with one of Windows 8 which came pre-installed on the laptop.
The laptop
Six years after buying my last laptop I have replaced it with another Sony Vaio. At the time I bought the first one I didn’t think I would do this, my old Sony Vaio (VGN-SZ2M) is a nice machine but it was infested with Sony cruftware which added little functionality and what it did try to add didn’t seem to work and the couriers Sony selected left it with a neighbour without asking whether this was appropriate. It had a weird black plastic finish which was probably described as "carbon fibre". It’s worked fine although I found the 80GB hard disk a little cramped and as the years went by it felt slower and slower when compared to the other machines I use.
After poking around extensively I finally decided on another Sony Vaio, other contenders were the Lenovo Yoga 13 (limited availability and would that hinge really hold out?), the Acer Aspire S7 (more pricey for a poorer config and apparently no option for a big conventional drive) and offerings from Samsung, Toshiba and Dell – the bar for being a contender in this limited set was the touchscreen. I did look at non-touchscreen variants too and particularly liked the look of the Lenovo IdeaPad U410.
Having decided, I bought direct from Sony getting to get a bit more configuration flexibility adding 8GB RAM, an i7 processor and going for the 32GB SSD/500GB conventional hard drive combination, this is an ultrabook class laptop with a 13.3" touchscreen, no optical drive, and Windows 8. I liked the idea of getting a pure SSD system but the price Sony charges for the upgrade is about double the price of the highly regarded Samsung 840 Pro series SSDs so maybe I’ll be opening the thing up soon. It weighs 1.5kg which is light but not the lightest in this class, I decided on a touchscreen since it didn’t seem to add hugely to the cost and it isn’t something you can retrofit should the desire arise.
It is a very beautiful thing: brushed metal with chromed highlights, and in its pristine state it comes out of hibernate very quickly.
Compared to my old laptop it has the same footprint, unsurprising since the screen is the same size. The keyboard is narrower though, losing a column of keys, but the device is about half the thickness – having lost the optical drive.
I worried a little about the monolithic touchpad with no separate left and right mouse buttons but it has a positive click in these two locations so I’ve not noticed the lack of separate buttons.
The screen resolution may be a little deficient (1366×768) but it is comparable with most of the laptops in its class and I intend using it on an external monitor anyway.
There is a small infestation of cruftware, featuring an update centre which seems to struggle to provide the necessary bandwith and an update-able electronic manual which I can’t seem to get hold of because the instructions for downloading it take you around in a loop.
As if in pique my old desktop PC failed shortly after I got the new Vaio so I’m using it as my sole computer for now, this works fine except it is a pain to install CD based software for various bits of hardware (quite why my video camera shipped with 4 CDs of software I don’t understand).
So overall – the Sony Vaio gets an A, a tick or some number of stars between 5 and 10.
Windows 8
I have a bit of a habit for getting computers with brand new Microsoft operating systems, although fortunately I skipped Windows Vista. Windows 8 takes a bit of getting used to, the best way of thinking about it is as Windows 7 with a mobile phone interface dropped on top of it. This is both good and bad. Personally I rather like Windows 7, and I’m also rather pleased with the Android-based touchscreen interface on my HTC Desire phone but the combination of the two is a bit disturbing.
Actually "a bit disturbing" is wrong "crap" would be better, the new style apps follow very different UI rules from conventional Windows apps and major in form over content – for example the pre-installed twitter app, although pretty and swooshy with the touchscreen is utterly useless as a twitter client. Not only does it have limited functionality but in order to view anything but the briefest of timelines you need to flap your arm about like a deranged semaphorist. The twitter app from twitter is marginally more functional but looks like the portrait aspect ratio phone screen placed in the middle of a wide laptop screen. Comparing my Android phone and tablet it strikes me few people have cracked scaling apps from phone to tablet size screens, let alone all the way to laptop screen sizes.
Live tiles offer interesting possibilities but they are constrained to one of two sizes, and I’ve yet to find one which does anything particularly interesting.
Microsoft is very keen for developers to write the mobile phone style apps, at one point the (free) Express version of Visual Studio was only going to allow developers to target the mobile phone style apps.
The only real redeeming feature of the new Windows 8 additions is that, once you’ve accepted the concept, the Start screen is better than the old Start button.
Not so long ago I would have "struck down upon thee with great vengeance and furious anger those who" touched the screen of any device I owned, these days I’m a little bit more relaxed: I find the touchscreen a nice adjunct to more conventional input but I have a smeary screen now.
It seems to me there are a limited number of things you need to "get" about an operating system in order to use it with a peaceful mind, for Windows 7 a big one was that you didn’t need to go stumbling through a cascade of entries in the Start menu – you just start typing the name of your desired application into the search box and it was revealed fairly promptly. Start typing when you are on the Windows 8 Start screen and you launch just such a search – how the hell you’re supposed to know this is a mystery to me. And this seems like one of the core problems with Windows 8 – there are some nice little interface features but there’s no way you would guess they were there or find them by accident.
Windows 8 is keen for you to login using a Microsoft account, it is possible to just use a local account but I thought “in for a penny, in for a pound” and went ahead and set one up. Interestingly you can see the benefit of this approach when using Google Chrome, when I installed Chrome it automatically installed the plugins I have on other PCs, my autocorrect settings and so forth – instantly I was at home. I guess this is the longer term plan for Windows 8. It also wants me to have an xbox account to buy music and video.
Some hints for new users of Windows 8:
- To shift tiles around on the Start page, hold them and drag then up or down initially (not left-right), to zoom out drag them towards the bottom of the screen;
- If you use Google Chrome as your default browser the title bar icons (minimise, maximise and close) disappear, to fix this don’t use it as your default browser;
- There exist both new style and old style applications, some things are available in both formats, for example Dropbox. The new-style apps resemble phone apps but offer limited functionality;
- New-style apps don’t have an "exit" button, simply navigate away from them as you would a phone app;
- The Start screen replaces the Start menu on the old Windows 7 desktop, to search for anything just start typing!
- Windows 8 style apps cannot play MPEG2 files, this is only available for Windows 8 Pro with added Windows Media Centre. Windows Media Player will play them (suitable codecs installed – I used Shark007) and VLC player works fine.
On the last item: this seems a bit bonkers – the video app on the mobile-style interface can see your video library perhaps containing an unrelenting series of videos of your growing child which will almost inevitably be in MPEG2 format as a default so crippling this functionality seems a bit stupid.
Bottom line: Windows 8 is very pretty and the Start screen is, in my view, better than the old Windows 7 Start menu once you’ve got your head around it. The idea of putting a mobile phone interface, with mobile phone style apps, on top of a desktop interface is stupid – my opinion on this may change if I see some apps that are optimised for laptops. Mobile interfaces such as iOS and Android are optimised for consumption which is fine, but many people will still be getting PC class devices to do “work” and for the main the new mobile interface in Windows 8 gets in the way of that.
And now to install Ubuntu on it… a process so exciting I have made it the subject of a second blog post.
Jan 18 2013
Windows 8 and Ubuntu 12.10 on a Sony Vaio T13 laptop
I wanted to dual boot my new Sony Vaio T13 laptop with Windows 8 and Ubuntu 12.10, as it turned out I found it challenging to setup a true dual boot but I have a satisfactory solution.
This process is not straightforward because the T13 uses the Insyde H2O UEFI instead of a old-style BIOS furthermore since Windows 8 was pre-installed SecureBoot is switched on, these factors mean that only the most recent, 64-bit version of Ubuntu (12.10) has any chance of installing. Also the T13 has no optical drive so I would need to boot from a USB memory stick.
I’ve installed various Linux distributions over the years but they tend not to be my primary OS, I considered three methods for this operation.
Method 1 – install using Wubi
The Wubi installer is a way of installing a Linux distribution effectively as an application in Windows but apparently this doesn’t work because of incompatibilities in with UEFI. I’ve used Wubi in the past – I like it because it reduces the chances of me rendering my Windows install inoperative via a partitioning mistake.
Method 2 – conventional dual boot installation
As of the 64-bit 12.10 version of Ubuntu it should be possible to do a fairly conventional dual boot installation of Ubuntu onto a machine preloaded with Windows 8. The instructions for this are here, essentially they are:
1. Download the appropriate ISO
2. Transfer the ISO to a USB stick using Universal USB Installer
3. Boot from the USB stick (Shift-restart in Windows 8 gives you lots of options for the necessary fiddling to achieve this) and follow the installation instructions (here).
However when I did this I kept getting this error:
(initramfs) unable to find a medium containing a live file system.
This error persisted through various combinations of enabled/disabled SecureBoot and boot orderings. I don’t know why this doesn’t work, I suspect that the Universal USB Installer is not creating an appropriate boot device perhaps if I flagged the USB drive as legacy rather than UEFI it might work. I was feeling slightly nervous about this because there were some indications (here) that if I had succeeded in producing a new disk partition for Ubuntu then I may have lost my Windows partition! Doing clean installs of both Windows 8 and Ubuntu onto a machine looks like it might be a bit simper (here).
Maybe I should have followed the instructions here, the trick seems to be to create your Ubuntu partition using Windows 8 rather than trying to do it with the Ubuntu installer.
In some ways the problem here is finding an excess of instructions!
Method 3 – install on a virtual machine
Following a suggest on twitter my third method was to try installing Ubuntu onto a virtual machine inside Windows 8, if I’d have splashed out on Windows 8 Pro then I could have used Hyper-V as my virtual machine. However, I’m using VirtualBox. The instructions for installing Ubuntu inside VirtualBox are here, I switched on hardware virtualization support which was disabled by default.
This worked pretty smoothly, you don’t even need to produce a USB stick from which to boot, simply mount the ISO you downloaded as a virtual optical drive in VirtualBox. After initial installation Ubuntu was rather slow and unresponsive, I think this might have been due to downloading updates but I’m not sure. The only problem was that Ubuntu inside the VirtualBox couldn’t display at full screen resolution. This problem should be fixed by installing “Guest Additions” – this is software that lives on the guest operating system (the one inside the VirtualBox) and helps it interface with the host operating system. You can install the Guest Additions from an ISO image supplied with VirtualBox, the instructions for this are here. I failed to do this by not reading the instructions, in particular I didn’t install Dynamic Kernel Module Support (DKMS) properly. This was a recoverable mistake though, I learnt here that I needed to do this commandline first:
sudo apt-get install build-essential linux-headers-$(uname -r)
and then I re-installed using this commandline:
sudo apt-get install virtualbox-guest-utils
And it worked nicely on rebooting the virtual machine. So now I have Ubuntu 12.10 running in a VirtualBox inside Windows 8 aside from a hint of the VirtualBox menu bar at the bottom of the screen I could just as well be dual booting. Theoretically I might experience reduced performance by not running Ubuntu natively but I have 8GB of RAM in my laptop and an i7 processor so I suspect this won’t be an issue.
Now my eyes have been opened to the magic of virtual machines I want to install more! Sadly Apple’s OS X is not supported for such ventures.
I don’t claim to be an expert in this sort of thing so any comments on my understanding and technique are welcome!
Update
The Ubuntu in the virtual machine doesn’t find my monitor resolution (1920×1080), so I apply this fix (link)
Also I use this technique, adding vboxvideo to modules, to improve performance (link).
Possibly I need to do this thing in the host machine (link):
VBoxManage setextradata global GUI/MaxGuestResolution any
Dec 28 2012
Review of the year: 2012
It has become a tradition for me to review my posts at the end of each year, OK I’ve done it twice before and now I find myself sounding like a teenage diarist.
Clearly the main event of this year has been The Arrival; Thomas was born on 4th February, as I write he is systematically throwing all his books on the floor whilst muttering to himself, it is 6am. I haven’t written much about Thomas but he fills my real life, looking after a small child is very much like conducting an experiment at a central facility.
I’ve managed to keep reading although at a somewhat reduced rate. I read about geodesy in “The Great Arc” and “Measure of the Earth”, both tales of considerable derring-do conducted in the jungles of India and Ecuador respectively. I read about scientific instruments, in Stargazers, “Decoding the Heavens”, "A computer called Leo" and "The History of Clocks & Watches". The subjects of the last two of these are obvious, the first is on telescopes and the second on the Antikythera mechanism, an astoundingly complex mechanical model of the heavens. I read about Alan Turing, Christiaan Huygens and Benjamin Franklin.
If I was forced to pick a favourite book I think I would go for Arthur Koestler’s "The Sleepwalkers" which traces the development of cosmology from the ancient Greeks to Isaac Newton with its focus on the journey from Copernicus, still obsessed with celestial circles, to Kepler who started to sound like a modern physicist. Keplers’ attempts to identify elliptic orbits takes on a pantomime air at some points… “They’re right in front of you!”. Or perhaps my favourite should be Stargazers since after reading this I bought a telescope – more of which below.
Slightly more miscellaneously I read Tim Harford’s "Adapt" about trial and error as an approach to public policy and management, "The Geek Manifesto" on science and politics and "The Etymologicon" – a casual journey through where words come from. Finally, I also read "Visualize This", capturing the essence of my data twiddling and cluing me into tidying up my plots using Inkscape (or Adobe Illustrator if you have the cash).
Another new thing this year was a telescope, rather than appear some sort of dedicated follower of fashion, rushing out to buy one in the wake of a celebrity astronomonothon, I delayed until May. This turned out to be a bad idea: it doesn’t get properly dark until two hours after sunset and starts to get light two hours before dawn difficult at the best of times, impossible when combined with childcare responsibilities. Consequently I got little star viewing action for quite some time, except for the Sun. My telescope review post (including video) was my most read post of the year. It has been magical though, my first view of Saturn with its rings had me hopping up and down like a small child! More recently I got Jupiter and the four moons discovered by Galileo. I’m still trying for a deep sky object, I don’t count my pictures of the whole Milky Way taken through a normal camera lens.
Not much else in the way of photography this year, obviously I have an enormous collection of photos of Thomas but I won’t bore you with them but I’ll say to expecting parents who are also keen photographers that a 50mm f/1.4 lens is ideal for photographing small children since you are often indoors operating in relatively low light. I also took some pictures of Chester Cathedral, Beeston Castle and in the area of Harlech, where we took our first holiday with Thomas.
I did a little bit of fiddling with data this year, plotting the spending of the Board of Longitude, finding that they did a great deal to support John Harrison through his life, and looking at how quarterly GDP growth figures are revised – basically they’re all over the place!
I also pottered around a little with science policy and politics. “I am Dr Faustus” was an oft-read post, in which I disagreed with Ananyo Bhattacharya’s assertion that basic research in the UK had been corrupted by the idea of showing some application. “GCSE results through the ages” also got a lot of hits, it showed the changes in grades for GCSE and A levels over the years.
And as the year came to an end I handed in my notice to go to a new job – starting in March. I used some of my blog posts in support of my application!