A Docker environment for Windows (October 2015 edition)

This blog post provides an outline method for installing a nice environment for developing in Python using Docker on a Windows 10 machine. Hopefully I have provided sufficient of the error messages I encountered that both myself and others will find this post when in distress!

The past three years I’ve been working for ScraperWiki as a data scientist, this has meant a degree of coding in Python and interacting with my colleagues, and some customers, who use Linux (principally Ubuntu) or OS X. I have continued to use a Windows laptop. You can see my review of it here.

Until recently my setup was based on a core installation of Python and a whole bunch of handy libraries using Python(x,y). I also installed Git for Windows which gave me a shell prompt, and the command-line git commands along with some fraction of the bash environment. I also installed msysgit which provided further Linux style enhancements to my shell. I configured my shell so I could get ssh access to ScraperWiki servers in the cloud. For reasons I can’t recall I also installed ansicon.exe which gives the Windows Command prompt some of the colour highlighting of a modern shell prompt.

With this setup I could do most of what I needed from Windows, and if I had to I could fire up my Ubuntu VM and work in there. Typically I did this when I had some tricky libraries to install, or I wanted to be sure I could deploy onto ScraperWiki’s servers in the cloud. I never really got virtual environments working nicely on Windows – virtualenvwrapper, which makes such things, nicer is challenging to configure on Windows.

Students of this sort of thing will appreciate that the configuration described above is reached with a degree of trial and error, and a lot of Googling of error messages.

Times have changed and this setup was getting a bit long in the tooth, the environment around me was also changing – we started using Docker. I couldn’t get code using the Python requests library to run because of problems with SSL. Also, all the cool kids were talking about Python 3 and how new projects should all be in Python 3. I couldn’t work out how to add Python 3 to my Python(x,y) installation, and furthermore I was currently tied to 32-bit rather than 64-bit Python. ScraperWiki had recently done some work on making an easily deployable Python application and identified that the Anaconda Python distribution by ContinuumIO was the way to go.

Installing Python 3 and 2 using Anaconda

This worked very smoothly, there is an installer here. I had Python 3.4.3 (64-bit) working in the twinkling of an eye, and from my bash prompt I could now run that Python code which was previously broken due to OpenSSL problems. However, all was not rosy since it turns out my latest project was accidently Python 3 compatible, whilst my older projects were not. I therefore needed Python 2 as well. In principle, with Anaconda this is as simple as doing:

conda create -n python2 python=2.7 anaconda

and then

source activate python2

This puts you into a Python 2 virtual environment which will run your old code. However, it doesn’t work from the Git Bash prompt, you need to use a Windows Command prompt, as discussed here. But at least I now have the latest whizzy Python 3 installation and I can also run Python 2, when required. It’s worth noting that installing new libraries on Python under Windows has become rather easier with newer versions of pip, I believe due to the introduction of pip wheel. In the past installing some libraries was a pain because of a need to compile binary components.

Using Docker on Windows with Docker Toolbox and the Git SDK

The next task was to get support for Docker, the container system. You can find out more about Docker in my blog post here. Essentially it is a method for running an application in an isolation unit which is defined by a simple Dockerfile, largely removing problems of dependencies and versions. Docker is intrinsically a Linux technology, it relies on several deeply embedded components of the operating system and so does not run on Windows. However, you can boot up a very lightweight Linux-based VM and run Docker images on that from either Windows or OS X. Until recently this was done using boot2docker. The new way is to use the Docker Toolbox. I held off installing this until it became Windows 10 compatible since as a neophile I have obviously upgraded to Windows 10 at earliest opportunity. Docker Toolbox installs VirtualBox to run a VM to host Docker and Git for Windows to provide a bash shell prompt, as well as the Docker commandline tools.

I found installing Docker Toolbox relatively smooth although I had a problem with it finding ssh key files with an error message “open <filepath\ca.pem : The system cannot find the file specified” which was fixed by regenerating the key files:

docker-machine regenerate-certs default

But this alone does not give me the right workflow since ScraperWiki make heavy use of Make to build and run containers and Git for Windows does not come configured with Make. You can see this in action for the Simple API we made for the NewsReader Project. I used the Git for Windows SDK to provide Make and other build tools. This is designed for use by Git for Windows developers, it’s based on msys2 which I also tried to install but which errored on a couple of steps. The Git SDK is more verbose in its installation appeared to install cleanly.

Once we have Git for Windows SDK we need to use its git-bash to launch Docker Quickstart Terminal (rather than the version provided by the Toolbox), this means changing the command executed by the Docker Quickstart Terminal shortcut from:

"C:\Program Files\Git\git-bash.exe" "C:\Program Files\Docker Toolbox\start.sh"

to

C:\git-sdk-64\git-bash.exe "C:\Program Files\Docker Toolbox\start.sh"

Update 2016-03-21: I modified start.sh to give the docker-machine binary an absolute path, this means I can launch a plain Git Bash shell and run the start.sh script later, if required. This change requires further modification to make sure paths were properly escaped. You can see my version of start.sh here: https://gist.github.com/IanHopkinson/85453a90212eb6627f29

Simply trying to run the Git SDK version of the make tool does not seem to work, you get an error like “unable to make temporary trusted Dockerfile”.

We’re into the final straight now!

My final problem was that when I tried to make my previously working application it failed with an error message:

IOError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/home/newsreader-demo']

The problem seems to be the way in which msys2 handles paths in Windows it needs to have two preceding //, rather than one, as described here. So all I need to do is change this line in my Makefile

@docker run -p 8000:8000 --read-only --rm --volume /tmp -e NEWSREADER_PUBLIC_API_KEY ianhopkinson/newsreader_demo

To this:

@docker run -p 8000:8000 --read-only --rm --volume //tmp -e NEWSREADER_PUBLIC_API_KEY ianhopkinson/newsreader_demo

Can you see what I did there?

Update 2015-10-14 – interactive shells into docker

If you try to get an interactive shell on a container then you get an error like:

cannot enable tty mode on non tty input

To avoid this you can use winpty:

winpty docker exec -i -t [CONTAINER_NAME] bash

There’s some discussion of this on the Docker Toolbox issue tracker

Update 2015-10-22 – Which Python are you using?

It turns out I was accidentally using the Python shipped with Git for Windows SDK, rather than the Anaconda version I had so carefully installed. I fixed this by adding this to my .profile file:

export PATH=/c/anaconda3/:$PATH

I didn’t spot it earlier because I checked Python version by running ipython rather than python.

Concluding thoughts

I wrote this partly in frustration at the amount of time I spent getting this all fixed up, and the fact that I couldn’t stop until I had fixed it. The scheme above worked for me but I suspect it is quicker and easier to do on a laptop with no history.

There’s no doubt that the situation is better than I found it 3 years ago but it is still a painful process involving much trial and error. Docker brings great benefits for developers, and once it is working makes sharing your work across multiple users very straightforward.

Book review: The Son also Rises by Gregory Clark

downloadThe Son also Rises by Gregory Clark is a book about social mobility, as traced through surnames. Clark prefaces his work by saying that what he is to say might be considered radical and controversial. Other studies of social mobility have find modest “inheritability” between generations. This study finds high levels of inheritability spanning hundreds of years.

The theme for the early chapters is to find some source of high status individuals – be it graduation from prestigious universities such as Oxford, Cambridge or the American Ivy League, membership of professional bodies such as those for doctors or attorneys or from financial records such as occasional tax releases or records of wills (probate). Next a cohort of names is tracked through these systems and their level of incidence is compared against the background level of incidence for that surname. For example, “Smythe” is a relatively rare surname in the general UK population but it is found at a much higher level in records of registered doctors.

The selected cohort of surnames may be from a distinctive ethnic population – i.e. Japanese in America, Native Americans or French settlers. Or it may be selected from a set of high status individuals at a point in time i.e. the Normans who came the England with William the Conqueror, or Swedish nobles.

Clark’s discovery is that for all of these many cohorts across multiple measures of status the persistence over time is strong. The Smythes of 200 years ago had relatively high status then and they still do now. After nearly a 1000 years those with surnames associated with the Norman conquest are still a little over-represented in the intake of Oxford and Cambridge University. Similar behaviour is found for low status groups, Baldrick’s character through the several series of Blackadder is not far from the truth. In both cases these groups are “regressing towards the mean” but it is a long, slow process.

Following these initial demonstrations of social mobility, Clark states his general law which is that the correlation of status over generations is high compared to previously measured parent-child measures and remarkably constant across multiple countries, periods in history and cohorts. The magic number for the correlation is 0.75. He argues that the reason that his estimate is higher than others is that he models social mobility with an underlying constant and a random fluctuation, the methods of calculation for early figures mean that this random fluctuation is much more apparent and brings down the measured social mobility. I don’t feel he demonstrates the origin of this discrepancy very clearly.

Subsequent chapters go on to look at some cases where one really expects deviations from this general rule, in the Indian caste system where low mobility is expected and also in China, where post-revolution is expected to be a time of high social mobility. It turns out that in India, despite laws aimed at reducing caste based discrimination, social mobility is has not improved dramatically. In China social mobility seems to have been little bothered by the revolution. The odd groups that do break the rule of constant social mobility seem to do so by preferential recruitment i.e. in the past in Muslim countries non-Muslims were tolerated but charged a poll tax which meant that lower status/income people were more likely to convert to Islam leaving a more persistently high status non-Muslim population. A second route is by strong preference for “in group” marriage which is seen in the Indian Brahmin caste. It turns out that the surnames identified with British parliamentarians are particularly immobile.

As for the origin of this constant social mobility, Clark ascribes it to what he calls “social competence”. There is a confused discussion of the balance of nature and nurture, not helped by a table where nature and nurture headings are accidently swapped (I think). I believe that technically it is all nurture, and Clark is trying to work out whether it is all about money. It strikes me that your wider family is where you learn about what the possibilities are for you and, while every family has it’s black sheep, the fact that your father, two out of three uncles went to Cambridge University means that your expectation is that you should aspire to that. Your family sets what is “normal”.

I suspect that this is particularly the case for British parliamentarians where there seems to be a lot of siblings (Milibands, Johnsons, Eagles), husband wife (Cooper/Balls) and parent-child (Kinnock, Benn) combinations. Being a politician is an odd sort of job, there is not really a class at school for it, seeing your family working in the “family business” must be a big influence.

“The Son also Rises” is an interesting read but turning it into a 300 page book seems to belabour the point somewhat. I liked the incidental details of the origins of surnames, and the various sources of information on social status.

I got this as a Kindle edition, I wish I’d bought it as a paperback, there are numerous figures, tables and equations which didn’t render at a reasonable size in the first instance.

Book review: The Values of Precision edited by M. Norton Wise

valuesofprecisionThe Values of Precision edited by M. Norton Wise is a collection of essays from the Princeton Workshop in the History of Science held in the early 1990s.

The essays cover the period from the mid-18th century to the early 20th century. The early action is in France and moves to Germany, England and the US as time progresses. The topics vary widely, starting with population censuses, then moving on to measurement standards both linear and electrical, calculating methods and error analysis.

I’ve written some notes on each essay, skip to the end of the bullet points if you want the overview:

  • The first article is about the measurement of population, mainly in pre-revolutionary France. This was spurred by two motivations: firstly, monarchs were increasingly seeing the number of their subjects as a measure of their power and secondly, there was a concern that France was experiencing depopulation. In the 17th century the systematic recording of births, deaths and marriages was mandated by royal direction. In the period after this populations were either estimated from a count of “hearths” or from the number of births. The idea being that you could take either of these indirect measures and multiple them by some factor to get a true measure of population.
  • The second article is by Ken Alder, he of “The Measure of All Things” and is another trip to revolutionary France and their efforts to introduce a metric system of measurement. The revolutionary attempt failed but the system of standards they created prevailed in the middle of the 19th century but not without some effort. Alder highlights the resistance of France to metrification, and also how the revolution bred a will to introduce a rational system based on natural measurements rather than a physical object created by man. He also discusses some of the benefits of the pre-metric system: local control, the ability for workers to take a cut without varying price, connection to effort expended/quality. This last because land was measured in terms of the amount of grain used to seed it or the area one person could harvest in a day – this varies with the quality of the land.
  • Jan Golinski writes on Lavoisier (again from France at the turn of the Revolution) regarding “exactness” and its almost political nature. Lavoisier made much of his exact measurements in the determination of the masses of what are now called hydrogen and oxygen in producing a known mass of water. This caused some controversy since other experimenters of the time saw his claims of exactness in measurement to be mis-used in supporting his theory for chemical reactions. There were reasons to be sceptical of some of his claims, he often cited weighed amounts to more significant figures than were justified by the precision of his measurements and there are signs his recorded measurements are a little too good to be true. These could be seen as the birthing pains of a new way of doing science which didn’t just apply to chemical measurements of the time, but also to surveying and the measurement of population. These days the inappropriateness quoting of more significant figures than are justified by the measurement is drummed into students at an early age.
  • Next we move from France to Germany and a discussion of the method of least squares, and the authority of measurements by Kathryn M. Olesko. Characters such as Legrendre and Laplace had started to put the formal analysis of error and uncertainty in measurement on the map. This work was carried forward by Gauss with the method of least squares, essentially this says that the “true” value of a measurement is that which minimises the squared difference of all the measurements made of that value. It is an idea related to probability, and it is still deeply embedded in how we make measurements today and also how we compare measurement to theory. In common with events in France, the drive for better measurement came in Germany with a drive to standardise weights and measures for the purposes of trade. The action here takes place in the first half of the 19th century.
  • The trek through the 19th century continues with Simon Schaffer’s essay on the work in England and Germany on electrical units with a particular view to establishing whether the speed of light and the speed of propagation of electromagnetic waves were the same. This involved the standardisation of units of electrical resistance. It was work that went on for some time. Interesting from a practicing scientists point of view was the need for the bench scientist and instrument makers to work closely together.
  • The next chapter is a step away from the physical sciences with a look at life insurance and the actuarial profession in the first half of the 19th century. Theodore Porter describes the attitude of this industry to precision and calculation, noting that they fended off attempts to regulate the industry too tightly by arguing that there business could not be reduced to blind calculation. The skill, judgement and character of the actuary was important.
  • The Image of Precision is about Helmholtz’s work on muscle physiology in around 1850, he used an apparatus which showed the extension of a muscle graphically following stimulation, and measured the speed of nerve impulses using similar methods. The graphical method was in some senses less precise than an alternative method but it was a more compelling explanatory tool and provided for better understanding of the phenomena under study.
  • Next up is a discussion of the introduction of so-called “direct-reading” ammeters and voltmeters by Ayrton and Perry in around ~1870. This was an area of some dispute, with physicists claiming that determinations of volts and amps be made by reference to the basic units of length, time and mass. Ayrton and Perry were interested in training electrical engineers whose measurements would be made in environments not conducive to these physicist-preferred measurements. Not conducive in both a technical sense (stray magnetic fields, vibration and so forth) nor in the practical sense (an answer within 1 percent in 10 minutes was far superior to one within 0.5 percent in 2 hours).
  • As we approach the end of the book we learn of Henry Rowland, and his diffraction gratings, made at John Hopkins university. Rowland had toured Europe, and on his return set to making high quality diffraction gratings to measure optical spectra. This is a challenging technical task, to be useful a diffraction grating needs many very closely spaced lines of the same profile. Rowland sent out his diffraction gratings for a nominal price, making no profit, but did not reveal the details of his methods. It took many years for his work to be better, and even longer yet for better diffraction gratings to be available generally.
  • The collection finishes with the construction of mathematical tables, starting with a somewhat philosophical discussion of the limits of calculation but moving onto more pragmatic issues of the calculation and sharing tables. The need for these tables came original with the computationally intensive calculations for determining the longitude by the method of lunar distances. The 19th century saw the growth in mathematical analysis in a range of areas, spreading the need to make mathematical tables. Towards the end of the century machine calculation was used to help build these tables, and do the analysis they supported. Students of my generation will likely just about remember using tables of trigonometric and other functions, these days in my practical work they are entirely replaced by computer calculations done on demand.

There is a lot in here which will speak to those with a training in science, physics in particular. The techniques discussed and the concerns of the day we will recognise in our own training. The essays hold a slight distance from practitioners in this arts but that brings the benefit of a different view. Core to which is the way in which precision in measurement is a social as well as technical affair. To propagate standards of measurement requires the community to build trust in the work of others, this does not happen automatically.

I like this style of presentation, each essay has its own character and interest. The range covered is much larger than one might find in a book length biography, and there is a degree of urgency in the authors getting their key points across in the space allocated.

In this book the various chapters do not overlap in their topics and cover a substantial period in time and space with the editor providing some short linking chapters to tie things together. All in all very well done.

Book Review: Stargazers–Copernicus, Galileo, the Telescope and the Church by Allan Chapman

stargazersIt’s been a while since my last book review here but I’ve just finished reading Stargazers: Copernicus, Galileo, the Telescope and the Church by Allan Chapman.

The book covers the period from the end of the 16th century, the time of Copernicus and Tycho Brahe, to the early 18th century and Bradley’s measurement of stellar aberration passing Galileo, Newton and others on the way. Conceptually this spans the full transition from a time when people believed in a Classical universe with earth at its centre, and stars and planets plastered onto crystal spheres, to the modern view of the solar system with the earth and other planets orbiting the sun.

This development parallels that in Arthur Koestler’s classic book "The Sleepwalkers”, however Chapman’s style is much more readable, his coverage is broader but not so deep. Chapman introduces a wealth of little personal anecdotes and experiments. For instance on visiting Tycho Brahe’s island observatory he recounts a meeting with a local farmer who had in his living room a marked stone from the Brahe’s observatory (which had been dismantled by the locals on Brahe’s death). Brahe was hated by his tenants for his treatment of them, a hate that was handed down through the generations. Illustrations are provided in the author’s own hand, which is surprisingly effective. He discusses his own work in reconstructing historical apparatus and observations.

Astronomy was an active field from well before the start of this period for a couple of reasons: firstly, astrology had been handed down from Classical times as a way of divining the future. To was believed that to improve the accuracy of astrological predictions better data on the locations of heavenly bodies over time was required. Similarly, the Christian Church required accurate astronomical measurement to determine when Easter fell, across increasingly large spans of the Earth.

The period covered by the book marks a time when new technology made increasingly accurate measurements of the heavens possible, and the telescope revealed features such as mountains on the moon, sunspots and the moons of Jupiter visible for the first time. Galileo was a principle protagonist in this revolution.

Amongst scientists there is something of the view that the Catholic Church suppressed scientific progress with Galileo the poster boy for the scientist’s case. Historians of science don’t share this view, and haven’t for quite some time. Looking back on Sleepwalkers, written in 1959 I noted the same thing – the historians view is generally that Galileo brought it on himself in the way he dismissed those that did not share his views in rather offensive terms. Galileo lived in a time when the well-entrenched Classical view of the universe was coming under increased pressure from new observations using new instruments. In some senses it was the collision with the long-held Classical view of the universe which led to his problems, the Church being more committed to this Classical view of the physical universe rather than to anything proposed in Scripture.

The role of the Church in promoting, and fostering science, is something Chapman returns to frequently – emphasising the scientific work that members of the Church did, and also the often good relationships that lay “scientists” of different faiths had with Church authorities.

Chapman introduces some of the lesser known English (and Welsh) contributors to the story. Harriet who made the earliest known sketches of the moon. The Lancashire astronomers, who made the first observations of the transit of Venus. John Wilkins whose meetings were to lead to the foundation of the Royal Society. He also notes the precedent of the Royal College of Physicians, formed in 1518. The novelty of the Royal Society when compared with earlier organisations of similar character was that the Fellows were responsible for new appointments, rather than them being imposed by a patron. This seems to have been an English innovation, repeated in the Oxbridge colleges, and Guilds.

Relating to these English astronomers was the development of precision instruments in England. This seems to have been spurred by the Dissolution of the monasteries. The glut of land, seized by Henry VIII, became available to purchase. The purchase of land meant a requirement for accurate surveying, and legal documents. Hence an industry was born of skilled men wielding high technology to produce maps.

I was distracted by the presence of Martin Durkin in the acknowledgements to this book, he was the architect of “polemical” Channel 4 documentary “The Great Global Warming Swindle”, so it cast doubt in my mind as to whether I should take this book seriously. On reflection Chapman’s position as presented in this book seems respectable, but it is interesting how a short statement in the acknowledgements made me consider this more deeply.

Overall, Stargazers is rather more readable than Sleepwalkers, not quite so single-tracked in it’s defence of the Catholic Church as God’s Philosophers and a different proposition to Fred Watson’s book of the same name, which is all about telescopes.

The London Underground – Can I walk it?

caniwalkitThere are tube strikes planned for 25th August 2015 and 28th August 2015 with disruption through the week. The nature of the London Underground means that it is not all obvious that walks between stations can be quite short. This blog post introduces a handy tool to help you work out “Can I walk it?

You can find the tool here:

http://www.caniwalkit.co.uk/

To use it start by selecting the station you want to walk from, either by using the “Where am I?” dropdown or by clicking one of the coloured station symbols (or close to it). The map will then refresh, the station you selected is marked by a red disk, the stations within 1.5 miles of the starting station are marked by an orange disk and those more than 1.5 miles away are marked by a blue disk. 1.5 miles is my “walkable” threshold, it takes me about 25 minutes to walk that far. You can enter your own “walkable” threshold in the “I will walk” box and press refresh or select a new starting station to refresh the map.

The station markers will show the station names on mouseover, and the distances to the starting station once it has been selected.

This tool comes with no guarantees, the walking distances are estimated and these estimates may be faulty, particularly for river crossings. Weather conditions may make walking an unpleasant or unwise decision. The tool relies on the user to supply their own reasonable walking threshold. Your mileage may vary.

To give a little background to this project: I originally made this tool using Tableau. It was OK but tied to the Tableau Public platform. I felt it was a little slow and unresponsive. It followed some work I’d done visualising data relating to the London Underground which you can read about here.

As an exercise I thought I’d try to make a “Can I walk it?” web application, re-writing the original visualisation in JavaScript and Python. I’ve been involved with projects like this at ScraperWiki but never done the whole thing for myself. I used the leaflet.js library to provide the mapping, the Flask library in Python to serve the data, Boostrap to make it look okay and Docker containers on Digital Ocean to deploy the application.

The underlying data for this tool comes from Open Street Map, where the locations of all the London Underground stations are encoded as latitude and longitude. With this information in hand it is possible to calculate the distances between stations. Really I want the “walking distance” between stations rather than the crow flies distance which is what this data gives me. Ideally to get the walking distance I’d use Google Directions API but unfortunately this has a rate limit of 2500 calls per day and I need to make about 36000 calls to get all the data I need!

The code is open source and available in this BitBucket repository:

https://bitbucket.org/ian_hopkinson/london-underground-app

Comments and feedback are welcome!