Dr Administrator

Author's posts

Book review: An Introduction to Geographical Information Systems by Ian Heywood et al

HeywoodI’ve been doing quite a lot of work around Geographical Information Systems recently. So I thought I should get some background understanding to avoid repeating the mistakes of others. I turned to An Introduction to Geographic Information Systems by Ian Heywood, Sarah Cornelius and Steve Carver, now in its fourth edition.

This is an undergraduate text, the number of editions suggests it to be a good one. The first edition of Introduction was published in 1998 and this shows in the content, much of the material is rooted in that time with excursions into more recent matters. There is mention of CRT displays and Personal Data Assistants (PDA). This edition was published in 2011, obviously quite a lot of new material has been added since the first edition but it clearly forms the core of the book.

I quite deliberately chose a book that didn’t mention the latest shiny technologies I am currently working with (QGIS, Open Layers 3, spatial extensions in MariaDB) since that sort of stuff ages fast and the best, i.e. most up to date, information is on the web.

GIS allows you to store spatially related data with the ability to build maps using layers of different content and combine this spatial data with attributes stored in databases.

Early users were governments both local and national and their agencies, who must manage large amounts of land. These were followed by utility companies who had geographically distributed infrastructure to manage. More recently retail companies have become interested in GIS as a way of optimising store location and marketing. The application of GIS is frequently in the area of “decision support”, along the lines of “where should I site my…?” Although, “how should I get around these locations?” is also a frequent question. And with GPS for route finding arguably all of us carry around a GIS, and they are certainly important to logistics companies.

From the later stages of the book we learn how Geographic Information Systems were born in the mid to late 1960s became of increasing academic interest through the 1970s, started to see wider uptake in the eighties and became a commodity in the nineties. With the advent of Google Maps and navigation apps on mobile phones GIS is now ubiquitous.

I find it striking that the Douglas-Peucker algorithm for line simplification, born in the early seventies, is recently implemented in my favoured spatially enabled database (MariaDB/MySQL). These spatial extensions in SQL appear to have grown out of a 1999 standard from the OGC (Open Geospatial Consortium). Looking at who has implemented the standards is a good way of getting an overview of the GIS market.

The book is UK-centric but not overwhelmingly so, we learn about the Ordnance Survey mapping products and the UK postcode system, and the example of finding a site for a nuclear waste repository in the UK is a recurring theme.

Issues in GIS have not really changed a great deal, projection and coordinate transforms are still important, and a source of angst (I have experienced this angst personally!). We still see digitisation and other data quality issues in digitized data, although perhaps the source is no longer the process of manual digitization from paper but of inconsistency in labelling and GPS errors.

One of the challenges not discussed in Introduction is the licensing of geographic data, this has recently been in the news with the British government spending £5 million to rebuild an open address database for the UK, having sold off the current one with the Royal Mail in 2013. (£5 million is likely just the start). UN-OCHA faces similar issues in coordinating aid in disaster areas, the UK is fairly open in making details of administrative boundaries within the UK available electronically but this is not the case globally.

I have made some use of conventional GIS software in the form of QGIS which although powerful, flexible and capable I find slow and ugly. I find it really hand for a quick look-see at data in common geospatial formats. For more in-depth analysis and visualisation I use a combination of spatial extensions in SQL, Python and browser technology.

I found the case studies the most useful part of this book, these are from a wide range of authors and describe real life examples of the ideas discussed in the main text. The main text uses the hypothetical ski resort of Happy Valley as a long running example. As befits a proper undergraduate introduction there are lots of references to further reading.

Despite its sometimes dated feel Introduction to Geographic Information Systems does exactly what it says on the tin.

Ride

Picture1A new job has brought me a new mode of transport for my daily commute. No longer do I spend an hour and a half on Merseyrail each day, instead I cycle across Chester (8 miles or 50 minutes a day). This isn’t a novelty to me, we lived in Cambridge for nearly 10 years and everybody cycles there. Although I passed my test many years ago, I don’t drive. So when it came to my new job cycling was the obvious way to work. Some people take the bus but for me that would mean one bus into the centre of town and one bus back out again – it’s quicker to cycle.

I’m a cycling commuter, rather than a dedicated cyclist who dons lycra and takes up a fancy road bike for the cycle to work. I wear a running top and cycling windproof but work shoes and trousers. The greatest innovation since my Cambridge days is the “transformer panniers” which convert from panniers to rucksack, ideal for day trips to London when I cycle to Chester’s main station. I treated myself to a new bike for the commute – a Rayleigh Loxley – cost equivalent to 3 months of Merseyrail travel to Liverpool.

I get wet surprisingly infrequently, I’ve flipped between shorts and rainproof trousers for rainy days. Winter rain on bare legs gives one an expensive-spa tingling sensation. I find waterproof trousers a bit clammy. Snow and ice haven’t been a problem this year. My dad was a life long cycling commuter, and recommended keeping a full change of clothes at to work in case of unexpected rain.

I try to be as visible as possible, my jacket is acid green – slightly short of full high viz, I have a high viz helmet cover, two sets of lights front and back and lights on my spokes. I considered going for the super-blingy spoke lights.

I’m well catered for at work, whilst car drivers are squeezed into a space which seems to be 25% too small, I have a bike shed pretty much to myself except for an occasional motorbike and at most two other bikes. Not only that my bike is guarded through the day by a steady stream of smokers! If I wanted I could have a shower.

Amongst my colleagues I’m viewed as something of a novelty, of the 100 or so people on site I’m the only one who cycles regularly and there are rarely as many as three bikes in the shed. A few people have asked about my ride, their main concern seems to be safety.

I’ve found optimising my route has taken a while. There’s a chunk of cycle route on the way out of Chester towards the Business Park, that bit’s fine. The route expires shortly before I reach my destination which is inconvenient, cycling three sides of the Business Park to get to my office seems excessive and it’s on dual carriageway with poor crossings – there is no cycle route. The piece of the road out of town which gets me fairly directly to my office is a bit narrow, as is the fragment of pavement that I’d need to traverse. A twisty path across the end of the Business Park is clearly not designed to cycle, and is unlit with an awkward gate at the end. A rather lovely looking route along Duke’s Drive is blocked at the Business Park end, this is a pity.

The rest of the route is more a case of finding the quietest roads, cycling through the town centre is not great – it’s cobbled, has a one way system and the pedestrians dodge backward and forward unpredictably. The pavements on the Grosvenor bridge are a bit too narrow, and the roadway is too. So you either menace pedestrians or have large vehicles itching to get past you.

It seems my cycling is more reliable than driving, two or three times in the last few months a large fraction of the people I work with have been held up by up to an hour by traffic.

The only thing I really miss about the train is the lost reading time.

(Click here for an Google Map of my route)

The Running Man

forerunner10Over the last 8 months or so I’ve started running. Thomas has just turned four, and before he was born I used to go to the gym three times a week, this lapsed on Thomas’ arrival since life became so much busier. Long ago I used to run at school over middle distance, and I’m quite keen walker – so running isn’t an alien concept to me.

Feeling the need to lose a little bit of weight, I got nagged by the doctor about it whenever I visit, and also finding exercise good for my mental health, I got started. It’s telling that the first thing I did was get a GPS watch so I could record my running, I did this before buying a decent pair of trainers! This is just the right motivation for me: I can see where I’ve been on a map (tick) and I can collect data (tick)! As a consequence I ran a bit too enthusiastically to start off and wore myself out.

I got the Garmin Forerunner 10, this is a fairly basic GPS watch. It takes up to a couple of minutes to find location from GPS when you switch on. Then all I do is press go and run. It reports time and distance travelled in kilometers, and flashes up a lap time at the end of each mile. At the end of my run I upload the data to the Garmin Connect website where I can see where I’ve run on a map with timings and so forth, I get an animated winners cup when I break one of my records! It will do more complicated things but to be honest when I’m running I can’t be doing with fiddling around with a tiny display and the audible prompts it produces are pretty quiet.

Pain was any early problem, I had intense pain in my shins when I started running. This was fixed by buying some proper running shoes rather than simply using the Marks and Spencer’s trainers I’d fished out of the back of the cupboard. I got the new ones from Up and Running in Chester, who measured my gait on a treadmill with a video camera. It took a while before the pain subsided but now I sometimes get a rather atavistic feeling of being able to run for ever. This was how man won on the African savannah, he wasn’t a sprinter but he could keep going to the point his prey gave up, preferring to be eaten than run any further!

Collecting so much data, it’s obviously I must decorate this post with at least one chart. I didn’t aim for a big weight loss, I went from 85 Kg to 75kg. This means my belts all need an extra hole punching in them, and my waist has room to move in my trousers but I don’t need to replace my wardrobe. My general problem with losing more weight is that I see cake as a suitable reward to myself for losing weight, which is self-defeating. Mrs H rewards herself with sparkly nail varnish, I can’t say I’m not tempted to do the same. Below you see my weight loss over time, with a perceptible bump around Christmas wherein I converted Christmas cake efficiently to body mass.

weight

When I started running I was running for a bit then walking, running a bit more, walking. After about a month I was able to run for 5km without a break, I fairly regularly run 8km and once the weather and light has improved I may start doing a weekly 10km run, passing through Chester Zoo. I have some pleasant runs from my door, running along the Shropshire Union canal although in the depths of winter this became hazardous due to the dark, so I ran the suburban roads. It’s sad that my wife doesn’t feel safe to run along the canal at any time and the streets after dark.

You can see how my pace has quickened in the chart below showing my average time to run a mile. My increasing speed has tracked my loss in weight. I’ve been at a plateau for a while now. Interestingly, I started cycling 8 miles a day to and from work in November and this appears to have had no effect on my weight or speed.

pace_mile

Book review: A New History of Life by Peter D. Ward and Joe Kirschvink

A new history of life

This next book is a Christmas present, A New History of Life by Peter D. Ward and Joe Kirschvink.

The theme of the book is the evolution of life from the very early periods of life on earth with a particular emphasis on that which has been discovered over the last 20 or so years, they cite Life: A Natural History of the First Four Billion Years of Life on Earth by Richard Fortey as the last comparable work.

A New History follows a long thread of books I’ve read, some I’ve reviewed here such as Neil Shubin’s Your Inner Fish, others in the prehistory of my blogging such as Stephen Jay Gould’s Wonderful Life: the Burgess Shale and the Nature of History. I’ve also written about First Life, a program narrated by David Attenborough on the earliest life.

It turns out quite a lot has happened in the last 20 or so years, radio-dating has improved in sensitivity allowing us to probe the early years of life on earth in more detail, new fossil fields have opened up in China, the chemistry of the early earth is better understood, early “Snowball Earth” episodes have been identified, and the discovery of exoplanets has led to more interest in the very earliest stages of life. Indeed, Tiktaalik at the core of Shubin’s Your Inner Fish, one of the first vertebrates to walk on the land, was discovered this century.

I always feel a little cautious approaching popular books such as this, promising great and revolutionary things, the risk is they present a minority view unsupported by other experts in the field and as an outsider you would be completely unaware of this. Thankfully, this isn’t the case here, Ward and Kirschvink are experts in early life but they are explicit where they own theories come into play and present alternative viewpoints in a fairly balanced way.

Half of the book covers the earliest life on earth up to the Cambrian Explosion 600-500 million years ago, when a huge diversity of life suddenly appeared. It seems here that the greatest new research activity has taken place. This includes work on ancient atmospheric composition: the relative amounts of carbon dioxide and oxygen; the “Snowball Earth” periods of complete glaciation where there was only liquid water at the surface due to volcanic activity; the chemistry of early life and the precursors to the Cambrian Explosion such as the Ediacaran fauna and other life such as Grypania and arcritarchs of which I had not heard. Vernanimalcula is also mentioned as the first bilateral animal, a microscopic fossil found in rocks 600 million years old. Although I see from wikipedia that this attribution is disputed.

The origins of life have the air of the physicist’s dark matter, their must be something there but we have little direct evidence for what it is and so it is ripe for a wide range of hypotheses. The big problem is the formation of RNA and DNA, experiments have long show that the basic building blocks of life can form in plausible early conditions stimulated by heat and lightning. But DNA and RNA are large, complex molecules, and not particularly heat stable.  One of the authors (Kirschvink) is keen on a Martian genesis for these molecules, then transported by asteroid to earth. I’ve always found these extra-terrestrial origins proposals unlikely.

A New History highlights the dispute between Stephen Jay Gould and Simon Conway Morris over the Burgess Shale. Gould was a proponent of the idea that the Burgess Shale assemblage represented a massive diversification of forms which of which many are now extinct, whilst Morris sees the forms as precursors to modern forms.

Following on from the earliest times the rest of the book is a story of successive mass extinctions followed by diversifications. Aside from the K-T extinction 65 million or so years ago, caused largely by a massive asteroid impact, extinction events were caused by changes in atmospheric chemistry. Typically this involved high levels of carbon dioxide leading to global warming and lower levels of oxygen. These changes in atmospheric chemistry were driven by large scale geology and life. Other than microbes, life struggles to survive when oxygen levels are much below the 21 percent we currently enjoy, conversely when oxygen levels are high large animals can evolve. The dinosaurs prevailed because they could survey at relatively low oxygen levels, and then became giants when oxygen levels rose above current levels.

I was amused to discover that reptiles and amphibians can’t run and breathe effectively at the same time, their splayed gait compresses the ribcage inconveniently. Creatures such as the dinosaurs resolved this problem by moving the legs beneath the body, our bipedal stance is even better since breathing and running can work entirely independently.

The book starts a little bombastically with comments on how boring history is perceived to be and how new and revolutionary this book is but once it settles into its stride its rather readable.

Book review: Artificial intelligence for Humans: Volume 3 Deep Learning and Neural Networks by Jeff Heaton

heaton-vol3Deep learning and neural networks are receiving more attention these days, you may have seen the nightmarish images generated using this technology by Google Research. I picked up Artificial Intelligence for Humans: Volume 3 Deep Learning and Neural Networks by Jeff Heaton to find out more since the topic fits in with my interests in data science and machine learning. There doesn’t seem to be much in the way of accessible, book length treatments of this relatively new topic. Most other offerings on Amazon have publication dates in the future.

It turns out that Artificial Intelligence for Humans is the result of a Kickstarter campaign, so far the author has funded three volumes on artificial intelligence by this route: two of them for around $18,000 and on for around $10,000. I paid £16 for the physical book which seems like a reasonable price. I think it is a pretty well polished product, it doesn’t quite reach the editing and production levels of a publisher like O’Reilly but it is at least as good as other technical publishers. The accompanying code examples and web site are really nicely done.

Neural networks have been around for a long time, since the 1940s, and see period outbreaks of interest and enthusiasm. They are modelled, loosely on the workings of biological brains with “neurons” connected together with linkages of different weights which can be trained to perform tasks such as image recognition, classification and regression. The “neurons” are grouped into layers with an input layer, where data enters, feeding into potentially multiple successive “hidden” layers finally leading to an output layer of neurons where results are read off. The output of a neuron is calculated by summing its inputs multiplied by the weights of the inputs and feeding the result through an “activation function”. The training process is used to optimise the weights, and may also evolve the structure of the network.

I remember playing with neural networks in the 1980s, typing a programme into my Amstrad CPC464 from a magazine which recognised handwritten digits, funnily enough this is still the go to demonstration of neural networks! In the past neural networks have not gained traction because of the computational demands of training. This problem appears to have been solved with new algorithms and GPU-based computation. A second innovation is the introduction of techniques to evolve the structure of neural networks to do “deep learning”.

Much of what is presented is familiar to me from my reading on machine learning (supervised and unsupervised learning, regression and classification), image analysis (convolution filters), and old-fashioned optimisation (stochastic gradient descent, Levenberg-Marquardt, genetic algorithms and stimulated annealing). It does lead me to wonder sometimes whether there is nothing new under the sun and that many of these techniques are simply different fields of investigation re-casting the same methods in their own language. For example, the LeNET-5 networks used in image analysis contain convolution layers which act exactly like convolution filters in normal image analysis, the max pool layers have the effect of downscaling the image. The combination of these one would anticipate to give the same effect as multi-scale image processing techniques.

The book provides a good summary on the fundamentals of neural networks, how they are built and trained, what different variants are called and then goes on to talk in more detail about the new stuff in deep learning. It turns out the label “deep” is applied to neural networks with more than two layers, which isn’t a particularly high bar. It isn’t clear whether this is two layers including the input and output layers or two layers of hidden neurons. I suspect it is the latter. These “deep” networks are typically generated automatically.

As the author highlights, with the proliferation of easy to use machine learning and neural network libraries the problem is no longer the core algorithm rather it is the selection of the right model for your particular problem and optimising the learning and evaluation strategy. As a Pythonista it looks like the way to go is to use the NoLearn and Lasagna libraries. A measure of this book is that when I go to look at the documentation for these projects the titles at least make sense.

The author finishes off with a description of his experience with doing a Kaggle challenge. I’ve done this, it’s a great way of getting some experience in machine learning techniques on nearly real problems. I thought the coverage was a bit brief but it highlighted how neural networks are used in combination with other techniques.

This isn’t an in depth book, but it introduces all the useful vocabulary and the appropriate libraries to start work in this area. And as a result I’m off to try t-SNE on a problem I’m working on, and then maybe try some analysis using the Lasagna library.