Tag: computing

Book review: The Programmer’s Brain by Felienne Hermans

programmers_brainI picked up The Programmer’s Brain by Felienne Hermans, as a result of a thread on Twitter. I’ve been following Hermans for quite a while, and knew the areas of computer science she worked in but my interest in Programmer’s Brain was stimulated by a lengthy thread she posted over the Christmas break.

The book is based around the idea of the brain as having long term (LTM), short term (STM) and working memory and how these different sorts of memory come into play in programming tasks, how we can improve our memories, and how we can write code that supports our use of them. It cites a fair number academic studies in each area it looks at.

The book is divided into four parts.

The first part covers the reading of code. We do a lot of training on how to write code but none on reading it, yet as developers we spend a lot of time reading code, either our own code from the past, the code of our colleagues or library code.

Perhaps most traumatic for me was the suggestion that I should learn syntax. Hermans suggests flash cards to learn syntax, as an aid to reading code (and writing it), highlighting that going and looking up syntax is likely to break our flow, by the time we have checked out twitter and some pictures of kittens. Thinking about my own behaviour, this is definitely true. My first flash cards would all be around Python – set syntax, format statements, unittests boilerplate and the options for sort and sorted.

An idea I hadn’t come across before was refactoring code for readability which may be at odds with how code currently stands; you might, for example, inline functions to remove the need to go look them up and potentially lose your place in code. Or replace lambdas, list comprehensions or ternary operators – all of which take a bit more effort to parse – with their more verbose, conventional alternatives.

Two things that aid reading code are "chunking", experts in a field, like chess or programming, don’t learn remember every detail but they know the rules of possibility so they can break up a programme or a chess position into larger pieces (or chunks). They thus have better recall than novices.

The second aid to reading code are beacons, variable names and comments that hint about the higher purpose of code, to enable you to recall the right chunks. That’s to say if you are implementing code that uses a binary try you use the conventional names of root, branch, node, left and right rather than trying to be individualistic.

I suspect a lot of programmers, like me, will be looking at the rote learning exercises that Hermans proposes and starts to think immediately about how to automate them! I think there is scope for IDE extensions that allow you set up some flashcards or little code exercises. Also Hermans proposes quite a lot of printing out code and annotating it, again this was something I’d quite like IDE support for.

The second part is on understanding code more deeply, how it works. I was interested to learn that our natural language abilities are a better predictor of how good we are at comprehending what code does, than our mathematical abilities. In terms of understanding code, Hermans talks about marking up listings of code to highlight the occurrence of functions and variables. And, furthermore, to label variables by role following the work of Sajaniemi that is to say into the categories of fixed value, stepper, flag, walker (like a stepper), most recent holder, most wanted holder, gatherer, container, follower, organiser, and temporary. The co-occurrence of these roles provides strong clues as to what code does – in the same manner as design patterns. If we spot a design pattern we can access our long term memory as to what a design pattern does.

Following on from the idea of labelling roles of variables is the somewhat depreciated "Hungarian notation" proposed by Simonyi. This is where you include some type or role information in a variable name such as "strMyName" or "lb_textbox", Simonyi’s original proposal was to name variables with their roles, rather than just their types which is rather less useful in strongly typed languages and modern IDEs with syntax highlighting.

The third part is on writing code, starting with the importance of naming things. The key here is consistency in naming (i.e. stick with either snake case or camel case, don’t mix), and agreeing a "name mould" – a pattern for compiling parts of a name. Martin Fowler’s "code smells" are also covered in this section, highlighting how they interact with the model and how bad code smells prevent us accessing our long term memories. 

The final part is on collaborating on code, including the developer’s great bugbear "the interruption", it turns out this annoyance is well-founded with research showing that an interruption typically requires 15-20 minutes for recovery. I was also interested to see that we are not very good at multi-tasking, although we might think we are.

Also in this part is a discussion of the cognitive dimensions of code bases (CDCB), these are ideas like the error proneness of code, how easy it is to modify, how easy it is to test in parts applied at the level of an application or library. There is an implication here that the language you use to build a library may change over the course of time, perhaps starting with Python when you are roughing things out quickly, adding in type hinting when the library is more mature and shifting to Scala or Java when the design is stable and better performance is needed.

Finally, there is a small piece on onboarding new developers to a project, here the ideas of cognitive load repeat. Often when we are onboarding a new developer we show them the code, introduce a load of people, draw diagrams and so forth – all very fast. Under these circumstances our ideas about cognitive load tell us anyone will be overwhelmed.

I enjoyed this book, it feels like a guide to getting better at doing something I spend a lot of my time on. It is an area, learning in the field of programming, that I have not seen written about elsewhere.

Hopefully this book will change the way I work a bit, I’ll try to learn more syntax, I’ll not worry about reusing the same variable names, or even using Hungarian notation. I’ll try to remember the roles of variables. And I’ll try Hedy out with my son, Hedy is the teaching language Hermans wrote while also writing this book.

Book review: Exercises in Programming Style by Cristina Videira Lopes

exercises_in_programming_styleRecently our CIO has allowed us to claim one technical book per quarter on expenses as part of our continuing professional development. Needless to say since I was buying these books already I leapt at the opportunity! The first fruit of this policy is Exercises in Programming Style by Cristina Videira Lopes.

The book is modelled on Raymond Queneau’s book Exercises in Style which writes the same story in 99 different ways.

Exercises in Programming Style takes a simple exercise: counting the frequency of words in a file and reporting the top 25 words, and writes a program to do this in forty different styles spread across 10 sections.

The sections are historical, basic styles, function composition, objects and object interaction, reflection and metaprogramming, adversity, data-centric, concurrency, interactivity, and neural networks. The section on neural networks breaks the pattern with example programmes only handling small elements of the word frequency problem. The sections vary in size, the objects and object interaction is the largest.

Lopes talks about styles in terms of constraints, for example in the "Good old times" historical style there are no named variables and limited memory, in the "Letterbox" style objects pass messages to one another to prompt actions.

The shortest implementation of the example is in the "Code Golf" chapter with just six lines, other examples run to a couple of pages – a hundred lines or so. Lopes is somewhat opinionated as to style but quite balanced providing reasoning where unusual styles may be appropriate. This was most striking for me in the section on "Adversity" which discussed error-handling. Lopes suggests that a "Passive Aggressive" style with error handling all occurring at the top level in a try-except block is better than my error handling to date which has been more in the "Constructivist" (trapping errors but proceeding with defaults) or "Tantrum"(catching errors and refusing to proceed) style.

Sometimes the fit to the style format feels slightly forced, in particular in the chapters relating to neural networks but in the Data-Centric chapter I learnt how to implement spreadsheet-like functionality in Python which is interesting.

I’ve been programming for about 40 years but as a physical scientist analysing data or trying out numerical models rather than a professional developer. Exercises  brings together many bits and pieces of things I’ve learnt, often in the context of different languages. For a while I’ve had the feeling that I didn’t need to learn new languages, I needed to learn how to apply new techniques in my favoured language (Python) and this book does exactly that.

Once again I was bemused to see Python’s "gentleman’s agreement" methodology over certain matters. By convention methods of a class whose name start with an underscore are considered private but this isn’t enforced so if you really want to use a "private" method just go ahead. Similarly many object-oriented languages support a "this" keyword for the members of a class to refer to themselves. Python uses "self" but only by convention, you can specify "self" is "me" or whatever other name you please. The style format provides a nice way of demonstrating a feature of Python in a non-trivial but minimal functioning manner.

It is somewhat chastening to discover that many of the styles in this book had their abstract origins in the 1960s, shortly before I was born, entered experimental languages such as Smalltalk in the seventies where I would have read about them in computer magazines and became mainstream in the eighties and nineties in languages like C++, Java and Python, not long after the start of my programming career. Essentially, most of the action in this book has taken place during my lifetime! In physics we are used to the figures in our eponymous laws (Newton, Maxwell etc) being very long dead. In computing the same does not apply.

What I take away from Exercises is that to a fair degree modern programming languages can be used to implement a wide range of the ideas generated in computer science over the last 50 or so years so in improving your skill as a programmer learning new languages is not the highest priority. There is a benefit to learning new techniques in a language in which your are familiar. Clearly some languages are designed heavily to support a certain style, for example Haskell and functional programming but I found it easier to understand monads explained in the context of Python than in Haskell where everything was alien.

Exercises is surprisingly readable, the programs are well-documented and Lopes’ text is short but clear with references to further reading. It stands alongside Seven databases in Seven Weeks by Eric Redmond and Jim R. Wilson as a book that I will rave about and recommend to everyone!

Far, far away

This week I have journeyed into the heart of darkness.

Actually it was my company’s IT outsourcing system. I work for a very big company: it has about 150,000 employees spread across the world. I work in north west England amongst other things I look after a little unit which uses a particular piece of bespoke software, the unit involves seven people in an office a couple of hundred metres from where I sit at work. The tale of our new bespoke software is long and tortuous and I won’t go into it here but to relate my adventures in getting the test version of the software copied onto the live system today.

The servers on which this software resides are located in North Wales (15 miles away) and a spot down the road about 8 miles away. The outsourcing of our IT services means that the manager for this process is located in the Netherlands, and the person actually doing the process, Supriya, is in India. I can tell she is in India because she has an Indian phone number. Her e-mail signature says her “office base” is in North Wales, it must be a bit inconvenient having your “office base” in North Wales, a location I suspect Supriya has never visited, and a phone in India. Do my company think I am some sort of dribbling BNP little Englander who would dissolve in rage if I thought I was dealing with someone in India? I regularly work with people from China, France and even the US, trying to obfuscate where someone works is frankly patronising and offensive – particularly if you do it so ineptly.

I’ve spoken to Supriya before – she’s a friendly and helpful lass but she doesn’t half ask some odd questions: “Could I confirm that Ireland was not going to be impacted by the change I had requested?”. “Had I notified NL service mfgpro(users)?” Just to be clear: I have no idea how Ireland might be affected or who the “NL service mfgpro(users)” are, these aren’t recognised code words for me. I clearly provided the right answer in these cases because I was informed that both Ireland and the Benelux countries had given their approval. But the fear arises in my mind: I’ve not cleared things with the Austro-Hungarian Empire – could I have inadvertently started World War III? This is yet to be determined.

The process doesn’t go entirely smoothly, largely because Supriya is too polite to tell me that the procedure she’d been asked to carry out throws up some errors. I can’t help because I’m not given permissions to see the servers where the software resides, Supriya has a difficult time because she has no absolutely idea what the software does. However, with the help of  James, who wrote the software, based in Manchester but whose boss is in Sunderland we do manage to get everything sorted out by the end of the day (or about 10pm in Supriya’s time zone).

This is not an isolated incident: receipts for my travel claims are sent to Iron Mountain (a company just outside Birmingham) where they are converted to electronic form before being sent to Manila (I can’t help thinking this may have been due to a misunderstanding involving envelopes) and paid via India. In a fit of tidiness I once decided to get a stash of 6 computers removed from a desk in my office: they’d been left by a sequence of unnamed, and now forgotten contractors. I received endless fractious e-mails from a centre in Bulgaria, belonging to the leasing company, demanding to know who all these computers belonged to, or why I appeared to be in possession of 6 computers.

The old way of doing things involved a prescriptive system of doing stuff where you filled in a form and it went through a process and something got done. But actually it didn’t, actually you learnt who was going to do what you wanted, went over for a little chat whereby you found out what incantation you needed to inject into the system in order to get your job squared with the system whilst they got on and did the job. Outsourcing frequently loses this human contact, in fact it purposefully eliminates it.

A set of blog posts on SQL

This is a roundup post for a rather specialist set of posts I wrote on SQL (Structured Query Language), a computer language for creating and querying databases. Basically the posts are my notes on the language which I’m learning because a couple of programming projects I have in mind will need it. The main source for these notes is the Head First SQL book. I’ve used a another book in this series (Head First Design Patterns) – I quite like the presentational style. The code in the posts is formatted for display using this SQL to HTML formatter.

Topics covered:
Some notes on SQL: 1 – Creation
Some notes on SQL: 2 – Basic SELECT
Some notes on SQL: 3 – Changing a table
Some notes on SQL: 4 – Advanced SELECT
Some notes on SQL: 5 – Database design
Some notes on SQL: 6 – Multi-table operations
Some notes on SQL: 7 – Subqueries and views

Of course you can find SQL cheatsheets elsewhere.

The Head First SQL book also has material on transactions and security, if I get a renewed bout of enthusiasm I will add a post on these items.

I used MySQL via its command line client to do the exercises, because it’s about as straightforward as you can get. Notepad++ recognises SQL as a language and will do syntax highlighting, so I type my commands into it and copy them into the MySQL command line client. MySQL is pretty straightforward to install. I also installed Microsoft SQL Server Express 2008, which turned out to be a bit more of a struggle but on the plus side integration the C# .NET, which is what I normally program in, looks better than for MySQL.

I’ve been using with the SQL Server via SQL Management Studio (a graphical interface to databases) on the general election data compiled by The Guardian. First contact with actual data, as opposed to learning exercises has proved interesting! A lot of things that are fiddly to do in a spreadsheet are straightforward using  SQL.

SQL was designed in the early 1970’s, with commercial implementations appearing towards the end of the decade. It’s influence visible is visible in more modern languages, such as the LINQ extensions to C# (this influence is pretty explicitly stated). Some of the ideas of database design (normalisation) seem relevant to object-oriented programming.

It’s been an interesting learning experience, my scientific background in programming has me stuffing pretty much any sort of data into an array in the first instance. SQL and a database look like a much better solution for many situations.

Some notes on SQL: 7 – Subqueries, combining selections and views

This is the seventh, and final, in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database, the third commands to modify the structure and contents of an existing database, the fourth, advanced selection. The fifth post covered database design. The sixth post covered multi-table database operations. This post covers subqueries and views. No claim of authority is made for these posts, they are mainly intended as my notes on the topic.
This is largely a wrapping up blog post to cover a couple of items.As an alternative to the joins described in a previous post, “subqueries” can often be used. A subquery is essentially an entire query embedded in another query. Subqueries can be used with UPDATE, INSERT and DELETE statements, whilst joins cannot. However, joins can be used to bring columns from multiple tables. There are no special keywords involved in creating a subquery. There is some discussion on the pros and cons of subqueries and joins here on Stackoverflow.

In an uncorrelated subquery the so-called inner query can be evaluated with no knowledge of the contents of the outer query. This expression contains an uncorrelated subquery:

UPDATE raw_candidate_data
SET    gender = ‘f’
WHERE  firstname IN (SELECT firstname
FROM   firstname_gender_lookup
WHERE  gender = ‘f’); 

The inner query is the clause in brackets, in this instance it is a shorthand way of building a list for an “IN” comparison. Often an inner query returns a single value, i.e. for an average or maximum in a list.

This expression contains a correlated subquery:

UPDATE raw_candidate_data
SET    party_id = (SELECT party_id
FROM   parties
WHERE  raw_candidate_data.party = parties.party_name); 

The inner query requires information from the outer query to work, this expression acts as a look up.

Complex queries can be given aliases using the VIEW keyword, for example:

CREATE VIEW web_designer AS SELECT mc.first_name, mc.last_name, mc.phone,
mc.email FROM my_contacts mc NATURAL JOIN job_desired jd WHERE jd.title=‘Web Designer’; 

Can subsequently be used by:

SELECT * FROM   web_designers; 

The view web_designers can be treated just as any other table.

The results of multiple select commands can be combined using the keywords: UNION, UNION ALL, INTERSECT and EXCEPT. UNION returns the distinct union of the results of all the selects, i.e. with no repeats, UNION ALL includes the repeats. INTERSECT returns items that are common to both SELECTs and EXCEPT returns those items that are returned by the first SELECT but not the second. The general form for these combination operations is as follows:

SELECT title FROM   job_current
UNION
SELECT title FROM   job_desired
UNION
SELECT title FROM   job_listings
ORDER  BY title; 

Each SELECT statement must return the same number of columns and each column must be coercible to the same datatype. Only a single ORDER BY statement for the set of SELECTs can be used.

Keywords: UNION, UNION ALL, INTERSECT, EXCEPT, VIEW