Tag: git

Book review: Pro Git by Scott Chacon and Ben Straub

progitPro Git by Scott Chacon and Ben Straub is available to download for free, or to read online at the website but you can buy a paper copy if you prefer. I downloaded an read it on my tablet. Pro Git is the bible of all things relating to git, the distributed version control system. This is an application to record the history of changes to your computer code, or any other plain test file. Such applications are essential if you are a software company producing code commercially, or if you are collaborating on an open source project. They are also useful, if you use code in analysis or modelling, as I do.

Git is most famous as the creation of Linus Torvalds in support of the development of the Linux operating system. For developers version control is a fundamental activity which crosses all boundaries of domain and language. Git is one of the more recent examples in a line of version control systems, my former colleague Francis Irving wrote very nicely about this subject.

My adventures with source control extend over 20 years although it is fair to say that I didn’t really use them in anger until I worked at ScraperWiki. There my usage moved from being a safety line for work that only really impacted me, to a collaborative tool. I picked up my usage of git through pairing with other people, and through explicitly stated conventions for using git in a developing team. Essentially one of the other developers told us off if he thought our commit messages were not up to scratch! This is a good thing. This culturally determined use of git is important in collaborative environments.

My interest in git has recently been re-awoken for a couple of reasons: my new job means I’m doing a lot of coding, and I discovered GitKraken which is a blingy new git client. I’ve not used a graphical git client before but GitKraken is very pretty and the GUI invites you to discover more git functionality. Sadly it doesn’t work on my work PC, and once it leaves beta I have no idea what the price might be.

Pro Git starts with an introduction to git and the basics of getting up and running. It then goes on to describe how to use git in collaborative environments, how to use git with GitHub and then more advanced topics such as how to write hooks, and use git as a client to Subversion (an earlier source control system). Coverage feels pretty complete to me, it’s true that you might resort to Stackoverflow to answer some questions but that’s universally true in coding.

The book finishes with a chapter on git internals, what is going on under the hood as you issue commands. Git has a famous division between “porcelain” and “plumbing” commands. Plumbing is what really get things done, low level commands with somewhat opaque meaning whilst porcelain is the stuff you use day to day. The internals chapter starts by showing how the plumbing works by reproducing the effects of some of the porcelain commands. This is surprisingly informative, and built my confidence a bit – I always have some worry that I will lose something irrevocably by issuing the wrong command in git. These dangers exist but Pro Git is clear where they lie.

Here are a couple of things I’ve already started using on reading this book:

git log –since=1.week

– filter the log to just show the commits made in the last week, other time options are available. Invaluable for weekly reporting!

git describe

– make a human readable (sort of) build number based on the most recent tag and how far you are along from it.

And there are some things I used to wonder about. First of all I should consider commits as a tree structure, with branches pointers to particular commits. In this context HEAD^ refers to the parent commit of the current HEAD, or latest commit. HEAD~2 refers to the grandparent of the current commit, and so on. I now have some appreciation of soft, mixed and hard resets. Hard is bad, it could lose your work!

I now know why git filter-branch was spoken of in hushed tones in the ScraperWiki office, basically because it allows you to systematically rewrite the history of a repository which is sort of really wrong in source control.

Pro Git is good in outlining not only what you can do but also what you should do. For example, one has the choice with git to merge different branches or to carry out a rebase. I’d always been a bit vague on the difference between these two things but Pro Git explains clearly, and also tells you when you shouldn’t use rebase (when other people have seen the commits you are rebasing).

My electronic edition on Kindle does suffer from the occasional glitch with some paragraphs appearing twice but the writing is clear and natural. Pro Git can’t be beaten for the price and it is probably worth the  £32 Amazon charge for a paper copy.


logo@2xI’ve discovered that my blog is actually a good place to put things I need to remember see, for example, my blog post on running Ubuntu in a VM on Windows 8.

In this spirit here are my notes on using git, the distributed version control system (DVCS). These are things I picked up around the office at ScraperWiki, I wrote something there about the scheme we use for Git. This is more a compendium of useful git commands.

I use Git on both Windows and Ubuntu and I have accounts with both GitHub and Bitbucket. I’ve configured ssh on my Windows and Ubuntu machines and use that for authentication. I Windows I interact with Git using Git Bash.


On installing Git I do the following setup, obviously using my own name and email:

git config --global user.name "John Doe"
git config --global user.email johndoe@example.com
git config --global core.editor vim

I can list my config settings using:

git config -l

Starting a repo

To start a new repo we do:

git init

These days I feel bereft if I’m not “pushing” my local repository to an online repository like GitHub or BitBucket. To add a remote repository create one using the service of your choice which will probably ask you to do:

git remote add origin [url]

Alternatively you can clone an existing repository into a subdirectory of your current directory with the name of the repo:

git clone [url]

This one clones into current directory, making a mess if that’s not what you intended!

git clone [url] .

A variant, if you are using a repo with submodules in it, :

git clone –recursive [url]

If you forgot to do the above on first cloning then you can do:

git submodule update –init

Adding and committing files

If you’ve started a new repository then need to add some files to track:

git add [filename]

You don’t have to commit all the changes you made since the last commit, you can select them using the -p option

git add –p

And commit them to the repository with a commit command like:

git commit –m [message]

Alternatively you can add the commit message in your favoured editor with the difference from previous commit shown below:

git commit –a –v

I tend to use an remote repository as a backup so I regularly do:

git push origin HEAD

If someone else is working on the same repository as you then things get more complicated but that’s out of the scope of this post.

Undoing things

If you get your commit message wrong you can edit it with:

git commit --amend

If you decide you change your mind about staging a file for commit:

git reset HEAD [filename]

If you change your mind about the modifications you have made to a file since the last commit then you can revert to the last commit using this **destructive** command:

git checkout -- [filename]

You should be careful doing that since it will obliterate any changes you’ve made to a file, even if you saved them from the editor.

Working out where you are

You can list files in the repo with:

git ls-tree --full-tree -r HEAD

The general command for seeing what is going on is:

git status

This tells you if you have made edits which have not been staged, which branch you are on and files which are not being tracked. Whilst you are working you can see the difference from the previous commit using:

git diff

If you’ve already added files to commit then you need to do:

git diff –cached

You can see a list of all your changes using:

git log

This command gives you more information, in a more compact form:

git log --oneline --graph --decorate

is a good way of seeing the status of your branch and the other branches in the repository. I have aliased this log set of options as:

git lg

To do this I added the following to my ~/.gitconfig file:

        lg = log --oneline --graph --decorate

Once you’ve commited a bunch of changes you might want to push them to a remote server. This pushes to the remote called origin, and HEAD ensures you push to your current branch. HEAD is Git’s shorthand for the latest commit on the current branch:

git push origin HEAD


The proceeding commands are how you’d work using a single master branch, if you were working alone on something simple, for example. If you are working with other people or on something more complicated then you probably want to work on a branch, you can make a new branch by doing:

git checkout –b [branch name]

You can find out what other branches are available by doing:

git branch –v -a

Once you are on a branch you can commit changes, and push them onto your remote server, just as if you were on the master branch.

Merging and rebasing

The excitement comes when you want to merge your changes onto the master branch or you want to get changes on your own branch made by someone else and pushed to the remote reposition. The quick and dirty way to do this is using

git pull

This does a fetch and merge all at the same time. The better way is to fetch the changes and then merge them:

git fetch –prune –all
git merge origin/master

If you are working with someone else then you may prefer to merge changes onto the master branch by making a pull request on GitHub or BitBucket.

Accepting Pull Requests from Forks

If someone makes a Pull Request based on their forked copy of a repo then you can download for testing by doing:

git fetch origin pull/ID/head:BRANCHNAME



This post was first published at ScraperWiki.

As software company, use of some sort of software source control system is inevitable, indeed our CEO wrote TortoiseCVS – a file system overlay for the early CVS source control system. For those uninitiated in the joys of software engineering: source control is a system for recording the history of file revisions allowing programmers to edit their code, safe in the knowledge that they can always revert to a previous good state of code if it all goes horribly wrong. We use Git for source control, hosted either on Github or on Bitbucket. The differing needs of our platform and data services teams fit the payment plans of the two different sites.

Git is a distributed source control system created by Linus Torvalds, to support the development of Linux. Git is an incredibly flexible system which allows you to do pretty much anything. But what should you do? What should be your strategy for collective code development? It’s easy to look up a particular command to do a particular thing, but less is written on how you should string your git commands together. Here we hope to address this lack.

We use the “No Switch Yard” methodology, this involves creating branches from the master branch on which to develop new features and regularly rebasing against the master branch so that when the time comes the feature branch can be merged into the master branch via a pull request with little fuss. We should not be producing a byzantine system by branching feature branches from other feature branches. The aim of “No Switch Yard” is to make the history as simple as possible and make merging branches back onto master as easy as possible.

How do I start?

Assuming that you already have some code in a repository, create a local clone of that repository:

git clone git@github.com:scraperwiki/myproject.git

Create a branch:

git checkout -b my-new-stuff

Start coding…adding files and committing changes as you go:

git add -u
git commit -m "everything is great"

The -u switch to git add simply checks in all the tracked, uncommitted files. Depending on your levels of paranoia you can push your branch back to the remote repository:

git push

How do I understand what’s going on?

For me the key revelation for workflow was to be able to find out my current state and feel pleasure when it was good! To do this, fetch any changes that may have been made on your repository:

git fetch

and then run:

git log --oneline --graph --decorate --all

To see an ASCII art history diagram for your repository. What you are looking for here is a relatively simple branching structure without too many parallel tracks and with the tips of each branch lined up between your local and the remote copy.
You can make an alias to simplify this inspection:

git config --global alias.lg 'log --oneline --graph --decorate'

Then you can just do:

git lg --all

I know someone else has pushed to the master branch from which I branched – what should I do?

If stuff is going on on your master branch, perhaps because your changes are taking a while to complete, you should rebase. You should also do this just before submitting a pull request to merge your work with the master branch.

git rebase -i

Allows you to rebase interactively, this means you can combine multiple commits into a single larger commit. You might want to do this if you made lots of little commits whilst achieving a single goal. Rebasing brings you up to date with another branch, without actually merging your changes into that branch.

I’m done, how do I give my colleagues the opportunity to work on my great new features?

You need to rebase against the remote branch onto which you wish to merge your code and then submit a pull request for your changes. You can submit a pull request from the web interface at Github or Bitbucket. Or you can use a command line tool such as hub.  The idea of using a pull request is that it makes your changes visible to your colleagues, and keeps a clear record of those changes. If you’ve been rebasing regularly you should be able to merge your code automatically.

An important principle here is “ownership”, in social terms you own your local branch on which you are developing a feature, so you can do what you like with it. The master branch from which you started work is in collective ownership so you should only merge changes onto it with the permission of your colleagues and ideally you want others to look at your changes and approve the pull themselves.

I started doing some fiddling around with my code and now I realise it’s serious and I want to put it on a branch, what did I do?

You need to stash your code, using:

git stash

Then create a branch, as described above, and then retrieve the contents of the stash:

git stash pop

That’s how we use git – what do you do?