Best Practice work-flow with git
A quick look about on the web will bring you up-to speed on pretty much all you need to know about git. There are some great introduction’s to what it is, detailed manuals, and best of all an explanation of how it works aimed at people who understand computer science (and if you can’t follow that, you’re not going to earn much working as a programmer). However there is somethings missing from all these pages, and that’s some best practices on how you should actually use git. What work-flow should you use, and what best practices should you follow.
This is really important. The problems with subversion were not that some of its operation’s could be slow. Personally I never found myself staring at my screen, twiddling my thumbs, or going for a quick round of Mario Kart while waiting for subversion to finish something. The problem I always had with subversion was that my team and I were always treading on each others toes. We had a number of releases that were late because we were all committing code over each others’ work, and introducing unnecessary complexity.
However, there is a solution, and we found it. To fully understand it though, you need a good understanding of the problems that need solving.
The real problem with subversion
When I first came across continuous integration, I thought it was an absolutely great idea. If, as it often suggested, integrating work from different developers is hard, with the difficulty increasing roughly quadratically with time since the last integration, it make a huge amount of sense to integrate as often as you can. But I learned the hard way that it’s not true. At least not at the small scale. When someone else in my team is working on code, they, like all developers often go though a phase of sketching out their solution in code. This is normally pretty bad code from a production point of view. Their next step is to tidy this up, and make it into production quality code. It’s at this point that integration is good. Any fool can see trying to integrating my production code with a colleagues sketch code is bad. This is what happens with subversion though. Everything gets committed, otherwise you risk loosing your work if there is an issue (you won’t believe how often developer leave their laptops in bars). Branches are of course for exactly this reason, and I will talk about them later. For now lets just say I don’t know anyone using them successfully in subversion.
The problem with subversion then is that there is a tension, between trying to integrate your code with the rest of your team, and trying to get far enough down the route of maturing your code that you don’t create a bottle-neck. I experienced at least one case personally where one developer was doing a major chunk of refactoring, and it acted as a bottle-neck, preventing any bug fixes from other developers being committed and deployed. Subversion makes avoiding this too expensive.
The solution is branches, merging, and testing at each stage
Hopefully you knew that already, but does your team actually do it? This was always the answer I would give if asked how manage a code base, but no team I worked in ever managed it.
Why? Well quite simply, merging is hard. Subversion merges don’t work well with code that moves. If I move a chunk of code from one directory to another, subversion no longer tracks it well between branches. This is something I do a lot when refactoring code, and it breaks subversion.
On the other hand, because git expects merges, and moves, to be regular events it handles them very well. This is probably because under the hood it tracks the contents of your files rather your files, but at this level of understanding, all I care about is it works. I can create a branch, work on it, and merge, and apart from some annoying glitches between how editors handle whitespace, most things just work.
This is the work-flow we chose (and if your skimming this article, this is the best bit to read)
First of all, we had a centrally hosted repository. I’m of the opinion that trying to run git with no repository being authoritative can work, but adds various complications, and pretty much no benefits. It might be cool, but that’s what the kids who give you cigarettes at school always said.
Then we had an authoritative branch on that repository. We all set our authoritative repository to be called “origin” and the branch was called master. Thus origin/master represented the state of the art production code. However, no-one, absolutely no-one was allowed to work on master. Most of the time everyone in the team (except the gatekeeper) did not even have master set up as a local branch.
Secondly, we had another branch on origin, called stable. Stable was always an ancestor of origin/master, but lagged behind a bit. Stable had various tags placed on it, which represented the actual public releases we made. More on this later.
Next each developer had as many branches as they wanted. Foremost though, each developer had a branch named after themselves. So I mostly worked in the ‘laurie’ branch, which was also on origin as origin/laurie. Along with this, each developer had a copy of the deployment platform on their workstation, and on on a staging server, in my case this was called laurie-stage. Each developer then works there. They write their tests, modify their code etc, making lots of commits along the way (the local nature of git commits makes regular small commits a very easy habit to get into, and it’s a very good one when you need to debug something that went wrong a while ago). When I am happy with my work, and its tested and working locally, I merge master into it:
git fetch && git merge origin/master && rake spec
This command gets the latest version of master, and applies any new changes to my code base. Master is not changed. I run all my tests again, and then deploy to stage-laurie. I then pass this over to my quality assurance guys (which could be me in another hat, but we were lucky enough to have a secondary team of people who were in a position to do the testing instantly). They test the product, checking that the feature has been added correctly, or the bug fixed, and that no new bugs have been introduced (though your unit tests will catch that – right?). This is continuous integration happening right here.
After I have gotten my code to a point where this all passes, I push the state of my local branch to origin/laurie, and I go and talk to the gatekeeper.
In our team the gatekeeper was a person, though if your brave you could automate him. The gatekeeper has a local master branch. After I have told him that the changes in laurie are good to go, he asks round the rest of the team. Are any other branches good to go, generally there will be about 2-3 branches ready to go at any one time. He then gets a summary of what the changes are, and orders them in order of business value. Then, starting with the most critical change, he merges it into master, and runs all the tests. He then does this for the next most critical fix and so on. If at any point one of the merge results in code that fails the tests, he can simply un-merge (moving the post it note mentioned in the Git for Computer scientist article, – you did read it I hope).
Other developers can help with this process on their workstations. After my changes (which are of course the most important) are successfully merged into master, everyone else can pull master again, and merge it back into their branch, – preempting any conflicts and fixing them.
Once the gatekeeper has merged in all the changes, or at least all the ones that don’t conflict and break tests, he deploys this to a master staging server. Once again the quality team takes a look, this time concentrating on making sure that no existing functionality has been blatted by any of the changes. Assuming that passes the gatekeeper then merges origin/master into origin/stable. Tags it with the latest release revision number, and deploys onto our production environment.
We found this flow worked really well. Conflicts and merge related issues did occur, but always when merging the master branch into a developers local branch, so at most one team member was held up by this.
We took the policy of releasing as often as possible, so we would often release a new production code-base 2-3 times a day, each time with fully tested code. Sure, there were a few mistakes, but even when we were making big change, and developing the work flow, no bug serious enough to need us to roll-back the production code got through the safety nets.
If you need to guarantee that there are no mistakes, then like any project, you need to increase the depth of your test phase. Ours was relatively fast, as most of the users were alpha/beta testers :)
Releasing production code that often was a great asset too. The management team could see that work was progressing. Even if it wasn’t going the speed they wanted (is such a thing possible) they were greatly comforted to know that the users would see several improvements per day. As a team we had the freedom to allow one or two developers to pick up a slightly longer scale project, such as refactoring an important sub-system while the rest of the team got on with pushing out live improvements, and of course the users got the experience of a system constantly being updated. As we were sensible with listening to the users before choosing the next piece of work, they also got the feeling that the application was very responsive to any change request they made.
Previous entry Polymorphic Associations and Interfaces In Ruby/Rails
Next entry Design Patterns in Ruby – A Book Review