Or how to achieve bi-weekly releases at SaaS scale
Part II – Tools
This is Part II of a series, see Other Parts in the Series
Code is King
A certain kind of people will tell you that a car is just a means to
“get from A to B”: I insist that there is a fundamental, qualitative
difference between a Porsche GT3 and a Fiat Multipla – even without knowing
anything about mechanics and the inner working of a petrol engine, by just
looking at the beauty of the former, one knows there is a tremendous amount
of engineering pride and excellence that went into it, that which will be entirely
missing in something that was clearly designed by a drunk squirrel.
With code, it is now my strongly held belief, it’s exactly the same – multiplied
by a factor of 100x, at least: readable, clear, concise code shows a clarity
of intent and an intellectual mastery of the problem domain, that is withoout
fail entirely missing from convoluted, shoddily written and just “hacked
together” spaghetti code.
Obviously, business people – the CEO-types, usually with background in
management, strategy or marketing – will say that “so long as it works” and
earns a buck then it’s just fine and that if you agonize over code quality and
style, you’re just a stickler who’s lost touch with the hard reality of
And, maybe, in the era of fart apps being valued at $3BN+ while they can’t
even be bothered to secure they users’ accounts, let alone develop something
actually useful to solve real-life problem, that’s possibly true.
However, when you are developing Enterprise SaaS software, which must be
deployed across the globe, must be highly available and must be ready to scale
to handle many thousand of business critical transactions per second; well,
then that’s an entirely other story.
Further, in an era of rapidly changing markets, or emerging user requirements
and the constant need to stay one step ahead of the competition (who are
usually just as smart as you are) there are some key requirements that must
- code must be easy to extend and modify;
- systems must be loosely coupled and embrace separation of concerns;
- the code must be fully testable – and tests must be fully automated and
- the code must be easy to understand by new hires; the APIs must be clean
and “easy to use right” by partners, users and integrators; and one must
be able to quickly open source the code base, if required.
I’ve learned this the hard way at Google, joining a part of the organization
that was not originally part of the core engineering team and where,
historically, there had been a less-than-ideal attention to code quality,
because they were not subject to the same stringent requirements as the rest
core Search and Ads teams: starting with a core of extremely talented folks
in the London office, we set out to change that, but it took us the best part
of three years to achieve our goals, and some parts of that codebase were still
woefully inadequate by the time I left.
No surprise, those tools and services we developed according to the core
Engineering best practices were the ones that scaled fastest, were most used
beyond our organization and gave us the least problems operationally; the others,
well, not so good.
Which brings us to the core of this post: the tools and the principles.
Critical to the success of any serious development effort, a flexible and
scalable source control system must allow teams to work seamlessly on a shared
code base, enable easy branching and merging and give access to a rich history
of changes (including the ability to rollback changes without wasting days
undoing some wayward change).
In my opinion, there is currently only one SCM today that
meets these criteria, and is reasonably easy to use, once one gets past the
initial steep learning curve: git
And, obviously, the only sensible way to go about it these days is to
create an account at github .
There are some shortcomings  but compared with the cost of running your
own server and the hassle in ensuring it is secure and always available, it
makes the decision to use it a complete no-brainer.
Given the relative steep learning curve (but the general plain sailing
afterwards) it is important to have a few clear policies around the source
code management (eg, always develop in feature branches, never against
develop, etc. — I’ll elaborate more in Part 4) and
cultivate a healthy intolerance towards anyone in the team who fails to embrace
those rules and shows signs of laziness in learning git to an adequate level
of proficiency: it’s a key tool of the trade, and not putting all of one’s
effort into learning it, is a bit like working in a carpenter’s workshop and
refuse to learn how to properly use a jigsaw.
In AD 2013, I am amazed how I still come across self-professed “senior software
developers” (or, worse, Engineering Director-level manager) who claim that they
don’t practice code reviews in their teams.
And I don’t mean the occasional code review, or once-a-month pair programming
session, or let-me-eyeball-your-code-while-you-type travesty: I mean the
single-minded, no holds barred, no-code-gets-commited-without-one, honest-to-God
code review that will make us both better coders, daily – every day.
We use Review Board, across all our repositories (we have now in excess of 20)
taking advantage of the service hosted by RBCommons: when I joined RM they
were hosting the open source version themselves (clearly, saving a few pennies
per month was deemed more important than the developers’ time wasted in
maintaining it, solving DNS issues and a whole host of petty nonsense), but I
quickly made it clear (using a few chosen words that really can’t be
printed here) what I thought of all that.
I will elaborate more in Part 4 (Processes), but the basic principle is
that no code gets committed into the develop branch without at least one
“Ship It!” by an authoritative other developer (in other words, someone who is
either senior or very familiar with that part of the code base).
Something I learned at Google (and, believe me, when you’re dealing with
300MM of lines of code – and by now, I’m guessing, probably in excess of the
half billion – you better learn fast) is the critical importance of having a
code base that is uniform, with clear guidelines and a code style that is
consistent across languages, OSes and platforms.
At Google, it is critical to ensure that engineers moving across teams don’t
lose all their productivity for several weeks (or, possibly, months) while
trying to adjust to unfamiliar patterns – for us, at a smaller scale, it means
that the code is uniform, is easier to understand and people moving across
unfamiliar parts of the system can still achieve rapidity and agility.
Most IDEs (and even Vim) can be configured to automatically flag (and possibly
even auto-correct on-save) style violations, so the impact on productivity is
initially minimal and eventually nil on an ongoing basis, when people become
familiar with them and just adhere to them without even thinking about it.
I am utterly amazed (but no longer surprised) at how people (especially
non-coders, but some developers too) fail to appreciate the incredible
productivity gains of a clear, consistent and easy-to-read code base, achieved
at virtually no productivity cost, and just regard our collective fixation
on writing awesome code as just a quirk of smart, but ultimately weird, geeks.
Talking of IDEs, we have a policy of “total tolerance:” in other words, every
developer is free to choose whichever IDE they prefer and feel comfortable with.
In practice, this means we use a mixture of vim, IntelliJ and Pycharm, with a
smattering of Eclipse (and, sadly, but by necessity, in future some Visual
I would really like to single out JetBrains here for their excellent work
on IntelliJ and PyCharm: they are incredibly delightful IDEs to use and their
support is terrific – after posting a question on their forums or raising a
ticket, one has barely time to grab a coffee and they’ve already intelligently
and professionally addressed it: highly recommended, without a doubt .
We all use Macs (Retina MBPs, running OSX Mavericks), mostly because it has a
Linux shell underneath, although almost all of us do development inside a VM
running some flavor or other of Linux (I personally favor Ubuntu 12.04 LTS,
most of the other developers use CentOS).
We have initially used VirtualBox, however it has proven to be such a pain and
fiddly to configure for our purposes (especially the networking part) that we
have now moved en masse to vmWare Fusion.
The goal here is to maximize developer productivity, hence I do not mandate any
particular development environment, although, to maintain consistency of results
during unit and integration testing, we do require the use of CentOS, which is
the same platform we use in Production (and, obviously, in our QA testbeds).
And, no, I don’t allow any Windows machines here – it is a dreadful OS and
lacks even the most basic tools necessary for serious system and Cloud
development: no serious developer should ever consider using it, ever.
Together with fully automated testing (unit, integration, functional and
regression — see Part 4) Continuous Build (or Continuos Integration)
is the cornerstone of achieving ‘rapid delivery’ and a high release cadence.
This is worth repeating (although it really shouldn’t be necessary in this
day and age): unless one can fully automate one’s build and test process, and run
it on every commit to main branch (aka, “develop“) there is no realistic
prospect to achieve a release cycle that can be measured in days.
We originally used Jenkins and I still think it’s a decent platform for small
to medium-sized projects; but for Enterprise SaaS scale deployments, it turned
out to be too brittle, and too fiddly to use reliably and at scale.
We are now using (again, JetBrains‘s) TeamCity and, while the learning curve
has proven to be not as gentle as one would have hoped, it turned out to be
a good choice and is allowing us to rapidly automate many task, especially
around QA and test automation.
|||These are the same kind of guys who’ll think it’s ok to offshore to India,
as “those guys are cheap” and, then, when releases start to fail and
features take forever to be implemented, they’ll tell you to
“throw more of them at the problem” – a bit like someone who, upon
gazing to a house on fire due to someone clumsily spilling gasoline on
it, thinks that the best way to “fix the problem” is just to throw more
of the stuff on the flames.
I guess that’s why they get paid the big bucks.
|||Sometimes (ever more rarely recently) github will be down, and for some
reasons that escape us, Teamcity sometimes fails to retrieve the latest
commit; finally, the team management is a bit haphazard and confusing,
and organization management is still a bit awkward (although much
|||It is a sad reflection of my lack of familiarity (and complete lack of
I would not be prepared to bet today on a strict adherence.
I will elaborate in the Part 3 (People).
|||I have used Eclipse since probably around 2003-4, it must have been
around the time they came out with 3.1 which was finally usable: and
I’ve also led the team at Google who develop the Google Plugins for
internal developers’ use; reluctantly, I have had to give up on Eclipse:
it’s too slow and bloated, and its plugins install and upgrade
management system (p2) is an epic fail and seems to me to have been
“designed by committee” resulting in the most worthless and infuriating
piece of unfixable software I’ve ever come across.
I still occasionally use Eclipse (mostly for Scala or C++ development)
but I honestly think its days are numbered.