Skip to main content

Your submission was sent successfully! Close

Thank you for signing up for our newsletter!
In these regular emails you will find the latest updates from Canonical and upcoming events where you can meet our team.Close

Thank you for contacting us. A member of our team will be in touch shortly. Close

An error occurred while submitting your form. Please try again or file a bug report. Close

  1. Blog
  2. Article

Tom Haddon
on 30 October 2014

ScalingStack: 2x performance in Launchpad’s build farm with OpenStack


Launchpad is an open source suite of tools that help people and teams to work together on software projects, and it includes a build service with over 11,000 Personal Package Archives (PPAs). We’ve recently made some major changes to the underlying infrastructure of this system by migrating it to an OpenStack instance that we call “ScalingStack”.

ScalingStack can be thought of as spot instances for Canonical. It’s designed to run workloads that can tolerate having hypervisors removed midway through any job without negative impact. This enables us to repurpose the underlying hardware, even if only temporarily, for usage by such workloads and then take it back at any point if it needs to be returned for usage in another part of the company. It also allows us to deal with spikes in load by temporarily assigning hardware to it.

Previously the Launchpad build farm was built using Xen, copy-on-write snapshots and ballooning to achieve resets of the VMs in less than 5 seconds, designed back in 2007.  This setup involved the creation of a read-only base image, and then an overlay image that the builders would have write access to. Reset scripts in Launchpad would allow it to trigger a teardown and rebuild of a particular builder. We also had a custom network which we called “the airlock” that we would move machines into to set them up in the build infrastructure. This provided us with a way to safely re-use hardware on a temporary basis for builders.

We chose to deploy ScalingStack with OpenStack Icehouse running on Ubuntu 14.04 LTS using MAAS and Juju, as it’s part of our culture to dogfood technologies that Canonical’s customers use. Getting a set up that worked without issue took a significant amount of work, but along the way we helped improve many of the Juju charms used to deploy OpenStack on Ubuntu via Juju, and in the end we had a deployment solution that allowed us to do an end to end OpenStack deployment (we tested it multiple times) in a few hours.

Our initial implementation comprised two OpenStack infrastructure nodes and three dedicated compute nodes. Each compute node had 24 cores, 80G RAM and 1TB of local storage. On the first infrastructure node we ran MAAS, the Juju bootstrap node in KVM, and neutron (also in KVM). On the second infrastructure node we then ran OpenStack components (Glance, Keystone and Nova’s “Cloud Controller” as well as MySQL and RabbitMQ) deployed via Juju in Linux containers (LXC). This setup was codified in a branch containing juju-deployer config files, making for a repeatable deployment process. As we were nearing go-live on production we tore everything down and redeployed from scratch a number of times to confirm our process worked over and over again.

In August 2014, ScalingStack went live and replaced the legacy infrastructure taking the entire builder workload for production Launchpad. Within a few weeks we added a second region in a different data centre, expanding our capacity and giving us the ability to use hardware in ScalingStack from both of Canonical’s major data centres in the UK.

In most senses, ScalingStack is a vanilla OpenStack installation which relies on per-tenant networks in neutron for isolation of the different workloads we plan to run on it. But there are a few differences from a standard OpenStack install that are worth highlighting.

The first is that we pass “cachemode=unsafe” to the “config-flags” configuration option of the nova-compute charm. This means that none of the changes we make to nova-compute instances need to be written to disk, unless we run out of RAM. This buys us some pretty significant speed increases for things like the initial filesystem resize and package upgrades as we boot a new instance, which are typically very IO intensive. Obviously this isn’t something you’d want to do on most production OpenStack installations, but it’s a specific choice here because the workloads can tolerate transient failures which might cause them to need to retry a job.

The second way in which ScalingStack differs from a vanilla OpenStack install is that we have some Juju charms to customise the images we’re using for the Launchpad builders to include specific packages and up to the minute security updates – anything to avoid repeating tasks on each VM boot saves us significant time.

So what kind of changes have we seen as part of this project?

  • Before:
    • 67 machines taking 154U (4 racks)
    • Manual job to repair builders on an ongoing basis
    • Mean build duration: 16 minutes
    • 90th percentile wait time (how long before a build started): 78m
  • After:
    • 6 nova-compute nodes taking 12U (less than a third of a rack)
    • No need for manual repair job
    • Mean build duration: 8 minutes
    • 90th percentile wait time: 22m

Even taking into account that the some 67 machines were older and slower hardware, that’s still a significant improvement in hardware density. Since the migration to OpenStack we’ve averaged over 12,000 builds per week. What’s more, adding hardware to ScalingStack is as simple as provisioning it in MAAS, and then doing a “juju add-unit nova-compute”.

We expected a few benefits from this project, including being able to jettison a significant portion legacy code as we migrated to OpenStack, and increased hardware density. However, the scale of the hardware density increase surprised us, as did the performance improvements – the reductions in build time and the build queue were a nice side effect!

Where do we go from here? We have active plans to open up ScalingStack to other workloads, including retracing errors for Ubuntu machines, and slave instances for continuous integration testing of changes to Ubuntu.

Many thanks to the team that worked on the ScalingStack project, particularly Paul Collins who helped drive this project over the finish line, but also William Grant for his work on the Launchpad side of things.

Related posts


Amir Abdel Baki
11 July 2025

From sales development to renewals: Mariam Tawakol’s career progression at Canonical

Ubuntu Article

Career progression doesn’t follow a single path – and at Canonical, we embrace that. Our culture encourages individuals to explore roles aligned with their evolving skills and interests, even if it means stepping into a completely new technical space. Internal mobility is more than just a policy here;  it’s something we actively support a ...


Erin Conley
10 July 2025

In pursuit of quality: UX for documentation authors

Documentation Article

Canonical’s Platform Engineering team has been hard at work crafting documentation in Rockcraft and Charmcraft around native support for web app frameworks like Flask and Django. It’s all part of Canonical’s aim to write high quality documentation and continuously improve it over time through design and development processes. One way we i ...


Canonical
10 July 2025

Canonical announces Charmed Feast: A production-grade feature store for your open source MLOps stack

AI Article

July 10, 2025: Today, Canonical announced the release of Charmed Feast, an enterprise solution for feature management with seamless integration with Charmed Kubeflow, Canonical’s distribution of the popular open source MLOps platform. Charmed Feast provides the full breadth of the upstream Feast capabilities, adding multi-cloud capabiliti ...