You are here

How Airbnb scaled its migration to continuous delivery with Spinnaker

public://pictures/brianwolfe.jpg
public://pictures/jens.png
By Brian Wolfe, Software Engineer, Airbnb and Jens Vanderhaeghe, Software Engineer, Airbnb

Until a year ago, Airbnb released code exclusively with Deployboard, an in-house tool we developed over the last eight years. Deployboard helped us coordinate continuous integration (CI), Git merges, and deploys to pre-production environments, and it allowed over 1,000 engineers to contribute to our monolithic Ruby on Rails codebase, which we call Monorail.

Unfortunately, Deployboard does not provide flexible deploy pipelines. To provide this automation, we started to migrate from Deployboard to Spinnaker, the open-source tool from Netflix and Google that's built to manage continuous delivery pipelines.

Here's how the project has progressed so far, along with a few lessons learned along the way.

[ Get up to speed on quality-driven development with TechBeacon's new guide. Plus: Download the World Quality Report 2019-20 for lessons from leading organizations. ]

Breaking up the monolith

For the last two years Airbnb has been transitioning from a monolith to a service-oriented architecture (SOA). This has increased the number of services and the number of teams that independently manage and deploy their own services. Service owners started to write more integration test plugins and automation inside and on top of Deployboard.

Even with some additional automation, team runbooks started to grow. Some teams performed 20 or more manual steps over the course of hours or days. Operational mistakes and oversights during releases caused a substantial percentage of incidents.

Figure 1. The number of projects in Deployboard over time has increased rapidly as we move to a service-oriented architecture (click image to enlarge).

Our Deploy Infrastructure team recognized that many incidents could be prevented with tooling to assist the rollout process, using an automated deploy pipeline. We prototyped this logic in Deployboard, but realized that we should use a tool built from the ground up to orchestrate deploy pipelines instead of building our own. Using an existing tool would let us immediately get more powerful features (such as canary analysis) and reduce the amount of custom code our team alone would have to maintain in the future.

The move away from Deployboard

Our team anticipated having to port some functionality from Deployboard to any new deploy solution. Deployboard shows detailed CI results and build artifacts, has special handling for in-house Kubernetes custom resources, and plugs into several in-house integration test frameworks that we built over the years. We did not want to regress on these features when moving to a new tool.

We chose Spinnaker as the replacement for Deployboard in part because we could bridge functionality gaps by plugging in custom logic easily, without forking the core code. We are building critical features from Deployboard into Spinnaker as a set of custom extensions.

[ Get up to speed with TechBeacon's Guide to Software Test Automation. Plus: Get the Buyer’s Guide for Selecting Software Test Automation Tools ]

Extending Spinnaker's user interface

To make the onboarding experience with Spinnaker less jarring for Airbnb engineers, we made some customizations to make it look similar to what they were used to in Deployboard (see the figures below).

When developers look at their app in Deployboard, they select the version to deploy from a list of available versions. We mirrored this initial view in Spinnaker. This familiar experience lets developers immediately start using Spinnaker.

Figure 2. This example of the snapshots view in Deployboard shows code versions on the master branch in chronological order.

Figure 3. Here's an example of the same snapshots view in Spinnaker. The similarity to Deployboard lets developers transition easily to Spinnaker.

We are able to make this, and other changes, in a standalone Airbnb module that plugs into Deck, Spinnaker's UI service.

How we extended Spinnaker's functionality

In addition to the customizations in the user interface, we have been extending Spinnaker's functionality with Kubernetes jobs, custom webhooks, and Spring Boot components that have been added to back-end Spinnaker services.

Kubernetes jobs are easy to create and integrate with a Spinnaker pipeline. Teams onboarding to Spinnaker use custom jobs to automate non-standard parts of their deploy process. Those teams have full control of compute resources, retry policies, and permissions for the job.

Before Spinnaker, some teams already created services to run integration tests. To onboard these test runners into Spinnaker, we created a webhook stage to call services that use Airbnb's service interface definition language (IDL). By making this native to both Spinnaker and Airbnb, we have been able to more easily plug new and existing services into Spinnaker.

Figure 4. Orca, Spinnaker's orchestration service, can perform operations in other Airbnb services using a custom webhook plugin.

In addition to new features, we have also had to augment existing features in Spinnaker. For example, Airbnb-specific Kubernetes resources need special logic to determine when they have stabilized. We were able to customize this logic in Spinnaker without forking it by including custom Spring Boot components into a build, alongside the core Spinnaker library. By including our custom components into a build with the core Spinnaker library, we were able to easily customize functionality without having to fork Spinnaker code.

 

Figure 5. Sketch of a Kubernetes custom resource handler for Spinnaker. Components such as this let us add features to Spinnaker without changing any core code.

Final results

We started integrating Spinnaker into Airbnb in January 2019. After building our initial UI extension and augmenting the built-in Kubernetes support, we gradually onboarded early customers, starting with internal services with low traffic. Guided by customer feedback, we closed critical gaps with Deployboard. Once Spinnaker satisfied our basic use cases, we started onboarding a diverse set of production services, ensuring that we had all the functionality necessary for Airbnb-wide adoption.

 

By October we had deployed more than 30 services through Spinnaker, including moderately large services with more than 40 active developers. Beyond just adoption and customer feedback, we are tracking progress by measuring regressions that are caught before reaching a full production deploy. Our early adopters already have fewer regressions promoted to production and anecdotally spend less time managing their deploy process when using Spinnaker. Having these case studies will make it easier for us to motivate Spinnaker adoption across all of Airbnb.

The next phase

Through the end of this year, we are focusing on ensuring that Spinnaker prevents regressions, improves productivity for our early adopters, and closes some feature gaps with Deployboard. In the meantime, we are also performing scale tests to validate that Spinnaker can handle managing all deploys at Airbnb. In 2020, we hope to onboard over 1,000 services and start contributing more of our custom logic back to open-source Spinnaker.

Thanks to the rest of our team, Alper Kokmen, Dion Hagan, Freddy Chen, Greg Foster, Ryan Zelen, and our managers, Willie Yao and Bruce Chu, for helping to make this transition a success.

Want to know more? Join us at Spinnaker Summit, where we will be talking in detail about extending Spinnaker to ease Airbnb's transition toward continuous delivery. The conference runs November 15-17, 2019, in San Diego.

[ Learn how to apply DevOps principles to succeed with your SAP modernization in TechBeacon's new guide. Plus: Get the SAP HANA migration white paper. ]