How Uber used mobile performance engineering to pull ahead

Uber is the quintessential "unicorn"—not much of a track record, but a valuation that tops $1 billion. But is its success really that ephemeral? I recently sat down with Matt Ranney, chief systems architect for Uber, who explained that it all comes down to performance engineering.

Yes, the company has a great business model, but its explosive growth was only possible with a relentless focus on performance. And Uber's approach is not as unusual as it might seem. The problems Uber faces—and the tools it uses to solve them—are common to enterprises around the world.

The Forrester Wave: Enterprise Service Management 2018

Why Uber's challenge is your challenge

The list of companies modeling the disruptive sharing economy and crowdsourcing is well known: Uber, Airbnb, Etsy, etc. These companies have no inventory and no products, and yet they are worth billions. They seem to have unique business models that you could never hope to duplicate. But perhaps they are not so unusual.

Consider this: At its core, Uber's business model is a dispatch system. How many businesses are built around dispatch systems? Cleaning services, mechanical support, technical support, even food and education services can be controlled through dispatch. What makes Uber so incredibly successful is not its business model, but how easy it is for people to work inside of its business model.

What Uber has exposed is that companies can transform a business if they engage with customers more effectively. But getting that engagement is tough.

According to Ranney, "Each person connecting to Uber is having a unique experience with highly personalized data. This level of personalization at the scale Uber operates brings problems. Uber's business is not one where the data can be cached or deployed through CDNs [content delivery networks]. What complicates issues is that mobile networks are simply not as fast as a traditional PC. As an example, the session time for a typical mobile device is ten times longer than a PC."

Mobile session time is ten times longer than a PC.

-- Matt Ranney, chief systems architect for Uber

"Another challenge is that Uber cannot afford a system failure. Ever," said Ranney. "If a Uber customer cannot get a Uber car, then they will switch to another app. There is no brand loyalty. The systems must always work."

Ranney lists three key areas that Uber needs for a fully redundant system:

  • Performance: What types of tests do you run to ensure that your systems keep running?
  • Data: How can data operate in an environment where a data connection is intermittent?
  • Future proofing: What technologies does Uber invest in to improve efficiencies in its systems?

Ranney sees this as necessary to support the load data stores require. "The result is that all processes are treated the same, whether they are running on the same machine or not," he said.

How Uber gets killer performance

A common thread through all of Uber's systems is performance, performance, and performance. Each technology is chosen because it is the most stable and delivers the fastest response.

In addition to this, Uber looks to ensure that tools work independently of each other and are destructible. To this end, Uber actively attempts to crash its systems, including networks, databases, and APIs. The system must work even when it is down.

Starting with the mobile apps: Uber does not use HTML5 or hybrid solutions in them. All of the coding is completed with native code using the performance and analysis tools in Xcode and Android Studio.

Uber does not use HTML5 or hybrid solutions.

The next step is a stateful server model to manage the high levels of demand. Uber's answer is a mixture of open source solutions and homegrown magic. Wrapping the process is Uber's Ringpop technology, which is similar in concept to big data solutions such as Riak, Memcache, and Amazon's Dynamo. Ringpop also manages cluster membership and failure detection using SWIM (Scalable Weakly-consistent, Infection-style Process Group Membership Protocol).

The level of performance does not stop at the server. The communication channel, or RPC, is also modified. Uber's version is called TChannel. It is based on Twitter's multiplex RPC protocol, Mux.

Uber needed to invent its own RPC communication channel because it supports more languages than Twitter. Ranney added, "We are even looking to replace HTTP+JSON, a typical REST API, with Thrift, as our tests are showing that it is 20 times faster. We need all the speed we can get."

Uber's tests show Thrift is 20 times faster than HTTP+JSON.

Uber's approach to data

Performance for Uber goes to extremes with data. The typical data structure for a company is a relational database. The problem that Uber sees with relational databases is that the whole system can come down if the database is not available.

Uber uses big data systems as a foundation for its technologies, with tools such as Riak, Postgres, Redis, and MySQL. Also, the company is extending MySQL with its distributed column store to orchestrate the data processes.

Uber uses Riak, Postgres, Redis, and MySQL for its big data systems.

What is clever is that Uber can kill the whole database system and still run. When Steve Jobs took to the stage in January 2007, he described the iPhone as a "computer in your pocket." Uber is taking that quite literally. Uber uses drivers' phones as the method of distributing data, achieving a kind of "super distributed computing."

The result is that stress on replicating data is eliminated from the data centers. The trick is achieved by the phone checking in with a server every four seconds to receive an encrypted digest. If a server does not respond, the phone moves to a new server. The whole data environment is redundant. Also, the more drivers, the more redundancy is added to the system.

In many ways, it is Uber's use of phones that is the secret sauce to its systems. Using smartphones for distributed computing is a step companies do not often take.

Future-proofing Uber's systems for growth

In addition to the core supply-and-demand dispatch systems, Uber does have a third system: Disco. Ranney said, "Disco is the dispatch optimization system. Disco's main function is to match supply with demand. Disco, however, allows Uber to look into the future. We can match predictive supply and demand, whereas our old system could only match what we knew then."

The advantage Disco provides Uber is clear: Through data, Uber can help busy drivers keep efficiently picking up riders. To do this, Uber needs a global index that requires a massive amount of data: over 1 million writes per second.

Uber is using Google's S2 Geometry Library to break down the data and get it out. This library is designed to split data into smaller geographical sections. The result is that each section is not handling only the writes for that geographical location. This in turns helps the company send drivers rapid updates on where riders are located and provide more accurate ETAs for trips. It also gives Uber the opportunity to expand its business into a specific geography.

Google's S2 Geometry Library helps Uber break down its data geographically.

You can use Uber's tools

Uber doesn't use magic dust to power its systems. The systems that Uber's architects use are available to you, too. Here is a list of the core tools:

NodeJS is the core to most of Uber's core systems, but Ranney is not happy with the performance. "It worries me that more companies are not showing concern for the performance of technologies such as NodeJS and Python," he said. "Uber cannot afford to fail, but many other companies are also going to want the same level of performance from their systems. We are always watching emerging technologies that place an emphasis on performance. For this reason, we are looking at IO.js, a branch of NodeJS, that we think will perform faster. We are not afraid to switch out core systems if performance improves."

Not only is Uber a user of open source solutions, but it also actively contributes to the open source community. The following are some of the Uber projects available at GitHub:

  • tchannel-python: Python implementation of the TChannel protocol
  • thriftrw-python: Similar to thriftrw but for Python
  • ringpop-node: Scalable, fault-tolerant application-layer sharding
  • sevnup: Reliably carry on work for a hashring node that owns a keyspace
  • multitransport-jsonrpc: JSON-RPC Client (Node.js & Browser) and Server (Node.js) aim at "natural looking" server and client code

There are currently over 100 projects Uber supports on GitHub. Warning: The documentation for each project is not the greatest, but the owner and contributors for each project are listed.

Can you be the next Uber?

The dream of many CEOs is to be the next Uber. Take a look at your business: Can you split your business into supply and demand? There is a good chance you can. The next step: Is there a way to crowdsource who sells and who buys your product? If you answer yes to both of these questions, then you might be able to use Uber's recipe for rapid growth:

  1. Mobile: Assume your business is being run on mobile and nothing else. Everyone has mobile, and you don't have to invest in new technology.
  2. Inconsistent networks: Don't rely on always-connected networks. Build solutions for networks that will drop occasionally.
  3. Data: Replace traditional relational databases with modern solutions such as MySQL, Postgres, Redis, and Riak. The database cannot be the bottleneck of the system
  4. Open source: Don't reinvent the wheel. There are many high-performing solutions freely available such as NodeJS, Java, and Python.
  5. Talent. The secret sauce for how all of this connects is only as good as the talent on your team. Invest heavily in the best.

The final word is that your business will be disrupted at some point in the same way that Uber is disrupting the taxi business. The question is: Do you want to put in place the solution that will disrupt the industry you are in and leave your competitors in the dust?