You are here

You are here

Microservices migration: How Elsevier took on its monolith

public://pictures/Mark Boyd 2014.png
Mark Boyd Writer, Independent

When science information provider and publisher Elsevier acquired research tool startup Mendeley in 2013, developer advocate Joyce Stack faced a challenge. Like many companies, Elsevier was looking to microservices migration as a Lego-like way to reorient its legacy architecture into more composable functions and data assets that could be plugged into a value chain interdependently, enabling faster product development, continuous delivery of application services, and more granular control over computing and storage resources.

But for Elsevier and other businesses, fear of change and the sheer size of the task can force teams to go through what she calls "stages of denial" as they migrate a monolithic IT infrastructure to a microservices architecture. Here are some typical scenarios that may make development organizations feel that they're not ready for microservices:

  • Enterprises acquire startups and encourage them to maintain their independence and agility, but experience difficulties when it comes to integrating the capabilities of new acquisitions into legacy systems.
  • Medium-size businesses experience the technical debt that comes with legacy infrastructure as they try to use API and container technologies to create new products and services with a faster time to market.
  • Startups on a growth track see some of their early successes, particularly around building a single API that helps them enter and disrupt an existing market, become technical debt when scaling and enabling service components to be accessed by resellers and partners.

Stack says she has experienced all three situations.

Building a "research operating system"

Mendeley started life as a tool to help researchers across the sciences build their academic profiles to help them win grants, build their institutional reputation, showcase their work, and promote their own research and papers. Its product helps researchers manage the impact of their literature, for example, by keeping track of their publications and readers of their work. But as part of the wider Elsevier publications company, Mendeley is now one of many products that help enterprises build out what it calls a "research operating system" that links data, applications, users, and platforms together for the scientific and research community.

Today, Mendeley uses a REST API that returns JSON-formatted data, uses OAuth 2.0 for authorization, and receives about 20 million API requests a day. The API is part of a microservices architecture that the team built using a Dropwizard framework, and it draws on data from what Stack calls a "creaking" MySQL database, with additional data stored in HBase. So how did a startup API manage to get to this level of capability at Elsevier when the company was built in a monolithic legacy system?

In its early days, Mendeley's API infrastructure grew quickly. As a small startup, the team was able to quickly build new features in response to user demands, without having a longer term road map to show how these new product functions would be built. Eventually, however, technology became the obstacle rather than an enabler. It was preventing the team from adding more users, adding an extra column to a database table, and doing real-time synchronization.

"We could not add the new features, we could not resource strategic projects, the technical team had to keep telling the product team 'no, you can't do this'," says Stack.

Dueling APIs

By the time Elsevier began integrating Mendeley into its larger product suite, it had two high-maintenance APIs (one in XML and one in JSON), both brittle to change and drawing on data from an unmodifiable database with a poor data model. To make things worse, the APIs and database are tightly coupled. Accessibility of the data could break if developers changed things without understanding how the monolith was put together. And, as with most legacy systems, that monolith was filled with a spaghetti code of quickly completed, undocumented fixes that made sense to software engineers at the time but left irrational connections between data sources and functions that became dependently linked.

After seven years in software development, the source code contained different methodologies, followed inconsistent standards, and had a bunch of "FIXME" and "TODO" comments. Business logic was split across the two APIs in one code base, and it included dead code that could not be identified until it was actually in production, due to low test coverage of the code base.

The 3 stages of acceptance

Stack describes three key cultural stages—similar to the Kübler-Ross stages of grief—that she sees businesses go through as they seek to tackle their monoliths and turn them into a microservices architecture. These include:

  • Bargaining: After first seeing the enormity of the task, it was natural for the Mendeley team to first consider just living with the monolith that had been created. During the seven-year organic growth process, the API had not been seen as an asset internally, so while a JSON-formatted API had been built, for example, no one was actually using it.
  • Depression: The Mendeley team was nearly paralyzed at the thought of having to make changes to the infrastructure. A sense of doom and hopelessness set in.
  • Acceptance: The technical engineering team had been part of the success of Mendeley's growth in the first place, and several leaders got together and realized that they had to take ownership of the problem.

The road to reorientation commenced. With Conway's Law in the back of their minds, team members started to decouple the front and back ends. They hired additional DevOps and Java engineers, and started to isolate the big pain points.

First they duplicated the functions into microservices. "This allows you to build up your centralized build components, like Dropwizard, wrappers, your Jenkins build pipeline, your monitoring, and all of that," says Stack.

This gave the team confidence, and it began to tackle the larger problem: rebuilding and combining the two APIs. The team split its dev pairs to work in separate services, researched best practices, and began to treat the API as a product.

It started small, speaking with client teams about what they needed. At times, it didn't even build a minimum viable product, instead creating a wiki page that contained ideas for an endpoint contract.

At this stage, the business made heavy use of the API description format Swagger to help the different teams communicate.

Hard decisions

Stack says the business faced some hard decisions in meeting the challenges of microservices migration, and some people felt the brunt of the reorientation more than others. Unfortunately, they are often the early adopters and key clients who helped to navigate the move to microservices. "Your first client will hate you, and you will hate them because they will get in the way of you building a nice, clean REST API," Stack says. It's common for trust issues to develop with the first clients, as iterating in the open with changing endpoint contracts creates a sense that the team does not know what it is doing.

At some point, Stack says, it may be necessary to compromise on design, given some of the tightly coupled issues that your legacy infrastructure has withheld. As a result, you may need to build in several features to assist specific clients so that they can continue accessing the database.

"Like any monolith, you are going to feel like it never ends because of all the weird corner cases you never thought about. For us, it felt like a massive whack-a-mole [situation]," Stack says.

Mendeley has had its new API in production for two years now. It has a versioning strategy and the capacity for quick deployments, and it is "dog fooding the API" for internal use, Stack says. In addition, the developer portal is well maintained, with SDKs and clear documentation.

But there is still work to do, she adds. For one thing, it has not been able to retire its old API yet because several clients are still using it to communicate the changes to their user communities. (Although, for the most part, the team used a Twitter blackout testing strategy to shut down the old API for a period of time, presenting an error message that directs users to the new API.)

New challenges emerge

Like other enterprises that have moved to a microservices architecture and then leveraged that architecture to enable a platform-based business model, Elsevier has seen a new range of technical challenges emerge, Stack says. Those include:

  • How do you decide the granularity of a microservice? That is, how do you find the sweet spot between overly chatty and monolithic?
  • If you discover that your microservices are doing too much or too little, what strategies should you apply to adjust the boundaries?
  • Should you go straight to microservices or start something more monolithic and refactor to microservices later?
  • Can you share code across services?

But the technical challenges are almost always more easy to address. By identifying the stages of denial in the enterprise's change management culture, Stack and the tech team were better able to take advantage of new approaches to APIs and container technologies.

As agile expert Em Campbell-Pretty says, it's not the technology that provides an effective microservice architecture, but the organizational culture that enables it to be implemented successfully.

Keep learning

Read more articles about: App Dev & TestingApp Dev