How a systems failure took Bloomberg customers offline

This article is part of an ongoing series of Performance Retrospectives that assess real-world application performance issues in the news, analyze what might have happened, and offer up best practices that just might help you avoid similar problems.

Shortly after 8:00 AM on Friday, April 17, the Bloomberg trading terminals went dark and stayed offline for two hours after a major systems failure.

What happened

"We experienced a combination of hardware and software failures in the network, which caused an excessive volume of network traffic," Bloomberg said in a statement about the systems failure. "This led to customer disconnections as a result of the machines being overwhelmed." The company added that "multiple redundant systems" failed to prevent the disruption.

Why it happened

This outage highlights the complexity of modern composite applications and the dependencies on services. When these services become unstable or fail, they can have a broad impact on internal users and customers.

The business impact

The systems failure took down Bloomberg's trading platform, data service, and chat platform. It affected financial markets around the world, exacerbating a spike in volatility in European stocks and causing some debt sales to be postponed. Bloomberg suffered negative publicity. A Financial Times article carried the headline "Bloomberg's global outage paralyses investors" and noted that 315,000 customers rely on the service. Bloomberg operates in a competitive market. Last year, the financial data services firm increased its market share to 32 percent of the $26.5 billion market. Outages can send customers to competitors.

It's difficult to fully measure the cost of the outage to Bloomberg subscribers in terms of total business losses. However, assuming that each of its 315,000 subscribers works the typical 264 days a year and pays $20,000 per year for the subscription, the calculated impact of the two-hour outage would be $18.94 per subscriber or a total of $5.97 million just in paid subscription costs.

This does not include other impacts to subscribers, such as the negative impact on the trading floor, the inability to make informed decisions, and compromised communication between traders who were unable to execute timely transactions. More broadly, "The lack of price visibility was blamed for accelerating a sell-off in European shares, while trading volumes in German government-bond futures contracts fell by around a third," according to the Reuters story.

Takeaways: Test for resiliency

Initial software and hardware failures highlight a weakness in the resiliency of the system. Performance and resiliency of the system should be tested and hardened to prevent this type of outage from occurring again. There are several ways you can do this today. Modern testing practices use lifecycle virtualization to quickly and inexpensively recreate conditions and dependencies in a pre-production or disaster recovery environment. This practice enables testers to conduct "what if" scenarios easily while observing the resiliency of applications and the end-to-end system.

*Image source: Gforsythe (Own work) [CC0], via Wikimedia Commons

Keep learning

Take a deep dive into the state of quality with TechBeacon's Guide. Plus: Download the free World Quality Report 2022-23.
Put performance engineering into practice with these top 10 performance engineering techniques that work.
Find to tools you need with TechBeacon's Buyer's Guide for Selecting Software Test Automation Tools.
Discover best practices for reducing software defects with TechBeacon's Guide.
Take your testing career to the next level. TechBeacon's Careers Topic Center provides expert advice to prepare you for your next move.

Read more articles about: App Dev & Testing, Testing

You are here