You are here

NYSE outage raises questions about the resiliency of critical systems

New York Stock Exchange outage raises concerns about critical systems

public://pictures/Jaikumar-Vijayan-Freelance-Writer.png
Jaikumar Vijayan, Freelance writer

The recent NYSE outage raises concerns about the resiliency of critical infrastructure systems to accidental and maliciously triggered glitches.

The unspecified system malfunction halted all trading on the New York Stock Exchange (NYSE) for an unprecedented three-plus hours Wednesday and first surfaced shortly after trading began Wednesday. The problem initially affected only customers who had submitted orders for some 200 smaller stocks listed on the exchange. A market status report posted by NYSE pointed to a connectivity issue involving two network gateways as the source of the problem. Customers who did not receive any confirmation for orders they had placed for the affected stocks were advised to cancel their orders.

A trader on the floor who spoke with The New York Times said the issue caused the NYSE to manually cancel about 700,000 orders that were in the system when the problems began. The NYSE then rebooted its systems and issued an alert again at 9:37 a.m., saying the problem had been successfully resolved. However, less than two hours later, it suspended all trading activity, citing technical issues once again.

World Quality Report 2018-19: The State of QA and Testing

Software update to blame

An unnamed source quoted by Boston Globe said that NYSE officials had informed traders about the problem being related to a software update that the exchange had implemented just before trading began Wednesday. Some described NYSE officials as panicking as the problem escalated and saying they had lost control of the system. (Update: NYSE confirmed on Thursday that the outage was caused by a software update.)

The NYSE finally resumed trading shortly after 3:00 pm, or about three-and-half hours after it first suspended all trading. In an update, the exchange noted that all the issues had been remediated, except in the case of its OpenBook system pertaining to limit-order trades.

[ Conference: ADM Summit 2019: Optimize Your Deployment Pipeline ]

Coinciding airline outage sparks cyber attack concerns

The lengthy disruption sparked immediate concerns of a cyber attack, especially because it happened relatively shortly after United Airlines said network connectivity issues had caused it to ground all flights nationwide for several hours. Troubles with the Wall Street Journal's website at around the same time only compounded the concerns further.

[ Also see: When computer issues ground an airline: Why it's important to build in performance ]

Both NYSE and United were quick to point out that their problems weren't the result of a cyber breach, while officials at the White House, FBI, and the US Department of Homeland Security (DHS) reiterated those sentiments with equal alacrity. In a statement, DHS Secretary Jeh Johnson noted there's nothing to suggest a cyber attack with the disruptions. "It appears from what we know that the malfunctions at United and the stock exchange were not the result of any nefarious actor." What happened at the Journal, however, remains unclear, he said.

A question of system resiliency

The prolonged disruption suggests that even the most sophisticated networks in the world aren't as resilient to problems as some might have imagined.

If the DHS and FBI are correct in ruling out a cyber attack as a cause for the outage, then it suggests that the country's technological foundation is in a really bad shape, said Igor Baikalov, chief scientist at Securonix. "It's our critical infrastructure we're talking about," he noted via email.

"To have vital transportation, financial, and media companies that are heavily dependent on technology experience disrupting glitches in their busiest hours is something that only a global war game scenario can envision," Baikalov said.

Availability and continuity goes MIA

The NYSE outage raises concerns that the exchange hasn't paid as much attention to ensuring availability and continuity of its systems. For many organizations, the cost of implementing high-availability and fault-tolerant features can be very high, Baikalov said. "As long as the cost of implementation exceeds the cost of outage, businesses are not going to do it."

The NYSE outage points to a failure to build in enough redundancies, so if one system fails, other systems can pick up the slack, said Pierluigi Stella, chief technology officer of Network Box USA. "Certain major disruptions don't happen because a disk broke in an organization of this scale; they happen because someone made a mistake," Stella said. The disruption at United points to a similar failure to build in enough redundancies where needed, he noted.

Not the first, nor last

As significant as Wednesday's disruption at the NYSE was, it's not the first time something like this has happened. In 2013, the NASDAQ exchange had to suspend trading operations for more than three hours because of what it described as a technical glitch in a system that disseminates market data for securities listed on the exchange.

The disruption in NASDAQ's Unlisted Trading Privileges Securities Information Processor (UTP SIP) system halted trading on more than 2,000 stocks, including marquee brands like Google, Amazon, and Cisco. That disruption, unlike the one today, also affected trading on all NASDAQ listed stocks, and funds, as well as options on other exchanges.

In another incident, the BATS Global Markets electronic stock exchange suffered a nearly hour long disruption in 2013 as the result of what the exchange called an internal network issue. It was the second outage at the exchange. In 2012, the company had planned to use its own stock to inaugurate the all-electronic exchange, but a software problem forced BATS to abandon the effort and cancel all orders that customers had placed or were attempting to place via the exchange.

Such incidents also point to the issues that can result from the increasingly complexity of modern networks, said Tim Erlin, director of IT security and risk strategy at Tripwire. "The level of complexity can be staggering, and this means an error made by a developer half-way around the world somewhere in the supply chain of a service can impact the operations of major businesses [in the U.S.]," Erlin said.

Image source: Flickr

[ Partner Resource: Learn more about key trends in performance testing. Attend the PerfGuild online conference, which runs from April 8-9. ]