How predictive analytics will disrupt software development

Robert L. Scheier, Principal, Bob Scheier Associates

What if you knew how many software bugs to expect, where you need to test for them, and how much time you would need to fix each one before you even started a project? Many businesses already use predictive analytics, also known as machine learning, to analyze historical data in order to make predictions about future outcomes.

The technology is widely used in areas such as insurance, to predict losses; by law enforcement, to determine where and when property crimes are most likely to occur; and in financial services, where it's used to predict everything from fraud to which applicants are most likely to default on a loan. It can also be used to make predictions about what changes you should make to avoid future problems and change predicted outcomes, a discipline known as prescriptive analytics.

Those same principles are just beginning to be adapted to application development and could soon become a major disruptive force, enabling you to deliver software faster, and with fewer defects. Here's how the technology works, how it's used in business, how it will affect your software development life cycle, and what you need to do now to prepare.

World Quality Report 2018-19: The State of QA and Testing

The power of prediction

While the use of predictive analytics in the software development life cycle is relatively new, its value in solving operational business problems is well understood. Every year, the pace of change ramps up, with new technologies, rivals, and business models challenging your place in the market. Anything from a rise in interest rates to a snowstorm in a major market can force adjustments to your product mix, pricing structures, or production schedule. While reacting on the fly, you must maintain lower costs and higher quality than your competitors.

Predictive analytics enables businesses to anticipate emerging needs and adapt before their competitors do, instead of scrambling to react after change becomes obvious. Unlike traditional analytics and reporting tools that provide a "rear-view mirror" perspective on what's already happened or is now happening, advanced analytics applies algorithms, and its tools offer a "view through the windshield" to predict future outcomes.

That same capability is starting to be integrated into the software development life cycle, promising big gains in quality and delivery schedules. "There are currently many apps with attractive trend lines and dashboards that record factors such as the number of defects," says Megan Sheehan, senior product manager for application life-cycle management at Hewlett Packard Enterprise. "Executives love them, as their bonuses are often tied to such metrics." But these are just the first step in analytics, she says. The next is to mine the data, use more advanced analytics to identify ongoing trends, and project with some degree of confidence what is likely to happen in the future."

[ Webinar: Agile Portfolio Management: Three best practices ]

Real-world applications

Predictive analytics is used extensively in the consumer packaged goods industry to predict everything from what color to make a box of toothpaste to what promotions to offer at what times of the year, says Sheehan. "They've done a lot of data mining and modeling to figure out how to optimize sales and profits."

The World Quality Report 2015-2016, published by Capgemini, Sogeti, and Hewlett Packard Enterprise, found that 66% of respondents in the automotive industry have business/analytics specialists in their testing organizations, and 47% use predictive analytics to construct test strategies. Among other tactics, they use big data to help determine features and pricing for new products.

In health and life sciences, analytics helps healthcare providers reduce costs by tracking the prevalence and likelihood of costly hospital-acquired infections. Across the health and life sciences industry, the report says, 67% employ business/analytics specialists. Advanced analytics is driving other industries as well. Consider:

  • One fast-growing global agricultural products company uses predictive analytics to increase visibility into its customers' purchase plans and its supply chain. This helped it negotiate better pricing and scheduling of supplies. It has improved inventory turns by 50 percent and its forecast accuracy by 30 percent,saving $1 million annually.
  • A telecommunications provider used advanced analytics to customize offers for millions of customers and reduced its customer defections to other providers by 28 percent.
  • Police, academia, and a private company tweaked an algorithm originally used to predict earthquakes to instead predict where crimes were likely to occur, with accuracy within 500 feet. By moving patrols into those hot spots, police were able to reduce the crime rate. The results in Los Angeles include a 33% reduction in burglaries and a 21% reduction in violent crimes in areas where the software is being used.

As demand for such services grows, "predictive analytics vendors are providing tools that lower the barrier to entry and increase appeal for those with less statistics skills," according to a Forrester Research report on on big data predictive analytics solutions.

With the development of more sophisticated analytics models and the growing availability of cloud-based analytic computing power, predictive analytics is just starting to be applied to IT processes, including application development. This can help meet what the World Quality Report calls one of the leading challenges in today's ultra-short development cycles: the inability to decide on what to test. Advanced analytics and continuous feedback "will become major enablers for the prioritization of tests," the report says.

Automated predictive analytics processes will help testers understand the impact of changes made in the development stage across the entire software development life cycle, identify the amount of testing needed to produce a minimum viable product, and identify focus areas for testing based on feedback from the production team as well as the size and skills of the testing team, the report says.

Understanding predictive analytics basics

Using the same data as traditional analytics, such as daily sales on a website, defects per hour on the production line, or loan repayment rates, predictive analytics can show what will happen in the future.

Beneath that high-level distinction, though, are many decisions a business must make about how exactly it can use predictive analytics to improve the bottom line.

Advanced analytics can be based on a wide choice of analytic models, says Dean Abbott, co-founder and chief data scientist at SmarterHQ Inc., which provides data analytics and contextual marketing solutions for the retail industry.

Linear and logistic regression

Some of the oldest, but most common, approaches are linear regression and logistic regression, which Abbott calls "supervised" learning methods. This means they estimate a specific outcome that the company has an interest in predicting. You use linear regression to define a value on a continuous range, such as the risk that an applicant for auto insurance will have a car accident in the next year. Classification algorithms such as logistic regression are used for either/or predictions, such as whether a prospect who has viewed a certain combination of content will purchase a product.

In "supervised" learning, the algorithm solves for a specific outcome, such as whether a stock will rise or fall or when a certain piece of equipment will need maintenance. "Unsupervised" learning is used if the aim is to better understand new market needs or find unexpected trends without looking for a specific outcome or target variable.

Here, the model analyzes as many combinations of factors as possible to find interesting correlations from which a human can draw conclusions. Examples of such conclusions might include "Smartphone customers between 18 and 25 in urban areas are using voice more than text" or "Since we began using a new fracking chemical at our well sites, pump filters require cleaning twice as often."

Decision trees and neural networks

Two popular analytics algorithms, decision trees and neural networks, can be used for either type of predictive analytics. Decision trees present the results of an analysis in a series of if/then choices, such as that used to determine the likelihood that a wireless customer will leave for another provider. One "branch" in the tree might be customers who called more than three times in the past year to discuss their bills. If they had placed multiple calls, that would increase the likelihood of their defecting to another carrier. Below that, another branch might indicate whether or not they are under the age of 24. A "yes" answer would also raise the risk they will leave, since analytics show that younger customers are more likely to churn than are their older counterparts.

Neural networks use large numbers of highly interconnected processing elements to learn by example and solve problems. Their advantages include the ability to identify patterns and trends too complex for humans or other automated techniques and the ability to work more quickly because many computations can be executed in parallel.

Advanced analytics also involves a process called model training, says Simeon Fitch, director of software architecture at analytics consultancy Elder Research Inc. It takes various forms, but model training generally involves partitioning data into "training" and "testing" sets. The first set is used to build an analytic model, and the second data set to evaluate the performance of the model. The evaluation process is repeated using different data sets. Significant variability in the performance of the model among data sets indicates a problem in the methodology, Fitch says.

Good predictions require good data

Remembering the old cliche? of "garbage in, garbage out," businesses need the right amount of quality of data to produce effective predictions.

For example, says Abbott, it's important to gather data going as far back in time as you hope to predict going forward. In other words, predicting product sales for the next six months requires at least six months of historical data to capture as many factors as possible that might affect sales over that period.

Data management can be complicated, because those responsible for the data warehouses used for historical analysis may have normalized the data, organizing it to reduce redundancy and discarding seemingly unnecessary fields. By contrast, says Fitch, data scientists often want "the raw data, every little bit they can get their hands on," because it's impossible to determine what data in, say, an application's crash report, will turn out to be significant.

One major reason predictive analytics efforts fail, says Abbott, is that they include historical data that was not available at the time the model was used for scoring. For example, in using unemployment data from the month of July to predict the influence of the unemployment rate on stock prices, you would need to use the unemployment rate as it stood in July, not the revised figure issued in August or September.

Advanced analytics also has limited value unless the insights can be deployed into your software applications and business processes, according to Forrester. API calls, web services, and predictive model markup language (PMML) are among the ways companies are seamlessly integrating predictions into their business.

Recognizing the scarcity of good data scientists, "vendors are providing tools for users who may only have a computer science or undergraduate statistics backgrounds,"says Forrester. These use modern interfaces familiar to developers with experience in integrated development environments such as Visual Studio or Eclipse.

Getting your data in order

To prepare, start gathering and retaining the required data you'll need, and work with advanced data visualization tools "to explore the data from various sources to determine what might be relevant for a predictive analytics project," says Forrester. In fact, it says, many predictive analytics practitioners spend more than three quarters of their time just preparing data, a process that includes such steps as calculating aggregate fields, stripping extraneous characters, filling in missing data, and merging multiple data sources.

"Hold on to your data and ensure that it is clean," says Sheehan, and ensure that business owners are using data fields consistently. She recommends developing each predictive analytics implementation independently, "with different dependencies and independent variables, depending on what questions you're trying to answer and what data you need." For more advanced needs, Fitch adds, companies might turn to more advanced or proprietary algorithms, or combining several algorithms and employing a voting process to decide which results, or combination of results, to use.

To be most effective, predictive analytics must be an ongoing, iterative process. "Predictive models are only as accurate as the data fed into them, and over time they may degrade or increase their effectiveness," according to the Forrester report. "To monitor models for ongoing effectiveness and value, newly accumulated data is rerun through the algorithms. If and when the model becomes less accurate," it says, it will be necessary to adjust the model, such as by adjusting parameters in the algorithms, or finding additional data.

Businesses that successfully use predictive analytics, says Fitch, focus first on an immediate, achievable business goal. It's easy to think of every "pie in the sky" predictive analytics feature, he says. But it changes from a cost sink into a cost saver only when analytics becomes part of the business culture, he says. If advanced analytics can deliver just a 1 percent improvement on mean time between failures in a manufacturing process, or improve software delivery times with fewer defects, that will get the attention of the business. "Demonstrate the actual value of [predictive analytics] to the business, and use that to build momentum," he says.

The biggest challenge, says Fitch, is ensuring that the people who see the analysis have the power to force change. This is especially important if the analysis reveals hard truths or will force changes in people's jobs or organizational structures. Backing from C-level executives who trust the process and stick their necks out for it, and who an do the arm twisting needed to make the resulting changes, is essential to success. "You need to have someone with the power to implement the change the predictive analytics points you to or there's no point in doing it," he says.

Photo: Christian Schnettelker/Flickr