Micro Focus is now part of OpenText. Learn more >

You are here

You are here

Predictive delivery: How to apply analytics to software development

public://pictures/studiobooth_day2_session2_0208_-_techbeacon_0.jpg
Malcolm Isaacs Senior Researcher, Micro Focus
Many magnifying glasses
 

Predictive analytics is set to radically change the way we develop software. By using statistical techniques, historical data accumulated over previous tasks can be used to forecast how your actions today will affect your results in the future.  

This shift toward a more scientific approach to decision making is already gaining currency among software professionals. Predictive analytics enables teams to optimize their backlog and software delivery pipelines to more accurately predict their capacity and velocity, which results in better use of resources and, ultimately, higher-quality software.

The use of predictive analytics in itself is not a guarantee that you’ll improve your estimates and forecasts. The real value lies in the outcome of actions that predictive techniques recommend for you. Analytics tools will let you model different scenarios to come up with the best course of action to reduce bottlenecks in your delivery pipeline, increase the quality of your deliverables, and ensure that the content you deliver to your customers is relevant.

Here, I will explain a subset of predictive analytics known as predictive delivery, which is the application of predictive analytics to software development. This will soon become an indispensable part of software delivery tools.

Predictive delivery basics

Whenever you run a build of your software, you generate huge quantities of data. You’re constantly creating large numbers of files, whether you're running functional, performance, or security tests; reporting or fixing defects; or even running the build itself. These file are generally considered to be temporary, lying around until the storage gets full, then deleted to make room for more. But they actually contain useful, or usable, information. They are a record of what happened at the time they were created, and they can be leveraged to turn that data into knowledge.

Predictive analytics algorithms can read those files and look for patterns that the unaided human eye can’t detect. There are lots of examples of places where predictive analytics algorithms have become embedded. The automotive industry has embraced predictive analytics for maintenance, collision avoidance, and security, as well as autonomous vehicles. Marketers use it to answer questions about user behavior. The banking industry uses it for fraud detection and application screening. It's everywhere.

It's also in the software delivery lifecycle and will become more prevalent in the near future. The 2016-17 World Quality Report, by HPE, Capgemini, and Sogeti, foresees predictive analytics being leveraged to make decisions about quality, and for continuous monitoring of software, with 40% of respondents looking to use predictive analytics as an automation technique over the next 12 months.

Types of predictive algorithms 

At the heart of all forms of predictive analytics is the use of mathematical algorithms to identify and explain patterns in data and the extrapolation of those patterns to make predictions about the future behavior of that data. Predictive analytics uses several algorithms, including:

  • Regression algorithms: There are many different types of regression algorithms, all of which are designed to estimate a relationship between variables. They range from simple linear regression models, which estimate a straight line that fits the data, to nonlinear regression, which estimates a curve that fits the data. Multivariate regression models are used to identify multiple relationships within a dataset.
  • Time series analysis: Time series analysis looks for relationships between data that is collected over a time interval, to predict how it might behave in the future.
  • Machine learning: Machine-learning algorithms examine the input data and attempt to find a relationship to a specific set of data. As it is exposed to more data, a machine-learning algorithm refines the relationship until it produces a good match. Machine-learning algorithms can be "supervised," in which case they are given some examples up front to tune the relationship, or they can be "unsupervised," in which case they attempt to figure out the relationship without any hints.

The results of a predictive analytics algorithm are best presented with some context, to indicate what the data represents and the circumstances under which it was gathered. Interactive graphical formats are best, since they allow the user to play with the relationship between the data points and recalculate, and so refine the prediction for different circumstances.

There must be enough data to get meaningful results, and it’s imperative to review the results as they change due to new data points being processed. Predictive analytics must not be a one-time activity. Continuous recalculation is necessary if your business is to make the best decisions over time.

The parts of a predictive algorithm

A predictive algorithm is made up of the algorithm itself and the data that it processes. It's important to understand both of these in order to achieve the most accurate predictions.

Data

The term "big data" often refers to data sets that come in a combination of volume, velocity, and variety, commonly called "the 3 Vs." Big data analysis requires specialized techniques to extract patterns and information. In recent years, improvements in storage capabilities, processor speeds, parallel processing, and algorithm design have laid the groundwork for processing large quantities of data in a reasonable period of time. This can be seen in software development: frequent software builds (velocity) will generate lots of different types of data, such as test results, log files, production monitoring data, etc. (variety). And of course, these resulting data sets are going to be large (volume).

To get meaningful results, it’s important to ensure that the data is consistent. For example, if you want to analyze defect severity across multiple projects, each project must use the same values to rank severity. If one project uses "1" to indicate a critical defect and another project uses "5" for a critical defect, the accuracy of the results will be anybody’s guess, so the values must be normalized before they can be used.

Algorithm

The other main component is the algorithm. It's important to choose, or develop, the correct algorithm that will be used to process the data. Simplicity, or "parsimony" as researchers like to say, is critical because, as models grow in complexity, they become more sensitive to changes in the input data and can distort the prediction.

Typically, it is the data scientist who chooses the correct algorithm for the job. The data scientist understands the business and can make decisions based on the specific problem to be solved and the data available to solve it. In predictive delivery, the algorithms are tailored to the software development domain and typically let you refine the algorithm’s parameters as necessary.

Predictive delivery

Predictive analytics can be used to solve problems in the software development lifecycle. For years, developers have been producing data as they plan, build, test, and deploy their software, whether as part of an agile development environment or in a traditional waterfall organization. But until recently, as noted above, that data sat idle until it was deleted to make room for more.

Today, however, developers are in a position to ask and answer some very interesting questions based on that data, such as:

  • Will my team be able to meet all of its commitments?
  • Are we wasting time testing scenarios that aren't used?
  • Are we prioritizing our development effectively?

Questions such as these, and their answers, are in the domain of what we call predictive delivery.

The predictive delivery pipeline

The predictive delivery pipeline refers to the development of software as viewed from the point of view of predictive analytics. The analysis can be applied at each stage of the pipeline to better understand what is likely to happen; the results are combined to give you a continuously updated, end-to-end overview of the likely results of your development process.

A team of developers has many tasks that it needs to complete. By using predictive algorithms, the team's past performance can forecast its future performance. As development progresses, you can refine your forecast and understand what changes to make in order to influence the content of the release. Here are three of the aspects that can be analyzed.

Planning

By looking at the tasks that are assigned to a team, it is possible to understand how long it will take that team to implement features, based on the group's past performance. You can attempt to answer the following questions:

  • How long will it take to complete all of the tasks?
  • Will we meet the scheduled release date?
  • Which tasks should we remove if we're not going to meet the date?
  • Do we have the capacity to add more functionality?
  • What would be the effect of adding more people to the team, or removing people from the team?

Estimates of time, effort, and costs in software development are notoriously inaccurate. They're usually far too optimistic, leading teams to either deliver functionality later than intended, deliver less functionality in order to meet the dates that were set in advance, or deliver low-quality work.

By applying predictive analytics to backlog planning, you can develop more accurate estimates. By combining the data for multiple teams, you can build an accurate plan of the backlog for your release and set more realistic delivery goals.

Development

As developers and testers produce and test software, they generate large quantities of data that can be used to answer questions such as:

  • How long will it take me to fix a certain set of defects?
  • Which tests should I run on my code?
  • How can I shorten the time it takes to find out if my code breaks the build?
  • What parts of my code are at risk of having defects?

Predictive techniques applied during development allow more efficient and faster throughput, while increasing the quality of the output.

Operations

The ultimate goal of software development is to get working software into the hands of the customer or user as soon as possible, at a high level of quality. Predictive techniques offer insight into how software will be used in the field and help you reduce the likelihood of defects making their way to production. For example, you can get insight into questions such as:

  • What's the most likely usage scenario for a feature?
  • Am I investing the right amount of effort in certain functionality?
  • Am I getting an accurate status report of my deployed software today?
  • What should I expect to see tomorrow?

The future of predictive delivery

Predictive analytics is surprisingly prevalent in our lives. We interact with it on a daily basis, often without knowing it—when, for example, our car's collision-detection system is invoked, when we see targeted advertisements pop up as we browse the Internet, or when we apply for a new credit card. The same techniques are increasingly being applied to the software development lifecycle.

By introducing predictive analytics into the development pipeline, we can reduce waste and increase the velocity and quality of our software, saving time and resources, and keeping users happy. We’re just beginning to see predictive analytics in our software development and lifecycle management tools, but it is on its way to becoming a vital part of the way we create and deliver software.

Keep learning

Read more articles about: App Dev & TestingApp Dev