8 Phases of an AI-First Data Strategy

To successfully deliver innovative customer experiences and better business outcomes enabled by artificial intelligence (AI), organizations need to shift to an AI-first data strategy. This means designing data models with an AI-first mindset rather than trying to retrofit old models, according to a recent Forrester Research report.

"The more that you can understand about your business and your customers, the more that you can attach and connect those insights immediately to the way that you engage and create experiences and value—and automate the way that you execute your business," said Michele Goetz, vice president and principal analyst at Forrester. "The data element becomes increasingly important because your intelligence is only as good as your data."

Otherwise, Goetz explained, if organizations don't address the data, don't strategize how they source and learn from that information, or don't apply data and data strategy to the business, their AI strategy is just information stuck on a dashboard.

What Is an AI-First Data Mindset/Strategy?

An AI-first data mindset first requires understanding the problems that companies want to solve with AI and then identifying the data that will help them do that, said Rameshwar Singh, vice president of technology at Thrillworks.

"The strategy of an AI-first data mindset is to build the foundation of quality data objects that can feed into the AI models and provide solutions to your business problems," said Singh. "AI has specific use cases where it is more applicable. To feed input to those use cases you need to structure data in a more meaningful way."

Enterprises typically maintain massive amounts of unstructured data in chats, emails, reports, articles, and recorded conversations. An enterprise that adopts an AI-first data strategy will think about how to reliably extract information from unstructured data.

"Instead of storing that information in tables where the types of data, labels, and categories must be predicted by a developer, it can be mined in its raw, unstructured form," said Robb Wilson, author of Age of Invisible Machines, and the founder, lead designer, and chief technologist at OneReach.ai, a conversational-AI platform, “eliminating the need for complex schematics and database architects.”

According to Forrester's report, these are the eight phases of an AI-first data strategy:

1. Discover and Source Data to Represent the Business in Data

To move forward with AI and machine learning (ML), data-management professionals must first understand what internal and external data is available to them, Goetz said.

"They have to have a strategic mindset about it," said Goetz, "because what they're trying to do is recreate their business, recreate their customers, [and] recreate their value and their outcomes through the lens of that data."

From there, data-management teams should continue to define and redefine the data for new enterprise use cases.

"This step focuses on gathering high-quality representative data that’s relevant to the use case and outcome as part of ML development and eventual deployment of AI," reads the report. "New data and data types (text, voice, image, audio, video) should augment and improve ML models as data becomes more representative of the environment where AI is deployed."

2. Capture and Ingest Data for Quality and Relevance

If they're going to meet expectations, AI solutions need relevant, trusted, and continually fresh data and types of data, (e.g., structured or semi-structured content), according to the report.

Once the data engineers know where the data is, they have to figure out how to bring it into their systems as well as ensure that it's of high quality and that it's ready to use, Goetz said. They also need to understand and apply whatever data-governance policies are necessary to address various jurisdictional compliance needs.

Goetz explained that this ensures that they're using data responsibly and appropriately to get the right insights to make better business decisions.

3. Curate and Model Data for Better Context

To maximize AI's efficacy and make data-analytics decisions, data engineers must use both internal and external data. According to Goetz, this means having to continuously classify, label, and certify that data to understand and govern it for self-service use.

"They have to really think about how they're instrumenting the ability to consume information—and doing that through [not only] technical but also nontechnical, no-code/low-code means," said Goetz. "[This will] ensure that they're building the right features and doing the right feature engineering."

Forrester urges in its report that using automated ML (AutoML) tools for labeling and modeling data to accomplish this. Using AutoML tools, companies can uncover valuable new business insights, embed advanced AI functionality in their applications, and allow data scientists as well as nontechnical experts to rapidly build predictive models.

4. Transform and Prepare Data for Increased Relevance

"Transforming and preparing data for ML and operational systems requires sharing, transparency, and traceability to reconcile transformations, whether in the data flow or in the model itself," reads the report.

To this end, data scientists and data engineers must work together to transform and prepare data to streamline its flow into the AI system.

"As we're ingesting and bringing forward the data, we're transforming it from the different sources and preparing it so that it is ready and available for analytics purposes . . . to help extract new insights," said Goetz.

5. Test and Train the Model to Engender Trust

A holistic approach to AI testing can help ensure effective and ethical data use in AI models, according to Forrester.

Goetz describes this "test and train" approach as a data- and analytics-oriented view across the entire software development lifecycle of AI solutions and applications.

"This observability of the information . . . assures the quality and compliance of those models so you can see how these models and AI capabilities are going to run over time within your environment," she said. "But you have the solution testing that has to work with that, and you need to ensure that your solution is going to work and run appropriately in your ecosystem."

Goetz added that the other aspect of "test and train" is AI governance. As such, "test and train" can help organizations determine for themselves:

If they are adhering to ethics
If they are being responsible
If their models are running sustainably
If their models achieve the right performance and outcomes as expected

"Maintaining the data governance oversight [enables] you to understand and interpret if data does change, or [if] it's not in compliance anymore," said Goetz. "[T]hat's going to have a downstream effect on the effectiveness of your models."

6. Deliver and Deploy Data for Scale

"A persistent bottleneck for AI is the ability to release models at scale and deploy AI applications at that scale," reads the report. "While data engineers, scientists, and stewards can build asynchronously, scaling out the release, publishing, and sharing of data and analytics products requires a coordinated and orchestrated approach to launch AI applications."

Goetz explained that this means bringing together developers, engineers, data scientists, and other contributors and stakeholders to work together "in a matrix organization."

"Coordination and collaboration are important to maintain consistency in the solutions," said Goetz. "[So too with] a set of standards in terms of the protocols, data, development capabilities, and models that you're using."

7. Execute and Act Dynamically to Drive Outcomes

If an AI solution is going to work well, everything surrounding it has to work well too. This means ensuring that it's running appropriately within an organization's environment to drive outcomes, Goetz explained.

"This [phase] tends to be a little more technically oriented; you're making sure that everything's running with your service-level agreements (SLAs)," said Goetz. "ModelOps applies here too because you're constantly watching the performance of that model as it's happening and executing in that environment."

From there, explains the report, "data and ML must have the ability to continuously develop, integrate, and deploy new intelligence capabilities into AI applications and adjacent use cases."

8. Observe and Evaluate for Refinement and Ongoing Governance

But how do you know whether your AI solution is working well? Just as you would with any other form of risk—through governance and monitoring.

"Data governance is no longer a program for regulatory and security capabilities only," advises the report. "Data observability tools can uncover anomalies in the AI pipeline so data scientists can interpret them and retrain or enhance their models."

This allows data scientists to both avoid AI bias and correct data drift (i.e., change in the model input data that can cause the model to perform poorly).

"[You] can make decisions about what to do next, how to optimize the models, and decide whether to take a model out of production and replace it," Goetz explained of data observability. "So whether you're in data science, engineering, application development, or even the governance team, you have a window into what's working and what's not."

Keep learning

Keep up with QA's evolution with the World Quality Report 2022-23.
Put performance engineering into practice with these top 10 performance engineering techniques that work.
Find to tools you need with TechBeacon's Buyer's Guide for Selecting Software Test Automation Tools.
Discover best practices for reducing software defects with TechBeacon's Guide.
Take your testing career to the next level. TechBeacon's Careers Topic Center provides expert advice to prepare you for your next move.

Read more articles about: DevOps, DevOps Transformation

You are here