Micro Focus is now part of OpenText. Learn more >

You are here

You are here

Why Unstructured-Data Visibility Matters

Krishna Subramanian Co-founder, President and COO, Komprise
Photo by Kevin Ku on Unsplash

Most enterprises are flying blind with their unstructured data. They don't know what they have, who is using it, why it's growing so fast, or how to be more efficient in managing it.

IT leaders need insight into their unstructured data. Without it, they are hindered in their ability to cut significant costs on data storage. As it is, most enterprises are spending more than 30% of their IT budget on data storage, backups, and disaster recovery, according to a 2022 survey on unstructured-data management.

Beyond high spending—which can get higher if you don't optimize cloud-storage placement—there is also the question of monetizing data. Unstructured data too often holds significant untapped business value. Most organizations use but a small percentage of the data they produce and store. A recent Accenture study revealed that 68% of companies don't realize tangible and measurable value from their data.

Since unstructured data comprises the lion's share of all data in the world, you need to know what data you have, who needs access to it, how much of it is active, where it is stored, and its value to the organization. You need visibility.

Attaining this visibility isn't easy, of course; in our complex world of hybrid clouds, unstructured data is strewn across corporate and colocation data centers, edge systems, and various cloud services. Moving data into a central repository would be an expensive and likely impossible proposition because of the distributed nature of data and data creation in the modern world.

Since unstructured data (including images, video, and documents) can reach billions of files of various types and sizes, organizations need a systematic approach to analyzing and classifying it. Creating searchable data index of all the organization's data across silos—from on premises to edge to cloud—is an important first step to getting visibility.

Getting Started with Fundamentals

You can address data-visibility issues in your organization by developing a plan and process to assess and track your unstructured data. There are several fundamentals about your data that you'll want to start tracking, including:

  • Volume of data in storage 
  • Growth rate of data over time 
  • Age of data 
  • Access patterns, such as time of last access 
  • Location of data 
  • File types and file sizes 
  • Top data owners and types of data they are storing 
  • Costs of data storage, backup, and disaster recovery today and in the future 

Here's why these data points are important:

Data-usage metrics: Without the ability to see which files/shares/directories are being used regularly and which haven't been touched for a year or more, it's hard to do anything other than keep all your data on your expensive, high-performing storage. If, however, you can see how much of your data is rarely accessed (or "cold"), then you can manage it at a much lower cost by migrating or tiering it to cheaper storage, such as cloud object storage (AWS S3 or Azure Blob, for instance). Additionally, in organizations with chargeback models in place, department managers need to know data-growth metrics and who the top data owners are so that those individuals are included in data-management conversations.

Sensitive data: Organizations sometimes need to delete data altogether for legal reasons—for instance, ex-employee data or ex-customer financial data. The ability to easily search customer and individual names connected to files delivers a huge advantage here. Granular search capabilities (such as by file extension or metadata) let the user locate intellectual property or financial data that might have been copied or moved to a location without appropriate security protections or access rules applied.

Financial metrics: As part of a data-operations (DataOps) and financial-operations (FinOps) strategy, IT leaders should understand the costs of storing data on current technologies and be able to project costs for moving to a different storage platform. From there, they can determine if it would be cost-effective to, say,

When armed with knowledge on their data assets, IT teams can set policies to transparently tier data to the most cost-effective storage based on datasets' use cases and priorities. With this empowerment, IT leaders can slash storage and data-management costs while accommodating rapid data growth.

Data Refinement

Once you get started on an unstructured-data assessment through indexing and analytics, consider further refinement. When you tag data with additional context, such as demographics, descriptive details (for instance, "image of eyes"), or project names, you open search parameters to help users and to make better data-management decisions. (Look for an unstructured-data-management solution that supports automated tagging by policy and can retain tags for data wherever it moves.)

Moreover, systematically classified, well-managed, easily searchable data is vital for fueling the latest generation of affordable, powerful artificial-intelligence (AI) and machine-learning (ML) applications. New AI/ML tools can jump-start an organization's innovation cycles, deliver noticeable productivity gains, and/or optimize anomaly detection to dramatically reduce security/compliance risks.

As data becomes ever more central to business decisions, product development, and customer strategy, knowledge about that data is increasingly valuable to people across the organization. The CIO needs to understand high-level implications of cloud storage and data growth. Researchers want to know what data is available for future projects. Legal and security teams need to ensure data is protected and discoverable if needed for auditing or investigations.

Yet visibility alone isn't enough. To get ROI from unstructured-data management, this data knowledge must be integrated into workflow processes. It should be simple to move from insight to action—migrating, tiering, copying, and deleting data, along with ongoing data-lifecycle management—to meet user, application, and departmental needs. 

Keep learning

Read more articles about: Enterprise ITData Management