Micro Focus is now part of OpenText. Learn more >

You are here

You are here

The state of data masking: It's essential—so get it right

Rob Lemos Writer and analyst

In 2017, a breach of Equifax through an unpatched server led to the exposure of sensitive data on more than 143 million US citizens, representing more than half of all adult Americans and exposing information such as Social Security numbers.

The breach led to a tepid $700 million fine, much debate over the future of SSNs as a de facto national identifier, and companies reevaluating how to protect information in a usable way. One solution, data masking—a technique for hiding sensitive data from being viewed by nonauthorized users—has taken off and evolved since then. Data masking has morphed from simple replacement of sensitive data to dynamic masking and tokenization techniques that allow companies to preserve much of the usefulness of data while protecting information from attackers.

When done correctly and paired with the right data-security policies, data masking can allow protected data to still be functional, said Goutham Belliappa, vice president of AI engineering for Capgemini Americas, an IT services firm.

"When data is masked correctly and all personally identifiable information is removed, leaked data may not count as a breach—as there should be no way to triangulate the leaked data to a natural person."
Goutham Belliappa

The move to data masking is important because it gives companies a choice. In the past, companies would have to either delete data and lose its value, encrypt data and make its value harder to extract, or attempt to protect data and be at risk of a data breach.

Because encryption has historically been hard to manage, the choice resulted in companies often rolling the dice, said Reiner Kappenberger, director of data-security products for Micro Focus.

"Some companies feel like they want to keep data forever, because data might have value to the company at some point, and they don't want to delete it, because they don't know when that value will be achieved. So they end up accepting a higher operational risk at that point."
Reiner Kappenberger

Data masking can help solve problems with data retention, but companies need to educate themselves. Here are four key issues you need to understand.

1. Data masking covers a lot of ground

The technologies covered by the term data masking have expanded. Originally, data masking often just referred to hiding sensitive data from prying eyes—sending "(000) 000-0000" instead of a phone number, for example. Increasingly, data masking also encompasses functional anonymization and pseudonymous capabilities, producing tokens for sensitive data values that cannot be reversed but allow for some analytical capabilities. Data masking makes use of techniques such as format-preserving encryption (FPE) and stateless tokenization.

These techniques are practical, unlike calculation-intensive techniques such as homomorphic encryption, which can cause both the size of data to grow by at least two orders of magnitude and the complexity of computation to explode.

"Data masking is one of the less sophisticated techniques of ensuring data security. Masking or other privacy techniques need to be robust enough to survive persistent attacks from bad actors that now have budget and power."
—Goutham Belliappa

2. Not all data masking is the same

Static data masking uses a second database with masked data that is used to deliver records to normal users, while privileged users still have access to the original data. Dynamic data masking programmatically changes the data when it is requested, using a variety of techniques that can preserve some of the usefulness of the data. Tokens may be used in place of credit-card numbers, for example, allowing one-time transactions and preventing the leakage of account information.

While such techniques provide strong capabilities, they are not universal, said Belliappa. Many database products offer data-masking features, but more often than not, the features essentially allow the literal "masking" of data to hide the information. Other dynamic data masking uses machine-learning algorithms to change the data as it is delivered, but poor training can affect the eventual result.

"However, most techniques of data masking are done poorly because the testing or training on this data does not represent the real world. This results in issues when the trained model or code is released into the wild."
—Goutham Belliappa

3. Good data masking should satisfy current regulations

The eventual goal of data masking is to prevent the need for notification when a breach happens. In its 2021 Guidelines on Examples Regarding Data Breach Notification, the European Data Protection Board confirmed that a company that loses or leaks encrypted data but still retains access to the data does not need to notify customers of a breach.

To take advantage of such legal protections, however, companies should make sure that they are taking the proper steps and using the right technologies.

In a column for the International Association of Privacy Professionals recently, three privacy lawyers wrote:

"If we’re realistic about anonymization ... the best we can hope for is getting the risks of re-identification low enough to be reasonable or functionally anonymized. Here, the concept of 'functional anonymization' means that the data is sufficiently anonymized to pose little risk given the broader controls imposed on that data."

4. You need to make sure you can still use your data

Finally, companies need to ensure that masked data will still serve the needs of its users. Employees who are not satisfied with data-masking technology may try to do an end run around the technology. The masked data needs to give the same results as real-world data and not be biased or skewed as a result of the masking, said Capgemini's Belliappa.

"Can you guarantee real-world characteristics of masked data? The data still needs to be valuable, even when it’s masked [or] secure."
—Goutham Belliappa

Take a stand

Companies should start taking a harder stance on their data—if they don't have a use for it, then delete it. Yet, in all other cases, masking the data will help companies escape the worst of a breach.

"The data privacy regulations are trying to tell companies that if they don't have a real reason for keeping the data, then just remove it, because when a breach happens, the fines get larger. There is a big ROI on the cyber risk side for data masking."
—Reiner Kappenberger

Keep learning

Read more articles about: SecurityData Security