Micro Focus is now part of OpenText. Learn more >

You are here

You are here

Why data-centric security is essential for legacy COBOL systems

Luther Martin Distinguished Technologist, Micro Focus

Much of the software now in use worldwide was written in COBOL. Because COBOL can be harder to maintain due to a shortage of experienced COBOL programmers, the inevitable security vulnerabilities that exist in this code can be a serious problem. 

There are probably over 220 billion lines of COBOL that go into the software that runs globally today. That represents about 80% of the world's entire base of actively used code. That many lines of code almost certainly contain lots of security vulnerabilities.

Roughly how many should we expect there to be in 220 billion lines of code (BLOC)? Research has suggested that high-quality commercial software has roughly 0.5 defects per 1,000 lines of code (KLOC). 

More typical software has closer to 0.7 defects per KLOC, and this is roughly the same for open-source software. If you're testing pre-release software, this value could be much higher. (Be sure that you understand that particular risk before using pre-release software.) 

The same research also suggests that about 25% of these defects cause a serious security vulnerability. That means in the 220 billion lines of COBOL that are in use today, we should expect about 27.5 million serious security vulnerabilities, which we can calculate like this: 

220 BLOC x (1,000,000 KLOC / BLOC) x (0.5 defects / KLOC) x (1 serious security vulnerability / 4 defects) = 27.5 million serious security vulnerabilities 

That’s a lot of serious security vulnerabilities, so many that it’s reasonable to assume that most of them will never be addressed. This means that the legacy COBOL systems that run the world will probably always have lots of serious security vulnerabilities despite our best efforts to find and fix them.   

Note that this situation isn’t unique to COBOL. There are also many serious security vulnerabilities in software that is written in languages other than COBOL, of course, but the ever-decreasing supply of COBOL programmers who can maintain the huge existing base of COBOL code makes this situation particularly perilous.  

It's been known for a while that we may simply have to live with systems with lots of security vulnerabilities. One researcher has even suggested that we should simply accept this unfortunate fact and design systems that are reasonably secure, even with such constraints. 

If accepting vulnerabilities is the case with the COBOL systems that run the world, what's a good way to design reasonably secure systems? Here's why data-centric security is critical for such legacy code.

Data-centric security can be your friend 

One approach is to separate the security of sensitive data from the software that processes it. This approach is often called "data-centric security" because of its focus on protecting the data itself instead of the applications that process it.

Replacing sensitive data with a value that is generated by either an encryption or tokenization algorithm are easy ways to do this. (For the sake of clarity, in the following, when "encryption" is mentioned, it means both encryption and tokenization.) 

Then, if hackers manage to compromise the security of a database, for example, all they will be able to get are valueless data (meaningless ciphertext) instead of its sensitive and valuable counterpart. 

And since it's virtually impossible for even the cleverest hacker or the best-funded nation-state to crack encryption, if any of these potential troublemakers manage to get data that is protected by data-centric security, the possibility of them being able to recover the sensitive data from the ciphertexts that are created by encryption are remote at best.

It might be theoretically possible, just as it's theoretically possible to count all the grains of sands on the Earth. And just as practical difficulties make it impossible to count those grains of sand, realistic constraints also make it impossible to crack encryption.  

Mining cryptocurrencies, for example, is limited by the amount of power that miners are willing to spend on their mining, the amount of power available limits the ability of hackers to beat encryption. It doesn’t take just the output of several power plants to do this in a reasonable amount of time; it takes billions of times the energy that the Earth receives from the sun each year. 

So what is the best way to add data-centric security to COBOL systems? It can be difficult and expensive to modify them due to the age of the codebase and the limited availability of skilled programmers. Thus, to add data-centric security to such systems, it can be very useful to encrypt in a way that lets existing systems easily handle protected values with minimal modifications.  

Preserve the format of data 

It is easy to encrypt in a way that preserves the format of the original values. If you want to protect a 16-digit credit card number, for example, if the corresponding ciphertext is also a 16-digit value, this is relatively easy. On the other hand, if you use types of encryption that do not preserve the format of the data, it can be difficult and expensive to integrate encryption into complex systems.

The most common encryption algorithm used to protect traffic on the Internet, for example, is the CBC (cipher block chaining) mode of the AES algorithm. If this form of encryption is used to encrypt a 16-digit value such as 1234567812345678, we might get a value such as RTxZIdIP2B/akV5QkdBCivTwZWrlZ0l4+DynqYj/Hp4= for our ciphertext.

This value is no longer simply decimal digits and is longer than the original value. Either of these can cause errors in systems that were designed to process only 16-digit values. Keeping the format of the ciphertext the same as the plaintext does a good job of ensuring that neither of these errors happen.  

Cheaper, easier—and safer

It is generally cheaper and easier to adapt the format of the data to an existing system than to modify an existing system to handle data with a different format. Adapt the data to the network, not the network to the data. And this technique will even work for systems that aren't implemented in COBOL.

To find the biggest savings possible from data-centric security, look at those 220 billion lines of COBOL first. And look at technologies such as format-preserving encryption (FPE) or format-preserving tokenization (FPT) when you do this. 

Those 27.5 million serious security vulnerabilities won't let hackers get your sensitive data if it’s protected with data-centric security, and FPE or FPT are the cheapest and easiest ways to implement this. 

Keep learning

Read more articles about: SecurityData Security