Micro Focus is now part of OpenText. Learn more >

You are here

You are here

COBOL and GDPR: Why format-preserving data protection is key

Phil Smith III Senior Product Manager and Architect, Mainframe and Enterprise, CyberRes

COBOL is more than 60 years old, and few new computer science graduates have ever written a single line of COBOL code. Yet it remains one of the most commonly used programming languages, especially in large enterprises.

The value of the business logic encapsulated in the billions of lines of COBOL code in existence is staggering, and explains its longevity: It may not be the hot, trendy language, but nobody wants to rewrite millions of lines of code that’s currently working fine.

Yet as security issues—breaches, internal threats, and more—continue to get the attention of CISOs and their kin, COBOL applications often do get targeted for updating to add some form of data protection. The rise of data protection standards such as GDPR, CCPA, and 23 NYCRR 500—not to mention a US federal standard rumored to be in the works—means that data security is becoming important even for enterprises that are not concerned with bad actors.

Many businesses try to put this off by implementing whole-disk or filesystem-level encryption, but it has become clear that while these approaches add some value, they fall far short of the level of protection provided by persistent, data-centric protection.

Here's why format-preserving data protection is better for your COBOL applications—especially in the age of privacy laws.

Format-preserving data protection is a path to salvation

COBOL is a strongly typed language, providing limited automatic conversion among data types. This is mostly a good thing, but a heavy impact comes with adding traditional forms of data protection. For example, most modes of AES change data type and length, requiring extensive analysis and rewriting of code.

And because COBOL programmers are mostly older and thus expensive (not to mention often retiring and thus unavailable), COBOL applications are often the last to receive the attention they need. This is understandable but a big risk, since those applications typically comprise the core of an enterprise’s critical data processing, with extensive access to the data crown jewels.

Adding to the difficulty is the fact that nobody does upward compatibility like IBM: COBOL code written decades ago will usually run fine on current operating system versions and runtime libraries. Thus, many COBOL applications are not recompiled that frequently, further reducing staff familiarity with the code and increasing the cost of updates.

With format-preserving data protection, data can be securely protected without changing its type or size. Thus, a protected name field of 40 printable characters remains the same 40 printable characters when protected. This eliminates the need to change data structure definitions, processing verbs, or dataset allocations.

Those benefits of such data protection are immediately obvious to most programmers. What is often less clear is that this methodology enables a shift in how protected data is used. With most data protection methods, the approach to any processing is to first convert the data back to cleartext, and then perform the processing. The data is thus at risk during that processing.

Data travels safely

With format-preserving data protection, most (not all!) processing can usually be performed with the data in its protected state.

Consider a US Social Security number: Many businesses collect these from customers for a variety of reasons, often even using them as primary database keys. Yet an SSN is just a nine-digit number: for most purposes, all that the business cares about is that it has it and that it is unique and consistent.

If SSNs are protected using format-preserving data protection as soon as they are acquired, they can thus remain in their protected state throughout their use as database keys and other processing. Only for specific use cases—for example, passing an SSN to a credit bureau for a credit check—would it need to be converted back to cleartext.

This protection persistence extends to data that flows between EBCDIC and ASCII environments. Existing applications often share data across systems, translating between ASCII and EBCDIC as it moves. Adding traditional data protection methods means both more disruption and reduced security, since the data must be decrypted before it is translated and (perhaps) re-encrypted afterward. This adds complexity and additional processing, and also means that data is in the clear at the transfer point, which adds significant attack surface.

With format-preserving data protection, such flows require no changes: the data can be translated in its protected state, and deprotected later as needed on the “other side.”

Searching for protected values is easy: protect the search term and find that. Only partial or wildcard searches would require deprotecting significant volumes of data, a problem shared by all data-centric protection. And there are methods to reduce this difficulty, too.

This all means that application modules that neither ingest data nor have specific need for the cleartext not only don't require updates, but they need not even be aware that the data has been protected.

Best of all, a data protection project shifts from “examine every line of code” to “understand the data flows within this application” (protecting the data only on ingestion, and deprotecting it only when strictly necessary)—an effort with lasting benefit, as that understanding has longer-term value as requirements and staffing evolve.

Map your data flow and go

With format-preserving data protection, these protection and deprotection points are single lines of code, making the actual coding easy once the data flow is mapped. Of course the initial protection of existing data is still required, and can require significant effort. There are techniques that can reduce this pain, including staged/rolling implementations.

Traditionally, data protection projects have been driven by a breach, an audit finding, or (less often) forward-thinking management. GDPR et al. add new and very specific imperatives, often for data that the enterprise does not consider particularly sensitive, and with serious fines in the offing. Field-level data protection is often the only realistic approach to meeting such requirements, making the reduced impact from format-preserving methods particularly attractive.

Approaching a data protection project for an existing COBOL application can seem fraught, for a variety of reasons: the critical nature of the processing; application age and the corresponding lack of familiarity with its operation; and the sheer volume of existing data that must be converted. The good news is that with some planning and understanding, it becomes approachable—and format-preserving data protection make it much easier than other solutions.

Keep learning

Read more articles about: SecurityData Security