7 things developer should know about production infrastructure

In my experience managing DevOps teams and groups, I’ve seen many changes in how development and operations teams work together. Initially, we worked in silos, with infrequent communication and collaboration. Today with the move to DevOps, we are narrowing the gaps between Dev and Ops and educating our developers by bringing Ops knowledge into Dev.

As in most organizations today, we see that developers are becoming increasingly DevOps-oriented, with constant focus on continuous delivery. We also see developers playing a central role not only with software-related issues, but with operations and production environment issues and requirements as well.

With applications and services sitting on various types of infrastructure such as database servers, app servers, etc., the production environment is often very different from the development environment. While it may seem straightforward for a developer to gather requirements, then code, test, and release the resulting software to production, developers are sometimes unaware of the requirements with which operations staff must comply.

Here are seven things that every developer should know about production infrastructure.

What do devs need to know about the production environment?

In a DevOps world, it's critical that developers understand how software is deployed in production. The goal here is to create better collaboration and teamwork and avoid friction and problems when onboarding their applications into production.

1. The server file system: Whose is it?

Developers frequently use a server’s file system as if it were their own. When you code on your own PC, you can let your app use hard-coded /tmp or /home folders, for example; but when your app gets to production, things change. In production, /tmp might be a system folder cleaned upon reboot, with many other processes using it. And /home is actually the home of the runtime user, a user set up to run apps in production. Additionally, production servers’ local storage is limited, since most usage occurs on network storage.

You can avoid these conflicts by making sure that any file server path your app uses can be configured in advance, and upon deployment. This can be done, for example, by using environment variables such as $TMPDIR and $HOME.

2. Application servers are not the only servers

Keep in mind that your application server is not the only one out there; there are database servers, load balancing servers, queue servers, and others. In production, your app server will likely be secondary to a virtual IP or other server passing client requests.

If your app needs to recognize the client IP (for example, to block users from reaching the /admin page), make sure you use the X-Forwarded-For header in your HTTP requests to help identify the originating IP address, and ask your Ops administrator to define and enable this option in the relevant load balancer.

3. Production applications are constantly monitored

Developers often focus on the core of the feature or product they develop and forget the need to support production monitoring requirements. In addition to traditional machine monitors, such as Nagios, your app is likely to be continuously monitored by an application heartbeat monitor. The load balancer initiates this monitor, where heartbeat nodes are generated every few seconds to ensure that they’re "alive" and that traffic can be routed to them.

As a developer, you should therefore develop and support the right API to return a heartbeat when invoked by the load balancer.

4. Invest in logs

Production infrastructure is heavily hardened, meaning that as a developer, chances are you won’t be able to access the infrastructure, not to mention debug it. For this reason it’s best to invest in logs early. Always ensure that they are highly informative, capturing and documenting an application’s behavior and its history, so that you’ll be able to reproduce and simulate what you’ve seen in production in a dev environment as well. As discussed in the next section, make sure that you also are logging every message that comes in and out of your app. Integration between different components is a difficult job, and this is particularly true for diagnosing problems between different apps and/or servers.

In addition, you should always make sure to use logging levels. Having the ability to filter logs and extract the information with proper detail could be a time-saver when chasing a critical defect in production.

5. Logs must be managed and controlled

Log files have the potential to crash your app by utilizing 100% of the server storage, leaving no storage left for new write operations. Developers should be aware that every log generated increases risk to the app hosting server and that the logs need to be constantly managed, including rotation and purging of old log data. Just think of the scale running in production and the number of write operations done by your app.

To prevent any issues with this, you should make sure to use standard log mechanisms such as log4j. However, make sure you check for any legal and corporate policy requirements that might require that records, including log files, be retained for a minimum period of time.

6. Enterprise production systems are often locked down

Unlike development-based systems with open Internet access, enterprise production systems are sometimes locked down, with security considerations rendering Internet access unavailable, and no inbound traffic allowed.

If your app deployment process requires access to the Internet, use local repositories and downloadable packages within the scripts that you deliver to Ops. Decoupling your deployment process from an Internet connection in this way may seem like a constraint, but it’s actually a benefit. The less your deployment process is Internet-dependent, the more likely it will be stable, fast, and secure.

7. Production systems cannot be shut down to install updates

Updates to production are often rolled out gradually (known as "a canary release") or to separate environments (a "blue-green release"), in order to maintain a good user experience, reduce risk, and avoid downtime. Any users or systems in the middle of a transaction must be shut down delicately, and this means your application should have a homogeneous software configuration and/or be able to support backward-compatibility configuration. Otherwise, the app will crash during the upgrade, or even worse, silently cause data loss.

Consider the Ops implications

As a developer working on applications, you need to always consider the implications of your work not only as it relates to software development, but also to all of the infrastructure-related elements that are affected on the way to production. If empowered with more knowledge and Ops guidance, developers of tomorrow will not only master software development, but efficiently work with production-related requirements as well. It will take a few tweaks in the organization’s culture and a change in mindset. But eventually, you’ll understand the bigger picture, while still developing first-class applications.

Image credit: Flickr

Keep learning

Choose the right ESM tool for your needs. Get up to speed with the our Buyer's Guide to Enterprise Service Management Tools
What will the next generation of enterprise service management tools look like? TechBeacon's Guide to Optimizing Enterprise Service Management offers the insights.
Discover more about IT Operations Monitoring with TechBeacon's Guide.
What's the best way to get your robotic process automation project off the ground? Find out how to choose the right tools—and the right project.
Ready to advance up the IT career ladder? TechBeacon's Careers Topic Center provides expert advice you need to prepare for your next move.

Read more articles about: Enterprise IT, IT Ops

You are here