How to break through 5 choke points in your continuous delivery pipeline
A well-designed continuous delivery pipeline should flow your software from your source control repository to production in an automated and traceable fashion. Too bad it doesn't always work according to plan. Once you've set up your first CD pipeline, you may feel as if you're ready to focus on how to accelerate delivery though your CD pipeline. All too often, however, development teams face a very different problem. Suddenly, the CD pipeline slows down, or stops entirely. Here are five common reasons why your pipeline can hit a choke point, and what to do about them.
1. Designing for an overly coupled product architecture
The success of CD largely depends on a decoupled product architecture. Wherever you are in your architectural journey, from mud-ball to service-oriented architecture to microservices, you need low coupling and high cohesion. This leads to faster continuous integration builds and faster feedback cycles. Decoupling will help you define a robust test strategy and lead to a shorter test cycle.
As you strangle the code monolith, start your CD journey with the slice of your product that generates the most revenue and that's evolving the fastest. It could be that 80 percent of your customers bank heavily on 20 percent of your codebase, so the 20 percent needs to be continuously delivered first. Google Page Analytics or similar tools can confirm usage, time spent, and the access patterns of your customers. Decide what's most important and analyze the bottlenecks.
In other words, don't let the organizational hierarchy and its communication structure influence your system design. Interfaces are of paramount importance. Interfaces influence how modules interact, while modules influence how teams interact — or should interact.
Here's Conway's corollary: "Because the design that occurs first is almost never the best possible, the prevailing system concept may need to change. Therefore, flexibility of organization is important to effective design."
The bottom line is to allow for the design to evolve.
2. Spinning with the revolving door of CD tools
It can be frustrating when CD tools go through a revolving door of revisions. CD is an evolving area, and the tools to continuously deliver are evolving rapidly. Unfortunately, there's not much you can do about that. The good news is that, if you get the fundamentals straight, the tools are simply a means to an end.
CD is the right thing to do. Internalize the fundamentals, and let the design tools follow.
3. Failing to create immutable infrastructure
Treat your infrastructure as an artifact. Machines are artifacts too, and here machine images are key to repeatability. Machine failures should be induced and fixed at build time, not at run time, in much the same way you detect and address software failures.
Consider using a tool such as the open source Packer, which builds identical machine images in parallel from a single-source configuration file for multiple platforms like Amazon EC2 and VMware. It works well with traditional configuration management tools, and, as a post-processing step, it can write out vagrant boxes.
The takeaway here: A version-controlled infrastructure goes a long way toward making CD successful.
4. Your big data appears to be too big for your pipeline
It's a myth that big data applications don't lend themselves to CD, although these can be complex workflows that are typically situated in the cloud. Once you have defined the workflows and identified the upstream and downstream data dependencies, complex big data applications like ETL (extract transform load), ad server, inventory management system, and behavioral targeting can be fully automated using standard tool sets.
As always, the CD pipeline architecture should follow your product architecture, and components, modules, or services should be defined, followed by your subsystems or tiers, and eventually the system itself.
It is possible to stitch big data applications and the corresponding tests with a workflow manager, like an Oozie workflow scheduling system. When you construct an Oozie workflow XML as a directed acyclic graph (DAG), you can define each action as Java Map-Reduce, streaming MapReduce, Pig, Hive, Java program, or shell script. The coordinator XML executes these workflows based on a time trigger and/or upstream data availability.
You can tie these workflows back to any standard orchestrator, such as Jenkins, TeamCity, or GoCD, and you can simulate upstream data dependencies by programmatically injecting legitimate data, illegitimate data, or occasionally no data to cover workflow paths. When you inject external test data, it's key to have robust SetUp () and TearDown () mechanisms that leave the system in the same pristine state as you found it in before you ran the tests.
Once your serpentine workflow starts to crawl, applications and tests will execute as nodes in the DAG. Data gets crunched, decisions flow, and validations occur. The test cycle time could depend on just how big your big data is, but don't let the terabytes, petabytes, exabytes, zettabytes, or yottabytes concern you otherwise.
5. You have an irrational fear of databases
For a while in CD, no one wanted to touch databases. That's changing, however. Traditionally, database administrators were a specialized, siloed group who believed that databases couldn't be continuously delivered. Databases are so complex, they said, that you need manual administration for objects, schema, and data. This is not true, but this mind-set has nonetheless caused expensive delays in some organizations.
Whether you use Liquibase, Datical (Enterprise Liquibase), DBMaestro Teamwork, or a similar tool, the underlying concepts of CD remain the same.
Object types in databases (tables, procedures, functions, packages, triggers, views, sequences, etc.) can be delivered as part of the CD pipeline. DDL (data definition language, such as Create and Alter), DML (data manipulation language, such as Select and Insert), DCL (data control language, such as Grant and Revoke) and TCL (transaction control language, such as Commit and Rollback) statements have been automated as part of ChangeLogs and ChangeSets.
Production and test data can be structured or unstructured, and can have various types, like XML, strings, dates, numbers, etc., and even those have been managed as part of ChangeSets. Databases typically constitute a subsystem, and the tools mentioned support Build, Configure, Assemble, Deploy and Test as part of component, subsystem and system stages in the CD pipeline.
Security, scale and integration at an enterprise level have been achieved, and there's a strong audit trail that helps with regulations.
NoSQL is schema-less which means that the schema is ingrained in the source code. In this model, schemas can be continuously delivered with the source.
Your CD pipeline carries your business' ideas into production. The less it chokes, the more you can experiment and the faster you can monetize your products. That's the only way to run your business.