I was reminiscing with a friend and former colleague of mine recently about how we both have experienced product and service releases that were under tremendous pressure to deliver a new feature or capability and were already behind schedule. Along with the highly charged environment and the pressures on the team, people were already tired, and quality issues started creeping into the builds. Look, our teams had the best of intentions, but being behind on a critical release meant that ultimately there was an impact on quality. And when we finally managed to get these challenging releases “out the door,” too many defects escaped with them and made just about everything worse. Now, not only are we delaying revenue and missing opportunities, the high-priority need to fix critical defects flattens our capacity for new projects, compounding frustration and opportunity costs.
Building and maintaining great continuous integration/continuous delivery (CI/CD) pipelines is hard but doable. These formative (and painful) experiences instilled in me a passion to commit my professional career to resolve these two critical issues:
How we make the engineering team more productive so we can accelerate the releases.
How efficiently we identify and resolve the offending commits early in the release cycle and gate them from reaching production.
Fortunately, early in my career, I had the opportunity to learn from folks who were very good at building efficient release processes that delivered consistently high quality releases. In particular, I learned a lot from BEA Weblogic’s engineering team, which built one of the most efficient and automated release processes with a lot of emphasis on quality. I was also part of a talented team in VMware, which was responsible for setting up the release processes and CI/CD pipelines for on-prem and cloud-native applications. It was painful, but we successfully integrated the CI/CD pipelines with a few monitoring tools to help the developers debug the pipeline issues.
Why doesn’t everyone build great CI/CD pipelines to accelerate software delivery?
We weren’t the only ones with both the scars of painful releases and the experience of well-executed release processes. If we had learned the lessons, lots of other folks had as well. So why wasn’t everyone running release processes with best practices?
For one thing, it can be pretty challenging to keep pre-production environments stable and fully functional. One very simple and insidious reason is that when we run automated tests in pre-production environments, we invariably have the dilemma of what to do when the tests fail. Is it a “real” failure, or is it an artifact of the test environment? If testing in pre-production environments is perceived to be flaky, it becomes easy for people to start ignoring the test failures and keep promoting the commits to the next stage in the pipeline. This behavior, in turn, results in having regressions enter into production environments, which are easily 10x more expensive to fix.
The sad truth is that it is all too common to wind up detecting major regressions in production that were already caught in pre-production, but nobody examined and triaged that failure before the code was promoted to production. Not only does this consume time and money, just think about all the time that we wasted in automating and running tests, the results of which are being ignored.
So, why does this happen over and over again?
Not because we don’t understand what’s going on.
Not because we don’t know how to minimize these failures.
Not because industry analysts haven’t been telling us how important effective automation is.
It turns out that what keeps most companies from investing in setting up a fully automated CI/CD pipeline is:
Historically, it has been hard and resource intensive.
The CI/CD eco-system is constantly evolving, and the existing tools have simply not kept up with rapidly evolving developer requirements.
Outages in production are very visible due to the business impact they cause, but the pre-production issues are ignored because the tangible benefit of stable pre-production environments is not explicitly visible to decision makers.
Existing CI/CD tools are failing developers (and businesses that depend on them)
Developers, and the businesses that depend on them, need a platform that helps them to diagnose, build, deploy, and test issues quickly and that encourages consistent “high standards of governance and efficiency” across an organization.
Succinctly, developers need an aggregated view of all the sources of issues to help them to find the root cause of the problem quickly. That will reduce the MTTR for fixing the pre-production issues and keep the pre-production environments functional and stable.
The nature of a typical CI/CD environment is that builds, tests, and deployments run full throttle, and the demand of quick turnaround to troubleshoot, fix, and redeploy is key to optimizing the CI/CD process. DevOps environments generate an overload of logs. Information overload can make root cause analysis (RCA) a nightmare. Hours, if not days, are spent manually in troubleshooting issues, especially when they manage to creep into the production environment. Often, it is not the lack of logs or metrics data or traces or information in general, but it is indeed a problem of excess. Sifting through the noise, maze of alerts, warnings, and irrelevant logs to focus on the “errors that matter” in quick time is crucial to convergence on a root cause.
Complexity is costly, as it drains productivity and quality
Side note: In many cases, pre-production environments only require monitoring when there are code promotion events. Personally, I don’t see a need to keep collecting logs 24/7 if those environments are used only once or twice per day to validate code commits. To do this, we need an observability solution to collect the relevant logs and metrics only when we really need them to debug the CI/CD pipeline issues, including the test issues.
Introducing CloudBees Release Orchestration SaaS (formerly ReleaseIQ):
This is why we started RO SaaS, to bridge the gap between these areas and solve real-world problems for product teams.
CloudBees Release Orchestration SaaS’s intelligent, people-centric software delivery platform is designed to do exactly that. Our platform unifies the CI/CD with observability to provide the ability to developers and testers to fix the build, deploy, and test failures quickly, which results in higher productivity and release efficiency.
CloudBees Release Orchestration SaaS’s
Build pipelines quickly with the no-code, drag-and-drop pipeline orchestrator, which provides end-to-end visibility into the release status of each commit and build that goes into production by foreseeing potential roadblocks—and helping teams take timely action to avoid delays in software releases..
Maintain pipeline integrity with intelligent root cause analysis, allowing both developers and DevOps engineers to rapidly orchestrate CI/CD pipelines with customized workflows and processes by seamlessly integrating with their SDLC tools.
Ensure pipeline operation with end-to-end process visibility that offers role-based productivity dashboards which enable tracking and measuring performance across teams. Improve organizational governance with enterprise-class BI visualizations.
Get the right information to the right team members with role-based productivity dashboards that provide integrated, AI-driven troubleshooting capability, which brings insights into every step of the release process for build, deploy, and test failures. Leveraging the root cause traceability will improve engineering efficiency and productivity.
This results in faster deployment, shorter lead time for new features, faster service restoration, and lower change failure rate.