This is a guest blog post by Gary Gruver, one of CloudBees's strategic advisors. Gary is the Co-Author of Leading the Transformation, A Practical Approach to Large-Scale Agile Development, and Starting and Scaling DevOps in the Enterprise.
This is the third free chapter from my recent book "Starting and Scaling DevOps in the Enterprise ". You can read the first chapters here and here . You can also download your free copy of the complete book, here . The book provides a concise framework for analyzing your delivery processes and optimizing them by implementing DevOps practices that will have the greatest immediate impact on the productivity of your organization. It covers both the engineering, architectural and leadership practices that are critical to achieving DevOps success. I hope this will be a helpful resource for you on your DevOps path!
In the coming weeks, I will be sharing additional chapters and tips from the book. Can't wait? you can download your free copy now . Download the full book for free »
This is the third free chapter from my recent book "Starting and Scaling DevOps in the Enterprise ". You can read the first chapters here and here . You can also download your free copy of the complete book, here . The book provides a concise framework for analyzing your delivery processes and optimizing them by implementing DevOps practices that will have the greatest immediate impact on the productivity of your organization. It covers both the engineering, architectural and leadership practices that are critical to achieving DevOps success. I hope this will be a helpful resource for you on your DevOps path!
Chapter 3: Optimizing The Basic Deployment Pipeline
Setting up your Deployment Pipeline (DP) and using DevOps practices for increasing its throughput while maintaining or improving quality is a journey that takes time for most large organizations. This approach, though, will provide a systematic method for addressing inefficiencies in your software development processes and improving those processes over time. We will look at the different types of work, different types of waste, and different metrics for highlighting inefficiencies. We will start there because it is important to put the different DevOps concepts, metrics, and practices into perspective so you can start your improvements where they will provide the biggest benefits and start driving positive momentum for your transformation. The technical and cultural shifts associated with this will change how everyone works on a day-to-day basis. The goal is to get people to accept these cultural changes and embrace different ways of working. For example: As an Operations person, I have always logged into a server to debug and fix issues on the fly. Now I can log on to debug, but the fix is going to require updating and running the script. This is going to be slower at first and will feel unnatural to me, but the change means I know, as does everyone else, that the exact state of the server with all changes are under version control, and I can create new servers at will that are exactly the same. Short-term pain for long-term gain is going to be hard to get some people to embrace, but this is the type of cultural change that is required to truly transform your development processes. Additionally, there are lots of breakthroughs coming from the field of DevOps that will help you address issues that have been plaguing your organization for years that were not very visible while operating at a low cadence. When you do one deployment a month, you don’t see the issues repeating enough to see a common cause that needs to be fixed. When you do a deployment each day, you see a pattern that reveals the things that need fixing. When you are deploying manually on a monthly basis, you can use brute force, which takes up a lot of time, requires a lot of energy, and creates a lot of frustration. When you deploy daily, you can no longer use brute force. You need to automate to improve frequency, and that automation allows you to fix repetitive issues. As you look to address inefficiencies, it is important to understand that there are three different kinds of work with software that require different approaches to eliminate waste and improve efficiency. First, there is new and unique work , such as the new features, new applications, and new products that are the objective of the organization. Second, there is triage work that must be done to find the source of the issues that need to be fixed. Third, there is repetitive work, which includes creating an environment, building, deploying, configuring databases, configuring firewalls, and testing. Since the new and unique work isn’t a repetitive task, it can’t be optimized the way you would a manufacturing process. In manufacturing, the product being built is constant so you can make process changes and measure the output to see if there was an improvement. With the new and unique part of software you can’t do that because you are changing both the product and the process at the same time. Therefore, you don’t know if the improvement was due to the process change or just a different outcome based on processing a different type or size of requirement. Instead the focus here should be on increasing the feedback so that people working on these new capabilities don’t waste time and energy on things that won’t work with changes other people are making, won’t work in production, or don’t meet the needs of the customer. Providing fast, high-quality feedback helps to minimize this waste. It starts with feedback in a production-like environment with their latest code working with everyone else’s latest code to ensure real-time resolution of those issues. Then, ideally, the feedback comes from the customer with code in production as soon as possible. Validating with the customer is done to address the fact that 50 of new software features are never used or do not meet their business intent. Removing this waste requires getting new features to the customers as fast as possible to enable finding which parts of the 50 are not meeting their business objective so the organization can quit wasting time on those efforts. In large software organizations, triaging and localizing the source of the issue can consume a large amount of effort. Minimizing waste in this area requires minimizing the amount of triage required and then designing processes and approaches that localize the source of issues as quickly as possible when triage is required. DevOps approaches work to minimize the amount of triage required by automating repetitive tasks for consistency. DevOps approaches are also designed to improve the efficiency of the triage process by moving to smaller batch sizes, resulting in fewer changes needing to be investigated as potential sources of the issue. The waste with repetitive work is different. DevOps moves to automate these repetitive tasks for three reasons. First, it addresses the obvious waste of doing something manually when it could be automated. Automation also enables the tasks to be run more frequently , which helps with batch sizes and thus the triage process. Second, it dramatically reduces the time associated with these manual tasks so that the feedback cycles are much shorter , which helps to reduce the waste for new and unique work. Third, because the automated tasks are executed the same way every time, it reduces the amount of triage required to find manual mistakes or inconsistencies across environments. DevOps practices are designed to help address these sources of waste, but with so many different places that need to be improved in large organizations, it is important to understand where to start. The first step is documenting the current DP and starting to collect data to help target the bottlenecks in flow and the biggest sources of waste. In this chapter we will walk through each step of the basic DP and will review which metrics to collect to help you understand the magnitude of issues you have at each stage. Then, we will describe the DevOps approaches people have found effective for addressing the waste at that stage. Finally, we will highlight the cultural changes that are required to get people to accept working differently. This approach should help illustrate why so many different people have different definitions of DevOps. It really depends what part of the elephant they are seeing. For any given organization, the constraint in flow may be the planning/requirements process, the development process, obtaining consistent environments, the testing process, or deploying code. Your view of the constraint also potentially depends on your role in the organization. While everything you are hearing about DevOps is typically valid, you can’t simply copy the rituals because it might not make sense for your organization. One organization’s bottleneck is not another organization’s bottleneck so you must focus on applying the principles!Requirement/Planning
Here we are talking about new and unique work, not repetitive work, so fixing it requires fast feedback and a focus on end-to-end cycle time for ultimate customer feedback. For organizations trying to better understand the waste in the planning and requirements part of their DP, it is important to understand the data showing the inefficiencies. It may not be possible to collect all the data at first, but don’t let this stop you from starting your improvements. As with all of the metrics we describe, get as much data as you can to target issues and start your continuous improvement process. It is more important to start improving than it is to get a perfect view of your current issues. Ideally, though, you would want to know the answers to the following questions:- What percentage of the organizations capacity is spent on documenting requirements and planning?
- What is the amount of requirements inventory waiting for development, roughly, in terms of days of supply?
- What percentages of the requirements are reworked after originally defined?
- What percentages of the delivered features are being used by the customers and are achieving the expected business results?
Environments
For many organizations, like the one described in Chapter 2, the time it takes for Operations to create an environment for testing is one of the lengthiest steps in the DP. Additionally, the consistency between this testing environment and production is so lacking that it requires finding and fixing a whole new set of issues at each stage of testing in the DP. Creating these environments is one of the main repetitive tasks that can be documented, automated, and put under revision control. The objective here is to be able to quickly create environments that provide consistent results across the DP. This is done through a movement to infrastructure as code, which has the additional advantage of documenting everything about the environments so it is easier for different parts of the organization to track and collaborate on changes. To better understand the impact environment issues are having on your DP, it would be helpful to have the following data:- time from environment request to delivery
- how frequently new environments are required
- the percent of time environments need fixing before acceptance
- the percent of defects associated with code vs. environment vs. deployment vs. database vs. other at each stage in the DP
Testing
The testing, debug, and defect fixing stage of the DP is a big source of inefficiencies for lots of organizations. To understand the magnitude of the problem for your DP, it would be helpful to have the following data:- the time it takes to run the full set of testing
- the repeatability of the testing (false failures)
- the percent of defects found with unit tests, automated system tests, and manual tests
- the time it takes the release branch to meet production quality
- approval times
- batch sizes or release frequency at each stage
Production Release
The next step in the basic DP is the release into production. Ideally, you would have found and fixed all the issues in the test stage so that this is a fairly automated and simple process. Realistically, this is not the case for most organizations. To better understand the source and magnitude of the issues at this stage, it is helpful to look at the following metrics:- the time and effort required to deploy and release into production
- the number of issues found during release and their source (code, environment, deployment, test, data, etc…)
Operation and Monitoring
The final step is operating and monitoring the code to make sure it is working as expected in production. The primary metrics to monitor here are:- issues found in production
- time to restore service
Summary
This simple construct of a DP with a single developer does a good job of introducing the concepts and shows how the DevOps changes can help to improve flow. The metrics are also very useful for targeting where to start improving the pipeline. It is important to look across all the metrics in the DP to ensure you start this work with the bottleneck and/or the biggest source of waste because transforming your development and deployment processes is going to take some time, and you want to start seeing the benefits of these changes as soon as possible. This can only occur if you start by focusing on the biggest issues for your organization. The metrics are intended to help identify these bottlenecks and waste in order to gain a common understanding of the issues across your organization so you can get everyone aligned on investing in the improvements that will add the most value out of the gate. Starting and Scaling DevOps in the Enterprise - Free Book by Gary GruverIn the coming weeks, I will be sharing additional chapters and tips from the book. Can't wait? you can download your free copy now . Download the full book for free »