Running a company with a DevOps culture is all about adopting the right culture to allow developers and the operations team to work together. A DevOps culture advocates the implementation of several engineering best practices, by relying on tools and technologies that you will discover throughout this book.
Adopting a DevOps culture
The origin of DevOps
DevOps is a new movement that officially started in Belgium in 2009, when a group of people met at the first DevOpsdays conference, organized by Patrick Debois, to discuss how to apply some agile concepts to infrastructure. Agile methodologies transformed the way software is developed. In a traditional waterfall model, the product team would come up with specifications; the design team would then create and define a certain user experience and user interface; the engineering team would then start to implement the requested product or feature, and would then hand off the code to the QA team, who would test and ensure that the code behaved correctly, according to the design specifications. Once all the bugs were fixed, a release team would package the final code, which would be handed off to the technical operations team, to deploy the code and monitor the services over time:
The increasing complexity of developing certain software and technologies showed some limitations with this traditional waterfall pipeline. The agile transformation addressed some of these issues, allowing for more interaction between the designers, developers, and testers. This change increased the overall quality of the product, as these teams now had the opportunity to iterate more on product development. However, apart from this, you would still be in a very classical waterfall pipeline, as follows:
All of the agility added by this new process didn't extend past the QA cycles, and it was time to modernize this aspect of the software development life cycle. This foundational change with the agile process which allows for more collaboration between the designers, developers, and QA teams, is what DevOps was initially after, but very quickly, the DevOps movement started to rethink how developers and operations teams could work together.
The developers versus operations dilemma
In a non-DevOps culture, developers are in charge of developing new products and features and maintaining the existing code, but ultimately, they are rewarded when their code is shipped. The incentive is to deliver as quickly as possible. On the other hand, the operations team, in general, is responsible for maintaining the uptime of the production environment. For these teams, change is a negative thing. New features and services increase the risk of having an outage, and therefore, it is important to move with caution. To minimize the risk of outages, operations teams usually have to schedule any deployments ahead of time, so that they can stage and test any production deployment and maximize their chances of success. It is also very common for enterprise software companies to schedule maintenance windows, and, in these cases, production changes can only be made a few times a quarter, half-yearly, or once a year. Unfortunately, many times, deployments won't succeed, and there are many possible reasons for that.
Too much code changing at once
There is a correlation that can be made between the size of the change and the risk of introducing critical bugs into the product, as follows:
Differences in the production environment
It is often the case that the code produced by developers works fine in a development environment, but not in production. A lot of the time, this is because the production environment is very different from other environments, and some unforeseen errors occur. The common mistakes involve the development environment, because services are collocated on the same servers, or there isn't the same level of security. As a consequence, services can communicate with one another in development, but not in production. Another issue is that the development environment might not run the same versions of a certain library/software, and therefore, the interface to communicate with them might differ. The development environment may be running a newer version of a service, which has new features that the production doesn't have yet; or it could be simply a question of scale. Perhaps the dataset used in development isn't as big as that of production, and scaling issues will crop up once the new code is out in production.
Communication
One of the biggest dilemmas in information technology is miscommunication.
The following is according to Conway's Law:
                                                                             —Melvin Conway
In other words, the product that you are building reflects the communication of your organization. A lot of the time, problems don't come from the technology, but from the people and organizations surrounding the technology. If there is dysfunction among your developers and operations team in the organization, this will show. In a DevOps culture, developers and operations have a different mindset. They help to break down the silos that surround those teams, by sharing responsibilities and adopting similar methodologies to improve productivity. Together, they try to automate whatever is possible (not everything, as not everything can be automated in a single go) and use metrics to measure their success.
Key characteristics of a DevOps culture
As we have noted, a DevOps culture relies on a certain number of principles. These principles are to source control (version control) everything, automate whatever is possible, and measure everything.
Source control everything
Revision control software has been around for many decades now, but too often, only the product code is checked. When practicing DevOps, not only is the application code checked, but configurations, tests, documentation, and all of the infrastructure automation needed to deploy the application in all environments, are also checked. Everything goes through the regular review process by the Source Code Manager (SCM).
Automating testing
Automated software testing predates the history of DevOps, but it is a good starting point. Too often, developers focus on implementing features and forget to add a test to their code. In a DevOps environment, developers are responsible for adding proper testing to their code. QA teams can still exist; however, similar to other engineering teams, they work on building automation around testing.
This topic could fill its own book, but in a nutshell, when developing code, keep in mind that there are four levels of testing automation to focus on, in order to successfully implement DevOps:
- Unit testing: This is to test the functionality of each code block and function.
- Integration testing: This is to make sure that services and components work together.
- User interface testing: This is often the most challenging component to successfully implement.
- System testing: This is end-to-end testing. For example, in a photo- sharing application, the end-to-end testing could be to open the home page, sign in, upload a photo, add a caption, publish the photo, and then sign out.
Automating infrastructure provisioning and configuration
In the last few decades, the size of the average infrastructure and the complexity of the stack have skyrocketed. Managing infrastructure on an ad-hoc basis, as was once possible, is very error-prone. In a DevOps culture, the provisioning and configuration of servers, networks, and services in general, are performed through automation. Configuration management is often what the DevOps movement is known for. However, as you know, this is just a small piece of a big puzzle.
Automating deployment
As you now, it is easier to write software in small chunks and deploy the new chunks as soon as possible, to make sure that they are working. To get there, companies practicing DevOps rely on continuous integration and continuous deployment pipelines. Whenever a new chunk of code is ready, the continuous integration pipeline kicks off. Through an automated testing system, the new code is run through all of the relevant, available tests. If the new code shows no obvious regression, it is considered valid and can be merged to the main code base. At that point, without further involvement from the developer, a new version of the service (or application) that includes those new changes will be created and handed off to a system called a continuous deployment system. The continuous deployment system will take the new builds and automatically deploy them to the different environments that are available. Depending on the complexity of the deployment pipeline, this might include a staging environment, an integration environment, and sometimes, a pre-production environment. Ultimately, if everything goes as planned (without any manual intervention), this new build will get deployed to production.
One aspect about practicing continuous integration and continuous deployment that often gets misunderstood is that new features don't have to be accessible to users as soon as they are developed. In this paradigm, developers heavily rely on feature flagging and dark launches. Essentially, whenever you develop new code and want to hide it from the end users, you set a flag in your service configuration to describe who gets access to the new feature, and how. At the engineering level, by dark launching a new feature this way, you can send production traffic to the service, but hide it from the UI, to see the impact it has on your database or on performance, for example. At the product level, you can decide to enable the new feature for only a small percentage of your users, to see if the new feature is working correctly and if the users who have access to the new feature are more engaged than the control group, for example.
Measuring everything
Measuring everything is the last major principle that DevOps-driven companies adopt. As Edwards Deming said, you can't improve what you can't measure. DevOps is an ever-evolving process and methodology that feeds off those metrics to assess and improve the overall quality of the product and the team working on it. From a tooling and operating standpoint, the following are some of the metrics most organizations look at:
- How many builds are pushed to production a day
- How often you need to roll back production in your production environment (this is indicated when your testing didn't catch an important issue)
- The percentage of code coverage
- The frequency of alerts resulting in paging the on-call engineers for immediate attention
- The frequency of outages
- Application performance
- The Mean Time to Resolution (MTTR), which is the speed at which an outage or a performance issue can be fixed
At the organizational level, it is also interesting to measure the impact of shifting to a DevOps culture. While this is a lot harder to measure, you can consider the following points:
- The amount of collaboration across teams
- Team autonomy
- Cross-functional work and team efforts
- Fluidity in the product
- How often Dev and Ops communicate
- Happiness among engineers
- Attitudes towards automation
- Obsession with metrics
As you just learned, having a DevOps culture means, first of all, changing the traditional mindset that developers and operations are two separate silos, and making the teams collaborate more, during all phases of the software development life cycle.
In addition to a new mindset, DevOps culture requires a specific set of tools geared toward automation, deployment, and monitoring:
With AWS, Amazon offers a number of services of the PaaS and SaaS types that will let us do just that.