The Two “Falling” Approaches
In this section, we’ll learn about two ways to recover from a failed software deployment. After finishing this section, you’ll know how to use these two approaches to make a justified decision on recovering from a bad deployment.
In a standard pipeline, companies sometimes experience software glitches when deploying to a web server. Users may see an error message when they perform an action on the website.
What do you do when the software doesn’t work as expected? How does this work in the DevOps pipeline?
Every time you build software, there’s always a chance something could go wrong. You always need a backup plan before the software is deployed.
Let’s cover the two types of recovery methods we can use when software deployments don’t succeed.
Falling Backward (or fallback)
If various bugs were introduced into the product and the previous version doesn’t appear to have these errors, it makes sense to revert the software or fall back to the previous version.
In a pipeline, the process at the end creates artifacts, which are self-contained, deployable versions of your product.
Here is an example of falling backward:
- Your software deployment was a success last week and was marked as version 1.1 (v1.1).
- Over 2 weeks, development created two new features for the software and wanted to release them as soon as possible.
- A new build was created and released called version 1.3 (v1.3).
- While users were using the latest version (v1.3), they experienced issues with one of the new features, causing the website to show errors.
- Since the previous version (v1.1) doesn’t have this issue and the impact is not severe, developers can redeploy v1.1 to the server so that users can continue to be productive again.
This type of release is called falling backward.
If you have to replace a current version (v1.3) with a previous version (v1.1) (except for databases, which I’ll cover in a bit), you can easily identify and deploy the last-known artifact.
Falling Forward
If the fallback approach isn’t a viable recovery strategy, the alternative is to fall forward.
When falling forward, the product team accepts the deployment with errors (warts and all) and continues to move forward with newer releases while placing a high priority on these errors and acknowledging the errors will be fixed in the next or future release.
Here is a similar example of falling forward:
- Again, a software deployment was successful last week and was marked as version 1.5 (v1.5).
- Over another 2 weeks, development created another new large feature for the software.
- A new build was created and released called version 1.6 (v1.6).
- While users were using the latest version (v1.6), they experienced issues with one of the new features, causing the website to show errors.
- After analysis, the developers realized this was a “quick fix,” created the proper unit tests to show it was fixed, pushed a new release through the pipeline, and immediately deployed the fixed code in a new release (v1.7).
This type of release is called falling forward.
The product team may have to examine each error and make a decision as to which recovery method is the best approach for the product’s reputation.
For example, if product features such as business logic or user interface updates are the issue, the best recovery method may be to fall forward since the impact on the system is minimal and a user’s workflow is not interrupted and productive.
However, if code and database updates are involved, the better approach would be to fall back – that is, restore the database and use a previous version of the artifact.
If it’s a critical feature and reverting is not an option, a “hotfix” approach (as mentioned in the previous chapter) may be required to patch the software.
Again, it depends on the impact each issue has left on the system as to which recovery strategy is the best approach.
In this section, we learned about two ways to recover from unsuccessful software deployments: falling backward and falling forward. While neither option is a mandatory choice, each approach should be weighed heavily based on the error type, the recovery time of the fix, and the software’s deployment schedule.