Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
The DevOps 2.2 Toolkit

You're reading from   The DevOps 2.2 Toolkit Self-Sufficient Docker Clusters

Arrow left icon
Product type Paperback
Published in Mar 2018
Publisher Packt
ISBN-13 9781788991278
Length 360 pages
Edition 1st Edition
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Viktor Farcic Viktor Farcic
Author Profile Icon Viktor Farcic
Viktor Farcic
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Introduction to Self-Adapting and Self-Healing Systems FREE CHAPTER 2. Choosing a Solution for Metrics Storage and Query 3. Deploying and Configuring Prometheus 4. Scraping Metrics 5. Defining Cluster-Wide Alerts 6. Alerting Humans 7. Alerting the System 8. Self-Healing Applied to Services 9. Self-Adaptation Applied to Services 10. Painting the Big Picture – The Self-Sufficient System Thus Far 11. Instrumenting Services 12. Self-Adaptation Applied to Instrumented Services 13. Setting Up a Production Cluster 14. Self-Healing Applied to Infrastructure 15. Self-Adaptation Applied to Infrastructure 16. Blueprint of a Self-Sufficient System 17. Other Books You May Enjoy

What is a self-healing system?

A self-healing system needs to be adaptive. Without the capability to adapt to the changes in the environment, we cannot self-heal. While adaptation is more permanent or longer lasting, healing is a temporary action. Take a number of requests as an example. Let's imagine that it increased permanently because now we have more users or because the new design of the UI is so good that users are spending more using our frontend. As a result of such an increase, our system needs to adapt and permanently (or, at least, longer lastingly) increase the number of replicas of our services. That increase should match the minimum expected load. Maybe we run five replicas of our shopping cart, and that was enough in most circumstances but, since our number of users increased, the number of instances of the shopping cart needs to increase to, let's say, ten replicas. It does not need to be a fixed number. It can, for example, vary from seven (lowest expected load) to twelve (highest expected load).

Self-healing is a reaction to unexpected and has a temporary nature. Take us (humans) as an example. When a virus attacks us, our body reacts and fights it back. Once the virus is annihilated, the state of the emergency ceases and we go back to the normal state. It started with a virus entering and ended once it's removed. A side effect is that we might adapt during the process and permanently create a better immune system. We can apply the same logic to our clusters. We can create processes that will react to external threats and execute reactive measures. Some of those measures will be removed as soon as the threat is gone while others might result in permanent changes to our system.

Self-healing does not always work. Both us (humans) and software systems sometimes need external help. If all else fails, and we cannot self-heal ourselves and eliminate the problem internally, we might go to a doctor. Similarly, if a cluster cannot fix itself it should send a notification to an operator who will, hopefully, be able to fix the problem, write a post-mortem, and improve the system so that the next time the same problem occurs it can self-heal itself.

This need for an external help outlines an effective way to build a self-healing system. We cannot predict all the combinations that might occur in a system. However, what we can do is make sure that when unexpected happens, it is not unexpected for long. A good engineer will try to make himself obsolete. He will try to do the same action only once, and the only way to accomplish that is through an ever-increasing level of automated processes. Everything that is expected should be scripted and fall into self-adapting and self-healing processes executed by the system. We should react only when unexpected happens.

You have been reading a chapter from
The DevOps 2.2 Toolkit
Published in: Mar 2018
Publisher: Packt
ISBN-13: 9781788991278
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime