In the last few years, we have gone through a revolution in IT: it sparkled from pure IT companies to all the sectors: retail, banking, finance, and so on. This has led to a number of small companies called start-ups, which are basically a number of individuals who had an idea, executed it, and went to the market in order to sell the product or the service to a global market (usually). Companies such as Amazon or Alibaba, not to mention Google, Apple, Stripe or even Spotify, have gone from the garage of one of the owners to big companies employing thousands of people.
One thing in common in the initial spark with these companies has always been corporate inefficiency: the bigger the company, the longer it takes to complete simple tasks.
Example of corporate inefficiency graph
This phenomenon creates a market on its own, with a demand that cannot be satisfied with traditional products. In order to provide a more agile service, these start-ups need to be cost-effective. It is okay for a big bank to spend millions on its currency exchange platform, but if you are a small company making your way through, your only possibility against a big bank is to cut costs by automation and better processes. This is a big drive for small companies to adopt better ways of doing things, as every day that passes is one day closer to running out of cash, but there is a bigger drive for adopting DevOps tools: failure.
Failure is a natural factor for the development of any system. No matter how much effort we put in, failure is always there, and at some point, it is going to happen.
Usually, companies are quite focused on removing failure, but there is a unwritten rule that is keeping them from succeeding: the 80-20 rule:
- It takes 20% of time to achieve 80% of your goals. The remaining 20% will take 80% of your time.
Spending a huge amount of time on avoiding failure is bound to fail, but luckily, there is another solution: quick recovery.
Up until now, in my work experience, I have only seen one company asking "what can we do if this fails at 4 A.M. in the morning?" instead of "what else can we do to avoid this system from failing?", and believe me, it is a lot easier (especially with the modern tools) to create a recovery system than to make sure that our systems won't go down.
All these events (automation and failure management) led to the development of modern automation tools that enabled our engineers to:
- Automate infrastructure and software
- Recover from errors quickly