Operations and infrastructure engineering in 2019: what really mattered

Everything is unreliable, right? If we didn’t realise it before, 2019 was the year when we fully had to accept the reality of the systems we’re building and managing. That was scary, sure, but it was also liberating. But we shouldn’t get carried away: given how highly distributed software systems are now part and parcel in a range of different industries, the issue of reliability and resilience isn’t purely an academic issue: in many instances, it’s urgent and critical.

That makes the work of managing and building software infrastructure an incredibly vital role. Back in 2015 I wrote that Docker had turned us all into SysAdmins, but on reflection it may be more accurate to say that we’ve now entered a world where cloud and the infrastructure-as-code revolution has turned everyone into a software developer.

Kubernetes is everywhere

Kubernetes is arguably the definitive technology of 2019. With the move to containers now fully mainstream, Kubernetes is an integral in helping engineers to deploy and manage containers at scale. The other important element to Kubernetes is that it all but kills off dreaded infrastructure lock-in. It gives you the freedom to build across different environments, and inside a more heterogeneous software infrastructure. From a tooling and skill set perspective that’s a massive win.

Although conversations about flexibility and agility have been ongoing in the tech industry for years, with Kubernetes we are finally getting to a place where that’s a reality. This isn’t to say it’s all plain sailing - Kubernetes’ complexity is a point of complaint for many, with many people suggesting that compared to, say, Docker, the developer experience leaves a lot to be desired.

But insofar as DevOps and cloud-native have almost become the norm for many engineering teams, Kubernetes casts a huge shadow. Indeed, even if it’s not the right option for you right now, it’s hard to escape the fact that understanding it, and being open to using it in the future, is crucial.

Find an extensive range of Kubernetes content in our new cloud bundles.

Serverless and NoOps

This year serverless has really come into its own. Although it was certainly gaining traction in 2018, the last 12 months have demonstrated its value as more and more teams have been opting to forgo servers completely.

There have been a few arguments about whether serverless is going to kill off containers. It’s not hard to see where this comes from, but in reality there’s no chance that this is going to happen. The way to think of serverless is to see it as an additional option that can be used when speed and agility are particularly important. For large-scale application development and deployment, containers running on ‘traditional’ cloud servers will be the dominant architectural approach.

The companion trend to serverless is NoOps. Given the level of automation and abstraction that serverless can give you, the need to configure environments to ensure code runs properly all but disappears - code runs through ‘functions’ that get fired when needed. So, the thinking goes, the need for operations becomes very small indeed.

But before anyone starts worrying about their jobs, the death of operations is greatly exaggerated. As noted above, serverless is just one option - it’s not redefining the architectural landscape. It might mean that the way we understand ‘ops’ evolves (just as ‘dev’ has), but it certainly won’t kill it off.

Discover and search serverless eBooks and videos on the Packt store.

Chaos engineering

In the introduction I mentioned that one of the strange quandaries of our contemporary distributed software world is that we’ve essentially made things more unreliable at a time when software systems are being used in ever more critical applications. From healthcare to self-driving cars, we’re entering a world where unreliability is both more common and potentially more damaging.

This is where chaos engineering comes in. Although it first appeared on ThoughtWorks Radar back in November 2017 and hasn’t yet moved out of its ‘Trial’ quadrant, in reality chaos engineering has been manifesting itself in a whole host of ways in 2019. Indeed, it’s possible that the term itself is misleading. While it suggests a wholesale methodology, in truth, there are different ways in which the core principles behind it - essentially stress-testing your software in order to manage unpredictability and improve resilience - are being used in different ways for both testing and security purposes.

Tools like Gremlin have done a lot to help promote chaos engineering and make it more accessible to organizations that maybe wouldn't see themselves as having the resources to perform cutting-edge approaches. It appears the ground-work has been done, which means it will be interesting to see how it evolves in 2020.

Observability: service meshes and tracing

One of the biggest challenges when dealing with complex software systems - and one of the reasons why they are necessarily unreliable - is because it can be difficult (sometimes impossible) to get an understanding of what’s actually going on.

This is why the debate around observability and monitoring has moved on. It’s no longer enough to have a set of discrete logs and metrics. Chances are that they won’t capture the subtleties of what’s happening, or won’t be able to provide you with context that helps you to actually understand where errors are coming from.

What’s more, a lack of observability and the wrong monitoring set up can cause all sorts of issues inside a team. At a time when the role of the on call developer has never been more discussed and, indeed, important, ensuring there’s a level of transparency is the only way to guarantee that all developers are able to support each other and solve problems as they emerge. From this perspective, then, observability has a cultural impact as much as it does a technical one. operations-and-infrastructure-engineering-in-2019-what-really-mattered-img-0

Learn distributed tracing with Yuri Shkuro from Uber's observability engineering team: find Mastering Distributed Tracing on the Packt store.

Not sure what to learn for 2020? Start exploring thousands of tech eBooks and videos on the Packt store.