Data management strategies
There are many strategies out there, and we will need to use most of them to meet and hopefully exceed our customers’ expectations. Reading this book, you will learn about some of the key data management strategies at length. For now, however, I would like to bring six of these techniques to your attention. We will take a much closer look at each of these in the upcoming chapters:
- Bring your data closer: The closer the data is to users, the faster they can access it. Yes, it may sound obvious, but users can be anywhere in the world, and they might even be traveling while trying to access their data. For them, these details do not matter, but the expectation will remain the same.
There are many different ways to keep data physically close. One of the most successful strategies is called edge computing, which is a distributed computing paradigm that brings computation and data storage closer to the sources of data. This is expected to improve response times and save bandwidth. Edge computing is an architecture rather than a specific technology (and a topology), and is a location-sensitive form of distributed computing.
The other very obvious strategy is to utilize the closest data center possible when utilizing a cloud provider. AWS, for example, spans 96 Availability Zones within 30 geographic Regions around the world as of 2022. Google Cloud offers a very similar 106 zones and 35 regions as of 2023.
Leveraging the nearest physical location can greatly decrease your latency and therefore your customer experience.
- Reduce the length of your data journey: Again, this is a very obvious one. Try to avoid any unnecessary steps to create the shortest journey between the end user and their data. Usually, the shortest will be the fastest (obviously it’s not that simple, but as a best practice, it can be applied). The greater the number of actions you do to retrieve the required information, the greater computational power you utilize, which directly increases the cost associated with the operation. It also linearly increases the complexity and most of the time increases latency and cost as well.
- Choose the right database solutions: There are many database solutions out there that you can categorize based on type, such as relational to non-relational (or NoSQL), the distribution being centralized or distributed, and so on. Each category has a high number of sub-categories and each can offer a unique set of solutions to your particular use case. It’s really hard to find the right tool for the job, considering that requirements are always changing. We will dive deeper into each type of system and their pros and cons a bit later in this book.
- Apply clever analytics: Analytical systems, if applied correctly, can be a real game changer in terms of optimization, speed, and security. Analytics tools are there to help develop insights and understand trends and can be the basis of many business and operational decisions. Analytical services are well placed to provide the best performance and cost for each analytics job. They also automate many of the manual and time-consuming tasks involved in running analytics, all with high performance, so that customers can quickly gain insights.
- Leverage machine learning (ML) and artificial intelligence (AI) to try to predict the future: ML and AI are critical for a modern data strategy to help businesses and customers predict what will happen in the future and build intelligence into their systems and applications. With the right security and governance control combined with AI and ML capabilities, you can make automated actions regarding where data is physically located, who has access to it, and what can be done with it at every step of the data journey. This will enable you to stick with the highest standards and greatest performance when it comes to data management.
- Scale on demand: The aforementioned strategies are underpinned by the method you choose to operate your systems. This is where DevOps (and SRE) plays a crucial part and can be the deciding factor between success and failure. All major cloud providers provide you with literally hundreds of platform choices for virtually every workload (AWS offered 475 instance types at the end of 2022). Most major businesses have a very “curvy” utilization trend, which is why they find the on-demand offering of the cloud very attractive from a financial point of view.
You should only pay for resources when you need them and pay nothing when you don’t. This is one of the big benefits of using cloud services. However, this model only works in practice if the correct design and operational practices and the right automation and compatible tooling are utilized.
A real-life example
A leading telecommunications company was set to unveil their most anticipated device of the year at precisely 2 P.M., a detail well publicized to all customers. As noon approached, their online store saw typical levels of traffic. By 1 P.M., it was slightly above average. However, a surge of customers flooded the site just 10 minutes before the launch, aiming to be among the first to secure the new phone. By the time the clock struck 2 P.M., the website had shattered previous records for unique visitors. In the 20 minutes from 1:50 P.M. to 2:10 P.M., the visitor count skyrocketed, increasing twelvefold.
This influx triggered an automated scaling event that expanded the company’s infrastructure from its baseline (designated as 1x) to an unprecedented 32x. Remarkably, this massive scaling was needed only for the initial half-hour. After that, it scaled down to 12x by 2:30 P.M., further reduced to 4x by 3 P.M., and returned to its baseline of 1x by 10 P.M.
This seamless adaptability was made possible through a strategic blend of declarative orchestration frameworks, infrastructure as code (IaC) methodologies, and fully automated CI/CD pipelines. To summarize, the challenge is big. To be able to operate reliably yet cost-effectively, with consistent speed and security, all the while automatically scaling these services up and down on demand without human interaction in a matter of minutes, you need a set of best practices on how to design, build, test, and operate these systems. This sounds like DevOps.