Getting started with serverless data management
Years ago, developers, data scientists, and ML engineers had to spend hours or even days setting up the infrastructure needed for data management and data engineering. If a large dataset stored in S3 needed to be analyzed, a team of data scientists and ML engineers performed the following sequence of steps:
- Launch and configure a cluster of EC2 instances.
- Copy the data from S3 to the volumes attached to the EC2 instances.
- Perform queries on the data using one or more of the applications installed in the EC2 instances.
One of the known challenges with this approach is that the provisioned resources may end up being underutilized. If the schedule of the data query operations is unpredictable, it would be tricky to manage the uptime, cost, and compute specifications of the setup as well. In addition to these, system administrators and DevOps engineers need to spend time managing the security, stability, performance...