Working with Large Datasets
So far, we've been working with a relatively small number of documents. The movies
collection has roughly 23,500 documents in it. This may be a considerable number for a human to work with, but for large production systems, you may be working on a scale of millions instead of thousands. So far, we have also been focusing strictly on a single collection at a time, but what if the scope of our aggregation grows to include multiple collections?
In the first topic, we briefly discussed how you could use the projection stage while developing your pipelines to create more readable output as well as simplify your results for debugging. However, we didn't cover how you can improve performance when working on much, much larger datasets, both while developing and for your final production-ready queries. In this topic, we'll discuss a few of the aggregation stages that you need to master when working with large, multi-collection datasets.