Applying differential privacy to large datasets
In the previous examples, we focused on calculating differentially private aggregates (such as count, sum, and average) on smaller datasets, involving a single terminal or a limited number of terminals. However, in this section, we will explore how to generate differentially private aggregates on large datasets, including millions or even billions of records. Specifically, we will consider a use case involving a dataset of approximately 5 million credit card transactions across 1,000 point-of-sale terminals and 5,000 customers.
Use case – generating differentially private aggregates on a large dataset
Let’s generate the dataset comprising credit card transactions recorded on a daily basis across 1,000 POS terminals. These transactions involve a total of 5,000 customers, resulting in an extensive collection of approximately 5 million records.
To calculate differentially private aggregates on such a large dataset...