Data storage
In this section, you will perform sizing, which is an educated estimate, for storage requirements. You will consider not just volume size and speed, but also several other aspects of the database cluster that are needed for harnessing the data of your application while following expected data access patterns.
MDN plans to publish 100 articles daily. Keeping the articles from the last 5 years, the number of articles would total 182,500. With 48 million subscribers and 24 million daily active users, peak access occurs for 30 minutes daily across three major time zones, as shown in Figure 6.4.
Figure 6.4: MDN subscriber time zones and peak times
First, you will estimate the total data size. Each article has one 1,024-dimension embedding for semantic search and five 768-dimension embeddings for image search, totaling 40 KB uncompressed (dimensions use the double type). With the title
, summary
, body
(with and without markup), and other fields...