Data modeling
This section delves into the diverse types of data required by AI/ML systems, including structured, unstructured, and semi-structured data, and how these are applied to MDN’s news articles. The following are short descriptions of each to set a basic understanding:
- Structured data conforms to a predefined schema and is traditionally stored in relational databases for transactional information. It powers systems of engagement and intelligence.
- Unstructured data includes binary assets, such as PDFs, images, videos, and others. Object stores such as Amazon S3 allow storing these under a flexible directory structure at a lower cost.
- Semi-structured data, such as JSON documents, allow each document to define its schema, accommodating both common and unique data points, or even the absence of some data.
MDN will store news articles, subscriber profiles, billing information, and more. For simplicity, in this chapter, you will focus on the data about...