Sample questions from the Design and Develop Data Processing section
This section focuses on the data processing section of the syllabus. Let's start with a data lake-based question.
Data lake design
You are working in a marketing firm. The firm provides social media sentiment analysis to its customers. It captures data from various social media websites, Twitter feeds, product reviews, and other online forums.
Technical requirements:
- The input data includes files in CSV, JSON, image, video, and plain text formats.
- The data is expected to have inconsistencies such as duplicate entries and missing fields.
- The overall data size would be about 5 petabytes every month.
- The engineering team are experts in Scala and Python and would like a Notebook experience.
- Engineers must be able to visualize the data for debugging purposes.
- The reports have to be generated on a daily basis.
- The reports should have charts with the ability to filter and sort...