Big data may be semi-structured or unstructured. The massively parallel processing (MPP) architecture structures big data to enable easy querying for reporting and analytic purposes. MPP systems are sometimes referred to as shared nothing systems. This means that data is partitioned across many servers (otherwise known as nodes) and each server processes queries locally.
Let's explore MPP in detail using the following diagram as a point of reference:
Please see following, an explanation of the diagram:
- The process begins by the Client issuing a query that is then passed to the Master Node.
- The Master Node contains information, such as the data dictionary and session information, which it uses to generate an execution plan designed to retrieve the needed information from each underlying Node.
- Parallel Execution represents the implementation...