Compiling Pig scripts
The Pig architecture is layered to facilitate pluggable execution engines. Hadoop's MapReduce is an execution platform that is plugged into Pig. There are three main phases when compiling and executing a Pig script: preparing the logical plan, transforming it into a physical plan, and finally, compiling the physical plan into a MapReduce plan that can be executed in the appropriate execution environment.
The logical plan
The Pig statements are first parsed for syntax errors. Validation of the input files and input data structures happens during parsing. Type checking in the presence of a schema is done during this phase. A logical plan, a DAG of operators as nodes, and data flow as edges are then prepared. The logical plan cannot be executed and is agnostic of the execution layer. Optimizations based on in-built rules happen at this stage. Some of these rules are discussed later in the chapter. The logical plan has a one-to-one correspondence with the operators available...