Hive can also leverage the streaming feature in Hadoop to transform data in an alternative way. The streaming API opens an I/O pipe to an external process, such as a script. Then, the process reads data from the standard input and writes the results out through the standard output. In HQL, we can use TRANSFORM clauses directly to embed the mapper and the reducer scripts written in commands, shell scripts, Java, or other programming languages. Although streaming brings overhead by using serialization/deserialization between processes, it provides a simple coding mode for non-Java developers. The syntax of the TRANSFORM clause is as follows:
FROM ( FROM src SELECT TRANSFORM '(' expression (',' expression)* ')' (inRowFormat)? USING 'map_user_script' (AS colName (',' colName)*)? (outRowFormat)? (outRecordReader...