This book is about Python for geospatial development, so in this section, you will learn how to use Python for HDFS operations and Hive queries. There are several database wrapper libraries with Python and Hadoop, but it does not seem like a single library has become a standout go-to library, and others, like Snakebite, don't appear ready to run on Python 3. In this section, you will learn how to use two libraries—PyHive and PyWebHDFS. You will also learn how you can use the Python subprocess module to execute HDFS and Hive commands.
To get PyHive, you can use conda and the following command:
conda install -c blaze pyhive
You may also need to install the sasl library:
conda install -c blaze sasl
The previous libraries will give you the ability to run Hive queries from Python. You will also want to be able to move files to HDFS. To do so, you...