Introduction
Hadoop is one of the names we think about when it comes to Big Data. I'm not going into details about it since there is plenty of information out there; moreover, like somebody once said, "If you decided to use Hadoop for your data warehouse, then you probably have a good reason for it". Let's not forget: it is primarily a distributed filesystem, not a relational database. That said, there are many cases when we may need to use this technology for number crunching, for example, together with MicroStrategy for analysis and reporting.
There are mainly two ways to leverage Hadoop data from MicroStrategy: the first is Hive and the second is Impala. They both work as SQL bridges to the underlying Hadoop structures, converting standard SELECT
statements into jobs. The connection is handled by a proprietary 32-bit ODBC driver available for free from the Cloudera website.
In my tests, Impala resulted largely faster than Hive, so I will show you how to use it from our MicroStrategy virtual machine.
Note
Please note that I am using Version 9.3.0 for consistency with the rest of the book. If you're serious about Big Data and Hadoop, I strongly recommend upgrading to 9.3.1 for enhanced performance and easier setup. See MicroStrategy knowledge base document TN43588: Post-Certification of Cloudera Impala 1.0 with MicroStrategy 9.3.1.
The ODBC driver is the same for both Hive and Impala, only the driver settings change.