Kite Data
The Kite SDK (http://www.kitesdk.org) is a collection of classes, command-line tools, and examples that aims at easing the process of building applications on top of Hadoop.
In this section we will look at how Kite Data, a subproject of Kite, can ease integration with several components of a Hadoop data warehouse. Kite examples can be found at https://github.com/kite-sdk/kite-examples.
On Cloudera's QuickStart VM, Kite JARs can be found at /opt/cloudera/parcels/CDH/lib/kite/
.
Kite Data is organized in a number of subprojects, some of which we'll describe in the following sections.
Data Core
As the name suggests, the core is the building block for all capabilities provided in the Data module. Its principal abstractions are datasets and repositories.
The org.kitesdk.data.Dataset
interface is used to represent an immutable set of data:
@Immutable public interface Dataset<E> extends RefinableView<E> { String getName(); DatasetDescriptor getDescriptor(); Dataset<E>...