Data Sources API
The Data Sources API provides a single interface for loading and storing data using Spark SQL. In addition to the built-in sources, this API provides an easy way for developers to add support for custom data sources. All available external packages are listed at http://spark-packages.org/. Let's learn how to use built-in sources and external sources in this section.
Read and write functions
The Data Sources API provides generic read and write functions that can used for any kind of data source. Generic read and write functions provide two functionalities as given in the following:
Parses text records, JSON records, and other formats and deserializes data stored in binary
Converts Java objects to rows of Avro, JSON, Parquet, and HBase records
The default data source is set to parquet with the spark.sql.sources.default
configuration property. This can be changed as needed.
Built-in sources
Built-in sources are pre-packaged with Spark by default. Examples of built-in sources are Text...