The DataFrame API
Excel spreadsheets like data representation, or output from a database projection (select statement's output), the data representation closest to human being had always been a set of uniform columns with multiple rows. Such a two-dimensional data structure that usually has labelled rows and columns is called a DataFrame in some realms, such as R DataFrames and Python's Pandas DataFrames. In a DataFrame, typically, a single column has the same kind of data, and rows describe data points about that column that mean something together, be it data about a person, a purchase, or a baseball game outcome. You can think of it as a matrix, or a spreadsheet, or an RDBMS table.
DataFrames in R and Pandas are very handy in slicing, reshaping, and analyzing data -essential operations in any data wrangling and data analysis workflow. This inspired the development of a similar concept on Spark, called DataFrames.
DataFrame basics
The DataFrame API was first introduced in Spark...