dataframes will serve as the framework for any and all data that will be used in building deep learning models. Similar to the pandas library with Python, PySpark has its own built-in functionality to create a dataframe.
Creating a dataframe in PySpark
Getting ready
There are several ways to create a dataframe in Spark. One common way is by importing a .txt, .csv, or .json file. Another method is to manually enter fields and rows of data into the PySpark dataframe, and while the process can be a bit tedious, it is helpful, especially when dealing with a small dataset. To predict gender based on height and weight, this chapter will build a dataframe manually in PySpark. The dataset used is as follows:
While the dataset...