Converting a PySpark dataframe to an array
In order to form the building blocks of the neural network, the PySpark dataframe must be converted into an array. Python has a very powerful library, numpy
, that makes working with arrays simple.
Getting ready
The numpy
 library should be already available with the installation of the anaconda3
 Python package. However, if for some reason the numpy
 library is not available, it can be installed using the following command at the terminal:
pip install
 or sudo pip install
 will confirm whether the requirements are already satisfied by using the requested library:
import numpy as np
How to do it...
This section walks through the steps to convert the dataframe into an array:
- View the data collected from the dataframe using the following script:
df.select("height", "weight", "gender").collect()
- Store the values from the collection into an array calledÂ
data_array
 using the following script:
data_array = np.array(df.select("height", "weight", "gender").collect())
- Execute...