Manipulating columns in a PySpark dataframe
The dataframe is almost complete; however, there is one issue that requires addressing before building the neural network. Rather than keeping the gender value as a string, it is better to convert the value to a numeric integer for calculation purposes, which will become more evident as this chapter progresses.
Getting ready
This section will require importing the following:
from pyspark.sql import functions
How to do it...
This section walks through the steps for the string conversion to a numeric value in the dataframe:
- Female --> 0
- Male --> 1
- Convert a column value inside of a dataframe requires importing
functions
:
from pyspark.sql import functions
- Next, modify the
gender
column to a numeric value using the following script:
df = df.withColumn('gender',functions.when(df['gender']=='Female',0).otherwise(1))
- Finally, reorder the columns so that
gender
is the last column in the dataframe using the following script:
df = df.select('height', 'weight...