Discretizing continuous variables
Sometimes, it is actually useful to have a discrete representation of a continuous variable.
In this recipe, we will learn how to discretize a numerical feature with an example drawn from the Fourier series.
Getting ready
To execute this recipe, you will need a working Spark environment.
No other prerequisites are required.
How to do it...
In this recipe, we will use a small dataset that is located in the data
folder, namely, fourier_signal.csv
:
signal_df = spark.read.csv( '../data/fourier_signal.csv' , header=True , inferSchema=True ) steps = feat.QuantileDiscretizer( numBuckets=10, inputCol='signal', outputCol='discretized') transformed = ( steps .fit(signal_df) .transform(signal_df) )
How it works...
First, we read the data into signal_df
. The fourier_signal.csv
contains a single column called signal
.
Next, we use the .QuantileDiscretizer(...)
method to discretize the signal into 10 buckets. The bin ranges are chosen...