It is always important to first analyze any dataset before applying models on that same dataset
Analyzing the therapy bot session dataset
Getting ready
This section will require importing functions from pyspark.sql to be performed on our dataframe.
import pyspark.sql.functions as F
How to do it...
The following section walks through the steps to profile the text data.
- Execute the following script to group the label column and to generate a count distribution:
df.groupBy("label") \
.count() \
.orderBy("count", ascending = False) \
.show...