In this section, we will prepare some data so that we can develop an author classification model. We will start by using tokens to convert text data that is available in the form of articles into a sequence of integers. We will also make changes to identify each author by unique integers. Subsequently, we will use padding and truncation to arrive at the same length for the sequence of integers that represent the articles by 50 authors. We will end this section by partitioning the training data into train and validation datasets and then carrying out one-hot encoding on the response variables.
Preparing the data for model building
Tokenization and converting text into a sequence of integers
We will start by carrying out tokenization...