The steps we need to follow in order to prepare the data for model building are as follows:
- Tokenization
- Converting text into integers
- Padding and truncation
To illustrate the steps involved in data preparation, we will make use of a very small text dataset involving five tweets related to when the Apple iPhone X released in September 2017. We will use this small dataset to understand the steps that are involved in data preparation and then we will switch to a larger IMDb dataset in order to build a deep network classification model. The following are the five tweets that we are going to store in t1 to t5:
t1 <- "I'm not a huge $AAPL fan but $160 stock closes down $0.60 for the day on huge volume isn't really bearish"
t2 <- "$AAPL $BAC not sure what more dissapointing: the new iphones or the presentation for...