In this section, we will focus on how our data (tweets, in this case) is transformed to fit the model's requirements. We will first see how, using the files in the data/ folder from the GitHub repo for this task, the model can help us extract the needed tweets. Then, we will look at how, with the help of a simple set of functions, we can split and transform the data to achieve the needed results.
An important file to examine is data.py, inside the data/twitter folder. It transforms plain text into a numeric format so it is easy for us to train the network. We won't go deep into the implementation, since you can examine it by yourself. After running the code, we produce three important files:
- idx_q.npy: This is an array of arrays containing index representation of all the words in different sentences forming the chatbot questions...