The corpus.csv file contains the description of the videos in the form of text captions (see Figure 5.5). A snippet of the data is shown in the following screenshot. We can remove a few [VideoID,Start,End] combination records and treat these as test files for evaluation later on:
Figure 5.5: A snapshot of the format of the captions file
The VideoID, Start and End columns combine to form the video name in the following format: VideoID_Start_End.avi. Based on the video name, the features from the convolutional neural network VGG16 has been stored as VideoID_Start_End.npy. Illustrated in the following code block is the function to process the text captions for the video and create the path cross reference to the video image features from VGG16:
def get_clean_caption_data(self,text_path,feat_path):
text_data = pd.read_csv(text_path...