The next step involves doing some preprocessing for our caption data and building a vocabulary or metadata dictionary for our captions. We start by reading in our training dataset records and writing a function to preprocess our text captions:
train_df = pd.read_csv('image_train_dataset.tsv', delimiter='\t') total_samples = train_df.shape[0] total_samples 35000 # function to pre-process text captions def preprocess_captions(caption_list): pc = [] for caption in caption_list: caption = caption.strip().lower() caption = caption.replace('.', '').replace(',', '').replace("'", "").replace('"', '') caption = caption.replace('&','and').replace...