In the previous case study, we learned about using CNN, RNN, and CTC loss together to transcribe the handwritten digits.
In this case study, we will learn about integrating CNN and RNN architectures to generate captions for a given picture.
Here is a sample of the picture:
- A girl in red dress with Christmas tree in background
- A girl is showing the Christmas tree
- A girl is playing in the park
- A girl is celebrating Christmas