Classifying videos is an area of active research because of a large amount of data needed for processing this type of media. Memory requirements are frequently reaching the limits of modern GPUs and a distributed form of training on multiple machines might be required. Research is currently exploring different directions with increased levels of complexity, let's review them.
The first approach consists of classifying one video frame at a time by considering each of them as a separate image processed with a 2D CNN. This approach simply reduces the video classification problem to an image classification problem. Each video frame emits a classification output, and the video is classified by taking into account the more frequently chosen category for each frame.
The second approach consists of creating one single...