Running an FFT on a full-resolution video feed would be slow. The resulting frequencies may also reflect localized phenomena at each captured pixel, so that the motion map (the result of filtering the frequencies and then applying the IFFT) might appear noisy and overly sharp. To address these problems, we need a cheap, blurry downsampling technique. However, we also want the option to enhance edges, which are important to our perception of motion.
Our need for a blurry downsampling technique is fulfilled by a Gaussian image pyramid. A Gaussian filter blurs an image by making each output pixel a weighted average of multiple input pixels in the neighborhood. An image pyramid is a series in which each image is a fraction of the width and height of the previous image. Often, the fraction is one half. The halving of image dimensions is achieved...