Object detection models are notoriously computationally-intensive. Running on every frame of a video is sometimes impractical. A common technique is to use the model every few frames only. In-between frames, linear interpolation is used to follow the tracked object.
While this technique does not work for real-time applications, another one that is commonly used is object tracking. Once an object is detected with a deep learning model, a more simple model is used to follow the boundaries of the object.
Object tracking can work on almost any kind of object as long as it is well distinguishable from its background and its shape does not change excessively. There are many object tracking algorithms (some of them are available through OpenCV's tracker module, documented here https://docs.opencv.org/master/d9/df8/group__tracking.html); many of them are available for mobile applications.