Due to their high computational requirements, deep learning algorithms are most commonly run on powerful servers. They are computers specifically designed for this task. For latency, privacy, or cost reasons, it is sometimes more interesting to run inference on customers' devices: smartphones, connected objects, cars, or microcomputers.
What all those devices have in common are lower computational power and low power requirements. Because they are at the end of the data life cycle, on-device machine learning is also referred to as edge computing or machine learning on the edge.
With regular machine learning, the computation usually happens in the data center. For instance, when you upload a photo to Facebook, a deep learning model is run in Facebook's data center to detect your friends' faces and help you tag them.
With on-device machine learning, the inference happens on your device. A common example is Snapchat face filters...