Using a handheld mobile device, such as a smartphone or tablet, augmented reality uses the device's camera to capture the video of the real world and combine it with virtual objects.
As illustrated in the following image, running an AR app on a mobile device, you simply point its camera to a target in the real world and the app will recognize the target and render a 3D computer graphic registered to the target's position and orientation. This is handheld mobile video see-through augmented reality:
We use the words handheld and mobile because we're using a handheld mobile device. We use video see-through because we're using the device's camera to capture reality, which will be combined with computer graphics. The AR video image is displayed on the device's flat screen.
Mobile devices have features important for AR, including the following:
- Untethered and battery-powered
- Flat panel graphic display touchscreen input
- Rear-facing camera
- CPU (main processor), GPU (graphics processor), and memory
- Motion sensors, namely accelerometer for detecting linear motion and gyroscope for rotational motion
- GPS and/or other position sensors for geolocation and wireless and/or Wi-Fi data connection to the internet
Let's chat about each of these. First of all, mobile devices are... mobile.... Yeah, I know you get that. No wires. But what this really means is that like you, mobile devices are free to roam the real world. They are not tethered to a PC or other console. This is natural for AR because AR experiences take place in the real world, while moving around in the real world.
Mobile devices sport a flat panel color graphic display with excellent resolution and pixel density sufficient for handheld viewing distances. And, of course, the killer feature that helped catapult the iPhone revolution is the multitouch input sensor on the display that is used for interacting with the displayed images with your fingers.
A rear-facing camera is used to capture video from the real world and display it in real time on the screen. This video data is digital, so your AR app can modify it and combine virtual graphics in real time as well. This is a monocular image, captured from a single camera and thus a single viewpoint. Correspondingly, the computer graphics use a single viewpoint to render the virtual objects that go with it.
Today's mobile devices are quite powerful computers, including CPU (main processor) and GPU (graphics processor), both of which are critical for AR to recognize targets in the video, process sensor, and user input, and render the combined video on the screen. We continue to see these requirements and push hardware manufacturers to try ever harder to deliver higher performance.
Built-in sensors that measure motion, orientation, and other conditions are also key to the success of mobile AR. An accelerometer is used for detecting linear motion along three axes and a gyroscope for detecting rotational motion around the three axes. Using real-time data from the sensors, the software can estimate the device's position and orientation in real 3D space at any given time. This data is used to determine the specific view the device's camera is capturing and uses this 3D transformation to register the computer-generated graphics in 3D space as well.
In addition, GPS sensor can be used for applications that need to map where they are on the globe, for example, the use of AR to annotate a street view or mountain range or find a rogue Pokémon.
Last but not least, mobile devices are enabled with wireless communication and/or Wi-Fi connections to the internet. Many AR apps require an internet connection, especially when a database of recognition targets or metadata needs to be accessed online.