Enabling audio and video on the Web
The biggest accomplishment of WebRTC is bringing high-quality audio and video to the open the Web without the need for third-party software or plugins. Currently, there are no high-quality, well-built, freely available solutions that enable real-time communication in the browser. The success of the Internet is largely due to the high availability and open use of technologies, such as HTML, HTTP, and TCP/IP. To move the Internet forward, we want to continue building on top of these technologies. This is where WebRTC comes into play.
To build a real-time communication application from scratch, we would need to bring in a wealth of libraries and frameworks to deal with the many issues faced when developing these types of applications. These typically include software to handle connection dropping, data loss, and NAT traversal. The great thing about WebRTC is that all of this comes built-in to the browser API. Google has open sourced much of the technology involved in accomplishing this communication in a high-quality and complete manner.
Note
Most of the information about WebRTC, including the source code of its implementation, can be found freely available at http://www.webrtc.org/.
With WebRTC, the heavy lifting is all done for you. The API brings a host of technologies into the browser to make implementation details easy. This includes camera and microphone capture, video and audio encoding and decoding, transportation layers, and session management.
Camera and microphone capture
The first step to using any communication platform is to gain access to the camera and microphone on the device that the user is using. This means detecting the types of devices available, getting permission from the user to access them, and obtaining a stream of data from the device itself. This is where we will begin implementing our first application.
Encoding and decoding audio and video
Unfortunately, even with the improvements made in network speed, sending a stream of audio and video data over the Internet is too much to handle. This is where encoding and decoding comes in. This is the process of breaking down video frames or audio waves into smaller chunks and compressing them into a smaller size. The smaller size then makes it faster to send them across a network and decompress them on the other side. The algorithm behind this technique is typically called a codec.
If you have ever had trouble playing a video file on your computer, then you have some insight into the complex world of video and audio codecs. There are several different ways to encode audio and video streams, each with their different benefits. To add to this, there are many different companies that have different business goals behind creating and maintaining a codec. This means not all of the codecs are free for everyone to use.
There are many codecs in use inside WebRTC. These include H.264, Opus, iSAC, and VP8. When two browsers speak to each other, they pick the most optimal supported codec between the two users. The browser vendors also meet regularly to decide which codecs should be supported in order for the technology to work. You can read more about the support for various codecs at http://www.webrtc.org/faq.
You could easily write several books on the subject of codecs. In fact, there are many books already written on the subject. Fortunately for us, WebRTC does most of the encoding in the browser layer. We will not worry about it over the course of this book but, once you start venturing past basic video and audio communication, you will more than likely bump heads with codec support.
Transportation layer
The transportation layer is the topic of several other books as well. This layer deals with packet loss, ordering of packets, and connecting to other users. The API makes it easy to deal with the fluctuations of a user's network and facilitates reacting to changes in connectivity.
The way WebRTC handles packet transport is very similar to how the browser handles other transport layers, such as AJAX or WebSockets. The browser gives an easy-to-access API with events that tell you when there are issues with the connection. In reality, the code to handle a simple WebRTC call could be thousands or tens of thousands of lines long. These can be used to handle all the different use cases, ranging from mobile devices, desktops, and more.
Session management
Session management is the final piece of the WebRTC puzzle. This is simpler than managing network connectivity or dealing with codecs but still an important piece of the puzzle. This will deal with opening multiple connections in a browser, managing open connections, and organizing what goes to which person. This can most commonly be called signaling and will be dealt with more in Chapter 4, Creating a Signaling Server.
Included in this array of new features is also support for data transfer. Since a high-quality data connection is needed between two clients for audio and video, it also makes sense to use this connection to transfer arbitrary data. This is exposed to the JavaScript layer through the RTCDataChannel API. We will cover this in more detail at a later point.
WebRTC today has many of the building blocks needed to build an extremely high-quality real-time communication experience. Google, Mozilla, Opera, and many others have invested a wealth of time and effort through some of their best video and audio engineers to bring this experience to the Web. WebRTC even has roots in the same technology used to bring Voice over Internet Protocol (VoIP) communication to users. It will change the future of how engineers think about building real-time communication applications.