Establishing a peer-to-peer connection
WebRTC can't create direct connections between peers without the help of a signaling server. The signaling server is not something standardized that your application can use. Actually, any communication mechanism that allows us to exchange Session Description Protocol (SDP) data between peers can be used for signalization. SDP is described in the next section.
A connection between peers and a signaling server is usually called a signaling channel. In this chapter, we will use WebSockets to build our signaling server.
Also, peers that exchange SDP data should exchange data about the network connection (even called ICE candidates).
The Session Description Protocol
SDP is an important part of the WebRTC stack. It is used to negotiate on-session/media options while establishing a peer connection.
It is a protocol that is intended to describe multimedia communication sessions for the purposes of session announcement, session invitation, and parameter negotiation. It does not deliver the media data itself, but is used for negotiation between peers of various media types, formats, and all associated properties/options, such as resolution, encryption, and codecs. The set of properties and parameters is usually called a session profile.
Peers have to exchange SDP data using the signaling channel before they can establish a direct connection.
The following is an example of an SDP offer:
v=0 o=alice 2890844526 2890844526 IN IP4host.atlanta.example.com s= c=IN IP4host.atlanta.example.com t=0 0 m=audio 49170 RTP/AVP 0 8 97 a=rtpmap:0PCMU/8000 a=rtpmap:8PCMA/8000 a=rtpmap:97iLBC/8000 m=video 51372 RTP/AVP 31 32 a=rtpmap:31H261/90000 a=rtpmap:32MPV/90000
Here, we can see that this is a video and audio session, and multiple codecs are offered.
The following is an example of an SDP answer:
v=0 o=bob 2808844564 2808844564 IN IP4host.biloxi.example.com s= c=IN IP4host.biloxi.example.com t=0 0 m=audio 49174 RTP/AVP 0 a=rtpmap:0PCMU/8000 m=video 49170 RTP/AVP 32 a=rtpmap:32MPV/90000
Here, we can see that only one codec is accepted in response to the preceding offer.
You can find more SDP sessions' examples at https://www.rfc-editor.org/rfc/rfc4317.txt.
You can also find deep details on SDP in the appropriate RFC at http://tools.ietf.org/html/rfc4566.
ICE and ICE candidates
Interactive Connectivity Establishment (ICE) is a mechanism that allows peers to establish a connection. In real life, customers usually don't have a direct connection to the Internet; they are connected via network devices/routers, have private IP addresses, use NAT, use network firewalls, and so on. Usually, customers' devices don't have public IP addresses. ICE uses STUN/TURN protocols to make peers establish a connection.
You can find details on ICE in the appropriate RFC at https://tools.ietf.org/html/rfc5245.
NAT traversal
WebRTC has an in-built mechanism to use NAT traversal options such as STUN and TURN servers.
In this chapter, we will use public STUN servers, but in real life, you should install and configure your own STUN or TURN server. We will learn how to install a STUN server at the end of this chapter as a bonus to the developed application. We will get into installing and configuring the TURN server in Chapter 4, Security and Authentication, while diving into the details.
In most cases, you will use a STUN server; it helps perform a NAT/firewall traversal and establish a direct connection between the peers. In other words, the STUN server is utilized only during the stage of establishing a connection. After the connection has been established, peers will transfer the media data directly between them.
In some cases (unfortunately, they are not so rare), the STUN server won't help you get through a firewall or NAT, and establishing a direct connection between the peers will be impossible, for example, if both peers are behind a symmetric NAT. In this case, the TURN server can help you.
A TURN server works as a retransmitter between the peers. Using the TURN server, all the media data between the peers will be transmitted through the TURN server.
If your application gives a list of several STUN/TURN servers to a WebRTC API, then the web browser will try to use STUN servers first; in case the connection failed, it will try to use the TURN servers automatically.
WebSocket
WebSocket is a protocol that provides full-duplex communication channels over a single TCP connection. This is a relatively young protocol but today all major web browsers, including Chrome, Internet Explorer, Opera, Firefox, and Safari, support it. WebSocket is a replacement for long polling to get a two-way communication between the browser and server.
In this chapter, we will use WebSocket as a transport channel to develop a signaling server for our video conference service. Our peers will communicate with the signaling server using this.
Two important benefits of WebSocket are that it does support HTTPS (secure channel), and it can be used via web proxy (nevertheless, some proxies can block the WebSocket protocol).