A simple p2p video conference – the browser application
For client-side code that runs in the user's web browser, we will use plain JavaScript.
The WebRTC API functions have different names in different web browsers. To make your application work well with all the browsers, you need to detect which web browser your application is running under and use the appropriate API function names. First of all, we need to implement a helper or an adapter to the WebRTC API functions.
Please note that this situation with different function names is temporary, and after WebRTC is standardized, every browser will support the standard WebRTC API function names. Thus, the WebRTC adapter that we're developing here will probably not be necessary in the future.
Developing a WebRTC API adapter
Create the www/myrtcadapter.js
file:
function initWebRTCAdapter() {
Check whether we're running the file in Firefox:
if (navigator.mozGetUserMedia) { webrtcDetectedBrowser = "firefox";
Redefine the RTCPeerConnection
API function, an entity to keep and control a peer connection itself:
RTCPeerConnection = mozRTCPeerConnection;
To control the session description entity, we will use RTCSessionDescription
:
RTCSessionDescription = mozRTCSessionDescription;
To support the NAT traversal functionality, we need to use the RTCIceCandidate
entity:
RTCIceCandidate = mozRTCIceCandidate;
We want to get access to audio and video and for that, we need to use the getUserMedia
API function:
getUserMedia = navigator.mozGetUserMedia.bind(navigator);
Besides the WebRTC API functions, different web browsers have different ways to control HTML entities that we need to use. For example, Chrome and Firefox attach the media stream to a media entity (the HTML tag video) in different ways. Thus, we need to redefine additional functions here.
We define the following two functions to attach and reattach the media stream to a video HTML tag:
attachMediaStream = function(element, stream) { element.mozSrcObject = stream; element.play(); }; reattachMediaStream = function(to, from) { to.mozSrcObject = from.mozSrcObject; to.play(); };
Here, we define two functions to be able to get audio-video tracks from a media stream. Unfortunately, there is no way to do this on a Firefox version that is older than Version 27; thus, here, we just have redundant functions to make our adapter universal:
if (!MediaStream.prototype.getVideoTracks) { MediaStream.prototype.getVideoTracks = function() { return []; }; }; if (!MediaStream.prototype.getAudioTracks) { MediaStream.prototype.getAudioTracks = function() { return []; }; }; return true;
Next, we do the same for Chrome:
} else if (navigator.webkitGetUserMedia) { webrtcDetectedBrowser = "chrome"; RTCPeerConnection = webkitRTCPeerConnection; getUserMedia = navigator.webkitGetUserMedia.bind(navigator);
As you can see here, we use different ways to support the "attach media stream" functionality for Chrome from the ways we used for Firefox previously:
attachMediaStream = function(element, stream) { element.src = webkitURL.createObjectURL(stream); }; reattachMediaStream = function(to, from) { to.src = from.src; };
Chrome does support the functionality to get video and audio tracks and so, here, we have a different approach as compared to the one we used for Firefox previously:
if (!webkitMediaStream.prototype.getVideoTracks) { webkitMediaStream.prototype.getVideoTracks = function() { return this.videoTracks; }; webkitMediaStream.prototype.getAudioTracks = function() { return this.audioTracks; }; } if (!webkitRTCPeerConnection.prototype.getLocalStreams) { webkitRTCPeerConnection.prototype.getLocalStreams = function() { return this.localStreams; }; webkitRTCPeerConnection.prototype.getRemoteStreams = function() { return this.remoteStreams; }; } return true; } else return false; };
Developing a WebRTC API wrapper
It is useful to develop a little WebRTC API wrapper library to use it in your application.
Create a file and name it www/myrtclib.js
.
First of all, we need to define several variables to control WebRTC entities and use the API. We make them equal to null
. However, using our adapter that we developed previously, these variables will refer to appropriate API functions:
var RTCPeerConnection = null; var getUserMedia = null; var attachMediaStream = null; var reattachMediaStream = null; var webrtcDetectedBrowser = null;
Here, we keep the virtual room number:
var room = null;
The initiator
variable keeps the initiator state that tells us whether we are calling our peer or are waiting for a call:
var initiator;
The following two variables keep the references to local and remote media streams:
var localStream; var remoteStream;
We need the pc
variable to control a peer connection:
var pc = null;
As we discussed previously, we need a signaling mechanism to make our connection work. The following variable will store the URL that will point to our signaling server:
var signalingURL;
The following variables keep the HTML video entities: local and remote. They are just IDs of video
HTML tags:
var localVideo; var remoteVideo;
We want to know whether our signaling channel is ready for operation, and we need a variable to control it:
var channelReady; var channel;
Here, we define two STUN servers to support the NAT traversal functionality:
var pc_config = {"iceServers": [{url:'stun:23.21.150.121'}, {url:'stun:stun.l.google.com:19302'}]};
We also need to define constraints. Using this, we tell a web browser whether we want to use just audio for our conference, or video, or both:
var sdpConstraints = {'mandatory': {'OfferToReceiveAudio':true, 'OfferToReceiveVideo':true }};
Next, we define several wrapping/helping functions to make our code more universal and reusable.
This is our initialization function. It gets a signaling server's URL and references to local and remote video HTML entities.
Here, we perform the initialization of our API adapter that we developed earlier; after this, we will have universal API function names that we can use under any web browser that supports WebRTC.
After the adapter is initialized, we call the openChannel
function that we use to initiate a connection to our signaling server:
function myrtclibinit(sURL, lv, rv) { signalingURL = sURL; localVideo = lv; remoteVideo = rv; initWebRTCAdapter(); openChannel(); };
The openChannel
function opens a connection to our signaling server. Here, we use WebSockets as a transport layer, but it is not mandatory. You can create your own implementation using Ajax, for example, or any other suitable technology that you like the most:
function openChannel() { channelReady = false; channel = new WebSocket(signalingURL);
This callback function will be called if our signaling connection has been established successfully. We can't continue if the signaling channel has not been opened:
channel.onopen = onChannelOpened;
When our peer sends a message during the process of establishing the peer connection, the onChannelMessage
callback function will be called and we will be able to react on it:
channel.onmessage = onChannelMessage;
If the signaling channel has been closed due to some reason (our peer closed its browser or the signaling sever has been powered down), we will get a notification from the onChannelClosed
function and react on these two event: show a message to the user or try to re-establish a connection:
channel.onclose = onChannelClosed; };
We will get here after the signaling channel has been opened successfully and we can continue and start our conference:
function onChannelOpened() {
First of all, we need to indicate that the signaling channel is opened and alive:
channelReady = true;
Here, we try to understand whether we're calling to our peer or we're waiting for a call from it.
We take the URL of our location and try to find the room
word inside of it. If there is no such word, then we're going to create a virtual room and act passively, waiting for a call from someone.
If we find the room
word, it means that someone has already created a virtual room and we want to enter it; we're in a calling state and should behave actively, trying to initiate a connection to our peer in the room.
We use the sendMessage
function to send messages to our signaling server. If the virtual room has not been created yet, then the signaling server will create it and return its room number back to us. In case we have a virtual room number, we ask the signaling server to enter us in to the room; it will parse our message and send it to our peer to initiate the establishment of a direct connection:
if(location.search.substring(1,5) == "room") { room = location.search.substring(6); sendMessage({"type" : "ENTERROOM", "value" : room * 1}); initiator = true; } else { sendMessage({"type" : "GETROOM", "value" : ""}); initiator = false; }
We solved our questions with the virtual room; now, we need to ask the browser to give us access to the browser's media resources, video (web camera), and audio (mic):
doGetUserMedia(); };
The following function is called when we get a message from our signaling server. Here, we can add some logging or any additional logic but for now, we just need to process the message and react on it:
function onChannelMessage(message) { processSignalingMessage(message.data); };
The onChannelClosed
function will be called when the signaling server becomes unavailable (a dropped connection) or if the remote peer has closed the connection (the remote customer has closed its web browser, for example).
In this function, you can also show an appropriate message to your customer or implement any other additional logic.
In the following function, we just indicate that the channel has been closed, and we don't want to transfer any messages to our signaling server:
function onChannelClosed() { channelReady = false; };
To communicate with the signaling server, we use the sendMessage
function. It gets a message as a JSON object, makes a string from it, and just transfers it to the signaling server.
When debugging, it is usually helpful to add some kind of message-logging functionality here:
function sendMessage(message) { var msgString = JSON.stringify(message); channel.send(msgString); };
We need to parse messages from the signaling server and react on them, respectively:
function processSignalingMessage(message) { var msg = JSON.parse(message);
If we get an offer
message, then it means that someone is calling us and we need to answer the call:
if (msg.type === 'offer') { pc.setRemoteDescription(new RTCSessionDescription(msg)); doAnswer();
If we get an answer
message from the signaling server, it means that we just tried to call someone and it replied with the answer
message, confirming that it is ready to establish a direct connection:
} else if (msg.type === 'answer') { pc.setRemoteDescription(new RTCSessionDescription(msg));
When a remote peer sends a list of candidates to communicate with, we get this type of message from the signaling server. After we get this message, we add candidates to the peer connection:
} else if (msg.type === 'candidate') { var candidate = new RTCIceCandidate({sdpMLineIndex:msg.label, candidate:msg.candidate}); pc.addIceCandidate(candidate);
If we asked the signaling server to create a virtual room, it will send a GETROOM
message with the created room's number. We need to store the number to use it later:
} else if (msg.type === 'GETROOM') { room = msg.value;
The OnRoomReceived
function is called to implement an additional functionality. Here, we can perform some UI-related actions, such as showing the room's URL to the customers so that they can share it with their friends:
OnRoomReceived(room);
If we get an URL from our friend that asks us to enter a virtual room but the room number is wrong or outdated, we will get the WRONGROOM
message from the signaling server. If so, we are just moving to the index page:
} else if (msg.type === 'WRONGROOM') { window.location.href = "/"; } };
Here, we're asking the web browser to get us access to the microphone and web camera.
Chrome will show a pop-up window to the user that will ask the user whether he/she wants to provide access or not. So, you will not get access until the user decides. Chrome will ask this every time the user opens your application page. To avoid this and make Chrome remember your choice, you should use the HTTPS connection with the SSL/TLS certificate properly configured in the web server that you're using. Please note that the certificate either needs to be signed by a public CA (Certificate Authority), or by a private CA whose identity has been configured in the browser/client computer. If the browser doesn't trust the certificate automatically and prompts the user to indicate an exception, then your choice will not be remembered by Chrome.
Firefox won't remember the choice, but this behavior can be changed in future:
function doGetUserMedia() { var constraints = {"audio": true, "video": {"mandatory": {}, "optional": []}}; try {
We ask the WebRTC API to call our callback function, onUserMediaSuccess
, if we have got the access rights from the user:
getUserMedia(constraints, onUserMediaSuccess, null);
If we didn't get the access rights, we'll get an exception. Here, you probably want to add some logging and UI logic to inform your customer that something is wrong and we can't continue:
} catch (e) { } };
We will get trapped here if we get the access rights to reach the web camera and microphone via the web browser:
function onUserMediaSuccess(stream) {
We get a video stream from a local web camera and we want to show it on the page, so we're attaching the stream to the video
tag:
attachMediaStream(localVideo, stream);
Store the stream in a variable because we want to refer to it later:
localStream = stream;
Now we're ready to create a direct connection to our peer:
createPeerConnection();
After the peer connection is created, we put our local video stream into it to make the remote peer see us:
pc.addStream(localStream);
Check whether we're waiting for a call or we're the caller. If we're the initiator, we call the doCall
function to initiate an establishment to a direct connection:
if (initiator) doCall(); };
The following function will try to create a peer connection—a direct connection between peers:
function createPeerConnection() {
To improve the security of the connection, we ask the browser to switch on the DTLS-SRTP option. It enables the exchange of the cryptographic parameters and derives the keying material. The key exchange takes place in the media plane and is multiplexed on the same ports as the media itself.
This option was disabled in Chrome by default, but it has been enabled from Version 31 onwards. Nevertheless, we don't want to check the version of a browser used by our customer, so we can't rely on the default settings of the browser:
var pc_constraints = {"optional": [{"DtlsSrtpKeyAgreement": true}]}; try {
Create a peer connection using the WebRTC API function call. We pass a predefined list of STUN servers and connection configurations to the function:
pc = new RTCPeerConnection(pc_config, pc_constraints);
Here, we define a callback function to be called when we have to send the ICE candidates to the remote part:
pc.onicecandidate = onIceCandidate;
When the connection is established, the remote side will add its media stream to the connection. Here, we want to be informed of such an event in order to be able to show the remote video on our web page:
pc.onaddstream = onRemoteStreamAdded;
If the establishment of the connection fails, we will get an exception. Here, you can add debug console logging and UI improvements to inform the customer that something is wrong:
} catch (e) { pc = null; return; } };
When we have ICE candidates from the WebRTC API, we want to send them to the remote peer in order to establish a connection:
function onIceCandidate(event) { if (event.candidate) sendMessage({type: 'candidate', label: event.candidate.sdpMLineIndex, id: event.candidate.sdpMid, candidate: event.candidate.candidate}); };
We will get trapped into this function when a direct connection has been established and a remote peer has added its media stream to the connection. We want to show a remote video so, here, we're attaching a remote video to the video
tag on the web page:
function onRemoteStreamAdded(event) { attachMediaStream(remoteVideo, event.stream);
We also want to store a reference to the remote stream in order to use it later:
remoteStream = event.stream; };
The following function is called by us when we're joining a virtual room and initiating a call to the remote peer:
function doCall() {
We don't want to use the data channel yet (as it will be introduced in the next chapter). It is enabled in Firefox by default so here, we're asking Firefox to disable it:
var constraints = {"optional": [], "mandatory": {"MozDontOfferDataChannel": true}};
Check whether we're running this execution under Chrome and if so, remove the unnecessary options that are preconfigured to run under Firefox:
if (webrtcDetectedBrowser === "chrome") for (var prop in constraints.mandatory) if (prop.indexOf("Moz") != -1) delete constraints.mandatory[prop];
Merge browser options with the whole constraints
entity, and call the createOffer
function in order to initiate a peer connection. In case of a success, we will get into the setLocalAndSendMessage
function:
constraints = mergeConstraints(constraints, sdpConstraints); pc.createOffer(setLocalAndSendMessage, null, constraints); };
If we're waiting for a call and have got an offer from a remote peer, we need to answer the call in order to establish a connection and begin the conference.
Here is the function that will be used to answer a call. As is the case with doAnswer
, we will get into the setLocalAndSendMessage
function in case of a success:
function doAnswer() { pc.createAnswer(setLocalAndSendMessage, null, sdpConstraints); };
The preceding callback function is used during the process of establishing a connection by the WebRTC API. We receive a session description entity, and then we need to set up a local description and send an SDP object to the remote peer via a signaling server:
function setLocalAndSendMessage(sessionDescription) { pc.setLocalDescription(sessionDescription); sendMessage(sessionDescription); };
The following is a simple helper that merges the constraints:
function mergeConstraints(cons1, cons2) { var merged = cons1; for (var name in cons2.mandatory) merged.mandatory[name] = cons2.mandatory[name]; merged.optional.concat(cons2.optional); return merged; };
Developing an index page
We have two JavaScript files under our www
directory: myrtclib.js
and myrtcadapter.js
.
Now, it's time to use them and create an index page of the application.
Create an index page file, www/index.html
:
<!DOCTYPE html> <html> <head> <title>My WebRTC application</title>
Here, we defined a style for the page to place a local and remote video object one by one on the same row:
<style type="text/css"> section { width: 90%; height: 200px; background: red; margin: auto; padding: 10px; } div#lVideo { width: 45%; height: 200px; background: black; float: left; } div#rVideo { margin-left: 45%; height: 200px; background: black; } </style>
Include our adapter and wrapper JavaScript code:
<script type="text/javascript"src="myrtclib.js"></script> <script type="text/javascript"src="myrtcadapter.js"></script> </head>
We want to perform some additional actions after the page is loaded, but before the start of the conferencing, we use the onLoad
property of the body
HTML tag to call the appropriate function:
<body onLoad="onPageLoad();">
The status
div will be used to store the information about the customer. For example, we will put a URL there with a virtual room number that is to be shared between the peers:
<div id='status'></div> <section>
Local and remote video objects
We use the autoplay
option to start the video streaming automatically after the media stream has been attached.
We mute the local video object in order to avoid the local echo effect:
<div id='lVideo'> <video width="100%" height="100%"autoplay="autoplay" id="localVideo" muted="true"></video> </div> <div id='rVideo'> <video width="100%" height="100%"autoplay="autoplay" id="remoteVideo"></video> </div> </section>
The following function will be called by the web browser after the page has been loaded:
<script> function onPageLoad() {
First of all, we will try to make the UI look nicer. Here, we try to get the width of every video object and set an appropriate height parameter. We assume that the width/height is 4/3 and calculate the height for each object respectively:
var _divV = document.getElementById("lVideo"); var _w = _divV.offsetWidth; var _h = _w * 3 / 4; _divV.offsetHeight = _h; _divV.setAttribute("style","height:"+_h+"px"); _divV.style.height=_h+'px'; _divV = document.getElementById("rVideo"); _divV.setAttribute("style","height:"+_h+"px"); _divV.style.height=_h+'px';
This is the main point where we start our conference. We pass the signaling server's URL and local/remote objects' references to the initialization function, and the magic begins.
Please use appropriate IP address and port values where your signaling server is running (we will begin to build it in the next page):
myrtclibinit("ws://IP:PORT",document.getElementById("localVideo"),document.getElementById("remoteVideo")); };
This is a callback function called from our myrtclib.js
script when the signaling server returns a virtual room's number. Here, we construct an appropriate URL for our customer to share it with a friend:
function OnRoomReceived(room) { var st = document.getElementById("status"); st.innerHTML = "Now, if somebody wants to join you, should use this link: <a href=\""+window.location.href+"?room="+room+"\">"+window.location.href+"?room="+room+"</a>"; }; </script> </body> </html>