Setup is probably the most crucial period of a WebRTC conference, it is the period when users need to give access to their devices and the endpoints negotiate settings that work for each participant’s setup. The setup period ends when the participants are connected to each other or they fail to connect. In this blog post, we will go through the various phases of the conference setup process and the appropriate metrics that are used to diagnose the quality of the conference setup.
Steps in conference setup
WebRTC conference setup has multiple stages, some can run in parallel.
To set up a WebRTC conference, the endpoints need to figure out what settings work for everyone. The purpose of the Session Description Protocol (SDP) is to convey information about media streams in multimedia sessions to allow the recipients of a session description to participate in the session. The signaling protocol carries information, for example, about the type of media (audio or video), codecs, transport protocols, and the information needed to receive media (essentially the ICE candidates). With ICE candidates (for example, IP addresses for clients and server, ports) the WebRTC clients decide how to connect directly to each other through NATs and firewalls.
The candidates are prioritized and sent to other participants giving each participant a list of prioritized candidates for other participants, this can be sent as they gather them or when they have gathered all the candidates. During the exchange, the endpoints start connection checks, which ends when a suitable candidate is found between the endpoints. The conference starts when the connection checks have finished and media starts flowing between the endpoints.
The above description describes a linear process, however, some of the steps can be run in parallel.
Failures during conference setup
Due to the complex nature of the handshake process, many things can go wrong. We have broadly categorized these pre-setup failures into three categories:
Media source related
- No working devices: media device cannot be accessed (e.g., it’s acquired by other application, device does not have it, etc.).
- No permissions: user refused to grant access to the device for an application.
- Incorrect Constraints: media device settings set by the application cannot be fulfilled (e.g., too high a frame rate).
To get a more detailed explanation, read our blog post about errors in accessing user media.
Negotiation related
- SDP generation error: SDP offer wasn’t generated due to some error.
- Negotiation failure: failed to find common configurations for crypto, network and media configuration failed.
During NAT/FW traversal
- ICE connection failure: checks failed because of a restrictive NAT or firewall.
- Aborted: checks have not failed or completed, however user closes the call.
We also measure issues after a conference was set up, i.e., churn, renegotiation failed, connectivity disruptions, missed, which will be covered in an upcoming blogpost.
Metrics related to conference setup
We have acknowledged that set up is a critical period and to help debug possible issues in conference setup we calculate four metrics related to the period:
- Setup delay
- Gathering delay
- Connection check delay
- Time to first media
Setup delay
From joining the conference until the connectivity checks have finished, setup delay measures the time it takes to be ready to send and receive media from other endpoints in the conference. The community is working really hard to reduce the setup delay, for example, tweaking the interval of performing connectivity checks. Our last report showed a 30% improvement in set up times since autumn of 2016. As a rule of thumb, if the setup delay exceeds five seconds, it is considered long.
Gathering delay
The ICE gathering delay represents the time taken for the endpoint to collect all the ICE candidates associated with the endpoint. These include the host, peer, and server reflexive candidates. Some services start negotiating the candidate as soon as they become available. While some other services collect and cache candidates even before the end-user intends to make a call, and update them a part of their presence systems.
Based on our previous WebRTC Industry report, for about 80% of the WebRTC sessions gathering delay is less than 100 ms. On the other hand, for about 5% of the WebRTC sessions ICE gathering takes more than 1 second. The key to lowering gathering delay is to place TURN servers closer to the endpoints.
Connection check delay
The ICE connectivity check delay is the time taken from the start of the connectivity check to the time when the chosen candidate is selected by the endpoint. This metric encompasses the network delay between the endpoints, i.e. if a pair of endpoints are geographically far apart, the checks will take longer.
Another factor that affects the checks is how often are they sent, i.e. the interval between two consecutive checks (pacing interval). If the pacing interval is long, the checks will take longer. In October 2016, the connection check delay was less than 100 ms for 20% of the WebRTC sessions.
Time to first media
The conference participant experiences a delay, for example, from pressing a call button to when she can see or hear the other participants for the first time. Time To First Media (TTFM) measures this delay, and it is an indication of the real world experience that users might also mention in feedback or support tickets. Ideally, TTFM should only be slightly higher than setup delay.
We are releasing the next edition of the WebRTC Metrics Report series. Sign up for our monthly newsletter to be the among the first ones to get report.