WebRTC 1.0: Real-time Communication Between Browsers

This document defines a set of ECMAScript APIs in WebIDL to allow media to be sent to and received from another browser or device implementing the appropriate set of real-time protocols. This specification is being developed in conjunction with a protocol specification developed by the IETF RTCWEB group and an API specification to get access to local media devices developed by the Media Capture Task Force.

This document is neither complete nor stable, and as such is not yet suitable for commercial implementation. However, early experimentation is encouraged. The API is based on preliminary work done in the WHATWG. The Web Real-Time Communications Working Group expects this specification to evolve significantly based on:

The outcome of ongoing exchanges in the companion RTCWEB group at IETF to define the set of protocols that, together with this document, will enable real-time communications in Web browsers.
Privacy issues that arise when exposing local capabilities and local streams.
Technical discussions within the group.
Experience gained through early experimentations.
Feedback received from other groups and individuals.

This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [[!WEBIDL]], as this specification uses that specification and terminology.

Terminology

The EventHandler interface represents a callback used for event handlers as defined in [[!HTML5]].

The concepts queue a task and fires a simple event are defined in [[!HTML5]].

The terms event, event handlers and event handler event types are defined in [[!HTML5]].

The terms MediaStream, MediaStreamTrack, Constraints, and Consumer are defined in [[!GETUSERMEDIA]].

Peer-to-peer connections

Introduction

An RTCPeerConnection allows two users to communicate directly, browser to browser. Communications are coordinated via a signaling channel which is provided by unspecified means, but generally by a script in the page via the server, e.g. using XMLHttpRequest [[XMLHttpRequest]].

Configuration

RTCConfiguration Type

sequence<RTCIceServer> iceServers: An array containing URIs of servers available to be used by ICE, such as STUN and TURN server.
RTCIceTransportPolicy iceTransportPolicy = "all": Indicates which candidates the ICE engine is allowed to use.
RTCBundlePolicy bundlePolicy = "balanced": Indicates which BundlePolicy to use.
DOMString peerIdentity: Sets the target peer identity for the RTCPeerConnection. The RTCPeerConnection will establish a connection to a remote peer unless it can be successfully authenticated with the provided name.

RTCIceServer Type

(DOMString or sequence<DOMString> urls: STUN or TURN URI(s) as defined in [[!RFC7064]] and [[!RFC7065]] or other URI types.
DOMString username: If this RTCIceServer object represents a TURN server, then this attribute specifies the username to use with that TURN server.
DOMString credential: If this RTCIceServer object represents a TURN server, then this attribute specifies the credential to use with that TURN server.

In network topologies with multiple layers of NATs, it is desirable to have a STUN server between every layer of NATs in addition to the TURN servers to minimize the peer to peer network latency.

An example array of RTCIceServer objects is:

[ { "urls": "stun:stun1.example.net" }, { "urls": "turn:turn.example.org", "username": "user", "credential": "myPassword" } ]

RTCIceTransportPolicy Enum

none: The ICE engine MUST not send or receive any packets at this point.
relay: The ICE engine MUST only use media relay candidates such as candidates passing through a TURN server. This can be used to reduce leakage of IP addresses in certain use cases.
all: The ICE engine may use any type of candidates when this value is specified.

RTCBundlePolicy Enum

Defined in [[!RTCWEB-JSEP]]. The following is a non-normative summary for convenience. The BundlePolicy effects which media tracks are negotiated if the remote endpoint is not BUNDLE-aware, and what ICE candidates are gathered. If the remote endpoint is BUNDLE-aware, all media tracks and data channels are BUNDLEd onto the same transport.

balanced: Gather ICE candidates for each media type in use (audio, video, and data). If the remote endpoint is not BUNDLE-aware, negotiate only one audio and video track on separate transports.
max-compat: Gather ICE candidates for each track. If the remote endpoint is not BUNDLE-aware, negotiate all media tracks on separate transports.
max-bundle: Gather ICE candidates for only one track. If the remote endpoint is not BUNDLE-aware, negotiate only one media track.

Offer/Answer Options

These dictionaries describe the options that can be used to control the offer/answer creation process.

long offerToReceiveVideo

In some cases, an RTCPeerConnection may wish to receive video but not send any video. The RTCPeerConnection needs to know if it should signal to the remote side whether it wishes to receive video or not. This option allows an application to indicate its preferences for the number of video streams to receive when creating an offer.

long offerToReceiveAudio

In some cases, an RTCPeerConnection may wish to receive audio but not send any audio. The RTCPeerConnection needs to know if it should signal to the remote side whether it wishes to receive audio. This option allows an application to indicate its preferences for the number of audio streams to receive when creating an offer.

boolean voiceActivityDetection = true

Many codecs and system are capable of detecting "silence" and changing their behavior in this case by doing things such as not transmitting any media. In many cases, such as when dealing with emergency calling or sounds other than spoken voice, it is desirable to be able to turn off this behavior. This option allows the application to provide information about whether it wishes this type of processing enabled or disabled.

boolean iceRestart = false

When the value of this dictionary member is true, the generated description will have ICE credentials that are different from the current credentials (as visible in the localDescription attribute's SDP). Applying the generated description will restart ICE.

When the value of this dictionary member is false, and the localDescription attribute has valid ICE credentials, the generated description will have the same ICE credentials as the current value from the localDescription attribute.

yes: An identity MUST be requested.
no: No identity is to be requested.
ifconfigured: The value "ifconfigured" means that an identity will be requested if either the user has configured an identity in the browser or if the setIdentityProvider() call has been made in JavaScript. As this is the default value, an identity will be requested if and only if the user has configured an IdP in some way.

RTCPeerConnection Interface

The general operation of the RTCPeerConnection is described in [[!RTCWEB-JSEP]].

Operation

Calling new RTCPeerConnection(configuration ) creates an RTCPeerConnection object.

The configuration has the information to find and access the servers used by ICE. There may be multiple servers of each type and any TURN server also acts as a STUN server.

An RTCPeerConnection object has an associated ICE agent [[!ICE]], RTCPeerConnection signaling state, ICE gathering state, and ICE connection state. These are initialized when the object is created.

When the RTCPeerConnection() constructor is invoked, the user agent MUST run the following steps:

Validate the RTCConfiguration argument by running the steps defined by the updateIce() method.
Let connection be a newly created RTCPeerConnection object.
Create an ICE Agent as defined in [[!ICE]] and let connection's RTCPeerConnection ICE Agent be that ICE Agent and provide it the the ICE servers list. The ICE Agent will proceed with gathering as soon as the ICE transports setting is not set to none. At this point the ICE Agent does not know how many ICE components it needs (and hence the number of candidates to gather), but it can make a reasonable assumption such as 2. As the RTCPeerConnection object gets more information, the ICE Agent can adjust the number of components.
Set connection's RTCPeerConnection signalingState to stable.
Set connection's RTCPeerConnection ice connection state to new.
Set connection's RTCPeerConnection ice gathering state to new.
Initialize an internal variable to represent a queue of operations with an empty set.
Return connection.

Once the RTCPeerConnection object has been initialized, for every call to createOffer, setLocalDescription, createAnswer and setRemoteDescription; execute the following steps:

Append an object representing the current call being handled (i.e. function name and corresponding arguments) to the operations array.
If the length of the operations array is exactly 1, execute the function from the front of the queue asynchronously.
When the asynchronous operation completes (either successfully or with an error), remove the corresponding object from the operations array. After removal, if the array is non-empty, execute the first object queued asynchronously and repeat this step on completion.

The general idea is to have only one among createOffer, setLocalDescription, createAnswer and setRemoteDescription executing at any given time. If subsequent calls are made while one of them is still executing, they are added to a queue and processed when the previous operation is fully completed. It is valid, and expected, for normal error handling procedures to be applied.

Additionally, during the lifetime of the RTCPeerConnection object, the following procedures are followed when an ICE event occurs:

If the RTCPeerConnection ice gathering state is new and the ICE transports setting is not set to none, the user agent MUST queue a task to start gathering ICE addresses and set the ice gathering state to gathering.
If the ICE Agent has found one or more candidate pairs for each MediaStreamTrack that forms a valid connection, the ICE connection state is changed to "connected".
When the ICE Agent finishes checking all candidate pairs, if at least one connection has been found for each MediaStreamTrack, the RTCPeerConnection ice connection state is changed to "completed"; otherwise "failed".

The section above shouldn't need to reference MediaStreamTracks when discussing the ICE connection state; one problem with this is that it doesn't handle the data channel situation properly. Rewrite this to refer to m-lines or ICE "media streams" or some such (here and in the later ICE connection state discussions.)

When the ICE Agent needs to notify the script about the candidate gathering progress, the user agent must queue a task to run the following steps:

Let connection be the RTCPeerConnection object associated with this ICE Agent.
If connection's RTCPeerConnection signalingState is closed, abort these steps.
If the intent of the ICE Agent is to notify the script that:
- A new candidate is available.
  
  Add the candidate to connection's localDescription and create a RTCIceCandidate object to represent the candidate. Let newCandidate be that object.
- The gathering process is done.
  
  Set connection's ice gathering state to completed and let newCandidate be null.
Fire a icecandidate event named icecandidate with newCandidate at connection.

The task source for the tasks listed in this section is the networking task source.

To prevent network sniffing from allowing a fourth party to establish a connection to a peer using the information sent out-of-band to the other peer and thus spoofing the client, the configuration information SHOULD always be transmitted using an encrypted connection.

Interface Definition

Constructor (RTCConfiguration configuration)

See the RTCPeerConnection constructor algorithm.

Promise<RTCSessionDescription> createOffer ( optional RTCOfferOptions options)

The createOffer method generates a blob of SDP that contains an RFC 3264 offer with the supported configurations for the session, including descriptions of the local MediaStreams attached to this RTCPeerConnection, the codec/RTP/RTCP options supported by this implementation, and any candidates that have been gathered by the ICE Agent. The options parameter may be supplied to provide additional control over the offer generated.

As an offer, the generated SDP will contain the full set of capabilities supported by the session (as opposed to an answer, which will include only a specific negotiated subset to use); for each SDP line, the generation of the SDP must follow the appropriate process for generating an offer. In the event createOffer is called after the session is established, createOffer will generate an offer that is compatible with the current session, incorporating any changes that have been made to the session since the last complete offer-answer exchange, such as addition or removal of streams. If no changes have been made, the offer will include the capabilities of the current local description as well as any additional capabilities that could be negotiated in an updated offer.

Session descriptions generated by createOffer MUST be immediately usable by setLocalDescription without causing an error as long as setLocalDescription is called reasonably soon. If a system has limited resources (e.g. a finite number of decoders), createOffer needs to return an offer that reflects the current state of the system, so that setLocalDescription will succeed when it attempts to acquire those resources. The session descriptions MUST remain usable by setLocalDescription without causing an error until at least the end of the fulfillment callback of the returned promise. Calling this method is needed to get the ICE user name fragment and password.

If the RTCPeerConnection is configured to generate Identity assertions, then the session description SHALL contain an appropriate assertion.

If this RTCPeerConnection object is closed before the SDP generation process completes, the USER agent MUST suppress the result and not resolve or reject the returned promise.

If the SDP generation process completed successfully, the user agent MUST resolve the returned promise with a newly created RTCSessionDescription object, representing the generated offer.

If the SDP generation process failed for any reason, the user agent MUST reject the returned promise with an DOMError object of type TBD as its argument.

To Do: Discuss privacy aspects of this from a fingerprinting point of view - it's probably around as bad as access to a canvas :-)

Promise<RTCSessionDescription> createAnswer ()

The createAnswer method generates an [[!SDP]] answer with the supported configuration for the session that is compatible with the parameters in the remote configuration. Like createOffer, the returned blob contains descriptions of the local MediaStreams attached to this RTCPeerConnection, the codec/RTP/RTCP options negotiated for this session, and any candidates that have been gathered by the ICE Agent. The options parameter may be supplied to provide additional control over the generated answer.

As an answer, the generated SDP will contain a specific configuration that, along with the corresponding offer, specifies how the media plane should be established. The generation of the SDP must follow the appropriate process for generating an answer.

Session descriptions generated by createAnswer must be immediately usable by setLocalDescription without causing an error as long as setLocalDescription is called reasonably soon. Like createOffer, the returned description should reflect the current state of the system. The session descriptions MUST remain usable by setLocalDescription without causing an error until at least the end of the fulfillment callback of the returned promise. Calling this method is needed to get the ICE user name fragment and password.

An answer can be marked as provisional, as described in [[!RTCWEB-JSEP]], by setting the type to "pranswer".

If the RTCPeerConnection is configured to generate Identity assertions, then the session description SHALL contain an appropriate assertion.

If this RTCPeerConnection object is closed before the SDP generation process completes, the USER agent MUST suppress the result and not resolve or reject the returned promise.

If the SDP generation process completed successfully, the user agent MUST resolve the returned promise with a newly created RTCSessionDescription object, representing the generated answer.

If the SDP generation process failed for any reason, the user agent MUST reject the returned promise with a DOMError object of type TBD.

Promise<void> setLocalDescription ( RTCSessionDescription description)

The setLocalDescription() method instructs the RTCPeerConnection to apply the supplied RTCSessionDescription as the local description.

This API changes the local media state. In order to successfully handle scenarios where the application wants to offer to change from one media format to a different, incompatible format, the RTCPeerConnection must be able to simultaneously support use of both the old and new local descriptions (e.g. support codecs that exist in both descriptions) until a final answer is received, at which point the RTCPeerConnection can fully adopt the new local description, or rollback to the old description if the remote side denied the change.

ISSUE: how to indicate to rollback?

To Do: specify what parts of the SDP can be changed between the createOffer and setLocalDescription

The following list describes the processing model for setting a new RTCSessionDescription.

When the method is invoked, the user agent MUST run the following steps:
1. Let p be a new promise.
2. If this RTCPeerConnection object's signaling state is closed, the user agent MUST reject p with InvalidStateError, and jump to the step labeled Return.
3. If a local description contains a different set of ICE credentials, then the ICE Agent MUST trigger an ICE restart. When ICE restarts, the gathering state will be changed back to "gathering", if it was not already gathering. If the RTCPeerConnection ice connection state was "completed", it will be changed back to "connected".
4. The user agent must start the process to apply the RTCSessionDescription argument.
5. Return: Return p.
If the process to apply the RTCSessionDescription argument fails for any reason, then user agent must queue a task runs the following steps:
1. Let connection be the RTCPeerConnection object on with this method was invoked.
2. If connection's signaling state is closed, then abort these steps.
3. If the reason for the failure is:
  - The content of the RTCSessionDescription argument is invalid or the type is wrong for the current signaling state of connection.
    
    Let reason be InvalidSessionDescriptionError.
  - The RTCSessionDescription is a valid description but cannot be applied at the media layer.
    
    TODO ISSUE - next few points are probably wrong. Make sure to check this in setRemote too.
    
    This can happen, e.g., if there are insufficient resources to apply the SDP. The user agent MUST then rollback as necessary if the new description was partially applied when the failure occurred.
    
    If rollback was not necessary or was completed successfully, let reason be IncompatibleSessionDescriptionError. If rollback was not possible, let reason be InternalError and set connection's signaling state to closed.
4. Reject p with reason.
If the RTCSessionDescription argument is applied successfully, then user agent must queue a task runs the following steps:
1. Let connection be the RTCPeerConnection object on with this method was invoked.
2. If connection's signaling state is closed, then abort these steps.
3. Set connection's description attribute ( localDescription or remoteDescription depending on the setting operation) to the RTCSessionDescription argument.
4. If the local description was set, connection's ice gathering state is new, and the local description contains media, then set connection's ice gathering state to gathering.
5. If the local description was set with content that caused an ICE restart, then set connection's ice gathering state to gathering.
6. Set connection's signalingState accordingly.
7. If connection's signalingState changed, fire a simple event named signalingstatechange at connection.
8. Resolve p with undefined.

readonly attribute RTCSessionDescription? localDescription

The localDescription attribute MUST return the last RTCSessionDescription that was successfully set using setLocalDescription(), plus any local candidates that have been generated by the ICE Agent since then.

A null object will be returned if the local description has not yet been set.

Promise<void> setRemoteDescription ( RTCSessionDescription description)

The setRemoteDescription() method instructs the RTCPeerConnection to apply the supplied RTCSessionDescription as the remote offer or answer. This API changes the local media state.

When the method is invoked, the user agent must follow the processing model of setLocalDescription(), with the following additional conditions:

If an a=identity attribute is present in the session description, the browser validates the identity assertion.. Identity validation completes asynchronously and does not block the completion of setRemoteDescription, unless there is a target peer identity.

The target peer identity cannot be changed once set. Once set, if a different value is provided, the user agent MUST reject the returned promise with InvalidStateError and abort this operation.
If the "peerIdentity" configuration is applied to the RTCPeerConnection, this establishes a target peer identity. Alternatively, if the RTCPeerConnection has previously authenticated the identity of the peer (that is, there is a current value for peerIdentity ), then this also establishes a target peer identity.

If there is a target peer identity, then setRemoteDescription rejects the returned promise, unless the description contains an identity assertion that matches the target peer identity. The RTCPeerConnection MAY be closed if the validated peer identity does not match the target peer identity.

readonly attribute RTCSessionDescription? remoteDescription

The remoteDescription attribute MUST return the last RTCSessionDescription that was successfully set using setRemoteDescription(), plus any remote candidates that have been supplied via addIceCandidate() since then.

A null object will be returned if the remote description has not yet been set.

readonly attribute RTCSignalingState signalingState

The signalingState attribute MUST return the RTCPeerConnection object's RTCPeerConnection signaling state.

void updateIce (RTCConfiguration configuration)

The updateIce method updates the ICE Agent process of gathering local candidates and pinging remote candidates.

This call may result in a change to the state of the ICE Agent, and may result in a change to media state if it results in connectivity being established.

When the updateIce() method is invoked, the user MUST run the following steps to process the RTCConfiguration dictionary:

If the iceTransportPolicy member is present, let its value be the ICE Agent's ICE transports setting.
If the iceTransportPolicy member was omitted and the ICE Agent's ICE transports setting is unset, set the ICE Agent's ICE transports setting to the iceTransportPolicy dictionary member default value.
If the iceServers dictionary member is present, but its value is an empty list, then throw an InvalidAccessError and abort these steps. If the list, on the other hand, has elements, each element must be validated by running the following sub-steps:
1. Let server be the current list element.
2. If the server.urls dictionary member is omitted or an empty list, then throw an InvalidAccessError and abort these steps.
3. If server.urls is a string, let urls be a list consisting of just that string. Otherwise, let urls refer to the server.urls list.
4. For each url in urls, parse the url and obtain scheme name. If the parsing fails or if scheme name is not implemented by the browser, throw a SyntaxError and abort these steps.
5. If scheme name is "turn" and either of the dictionary members server.username or server.credential are omitted, then throw an InvalidAccessError and abort these steps.
After passing the validation, let the iceServers dictionary member be the ICE Agent's ICE servers list.

If a new list of servers replaces the ICE Agent's existing ICE servers list, no action will taken until the RTCPeerConnection 's ice gathering state transitions to gathering. If a script wants this to happen immediately, it should do an ICE restart.
If the iceServers dictionary member was omitted, and the ICE Agent's ICE servers list is unset, throw an InvalidAccessError and abort these steps.

The exception types throw in the above algorithm are provisional (until we decide what to do in each case).

Promise<void> addIceCandidate (RTCIceCandidate candidate)

The addIceCandidate() method provides a remote candidate to the ICE Agent. In addition to being added to the remote description, connectivity checks will be sent to the new candidates as long as the ICE Transports setting is not set to none. This call will result in a change to the connection state of the ICE Agent, and may result in a change to media state if it results in different connectivity being established.

Let p be a new promise.
If this RTCPeerConnection object's signaling state is closed, the user agent MUST reject p with InvalidStateError, and jump to the step labeled Return.
If the candidate parameter is malformed, reject p with SyntaxError and jump to the step labeled Return.
If the candidate could not be successfully applied, reject p with a DOMError object whose name attribute has the value TBD (TODO InvalidCandidate and InvalidMidIndex) and jump to the step labeled Return.
If the candidate is successfully applied, resolve p with undefined.
Return: Return p.

What errors do we need here? Should we reuse the *SessionDescriptionError names or invent new ones for candidates? Should this method be queued?

readonly attribute RTCIceGatheringState iceGatheringState

The iceGatheringState attribute MUST return the gathering state of the RTCPeerConnection ICE Agent.

readonly attribute RTCIceConnectionState iceConnectionState

The iceConnectionState attribute MUST return the state of the RTCPeerConnection ICE Agent ICE state.

readonly attribute boolean? canTrickleIceCandidates

This attribute indicates whether the remote peer is able to accept trickled ICE candidates [[TRICKLE-ICE]]. The value is determined based on whether a remote description indicates support for trickle ICE, as defined in Section 4.1.9 of [[!RTCWEB-JSEP]]. Prior to the completion of setRemoteDescription, this value is null.

RTCConfiguration getConfiguration()

Returns a RTCConfiguration object representing the current configuration of this RTCPeerConnection object.

When this method is call, the user agent MUST construct new RTCConfiguration object to be returned, and initialize it using the ICE Agent's ICE transports setting and ICE servers list.

void close ()

When the RTCPeerConnection close() method is invoked, the user agent MUST run the following steps:

If the RTCPeerConnection object's RTCPeerConnection signalingState is closed, abort these steps.
Destroy the RTCPeerConnection ICE Agent, abruptly ending any active ICE processing and any active streaming, and releasing any relevant resources (e.g. TURN permissions).
Set the object's RTCPeerConnection signalingState to closed.

attribute EventHandler onnegotiationneeded

This event handler, of event handler event type negotiationneeded, MUST be supported by all objects implementing the RTCPeerConnection interface.

attribute EventHandler onicecandidate

This event handler, of event handler event type icecandidate, MUST be supported by all objects implementing the RTCPeerConnection interface.

attribute EventHandler onsignalingstatechange

This event handler, of event handler event type signalingstatechange, MUST be supported by all objects implementing the RTCPeerConnection interface. It is called any time the readyState changes, i.e., from a call to setLocalDescription, a call to setRemoteDescription, or code. It does not fire for the initial state change into new.

attribute EventHandler oniceconnectionstatechange

This event handler, of event handler event type iceconnectionstatechange, MUST be fired by all objects implementing the RTCPeerConnection interface. It is called any time the RTCPeerConnection ice connection state changes.

attribute EventHandler onicegatheringstatechange

This event handler, of event handler event type icegatheringstatechange, MUST be fired by all objects implementing the RTCPeerConnection interface. It is called any time the RTCPeerConnection ice gathering state changes.

Legacy Interface Extensions

These methods are kept on


            RTCPeerConnection

for legacy purposes.

void createOffer (RTCSessionDescriptionCallback successCallback, RTCPeerConnectionErrorCallback failureCallback, optional RTCOfferOptions options)

When the createOffer method is called, the user agent MUST run the following steps:

Let successCallback be the method's first argument.
Let failureCallback be the callback indicated by the method's second argument.
Let options be the callback indicated by the method's third argument.
Invoke RTCPeerConnection.createOffer() with options as the sole argument, and let p be the resulting promise.

Upon fulfillment of p with value offer, invoke successCallback with offer as the argument.