Mapping RTP Streams to Controlling Multiple Streams for Telepresence (CLUE) Media Captures
Tel Aviv
Israel
ron.even.tlv@gmail.com
8x8, Inc. / Jitsi
Jersey City
NJ
07302
United States of America
jonathan.lennox@8x8.com
This document describes how the Real-time Transport Protocol (RTP) is used
in the context of the Controlling Multiple Streams for Telepresence (CLUE)
protocol. It also describes the mechanisms and recommended practice for
mapping RTP media streams, as defined in the Session Description Protocol
(SDP), to CLUE Media Captures and defines a new RTP header extension
(CaptureID).
Introduction
Telepresence systems can send and receive multiple media streams.
The CLUE Framework defines Media Captures
(MCs) as a source of Media, from one or more Capture Devices. A Media
Capture may also be constructed from other Media streams. A middlebox
can express conceptual Media Captures that it constructs from
Media streams it receives. A Multiple Content Capture (MCC) is a
special Media Capture composed of multiple Media Captures.
SIP Offer/Answer uses SDP
to describe the RTP media
streams . Each RTP stream
has a unique Synchronization Source (SSRC)
within its RTP session. The content of the RTP stream is created by
an encoder in the endpoint. This may be an original content from a
camera or a content created by an intermediary device like a Multipoint Control Unit (MCU).
This document makes recommendations for the CLUE architecture about
how RTP and RTP Control Protocol (RTCP) streams should be encoded and transmitted and how
their relation to CLUE Media Captures should be communicated. The
proposed solution supports multiple RTP topologies .
With regards to the media (audio, video, and timed text), systems that
support CLUE use RTP for the media, SDP for codec and media transport
negotiation (CLUE individual encodings), and the CLUE protocol for
Media Capture description and selection. In order to associate the
media in the different protocols, there are three mappings that need to
be specified:
- CLUE individual encodings to SDP
- RTP streams to SDP (this is not a CLUE-specific mapping)
- RTP streams to MC to map the received RTP stream to the current MC
in the MCC.
Terminology
The key words "MUST", "MUST NOT",
"REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are
to be interpreted as described in BCP 14
when, and only when, they appear in all capitals,
as shown here.
Definitions from the CLUE Framework
(see ) are used by this document as
well.
RTP Topologies for CLUE
The typical RTP topologies used by CLUE telepresence systems specify
different behaviors for RTP and RTCP distribution. A number of RTP
topologies are described in . For CLUE telepresence, the
relevant topologies include Point-to-Point, as well as Media-Mixing
Mixers, Media-Switching Mixers, and Selective Forwarding Middleboxes.
In the Point-to-Point topology, one peer communicates directly with a
single peer over unicast. There can be one or more RTP sessions,
each sent on a separate 5-tuple, that have a separate SSRC space,
with each RTP session carrying multiple RTP streams identified by
their SSRC. All SSRCs are recognized by the peers based on the
information in the RTCP Source description (SDES) report that
includes the Canonical Name (CNAME) and SSRC of the sent RTP streams. There are
different Point-to-Point use cases as specified in the CLUE use case
. In some cases, a CLUE session that, at a high level, is
Point-to-Point may nonetheless have an RTP stream that is best
described by one of the mixer topologies. For example, a CLUE
endpoint can produce composite or switched captures for use by a
receiving system with fewer displays than the sender has cameras.
The Media Capture may be described using an MCC.
For the media mixer topology , the peers communicate only
with the mixer. The mixer provides mixed or composited media
streams, using its own SSRC for the sent streams. If needed by the CLUE
endpoint, the conference roster information including conference
participants, endpoints, media, and media-id (SSRC) can be determined
using the conference event package element.
Media-Switching Mixers and Selective Forwarding Middleboxes behave as
described in .
Mapping CLUE Capture Encodings to RTP Streams
The different topologies described in create different SSRC
distribution models and RTP stream multiplexing points.
Most video conferencing systems today can separate multiple RTP
sources by placing them into RTP sessions using the SDP description;
the video conferencing application can also have some knowledge about
the purpose of each RTP session. For example, video conferencing
applications that have a primary video source and a slides video
source can send each media source in a separate RTP session with a
content attribute , enabling different application behavior
for each received RTP media source. Demultiplexing is
straightforward because each Media Capture is sent as a single RTP
stream, with each RTP stream being sent in a separate RTP session, on
a distinct UDP 5-tuple. This will also be true for mapping the RTP
streams to Capture Encodings, if each Capture Encoding
uses a separate RTP session and the consumer can identify it based
on the receiving RTP port. In this case, SDP only needs to label the
RTP session with an identifier that can be used to identify the Media
Capture in the CLUE description. The SDP label attribute serves as
this identifier.
Each Capture Encoding MUST be sent as a separate RTP stream. CLUE
endpoints MUST support sending each such RTP stream in a separate RTP
session signaled by an SDP "m=" line. They MAY also support sending
some or all of the RTP streams in a single RTP session, using the
mechanism described in to
relate RTP streams to SDP "m=" lines.
MCCs bring another mapping issue, in that an MCC represents multiple
Media Captures that can be sent as part of the MCC if configured by
the consumer. When receiving an RTP stream that is mapped to the
MCC, the consumer needs to know which original MC it is in order to
get the MC parameters from the advertisement. If a consumer
requested a MCC, the original MC does not have a Capture Encoding, so
it cannot be associated with an "m=" line using a label as described in
"CLUE Signaling" . It is important, for
example, to get correct scaling information for the original MC,
which may be different for the various MCs that are contributing to
the MCC.
MCC Constituent CaptureID Definition
For an MCC that can represent multiple switched MCs, there is a need
to know which MC is represented in the current RTP stream at any
given time. This requires a mapping from the SSRC of the RTP stream
conveying a particular MCC to the constituent MC. In order to
address this mapping, this document defines an RTP header extension
and SDES item that includes the captureID of the original MC,
allowing the consumer to use the MC's original source attributes like
the spatial information.
This mapping temporarily associates the SSRC of the RTP stream
conveying a particular MCC with the captureID of the single original
MC that is currently switched into the MCC. This mapping cannot be
used for a composed case where more than one original MC is
composed into the MCC simultaneously.
If there is only one MC in the MCC, then the media provider MUST send
the captureID of the current constituent MC in the RTP header
extension and as an RTCP CaptureID SDES item. When the media provider
switches the MC it sends within an MCC, it MUST send the captureID
value for the MC that just switched into the MCC in an RTP header
extension and as an RTCP CaptureID SDES item as specified in .
If there is more than one MC composed into the MCC, then the media
provider MUST NOT send any of the MCs' captureIDs using this
mechanism. However, if an MCC is sending Contributing Source (CSRC)
information in the RTP header for a composed capture, it MAY send the
captureID values in the RTCP SDES packets giving source information
for the SSRC values sent as CSRCs.
If the media provider sends the captureID of a single MC switched
into an MCC, then later sends one composed stream of multiple MCs in
the same MCC, it MUST send the special value "-", a single-dash
character, as the captureID RTP header extension and RTCP CaptureID
SDES item. The single-dash character indicates there is no
applicable value for the MCC constituent CaptureID. The media
consumer interprets this as meaning that any previous CaptureID value
associated with this SSRC no longer applies. As
defines the captureID syntax as
"xs:ID", the single-dash character is not a legal captureID value, so
there is no possibility of confusing it with an actual captureID.
RTCP CaptureID SDES Item
This document specifies a new RTCP SDES item.
This CaptureID is a variable-length UTF-8 string corresponding to either
a CaptureID negotiated in the CLUE protocol or the single
character "-".
This SDES item MUST be sent in an SDES packet within a compound RTCP
packet unless support for Reduced-Size RTCP has been negotiated as
specified in RFC 5506 , in which case it can be sent as an
SDES packet in a noncompound RTCP packet.
RTP Header Extension
The CaptureID is also carried in an RTP header extension ,
using the mechanism defined in .
Support is negotiated within SDP using the URN "urn:ietf:params:rtp-hdrext:sdes:CaptureID".
The CaptureID is sent in an RTP header extension because for switched
captures, receivers need to know which original MC corresponds to the
media being sent for an MCC, in order to correctly apply geometric
adjustments to the received media.
As discussed in , there is no need to send the CaptId Header
Extension with all RTP packets. Senders MAY choose to send it only
when a new MC is sent. If such a mode is being used, the header
extension SHOULD be sent in the first few RTP packets to reduce the
risk of losing it due to packet loss. See for further discussion.
Examples
In this partial advertisement, the media provider advertises a
composed capture VC7 made of a big picture representing the current
speaker (VC3) and two picture-in-picture boxes representing the
previous speakers (the previous one -- VC5 -- and the oldest one -- VC6).
CS1
true
VC3
VC5
VC6
3
false
big picture of the current
speaker pips about previous speakers
1
it
static
individual
]]>
In this case, the media provider will send capture IDs VC3, VC5, or VC6
as an RTP header extension and RTCP SDES message for the RTP stream
associated with the MC.
Note that this is part of the full advertisement message example from
the CLUE data model example and is not a
valid XML document.
Communication Security
CLUE endpoints MUST support RTP/SAVPF profiles and the Secure Real-time Transport Protocol (SRTP) .
CLUE endpoints MUST support DTLS and DTLS-SRTP
for SRTP keying.
All media channels SHOULD be secure via SRTP and the RTP/SAVPF
profile unless the RTP media and its associated RTCP are secure by
other means (see and ).
All CLUE implementations MUST support DTLS 1.2 with the
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 cipher suite and the P-256
curve . The DTLS-SRTP protection profile
SRTP_AES128_CM_HMAC_SHA1_80 MUST be supported for SRTP.
Implementations MUST favor cipher suites that support Perfect
Forward Secrecy (PFS) over non-PFS cipher suites and SHOULD favor
Authenticated Encryption with Associated Data (AEAD) over non-AEAD
cipher suites. Encrypted SRTP Header extensions MUST be supported.
Implementations SHOULD implement DTLS 1.2 with the
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 cipher suite.
Implementations MUST favor cipher suites that support Perfect Forward Secrecy (PFS) over non-
PFS cipher suites and SHOULD favor Authenticated Encryption with Associated Data (AEAD) over non-AEAD cipher suites.
NULL Protection profiles MUST NOT be used for RTP or RTCP.
CLUE endpoints MUST generate short-term persistent RTCP CNAMEs, as
specified in , and thus can't be used for long-term tracking
of the users.
IANA Considerations
This document defines a new extension URI in the "RTP SDES Compact
Header Extensions" subregistry of the "Real-Time Transport Protocol
(RTP) Parameters" registry, according to the following data:
- Extension URI:
- urn:ietf:params:rtp-hdrext:sdes:CaptId
- Description:
- CLUE CaptId
- Contact:
- <ron.even.tlv@gmail.com>
- Reference:
- RFC 8849
The IANA has registered one new RTCP SDES items in the
"RTCP SDES Item Types" registry, as follows:
Value |
Abbrev |
Name |
Reference |
14 |
CCID |
CLUE CaptId |
RFC 8849 |
Security Considerations
The security considerations of the RTP specification, the RTP/SAVPF
profile, and the various RTP/RTCP extensions and RTP payload formats
that form the complete protocol suite described in this memo apply.
It is believed that there are no new security considerations
resulting from the combination of these various protocol extensions.
The "Extended Secure RTP Profile for Real-time Transport Control
Protocol (RTCP)-Based Feedback (RTP/SAVPF)" document provides
the handling of fundamental issues by offering confidentiality, integrity,
and partial source authentication. A mandatory-to-implement and use
media security solution is created by combining this secured RTP
profile and DTLS-SRTP keying as defined in the
communication security section of this memo ().
RTCP packets convey a CNAME identifier that is used
to associate RTP packet streams that need to be synchronized across
related RTP sessions. Inappropriate choice of CNAME values can be a
privacy concern, since long-term persistent CNAME identifiers can be
used to track users across multiple calls. The communication
security section of this memo () mandates the generation of short-
term persistent RTCP CNAMEs, as specified in , so they can't
be used for long-term tracking of the users.
Some potential denial-of-service attacks exist if the RTCP reporting
interval is configured to an inappropriate value.
This could be done
by configuring the RTCP bandwidth fraction to an excessively large or
small value using the SDP "b=RR:" or "b=RS:" lines , or some
similar mechanism, or by choosing an excessively large or small value
for the RTP/AVPF minimal receiver report interval (if using SDP, this
is the "a=rtcp-fb:... trr-int" parameter) . The risks are as
follows:
- The RTCP bandwidth could be configured to make the regular
reporting interval so large that effective congestion control
cannot be maintained, potentially leading to denial of service
due to congestion caused by the media traffic;
- The RTCP interval could be configured to a very small value,
causing endpoints to generate high-rate RTCP traffic, which potentially
leads to denial of service due to the non-congestion-controlled
RTCP traffic; and
- RTCP parameters could be configured differently for each
endpoint, with some of the endpoints using a large reporting
interval and some using a smaller interval, leading to denial of
service due to premature participant timeouts, which are due to mismatched
timeout periods that are based on the reporting interval (this
is a particular concern if endpoints use a small but non-zero
value for the RTP/AVPF minimal receiver report interval (trr-int)
, as discussed in ).
Premature participant timeout can be avoided by using the fixed (non-
reduced) minimum interval when calculating the participant timeout
. To address the other
concerns, endpoints SHOULD ignore parameters that configure the RTCP
reporting interval to be significantly longer than the default five-second
interval specified in (unless the media data rate is
so low that the longer reporting interval roughly corresponds to 5%
of the media data rate) or that configure the RTCP reporting
interval small enough that the RTCP bandwidth would exceed the media
bandwidth.
The guidelines in apply when using variable bit rate (VBR)
audio codecs such as Opus.
Encryption of the header extensions is RECOMMENDED,
unless there are known reasons, like RTP middleboxes performing voice-activity-based
source selection or third-party monitoring that will
greatly benefit from the information, and this has been expressed
using API or signaling. If further evidence is produced to show
that information leakage is significant from audio level indications,
then the use of encryption needs to be mandated at that time.
In multi-party communication scenarios using RTP middleboxes,
the middleboxes are REQUIRED, by this protocol, to not weaken the
sessions' security. The middlebox SHOULD maintain
confidentiality, maintain integrity, and perform source authentication. The
middlebox MAY perform checks that prevent any endpoint participating
in a conference to impersonate another. Some additional security
considerations regarding multi-party topologies can be found in
.
The CaptureID is created as part of the CLUE protocol. The CaptId
SDES item is used to convey the same CaptureID value in the SDES
item. When sending the SDES item, the security considerations
specified in and in the
communication security section of this memo (see ) are applicable.
Note that since the CaptureID is also carried in CLUE protocol
messages, it is RECOMMENDED that this SDES item use at least similar
protection profiles as the CLUE protocol messages carried in the CLUE
data channel.
References
Normative References
An XML Schema for the Controlling Multiple Streams for Telepresence (CLUE) Data Model
Framework for Telepresence Multi-Streams
Negotiating Media Multiplexing Using the Session Description Protocol (SDP)
Informative References
Digital Signature Standard (DSS)
National Institute of Standards and Technology (NIST)
FIPS, PUB 186-4
Session Signaling for Controlling Multiple Streams for
Telepresence (CLUE)
Acknowledgments
The authors would like to thank and
for
contributing text to this work. helped draft
the security section.