Congestion Control Requirements for Interactive Real-Time MediaMozillaUnited States of Americarandell-ietf@jesup.orgEricsson ABTorshamnsgatan 23Stockholm164 83Sweden+46 10 717 37 43zaheduzzaman.sarker@ericsson.comInteractive multimediawebrtcvideo communicationRTP/RTCPCongestion control is needed for all data transported across the
Internet, in order to promote fair usage and prevent congestion
collapse. The requirements for interactive, point-to-point real-time
multimedia, which needs low-delay, semi-reliable data delivery, are
different from the requirements for bulk transfer like FTP or bursty
transfers like web pages. Due to an increasing amount of RTP-based
real-time media traffic on the Internet (e.g., with the introduction of
the Web Real-Time Communication (WebRTC)), it is especially important to
ensure that this kind of traffic is congestion controlled.This document describes a set of requirements that can be used to
evaluate other congestion control mechanisms in order to figure out
their fitness for this purpose, and in particular to provide a set of
possible requirements for a real-time media congestion avoidance
technique.Status of This Memo
This document is not an Internet Standards Track specification; it is
published for informational purposes.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are candidates for any level of Internet
Standard; see Section 2 of RFC 7841.
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
.
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
() in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Table of Contents
. Introduction
. Requirements Language
. Requirements
. Deficiencies of Existing Mechanisms
. IANA Considerations
. Security Considerations
. References
. Normative References
. Informative References
Acknowledgements
Authors' Addresses
IntroductionMost of today's TCP congestion control schemes were developed with a
focus on a use of the Internet for reliable bulk transfer of
non-time-critical data, such as transfer of large files. They have also
been used successfully to govern the reliable transfer of smaller chunks
of data in as short a time as possible, such as when fetching web
pages.These algorithms have also been used for transfer of media streams
that are viewed in a non-interactive manner, such as "streaming" video,
where having the data ready when the viewer wants it is important, but
the exact timing of the delivery is not.When handling real-time interactive media, the requirements are
different. One needs to provide the data continuously, within a very
limited time window (no more delay than hundreds of milliseconds
end-to-end). In addition, the sources of data may be able to adapt the
amount of data that needs sending within fairly wide margins, but they can be rate limited by the
application -- even not always having data to send. They may tolerate some
amount of packet loss, but since the data is generated in real time,
sending "future" data is impossible, and since it's consumed in
real time, data delivered late is commonly useless.While the requirements for real-time interactive media differ from
the requirements for the other flow types, these other flow types will
be present in the network. The congestion control algorithm for
real-time interactive media must work properly when these other flow
types are present as cross traffic on the network.One particular protocol portfolio being developed for this use case
is WebRTC , where one
envisions sending multiple flows using the Real-time Transport Protocol
(RTP) between two peers, in conjunction
with data flows, all at the same time, without having special
arrangements with the intervening service providers. As RTP does not
provide any congestion control mechanism, a set of circuit breakers,
such as those described in ,
are required to protect the network from excessive congestion caused by
non-congestion-controlled flows. When the real-time interactive
media is congestion controlled, it is recommended that the
congestion control mechanism operate within the constraints defined by
these
circuit breakers when a circuit breaker is present and that it should not
cause congestion collapse when a circuit breaker is not implemented.Given that this use case is the focus of this document, use cases
involving non-interactive media such as video streaming and those
using multicast/broadcast-type technologies, are out of scope.The terminology defined in
is used in this memo.Requirements LanguageThe key words "MUST", "MUST NOT",
"REQUIRED", "SHALL",
"SHALL NOT", "SHOULD",
"SHOULD NOT",
"RECOMMENDED",
"MAY", and "OPTIONAL" in this document
are to be interpreted as described in BCP 14
.Requirements
The congestion control algorithm MUST attempt to provide
as-low-as-possible-delay transit for interactive real-time traffic
while still providing a useful amount of bandwidth. There may be
lower limits on the amount of bandwidth that is useful, but this is
largely application specific, and the application may be able to
modify or remove flows in order to allow some useful flows to get
enough bandwidth. For example, although there might not be enough bandwidth
for low-latency video+audio, there could be enough for audio only.
Jitter (variation in the bitrate over short timescales) is also
relevant, though moderate amounts of jitter will be absorbed
by jitter buffers. Transit delay should be considered to track
the short-term maximums of delay, including jitter.
The algorithm should provide this as-low-as-possible-delay transit and
minimize self-induced latency even when faced with intermediate
bottlenecks and competing flows. Competing flows may limit
what's possible to achieve.
The algorithm should be resilient to the effects of events, such as
routing changes, which may alter or remove bottlenecks or change
the bandwidth available, especially if there is a reduction in
available bandwidth or increase in observed delay. It is
expected that the mechanism reacts quickly to such events to
avoid delay buildup. In the context of this memo, a "quick"
reaction is on the order of a few RTTs, subject to the
constraints of the media codec, but is likely within a second.
Reaction on the next RTT is explicitly not required, since many
codecs cannot adapt their sending rate that quickly, but
at the same time a response cannot be arbitrarily delayed.
The algorithm should react quickly to handle both local and remote
interface changes (e.g., WLAN to 3G data) that may radically
change the bandwidth available or bottlenecks, especially if
there is a reduction in available bandwidth or an increase in
bottleneck delay. It is assumed that an interface change can
generate a notification to the algorithm.
The real-time interactive media applications can be rate
limited. This means the offered loads can be less than the
available bandwidth at any given moment and may vary
dramatically over time, including dropping to no load and then
resuming a high load, such as in a mute/unmute operation. Hence,
the algorithm must be designed to handle such behavior from
a media source or application. Note that the reaction time between
a change in the bandwidth available from the algorithm and a
change in the offered load is variable, and it may be different
when increasing versus decreasing.
The algorithm is required to avoid building up queues when
competing with short-term bursts of traffic (for example,
traffic generated by web browsing), which can quickly saturate a
local-bottleneck router or link but clear quickly. The
algorithm should also react quickly to regain its previous share
of the bandwidth when the local bottleneck or link is
cleared.
Similarly, periodic bursty flows such as MPEG DASH or proprietary media
streaming
algorithms may compete in bursts with the algorithm and may not
be adaptive within a burst. They are often layered on top of TCP
but use TCP in a bursty manner that can interact poorly with
competing flows during the bursts. The algorithm must not
increase the already existing delay buildup during those bursts.
Note that this competing traffic may be on a shared access link,
or the traffic burst may cause a shift in the location of the
bottleneck for the duration of the burst.
The algorithm MUST be fair to other flows, both real-time flows
(such as other instances of itself) and TCP flows, both long-lived flows
and bursts such as the traffic generated by a typical web-browsing
session. Note that "fair" is a rather hard-to-define term. It SHOULD
be fair with itself, giving a fair share of the bandwidth to multiple
flows with similar RTTs, and if possible to multiple flows with
different RTTs.
Existing flows at a bottleneck must also be fair to new flows
to that bottleneck and must allow new flows to ramp up to a
useful share of the bottleneck bandwidth as quickly as possible.
A useful share will depend on the media types involved, total
bandwidth available, and the user-experience requirements of a
particular service. Note that relative RTTs may affect the rate
at which new flows can ramp up to a reasonable share.
The algorithm SHOULD NOT starve competing TCP flows and SHOULD,
as best as possible, avoid starvation by TCP flows.
The congestion control should prioritize achieving a useful
share of the bandwidth depending on the media types and total
available bandwidth over achieving as-low-as-possible transit
delay, when these two requirements are in conflict.
The algorithm SHOULD adapt as quickly as possible to initial
network conditions at the start of a flow. This SHOULD occur whether
the initial bandwidth is above or below the bottleneck bandwidth.
The algorithm should allow different modes of adaptation; for
example, the startup adaptation may be faster than adaptation
later in a flow. It should allow for both slow-start operation
(adapt up) and history-based startup (start at a point expected
to be at or below channel bandwidth from historical information,
which may need to adapt down quickly if the initial guess is
wrong). Starting too low and/or adapting up too slowly can cause
a critical point in a personal communication to be poor
("Hello!").
Starting too high above the available bandwidth causes other problems for
user experience, so there's a tension here. Alternative methods
to help startup, such as probing during setup with dummy data, may be
useful in some applications; in some cases, there will be a
considerable gap in time between flow creation and the initial
flow of data. Again, a flow may need to change adaptation rates
due to network conditions or changes in the provided flows (such
as unmuting or sending data after a gap).
The algorithm SHOULD be stable if the RTP streams are halted or
discontinuous (for example, when using Voice Activity Detection).
After stream resumption, the algorithm should attempt to
rapidly regain its previous share of the bandwidth; the
aggressiveness with which this is done will decay with the
length of the pause.
Where possible, the algorithm SHOULD merge information across
multiple RTP streams sent between two endpoints when those RTP
streams share a common bottleneck, whether or not those streams are
multiplexed onto the same ports. This will allow congestion
control of the set of streams together instead of as multiple
independent streams. It will also allow better overall bandwidth
management, faster response to changing conditions, and fairer
sharing of bandwidth with other network users.
The algorithm should also share information and adaptation
with other non-RTP flows between the same endpoints, such as a
WebRTC data channel , when
possible.
When there are multiple streams across the same 5-tuple
coordinating their bandwidth use and congestion control, the
algorithm should allow the application to control the relative
split of available bandwidth. The most correlated bandwidth
usage would be with other flows on the same 5-tuple, but there
may be use in coordinating measurement and control of the local
link(s). Use of information about previous flows, especially on
the same 5-tuple, may be useful input to the algorithm,
especially regarding startup performance of a new flow.
The algorithm SHOULD NOT require any special support from network
elements to be able to convey congestion-related information.
As much as possible, it SHOULD leverage available information about
the incoming flow to provide feedback to the sender. Examples of
this information are the packet arrival times, acknowledgements and
feedback, packet timestamps, packet losses, and Explicit Congestion
Notification (ECN) ; all of these can
provide information about the state of the path and any bottlenecks.
However, the use of available information is algorithm
dependent.
Extra information could be added to the packets to provide
more detailed information on actual send times (as opposed to
sampling times), but such information should not be required.
Since the assumption here is a set of RTP streams, the
backchannel typically SHOULD be done via the RTP Control Protocol
(RTCP) ; instead, one alternative
would be to include it
in a reverse-RTP channel using header extensions.
In order to react sufficiently quickly when using RTCP for a
backchannel, an RTP profile such as RTP/AVPF or RTP/SAVPF that allows sufficiently frequent
feedback must be used. Note that in some cases, backchannel
messages may be delayed until the RTCP channel can be allocated
enough bandwidth, even under AVPF rules. This may also imply
negotiating a higher maximum percentage for RTCP data or
allowing solutions to violate or modify the rules specified for
AVPF.
Bandwidth for the feedback messages should be minimized
using techniques such as those in , to allow RTCP
without Sender/Receiver Reports.
Backchannel data should be minimized to avoid taking too much
reverse-channel bandwidth (since this will often be used in a
bidirectional set of flows). In areas of stability, backchannel
data may be sent more infrequently so long as algorithm
stability and fairness are maintained. When the channel is
unstable or has not yet reached equilibrium after a change,
backchannel feedback may be more frequent and use more
reverse-channel bandwidth. This is an area with considerable
flexibility of design, and different approaches to backchannel
messages and frequency are expected to be evaluated.
Flows managed by this algorithm and flows competing against each
other at a
bottleneck may have different Differentiated Services Code Point
(DSCP)
markings depending on the type of traffic or may be subject to
flow-based QoS. A particular bottleneck or section of the network
path may or may not honor DSCP markings. The algorithm SHOULD
attempt to leverage DSCP markings when they're available.
The algorithm SHOULD sense the unexpected lack of backchannel
information as a possible indication of a channel-overuse problem
and react accordingly to avoid burst events causing a congestion
collapse.
The algorithm SHOULD be stable and maintain low delay when faced
with Active Queue Management (AQM) algorithms. Also note that these
algorithms may apply across multiple queues in the bottleneck or to
a single queue.
Deficiencies of Existing MechanismsAmong the existing congestion control mechanisms, TCP Friendly Rate
Control (TFRC) is the one that claims to
be suitable for real-time interactive media. TFRC is an equation-based
congestion control mechanism that provides a reasonably fair share of
bandwidth when competing with TCP flows and offers much lower throughput
variations than TCP. This is achieved by a slower response to the
available bandwidth change than TCP. TFRC is designed to perform best
with applications that have a fixed packet size and do not have a fixed
period between sending packets.TFRC detects loss events and reacts to congestion-caused loss by
reducing its sending rate. It allows applications to
increase the sending rate until loss is observed in the flows. As
noted in IAB/IRTF report , large buffers
are available in the network elements, which introduce additional delay
in the communication. It becomes important to take all possible
congestion indications into consideration. Looking at the current
Internet deployment, TFRC's biggest deficiency is that it only considers
loss events as a congestion indication.
A typical real-time interactive communication includes live-encoded
audio and video flow(s). In such a communication scenario, an audio
source typically needs a fixed interval between packets and needs to
vary the segment size of the packets instead of the packet rate in
response to congestion; therefore, it sends smaller packets.
A variant of TFRC, Small-Packet
TFRC (TFRC-SP) , addresses the issues
related to such kind of sources. A video source generally varies video
frame sizes, can produce large frames that need to be further
fragmented to fit into path Maximum Transmission Unit (MTU) size, and
has an almost fixed interval between producing frames under a certain
frame rate. TFRC is known to be less optimal when using such video
sources.There are also some mismatches between TFRC's design assumptions and
how the media sources in a typical real-time interactive application
work. TFRC is designed to maintain a smooth sending rate; however, media
sources can change rates in steps for both rate increase and rate
decrease. TFRC can operate in two modes: i) bytes per second and ii)
packets per second, where typical real-time interactive media sources
operate on bit per second. There are also limitations on how quickly
the media sources can adapt to specific sending rates. Modern video
encoders can operate in a mode in which they can vary the output bitrate a
lot depending on the way they are configured, the current scene they are
encoding, and more. Therefore, it is possible that the video source will
not always output at an allowable bitrate. TFRC tries to increase
its sending rate when transmitting at the maximum allowed rate, and it increases
only twice the current transmission rate; hence, it may create issues when
the video sources vary their bitrates.Moreover, there are a number of studies on TFRC that show its
limitations, including TFRC's unfairness to low statistically
multiplexed links, oscillatory behavior, performance issues in highly
dynamic loss-rate conditions, and more .Looking at all these deficiencies, it can be concluded that the
requirements for a congestion control mechanism for real-time interactive
media cannot be met by TFRC as defined in the standard.IANA ConsiderationsThis document has no IANA actions.Security ConsiderationsAn attacker with the ability to delete, delay, or insert messages into
the flow can fake congestion signals, unless they are passed on a
tamper-proof path. Since some possible algorithms depend on the timing
of packet arrival, even a traditional, protected channel does not fully
mitigate such attacks.An attack that reduces bandwidth is not necessarily significant,
since an on-path attacker could break the connection by discarding all
packets. Attacks that increase the perceived available bandwidth are
conceivable and need to be evaluated. Such attacks could result in
starvation of competing flows and permit amplification attacks.Algorithm designers should consider the possibility of malicious
on-path attackers.ReferencesNormative ReferencesKey words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.RTP: A Transport Protocol for Real-Time ApplicationsThis memorandum describes RTP, the real-time transport protocol. RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of- service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers. Most of the text in this memorandum is identical to RFC 1889 which it obsoletes. There are no changes in the packet formats on the wire, only changes to the rules and algorithms governing how the protocol is used. The biggest change is an enhancement to the scalable timer algorithm for calculating when to send RTCP packets in order to minimize transmission in excess of the intended rate when many participants join a session simultaneously. [STANDARDS-TRACK]Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)Real-time media streams that use RTP are, to some degree, resilient against packet losses. Receivers may use the base mechanisms of the Real-time Transport Control Protocol (RTCP) to report packet reception statistics and thus allow a sender to adapt its transmission behavior in the mid-term. This is the sole means for feedback and feedback-based error repair (besides a few codec-specific mechanisms). This document defines an extension to the Audio-visual Profile (AVP) that enables receivers to provide, statistically, more immediate feedback to the senders and thus allows for short-term adaptation and efficient feedback-based repair mechanisms to be implemented. This early feedback profile (AVPF) maintains the AVP bandwidth constraints for RTCP and preserves scalability to large groups. [STANDARDS-TRACK]Extended Secure RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)An RTP profile (SAVP) for secure real-time communications and another profile (AVPF) to provide timely feedback from the receivers to a sender are defined in RFC 3711 and RFC 4585, respectively. This memo specifies the combination of both profiles to enable secure RTP communications with feedback. [STANDARDS-TRACK]Overview: Real-Time Protocols for Browser-Based ApplicationsInformative ReferencesDesigning TCP-Friendly Window-based Congestion Control for Real-time Multimedia ApplicationsProceedings of PFLDNeTInformation Technology -- Dynamic adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and segment formatsISOThe Addition of Explicit Congestion Notification (ECN) to IPThis memo specifies the incorporation of ECN (Explicit Congestion Notification) to TCP and IP, including ECN's use of two bits in the IP header. [STANDARDS-TRACK]TCP Friendly Rate Control (TFRC): The Small-Packet (SP) VariantThis document proposes a mechanism for further experimentation, but not for widespread deployment at this time in the global Internet.TCP-Friendly Rate Control (TFRC) is a congestion control mechanism for unicast flows operating in a best-effort Internet environment (RFC 3448). TFRC was intended for applications that use a fixed packet size, and was designed to be reasonably fair when competing for bandwidth with TCP connections using the same packet size. This document proposes TFRC-SP, a Small-Packet (SP) variant of TFRC, that is designed for applications that send small packets. The design goal for TFRC-SP is to achieve the same bandwidth in bps (bits per second) as a TCP flow using packets of up to 1500 bytes. TFRC-SP enforces a minimum interval of 10 ms between data packets to prevent a single flow from sending small packets arbitrarily frequently.Flows using TFRC-SP compete reasonably fairly with large-packet TCP and TFRC flows in environments where large-packet flows and small-packet flows experience similar packet drop rates. However, in environments where small-packet flows experience lower packet drop rates than large-packet flows (e.g., with Drop-Tail queues in units of bytes), TFRC-SP can receive considerably more than its share of the bandwidth. This memo defines an Experimental Protocol for the Internet community.TCP Friendly Rate Control (TFRC): Protocol SpecificationThis document specifies TCP Friendly Rate Control (TFRC). TFRC is a congestion control mechanism for unicast flows operating in a best-effort Internet environment. It is reasonably fair when competing for bandwidth with TCP flows, but has a much lower variation of throughput over time compared with TCP, making it more suitable for applications such as streaming media where a relatively smooth sending rate is of importance.This document obsoletes RFC 3448 and updates RFC 4342. [STANDARDS-TRACK]Support for Reduced-Size Real-Time Transport Control Protocol (RTCP): Opportunities and ConsequencesThis memo discusses benefits and issues that arise when allowing Real-time Transport Protocol (RTCP) packets to be transmitted with reduced size. The size can be reduced if the rules on how to create compound packets outlined in RFC 3550 are removed or changed. Based on that analysis, this memo defines certain changes to the rules to allow feedback messages to be sent as Reduced-Size RTCP packets under certain conditions when using the RTP/AVPF (Real-time Transport Protocol / Audio-Visual Profile with Feedback) profile (RFC 4585). This document updates RFC 3550, RFC 3711, and RFC 4585. [STANDARDS-TRACK]A Differentiated Services Code Point (DSCP) for Capacity-Admitted TrafficThis document requests one Differentiated Services Code Point (DSCP) from the Internet Assigned Numbers Authority (IANA) for a class of real-time traffic. This traffic class conforms to the Expedited Forwarding Per-Hop Behavior. This traffic is also admitted by the network using a Call Admission Control (CAC) procedure involving authentication, authorization, and capacity admission. This differs from a real-time traffic class that conforms to the Expedited Forwarding Per-Hop Behavior but is not subject to capacity admission or subject to very coarse capacity admission. [STANDARDS-TRACK]Report from the IAB/IRTF Workshop on Congestion Control for Interactive Real-Time CommunicationThis document provides a summary of the IAB/IRTF Workshop on 'Congestion Control for Interactive Real-Time Communication', which took place in Vancouver, Canada, on July 28, 2012. The main goal of the workshop was to foster a discussion on congestion control mechanisms for interactive real-time communication. This report summarizes the discussions and lists recommendations to the Internet Engineering Task Force (IETF) community.The views and positions in this report are those of the workshop participants and do not necessarily reflect the views and positions of the authors, the Internet Architecture Board (IAB), or the Internet Research Task Force (IRTF).Multimedia Congestion Control: Circuit Breakers for Unicast RTP SessionsThe Real-time Transport Protocol (RTP) is widely used in telephony, video conferencing, and telepresence applications. Such applications are often run on best-effort UDP/IP networks. If congestion control is not implemented in these applications, then network congestion can lead to uncontrolled packet loss and a resulting deterioration of the user's multimedia experience. The congestion control algorithm acts as a safety measure by stopping RTP flows from using excessive resources and protecting the network from overload. At the time of this writing, however, while there are several proprietary solutions, there is no standard algorithm for congestion control of interactive RTP flows.This document does not propose a congestion control algorithm. It instead defines a minimal set of RTP circuit breakers: conditions under which an RTP sender needs to stop transmitting media data to protect the network from excessive congestion. It is expected that, in the absence of long-lived excessive congestion, RTP applications running on best-effort IP networks will be able to operate without triggering these circuit breakers. To avoid triggering the RTP circuit breaker, any Standards Track congestion control algorithms defined for RTP will need to operate within the envelope set by these RTP circuit breaker algorithms.WebRTC Data ChannelsAcknowledgementsThis document is the result of discussions in various fora of the
WebRTC effort, in particular on the <rtp-congestion@alvestrand.no> mailing
list. Many people contributed their thoughts to this.Authors' AddressesMozillaUnited States of Americarandell-ietf@jesup.orgEricsson ABTorshamnsgatan 23Stockholm164 83Sweden+46 10 717 37 43zaheduzzaman.sarker@ericsson.com