Review of draft-ietf-avtcore-multiplex-guidelines-08 Reviewer: Bernard Aboba Date: April 1, 2019 Review Summary: Ready with Issues This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information. When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@ietf.org if you reply to or forward this review. ---------------- Summary This document focuses on the use of the SSRC and PT for multiplexing, which were the two RTP multiplexing mechanisms available at the time the -00 version of the document was submitted (October 2013). However, since that time, "RTP Header Extension for RTCP Source Description Items" (RFC 7941) has been developed, which can be used to provide the "RTP Stream Identifier" (RID) and "RTP Repaired Stream Identifier" (RRID) (see: draft-ietf-avtext-rid) as well as the "Media Identifier" (MID) (see: draft-ietf-mmusic-sdp-bundle-negotiation). The question therefore arises as to whether (and if so, how much) coverage of these new mechanisms is appropriate. The document does mention RID/RRID at multiple points, so I don't think there is much if any additional mention needed with respect to RID/RRID. However, the document does not mention MID although there do appear to be a few places (see detailed comments) where this might be appropriate. Mention of MID was the only major question I came up with in reviewing this document. It struck me as potentially important since draft-ietf-rtcweb-jsep deprecates "a=ssrc" lines in favor of RID/MID, and as a result, Selective Forwarding Units (SFUs) are going to need to be updated to handle RID/MID. Updates will also be required within RTP stack implementations so as to route incoming RTP/RTCP packets utilizing the MID, and possibly even to deal with incoming RIDs (for those implementations that support simulcast reception). For example, this seems potentially relevant to the de-multiplexing diagram in Section 3.2, since Section 9.2 of draft-ietf-mmusic-sdp-bundle-negotiation utilizes the MID as well as the SSRC and PT in its algorithms to describe how RTP and RTCP packets can be associated with the appropriate RtpReceiver and RtpSender objects. -------------------------------------------------------- Detailed Comments Section 3.1 o Multiple RTP streams might be needed to represent one media source (for instance when using layered encodings) [BA] I would say "for instance when using simulcast or scalable video coding". o Alternative formats, for instance multiple resolutions of the same video stream [BA] Not sure how this is different from "Multiple RTP streams", since "layered encodings" enable multiple resolutions/frame rates of the same video stream. Is the intent for this bullet item to refer to simulcast while the prior one refers only to SVC? Section 3.2 | | packets +-- v | +------------+ | | Socket | Transport Protocol Demultiplexing | +------------+ | || || RTP | RTP/ || |+-----> SCTP ( ...and any other protocols) Session | RTCP || +------> STUN (multiplexed using same port) +-- || +-- || | (split by SSRC) | || || || | || || || RTP | +--+ +--+ +--+ Streams | |PB| |PB| |PB| Jitter buffer, process RTCP, etc. | +--+ +--+ +--+ +-- | | | (select decoder based on PT) +-- | / | | +----+ | | / | | Payload | +---+ +---+ +---+ Formats | |Dec| |Dec| |Dec| Decoders | +---+ +---+ +---+ +-- [BA] This diagram no longer represents the demultiplexing process in RTP stacks implementing MID and possibly RID. For example, where MID is supported, "split by SSRC" will typically be updated by "split by MID/SSRC". Would be nice if the figure indicated what the "PB" acronym refers to. Although it is relatively rare, I have seen implementations that support rendering of multiple SSRCs and RIDs to a single video tag (e.g. support for receiving simulcast and/or MRST SVC). For these cases, I believe the above figure is accurate, although there is an intermediate step which combines mutiple PBs into a single bitstream sent to a decoder. So perhaps it should be made clear that multiple PBs can feed a single decoder. Section 3.2.2 An endpoint that changes its network transport address during a session have to choose a new SSRC identifier to avoid being interpreted as looped source, unless the transport layer mechanism, e.g ICE [RFC8445], handles such changes. [BA] To be a bit more general, you might say "unless a mechanism providing a virtual transport (such as ICE [RFC 8445]) abstracts the changes. An RTP receiver receiving a previously unseen SSRC value will interpret it as a new source. It might in fact be a previously existing source that had to change SSRC number due to an SSRC conflict. However, the originator of the previous SSRC ought to have ended the conflicting source by sending an RTCP BYE for it prior to starting to send with the new SSRC, so the new SSRC is anyway effectively a new source. [BA] One of the reasons for creating the MID/RID was to better handle the SSRC conflict scenario. For example, a MID extension can be used to route RTP packets with an updated SSRC to the same receiver, and a RID extension could be used to indicate that a simulcast stream represented the same encoding even though the SSRC changed. In these scenarios, to what extent will the RTP receiver treat a previously unseen SSRC as a "new source"? The text relating to RTCP BYE remains valid, it seems to me. Section 3.2.3 The Contributing Source (CSRC) is not a separate identifier. Rather an SSRC identifier is listed as a CSRC in the RTP header of a packet generated by an RTP mixer, if the corresponding SSRC was in the header of one of the packets that contributed to the mix. It is not possible, in general, to extract media represented by an individual CSRC since it is typically the result of a media mixing (merge) operation by an RTP mixer on the individual media streams corresponding to the CSRC identifiers. The exception is the case when only a single CSRC is indicated as this represent forwarding of an RTP stream, possibly modified. The RTP header extension for Mixer-to-Client Audio Level Indication [RFC6465] expands on the receiver's information about a packet with a CSRC list. Due to these restrictions, CSRC will not be considered a fully qualified multiplexing point and will be disregarded in the rest of this document. [BA] Since I've seen CSRCs used in video scenarios (such as for an MCU or switching between video streams to implement Dominant Speaker detection), I might generalize this a bit. For example: The Contributing Source (CSRC) is not a separate identifier. Rather an SSRC identifier is listed as a CSRC in the RTP header of a packet generated by an RTP audio mixer or video MCU/switch, if the corresponding SSRC was in the header of one of the packets that contributed to the output. It is not possible, in general, to extract media represented by an individual CSRC since it is typically the result of a media merge operation on the individual media streams corresponding to the CSRC identifiers. The exception is the case when only a single CSRC is indicated as this represents forwarding of an RTP stream, possibly modified. The RTP header extension for Mixer-to-Client Audio Level Indication [RFC6465] expands on the receiver's information about an audio packet with a CSRC list. Due to these restrictions, CSRC will not be considered a fully qualified multiplexing point and will be disregarded in the rest of this document. Section 3.2.4 If it is acceptable to send multiple formats of the same media source as separate RTP streams (with separate SSRC), simulcast [I-D.ietf-mmusic-sdp-simulcast] can be used. [BA] Not sure why this sentence is included in the "RTP Payload Type" section. Are you implying that simulcast streams should not use PT multiplexing? If so, you might say this more directly. If so, that point may also apply to MRST transport of scalable video coding (which generally uses SSRC rather than PT multiplexing). The RTP payload type number is sometimes used to associate an RTP stream with the signalling; this is not recommended since a specific payload type value can be used in multiple bundled "m=" sections [I-D.ietf-mmusic-sdp-bundle-negotiation]. This association is only possible if unique RTP payload type numbers are used in each context. [BA] The last sentence is not quite true - some implementations allow different MID values to mask PT conflicts. Section 3.3 o Do I need network differentiation in form of QoS? [BA] You might reference Section 4.2.1 here, which deals with the implications of multiplexing for QoS. Section 3.4.3 I would consider adding mention of MID in this section since it can be used to route an RTP source and FEC/RTX to the same receiver, or even multiple RTP sources (e.g. reception of simulcast or MRST SVC) to the same receiver. Section 5.2 3. For applications with dynamic usage of RTP streams, i.e. frequently added and removed, having much of the state associated with the RTP session rather than per individual SSRC can avoid the need for in-session signalling of meta-information about each SSRC. [BA] Not sure I grasp your point here. If there are multiple SSRCs in the same RTP session, avoiding the need for in-session signaling typically requires: a. A mechanism for handling "unsignaled streams" (e.g. an Unhandled RTP event as in ORTC), OR b. Support for MID to allow routing to the correct RTP receiver without in-session signaling of the SSRC.