Review of draft-ietf-avtcore-multiplex-guidelines-08
Reviewer: Bernard Aboba
Date: April 1, 2019
Review Summary: Ready with Issues

This document has been reviewed as part of the transport area review team's
ongoing effort to review key IETF documents. These comments were written
primarily for the transport area directors, but are copied to the document's
authors and WG to allow them to address any issues raised and also to the IETF
discussion list for information.

When done at the time of IETF Last Call, the authors should consider this
review as part of the last-call comments they receive. Please always CC
tsv-art@ietf.org if you reply to or forward this review.

----------------
Summary

This document focuses on the use of the SSRC and PT for multiplexing, which
were the two RTP multiplexing mechanisms available at the time the -00 version 
of the document was submitted (October 2013).

However, since that time, "RTP Header Extension for RTCP Source Description 
Items" (RFC 7941) has been developed, which can be used to provide the
"RTP Stream Identifier" (RID) and  "RTP Repaired Stream Identifier" (RRID)
(see: draft-ietf-avtext-rid) as well as the "Media Identifier" (MID)
(see: draft-ietf-mmusic-sdp-bundle-negotiation). The question therefore arises
as to whether (and if so, how much) coverage of these new mechanisms is appropriate.
The document does mention RID/RRID at multiple points, so I don't think there
is much if any additional mention needed with respect to RID/RRID. However, 
the document does not mention MID although there do appear to be a few
places (see detailed comments) where this might be appropriate. 

Mention of MID was the only major question I came up with in reviewing this document.
It struck me as potentially important since draft-ietf-rtcweb-jsep deprecates 
"a=ssrc" lines in favor of RID/MID, and as a result, Selective Forwarding Units
(SFUs) are going to need to be updated to handle RID/MID.  Updates will also 
be required within RTP stack implementations so as to route incoming
RTP/RTCP packets utilizing the MID, and possibly even to deal with
incoming RIDs (for those implementations that support simulcast reception). 

For example, this seems potentially relevant to the de-multiplexing diagram 
in Section 3.2, since Section 9.2 of draft-ietf-mmusic-sdp-bundle-negotiation utilizes
the MID as well as the SSRC and PT in its algorithms to describe how RTP and 
RTCP packets can be associated with the appropriate RtpReceiver and
RtpSender objects. 

--------------------------------------------------------
Detailed Comments

Section 3.1

   o  Multiple RTP streams might be needed to represent one media source
      (for instance when using layered encodings)

[BA] I would say "for instance when using simulcast or scalable video coding". 

   o  Alternative formats, for instance multiple resolutions of the same
      video stream

[BA] Not sure how this is different from "Multiple RTP streams", since "layered encodings"
enable multiple resolutions/frame rates of the same video stream.  Is the intent
for this bullet item to refer to simulcast while the prior one refers only to SVC? 

Section 3.2


                           |
                           | packets
           +--             v
           |        +------------+
           |        |   Socket   |   Transport Protocol Demultiplexing
           |        +------------+
           |            ||  ||
      RTP  |       RTP/ ||  |+-----> SCTP ( ...and any other protocols)
   Session |       RTCP ||  +------> STUN (multiplexed using same port)
           +--          ||
           +--          ||
           |      (split by SSRC)
           |      ||    ||    ||
           |      ||    ||    ||
     RTP   |     +--+  +--+  +--+
   Streams |     |PB|  |PB|  |PB| Jitter buffer, process RTCP, etc.
           |     +--+  +--+  +--+
           +--      |   |      |
           (select decoder based on PT)
           +--      |  /       |
           |        +----+     |
           |         /   |     |
   Payload |     +---+ +---+ +---+
   Formats |     |Dec| |Dec| |Dec| Decoders
           |     +---+ +---+ +---+
           +--


[BA] This diagram no longer represents the demultiplexing process in RTP stacks implementing
MID and possibly RID.  For example, where MID is supported, "split by SSRC" will typically
be updated by "split by MID/SSRC". 

Would be nice if the figure indicated what the "PB" acronym refers to.

Although it is relatively rare, I have seen implementations that support rendering of
multiple SSRCs and RIDs to a single video tag (e.g. support for receiving simulcast and/or
MRST SVC). For these cases, I believe the above figure is accurate, although
there is an intermediate step which combines mutiple PBs into a single bitstream
sent to a decoder.  So perhaps it should be made clear that multiple PBs can feed
a single decoder. 

Section 3.2.2

   An endpoint that changes its network transport address
   during a session have to choose a new SSRC identifier to avoid being
   interpreted as looped source, unless the transport layer mechanism,
   e.g ICE [RFC8445], handles such changes. 

[BA] To be a bit more general, you might say "unless a mechanism providing a
virtual transport (such as ICE [RFC 8445]) abstracts the changes.

   An RTP receiver receiving a previously unseen SSRC value will
   interpret it as a new source.  It might in fact be a previously
   existing source that had to change SSRC number due to an SSRC
   conflict.  However, the originator of the previous SSRC ought to have
   ended the conflicting source by sending an RTCP BYE for it prior to
   starting to send with the new SSRC, so the new SSRC is anyway
   effectively a new source.

[BA] One of the reasons for creating the MID/RID was to better handle
the SSRC conflict scenario. For example, a MID extension can be used to
route RTP packets with an updated SSRC to the same receiver, and a
RID extension could be used to indicate that a simulcast stream represented
the same encoding even though the SSRC changed.  In these scenarios,
to what extent will the RTP receiver treat a previously unseen SSRC
as a "new source"? The text relating to RTCP BYE remains valid, it
seems to me.

Section 3.2.3

   The Contributing Source (CSRC) is not a separate identifier.  Rather
   an SSRC identifier is listed as a CSRC in the RTP header of a packet
   generated by an RTP mixer, if the corresponding SSRC was in the
   header of one of the packets that contributed to the mix.

   It is not possible, in general, to extract media represented by an
   individual CSRC since it is typically the result of a media mixing
   (merge) operation by an RTP mixer on the individual media streams
   corresponding to the CSRC identifiers.  The exception is the case
   when only a single CSRC is indicated as this represent forwarding of
   an RTP stream, possibly modified.  The RTP header extension for
   Mixer-to-Client Audio Level Indication [RFC6465] expands on the
   receiver's information about a packet with a CSRC list.  Due to these
   restrictions, CSRC will not be considered a fully qualified
   multiplexing point and will be disregarded in the rest of this
   document.

[BA] Since I've seen CSRCs used in video scenarios (such as for an MCU
or switching between video streams to implement Dominant Speaker detection), I
might generalize this a bit. For example:

   The Contributing Source (CSRC) is not a separate identifier.  Rather
   an SSRC identifier is listed as a CSRC in the RTP header of a packet
   generated by an RTP audio mixer or video MCU/switch, if the corresponding
   SSRC was in the header of one of the packets that contributed to
   the output.

   It is not possible, in general, to extract media represented by an
   individual CSRC since it is typically the result of a media merge
   operation on the individual media streams corresponding to the
   CSRC identifiers.  The exception is the case when only a single CSRC
   is indicated as this represents forwarding of an RTP stream, possibly
   modified.  The RTP header extension for
   Mixer-to-Client Audio Level Indication [RFC6465] expands on the
   receiver's information about an audio packet with a CSRC list.  Due to these
   restrictions, CSRC will not be considered a fully qualified
   multiplexing point and will be disregarded in the rest of this
   document.

Section 3.2.4

   If it is acceptable to send multiple formats of the same media
   source as separate RTP streams (with separate SSRC), simulcast
   [I-D.ietf-mmusic-sdp-simulcast] can be used.

[BA] Not sure why this sentence is included in the "RTP Payload Type"
section. Are you implying that simulcast streams should not use PT
multiplexing?  If so, you might say this more directly.  

If so, that point may also apply to MRST transport of scalable video coding
(which generally uses SSRC rather than PT multiplexing). 

   The RTP payload type number is sometimes used to associate an RTP
   stream with the signalling; this is not recommended since a specific
   payload type value can be used in multiple bundled "m=" sections
   [I-D.ietf-mmusic-sdp-bundle-negotiation].  This association is only
   possible if unique RTP payload type numbers are used in each context.

[BA] The last sentence is not quite true - some implementations allow
different MID values to mask PT conflicts.

Section 3.3

   o  Do I need network differentiation in form of QoS?

[BA] You might reference Section 4.2.1 here, which deals with the implications
of multiplexing for QoS.

Section 3.4.3

I would consider adding mention of MID in this section since it can be used to route an
RTP source and FEC/RTX to the same receiver, or even multiple RTP sources (e.g. reception
of simulcast or MRST SVC) to the same receiver. 
 
Section 5.2

   3.  For applications with dynamic usage of RTP streams, i.e.
       frequently added and removed, having much of the state associated
       with the RTP session rather than per individual SSRC can avoid
       the need for in-session signalling of meta-information about each
       SSRC.

[BA] Not sure I grasp your point here.  If there are multiple SSRCs in the
same RTP session, avoiding the need for in-session signaling typically requires:

a. A mechanism for handling "unsignaled streams" (e.g. an Unhandled RTP
event as in ORTC), OR

b. Support for MID to allow routing to the correct RTP receiver without in-session
signaling of the SSRC.