CURRENT_MEETING_REPORT_ Reported by Steve Casner/Information Sciences Institute Minutes of Audio/Video Transport Working Group (AVT) The AVT Working Group met for three sessions. The first two sessions were used to discuss some open issues on the Draft specification for the Real-Time Transport Protocol (RTP); the third session was an ``implementors agreement'' session focusing on software video encoding. Status of the Working Group and Review of the Draft RTP Specification The goal of the AVT Working Group is to specify a set of experimental protocols for real-time transmission of audio and video. The emphasis is short-term to promote experimentation now so that standards-track protocols can be developed based on what is learned. The first Working Group meeting was a year ago, so one might expect a conclusion soon. In fact, many issues were resolved during the last two IETF meetings and the specification is nearing readiness for submission as an RFC. A set of four Internet-Drafts specifying the RTP protocol were issued in December 1992, and an Internet-Draft on packetization of H.261 coded video was issued in March, 1993. In addition, RTP has been implemented to varying degrees in the following programs: IVS Thierry Turletti and Christian Huitema Maven Charley Kline NEVOT Henning Schulzrinne ``nv'' Ron Frederick PictureWindow Paul Milazzo and Bob Clements (an older draft) PVP Steve Casner This session began with a brief review of RTP. It consists primarily of protocol headers for real-time data packets, which is just eight octets long in the typical case. RTP supports the following functions: o Transfer of media data. o Demultiplexing of multiple flows. o Content identification (e.g., the media encoding used). o Synchronization and sequencing. o Options for simple control functions such as identification of participants. 1 Full details on the protocol are available in the Internet-Drafts, which are listed later in this document. Discussion of Open Issues A few open issues had developed since the last meeting and were discussed. The primary items were: o Elimination of IPv4 addresses carried within RTP. o Separation of RTCP control functions not related to transport. o Security services and mechanisms. These issues are expanded in the following paragraphs. No roadblocks were identified, but resolution of some of the questions was left to email discussion. The CSRC and SSRC options in the December RTP specification carried globally unique identifiers for the ``content source'' and ``synchronization source'' respectively, in data packets that have been processed through transport/application-level gateways such as audio mixers. These globally unique identifiers were based on IPv4 addresses, but this is not acceptable considering the impending transition to a next-generation IP. Therefore, the Group considered two revision choices: 1. Use a (type, length, value) triple to allow any form of address to be carried. 2. Carry only a 16-bit or 32-bit identifier that is locally unique to the gateway that inserts the option, and then define the mapping to a globally unique address, or other information, through an extended SDESC (source description) option or a higher-layer protocol. Since the CSRC and SSRC options may be carried in every data packet for some flows, the long addresses in the first choice might impose an uncomfortably large overhead. Furthermore, the Group recognized that the locally unique identifier is sufficient for an RTP receiver to distinguish the source and process the packet; it is only necessary to map to a globally unique address or user name for purposes of monitoring, control, and user interface. Therefore, the second choice was selected, and it was generally agreed that 16 bits would be sufficient because the identifiers are unique only with respect to the full source address of the gateway (network address plus UDP port, for example). Mapping of the identifiers to other information, such as the name of a conference participant, should be accomplished through a higher-layer control protocol. Note that the gateway entities must be involved in that protocol, but this would be true even if globally unique addresses 2 were used since the addresses include port numbers to allow more than one entity per host. One way to accomplish the mapping is through the Source Description (SDESC) option in the RTP control ``sub-protocol'' RTCP. The SDESC option includes the 16-bit identifier plus one or more items to which it is mapped. Examples include the user name, as before, and various forms of globally unique addresses. A full address must also be specified for the return of RTP reverse control packets containing options such as the Quality of Service Measurement (QOS) option. In the December Draft, a return port number was specified in the Content Description (CDESC) option, but it was agreed that a full address is required because the return information may go to the content or synchronization source or to third party monitoring host. Furthermore, it was suggested that the return address be separated into its own option because it may be desirable to establish a return address without defining a new content type or redefining an old one. One problem is that the sender and receiver of the reverse control information may be using different forms of addressing, for example IPv4 and SIP or PIP. This problem extends beyond RTP, and its solution will depend on IPng transition plans. Two solutions the Group has considered are to use a DNS name or to allow multiple forms of binary address to be specified. Some members of the Working Group objected to including within the RTP specification those RTCP options unnecessary for transport functions. However, it seems impractical to split off a few pages of RTCP option definitions into another RFC. It was agreed to keep the RTCP options within the RTP specification as a separate section with appropriate disclaimers about these functions being replaced by higher-layer control protocols when they are available. The RTP specification Draft includes a brief Security Considerations section, but the protocol will be inadequate for teleconferencing applications, at least for confidentiality of voice communication, without access to some security mechanisms. In the future, it may be possible to depend on security mechanisms at the IP layer, but for near-term use, possibly including experimentation with new security mechanisms specifically for real-time applications, the Group believes it is appropriate to define security mechanisms at the RTP layer. Stuart Stubblebine made a presentation reviewing the security services and mechanisms that might be needed for applications that use RTP. The most needed services are confidentiality and integrity of the payload, and authentication of the source. Integrity is considered separately for individual packets and for the stream of packets. A set of three new RTP options was proposed to implement these security services: o ENC, to indicate the start point for encryption and carry the initialization parameters; o MIC, which may be used with a variety of security schemes, for example to carry a Message Integrity Check; and 3 o KDEF, an RTCP option to define key identifiers. Details of these security options will be found in a new Draft of the RTP specification to be released in mid-May. In addition to the major topics, the Group also discussed a request from Frank Hoffmann for two additions to the protocol to allow higher reliability levels of service. The first was that a checksum option be provided for use when RTP is carried over ST-II, which does not provide a checksum. However, there is a problem with carrying a checksum in an option: an error in the header may make it appear that there is no checksum option, so the checksum would not be checked and the error would not be detected. Also, it is inconvenient that an RTP-level reflector from ST-II to IP/UDP would have to check and remove the checksum option to avoid presenting the UDP destination with two potentially conflicting checksums. As an alternative, it is suggested that there be a separate specification for an encapsulation of RTP in ST-II that would include a checksum. That encapsulation might define a service like that of UDP over IP. The second request was for an option to request retransmission to implement a ``reliable'' class of service. It is expected that most real-time applications will not want to incur the delay imposed by retransmission. A generic retransmission function probably does not make sense in RTP, but reverse control options can be used to request retransmission in an application-specific manner when appropriate. For example, negative acknowledgements are defined in the INRIA H.261 packetization protocol Draft. For those applications that want to use the services of RTP but do require reliable delivery, RTP can be transported in a TCP stream. One final topic that the Group did not have time to discuss adequately was how RTP profiles should be defined and used. Would it be reasonable to include as part of a profile definition that the RTP framing field would always be included in order to allow multiple RTP PDUs to be assembled into, for example, one UDP packet? The Group has also not defined how the selection of a particular profile will be identified for applications that can operate with more than one profile. More work is needed on this topic. ``Implementors Agreement'' Session The third Working Group session was devoted to an ``implementors agreement'' discussion to promote convergence and interoperation among the software video compression programs being developed. At the previous meeting, it was observed that the tight coupling between the video frame grabbing procedure and the encoding process might mean that it would be infeasible to define an API between these two steps. However, Ron Frederick has observed that the coupling between software decoding and the display process can be much more flexible. Therefore, it may be feasible to achieve interoperation by allowing each hardware 4 and software platform to encode and transmit data in its native format, but to incorporate multiple decode routines using a common API so that each program can decode many or all of the other programs' native formats. Ron Frederick gave a presentation on the implementation of a decoder API in version 3.0 of the ``nv'' program. Decoding and rendering of the image data and decoupled: ``nv'' does all the network I/O, RTP processing, and X window system interaction; the image decode routines just convert each packet of compressed bits into uncompressed pixels for a portion of the image. When the ``S'' bit (end of synchronization unit) is set in the RTP header, ``nv'' will display the new image on the screen. This scheme allows support for multiple display depths, brightness/contrast mapping, and image scaling with simultaneous display of multiple sizes. Three decoders have been incorporated into ``nv'' to process video from ``nv'', from the hardware Bolter codec, and from the Cornell CU-SeeMe program. Christian Huitema identified a problem with the ``nv'' API when he described the H.261 encoding in the IVS program and the complexity that results from a combination of image structure and Huffman encoding of the image coefficients. There is a lot of state information implicit in the decoding process in the middle of processing ``group of blocks'' (GOB). Since one GOB may occupy more than one packet, it might be infeasible to try to save and restore the state at the boundary between packets so that control could be returned across the API from a decode routine. Therefore, IVS uses the ``S'' bit to indicate the GOB boundary so that all of the packets of a GOB can be handed together to the decode routine. This conflict in the interpretation of the ``S'' bit will have to be resolved to allow interoperation of ``nv'' and IVS. It was suggested that IVS is using the ``S'' bit for a transport function, whereas ``nv'' is using it for a presentation function, and therefore the former is correct. Ron said ``nv'' could be modified to get the display-image indication as a return valued from the decode routine, so this may be the solution. Paul Milazzo has implemented color video transmission in the BBN PictureWindow program. Paul proposed that the YCrCb color encoding from the CCIR 601 digital video standard be used as the color representation for software encoding. This is similar to the YUV analog encoding. The proposed scheme would keep the luminance (Y) pixels in one array, and chrominance (CrCb or UV) pixel pairs, subsampled by 2 in the X dimension, in another array. Because of the subsampling, the two arrays would be the same size. This scheme allows easy rendering into monochrome or color images. Further Working Group Activities A set of Internet-Drafts on RTP was issued in December 1992: 5 o draft-ietf-avt-rtp-00.txt o draft-ietf-avt-encoding-00.txt o draft-ietf-avt-profile-00.txt o draft-ietf-avt-issues-00.ps, .txt The first Draft is the specification of the real-time transport protocol itself. The second and third Drafts define a set of media encodings and a sample profile for use of those encodings to implement audio and video multiparticipant conferences with minimal control. The last Draft is an updated discussion of the issues and decisions involved in the design of the protocol. Revised Drafts incorporating changes discussed at this meeting will be issued in May. After review and possible further revision, it is expected that the Internet-Drafts will be submitted for approval as RFCs in June, completing the Working Group's Charter. Attendees Lou Berger lberger@bbn.com Larry Blunk ljb@merit.edu John Boatright bryan_boatright@ksc.nasa.gov Erik-Jan Bos erik-jan.bos@surfnet.nl Monroe Bridges monroe@cup.hp.com Sandy Bryant slb@virginia.edu Stephen Casner casner@isi.edu Yee-Hsiang Chang yhc@hpl.hp.com William Chimiak chim@relito.medeng.wfu.edu Richard Cogger R.Cogger@cornell.edu David Conklin conklin@jvnc.net Simon Coppins coppins@arch.adelaide.edu.au Mark Davis-Craig mad@merit.edu Shane Dawalt sdawalt@desire.wright.edu Steve DeJarnett steve@ibmpa.awdpa.ibm.com Tony DeSimone tds@hoserve.att.com Ed Ellesson ellesson@vnet.ibm.com Chip Elliott celliot@bbn.com Hans Eriksson hans@sics.se Francois Fluckiger fluckiger@vxcern.cern.ch Paul Franchois paulf@bldrdoc.gov Ron Frederick frederick@parc.xerox.com Jerry Friesen jafries@sandia.llnl.gov Marcello Frutig frutig@rnp.impa.br Gwen Funchess funchess@magnus.acs.ohio-state.edu Joseph Godsil jgodsil@ncsa.uiuc.edu Fengmin Gong gong@concert.net Kenneth Goodwin goodwin@a.psc.edu Mark Green markg@apple.com Robert Gutierrez gutierre@nsipo.nasa.gov Don Hoffman hoffman@eng.sun.com Frank Hoffmann hoffmann@dhdibm1.bitnet Christian Huitema christian.huitema@sophia.inria.fr 6 Phil Irey pirey@relay.nswc.navy.mil Ronald Jacoby rj@sgi.com Peter Kirstein P.Kirstein@cs.ucl.ac.uk Charley Kline cvk@uiuc.edu Lakshman Krishnamurthy lakashman@ms.uky.edu Giri Kuthethoor giri@ms.uky.edu Paul Lambert paul_lambert@email.mot.com Ruth Lang rlang@nisc.sri.com Ronald Lanning lanning@netltm.cats.ohiou.edu Yu-Lin Lu yulin@hpinddu.cup.hp.com Marjo Mercado marjo@cup.hp.com Donald Merritt Don@brl.mil Paul Milazzo milazzo@bbn.com Joseph Pang pang@bodega.stanford.edu Geir Pedersen Geir.Pedersen@usit.uio.no Jim Rees jim.rees@umich.edu Michael Safly saf@tank1.msfc.nasa.gov Carl Schoeneberger 70410.3563@Compuserve.com Eve Schooler schooler@isi.edu Stuart Stubblebine stubblebine@isi.edu Sally Tarquinio sallyt@gateway.mitre.org Craig Todd ctodd@desire.wright.edu Claudio Topolcic topolcic@cnri.reston.va.us Hung Vu hungv@fonorola.com Huyen Vu vu@polaris.disa.mil Abel Weinrib abel@bellcore.com John Wroclawski jtw@lcs.mit.edu Yow-Wei Yao yao@chang.austin.ibm.com 7