CURRENT_MEETING_REPORT_ Reported by Steve Casner/USC-ISI Minutes of the Audio/Video Transport Working Group (AVT) The AVT Working Group met during three separate sessions. The first session began with presentations of candidate protocols for real-time audio/video transport, followed by a lively discussion of the differences among the candidates and the underlying questions implied by those differences. The discussion resumed in the second session and part of the third, followed by live demonstrations of experimental packet audio and video programs. As part of the second IETF ``audiocast'', live audio and video from all three sessions was transmitted via UDP and IP multicast to participants at a number of locations around the world. At least two remote participants made multiple contributions to the Working Group discussion. 1. Presentations of Candidate Protocols Steve Casner began with a quick review of the descriptions of the Network Voice Protocol (NVP-II) data packet format and the first-cut strawman protocol from the San Diego meeting, then presented a second-cut strawman based on the discussions in San Diego. The data packet header contains the following fields: o Timestamp (16 bits of seconds + 16-bit fraction) o Packet Sequence Number (16 bits) o Flow Identifier (8 bits) o Options Length (8 bits) o Options Since Van Jacobsen could not attend, Steve also described the protocol used by the vat audio program, based on a protocol description sent by Van to the rem-conf list. The data packet header format is: o Protocol Version (2 bits) o Number of Site Identifiers to follow header (6 bits) o Start-of-Talkspurt Flag (1 bit) o Audio Format/Encoding (5 bits) o Conference Identifier (16 bits) o Timestamp (32-bit audio sample counter) o Site Identifiers (0 to 63; 32 bits each) Both of these data packet formats depend on a session/control protocol to carry information that is not required in every data packet. Henning 1 Schulzrinne described the extensions to the vat session protocol used in his NEVOT audio program, in particular the periodic transmission of the sender's state (the current time and how many samples have been transmitted) to enable measurement of loss at the receiver. Simon Hackett gave an impromptu overview of his Multimedia Data Switch (MMDS) application and protocol. For purposes of experimentation, Simon chose to use large headers including a variety of fields to make the data self-describing. He also continues to send packet headers during silence as a keep-alive, but just omits the data to reduce the bandwidth. See section 5 below for references on these protocols. 2. Discussion of Protocol Differences The goal of the discussion was to identify the issues that must be resolved in order to produce a draft protocol. The primary ones were: o Timestamp format, media sample clock or real time o Sequence number versus start-of-talkspurt flag o What multiplexing is required beyond address+port o Whether or not to indicate encoding format in data packets The first two issues underlie a key question for the Working Group, namely whether we should define one real-time transport protocol or multiple application-specific protocols. The rough concensus was for the former, but this may conflict with ease of implementation. The Working Group discussed timestamp formats at the last meeting and this one, but the issue is still not finally decided. For purposes of synchronization among multiple media sources, the only practical means is to relate all streams to real time (synchronized time of day). This would be simplified if the timestamps are in real time, but the implementation of audio buffering is much easier with an audio sample clock timestamp. The timestamp format could be converted either at the sender or receiver; what's needed is a detailed analysis of the tradeoffs. The strawman protocols propose a packet sequence number in addition to the timestamp in order to differentiate lost packets from packets not sent during silence. The vat protocol uses a flag on the first packet of a talkspurt because packet mis-ordering makes the sequence number hard to use. On the other hand, a sequence number may be required for video applications that don't have talkspurts but require multiple packets per frame all with the same timestamp. The Flow ID in the strawman protocol serves two purposes: it provides multiplexing of multiple streams (e.g., audio and video) from the same source on one IP multicast address and port, and it allows for different encodings to be used, with each Flow ID bound to an encoding descriptor 2 using the session/control protocol. As defined, the vat protocol includes an explicit encoding format field in the data packet, but the Working Group deemed 5 bits to be too small a number. The vat encoding values could also be bound a dynamic set of encoding descriptors using a control protocol. The vat Conference ID discriminates among conferences in case of a collision in random IP multicast address allocation and because many BSD derived systems don't allow discriminating on the multicast destination address. The strawman assumes a repair of the BSD deficiency (which seems feasible at this time for multicast capable systems) and assumes some other method to avoid address collisions. 3. Completeness and Compatibility with Connection Management In addition to resolving differences among the protocol proposals, we must consider whether the protocols are sufficiently complete. Unlike the audio and video conferencing applications, distributed simulation and PBX trunking may require aggregation of multiple frames of data into a single packet. If the frames can all share the same header information, then aggregation can be consigned to the next layer up; if not then some additional encapsulating mechanism would be required. We did not consider this further. Another extension would be flow control. In previous Working Group discussions, it has been assumed that network resource management mechanisms and protocols would be available to allow real-time applications to avoid congestion. Christian Huitema pointed out that at least over some paths we will probably need a feedback mechanism to allow adjustable codecs to accommodate congestion. The Group was unsure whether an application-independent feedback mechanism could be defined. Christian is to write a specification as a starting point. This Working Group's low-level protocol must also be compatible with higher-level connection management protocols such as those under discussion in the Remote Conferencing Architecture BOF. Provision of encoding format selections from a conference directory server seems straightforward. However, the server must also have a means to acquire an IP multicast address. Lixia Zhang suggested (remotely!) that we really should consider a distributed system of servers to hand out globally unique IP multicast addresses; this capability will be needed by several groups considering multicast, not just ours. 4. Software Encoding and Enumeration The real-time transport protocol should be independent of the media encoding algorithms and formats that belong to the next higher layer except that the format must be identified by the lower layer. However, in keeping with the Working Group goal to foster interoperation and experimentation with packet audio and video, it may be valuable to agree on some (perhaps low performance) software compression techniques for use until hardware is generally available. This suggests that some of 3 the encoding formats we need to identify will be non-standard and hence not included in any standard enumeration. The Working Group feels a strong need to pick up a task that has been deferred by others, to define an IANA-managed enumeration or naming convention for audio and video encoding algorithms to enable interoperation. The enumeration should not be part of the protocol itself, but the protocol must provide the space to carry the encoding identification. There was substantial discussion of numeric vs text/parametric identification of formats. This issue was not resolved. The third Working Group session was concluded with descriptions and demonstrations of the software encoding algorithms developed by Working Group participants. Paul Milazzo gave an update on the protocol for the BBN Desktop Video Conference program which was used to multicast packet video from IETF. Christian Huitema showed the INRIA H.261 video compression software. Hans Eriksson described the packet audio and video experiments at SICS. 5. Further Discussion While several issues were not resolved, we laid out the considerations for each choice well enough to guide the design of a complete set of consistent choices as the first draft protocol from this Group. Our (revised) goal is to have an Internet Draft protocol submitted by November. Further discussion by email will be required to make this happen. During the IETF meeting, some notes from the first session, including a description of the strawman and vat protocols, was sent to the rem-conf list. It should be in the archive, or may be requested from casner@isi.edu. A message from last March on MMDS is also available. An extensive summary of the issues and a protocol recommendation has been prepared by Henning Schulzrinne and is available from: gaia.cs.umass.edu:~ftp/pub/rtp/rtp.ps This working paper will be made an Internet Draft for wider distribution. Thanks to Eve Schooler, Henning Schulzrinne and Christian Huitema for taking the notes from which these Minutes were prepared. Attendees George Abe abe@infonet.com J. Allard jallard@microsoft.com John Batzer Lou Berger lberger@penril.com 4 James Berry beri@sandia.llnl.gov Luc Boulianne lucb@cs.mcgill.ca Scott Brim swb@cornell.edu Alan Bryenton bryenton@bnr.ca Randy Butler rbutler@ncsa.uiuc.edu Stephen Casner casner@isi.edu Yee-Hsiang Chang yhc@concert.net Andrew Cherenson arc@sgi.com Robert Clements clements@bbn.com Michael Collins collins@ccc.nersc.gov Steve Deering deering@parc.xerox.com Tony DeSimone tds@hoserve.att.com Jack Drescher drescher@concert.net Hans Eriksson hans@sics.se Julio Escobar jescobar@bbn.com Roger Fajman raf@cu.nih.gov Margaret Forsythe mrf@ftp.com Osten Franberg euaokf@eua.ericsson.se Ron Frederick frederick@parc.xerox.com Jerry Friesen jafries@sandia.llnl.gov Robert Gilligan Bob.Gilligan@eng.sun.com Simon Hackett simon@internode.com.au Robert Hagens hagens@ans.net Christian Huitema christian.huitema@sophia.inria.fr Peter Kirstein P.Kirstein@cs.ucl.ac.uk Jim Knowles jknowles@trident.arc.nasa.gov Padma Krishnaswamy kri@sabre.bellcore.com Matt Mathis mathis@a.psc.edu Cindy Mazza Donald Merritt don@brl.mil Paul Milazzo milazzo@bbn.com Robert Mines rfm@sandia.llnl.gov Donald Morris morris@ucar.edu Ari Ollikainen ari@es.net Roger Osmond bytex!rfo@uunet.uu.net Larry Palmer lp@decvax.dec.com Michael Powell mdpowel@pacbell.com Russell Pretty pretty@bnr.ca K. K. Ramakrishnan rama@erlang.enet.dec.com Bradley Rhoades bdrhoades@mmc.mmmg.com Allan Rubens acr@merit.edu Henry Sanders henrysa@microsoft.com Eve Schooler schooler@isi.edu Koichiro Seto seto@hitachi-cable.co.jp Vincent Sgro sgro@cs.rutgers.edu Louis Steinberg louiss@vnet.ibm.com Terrance Sullivan terrys@newbridge.com Sally Tarquinio sallyt@gateway.mitre.org Claudio Topolcic topolcic@nri.reston.va.us Mark Uhrmacher maui@tabasco.lcs.mit.edu Andrew Veitch aveitch@bbn.com John Vollbrecht jrv@merit.edu David Waitzman djw@bbn.com Sandro Wallach sandro@elf.com 5 Abel Weinrib abel@bellcore.com Rick Wilder wilder@ans.net Walter Wimer walter.wimer@andrew.cmu.edu Linda Winkler lwinkler@anl.gov Jeff Wong jaw@io.att.com Richard Woundy rwoundy@rhqvm21.vnet.ibm.com John Wroclawski jtw@lcs.mit.edu Paul Zawada Zawada@ncsa.uiuc.edu 6