Minutes of the IP Performance Working Group Reported by Paul Love and Guy Almes 1. Overview and Agenda Bashing The meeting was chaired by WG co-chairs Guy Almes and Vern Paxson, and was very well attended. This was our first meeting as a formal working group within the Transport Area, and was also our first meeting co-chaired by Vern Paxson. Proposed agenda: Welcome: we are now a working group. (10 min) Final changes to Framework document and Last Call. (10 min) Update on draft metrics (delay, loss, connectivity). (15 min) Experiences with draft metrics. (20 min) Surveyor: Guy Almes Poip: Vern Paxson DOE PingER effort: David Martin. (20 min) End-User Perspective on Internet Performance: Venkat Rangan/Jim Goetz. (20 min) Impact of loss characteristics on real-time applications: Rajeev Koodli/Rayadurgam Ravikanth. (20min) Report from ANSI T1A1.3 liaison. (10 min) The agenda was agreeable to the participants, and it was noted that the meeting would be somewhat packed. 2. Final changes to Framework document and Last Call: Vern Paxson [slides included in the proceedings] Vern Paxson reviewed the changes made to the Framework since our Munich meeting. Unless there are significant objections, we will soon submit the Framework as an RFC. Key changes include the following: <> "Criteria for Granting Official Status to a Metric or Methodology" deleted due to lack of rough consensus. <> Added discussion on "wire times" & IP fragments. <> Standard-formed packet now has unspecified IP protocol field. <> Added C code & discussion of Anderson-Darling goodness-of-fit test for exponential and uniform distributions. <> Expanded discussion of Random Additive Sampling: Pros Avoids synchronization Unbiased estimates Cons Complicates frequency-domain analysis Somewhat predictable by adversary unless Poisson Added note that it may in some situations be preferable to use non- Poisson sampling. For example, it may be useful in some cases to have an upper bound on sampling interval. <> Clarified: can test different distributions for consistency: Compute measurement schedule User-level measurement timestamps Measurement wire times Given this discussion, Vern issued a Last Call. We intend to submit to publish the Framework as an informational RFC. The last call was sent out last week via email. That email last call resulted in two improvements in the Framework: [] Clarifying discussion of "avoiding Stochastic Metrics" [] Approximating Poisson sampling There being no further comments during the meeting, the Framework was passed off to the IESG (pending a set of final edits). Scott Bradner, the Area Director overseeing our work, expressed his appreciation at this completed milestone. 3. Revisions to the Connectivity ID: Vern Paxson [slides included in proceedings] Vern reported on recent updates to the Connectivity Metric: <> Use of the term "causal" has been changed to "temporal". <> Minor edits have been made to clarify wording. <> A comment has been added that TCP implementations generally do not need to send ICMP port unreachables, but they are required to treat them the same as RST in response. He also mentioned the comment from Jeff Sedayao that One-way connectivity, which had been thought to be only of theoretical interest, might be useful for dealing with certain security issues; this comment will be added in the future. 4. Revisions to the One-way Packet Loss and One-way Delay Metrics: Guy Almes [slides included in proceedings] Guy reported on recent updates to the metrics for One-way Packet Loss and One-way Delay: <> The metrics were clarified to note that a packet would be regarded as lost even in the case that some fragments arrived at the destination, but that reassembly failed. <> The one-way delay metric was clarified to note that Wire Time was the currently cleanest way to define the precise times at which packets were sent or received. With regard to the latter point, he mentioned one specific reservation with wire time: in the case of contention-based networks, such as the classical CSMA/CD-based ethernet, time spent by the sending host in waiting for the network to become available should be regarded as a form of queueing delay in the first hop across that contention-based network and thus included in one-way delay. If a strict notion of wire-time is used, however, this waiting time will not be included. This observation will be included in future versions of the one-way delay metric draft, and may influence future versions of the Framework. He was careful, however, to stress that this problem only occurs when the first hop on a path is over a contention-based network; as networks become increasingly switched base, this problem will occur less often. Guy then commented on two alternatives to Poisson sampling: <> N values within delta-T uniformly distributed. This is a relatively minor deviation from Poisson and could be considered if there were significant motivation. <> Passively watching the packets go by. This would be a major departure, since we would have no control over the statistical properties of the sample. There are no current plans to include either in the one-way delay or packet loss metrics. Christian Huitema strongly argued for the need to state the "error bars" for measurement results. 5. Experiences with Delay/Loss Metrics 5.a. Poip (Poisson Ping): Vern Paxson [slides included in the proceedings] Vern Paxson reported on the development of Poip. Key features include: <> Sources/sinks UDP packets transmitted at Poisson intervals (or uniform or periodic). (So not "really" a ping, as it is one-way.) <> Uses a generic wire time library. <> Packet headers include: version, type, length, seq number, timestamp, and MD5 checksum over payload. <> Uses Anderson-Darling "A^2" test to check sending times. <> Sanity checks on packet integrity (all the usual suspects) Wire time API given for wire_init, wire_done, wire_add_fds, wire_num_filter_drops & wire_activity. Experience with the goodness-of-fit testing of measurement times shows that scheduled times pass the test, but that user-level timestamps with 10 msec granularity fail when hundreds of samples are tested. Vern then showed some measurements of wire time vs send time from poip testing. Generally, the differences between the two fall into about three descrete values between 100 usec and 200 usec. However, occasionally network events occur that cause widely varying differences. Two that were identified were a large midnight batch job, and nightly backups over the network. These events serve as reminders that sometimes wire times can differ significantly from application-layer perceptions, due to external factors. 5.b. Surveyor Project: Guy Almes [slides included in the proceedings] The Surveyor project is a joint effort of Advanced Network & Services and the 23 universities of the Common Solutions Group. The current emphasis is on ongoing operational measurement and archiving of one-way delay and packet loss along all the end-to-end paths between pairs of campuses. At each campus there is a dedicated 200 MHz Pentium-based measurement machine that measures one-way delay and packet loss with a lambda of 2 packets/sec. These measurement machines upload their results in an ongoing basis to a database server which stores them indefinately. The results can then be accessed via the web using a (currently immature) set of analysis/visualization tools. The slides show several examples of one-way delay and loss along several paths. In each slide, one 24-hour (GMT) period is indicated. In the delay graphs, the minimum, 50th percentile, and 90th percentile of delay are displayed for each one-minute period. In the loss graphs, the percentage of packet loss is displayed for each one-minute period. The delay figures are believed to be accurate to about 100 usec. One challenge is to relate these frequent measurements of delay and loss to dynamic changes in route. It is not practical, for example, to both measure delay accurately and know the route taken. Without making too much of it, the similarities with the Heisenberg effect can be considered. Current work includes broadening the deployment to other sites and to improving the analysis and visualization tools. 6. Three Presentations on Related Research 6.a. Report on the DoE Energy Research PingER Network Monitoring Effort: David Martin [slides included in the proceedings] David Martin reported on joint work with colleagues at Fermilab, SLAC, and 15 participating DoE HEPnet sites. HEPnet has sites of interest around world. Since the network has moved from single-purpose to a world of NAPs, ISPs, etc.,there is a need to measure performance. The talk emphasized the following points: <> Round-trip delay and loss (using the ping tool) of 100-byte and 1000-byte packets is the fundamental low-level measurement. <> The Data Collection Architecture includes: Remote sites - need only respond to a Ping Collecting sites - initiate Pings and record results Analysis sites - take data from collecting site(s) and do the work on it <> In each test, a single ping is used to prep caches, etc., then 10 100-byte pings and 10 1000-byte pings are measured. These tests are performed once every 30 minutes on each path. Both packet loss percentage and the minimum, mean, and maximum of the round-trip delay values are recorded. <> Analysis sites use a set of Perl 5 programs and the SAS scientific database system/language to facilitate analysis and archiving of results. <> The results of the analyses are presented via the web. Some of these analyses can be parameterized and invoked via CGI and thus quite adaptable. <> Based on experience, a new "timeping" daemon is being implemented: - Poisson process instead of every 30 minutes to trigger tests - Median delay values will be recorded (in addition to mean etc.) A set of example screen captures were then shown to give a feel for the use of the tools. 6.b. End-user perspective on Internet Performance: Venkat Rangan [slides included in the proceedings] Venkat presented a set of tools by VitalSigns that aims to: <> Use an End-User agent to diagnose and isolate performance problems across user's Internet path <> Impose low resource consumption <> Emphasize passive techniques <> Emphasize first-level problem diagnosis <> Emphasize estimates and indices rather than hard metrics <> Provide visual and immediate indicators of performance problems and bottlenecks <> Provide performance data which is not normally available by traditional means The technical means used emphasize passively observing: <> Response time of the initial handshake of TCP connections <> Apparent throughput rates during TCP connections <> Packet loss from TCP retransmissions VitalSigns has a white paper at www.vitalsigns.com/products/vista/wp/index.html describing the tools in more detail. 6.c. Impact of Loss Characteristics on Real-Time Applications: Rajeev Koodli/Rayadurgam Ravikanth [slides included in the proceedings] This talk, based on research at Nokia's Boston laboratories, advanced some new ideas on understanding how packet loss impacts real-time applications. Using the current notions of singletons and samples of packet loss, it was noted that the only statistic defined so far was percentage. The talk presented two ways to treat the time-series aspects of packet loss. First, it was noted that some real-time applications are sensitive to packet loss in such a fashion that isolated losses can be withstood, but bunches of packets lost would degrade quality. For example, an audio application using forward error correction might be able to tolerate isolated packet losses provided that 5 successful packet transmission occur between successive packet losses. Based on this observation, the notion of loss constraint was introduced as the minimum number of successful packet transmissions between lost packets. Thus, if a given test shows that there are always 5 successful transmissions between packet losses, then the stream is said to succeed with a loss constraint of 5. Second, it was noted that the loss period, defined as the time duration of a burst of losses, might be important for real-time applications. Bursts of losses longer than a critical threshold might have an especially severe negative impact on such applications. This talk was the first attempt with the IPPM effort to treat the time- series aspects of packet loss. It triggered many questions. Vern Paxson noted that the significance of loss period would depend on the rate at which the application was attempting to send packets. Christian Huitema noted that the work was very specific to a particular kind of real-time application, and may not be suitable to standardize for an IETF effort, and asked how could this be generalized to archive information with more general usefulness. Vern Paxson asked whether anything is known about what burst lengths are known to be problematic for specific applications. The presenters answered that they did not know for audio, but that the loss of 2 frames in a row was perceived in video. 7. Report from ANSI T1A1.3 liaison: Vern Paxson [slides included in proceedings] Vern Paxson reported on his work as liaison to the ANSI T1A1.3 effort. T1A1.3 is an ANSI working group in "Performance of Digital networks and services". It has recently initiated work on "Internet Service Performance Specification", and will likely forward the results eventually to the ITU. This work is rooted in ANSI experience going back to efforts to specify the quality of X.25 services (X134 - X139), and has considerable overlap with IETF IPPM work. Based on their experiences and traditions, the ANSI effort uses terminology different from that used by IPPM. As one example, they allow themselves to define (theoretically) "observable events" that, while well defined, have no practical methodology. Also unlike the IPPM work, they may well specify specific values as criteria for rating networks as acceptable. They will likely emphasize passive methodologies and will likely include link-layer (in addition to network-layer) notions. Refer to: www.t1.org/t1a1/t1a1.htm www.t1.org/t1a1/_a13-hom.htm for more information.