Minutes of the IP Performance Working Group

Reported by
	Paul Love and Guy Almes

1. Overview and Agenda Bashing

The meeting was chaired by WG co-chairs Guy Almes and Vern Paxson, and
was very well attended.  This was our first meeting as a formal working
group within the Transport Area, and was also our first meeting 
co-chaired by Vern Paxson.

Proposed agenda:
      Welcome: we are now a working group.  (10 min)
      Final changes to Framework document and Last Call.  (10 min)
      Update on draft metrics (delay, loss, connectivity).  (15 min)
      Experiences with draft metrics.  (20 min)
           Surveyor: Guy Almes
           Poip: Vern Paxson
      DOE PingER effort: David Martin. (20 min)
      End-User Perspective on Internet Performance:
           Venkat Rangan/Jim Goetz.  (20 min)
      Impact of loss characteristics on real-time applications:
           Rajeev Koodli/Rayadurgam Ravikanth.  (20min)
      Report from ANSI T1A1.3 liaison.  (10 min)

The agenda was agreeable to the participants, and it was noted that
the meeting would be somewhat packed.

2. Final changes to Framework document and Last Call: Vern Paxson
[slides included in the proceedings]

Vern Paxson reviewed the changes made to the Framework since our Munich
meeting.  Unless there are significant objections, we will soon submit
the Framework as an RFC.

Key changes include the following:
<> "Criteria for Granting Official Status to a Metric or Methodology"
   deleted due to lack of rough consensus.
<> Added discussion on "wire times" & IP fragments.
<> Standard-formed packet now has unspecified IP protocol field.
<> Added C code & discussion of Anderson-Darling goodness-of-fit test
   for exponential and uniform distributions.
<> Expanded discussion of Random Additive Sampling:
      Pros
        Avoids synchronization
        Unbiased estimates
      Cons
        Complicates frequency-domain analysis
        Somewhat predictable by adversary unless Poisson
   Added note that it may  in some situations be preferable to use non-
   Poisson sampling.  For example, it may be useful in some cases to have
   an upper bound on sampling interval.
<> Clarified: can test different distributions for consistency:
      Compute measurement schedule
      User-level measurement timestamps
      Measurement wire times

Given this discussion, Vern issued a Last Call.  We intend to submit to
publish the Framework as an informational RFC.  The last call was sent
out last week via email.  That email last call resulted in two improvements
in the Framework:
[] Clarifying discussion of "avoiding Stochastic Metrics"
[] Approximating Poisson sampling
There being no further comments during the meeting, the Framework was
passed off to the IESG (pending a set of final edits).  Scott Bradner, the
Area Director overseeing our work, expressed his appreciation at this
completed milestone.

3. Revisions to the Connectivity ID: Vern Paxson
[slides included in proceedings]

Vern reported on recent updates to the Connectivity Metric:
<> Use of the term "causal" has been changed to "temporal".
<> Minor edits have been made to clarify wording.
<> A comment has been added that TCP implementations generally do not
   need to send ICMP port unreachables, but they are required to treat 
   them the same as RST in response.
He also mentioned the comment from Jeff Sedayao that One-way connectivity,
which had been thought to be only of theoretical interest, might be useful
for dealing with certain security issues; this comment will be added in
the future.

4. Revisions to the One-way Packet Loss and One-way Delay Metrics: Guy
Almes
[slides included in proceedings]

Guy reported on recent updates to the metrics for One-way Packet Loss and
One-way Delay:
<> The metrics were clarified to note that a packet would be regarded as
   lost even in the case that some fragments arrived at the destination,
   but that reassembly failed.
<> The one-way delay metric was clarified to note that Wire Time was the
   currently cleanest way to define the precise times at which packets
   were sent or received.
With regard to the latter point, he mentioned one specific reservation with
wire time: in the case of contention-based networks, such as the classical
CSMA/CD-based ethernet, time spent by the sending host in waiting for the
network to become available should be regarded as a form of queueing delay
in the first hop across that contention-based network and thus included in
one-way delay.  If a strict notion of wire-time is used, however, this
waiting time will not be included.  This observation will be included in
future versions of the one-way delay metric draft, and may influence future
versions of the Framework.  He was careful, however, to stress that this
problem only occurs when the first hop on a path is over a contention-based
network; as networks become increasingly switched base, this problem will
occur less often.

Guy then commented on two alternatives to Poisson sampling:
<> N values within delta-T uniformly distributed.  This is a relatively
   minor deviation from Poisson and could be considered if there were
   significant motivation.
<> Passively watching the packets go by.  This would be a major departure,
   since we would have no control over the statistical properties of the
   sample.
There are no current plans to include either in the one-way delay or
packet loss metrics.

Christian Huitema strongly argued for the need to state the "error bars"
for
measurement results.

5. Experiences with Delay/Loss Metrics

5.a. Poip (Poisson Ping): Vern Paxson
[slides included in the proceedings]

Vern Paxson reported on the development of Poip.  Key features include:
<> Sources/sinks UDP packets transmitted at Poisson intervals (or
   uniform or periodic).  (So not "really" a ping, as it is one-way.)
<> Uses a generic wire time library.
<> Packet headers include: version, type, length, seq number, timestamp,
   and MD5 checksum over payload.
<> Uses Anderson-Darling  "A^2" test to check sending times.
<> Sanity checks on packet integrity (all the usual suspects)

Wire time API given for wire_init, wire_done, wire_add_fds,
wire_num_filter_drops & wire_activity.

Experience with the goodness-of-fit testing of measurement times
shows that scheduled times pass the test, but that user-level timestamps 
with 10 msec granularity fail when hundreds of samples are tested.

Vern then showed some measurements of wire time vs send time from poip
testing.  Generally, the differences between the two fall into about
three descrete values between 100 usec and 200 usec.  However, occasionally
network events occur that cause widely varying differences.  Two that were
identified were a large midnight batch job, and nightly backups over the
network.  These events serve as reminders that sometimes wire times can
differ significantly from application-layer perceptions, due to external
factors.

5.b. Surveyor Project: Guy Almes
[slides included in the proceedings]

The Surveyor project is a joint effort of Advanced Network & Services
and the 23 universities of the Common Solutions Group.  The current
emphasis is on ongoing operational measurement and archiving of
one-way delay and packet loss along all the end-to-end paths between
pairs of campuses.

At each campus there is a dedicated 200 MHz Pentium-based measurement
machine that measures one-way delay and packet loss with a lambda of
2 packets/sec.

These measurement machines upload their results in an ongoing basis
to a database server which stores them indefinately.

The results can then be accessed via the web using a (currently immature)
set of analysis/visualization tools.

The slides show several examples of one-way delay and loss along several
paths.  In each slide, one 24-hour (GMT) period is indicated.
In the delay graphs, the minimum, 50th percentile, and 90th percentile
of delay are displayed for each one-minute period.
In the loss graphs, the percentage of packet loss is displayed for each
one-minute period.
The delay figures are believed to be accurate to about 100 usec.

One challenge is to relate these frequent measurements of delay and loss
to dynamic changes in route.  It is not practical, for example, to both
measure delay accurately and know the route taken.  Without making too
much of it, the similarities with the Heisenberg effect can be considered.

Current work includes broadening the deployment to other sites and to
improving the analysis and visualization tools.

6. Three Presentations on Related Research

6.a. Report on the DoE Energy Research PingER Network Monitoring Effort:
	David Martin
[slides included in the proceedings]

David Martin reported on joint work with colleagues at Fermilab, SLAC,
and 15 participating DoE HEPnet sites.

HEPnet has sites of interest around world.  Since the network has moved
from 
single-purpose to a world of NAPs, ISPs, etc.,there is a need to measure 
performance.

The talk emphasized the following points:
<> Round-trip delay and loss (using the ping tool) of 100-byte and
1000-byte
   packets is the fundamental low-level measurement.
<> The Data Collection Architecture includes:
      Remote sites - need only respond to a Ping
      Collecting sites - initiate Pings and record results
      Analysis sites - take data from collecting site(s) and do the work on it
<> In each test, a single ping is used to prep caches, etc., then 10
100-byte
   pings and 10 1000-byte pings are measured.  These tests are performed
   once every 30 minutes on each path.  Both packet loss percentage and the
   minimum, mean, and maximum of the round-trip delay values are recorded.
<> Analysis sites use a set of Perl 5 programs and the SAS scientific
   database system/language to facilitate analysis and archiving of
results.
<> The results of the analyses are presented via the web.  Some of these
   analyses can be parameterized and invoked via CGI and thus quite
   adaptable.
<> Based on experience, a new "timeping" daemon is being implemented:
   - Poisson process instead of every 30 minutes to trigger tests
   - Median delay values will be recorded (in addition to mean etc.)

A set of example screen captures were then shown to give a feel for the
use of the tools.

6.b. End-user perspective on Internet Performance: Venkat Rangan
[slides included in the proceedings]

Venkat presented a set of tools by VitalSigns that aims to:
<> Use an End-User agent to diagnose and isolate performance problems
   across user's Internet path
<> Impose low resource consumption 
<> Emphasize passive techniques
<> Emphasize first-level problem diagnosis 
<> Emphasize estimates and indices rather than hard metrics
<> Provide visual and immediate indicators of performance problems and
   bottlenecks
<> Provide performance data which is not normally available by traditional
   means

The technical means used emphasize passively observing:
<> Response time of the initial handshake of TCP connections
<> Apparent throughput rates during TCP connections
<> Packet loss from TCP retransmissions

VitalSigns has a white paper at
	www.vitalsigns.com/products/vista/wp/index.html
describing the tools in more detail.

6.c. Impact of Loss Characteristics on Real-Time Applications:
	Rajeev Koodli/Rayadurgam Ravikanth
[slides included in the proceedings]

This talk, based on research at Nokia's Boston laboratories, advanced
some new ideas on understanding how packet loss impacts real-time
applications.

Using the current notions of singletons and samples of packet loss, it
was noted that the only statistic defined so far was percentage.  The
talk presented two ways to treat the time-series aspects of packet loss.

First, it was noted that some real-time applications are sensitive to
packet loss in such a fashion that isolated losses can be withstood,
but bunches of packets lost would degrade quality.  For example, an
audio application using forward error correction might be able to
tolerate isolated packet losses provided that 5 successful packet
transmission occur between successive packet losses.
Based on this observation, the notion of loss constraint was introduced
as the minimum number of successful packet transmissions between lost
packets.  Thus, if a given test shows that there are always 5 successful
transmissions between packet losses, then the stream is said to
succeed with a loss constraint of 5.

Second, it was noted that the loss period, defined as the time duration
of a burst of losses, might be important for real-time applications.
Bursts of losses longer than a critical threshold might have an especially
severe negative impact on such applications.

This talk was the first attempt with the IPPM effort to treat the time-
series aspects of packet loss.  It triggered many questions.

Vern Paxson noted that the significance of loss period would depend on
the rate at which the application was attempting to send packets.

Christian Huitema noted that the work was very specific to a particular
kind of real-time application, and may not be suitable to standardize
for an IETF effort, and asked how could this be generalized to archive
information with more general usefulness.

Vern Paxson asked whether anything is known about what burst lengths
are known to be problematic for specific applications.  The presenters
answered that they did not know for audio, but that the loss of 2
frames in a row was perceived in video.

7. Report from ANSI T1A1.3 liaison: Vern Paxson
[slides included in proceedings]

Vern Paxson reported on his work as liaison to the ANSI T1A1.3
effort.

T1A1.3 is an ANSI working group in "Performance of Digital networks
and services".  It has recently initiated work on "Internet Service
Performance Specification", and will likely forward the results
eventually to the ITU.
This work is rooted in ANSI experience going back to efforts to
specify the quality of X.25 services (X134 - X139), and has considerable
overlap with IETF IPPM work.

Based on their experiences and traditions, the ANSI effort uses
terminology different from that used by IPPM.  As one example, they
allow themselves to define (theoretically) "observable events" that,
while well defined, have no practical methodology.  Also unlike the
IPPM work, they may well specify specific values as criteria for
rating networks as acceptable.  They will likely emphasize passive
methodologies and will likely include link-layer (in addition to
network-layer) notions.

Refer to:

    www.t1.org/t1a1/t1a1.htm
    www.t1.org/t1a1/_a13-hom.htm

for more information.