# secdir review of draft-ietf-ippm-ioam-data-integrity-06
CC kaduk

I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG. These comments were written primarily for the benefit of the
security area directors. Document editors and WG chairs should treat
these comments just like any other last call comments.

The summary of the review is that this is still early-stage work and we haven't landed on the right cryptographic mechanisms yet -- what's currently in place has some issues that would be very significant issues for some deployment sites/scenarios, and there are some aspects that are not fully specified yet.
A lot of my comments cover topics that will merit mention in the security considerations section, though I did not attempt to call each one out as such or produce an exhaustive list at the end.

## Discuss

### Signature vs MAC

I'm pretty uncomfortable about using the word "signature" to describe the protection that we're applying here.  The current text in §5 reads as if the need for a symmetric-key mechanism is an intrinsic requirement of the mechanism (which seems reasonable given the use-case -- true asymmetric signatures would be too computationally expensive to be suitable here), but in that case it seems that we are really expecting a MAC (message authentication code) rather than a signature (that provides both data authentication and source authentication).  In particular, we cite NIST.800-38D for the AES-256-GCM "signature algorithm", when I think we are really using GMAC -- the NIST document talks about GMAC a lot and does not use the word "signature" anywhere, which seems particularly illustrative to me.

### Pre-hash vs. direct signature

The mechanism specified in §5 uses a sign(hash(...)) construction, rather than signing directly.  This is sometimes needed for asymmetric signature primitives like RSA, but typically a MAC construction will be able to authenticate large quantities of data directly, without the need to pre-digest the input data.
It is worth noting that the current construction does not actually provide the MAC's authentication guarantee to the underlying packet fields, just to the has value.

### Key management

While the specifics of key distribution and management will inherently be out of scope for this specification, I think we do need to settle a core question: will each IOAM Node have its own unique key for signature generation, or do we expect some level of key reuse across machines within a given domain?  The latter scenario intuitively seems like it would make for a simpler deployment, but would place some quite stringent limitations on what cryptographic mechanisms we can use at runtime.  NIST's key usage limitations for GCM, in particular, might actually prove prohibitive for the "one key shared across nodes" option in practice.

### Nonce guidance

I think this document is incomplete without some discussion of the properties that the nonce can provide, especially the factors that go into selection of the nonce length.  I can accept that the actual "common methodology to keep the Nonce valid only for a specific period of time" would be outside of the scope of this document (though we could certainly provide one or more examples along with their caveats and areas of applicability if we wanted to), but there is a lot that we can and should talk about.  For example, consumers will need to know if the nonce must be unpredictable as well as unique, and whether there are factors going into the selection of nonce length other than the number of available nonce values without reuse (and the likelihood of collision if a random nonce-selection procedure is used).  For GMAC in particular, there is a pretty strong argument in favor of using 96-bit nonces (since all other nonce lengths internally get hashed down to 96 bits), even though the statistics for 96-bit nonces can hinder certain use cases.  We could probably also talk about the benefits and risks of using sequential values as the nonce (bearing in mind the discussions in RFC 9416) even if there is not a general recommendation one way or the other on their use.

### Signature as nonce for transit nodes

The specification of the actual cryptographic protection algorithm in §5 includes a provision (step 2) for a transit node to compute a new signature that accounts for the additions it has made to the IOAM data.  It seems to be saying that the corresponding cryptographic computation involves using the received signature value as the nonce input for producing this new signature. While SP800-38D does seem to permit using multiple IV lengths with a given key, using the received signature as a nonce seems to set us up for using nonces of length other than 96 bits, which (per {{GCM-Key-usage-limitations}}) strongly limits how much traffic we can send in a given key.  It also would entail using data received from the network as the nonce for a given node's private key, which seems like it would make it easy for an attacker to induce (key,nonce) reuse.  (This might be mitigated if the transit nodes diligently validate the received signature prior to using it as a nonce, but even that would not obivate the need for a fully reliable replay detection mechanism for the lifetime of the key, which seems prohibitively expensive.)
This scheme also seems to make validation quite complicated and expensive, since in order to verify the final received signature we'd need to reconstruct the whole chain of signature from initial encapsulation through all transit nodes.  While being forced to validate all the intermediate signatures does provide a fairly strong indication of non-tampering along the path, it's also a lot of recomputation.
A scheme where the transit nodes that regenerate the signature also generate a new nonce that goes in the packet would be simpler/faster, at the cost of trustng all the transit nodes to properly validate the received signature and to be operating correctly.  This is a trade-off, and could be decided either way, but if we opt to go for the stronger validation we should specifically say that we made the choice to do that and accept the costlier validation and replay protection.

### Bit-level hash inputs (Protected flags for trace and DEX option-types)

I think we need to provide more clear guidance on how to handle the protected flags for the trace and DEX option-types.  First, a note in passing that interoperability does require locking in which flag bits are (not) covered by the MAC at the time the protected option type is specified (i.e., now), which blocks off future extensions since any new flag bits cannot be integrity protected using this mechanism.  That said, it would be possible to divide the unallocated flags range into a protected and unprotected portion, to leave a little bit of flexibility for the future.
My main concern here, though, is to concretely specify, at the bit level, what the input to the MAC (or hash) function is -- do we mask out the unprotected bits (if so, to 0s or 1s?), or do we literally just extract the two bits in question and make the bitstream input to the MAC (or hash) be not byte aligned?  I strongly suggest the former, since implementation handling for non-octet inputs to hash functions is very poor, but reading the current text I would conclude that I must attempt to implement the latter.  (I note on re-read that SP800-38D does require the inputs to have bit lengths be a multiple of 8, so if we decide to use GMAC directly rather than hash-and-MAC, we would need to specify padding if we opt for the latter approach.)

### GCM Key usage limitations

Since we're using GMAC as the integrity protection mechanism, we need to look at the GCM key usage limitations to know how many times a given key can be used.  Unfortunately, what SP800-32D says is quite restrictive: its §8.3 lays out scenarios where the total number of invocations of the authenticated encryption function cannot exceed 2**32 for a given key.  Even if we have unique keys per node generating a signtaure, this limit can still be hit fairly quickly at modrately high traffic rates.  To avoid that limit we'd need to exclusively use 96-bit IVs that use the "determinstic construction" and in that case would be limited by the need to avoid reusing "invocation field" values on a given device.  Depending on what deployment scenarios we are thinking about, there's a significant chance that we'll need to have the protocol be able to accomodate key rotations in order to avoid the key usage limits.  This might take the place of an in-protocol key identifier field, or guidance to use some other protocol element (such as Namespace-ID) to select which key to use.

### GMAC output length

The specification of integrity protection signature suite 1 in §5 says that we're using AES-256-GCM and that "the signature consumes 32 octets".  I'm having trouble understanding where that number comes from.  While it's true that AES-256-GCM needs a 32-byte key and SHA-256 produces a 32-byte output, the GCM authentication tag length is specified separately, and has to be one of a handful of preordained values (128, 120, 112, 104, of 96 bits per SP800-38D).  I would assume that we want the full 16-byte authentication tag for our purposes, but then what are the other 16 bytes of "signature" supposed to be?

## Comments

### Requirement for security

When the abstract says

   IETF protocols require features to ensure their security.

This is true in a certain sense, but the sense may be quite subtle for
some readers.  In particular, we require (of new work) that they have
the capability to be used in a secure fashion, but in many cases we do
not require that the security mechanisms are actually used at runtime.
So, for example, while TLS 1.3 does require that you actually provide
(server) authentication, data confidentiality, and in most cases forward
secrecy, we also see that (to pick an arbitrary example) one can use the
RFC 8300 Network Service Header without using RFC 9145's integrity
protection.

Which is a long-winded way of saying that I'd propose to say "require
features that can provide secure operation" rather than "require
features to ensure their security".

### References for Ping and Traceroute

Do we want to provide references or links for Ping and/or Traceroute
(mentioned in passing in §1)?

### Threat model

§1 lays out the scenario as having an IOAM-Domain that's under a single administrative control but invokes the possibility of data collected in untrused or semi-trusted environments as a motivation for integrity protection.  Is this just a risk that nodes which are supposed to be under the domain's administrative control get compromised, or are we intending to consider a subtler scenario with semi-trusted entities being authorized parts of the administrative domain directly, or something else?

### detectability problem

In a certain sense a nit, but coupled enough to the core intent of the document that I promote it to a comment.  I think we need to more concretely introduce the "detectability problem", c.f.  §1¶3:

> The following considerations and requirements are to be taken into account in addition to addressing the problem of detectability of any integrity breach of the IOAM-Data-Fields collected:

The "detectability" problem hasn't been introduced yet, as such.  Maybe we want another paragraph before this one, like

% Since arbitrary nodes and middleboxes are free to tamper with all packet data, including IOAM fields, and the packets are (in general) processed by other intermediary nodes before they might come to a node in a position to verify the packet's contents, there is little value in attempting to use cryptographic mechanisms to prevent such modifications to the packet contents.  Instead, we limit ourselves to the "detectability problem", namely, to allow an endpoint or IOAM control point to detect that such modification has occurred since the generation of the IOAM fields.  (Note that, as an IOAM-layer mechanism, the scope of modifications that can be detected may be limited to just the IOAM fields themselves.)
%
% In addition to this detectability problem, the following considerations are to be taken into account in constructing an IOAM integrity mechanism:

(This also serves to give a bit more motivation for why we don't consider confidentiality protection.  That content might be applicable in §3 as well, but I put it in the proposal here since it appears prior to that section.)

### Separation of layers

While it's generally reasonable (as §3 does) to require the lower layer protocol to handle threats at their own layer, I would probably call out that since IOAM is defined as data fields rather than a dedicated packet structure, we also rely on the lower layer to provide integrity protection for whch data fields (that is, IOAM Option-Types) are present in a given packet.

### limited off-path attackers

§3 refers to RFC 9055 for definition of on- and off-path attackers.  QUIC (RFC 9000) considers an additional case of "limited on-path" attackers that are initially off-path but in some cases can change packet routing and become on path for some portions of a flow.  Do you think that considering this level of subtlety is relevant for this document?  On initial read, I'm not really seeing much where considering this distinction would actually change what we say, so perhaps not.

### threat: false error injection

The discussion in §3 around creating a failure report for a nonexistent failure mentions the potential for additional processing/export by IOAM nodes along the path.  Could this be a privacy concern where the additional reported data contains "sensitive" information (for some definition of "sensitive")?  I am not sure if it is worth also mentioning the time of humans who get to look at the false positive reports and analyze them, only to ultimately discard them as bogus.

### threat: removal of fields

We cover modification and injection already, but might have a bit of discussion on what happens when the attacker just removes some or all IOAM fields.

### IOAM-Data-Fields modification

In §3.1 I might expand "false picture of the paths in the network" to cover both the notion of providing false paths in the network (topology-wise) and providing false data about (real or false) paths in the network.

### Option-Type Headers scope

In §3.2 we talk about the implications of changing the header of IOAM Option-Types, but the discussion here is intrinsically limited to the option-types defined at the time of this writing; we should probably acknowledge that limitation explicitly (and possibly just say that the listed implications are intended to be examples rather than exhaustive, as well).
We might also want to say that modifying the headers can have similar effects to modifying the data-fields directly, in terms of making the interpreted data useless.

### Namespace-ID modification

> Another possibility for the attacker is to change the context of IOAM-Data-Fields by modifying the Namespace-ID field in IOAM Option-Type headers, which makes the integrity protection of IOAM-Data-Fields completely useless.

This "completely useless" probaly merits further exposition.  (That is to say, I don't think I actually know what you mean by it.)

### Injection defenses

While I agree that the impacts of injection (§3.3,3.4) are similar to modification in general, it does seem that an IOAM deployment would be able to protect itself from injection (but not modification) if it know a priori what IOAM mechanisms were going to be in use on each flow, so that unexpected ones could be rejected.  That said, this scenario may not be worth mentioning in the document, since an attacker in a position to inject IOAM content into otherwise valid packets would very likely also be able to modify preexisting IOAM content...but an off-path attacker could inject wholly new packets with IOAM content while being unable to modify existing IOAM content.

### Replay scope

§ 3.5 mentions that an attacker can replay an IOAM Option-Type on a new data packet as a specific example; I'd suggest prefacing that remark with a statement that "In addition to wholesale replay of old packets" to highlight this scenario as a special case of the more generic replay topic.
As far as impact goes, I might also add that replaying old IOAM data might allow an attacker to mask other elements of an attack, such as a change in network path.

### Clarity on Management Attacks

I'm not entirely sure what the scope and intent of §3.6 is.  While the overall statement that management-plane attacks are out of scope for this document is clear (and probably reasonable), I don't have a picture of what attacks are envisioned -- are we looking at changing the data reported by IOAM as it goes from IOAM nodes to a reporting system?  Or are we including management-plane traffic that configures nodes on what IOAM traffic to expect/process, what IOAM-domain and/or namespace to participate in, etc?

A message of "once it leaves the IOAM layer, we can't do more for it" is simple and easy to explain, but some of the other more complicated topics may also be interesting to talk about if we want to consider the security of the overall ecosystem.

### Anti-Delay

While §3.7 does a reasonable job discussing delay, there is a niche case of anti-delay attacks possible as well, where an attacker has acces to a faster path and can skew the delay measurements in the "wrong way".  I am not sure if this presents any sufficiently interesting consequences to merit its own mention, though

### DEX Integrity Protected

§4 lists IOAM Option-Type 68 as allocated to DEX Integrity Protected, but I do not see this allocation reflected at https://www.iana.org/assignments/ioam/ioam.xhtml .

### Order of Headers and Data

I'd suggest being very explicit about the relative ordering of the Option-Type header being integrity protected, the Integrity Protection Header, and the actual IOAM data/data-fields.  Almost everything I see suggests that we insert the Integrity Protection Header between the Option-Type header being protected and its corresponding data, but in §4.4 we say that the optional fields in the DEX Option-Type header are treated as optional IOAM-Data-Fields while appearing before the Integrity Protection Header -- that leaves me unsure where the data fields go for the other Option-Header types.  Some statement and/or diagram (perhaps in toplevel §4) would avoid any ambiguity.

### Protected flags for POT option-type

Right now we (implicitly, by not protecting the IOAM-POT-Flags field at all) say that any future POT flags will not be integrity protected.  Is that the right choice?  There are not currently any POT flags defined, so it's a little hard to predict what kinds of use cases they might find.  As for the trace option-types mentioned above, it would be possible to subdivide the flags space into a protected and unprotected range, to leave a little flexibility for the future.

### Protected flags for DEX Option-Type

There are currently no "Flags" defined for DEX, which leaves it in a similar state as POT.  I comment separately on DEX because of the "Extension-Flags" field -- I would expect that all of this field, not just two bits, should be protected.  That's because these bits determine the structure of the following packet, which is something that we have been consistently applying protection throughout the document, and I don't see a reason to diverge from that pattern.

### Mutable fields to skip for signing

The discussion in §5 very quickly glosses over "IOAM-Data-Fields supposed to be modified by other IOAM nodes on the path MUST be excluded from the signature".  This is actually a critical point for constructing and validating ciphertexts, and seems like it would merit a longer treatment.  Most notably, I would probably want to have a central table (or maybe add to the IANA registry?) to indicate "does this field get skipped for integrity protection: Y/N" to try to leave out any guesswork by the implementor as to what is mutable or not.

### Signature-as-nonce is taking action

I'd clarify in §5 step (3) first bullet point that an intermediate node using a received signature as a nonce counts as "taking action" triggered by a field in the protected header and thus incurs the obligation to validate first.  So this requirement would be in force regardless of whether Loopback or Active are used, IMO.

### Guidance on what fields to integrity protect

In §6.1 I'd want to include alongside the requirement for new integrity protected option types to specfy which fields they protect, some guidance on which sorts of things to protect or not protect.  (This would be a place to codify the behavior I noted as being implicitly present in the document, that fields that affect the structure/interpretation of the rest of the packet should be integrity protected.)

## Nits

### proving

¶1 of §1 mentions that IOAM might be used for "proving that a certain traffic flow takes a pre-defined path"; my sense is that some readers will read more stringent requirements into the word "proving" than are met by the current technologies.  I would suggest rephrasing to "verifying that a certain traffic flow takes a pre-defined path" or "assuring that...".

### IOAM-Domain as "set of nodes"

¶2 of §1 leads with a few sentences about IOAM-Domains, one of which is "An IOAM-Domain is a set of nodes that use IOAM."  This is true, but when read without the caveats of the adjacent sentences, gives a misleading sense that it could be a set of unrelated nodes.  Please consider joining this sentence with the following one ("...that use IOAM, bounded by ..."), or adding a qualifier like "related" ("set of related nodes").

### in the clear

§1 ¶2 s/in clear/in the clear/.

### the viability

§1 first numbered point, s/viability/the viability/

### false illusion

Used a couple times in §3, I believe that "false illusion" is redundant and just "illusion" would do fine.

### time synchronization

In §3.7, I suggest s/synchronization/time synchronization" since that's more common in RFC 7384 and is less ambiguous.

### Delay non-mitigation

Also §3.7, I propose s/It is noted that this threat is not within the scope of the threats that are mitigated in this document/Note that the mechanisms in this document do not attempt to provide any mitigation against this threat/.

### Trivial validation

In §5 step (3), I'd s/trivial/one-step/ -- verifying a single MAC is not exactly trivial, it's just simpler than the iterative scheme that's needed in the general case.

### Transit Node-IDs

Also §5 step (3), I'd expound on "node-ids MUST be included in IOAM Data-Fields" to clarify that we need to be able to identify the nodes that regenerated the signature so that we can look up their keys, and so accordingly we require those node-ids to be present in the packet alongside the signature.

## Notes

This review is formatted in the "IETF Comments" Markdown format, see
https://github.com/mnot/ietf-comments.