$Id: draft-wing-sipping-srtp-key-04-rev.txt,v 1.2 2009/02/15 15:41:29 ekr Exp $ End-to-end VoIP security mechanisms such as DTLS-SRTP represent a threat to mechanisms in which a network element which is not a party to the call wishes to monitor or modify the contents of the media traffic. This document describes a mechanism for one of the parties to the communication to provide a copy of the keying material to such a third party subject to some set of authorization controls. I'm concerned that this document doesn't have a very clear statement of requirements. Rather, it seems to be attempting to fulfill a number of distinct use cases which don't have much in common except that they represent violations of the end-to-end security model of the SIP call. This document describes two major use cases for this type of technology: - Monitoring (call recording) - Transcoding I don't think it's particularly useful to conflate these cases, which are really quite different. Monitoring is fundamentally a passive process: there is no need for the monitor to be able to modify the traffic. By contrast, transcoding is an active process: the transcoder is expected to modify the data. In reality, a transcoded call isn't a call between two endpoints, but rather two calls, each from one endpoint to the transcoder. I think it's a mistake to try do to these with the same mechanism. Similarly, this document fails to distinguish adequately between real-time and non-real-time use cases. Many monitoring/call recording applications are inherently non-real-time: you record the call and some time in the future, the call may or may not be replayed. This distinction has a number of implications, particularly since capture of the keying material and media can be separated. In particular, it may be desirable to deliver the keying material long after the call has finished (for privacy reasons). It's not clear to me how this is accomplished with this draft. It's possible it could be initiated by the UA, but I don't see how it could be initiated by the monitor. Even in a UA initiated fashion, I don't see that the information provided by the SDP in S 11 is sufficient to unambiguously identify the flow, in part due to network parameter reuse. While I appreciate it's convenient to reuse the SDP parameters, it's not clear to me that it's a good idea to hand over the SRTP master key. If all you need to do is verify the call for quality assurance, you don't need the integrity check, at least not initially. In fact, not having access to the integrity key protects against accusations that the recording device tampered. Similarly, it's not clear to me that it's desirable to have the same level of protection for the connection parameters as for the keys. Wouldn't it be useful for the monitoring application to know what connections it *potentially* has the keys for but not have direct access to them until some future time? Again, this seems like something that would be more clear with a requirements analysis in terms of privacy requirements. Finally, the elephant under the covers here is lawful intercept. the authors specifically disclaim it, but it's quite clear that this is usable as an LI system. Indeed, many such systems (e.g., FORTEZZA) involve cooperation from the endpoint being monitored. Accordingly, I would recommend that rather than accepting this mechanism as a WG document, the WG do a thorough requirements analysis focusing on minimizing the privacy issues inherent in mechanisms of this type. Once there is consensus on the requirements, then it's possible to have a discussion of mechanisms. DETAILED COMMENTS 4.3. If the requirement for recording is this strong, wouldn't it be better not to rely on the UA doing the right thing? Rather enforce it in a firewall or IDS. 7.2.2. The signature of the SAML assertion should be produced using the private key of the domain certificate. This certificate MUST have a SubjAltName which matches the domain of user agent's SIP proxy (that is, if the SIP proxy is sip.example.com, the SubjAltName of the domain certificate signing this SAML assertion MUST also be example.com). Here, the main focus is placed on communication of clients with the ESC, which belongs to the client's home domain. It's not clear to me why this is the correct authorizing certificate. 7.2.3. I don't really understand the need for the rcrypto thing. Why not just pretend you have two streams with distinct keys and use crypto= for both. Actually, I don't really think it makes sense to use SDP here at all: the semantics of the SDP really aren't the same, since you're not offering to receive a media stream, you're advertising what you're going to send. As noted above, I think it would be better to send the traffic keys separately. 7.2.4. This whole SAML thing seems pretty underspecified. I don't think using SIPS here is adequate, since it doesn't provide any guarantee to the endpoint of the security treatment of the keying material. In fact, as I noted earlier, I'm not clear that S/MIME is good enough. I think you may want something multilevel. 9.3. This Disclosure thing seems a bit confusing. Isn't what you really need to inject the appropriate warnings in the media plane. -Ekr