Hi all,

I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG.  These comments were written primarily for the benefit of the
security area directors.  Document editors and WG chairs should
treat these comments just like any other last call comments.

It is rather surprising to find notable omissions with what is
effectively rfc4627ter, though that seems to be my conclusion after
performing this review.  Luckily, they should be easy to resolve
with just a touch more text in the security considerations, for the
most part.

As an overall summary, this document is a simple, clean, crisp
writeup of the JSON interchange format, which is expected to match
that of the ECMAscript standard.

Since this is a data format and not a protocol, any security issues
with this document are expected to be second-order effects, relating
to how the format is used or could be (mis)used by consuming
protocols.  It is useful to give guidance on issues that may occur
due to known-bad implementations, edge cases that can cause trouble,
etc., and this document does so, noting that the fields in an object
are unordered (but some implementations respect the order in their
APIs), when comparing field names the potential for escaped
characters must be taken into account, many implementations have
limited precision/bounds for numbers, non-number mathematical values
cannot be expressed in JSON, etc.

However, I think that the security considerations section would
benefit from some discussion of potential/example consequences of
failing to heed those issues:

Doing name comparison without respect for excaping could let an
attacker inject unsanitized data into what is supposed to be a
trusted structure, potentially giving privilege escalation,
compromising authentication and authorization, etc.

Considering only the first (or last) of duplicated name/value pairs
can lead to vulnerabilities by bypassing security checks, in mixed
environments.  Similarly, a reliance on fields being returned in
order could cause issues (certainly denial of service, potentially
worse) if an attacker uses a different order than expected.  Perhaps
the start of section 4 could refer forward to discussion in the
security considerations, which give some exposition on the
consequences of breaking the "names within an object SHOULD be
unique" guidance.

Implementations should probably check that their numerical routines
do not attempt to emit "NaN" or "Inf" or simlar, though I do not
have a good picture of how that would affect real systems
(hopefully, just a parse error on the receiver, as those are not
permitted by the grammar).  Similarly, the use of extremely large or
precise numbers in mixed environments can lead to unexpected
results.  I expect that there could be vulnerabilities here, such as
if code can be induced into a loop with incearingly large values,
where a sender ends up producing numbers larger than can be
represented on the receiver, which then saturates its
strtonum-equivalent and hits an unexpected codepath.


I'm also concerned about the freewheeling use of Unicode.  While
this document does discuss the potential encodings and lists UTF-8
as the default (and most interoperable), I think it would benefit
from a stricter warning that parties using JSON for communication
must have some out-of-band way to agree on what encoding is to be
used.  I would expect that this is usually going to be done by the
protocol using JSON, but could see a place for the actual
communicating peers to have out-of-band knowledge.  (An application
having to guess what encoding is being used based on heuristics is a
recipe for disaster.)

Additionally, the document makes no mention of Unicode
normalization, which can be a minefield.  The precis working group
has a lot of work in this area, from which the executive summary is:
it's a lot of work to do things correctly, and being sloppy usually
leads to vulnerabilities.  The most obvious issue would be in (the
comparison of) field names using strings that can be represented
differently in different normalization forms (for example, e with
acute accent), which can be either U+00e9 or U+0064 and the
combining character U+0301.  Simply converting to Unicode code
points is insufficient for an implementation to cause those strings
to compare as equivalent.  I think this document should at least
mention that Unicode normalization forms exist and should be
considered by protocol designers when using JSON with characters
outside of US-ASCII.


Section 9 (parsers) mentions that an implementation "may set limits"
on various parameters.  That seems like a good place to give some
guidance on what limits are in common use and how protocols or
protocol peers might discover and/or negotiate each others'
implementation limits.


With the proposed additions to the security considerations out of
the say, on to some other minutiae.

Section 3 describes that "[a] JSON value MUST be an object, array,
number, or string, or one of the following three literal names:
false null true"; in the introduction, this (equivalent) list is
given as "object, array, number, string, boolean, or null", in the
guise of "four primitive types (strings, numbers, booleans, and
null) and two structured types (objects and arrays)".  There may be
some value in using consistent terminology between the two
locations, but it would be pretty minor value and might not be worth
the effort.

The format for numbers (Section 6) permits an explicit (redundant)
plus sign in the exponent part, but not in front of the overall
number.  This feels slightly strange to me, but I don't know if
there are any implementations that get this wrong.  (I also don't
really see security consequences of getting it wrong.)  Similarly,
the fractional part must contain a digit (that is, a number ending
in a decimal point is forbidden), which is probably harmless.  I'll also
note that the number 3.141592653589793238462643383279 is used as an
example of a precision unrepresentable in 64-bit IEEE 754
floating-point, but the next digit in the expansion of pi is a '5',
which would round up to ...80.  Maybe another digit should be added
to keep overly pedantic readers from worrying about rounding modes :)

Should section 8.3 (string comparison) use any RFC 2119 language
with respect to treating strings as equal that use escapes (or even
normalization forms, as covered above)?

The IANA considerations (section 11) might be a little more explicit
that the IANA "Media Types" registry is what is being modified.
I'm also rather curious about the claim that no "charset" parameter
is needed as it "really has no effect on compliant recipients".  Why
is this not a good way to communicate whether UTF-8, UTF-16, or
UTF-32 is in use for a given text?

It might also be useful to have an example of an array object that
contains elements of different type, to reinforce that this is
permitted.

With respect to section 1.3, I personally am more used to seeing a
"changes since <previous document>" section that lays out what
changed inline, as opposed to with (opaque) references to specific
errata.  But that is rather an editorial issue so my personal
preference should not bind you.

-Ben