Post by Doug EwellIt seems Leif might be trying to tag the incomplete or erroneous
behavior of individual applications, even if they don't correspond to
documented behavior, or to tag mis-documented behavior that may not
actually be implemented (like "unicode" meaning "BMP only").
* BMP: The motivation behind why the registrations says 'BMP' was only
that the written spec says so and because the registration template
asked for such data.
* Products: Reference to products are made in order to document that
the 'unicode'/'unicodeFFFE' specs actually are implemented. In that
regard, the possible 'BMP'-incorrectness seems far less important
w.r.t. practical 'real' problems than the endianness issues.
* Actually implemented: That 'unicode' and 'utf-16' (in the Microsoft
spec) are names for little-endian UTF-16, while 'unicodeFFFE' is name
for big-endian UTF-16, is a fact. To verify, try the following web page
in Chrome, Safari or IE - the clue being that the page is
'utf-16b'-encoded while HTTP says 'utf-16':
http://malform.no/testing/utf/html/16be/http.utf16
For reference, an identical, but little-endian encoded page:
http://malform.no/testing/utf/html/16le/http.utf16
If IE and Safari/Chrome implemented the official UTF-16
specification, the first page should have worked fine, while the latter
perhaps did not need to work. Instead, we see the opposite: The first
page fails in in the mentioned browsers.
* 'Actually implemented' has reached Web standards: HTML5 specifies:
«The requirement to default UTF-16 to little-endian rather than
big-endian is a willful violation of RFC 2781, motivated by a desire
for compatibility with legacy content. [RFC2781]»
<http://dev.w3.org/html5/spec/parsing.html#character-encodings-0>
Whether it is 'legacy content' - as HTML5 claims - or implementation
of the Microsoft spec - or both things - that makes HTML5 say this, is
perhaps an open question.
Post by Doug EwellI'm not sure that's a goal of registering charsets.
The goals with these registrations are to comply with section 2.5. In
particular did this seem relevant: «the use of a large number of
undocumented and/or unlabeled charsets hampers interoperability even
more.»
<http://tools.ietf.org/html/bcp19#section-2.5>
Post by Doug EwellIt also seemed to
me—though I assume I'm wrong here—that he was trying to call
particular attention to errors in Microsoft implementations, but I'm
sure Shawn and others can speak to that.
It is not only products of Microsoft: Webkit is backed by Apple,
Google, HTML5 ...
But with Microsoft's positive attitude Unicode, including UTF-16, it
seems reasonable to ask: Is it certain that Microsoft - and the
community at large - is aware of how they operate with a shadow spec
that contradicts UTF-16 - and the impacts of this? Perhaps, with a
little attention to this, they will update or fine