Volunteer needed to serve as IANA charset reviewer

Discussion:

Ned Freed

2006-09-06 20:30:44 UTC

> (IETF list removed, since this is about to become specialized)

> --On Wednesday, 06 September, 2006 11:04 -0700 Ted Hardie
> <***@qualcomm.com> wrote:

> > The Applications Area is soliciting volunteers willing to
> > serve as the IANA charset reviewer. This position entails
> > reviewing charset registrations submitted to IANA in
> > accordance with the procedures set out in RFC 2978. It
> > requires the reviewer to monitor discussion on the
> > ietf-charsets mailing list (moderating it, if necessary); it
> > also requires that the reviewer interact with the registrants
> > and IANA on the details of the registration. There is
> > currently a small backlog, and it will be necessary to work to
> > resolve that backlog during the initial period of the
> > appointment.
> >...

> Perhaps the need for a new volunteer in this area is the time to
> ask a broader question:

> At the time 2978 (and its predecessor, 2278) were defined, there
> were a large number of charsets in heavy use and there was some
> general feeling in the implementer community that, despite the
> provisions of RFC 2277, Unicode/ISO 10646 were not quite ready.
> Although we probably still have some distance to go (the issues
> with my net-Unicode draft may be illustrative), I wonder if we
> are reaching the point at which a stronger "use Unicode on the
> wire" recommendation would be in order. The implications of
> such a recommendation would presumably include a 2978bis that
> made the requirements for registration of a new charset _much_
> tougher, e.g., requiring a demonstration that the then-current
> version of Unicode cannot do the relevant job and/or evidence
> that the newly-proposed charset is needed in deployed
> applications.

I agree that we've reached a point where "use UTF-8" is what we need to be
pushing for in new protocol development. (Note that I said UTF-8 and not
Unicode - given the existance of gb18030 [*] I don't regard a recommendation of
"use Unicode" as even close to sufficient. The last thing we want is to see the
development of specializesd Unicode CESes for Korean, Japanese, Arabic, Hebrew,
Thai, and who knows what else.) And if the reason there are new charset
registrations was because of the perceived need to have new charsets for use in
new protocols, I would be in total agreement that a change in focus for charset
registration is in order.

But that's not why we're seeing new registrations. The new registrations we're
seeing are of legacy charsets used in legacy applications and protocols that
for whatever reason never got registered previously. Given that these things
are in use in various nooks and crannies around the world, it is critically
important that when they are used they are labelled accurately and
consistently.

The plain fact of the matter is that we have done a miserable job of producing
an accurate and useful charset registry, and considerable work needs to be done
both to register various missing charsets as well as to clean up the existing
registry, which contains many errors. I've seen no interest whatsoever in
registering new charsets for new protocols, so to my mind pushing back on, say,
the recent registration of iso-8859-11, is an overreaction to a non-problem.
[**]

> This question is motivated, not by a strong love for Unicode,
> but by the observation that RFC 2277 requires it and that the
> IETF is shifting toward it in a number of areas. More options
> and possibilities for local codings that are not generally known
> and supported do not help with interoperability; perhaps it is
> time to start pushing back.

Well, I have to say that to the extent we've pushed back on registrations, what
we've ended up with is ad-hoc mess of unregistered usage. I am therefore quite
skeptical of any belief that pushing back on registrations is a useful tactic.

> And that, of course, would dramatically change the work of the
> charset reviewer by reducing the volume but increasing the
> amount of evaluation to be done.

Even if we closed the registry completely there is still a bunch of work to do
in terms of registry cleanup.

Now, having said all this, I'm willing to take on the role of charset reviewer,
but with the understanding that one of the things I will do is conduct a
complete overhaul of the existing registry. [***] Such a substantive change will
of course require some degree of oversight, which in turn means I'd like to see
some commitment from the IESG of support for the effort.

As for qualifications, I did write the charset registration specification, and
I also wrote and continue to maintain a fairly full-features charset conversion
library. I can provide more detail if anyone cares.

Ned

[*] - For those not fully up to speed on this stuff, gb18030 can be seen as an
encoding of Unicode that is backwards compatible with the previous simplified
Chinese charsets gb2312 and gbk.

[**] - The less recent attempt to register ISO-2022-JP-2004 is a more
interesting case. I believe this one needed to be pushed on, but not
because of potential use in new applications or protocols.

[***] - I have the advantage of being close enough to IANA that I can drive
over there and have F2F meetings should the need arise - and I suspect
it will.

Keith Moore

2006-09-06 21:45:58 UTC