Discussion:
Registration of new charset [ISO-2022-JP-2004]
Koichi Yasuoka
2006-07-17 02:14:06 UTC
Permalink
Charset name:

ISO-2022-JP-2004

Charset aliases:

ISO-2022-JP-2003
ISO-2022-JP-3-2003

Suitability for use in MIME text:

Suitable for 7-bit use in MIME body-part as text/plain or text/html.
B-encoding is recommended for use in MIME header-part, because
ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.

Published specification:

JIS X 0213 7-bit and 8-bit double byte coded extended KANJI sets for
information interchange, Japanese Standards Association (first edition
2000-01-20, amendment 2004-02-20, corrigendum 2004-04-01).

ISO 10646 equivalency table:

No direct URI to the equivalency table, but the table is included in
JIS X 0213, which can be found via
http://www.jisc.go.jp/app/JPS/JPSO0020.html
with searching the word "X0213".

Additional information:

"ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" were in the first print of
JIS X 0213:2004 dated February 20, 2004, and they were both corrected
to "ISO-2022-JP-2004" in the corrigendum dated April 1, 2004. To avoid
complications "ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" may be aliases
of "ISO-2022-JP-2004", but "ISO-2022-JP-2004" is preferred.

Person & email address to contact for further information:

Koichi Yasuoka
***@kanji.zinbun.kyoto-u.ac.jp

Intended usage:

COMMON
Koichi Yasuoka
2006-08-02 02:17:56 UTC
Permalink
Dear Sirs,

Two week period has passed after the proposal shown below, and I've
heard no objection against the charset. Then, how do I go to the
next stage?

Best Regards,
Koichi Yasuoka

------ Registration proposal on 17 Jul 2006

Charset name:

ISO-2022-JP-2004

Charset aliases:

ISO-2022-JP-2003
ISO-2022-JP-3-2003

Suitability for use in MIME text:

Suitable for 7-bit use in MIME body-part as text/plain or text/html.
B-encoding is recommended for use in MIME header-part, because
ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.

Published specification:

JIS X 0213 7-bit and 8-bit double byte coded extended KANJI sets for
information interchange, Japanese Standards Association (first edition
2000-01-20, amendment 2004-02-20, corrigendum 2004-04-01).

ISO 10646 equivalency table:

No direct URI to the equivalency table, but the table is included in
JIS X 0213, which can be found via
http://www.jisc.go.jp/app/JPS/JPSO0020.html
with searching the word "X0213".

Additional information:

"ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" were in the first print of
JIS X 0213:2004 dated February 20, 2004, and they were both corrected
to "ISO-2022-JP-2004" in the corrigendum dated April 1, 2004. To avoid
complications "ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" may be aliases
of "ISO-2022-JP-2004", but "ISO-2022-JP-2004" is preferred.

Person & email address to contact for further information:

Koichi Yasuoka
***@kanji.zinbun.kyoto-u.ac.jp

Intended usage:

COMMON
Koichi Yasuoka
2006-08-09 05:53:50 UTC
Permalink
Dear Sirs,

Three weeks have passed after the proposal shown below, and I've
heard no objection against the charset. Then, how do I go to the
next stage?

Best Regards,
Koichi Yasuoka

------ Registration proposal on 17 Jul 2006

Charset name:

ISO-2022-JP-2004

Charset aliases:

ISO-2022-JP-2003
ISO-2022-JP-3-2003

Suitability for use in MIME text:

Suitable for 7-bit use in MIME body-part as text/plain or text/html.
B-encoding is recommended for use in MIME header-part, because
ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.

Published specification:

JIS X 0213 7-bit and 8-bit double byte coded extended KANJI sets for
information interchange, Japanese Standards Association (first edition
2000-01-20, amendment 2004-02-20, corrigendum 2004-04-01).

ISO 10646 equivalency table:

No direct URI to the equivalency table, but the table is included in
JIS X 0213, which can be found via
http://www.jisc.go.jp/app/JPS/JPSO0020.html
with searching the word "X0213".

Additional information:

"ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" were in the first print of
JIS X 0213:2004 dated February 20, 2004, and they were both corrected
to "ISO-2022-JP-2004" in the corrigendum dated April 1, 2004. To avoid
complications "ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" may be aliases
of "ISO-2022-JP-2004", but "ISO-2022-JP-2004" is preferred.

Person & email address to contact for further information:

Koichi Yasuoka
***@kanji.zinbun.kyoto-u.ac.jp

Intended usage:

COMMON
Koichi Yasuoka
2006-08-16 04:44:17 UTC
Permalink
Dear Sirs,

One month has passed after the proposal shown below, and I've
heard no objection against the charset. Then, regarding RFC 2978,
how do I contact the "charset reviewer"?

Best Regards,
Koichi Yasuoka

------ Registration proposal on 17 Jul 2006

Charset name:

ISO-2022-JP-2004

Charset aliases:

ISO-2022-JP-2003
ISO-2022-JP-3-2003

Suitability for use in MIME text:

Suitable for 7-bit use in MIME body-part as text/plain or text/html.
B-encoding is recommended for use in MIME header-part, because
ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.

Published specification:

JIS X 0213 7-bit and 8-bit double byte coded extended KANJI sets for
information interchange, Japanese Standards Association (first edition
2000-01-20, amendment 2004-02-20, corrigendum 2004-04-01).

ISO 10646 equivalency table:

No direct URI to the equivalency table, but the table is included in
JIS X 0213, which can be found via
http://www.jisc.go.jp/app/JPS/JPSO0020.html
with searching the word "X0213".

Additional information:

"ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" were in the first print of
JIS X 0213:2004 dated February 20, 2004, and they were both corrected
to "ISO-2022-JP-2004" in the corrigendum dated April 1, 2004. To avoid
complications "ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" may be aliases
of "ISO-2022-JP-2004", but "ISO-2022-JP-2004" is preferred.

Person & email address to contact for further information:

Koichi Yasuoka
***@kanji.zinbun.kyoto-u.ac.jp

Intended usage:

COMMON
Martin Duerst
2006-09-27 10:42:02 UTC
Permalink
Hello Koichi,

At 13:44 06/08/16, Koichi Yasuoka wrote:
>Dear Sirs,
>
>One month has passed after the proposal shown below, and I've
>heard no objection against the charset. Then, regarding RFC 2978,
>how do I contact the "charset reviewer"?

The former charset reviewer has resigned.
The IETF Applications Area Directors are currently working on
finding and appointing a new reviewer.

When I saw your proposal, I had a few comments, but unfortunately,
I didn't find the time to put them together. Please find them
below.


>Best Regards,
>Koichi Yasuoka
>
>------ Registration proposal on 17 Jul 2006
>
>Charset name:
>
>ISO-2022-JP-2004
>
>Charset aliases:
>
>ISO-2022-JP-2003
>ISO-2022-JP-3-2003
>
>Suitability for use in MIME text:
>
>Suitable for 7-bit use in MIME body-part as text/plain or text/html.
>B-encoding is recommended for use in MIME header-part, because
>ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.

The fact that the new encoding can be seen as an extension of
iso-2022-jp doesn't make it easy for people to understand
why the B-encoding is recommended.

RFC 1468 also just says:

ISO-2022-JP may also be used in MIME Part 2 headers. The "B"
encoding should be used with ISO-2022-JP text.

and thus doesn't motivate or explain anything. Is the preference
for "B" due to tradition? Or because on average, it leads to
shorter encodings? Or because even if "Q" may be shorter in
some (many?) cases, the literally displayed US-ASCII codepoints
will just confuse somebody who looks at it? Or are there
implementations that only understand "B"?

Also, I think that any mention of "extension of ISO-2022-JP"
without explanations is a bit problematic, because it might
give the impression that implementations accepting iso-2022-jp
also will somehow work for this new encoding. In my understanding,
because new escape sequences are used, it is extremely difficult
to predict what might happen in such a case.


>Published specification:
>
>JIS X 0213 7-bit and 8-bit double byte coded extended KANJI sets for
>information interchange, Japanese Standards Association (first edition
>2000-01-20, amendment 2004-02-20, corrigendum 2004-04-01).
>
>ISO 10646 equivalency table:
>
>No direct URI to the equivalency table, but the table is included in
>JIS X 0213, which can be found via
> http://www.jisc.go.jp/app/JPS/JPSO0020.html
>with searching the word "X0213".

Oh, this is really nice: A JIS standard is available in .pdf
on the Web (although it can't be printed out). Given that the paper
copy is 11,000 Yen + tax (at least mine was, in 2000), that's a very
nice development.

The main problem I see, not for me personally, but for others, is the
fact that everything on this site as well as the standard itself is
in Japanese. Also, for programmers, even if they read Japanese,
typing in the data from a screen (or buying the standard
and typing the data in from there) looks like a really bad (because
extremely tedious and error-prone) idea.

So I think both a more detailled description and a pointer to
machine-readable data would be highly appreciated by anybody who
wants to implement this. Even inofficial pointers, and even if
only on the mailing list and not as part of the official
registration form, would be better than nothing.
As for description, I think it could be as easy as just listing
the various escape sequences and their meaning (roughly what's
in appendix 2, section 4 of (at least the 2000 version that I
have in front of me) of JIS X 0213.


>Additional information:
>
>"ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" were in the first print of
>JIS X 0213:2004 dated February 20, 2004, and they were both corrected
>to "ISO-2022-JP-2004" in the corrigendum dated April 1, 2004. To avoid
>complications "ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" may be aliases
>of "ISO-2022-JP-2004", but "ISO-2022-JP-2004" is preferred.

The 2000 version of JIS X 0213 also contains ISO-2022-JP-3.
What's the reason for leaving that out of the registration?
Is the reason that there were changes in the Unicode mappings
of JIS X 0213?

[for outsiders: The 2000 version contained some
characters that were not yet in Unicode/ISO 10646, and listed
the 'desired' Unicode/ISO 10646 codepoints, but the actually
allocated codepoints in Unicode/ISO 10646 were different.]

Are there any changes in Unicode/ISO 10646 mappings between
2003 and 2004? If yes, what? If not, what are the chances that
motivated yet another alias (already the two original aliases
are in principle one more than necessary).


>Person & email address to contact for further information:
>
>Koichi Yasuoka
>***@kanji.zinbun.kyoto-u.ac.jp

Thanks for taking on the job of registering this encoding!


>Intended usage:
>
>COMMON

How common is this already, or is it going to be?
What I have heard is that most implementers are using, or
plan to use, UTF-8 or UTF-16 for implementing the repertoire
of JIS X 0213.


Another question: at least the 2000 version I have in front of
me also defines Shift_JISX0213 and EUC-JISX0213. Are there plans
to register these, too? (I seem to remember that there was a
request in that direction, but that failed because the requester
wasn't able to agree with the rest of the list on the meaning
of the "Suitability for use in MIME text:" field.)


Regards, Martin.


#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Koichi Yasuoka
2006-09-19 14:18:03 UTC
Permalink
Dear Sirs,

Two months have passed after the proposal shown below, and I've
heard no objection against the charset. Then, regarding RFC 2978,
how do I contact the "charset reviewer"?

Best Regards,
Koichi Yasuoka

------ Registration proposal on 17 Jul 2006

Charset name:

ISO-2022-JP-2004

Charset aliases:

ISO-2022-JP-2003
ISO-2022-JP-3-2003

Suitability for use in MIME text:

Suitable for 7-bit use in MIME body-part as text/plain or text/html.
B-encoding is recommended for use in MIME header-part, because
ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.

Published specification:

JIS X 0213 7-bit and 8-bit double byte coded extended KANJI sets for
information interchange, Japanese Standards Association (first edition
2000-01-20, amendment 2004-02-20, corrigendum 2004-04-01).

ISO 10646 equivalency table:

No direct URI to the equivalency table, but the table is included in
JIS X 0213, which can be found via
http://www.jisc.go.jp/app/JPS/JPSO0020.html
with searching the word "X0213".

Additional information:

"ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" were in the first print of
JIS X 0213:2004 dated February 20, 2004, and they were both corrected
to "ISO-2022-JP-2004" in the corrigendum dated April 1, 2004. To avoid
complications "ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" may be aliases
of "ISO-2022-JP-2004", but "ISO-2022-JP-2004" is preferred.

Person & email address to contact for further information:

Koichi Yasuoka
***@kanji.zinbun.kyoto-u.ac.jp

Intended usage:

COMMON
Martin Duerst
2006-09-28 07:06:56 UTC
Permalink
At 00:28 06/09/28, Koichi Yasuoka wrote:
>Dear Martin,
>
>Thank you for your reply about the registration of ISO-2022-JP-2004.
>However, I almost give up the registration...

Please don't.

I didn't think this would happen so quickly,
and I was somehow expecting a public announcement, but as
of a few hours ago, IANA has listed the new reviewers on
their page (see http://www.iana.org/numbers.html#C).
To spare people the work of checking it, the relevant line
reads:
Character Sets RFC2978 Expert Review (Primary Expert Ned Freed
and Secondary Expert Martin Duerst)

Ned and me still need to figure out some details of how we split up
our work, but we definitely hope that we can move things forward more
quickly than in the (recent) past.

>>Is the preference
>>for "B" due to tradition? Or because on average, it leads to
>>shorter encodings?
>
>"B" is shorter on average than "Q", especially encoding Japanese
>names in From, To, and Cc fields. For example, my name in Japanese
>is encoded as:
>
>=?ISO-2022-JP-2004?B?GyRCMEIyLDknMGwbKEI=?=
>=?ISO-2022-JP-2004?Q?=1B=24B0B2=2C9=270l=1B=28B?=
>
>So as ISO-2022-JP. Please try other Japanese names.

Ok, I see. Probably best to say so in the registration.

>>Also, I think that any mention of "extension of ISO-2022-JP"
>>without explanations is a bit problematic, because it might
>>give the impression that implementations accepting iso-2022-jp
>>also will somehow work for this new encoding. In my understanding,
>>because new escape sequences are used, it is extremely difficult
>>to predict what might happen in such a case.
>
>I understand that ISO-2022-JP texts with "ESC $ B" and
>"ESC ( B" can be accepted by ISO-2022-JP-2004 decoder.
>It is problematic when "ESC $ @" or "ESC ( J" is used but
>they are very rare now.

That's not the case I was worring about. I think it will
be very rare for some piece of software to only support
iso-2022-jp-2004, but not iso-2022-jp. The reverse is
much more likely, and feeding iso-2022-jp-2004 data
to a iso-2022-jp decoder is where things are extremely
difficult to predict.

>>So I think both a more detailled description and a pointer to
>>machine-readable data would be highly appreciated by anybody who
>>wants to implement this.
>
>Yes, I agree that the machine-readbale table for the
>conversion between ISO-2022-JP-2004 and ISO/IEC 10646 is
>highly appreciated, but I didn't get it when I proposed
>the registration. Now I know "ISO-2022-JP-2004 vs Unicode
>mapping table" at http://x0213.org/codetable/iso-2022-jp-2004-std.txt

This looks great. Please include this pointer in the
registration. If necessary, you can mention that it is
not normative/official.

>>The 2000 version of JIS X 0213 also contains ISO-2022-JP-3.
>>What's the reason for leaving that out of the registration?
>>Is the reason that there were changes in the Unicode mappings
>>of JIS X 0213?
>
>I thought I would regist them one by one. First ISO-2022-JP-2004,
>second EUC-JIS-2004, third Shift_JIS-2004, fourth ISO-2022-JP-3,
>fifth EUC-JISX0213, sixth and the last Shift_JISX0213. The latter
>three encodings' "Intended usage" should be "OBSOLETE".

Okay, that makes sense. But probably, it's better to speed
things up a bit by doing EUC-JIS-2004 and Shift_JIS-2004
together, and later doing all the obsolete ones together.

By the way, you said
Intended usage: COMMON
in your inintial registration form.

Regarding this, I find the following in http://www.ietf.org/rfc/rfc2978.txt:
A charset should therefore be registered ONLY if it adds significant
functionality that is valuable to a large community, OR if it
documents existing practice in a large community. Note that charsets
registered for the second reason should be explicitly marked as being
of limited or specialized use and should only be used in Internet
messages with prior bilateral agreement.
It sounds to me as if
Intended usage: LIMITED
might fit better. But I'm not totally familiar with the
usage patters.


>>Are there any changes in Unicode/ISO 10646 mappings between
>>2003 and 2004? If yes, what?
>
>Do you mean "between 2000 and 2004"?

No, I explicitly wanted to ask for changes from 2003 to 2004.
The reason why I asked was that if the 2003 version introduces
the labels ISO-2022-JP-2003 and ISO-2022-JP-3-2003 (which you
propose to use as aliases), and there were no changes between
2003 and 2004, it is difficult to explain why to also introduce
the label ISO-2022-JP-2004.

In general, if a standard is updated or republished, there
are either changes that warrant a different label, in which
case the old labels should not be used as aliases, or there
are no relevant changes, and in this case, introducing
a new label is a bad idea.


>If so, I say yes. UCS for
>2-93-27 was changed from 9B1D into 9B1C (well, the codepoint has
>much complicated history).

This looks like a near miss, based on two glyph shapes that look
very similar, with components that are at least occasionally
used interchangeably.

As far as I'm aware, there were much more drastic changes between
2000 and 2004. As an example, the 2000 version gives (31D3)
as the Unicode/ISO 10646 codepoint for a hiragana "ke" with a small ring.
This has been corrected to a composition of U+3051 and U+309A
in the 2004 version. There are five such hiragana examples and x
nine katakana examples. There are also some Latin and Greek characters
with similar changes.
Also, the circled numbers from 21 to 50 have different Unicode/ISO
10646 mappings in the 2000 version and in the 2004 version.
And then there are quite a number of Kanji (I haven't counted them)
that contain some mappings in the 2000 version that had to be
fixed. As an example, the character numbered as 3-2E22 in
http://x0213.org/codetable/iso-2022-jp-2004-std.txt has a
code-point (AAA2) in the 2000 version, but the actual character
in Unicode is at U+2000B, and the 2004 version corrects this.
Basically, all Unicode points that the 2000 version put into
parentheses are suspect to change (and most of them actually
changed).

So I agree with your assessment that any labels that refer to
the 2000 version should be classified as "Obsolete".

>Furthermore, the 2004 version of
>JIS X 0213 includes ten more characters than the 2000 version.

Were these new characters introduced in 2003 or in 2004?


Regards, Martin.


#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Erik van der Poel
2006-10-01 17:31:07 UTC
Permalink
Hello,

I have a few questions about this registration:

> At 00:28 06/09/28, Koichi Yasuoka wrote:
> >=?ISO-2022-JP-2004?Q?=1B=24B0B2=2C9=270l=1B=28B?=

I believe that, in general, many of us recommend being conservative in
what you send out, liberal in what you accept. Therefore, the
recommendation is to use the charset label that matches the smallest
subset of characters actually used in the text, as well as using the
oldest and/or most commonly accepted name. In this case, you are
clearly using the ESC $ B (1B 24 42) that is part of iso-2022-jp (rfc
1468). Therefore, the more conservative option is to use the name
iso-2022-jp when sending this particular piece of text.

I have noticed over the years that if you don't spell out the
recommendations, implementors will do the wrong thing. In this case,
would it be a good idea to add such recommendations to the
registration itself? Or should a new RFC be written, in order to
provide the recommendations in more detail?

> >I understand that ISO-2022-JP texts with "ESC $ B" and
> >"ESC ( B" can be accepted by ISO-2022-JP-2004 decoder.
> >It is problematic when "ESC $ @" or "ESC ( J" is used but
> >they are very rare now.

Which escape sequences are permitted in iso-2022-jp-2004? There are 3
problems with the link you sent earlier*: The first page is in
Japanese, and when you search for X0213, the results are in Japanese
too. Then X0213 is split into many PDFs, and it is not clear which one
to download in order to see the escape sequences, nor am I inclined to
download all of the pieces. Finally, that site was down yesterday and
up today. How often does it go down?

* http://www.jisc.go.jp/app/JPS/JPSO0020.html

> >Now I know "ISO-2022-JP-2004 vs Unicode
> >mapping table" at http://x0213.org/codetable/iso-2022-jp-2004-std.txt

I wonder whether either or both of these links would be good to have
in the registration:

http://www.itscj.ipsj.or.jp/ISO-IR/233.pdf
http://www.itscj.ipsj.or.jp/ISO-IR/

Erik van der Poel
Editor and co-author of RFC 1468 (iso-2022-jp)
Martin Duerst
2006-10-02 08:06:01 UTC
Permalink
At 02:31 06/10/02, Erik van der Poel wrote:
>Hello,

Hello Erik,

Many thanks for your questions.

>I have a few questions about this registration:
>
>> At 00:28 06/09/28, Koichi Yasuoka wrote:
>> >=?ISO-2022-JP-2004?Q?=1B=24B0B2=2C9=270l=1B=28B?=
>
>I believe that, in general, many of us recommend being conservative in
>what you send out, liberal in what you accept. Therefore, the
>recommendation is to use the charset label that matches the smallest
>subset of characters actually used in the text, as well as using the
>oldest and/or most commonly accepted name. In this case, you are
>clearly using the ESC $ B (1B 24 42) that is part of iso-2022-jp (rfc
>1468). Therefore, the more conservative option is to use the name
>iso-2022-jp when sending this particular piece of text.

Yes indeed. In this specific case, I think Koichi didn't mean
to suggest that one necessarily should label such data as
iso-2022-jp-2004, but just used his name as an example to
answer my question on why B encoding was preferred to Q encoding.


>I have noticed over the years that if you don't spell out the
>recommendations, implementors will do the wrong thing. In this case,
>would it be a good idea to add such recommendations to the
>registration itself? Or should a new RFC be written, in order to
>provide the recommendations in more detail?

I agree that the registration should give some information
about how this new encoding relates to iso-2022-jp.


>> >I understand that ISO-2022-JP texts with "ESC $ B" and
>> >"ESC ( B" can be accepted by ISO-2022-JP-2004 decoder.
>> >It is problematic when "ESC $ @" or "ESC ( J" is used but
>> >they are very rare now.

On reconsideration, I'm not sure I can agree with this
statement of Koichi. What I found is that JIS X 0213 contains
a list of characters that are not supposed to be sent with
ESC ( B. This list, as far as I was able to check, includes
all the new additions to the code table, but it also contains
quite a few characters already in JIS X 0208 (the base for
iso-2022-jp). I haven't yet found something that says that
although these characters are not supposed to be sent, they
nevertheless have to be accepted. Therefore, I think the
above statement is doubtful.

>Which escape sequences are permitted in iso-2022-jp-2004?

- ESC ( B for ISO/IEC 646 IRV
- ESC $ ( O for the full plane 1 of JIS X 0213
- ESC $ ( P for plane 2 of JIS X 0213
- ESC $ ( B for a subset of plane 1 of JIS X 0213
(also a subset of the plane/table from JIS X 0208)

>There are 3
>problems with the link you sent earlier*: The first page is in
>Japanese, and when you search for X0213, the results are in Japanese
>too. Then X0213 is split into many PDFs, and it is not clear which one
>to download in order to see the escape sequences, nor am I inclined to
>download all of the pieces.

Giving URIs to specific parts of the documents, with explanations,
would certainly be appreciated.

>Finally, that site was down yesterday and
>up today. How often does it go down?
>
>* http://www.jisc.go.jp/app/JPS/JPSO0020.html
>
>> >Now I know "ISO-2022-JP-2004 vs Unicode
>> >mapping table" at http://x0213.org/codetable/iso-2022-jp-2004-std.txt
>
>I wonder whether either or both of these links would be good to have
>in the registration:
>
>http://www.itscj.ipsj.or.jp/ISO-IR/233.pdf
>http://www.itscj.ipsj.or.jp/ISO-IR/

The first one certainly would be good to have.
The second one is too general.

Regards, Martin.

>Erik van der Poel
>Editor and co-author of RFC 1468 (iso-2022-jp)


#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Erik van der Poel
2006-10-02 17:56:54 UTC
Permalink
Now that we have reached the letter O, we should probably use hex to
avoid confusion with the digit 0?

ESC $ ( O -> 1B 24 28 4F or \x1B\x24\x28\x4F (C/C++/etc)?

ISO 2022 uses decimal in nibbles (4-bit units == quartets) separated
by slash: O -> 4/15

However, most of the character encoding experts use hex, so we should too?

Anyway, I think iso-2022-jp-2004 may be using the latest registration:

http://www.itscj.ipsj.or.jp/ISO-IR/233.pdf

Which says that the final byte in the escape sequence is 5/1, i.e. \x51 == 'Q'

Also, I am quite concerned about the "full plane 1" and "subset of
plane 1" mentioned below, and the use of ESC $ ( B instead of the more
traditional ESC $ B.

Erik

On 10/2/06, Martin Duerst <***@it.aoyama.ac.jp> wrote:
> At 02:31 06/10/02, Erik van der Poel wrote:
> >Which escape sequences are permitted in iso-2022-jp-2004?
>
> - ESC ( B for ISO/IEC 646 IRV
> - ESC $ ( O for the full plane 1 of JIS X 0213
> - ESC $ ( P for plane 2 of JIS X 0213
> - ESC $ ( B for a subset of plane 1 of JIS X 0213
> (also a subset of the plane/table from JIS X 0208)
Koichi Yasuoka
2006-10-03 00:21:19 UTC
Permalink
Dear Martin,

Just a quick reply to correct the escape-sequences.

>>Which escape sequences are permitted in iso-2022-jp-2004?
>
>- ESC ( B for ISO/IEC 646 IRV
>- ESC $ ( O for the full plane 1 of JIS X 0213
>- ESC $ ( P for plane 2 of JIS X 0213
>- ESC $ ( B for a subset of plane 1 of JIS X 0213
> (also a subset of the plane/table from JIS X 0208)

No. In ISO-2022-JP-2004, we mainly use "ESC $ ( Q" and never use
"ESC $ ( B".

- ESC ( B for ISO/IEC 646 IRV
- ESC $ ( Q for the full plane 1 of JIS X 0213:2004
- ESC $ ( P for plane 2 of JIS X 0213
- ESC $ ( O for a subset of plane 1 of JIS X 0213
(20 characters omitted)
- ESC $ B for a subset of plane 1 of JIS X 0213
(also a subset of the plane/table from JIS X 0208)

Best Regards,
Koichi Yasuoka
Martin Duerst
2006-10-03 10:05:24 UTC
Permalink
Hello Koichi,

Thanks for the correction. This information is very
helpful, and it would be good to have it in the registration
form.

Regards, Martin.

At 09:21 06/10/03, Koichi Yasuoka wrote:
>Dear Martin,
>
>Just a quick reply to correct the escape-sequences.
>
>>>Which escape sequences are permitted in iso-2022-jp-2004?
>>
>>- ESC ( B for ISO/IEC 646 IRV
>>- ESC $ ( O for the full plane 1 of JIS X 0213
>>- ESC $ ( P for plane 2 of JIS X 0213
>>- ESC $ ( B for a subset of plane 1 of JIS X 0213
>> (also a subset of the plane/table from JIS X 0208)
>
>No. In ISO-2022-JP-2004, we mainly use "ESC $ ( Q" and never use
>"ESC $ ( B".
>
>- ESC ( B for ISO/IEC 646 IRV
>- ESC $ ( Q for the full plane 1 of JIS X 0213:2004
>- ESC $ ( P for plane 2 of JIS X 0213
>- ESC $ ( O for a subset of plane 1 of JIS X 0213
> (20 characters omitted)
>- ESC $ B for a subset of plane 1 of JIS X 0213
> (also a subset of the plane/table from JIS X 0208)
>
>Best Regards,
>Koichi Yasuoka


#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Erik van der Poel
2006-10-05 15:28:55 UTC
Permalink
Hello,

Is ESC $ ( O for the full plane 1 of JIS X 0213:2000? If so, it might
be clearer to write it that way. Is ESC $ B for the full JIS X 0208?
If so, that would be clearer too. If ESC $ B is only to be used for a
subset of JIS X 0208, it should state which subset that is.

Erik

On 10/3/06, Martin Duerst <***@it.aoyama.ac.jp> wrote:
> Hello Koichi,
>
> Thanks for the correction. This information is very
> helpful, and it would be good to have it in the registration
> form.
>
> Regards, Martin.
>
> At 09:21 06/10/03, Koichi Yasuoka wrote:
> >Dear Martin,
> >
> >Just a quick reply to correct the escape-sequences.
> >
> >>>Which escape sequences are permitted in iso-2022-jp-2004?
> >>
> >>- ESC ( B for ISO/IEC 646 IRV
> >>- ESC $ ( O for the full plane 1 of JIS X 0213
> >>- ESC $ ( P for plane 2 of JIS X 0213
> >>- ESC $ ( B for a subset of plane 1 of JIS X 0213
> >> (also a subset of the plane/table from JIS X 0208)
> >
> >No. In ISO-2022-JP-2004, we mainly use "ESC $ ( Q" and never use
> >"ESC $ ( B".
> >
> >- ESC ( B for ISO/IEC 646 IRV
> >- ESC $ ( Q for the full plane 1 of JIS X 0213:2004
> >- ESC $ ( P for plane 2 of JIS X 0213
> >- ESC $ ( O for a subset of plane 1 of JIS X 0213
> > (20 characters omitted)
> >- ESC $ B for a subset of plane 1 of JIS X 0213
> > (also a subset of the plane/table from JIS X 0208)
> >
> >Best Regards,
> >Koichi Yasuoka
>
>
> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
>
>
Koichi Yasuoka
2006-10-06 00:47:21 UTC
Permalink
Dear Erik,

>>- ESC ( B for ISO/IEC 646 IRV
>>- ESC $ ( Q for the full plane 1 of JIS X 0213:2004
>>- ESC $ ( P for plane 2 of JIS X 0213
>>- ESC $ ( O for a subset of plane 1 of JIS X 0213
>> (20 characters omitted)
>>- ESC $ B for a subset of plane 1 of JIS X 0213
>> (also a subset of the plane/table from JIS X 0208)

>Is ESC $ ( O for the full plane 1 of JIS X 0213:2000?

No. As shown in page 20 of JIS X 0213:2004, 20 characters,
of which 10 characters(*) were included in JIS X 0213:2000
and the other 10 characters are the full set of the "added
characters" in 2004, are omitted:

1-14-1, 1-15-94, 1-17-19(*), 1-22-70(*), 1-23-50(*), 1-28-24(*),
1-33-73(*), 1-38-61(*), 1-39-77(*), 1-47-52, 1-47-94, 1-53-11(*),
1-54-2(*), 1-54-85(*), 1-84-7, and 1-94-90 to 1-94-94.

>Is ESC $ B for the full JIS X 0208?

No. As shown in page 64 of JIS X 0213:2000 and page 20 of
JIS X 0213:2004, following characters are omitted from JIS
X 0213:2004, and the result is a subset of JIS X 0208:1997:

1-2-15 to 1-3-15, 1-3-26 to 1-3-32, 1-3-59 to 1-3-64,
1-3-91 to 1-3-94, 1-4-84 to 1-4-91, 1-5-87 to 1-5-94,
1-6-25 to 1-6-32, 1-6-57 to 1-6-94, 1-7-34 to 1-7-48,
1-7-82 to 1-8-62, 1-8-71 to 1-8-92, 1-9-1 to 1-12-83,
1-12-93 to 1-13-55, 1-13-63 to 1-13-79, 1-13-83, 1-13-88,
1-13-89, 1-13-93, 1-13-94, 1-14-1 to 1-15-94, 1-16-2, 1-16-19,
1-16-79, 1-17-19, 1-17-58, 1-17-75, 1-17-79, 1-18-3, 1-18-9,
1-18-10, 1-18-11, 1-18-25, 1-18-50, 1-18-89, 1-19-4, 1-19-20,
1-19-21, 1-19-34, 1-19-41, 1-19-69, 1-19-73, 1-19-76, 1-19-86,
1-19-90, 1-20-18, 1-20-33, 1-20-35, 1-20-50, 1-20-79, 1-20-91,
1-21-7, 1-21-85, 1-22-2, 1-22-31, 1-22-33, 1-22-38, 1-22-48,
1-22-64, 1-22-70, 1-22-77, 1-23-16, 1-23-39, 1-23-50, 1-23-59,
1-23-66, 1-24-6, 1-24-20, 1-25-60, 1-25-77, 1-25-82, 1-25-85,
1-27-6, 1-27-67, 1-27-75, 1-28-24, 1-28-40, 1-28-41, 1-28-49,
1-28-50, 1-28-52, 1-29-11, 1-29-13, 1-29-43, 1-29-75, 1-29-77,
1-29-79, 1-29-80, 1-29-84, 1-30-36, 1-30-45, 1-30-53, 1-30-63,
1-30-85, 1-31-32, 1-31-57, 1-32-5, 1-32-65, 1-32-70, 1-33-8,
1-33-36, 1-33-46, 1-33-56, 1-33-63, 1-33-67, 1-33-73, 1-33-93,
1-33-94, 1-34-3, 1-34-8, 1-34-45, 1-34-86, 1-35-18, 1-35-29,
1-35-86, 1-35-88, 1-36-7, 1-36-8, 1-36-45, 1-36-47, 1-36-59,
1-36-87, 1-37-22, 1-37-31, 1-37-52, 1-37-55, 1-37-78, 1-37-83,
1-37-88, 1-38-33, 1-38-34, 1-38-45, 1-38-61, 1-38-81, 1-38-86,
1-39-25, 1-39-63, 1-39-72, 1-39-77, 1-40-14, 1-40-16, 1-40-43,
1-40-53, 1-40-60, 1-40-74, 1-41-16, 1-41-48, 1-41-49, 1-41-50,
1-41-51, 1-41-78, 1-42-1, 1-42-27, 1-42-29, 1-42-57, 1-42-66,
1-43-43, 1-43-47, 1-43-72, 1-43-74, 1-43-89, 1-44-40, 1-44-45,
1-44-65, 1-44-89, 1-45-20, 1-45-58, 1-45-73, 1-45-74, 1-45-83,
1-46-20, 1-46-26, 1-46-48, 1-46-62, 1-46-64, 1-46-81, 1-46-82,
1-46-93, 1-47-3, 1-47-13, 1-47-15, 1-47-22, 1-47-25, 1-47-26,
1-47-31, 1-47-52 to 1-47-94, 1-48-54, 1-52-68, 1-53-11, 1-54-2,
1-54-85, 1-57-88, 1-58-25, 1-59-56, 1-59-77, 1-62-25, 1-62-85,
1-63-70, 1-64-86, 1-66-72, 1-66-74, 1-67-62, 1-68-38, 1-73-2,
1-73-14, 1-73-58, 1-74-4, 1-75-61, 1-76-45, 1-77-78, 1-80-55,
1-80-84, 1-82-45, 1-82-84, and 1-84-1 to 1-94-94.

Best Regards,
Koichi Yasuoka
Erik van der Poel
2006-10-07 19:04:13 UTC
Permalink
Thanks, Yasuoka-san.

Comments interspersed below.

On 10/5/06, Koichi Yasuoka <***@kanji.zinbun.kyoto-u.ac.jp> wrote:
> Dear Erik,
>
> >>- ESC ( B for ISO/IEC 646 IRV
> >>- ESC $ ( Q for the full plane 1 of JIS X 0213:2004
> >>- ESC $ ( P for plane 2 of JIS X 0213
> >>- ESC $ ( O for a subset of plane 1 of JIS X 0213
> >> (20 characters omitted)
> >>- ESC $ B for a subset of plane 1 of JIS X 0213
> >> (also a subset of the plane/table from JIS X 0208)
>
> >Is ESC $ ( O for the full plane 1 of JIS X 0213:2000?
>
> No. As shown in page 20 of JIS X 0213:2004, 20 characters,
> of which 10 characters(*) were included in JIS X 0213:2000
> and the other 10 characters are the full set of the "added
> characters" in 2004, are omitted:
>
> 1-14-1, 1-15-94, 1-17-19(*), 1-22-70(*), 1-23-50(*), 1-28-24(*),
> 1-33-73(*), 1-38-61(*), 1-39-77(*), 1-47-52, 1-47-94, 1-53-11(*),
> 1-54-2(*), 1-54-85(*), 1-84-7, and 1-94-90 to 1-94-94.

OK, I have confirmed the ones without the asterisk(*) on page 32,
section 33 of JIS X 0213:2004 Amendment 1.

I have also confirmed the entire set of 20 (both with and without
asterisk) on page 20, section 21 of JIS X 0213:2004 Amendment 1. It
says that these characters are not used with the escape sequences 1B
24 42 and 1B 24 28 4F.

To find JIS X 0213:2004 Amendment 1, you start at the following URL
and enter X0213 in the first text box:

http://www.jisc.go.jp/app/JPS/JPSO0020.html

This will take you to the following page:

http://www.jisc.go.jp/app/pager?id=16428

Then you click on the PDF X0213_20 to get the amendment.

I have not been able to confirm the characters below. Are there any
URLs to confirm these? The site seems to be down again. Do they
perform maintenance every Saturday night?

Erik

> >Is ESC $ B for the full JIS X 0208?
>
> No. As shown in page 64 of JIS X 0213:2000 and page 20 of
> JIS X 0213:2004, following characters are omitted from JIS
> X 0213:2004, and the result is a subset of JIS X 0208:1997:
>
> 1-2-15 to 1-3-15, 1-3-26 to 1-3-32, 1-3-59 to 1-3-64,
> 1-3-91 to 1-3-94, 1-4-84 to 1-4-91, 1-5-87 to 1-5-94,
> 1-6-25 to 1-6-32, 1-6-57 to 1-6-94, 1-7-34 to 1-7-48,
> 1-7-82 to 1-8-62, 1-8-71 to 1-8-92, 1-9-1 to 1-12-83,
> 1-12-93 to 1-13-55, 1-13-63 to 1-13-79, 1-13-83, 1-13-88,
> 1-13-89, 1-13-93, 1-13-94, 1-14-1 to 1-15-94, 1-16-2, 1-16-19,
> 1-16-79, 1-17-19, 1-17-58, 1-17-75, 1-17-79, 1-18-3, 1-18-9,
> 1-18-10, 1-18-11, 1-18-25, 1-18-50, 1-18-89, 1-19-4, 1-19-20,
> 1-19-21, 1-19-34, 1-19-41, 1-19-69, 1-19-73, 1-19-76, 1-19-86,
> 1-19-90, 1-20-18, 1-20-33, 1-20-35, 1-20-50, 1-20-79, 1-20-91,
> 1-21-7, 1-21-85, 1-22-2, 1-22-31, 1-22-33, 1-22-38, 1-22-48,
> 1-22-64, 1-22-70, 1-22-77, 1-23-16, 1-23-39, 1-23-50, 1-23-59,
> 1-23-66, 1-24-6, 1-24-20, 1-25-60, 1-25-77, 1-25-82, 1-25-85,
> 1-27-6, 1-27-67, 1-27-75, 1-28-24, 1-28-40, 1-28-41, 1-28-49,
> 1-28-50, 1-28-52, 1-29-11, 1-29-13, 1-29-43, 1-29-75, 1-29-77,
> 1-29-79, 1-29-80, 1-29-84, 1-30-36, 1-30-45, 1-30-53, 1-30-63,
> 1-30-85, 1-31-32, 1-31-57, 1-32-5, 1-32-65, 1-32-70, 1-33-8,
> 1-33-36, 1-33-46, 1-33-56, 1-33-63, 1-33-67, 1-33-73, 1-33-93,
> 1-33-94, 1-34-3, 1-34-8, 1-34-45, 1-34-86, 1-35-18, 1-35-29,
> 1-35-86, 1-35-88, 1-36-7, 1-36-8, 1-36-45, 1-36-47, 1-36-59,
> 1-36-87, 1-37-22, 1-37-31, 1-37-52, 1-37-55, 1-37-78, 1-37-83,
> 1-37-88, 1-38-33, 1-38-34, 1-38-45, 1-38-61, 1-38-81, 1-38-86,
> 1-39-25, 1-39-63, 1-39-72, 1-39-77, 1-40-14, 1-40-16, 1-40-43,
> 1-40-53, 1-40-60, 1-40-74, 1-41-16, 1-41-48, 1-41-49, 1-41-50,
> 1-41-51, 1-41-78, 1-42-1, 1-42-27, 1-42-29, 1-42-57, 1-42-66,
> 1-43-43, 1-43-47, 1-43-72, 1-43-74, 1-43-89, 1-44-40, 1-44-45,
> 1-44-65, 1-44-89, 1-45-20, 1-45-58, 1-45-73, 1-45-74, 1-45-83,
> 1-46-20, 1-46-26, 1-46-48, 1-46-62, 1-46-64, 1-46-81, 1-46-82,
> 1-46-93, 1-47-3, 1-47-13, 1-47-15, 1-47-22, 1-47-25, 1-47-26,
> 1-47-31, 1-47-52 to 1-47-94, 1-48-54, 1-52-68, 1-53-11, 1-54-2,
> 1-54-85, 1-57-88, 1-58-25, 1-59-56, 1-59-77, 1-62-25, 1-62-85,
> 1-63-70, 1-64-86, 1-66-72, 1-66-74, 1-67-62, 1-68-38, 1-73-2,
> 1-73-14, 1-73-58, 1-74-4, 1-75-61, 1-76-45, 1-77-78, 1-80-55,
> 1-80-84, 1-82-45, 1-82-84, and 1-84-1 to 1-94-94.
>
> Best Regards,
> Koichi Yasuoka
Koichi Yasuoka
2006-10-10 11:51:09 UTC
Permalink
Dear Erik,

>I have not been able to confirm the characters below. Are there any
>URLs to confirm these?

>>>Is ESC $ B for the full JIS X 0208?

>> No. As shown in page 64 of JIS X 0213:2000 and page 20 of
>> JIS X 0213:2004, following characters are omitted from JIS
>> X 0213:2004, and the result is a subset of JIS X 0208:1997:
>>
>> 1-2-15 to 1-3-15, 1-3-26 to 1-3-32, 1-3-59 to 1-3-64,
>> 1-3-91 to 1-3-94, 1-4-84 to 1-4-91, 1-5-87 to 1-5-94,
>> 1-6-25 to 1-6-32, 1-6-57 to 1-6-94, 1-7-34 to 1-7-48,
>> 1-7-82 to 1-8-62, 1-8-71 to 1-8-92, 1-9-1 to 1-12-83,
>> 1-12-93 to 1-13-55, 1-13-63 to 1-13-79, 1-13-83, 1-13-88,
>> 1-13-89, 1-13-93, 1-13-94, 1-14-1 to 1-15-94, 1-16-2, 1-16-19,
>> 1-16-79, 1-17-19, 1-17-58, 1-17-75, 1-17-79, 1-18-3, 1-18-9,
>> 1-18-10, 1-18-11, 1-18-25, 1-18-50, 1-18-89, 1-19-4, 1-19-20,
>> 1-19-21, 1-19-34, 1-19-41, 1-19-69, 1-19-73, 1-19-76, 1-19-86,
>> 1-19-90, 1-20-18, 1-20-33, 1-20-35, 1-20-50, 1-20-79, 1-20-91,
>> 1-21-7, 1-21-85, 1-22-2, 1-22-31, 1-22-33, 1-22-38, 1-22-48,
>> 1-22-64, 1-22-70, 1-22-77, 1-23-16, 1-23-39, 1-23-50, 1-23-59,
>> 1-23-66, 1-24-6, 1-24-20, 1-25-60, 1-25-77, 1-25-82, 1-25-85,
>> 1-27-6, 1-27-67, 1-27-75, 1-28-24, 1-28-40, 1-28-41, 1-28-49,
>> 1-28-50, 1-28-52, 1-29-11, 1-29-13, 1-29-43, 1-29-75, 1-29-77,
>> 1-29-79, 1-29-80, 1-29-84, 1-30-36, 1-30-45, 1-30-53, 1-30-63,
>> 1-30-85, 1-31-32, 1-31-57, 1-32-5, 1-32-65, 1-32-70, 1-33-8,
>> 1-33-36, 1-33-46, 1-33-56, 1-33-63, 1-33-67, 1-33-73, 1-33-93,
>> 1-33-94, 1-34-3, 1-34-8, 1-34-45, 1-34-86, 1-35-18, 1-35-29,
>> 1-35-86, 1-35-88, 1-36-7, 1-36-8, 1-36-45, 1-36-47, 1-36-59,
>> 1-36-87, 1-37-22, 1-37-31, 1-37-52, 1-37-55, 1-37-78, 1-37-83,
>> 1-37-88, 1-38-33, 1-38-34, 1-38-45, 1-38-61, 1-38-81, 1-38-86,
>> 1-39-25, 1-39-63, 1-39-72, 1-39-77, 1-40-14, 1-40-16, 1-40-43,
>> 1-40-53, 1-40-60, 1-40-74, 1-41-16, 1-41-48, 1-41-49, 1-41-50,
>> 1-41-51, 1-41-78, 1-42-1, 1-42-27, 1-42-29, 1-42-57, 1-42-66,
>> 1-43-43, 1-43-47, 1-43-72, 1-43-74, 1-43-89, 1-44-40, 1-44-45,
>> 1-44-65, 1-44-89, 1-45-20, 1-45-58, 1-45-73, 1-45-74, 1-45-83,
>> 1-46-20, 1-46-26, 1-46-48, 1-46-62, 1-46-64, 1-46-81, 1-46-82,
>> 1-46-93, 1-47-3, 1-47-13, 1-47-15, 1-47-22, 1-47-25, 1-47-26,
>> 1-47-31, 1-47-52 to 1-47-94, 1-48-54, 1-52-68, 1-53-11, 1-54-2,
>> 1-54-85, 1-57-88, 1-58-25, 1-59-56, 1-59-77, 1-62-25, 1-62-85,
>> 1-63-70, 1-64-86, 1-66-72, 1-66-74, 1-67-62, 1-68-38, 1-73-2,
>> 1-73-14, 1-73-58, 1-74-4, 1-75-61, 1-76-45, 1-77-78, 1-80-55,
>> 1-80-84, 1-82-45, 1-82-84, and 1-84-1 to 1-94-94.

Well, please check page 5 of X0213_05 at the
http://www.jisc.go.jp/app/pager?id=xxxxx
for JIS X 0213.

Best Regards,
Koichi Yasuoka
Erik van der Poel
2006-10-10 13:55:39 UTC
Permalink
Thanks again. Comments at the bottom.

On 10/7/06, Erik van der Poel <***@google.com> wrote:
> Thanks, Yasuoka-san.
>
> Comments interspersed below.
>
> On 10/5/06, Koichi Yasuoka <***@kanji.zinbun.kyoto-u.ac.jp> wrote:
> > Dear Erik,
> >
> > >>- ESC ( B for ISO/IEC 646 IRV
> > >>- ESC $ ( Q for the full plane 1 of JIS X 0213:2004
> > >>- ESC $ ( P for plane 2 of JIS X 0213
> > >>- ESC $ ( O for a subset of plane 1 of JIS X 0213
> > >> (20 characters omitted)
> > >>- ESC $ B for a subset of plane 1 of JIS X 0213
> > >> (also a subset of the plane/table from JIS X 0208)
> >
> > >Is ESC $ ( O for the full plane 1 of JIS X 0213:2000?
> >
> > No. As shown in page 20 of JIS X 0213:2004, 20 characters,
> > of which 10 characters(*) were included in JIS X 0213:2000
> > and the other 10 characters are the full set of the "added
> > characters" in 2004, are omitted:
> >
> > 1-14-1, 1-15-94, 1-17-19(*), 1-22-70(*), 1-23-50(*), 1-28-24(*),
> > 1-33-73(*), 1-38-61(*), 1-39-77(*), 1-47-52, 1-47-94, 1-53-11(*),
> > 1-54-2(*), 1-54-85(*), 1-84-7, and 1-94-90 to 1-94-94.
>
> OK, I have confirmed the ones without the asterisk(*) on page 32,
> section 33 of JIS X 0213:2004 Amendment 1.
>
> I have also confirmed the entire set of 20 (both with and without
> asterisk) on page 20, section 21 of JIS X 0213:2004 Amendment 1. It
> says that these characters are not used with the escape sequences 1B
> 24 42 and 1B 24 28 4F.
>
> To find JIS X 0213:2004 Amendment 1, you start at the following URL
> and enter X0213 in the first text box:
>
> http://www.jisc.go.jp/app/JPS/JPSO0020.html
>
> This will take you to the following page:
>
> http://www.jisc.go.jp/app/pager?id=16428
>
> Then you click on the PDF X0213_20 to get the amendment.
>
> I have not been able to confirm the characters below. Are there any
> URLs to confirm these? The site seems to be down again. Do they
> perform maintenance every Saturday night?
>
> Erik
>
> > >Is ESC $ B for the full JIS X 0208?
> >
> > No. As shown in page 64 of JIS X 0213:2000 and page 20 of
> > JIS X 0213:2004, following characters are omitted from JIS
> > X 0213:2004, and the result is a subset of JIS X 0208:1997:
> >
> > 1-2-15 to 1-3-15, 1-3-26 to 1-3-32, 1-3-59 to 1-3-64,
> > 1-3-91 to 1-3-94, 1-4-84 to 1-4-91, 1-5-87 to 1-5-94,
> > 1-6-25 to 1-6-32, 1-6-57 to 1-6-94, 1-7-34 to 1-7-48,
> > 1-7-82 to 1-8-62, 1-8-71 to 1-8-92, 1-9-1 to 1-12-83,
> > 1-12-93 to 1-13-55, 1-13-63 to 1-13-79, 1-13-83, 1-13-88,
> > 1-13-89, 1-13-93, 1-13-94, 1-14-1 to 1-15-94, 1-16-2, 1-16-19,
> > 1-16-79, 1-17-19, 1-17-58, 1-17-75, 1-17-79, 1-18-3, 1-18-9,
> > 1-18-10, 1-18-11, 1-18-25, 1-18-50, 1-18-89, 1-19-4, 1-19-20,
> > 1-19-21, 1-19-34, 1-19-41, 1-19-69, 1-19-73, 1-19-76, 1-19-86,
> > 1-19-90, 1-20-18, 1-20-33, 1-20-35, 1-20-50, 1-20-79, 1-20-91,
> > 1-21-7, 1-21-85, 1-22-2, 1-22-31, 1-22-33, 1-22-38, 1-22-48,
> > 1-22-64, 1-22-70, 1-22-77, 1-23-16, 1-23-39, 1-23-50, 1-23-59,
> > 1-23-66, 1-24-6, 1-24-20, 1-25-60, 1-25-77, 1-25-82, 1-25-85,
> > 1-27-6, 1-27-67, 1-27-75, 1-28-24, 1-28-40, 1-28-41, 1-28-49,
> > 1-28-50, 1-28-52, 1-29-11, 1-29-13, 1-29-43, 1-29-75, 1-29-77,
> > 1-29-79, 1-29-80, 1-29-84, 1-30-36, 1-30-45, 1-30-53, 1-30-63,
> > 1-30-85, 1-31-32, 1-31-57, 1-32-5, 1-32-65, 1-32-70, 1-33-8,
> > 1-33-36, 1-33-46, 1-33-56, 1-33-63, 1-33-67, 1-33-73, 1-33-93,
> > 1-33-94, 1-34-3, 1-34-8, 1-34-45, 1-34-86, 1-35-18, 1-35-29,
> > 1-35-86, 1-35-88, 1-36-7, 1-36-8, 1-36-45, 1-36-47, 1-36-59,
> > 1-36-87, 1-37-22, 1-37-31, 1-37-52, 1-37-55, 1-37-78, 1-37-83,
> > 1-37-88, 1-38-33, 1-38-34, 1-38-45, 1-38-61, 1-38-81, 1-38-86,
> > 1-39-25, 1-39-63, 1-39-72, 1-39-77, 1-40-14, 1-40-16, 1-40-43,
> > 1-40-53, 1-40-60, 1-40-74, 1-41-16, 1-41-48, 1-41-49, 1-41-50,
> > 1-41-51, 1-41-78, 1-42-1, 1-42-27, 1-42-29, 1-42-57, 1-42-66,
> > 1-43-43, 1-43-47, 1-43-72, 1-43-74, 1-43-89, 1-44-40, 1-44-45,
> > 1-44-65, 1-44-89, 1-45-20, 1-45-58, 1-45-73, 1-45-74, 1-45-83,
> > 1-46-20, 1-46-26, 1-46-48, 1-46-62, 1-46-64, 1-46-81, 1-46-82,
> > 1-46-93, 1-47-3, 1-47-13, 1-47-15, 1-47-22, 1-47-25, 1-47-26,
> > 1-47-31, 1-47-52 to 1-47-94, 1-48-54, 1-52-68, 1-53-11, 1-54-2,
> > 1-54-85, 1-57-88, 1-58-25, 1-59-56, 1-59-77, 1-62-25, 1-62-85,
> > 1-63-70, 1-64-86, 1-66-72, 1-66-74, 1-67-62, 1-68-38, 1-73-2,
> > 1-73-14, 1-73-58, 1-74-4, 1-75-61, 1-76-45, 1-77-78, 1-80-55,
> > 1-80-84, 1-82-45, 1-82-84, and 1-84-1 to 1-94-94.
> >
> > Best Regards,
> > Koichi Yasuoka

OK, I have confirmed that the above list of characters is: [the list
at PDF X0213_05 JIS X 0213:2000 page 64] plus [the 20 mentioned
earlier]. These are not used with 1B 24 42.

I have also confirmed that the PDF X0213_12 ISO/IEC 646 IRV is the
same as US-ASCII (from 20 to 7F).

Also, at PDF X0213_13 we have JIS X 0211 C0 controls, same as
Unicode's C0, except for 1C-1F. The important characters are the same:
09, 0A, 0D and 1B.

And the PDF after X0213_20 is JIS X 0213:2004 Amendment 1, Corrigendum
dated April 1st, 2004, where the name ISO-2022-JP-2004 appears.

I believe we now have enough of the details to register this charset
(ISO-2022-JP-2004). It would probably be a good idea to add the info
from these emails to the actual registration. How do others feel about
this?

Erik
Martin Duerst
2006-10-12 08:06:32 UTC
Permalink
At 22:55 06/10/10, Erik van der Poel wrote:

>I believe we now have enough of the details to register this charset
>(ISO-2022-JP-2004). It would probably be a good idea to add the info
>from these emails to the actual registration. How do others feel about
>this?

Yes, I think in the various mails, we have a lot of information,
and it would be good if most of it made it into the registration
form. Koichi, can you please submit another registration form,
with the information we discussed added in? I'm sure Eric and
others will be glad to check it.

Regards, Martin.



#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Koichi Yasuoka
2006-10-27 22:49:33 UTC
Permalink
Dear Sirs,

I sent you the new registration form of "ISO-2022-JP-2004" last week, but
it doesn't appear on http://mail.apps.ietf.org/ietf/charsets/maillist.html
yet. Now I send it to you again.

Best Regards,
Koichi Yasuoka

============================================================
*Charset name:

ISO-2022-JP-2004

*Charset aliases:

ISO-2022-JP-2003
ISO-2022-JP-3-2003

*Suitability for use in MIME text:

Suitable for 7-bit use in MIME body-part as text/plain or text/html.
B-encoding is recommended for use in MIME header-part, because
ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.

*Published specification:

JIS X 0213 7-bit and 8-bit double byte coded extended KANJI sets for
information interchange, Japanese Standards Association (first edition
2000-01-20, amendment 2004-02-20, corrigendum 2004-04-01).

*ISO 10646 equivalency table:

No direct URI to the official equivalency table, but the table is
included in JIS X 0213, which can be found via
http://www.jisc.go.jp/app/JPS/JPSO0020.html
with searching the word "X0213".

An unofficial equivalency table can be found at
http://x0213.org/codetable/iso-2022-jp-2004-std.txt
prepared by "Project X0213".

*Additional information:

Escape sequences used in ISO-2022-JP-2004 are:
- ESC ( B for ISO/IEC 646 IRV
- ESC $ ( Q for the full plane 1 of JIS X 0213:2004
- ESC $ ( P for plane 2 of JIS X 0213
- ESC $ ( O for a subset of plane 1 of JIS X 0213
(20 characters omitted)
- ESC $ B for a subset of plane 1 of JIS X 0213
(also a subset of the plane/table from JIS X 0208)

For "ESC $ ( O", 20 characters, of which 10 characters(*) were included
in old JIS X 0213:2000 and the other 10 characters are the full set of
the "added characters" in 2004, are omitted:

1-14-1, 1-15-94, 1-17-19(*), 1-22-70(*), 1-23-50(*), 1-28-24(*),
1-33-73(*), 1-38-61(*), 1-39-77(*), 1-47-52, 1-47-94, 1-53-11(*),
1-54-2(*), 1-54-85(*), 1-84-7, and 1-94-90 to 1-94-94.

For "ESC $ B", following characters are omitted from JIS X 0213:2004,
and the result is a subset of JIS X 0208:1997:

1-2-15 to 1-3-15, 1-3-26 to 1-3-32, 1-3-59 to 1-3-64,
1-3-91 to 1-3-94, 1-4-84 to 1-4-91, 1-5-87 to 1-5-94,
1-6-25 to 1-6-32, 1-6-57 to 1-6-94, 1-7-34 to 1-7-48,
1-7-82 to 1-8-62, 1-8-71 to 1-8-92, 1-9-1 to 1-12-83,
1-12-93 to 1-13-55, 1-13-63 to 1-13-79, 1-13-83, 1-13-88,
1-13-89, 1-13-93, 1-13-94, 1-14-1 to 1-15-94, 1-16-2, 1-16-19,
1-16-79, 1-17-19, 1-17-58, 1-17-75, 1-17-79, 1-18-3, 1-18-9,
1-18-10, 1-18-11, 1-18-25, 1-18-50, 1-18-89, 1-19-4, 1-19-20,
1-19-21, 1-19-34, 1-19-41, 1-19-69, 1-19-73, 1-19-76, 1-19-86,
1-19-90, 1-20-18, 1-20-33, 1-20-35, 1-20-50, 1-20-79, 1-20-91,
1-21-7, 1-21-85, 1-22-2, 1-22-31, 1-22-33, 1-22-38, 1-22-48,
1-22-64, 1-22-70, 1-22-77, 1-23-16, 1-23-39, 1-23-50, 1-23-59,
1-23-66, 1-24-6, 1-24-20, 1-25-60, 1-25-77, 1-25-82, 1-25-85,
1-27-6, 1-27-67, 1-27-75, 1-28-24, 1-28-40, 1-28-41, 1-28-49,
1-28-50, 1-28-52, 1-29-11, 1-29-13, 1-29-43, 1-29-75, 1-29-77,
1-29-79, 1-29-80, 1-29-84, 1-30-36, 1-30-45, 1-30-53, 1-30-63,
1-30-85, 1-31-32, 1-31-57, 1-32-5, 1-32-65, 1-32-70, 1-33-8,
1-33-36, 1-33-46, 1-33-56, 1-33-63, 1-33-67, 1-33-73, 1-33-93,
1-33-94, 1-34-3, 1-34-8, 1-34-45, 1-34-86, 1-35-18, 1-35-29,
1-35-86, 1-35-88, 1-36-7, 1-36-8, 1-36-45, 1-36-47, 1-36-59,
1-36-87, 1-37-22, 1-37-31, 1-37-52, 1-37-55, 1-37-78, 1-37-83,
1-37-88, 1-38-33, 1-38-34, 1-38-45, 1-38-61, 1-38-81, 1-38-86,
1-39-25, 1-39-63, 1-39-72, 1-39-77, 1-40-14, 1-40-16, 1-40-43,
1-40-53, 1-40-60, 1-40-74, 1-41-16, 1-41-48, 1-41-49, 1-41-50,
1-41-51, 1-41-78, 1-42-1, 1-42-27, 1-42-29, 1-42-57, 1-42-66,
1-43-43, 1-43-47, 1-43-72, 1-43-74, 1-43-89, 1-44-40, 1-44-45,
1-44-65, 1-44-89, 1-45-20, 1-45-58, 1-45-73, 1-45-74, 1-45-83,
1-46-20, 1-46-26, 1-46-48, 1-46-62, 1-46-64, 1-46-81, 1-46-82,
1-46-93, 1-47-3, 1-47-13, 1-47-15, 1-47-22, 1-47-25, 1-47-26,
1-47-31, 1-47-52 to 1-47-94, 1-48-54, 1-52-68, 1-53-11, 1-54-2,
1-54-85, 1-57-88, 1-58-25, 1-59-56, 1-59-77, 1-62-25, 1-62-85,
1-63-70, 1-64-86, 1-66-72, 1-66-74, 1-67-62, 1-68-38, 1-73-2,
1-73-14, 1-73-58, 1-74-4, 1-75-61, 1-76-45, 1-77-78, 1-80-55,
1-80-84, 1-82-45, 1-82-84, and 1-84-1 to 1-94-94.

"ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" were in the first print of
JIS X 0213:2004 dated February 20, 2004, and they were both corrected
to "ISO-2022-JP-2004" in the corrigendum dated April 1, 2004. To avoid
complications "ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" may be aliases
of "ISO-2022-JP-2004", but "ISO-2022-JP-2004" is preferred.

*Person & email address to contact for further information:

Koichi Yasuoka
***@kanji.zinbun.kyoto-u.ac.jp

*Intended usage:

COMMON
Erik van der Poel
2006-10-30 18:54:19 UTC
Permalink
> Suitable for 7-bit use in MIME body-part as text/plain or text/html.

This wording might be misinterpreted to mean that iso-2022-jp-2004 can
only be used with text/plain and text/html. I believe this charset is
suitable for use with any subtype of text (text/*).

Also, I wonder if we should say anything about
Content-Transfer-Encoding. The quoted-printable encoding does not look
good at all in very old (pre-MIME) implementations (and is not
necessary when the lines are short enough). The format=flowed
transformation would work fine, even in ancient software. (Do we care
about pre-MIME implementations?)

> B-encoding is recommended for use in MIME header-part, because
> ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.

The term "header-part" is not normally used for these. They are called
encoded-words. How about: "The 'B' encoding is recommended for use
with MIME encoded-words in headers, as is recommended for the related
charset iso-2022-jp."

> - ESC ( B for ISO/IEC 646 IRV

I wonder if we should add JIS X 0211 here. This set is similar to (if
not identical to) ISO 6429 control characters in the range 0x00 to
0x1F. The most commonly used controls in the iso-2022-jp family are
horizontal TAB, CR, LF and ESC. SI and SO _must_ not be used.

Erik
Martin Duerst
2006-11-07 05:33:05 UTC
Permalink
Hello Koichi,

Many thanks for your update, and sorry for my delay.
This mail contains comments directly on your updated proposal
as well as on the comments by Eric and Frank.

At 07:49 06/10/28, Koichi Yasuoka wrote:
>Dear Sirs,
>
>I sent you the new registration form of "ISO-2022-JP-2004" last week, but
>it doesn't appear on http://mail.apps.ietf.org/ietf/charsets/maillist.html
>yet. Now I send it to you again.
>
>Best Regards,
>Koichi Yasuoka
>
>============================================================
>*Charset name:
>
>ISO-2022-JP-2004
>
>*Charset aliases:
>
>ISO-2022-JP-2003
>ISO-2022-JP-3-2003
>
>*Suitability for use in MIME text:
>
>Suitable for 7-bit use in MIME body-part as text/plain or text/html.

Please remove "as text/plain or text/html". Charset registrations
are designed to be independent of mime types. Also, I agree with
Frank that format=flowed does not belong here.

>B-encoding is recommended for use in MIME header-part, because
>ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.

Based on our previous discussion, I'd rephrase this sentence to say:

B-encoding is recommended for use in MIME header-parts,
because in general it results in shorter strings than Q-encoding.

Instead of "in general", "in most cases" or something similare
may also work.

>*Published specification:
>
>JIS X 0213 7-bit and 8-bit double byte coded extended KANJI sets for
>information interchange, Japanese Standards Association (first edition
>2000-01-20, amendment 2004-02-20, corrigendum 2004-04-01).
>
>*ISO 10646 equivalency table:
>
>No direct URI to the official equivalency table, but the table is
>included in JIS X 0213, which can be found via
> http://www.jisc.go.jp/app/JPS/JPSO0020.html
>with searching the word "X0213".

[side remark: I really wish JSA and the Japanese Government would
learn how the Web works.]

For the above text, it would be good if you made clear that
the various pages and texts are in Japanese.


>An unofficial equivalency table can be found at
> http://x0213.org/codetable/iso-2022-jp-2004-std.txt
>prepared by "Project X0213".
>
>*Additional information:
>
>Escape sequences used in ISO-2022-JP-2004 are:
>- ESC ( B for ISO/IEC 646 IRV
>- ESC $ ( Q for the full plane 1 of JIS X 0213:2004
>- ESC $ ( P for plane 2 of JIS X 0213
>- ESC $ ( O for a subset of plane 1 of JIS X 0213
> (20 characters omitted)
>- ESC $ B for a subset of plane 1 of JIS X 0213
> (also a subset of the plane/table from JIS X 0208)
>
>For "ESC $ ( O", 20 characters, of which 10 characters(*) were included
>in old JIS X 0213:2000 and the other 10 characters are the full set of
>the "added characters" in 2004, are omitted:
>
>1-14-1, 1-15-94, 1-17-19(*), 1-22-70(*), 1-23-50(*), 1-28-24(*),
>1-33-73(*), 1-38-61(*), 1-39-77(*), 1-47-52, 1-47-94, 1-53-11(*),
>1-54-2(*), 1-54-85(*), 1-84-7, and 1-94-90 to 1-94-94.
>
>For "ESC $ B", following characters are omitted from JIS X 0213:2004,
>and the result is a subset of JIS X 0208:1997:
>
>1-2-15 to 1-3-15, 1-3-26 to 1-3-32, 1-3-59 to 1-3-64,
>1-3-91 to 1-3-94, 1-4-84 to 1-4-91, 1-5-87 to 1-5-94,
>1-6-25 to 1-6-32, 1-6-57 to 1-6-94, 1-7-34 to 1-7-48,
>1-7-82 to 1-8-62, 1-8-71 to 1-8-92, 1-9-1 to 1-12-83,
>1-12-93 to 1-13-55, 1-13-63 to 1-13-79, 1-13-83, 1-13-88,
>1-13-89, 1-13-93, 1-13-94, 1-14-1 to 1-15-94, 1-16-2, 1-16-19,
>1-16-79, 1-17-19, 1-17-58, 1-17-75, 1-17-79, 1-18-3, 1-18-9,
>1-18-10, 1-18-11, 1-18-25, 1-18-50, 1-18-89, 1-19-4, 1-19-20,
>1-19-21, 1-19-34, 1-19-41, 1-19-69, 1-19-73, 1-19-76, 1-19-86,
>1-19-90, 1-20-18, 1-20-33, 1-20-35, 1-20-50, 1-20-79, 1-20-91,
>1-21-7, 1-21-85, 1-22-2, 1-22-31, 1-22-33, 1-22-38, 1-22-48,
>1-22-64, 1-22-70, 1-22-77, 1-23-16, 1-23-39, 1-23-50, 1-23-59,
>1-23-66, 1-24-6, 1-24-20, 1-25-60, 1-25-77, 1-25-82, 1-25-85,
>1-27-6, 1-27-67, 1-27-75, 1-28-24, 1-28-40, 1-28-41, 1-28-49,
>1-28-50, 1-28-52, 1-29-11, 1-29-13, 1-29-43, 1-29-75, 1-29-77,
>1-29-79, 1-29-80, 1-29-84, 1-30-36, 1-30-45, 1-30-53, 1-30-63,
>1-30-85, 1-31-32, 1-31-57, 1-32-5, 1-32-65, 1-32-70, 1-33-8,
>1-33-36, 1-33-46, 1-33-56, 1-33-63, 1-33-67, 1-33-73, 1-33-93,
>1-33-94, 1-34-3, 1-34-8, 1-34-45, 1-34-86, 1-35-18, 1-35-29,
>1-35-86, 1-35-88, 1-36-7, 1-36-8, 1-36-45, 1-36-47, 1-36-59,
>1-36-87, 1-37-22, 1-37-31, 1-37-52, 1-37-55, 1-37-78, 1-37-83,
>1-37-88, 1-38-33, 1-38-34, 1-38-45, 1-38-61, 1-38-81, 1-38-86,
>1-39-25, 1-39-63, 1-39-72, 1-39-77, 1-40-14, 1-40-16, 1-40-43,
>1-40-53, 1-40-60, 1-40-74, 1-41-16, 1-41-48, 1-41-49, 1-41-50,
>1-41-51, 1-41-78, 1-42-1, 1-42-27, 1-42-29, 1-42-57, 1-42-66,
>1-43-43, 1-43-47, 1-43-72, 1-43-74, 1-43-89, 1-44-40, 1-44-45,
>1-44-65, 1-44-89, 1-45-20, 1-45-58, 1-45-73, 1-45-74, 1-45-83,
>1-46-20, 1-46-26, 1-46-48, 1-46-62, 1-46-64, 1-46-81, 1-46-82,
>1-46-93, 1-47-3, 1-47-13, 1-47-15, 1-47-22, 1-47-25, 1-47-26,
>1-47-31, 1-47-52 to 1-47-94, 1-48-54, 1-52-68, 1-53-11, 1-54-2,
>1-54-85, 1-57-88, 1-58-25, 1-59-56, 1-59-77, 1-62-25, 1-62-85,
>1-63-70, 1-64-86, 1-66-72, 1-66-74, 1-67-62, 1-68-38, 1-73-2,
>1-73-14, 1-73-58, 1-74-4, 1-75-61, 1-76-45, 1-77-78, 1-80-55,
>1-80-84, 1-82-45, 1-82-84, and 1-84-1 to 1-94-94.
>
>"ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" were in the first print of
>JIS X 0213:2004 dated February 20, 2004, and they were both corrected
>to "ISO-2022-JP-2004" in the corrigendum dated April 1, 2004. To avoid
>complications "ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" may be aliases
>of "ISO-2022-JP-2004", but "ISO-2022-JP-2004" is preferred.

It is still unclear why this 'correction' has been made. Any
label should be good enough, changing labels all the time only
creates confusion. But this is just for personal interest.
I do not think that the above paragraph, or any slightly
updated explanation, needs to be in the registration template.

On the other hand, I would like to see the following text, or
something similar, added here or at the start of the "Additional
Information" section:

Please note that there is a certain overlap of codepoints
and similarity of encoding methods and escape sequences
between 'iso-2022-jp' and 'iso-2022-jp-2004' which may suggest
that these two charsets are interoperable. But this is not
the case.

Also, adding here that code tables can be found at
http://www.itscj.ipsj.or.jp/ISO-IR/228.pdf,...
might help some people.

Regards, Martin.


>*Person & email address to contact for further information:
>
>Koichi Yasuoka
>***@kanji.zinbun.kyoto-u.ac.jp
>
>*Intended usage:
>
>COMMON


#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Erik van der Poel
2006-10-01 18:38:42 UTC
Permalink
Hello,

I have a few questions about this registration:

> At 00:28 06/09/28, Koichi Yasuoka wrote:
> >=?ISO-2022-JP-2004?Q?=1B=24B0B2=2C9=270l=1B=28B?=

I believe that, in general, many of us recommend being conservative in
what you send out, liberal in what you accept. Therefore, the
recommendation is to use the charset label that matches the smallest
subset of characters actually used in the text, as well as using the
oldest and/or most commonly accepted name. In this case, you are
clearly using the ESC $ B (1B 24 42) that is part of iso-2022-jp (rfc
1468). Therefore, the more conservative option is to use the name
iso-2022-jp when sending this particular piece of text.

I have noticed over the years that if you don't spell out the
recommendations, implementors will do the wrong thing. In this case,
would it be a good idea to add such recommendations to the
registration itself? Or should a new RFC be written, in order to
provide the recommendations in more detail?

> >I understand that ISO-2022-JP texts with "ESC $ B" and
> >"ESC ( B" can be accepted by ISO-2022-JP-2004 decoder.
> >It is problematic when "ESC $ @" or "ESC ( J" is used but
> >they are very rare now.

Which escape sequences are permitted in iso-2022-jp-2004? There are 3
problems with the link you sent earlier*: The first page is in
Japanese, and when you search for X0213, the results are in Japanese
too. Then X0213 is split into many PDFs, and it is not clear which one
to download in order to see the escape sequences, nor am I inclined to
download all of the pieces. Finally, that site was down yesterday and
up today. How often does it go down?

* http://www.jisc.go.jp/app/JPS/JPSO0020.html

> >Now I know "ISO-2022-JP-2004 vs Unicode
> >mapping table" at http://x0213.org/codetable/iso-2022-jp-2004-std.txt

I wonder whether either or both of these links would be good to have
in the registration:

http://www.itscj.ipsj.or.jp/ISO-IR/233.pdf
http://www.itscj.ipsj.or.jp/ISO-IR/

Erik van der Poel
Editor and co-author of RFC 1468 (iso-2022-jp)
Ned Freed
2006-10-01 19:24:44 UTC
Permalink
> Hello,

> I have a few questions about this registration:

> > At 00:28 06/09/28, Koichi Yasuoka wrote:
> > >=?ISO-2022-JP-2004?Q?=1B=24B0B2=2C9=270l=1B=28B?=

> I believe that, in general, many of us recommend being conservative in
> what you send out, liberal in what you accept. Therefore, the
> recommendation is to use the charset label that matches the smallest
> subset of characters actually used in the text, as well as using the
> oldest and/or most commonly accepted name.

I agree with the sentiment but I'm a bit concerned as to the choice of metric.
Taken to an extreme, picking a charset that most closely aligns with the
repetoire used can lead to the use of some very obscure charsets that don't
enjoy wide support. I think a more reasonable approach is to try and choose a
charset based both on its alignment with what you're doing and how well
supported it is. Hopefully as times goes on and UTF-8 support becomes truly
ubituitous it will become the charset of choice, never mind the fact that most
uses of it won't involve more than a small fraction of its repetoire.

> In this case, you are
> clearly using the ESC $ B (1B 24 42) that is part of iso-2022-jp (rfc
> 1468). Therefore, the more conservative option is to use the name
> iso-2022-jp when sending this particular piece of text.

Yes, in this specific case it would be better to use iso-2022-jp.

> I have noticed over the years that if you don't spell out the
> recommendations, implementors will do the wrong thing. In this case,
> would it be a good idea to add such recommendations to the
> registration itself? Or should a new RFC be written, in order to
> provide the recommendations in more detail?

I certainly would have no problem with such a recommendation being part of the
registration, although I'm not quite sure how I'd embody such a recommendation
in an actual implementation.

> > >I understand that ISO-2022-JP texts with "ESC $ B" and
> > >"ESC ( B" can be accepted by ISO-2022-JP-2004 decoder.
> > >It is problematic when "ESC $ @" or "ESC ( J" is used but
> > >they are very rare now.

> Which escape sequences are permitted in iso-2022-jp-2004? There are 3
> problems with the link you sent earlier*: The first page is in
> Japanese, and when you search for X0213, the results are in Japanese
> too. Then X0213 is split into many PDFs, and it is not clear which one
> to download in order to see the escape sequences, nor am I inclined to
> download all of the pieces. Finally, that site was down yesterday and
> up today. How often does it go down?

> * http://www.jisc.go.jp/app/JPS/JPSO0020.html

> > >Now I know "ISO-2022-JP-2004 vs Unicode
> > >mapping table" at http://x0213.org/codetable/iso-2022-jp-2004-std.txt

> I wonder whether either or both of these links would be good to have
> in the registration:

> http://www.itscj.ipsj.or.jp/ISO-IR/233.pdf
> http://www.itscj.ipsj.or.jp/ISO-IR/

> Erik van der Poel
> Editor and co-author of RFC 1468 (iso-2022-jp)

Well, since you kinda sorta brought it up, I have an issue to raise with RFC
1468. In the formal syntax, it says:

single-byte-seq = ESC "(" ( "B" / "J" )
single-byte-segment = single-byte-seq 1*single-byte-char

The "1*" in this turns out to be fairly problematic in how it interacts with
MIME encoded words. To use the earlier encoded-word as the basis for an
example, suppose you have a header field containing:

=?ISO-2022-JP?Q?=1B=24B0B2=2C9=270l=1B=28B?= =?ISO-2022-JP?Q?=1B=24B0B2=2C9=270l=1B=28B?=

According to encoded-word rules, the space between adjacent encoded words is
supposed to be discarded when decoding. So this decodes to a sequence that has
this in it:

ESC ( B ESC $ B

And this is illegal according to the formal grammar, which basically says that
ESC ( B either has to appear at the end of a segment or else has to be followed
by some amount of ASCII text. And unfortunately there are implementations out
there that refuse to decode this. (IMO such implementations are in violation of
the robustness principle, but they are technically within their rights
according to the standards.)

Addressing this means that the encoded word decoder (which often operates at a
completely different level than charset handling) has to be made charset-aware
enough to know to remove the ESC ( B ESC $ B sequence in its entirety, which is
more than a little ugly.

The obvious remedy is to change the rule to be:

single-byte-segment = single-byte-seq 0*single-byte-char

But this then brings up the concern that it will make some currently compliant
implementations incompliant.

I frankly don't see a good way to fix this. Suggestions?

Ned
Erik van der Poel
2006-10-01 23:32:16 UTC
Permalink
> > > >=?ISO-2022-JP-2004?Q?=1B=24B0B2=2C9=270l=1B=28B?=
>
> > I believe that, in general, many of us recommend being conservative in
> > what you send out, liberal in what you accept. Therefore, the
> > recommendation is to use the charset label that matches the smallest
> > subset of characters actually used in the text, as well as using the
> > oldest and/or most commonly accepted name.
>
> I agree with the sentiment but I'm a bit concerned as to the choice of metric.
> Taken to an extreme, picking a charset that most closely aligns with the
> repetoire used can lead to the use of some very obscure charsets that don't
> enjoy wide support.

True. :-) I should have been more careful with my wording. Maybe one
of the RFCs already says something about choosing the smallest or most
commonly used charset.

> Well, since you kinda sorta brought it up, I have an issue to raise with RFC
> 1468. In the formal syntax, it says:
>
> single-byte-seq = ESC "(" ( "B" / "J" )
> single-byte-segment = single-byte-seq 1*single-byte-char
>
> The "1*" in this turns out to be fairly problematic in how it interacts with
> MIME encoded words. To use the earlier encoded-word as the basis for an
> example, suppose you have a header field containing:
>
> =?ISO-2022-JP?Q?=1B=24B0B2=2C9=270l=1B=28B?= =?ISO-2022-JP?Q?=1B=24B0B2=2C9=270l=1B=28B?=
>
> According to encoded-word rules, the space between adjacent encoded words is
> supposed to be discarded when decoding. So this decodes to a sequence that has
> this in it:
>
> ESC ( B ESC $ B
>
> And this is illegal according to the formal grammar, which basically says that
> ESC ( B either has to appear at the end of a segment or else has to be followed
> by some amount of ASCII text. And unfortunately there are implementations out
> there that refuse to decode this. (IMO such implementations are in violation of
> the robustness principle, but they are technically within their rights
> according to the standards.)
>
> Addressing this means that the encoded word decoder (which often operates at a
> completely different level than charset handling) has to be made charset-aware
> enough to know to remove the ESC ( B ESC $ B sequence in its entirety, which is
> more than a little ugly.
>
> The obvious remedy is to change the rule to be:
>
> single-byte-segment = single-byte-seq 0*single-byte-char
>
> But this then brings up the concern that it will make some currently compliant
> implementations incompliant.
>
> I frankly don't see a good way to fix this. Suggestions?

Don't know whether people would like this solution, but one way would
be to have a spec for decoding and a spec for encoding. When encoding,
it should be 1* and when decoding it can be * (same as 0*). But
perhaps this would interact badly with other parts of the spec?

When encoding encoded-words, each piece of iso-2022-jp text is to be
encoded separately, so there would have to be an ESC ( B or ESC ( J at
the end of each piece, if not already in that state.

Erik
Keith Moore
2006-10-01 23:56:38 UTC
Permalink
> The obvious remedy is to change the rule to be:
>
> single-byte-segment = single-byte-seq 0*single-byte-char
>
> But this then brings up the concern that it will make some currently
> compliant implementations incompliant.

that's not a good reason to not change the specification. of course
old implementations won't necessarily be compliant with a new version of
the specification. the whole point of changing the specification is to
encourage better behavior in new software and new versions of old software.
Erik van der Poel
2006-10-29 16:47:33 UTC
Permalink
If you look at the archived message, you will see "Spam-test: False ;
1.6 / 4.5 ; DEAR_SOMETHING":

http://mail.apps.ietf.org/ietf/charsets/msg01664.html

Are we supposed to refrain from addressing each other as "Dear ..."? :-)

(When I received the email, it also had a SPAM warning in the Subject
header, but this might be a corporate spam filter.)

Erik

On 10/27/06, Koichi Yasuoka <***@kanji.zinbun.kyoto-u.ac.jp> wrote:
> Dear Sirs,
>
> I sent you the new registration form of "ISO-2022-JP-2004" last week, but
> it doesn't appear on http://mail.apps.ietf.org/ietf/charsets/maillist.html
> yet. Now I send it to you again.
>
> Best Regards,
> Koichi Yasuoka
>
> ============================================================
> *Charset name:
>
> ISO-2022-JP-2004
>
> *Charset aliases:
>
> ISO-2022-JP-2003
> ISO-2022-JP-3-2003
>
> *Suitability for use in MIME text:
>
> Suitable for 7-bit use in MIME body-part as text/plain or text/html.
> B-encoding is recommended for use in MIME header-part, because
> ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.
>
> *Published specification:
>
> JIS X 0213 7-bit and 8-bit double byte coded extended KANJI sets for
> information interchange, Japanese Standards Association (first edition
> 2000-01-20, amendment 2004-02-20, corrigendum 2004-04-01).
>
> *ISO 10646 equivalency table:
>
> No direct URI to the official equivalency table, but the table is
> included in JIS X 0213, which can be found via
> http://www.jisc.go.jp/app/JPS/JPSO0020.html
> with searching the word "X0213".
>
> An unofficial equivalency table can be found at
> http://x0213.org/codetable/iso-2022-jp-2004-std.txt
> prepared by "Project X0213".
>
> *Additional information:
>
> Escape sequences used in ISO-2022-JP-2004 are:
> - ESC ( B for ISO/IEC 646 IRV
> - ESC $ ( Q for the full plane 1 of JIS X 0213:2004
> - ESC $ ( P for plane 2 of JIS X 0213
> - ESC $ ( O for a subset of plane 1 of JIS X 0213
> (20 characters omitted)
> - ESC $ B for a subset of plane 1 of JIS X 0213
> (also a subset of the plane/table from JIS X 0208)
>
> For "ESC $ ( O", 20 characters, of which 10 characters(*) were included
> in old JIS X 0213:2000 and the other 10 characters are the full set of
> the "added characters" in 2004, are omitted:
>
> 1-14-1, 1-15-94, 1-17-19(*), 1-22-70(*), 1-23-50(*), 1-28-24(*),
> 1-33-73(*), 1-38-61(*), 1-39-77(*), 1-47-52, 1-47-94, 1-53-11(*),
> 1-54-2(*), 1-54-85(*), 1-84-7, and 1-94-90 to 1-94-94.
>
> For "ESC $ B", following characters are omitted from JIS X 0213:2004,
> and the result is a subset of JIS X 0208:1997:
>
> 1-2-15 to 1-3-15, 1-3-26 to 1-3-32, 1-3-59 to 1-3-64,
> 1-3-91 to 1-3-94, 1-4-84 to 1-4-91, 1-5-87 to 1-5-94,
> 1-6-25 to 1-6-32, 1-6-57 to 1-6-94, 1-7-34 to 1-7-48,
> 1-7-82 to 1-8-62, 1-8-71 to 1-8-92, 1-9-1 to 1-12-83,
> 1-12-93 to 1-13-55, 1-13-63 to 1-13-79, 1-13-83, 1-13-88,
> 1-13-89, 1-13-93, 1-13-94, 1-14-1 to 1-15-94, 1-16-2, 1-16-19,
> 1-16-79, 1-17-19, 1-17-58, 1-17-75, 1-17-79, 1-18-3, 1-18-9,
> 1-18-10, 1-18-11, 1-18-25, 1-18-50, 1-18-89, 1-19-4, 1-19-20,
> 1-19-21, 1-19-34, 1-19-41, 1-19-69, 1-19-73, 1-19-76, 1-19-86,
> 1-19-90, 1-20-18, 1-20-33, 1-20-35, 1-20-50, 1-20-79, 1-20-91,
> 1-21-7, 1-21-85, 1-22-2, 1-22-31, 1-22-33, 1-22-38, 1-22-48,
> 1-22-64, 1-22-70, 1-22-77, 1-23-16, 1-23-39, 1-23-50, 1-23-59,
> 1-23-66, 1-24-6, 1-24-20, 1-25-60, 1-25-77, 1-25-82, 1-25-85,
> 1-27-6, 1-27-67, 1-27-75, 1-28-24, 1-28-40, 1-28-41, 1-28-49,
> 1-28-50, 1-28-52, 1-29-11, 1-29-13, 1-29-43, 1-29-75, 1-29-77,
> 1-29-79, 1-29-80, 1-29-84, 1-30-36, 1-30-45, 1-30-53, 1-30-63,
> 1-30-85, 1-31-32, 1-31-57, 1-32-5, 1-32-65, 1-32-70, 1-33-8,
> 1-33-36, 1-33-46, 1-33-56, 1-33-63, 1-33-67, 1-33-73, 1-33-93,
> 1-33-94, 1-34-3, 1-34-8, 1-34-45, 1-34-86, 1-35-18, 1-35-29,
> 1-35-86, 1-35-88, 1-36-7, 1-36-8, 1-36-45, 1-36-47, 1-36-59,
> 1-36-87, 1-37-22, 1-37-31, 1-37-52, 1-37-55, 1-37-78, 1-37-83,
> 1-37-88, 1-38-33, 1-38-34, 1-38-45, 1-38-61, 1-38-81, 1-38-86,
> 1-39-25, 1-39-63, 1-39-72, 1-39-77, 1-40-14, 1-40-16, 1-40-43,
> 1-40-53, 1-40-60, 1-40-74, 1-41-16, 1-41-48, 1-41-49, 1-41-50,
> 1-41-51, 1-41-78, 1-42-1, 1-42-27, 1-42-29, 1-42-57, 1-42-66,
> 1-43-43, 1-43-47, 1-43-72, 1-43-74, 1-43-89, 1-44-40, 1-44-45,
> 1-44-65, 1-44-89, 1-45-20, 1-45-58, 1-45-73, 1-45-74, 1-45-83,
> 1-46-20, 1-46-26, 1-46-48, 1-46-62, 1-46-64, 1-46-81, 1-46-82,
> 1-46-93, 1-47-3, 1-47-13, 1-47-15, 1-47-22, 1-47-25, 1-47-26,
> 1-47-31, 1-47-52 to 1-47-94, 1-48-54, 1-52-68, 1-53-11, 1-54-2,
> 1-54-85, 1-57-88, 1-58-25, 1-59-56, 1-59-77, 1-62-25, 1-62-85,
> 1-63-70, 1-64-86, 1-66-72, 1-66-74, 1-67-62, 1-68-38, 1-73-2,
> 1-73-14, 1-73-58, 1-74-4, 1-75-61, 1-76-45, 1-77-78, 1-80-55,
> 1-80-84, 1-82-45, 1-82-84, and 1-84-1 to 1-94-94.
>
> "ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" were in the first print of
> JIS X 0213:2004 dated February 20, 2004, and they were both corrected
> to "ISO-2022-JP-2004" in the corrigendum dated April 1, 2004. To avoid
> complications "ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" may be aliases
> of "ISO-2022-JP-2004", but "ISO-2022-JP-2004" is preferred.
>
> *Person & email address to contact for further information:
>
> Koichi Yasuoka
> ***@kanji.zinbun.kyoto-u.ac.jp
>
> *Intended usage:
>
> COMMON
>
>
Ned Freed
2006-11-01 03:44:45 UTC
Permalink
> If you look at the archived message, you will see "Spam-test: False ;
> 1.6 / 4.5 ; DEAR_SOMETHING":

That's a SpamAssassin scan result. It clearly shows the message
was NOT seen as spam.

> http://mail.apps.ietf.org/ietf/charsets/msg01664.html

> Are we supposed to refrain from addressing each other as "Dear ..."? :-)

SA has a lot of rules to check for stuff that ordinary messages sometimes
trigger. Given the low weight of such rules they rarely cause problems in
practice.

> (When I received the email, it also had a SPAM warning in the Subject
> header, but this might be a corporate spam filter.)

The list software I use doesn't do this; must be something else in the
mix that's adding it.

Ned
Erik van der Poel
2006-11-06 01:48:10 UTC
Permalink
Sorry, that was confusing. Implementors that are concerned about
ancient software will not be using the new escape sequences (and new
characters) anyway. They will probably use iso-2022-jp, which is
already described in RFC 1468.

I still wonder whether we should mention format=flowed in the
iso-2022-jp-2004 registration, as an alternative to quoted-printable
when lines (or paragraphs) are long. If we do mention format=flowed,
we may also wish to describe the Japanese line breaking rules to some
extent. Then again, maybe all this is too much for an IANA charset
registration -- maybe it belongs in an RFC?

Erik

> > Suitable for 7-bit use in MIME body-part as text/plain or text/html.
>
> This wording might be misinterpreted to mean that iso-2022-jp-2004 can
> only be used with text/plain and text/html. I believe this charset is
> suitable for use with any subtype of text (text/*).
>
> Also, I wonder if we should say anything about
> Content-Transfer-Encoding. The quoted-printable encoding does not look
> good at all in very old (pre-MIME) implementations (and is not
> necessary when the lines are short enough). The format=flowed
> transformation would work fine, even in ancient software. (Do we care
> about pre-MIME implementations?)
>
> > B-encoding is recommended for use in MIME header-part, because
> > ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.
>
> The term "header-part" is not normally used for these. They are called
> encoded-words. How about: "The 'B' encoding is recommended for use
> with MIME encoded-words in headers, as is recommended for the related
> charset iso-2022-jp."
>
> > - ESC ( B for ISO/IEC 646 IRV
>
> I wonder if we should add JIS X 0211 here. This set is similar to (if
> not identical to) ISO 6429 control characters in the range 0x00 to
> 0x1F. The most commonly used controls in the iso-2022-jp family are
> horizontal TAB, CR, LF and ESC. SI and SO _must_ not be used.
>
> Erik
Frank Ellermann
2006-11-06 02:49:03 UTC
Permalink
Erik van der Poel wrote:

> maybe all this is too much for an IANA charset registration

s/maybe/certainly/ - mail oddities like 76, 78, 998, etc. aren't
relevant for the charset registry, unless we manage to create
charset names that are too long for a 4646 / 2231 "enhanced"
2047 word. The 4646 folks used "pc-multilingual-850+euro" as a
worst case, it should work.

> maybe it belongs in an RFC?

IMO nothing's wrong with RFC 2646.

Frank
Kent Karlsson
2006-11-06 09:26:49 UTC
Permalink
> > maybe it belongs in an RFC?
>
> IMO nothing's wrong with RFC 2646.

It does not deal with RTL/bidi scripts nor with
CJK line breaking rules (which is, I think, what
Erik was referring to).

UAX 9 (on bidi), UAX 14 (on line breaking), along
with associated data files and the Unicode book
get you quite far (though not ALL the way, perhaps;
indeed bordercases like having ">":s at the
beginning of lines aren't covered).

/kent k
Kent Karlsson
2006-11-06 09:27:15 UTC
Permalink
> > maybe it belongs in an RFC?
>
> IMO nothing's wrong with RFC 2646.

It does not deal with RTL/bidi scripts nor with
CJK line breaking rules (which is, I think, what
Erik was referring to).

UAX 9 (on bidi), UAX 14 (on line breaking), along
with associated data files and the Unicode book
get you quite far (though not ALL the way, perhaps;
indeed bordercases like having ">":s at the
beginning of lines aren't covered).

/kent k
Frank Ellermann
2006-11-06 20:11:09 UTC
Permalink
Ned Freed wrote:

> RFC 2646 is obsolete. The replacement is RFC 3676

Thanks, I was confused when my tool to find "flowed" in
some "new" (3xxx or 4xxx) RFCs found nothing. A bug,
the tool didn't work for one hit (3676), I've fixed it.

Frank
--
http://purl.net/xyzzy/src/ygrep.cmd
Loading...