Registration of new charset CP50220

Discussion:

NARUSE Yui

2010-04-21 12:25:47 UTC

Charset name: CP50220

Charset aliases: csCP50220

Suitability for use in MIME text:

Yes, CP50220 is suitable for use with subtypes of the "text" Content-Type.

Since the "CP50220" is 7bit encoding, Content-Transfer-Encoding is not needed.
Based64 or Quoted-Printable encoding MAY break this encoding.

Published specification(s):

CP50220 is consisted by following character sets:

reg# character set ESC sequence designated to
------------------------------------------------------
6 US-ASCII ESC ( B G0
13 JIS X 0201-Katakana ESC ( I G0
14 JIS X 0201-Roman ESC ( J G0
42 JIS X 0208-1978 ESC $ @ G0
87 JIS X 0208-1983 ESC $ B G0
13 JIS X 0201-Katakana ESC ) I G1
reg# character set shift in with designated to
------------------------------------------------------
6 US-ASCII SI G0
13 JIS X 0201-Katakana SO G0

* The beggining of a text is assumed to have "ESC ( B ESC ) I".
* Each line of CP50220 text MUST end with ASCII.
* There are two kinds of shifts: SI and SO. Shift functions
specify how to interpret the subsequent bytes.
* The shift SI (one byte with hexadecimal value 0F) declares that
subsequent bytes are interpreted in US-ASCII.
* The shift SO (one byte with hexadecimal value 0E) declares that
subsequent bytes are interpreted in JIS X 0201 Katakana.
* On receiving JIS X 0201-Katakana characters MAY be encoded as
* GL with the escape sequence: ESC ( I
* GL with the shifts: SI / SO
* GR
* On sending JIS X 0201-Katakana, it MUST be converted to related
character of JIS X 0208.
* The character set of CP50220 is based on Windows Codepage 932.
So a meaning and a map to Unicode of each character is refer to it.
http://msdn.microsoft.com/en-us/goglobal/cc305152.aspx

ISO 10646 equivalency table:

This charset is ISO/IEC 2022 family.
Conversion of each character refers Windows Codepage 932:
http://msdn.microsoft.com/en-us/goglobal/cc305152.aspx
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
http://icu-project.org/repos/icu/data/trunk/charset/data/ucm/windows-932-2000.ucm

Additional information:

This is a request for a new registration of this charset.

CP50220 is a variant of ISO-2022-JP (like Windows-31J and Shift_JIS).
this charset is different from ISO-2022-JP in:
* CP50220 supports JIS X 0201-Katakana
* CP50220 supports characters extended by Windows Codepage 932
* Unicode mapping of some characters are different

Typical user of CP50220 is web browsers. When web browsers load
a page which are declared or auto-detected as "ISO-2022-JP", they
don't interpret it as true ISO-2022-JP registerd in IANA Character
Sets but as CP50220. When they post form data as "ISO-2022-JP",
the data is also encoded as CP50220. Note that though csISO2022JP
is alias of ISO-2022-JP in IANA Character Sets, on Windows it means
neither registered ISO-2022-JP nor CP50220 but means CP50221.

Another typical user is Japanese IRC network. They sometimes send
JIS X 0201-Katakana encoded in GR (JIS8).

The name "CP50220" is in use following applications:
* Citrus iconv (NetBSD and DragonFly uses this)
* Mojikan http://www.mirai-ii.co.jp/moji/mojikan/
* nkf 2.0.5
* Encode-EUCJPMS-0.06

Moreover applications which uses MLang.DLL or .NET Framework for
converting "ISO-2022-JP" implicitly uses this charset.

So this charset is widely used, but doesn't have its own name.

Why the name is not "Windows-50220" is some of applications which accept
the name "CP50220" don't support the name "Windows-50220".

CP50220 is for use of communicating with legacy system.
UTF-8 is preferred to CP50220 for new system.

Related references are:

"Remarks" of "GetEncodings Method" of "System.Text"
http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx

"Unicode$B$K$h$k(BJIS X0213$B<BAuF~Lg!=>pJs%7%9%F%`$N?7$?$JF|K\8l=hM}4D6-(B"
$BF|7P(BBP$B%=%U%H%W%l%9(B, ISBN 978-4891006082, 2008, p. 17-18, 20, 120-158

CP50220 - Legacy Encoding Project
http://legacy-encoding.sourceforge.jp/wiki/index.php?cp50220

This charset is also known as Windows Codepage 50220.

Person & email address to contact for further information:

NARUSE, Yui
Email: ***@ruby-lang.org

Intended usage: LIMITED USE

--
NARUSE, Yui
***@airemix.jp

Masatoshi Kimura

2010-04-21 16:21:23 UTC