Discussion:
Big5 / CP950
Shawn Steele
2011-09-19 17:23:44 UTC
Permalink
Murata-san has asked us to update the big5 entry similarly to what we did for shift_jis, pointing out that Big5 has vendor-specific variations as well. Eg: add something like this:

Several vendor specific charsets that derive from Big5 often use
the Big5 name instead of a more specific vendor charset name.
Windows Code Page 950 is one example, Big-5 HKSCS, Big5+ and
several font specific variations are others.

However, I don’t see a big5 entry in the charset registry, only the entry in the Character Sets table. There is an entry for Big5-HKSCS (which probably fits the definition of one of the Big5 variants above), but HKSCS is a variant.

Am I missing something? Should a new entry for big5 be created (pointing to something like http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT)? Other suggestions?

Thanks,

-Shawn

 
http://blogs.msdn.com/shawnste
Martin J. Dürst
2011-09-20 01:08:23 UTC
Permalink
Hello Shawn,
Murata-san has asked us to update the big5 entry similarly to what =
we did for shift_jis, pointing out that Big5 has vendor-specific vari=
ations as well. Eg: add something like this:

Thanks for picking this up.
Several vendor specific charsets that derive from Big5 often use
the Big5 name instead of a more specific vendor charset name.
Windows Code Page 950 is one example, Big-5 HKSCS, Big5+ and
several font specific variations are others.
However, I don=E2=80=99t see a big5 entry in the charset registry, =
only the entry in the Character Sets table.

I don't understand what you mean. The registry is at=20
http://www.iana.org/assignments/character-sets. I see an entry that s=
ays:

Name: Big5 (preferred MIME name)
MIBenum: 2026
Source: Chinese for Taiwan Multi-byte set.
PCL Symbol Set Id: 18T
Alias: csBig5

Do you mean that Big5 doesn't appear at=20
http://www.iana.org/assignments/charset-reg/index.html ? The fact tha=
t=20
it doesn't appear there and also doesn't refer to an RFC definitely=
=20
means that this registration is somewhat at the lower end of what's=
=20
expected, but it's nevertheless registered.
There is an entry for Big5-HKSCS (which probably fits the definitio=
n of one of the Big5 variants above), but HKSCS is a variant.
Am I missing something? Should a new entry for big5 be created (po=
inting to something like http://unicode.org/Public/MAPPINGS/OBSOLETE/=
EASTASIA/OTHER/BIG5.TXT)? Other suggestions?

For an update of the Big5 registration, it would indeed be good if yo=
u=20
proposed a filled-in template, adding references such as the above wh=
ere=20
needed.

Regards, Martin.
Shawn Steele
2011-09-20 01:34:41 UTC
Permalink
Do you mean that Big5 doesn't appear at http://www.iana.org/assignments/charset-reg/index.html ? The fact that it doesn't appear there and also doesn't refer to an RFC definitely means that this registration is somewhat at the lower end of what's expected, but it's nevertheless registered.
Yes, that's what I meant, I was wondering if I was missing a reference :) Eg: I have to do the whole thing, not just a little tweak to an existing template ;-)
Shawn Steele
2011-09-20 17:44:25 UTC
Permalink
Here's some proposed text for a more complete registration. Comments welcome. AFAICT this code page is quite a bit less stable than others, and there are a plethora of mappings. I've included two ISO10646 equivalency tables for that reason.

Thanks,
Shawn


-----------------------------------

Charset name: big5

Charset aliases: (None)

MIBenum: 2026

Suitability for use in MIME text:

Yes, big5 is suitable for use with subtypes of the "text"
Content-Type. Note that big5 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.

Two example ISO 10646 equivalency tables: Note that Big5 has
many variants, so these exemplars provide two common mappings:
http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT

Additional information:

Several vendor specific charsets that derive from Big5 often use
the Big5 name instead of a more specific vendor charset name.
Big5-HKSCS is one example, Microsoft Code Page 950, Big5+ and
several font specific variations are other examples.

Although not authoritative, the following references may also be of
interest:

Printed mapping table:
Dr. International "Developing International Software, Second Edition",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 778 and appendixes on CD.

Microsoft windows extended "best fit" behavior:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt

Again not authoritative, but the Wikipedia article currently touches
on the many variations of Big5 and may be of interest to implementers:
http://en.wikipedia.org/wiki/Big-5

The wide variety of existing variations of Big5 may make it
unsuitable for many modern applications. Developers should
consider whether UTF-8 or UTF-16 would be more appropriate for
new applications.

This is an update of an existing registration of this charset. This
charset name is in use.

This charset is also known as Windows Code Page 950 or cp950 for
short; these are NOT aliases.

Person & email address to contact for further information:

Shawn Steele
Email: Shawn.Steele&microsoft.com

Microsoft Corporation
One Microsoft Way
Redmond,
Martin J. Dürst
2011-09-21 08:00:55 UTC
Permalink
Hello Shawn,
Post by Shawn Steele
Here's some proposed text for a more complete registration.
Many thanks for doing this work. Some comments below, mostly nits.
Post by Shawn Steele
Comments welcome. AFAICT this code page is quite a bit less stable than others, and there are a plethora of mappings. I've included two ISO10646 equivalency tables for that reason.
Thanks,
Shawn
-----------------------------------
Charset name: big5
Charset aliases: (None)
MIBenum: 2026
Yes, big5 is suitable for use with subtypes of the "text"
Content-Type. Note that big5 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.
Two example ISO 10646 equivalency tables: Note that Big5 has
http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
(I'd put the "Note that Big5 has many variants...) after the URIs)
Post by Shawn Steele
Several vendor specific charsets that derive from Big5 often use
the Big5 name instead of a more specific vendor charset name.
Big5-HKSCS is one example, Microsoft Code Page 950, Big5+ and
several font specific variations are other examples.
From what I have read in the Wikipedia article, Big5+ seems to be quite
far away from the "average" Big5 variant. I'm not sure I'd list it up here.
Post by Shawn Steele
Although not authoritative, the following references may also be of
Dr. International "Developing International Software, Second Edition",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 778 and appendixes on CD.
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt
Again not authoritative, but the Wikipedia article currently touches
http://en.wikipedia.org/wiki/Big-5
I'd personally shorten the text here, e.g. to something like:
"Additional information about the many variants of Big5:"
Post by Shawn Steele
The wide variety of existing variations of Big5 may make it
unsuitable for many modern applications. Developers should
consider whether UTF-8 or UTF-16 would be more appropriate for
new applications.
This is an update of an existing registration of this charset. This
charset name is in use.
This charset is also known as Windows Code Page 950 or cp950 for
short; these are NOT aliases.
Shawn Steele
Email: Shawn.Steele&microsoft.com
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
U.S.A.
Intended usage: COMMON
You have "COMMON" here while your Shift_JIS registration has "LIMITED".
Is that by accident, or is there some rationale behind it?

Regards, Martin.
Shawn Steele
2011-09-21 15:13:10 UTC
Permalink
Moved the note, Removed big5+, if anyone knows other examples, I'd include those.
Post by Martin J. Dürst
You have "COMMON" here while your Shift_JIS registration has "LIMITED".
Is that by accident, or is there some rationale behind it?
Um, by accident. I copied the original shift-jis registration, and used the windows-1252 as a template for this. I have no clue what the distinction is :) Changed to LIMITED USE. (reasoning that the variations are cause instability between implementations, so I'd much rather have people picking something like UTF-8). Is there a definition of these terms? All of them should be OBSOLETE in favor of UTF-* ;-) I'd use that if I could get away with it.

-Shawn

 
http://blogs.msdn.com/shawnste

________________________________________
From: "Martin J. Dürst" [***@it.aoyama.ac.jp]
Sent: Wednesday, September 21, 2011 1:00 AM
To: Shawn Steele
Cc: 'ietf-***@mail.apps.ietf.org'; Makoto Murata (eb2m-***@asahi-net.or.jp)
Subject: Re: Big5 / CP950

Hello Shawn,
Post by Martin J. Dürst
Here's some proposed text for a more complete registration.
Many thanks for doing this work. Some comments below, mostly nits.
Post by Martin J. Dürst
Comments welcome. AFAICT this code page is quite a bit less stable than others, and there are a plethora of mappings. I've included two ISO10646 equivalency tables for that reason.
Thanks,
Shawn
-----------------------------------

Charset name: big5

Charset aliases: (None)

MIBenum: 2026

Suitability for use in MIME text:

Yes, big5 is suitable for use with subtypes of the "text"
Content-Type. Note that big5 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.

Two example ISO 10646 equivalency tables: http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT

Note that Big5 has many variants, so these exemplars provide two
common mappings:

Additional information:

Several vendor specific charsets that derive from Big5 often use
the Big5 name instead of a more specific vendor charset name.
Big5-HKSCS is one example, Microsoft Code Page 950, and
several font specific variations are other examples.

Although not authoritative, the following references may also be of
interest:

Printed mapping table:
Dr. International "Developing International Software, Second Edition",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 778 and appendixes on CD.

Microsoft windows extended "best fit" behavior:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt

Additional information about the many variants of Big5:
http://en.wikipedia.org/wiki/Big-5

The wide variety of existing variations of Big5 may make it
unsuitable for many modern applications. Developers should
consider whether UTF-8 or UTF-16 would be more appropriate for
new applications.

This is an update of an existing registration of this charset. This
charset name is in use.

This charset is also known as Windows Code Page 950 or cp950 for
short; these are NOT aliases.

Person& email address to contact for further information:

Shawn Steele
Email: Shawn.Steele&microsoft.com

Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
U.
Ira McDonald
2011-09-21 16:40:32 UTC
Permalink
Hi Shawn,

RFC 2978 section 5 'Charset Registration Template'

"Intended usage:

(One of COMMON, LIMITED USE or OBSOLETE)"

The spirit of LIMITED USE has been to discourage the use
of legacy charsets that are particularly problematic - Big5.

Not sure if OBSOLETE has ever been used.

Martin - searching for this made me realize that the
plaintext IANA Charset Registry at

ftp://ftp.iana.org/assignments/character-sets

contains 257 entries - they don't include the Intended
Usage field.

I suggest we work w/ IANA to change the plaintext
registry.

In most cases this data is long lost (if ever submitted)
because the directory

ftp://ftp.iana.org/assignments/charset-reg

contains only 55 entries.

Cheers,
- Ira

Ira McDonald (Musician / Software Architect)
Chair - Linux Foundation Open Printing WG
Co-Chair - IEEE-ISTO PWG IPP WG
Chair - TCG Embedded Systems Hardcopy SWG
IETF Designated Expert - IPP & Printer MIB
Blue Roof Music/High North Inc
http://sites.google.com/site/blueroofmusic
http://sites.google.com/site/highnorthinc
mailto:***@gmail.com
Christmas through April:
579 Park Place Saline, MI 48176
734-944-0094
May to Christmas:
PO Box 221 Grand Marais, MI 49839
906-494-2434



On Wed, Sep 21, 2011 at 11:13 AM, Shawn Steele
Post by Shawn Steele
Moved the note, Removed big5+, if anyone knows other examples, I'd include those.
Post by Martin J. Dürst
You have "COMMON" here while your Shift_JIS registration has "LIMITED".
Is that by accident, or is there some rationale behind it?
Um, by accident. I copied the original shift-jis registration, and used
the windows-1252 as a template for this. I have no clue what the
distinction is :) Changed to LIMITED USE. (reasoning that the variations
are cause instability between implementations, so I'd much rather have
people picking something like UTF-8). Is there a definition of these terms?
All of them should be OBSOLETE in favor of UTF-* ;-) I'd use that if I
could get away with it.
-Shawn
 
http://blogs.msdn.com/shawnste
________________________________________
Sent: Wednesday, September 21, 2011 1:00 AM
To: Shawn Steele
Subject: Re: Big5 / CP950
Hello Shawn,
Post by Martin J. Dürst
Here's some proposed text for a more complete registration.
Many thanks for doing this work. Some comments below, mostly nits.
Post by Martin J. Dürst
Comments welcome. AFAICT this code page is quite a bit less stable than
others, and there are a plethora of mappings. I've included two ISO10646
equivalency tables for that reason.
Post by Martin J. Dürst
Thanks,
Shawn
-----------------------------------
Charset name: big5
Charset aliases: (None)
MIBenum: 2026
Yes, big5 is suitable for use with subtypes of the "text"
Content-Type. Note that big5 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.
http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
Note that Big5 has many variants, so these exemplars provide two
Several vendor specific charsets that derive from Big5 often use
the Big5 name instead of a more specific vendor charset name.
Big5-HKSCS is one example, Microsoft Code Page 950, and
several font specific variations are other examples.
Although not authoritative, the following references may also be of
Dr. International "Developing International Software, Second Edition",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 778 and appendixes on CD.
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt
http://en.wikipedia.org/wiki/Big-5
The wide variety of existing variations of Big5 may make it
unsuitable for many modern applications. Developers should
consider whether UTF-8 or UTF-16 would be more appropriate for
new applications.
This is an update of an existing registration of this charset. This
charset name is in use.
This charset is also known as Windows Code Page 950 or cp950 for
short; these are NOT aliases.
Shawn Steele
Email: Shawn.Steele&microsoft.com
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
U.S.A.
Intended usage: LIMITED USE
Shawn Steele
2011-09-21 18:05:02 UTC
Permalink
I saw the “one of
”, but they aren’t defined in the RFC? Your spirit of Limited Use sounds about right for big5 though.

Thanks,
Shawn

From: Ira McDonald [mailto:***@gmail.com]
Sent: Wednesday, September 21, 2011 9:41 AM
To: Shawn Steele; Ira McDonald
Cc: "Martin J. DÃŒrst"; ietf-***@mail.apps.ietf.org; Makoto Murata (eb2m-***@asahi-net.or.jp)
Subject: Re: Big5 / CP950

Hi Shawn,

RFC 2978 section 5 'Charset Registration Template'

"Intended usage:

(One of COMMON, LIMITED USE or OBSOLETE)"

The spirit of LIMITED USE has been to discourage the use
of legacy charsets that are particularly problematic - Big5.

Not sure if OBSOLETE has ever been used.

Martin - searching for this made me realize that the
plaintext IANA Charset Registry at

ftp://ftp.iana.org/assignments/character-sets

contains 257 entries - they don't include the Intended
Usage field.

I suggest we work w/ IANA to change the plaintext
registry.

In most cases this data is long lost (if ever submitted)
because the directory

ftp://ftp.iana.org/assignments/charset-reg

contains only 55 entries.

Cheers,
- Ira

Ira McDonald (Musician / Software Architect)
Chair - Linux Foundation Open Printing WG
Co-Chair - IEEE-ISTO PWG IPP WG
Chair - TCG Embedded Systems Hardcopy SWG
IETF Designated Expert - IPP & Printer MIB
Blue Roof Music/High North Inc
http://sites.google.com/site/blueroofmusic
http://sites.google.com/site/highnorthinc
mailto:***@gmail.com<mailto:***@gmail.com>
Christmas through April:
579 Park Place Saline, MI 48176
734-944-0094
May to Christmas:
PO Box 221 Grand Marais, MI 49839
906-494-2434


On Wed, Sep 21, 2011 at 11:13 AM, Shawn Steele <***@microsoft.com<mailto:***@microsoft.com>> wrote:
Moved the note, Removed big5+, if anyone knows other examples, I'd include those.
Post by Martin J. Dürst
You have "COMMON" here while your Shift_JIS registration has "LIMITED".
Is that by accident, or is there some rationale behind it?
Um, by accident. I copied the original shift-jis registration, and used the windows-1252 as a template for this. I have no clue what the distinction is :) Changed to LIMITED USE. (reasoning that the variations are cause instability between implementations, so I'd much rather have people picking something like UTF-8). Is there a definition of these terms? All of them should be OBSOLETE in favor of UTF-* ;-) I'd use that if I could get away with it.

-Shawn

 
http://blogs.msdn.com/shawnste
________________________________________
From: "Martin J. DÃŒrst" [***@it.aoyama.ac.jp<mailto:***@it.aoyama.ac.jp>]
Sent: Wednesday, September 21, 2011 1:00 AM
To: Shawn Steele
Cc: 'ietf-***@mail.apps.ietf.org<mailto:ietf-***@mail.apps.ietf.org>'; Makoto Murata (eb2m-***@asahi-net.or.jp<mailto:eb2m-***@asahi-net.or.jp>)
Subject: Re: Big5 / CP950

Hello Shawn,
Post by Martin J. Dürst
Here's some proposed text for a more complete registration.
Many thanks for doing this work. Some comments below, mostly nits.
Post by Martin J. Dürst
Comments welcome. AFAICT this code page is quite a bit less stable than others, and there are a plethora of mappings. I've included two ISO10646 equivalency tables for that reason.
Thanks,
Shawn
-----------------------------------

Charset name: big5

Charset aliases: (None)

MIBenum: 2026

Suitability for use in MIME text:

Yes, big5 is suitable for use with subtypes of the "text"
Content-Type. Note that big5 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.
Two example ISO 10646 equivalency tables: http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT

Note that Big5 has many variants, so these exemplars provide two
common mappings:
Additional information:

Several vendor specific charsets that derive from Big5 often use
the Big5 name instead of a more specific vendor charset name.
Big5-HKSCS is one example, Microsoft Code Page 950, and
several font specific variations are other examples.
Although not authoritative, the following references may also be of
interest:

Printed mapping table:
Dr. International "Developing International Software, Second Edition",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 778 and appendixes on CD.

Microsoft windows extended "best fit" behavior:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt
Additional information about the many variants of Big5:
http://en.wikipedia.org/wiki/Big-5
The wide variety of existing variations of Big5 may make it
unsuitable for many modern applications. Developers should
consider whether UTF-8 or UTF-16 would be more appropriate for
new applications.

This is an update of an existing registration of this charset. This
charset name is in use.

This charset is also known as Windows Code Page 950 or cp950 for
short; these are NOT aliases.

Person& email address to contact for further information:

Shawn Steele
Email: Shawn.Steele&microsoft.com<http://microsoft.com>

Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
U.S.A.
Intended usage: LIMITED USE
Martin J. Dürst
2011-09-22 04:09:27 UTC
Permalink
I saw the =E2=80=9Cone of=E2=80=A6=E2=80=9D, but they aren=E2=80=
=99t defined in the RFC? Your spirit of Limited Use sounds about rig=
ht for big5 though.

I agree.

For some more comments, please see below.
Thanks,
Shawn
Sent: Wednesday, September 21, 2011 9:41 AM
To: Shawn Steele; Ira McDonald
Subject: Re: Big5 / CP950
Hi Shawn,
RFC 2978 section 5 'Charset Registration Template'
(One of COMMON, LIMITED USE or OBSOLETE)"
The spirit of LIMITED USE has been to discourage the use
of legacy charsets that are particularly problematic - Big5.
Not sure if OBSOLETE has ever been used.
I haven't checked, but I guess these were not introduced when the=
=20
charset registry was created, but with a later update.

I assume the distinction between COMMON and LIMITED USE was originall=
y=20
intended as some kind of advice to implementers: If it's COMMON, then=
=20
make sure it's supported, if it's LIMITED USE, you may not need it. B=
ut=20
I don't think that has ever really worked.
Martin - searching for this made me realize that the
plaintext IANA Charset Registry at
ftp://ftp.iana.org/assignments/character-sets
contains 257 entries - they don't include the Intended
Usage field.
I suggest we work w/ IANA to change the plaintext
registry.
Assuming somebody has lots of spare time, that would indeed be a good=
=20
idea. Assuming that everybody's time is rather limited, it may have t=
o=20
wait. There are quite a few other things in the registry that might=
=20
benefit from clearing up, but the critical mass may not be reached ye=
t.
In most cases this data is long lost (if ever submitted)
because the directory
ftp://ftp.iana.org/assignments/charset-reg
contains only 55 entries.
Lots of stuff was taken from http://tools.ietf.org/html/rfc1345 (and=
=20
some other places). There's no need to keep that kind of information =
in=20
separate templates.

Regards, Martin.
Cheers,
- Ira
Ira McDonald (Musician / Software Architect)
Chair - Linux Foundation Open Printing WG
Co-Chair - IEEE-ISTO PWG IPP WG
Chair - TCG Embedded Systems Hardcopy SWG
IETF Designated Expert - IPP& Printer MIB
Blue Roof Music/High North Inc
http://sites.google.com/site/blueroofmusic
http://sites.google.com/site/highnorthinc
579 Park Place Saline, MI 48176
734-944-0094
PO Box 221 Grand Marais, MI 49839
906-494-2434
Moved the note, Removed big5+, if anyone knows other examples, I'd =
include those.
You have "COMMON" here while your Shift_JIS registration has "LIMI=
TED".
Is that by accident, or is there some rationale behind it?
Um, by accident. I copied the original shift-jis registration, and=
used the windows-1252 as a template for this. I have no clue what t=
he distinction is :) Changed to LIMITED USE. (reasoning that the va=
riations are cause instability between implementations, so I'd much r=
ather have people picking something like UTF-8). Is there a definiti=
on of these terms? All of them should be OBSOLETE in favor of UTF-* =
;-) I'd use that if I could get away with it.
-Shawn
=EF=A3=A2=EF=A3=90=EF=A3=A7=EF=A3=9B =EF=A3=A2=EF=A3=A3=EF=A3=97=
=EF=A3=94=EF=A3=99
http://blogs.msdn.com/shawnste
________________________________________
it.aoyama.ac.jp>]
Sent: Wednesday, September 21, 2011 1:00 AM
To: Shawn Steele
s.ietf.org>'; Makoto Murata (eb2m-***@asahi-net.or.jp<mailto:eb2m-mrt=
@asahi-net.or.jp>)
Subject: Re: Big5 / CP950
Hello Shawn,
Here's some proposed text for a more complete registration.
Many thanks for doing this work. Some comments below, mostly nits.
Comments welcome. AFAICT this code page is quite a bit less stabl=
e than others, and there are a plethora of mappings. I've included t=
wo ISO10646 equivalency tables for that reason.
Thanks,
Shawn
-----------------------------------
Charset name: big5
Charset aliases: (None)
MIBenum: 2026
Yes, big5 is suitable for use with subtypes of the "text"
Content-Type. Note that big5 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.
Two example ISO 10646 equivalency tables: http://unicode.org/Publi=
c/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
Note that Big5 has many variants, so these exemplars provide two
Several vendor specific charsets that derive from Big5 often use
the Big5 name instead of a more specific vendor charset name.
Big5-HKSCS is one example, Microsoft Code Page 950, and
several font specific variations are other examples.
Although not authoritative, the following references may also be of
Dr. International "Developing International Software, Second Editio=
n",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 778 and appendixes on=
CD.
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFi=
t/bestfit950.txt
http://en.wikipedia.org/wiki/Big-5
The wide variety of existing variations of Big5 may make it
unsuitable for many modern applications. Developers should
consider whether UTF-8 or UTF-16 would be more appropriate for
new applications.
This is an update of an existing registration of this charset. This
charset name is in use.
This charset is also known as Windows Code Page 950 or cp950 for
short; these are NOT aliases.
Shawn Steele
Email: Shawn.Steele&microsoft.com<http://microsoft.com>
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
U.S.A.
Intended usage: LIMITED USE
Martin J. Dürst
2011-09-20 01:23:30 UTC
Permalink
However, I don=E2=80=99t see a big5 entry in the charset registry, =
only the entry in the Character Sets table.

Given there seems some confusion over what's the registry, I had a lo=
ok=20
and I'm proposing the following tweaks/changes. I'm sending this to t=
he=20
charset mailing list first to see what people think, and will ask IAN=
A=20
whether they can make the changes that we agree on.

On http://www.iana.org/assignments/charset-reg/index.html:

Change the title from "Character Set Registrations" to "Links to=20
Character Set Registration Forms"

Near the link to RFC2978, add another item saying:
"Full list of registered Character Sets", with a link to=20
http://www.iana.org/assignments/character-sets.

Change the sentence "The following is the list of Character Set=20
Registrations not defined in RFCs:" to "The following is the list of=
=20
Character Set Registration forms not contained in an RFC:"


On http://www.iana.org/protocols/, I suggest to change "Registration=
=20
Templates" for Character Sets to "Registration Forms". The word=20
"Template", as far as I'm aware, is only used for an empty or example=
=20
form, not for the final product of filling in the form.


Regards, Martin.
Loading...