Discussion:
Windows Code Pages 932, 936, 949 and 950
Erik van der Poel
2006-11-03 05:53:24 UTC
Permalink
It turns out that MSIE does not support the following charset names:

windows-932
windows-936
windows-949
windows-950

Of course, MSIE does support those code pages, but the most commonly
used names for those code pages in MSIE are shift_jis, gb2312, euc-kr
and big5, respectively.

If we are emphatic about cp1252 not being an alias for windows-1252
for interoperability reasons, then for the same reasons, we should not
register windows-932, windows-936, windows-949 and windows-950.

(Windows-874 is supported by MSIE, so it will probably be OK to
register this one.)

Erik
Erik van der Poel
2006-11-03 05:55:12 UTC
Permalink
But of course windows-936 is already registered, as an alias of gbk.

E.
Post by Erik van der Poel
windows-932
windows-936
windows-949
windows-950
Of course, MSIE does support those code pages, but the most commonly
used names for those code pages in MSIE are shift_jis, gb2312, euc-kr
and big5, respectively.
If we are emphatic about cp1252 not being an alias for windows-1252
for interoperability reasons, then for the same reasons, we should not
register windows-932, windows-936, windows-949 and windows-950.
(Windows-874 is supported by MSIE, so it will probably be OK to
register this one.)
Erik
Erik van der Poel
2006-11-03 07:46:41 UTC
Permalink
Hi Shawn,

Yes, I think it would be helpful to have the lists of recognized and
unrecognized names, particularly for IE (i.e. MLang). (The .NET
framework may not be as relevant, unless it is directly and often
involved in charset names as they appear "on the wire". I don't know
.NET very well.)

If you also know what the situation is in the email area (as opposed
to the Web), that would be good to know too. I.e. do Microsoft email
agents respond to charset names? And if so, which ones? Same as MLang?

Thank you!

Erik
Some of the code pages (I'd have to check which ones) respond to x-windows-nnn instead of windows-nnn in the .NET framework, which is similar to MLang, which is what IE uses to map names to actual code pages.
IMHO it would be worth using only the names that MLang or .Net (or IE) supports for aliasing windows code page names. Our names aren't consistent and registering a superset of the patterns as aliases would probably lead to trouble.
One reason I'd recommend not registering aliases that Microsoft products don't actually recognize for Microsoft code pages is that it is unlikely that we'd start responding to "new" names since that might cause user confusion and compatibility problems.
If it would help I can provide a list of the names and aliases that we actually use for the various code pages, and names that we don't recognize but have been I know they've been referred to (like CP1252).
- Shawn
________________________________
Sent: Thu 11/2/2006 9:55 PM
Cc: Mike Ksar
Subject: Re: Windows Code Pages 932, 936, 949 and 950
But of course windows-936 is already registered, as an alias of gbk.
E.
Post by Erik van der Poel
windows-932
windows-936
windows-949
windows-950
Of course, MSIE does support those code pages, but the most commonly
used names for those code pages in MSIE are shift_jis, gb2312, euc-kr
and big5, respectively.
If we are emphatic about cp1252 not being an alias for windows-1252
for interoperability reasons, then for the same reasons, we should not
register windows-932, windows-936, windows-949 and windows-950.
(Windows-874 is supported by MSIE, so it will probably be OK to
register this one.)
Erik
Shawn Steele
2006-11-04 00:18:32 UTC
Permalink
Bccing Mike Ksar. We've chatted over here and I (***@microsoft.com) would be a better contact for code pages in the future.

I'll try to compile a list shortly, it may take a few days because I'd like to get it right the first time :).

.NET applications can pass charset names "on the wire", and should be mostly the same as MLang's. Microsoft's email clients & servers recognize the same names.

One thing we would like to avoid is accidentally introducing new aliases that aren't currently used. If that happened then other mail agents could accidentally pass the "wrong" name and interoperability could suffer.

- Shawn

-----Original Message-----
From: Erik van der Poel [mailto:***@google.com]
Sent: Poʻahā, Nowemapa 02, 2006 11:47 PM
To: Shawn Steele
Cc: ietf-***@iana.org; Mike Ksar
Subject: Re: Windows Code Pages 932, 936, 949 and 950

Hi Shawn,

Yes, I think it would be helpful to have the lists of recognized and
unrecognized names, particularly for IE (i.e. MLang). (The .NET
framework may not be as relevant, unless it is directly and often
involved in charset names as they appear "on the wire". I don't know
.NET very well.)

If you also know what the situation is in the email area (as opposed
to the Web), that would be good to know too. I.e. do Microsoft email
agents respond to charset names? And if so, which ones? Same as MLang?

Thank you!

Erik
Some of the code pages (I'd have to check which ones) respond to x-windows-nnn instead of windows-nnn in the .NET framework, which is similar to MLang, which is what IE uses to map names to actual code pages.
IMHO it would be worth using only the names that MLang or .Net (or IE) supports for aliasing windows code page names. Our names aren't consistent and registering a superset of the patterns as aliases would probably lead to trouble.
One reason I'd recommend not registering aliases that Microsoft products don't actually recognize for Microsoft code pages is that it is unlikely that we'd start responding to "new" names since that might cause user confusion and compatibility problems.
If it would help I can provide a list of the names and aliases that we actually use for the various code pages, and names that we don't recognize but have been I know they've been referred to (like CP1252).
- Shawn
________________________________
Sent: Thu 11/2/2006 9:55 PM
Cc: Mike Ksar
Subject: Re: Windows Code Pages 932, 936, 949 and 950
But of course windows-936 is already registered, as an alias of gbk.
E.
Post by Erik van der Poel
windows-932
windows-936
windows-949
windows-950
Of course, MSIE does support those code pages, but the most commonly
used names for those code pages in MSIE are shift_jis, gb2312, euc-kr
and big5, respectively.
If we are emphatic about cp1252 not being an alias for windows-1252
for interoperability reasons, then for the same reasons, we should not
register windows-932, windows-936, windows-949 and windows-950.
(Windows-874 is supported by MSIE, so it will probably be OK to
register this one.)
Erik
Erik van der Poel
2006-11-04 07:25:03 UTC
Permalink
Shawn,

Thank you for offering to compile a list. I'm glad you talked to Mike.
Will you continue the effort that Mike started to update the windows-*
registrations?

By the way, it might be good to have the x- names in your list too.
Some x- names appear to be used quite a lot.

Thanks again,

Erik
Post by Shawn Steele
I'll try to compile a list shortly, it may take a few days because I'd like to get it right the first time :).
.NET applications can pass charset names "on the wire", and should be mostly the same as MLang's. Microsoft's email clients & servers recognize the same names.
One thing we would like to avoid is accidentally introducing new aliases that aren't currently used. If that happened then other mail agents could accidentally pass the "wrong" name and interoperability could suffer.
- Shawn
-----Original Message-----
Sent: Poʻahā, Nowemapa 02, 2006 11:47 PM
To: Shawn Steele
Subject: Re: Windows Code Pages 932, 936, 949 and 950
Hi Shawn,
Yes, I think it would be helpful to have the lists of recognized and
unrecognized names, particularly for IE (i.e. MLang). (The .NET
framework may not be as relevant, unless it is directly and often
involved in charset names as they appear "on the wire". I don't know
.NET very well.)
If you also know what the situation is in the email area (as opposed
to the Web), that would be good to know too. I.e. do Microsoft email
agents respond to charset names? And if so, which ones? Same as MLang?
Thank you!
Erik
Some of the code pages (I'd have to check which ones) respond to x-windows-nnn instead of windows-nnn in the .NET framework, which is similar to MLang, which is what IE uses to map names to actual code pages.
IMHO it would be worth using only the names that MLang or .Net (or IE) supports for aliasing windows code page names. Our names aren't consistent and registering a superset of the patterns as aliases would probably lead to trouble.
One reason I'd recommend not registering aliases that Microsoft products don't actually recognize for Microsoft code pages is that it is unlikely that we'd start responding to "new" names since that might cause user confusion and compatibility problems.
If it would help I can provide a list of the names and aliases that we actually use for the various code pages, and names that we don't recognize but have been I know they've been referred to (like CP1252).
- Shawn
________________________________
Sent: Thu 11/2/2006 9:55 PM
Cc: Mike Ksar
Subject: Re: Windows Code Pages 932, 936, 949 and 950
But of course windows-936 is already registered, as an alias of gbk.
E.
Post by Erik van der Poel
windows-932
windows-936
windows-949
windows-950
Of course, MSIE does support those code pages, but the most commonly
used names for those code pages in MSIE are shift_jis, gb2312, euc-kr
and big5, respectively.
If we are emphatic about cp1252 not being an alias for windows-1252
for interoperability reasons, then for the same reasons, we should not
register windows-932, windows-936, windows-949 and windows-950.
(Windows-874 is supported by MSIE, so it will probably be OK to
register thi
Shawn Steele
2006-11-07 00:06:42 UTC
Permalink
I've created a list of the aliases that we recognize for the code pages we're discussing:

http://blogs.msdn.com/shawnste/archive/2006/11/06/expected-names-of-microsoft-windows-ansi-code-pages-encodings.aspx

They aren't as
Frank Ellermann
2006-11-07 17:02:48 UTC
Permalink
Post by Shawn Steele
http://blogs.msdn.com/shawnste/archive/2006/11/06/expected-names-of-microsoft-windows-ansi-code-pages-encodings.aspx
They aren't as messy as I feared, many don't have aliases.
Thanks, that's interesting. There are roughly two classes, some
windows-nnnn, and three others: shift-jis, gb2312, ks_c_5601-1987.

For the windows-nnnn we've in theory to check if they are already
listed under another name. If not they can be listed or updated
as is. Otherwise (already listed under another name) we can add
windows-nnnn as alias, ready, that part should be simple.

For the three others it could tricky if they are already listed
under these names with a different mapping. Let's first finish
the simple windows-nnnn cases. Apparently windows-1252 is almost
ready now.

Frank
Shawn Steele
2006-11-07 20:08:45 UTC
Permalink
I should mention that these are names that we recognize for these code pages. In some cases the mapping might be the closest we have, but not completely appropriate, in which case the names shouldn't be included as aliases in the charset doc.

- Shawn

-----Original Message-----
From: news [mailto:***@sea.gmane.org] On Behalf Of Frank Ellermann
Sent: Pōʻ, Nowemapa 07, 2006 9:03 AM
To: ietf-***@mail.apps.ietf.org
Subject: Re: Windows Code Pages 932, 936, 949 and 950
Post by Shawn Steele
http://blogs.msdn.com/shawnste/archive/2006/11/06/expected-names-of-microsoft-windows-ansi-code-pages-encodings.aspx
They aren't as messy as I feared, many don't have aliases.
Thanks, that's interesting. There are roughly two classes, some
windows-nnnn, and three others: shift-jis, gb2312, ks_c_5601-1987.

For the windows-nnnn we've in theory to check if they are already
listed under another name. If not they can be listed or updated
as is. Otherwise (already listed under another name) we can add
windows-nnnn as alias, ready, that part should be simple.

For the three others it could tricky if they are already listed
under these names with a different mapping. Let's first finish
the simple windows-nnnn cases. Apparently windows-1252 is almost
ready now.

Fran

Shawn Steele
2006-11-03 07:18:26 UTC
Permalink
Some of the code pages (I'd have to check which ones) respond to x-windows-nnn instead of windows-nnn in the .NET framework, which is similar to MLang, which is what IE uses to map names to actual code pages.

IMHO it would be worth using only the names that MLang or .Net (or IE) supports for aliasing windows code page names. Our names aren't consistent and registering a superset of the patterns as aliases would probably lead to trouble.

One reason I'd recommend not registering aliases that Microsoft products don't actually recognize for Microsoft code pages is that it is unlikely that we'd start responding to "new" names since that might cause user confusion and compatibility problems.

If it would help I can provide a list of the names and aliases that we actually use for the various code pages, and names that we don't recognize but have been I know they've been referred to (like CP1252).

- Shawn

________________________________

From: Erik van der Poel [mailto:***@google.com]
Sent: Thu 11/2/2006 9:55 PM
To: ietf-***@iana.org
Cc: Mike Ksar
Subject: Re: Windows Code Pages 932, 936, 949 and 950



But of course windows-936 is already registered, as an alias of gbk.

E.
Post by Erik van der Poel
windows-932
windows-936
windows-949
windows-950
Of course, MSIE does support those code pages, but the most commonly
used names for those code pages in MSIE are shift_jis, gb2312, euc-kr
and big5, respectively.
If we are emphatic about cp1252 not being an alias for windows-1252
for interoperability reasons, then for the same reasons, we should not
register windows-932, windows-936, windows-949 and windows-950.
(Windows-874 is supported by MSIE, so it will probably be OK to
register this one.)
Erik
Frank Ellermann
2006-11-03 09:35:42 UTC
Permalink
Some of the code pages (I'd have to check which ones) respond to
x-windows-nnn instead of windows-nnn in the .NET framework, which
is similar to MLang, which is what IE uses to map names to actual
code pages.
IMHO it would be worth using only the names that MLang or .Net (or
IE) supports for aliasing windows code page names.
Yes, but we can't register x-anything. X- is reserved in RFC 2045
chapter 5, and RFC 2978 inherits that in chapter 3.1. The x- rules
are always the same, nothing special wrt charsets.

IOW we need a simple way to break this rule if necessary. We could
pro forma register "anything" mentioning that only x-anything works
in practice. Or because we always need a cswhatever we could take
csanything.
it is unlikely that we'd start responding to "new" names since that
might cause user confusion and compatibility problems.
Dropping the x- in x-anything shouldn't be _too_ confusing, ignoring
the "minor" trouble that it won't work for old = existing systems :-(
Post by Erik van der Poel
windows-932
[...]
Post by Erik van der Poel
windows-949
windows-950
Of course, MSIE does support those code pages, but the most commonly
used names for those code pages in MSIE are shift_jis,
[...]
Post by Erik van der Poel
euc-kr and big5, respectively.
These names are already registered, we can't use them for something
_different_ outside of the "additional info" for 932, 949, and 950.

How about csWindows-932, etc. ? No other cs contains a hyphen, so
maybe it's a bad idea.

Frank
Erik van der Poel
2006-11-04 00:21:23 UTC
Permalink
Post by Frank Ellermann
Post by Erik van der Poel
windows-932
[...]
Post by Erik van der Poel
windows-949
windows-950
Of course, MSIE does support those code pages, but the most commonly
used names for those code pages in MSIE are shift_jis,
[...]
Post by Erik van der Poel
euc-kr and big5, respectively.
These names are already registered, we can't use them for something
_different_ outside of the "additional info" for 932, 949, and 950.
How about csWindows-932, etc. ? No other cs contains a hyphen, so
maybe it's a bad idea.
No, my suggestion is that we reject the attempt to register
windows-932, windows-949 and windows-950. According to Shawn,
Microsoft wouldn't support those names even if they were registered.
(So we shouldn't register csWindows-932 either.)

It would be nice if Shawn and Mike Ksar could discuss this, reach some
sort of agreement, and then report back to this mailing list.

Erik
Martin Duerst
2006-11-05 10:22:28 UTC
Permalink
Post by Frank Ellermann
Yes, but we can't register x-anything. X- is reserved in RFC 2045
chapter 5, and RFC 2978 inherits that in chapter 3.1. The x- rules
are always the same, nothing special wrt charsets.
IOW we need a simple way to break this rule if necessary. We could
pro forma register "anything" mentioning that only x-anything works
in practice. Or because we always need a cswhatever we could take
csanything.
As far as I understand, we are only discussing aliases, not the main
label. As long as the main label is supported, there is no need for
aliases at all. People look at the registry, pick the main label,
label their cotent, and send it off. The receiver understands what
is meant. Interoperability rules.

Aliases are only useful to document varying existing practice
for labeling charsets. Ideally, such practice will die out
eventually.

Regards, Martin.



#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Loading...