Discussion:
Update of charset windows-1252, draft 2
Erik van der Poel
2006-10-24 16:26:52 UTC
Permalink
Based on all the feedback on the windows-1252 update, here is draft no. 2.

Erik

---------------------------

Charset name: windows-1252

Charset aliases: (None)

Suitability for use in MIME text:

Yes, windows-1252 is suitable for use with subtypes of the "text"
Content-Type. Note that windows-1252 is an 8-bit charset. Care should
be taken to choose the appropriate Content-Transfer-Encoding.

Published specification(s):

1) Dr. International "Developing International Software, Second Edition",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 743-747

2) http://www.microsoft.com/globaldev/reference/sbcs/1252.htm

ISO 10646 equivalency table:

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

Additional information:

This is an update of an existing registration of this charset. This
charset name is in use.

Older versions of this charset have been registered as
ISO-8859-1-Windows-3.0-Latin-1 and ISO-8859-1-Windows-3.1-Latin-1.

Another name that is sometimes used for this charset is cp1252.

The graphic (non-control) characters of Windows-1252 are a superset of
the graphic characters of the ISO-8859-1 charset. See the range 80 to
9F (hex).

Microsoft has also published the table used with the "best fit"
feature in their APIs:

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/readme.txt

Person & email address to contact for further information:

Mike Ksar
Email: ***@microsoft.com

Microsoft Corporation
One Microsoft Way,
Redmond, WA 98052
U.S.A.

Intended usage: COMMON
Martin Duerst
2006-10-26 05:53:43 UTC
Permalink
This looks good to me, except for two points:

- I'd like to get a word from Mike to make sure that he/Microsoft
is fine with this.
- There is a confusion about aliases. It says "aliases: None",
but then later says "Another name that is sometimes used for
this charset is cp1252.". I think it is important to make it
clear that this is not an alias.

Reagards, Martin.


At 01:26 06/10/25, Erik van der Poel wrote:
>Based on all the feedback on the windows-1252 update, here is draft no. 2.
>
>Erik
>
>---------------------------
>
>Charset name: windows-1252
>
>Charset aliases: (None)
>
>Suitability for use in MIME text:
>
>Yes, windows-1252 is suitable for use with subtypes of the "text"
>Content-Type. Note that windows-1252 is an 8-bit charset. Care should
>be taken to choose the appropriate Content-Transfer-Encoding.
>
>Published specification(s):
>
>1) Dr. International "Developing International Software, Second Edition",
>Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 743-747
>
>2) http://www.microsoft.com/globaldev/reference/sbcs/1252.htm
>
>ISO 10646 equivalency table:
>
>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
>
>Additional information:
>
>This is an update of an existing registration of this charset. This
>charset name is in use.
>
>Older versions of this charset have been registered as
>ISO-8859-1-Windows-3.0-Latin-1 and ISO-8859-1-Windows-3.1-Latin-1.
>
>Another name that is sometimes used for this charset is cp1252.
>
>The graphic (non-control) characters of Windows-1252 are a superset of
>the graphic characters of the ISO-8859-1 charset. See the range 80 to
>9F (hex).
>
>Microsoft has also published the table used with the "best fit"
>feature in their APIs:
>
>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/readme.txt
>
>Person & email address to contact for further information:
>
>Mike Ksar
>Email: ***@microsoft.com
>
>Microsoft Corporation
>One Microsoft Way,
>Redmond, WA 98052
>U.S.A.
>
>Intended usage: COMMON


#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Erik van der Poel
2006-10-28 02:09:33 UTC
Permalink
As far as I can tell, Microsoft Internet Explorer does not support the
name cp1252. I think we should either remove it from the registration,
or change it to say something like "The cp1252 name is sometimes used
in other contexts for this charset, but it is not an alias for general
Internet use."

Erik

On 10/25/06, Martin Duerst <***@it.aoyama.ac.jp> wrote:
> This looks good to me, except for two points:
>
> - I'd like to get a word from Mike to make sure that he/Microsoft
> is fine with this.
> - There is a confusion about aliases. It says "aliases: None",
> but then later says "Another name that is sometimes used for
> this charset is cp1252.". I think it is important to make it
> clear that this is not an alias.
>
> Reagards, Martin.
>
>
> At 01:26 06/10/25, Erik van der Poel wrote:
> >Based on all the feedback on the windows-1252 update, here is draft no. 2.
> >
> >Erik
> >
> >---------------------------
> >
> >Charset name: windows-1252
> >
> >Charset aliases: (None)
> >
> >Suitability for use in MIME text:
> >
> >Yes, windows-1252 is suitable for use with subtypes of the "text"
> >Content-Type. Note that windows-1252 is an 8-bit charset. Care should
> >be taken to choose the appropriate Content-Transfer-Encoding.
> >
> >Published specification(s):
> >
> >1) Dr. International "Developing International Software, Second Edition",
> >Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 743-747
> >
> >2) http://www.microsoft.com/globaldev/reference/sbcs/1252.htm
> >
> >ISO 10646 equivalency table:
> >
> >http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
> >
> >Additional information:
> >
> >This is an update of an existing registration of this charset. This
> >charset name is in use.
> >
> >Older versions of this charset have been registered as
> >ISO-8859-1-Windows-3.0-Latin-1 and ISO-8859-1-Windows-3.1-Latin-1.
> >
> >Another name that is sometimes used for this charset is cp1252.
> >
> >The graphic (non-control) characters of Windows-1252 are a superset of
> >the graphic characters of the ISO-8859-1 charset. See the range 80 to
> >9F (hex).
> >
> >Microsoft has also published the table used with the "best fit"
> >feature in their APIs:
> >
> >http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
> >http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/readme.txt
> >
> >Person & email address to contact for further information:
> >
> >Mike Ksar
> >Email: ***@microsoft.com
> >
> >Microsoft Corporation
> >One Microsoft Way,
> >Redmond, WA 98052
> >U.S.A.
> >
> >Intended usage: COMMON
>
>
> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
>
>
Keld Jørn Simonsen
2006-10-28 06:32:33 UTC
Permalink
On Fri, Oct 27, 2006 at 07:09:33PM -0700, Erik van der Poel wrote:
> As far as I can tell, Microsoft Internet Explorer does not support the
> name cp1252. I think we should either remove it from the registration,
> or change it to say something like "The cp1252 name is sometimes used
> in other contexts for this charset, but it is not an alias for general
> Internet use."

I would recommend that the clause on cp1252 is removed, as cp1252 is
already a registered name in another registration, namely that from rfc
1345.

Best regards
keld

> Erik
>
> On 10/25/06, Martin Duerst <***@it.aoyama.ac.jp> wrote:
> >This looks good to me, except for two points:
> >
> >- I'd like to get a word from Mike to make sure that he/Microsoft
> > is fine with this.
> >- There is a confusion about aliases. It says "aliases: None",
> > but then later says "Another name that is sometimes used for
> > this charset is cp1252.". I think it is important to make it
> > clear that this is not an alias.
> >
> >Reagards, Martin.
> >
> >
> >At 01:26 06/10/25, Erik van der Poel wrote:
> >>Based on all the feedback on the windows-1252 update, here is draft no. 2.
> >>
> >>Erik
> >>
> >>---------------------------
> >>
> >>Charset name: windows-1252
> >>
> >>Charset aliases: (None)
> >>
> >>Suitability for use in MIME text:
> >>
> >>Yes, windows-1252 is suitable for use with subtypes of the "text"
> >>Content-Type. Note that windows-1252 is an 8-bit charset. Care should
> >>be taken to choose the appropriate Content-Transfer-Encoding.
> >>
> >>Published specification(s):
> >>
> >>1) Dr. International "Developing International Software, Second Edition",
> >>Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 743-747
> >>
> >>2) http://www.microsoft.com/globaldev/reference/sbcs/1252.htm
> >>
> >>ISO 10646 equivalency table:
> >>
> >>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
> >>
> >>Additional information:
> >>
> >>This is an update of an existing registration of this charset. This
> >>charset name is in use.
> >>
> >>Older versions of this charset have been registered as
> >>ISO-8859-1-Windows-3.0-Latin-1 and ISO-8859-1-Windows-3.1-Latin-1.
> >>
> >>Another name that is sometimes used for this charset is cp1252.
> >>
> >>The graphic (non-control) characters of Windows-1252 are a superset of
> >>the graphic characters of the ISO-8859-1 charset. See the range 80 to
> >>9F (hex).
> >>
> >>Microsoft has also published the table used with the "best fit"
> >>feature in their APIs:
> >>
> >>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
> >>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/readme.txt
> >>
> >>Person & email address to contact for further information:
> >>
> >>Mike Ksar
> >>Email: ***@microsoft.com
> >>
> >>Microsoft Corporation
> >>One Microsoft Way,
> >>Redmond, WA 98052
> >>U.S.A.
> >>
> >>Intended usage: COMMON
> >
> >
> >#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> >#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
> >
> >
Martin Duerst
2006-10-28 06:59:54 UTC
Permalink
Hello Keld,

I haven't found the string "1252" in RFC 1345
(e.g. at http://www.ietf.org/rfc/rfc1345.txt), nor in the IANA
registration page at http://www.iana.org/assignments/character-sets.
Can you please tell us what you are talking about?

Regards, Martin.

At 15:32 06/10/28, Keld J=1B$B=8FS=1B(Bn Simonsen wrote:
>On Fri, Oct 27, 2006 at 07:09:33PM -0700, Erik van der Poel wrote:
>> As far as I can tell, Microsoft Internet Explorer does not support=
the
>> name cp1252. I think we should either remove it from the registrat=
ion,
>> or change it to say something like "The cp1252 name is sometimes u=
sed
>> in other contexts for this charset, but it is not an alias for gen=
eral
>> Internet use."
>
>I would recommend that the clause on cp1252 is removed, as cp1252 is
>already a registered name in another registration, namely that from =
rfc
>1345.
>
>Best regards
>keld
>
>> Erik
>>=20
>> On 10/25/06, Martin Duerst <***@it.aoyama.ac.jp> wrote:
>> >This looks good to me, except for two points:
>> >
>> >- I'd like to get a word from Mike to make sure that he/Microsoft
>> > is fine with this.
>> >- There is a confusion about aliases. It says "aliases: None",
>> > but then later says "Another name that is sometimes used for
>> > this charset is cp1252.". I think it is important to make it
>> > clear that this is not an alias.
>> >
>> >Reagards, Martin.
>> >
>> >
>> >At 01:26 06/10/25, Erik van der Poel wrote:
>> >>Based on all the feedback on the windows-1252 update, here is dr=
aft no. 2.
>> >>
>> >>Erik
>> >>
>> >>---------------------------
>> >>
>> >>Charset name: windows-1252
>> >>
>> >>Charset aliases: (None)
>> >>
>> >>Suitability for use in MIME text:
>> >>
>> >>Yes, windows-1252 is suitable for use with subtypes of the "text=
"
>> >>Content-Type. Note that windows-1252 is an 8-bit charset. Care s=
hould
>> >>be taken to choose the appropriate Content-Transfer-Encoding.
>> >>
>> >>Published specification(s):
>> >>
>> >>1) Dr. International "Developing International Software, Second =
Edition",
>> >>Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 743-747
>> >>
>> >>2) http://www.microsoft.com/globaldev/reference/sbcs/1252.htm
>> >>
>> >>ISO 10646 equivalency table:
>> >>
>> >>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP=
1252.TXT
>> >>
>> >>Additional information:
>> >>
>> >>This is an update of an existing registration of this charset. T=
his
>> >>charset name is in use.
>> >>
>> >>Older versions of this charset have been registered as
>> >>ISO-8859-1-Windows-3.0-Latin-1 and ISO-8859-1-Windows-3.1-Latin-=
1.
>> >>
>> >>Another name that is sometimes used for this charset is cp1252.
>> >>
>> >>The graphic (non-control) characters of Windows-1252 are a super=
set of
>> >>the graphic characters of the ISO-8859-1 charset. See the range =
80 to
>> >>9F (hex).
>> >>
>> >>Microsoft has also published the table used with the "best fit"
>> >>feature in their APIs:
>> >>
>>=20
>>>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestF=
it/bestfit1252.txt
>>=20
>>>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestF=
it/readme.txt
>> >>
>> >>Person & email address to contact for further information:
>> >>
>> >>Mike Ksar
>> >>Email: ***@microsoft.com
>> >>
>> >>Microsoft Corporation
>> >>One Microsoft Way,
>> >>Redmond, WA 98052
>> >>U.S.A.
>> >>
>> >>Intended usage: COMMON
>> >
>> >
>> >#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin Universi=
ty
>> >#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyam=
a.ac.jp
>> >
>> >


#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac=
.jp =20
Keld Jørn Simonsen
2006-10-30 12:50:05 UTC
Permalink
On Sat, Oct 28, 2006 at 03:59:54PM +0900, Martin Duerst wrote:
> Hello Keld,
>
> I haven't found the string "1252" in RFC 1345
> (e.g. at http://www.ietf.org/rfc/rfc1345.txt), nor in the IANA
> registration page at http://www.iana.org/assignments/character-sets.
> Can you please tell us what you are talking about?

I was wrong. It is not in rfc 1345, and not in the IANA registry.
I only have it in my updated tables, which are, however,
used in the Unix recode program. So the cp1252 name is free for
registration. Personally, as this name is not used in Microsoft windows
software, and used for something different, but similar in Unix-like
systems, I would prefer that it not be mentioned with this registration.

Best regards
Keld

> Regards, Martin.
>
> At 15:32 06/10/28, Keld J?Sn Simonsen wrote:
> >On Fri, Oct 27, 2006 at 07:09:33PM -0700, Erik van der Poel wrote:
> >> As far as I can tell, Microsoft Internet Explorer does not support the
> >> name cp1252. I think we should either remove it from the registration,
> >> or change it to say something like "The cp1252 name is sometimes used
> >> in other contexts for this charset, but it is not an alias for general
> >> Internet use."
> >
> >I would recommend that the clause on cp1252 is removed, as cp1252 is
> >already a registered name in another registration, namely that from rfc
> >1345.
> >
> >Best regards
> >keld
> >
> >> Erik
> >>
> >> On 10/25/06, Martin Duerst <***@it.aoyama.ac.jp> wrote:
> >> >This looks good to me, except for two points:
> >> >
> >> >- I'd like to get a word from Mike to make sure that he/Microsoft
> >> > is fine with this.
> >> >- There is a confusion about aliases. It says "aliases: None",
> >> > but then later says "Another name that is sometimes used for
> >> > this charset is cp1252.". I think it is important to make it
> >> > clear that this is not an alias.
> >> >
> >> >Reagards, Martin.
> >> >
> >> >
> >> >At 01:26 06/10/25, Erik van der Poel wrote:
> >> >>Based on all the feedback on the windows-1252 update, here is draft no. 2.
> >> >>
> >> >>Erik
> >> >>
> >> >>---------------------------
> >> >>
> >> >>Charset name: windows-1252
> >> >>
> >> >>Charset aliases: (None)
> >> >>
> >> >>Suitability for use in MIME text:
> >> >>
> >> >>Yes, windows-1252 is suitable for use with subtypes of the "text"
> >> >>Content-Type. Note that windows-1252 is an 8-bit charset. Care should
> >> >>be taken to choose the appropriate Content-Transfer-Encoding.
> >> >>
> >> >>Published specification(s):
> >> >>
> >> >>1) Dr. International "Developing International Software, Second Edition",
> >> >>Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 743-747
> >> >>
> >> >>2) http://www.microsoft.com/globaldev/reference/sbcs/1252.htm
> >> >>
> >> >>ISO 10646 equivalency table:
> >> >>
> >> >>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
> >> >>
> >> >>Additional information:
> >> >>
> >> >>This is an update of an existing registration of this charset. This
> >> >>charset name is in use.
> >> >>
> >> >>Older versions of this charset have been registered as
> >> >>ISO-8859-1-Windows-3.0-Latin-1 and ISO-8859-1-Windows-3.1-Latin-1.
> >> >>
> >> >>Another name that is sometimes used for this charset is cp1252.
> >> >>
> >> >>The graphic (non-control) characters of Windows-1252 are a superset of
> >> >>the graphic characters of the ISO-8859-1 charset. See the range 80 to
> >> >>9F (hex).
> >> >>
> >> >>Microsoft has also published the table used with the "best fit"
> >> >>feature in their APIs:
> >> >>
> >>
> >>>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
> >>
> >>>http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/readme.txt
> >> >>
> >> >>Person & email address to contact for further information:
> >> >>
> >> >>Mike Ksar
> >> >>Email: ***@microsoft.com
> >> >>
> >> >>Microsoft Corporation
> >> >>One Microsoft Way,
> >> >>Redmond, WA 98052
> >> >>U.S.A.
> >> >>
> >> >>Intended usage: COMMON
> >> >
> >> >
> >> >#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> >> >#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
> >> >
> >> >
>
>
> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Kent Karlsson
2006-10-30 23:09:21 UTC
Permalink
Keld Jørn Simonsen
> > I haven't found the string "1252" in RFC 1345
> > (e.g. at http://www.ietf.org/rfc/rfc1345.txt), nor in the IANA
> > registration page at http://www.iana.org/assignments/character-sets.
> > Can you please tell us what you are talking about?
>
> I was wrong. It is not in rfc 1345, and not in the IANA registry.
> I only have it in my updated tables, which are, however,
> used in the Unix recode program. So the cp1252 name is free for
> registration. Personally, as this name is not used in Microsoft
windows
> software, and used for something different, but similar in Unix-like

That, I think is REALLY bad. If "cp1252" is used at all as a
character encoding designation, it should be equivalent to
using "windows-1252", whatever the system. They are indeed
equivalent for ICU converters (fallbacks and all, though
downgrading fallbacks should still not be a part of an IANA
charset registration), along with some other IBM specific
aliases (basically ibm-5348).

Letting those two names stand for different charsets
*increases* the surprise factor, which I instead want
to *decrease*.

/kent k

> systems, I would prefer that it not be mentioned with this
> registration.
Shawn Steele
2006-10-30 23:52:28 UTC
Permalink
> > used in the Unix recode program. So the cp1252 name is free for
> > registration. Personally, as this name is not used in Microsoft
windows
> > software, and used for something different, but similar in Unix-like

> That, I think is REALLY bad. If "cp1252" is used at all as a
> character encoding designation, it should be equivalent to
> using "windows-1252", whatever the system.

For other code pages, cp1256 for example, it is possible for Encoding.GetEncoding("cp1256") to succeed.

AFAIK we don't use cp1252 as an alias for windows-1252, but it does seem to happen for other code pages. I've also seen references to the cp1252 term in documentation and in non-Microsoft uses referring to the windows code page, so it seems safest to mention this alias.

That said, windows APIs and IE are not going to recognize the cp1252 term even if it is registered, but it might make interoperability with unix or other platforms or tools less confusing.

- Shawn

Shawn Steele
Windows International
Microsoft
Ned Freed
2006-10-30 23:45:26 UTC
Permalink
> Keld J=F8rn Simonsen
> > > I haven't found the string "1252" in RFC 1345
> > > (e.g. at http://www.ietf.org/rfc/rfc1345.txt), nor in the IANA
> > > registration page at http://www.iana.org/assignments/character-=
sets.
> > > Can you please tell us what you are talking about?
> >
> > I was wrong. It is not in rfc 1345, and not in the IANA registry.
> > I only have it in my updated tables, which are, however,
> > used in the Unix recode program. So the cp1252 name is free for
> > registration. Personally, as this name is not used in Microsoft
> windows
> > software, and used for something different, but similar in Unix-l=
ike

> That, I think is REALLY bad. If "cp1252" is used at all as a
> character encoding designation, it should be equivalent to
> using "windows-1252", whatever the system. They are indeed
> equivalent for ICU converters (fallbacks and all, though
> downgrading fallbacks should still not be a part of an IANA
> charset registration), along with some other IBM specific
> aliases (basically ibm-5348).

> Letting those two names stand for different charsets
> *increases* the surprise factor, which I instead want
> to *decrease*.

I agree. And since this issue exsits, wouldn't it make sense to menti=
on
it in the registration?

=09=09=09=09Ned
Martin Duerst
2006-10-31 02:25:27 UTC
Permalink
At 08:09 06/10/31, Kent Karlsson wrote:
>
>Keld J=1B$B=8FS=1B(Bn Simonsen

>> I was wrong. It is not in rfc 1345, and not in the IANA registry.
>> I only have it in my updated tables, which are, however,
>> used in the Unix recode program. So the cp1252 name is free for
>> registration. Personally, as this name is not used in Microsoft
>windows
>> software, and used for something different, but similar in Unix-li=
ke
>
>That, I think is REALLY bad. If "cp1252" is used at all as a
>character encoding designation, it should be equivalent to
>using "windows-1252", whatever the system.

I have checked iconv on cygwin. It lists CP1252 as being
equivalent with WINDOWS-1252 and MS-ANSI. The iconv on Fedora
Linux that I tested unfortunately does not say which encoding
labels are equivalent.

Regards, Martin.

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac=
.jp =20
Martin Duerst
2006-10-28 07:01:08 UTC
Permalink
Irresponsive of Keld's issue, I'd personally prefer your
first proposal (remove cp1252). Regards, Martin.

At 11:09 06/10/28, Erik van der Poel wrote:
>As far as I can tell, Microsoft Internet Explorer does not support the
>name cp1252. I think we should either remove it from the registration,
>or change it to say something like "The cp1252 name is sometimes used
>in other contexts for this charset, but it is not an alias for general
>Internet use."
>
>Erik
>
>On 10/25/06, Martin Duerst <***@it.aoyama.ac.jp> wrote:
>> This looks good to me, except for two points:
>>
>> - I'd like to get a word from Mike to make sure that he/Microsoft
>> is fine with this.
>> - There is a confusion about aliases. It says "aliases: None",
>> but then later says "Another name that is sometimes used for
>> this charset is cp1252.". I think it is important to make it
>> clear that this is not an alias.
>>
>> Reagards, Martin.


#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Erik van der Poel
2006-10-28 15:22:11 UTC
Permalink
Thanks, I propose to remove the sentence about cp1252 after waiting a
few more days.

I'm also thinking: "the appropriate Content-Transfer-Encoding" -> "an
appropriate Content-Transfer-Encoding".

And "their APIs" sounds funny when the name at the bottom is Mike Ksar
from Microsoft. Maybe I'll change it to 'The table used with
Microsoft's "best fit" APIs has been published'

Erik

On 10/28/06, Martin Duerst <***@it.aoyama.ac.jp> wrote:
> Irresponsive of Keld's issue, I'd personally prefer your
> first proposal (remove cp1252). Regards, Martin.
>
> At 11:09 06/10/28, Erik van der Poel wrote:
> >As far as I can tell, Microsoft Internet Explorer does not support the
> >name cp1252. I think we should either remove it from the registration,
> >or change it to say something like "The cp1252 name is sometimes used
> >in other contexts for this charset, but it is not an alias for general
> >Internet use."
> >
> >Erik
> >
> >On 10/25/06, Martin Duerst <***@it.aoyama.ac.jp> wrote:
> >> This looks good to me, except for two points:
> >>
> >> - I'd like to get a word from Mike to make sure that he/Microsoft
> >> is fine with this.
> >> - There is a confusion about aliases. It says "aliases: None",
> >> but then later says "Another name that is sometimes used for
> >> this charset is cp1252.". I think it is important to make it
> >> clear that this is not an alias.
> >>
> >> Reagards, Martin.
>
>
> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
>
>
Kent Karlsson
2006-10-28 19:29:48 UTC
Permalink
> Thanks, I propose to remove the sentence about cp1252
>after waiting a few more days.

I see no problem in having cp1252 as an alias, esp. since
there is already a cp936 alias for a related (same originator
for same set of systems) encoding.

Why aren't all of the legacy MS defined encodings dealt
with in a single batch?

I really would like to see them being treated very similarly:
* similar preferred names/aliases
* similar set of other aliases
* similar mapping references
etc.
Anything else would be needlessly arbitrary and surprising.

> And "their APIs" sounds funny when the name at the bottom
> is Mike Ksar from Microsoft. Maybe I'll change it to 'The table
> used with Microsoft's "best fit" APIs has been published'

But I see no reason why that "best fit" file, or mapping APIs,
would need any mention at all in a registration such as this.

/kent k
Erik van der Poel
2006-10-28 20:30:51 UTC
Permalink
> I see no problem in having cp1252 as an alias, esp. since
> there is already a cp936 alias for a related (same originator
> for same set of systems) encoding.

It may be too late to register cp1252 as a formal alias for
windows-1252. If I am not mistaken, MSIE does not support the cp1252
name, and MSIE is used quite widely.

> Why aren't all of the legacy MS defined encodings dealt
> with in a single batch?

Because I don't want to edit so many files every time we come up with
a single edit. That's why I'm starting with a single windows-*
charset: to figure out the final pattern for one of them, and then
apply that pattern to all of the others in one fell swoop (with minor
changes, if necessary).

> I really would like to see them being treated very similarly:
> * similar preferred names/aliases
> * similar set of other aliases
> * similar mapping references
> etc.
> Anything else would be needlessly arbitrary and surprising.

Unfortunately, we are not at the very beginning of the deployment of
implementations of these charset names. When making changes to network
protocols, you must take existing deployments into account, try not to
"break" anyone, and follow migration plans if necessary.

By the way, the windows-1255 charset has changed recently. See the
mapping for 0xCA in:

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1255.TXT
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1255.txt

So we may want to update the out-of-date one (CP1255.TXT).

Also, windows-936 has already been registered, as an alias for gbk. So
we have to discuss whether we will add Mike Ksar's name to the
existing registration.

http://www.iana.org/assignments/charset-reg/GBK

> But I see no reason why that "best fit" file, or mapping APIs,
> would need any mention at all in a registration such as this.

It is merely being provided as "additional information". However, I
personally don't mind removing it if there is consensus to do so.

Erik
Kent Karlsson
2006-10-28 23:12:34 UTC
Permalink
> By the way, the windows-1255 charset has changed recently. See the
> mapping for 0xCA in:
>
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1255.TXT
>
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bes
tfit1255.txt

HEBREW POINT HOLAM HASER FOR VAV is new in Unicode 5.0.

> So we may want to update the out-of-date one (CP1255.TXT).

I think that is for Microsoft to do. But it does make their stated
policy of not updating any of their codepages less credible.

While doing that, I would suggest they fix the character name
comments (both in the cp* files and in the bestfit* files) to
align with Unicode 5.0. It is so much less confusing that way.

/kent k
Shawn Steele
2006-10-30 19:09:32 UTC
Permalink
>> By the way, the windows-1255 charset has changed recently. See the
>> mapping for 0xCA in:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1255.TXT
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1255.txt

Actually, the behavior of the windows code page has not changed, the previous mapping for 0xCA is identical to the current mapping. I'm guessing that since it wasn't a real code point it was filtered out by whomever created CP1255.txt (I wasn't here, I'm not sure how it came to be :)).

>> So we may want to update the out-of-date one (CP1255.TXT)

> I think that is for Microsoft to do. But it does make their stated
> policy of not updating any of their codepages less credible.

I'm not about to touch that data file, I'm not sure how it was created, obviously best-fit and unassigned unicode code points were filtered out. (A few code pages also map to the PUA, but those mappings aren't in the older Unicode tables.)

> While doing that, I would suggest they fix the character name
> comments (both in the cp* files and in the bestfit* files) to
> align with Unicode 5.0. It is so much less confusing that way.

These are effectively our raw source files, and were provided without any manipulations in order to avoid the risk of introducing a technical error. It'd be nice if the comments were pretty, but as it is we can easily prove that it's the same as the windows tables.

- Shawn

Shawn Steele
Windows International
Microsoft
Ned Freed
2006-10-28 18:31:58 UTC
Permalink
> Irresponsive of Keld's issue, I'd personally prefer your
> first proposal (remove cp1252). Regards, Martin.

My preference is the opposite. I have no problem with not listing
cp1252 as an alias, but given that this charset is sometimes labelled
this way the label needs to be mentioned in the registration.

Registrations aren't just used to determine how to label something, they
are also used to determine what various labels mean.

Ned
Martin Duerst
2006-10-29 02:08:37 UTC
Permalink
At 03:31 06/10/29, Ned Freed wrote:
>> Irresponsive of Keld's issue, I'd personally prefer your
>> first proposal (remove cp1252). Regards, Martin.
>
>My preference is the opposite. I have no problem with not listing
>cp1252 as an alias, but given that this charset is sometimes labelled
>this way the label needs to be mentioned in the registration.
>
>Registrations aren't just used to determine how to label something, they
>are also used to determine what various labels mean.

I'm okay with mentioning this if it is done in a way that makes
absolutely cristal clear that it is not an alias. So what e.g.
about (in the Additional Information section): "This charset
is also known as Microsoft Code Page 1252 (or cp1252 for short;
this is NOT an alias)."

We do not want to have more aliases than necessary (ideally zero).

Regards, Martin.




#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Frank Ellermann
2006-10-29 19:58:15 UTC
Permalink
Martin Duerst wrote:

> Ned Freed wrote:

>> I have no problem with not listing cp1252 as an alias, but
>> given that this charset is sometimes labelled this way the
>> label needs to be mentioned in the registration.

[...]
> I'm okay with mentioning this if it is done in a way that
> makes absolutely cristal clear that it is not an alias.

Summary, we don't want folks to start to use charset=cp-1252
in mail, on Web pages, or similar. but we also want it clear
that "cp1252" is the same thing, e.g. the "old" Unicode page:

http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

For some value of "we", where http://purl.net/net/cp/1252 is
a special case with historical info about various "cp1252".

On my box I'd "chcp 1004" to get 1252, but we should ignore
that as irrelevant, I'd also "chcp 850" to get back to 858 ;-)

Frank
Loading...