Discussion:
ietf-charsets@mail.apps.ietf.org windows 1250 - another update to review :)
Shawn Steele
2007-06-12 19:36:12 UTC
Permalink
Please review updates to windows 1250. I used the feedback we had last year for 1252 to guide this request.

Thanks,
Shawn

-------------------------------------------------------------------------
Charset name: windows-1250

Charset aliases: (None)

Suitability for use in MIME text:

Yes, windows-1250 is suitable for use with subtypes of the "text"
Content-Type. Note that windows-1250 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.

Published specification(s):

1) http://www.microsoft.com/globaldev/reference/sbcs/1250.htm

ISO 10646 equivalency table:

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT

Additional information:

UTF-8 is preferred to windows-1250 when permissible.

Although not authoritative, the following references may also be of
interest:

Printed mapping table:
Dr. International "Developing International Software, Second Edition",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 729-737

Microsoft windows extended "best fit" behavior:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1250.txt

This is an update of an existing registration of this charset. This
charset name is in use.

This charset is also known as Windows Code Page 1250 or cp1250 for
short; these are NOT aliases.

The graphic (non-control) characters of Windows-1250 are a superset of
the graphic characters of the ISO-8859-2 charset. See the range 80 to
9F (hex).

Person & email address to contact for further information:

Shawn Steele
Email: ***@microsoft.com

Microsoft Corporation
One Microsoft Way,
Redmond, WA 98052
U.S.A.

Intended usage: C
Erik van der Poel
2007-06-13 01:53:01 UTC
Permalink
Are the graphic character mappings of windows-1250 really a superset
of iso-8859-2? Do the bytes in the range 0xA0 to 0xFF map to the same
Unicodes?

I already knew of the windows-1252 and -1254 supersets, but not -1250.
Maybe the differences are "minor"?

Erik
Post by Shawn Steele
Please review updates to windows 1250. I used the feedback we had last year for 1252 to guide this request.
Thanks,
Shawn
-------------------------------------------------------------------------
Charset name: windows-1250
Charset aliases: (None)
Yes, windows-1250 is suitable for use with subtypes of the "text"
Content-Type. Note that windows-1250 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.
1) http://www.microsoft.com/globaldev/reference/sbcs/1250.htm
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT
UTF-8 is preferred to windows-1250 when permissible.
Although not authoritative, the following references may also be of
Dr. International "Developing International Software, Second Edition",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 729-737
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1250.txt
This is an update of an existing registration of this charset. This
charset name is in use.
This charset is also known as Windows Code Page 1250 or cp1250 for
short; these are NOT aliases.
The graphic (non-control) characters of Windows-1250 are a superset of
the graphic characters of the ISO-8859-2 charset. See the range 80 to
9F (hex).
Shawn Steele
Microsoft Corporation
One Microsoft Way,
Redmond, WA 98052
U.S.A.
Intended usage: COMMON
Erik van der Poel
2007-06-13 16:34:52 UTC
Permalink
Perhaps this is a misunderstanding due to the word "superset" (my
mistake). What I meant was an upward compatible character encoding, if
you disregard the control characters. Clearly, iso-8859-2 and
windows-1250 are different in the range 0xA0 to 0xFF:

--- icu-iso-8859-2 2006-10-06 22:01:39.000000000 -0700
+++ cp1250 2007-06-05 15:47:24.000000000 -0700
@@ -126,69 +126,69 @@
A0 0000A0
-A1 000104
+A1 0002C7
A2 0002D8
A3 000141
A4 0000A4
-A5 00013D
-A6 00015A
+A5 000104
+A6 0000A6
A7 0000A7
A8 0000A8
-A9 000160
+A9 0000A9
AA 00015E
-AB 000164
-AC 000179
+AB 0000AB
+AC 0000AC
AD 0000AD
-AE 00017D
+AE 0000AE
AF 00017B
B0 0000B0
-B1 000105
+B1 0000B1
B2 0002DB
B3 000142
B4 0000B4
-B5 00013E
-B6 00015B
-B7 0002C7
+B5 0000B5
+B6 0000B6
+B7 0000B7
B8 0000B8
-B9 000161
+B9 000105
BA 00015F
-BB 000165
-BC 00017A
+BB 0000BB
+BC 00013D
BD 0002DD
-BE 00017E
+BE 00013E
BF 00017C
C0 000154
C1 0000C1

So we should not only check all of your updates, but also choose a
better word than "superset".

Erik
Post by Erik van der Poel
Are the graphic character mappings of windows-1250 really a superset
of iso-8859-2? Do the bytes in the range 0xA0 to 0xFF map to the same
Unicodes?
I already knew of the windows-1252 and -1254 supersets, but not -1250.
Maybe the differences are "minor"?
Erik
Post by Shawn Steele
Please review updates to windows 1250. I used the feedback we had last year for 1252 to guide this request.
Thanks,
Shawn
-------------------------------------------------------------------------
Charset name: windows-1250
Charset aliases: (None)
Yes, windows-1250 is suitable for use with subtypes of the "text"
Content-Type. Note that windows-1250 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.
1) http://www.microsoft.com/globaldev/reference/sbcs/1250.htm
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT
UTF-8 is preferred to windows-1250 when permissible.
Although not authoritative, the following references may also be of
Dr. International "Developing International Software, Second Edition",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 729-737
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1250.txt
This is an update of an existing registration of this charset. This
charset name is in use.
This charset is also known as Windows Code Page 1250 or cp1250 for
short; these are NOT aliases.
The graphic (non-control) characters of Windows-1250 are a superset of
the graphic characters of the ISO-8859-2 charset. See the range 80 to
9F (hex).
Shawn Steele
Microsoft Corporation
One Microsoft Way,
Redmond, WA 98052
U.S.A.
Intended usage: COMMON
Shawn Steele
2007-06-13 18:29:38 UTC
Permalink
I was apparently blindly relying on erroneous data, sorry.

How's this correction?
Post by Erik van der Poel
The graphic (non-control) characters of Windows-1250 have a supported
set of characters similar to the ISO-8859-2 charset. There are
differences in the range from 80 to FF (hex).
I'll double check the others as well.

Thanks,
Shawn


-----Original Message-----
From: Erik van der Poel [mailto:***@google.com]
Sent: Wednesday 13 June 2007 9:35
To: Shawn Steele
Cc: ietf-***@mail.apps.ietf.org
Subject: Re: ietf-***@mail.apps.ietf.org windows 1250 - another update to review :)

Perhaps this is a misunderstanding due to the word "superset" (my
mistake). What I meant was an upward compatible character encoding, if
you disregard the control characters. Clearly, iso-8859-2 and
windows-1250 are different in the range 0xA0 to 0xFF:

--- icu-iso-8859-2 2006-10-06 22:01:39.000000000 -0700
+++ cp1250 2007-06-05 15:47:24.000000000 -0700
@@ -126,69 +126,69 @@
A0 0000A0
-A1 000104
+A1 0002C7
A2 0002D8
A3 000141
A4 0000A4
-A5 00013D
-A6 00015A
+A5 000104
+A6 0000A6
A7 0000A7
A8 0000A8
-A9 000160
+A9 0000A9
AA 00015E
-AB 000164
-AC 000179
+AB 0000AB
+AC 0000AC
AD 0000AD
-AE 00017D
+AE 0000AE
AF 00017B
B0 0000B0
-B1 000105
+B1 0000B1
B2 0002DB
B3 000142
B4 0000B4
-B5 00013E
-B6 00015B
-B7 0002C7
+B5 0000B5
+B6 0000B6
+B7 0000B7
B8 0000B8
-B9 000161
+B9 000105
BA 00015F
-BB 000165
-BC 00017A
+BB 0000BB
+BC 00013D
BD 0002DD
-BE 00017E
+BE 00013E
BF 00017C
C0 000154
C1 0000C1

So we should not only check all of your updates, but also choose a
better word than "superset".

Erik
Post by Erik van der Poel
Are the graphic character mappings of windows-1250 really a superset
of iso-8859-2? Do the bytes in the range 0xA0 to 0xFF map to the same
Unicodes?
I already knew of the windows-1252 and -1254 supersets, but not -1250.
Maybe the differences are "minor"?
Erik
Please review updates to windows 1250. I used the feedback we had last year for 1252 to guide this request.
Thanks,
Shawn
-------------------------------------------------------------------------
Charset name: windows-1250
Charset aliases: (None)
Yes, windows-1250 is suitable for use with subtypes of the "text"
Content-Type. Note that windows-1250 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.
1) http://www.microsoft.com/globaldev/reference/sbcs/1250.htm
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT
UTF-8 is preferred to windows-1250 when permissible.
Although not authoritative, the following references may also be of
Dr. International "Developing International Software, Second Edition",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 729-737
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1250.txt
This is an update of an existing registration of this charset. This
charset name is in use.
This charset is also known as Windows Code Page 1250 or cp1250 for
short; these are NOT aliases.
The graphic (non-control) characters of Windows-1250 have a supported
set of characters similar to the ISO-8859-2 charset. There are
differences in the range from 80 to FF (hex).
Shawn Steele
Microsoft Corporation
One Microsoft Way,
Redmond, WA 98052
U.S.A.
Inten
Ned Freed
2007-06-13 20:38:00 UTC
Permalink
Post by Erik van der Poel
Perhaps this is a misunderstanding due to the word "superset" (my
mistake). What I meant was an upward compatible character encoding, if
you disregard the control characters. Clearly, iso-8859-2 and
Well, I guess it's a question of whether you're talking about a superset of the
underlying set of characters (CCS) or a superset of the mapping of the
characters to integers (CES). So perhaps it is best to simply avoid the word
superset entirely and say "all of the characters in iso-8859-2 are present in
windows-1250 but some are in different positions" or something along these
lines.

Ned
Shawn Steele
2007-06-13 20:55:49 UTC
Permalink
I assume these comments are all just out-of-sync with my previous suggestion?
Post by Shawn Steele
How's this correction?
The graphic (non-control) characters of Windows-1250 have a supported
set of characters similar to the ISO-8859-2 charset. There are
differences in the range from 80 to FF (hex).
- Shawn

-----Original Message-----
From: Ned Freed [mailto:***@mrochek.com]
Sent: Wednesday 13 June 2007 13:38
To: Erik van der Poel
Post by Shawn Steele
Perhaps this is a misunderstanding due to the word "superset" (my
mistake). What I meant was an upward compatible character encoding, if
you disregard the control characters. Clearly, iso-8859-2 and
Well, I guess it's a question of whether you're talking about a superset of the
underlying set of characters (CCS) or a superset of the mapping of the
characters to integers (CES). So perhaps it is best to simply avoid the word
superset entirely and say "all of the characters in iso-8859-2 are present in
windows-1250 but some are in different positions" or something along these
Ned Freed
2007-06-13 20:57:01 UTC
Permalink
Post by Shawn Steele
I assume these comments are all just out-of-sync with my previous suggestion?
Post by Shawn Steele
How's this correction?
The graphic (non-control) characters of Windows-1250 have a supported
set of characters similar to the ISO-8859-2 charset. There are
differences in the range from 80 to FF (hex).
I think it is useful to say that every graphics character in iso-8859-2
is also in windows-1250. That's why I suggested the wording I did.

Ned
Post by Shawn Steele
- Shawn
-----Original Message-----
Sent: Wednesday 13 June 2007 13:38
To: Erik van der Poel
Post by Shawn Steele
Perhaps this is a misunderstanding due to the word "superset" (my
mistake). What I meant was an upward compatible character encoding, if
you disregard the control characters. Clearly, iso-8859-2 and
Well, I guess it's a question of whether you're talking about a superset of the
underlying set of characters (CCS) or a superset of the mapping of the
characters to integers (CES). So perhaps it is best to simply avoid the word
superset entirely and say "all of the characters in iso-8859-2 are present in
windows-1250 but some are in different positions" or something along these
lines.
Ned
Ned Freed
2007-06-13 18:00:24 UTC
Permalink
Post by Erik van der Poel
Are the graphic character mappings of windows-1250 really a superset
of iso-8859-2?
Yes, I believe it is. Basically what windows-1250 does is reassign a bunch of
the characters in the control range 80-9F to various punctuation and accented
letters. It ends up having a superset of what iso-8859-2.
Post by Erik van der Poel
Do the bytes in the range 0xA0 to 0xFF map to the same
Unicodes?
No they don't. Several of the characters iso-8859-2 has in this
range are in the 80-9F range in windows-1250.
Post by Erik van der Poel
I already knew of the windows-1252 and -1254 supersets, but not -1250.
Maybe the differences are "minor"?
Things are shuffled around quite a bit and there are about 20 additional
graphical characters in windows-1250. Not sure you can call that "minor".

Ned
Loading...