Discussion:
(resend) Request for egistration of character set TSCII for TAMIL language
Ned Freed
2007-02-25 18:45:17 UTC
Permalink
This is the original message on this thread; it was sent to the wrong address
so I am reposting it to the list.

Ned
K Kalyanasundaram
2007-02-26 14:42:11 UTC
Permalink
Dear Ned and Martin:

As per your suggestion, I am posting below a plain ASCII version
of TSCII charset registration request. While preparing this document =
we

have taken into account your comments. Please let us know if there
are still points to be sorted out.

K. Kalyanasundaram
----

Request for the Registration of the Charset TSCII

To:
ietf-charsets@ iana.org

Subject:
Registration of new charset

Character set name:
TSCII
(TAMIL SCRIPT CODE FOR INFORMATION INTERCHANGE)

Character set aliases:
None

Suitability for use in MIME text:
YES
usable as 8bit or with base64 or quoted-printable encoding

Published Specifications:
http://www.tscii.org/tsciispec.html

ISO 10646 Equivalency Table:=20
Available as a technical note at the Unicode Consortium website
http://www.unicode. org/notes/ tn15/

As a glyph-based encoding, TSCII codechart includes vowels, consonant=
s
and abugida (compound vowel-consonant) characters. Unicode, as a
character encoding encodes only vowels and consonants. Hence not all
codepoints of TSCII can be converted one-to-one with ISO 10646.

Intended usage:
COMMON

Additional Information:

Tamil is one of the main Indian languages (Dravidian in Origin)
currently spoken by over 70 million people worldwide. TSCII (Tamil
Script Code for Information Interchange) is a bilingual 8-bit
glyph-based encoding scheme (Roman and Tamil). The TSCII scheme was
collectively worked out through Net-based discussions in 1998. TSCII =
is
modeled on "ISO-8859-XX family of charsets" with standard plain ASCII
set filling the 7-bit part and a set of Tamil character glyphs fillin=
g
the 8-bit part.

Full technical details on the TSCII charset are available at the TSCI=
I
official website: http://www.tscii.org/tsciispec.html

Person(s) & email address to contact for further information:

TSCII USER GROUP represented by

Kalyanasundaram, Kuppuswamy (Switzerland)
***@yahoo. com
Manivannan, Mani (USA)
mmanivannan@ gmail.com
Nedumaran, Muthu (Malaysia)
***@murasu. com

TAMIL SCRIPT CODE FOR INFORMATION INTERCHANGE (TSCII)
Glyph/Character Listing and ISO 10646 Mapping Table

Column #1 is the TSCII code position (in hex),
Column #2 is the TSCII character name

ISO 10646 Mapping table can be obtained as a Technical note from the
Unicode Consortium website:
http://www.unicode. org/notes/ tn15

A Unicode based PDF file that includes the actual glyph forms of all
characters included in TSCII charset is available at the TSCII websit=
e
http://www.tscii. org/tsciispec.html

HEX Character Name
00 NULL
01 START OF HEADING
02 START OF TEXT
03 END OF TEXT
04 END OF TRANSMISSION
05 ENQUIRY
06 ACKNOWLEDGE
07 BELL
08 BACKSPACE
09 HORIZONTAL TABULATION
0A LINE FEED
0B VERTICAL TABULATION
0C FORM FEED
0D CARRIAGE RETURN
0E SHIFT OUT
0F SHIFT IN
=20
10 DATA LINE ESCAPE
11 DEVICE CONTROL ONE
12 DEVICE CONTROL TWO
13 DEVICE CONTROL THREE
14 DEVICE CONTROL FOUR
15 NEGATIVE ACKNOWLEDGE
16 SYNCHRONOUS IDLE
17 END OF TRANSMISSION BLOCK
18 CANCEL
19 END OF MEDIUM
1A SUBSTITUTE
1B ESCAPE
1C FILE SEPARATOR
1D GROUP SEPARATOR
1E RECORD SEPARATOR
1F UNIT SEPARATOR
=20
20 SPACE
21 EXCLAMATION MARK
22 QUOTATION MARK
23 NUMBER SIGN
24 DOLLAR SIGN
25 PERCENT SIGN
26 AMPERSAND
27 APOSTROPHE
28 LEFT PARENTHESIS
29 RIGHT PARENTHESIS
2A ASTERISK
2B PLUS SIGN
2C COMMA
2D HYPHEN MINUS
2E FULL STOP
2F SOLIDUS

30 DIGIT ZERO
31 DIGIT ONE
32 DIGIT TWO
33 DIGIT THREE
34 DIGIT FOUR
35 DIGIT FIVE
36 DIGIT SIX
37 DIGIT SEVEN
38 DIGIT EIGHT
39 DIGIT NINE
3A COLON
3B SEMICOLON
3C LESS-THAN SIGN
3D EQUALS SIGN
3E GREATER-THAN SIGN
3F QUESTION MARK
=20
40 COMMERCIAL AT
41 LATIN CAPITAL LETTER A
42 LATIN CAPITAL LETTER B
43 LATIN CAPITAL LETTER C
44 LATIN CAPITAL LETTER D
45 LATIN CAPITAL LETTER E
46 LATIN CAPITAL LETTER F
47 LATIN CAPITAL LETTER G
48 LATIN CAPITAL LETTER H
49 LATIN CAPITAL LETTER I
4A LATIN CAPITAL LETTER J
4B LATIN CAPITAL LETTER K
4C LATIN CAPITAL LETTER L
4D LATIN CAPITAL LETTER M
4E LATIN CAPITAL LETTER N
4F LATIN CAPITAL LETTER O

50 LATIN CAPITAL LETTER P
51 LATIN CAPITAL LETTER Q
52 LATIN CAPITAL LETTER R
53 LATIN CAPITAL LETTER S
54 LATIN CAPITAL LETTER T
55 LATIN CAPITAL LETTER U
56 LATIN CAPITAL LETTER V
57 LATIN CAPITAL LETTER W
58 LATIN CAPITAL LETTER X
59 LATIN CAPITAL LETTER Y
5A LATIN CAPITAL LETTER Z
5B LEFT SQUARE BRACKET
5C REVERSE SOLIDUS
5D RIGHT SQUARE BRACKET
5E CIRCUMFLEX ACCENT
5F LOW LINE

60 GRAVE ACCENT
61 LATIN SMALL LETTER A
62 LATIN SMALL LETTER B
63 LATIN SMALL LETTER C
64 LATIN SMALL LETTER D
65 LATIN SMALL LETTER E
66 LATIN SMALL LETTER F
67 LATIN SMALL LETTER G
68 LATIN SMALL LETTER H
69 LATIN SMALL LETTER I
6A LATIN SMALL LETTER J
6B LATIN SMALL LETTER K
6C LATIN SMALL LETTER L
6D LATIN SMALL LETTER M
6E LATIN SMALL LETTER N
6F LATIN SMALL LETTER O

70 LATIN SMALL LETTER P
71 LATIN SMALL LETTER Q
72 LATIN SMALL LETTER R
73 LATIN SMALL LETTER S
74 LATIN SMALL LETTER T
75 LATIN SMALL LETTER U
76 LATIN SMALL LETTER V
77 LATIN SMALL LETTER W
78 LATIN SMALL LETTER X
79 LATIN SMALL LETTER Y
7A LATIN SMALL LETTER Z
7B LEFT CURLY BRACKET
7C VERTICAL LINE
7D RIGHT CURLY BRACKET
7E TILDE
7F DELETE

80 TAMIL DIGIT CUZHI =3D Tamil digit zero
81 TAMIL DIGIT ONRRU =3D Tamil digit one
82 TAMIL GRANTHA LETTER SRI =3D Tamil letter sri
83 TAMIL GRANTHA LETTER JA =3D Tamil letter ja
84 TAMIL GRANTHA LETTER SSA =3D Tamil letter ssa
85 TAMIL GRANTHA LETTER SA =3D Tamil letter sa
86 TAMIL GRANTHA LETTER HA =3D Tamil letter ha
87 TAMIL GRANTHA LETTER KSHA =3D Tamil letter ksha
88 TAMIL GRANTHA LETTER J =3D Tamil letter j
89 TAMIL GRANTHA LETTER SS =3D Tamil letter ss
8A TAMIL GRANTHA LETTER S =3D Tamil letter s
8B TAMIL GRANTHA LETTER H =3D Tamil letter h
8C TAMIL GRANTHA LETTER KSH =3D Tamil letter ksh
8D TAMIL DIGIT IRANNNTU =3D Tamil digit two
8E TAMIL DIGIT MUUNNRRU =3D Tamil digit three
8F TAMIL DIGIT NAANNKU =3D Tamil digit four

90 TAMIL DIGIT AINTHU =3D Tamil digit five
91 LEFT SINGLE QUOTATION MARK
92 RIGHT SINGLE QUOTATION MARK
93 LEFT DOUBLE QUOTATION MARK
94 RIGHT DOUBLE QUOTATION MARK
95 TAMIL DIGIT AARRU =3D Tamil digit six
96 TAMIL DIGIT EEZHU =3D Tamil digit seven
97 TAMIL DIGIT ETTU =3D Tamil digit eight
98 TAMIL DIGIT ONPATHU =3D Tamil digit nine
99 TAMIL LETTER NGAKARA UKARAM =3D Tamil letter ngu
9A TAMIL LETTER NJAKARA UKARAM =3D Tamil letter nju
9B TAMIL LETTER NGAKARA UUKAARAM =3D Tamil letter nguu
9C TAMIL LETTER NJAKARA UUKAARAM =3D Tamil letter njuu
9D TAMIL NUMBER PATHTHU =3D Tamil number ten
9E TAMIL NUMBER NUURRU =3D Tamil number one hundred
9F TAMIL NUMBER AAYIRAM =3D Tamil number one thousand

A0 <NOT ASSIGNED>
A1 TAMIL VOWEL SIGN KAAL =3D Tamil vowel sign aa
A2 TAMIL VOWEL SIGN KOKKI =3D Tamil vowel sign i
A3 TAMIL VOWEL SIGN CUZHI-K-KOKKI =3D Tamil vowel sign ii
A4 TAMIL VOWEL SIGN KONNNTAI =3D Tamil vowel sign u
A5 TAMIL VOWEL SIGN CUZHIK KONNNTAI =3D Tamil vowel sign uu
A6 TAMIL VOWEL SIGN KOMPU =3D Tamil vowel sign e
A7 TAMIL VOWEL SIGN IRATTAI-K-KOMPU =3D Tamil vowel sign ee
A8 TAMIL VOWEL SIGN IRATTAI-C-CUZHI =3D Tamil vowel sign ai
A9 COPYRIGHT SIGN
AA TAMIL VOWEL SIGN CIRRAKU =3D Tamil au length mark
AB TAMIL LETTER AKARAM =3D Tamil letter a
AC TAMIL LETTER AAKAARAM =3D Tamil letter aa
AD TAMIL VOWEL IKARAM (USAGE IN SLOT DEPRECATED) =3D Tamil letter i
AE TAMIL LETTER IIKAARAM =3D Tamil letter ii
AF TAMIL LETTER UKARAM =3D Tamil letter u
=20
B0 TAMIL LETTER UUKAARAM =3D Tamil letter uu
B1 TAMIL LETTER EKARAM =3D Tamil letter e
B2 TAMIL LETTER EEKAARAM =3D Tamil letter ee
B3 TAMIL LETTER AIKAARAM =3D Tamil letter ai
B4 TAMIL LETTER OKARAM =3D Tamil letter o
B5 TAMIL LETTER OOKAARAM =3D Tamil letter oo
B6 TAMIL LETTER AUKAARAM =3D Tamil letter au
B7 TAMIL AAYTHAM LETTER AKHENAM or AKHAAN =3D Tamil letter aaytham
B8 TAMIL LETTER KAKARA AKARAM =3D Tamil letter ka
B9 TAMIL LETTER NGAKARA AKARAM =3D Tamil letter nga
BA TAMIL LETTER CAKARA AKARAM =3D Tamil letter ca
BB TAMIL LETTER NJAKARA AKARAM =3D Tamil letter nja
BC TAMIL LETTER TAKARA AKARAM =3D Tamil letter tta
BD TAMIL LETTER NNNAKARA AKARAM =3D Tamil letter nnna
BE TAMIL LETTER THAKARA AKARAM =3D Tamil letter ta
BF TAMIL LETTER NAKARA AKARAM =3D Tamil letter na

C0 TAMIL LETTER PAKARA AKARAM =3D Tamil letter pa
C1 TAMIL LETTER MAKARA AKARAM =3D Tamil letter ma
C2 TAMIL LETTER YAKARA AKARAM =3D Tamil letter ya
C3 TAMIL LETTER RAKARA AKARAM =3D Tamil letter ra
C4 TAMIL LETTER LAKARA AKARAM =3D Tamil letter la
C5 TAMIL LETTER VAKARA AKARAM =3D Tamil letter va
C6 TAMIL LETTER ZHAKARA AKARAM =3D Tamil letter llla
C7 TAMIL LETTER LLAKARA AKARAM =3D Tamil letter lla
C8 TAMIL LETTER RRAKARA AKARAM =3D Tamil letter rra
C9 TAMIL LETTER NNAKARA AKARAM =3D Tamil letter nna
CA TAMIL LETTER TAKARA IKARAM =3D Tamil letter tti
CB TAMIL LETTER TAKARA IIKAARAM =3D Tamil letter ttii
CC TAMIL LETTER KAKARA UKARAM =3D Tamil letter ku
CD TAMIL LETTER CAKARA UKARAM =3D Tamil letter cu
CE TAMIL LETTER TAKARA UKARAM =3D Tamil letter ttu
CF TAMIL LETTER NNNAKARA UKARAM =3D Tamil letter nnnu

D0 TAMIL LETTER THAKARA UKARAM =3D Tamil letter tu
D1 TAMIL LETTER NAKARA UKARAM =3D Tamil letter nu
D2 TAMIL LETTER PAKARA UKARAM =3D Tamil letter pu
D3 TAMIL LETTER MAKARA UKARAM =3D Tamil letter mu
D4 TAMIL LETTER YAKARA UKARAM =3D Tamil letter yu
D5 TAMIL LETTER RAKARA UKARAM =3D Tamil letter ru
D6 TAMIL LETTER LAKARA UKARAM =3D Tamil letter lu
D7 TAMIL LETTER VAKARA UKARAM =3D Tamil letter vu
D8 TAMIL LETTER ZHAKARA UKARAM =3D Tamil letter lllu
D9 TAMIL LETTER LLAKARA UKARAM =3D Tamil letter llu
DA TAMIL LETTER RRAKARA UKARAM =3D Tamil letter rru
DB TAMIL LETTER NNAKARA UKARAM =3D Tamil letter nnu
DC TAMIL LETTER KAKARA UUKAARAM =3D Tamil letter kuu
DD TAMIL LETTER CAKARA UUKAARAM =3D Tamil letter cuu
DE TAMIL LETTER TAKARA UUKAARAM =3D Tamil letter ttuu
DF TAMIL LETTER NNNAKARA UUKAARAM =3D Tamil letter nnnuu

E0 TAMIL LETTER THAKARA UUKAARAM =3D Tamil letter tuu
E1 TAMIL LETTER NAKARA UUKAARAM =3D Tamil letter nuu
E2 TAMIL LETTER PAKARA UUKAARAM =3D Tamil letter puu
E3 TAMIL LETTER MAKARA UUKAARAM =3D Tamil letter muu
E4 TAMIL LETTER YAKARA UUKAARAM =3D Tamil letter yuu
E5 TAMIL LETTER RAKARA UUKAARAM =3D Tamil letter ruu
E6 TAMIL LETTER LAKARA UUKAARAM =3D Tamil letter luu
E7 TAMIL LETTER VAKARA UUKAARAM =3D Tamil letter vuu
E8 TAMIL LETTER ZHAKARA UUKAARAM =3D Tamil letter llluu
E9 TAMIL LETTER LLAKARA UUKAARAM =3D Tamil letter lluu
EA TAMIL LETTER RRAKARA UUKAARAM =3D Tamil letter rruu
EB TAMIL LETTER NNAKARA UUKAARAM =3D Tamil letter nnuu
EC TAMIL LETTER KAKARAM =3D Tamil letter k
ED TAMIL LETTER NGAKARAM =3D Tamil letter ng
EE TAMIL LETTER CAKARAM =3D Tamil letter c
EF TAMIL LETTER NJAKARAM =3D Tamil letter nj

F0 TAMIL LETTER TAKARAM =3D Tamil letter tt
F1 TAMIL LETTER NNNAKARAM =3D Tamil letter nnn
F2 TAMIL LETTER THAKARAM =3D Tamil letter t
F3 TAMIL LETTER NAKARAM =3D Tamil letter n
F4 TAMIL LETTER PAKARAM =3D Tamil letter p
F5 TAMIL LETTER MAKARAM =3D Tamil letter m
F6 TAMIL LETTER YAKARAM =3D Tamil letter y
F7 TAMIL LETTER RAKARAM =3D Tamil letter r
F8 TAMIL LETTER LAKARAM =3D Tamil letter l
F9 TAMIL LETTER VAKARAM =3D Tamil letter v
FA TAMIL LETTER ZHAKARAM =3D Tamil letter LLL
FB TAMIL LETTER LLAKARAM =3D Tamil letter ll
FC TAMIL LETTER RRAKARAM =3D Tamil letter rr
FD TAMIL LETTER NNAKARAM =3D Tamil letter nn
FE TAMIL LETTER IKARAM =3D Tamil letter i
FF <NOT ASSIGNED>

NOTES:

i) Third vowel "i" is included in slots AD and EF but usage of "i" at
slot AD is deprecated. Inclusion of the glyph at slot AD is for
rendering legacy data and to enable conversion to other encodings. Te=
xt
converters to other encodings should attempt to determine which slot =
is
used in the text for ikaram before converting.

ii) Though ukara- and uukara modifiers (at slots 4A, 4B) are indicate=
d
as "TAMIL VOWEL SIGN U" and "TAMIL VOWEL SIGN UU" respectively, their
usage is permitted only for the grantha vowels. Entire ukara and uuka=
ra
uyirmey series are encoded directly in TSCII and they alone are to be
used to render these uyirmeys.

iii) Tamil numerals 0-9 are indicated as "TAMIL DIGITS" while Tamil
numerals 10,100 and 100 are indicated as "TAMIL NUMBERS". This is to
recognize the fact that Tamil numerals are being in used in two
different systems (decimalic as in Arabic using digits 0-9 and as an
additive-positional system using numerals 10,100 and 1000 as well).

iv) TAMIL LETTER AAYTHAM at slot 7B is a dependant letter though in
Unicode 4.1 it is listed differently as TAMIL VISARGA SIGN. (in Tamil
grammar this aaytham letter is called as =93caarpu ezuttu=94),

Acknowledgment:
TSCII user group would like to acknowledge the help of following
persons in the preparation of this TSCII specifications document: Mr.
S. Kaviarasan (USA), Mr. Ravindran Paul (Malaysia), Mr. Doddannan
Sivaraj (India), Dr. RM. Krishnan (India), Dr. Kumar Mallikarjunan
(USA) and Mr. Sinnathurai Srivas (UK).
Kenneth Whistler
2007-02-26 20:59:50 UTC
Permalink
Post by K Kalyanasundaram
http://www.tscii.org/tsciispec.html
Available as a technical note at the Unicode Consortium website
http://www.unicode. org/notes/ tn15/
That Technical Note #15 is out of date, reflecting the state
of affairs as of Unicode 4.0. It contains two notes regarding
the likelihood of the encoding of a Tamil digit zero and a
Tamil sha, but before they became final. In Unicode 5.0
the final code points are:

U+0BE6 TAMIL DIGIT ZERO

U+0BB6 TAMIL LETTER SHA
Post by K Kalyanasundaram
As a glyph-based encoding, TSCII codechart includes vowels, consonants
and abugida (compound vowel-consonant) characters. Unicode, as a
character encoding encodes only vowels and consonants. Hence not all
codepoints of TSCII can be converted one-to-one with ISO 10646.
What that means is that the referenced technical note is
incomplete currently as regards the 10646 equivalency of the
Post by K Kalyanasundaram
80 TAMIL DIGIT CUZHI = Tamil digit zero
82 TAMIL GRANTHA LETTER SRI = Tamil letter sri
--Ken
K Kalyanasundaram
2007-02-27 07:02:35 UTC
Permalink
Dear Ken:

We are aware that Unicode has added (or redefined) few characters in
the Tamil block since the tech. note #15 was filed. We are already in
contact
with Mr. Rick McGowan of Unicode Consortium on placing an updated
version of this technical note, so that the mapping table does
correspond
to latest version of Unicode Tamil Block.

K. Kalyanasundaram
Post by Kenneth Whistler
Post by K Kalyanasundaram
http://www.tscii.org/tsciispec.html
Available as a technical note at the Unicode Consortium website
http://www.unicode. org/notes/ tn15/
That Technical Note #15 is out of date, reflecting the state
of affairs as of Unicode 4.0. It contains two notes regarding
the likelihood of the encoding of a Tamil digit zero and a
Tamil sha, but before they became final. In Unicode 5.0
U+0BE6 TAMIL DIGIT ZERO
U+0BB6 TAMIL LETTER SHA
Post by K Kalyanasundaram
As a glyph-based encoding, TSCII codechart includes vowels,
consonants
Post by K Kalyanasundaram
and abugida (compound vowel-consonant) characters. Unicode, as a
character encoding encodes only vowels and consonants. Hence not
all
Post by K Kalyanasundaram
codepoints of TSCII can be converted one-to-one with ISO 10646.
What that means is that the referenced technical note is
incomplete currently as regards the 10646 equivalency of the
Post by K Kalyanasundaram
80 TAMIL DIGIT CUZHI = Tamil digit zero
82 TAMIL GRANTHA LETTER SRI = Tamil letter sri
--Ken
Martin Duerst
2007-02-27 23:38:50 UTC
Permalink
Great. I suggest that you nevertheless mention the two codepoints
in your registration, to make sure we can move forward without
having to wait for the update of the technical note.

Regards, Martin.
Post by K Kalyanasundaram
We are aware that Unicode has added (or redefined) few characters in
the Tamil block since the tech. note #15 was filed. We are already in
contact
with Mr. Rick McGowan of Unicode Consortium on placing an updated
version of this technical note, so that the mapping table does
correspond
to latest version of Unicode Tamil Block.
K. Kalyanasundaram
Post by Kenneth Whistler
Post by K Kalyanasundaram
http://www.tscii.org/tsciispec.html
Available as a technical note at the Unicode Consortium website
http://www.unicode. org/notes/ tn15/
That Technical Note #15 is out of date, reflecting the state
of affairs as of Unicode 4.0. It contains two notes regarding
the likelihood of the encoding of a Tamil digit zero and a
Tamil sha, but before they became final. In Unicode 5.0
U+0BE6 TAMIL DIGIT ZERO
U+0BB6 TAMIL LETTER SHA
Post by K Kalyanasundaram
As a glyph-based encoding, TSCII codechart includes vowels,
consonants
Post by K Kalyanasundaram
and abugida (compound vowel-consonant) characters. Unicode, as a
character encoding encodes only vowels and consonants. Hence not
all
Post by K Kalyanasundaram
codepoints of TSCII can be converted one-to-one with ISO 10646.
What that means is that the referenced technical note is
incomplete currently as regards the 10646 equivalency of the
Post by K Kalyanasundaram
80 TAMIL DIGIT CUZHI = Tamil digit zero
82 TAMIL GRANTHA LETTER SRI = Tamil letter sri
--Ken
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
K Kalyanasundaram
2007-03-07 16:54:13 UTC
Permalink
Dear Martin:

Updated version of TSCII <--> Unicode mapping table is now available
at the Unicode.org website under URL

http://www.unicode.org/notes/tn15/tn15-2.html

This URL is indicated within the general URL given for the tech.note
#15.
This takes care of the point raised by Ken Whistler.

Can you please advice us on any other changes that you would like to
see in the plain ASCII version of TSCII charset registration request
that I posted earlier for it to meet the approval of IETF reviewers?

K. Kalyanasundaram
Post by Martin Duerst
Great. I suggest that you nevertheless mention the two codepoints
in your registration, to make sure we can move forward without
having to wait for the update of the technical note.
Regards, Martin.
Post by K Kalyanasundaram
We are aware that Unicode has added (or redefined) few characters in
the Tamil block since the tech. note #15 was filed. We are already
in
Post by K Kalyanasundaram
contact
with Mr. Rick McGowan of Unicode Consortium on placing an updated
version of this technical note, so that the mapping table does
correspond
to latest version of Unicode Tamil Block.
K. Kalyanasundaram
Post by Kenneth Whistler
Post by K Kalyanasundaram
http://www.tscii.org/tsciispec.html
Available as a technical note at the Unicode Consortium website
http://www.unicode. org/notes/ tn15/
That Technical Note #15 is out of date, reflecting the state
of affairs as of Unicode 4.0. It contains two notes regarding
the likelihood of the encoding of a Tamil digit zero and a
Tamil sha, but before they became final. In Unicode 5.0
U+0BE6 TAMIL DIGIT ZERO
U+0BB6 TAMIL LETTER SHA
Post by K Kalyanasundaram
As a glyph-based encoding, TSCII codechart includes vowels,
consonants
Post by K Kalyanasundaram
and abugida (compound vowel-consonant) characters. Unicode, as a
character encoding encodes only vowels and consonants. Hence not
all
Post by K Kalyanasundaram
codepoints of TSCII can be converted one-to-one with ISO 10646.
What that means is that the referenced technical note is
incomplete currently as regards the 10646 equivalency of the
Post by K Kalyanasundaram
80 TAMIL DIGIT CUZHI = Tamil digit zero
82 TAMIL GRANTHA LETTER SRI = Tamil letter sri
--Ken
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp
Shawn Steele
2007-02-27 01:06:57 UTC
Permalink
TSCII as an established language encoding is already recognized by
major IT players like the Unicode Consortium, Microsoft, Apple, Oracle
and Sun Microsystems. With OS-level support for Tamil in Microsoft
Windows 2000 and later OS releases and very recently in Apple's Mac OS
X 10.4 (Tiger) release, Tamil Diaspora has started to use Unicode
already. The Purpose of this formal registration with IETF is to
facilitate migration of the vast amounts of legacy data in TSCII and
multitude of users.
This statement appears to be inaccurate since Microsoft doesn't actually
support TSCII, although there are 3rd party apps and workarounds that
may provide limited support on Windows machines. We do support Tamil in
the Unicode space, but that's not the same as supporting the TSCII
charset.

This registration might be interesting in supporting conversion of
legacy data to a more compatible encoding such as Unicode, but it
shouldn't be misleading about the level of support that the TSCII
encoding has. Microsoft would recommend Unicode for accurate exchange
of Tamil data.

- Shawn

Shawn Steele
SDE
Windows International
Microsoft
Mani Manivannan
2007-02-27 03:57:05 UTC
Permalink
Post by Shawn Steele
With OS-level support for Tamil in Microsoft
Windows 2000 and later OS releases and very recently in Apple's Mac OS
X 10.4 (Tiger) release, Tamil Diaspora has started to use Unicode
already.
This statement appears to be inaccurate since Microsoft doesn't actually
support TSCII, although there are 3rd party apps and workarounds that
may provide limited support on Windows machines. We do support Tamil in
the Unicode space, but that's not the same as supporting the TSCII
charset.
Shawn's conclusion is based on a misreading of the preamble.

The quoted statement doesn't say nor does it imply that there is OS-level support for TSCII encoding. It says there is OS-level support for the Tamil language (using Unicode) and due to this support the Tamil diaspora has started to migrate to Unicode (away from TSCII and other encodings). It makes the case that since vast amount of Tamil content exists in TSCII encoding already, a formal registration of TSCII encoding will be able to assist those vendors that want to make these TSCII based content available to the OS (and search engine) users and help help migrate content in TSCII encoding to Unicode (or possibly other OS supported standard encodings).

Mani M. Manivannan
TSCII.ORG

-----Original Message-----
Post by Shawn Steele
Sent: Feb 26, 2007 5:06 PM
Subject: RE: Request for egistration of character set TSCII for TAMIL language
TSCII as an established language encoding is already recognized by
major IT players like the Unicode Consortium, Microsoft, Apple, Oracle
and Sun Microsystems. With OS-level support for Tamil in Microsoft
Windows 2000 and later OS releases and very recently in Apple's Mac OS
X 10.4 (Tiger) release, Tamil Diaspora has started to use Unicode
already. The Purpose of this formal registration with IETF is to
facilitate migration of the vast amounts of legacy data in TSCII and
multitude of users.
This statement appears to be inaccurate since Microsoft doesn't actually
support TSCII, although there are 3rd party apps and workarounds that
may provide limited support on Windows machines. We do support Tamil in
the Unicode space, but that's not the same as supporting the TSCII
charset.
This registration might be interesting in supporting conversion of
legacy data to a more compatible encoding such as Unicode, but it
shouldn't be misleading about the level of support that the TSCII
encoding has. Microsoft would recommend Unicode for accurate exchange
of Tamil data.
- Shawn
Shawn Steele
SDE
Windows International
Microsoft
Shawn Steele
2007-02-27 21:54:46 UTC
Permalink
I'm encouraged that this draft is being submitted to aid the migration of Tamil data to Unicode.
TSCII as an established language encoding is already recognized by
major IT players like the Unicode Consortium, Microsoft, Apple,
Oracle and Sun Microsystems.
Microsoft does not recognize TSCII. We know it exists, but that's about it.

Other participants in the Unicode Consortium can speak to their products, but this sentence implies that Microsoft products would understand TSCII. Since Outlook, Vista, etc. do not natively understand TSCII this could confuse our users.

I don't object to the registration of TSCII, I'd just like this reference that implies Microsoft recognizes TSCI
Michael Yau
2007-02-27 22:29:45 UTC
Permalink
Oracle does not support (recognize) TSCII, but supports Tamil script as
encoded in Unicode.

Michael Yau
Server Globalization Technology
Oracle Corporation
Post by Shawn Steele
I'm encouraged that this draft is being submitted to aid the migration of Tamil data to Unicode.
TSCII as an established language encoding is already recognized by
major IT players like the Unicode Consortium, Microsoft, Apple,
Oracle and Sun Microsystems.
Microsoft does not recognize TSCII. We know it exists, but that's about it.
Other participants in the Unicode Consortium can speak to their products, but this sentence implies that Microsoft products would understand TSCII. Since Outlook, Vista, etc. do not natively understand TSCII this could confuse our users.
I don't object to the registration of TSCII, I'd just like this reference that implies Microsoft recognizes TSCII to be removed.
Thanks,
- Shawn
K Kalyanasundaram
2007-02-28 08:10:26 UTC
Permalink
Dear Shawn Steele and Michael Yau:

Many of us have been associated with a global Organization devoted
to Tamil IT called INFITT ("International Forum for Information
Technology
in Tamil"). INFITT organizes Tamil Internet Conferences in different
parts
of the world. This is an annual gathering of hardware & software
professionals
working in Tamil IT development. Representatives of many IT MNCs
(Microsoft, Apple, Sun, Oracle, IBM,.. ) have participated in these
conferences
where TSCII and other 8-bit bilingual encodings are extensively
discussed,
in particular migration of data from these legacy encodings to Unicode.

Papers presented in these TICs are available online at the INFITT
website
<www.infitt.org>. INFITT has a formal liason relationship with Unicode
Consortium, helping the UTC in the context of Tamil Unicode Block.

Steele has summarized correctly the present situation - many IT MNCs
are "aware" of the existence of TSCII encoding for Tamil, but there is
no direct support at the OS level. It is well known that MNCs do not
provide
direct support for 3rd party encodings that are not formally
registered.

Few years back we contacted Unicode Consortium to include a TSCII-
Unicode mapping table, mainly to aid migration of TSCII users to
multilingual
Unicode. IBM, for example, has recognized the need for such mapping
table to help migration:

http://www-306.ibm.com/software/globalization/topics/migratingdata/charactermap.jsp

Present initiative for a formal registration of TSCII is in the same
spirit- help
migration of users and legacy data to Unicode. It will be very helpful
for software
developers to provide migration support if popular (hacked?)
encodings
such as TSCII are formally registered with IETF/IANA.

In conclusion I want to state, in the plain ASCII version of TSCII
registration
request, we have condensed heavily the general introductory part and
removed all references to the knowledge or awareness of TSCII amongst
IT MNCs.

with best regards
K. Kalyanasundaram
Post by Michael Yau
Oracle does not support (recognize) TSCII, but supports Tamil script
as encoded in Unicode.
Michael Yau
Server Globalization Technology
Oracle Corporation
Post by Shawn Steele
I'm encouraged that this draft is being submitted to aid the
migration of Tamil data to Unicode.
Post by Shawn Steele
To clarify my earlier message, my concern is with the implication of
TSCII as an established language encoding is already recognized by
major IT players like the Unicode Consortium, Microsoft, Apple,
Oracle and Sun Microsystems.
Microsoft does not recognize TSCII. We know it exists, but that's
about it.
Post by Shawn Steele
Other participants in the Unicode Consortium can speak to their
products, but this sentence implies that Microsoft products would
understand TSCII. Since Outlook, Vista, etc. do not natively
understand TSCII this could confuse our users.
Post by Shawn Steele
I don't object to the registration of TSCII, I'd just like this
reference that implies Microsoft recognizes TSCII to be removed.
Post by Shawn Steele
Thanks,
- Shawn
Shawn Steele
2007-02-28 22:17:34 UTC
Permalink
Thanks,

Shawn

-----Original Message-----
From: K Kalyanasundaram [mailto:***@yahoo.com]
Sent: Poʻakolu, Pepeluali 28, 2007 12:10 AM
To: Michael Yau; ietf-***@iana.org; Shawn Steele
Cc: Mani Manivannan
Subject: Re: Request for registration of character set TSCII for TAMIL language

Dear Shawn Steele and Michael Yau:

Many of us have been associated with a global Organization devoted
to Tamil IT called INFITT ("International Forum for Information
Technology
in Tamil"). INFITT organizes Tamil Internet Conferences in different
parts
of the world. This is an annual gathering of hardware & software
professionals
working in Tamil IT development. Representatives of many IT MNCs
(Microsoft, Apple, Sun, Oracle, IBM,.. ) have participated in these
conferences
where TSCII and other 8-bit bilingual encodings are extensively
discussed,
in particular migration of data from these legacy encodings to Unicode.

Papers presented in these TICs are available online at the INFITT
website
<www.infitt.org>. INFITT has a formal liason relationship with Unicode
Consortium, helping the UTC in the context of Tamil Unicode Block.

Steele has summarized correctly the present situation - many IT MNCs
are "aware" of the existence of TSCII encoding for Tamil, but there is
no direct support at the OS level. It is well known that MNCs do not
provide
direct support for 3rd party encodings that are not formally
registered.

Few years back we contacted Unicode Consortium to include a TSCII-
Unicode mapping table, mainly to aid migration of TSCII users to
multilingual
Unicode. IBM, for example, has recognized the need for such mapping
table to help migration:

http://www-306.ibm.com/software/globalization/topics/migratingdata/charactermap.jsp

Present initiative for a formal registration of TSCII is in the same
spirit- help
migration of users and legacy data to Unicode. It will be very helpful
for software
developers to provide migration support if popular (hacked?)
encodings
such as TSCII are formally registered with IETF/IANA.

In conclusion I want to state, in the plain ASCII version of TSCII
registration
request, we have condensed heavily the general introductory part and
removed all references to the knowledge or awareness of TSCII amongst
IT MNCs.

with best regards
K. Kalyanasundaram
Post by Michael Yau
Oracle does not support (recognize) TSCII, but supports Tamil script
as encoded in Unicode.
Michael Yau
Server Globalization Technology
Oracle Corporation
Post by Shawn Steele
I'm encouraged that this draft is being submitted to aid the
migration of Tamil data to Unicode.
Post by Shawn Steele
To clarify my earlier message, my concern is with the implication of
TSCII as an established language encoding is already recognized by
major IT players like the Unicode Consortium, Microsoft, Apple,
Oracle and Sun Microsystems.
Microsoft does not recognize TSCII. We know it exists, but that's
about it.
Post by Shawn Steele
Other participants in the Unicode Consortium can speak to their
products, but this sentence implies that Microsoft products would
understand TSCII. Since Outlook, Vista, etc. do not natively
understand TSCII this could confuse our users.
Post by Shawn Steele
I don't object to the registration of TSCII, I'd just like this
reference that implies Microsoft recognizes TSCII to be removed.
Post by Shawn Steele
Thanks,
Loading...