ICANN IDN TLD Variant Issues Project
Presentation to the Unicode Technical Committee Andrew Sullivan (consultant) ajs@anvilwalrusden.com
L2/11-426
ICANN IDN TLD Variant Issues Project Presentation to the Unicode - - PowerPoint PPT Presentation
L2/11-426 ICANN IDN TLD Variant Issues Project Presentation to the Unicode Technical Committee Andrew Sullivan (consultant) ajs@anvilwalrusden.com Im a consultant Blame me for mistakes here, not staff or ICANN 2 Background DNS
ICANN IDN TLD Variant Issues Project
Presentation to the Unicode Technical Committee Andrew Sullivan (consultant) ajs@anvilwalrusden.com
L2/11-426
I’m a consultant
Blame me for mistakes here, not staff or ICANN
2 ¡
Background
(a subset of) ASCII
normally use ASCII
Names for Applications (IDNA) invented to help
3 ¡
Reminder: two flavours
IDNA2003 IDNA2008
4 ¡
Basic problem
DNS label repertoire
fit perfectly in other languages, scripts, or both
work like parts of natural language
5 ¡
What makes a DNS label?
is Letters, Digits, and Hyphen (“LDH”)
preserving
6 ¡
IDNA
in label
practical with deployed software
software or protocol
7 ¡
IDNA2003
that are allowed
troublesome (e.g. ZWNJ, upper-to-lowercase) using Nameprep
installed base, this is it
8 ¡
IDNA2008
perceived limitations of IDNA2003
points based on code point properties
with IDNA2003
9 ¡
What’s a variant?
Exactly
10 ¡
Origins of variants
Chinese/Traditional Chinese issue
issues, not always related
11 ¡
Things people have claimed
substitutable
meaning”
child names, sometimes not
12 ¡
Why now?
process delegated some
development
“variants”, we should be able to say what they are.
13 ¡
IDN Variant Issues Project
14
IDN Variant Issues Project
15 ¡ We are here
Comment period to 14 Nov
http://www.icann.org/en/ announcements/ announcement-4-03oct11- en.htm and h.p://www.icann.org/en/ public-‑comment/ ¡
16 ¡
Reports are only about the root
While some of the conclusions may apply to
reports discuss variants for TLDs only
17 ¡
A planned constraint for TLDs
Current rule is “only letters” (strictly, General Category {Ll, Lo, Lm, Mn})
18 ¡
From the guidebook
Restrictions suggested in report
19 ¡
Arabic team
ZWNJ
context rule
variants
20 ¡
Arabic team
Groups of characters
position (e.g. YEH)
position (e.g. ALEF w/ HAMZA ABOVE)
KAF vs SWASH KAF)
21 ¡
Arabic team
“NFC” issues
U+0648,U+064F
“confusables” algorithms?
22 ¡
Arabic team
Recommendations
variant, all resulting labels are available to the applicant
which ones to activate
23 ¡
Arabic team
Focus on Chinese Language
“script”, but report primarily about Chinese
effects on Japanese and Korean
24 ¡
Chinese team
RFC 3743, experience
use
25 ¡
Chinese team
Two fundamental cases
Separation Rule (e.g. U+6237 versus U+6236)
26 ¡
Chinese team
Focus on reducing confusion
confusion of strings between languages
no strong recommendation that “everything works”
27 ¡
Cyrillic team
Different from other cases
some other scripts
environment:
reforms
28 ¡
Cyrillic team
One language can cause issues
language obliterate differences in others
U+0433 vs U+0491
keyboards
29 ¡
Cyrillic team
Interaction with other scripts
and Latin raised
problematic
30 ¡
Cyrillic team
Very different issues
priority issue
URL bar display
akshars
31 ¡
Devanagari team
Environment issues
can be problematic
32 ¡
Devanagari team
ZWJ and ZWNJ
languages rely on ZWJ
precomposed version that will do
paradigms
33 ¡
Devanagari team
Inter-script issues
Devanagari and other Bramhi-derived scripts?
be important
34 ¡
Devanagari team
Unusual case
scripts in being used for
35 ¡
Greek team
Additional restrictions
excluding ancient characters
to Monotonic characters
36 ¡
Greek team
Sigma and Tonos
to lower case: Tonos can be lost
form sigma
applications in IDNA2008
37 ¡
Greek team
Final sigma
final form sigmas wherever requested
final sigma (i.e. with small sigma in place of final sigma)
38 ¡
Greek team
Tonos
with Tonos where requested
stripped
39 ¡
Greek team
Dimotiki and Katharevousa
Katharevousa string is requested, the “same” Dimotiki “word” is blocked
variant behaviour because
40 ¡
Greek team
The impossible dream
relationships among characters in Latin-using languages
41 ¡
Latin team
Remember, please comment
Open until 14 November h.p://www.icann.org/en/ public-‑comment/ ¡
42 ¡
Questions
43 ¡