icann idn tld variant issues project
play

ICANN IDN TLD Variant Issues Project Presentation to the Unicode - PowerPoint PPT Presentation

L2/11-426 ICANN IDN TLD Variant Issues Project Presentation to the Unicode Technical Committee Andrew Sullivan (consultant) ajs@anvilwalrusden.com Im a consultant Blame me for mistakes here, not staff or ICANN 2 Background DNS


  1. L2/11-426 ICANN IDN TLD Variant Issues Project Presentation to the Unicode Technical Committee Andrew Sullivan (consultant) ajs@anvilwalrusden.com

  2. I’m a consultant Blame me for mistakes here, not staff or ICANN 2 ¡

  3. Background • DNS labels were always in (a subset of) ASCII • Lots of people don’t normally use ASCII • Internationalized Domains Names for Applications (IDNA) invented to help 3 ¡

  4. Reminder: two flavours IDNA2003 IDNA2008 4 ¡

  5. Basic problem • IDNA (2003 & 2008) expands DNS label repertoire • The LDH pattern does not fit perfectly in other languages, scripts, or both • People want DNS labels to work like parts of natural language 5 ¡

  6. What makes a DNS label? • DNS labels are octets • Preferred syntax (RFC 1035) is Letters, Digits, and Hyphen (“LDH”) • Special DNS rule for ASCII • Case insensitive but case- preserving 6 ¡

  7. IDNA • Permit non-LDH characters in label • Be as compatible as practical with deployed software • No changes to deployed DNS software or protocol 7 ¡

  8. IDNA2003 • Provide a list of code points that are allowed • Map cases that are troublesome (e.g. ZWNJ, upper-to-lowercase) using Nameprep • To the extent there’s an installed base, this is it 8 ¡

  9. IDNA2008 • Attempt to address some perceived limitations of IDNA2003 • Permits or disallows code points based on code point properties • Certain incompatibilities with IDNA2003 9 ¡

  10. What’s a variant? Exactly 10 ¡

  11. Origins of variants • Starts because of Simplified Chinese/Traditional Chinese issue • JET Guidelines (RFC 3743) • Became model for other issues, not always related 11 ¡

  12. Things people have claimed • Characters that are substitutable • “Same words” or “same meaning” • Sometimes a constraint on child names, sometimes not 12 ¡

  13. Why now? • ccTLD IDN “Fast Track” process delegated some • Not uncontroversial • New gTLDs under development • If we’re going to create “variants”, we should be able to say what they are. 13 ¡

  14. IDN Variant Issues Project 14

  15. IDN Variant Issues Project { ¡ We are here 15 ¡

  16. Comment period to 14 Nov http://www.icann.org/en/ announcements/ announcement-4-03oct11- en.htm and h.p:// www.icann.org /en/ public-­‑comment/ ¡ 16 ¡

  17. Reports are only about the root While some of the conclusions may apply to other types of zones, the reports discuss variants for TLDs only 17 ¡

  18. A planned constraint for TLDs Current rule is “only letters” (strictly, General Category {Ll, Lo, Lm, Mn}) From the • guidebook No numerals • No HYPHEN-MINUS • No ZWNJ/ZWJ 18 ¡

  19. Restrictions suggested in report • No combining marks Arabic team • No digits • No archaic • No Quranic marks 19 ¡

  20. ZWNJ • Arguments for and against • Refinement of IDNA2008 Arabic team context rule • Issue is lack of shape change • Questions about resulting variants 20 ¡

  21. Groups of characters • Identical shape at some position (e.g. YEH) • Similar shape at some Arabic team position (e.g. ALEF w/ HAMZA ABOVE) • Interchangeable use (e.g. KAF vs SWASH KAF) 21 ¡

  22. “NFC” issues • Not exactly issue with NFC • Example: U+06C7 vs. Arabic team U+0648,U+064F • Perhaps could be caught by “confusables” algorithms? 22 ¡

  23. Recommendations • Whenever there is a variant, all resulting labels Arabic team are available to the applicant • It is up to the applicant which ones to activate 23 ¡

  24. Focus on Chinese Language • Reports in principle about “script”, but report Chinese team primarily about Chinese • Some consideration of effects on Japanese and Korean 24 ¡

  25. RFC 3743, experience • Experience at other levels Chinese team of DNS • RFC 3743 a good fit for CJK use 25 ¡

  26. Two fundamental cases • Traditional vs Simplified Chinese team • Variation due to Source Separation Rule (e.g. U+6237 versus U+6236) 26 ¡

  27. Focus on reducing confusion • Mainly interested in confusion of strings Cyrillic team between languages • Unlike Chinese and Arabic, no strong recommendation that “everything works” 27 ¡

  28. Different from other cases • Many more languages than some other scripts • Extremely fraught political Cyrillic team environment: • Cyrillic vs. Latin • Cyrillic vs. Arabic • Many spelling & character reforms 28 ¡

  29. One language can cause issues • Substitutions in one language obliterate differences in others Cyrillic team • E.g. U+0435 vs U+0451, U+0433 vs U+0491 • Some characters not on keyboards 29 ¡

  30. Interaction with other scripts • Issue of relation to Greek Cyrillic team and Latin raised • Declared out of scope, but problematic 30 ¡

  31. Very different issues • Confusing similarity a high priority issue Devanagari • Especially worried about team URL bar display • Concern about ill-formed akshars 31 ¡

  32. Environment issues • Display of Devanagari script Devanagari can be problematic team • Rendering engines • Fonts 32 ¡

  33. ZWJ and ZWNJ • Some Devanagari-using languages rely on ZWJ • Even if there is a Devanagari precomposed version that team will do • ZWNJ needed for noun paradigms • Use in TLDs not clear 33 ¡

  34. Inter-script issues • Relationship between Devanagari Devanagari and other team Bramhi-derived scripts? • Ruled out of scope, but may be important 34 ¡

  35. Unusual case • Greek alone in studied Greek team scripts in being used for only one language 35 ¡

  36. Additional restrictions • Team recommends excluding ancient Greek team characters • Team recommends sticking to Monotonic characters 36 ¡

  37. Sigma and Tonos • IDNA2003 maps upper case to lower case: Tonos can be lost Greek team • IDNA2003 maps away final form sigma • Transformations in applications in IDNA2008 37 ¡

  38. Final sigma • Recommend registering final form sigmas wherever requested Greek team • Also register without the final sigma (i.e. with small sigma in place of final sigma) 38 ¡

  39. Tonos • Recommend registering Greek team with Tonos where requested • Also register with Tonos stripped 39 ¡

  40. Dimotiki and Katharevousa • Recommendation that, if Katharevousa string is requested, the “same” Greek team Dimotiki “word” is blocked • Only report that requests variant behaviour because of whole-string meaning 40 ¡

  41. The impossible dream • There are too many relationships among Latin team characters in Latin-using languages • There’s no way to decide • Therefore, no variants 41 ¡

  42. Remember, please comment Open until 14 November h.p:// www.icann.org /en/ public-­‑comment/ ¡ 42 ¡

  43. Questions 43 ¡

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend