e Multilingual Lion:
T EX learns to seak Unicode
Jonathan Kew SIL International
April 7, 2005 A 27th Internationalization and Unicode Conference Berlin, Germany, April 2005
April 7, 2005 A Jonathan Kew SIL International 27th - - PowerPoint PPT Presentation
e Multilingual Lion: T EX learns to seak Unicode April 7, 2005 A Jonathan Kew SIL International 27th Internationalization and Unicode Conference Berlin, Germany, April 2005 Te Multilingual Lion: T EX learns to seak Unicode Background
Jonathan Kew SIL International
April 7, 2005 A 27th Internationalization and Unicode Conference Berlin, Germany, April 2005
preprocessors
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
Source text Typeset output Notes
\'{a}
typical accent command
\c{c}
\aa
ligature in typical T EX fonts
$\alpha$
math mode symbol
{\dnacchaa}
using custom preprocessor
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
included in the current codepage
language/script being typeset
text data stream
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
\halign{#\hfil\quad& #\hfil\cr dan&dan\cr dubok&dubok\cr džabe&ak\cr džin&džabe\cr Džin&džin\cr ak&Džin\cr Evropa&Evropa\cr}
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
\font\han="STSong"at16pt \font\rom="Gentium"at8pt \def\hc#1#2{\vtop{\hbox{\han#1} \hbox{\kern10pt\rom#2}}} \vtop{\hc{書く}{ka-ku} \hc{最も}{motto-mo} \hc{最後}{sai-go} \hc{働く}{hatara-ku} \hc{海}{umi}}
ka-ku
motto-mo
sai-go
hatara-ku
umi
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
\c1 \s \p \v1 . \v2 .
.“.
ﺶﺋﺍﺪﻴﭘ ﻲﺟ ﺎﻴﻧﺩ
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
EX fonts and documents
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
match the extended characer set
EX commands that refer to characer codes
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
can be treated as simple printable characers
restructuring
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
EX regards “a secific characer code in a secific font” as the fundamental unit of text to be typeset
precisely placed, and intervening “glue” nodes
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
E T EX paragraph consists of sequence of “word” nodes separated by “glue”
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
Nodes in a T EX paragraph Corresponding nodes in X E T EX
!"#$%!&'()!*+,-$ !"#$%!&'()!*+,-$ !"#$%!&'()!*+,-$
!"#$%!&'()!*+,-$ !"#$%!&'()!*+,-$ !"#$%!&'()!*+,-$ &'()%!.'/ &'()%!0#1-2 &'()%!34$
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
technology in use
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
ATSUSetRunStyle
ubidi_getDirection, ubidi_countRuns, ubidi_getVisualRun
getGlyphPositions
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
EX to remain unaware of low-level details
EX’s algorithm
break nodes
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
!"# $%&' ()**'+',- *#.'/ $%&' !"# $%&' ()* *'+ ',- *#.'/ $%&' 0120',3 0120',3
!"# $%&' ()* *'+ ',- *#.'/ $%&'
!"# $%&' ()**'+, '-. *#/'0 $%&'
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
\font\Doulos="DoulosSIL/ICU" \font\DoulosViet="DoulosSIL/ICU:language=VIT"
\font\Brioso="BriosoPro" \font\BriosoTrk="BriosoPro:language=TRK"
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
\font\Doulos="DoulosSIL/AAT" \font\DoulosAlt="DoulosSIL/AAT: Alternateforms=Literacyalternates,
UppercaseEngalternates=CapitalNwithtail"
Xɔsee na Mose ɖo Ŋutitotoŋkeke la anyi, eye wòna wohlẽ ʋu ɖe ʋɔtrutiwo ŋu bene dɔla si atsr ŋgɔgbeviwo la nagawɔ nuvevi Israel viwo ya o. Xɔsee n Mose ɖo utitotoŋkeke l nyi, eye wòn wohlẽ u ɖe ɔtrutiwo ŋu bene dɔl si tsr ŋɔbeviwo l nwɔ nuvevi Isrel viwo y o.
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
EX normally breaks lines at “glue” arising from spaces
โดยการกำหนดหมายเลขใหสำหรับแตละตัว. กอนหนาที่๊ Unicode จะถูกสรางขึ้น, ไดมีระบบ encoding อยูหลายรอยระบบสำหรับการกำหนดหมายเลขเหลานี้.
เก็บตัวอักษรและอักขระอื่นๆ โดยการกำหนดหมายเลขใหสำหรับแตละตัว. กอนหนาที่๊ Unicode จะถูกสรางขึ้น, ไดมีระบบ encoding อยูหลายรอย ระบบสำหรับการกำหนดหมายเลขเหลานี้.
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
EX font metrics and Type 1 font files
EX documents to work
∞
−∞
e−x2 dx 2 = ∞
−∞
∞
−∞
e−(x2+y2) dx dy = 2π ∞ e−r2r dr dθ = 2π
2
r=0
= π.
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
\XeTeXinputencoding"charset-name"
\XeTeXdefaultencoding"charset-name"
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
``\TeX''---atypesettingsystem
EX''---a typeseting system
;TECkitmappingforTeXinputconventions U+002DU+002D<>U+2013;--->endash U+002DU+002DU+002D<>U+2014;---->emdash U+0027<>U+2019;'->rightsinglequote U+0027U+0027<>U+201D;''->rightdoublequote U+0022>U+201D;"->rightdoublequote
EX”—a typeseting system
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
\def\SampleText{Unicode- этоуникальный коддлялюбогосимвола,\\ независимоотплатформы,\\ независимоотпрограммы,\\ независимоотязыка.} \font\gen="Gentium" \gen\SampleText \bigskip \font\gentrans="Gentium: mapping=cyr-lat-iso9" \gentrans\SampleText
Unicode - это уникальный код для любого символа, независимо от платформы, независимо от программы, независимо от языка. Unicode - èto unikal'nyj kod dlâ lûbogo simvola, nezavisimo ot platformy, nezavisimo ot programmy, nezavisimo ot âzyka.
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
E T EX, but now obsolete
E T EX implementation
EX to all scripts
27th Internationalization and Unicode Conference Berlin, Germany, April 2005
?؟"ﺩﻮﻜِﻧﻮﻳ" ﺓﺪﺣﻮﳌﺍ ﺓﺮﻔﺸﻟﺍ ﻲﻫ ﺎﻣ 什麽是Unicode (統一碼/標準萬國碼)? Što je Unicode? ? Τί εἶναι τὸ Unicode; ?דוקינוי הז המ यूिनकोड ा है? Hvað er Unicode? ユニコードとは何か? 유니코드에대해?؟ﺖﺴﻴﭼ ﺪُﻛﯽﻧﻮﻳ Что такое Unicode? Unicode คืออะไร? ?
27th Internationalization and Unicode Conference Berlin, Germany, April 2005