Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda - - PowerPoint PPT Presentation

unicode agenda for bangla unicode agenda for bangla
SMART_READER_LITE
LIVE PREVIEW

Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda - - PowerPoint PPT Presentation

Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for Bangla Bidyut Baran Chaudhuri Bidyut Baran Chaudhuri Society for Natural Language Technology Research Society for Natural Language Technology


slide-1
SLIDE 1

Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for Bangla

Bidyut Baran Chaudhuri Bidyut Baran Chaudhuri

Society for Natural Language Technology Research Society for Natural Language Technology Research & Indian Statistical Institute, Indian Statistical Institute, Kolkata, India

L2/09-294

slide-2
SLIDE 2

Indian Script and Bangla Indian Script and Bangla

 Most

Most Indian Indian Scripts Scripts are are derived derived from from Ancient Ancient Brahmi Brahmi script script script script.

 They

They are are alpha alpha-

  • syllabary/abiguda

syllabary/abiguda class class of

  • f scripts

scripts. .

 Indian

Indian writing writing system system started started to to evolve evolve 3000 3000 years years ago ago

 Indian

Indian writing writing system system started started to to evolve evolve 3000 3000 years years ago ago.

 Perhaps

Perhaps inspired inspired by by Ancient Ancient Aramic, Aramic, but but have have exceptional exceptional originality

  • riginality of
  • f Indian

Indian philologists philologists. . p g y g y p g p g

 Alphabet

Alphabet matrix matrix is is arranged arranged according according to to manner manner of

  • f

articulation articulation like like unvoiced unvoiced (unaspirated, (unaspirated, aspirated), aspirated), i d i d ( i d ( i d i d) i d) l f voiced voiced (unaspirated, (unaspirated, aspirated) aspirated) versus versus place place

  • f
  • f

articulation articulation in in mouth mouth like like velar, velar, post post-

  • alveolar,

alveolar, alveolar, alveolar, dental dental and and bilabial bilabial. dental dental and and bilabial bilabial.

slide-3
SLIDE 3

Brahmi Alpha Numerals Brahmi Alpha Numerals

slide-4
SLIDE 4

From Brahmi to Bangla From Brahmi to Bangla

 Full

Full-

  • blown

blown Brahmi Brahmi script script was was active active during during the the days days of

  • f

Christ Christ but but its its initial initial form form started started earlier earlier Christ, Christ, but but its its initial initial form form started started earlier earlier.

 It

It branched branched into into north north and and south south Indian Indian groups groups. .

 By

By 800 800 AD AD a north north variety variety named named Kutila Kutila script script evolved evolved

 By

By 800 800 AD AD a north north variety variety named named Kutila Kutila script script evolved evolved through through Kushana Kushana-

  • Gupta

Gupta group group of

  • f scripts

scripts. .

 Kutila

Kutila means means complicated complicated (the (the upper upper-

  • caste

caste people people did did not not like like the the lower lower-

  • caste

caste people people to to learn learn writing writing and and reading) reading). .

 B 1000

1000 AD AD proto proto Bangla Bangla script script e ol ed e ol ed

 By

By 1000 1000 AD AD proto proto-Bangla Bangla script script evolved evolved.

 Proto

Proto modern modern Bangla Bangla script script evolved evolved by by 1500 1500 AD AD. .

 By

By 18 18th

th century

century modern modern Bangla Bangla script script was was ready ready There There

 By

By 18 18th

th century

century modern modern Bangla Bangla script script was was ready

  • ready. There

There were were 34 34 consonants consonants and and 10 10 vowels vowels. .

slide-5
SLIDE 5

Bangla Script Evolution Bangla Script Evolution

Contd…

slide-6
SLIDE 6
slide-7
SLIDE 7

Stabilization of Bangla Script Stabilization of Bangla Script

 Printing

Printing in in Bangla Bangla started started in in late late eighteenth eighteenth century century (Halhed, (Halhed, 1778 1778). ( )

 Full

Full stop stop and and double double full full stop stop were were only

  • nly punctuation

punctuation marks marks noted noted in in initial initial script script. .

 Other

Other punctuation punctuation marks marks were were borrowed borrowed from from English English.

 Vidyasagar

Vidyasagar introduced introduced three three more more characters characters in in mid mid y g y g nineteenth nineteenth century century by by placing placing dot dot below below three three existing existing characters characters. .

 Some

Some characters characters like like li li and and double double-

  • li

li became became obsolete

  • bsolete.

.

 This

This stabilized stabilized script script system system remained remained in in use use for for 150 150 years years. .

slide-8
SLIDE 8

The Alphabet Currently Used for Bangla The Alphabet Currently Used for Bangla

slide-9
SLIDE 9

Further Modification of Further Modification of Bangla Script Bangla Script

 After

After 1900 1900 AD AD Spelling Spelling correction correction and and script script correction correction debates debates gained gained momentum momentum. .

 Several

Several correction correction suggestions suggestions were were accepted accepted

 Several

Several correction correction suggestions suggestions were were accepted accepted through through the the initiative initiative of

  • f Kolkata

Kolkata University University. .

 New

New Decimal Decimal monetary monetary system, system, weighing weighing t d d t d d t i t d d i t d d d 1960 1960 standards standards etc etc were were introduced introduced around around 1960 1960s. .

 Some

Some of

  • f the

the older

  • lder signs

signs and and symbols symbols disappeared disappeared. .

 Simplification

Simplification in in Representation Representation

  • f
  • f

conjunct conjunct

 Simplification

Simplification in in Representation Representation

  • f
  • f

conjunct conjunct characters characters are are being being proposed proposed since since twenty twenty years years. . There There is is still still debate debate on

  • n which

which should should be be simplified simplified.

slide-10
SLIDE 10

Development of Bangla ISCII and Unicode Development of Bangla ISCII and Unicode

 ISCII for Indian Languages were developed in 1980’s

ISCII for Indian Languages were developed in 1980’s g g p g g p through the initiatives of Dept. of Information through the initiatives of Dept. of Information Technology, Govt. of India. Technology, Govt. of India.

 Bangla script too got an ISCII version.

Bangla script too got an ISCII version.

 There has always been some problems in using Bangla

There has always been some problems in using Bangla ISCII for preparing electronic texts. ISCII for preparing electronic texts.

 The Bangla UNICODE code points appear to be based

The Bangla UNICODE code points appear to be based mainly on Bangla ISCII. mainly on Bangla ISCII.

 So, it has problems too, though some of them are

So, it has problems too, though some of them are l d l d l d l d already solved. already solved.

slide-11
SLIDE 11

Unicode 5.1 for Bangla Unicode 5.1 for Bangla

slide-12
SLIDE 12

Unicode 5.2 for Bangla Unicode 5.2 for Bangla

slide-13
SLIDE 13

Problems Remaining Problems Remaining

 Rendition

endition of

  • f Hasanta

Hasanta and and two two types types of

  • f conjunct

conjunct r +

 Rendition

endition of

  • f Hasanta

Hasanta and and two two types types of

  • f conjunct

conjunct r ja ja is is clumsy clumsy with with ZWJ ZWJ and and ZWNJ ZWNJ code code points points. .

 No

No code code point point exists exists for for (Khiya (Khiya or

  • r Jukta

Jukta-

  • kha

kha ) as as p ( y ( y ) well well as as the the Om Om-

  • kar

kar character character . .

 Unnecessary

Unnecessary existence existence of

  • f a

a code code point point for for right right y p g side side of

  • f ou
  • u-
  • kar

kar . .

 No

No code code point point exists exists for for Urdha Urdha-

  • comma

comma . .

 Existence

Existence

  • f
  • f

many many code code points points for for

  • ld
  • ld

and and

  • bsolete
  • bsolete symbols

symbols in in the the main main code code table table. .

 Unreasonable

Unreasonable proposal proposal of

  • f introducing

introducing extra extra code code for for transparent transparent and and non non-

  • transparent

transparent form form of

  • f vowel

vowel difi difi modifiers modifiers . .

 Code

Code points points for for various various signs signs need need discussion discussion. .

slide-14
SLIDE 14

Our Proposals Our Proposals

1 I d d i f I d d i f i h bl f i 09BA i h bl f i 09BA 1.

  • 1. Introduce a code point for

Introduce a code point for in the table, after ie, at 09BA in the table, after ie, at 09BA and for at 09D0. and for at 09D0. 2 Introduce a new code point for Ja Introduce a new code point for Ja fala ( ) say after ( ) i e at fala ( ) say after ( ) i e at 2.

  • 2. Introduce a new code point for Ja

Introduce a new code point for Ja-fala ( ) say after ( ) i.e. at fala ( ) say after ( ) i.e. at 09C9 and use this to express all kinds of Ja 09C9 and use this to express all kinds of Ja-

  • fala. The existing
  • fala. The existing

role of hasant and ZWNJ will continue. E.g. role of hasant and ZWNJ will continue. E.g. There will be no need for ZWJ code point in this scheme. There will be no need for ZWJ code point in this scheme. Contd..

slide-15
SLIDE 15
  • 3. There is no need to distinguish by using ZWJ.
  • 3. There is no need to distinguish by using ZWJ.
  • 4. Release the obsolete character code points by placing them in
  • 4. Release the obsolete character code points by placing them in

private use area. private use area. 5 St i th d i t f l th i th i 5 St i th d i t f l th i th i

  • 5. Stop using the code point for unless there is other pressing
  • 5. Stop using the code point for unless there is other pressing
  • reasons. It may create confusion for O
  • reasons. It may create confusion for O-
  • kar ( )

kar ( ) 6 (a) (a) Should Should we we use use any any of

  • f the

the existing existing code code points points for for 6.

  • 6. (a)

(a) Should Should we we use use any any of

  • f the

the existing existing code code points points for for representing representing the the upper upper comma comma which which has has different different connotation connotation in in Bangla? Bangla? We We are are in in favor favor of

  • f a

a distinct distinct code code point point. . (b) (b) Should Should we we use use the the Devanagari Devanagari code code point point of

  • f full

full-

  • stop

stop sign sign (danda) (danda) to to represent represent Bangla Bangla full full-

  • stop

stop also? also? Our Our suggestion suggestion is is to to have have distinct distinct code code point point for for Bangla Bangla full full-stops stops have have distinct distinct code code point point for for Bangla Bangla full full-stops stops. (c) (c) For For representing representing signs signs for for acronym, acronym, foot, foot, inch, inch, degree degree etc etc. . for for Bangla, Bangla, the the Unicode Unicode manual manual should should have have specific specific suggestions suggestions that that are are easily easily available available in in net net. . Contd..

slide-16
SLIDE 16
  • 7. In

In the the description description of

  • f code

code points points in in Unicode Unicode manual manual there there are are several several inadequacy inadequacy which which should should be be modified modified as as follows follows:

09 09F F4 4 BENGALI BENGALI CURRENCY CURRENCY NUMERATOR NUMERATOR SIGN SIGN FOR FOR ONE ONE ANNA ANNA

  • not

not in in current current usage usage 09 09F5 BENGALI BENGALI CURRENCY CURRENCY NUMERATOR NUMERATOR SIGN SIGN FOR FOR TWO TWO ANNAS ANNAS 09 09F5 BENGALI BENGALI CURRENCY CURRENCY NUMERATOR NUMERATOR SIGN SIGN FOR FOR TWO TWO ANNAS ANNAS

  • not

not in in current current usage usage 09 09F F6 6 BENGALI BENGALI CURRENCY CURRENCY NUMERATOR NUMERATOR SIGN SIGN FOR FOR THREE THREE ANNAS ANNAS

  • not

not in in current current usage usage 09 09F F7 7 BENGALI BENGALI CURRENCY CURRENCY NUMERATOR NUMERATOR SIGN SIGN FOR FOR FOUR FOUR ANNAS ANNAS not not in in c rrent c rrent sage sage (A code point is needed for eight annas also)

(A code point is needed for eight annas also)

  • not

not in in current current usage usage 09 09F F8 8 BENGALI BENGALI CURRENCY CURRENCY NUMERATOR NUMERATOR SIGN SIGN FOR FOR TWELVE TWELVE ANNAS ANNAS

  • not

not in in current current usage usage

(A code point is needed for eight annas also) (A code point is needed for eight annas also)

09 09F F9 9 BENGALI BENGALI CURRENCY CURRENCY DENOMINATOR DENOMINATOR SIXTEEN SIXTEEN END END MARKER MARKER AFTER AFTER ANNAS ANNAS

  • not

not in in current current usage usage not not in in current current usage usage 09 09FB FB BENGALI BENGALI GANDA GANDA MARK MARK

  • not

not in in current current usage usage

slide-17
SLIDE 17

Any Any Any Any Comment ? Comment ? Suggestion ? Suggestion ? gg gg Q ti ? Q ti ? Question ? Question ?

slide-18
SLIDE 18

Thank You Thank You

slide-19
SLIDE 19

BACK BACK

slide-20
SLIDE 20

BACK BACK

slide-21
SLIDE 21

In In Devanagari, Devanagari, khiya khiya is is formed formed by by combining combining two two characters characters. . In In Bangla Bangla also, also, the the current current practice practice is is to to form form it it as as follows follows: form form it it as as follows follows: However, However, in in Bangla Bangla it it is is considered considered as as single single character character and and in in Bangla Bangla dictionary dictionary it it is is ranked ranked in in character character and and in in Bangla Bangla dictionary dictionary it it is is ranked ranked in in between between and and . So, So, there there should should be be a a separate separate code code point point for for it it. . p

BACK BACK

slide-22
SLIDE 22

BACK BACK

slide-23
SLIDE 23

BACK BACK

slide-24
SLIDE 24