Pronunciation of Nouns in Pronunciation of Nouns in Text to Speech - - PowerPoint PPT Presentation

pronunciation of nouns in pronunciation of nouns in text
SMART_READER_LITE
LIVE PREVIEW

Pronunciation of Nouns in Pronunciation of Nouns in Text to Speech - - PowerPoint PPT Presentation

Pronunciation of Nouns in Pronunciation of Nouns in Text to Speech systems Text to Speech systems Veera Raghavendra, Lavanya Prahallad Veera Raghavendra, Lavanya Prahallad IIIT Hyderabad, India IIIT Hyderabad, India Agenda Agenda Nature of


slide-1
SLIDE 1

Pronunciation of Nouns in Pronunciation of Nouns in Text to Speech systems Text to Speech systems

Veera Raghavendra, Lavanya Prahallad Veera Raghavendra, Lavanya Prahallad IIIT Hyderabad, India IIIT Hyderabad, India

slide-2
SLIDE 2

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Agenda Agenda

  • Nature of Indian Language Scripts

Nature of Indian Language Scripts

  • Convergence and Divergence

Convergence and Divergence

  • Fonts and Transliteration Scheme

Fonts and Transliteration Scheme

  • SSML Extensions for Proper Nouns

SSML Extensions for Proper Nouns

slide-3
SLIDE 3

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Nature of Indian Language Scripts Nature of Indian Language Scripts

  • Indian language (IL) scripts originated from the ancient

Brahmi script.

  • Basic units of the writing system are Aksharas

Basic units of the writing system are Aksharas

  • An Akshara is an orthographic representation of a speech

An Akshara is an orthographic representation of a speech sound sound

  • Akshara is syllabic in nature

Akshara is syllabic in nature

  • A syllable is defined as C*VC*

A syllable is defined as C*VC*

  • C is a consonant

C is a consonant

  • V is a vowel

V is a vowel

  • Examples: V, CV, CCV, CVC, CCCV

Examples: V, CV, CCV, CVC, CCCV

  • amma:

amma:

  • Phone sequence: / a/ / m/ / m/ / aa/

Phone sequence: / a/ / m/ / m/ / aa/

  • Syllables: (/ a/ ) (/ m/ / m/ / aa/ )

Syllables: (/ a/ ) (/ m/ / m/ / aa/ )

  • Written from left- to- right
  • Words are separated by space as in European languages
  • Roman digits (0...9) are used as numerals.
slide-4
SLIDE 4

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Convergence and Divergence Convergence and Divergence

  • India is a multi- lingual nation with 21 recognized

India is a multi- lingual nation with 21 recognized

  • fficial languages and ~1652 dialects.
  • fficial languages and ~1652 dialects.
  • These languages are: Assamese, Tamil, Malayalam,

These languages are: Assamese, Tamil, Malayalam, Gujarati, Telugu, Oriya, Urdu, Bengali, Sanskrit, Gujarati, Telugu, Oriya, Urdu, Bengali, Sanskrit, Kashmiri, Sindhi, Punjabi, Konkani, Marathi, Kashmiri, Sindhi, Punjabi, Konkani, Marathi, Manipuri, Kannadam, Bodo, Dogri, Maithili, Santhali Manipuri, Kannadam, Bodo, Dogri, Maithili, Santhali and Nepali. and Nepali.

  • Apart from Hindi and English

Apart from Hindi and English

  • While all of these languages share a common

While all of these languages share a common phonetic base, some of the languages such as phonetic base, some of the languages such as Hindi, Marathi and Nepali also share a common Hindi, Marathi and Nepali also share a common script known as Devanagari. script known as Devanagari.

  • Languages such as Telugu,Kannada and Tamil have

Languages such as Telugu,Kannada and Tamil have their own scripts. their own scripts.

slide-5
SLIDE 5

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Fonts and Transliteration scheme Fonts and Transliteration scheme

  • True Type Fonts

True Type Fonts

  • Uses 1- 256 ASCII characters to represent characters

Uses 1- 256 ASCII characters to represent characters

  • Character representation is different from one font to

Character representation is different from one font to

  • ther [even in the same language]
  • ther [even in the same language]
  • Separate converter required for each font

Separate converter required for each font

  • Proprietary fonts

Proprietary fonts

  • Unicode

Unicode

  • A universal character set
  • provides a unique number for each character in a

provides a unique number for each character in a language language

  • Supports all platforms

Supports all platforms

  • Supports all the languages

Supports all the languages

slide-6
SLIDE 6

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

  • Transliteration (OM / IT3)
  • Developed by IISc Bangalore and Carnegie Mellon

Developed by IISc Bangalore and Carnegie Mellon

  • Developed from the user readability aspects –

Developed from the user readability aspects – Easier to read and type Easier to read and type

  • It is case- insensitive.

It is case- insensitive.

  • Thus a single transliteration scheme is used for

Thus a single transliteration scheme is used for all the Indian languages, as they share the same all the Indian languages, as they share the same set of sounds. set of sounds.

  • Each character (corresponding to a

Each character (corresponding to a phone/ sound) is not more than three letters phone/ sound) is not more than three letters length. length.

slide-7
SLIDE 7

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Hindi Telugu Reference: http:/ / speech.iiit.ac.in/ Transliteration/ http:/ / www.cs.cmu.edu/ ~madhavi/ Om/

slide-8
SLIDE 8

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Particles Particles

  • Hindi and some other Indian languages have a

practice of adding a particle 'ji' or 'saaheba‘ etc., after proper nouns.

  • They are added when the speaker wants to give

respect to the person he is referring to in his speech. Examples:

  • Huma maasat’arajii sei milnei gayei

Huma maasat’arajii sei milnei gayei (We went to meet the teacher) (We went to meet the teacher)

  • Aaja pitaajii ghara para rahein’gei

Aaja pitaajii ghara para rahein’gei (Father will be at house today) (Father will be at house today)

slide-9
SLIDE 9

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Example of Particle Example of Particle

  • < ?xml version= "1.0"?>

< speak version= "1.0" xml:lang= “hin- in“ xml:type= “IT3”> < voice gender= "female">

Huma

Huma < particle type= “ji”> maastaar< / particle> sei milnei gayei sei milnei gayei < / voice> < / speak>

slide-10
SLIDE 10

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Use of Loanword Use of Loanword

  • A

A loanword loanword (or (or loan word loan word) is a word directly taken ) is a word directly taken into one language from another with little or no into one language from another with little or no translation. translation.

  • Informal experiments suggested 33%
  • f errors of

Informal experiments suggested 33%

  • f errors of

TTS of IL occur while rendering loan words TTS of IL occur while rendering loan words

  • Such loan words could be automatically detected

Such loan words could be automatically detected due to syllabic properties of the Indian languages due to syllabic properties of the Indian languages

slide-11
SLIDE 11

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Example of loanword

  • CANCER has to be pronounced as / C/ / AE/ / N/ / S/

/ A/ / R/

  • / AE/ phoneme does not exist in Indian language

phone set

  • < loan> kaansar < / loan>
  • loan (non- native) words could be rendered using

different pronunciation dictionaries or letter- to- sound rules

slide-12
SLIDE 12

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Use of Mention Use of Mention

  • What is mention

What is mention

  • I mention – refers to first occurrence of a noun

I mention – refers to first occurrence of a noun

  • II mention – refers to second occurrence of a noun

II mention – refers to second occurrence of a noun

  • More emphasize on the first occurrence of the

More emphasize on the first occurrence of the proper noun in a sentence or paragraph proper noun in a sentence or paragraph

  • Tag, < mention> , should be used to identify similar

Tag, < mention> , should be used to identify similar words in synthesizing the speech words in synthesizing the speech

slide-13
SLIDE 13

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Duration prediction using Mention Duration prediction using Mention Information Information

 Duration modeling using mention information of US

Duration modeling using mention information of US English English

0.497 0.869 With MENTION 0.4580 0.876 Without MENTION Correlation RMSE

slide-14
SLIDE 14

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Example of Mention Example of Mention

  • < ?xml version= "1.0"?>

< speak version= "1.0“> < voice gender= "female"> < mention occ= 1> Gandhi< / mention> was a major political and spiritual leader of the Indian Independence Movement. < mention occ= 2> Gandhi < / mention> < / mention> was the pioneer of satyagraha < / voice> < / speak>

slide-15
SLIDE 15

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Conclusion Conclusion

  • Issues in Indian scripts are discussed

Issues in Indian scripts are discussed

  • Discussed the usage of < particle> , < loan> and

Discussed the usage of < particle> , < loan> and < mention> extensions for SSML < mention> extensions for SSML

slide-16
SLIDE 16

Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II 07/ 01/ 14 07/ 01/ 14

Thanks… Thanks…