Natural Language Processing The saara Approach Kavi Narayana Murthy - - PowerPoint PPT Presentation

natural language processing the saara approach
SMART_READER_LITE
LIVE PREVIEW

Natural Language Processing The saara Approach Kavi Narayana Murthy - - PowerPoint PPT Presentation

Natural Language Processing The saara Approach Kavi Narayana Murthy School of Computer and Information Sciences University of Hyderabad July 2014 Kavi Narayana Murthy - UoH NLP NLP Today: a data driven empirical science. NLP systems are


slide-1
SLIDE 1

July 2014 Kavi Narayana Murthy - UoH

Natural Language Processing The saara Approach

Kavi Narayana Murthy School of Computer and Information Sciences

University of Hyderabad

slide-2
SLIDE 2

July 2014 Kavi Narayana Murthy - UoH

NLP

 NLP Today: a data driven empirical science.

NLP systems are built by training language independent and generic machine learning algorithms on large scale language data.

 Original goals of NLP: NL understanding, NL

generation and NL learning

 Meaning has gradually lost focus, almost

forgotten?

slide-3
SLIDE 3

July 2014 Kavi Narayana Murthy - UoH

The saara Approach

 Given a sentence, how to compute its meaning.

 Given a word, how to compute its meaning.

 Problem: Computers do not understand meanings.  Solution: Structure indicates meaning. Use

appropriate structures and manipulations.

 Grammar: Mapping Structure to Meaning  Main Focus: Development of Computational

Grammars

 Philosophy: Do it right, no short cuts!

slide-4
SLIDE 4

July 2014 Kavi Narayana Murthy - UoH

Grammar

 Morphology: Relating word internal structure to

word meaning

 Syntax: Relating sentence structure to sentence

meaning

 What exactly does a sentence mean?

 Can be computed. Provided

 Speaker knows what exactly to say  Speaker knows how exactly to say

 Universal / language-independent

slide-5
SLIDE 5

July 2014 Kavi Narayana Murthy - UoH

Grammar and Usage

 People are not always very careful. They may also

not know what to say or how to express it. Thus, actual usage may not indicate the intended meaning directly, precisely and unambiguously.

 Layered Approach: Grammar should be designed

for carefully usage only. This forms the core. We can then build layers or wrappers to handle all the variations we find in actual usage. This way, we can get a simple, neat, elegant, efficient grammar and cater to practical needs at the same time.

slide-6
SLIDE 6

July 2014 Kavi Narayana Murthy - UoH

The saara System

 Computational Models for

Analysis

Generation

Translation and other Applications

 kannaDa-saara Alpha Ver 3  telugu-saaramu Alpha Ver 3  Lexical Resources and Tools  The saara Translator System

slide-7
SLIDE 7

July 2014 Kavi Narayana Murthy - UoH

The saara Approach

 Correct Analysis of Source Language

 Can be translated into any other language

 Only bilingual dictionary and transfer grammar required

 Correct Generation

 Can take the analysis produced by any other system

for any other language and translate to our language

 How to guarantee correctness?

 Machine Learning cannot!

 NLP is a technology. Q: What is the scientific

foundation? A: The saara Approach!

slide-8
SLIDE 8

July 2014 Kavi Narayana Murthy - UoH

Natural Language Engineering at SCIS, UoH

 Foundations: Word, Word Classes, Sentence,

Syntactic Relations, Parsing, ...

− Universal and Precise Definitions − Lexical Resources and Tools

 Core Research and Development:

− Telugu, Kannada, Indian Languages, English

 Applications

− Text Categorization, Text Summarization, NERC − Language ID, WSD, Spell Checking, Anaphora Reso. − ASR, TTS, OCR, Machine Translation, IR, IE

slide-9
SLIDE 9

July 2014 Kavi Narayana Murthy - UoH

slide-10
SLIDE 10

July 2014 Kavi Narayana Murthy - UoH

If You are Interested

  • Yoga, Ayurveda, Holistic Healing
  • Vedanta
  • Classical Music
  • Books:

– Brahmacharya – Ahimsa – Freedom

slide-11
SLIDE 11

July 2014 Kavi Narayana Murthy - UoH

Just Released!

slide-12
SLIDE 12

July 2014 Kavi Narayana Murthy - UoH

Freedom

See what the previewers have said: "Here is one book which covers all aspects of life, simple, clear, written with a scientific bent of mind, a great book, very much unlike many other books I have seen on similar topics" "The book is a masterpiece - no doubt" " The book is simply awesome and I feel it is a 'must read' for all human beings" "This is a book I love to read again and again"

slide-13
SLIDE 13

July 2014 Kavi Narayana Murthy - UoH

Thank You

Visit 202.41.85.68 email: knmuh@yahoo.com

slide-14
SLIDE 14

July 2014 Kavi Narayana Murthy - UoH

Words

 What Exactly is a Word?  Universal Word Classes  Sub-Categorization  Tag-Set

Hierarchical, Extensible, Fine-Grained

More Than POS

 Languages are NOT as ambiguous as they seem

to be!

slide-15
SLIDE 15

July 2014 Kavi Narayana Murthy - UoH

Words

 Lexicon  Morphological Analysis and Generation  Stemming and Lemmatization  Spelling Error Detection and Correction  Lexical Resource Toolkit, Glossing  Tagging

No ML, No Training Data, No Manual Work

High Performance

slide-16
SLIDE 16

July 2014 Kavi Narayana Murthy - UoH

Sentence

 What exactly is a Sentence?  What exactly is Syntax?  Universal Syntactic Relations  How to identify them?

slide-17
SLIDE 17

July 2014 Kavi Narayana Murthy - UoH

The saara Architecture

 Purely Linguistic, no ML  Simple Pipe-Line Architecture

Phonemes, Words, Sentences, Discourse

No tagging, chunking, local-word-grouping, ...

 Perl and Java – Platform Independent

Synchronization

Stand Alone, No other dependencies

slide-18
SLIDE 18

July 2014 Kavi Narayana Murthy - UoH

Status, Plans

 Word Level: Alpha Versions Released, Beta soon

> 90% Analyzed, Mostly Correct

First Complete Morph for Kannada/Telugu

 Sentence Level: Going On

To be Ready in about an Year

 Workshop Series  No Funding taken from any source so far.

TDIL/DeitY has now sanctioned a project.

slide-19
SLIDE 19

July 2014 Kavi Narayana Murthy - UoH

References

 Pl. visit our website!