EDAN20 Language Technology http://cs.lth.se/edan20/ Chapter 1: An - - PowerPoint PPT Presentation

edan20 language technology http cs lth se edan20
SMART_READER_LITE
LIVE PREVIEW

EDAN20 Language Technology http://cs.lth.se/edan20/ Chapter 1: An - - PowerPoint PPT Presentation

Language Technology EDAN20 Language Technology http://cs.lth.se/edan20/ Chapter 1: An Overview of Language Processing Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ August 28, 2017 Pierre Nugues


slide-1
SLIDE 1

Language Technology

EDAN20 Language Technology http://cs.lth.se/edan20/

Chapter 1: An Overview of Language Processing Pierre Nugues

Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/

August 28, 2017

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 1/20

slide-2
SLIDE 2

Language Technology Chapter 1: An Overview of Language Processing

Applications of Language Processing

Spelling and grammatical checkers: MS Word, e-mail programs, etc. Text indexing and information retrieval on the Internet: Google, Microsoft Bing, Yahoo, or software like Apache Lucene Translation: Google Translate, SYSTRAN Spoken interaction: Apple Siri, Google Now, Tellme.com, or SJ (trains in Sweden) Speech dictation of letters or reports: IBM ViaVoice, Windows Vista

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 2/20

slide-3
SLIDE 3

Language Technology Chapter 1: An Overview of Language Processing

Applications of Language Processing (ctn’d)

Direct translation from spoken English to spoken Swedish in a restricted domain: SRI and SICS Voice control of domestic devices such as tape recorders: Philips or disc changers: MS Persona Conversational agents able to dialogue and to plan: TRAINS Spoken navigation in virtual worlds: Ulysse, Higgins Generation of 3D scenes from text: Carsim Question answering: IBM Watson and Jeopardy!

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 3/20

slide-4
SLIDE 4

Language Technology Chapter 1: An Overview of Language Processing

Linguistics Layers

Sounds Phonemes Words and morphology Syntax and functions Semantics Dialogue

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 4/20

slide-5
SLIDE 5

Language Technology Chapter 1: An Overview of Language Processing

Sounds and Phonemes

Serious C’est par là ‘It is that way’

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 5/20

slide-6
SLIDE 6

Language Technology Chapter 1: An Overview of Language Processing

Lexicon and Parts of Speech

The big cat ate the gray mouse The/article big/adjective cat/noun ate/verb the/article gray/adjective mouse/noun Le/article gros/adjectif chat/nom mange/verbe la/article souris/nom grise/adjectif Die/Artikel große/Adjektiv Katze/Substantiv ißt/Verb die/Artikel graue/Adjektiv Maus/Substantiv

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 6/20

slide-7
SLIDE 7

Language Technology Chapter 1: An Overview of Language Processing

Morphology

Word Root form worked to work + verb + preterit travaillé travailler + verb + past participle gearbeitet arbeiten + verb + past participle

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 7/20

slide-8
SLIDE 8

Language Technology Chapter 1: An Overview of Language Processing

Syntactic Tree

sentence noun phrase article The noun boy verb phrase verb hit noun phrase article the noun ball

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 8/20

slide-9
SLIDE 9

Language Technology Chapter 1: An Overview of Language Processing

Syntax: A Classical View

A graph of dependencies and functions The boy hit the ball

Verb Subject Object

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 9/20

slide-10
SLIDE 10

Language Technology Chapter 1: An Overview of Language Processing

Semantics

As opposed to syntax:

1 Colorless green ideas sleep furiously. 2 *Furiously sleep ideas green colorless.

Determining the logical form: Sentence Logical representation Frank is writing notes writing(Frank, notes). François écrit des notes écrit(François, notes). Franz schreibt Notizen schreibt(Franz, Notizen).

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 10/20

slide-11
SLIDE 11

Language Technology Chapter 1: An Overview of Language Processing

Lexical Semantics

Word senses:

1 note (noun) short piece of writing; 2 note (noun) a single sound at a particular level; 3 note (noun) a piece of paper money; 4 note (verb) to take notice of; 5 note (noun) of note: of importance. Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 11/20

slide-12
SLIDE 12

Language Technology Chapter 1: An Overview of Language Processing

Reference

Pierre wrote notes wrote(pierre, notes)

Pierre Louis Charlotte

  • perating

systems language processing Prolog programming

  • 1. Sentence
  • 2. Logical representation
  • 3. Real world

r e f e r s t

  • refers to

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 12/20

slide-13
SLIDE 13

Language Technology Chapter 1: An Overview of Language Processing

Ambiguity

Many analyses are ambiguous. It makes language processing difficult. Ambiguity occurs in any layer: speech recognition, part-of-speech tagging, parsing, etc. Example of an ambiguous phonetic transcription: The boys eat the sandwiches That may correspond to: The boy seat the sandwiches; the boy seat this and which is; the buoys eat the sand which is

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 13/20

slide-14
SLIDE 14

Language Technology Chapter 1: An Overview of Language Processing

Models and Tools

Linguistics has produced an impressive set of theories and models Language processing requires significant resources Models and tools have matured. Resources are available. Tools involve notably finite-state automata, regular expressions, logic, statistics, and machine learning.

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 14/20

slide-15
SLIDE 15

Language Technology Chapter 1: An Overview of Language Processing

The Carsim System: A Text-to-Scene Converter

Texts XML Templates 3D Animation Véhicule B venant de ma gauche, je me trouve dans le carrefour, à faible vitesse environ 40 km/h, quand le véhicule B, percute mon véhicule, et me refuse la priorité à droite. Le premier choc atteint mon aile arrière gauche, // Static Objects STATIC [ ROAD TREE ] // Dynamic Objects DYNAMIC [ VEHICLE [ ID = vehicule_b; INITDIRECTION = east;

¡

= ⇒ = ⇒ NLP engine Java 3D animation program

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 15/20

slide-16
SLIDE 16

Language Technology Chapter 1: An Overview of Language Processing

Dialogue: The Persona Project from Microsoft Research

A conversation with Peedy Turn Utterance [Peedy is asleep on his perch] User: Good morning, Peedy. [Peedy rouses] Peedy: Good morning. User: Let’s do a demo. [Peedy stands up, smiles] Peedy: Your wish is my command, what would you like to hear? User: What have you got by Bonnie Raitt? [Peedy waves in a stream of notes, and grabs one as they rush by.] Peedy: I have “The Bonnie Raitt Collection” from 1990. User: Pick something from that Peedy: How about “Angel from Montgomery”?

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 16/20

slide-17
SLIDE 17

Language Technology Chapter 1: An Overview of Language Processing

Dialogue: The Persona Project from Microsoft Research

User: Sounds good. [Peedy drops note on pile] Peedy: OK. User: Play some rock after that. [Peedy scans the notes again, selects one] Peedy: How about “Fools in love”? User: Who wrote that? [Peedy cups one wing to his ‘ear’] Peedy: Huh? User: Who wrote that? [Peedy looks up, scrunches his brow] Peedy: Joe Jackson User: Fine. [Drops note on pile]

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 17/20

slide-18
SLIDE 18

Language Technology Chapter 1: An Overview of Language Processing

Persona System Architecture

¡

Source: http: //research.microsoft.com/research/pubs/view.aspx?pubid=439

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 18/20

slide-19
SLIDE 19

Language Technology Chapter 1: An Overview of Language Processing

IBM Watson

IBM Watson: A system that can answer questions better than any human Video: https://www.youtube.com/ watch?v=WFR3lOm_xhE IBM Watson builds on the extraction of knowledge from masses of texts: Wikipedia, archive of the New York Times, etc. Bottom line: Text is the repository of human knowledge

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 19/20

slide-20
SLIDE 20

Language Technology Chapter 1: An Overview of Language Processing

IBM Watson: Simplified Architecture

Question processing Passage retrieval Answer extraction Question Answers

Question parsing and classification: Syntactic parsing, entity recognition, answer classification Document retrieval. Extraction and ranking of passages: Indexing, vector space model. Extraction and ranking of answers: Answer parsing, entity recognition

Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/ August 28, 2017 20/20