Natural Language Processing: History & Limits Roman Kern - - PowerPoint PPT Presentation

natural language processing history limits
SMART_READER_LITE
LIVE PREVIEW

Natural Language Processing: History & Limits Roman Kern - - PowerPoint PPT Presentation

Natural Language Processing: History & Limits SCIENCE PASSION TECHNOLOGY Natural Language Processing: History & Limits Roman Kern <rkern@tugraz.at> 2020-03-05 Roman Kern <rkern@tugraz.at>, Institute for Interactive


slide-1
SLIDE 1

Natural Language Processing: History & Limits

SCIENCE PASSION TECHNOLOGY

Natural Language Processing: History & Limits

Roman Kern <rkern@tugraz.at> 2020-03-05

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 1

slide-2
SLIDE 2

Natural Language Processing: History & Limits

Outline

1 History 2 Language Basics 3 Limitations 4 Applications, Tools, Tasks

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 2

slide-3
SLIDE 3

History

Where are we coming from (as a discipline)?

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 3

slide-4
SLIDE 4

History

Motivational Example Recall the Turing Test (1950) A test designed to assess, if a machine achieves human level of intelligence ... via communication using a teleprinter, i.e. writen text Hence, NLP is ofen seen as a key technology for AI

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 4

slide-5
SLIDE 5

History

Telewriter Example

Figure: Teleprinter (teletypewriter, Teletype or TTY)

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 5

slide-6
SLIDE 6

History

Early Machine Translation Georgetown-IBM experiment 1952-54 60 sentences were translated from Russian to English Rule based system Highly constrained selection of sentences Vocabulary contained 250 words Sparked interest and funding money Authors claimed that within three or five years, machine translation would be a solved problem

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 6

slide-7
SLIDE 7

History

Influential Early Work Syntactic Structures by Noam Chomsky (1957) Book (lecture notes) proposing to analyse the structure of text ... and transforming it, so that machine can process them

Phase-Structure Grammar

“Colorless green ideas sleep furiously” Grammatically correct, but semantically meaningless Plus, an example for a sentence that has never been formulated before

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 7

slide-8
SLIDE 8

History

Influential Early Work “Early claims that computers can translate languages were vastly exaggerated” Anthony Oetinger (1966) “Time flies like an arrow” as example for an ambiguous sentence ... time moves quickly? (figuratively) ... measure the speed of flies? (imperative) ... species “time flies” have a preference for arrows?

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 8

slide-9
SLIDE 9

History

ELIZA

Developed by Joseph Weizenbaum at MIT (1964-66)

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 9

slide-10
SLIDE 10

History

First AI Winter Litle progress NLP (and other AI-related topics) received less funding ... due to failure to deliver, e.g., a working machine translation systems

→ relatively litle (visible) progress achieved

during the late 60ties to early 80ties

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 10

slide-11
SLIDE 11

History

Knowledge Representation History of knowledge representation Field of AI, closely related to NLP General Problem Solver (1959) Computer program

Could solve “toy examples” Dedicated programming language

Separated knowledge from the solving itself Expert systems Introduced by Feigenbaum (1965) Knowledge-based and reasoning system

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 11

slide-12
SLIDE 12

History

Knowledge Representation Types of knowledge representation Frame Inspired by psychological research (1930ties) Structures knowledge in hierarchical relationships

e.g., KL-ONE (1977), FrameNet (1997) Kicktionary: ❤tt♣✿✴✴✇✇✇✳❦✐❝❦t✐♦♥❛r②✳❞❡

Semantic networks Inspired by associational memory of humans

e.g., Aschaffenburg (early 20th century)

Cyc (1984) Ontologies, e.g., RDF & OWL

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 12

slide-13
SLIDE 13

History

Semantic Net

Example of an early semantic net (Collins und Qillian, about 1960s)

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 13

slide-14
SLIDE 14

History

Ontologies In philosophy an ontology deals with the existence question Since 1980s the term is being using in computer science Main components Individuals (instances), classes (concepts), atributes and relations Whereas relations ofen can be freely defined Upper ontologies vs. domain ontologies Only a few upper ontologies

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 14

slide-15
SLIDE 15

History

Knowledge Graph Ontologies are still popular today Term coined by Google initiative (2012) Knowledge base represented as a graph Well-known example: FreeBase (2007) Graph database (tripe store) Similar projects: YAGO, DBPedia, Wikidata Relevant for NLP WordNet ConceptNet

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 15

slide-16
SLIDE 16

History

Logic History of reasoning Initial combination of rules and logic for inference and reasoning e.g., first-order (predicate) logic Notations e.g., context free grammar (BNF) Fuzzy logic Introduced by Lotfi A. Zadeh (1965) Following the intuition that decisions do not have “hard borders”

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 16

slide-17
SLIDE 17

History

Corpus Linguistics History of language corpora Brown corpus (1961) 500 samples of English-language text “Computational Analysis of Present-Day American” by Henry Kučera and

  • W. Nelson Francis (1967)

→ Frequency of words follow the Zipf’s law

The Brown corpus was later also tagged Each word was annotated with its word group

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 17

slide-18
SLIDE 18

History

Paradigm Shif in NLP The majority of word in NLP by until the mid 1980s were based on rules e.g., mostly hand-crafed rules ... using domain knowledge (linguists) Shif toward statistical and stochastic models e.g., machine learning ... in combination with corpus linguistics “Every time I fire a linguist, the performance of the speech recognizer goes up”

Frederick Jelinek (1985)

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 18

slide-19
SLIDE 19

History

Machine Translation as Example for NLP History

History of machine translation ❤tt♣s✿✴✴✈❛s✸❦✳❝♦♠✴❜❧♦❣✴♠❛❝❤✐♥❡❴tr❛♥s❧❛t✐♦♥✴

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 19

slide-20
SLIDE 20

History

Recent History of Deep Learning Based NLP 2001 Neural language models 2008 Multi-task learning 2013 Word embeddings 2013 Neural networks for NLP 2014 Sequence-to-sequence models 2015 Atention 2015 Memory-based networks 2018 Pretrained language models

Taken from: ❤tt♣s✿✴✴r✉❞❡r✳✐♦✴❛✲r❡✈✐❡✇✲♦❢✲t❤❡✲r❡❝❡♥t✲❤✐st♦r②✲♦❢✲♥❧♣✴

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 20

slide-21
SLIDE 21

History

Overview of Terms Related to NLP Speech recognition Automatic speech recognition (ASR), speech to text (STT) Natural language understanding (NLU) “Machine reading” Builds upon NLP Natural language generation (NLG) Language production Ofen input to a text-to-speech system Computational linguistics Inter-disciplinary field of linguistics and computer science

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 21

slide-22
SLIDE 22

Language Basics

Main basic concepts and terminology

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 22

slide-23
SLIDE 23

Language Basics

Fun Facts about Human Languages Some languages do not have words for lef or right More than 6,000 languages spoken today Language differ in their word ordering Sometimes the change in order also changes the meaning The human brain has specific regions for language processing The language affects cognitive processes, e.g., speed Some aspects of language are arbitrary ...

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 23

slide-24
SLIDE 24

Language Basics

Basic elements of Lingustics Building blocks of spoken language Phonetics The sounds that make up the languages Phoneme → phones vs. grapheme → glyph Phonology The combination of sounds Morphology Word formation (lexical) Syntax Word combinations for phrases and sentences Semantics The meaning of e.g. sentences Pragmatics Understanding of the context

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 24

slide-25
SLIDE 25

Language Basics

Lingustics Definition of morphology Morphology: The study of the internal structure of words Morphotactics: What morphemes are allowed and in what order Morphophonology: How the form of morphemes is conditioned by other morphemes they combine with Morphosyntax: How the morphemes in a word affect its combinatoric potential

Taken from: ❤tt♣s✿✴✴❢❛❝✉❧t②✳✇❛s❤✐♥❣t♦♥✳❡❞✉✴❡❜❡♥❞❡r✴♣❛♣❡rs✴✶✵✵t❤✐♥❣s✳♣❞❢

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 25

slide-26
SLIDE 26

Language Basics

Morphological Types of Languages Analytic languages (isolating languages) - each word is a single morpheme e.g., Mandarin Chinese

Extra function words for plural or past tense

Synthetic languages - words may contain multiple morphemes e.g., Hungarian

Instead of using the word order, the words signify subject/object e.g., man biting dog vs. dog biting man

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 26

slide-27
SLIDE 27

Language Basics

Types of Synthetic Languages Agglutinating languages - morpheme can be freely joint to form (new) words e.g., Hungarian, Swahili Single word for “in our house”, or “I will read”, “You will read, ”S/he will read“ Fusional languages - affixes cannot be separated from stem e.g., Spanish, Russian

Affixes for the number of person, number (sg/pl), tense (or even mood)

Polysynthetic languages e.g., Sora

Combining multiple stems and affixes ”(Someone) will stab you with a knife in (your) belly“

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 27

slide-28
SLIDE 28

Language Basics

Writing Systems Pictographic/ideographic Symbols represent ideas/concepts, e.g., Aztec/Nahuatl Logographic Symbols (many) represent individual words, e.g., Egyptian hieroglyphs Syllabic Symbols represent syllables, e.g., Japan/Hiragana Alphabetic Symbols represent sound, e.g., Latin

Note: English is mostly alphabetic - with exceptions, e.g., the € symbol

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 28

slide-29
SLIDE 29

Language Basics

Word Order Many languages of these types (or mixtures): VO languages - verb first OV languages - object before the verb (German is mostly SOV+V2)

❤tt♣s✿✴✴❡♥✳✇✐❦✐♣❡❞✐❛✳♦r❣✴✇✐❦✐✴❙✉❜❥❡❝t✪❊✷✪✽✵✪✾✸♦❜❥❡❝t✪❊✷✪✽✵✪✾✸✈❡r❜ Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 29

slide-30
SLIDE 30

Language Basics

Example Aspects of Syntax Grammatical roles within a sentence Heads Required part of a constituent

e.g., noun of a noun phrase (NP)

Head defines # of arguments Dependents: arguments and adjuncta Adjuncta are optional

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 30

slide-31
SLIDE 31

Language Basics

Example Aspects of Semantics Meaning of words and sentences Not all words carry semantics May serve syntactical functions Vary between languages Synonym Same meaning, different words Homonymity Same word, but different meaning Polysemity Same word, but different senses (slightly different meaning)

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 31

slide-32
SLIDE 32

Limitations

Known issues and common limitations

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 32

slide-33
SLIDE 33

Limitations

Upper Bounds Human (dis-)agreement e.g., for PoS-tagging about 97% Dataset sizes Estimating probabilities requires a lot of evidence

Complex models (bi-gram, tri-gram, ...) require exponentially larger dataset Words are not a closed class, e.g., neologism

Sentences might be unique (previously unseen) Languages are vastly different Resources unevenly distributed

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 33

slide-34
SLIDE 34

Limitations

Limitations Computational resources Many, more complex, models are thinkable ... but not computable, e.g., NP-hard

→ need to cut corners

Complexity of the problem Might be AI-complete e.g., word sense disambiguation ... human languages are derived from the human cognition

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 34

slide-35
SLIDE 35

Limitations

Open Issues Unsolved problems Long ranging dependencies

E.g., for anaphora/co-reference resolution

Knowledge representation Many approaches from deep learning

... work for images, but not for text

Reasoning Connotations

e.g., for irony detection

...

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 35

slide-36
SLIDE 36

Limitations

Summary Current AI/NLP systems ... are very clever1 in ... appearing clever2 ... despite given their limitations

1As in cleverly engineered 2As in like humans

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 36

slide-37
SLIDE 37

Applications, Tools, Tasks

What are the main tasks and applications?

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 37

slide-38
SLIDE 38

Applications, Tools, Tasks

Overview

NLP

Applications Tasks Tools Algorithms

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 38

slide-39
SLIDE 39

Applications, Tools, Tasks

Example Applications Machine translation Speech to text Chatbots Spam detection Spell checking Anonymisation of text Semantic search ...

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 39

slide-40
SLIDE 40

Applications, Tools, Tasks

Tasks Semantic similarity Text summarisation Named entity recognition (NER) Document classification PoS tagging Word segmentation Qestion answering Sentiment analysis More complete list: ❤tt♣s✿✴✴❛❝❧✇❡❜✳♦r❣✴❛❝❧✇✐❦✐✴❙t❛t❡❴♦❢❴t❤❡❴❛rt

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 40

slide-41
SLIDE 41

Applications, Tools, Tasks

Tools Current and (partially outdated) Java systems GATE Apache OpenNLP Mallet Stanford Core NLP

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 41

slide-42
SLIDE 42

Applications, Tools, Tasks

Tools Current Python libraries spaCy NLTK (+TextBlob) GenSim AllenNLP

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 42

slide-43
SLIDE 43

Applications, Tools, Tasks

Tools Other tools Spark NLP fastext NLP Architect ...

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 43

slide-44
SLIDE 44

Thank You!

Next: Traditional NLP Pipeline

Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2020-03-05 44