statistical natural language processing
play

Statistical Natural Language Processing ar ltekin - PowerPoint PPT Presentation

Statistical Natural Language Processing ar ltekin ccoltekin@sfs.uni-tuebingen.de University of Tbingen Seminar fr Sprachwissenschaft Summer Semester 2019 / ta tltecn / Motivation Overview Practical matters


  1. Statistical Natural Language Processing Çağrı Çöltekin ccoltekin@sfs.uni-tuebingen.de University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2019 / tʃaːɾˈɯ tʃœltecˈɪn /

  2. Motivation Overview Practical matters Next Why study (statistical) NLP program science (and more) Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 1 / 29 • (Most of) you are studying in a ‘computational linguistics’ • Many practical applications • Investigating basic questions in linguistics and cognitive

  3. Motivation synthesis Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, identifjcation research annotation for linguistic text processing documentation through space change through time and processing learning For fun (research): Overview 2 / 29 Just a few examples Practical matters Next For profjt (engineering): Application examples • Machine translation • Modeling language • Question answering • Investigating language • Information retrieval • Dialog systems • Summarization • (Aiding) language • Text classifjcation • Text mining/analytics • Sentiment analysis • (Automatic) corpus • Speech recognition and • Stylometry, author • Automatic grading • Forensic linguistics

  4. Motivation Semantic Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Synthesis Speech Generation Word Generation Sentence Planning Sentence analysis Discourse analysis Parsing Overview Analysis Morphological Recognition Speech Generation Analysis discourse semantics syntax morphology phonetics / phonology Layers of linguistic analysis Next Practical matters 3 / 29

  5. Motivation det Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Syntax punct nsubj det root obl case Overview : story this comes AP the From Annotation layers: example Next Practical matters 4 / 29 → Tokens

  6. Motivation Overview Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Syntax punct nsubj det root obl det case PUNCT NOUN DET VERB PROPN Practical matters Next Annotation layers: example From the AP comes this story : ADP DET 4 / 29 → Tokens → POS Tags

  7. Motivation root 3s,Pres Sing,Dem Sing case det obl det Overview nsubj punct Syntax Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 Sing Def PUNCT NOUN Practical matters Next Annotation layers: example From the AP comes this story : ADP DET PROPN VERB DET 4 / 29 → Tokens → POS Tags → Morphology

  8. Motivation obl Sing 3s,Pres Sing,Dem Sing case det root PUNCT det nsubj punct Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 Overview Def NOUN comes Practical matters Next Annotation layers: example From the DET AP this story : ADP DET PROPN VERB 4 / 29 → Syntax → Tokens → POS Tags → Morphology

  9. Motivation Overview Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, 5 / 29 Next Typical NLP pipeline Practical matters • Text processing / normalization • Word/sentence tokenization • POS tagging • Morphological analysis • Syntactic parsing • Semantic parsing • Named entity recognition • Coreference resolution

  10. Motivation Overview Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, another (recent/trending) approach improves the results level – tasks are done individually, results are passed to upper pipeline approach: Do we need a pipeline? Next Practical matters 6 / 29 • Most ”traditional” NLP architectures are based on a • Joint learning (e.g., POS tagging and syntax) often • End-to-end learning (without intermediate layers) is

  11. Motivation Overview Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, statistical component from 80’s 90’s) rule-based methods pretation of this term. — Chomsky (1968) sentence’ is an entirely useless one, under any known inter- But it must be recognized that the notion ’probability of a On the word ‘statistical’ Next Practical matters 7 / 29 • Some linguistic traditions emphasize(d) use of ‘symbolic’, • Some NLP systems are based on rule-based systems (esp. • Virtually, all modern NLP systems include some sort of

  12. Motivation Overview Practical matters Next What is diffjcult with NLP? Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 8 / 29 • Combinatorial problems - computational complexity • Ambiguity • Data sparseness

  13. Motivation Overview Practical matters Next NLP and computational complexity probabilities of words in it? Many similar questions we deal with have an exponential search space Naive approaches often are computationally intractable Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 9 / 29 • How many possible parses a sentence may have? • How many ways can you align two (parallel) sentences? • How to calculate probability of sentence based on the

  14. Motivation Overview Practical matters Next NLP and computational complexity probabilities of words in it? search space Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 9 / 29 • How many possible parses a sentence may have? • How many ways can you align two (parallel) sentences? • How to calculate probability of sentence based on the • Many similar questions we deal with have an exponential • Naive approaches often are computationally intractable

  15. Motivation 4 Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, … … 20 10 5 3 Overview 2 trees words A typical linguistic problem: parsing Combinatorial problems Next Practical matters 10 / 29 How many difgerent binary trees can span a sentence of N words?

  16. Motivation 3 Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, … … 20 10 5 4 2 Overview trees words b a A typical linguistic problem: parsing Combinatorial problems Next Practical matters 10 / 29 How many difgerent binary trees can span a sentence of N words?

  17. Motivation trees Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, … … 20 10 5 4 3 2 words Overview c b a c b a A typical linguistic problem: parsing Combinatorial problems Next Practical matters 10 / 29 How many difgerent binary trees can span a sentence of N words?

  18. Motivation 3 a b c d words trees 2 4 c 5 10 20 … … Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 d b Overview a Practical matters Next Combinatorial problems A typical linguistic problem: parsing a b c d a b c d a b c d 10 / 29 How many difgerent binary trees can span a sentence of N words?

  19. Motivation 3 c d e … words trees 2 4 a 5 10 20 … … Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 b e Overview d Practical matters Next Combinatorial problems A typical linguistic problem: parsing a b c e d a b c d e a b c 10 / 29 How many difgerent binary trees can span a sentence of N words?

  20. Motivation 3 Overview d e … words trees 2 4 a 5 10 20 … … Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 b c e e Practical matters Next Combinatorial problems A typical linguistic problem: parsing a b c d d a b c d e a b c 10 / 29 How many difgerent binary trees can span a sentence of N words? 1 2 5 14 4862 1 767 263 190

  21. Motivation PROSTITUTES APPEAL TO POPE Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, MINERS REFUSE TO WORK AFTER DEATH DRUNK GETS NINE MONTHS IN VIOLIN CASE KIDS MAKE NUTRITIOUS SNACKS BAN ON NUDE DANCING ON GOVERNOR’S DESK Overview SQUAD HELPS DOG BITE VICTIM TEACHER STRIKES IDLE KIDS FARMER BILL DIES IN HOUSE fun with newspaper headlines NLP and ambiguity Next Practical matters 11 / 29

  22. Motivation PROSTITUTES APPEAL TO POPE Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, MINERS REFUSE TO WORK AFTER DEATH DRUNK GETS NINE MONTHS IN VIOLIN CASE KIDS MAKE NUTRITIOUS SNACKS BAN ON NUDE DANCING ON GOVERNOR’S DESK Overview SQUAD HELPS DOG BITE VICTIM TEACHER STRIKES IDLE KIDS FARMER BILL DIES IN HOUSE fun with newspaper headlines NLP and ambiguity Next Practical matters 11 / 29

  23. Motivation PROSTITUTES APPEAL TO POPE Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, MINERS REFUSE TO WORK AFTER DEATH DRUNK GETS NINE MONTHS IN VIOLIN CASE KIDS MAKE NUTRITIOUS SNACKS BAN ON NUDE DANCING ON GOVERNOR’S DESK Overview SQUAD HELPS DOG BITE VICTIM TEACHER STRIKES IDLE KIDS FARMER BILL DIES IN HOUSE fun with newspaper headlines NLP and ambiguity Next Practical matters 11 / 29

  24. Motivation PROSTITUTES APPEAL TO POPE Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, MINERS REFUSE TO WORK AFTER DEATH DRUNK GETS NINE MONTHS IN VIOLIN CASE KIDS MAKE NUTRITIOUS SNACKS BAN ON NUDE DANCING ON GOVERNOR’S DESK Overview SQUAD HELPS DOG BITE VICTIM TEACHER STRIKES IDLE KIDS FARMER BILL DIES IN HOUSE fun with newspaper headlines NLP and ambiguity Next Practical matters 11 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend