Statistical Natural Language Processing An overview of NLP - PowerPoint PPT Presentation

Statistical Natural Language Processing An overview of NLP applications: some topics not covered during the course Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2019

Some remarks on the exam fjrst things fjrst – Single a4 paper with anything that you want to remember – You can use both sides – You can hand-write/print as small as you like, but should be legible with bare eye Questions? Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 1 / 20 • Exam is scheduled on Fri July 26, start at 10:00, 10:30, or 11:00? • The duration is 2 hours • The exam (type of questions, length) will be similar to last year’s exam • Topics may shift, covering anything we studied during the course • You can bring a ‘cheat sheet’:

Resit nobody will need it, but just in case... Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, fjrst (maybe second) week of October by improving your exam score 2 / 20 – Easter-egg bonus • Note that your fjnal score is combination of – Exam ( 40 % ) – Assignments, best 6 scores out of 7 ( 60 % ) – Attendance (+ 5 % ) • The exam scores will be announced (latest) the week after the exam • Last two assignments will be graded in August • You can take a resit exam if your overall score <60 % , but you can reach 60 % • Resit will be scheduled before the beginning of the winter semester. Likely

A quick summary so far – N-gram language models Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Part III (would be) NLP applications – Vector representations / vector semantics – Statistical parsing – Tokenization / segmentation Part I Background & machine learning Part II NLP methods – Unsupervised learning – Sequence learning – Neural networks – How evaluate machine learning methods – Supervised methods: regression / classifjcation – Math: linear algebra, probability & information theory 3 / 20

Machine translation what & why your grandmother when she asks ‘what does a computational linguist do?’ Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 4 / 20 • Motivation for MT does not need many words: it is the example you give to • Rule-based machine translation is diffjcult • Most modern MT systems are statistical

Machine translation how: basic idea Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 5 / 20 arg max p ( e | f ) = arg max p ( f | e ) p ( e ) e e • The above defjnes a noisy-channel model • p ( f | e ) estimated with the noisy channel idea • p ( e ) is a language model

Machine translation how: phrase-based MT Using a parallel corpus, Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 6 / 20 arg max p ( e | f ) = arg max p ( f | e ) p ( e ) e e • Align sentences, estimate p ( f | e ) • We can estimate p ( e ) even from a (larger) mono-lingual corpus

Machine translation </s> Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Decoder Encoder </s> how: end-to-end systems (mostly neural) 7 / 20 arg max p ( e | f ) = arg max p ( f | e ) p ( e ) e e Estimate p ( e | f ) directly, typically with a recurrent neural network e 1 e 2 e 3 e 4 e 1 e 2 e 3 e 4 f 1 f 2 f 3

Machine translation How does it work? (1) Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 8 / 20

Machine translation How does it work? (2) Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 9 / 20

Machine translation How does it work? (seriously) – Solving issues with ambiguities, idioms, special/rare constructions – Low resource languages Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 10 / 20 • Works fjne if you have lots of parallel text • A lot of work remains in:

Entity recognition what & why Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, GEO Ukraine NONE visit NONE to NONE plans PER Guterres PER Antonio NONE Secretary-General ORG UN 11 / 20 • Many other applications depend on locating certain entities in text • Typical entities interest include: people, organizations, locations • Can be application specifjc too: e.g., drug/disease names

Entity recognition how Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 12 / 20 • Generally viewed as a typical sequence learning task • Any sequence learning model applies: e.g., HMMs, RNNs • Some linguistic processing is often helpful (e.g., POS tagging)

Relation extraction to Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, them GEO Ukraine NONE visit NONE NONE what & why plans head-of PER Guterres PER Antonio NONE Secretary-General ORG UN 13 / 20 • For many other tasks, we do not only need entities, but the relations between

Relation extraction how 1. Extract all pairs of entities of interest 2. Train the classifjer, to predict whether the entities are related Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 14 / 20 • Many approaches rely on patterns • Using classifjers on annotated data is also popular • Semi-supervised learning methods are common • Does it also look like dependency parsing?

Summarization what & why summarization are much wider – reduces the reading time – helps selecting right documents to read – may improve/help with Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 15 / 20 • We have lots, lots of text on any subject of choice • Probably you use them daily (e.g., news aggregators), but applications of • Summarization • indexing • storing/processing/searching large document collections • other applications like question answering

Summarization how Extractive summarization selects important sentences from the text. sequence) summary Abstractive summarization fuses sentences, combining and re-structuring them How about treating it like a machine translation problem? summarization too Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 16 / 20 • The task is binary classifjcation (paying attention to the • Classifjer decides whether to keep or discard the sentence in the • RNNs of the sort used in MT have lately been popular for

Question answering what & why document collection (e.g., IBM Watson) the class (and more) Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 17 / 20 • QA is another NLP application that needs little explanation • The task is given a question fjnd the answer in a database, or a unstructured • Domain specifjc specifjc are common • More general QA systems can perform well, sometimes better than humans • Also an important part of for modern personal assistant systems • Most systems are complex, combining many of the methods we discussed in

Question answering how database – linguistic processing (parsing) helps – Supervised methods can learn queries from natural language questions Question Text with answer RNN RNN Answer Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 18 / 20 • The natural language questions are turned int formal queries, searched in a • Again, RNNs have been recent popular approach

Ç. Çöltekin, More… SfS / University of Tübingen Summer Semester 2019 19 / 20 • Topic modeling / text mining • Information extraction • Coreference resolution • Semantic role labeling • Dialog systems • Speech recognition • Speech synthesis • Spelling correction • Text normalization

Summary we studied in this course Next: Mon Summary & your questions Wed Assignments 6 & 7, exam questions/discussion Fri Exam Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 20 / 20 • Many other problems/applications in NLP can be solved with the methods • Most of the real-world problems require a combination of multiple methods

Additional reading, references, credits many of these problems/applications (more on the 3rd edition draft) Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 A.1 • The textbook (Jurafsky and Martin 2009) includes detailed information on

Statistical Natural Language Processing An overview of NLP - PowerPoint PPT Presentation

Statistical Natural Language Processing An overview of NLP applications: some topics not covered during the course ar ltekin University of Tbingen Seminar fr Sprachwissenschaft Summer Semester 2019 Some remarks on the exam

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Statistical natural language processing 24.05.19 Statistical Natural Language Processing 1 The

2018 Annual Mee-ng Agenda Introduc)ons Chris Pyryt Approval of 2017 Annual Mee)ng Minutes

Qt Surprises signal/slot access specifjers Slot access check bool QObject::connect( const

CPSC 490: Problem Solving in Computer Science A bipartite graph is: and Y . A graph with no

Introduction to resampling methods Tushar Shanker Data Scientist DataCamp Statistical

Tailor-S: Look What You Made Me Do! Vadim Semenov Software Engineer @ Datadog

Objec(ves Review Lab 1 Linux prac(ce Programming prac(ce Print statements

How to Keep Your Game on Top of The Charts The story (mostly) of Doodle Jump Igor Pu enjak

Fuzzy Drove Conservation Group Public Meeting 29th June 2015 www.fuzzydrove.co.uk Agenda