Statistical Natural Language Processing
- Dr. Besnik Fetahu
Statistical Natural Language Processing Dr. Besnik Fetahu Lecture - - PowerPoint PPT Presentation
Statistical Natural Language Processing Dr. Besnik Fetahu Lecture Lecture: Thursdays: 10:00 11:30 Location: MultiMedia Raum, L3S, Appelstr. 9a Contact: Dr. Besnik Fetahu Tel: 17797 E-mail:
2
1.0 grade point improvement
3
4
5
Education, 2000.
Learning”, 2006.
Learning”. MIT Press, 2016.
6
7 https://www.tib.eu/en/search/id/TIBKAT%3A188854029/Natural- language-engineering/ https://www.tib.eu/en/search/id/TIBKAT%3A577240269/Speech-and- language-processing-an-introduction/ https://www.tib.eu/en/search/id/TIBKAT%3A627718655/Pattern- recognition-and-machine-learning/ https://www.tib.eu/en/search/id/springer%3Adoi~10.1007%252Fs107 10-017-9314-z/Ian-Goodfellow-Yoshua-Bengio-and-Aaron-Courville/
Relation Extraction
8
9
automated manner
and understand due to:
10
1. What kinds of things do people say? 2. What do these things say/ask/request about the world?
11
formed)
determine conventionality
12
word frequency and its context:
complementizer (subordinate clauses)
sexual preference.
understanding:
13
her a good lunch." (she = Margaret; her = Susan)
gone through the roof" (= increased greatly)
14
”book”, “bank”, “can” etc.)
with its hypernym (e.g. ”pigeon”, “crow” as “birds”)
exactly or nearly the same thing as another lexeme.
15
https://en.wikipedia.org/wiki/Homonym
16
17
American English.
politics, sports, etc.)
Street Journal.
where nouns, verbs, adjectives and adverbs are grouped into synsets.
range of topics
18
(SQuAD) is a reading comprehension dataset.
set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.
English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral.
19
CoreNLP for better accuracy)
20
21
22
23
a sentence with its appropriate part of speech.
forms into a predefined class of named entity categories (e.g. Person, Location, Organization):
24
the correct sense of a word given its context.
25
The robot that can recycle a can is useful for the environment.
constituents or brackets
26
context to entities from a reference database.
27 Credit to: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/
categories determine to which one a piece of text belongs to? (e.g. spam or not spam for e-mails)
28
supports/rejects/has no info for a given hypothesis.
29
TEXT HYPOTHESIS TASK ENTAIL- MENT 1 Regan attended a ceremony in Washington to commemorate the landings in Normandy. Washington is located in Normandy. IE False 2 Google files for its long awaited IPO. Google goes public. IR True 4 The SPD got just 21.5% of the vote in the European Parliament elections, while the conservative opposition parties polled 44.5%. The SPD is defeated by the opposition parties. IE True
Credit to: http://www.cs.biu.ac.il/~dagan/TE-Tutorial-ACL07.ppt
to another, while preserving the meaning and producing fluent text in the target language.
30
domain, where for a given question the task is to find a textual snippet which answer the question.
31
Credit to: https://arxiv.org/pdf/1806.03822.pdf
proper nouns they refer to.
from text.
has positive or negative sentiment.
words in a piece of text.
generate text for a given set of seed words.
32
33
34
incomplete, uncertain information, and thus, interpretation has to be based on probabilities
time, by incorporating diverse sources of evidence, including frequency information
similar behavior and interpret language in terms of probabilities
35
rank and the frequency.
a language generation model (Mandelbrot’s law)
36
37
the 1148 and 970 to 771
671 i 635 you 554 a 550 my 514 hamlet 494 in 451
Unigrams
my lord hamlet 62 my lord i 21 rosencrantz and guildenstern 18 good my lord 15 i pray you 13 exeunt hamlet act 12 in the castle 12 the castle enter 12 enter king claudius 11 that i have 11
Trigrams
my lord 180 king claudius 121 in the 93 lord polonius87 queen gertrude 82 lord hamlet 78 to the 72
it is 58 i ll 56
Bigrams
The Tragedy of Hamlet, Prince of Denmark, often shortened to Hamlet (/ˈhæmlɪt/), is a tragedy written by William Shakespeare at an uncertain date between 1599 and 1602. Set in Denmark, the play dramatises the revenge Prince Hamlet is called to wreak upon his uncle, Claudius, by the ghost of Hamlet's father, King Hamlet. Claudius had murdered his own brother and seized the throne, also marrying his deceased brother's widow.
https://en.wikipedia.org/wiki/Hamlet
38
39
NNP 107 VB NNP 11 NNP CC 3 NNP NNP 2 NNP 24 PRP VBP 7 VB, PRP 7 PRP VBZ 5
Claudius Syntactic frame
king 105 enter king 10 varro and 3 exeunt king 2 queen 7
laertes 3
Claudius Semantic frame
KING/NNP 105 Enter/VB KING/NNP 10 VARRO/NNP and/CC 2 Exeunt/NNP KING/NNP 2
Claudius
QUEEN/NNP 7 O/NNP 4 Laertes/NNP 3
Syntactic/Semantic frame
differently?
40
41
from the course web page.
course web page
42