IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 - PowerPoint PPT Presentation

1 IN4080 – 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning

2 Vectors, Distributions, Embeddings Lecture 5, Sept 14

Today 3  Lexical semantics  Vector models of documents  tf-idf weighting  Word-context matrices  Word embeddings with dense vectors

The meaning of words 4  Words (lecture 2)  Type – token  Word – lexeme – lemma  Meaning?

Look into the dictionary ˈ ɛ ə ˈ ɛ ə sense lemma 5 definition pepper, n. πέπερι / ˈ p ɛ p ə / , U.S. / ˈ p ɛ p ə r / Brit. Pronunciation: Forms: OE peopor ( rare ), OE pipcer (transmission error), OE pipor , OE pipur ( rare ... c. U.S. The California pepper tree, Schinus molle . Cf. PEPPER TREE n. 3. Frequency (in current use): Etymology: A borrowing from Latin. Etymon: Latin piper . < classical Latin piper , a loanword < Indo-Aryan (as is ancient Greek πέπερι ); compare San 3. Any of various forms of capsicum, esp. Capsicum annuum var. I . The spice or the plant. annuum . Originally (chiefly with distinguishing word): any variety of the 1. C. annuum Longum group, with elongated fruits having a hot, pungent a. A hot pungent spice derived from the prepared fruits (peppercorns) of taste, the source of cayenne, chilli powder, paprika, etc., or of the the pepper plant, Piper nigrum (see sense 2a), used from early times to perennial C. frutescens , the source of Tabasco sauce. Now frequently • A word with several senses is called season food, either whole or ground to powder (often in association with (more fully sw eet pepper ): any variety of the C. annuum Grossum salt). Also (locally, chiefly with distinguishing word): a similar spice group, with large, bell-shaped or apple-shaped, mild-flavoured fruits, derived from the fruits of certain other species of the genus Piper ; the polysemous usually ripening to red, orange, or yellow and eaten raw in salads or fruits themselves. cooked as a vegetable. Also: the fruit of any of these capsicums. The ground spice from Piper nigrum comes in two forms, the more pungent black pepper , produced • If two different words look and sound from black peppercorns, and the milder white pepper , produced from white peppercorns: see BLACK Sweet peppers are often used in their green immature state (more fully green pepper ), but some 1 adj. and n. Special uses 5a, PEPPERCORN n. 1a, and WHITE adj. and n. Special uses 7b(a). new varieties remain green when ripe. the same, they are called homonyms 2. a. The plant Piper nigrum (family Piperaceae), a climbing shrub indigenous to South Asia and also cultivated elsewhere in the tropics, which has alternate stalked entire leaves, with pendulous spikes of small • How to tell: one word or several? green flowers opposite the leaves, succeeded by small berries turning red when ripe. Also more widely: any plant of the genus Piper or the family Piperaceae. • Common origin • But not waterproof/easy to see b. Usu. with distinguishing word: any of numerous plants of other families having hot pungent fruits or leaves which resemble pepper ( 1a) in taste and in some cases are used as a substitute for it. † †

Relations between senses 6 Term Definition Examples

Relations between senses 7 Term Definition Examples Synonymy Have the same meaning in all(?)/some(?) contexts sofa-couch, bus-coach big-large

Relations between senses 8 Term Definition Examples Synonymy Have the same meaning in all(?)/some(?) contexts sofa-couch, bus-coach big-large Antonymy Opposites with respect to a feature of meaning true-false, strong-weak, up- down

Relations between senses 9 Term Definition Examples Synonymy Have the same meaning in all(?)/some(?) contexts sofa-couch, bus-coach big-large Antonymy Opposites with respect to a feature of meaning true-false, strong-weak, up- down rose  flower , cow  animal, Hyponym-hyperonym The <hyponym> is a type-of the <hyperonym> car  vehicle

Relations between senses 10 Term Definition Examples Synonymy Have the same meaning in all(?)/some(?) contexts sofa-couch, bus-coach big-large Antonymy Opposites with respect to a feature of meaning true-false, strong-weak, up- down rose  flower , cow  animal, Hyponym-hyperonym The <hyponym> is a type-of the <hyperonym> car  vehicle Similarity cow-horse boy-girl

Relations between senses 11 Term Definition Examples Synonymy Have the same meaning in all(?)/some(?) contexts sofa-couch, bus-coach big-large Antonymy Opposites with respect to a feature of meaning true-false, strong-weak, up- down rose  flower , cow  animal, Hyponym-hyperonym The <hyponym> is a type-of the <hyperonym> car  vehicle Similarity cow-horse boy-girl Related money-bank fish-water

Resources for lexical semantics: WordNet 12  https://wordnet.princeton.edu  Relations between the synsets  To each word:  One or more synsets lounge, waiting room, waiting area lounge sofa, couch, lounge couch couch (psych. bench) couch (coat of paint)

What does ongchoi mean? 13  Suppose you see these sentences:  Ong choi is delicious sautéed with garlic.  Ong choi is superb over rice  Ong choi leaves with salty sauces  And you've also seen these:  …spinach sautéed with garlic over rice  Chard stems and leaves are delicious  Collard greens and other salty leafy greens  Conclusion: Ongchoi is a leafy green like spinach, chard, or collard greens

Similar 14 (first-order association, Related syntagmatic) ong choi delicious Similar sautéed with garlic (second-order association, spinach over rice paradigmatic)

The distributional hypothesis 15  Words that occur in similar contexts have similar meanings

Shakespeare (from J & M) 17  Vectors are similar for the two  Notice similarity to text classification comedies  Mandatory 2A, multinomial  Different than the historical  The document represented by a dramas vector with the occurrences of  Comedies have more fools 35,000 terms and wit and fewer battles.

Document classification 18  The word vectors were used as basis for classification  If two documents had the same vectors they were put in the same class  Documents are similar = on the same side of the separating hyperplane A problem to draw 35,000 dimensions

Information retrieval (IR) 19  Documents placed in the same n -dimensional space as in classification 40 Henry V [4,13]  Retrieve documents similar to a 15 battle given document 10 Julius Caesar [1,7] 5 Twelfth Night [58,0] As You Like It [36,1] 5 10 15 20 25 30 35 40 45 50 55 60 fool

Cosine similarity 20  Several possible ways to define similarity, e.g.,  Euclidean 40 Henry V [4,13]  Manhattan 15 battle  Most common: cosine 10 Julius Caesar [1,7] 5 Twelfth Night [58,0] As You Like It [36,1]  Do the arrows point in the same direction? 5 10 15 20 25 30 35 40 45 50 55 60 å fool N cos( v , w ) = v · w v i w i = v · w = i = 1 å å v w v w N N 2 2 v i w i i = 1 i = 1

Let us try: cos(𝑤 1 , 𝑤 2 ) 21 Full vectors battles & fools AYLI TwNi JuCa HenV AYLI TwNi JuCa HenV AYLI 1.000 0.950 0.945 0.949 AYLI 1.000 1.000 0.169 0.321 TwNi 0.950 1.000 0.809 0.822 TwNi 1.000 1.000 0.141 0.294 JuCa 0.945 0.809 1.000 0.999 JuCa 0.169 0.141 1.000 0.988 HenV 0.949 0.822 0.999 1.000 0.321 0.294 0.988 1.000 HenV

Ways of counting: Term frequency 23 Alternatives  Raw counts/absolute frequencies, TeNi = (0, 80, 58, 15)  Binary counts (Mandatory 2A), TeNi = (0, 1, 1, 1)  Variants of normalization. 80 58 15  Rel. frequency, (0, 80+58+15 , 80+58+15 , 80+58+15 )  TfidfTransformer(use_idf=False, norm = "l1") 80 58 15  Length normalize, (0, 80 2 +58 2 +15 2 , 80 2 +58 2 +15 2 , 80 2 +58 2 +15 2 )  TfidfTransformer(use_idf=False, norm = "l2")  Sublinear TF: (1 + log(tf)), 0 when tf=0  TfidfTransformer(use_idf=False, sub_linear=True)

Normalize or not? 24  The cos-similarity measure does a form of length normalization:  Raw counts, relative counts, length normalized counts yield the same  For other measures, it matters whether we normalize  e.g. L2-distance is relative large between documents of different lengths  The sublinear squeezing distinguish between terms that occur often and terms that occurs very often:  If term1 occurs 100 times and term2 occurs 10 times:  term1 will be considered 10 times more frequent than term2  but only 2 times as important with sublinear

Inverse document frequency 25  Intuition: A word occurring in a large proportion of documents is not a good discriminator. 𝑂  𝑗𝑒𝑔 𝑢 = log 𝑒𝑔 𝑢 𝑢 the number of documents containing 𝑢 .  𝑒𝑔  TfidfTransformer(use_idf=True, smooth_idf=False)  Smooth: avoid dividing by zero 𝑂  𝑗𝑒𝑔 𝑢 = log 𝑢 +1 + 1 𝑒𝑔  TfidfTransformer(use_idf=True, smooth_idf=True)

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 - PowerPoint PPT Presentation

1 IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Vectors, Distributions, Embeddings Lecture 5, Sept 14 Today 3 Lexical semantics Vector models of documents tf-idf weighting Word-context matrices Word

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Today 2 Part 1: Course

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Probabilities Tutorial,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural networks, Language

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Words, text processing

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Logistic Regression

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural LMs, Recurrent

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 IE: Relation extraction,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Looking at data 2 Data 3

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

Dialogue management, system design & evaluation Pierre Lison IN4080 : Natural Language

Fall to Fall Enrollment Comparison Fall to Fall Enrollment Comparison Student FTE, Fall 2000

Nizhonigo Iina STAR School Wellness Program Created by Louva Montour and Loretta Jones Wellness

VIP: Visual International Politics Professor William A. Callahan Professor of International

How to give a talk Frank Coolen Term 2, 2013 1 Outline 1. timing 2. content 3. structure 4.

How the hell does Monero work? @pwrcycle > cafecode.com/shellcon2018-monero.pdf whois

Mobile Payment App INSTAPAY Paying at your restaurants just got faster & easier Cyrus Lau

A Mediation Framework for Transparent Access to largely distributed data sources Christine

KEEPING YOUR KIDS SAFE AND NOURISHED Students will practice social distancing in the serving

Dips PoPular DiPs serveD with freshly bakeD, homemaDe breaD. 1. Salatah MaShweya 4.25 2. BaBa

Sambuz

Useful Links

Newsletter

Mail Us

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 - PowerPoint PPT Presentation

1 IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Vectors, Distributions, Embeddings Lecture 5, Sept 14 Today 3 Lexical semantics Vector models of documents tf-idf weighting Word-context matrices Word

Dialogue systems &amp; chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue systems &amp; chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Chatbot models, NLU &amp; ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Today 2 Part 1: Course

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Probabilities Tutorial,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural networks, Language

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Words, text processing

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Logistic Regression

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural LMs, Recurrent

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 IE: Relation extraction,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Looking at data 2 Data 3

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

Dialogue management, system design &amp; evaluation Pierre Lison IN4080 : Natural Language

Fall to Fall Enrollment Comparison Fall to Fall Enrollment Comparison Student FTE, Fall 2000

Nizhonigo Iina STAR School Wellness Program Created by Louva Montour and Loretta Jones Wellness

VIP: Visual International Politics Professor William A. Callahan Professor of International

How to give a talk Frank Coolen Term 2, 2013 1 Outline 1. timing 2. content 3. structure 4.

How the hell does Monero work? @pwrcycle &gt; cafecode.com/shellcon2018-monero.pdf whois

Mobile Payment App INSTAPAY Paying at your restaurants just got faster &amp; easier Cyrus Lau

A Mediation Framework for Transparent Access to largely distributed data sources Christine

KEEPING YOUR KIDS SAFE AND NOURISHED Students will practice social distancing in the serving

Dips PoPular DiPs serveD with freshly bakeD, homemaDe breaD. 1. Salatah MaShweya 4.25 2. BaBa

Sambuz

Useful Links

Newsletter

Mail Us

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue management, system design & evaluation Pierre Lison IN4080 : Natural Language

How the hell does Monero work? @pwrcycle > cafecode.com/shellcon2018-monero.pdf whois

Mobile Payment App INSTAPAY Paying at your restaurants just got faster & easier Cyrus Lau