Algorithms for Natural Language Processing Lecture 7: Lexical - - PowerPoint PPT Presentation

algorithms for natural language processing
SMART_READER_LITE
LIVE PREVIEW

Algorithms for Natural Language Processing Lecture 7: Lexical - - PowerPoint PPT Presentation

Algorithms for Natural Language Processing Lecture 7: Lexical Semantics Three Ways of Looking at Word Meaning Decompositional What the components of meaning in a word are Ontological How the meaning of the word


slide-1
SLIDE 1

Algorithms for Natural Language Processing

Lecture 7: Lexical Semantics

slide-2
SLIDE 2

Three Ways of Looking at Word Meaning

  • Decompositional

– What the “components” of meaning “in” a word are

  • Ontological

– How the meaning of the word relates to the meanings of other words

  • Distributional

– What contexts the word is found in, relative to

  • ther words
slide-3
SLIDE 3

Decompositional Semantics

<latexit sha1_base64="YkJ5iztoEKnk0ilCsQYv0uTieao=">ACgXicbVDLatAFB0pfaTqy203hW7UOC0tIUHyJosSMOm4ILdRzwGHM1upIHz0PMjEqM0B/kB7vpR/QLMnJVSJ1eGDice869d05WCW5dkvwMwr179x83H8UPX7y9NnzwYuXF1bXhuGUaHNZQYWBVc4dwJvKwMgswEzrL1564/+4HGcq2+u02FCwml4gVn4Dy1HFxTpbnKUTnq8MoZ2WR601I6H1VuER1SgYWb0wxLrhowBjZtw8ZNK9roKH4f/WsagnKu6LjW2SBEgTuspDXwnUkRZX3I6nh5cotDpeDYXKSbCu+C9IeDElfk+XgF801q6U/nwmwdp4m/mw/1XHmV0e0tlgBW0OJcw8VSLSLZptaG7/zTB4X2vinXLxlbzsakNZuZOaVEtzK7vY68r+9bpytkPn9ORbg/9sxBYKrDdrmK1QV+WZD+S4C8XLDoJXHWyZgZGabaOJ0a3PpB09/t3wcXoJPX42g4Pu+j2SdvyAH5QFJySsbkC5mQKWHkd/A6eBschHvhxzAJR3+kYdB7XpF/Kvx0AzMxWY=</latexit> <latexit sha1_base64="nWJX1Zb98te8LQ/UGy3nUE2Ll1M=">ACgnicbZDLatAFIZHStom7s1NV6UbEaeltCRI7qKbBky7yabgQB0HPMYcjY7kwXMRM6NSI/QIecBs8hJ9gY4cBXI7MPDznev8aSm4dXF8GYRb20+ePtvZ7T1/8fLV6/6bvTOrK8NwrTQ5jwFi4IrnDjuBJ6XBkGmAqfp6mebn/5BY7lWv926xLmEQvGcM3AeLfoXVGmuMlSOvzrjKwLbkRD6WxYunvgArM3YymWHBVgzGwbmo2qhvR9L5EH6ObpmUlQfmuOzBHCQJbeniLQlYJ10KutGUsOLpZsfLPqD+CjeRPRQJ0YkC7Gi/4VzTSrpL+fCbB2lsT+bD/VceZX92hlsQS2gJnXiqQaOf1xrYm+uBJFuXa+KdctKG3O2qQ1q5l6isluKW9n2vho7l2nC2R+f0Z5uD/25IcwVUGbf0LypKr4tgbctia4sOglctWX1FIzSbBWNjW68Icn97z8UZ8OjxOvT4WD0o7Nmh7wn+QTScg3MiInZEwmhJF/wbtgPxiE2+HnMAm/XpeGQdfzltyJ8Pt/KULF0g=</latexit> <latexit sha1_base64="kzjnS6APslgHFuaz4pLe84ZgBGg=">ACgXicbVBda9RAFJ2kamv82uqL4EvsVlGkJdkXH0RY9MUXYQW3W9hZlpvJTXbY+Qgzk9Il5B/4B3xR/QXdLKNsLZeGDice869d05WCW5dkvwOwr179x/sHzyMHj1+8vTZ4PD5mdW1YThlWmhznoFwRVOHXcCzyuDIDOBs2z9tevPLtBYrtVPt6lwIaFUvOAMnKeWg19Ua5yVI46vHRGNhJUS+l8VLlFdEwFm5OMy5asAY2LQNGzetaKMP8dv4r2dV37ikx2yQAkCO3ZXCnktXEdSVHk/khpertzieDkYJqfJtuK7IO3BkPQ1WQ7+0FyzWvrzmQBr52niz/ZTHWd+dURrixWwNZQ491CBRLtotqm18RvP5HGhjX/KxVt219GAtHYjM6+U4Fb2dq8j/9vrxtkKmd+fYwH+vx1TILjaoG2+Q1VxVX72gZx0oXiZRSeBq07WzMAozdbxOjWB5Le/v5dcDY6T3+MRqOv/TRHJBX5Ii8Iyn5SMbkG5mQKWHkKngZvA6Owr3wfZiEoxtpGPSeF+SfCj9dAxM2xVY=</latexit> <latexit sha1_base64="G9USG7ljhlk93GXwi3Vgkv3K4gA=">ACg3icbVDbatAEF0pTZu6l7jtW/siYrcUQoNkCn0KmPYlLwUX6jgNWa0GsmL9yJ2V2N0C/k/qSr8gHZOUo0DgdWDicOWdm56Sl4NbF8d8g3Hu0/jJwdPes+cvXh72X70+t7oyDKdMC20uUrAouMKp407gRWkQZCpwlq6/tf3ZLzSWa/XTbUpcSCgUzkD56l/5IqzVWGylGHf5yR9W8tQTWUzkelW/SGVGDu5jTFgqsajIFNU7Nx3Yimdx9iO5cq+rWdY/MUYLAXRaySriWpKiybiQ1vFi5xXDZH8Qn8baihyDpwIB0NVn2r2imWSX9AUyAtfMk9t/2Ux1nfnWPVhZLYGsocO6hAol2UW9za6L3nsmiXBv/lIu27L+OGqS1G5l6pQS3sru9lvxvrx1nS2R+f4Y5+HtbJkdwlUFbf4ey5Ko49YF8akPxMotOAletrJ6BUZqto4nRjQ8k2T3/ITgfnSQe/xgNxl+7aA7IO3JEPpKEfCFjckYmZEoYuQ7eBoNgGO6Hx+Eo/HwrDYPO84bcq/D0BkDLxk4=</latexit>
slide-4
SLIDE 4

Limitations of Decompositional Semantics

  • Where do the features come from?

– How do you divide semantic space into features like this? – How do you settle on a final list?

  • How do you assign features to words in a

principled fashion?

  • How do you link these features to the real world?
  • For these reasons, decompositional semantics is

the least computationally useful approach to semantics

slide-5
SLIDE 5

Ontological Approaches to Semantics

slide-6
SLIDE 6

Semantic Relations

  • In grammar school, or in preparation for

standardized tests, you may have learned the following terms: synonymy, antonymy

  • Synonymy and antonymy are relations

between words. They are not alone: hyponymy, hypernymy, meronymy, holonymy

slide-7
SLIDE 7

Semantic Relations

  • Synonymy—equivalence

– <small, little>

  • Antonymy—opposition

– <small, large>

  • Hyponymy—subset; is-a relation

– <dog, mammal>

  • Hypernymy—superset

– <mammal, dog>

  • Meronymy—part-of relation

– <liver, body>

  • Holonymy—has-a relation

– <body, liver>

slide-8
SLIDE 8

Lexical Mini-Ontology

wall.v.1 wall.n.1 surround.v.2 fence.n.1 build.v.1 door.n.1 building.n.1 enclosure.n.1 destroy.v.1

hypernymy (is-a) hypernym hyponym synonymy synonym synonym meronymy (has-a) holonym (whole) meronym (part) antonymy antonym antonym

slide-9
SLIDE 9

WordNet

  • WordNet is a lexical

resource that organizes words according to their semantic relations

slide-10
SLIDE 10

WordNet

  • Words have different

senses

  • Each of those senses is

associated with a synset (a set of words that are roughly synonymous for a particular sense)

  • These synsets are

associated with one another through relations like antonymy, hyponymy, and meronymy

slide-11
SLIDE 11

WordNet is a glorified electronic thesaurus

slide-12
SLIDE 12

Synsets for dog (n)

  • S: (n) dog, domestic dog, Canis familiaris (a member of the genus Canis

(probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds) "the dog barked all night"

  • S: (n) frump, dog (a dull unattractive unpleasant girl or woman) "she got a

reputation as a frump"; "she's a real dog"

  • S: (n) dog (informal term for a man) "you lucky dog"
  • S: (n) cad, bounder, blackguard, dog, hound, heel (someone who is morally

reprehensible) "you dirty dog"

  • S: (n) frank, frankfurter, hotdog, hot dog, dog, wiener, wienerwurst,

weenie (a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll)

  • S: (n) pawl, detent, click, dog (a hinged catch that fits into a notch of a

ratchet to move a wheel forward or prevent it from moving backward)

  • S: (n) andiron, firedog, dog, dog-iron (metal supports for logs in a

fireplace) "the andirons were too hot to touch"

12

slide-13
SLIDE 13

What’s a Fish? (According to WordNet)

  • fish (any of various mostly cold-blooded aquatic vertebrates usually having scales

and breathing through gills)

  • aquatic vertebrate (animal living wholly or chiefly in or on water)
  • vertebrate, craniate (animals having a bony or cartilaginous skeleton with a

segmented spinal column and a large brain enclosed in a skull or cranium)

  • chordate (any animal of the phylum Chordata having a notochord or spinal

column)

  • animal, animate being, beast, brute, creature, fauna (a living organism

characterized by voluntary movement)

  • rganism, being (a living thing that has (or can develop) the ability to act or

function independently)

  • living thing, animate thing (a living (or once living) entity)
  • whole, unit (an assemblage of parts that is regarded as a single entity)
  • bject, physical object (a tangible and visible entity; an entity that can cast a

shadow)

  • entity (that which is perceived or known or inferred to have its own distinct

existence (living or nonliving))

13

slide-14
SLIDE 14

Thesaurus-based Word Similarity

14

giraffe gazelle Order Artiodactyla Class Mammalia lion Order Carnivora Genus Felidae Genus Caniformia Genus Bovidae Genus Giraffidae …

slide-15
SLIDE 15

Information Content

15 (Adapted from Lin. 1998. An information Theoretic Definition of Similarity. ICML.)

# words that are equivalent to or are hyponyms of c

IC(c) = -log

# words in corpus

Entity Inanimate-object Natural-object Geological formation Natural-elevation Hill Shore Coast 0.93 1.79 4.12 6.34 9.09 9.39 10.88 10.74

slide-16
SLIDE 16

WordNet Interfaces

  • Various interfaces to WordNet are available

– Many languages listed at https://wordnet.princeton.edu/related-projects – NLTK (Python)

>>> from nltk.corpus import wordnet as wn >>> wn.synsets('dog’)

(returns list of Synset objects) http://www.nltk.org/howto/wordnet.html

slide-17
SLIDE 17

Limitations of WordNet and Ontological Semantics

  • WordNet is a useful resource
  • There are intrinsic limits to this type of resource,

however:

– It requires many years of manual effort by skilled lexicographers – In the case of WordNet, some of the lexicographers were not that skilled, and this has led to inconsistencies – The ontology is only as good as the ontologist(s); it is not driven by data

  • We will now look at an approach to lexical semantics

that is data driven and does not rely on lexicographers

slide-18
SLIDE 18

Beef

Sentences from the brown corpus. Extracted from the concordancer in The Compleat Lexical Tutor, http://www.lextutor.ca/

slide-19
SLIDE 19

Chicken

slide-20
SLIDE 20

Context Vectors

slide-21
SLIDE 21

Hypothetical Counts based on Syntactic Dependencies

Modified-by- ferocious(adj) Subject-of- devour(v) Object-of- pet(v) Modified-by- African(adj) Modified-by- big(adj) Lion 15 5 6 15 Dog 7 3 8 12 Cat 1 1 6 1 9 Elephant 10 15 …

21

slide-22
SLIDE 22

A Problem

  • Some words are going to occur together many

times just because they are very frequent

  • The English words the and is are likely to occur

in the same window many times

  • They may not have a lot to do with one

another except for the fact that they are frequent

  • How should we address this?
slide-23
SLIDE 23

Pointwise Mutual Information

PMI(w, f) = log2 p(w, f) p(w) × p(f) = log2 N × count(w, f) count(w) × count(f)

slide-24
SLIDE 24

Distributionally Similar Words

24

Rum vodka cognac brandy whisky liquor detergent cola gin lemonade cocoa chocolate scotch noodle tequila juice Write read speak present receive call release sign

  • ffer

know accept decide issue prepare consider publish Ancient

  • ld

modern traditional medieval historic famous

  • riginal

entire main indian various single african japanese giant Mathematics physics biology geology sociology psychology anthropology astronomy arithmetic geography theology hebrew economics chemistry scripture biotechnology

(from an implementation of the method described in Lin. 1998. Automatic Retrieval and Clustering of Similar Words. COLING-ACL. Trained on newswire text.)

slide-25
SLIDE 25

Questions?