Lecture 6: Vector Space Model
Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16
1 6501 Natural Language Processing
Lecture 6: Vector Space Model Kai-Wei Chang CS @ University of - - PowerPoint PPT Presentation
Lecture 6: Vector Space Model Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 6501 Natural Language Processing 1 This lecture v How to represent a word, a sentence, or a document? v
Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16
1 6501 Natural Language Processing
2 6501 Natural Language Processing
6501 Natural Language Processing 3
6501 Natural Language Processing 4
egg student talk university happy buy
6501 Natural Language Processing 5
6501 Natural Language Processing 6
>>> fromnltk.corpusimportwordnet as wn >>> wn.synsets('motorcar') [Synset('car.n.01')] 6501 Natural Language Processing 7 >>> motorcar.hypernyms() [Synset('motor_vehicle.n.01')] >>> paths = motorcar.hypernym_paths() >>> [synset.name() forsynsetin paths[0]] ['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01', 'instrumentality.n.03', 'container.n.01', 'wheeled_vehicle.n.01','self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01'] >>> [synset.name() forsynsetin paths[1]] ['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01', 'instrumentality.n.03', 'conveyance.n.03', 'vehicle.n.01', 'wheeled_vehicle.n.01', 'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01']
6501 Natural Language Processing 8
>>> right = wn.synset('right_whale.n.01') >>> minke = wn.synset('minke_whale.n.01') >>> orca = wn.synset('orca.n.01') >>> tortoise = wn.synset('tortoise.n.01') >>> novel = wn.synset('novel.n.01') >>> right.lowest_common_hypernyms(minke) [Synset('baleen_whale.n.01')] >>> right.lowest_common_hypernyms(orca) [Synset('whale.n.02')] >>>right.lowest_common_hypernyms(tortoise) [Synset('vertebrate.n.01')] >>> right.lowest_common_hypernyms(novel) [Synset('entity.n.01')]
6501 Natural Language Processing 9
6501 Natural Language Processing 10
6501 Natural Language Processing 11
Implementation of the Brown hierarchical word clustering algorithm. Percy Liang
6501 Natural Language Processing 12
6501 Natural Language Processing 13
𝑤0.23 = [0.8 0.9 0.1 … ] 𝑤45662 = [0.8 0.1 0.8 … ] 𝑤())/* = [0.1 0.2 0.1 0.8 … ]
royalty masculinity femininity eatable
6501 Natural Language Processing 14
Royalty
w4 w2 W1 W5 w3 |D2-D4|
6501 Natural Language Processing 15
6501 Natural Language Processing 16
6501 Natural Language Processing 17
6501 Natural Language Processing 18
5 ⋅ ; ||5||⋅||;||
6501 Natural Language Processing 19
5 ||5|| is a unit vector
5 ⋅ ; ||5||⋅||;||
6501 Natural Language Processing 20
6501 Natural Language Processing 21
6501 Natural Language Processing 22
6501 Natural Language Processing 23
6501 Natural Language Processing 24
6501 Natural Language Processing 25
v Bag-of-word model: documents (clusters) as the basis for vector space
6501 Natural Language Processing 26
6501 Natural Language Processing 27
6501 Natural Language Processing 28 joy gladden sorrow sadden goodwill Group 1: “joyfulness” 1 1 Group 2: “sad” 1 1
Group 3: “affection”
1
6501 Natural Language Processing 29 joy gladden sorrow sadden goodwill Group 1: “joyfulness” 1 1 Group 2: “sad” 1 1
Group 3: “affection”
1
Pros and cons?
6501 Natural Language Processing 30
6501 Natural Language Processing 31
6501 Natural Language Processing 32
6501 Natural Language Processing 33
6501 Natural Language Processing 34
6501 Natural Language Processing 35
6501 Natural Language Processing 36
Example from Christopher Manning and Pandu Nayak, introduction to IR
6501 Natural Language Processing 37
6501 Natural Language Processing 38
6501 Natural Language Processing 39
6501 Natural Language Processing 40
6501 Natural Language Processing 41
6501 Natural Language Processing 42
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, NIPS 16
6501 Natural Language Processing 43
sunny rainy windy cloudy car wheel cab sad joy emotion feeling
6501 Natural Language Processing 44
6501 Natural Language Processing 45
6501 Natural Language Processing 46
? Michelle Obama Democratic Party George W Bush Laura Bush Republic Party
6501 Natural Language Processing 47
6501 Natural Language Processing 48