Vectors and Semantics Peter Turney Vectors and Semantics Vision of - - PDF document

vectors and semantics
SMART_READER_LITE
LIVE PREVIEW

Vectors and Semantics Peter Turney Vectors and Semantics Vision of - - PDF document

Vectors and Semantics November 2008 Peter Turney Vectors and Semantics Peter Turney Vectors and Semantics Vision of the Future future of SKDOU: from text to knowledge input: web output: knowledge beyond search: QA with


slide-1
SLIDE 1

Vectors and Semantics Peter Turney November 2008

Vectors and Semantics

Peter Turney

2

Vision of the Future

  • future of SKDOU: from text to knowledge

– input: web – output: knowledge – beyond search: QA with unconstrained questions and answers – 24/7 continuous automatic learning from the web

  • what will that knowledge look like? – default assumption:

– a giant expert system – but generated automatically, no hand-coding

  • what will that knowledge look like? – my opinion:

– expert systems are missing something vital – expert systems are not a sufficient representation of knowledge – we need vectors

Vectors and Semantics

slide-2
SLIDE 2

Vectors and Semantics Peter Turney November 2008

3

Outline

  • symbolic versus spatial approaches to knowledge

– logic versus geometry

  • term-document matrix

– latent semantic analysis; applications

  • pair-pattern matrix

– latent relational analysis; applications

  • episodic versus semantic

– some hypotheses about vectors and semantics

  • conclusions

– how to acquire knowledge; how to represent knowledge

Vectors and Semantics

4

Outline

  • symbolic versus spatial approaches to knowledge

– logic versus geometry

  • term-document matrix

– latent semantic analysis; applications

  • pair-pattern matrix

– latent relational analysis; applications

  • episodic versus semantic

– some hypotheses about vectors and semantics

  • conclusions

– how to acquire knowledge; how to represent knowledge

Vectors and Semantics

slide-3
SLIDE 3

Vectors and Semantics Peter Turney November 2008

5

Symbolic AI

  • symbolic approach to knowledge

– logic, propositional calculus, graph theory, set theory ...

  • GOFAI: good old-fashioned AI
  • benefits

– good for deduction, reasoning about entailment, consistency – crisp, clean, binary-valued – good for yes/no questions

  • does A entail B?
  • costs

– not so good for induction, learning, theories from data – aliasing: noise due to analog to digital conversion – not good for questions about similarity

  • how similar is A to B?

Symbolic versus Spatial (1 of 3)

6

Spatial AI

  • spatial approach to knowledge

– vector spaces, linear algebra, geometry, ...

  • machine learning, statistics, feature space, information retrieval
  • benefits

– good for induction, learning, theories from data – fuzzy, analog, real-valued – good for questions about similarity

  • similarity(A,B) = cosine(A,B)
  • costs

– not so good for deduction, entailment, consistency – messy, lots of numbers – not convenient for communication

  • language is digital

Symbolic versus Spatial (2 of 3)

slide-4
SLIDE 4

Vectors and Semantics Peter Turney November 2008

7

Symbolic vs Spatial

  • need to combine symbolic and spatial approaches

– symbolic for communication and entailment – spatial for similarity and learning

  • reference

– Peter Gärdenfors. (2000). Conceptual Spaces: The Geometry of Thought. MIT Press.

Symbolic versus Spatial (3 of 3)

8

Outline

  • symbolic versus spatial approaches to knowledge

– logic versus geometry

  • term-document matrix

– latent semantic analysis; applications

  • pair-pattern matrix

– latent relational analysis; applications

  • episodic versus semantic

– some hypotheses about vectors and semantics

  • conclusions

– how to acquire knowledge; how to represent knowledge

Vectors and Semantics

slide-5
SLIDE 5

Vectors and Semantics Peter Turney November 2008

10

Technicalities

  • weighting the elements

– give more weight when a term ti is surprisingly frequent in a document dj – tf-idf = term frequency times inverse document frequency – hundreds of variations of tf-idf

  • smoothing the matrix

– problem of sparsity, small corpus – Singular Value Decomposition (SVD), Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), Nonnegative Matrix Factorization (NMF), ...

  • comparing the vectors

– many ways to compare two vectors – cosine, Jaccard, Euclidean, Dice, correlation, Hamming, ...

Term-Document Matrix (2 of 9)

slide-6
SLIDE 6

Vectors and Semantics Peter Turney November 2008

11

Information Retrieval

  • how similar is document d1 to document d2?

– cosine of angle between d1 and d2 column vectors in matrix

  • how relevant is document d to query q?

– make a pseudo-document vector to represent q – cosine of angle between d and q

  • references

– Gerard Salton and Michael J. McGill. (1983). Introduction to Modern Information Retrieval. McGraw-Hill. – Scott Deerwester, Susan T. Dumais, and Richard Harshman. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407.

Term-Document Matrix (3 of 9)

12

Word Similarity

  • how similar is term t1 to term t2?

– cosine of angle between t1 and t2 row vectors in matrix

  • evaluation on TOEFL multiple-choice synonym questions

– 92.5% highest score of any pure (non-hybrid) algorithm – 64.5% for average human

  • references

– Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240. – Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity. Proceedings of the Ninth Machine Translation Summit, pp. 315-322.

Term-Document Matrix (4 of 9)

slide-7
SLIDE 7

Vectors and Semantics Peter Turney November 2008

13

Essay Grading

  • grade student essays

– latent semantic analysis – commercial product, Pearson's Knowledge Technologies

  • references

– Rehder, B., Schreiner, M.E., Wolfe, M.B., Laham, D., Landauer, T.K., and Kintsch, W. (1998). Using latent semantic analysis to assess knowledge: Some technical considerations. Discourse Processes, 25, 337-354. – Foltz, P.W., Laham, D., and Landauer, T.K. (1999). Automated essay scoring: Applications to educational technology. Proceedings of the ED-MEDIA ‘99 Conference, Association for the Advancement of Computing in Education, Charlottesville.

Term-Document Matrix (5 of 9)

14

Textual Cohesion

  • measuring textual cohesion

– latent semantic analysis

  • reference

– Foltz, P.W., Kintsch, W., and Landauer, T.K. (1998). The measurement of textual coherence with latent semantic analysis. Discourse Processes, 25, 285-307.

Term-Document Matrix (6 of 9)

slide-8
SLIDE 8

Vectors and Semantics Peter Turney November 2008

15

Semantic Orientation

  • measuring praise and criticism

– latent semantic analysis – small set of positive and negative reference words

  • good, nice, excellent, positive, fortunate, correct, and superior
  • bad, nasty, poor, negative, unfortunate, wrong, and inferior

– semantic orientation of a word X is sum of similarities of X with positive reference words minus sum of similarities of X with negative reference words

  • reference

– Turney, P.D., and Littman, M.L. (2003), Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems (TOIS), 21 (4), 315- 346

Term-Document Matrix (7 of 9)

16

Logic

  • logical operations can be performed by linear algebra

– t1 OR t2 = the vector space spanned by t1 and t2 – t1 NOT t2 = is the projection of t1 onto the subspace that is

  • rthogonal to t2

– bass NOT fisherman = bass in the sense of a musical instrument, not bass in the sense of a fish

  • reference

– Dominic Widdows. (2004). Geometry and Meaning. CSLI Publications.

Term-Document Matrix (8 of 9)

slide-9
SLIDE 9

Vectors and Semantics Peter Turney November 2008

17

Summary

  • applications for a term-document (word-chunk) matrix

– information retrieval – measuring word similarity – essay grading – textual cohesion – semantic orientation – logic

Term-Document Matrix (9 of 9)

18

Outline

  • symbolic versus spatial approaches to knowledge

– logic versus geometry

  • term-document matrix

– latent semantic analysis; applications

  • pair-pattern matrix

– latent relational analysis; applications

  • episodic versus semantic

– some hypotheses about vectors and semantics

  • conclusions

– how to acquire knowledge; how to represent knowledge

Vectors and Semantics

slide-10
SLIDE 10

Vectors and Semantics Peter Turney November 2008

19

Pair-Pattern Matrix

  • pair-pattern matrix

– rows correspond to pairs of words

  • X:Y = mason:stone

– columns correspond to patterns

  • “X works with Y”

– element corresponds to the frequency of the given pattern in a corpus, when the variables in the pattern are instantiated with the words in the given pair

  • “mason works with stone”

– row vector gives the distribution of the patterns in which the given pair appears

  • a signature of the semantic relation between mason and stone

Pair-Pattern Matrix (1 of 8)

20

Technicalities

  • exactly the same as with term-document matrices

– weighting the elements – smoothing the matrix – comparing the vectors

  • many lessons carry over from term-document matrices

– good weighting approaches – good smoothing algorithms – good formulas for comparing

Pair-Pattern Matrix (2 of 8)

slide-11
SLIDE 11

Vectors and Semantics Peter Turney November 2008

21

SAT Analogies

Pair-Pattern Matrix (3 of 8)

  • relational similarity of two pairs is cosine of two row vectors

– cosine(traffic:street, water:riverbed) = 0.692

  • reference

– Turney, P.D., and Littman, M.L. (2005), Corpus-based learning of analogies and semantic relations, Machine Learning, 60 (1-3), 251-278. 0.692 water:riverbed (e) 0.497 pedestrians:feet (d) 0.687 car:garage (c) 0.572 crop:harvest (b) 0.318 ship:gangplank (a) Choices: Cosine traffic:street Stem pair:

22

Semantic Relations

  • classify noun-modifier expressions according to their

semantic relations – 600 noun-modifier expressions labeled with semantic relations – 30 classes or 5 classes

  • Causality:

"cold virus", "onion tear"

  • Temporality:

"morning frost", "summer travel"

  • Spatial:

"aquatic mammal", "west coast", "home remedy"

  • Participant:

"dream analysis", "mail sorter", "blood donor"

  • Quality:

"copper coin", "rice paper", "picture book"

– supervised nearest neighbour algorithm using cosine of row vectors

  • reference

– Turney, P.D. (2006), Similarity of semantic relations, Computational Linguistics, 32 (3), 379-416.

Pair-Pattern Matrix (4 of 8)

slide-12
SLIDE 12

Vectors and Semantics Peter Turney November 2008

23

Synonyms vs Antonyms

  • ESL synonym versus antonym questions

– language test for students of English as a Second Language – 136 synonym versus antonym questions

  • dissimilarity - resemblance

syn or ant? (ant)

  • naive - callow

syn or ant? (syn)

  • commend - denounce

syn or ant? (ant)

  • expose - camouflage

syn or ant? (ant)

  • galling - irksome

syn or ant? (syn)

– two-class supervised learning using row vectors

  • reference

– Turney, P.D. (2008), A uniform approach to analogies, synonyms, antonyms, and associations, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 905-912.

Pair-Pattern Matrix (5 of 8)

24

Similar vs Associated

  • similar versus associated

– 3 x 48 = 144 word pairs – 3 classes: similar, associated, both

  • Similar:

table-bed, music-art, flea-ant

  • Associated:

cradle-baby, mug-beer, mold-bread

  • Both:

ale-beer, uncle-aunt, ball-bat

– three-class supervised learning problem using row vectors

  • reference

– Turney, P.D. (2008), A uniform approach to analogies, synonyms, antonyms, and associations, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 905-912.

Pair-Pattern Matrix (6 of 8)

slide-13
SLIDE 13

Vectors and Semantics Peter Turney November 2008

25

Systematic Analogies

  • analogical mapping between sets of terms

– mapping from solar system to atom

  • reference

– submitted but not yet published

Pair-Pattern Matrix (7 of 8)

Source A Mapping M Target B solar system

  • atom

sun

  • nucleus

planet

  • electron

mass

  • charge

attracts

  • attracts

revolves

  • revolves

gravity

  • electromagnetism

26

Summary

  • applications for a pair-pattern matrix

– proportional analogies – semantic relations – synonyms versus antonyms – similar versus associated – systematic analogies

Pair-Pattern Matrix (8 of 8)

slide-14
SLIDE 14

Vectors and Semantics Peter Turney November 2008

27

Outline

  • symbolic versus spatial approaches to knowledge

– logic versus geometry

  • term-document matrix

– latent semantic analysis; applications

  • pair-pattern matrix

– latent relational analysis; applications

  • episodic versus semantic

– some hypotheses about vectors and semantics

  • conclusions

– how to acquire knowledge; how to represent knowledge

Vectors and Semantics

28

Episodic vs Semantic

  • episodic memory is memory of a specific event in one's

personal past – I remember when I first went hang gliding – I remember when I saw the Great Pyramid of Giza

  • semantic memory is memory of basic facts and concepts,

unrelated to any specific event in one's personal past – I remember that the speed of light in a vacuum is approximately 3 x 108 meters per second – I remember that a tesseract is a four-dimensional hypercube composed of eight three-dimensional cubes

  • distinction from cognitive psychology

– types of explicit or declarative memory, as opposed to implicit or procedural memory

Episodic vs Semantic (1 of 4)

slide-15
SLIDE 15

Vectors and Semantics Peter Turney November 2008

29

Episodic vs Semantic

  • ACE Local Relation Detection and Recognition (LRDR) task

– “George Bush traveled to France on Thursday for a summit.” – there is a Physical.Located relation between George Bush and France – extraction of episodic information from a sentence

  • Noun-Modifier Classification task

– acquisition of semantic knowledge from a corpus

  • Causality:

"cold virus", "onion tear"

  • Temporality:

"morning frost", "summer travel"

  • Spatial:

"aquatic mammal", "west coast", "home remedy"

  • Participant:

"dream analysis", "mail sorter", "blood donor"

  • Quality:

"copper coin", "rice paper", "picture book"

Episodic vs Semantic (2 of 4)

30

Posterior vs Prior

  • posterior probability versus prior probability

– R(X,Y) = X and Y have relation R – S(X,Y) = X and Y occur in sentence S – prior probability = prob(R(X,Y)) = semantic – posterior probability = prob(R(X,Y) | S(X,Y)) = episodic

  • ACE Local Relation Detection and Recognition (LRDR) task

– R(X,Y) = there is a Physical.Located relation between George Bush and France – S(X,Y) = “George Bush traveled to France on Thursday for a summit.”

  • Noun-Modifier Classification task

– R(X,Y) = there is a Spatial relation between aquatic and mammal

Episodic vs Semantic (3 of 4)

slide-16
SLIDE 16

Vectors and Semantics Peter Turney November 2008

32

Outline

  • symbolic versus spatial approaches to knowledge

– logic versus geometry

  • term-document matrix

– latent semantic analysis; applications

  • pair-pattern matrix

– latent relational analysis; applications

  • episodic versus semantic

– some hypotheses about vectors and semantics

  • conclusions

– how to acquire knowledge; how to represent knowledge

Vectors and Semantics

slide-17
SLIDE 17

Vectors and Semantics Peter Turney November 2008

33

Knowledge Representation

  • need spatial representation

– for measuring similarity – for estimating probabilities

  • need symbolic representation

– for reasoning about entailment – for communication

  • input text and output text
  • language is symbolic

Conclusions (1 of 4)

34

Knowledge Acquisition

  • spatial approach is able to acquire knowledge from text

– term-document matrix:

  • information retrieval, measuring word similarity, essay grading,

textual cohesion, semantic orientation, logic

– pair-pattern matrix:

  • proportional analogies, semantic relations, synonyms versus

antonyms, similar versus associated, systematic analogies

Conclusions (2 of 4)

slide-18
SLIDE 18

Vectors and Semantics Peter Turney November 2008

35

Knowledge Use

  • symbolic representation

– useful for input and output – compact storage if aliasing is tolerable – useful for logical reasoning, entailment

  • spatial representation

– useful for calculating similarity – useful for calculating probability – case-based reasoning, analogical reasoning – learning

Conclusions (3 of 4)

36

Conclusion

  • Information Extraction has focused on episodic information

– IE: NER, MUC, ACE, etc. – episodic: posterior – representation is symbolic

  • Vector Space Models have focused on semantic information

– VSM: IR, LSA, LRA, cosine, etc. – semantic: prior – representation is spatial

  • need to combine the two

– IE can use prior information from VSM – VSM can use posterior information from IE

Conclusions (4 of 4)