lsa spaces for semantic differences
play

LSA Spaces for Semantic Differences John C. Martin Dissertation - PowerPoint PPT Presentation

Comparison of Hyper-dimensional LSA Spaces for Semantic Differences John C. Martin Dissertation Defense 20 May 2016 Overview Review LSA model of learning What is meaning? Measures Experiments Semantic Measurement Model Q


  1. Comparison of Hyper-dimensional LSA Spaces for Semantic Differences John C. Martin Dissertation Defense 20 May 2016

  2. Overview • Review LSA model of learning – What is meaning? • Measures • Experiments • Semantic Measurement Model • Q & A

  3. The LSA Model of Learning  Orthogonal Axes  Dimensionality Reduction  Mapping System Meaning

  4. Compositionality Constraint The meaning of a document is the sum of the meaning of its words 𝑟 𝑈 𝑉 𝑙 D =  𝑙

  5. Compositionality Constraint Corollary The meaning of a word is defined by the documents in which it appears (and does not appear)

  6. Meaning The Mapping system consists of:  Term Vector Dictionary  Singular Values

  7. Motivation

  8. Objective Find a measure or set of measures that can quantify the difference between two spaces

  9. Measures • Direct Comparison • Projected Content Comparison • Rotated Item Comparison

  10. Direct Comparison Measures 2 1 2 1 3 3

  11. Individual Space Measures • Document Count • Term Count • Non-zeroes

  12. Distribution Analysis

  13. Term and Document Overlap

  14. Projected Content Comparisons 2 Matched items 1 3 projected into each space 1 2 3

  15. Projected Item Distribution

  16. Three-Tuple Comparisons 𝐵, 𝐶, 𝐷 𝐵 = 𝑞 𝑗 , 𝐶 = 𝑞 𝑘 , 𝐷 = 𝑞 𝑙 , where 𝑗 ≠ 𝑘 ≠ 𝑙, ∀𝑞 ∈ 𝑄

  17. Three-Tuple Relationship Changes

  18. Rotations and Transform Comparisons 2 1 3 2 1 3 2'

  19. The Transform 𝐵 1 = 𝑄𝑠𝑝𝑘𝑓𝑑𝑢(𝐵, 𝑇 1 ) 𝐵 2 = 𝑄𝑠𝑝𝑘𝑓𝑑𝑢(𝐵, 𝑇 2 ) 𝑈 𝐵 2 = 𝑉  𝑊 𝑈 𝐵 1 𝑅 = 𝑉𝑊 𝑈 𝐵 1 𝑅 − 𝐵 2 𝐺

  20. Comparative Space Centroid Analysis C 1 C 2 C 1 C 2

  21. Overlapping Term Vector Norm 𝑈 𝑙 2 𝑈 1 𝑅 − 𝑈 2 𝐺 = 𝑈 𝐺 = 𝑢 𝑗,𝑘 𝑗=1 𝑘=1

  22. Projection/Anchor Sets Unique Term Set Documents Terms Instances NICHD04 1,060 5,912 70,063 T-500 500 16,317 123,668 T-1000 1,000 24,319 252,372 T-5000 5,000 49,995 1,281,749

  23. Control Experiment

  24. General Experiment

  25. General Experiment

  26. Grade Level Series Experiment

  27. Grade Level Series OTV-Norm

  28. Large Volume Experiment

  29. Large Volume Experiment

  30. Non-overlapping Series Experiment

  31. Non-Overlapping Series OTV-Norm

  32. Frozen Vocabulary Experiment

  33. OTV-Norm

  34. Semantic Measurement Model 𝑈𝐷% ≈ −0.207882 + 0.0507194 𝑃𝑈𝑊𝑂𝑝𝑠𝑛 + −0.339339(𝑈𝑃𝑆)

  35. Summary of Contributions • Semantic differences are observable – Measurable – Quality based • Similarity not dependent on overlapping content • OTV-Norm & Semantic Measurement Model – Whole-space measurement

  36. Further Research • Refine the model – Anchor set selection/influence – Account for non-overlapping terms – Investigate non-linear model • Other questions raised

  37. Leverage for Answering Other Questions  Is it possible to identify key documents that affect the meaning of a space?  Do additional items added to a space have any impact?  Is there a point at which adding any items to a space makes no difference?  Is it possible to identify necessary knowledge that would align two spaces?

  38. Q&A

  39. Backup Slides

  40. Projection of New Content Mapping Information 2 LSA Text 1 Space Sources 3 Projection 1 2 3

  41. Data • 42 Spaces • 592 Comparisons • 4 Projection Sets • 4 Anchor Sets • 26 Measures 61,568 Data Items Collected

  42. Distribution Analysis

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend