learning effective and interpretable semantic models
play

Learning Effective and Interpretable Semantic Models using - PowerPoint PPT Presentation

1 Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding (NNSE) Brian Murphy, Partha Talukdar, Tom Mitchell Machine Learning Department Carnegie Mellon University 2 Distributional Semantic Modeling 2


  1. 1 Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding (NNSE) Brian Murphy, Partha Talukdar, Tom Mitchell Machine Learning Department Carnegie Mellon University

  2. 2 Distributional Semantic Modeling

  3. 2 Distributional Semantic Modeling • Words are represented in a high dimensional vector space

  4. 2 Distributional Semantic Modeling • Words are represented in a high dimensional vector space • Long history: • (Deerwester et al., 1990), (Lin, 1998), (Turney, 2006), ...

  5. 2 Distributional Semantic Modeling • Words are represented in a high dimensional vector space • Long history: • (Deerwester et al., 1990), (Lin, 1998), (Turney, 2006), ... • Although effective, these models are often not interpretable

  6. 2 Distributional Semantic Modeling • Words are represented in a high dimensional vector space • Long history: • (Deerwester et al., 1990), (Lin, 1998), (Turney, 2006), ... • Although effective, these models are often not interpretable Examples of top 5 words from 5 randomly chosen dimensions from SVD 300

  7. Why interpretable dimensions? 3 Semantic Decoding: (Mitchell et al., Science 2008)

  8. Why interpretable dimensions? 3 Semantic Decoding: (Mitchell et al., Science 2008) “motorbike” Input stimulus word

  9. Why interpretable dimensions? 3 Semantic Decoding: (Mitchell et al., Science 2008) (0.87, ride) (0.29, see) “motorbike” . . . (0.00, rub) Input (0.00, taste) stimulus word Semantic representation

  10. Why interpretable dimensions? 3 Semantic Decoding: (Mitchell et al., Science 2008) (0.87, ride) (0.29, see) “motorbike” . . . (0.00, rub) Input (0.00, taste) stimulus word Semantic Mapping learned representation from fMRI data

  11. Why interpretable dimensions? 3 Semantic Decoding: (Mitchell et al., Science 2008) (0.87, ride) (0.29, see) predicted “motorbike” . activity . for . “motorbike” (0.00, rub) Input (0.00, taste) stimulus word Semantic Mapping learned representation from fMRI data

  12. 4 Why interpretable dimensions? (Mitchell et al., Science 2008)

  13. 4 Why interpretable dimensions? (Mitchell et al., Science 2008)

  14. 4 Why interpretable dimensions? (Mitchell et al., Science 2008) • Interpretable dimension reveals insightful brain activation patterns!

  15. 4 Why interpretable dimensions? (Mitchell et al., Science 2008) • Interpretable dimension reveals insightful brain activation patterns! • But, features in the semantic representation were based on 25 hand-selected verbs

  16. 4 Why interpretable dimensions? (Mitchell et al., Science 2008) • Interpretable dimension reveals insightful brain activation patterns! • But, features in the semantic representation were based on 25 hand-selected verbs ‣ can’t represent arbitrary concepts

  17. 4 Why interpretable dimensions? (Mitchell et al., Science 2008) • Interpretable dimension reveals insightful brain activation patterns! • But, features in the semantic representation were based on 25 hand-selected verbs ‣ can’t represent arbitrary concepts ‣ need data-driven, broad coverage semantic representations

  18. 5 What is an Interpretable, Cognitively-plausible Representation?

  19. 5 What is an Interpretable, Cognitively-plausible Representation? features words

  20. 5 What is an Interpretable, Cognitively-plausible Representation? features words w ... 0 0 0 ... 1.2 ...

  21. 5 What is an Interpretable, Cognitively-plausible Representation? features words w ... 0 0 0 ... 1.2 ... 1. Compact representation: Sparse , many zeros

  22. 5 What is an Interpretable, Cognitively-plausible Representation? features words w ... 0 0 0 ... 1.2 ... 1. Compact representation: 2. Uneconomical to store negative Sparse , many zeros (or inferable) characteristics: Non-Negative

  23. 5 What is an Interpretable, Cognitively-plausible Representation? features words 3. Meaningful Dimensions: w ... 0 0 0 ... 1.2 ... Coherent 1. Compact representation: 2. Uneconomical to store negative Sparse , many zeros (or inferable) characteristics: Non-Negative

  24. 6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative

  25. 6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative Hand Tuned

  26. 6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative Hand Tuned Corpus-derived (existing)

  27. 6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative Hand Tuned Hand Coded 83.5 Corpus-derived Corpus-derived 83.1 (existing) Prediction accuracy (on Neurosemantic Decoding) (Murphy, Talukdar, Mitchell, StarSem 2012)

  28. 6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative Hand Tuned Corpus-derived (existing)

  29. 6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative Hand Tuned Corpus-derived (existing) NNSE Our proposal

  30. 6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative Hand Tuned Corpus-derived (existing) NNSE Our proposal

  31. 7 Non-Negative Sparse Embedding (NNSE)

  32. 7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) X

  33. 7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) X w i input representation for word w i

  34. 7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) X A D = x w i input representation for word w i

  35. 7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) Basis X A D = x w i input representation NNSE representation for word w i for word w i

  36. 7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) Basis X A D = x w i input representation NNSE representation for word w i for word w i • matrix A is non-negative

  37. 7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) Basis X A D = x w i input representation NNSE representation for word w i for word w i • matrix A is non-negative • sparsity penalty on the rows of A

  38. 7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) Basis X A D = x w i input representation NNSE representation for word w i for word w i • matrix A is non-negative • sparsity penalty on the rows of A • alternating minimization between A and D, using SPAMS package

  39. 8 NNSE Optimization X A D = x

  40. 9 Experiments

  41. 9 Experiments • Three main questions

  42. 9 Experiments • Three main questions 1. Are NNSE representations effective in practice?

  43. 9 Experiments • Three main questions 1. Are NNSE representations effective in practice? 2. What is the degree of sparsity of NNSE?

  44. 9 Experiments • Three main questions 1. Are NNSE representations effective in practice? 2. What is the degree of sparsity of NNSE? 3. Are NNSE dimensions coherent?

  45. 9 Experiments • Three main questions 1. Are NNSE representations effective in practice? 2. What is the degree of sparsity of NNSE? 3. Are NNSE dimensions coherent? • Setup

  46. 9 Experiments • Three main questions 1. Are NNSE representations effective in practice? 2. What is the degree of sparsity of NNSE? 3. Are NNSE dimensions coherent? • Setup • partial ClueWeb09, 16bn tokens, 540m sentences, 50m documents

  47. 9 Experiments • Three main questions 1. Are NNSE representations effective in practice? 2. What is the degree of sparsity of NNSE? 3. Are NNSE dimensions coherent? • Setup • partial ClueWeb09, 16bn tokens, 540m sentences, 50m documents • dependency parsed using Malt parser

  48. 10 Baseline Representation: SVD

  49. 10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract

  50. 10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract • document co-occurrence

  51. 10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract • document co-occurrence • dependency features from the parsed corpus

  52. 10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract • document co-occurrence • dependency features from the parsed corpus • Reduce dimensionality using SVD. Subsets of this reduced dimensional space is the baseline

  53. 10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract • document co-occurrence • dependency features from the parsed corpus • Reduce dimensionality using SVD. Subsets of this reduced dimensional space is the baseline • This is also the input (X) to NNSE

  54. 10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract • document co-occurrence • dependency features from the parsed corpus • Reduce dimensionality using SVD. Subsets of this reduced dimensional space is the baseline • This is also the input (X) to NNSE • Other representations were also compared (e.g., LDA, Collobert and Weston, etc.), details in the paper

  55. 11 Is NNSE effective in Neurosemantic Decoding?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend