Mixed Membership Word Embeddings for Computational Social Science - PowerPoint PPT Presentation

Mixed Membership Word Embeddings for Computational Social Science James Foulds (Jimmy) Department of Information Systems University of Maryland, Baltimore County UMBC ACM Faculty Talk, April 5 2018 Paper to be presented at the International Conference on Artificial Intelligence and Statistics (AISTATS 2018)

Latent Variable Modeling Understand, Data explore, predict Complicated, noisy, high-dimensional 2

Latent Variable Modeling Understand, Data explore, predict Complicated, noisy, high-dimensional Latent variable model 3

Latent Variable Modeling Understand, Data explore, predict Low-dimensional, Complicated, noisy, semantically meaningful high-dimensional representations Latent variable model 4

Latent Variable Modeling • Latent variable modeling is a general, principled approach for making sense of complex data sets • Core principles: – Dimensionality reduction 5 Images due to Chris Bishop, Pattern Recognition and Machine Learning book

Latent Variable Modeling • Latent variable modeling is a general, principled approach for making sense of complex data sets • Core principles: – Dimensionality reduction – Probabilistic graphical models 6 Images due to Chris Bishop, Pattern Recognition and Machine Learning book

Latent Variable Modeling • Latent variable modeling is a general, principled approach for making sense of complex data sets • Core principles: – Dimensionality reduction – Probabilistic graphical models – Statistical inference, especially Bayesian inference 7 Images due to Chris Bishop, Pattern Recognition and Machine Learning book

Latent Variable Modeling • Latent variable modeling is a general, principled approach for making sense of complex data sets • Core principles: – Dimensionality reduction – Probabilistic graphical models – Statistical inference, especially Bayesian inference Latent variable models are, basically, PCA on steroids! 8 Images due to Chris Bishop, Pattern Recognition and Machine Learning book

Motivating Applications • Industry: – user modeling, recommender systems, and personalization, … 9

Motivating Applications • Natural language processing – Machine translation – Document summarization – Parsing – Question answering – Named entity recognition – Sentiment analysis – Opinion mining 10

Motivating Applications • Furthering scientific understanding in: – Cognitive psychology (Griffiths and Tenenbaum, 2006) – Sociology (Hoff, 2008) – Political science (Gerrish and Blei, 2012) – The humanities (Mimno, 2012) – Genetics (Pritchard, 2000) – Climate science (Bain et al., 2011) – … 11

Motivating Applications • Social network analysis – Identify latent social groups/communities – Test sociological theories (homophily, stochastic equivalence, triadic closure, balance theory,…) 12

Motivating Applications • Computational social science, digital humanities, … 13

Example: Mining Classics Journals 14

Example: Do U.S. Senators from the same state prioritize different issues? (Grimmer, 2010) Schiller’s theory is false Schiller’s theory is true “ ” Grimmer, J. A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases. 15 Political Analysis, 18(1):1 – 35, 2010.

Example: Influence Relationships in the U.S. Supreme Court 16 Guo, F., Blundell, C., Wallach, H., and Heller, K. (2015). AISTATS

Box’s Loop Evaluate, Understand, Data iterate explore, predict Low-dimensional, Complicated, noisy, semantically meaningful high-dimensional representations Latent variable model 17

Overview of my Research Evaluation Understand, Data explore, UAI’14 predict UAI’16, ArXiv’16 (submitted to, JMLR), AISTATS’17 Privacy KDD’13, AISTATS’11, KDD’15, ACL’15 (x2), EMNLP’13,’15 ArXiv’16 (submitted to JMLR) ICWSM’11, AISTATS’11, SDM’11 Latent variable models Older work KER’10, DS’10 General-purpose modeling frameworks AusAI’08, 18 ICML’15, RecSys’15 APJOR’06, IJOR’06

Topic Models (Blei et al., 2003) The quick brown fox jumps over the sly lazy dog 19

Topic Models (Blei et al., 2003) The quick brown fox jumps over the sly lazy dog [5 6 37 1 4 30 5 22 570 12] 20

Topic Models (Blei et al., 2003) The quick brown fox jumps over the sly lazy dog [5 6 37 1 4 30 5 22 570 12] Foxes Dogs Jumping [40% 40% 20% ] 21

Topics Topic 1 Topic 2 Topic 3 Reinforcement learning Learning algorithms Character recognition Distribution over all words in dictionary A vector of discrete probabilities (sums to one) 22

Topic Models for Computational Social Science time 23

Naïve Bayes Document Model Assumed generative process: Graphical model: … … Documents d=1:D 24

Mixed Membership Modeling • Naïve Bayes conditional independence assumption typically too strong, not realistic • Mixed membership: relax “hard clustering” assumption to “soft clustering” – Membership distribution over clusters – E.g.: • Text documents belong to a distribution of topics • Social network individuals belong partly to multiple communities • Our genes come from multiple different ancestral populations • Our genes come from multiple different ancestral populations 25

Mixed Membership Modeling • Improves representational power for a fixed number of topics/clusters – We can have a powerful model with fewer clusters • Parameter sharing – Can learn on smaller datasets, especially with Bayesian approach to manage uncertainty in cluster assignments 26

Topic Model Latent Representations • Unsupervised Foxes Dogs Jumping Doc 1 1 naïve Bayes Doc 2 1 (latent class model) Doc 3 1 Foxes Dogs Jumping • Topic model Doc 1 0.4 0.4 0.2 (mixed membership Doc 2 0.5 0.5 Doc 3 0.1 0.9 model) 27

Latent Dirichlet Allocation Topic Model (Blei et al., 2003) Documents have distributions over topics θ (d) Topics are distributions over words φ (k) Assumed generative process: (full model includes priors on θ , φ ) • For each document d • For each word w d,n • Draw a topic assignment z d,n ~ Discrete( θ (d) ) • Draw a word from the chosen topic w d,n ~ Discrete( φ (zd,n) ) φ 28

Collapsed Gibbs sampler for LDA Griffiths and Steyvers (2004) • Marginalize out the parameters, and perform inference on the latent variables only Z Z 𝚾 𝛊 29

Collapsed Gibbs sampler for LDA Griffiths and Steyvers (2004) • Collapsed Gibbs sampler Smoothing from prior (similar to Laplace smoothing) Word-topic counts Document-topic counts Topic counts 30

Word Embeddings • Language models which learn to represent dictionary words with vectors dog dog: (0.11, - 1.5, 2.7, … ) cat: (0.15, - 1.2, 3.2, … ) Paris: (4.5, 0.3, - 2.1, …) Paris cat • Nuanced representations for words • Improved performance for many NLP tasks – translation, part-of-speech tagging, chunking, NER, … • NLP “from scratch”? ( Collobert et al., 2011) 31

Word Embeddings • Vector arithmetic solves analogy tasks: man is to king as woman is to _____? v(king) - v(man) + v(woman) ≈ v(queen) v(woman) -v(man) v(king) v(queen) 32

The Distributional Hypothesis • “There is a correlation between distributional similarity and meaning similarity , which allows us to utilize the former in order to estimate the latter.” ( Sahlgren, 2008) _ _ _ _ w 1 _ _ _ _ similar similar similar _ _ _ _ w 2 _ _ _ _ 33

Word2vec (Mikolov et al., 2013) Skip-Gram A log-bilinear classifier for the context of a given word Figure due to Mikolov et al. (2013) 36

The Skip-Gram Encodes the Distributional Hypothesis _ _ _ _ w 1 _ _ _ _ _ _ _ _ w 2 _ _ _ _ • Word vectors encode distribution of context words • Similar words assumed to have similar vectors 37

Word2vec (Mikolov et al., 2013) • Key insights: – Simple models can be trained efficiently on big data – High-dimensional simple embedding models, trained on massive data sets, can outperform sophisticated neural nets 38

Word Embeddings for Computational Social Science? • Word embeddings have many advantages – Capture similarities between words – Often better classification performance than topic models • Have not yet been widely adopted for computational social science research, perhaps due to the following limitations: • Target corpus of interest is often not big data • It is important for the model to be interpretable 39

Mixed Membership Word Embeddings for Computational Social Science - PowerPoint PPT Presentation

Mixed Membership Word Embeddings for Computational Social Science James Foulds (Jimmy) Department of Information Systems University of Maryland, Baltimore County UMBC ACM Faculty Talk, April 5 2018 Paper to be presented at the International

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

Text as Data Zoltan Fazekas zfazekas.github.io 19 November 2015 @cph ssd Word clouds are the

Angelo Asti Jules Bastien-Lepage Paulus Moreelse Elin Danielson-Gambogi Paul Delaroche Anselm

Good afternoon Giraffe families, we hope youve had a lovely Wednesday. This morning we started

Energy Conservation Program 2017 Progress Report Prince William County School Board Meeting June

Preserving Recomputability of Results from Big Data Transformation Workflows Matthias Kricke

WE AT HE RIZAT ION DAY 2018 Kic koff We binar | Se pte mbe r 13, 2018 Presenters Eric

CEPH FileSystem Course: Computing Clusters, Computing Grids, Computing Clouds Presenter: An Pham

JANUARY 8, 2010 AGENDA Welcome and Introductions (5 min) Catherine Bishop , NHLP Welcome,

Mixed Membership Word Embeddings for Computational Social Science - PowerPoint PPT Presentation

Mixed Membership Word Embeddings for Computational Social Science James Foulds (Jimmy) Department of Information Systems University of Maryland, Baltimore County UMBC ACM Faculty Talk, April 5 2018 Paper to be presented at the International

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

Text as Data Zoltan Fazekas zfazekas.github.io 19 November 2015 @cph ssd Word clouds are the

Angelo Asti Jules Bastien-Lepage Paulus Moreelse Elin Danielson-Gambogi Paul Delaroche Anselm

Good afternoon Giraffe families, we hope youve had a lovely Wednesday. This morning we started

Energy Conservation Program 2017 Progress Report Prince William County School Board Meeting June

Preserving Recomputability of Results from Big Data Transformation Workflows Matthias Kricke

WE AT HE RIZAT ION DAY 2018 Kic koff We binar | Se pte mbe r 13, 2018 Presenters Eric

CEPH FileSystem Course: Computing Clusters, Computing Grids, Computing Clouds Presenter: An Pham

JANUARY 8, 2010 AGENDA Welcome and Introductions (5 min) Catherine Bishop , NHLP Welcome,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to