An N-gram Topic Model for Time-Stamped Documents Shoaib Jameel and - PowerPoint PPT Presentation

An N-gram Topic Model for Time-Stamped Documents Shoaib Jameel and Wai Lam The Chinese University of Hong Kong Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia

Outline Introduction and Motivation ◮ The Bag-of-Words (BoW) assumption ◮ Temporal nature of data Related Work ◮ Temporal Topic Models ◮ N-gram Topic Models Overview of our model ◮ Background ⋆ Topics Over Time (TOT) Model - proposed earlier ⋆ Our proposed n-gram model Empirical Evaluation Conclusions and Future Directions Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia

The ‘popular’ Bag-of-Words Assumption Many works in the topic modeling literature assume exchangeability among the words. As a result generate ambiguous words in topics. For example, consider few topics obtained from the NIPS collection using the Latent Dirichlet Allocation (LDA) model: Example Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 architecture order connectionist potential prior recurrent first role membrane bayesian network second binding current data module analysis structures synaptic evidence modules small distributed dendritic experts The problem with the LDA model Words in topics are not insightful. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia

The problem with the bag-of-words assumption Logical structure of the document is lost. For example, we do not 1 know whether “ the cat saw a dog or a dog saw a cat ”. The computational models cannot tap an extra word order 2 information inherent in the text. Therefore, affects the performance. The usefulness of maintaining the word order has also been 3 illustrated in Information Retrieval , Computational Linguistics and many other fields. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia

Why capture topics over time? We know that data evolves over time. 1 What people are talking today may not be talking tomorrow or an 2 year after. Wikipedia Gaza Strip Burj Khalifa N.Z Earthquake Volcano Sachin Tendulkar Manila Hostage Osama bin Laden China Iraq War Higgs Boson Apple Inc. Year-2010 Year-2011 Year-2012 Models such as LDA do not capture such temporal characteristics 3 in data. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia

Related Work Temporal Topic Models Discrete time assumption models ◮ Blei et al., (David M. Blei and John D. Lafferty. 2006.) - Dynamic Topic Models - assume that topics in one year are dependent on the topics of the previous year. ◮ Knights et al., (Knights, D., Mozer, M., and Nicolov, N. 2009.) - Compound Topic Model - Train a topic model on the most recent K months of data. The problem here One needs to select an appropriate time slice value manually. The question is which time slice be chosen: day, month, year, etc.? Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia

Related Work Temporal Topic Models Continuous Time Topic Models ◮ Noriaki (Noriaki Kawamae. 2011.) - Trend Analysis Model - The model has a probability distribution over temporal words, topics, and a continuous distribution over time. ◮ Uri et al., (Uri Nodelman, Christian R. Shelton, and Daphne Koller. 2002.) - Continuous Time Bayesian Networks - Builds a graph where each variable lies in the node whose values change over time. The problem with the above models All assume the notion of exchangeability and thus lose important collocation information inherent in the document. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia

Related Work N-gram Topic Models Wallach’s (Hanna M. Wallach. 2006.) bigram topic model. 1 Maintains word order during topic generation process. Generates only bigram words in topics. Griffiths et al. (Griffiths, T. L., Steyvers, M., and Tenenbaum, J. B. 2 2007.) - LDA Collocation Model. Introduced binary random variables which decides when to generate a unigram or a bigram. Wang et al. (Wang, X., McCallum, A., and Wei, X. 2007.) - Topical 3 N-gram Model - Extends the LDA Collocation Model. Gives topic assignment to every word in the phrase. The problem with the above models Cannot capture the temporal dynamics in data. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia

Topics Over Time (TOT) (Wang et al., 2006) Our model extends from this model. 1 Assumes the notion of word and topic exchangeability. 2 Generative Process Topics Over Time Model (TOT) Draw T multinomials φ z from a 1 Dirichlet Prior β , one for each α topic z For each document d , draw a 2 multinomial θ ( d ) from a Dirichlet θ prior α ; then for each word w ( d ) in i the document d Draw a topic z d z i from β 1 Multinomial θ ( d ) Draw a word w ( d ) from 2 i Multinomial φ z ( d ) i φ w t Ω Draw a timestamp t ( d ) from 3 N d i Beta Ω z ( d ) T D i Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia

Topics Over Time Model (TOT) The model assumes a continuous distribution over time 1 associated with each topic. Topics are responsible for generating both observed time-stamps 2 and also words. The model does not capture the sequence of state changes with a 3 Markov assumption. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia

An N-gram Topic Model for Time-Stamped Documents Shoaib Jameel and - PowerPoint PPT Presentation

An N-gram Topic Model for Time-Stamped Documents Shoaib Jameel and Wai Lam The Chinese University of Hong Kong Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia Outline Introduction and Motivation The Bag-of-Words (BoW) assumption

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

21 st Century Antibiotics Gram Negative Antibiotic Gram Positive Antibiotic Plasmid Library

More microscopic slides of bacteria Gram stain Good example of bacilli gram stain that is

N-Gram Model Formulas Estimating Probabilities N-gram conditional probabilities can be

N-gram models Unsmoothed n-gram models (finish slides from last class) Smoothing

GOLD/SILVER/PLATINUM BARS & COINS RSBL 0.5 Gram 999 Purity Platinum Bar/Coin More Details

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

COMP31212: Concurrency Topic 5.3: Liveness and Topic 5.4 Fairness Topic 5.3: Liveness Properties

Joshua Hartigan Supervisor: Judy-anne Osborn Heres a matrix And heres its Gram

Many words share the same root word This week we are focusing on words with the root gram.

Anaerobes Veillonella Gram positive bacilli Clostridium perfringens, tetani,

N-grams & Language ID If N-gram models represent language models, can we use N-gram

Language as an Interface Spencer Kelly introduction The pope is catholic. language as data

Gram-Schmidt Finding Orthonormal Basis The famous Gram-Schmidt process is used to produce an

Alpha Presentation Virtual Appliance Simulator The Capstone Experience Team Whirlpool Lisa

Stamp & Extend - Instant but Undeniable electronic time stamp Timestamping based on

Moment-based Distributionally Robust Server Allocation and Scheduling Problems Yiling Zhang 1 ,

PowerExecutive Tom Brey IBM Agenda Why PowerExecutive - The Data Center Power/Cooling

Workflow Plus Multiple Comment Fields Tool Features This tool will allow the use of 20

Stakeholder Workshop II On issues related to bundling of capacities Brussels, 30 June 2015

cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs Oded Green & David Bader What we

Data Science and Security in Digital Governance Aspects and an Elastic Bus Transportation Scheme

An N-gram Topic Model for Time-Stamped Documents Shoaib Jameel and - PowerPoint PPT Presentation

An N-gram Topic Model for Time-Stamped Documents Shoaib Jameel and Wai Lam The Chinese University of Hong Kong Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia Outline Introduction and Motivation The Bag-of-Words (BoW) assumption

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

21 st Century Antibiotics Gram Negative Antibiotic Gram Positive Antibiotic Plasmid Library

More microscopic slides of bacteria Gram stain Good example of bacilli gram stain that is

N-Gram Model Formulas Estimating Probabilities N-gram conditional probabilities can be

N-gram models Unsmoothed n-gram models (finish slides from last class) Smoothing

GOLD/SILVER/PLATINUM BARS &amp; COINS RSBL 0.5 Gram 999 Purity Platinum Bar/Coin More Details

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

COMP31212: Concurrency Topic 5.3: Liveness and Topic 5.4 Fairness Topic 5.3: Liveness Properties

Joshua Hartigan Supervisor: Judy-anne Osborn Heres a matrix And heres its Gram

Many words share the same root word This week we are focusing on words with the root gram.

Anaerobes Veillonella Gram positive bacilli Clostridium perfringens, tetani,

N-grams &amp; Language ID If N-gram models represent language models, can we use N-gram

Language as an Interface Spencer Kelly introduction The pope is catholic. language as data

Gram-Schmidt Finding Orthonormal Basis The famous Gram-Schmidt process is used to produce an

Alpha Presentation Virtual Appliance Simulator The Capstone Experience Team Whirlpool Lisa

Stamp &amp; Extend - Instant but Undeniable electronic time stamp Timestamping based on

Moment-based Distributionally Robust Server Allocation and Scheduling Problems Yiling Zhang 1 ,

PowerExecutive Tom Brey IBM Agenda Why PowerExecutive - The Data Center Power/Cooling

Workflow Plus Multiple Comment Fields Tool Features This tool will allow the use of 20

Stakeholder Workshop II On issues related to bundling of capacities Brussels, 30 June 2015

cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs Oded Green &amp; David Bader What we

Data Science and Security in Digital Governance Aspects and an Elastic Bus Transportation Scheme

GOLD/SILVER/PLATINUM BARS & COINS RSBL 0.5 Gram 999 Purity Platinum Bar/Coin More Details

N-grams & Language ID If N-gram models represent language models, can we use N-gram

Stamp & Extend - Instant but Undeniable electronic time stamp Timestamping based on

cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs Oded Green & David Bader What we