Recommenda)onEngines;Collabora)ve Filtering;Thema)cclusteringoflarge - - PowerPoint PPT Presentation
Recommenda)onEngines;Collabora)ve Filtering;Thema)cclusteringoflarge - - PowerPoint PPT Presentation
Recommenda)onEngines;Collabora)ve Filtering;Thema)cclusteringoflarge textcorpora;InfiniteMarkovmodelfor sta)s)calNLP RussellW.Hanson Dec.8,2008 Outline
Outline
- Several problems in applied mathema)cs and
approaches to their solu)ons:
– Recommenda)on Engines – Collabora)ve Filtering – Thema)c clustering of large text corpora – Infinite Markov model for sta)s)cal NLP
LobeLink.com – social bookmarking; social web annota)on; and recommenda)on engine
Recommenda)on Engines, $$$
Amazon.com NetFlix.com
A Recommenda)on Engine
ATribu)zed Bayesian Choice Modeling
- Collabora)ve Filtering for text and “news”:
– Cold Start Problem (it isn’t collabora)ve un)l it’s collabora)ve) – Past Experience: Some people want the most popular (“Dodgers make offer to Manny Ramirez ‐ Boston.com”); some don’t (“Non‐Abelian Anyons and Topological Quantum Computa)on”) – By weight in whole network; by weight in user’s network; by weight in thema)c cluster
Summary of tastes, T: ATribu)zed content items, i, are stored as vectors in the choice‐set database such that:
Thema)c Clustering
- Want to have more
fine‐grained recommenda)ons than connec)vity in user network — weight in a given thema)c cluster.
Latent Dirichlet Alloca)on/Analysis
Latent Dirichlet Alloca)on/Analysis (p3)
Latent Dirichlet Alloca)on/Analysis (p2)
Infinite Markov Models
Language models and parsers N-gram (bigram, trigram) vs. ∞-gram The supercalifragilisticexpialidocious-problem hierarchical Pitman-Yor language model (HPYLM) variable order hier archical Pitman-Yor language model (VPYLM)
Selected References
Document Clustering in Large German Corpora Using Natural Language Processing Richard Forster (2006) University of Zurich Latent Dirichlet Allocation Blei, Ng, and Jordan Journal of Machine Learning Research 3 (2003) 993-1022 The Infinite Markov Model Daichi Mochihashi, Eiichiro Sumita NIPS, 2007 LobeLink.com