Recommenda)onEngines;Collabora)ve Filtering;Thema)cclusteringoflarge - - PowerPoint PPT Presentation

recommenda on engines collabora ve filtering thema c
SMART_READER_LITE
LIVE PREVIEW

Recommenda)onEngines;Collabora)ve Filtering;Thema)cclusteringoflarge - - PowerPoint PPT Presentation

Recommenda)onEngines;Collabora)ve Filtering;Thema)cclusteringoflarge textcorpora;InfiniteMarkovmodelfor sta)s)calNLP RussellW.Hanson Dec.8,2008 Outline


slide-1
SLIDE 1

Recommenda)on
Engines;
Collabora)ve
 Filtering;
Thema)c
clustering
of
large
 text
corpora;
Infinite
Markov
model
for
 sta)s)cal
NLP


Russell
W.
Hanson
 Dec.
8,
2008


slide-2
SLIDE 2

Outline


  • Several
problems
in
applied
mathema)cs
and


approaches
to
their
solu)ons:


– Recommenda)on
Engines
 – Collabora)ve
Filtering
 – Thema)c
clustering
of
large
text
corpora
 – Infinite
Markov
model
for
sta)s)cal
NLP


slide-3
SLIDE 3

LobeLink.com
–
social
bookmarking;
 social
web
annota)on;
and
recommenda)on
engine


slide-4
SLIDE 4

Recommenda)on
Engines,
$$$


Amazon.com
 NetFlix.com


slide-5
SLIDE 5

A
Recommenda)on
Engine


slide-6
SLIDE 6

ATribu)zed
Bayesian
Choice
Modeling


  • Collabora)ve
Filtering
for
text
and
“news”:


– Cold
Start
Problem
(it
isn’t
collabora)ve
un)l
it’s
collabora)ve)
 – Past
Experience:
Some
people
want
the
most
popular
(“Dodgers
 make
offer
to
Manny
Ramirez
‐
Boston.com”);
some
don’t
 (“Non‐Abelian
Anyons
and
Topological
Quantum
Computa)on”)
 – By
weight
in
whole
network;
by
weight
in
user’s
network;
by
 weight
in
thema)c
cluster


Summary
of
tastes,
T:
 ATribu)zed
content
items,
i,
are
stored
as
vectors
in
the
choice‐set
database
such
that:


slide-7
SLIDE 7

Thema)c
Clustering


  • Want
to
have
more


fine‐grained
 recommenda)ons
 than
connec)vity
in
 user
network
—
 weight
in
a
given
 thema)c
cluster.


slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10

Latent
Dirichlet
Alloca)on/Analysis


slide-11
SLIDE 11

Latent
Dirichlet
Alloca)on/Analysis
(p3)


slide-12
SLIDE 12

Latent
Dirichlet

 Alloca)on/Analysis
(p2)


slide-13
SLIDE 13

Infinite
Markov
Models


Language models and parsers N-gram (bigram, trigram) vs. ∞-gram The supercalifragilisticexpialidocious-problem hierarchical Pitman-Yor language model (HPYLM) variable order hier archical Pitman-Yor language model (VPYLM)

slide-14
SLIDE 14

Selected
References


Document Clustering in Large German Corpora Using Natural Language Processing Richard Forster (2006) University of Zurich Latent Dirichlet Allocation Blei, Ng, and Jordan Journal of Machine Learning Research 3 (2003) 993-1022 The Infinite Markov Model Daichi Mochihashi, Eiichiro Sumita NIPS, 2007 LobeLink.com