BigARTM: Open Source Library for Regularized Multimodal Topic Modeling
- f Large Collections
BigARTM: Open Source Library for Regularized Multimodal Topic - - PowerPoint PPT Presentation
BigARTM: Open Source Library for Regularized Multimodal Topic Modeling of Large Collections Konstantin Vorontsov, Oleksandr Frei, Murat Apishev, Peter Romov, Marina Dudarenko Yandex CC RAS MIPT HSE MSU Analysis of Images, Social
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 3 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 4 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
1 PLSA — Probabilistic Latent Semantic Analysis (1999) 2 LDA — Latent Dirichlet Allocation (2003) 3 100s of PTMs based on Graphical Models & Bayesian Inference
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 5 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Ра••а•• а! #$%& •а'(!•-а!а') )*%#&)+ $•,-•, & ./0.'%!)1 •а•2/ /- $•• 03%!!/- $•. •••. . 4%!•2!/- $•#'%,•.а %'(!•# 0-. М% •, •#!•.а! !а •а•!•2а#7 а•!•2 •8%!).а!)) #-•,# .а !9&'%• ),!/- $•#'%,•.а %'(!•# %+ . $••# •а!# .% &•:;;)8)%! •. •а•'•3%!)0 ;•а42%! •. &•)./- GC- ) GA-#•,%•3а!)0 $• &'а##)*%#&)2 •• •4•!а'(!/2 •а•)#а2. На+,%!/ 9#'•.)0
#-•,# .а. М% •, •,)!а&•.• -•••7• •а•• а% !а •а•!/- 2а#7 а•а- ,а!!/-. О! $••.•'0% ./0.'0 ( #'%,/ #%42%! !/- ,9$')&а8)+ ) 2%4а#а %'') !/% 9*а# &) . 4%!•2%, •а+•!/ #)! %!)) $•) #•а.!%!)) $а•/ 4%!•2•.. Е4• 2•3!• )#$•'(••.а ( ,'0 ,% а'(!•4• )•9*%!)0 ;•а42%! •.
, … , #$ " , … , "#$:
0.018 •а•••• а!а "# 0.013 •$•%•&!• 0.011 •а&&#• … … … … 0.023 % ' 0.016 (# •) 0.009 *'+#•&"% … … … … 0.014 ,а•"• 0.009 ••#'&• 0.006 ••&•(• а+- ./ … … … …
! "
" #"
$• Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 6 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 7 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
W ×D = Φ W ×T · Θ T×D , where
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 8 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 9 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 10 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Topics of documents Words and keyphrases of topics
doc1: doc2: doc3: doc4: ...
Text documents
Topic Modeling D
u m e n t s T
i c s
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 11 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Topics of documents Words and keyphrases of topics
doc1: doc2: doc3: doc4: ...
Text documents
Topic Modeling D
u m e n t s T
i c s
Metadata: Authors Data Time Conference Organization URL etc. Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 12 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Topics of documents Words and keyphrases of topics
doc1: doc2: doc3: doc4: ...
Text documents
Topic Modeling D
u m e n t s T
i c s
Metadata: Authors Data Time Conference Organization URL etc. Images Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 13 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Topics of documents Words and keyphrases of topics
doc1: doc2: doc3: doc4: ...
Text documents
Topic Modeling D
u m e n t s T
i c s
Metadata: Authors Data Time Conference Organization URL etc. Images Links Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 14 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Topics of documents Words and keyphrases of topics
doc1: doc2: doc3: doc4: ...
Text documents
Topic Modeling D
u m e n t s T
i c s
Metadata: Authors Data Time Conference Organization URL etc. Ads Images Links Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 15 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Topics of documents Words and keyphrases of topics
doc1: doc2: doc3: doc4: ...
Text documents
Topic Modeling D
u m e n t s T
i c s
Metadata: Authors Data Time Conference Organization URL etc. Ads Images Links Users Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 16 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Topics of documents Words and keyphrases of topics
doc1: doc2: doc3: doc4: ...
Text documents
Topic Modeling D
u m e n t s T
i c s
Metadata: Authors Data Time Conference Organization URL etc. Ads Images Links Users Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 17 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 18 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 19 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 20 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 21 / 38
Theory BigARTM implementation — http://bigartm.org Experiments Probabilistic Topic Modeling ARTM — Additive Regularization for Topic Modeling Multimodal Probabilistic Topic Modeling
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 22 / 38
Theory BigARTM implementation — http://bigartm.org Experiments BigARTM: parallel architecture BigARTM: time and memory performance How to start using BigARTM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 23 / 38
Theory BigARTM implementation — http://bigartm.org Experiments BigARTM: parallel architecture BigARTM: time and memory performance How to start using BigARTM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 24 / 38
Theory BigARTM implementation — http://bigartm.org Experiments BigARTM: parallel architecture BigARTM: time and memory performance How to start using BigARTM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 25 / 38
Theory BigARTM implementation — http://bigartm.org Experiments BigARTM: parallel architecture BigARTM: time and memory performance How to start using BigARTM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 26 / 38
Theory BigARTM implementation — http://bigartm.org Experiments BigARTM: parallel architecture BigARTM: time and memory performance How to start using BigARTM
1 Download links, tutorials, documentation:
2 Linux: compile and start examples
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 27 / 38
Theory BigARTM implementation — http://bigartm.org Experiments BigARTM: parallel architecture BigARTM: time and memory performance How to start using BigARTM
1 Download links, tutorials, documentation:
2 Linux: compile and start examples
1 Post questions in BigARTM discussion group:
2 Report bugs in BigARTM issue tracker:
3 Contribute to BigARTM project via pull requests:
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 28 / 38
Theory BigARTM implementation — http://bigartm.org Experiments BigARTM: parallel architecture BigARTM: time and memory performance How to start using BigARTM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 29 / 38
Theory BigARTM implementation — http://bigartm.org Experiments ARTM for combining regularizers Multi-ARTM for classification Multi-ARTM for multi-language TM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 30 / 38
Theory BigARTM implementation — http://bigartm.org Experiments ARTM for combining regularizers Multi-ARTM for classification Multi-ARTM for multi-language TM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 31 / 38
Theory BigARTM implementation — http://bigartm.org Experiments ARTM for combining regularizers Multi-ARTM for classification Multi-ARTM for multi-language TM
1 · 106 2 · 106 3 · 106 0.34 0.52 0.69 0.87 1.04 1.22 ·104 Perplexity 20 40 60 80 100 Sparsity Perplexity Phi Theta 1 · 106 2 · 106 3 · 106 0.25 0.5 0.75 1 1.25 ·103 Kernel size 0.2 0.4 0.6 0.8 1 Purity and contrast Size Purity Contrast
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 32 / 38
Theory BigARTM implementation — http://bigartm.org Experiments ARTM for combining regularizers Multi-ARTM for classification Multi-ARTM for multi-language TM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 33 / 38
Theory BigARTM implementation — http://bigartm.org Experiments ARTM for combining regularizers Multi-ARTM for classification Multi-ARTM for multi-language TM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 34 / 38
Theory BigARTM implementation — http://bigartm.org Experiments ARTM for combining regularizers Multi-ARTM for classification Multi-ARTM for multi-language TM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 35 / 38
Theory BigARTM implementation — http://bigartm.org Experiments ARTM for combining regularizers Multi-ARTM for classification Multi-ARTM for multi-language TM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 36 / 38
Theory BigARTM implementation — http://bigartm.org Experiments ARTM for combining regularizers Multi-ARTM for classification Multi-ARTM for multi-language TM
Konstantin Vorontsov (voron@yandex-team.ru) BigARTM: Open Source Topic Modeling 37 / 38