a framework for incorporating general domain knowledge
play

A Framework for Incorporating General Domain Knowledge into Latent - PowerPoint PPT Presentation

A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation using First-Order Logic David Andrzejewski 1 Xiaojin Zhu 2 Mark Craven 3 , 2 Benjamin Recht 2 1 Center for Applied Scientific Computing 2 Department of


  1. A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation using First-Order Logic David Andrzejewski 1 Xiaojin Zhu 2 Mark Craven 3 , 2 Benjamin Recht 2 1 Center for Applied Scientific Computing 2 Department of Computer Sciences 3 Department of Biostatistics Lawrence Livermore National Laboratory (USA) and Medical Informatics University of Wisconsin–Madison (USA) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 1 / 18

  2. Topic modeling with Latent Dirichlet Allocation (LDA) Blei et al, JMLR 2003 Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 2 / 18

  3. Topic modeling with Latent Dirichlet Allocation (LDA) Blei et al, JMLR 2003 Human embryonic stem cell research may benefit patients with genetic risk factors... Patients at risk for drug- resistant infection... Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 2 / 18

  4. Topic modeling with Latent Dirichlet Allocation (LDA) Blei et al, JMLR 2003 Human embryonic stem cell research may benefit patients with genetic risk factors... Patients at risk for drug- resistant infection... Patients at risk for drug-resistant Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 2 / 18

  5. Topic modeling applications Research trends (Wang & McCallum, 2006) Info retrieval (UMass) (also KDD 2011!) Author/document profiling Scientific impact/influence (Gerrish & Blei, 2009) Match papers to reviewers (Mimno & McCallum, 2007) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 3 / 18

  6. Topic modeling applications Research trends (Wang & McCallum, 2006) Info retrieval (UMass) (also KDD 2011!) Author/document profiling Scientific impact/influence (Gerrish & Blei, 2009) Match papers to reviewers (Mimno & McCallum, 2007) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 3 / 18

  7. Topic modeling applications Research trends (Wang & McCallum, 2006) Info retrieval (UMass) (also KDD 2011!) Author/document profiling Scientific impact/influence (Gerrish & Blei, 2009) Match papers to reviewers (Mimno & McCallum, 2007) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 3 / 18

  8. Unsupervised LDA Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X ” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur” First-Order Logic latent Dirichlet Allocation (Fold · all) Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both Word-document statistics (as in LDA) Domain knowledge rules (as in MLN) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

  9. Unsupervised LDA Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X ” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur” First-Order Logic latent Dirichlet Allocation (Fold · all) Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both Word-document statistics (as in LDA) Domain knowledge rules (as in MLN) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

  10. Unsupervised LDA Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X ” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur” First-Order Logic latent Dirichlet Allocation (Fold · all) Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both Word-document statistics (as in LDA) Domain knowledge rules (as in MLN) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

  11. Unsupervised LDA Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X ” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur” First-Order Logic latent Dirichlet Allocation (Fold · all) Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both Word-document statistics (as in LDA) Domain knowledge rules (as in MLN) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

  12. Unsupervised LDA Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X ” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur” First-Order Logic latent Dirichlet Allocation (Fold · all) Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both Word-document statistics (as in LDA) Domain knowledge rules (as in MLN) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

  13. Unsupervised LDA Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X ” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur” First-Order Logic latent Dirichlet Allocation (Fold · all) Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both Word-document statistics (as in LDA) Domain knowledge rules (as in MLN) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

  14. Unsupervised LDA Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X ” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur” First-Order Logic latent Dirichlet Allocation (Fold · all) Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both Word-document statistics (as in LDA) Domain knowledge rules (as in MLN) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

  15. Unsupervised LDA Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X ” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur” First-Order Logic latent Dirichlet Allocation (Fold · all) Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both Word-document statistics (as in LDA) Domain knowledge rules (as in MLN) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

  16. Unsupervised LDA Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X ” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur” First-Order Logic latent Dirichlet Allocation (Fold · all) Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both Word-document statistics (as in LDA) Domain knowledge rules (as in MLN) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

  17. Unsupervised LDA Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X ” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur” First-Order Logic latent Dirichlet Allocation (Fold · all) Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both Word-document statistics (as in LDA) Domain knowledge rules (as in MLN) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

  18. Unsupervised LDA Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X ” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur” First-Order Logic latent Dirichlet Allocation (Fold · all) Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both Word-document statistics (as in LDA) Domain knowledge rules (as in MLN) Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

  19. Representing LDA with logical predicates Value Logical Predicate Description z i = t Z ( i , t ) Latent topic LDA w i = v W ( i , v ) Observed word d i = j D ( i , j ) Observed document Unified way to capture metadata / annotations Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 5 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend