Probabilistic Graphical Models for Credibility Analysis in Evolving - PowerPoint PPT Presentation

Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities Subhabrata Mukherjee Max Planck Institute for Informatics, Germany smukherjee@mpi-inf.mpg.de

Motivation ● Prior Work and its Limitations ● Credibility Analysis ● Framework for Online ↘ Communities Outline Temporal Evolution of Online ↘ Communities Credibility Analysis of ↘ Product Reviews Conclusions ● 2

Online Communities as Online communities are massive ● a Knowledge Resource repositories of knowledge accessed by regular users and professionals 59% of adult U.S. population and half ↘ of U.S. physicians rely on online resources [IMS Health Report, 2014] 40% of online consumers consult ↘ online reviews before buying products [Nielson Corporation, 2016] However their usability is restricted due to ● serious credibility concerns (e.g., spams, misinformation, bias etc.)

“Rapid spread of misinformation online” --- one of top 10 challenges as per The World Economic Forum Concerns Misinformation for health can have hazardous consequences 4

Truth Finding Linguistic Analysis Structured data (e.g., SPO Unstructured text triples, tables, networks) Subjective information (e.g., Objective facts (e.g., opinion spam, bias, viewpoint ) Obama_BornIn_Hawaii vs. External KB (e.g., WordNet, Obama_BornIn_Kenya ) KG) No contextual data (text) No network / interactions, No external KB, metadata metadata 6

1. How can we jointly leverage users, network, and context for credibility analysis in online communities? 2. How can we model users’ evolution ? Research Questions 3. How can we deal with limited data? 4. How can we generate interpretable explanations for credibility verdict? 7

Contributions Credibility Analysis Framework for Online Communities ● Classification: Health Communities [SIGKDD 2014] ↘ Regression: News Communities [CIKM 2015] ↘ Temporal Evolution of Online Communities ● [ICDM 2015, SIGKDD 2016] Credibility Analysis of Product Reviews ● [ECML-PKDD 2016, SDM 2017] 8

“ A statement is credible if it is reported What is Credibility? by a trustworthy user in an objective language” “Trustworthy users corroborate each other on credible statements” 10

Credibility Analysis Framework for Classification Problem: Given a set of posts from different users, extract credible statements ( subject-predicate-object triples like DrugX_HasSideEffect_Y) from trustworthy users Subhabrata Mukherjee, Gerhard Weikum and Cristian Danescu-Niculescu-Mizil: SIGKDD 2014 11

Credibility Analysis Framework for Classification Problem: Given a set of posts from different users, extract credible statements ( subject-predicate-object triples like DrugX_HasSideEffect_Y) from trustworthy users Subhabrata Mukherjee, Gerhard Weikum and Cristian Danescu-Niculescu-Mizil: SIGKDD 2014 12

Network of Interactions: Cliques Each user, post, and statement is a random variable with edges depicting interactions. ➔ Variables have observable features (e.g, authority, emotionality). A clique is formed between each user writing a post containing a statement . ➔ Statements: An IE tool generates candidate triple patterns like: Xanax_causes_headache, Xanax_gave_demonic-feel Potentially thousands of such triples, with only a handful of credible ones 13

Network of Interactions: Cliques Each user, post, and statement is a random variable with edges depicting interactions Statements: An IE tool generates candidate triple patterns like: Xanax_causes_headache, Xanax_gave_demonic-feel Potentially thousands of such triples, with only a handful of credible ones Idea: Trustworthy users corroborate on credible statements in objective language 14

Conditional Random Field to Exploit Joint Interactions (Users + Network + Context) How to complement expert medical knowledge with large scale non-expert data? Partial Supervision: Expert stated (top 20%) side-effects of drugs as partial training labels. Model predicts labels of unobserved statements. 15

Semi-Supervised Conditional Random Field 1. Estimate user trustworthiness: 2. Estimate label of unknown statements S u by Gibbs Sampling: 3. Maximize log-likelihood to estimate feature weights: 4. Apply E-Step and M-Step till convergence 16

Healthforum Dataset Healthboards.com community (www.healthboards.com) with 850,000 ● registered users and 4.5 million posts Expert labels about drugs from MayoClinic (www.mayoclinic.org) ● 6 widely used drugs for experimentation ↘ 17

What constitutes credible language? compunction anxiety embarrassment misery distress confidence sympathy self-esteem eagerness coolness Affective Emotions 19

What constitutes credible language? contrast (despite, though, ..) question (what, why, ..) conditional (if) adverb (maybe, probably, ..) modality (might, could, ..) determiner (this, that,..) negation (not, never, ..) second person (you, ..) conjunction (therefore, consequently, ..) Discourse and Modalities 20

Credibility Analysis Framework for Regression In many online communities users rate items on their quality 21

Credibility Analysis in News Communities Sources trunews.com Articles Topics Sources / Users “Global warming is a Scientificamerican.com hoax” snopes.com Climate Change user-donald Reviews & Ratings scientific analysis, 1.5/ 5, conspiratory theory However, user feedback is often subjective ; influenced by their bias and viewpoints 22

Credibility Analysis Framework for Regression Sources trunews.com Articles We use CRF to capture these mutual interactions in Topics Sources / Users “Global warming is a news communities (e.g., newstrust.net, digg, reddit) Scientificamerican.com hoax” snopes.com Climate Change to jointly rank all of the underlying factors. user-donald Reviews / Ratings scientific analysis, 1.5/ 5, conspiratory theory Idea: Trustworthy sources publish objective articles corroborated by expert users with credible reviews/ratings 23

Online Communities: Factors Related to Ensemble Learning, Learning to Rank

How to incorporate continuous ratings instead of discrete labels in CRF ? Probability Mass Function for discrete labels: Probability Density Function for continuous ratings: Subhabrata Mukherjee and Gerhard Weikum: CIKM 2015 25

Energy Function to Combine All

How to incorporate continuous ratings instead of discrete labels in CRF ? We show that a certain energy function for clique potential --- geared for ● reducing mean-squared-error --- results in multivariate gaussian p.d.f. !!! Constrained Gradient Ascent for inference ● Subhabrata Mukherjee and Gerhard Weikum: CIKM 2015 27

Predicting Article Credibility Ratings in Newstrust.net Progressive decrease in mean squared error with more network interactions, and context 28

Take-away Semi-supervised and Continuous CRF to jointly identify trustworthy users, ● credible statements, and reliable postings in online communities A framework to incorporate richer aspects like user expertise, topics / ● facets, temporal evolution etc. 29

Temporal Evolution Online communities are dynamic, as users join and leave; acquire new ● vocabulary; evolve and mature over time Trustworthiness and expertise of users evolve over time ● How to capture evolving user expertise? 31

Illustrative Example for Review Communities Consider following camera reviews by the same user John: ● “ My first DSLR. Excellent camera, takes great pictures with high definition, without a doubt it makes honor to its name.” [Aug, 1997] “ The EF 75-300 mm lens is only good to be used outside. The 2.2X HD lens can only be used for specific items; filters are useless if ISO, AP,... . The short 18-55mm lens is cheap and should have a hood to keep light off lens.” [Oct, 2012] Mukherjee et al.: ICDM 2015, SIGKDD 2016 32

Illustrative Example for Review Communities Consider following camera reviews by John: ● “ My first DSLR. Excellent camera, takes great pictures with high definition, without a doubt it makes honor to its name.” How can we quantify this change [Aug, 1997] in users’ maturity / experience ? How can we model this evolution “ The EF 75-300 mm lens is only good to be used outside. The 2.2X / progression in users’ maturity? HD lens can only be used for specific items; filters are useless if ISO, AP,... . The short 18-55mm lens is cheap and should have a hood to keep light off lens.” [Oct, 2012] Mukherjee et al.: ICDM 2015, SIGKDD 2016 33

Probabilistic Graphical Models for Credibility Analysis in Evolving - PowerPoint PPT Presentation

Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities Subhabrata Mukherjee Max Planck Institute for Informatics, Germany smukherjee@mpi-inf.mpg.de Motivation Prior Work and its Limitations

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

WebAssembly: Status & Web IDL Bindings W3C Games Workshop - June, 2019 Luke Wagner Based on

Active and reactive effort in sign language phonetics Nathan Sanders and Donna Jo Napoli

Topological Data Structures Jorge Stolfi Instituto de Computa c ao Universidade Estadual

Solving All Lattice Problems in Deterministic Single Exponential Time Daniele Micciancio (UCSD)

The Full-waveform Lidar Riegl LMS-Q680i: From Reverse Engineering to Sensor Modeling Andr

An Admins Guide to Salesforce DX Dreamforce 2019 Gloria Ramchandani | Director, Business

T HE RATIONAL COHOMOLOGY OF SMOOTH , REAL TORIC VARIETIES Alex Suciu Northeastern University

A constraint-stabilized time-stepping approach for piecewise smooth multibody dynamics Gary D.

Sambuz

Useful Links

Newsletter

Mail Us

Probabilistic Graphical Models for Credibility Analysis in Evolving - PowerPoint PPT Presentation

Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities Subhabrata Mukherjee Max Planck Institute for Informatics, Germany smukherjee@mpi-inf.mpg.de Motivation Prior Work and its Limitations

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

WebAssembly: Status &amp; Web IDL Bindings W3C Games Workshop - June, 2019 Luke Wagner Based on

Active and reactive effort in sign language phonetics Nathan Sanders and Donna Jo Napoli

Topological Data Structures Jorge Stolfi Instituto de Computa c ao Universidade Estadual

Solving All Lattice Problems in Deterministic Single Exponential Time Daniele Micciancio (UCSD)

The Full-waveform Lidar Riegl LMS-Q680i: From Reverse Engineering to Sensor Modeling Andr

An Admins Guide to Salesforce DX Dreamforce 2019 Gloria Ramchandani | Director, Business

T HE RATIONAL COHOMOLOGY OF SMOOTH , REAL TORIC VARIETIES Alex Suciu Northeastern University

A constraint-stabilized time-stepping approach for piecewise smooth multibody dynamics Gary D.

Sambuz

Useful Links

Newsletter

Mail Us

WebAssembly: Status & Web IDL Bindings W3C Games Workshop - June, 2019 Luke Wagner Based on