Recent advances in document network embedding @ERIC Julien Velcin - PowerPoint PPT Presentation

Dynamics On and Of Complex Networks 2020 Recent advances in document network embedding @ERIC Julien Velcin julien.velcin@univ-lyon2.fr Université Lumière Lyon 2 - ERIC Lab

Context 2

Informational landscape Projet Pulseweb (Cointet, Chavalarias…) Metromaps (Shahaf et al., 2015) http://pulseweb.cortext.net Readitopics (Velcin et al., 2018) Chronolines (Nguyen et al., 2014) https://github.com/Erwangf/readitopics 3

Document network embedding A complex system is a system composed of many components which may interact with each other. Complex systems are systems Examples of complex systems are Earth's global whose behavior is intrinsically climate, organisms, the human brain, infrastructure difficult to model due to the such as power grid, transportation or communication dependencies, competitions, systems, social and economic organizations (like relationships, or other types of cities), an ecosystem, a living cell, and ultimately the interactions between their parts or entire universe. between a given system and its environment. classification embedding link prediction clustering visualisation • Document network : “graph of vertices, where each vertex is associated with a text document” (Tuan et al., 2014)   e.g.: scientific articles, newspapers, social media… • Embedding for building a joint space for solving downstream tasks (e.g., link prediction, node classification, community detection) 4

Quick survey • Graph/Node embedding - Laplacian Eigenmaps (Belkin and Niyogi, 2002) - DeepWalk (Perozzi et al., 2014), Node2vec (Grover and Leskovec, 2016) - Graph Neural Networks (Scarselli et al., 2009) • Document network embedding - TADW (Yang et al., 2015) - Attention models and CANE (Tu et al., 2017) 5

Collaborators of the DMD team Robin Brochier Antoine Gourru Julien Jacques Adrien Guille Phd student Phd student Professor Associate (now graduated!) Professor 6

Contributions 7

Regularized Linear Embedding (RLE) Gourru A., J. Velcin, J. Jacques and A. Guille   Document Network Projection in Pretrained Word Embedding Space. ECIR 2020. Given: - U ∈ ℝ v × k matrix of pretrained word embeddings - T ∈ ℝ n × v Document x Word matrix ( textual information ) - A ∈ ℝ [0,1] × [0,1] the transition matrix ( graph information ) • Goal: learn the weights p i ∈ ℝ v for the words composing d i d i = p i U parameter to learn The vector for is just a weighted sum over pretrained WE d i 8

RLE (con’t) P = (1 − λ ) T + λ B with λ ∈ [0,1] a tradeo ff b/w textual and structural information 1 ∑ j S i , j ∑ b i = S i , j t j j S ∈ ℝ n × n with a squared matrix that reflects the pairwise similarity between nodes in the graph   S = A + A 2 (here, we use ) 2 9

Evaluation Datasets: - Cora (2,211 docs; 7 labels=topic; 5,001 citation links) - DBLP (60,744 docs; 4 labels=topic; 52,914 links) - New York Times (5,135 docs; 4 labels=article section; 3,050,513 links=common tag) https://github.com/AntoineGourru/DNEmbedding • Task 1: node classification • Task 2: link prediction 10

sensitivity to λ 11

GVNR and GVNR-t Brochier, A., Guille and J. Velcin. Global Vectors for Node Representation. The Web Conference ( WWW ) 2019. • Quick reminder of DeepWalk (Perozzi et al., 2014): - goal: learn vector representation of nodes - approach: a) make multiple random walks b) paths views as documents c) use Skip-Gram to build vectors (Mikolov et al., 2013) target context 12

GVNR and GVNR-t Brochier, A., Guille and J. Velcin. Global Vectors for Node Representation. The Web Conference ( WWW ) 2019. • Following GloVe (Pennington et al., 2014, GVNR solves regression task on the weighted cooccurrence matrix X where cells with small values are set to 0 (> threshold ) x min • We’re looking for ( U , b U ) ( V , b V ) and s.t.: n n ∑ ∑ s ( x ij )( u i . v j + b U i + b V j − log ( c + x ij )) 2 1 arg min U , V , b U , b V i j with s ( x ij ) = 1 if x ij > 0 and m i ∼ B ( α ) else where α is chosen s.t. m = k in average • GVNR-t integrates textual information by modifying : v j cooccurrence b/w and x i x j n n δ j . W ∑ ∑ + b U i + b V j − log ( c + x ij )) 2 arg min s ( x ij )( u i . | δ j | 1 U , V , b U , b V U,W i j 13

Results for GVNR-t • Classification on two citation networks (Cora with 2,708 nodes and Citeseer with 3,312 nodes) • Keyword recommendation on DBLP (1,397,240 documents and 3,021,489 citation relationships) https://github.com/brochier/gvnr 14

Inductive Document Network Embedding (IDNE) Brochier R., A. Guille and J. Velcin.   Inductive Document Network Embedding with Topic-Word Attention. ECIR 2020 (virtual). x i ∈ ℝ n w d i ∈ ℝ p Topical attention 0 = no link shared parameters W and T σ ( d i . d j ) 1 = link x j ∈ ℝ n w d j ∈ ℝ p Topical attention 15

Topical attention W W T words p p p K topics K dot products topical attention K topical vector weights Z For document d i u ( i | k ) ∑ K Document representation is the normalized sum over the K topics: d i k =1 | x i | 1 16

Learning IDNE Minimize L ( W , T ) = ∑ n d i =1 ∑ n d j =1 s ij log σ ( u i . u j ) + (1 − s ij )log σ ( − u i . u j ) so that: • S is a binary similarity matrix based on A, for instance: s ij = 1 if ( A + A 2 ) ij > 0 else s ij = 0 17

Results of IDNE on Cora T for Transductive I for Inductive C = classification P = link prediction 18

Observations 19

Observations (con’t) MCMC + theory Decision trees 20

Conclusion and future works 21

Conclusion • Several contributions on the embedding of documents augmented with network information:   RLE , GVNR-t , MATAN, IDNE , GELD • Use of “absolute” WE leads to good results. Can they be improved using contextualized WE (Devlin et al., 2018) • Recent advances in GNN should be considered in the future, e.g. GAT (Velikcovik et al., 2018) 22

Future works • Integrating uncertainty in the modelling   (Gourru et al., 2020) • Moving to author embedding (Ganesh et al., 2016) and modeling dynamics following (Balmer et al., 2017) • Information di ff usion in information networks (work in progress with G. Poux and S. Loudcher) 23

References • Brochier R., A. Guille and J. Velcin. Inductive Document Network Embedding with Topic- Word Attention. ECIR 2020 (virtual). • Brochier R., A. Guille and J. Velcin. Link Prediction with Mutual Attention for Text- Attributed Networks. Workshop on Deep Learning for Graphs and Structured Data Embedding , colocated with WWW (Companion Volume), May 13–17, 2019, San Francisco, CA, USA. • Brochier R., A. Guille and J. Velcin. Global Vectors for Node Representation. The Web Conference ( WWW ), May 13–17, 2019, San Francisco, CA, USA. • Gourru A., J. Velcin, J. Jacques and A. Guille Document Network Projection in Pretrained Word Embedding Space. ECIR 2020 (virtual). • Gourru A., J. Velcin and J. Jacques. Gaussian Embedding of Linked Documents from a Pretrained Semantic Space. IJCAI 2020. ➡ Code for GVNR and GVNR-t: https://github.com/brochier/gvnr ➡ Code for IDNE: https://github.com/brochier/idne ➡ Code for RLE and GELD: https://github.com/AntoineGourru/DNEmbedding 24

Recent advances in document network embedding @ERIC Julien Velcin - PowerPoint PPT Presentation

Dynamics On and Of Complex Networks 2020 Recent advances in document network embedding @ERIC Julien Velcin julien.velcin@univ-lyon2.fr Universit Lumire Lyon 2 - ERIC Lab Context 2 Informational landscape Projet Pulseweb (Cointet,

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Recent Advances in Photonic Recent Advances in Photonic effect employing IP- based distributed

Recent Advances in Two-loop Superstrings Eric DHoker Institut des Hautes Etudes Scientifiques,

Recent Advances In the Recent Advances In the Management of ITP Management of ITP Prof Gregory

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Recent advances in Mandelbrot martingales theory Julien Barral, Universit e Paris Nord

Seminar on Seminar on Recent Developments in Project Management Recent Developments in Project

Recent Advances in Adversarial Machine Learning Nicholas Carlini Google Research Recent

Avoiding artifacts in spectral white matter fiber clustering and embedding Demian Wassermann

Large-Scale Clustering through Functional NCut Embedding Embedding Experiments Summary

Rby : An Embedding of Alloy in Ruby Aleksandar Milicevic , Ido Efrati, and Daniel Jackson

Embedding ACL2 in HOL Mike Gordon, Warren A. Hunt, Jr., Matt Kaufmann, James Reynolds Gordon,

CSE543 - Computer and Network Security Module: Android Security Professor Trent Jaeger PhD

Termination Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

VI.2 IE for Entities, Relations, Roles Extracting named entities (either type-less constants or

Parking Can Get You There Faster Model Augmentation to Speed up Real-Time Model Checking Oliver

Matt Herdon Product Manager / Marketing Nautel Your questions please? (if you dont see the

Using Newtons method to solve linear systems or: How I learned to stop worrying and love

UC UC SF SF Disclosures EVAR for Rupture: Royalties and research grant Trial Data and

Eyes-free Computing Past, Present And Future T. V. Raman Google http://emacspeak.sf.net/raman