An introduction to network inference and mining Nathalie - PowerPoint PPT Presentation

An introduction to network inference and mining Nathalie Villa-Vialaneix - nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org INRA, UR 875 MIAT Formation Biostatistique, Niveau 3 Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 1 / 24

Outline 1 A brief introduction to networks/graphs 2 Network inference 3 Simple graph mining Visualization Global characteristics Numerical characteristics calculation Clustering Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 2 / 24

A brief introduction to networks/graphs Outline 1 A brief introduction to networks/graphs 2 Network inference 3 Simple graph mining Visualization Global characteristics Numerical characteristics calculation Clustering Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 3 / 24

A brief introduction to networks/graphs What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities . Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 4 / 24

A brief introduction to networks/graphs What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities . The entities are called the nodes or the vertexes (vertices in British) nœuds/sommets Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 4 / 24

A brief introduction to networks/graphs What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities . A relation between two entities is modeled by an edge arête Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 4 / 24

A brief introduction to networks/graphs (non biological) Examples Social network : nodes: persons - edges: 2 persons are connected (“friends”) TM 1 network) (Natty’s facebook Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 5 / 24

A brief introduction to networks/graphs (non biological) Examples Modeling a large corpus of medieval documents Notarial acts (mostly baux à fief , more precisely, land charters) established in a seigneurie named “Castelnau Montratier”, written between 1250 and 1500, involving tenants and lords. a a http://graphcomp.univ-tlse2.fr Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 5 / 24

A brief introduction to networks/graphs (non biological) Examples Modeling a large corpus of medieval documents • nodes: transactions and individuals (3 918 nodes) • edges: an individual is directly involved in a transaction (6 455 edges) Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 5 / 24

A brief introduction to networks/graphs (non biological) Examples Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 5 / 24

A brief introduction to networks/graphs Standard issues associated with networks Inference Giving data, how to build a graph whose edges represent the direct links between variables? Example : co-expression networks built from microarray data (nodes = genes; edges = significant “direct links” between expressions of two genes) Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 6 / 24

A brief introduction to networks/graphs Standard issues associated with networks Inference Giving data, how to build a graph whose edges represent the direct links between variables? Graph mining (examples) 1 Network visualization : nodes are not a priori associated to a given position. How to represent the network in a meaningful way? Positions aiming at representing Random positions connected nodes closer Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 6 / 24

A brief introduction to networks/graphs Standard issues associated with networks Inference Giving data, how to build a graph whose edges represent the direct links between variables? Graph mining (examples) 1 Network visualization : nodes are not a priori associated to a given position. How to represent the network in a meaningful way? 2 Network clustering : identify “communities” (groups of nodes that are densely connected and share a few links (comparatively) with the other groups) Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 6 / 24

A brief introduction to networks/graphs More complex relational models Nodes may be labeled by a factor Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 7 / 24

A brief introduction to networks/graphs More complex relational models Nodes may be labeled by a factor ... or by a numerical information. [Laurent and Villa-Vialaneix, 2011] Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 7 / 24

A brief introduction to networks/graphs More complex relational models Nodes may be labeled by a factor ... or by a numerical information. [Laurent and Villa-Vialaneix, 2011] Edges may also be labeled (type of the relation) or weighted (strength of the relation) or directed (direction of the relation). Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 7 / 24

Network inference Outline 1 A brief introduction to networks/graphs 2 Network inference 3 Simple graph mining Visualization Global characteristics Numerical characteristics calculation Clustering Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 8 / 24

Network inference Framework Data : large scale gene expression data    . . . . . .  individuals  X j   X = . . . . . n ≃ 30 / 50 i . . . . . . � �� variables (genes expression) , p ≃ 10 3 / 4 What we want to obtain : a network with • nodes: genes; • edges: significant and direct co-expression between two genes (track transcription regulations) Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 9 / 24

Network inference Advantages of inferring a network from large scale transcription data 1 over raw data : focuses on the strongest direct relationships : irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand. Expression data are analyzed all together and not by pairs. Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 10 / 24

Network inference Advantages of inferring a network from large scale transcription data 1 over raw data : focuses on the strongest direct relationships : irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand. Expression data are analyzed all together and not by pairs. 2 over bibliographic network : can handle interactions with yet unknown (not annotated) genes and deal with data collected in a particular condition. Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 10 / 24

Network inference Using correlations : relevance network [Butte and Kohane, 1999, Butte and Kohane, 2000] First (naive) approach : calculate correlations between expressions for all pairs of genes, threshold the smallest ones and build the network. Thresholding Graph “Correlations” Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 11 / 24

Network inference But correlation is not causality... Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 12 / 24

Network inference But correlation is not causality... x y z strong indirect correlation set.seed(2807); x <- runif(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y); [1] 0.9988261 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z); [1] 0.998751 cor(y,z); [1] 0.9971105 Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 12 / 24

Network inference But correlation is not causality... x y z strong indirect correlation set.seed(2807); x <- runif(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y); [1] 0.9988261 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z); [1] 0.998751 cor(y,z); [1] 0.9971105 ♯ Partial correlation cor(lm(y ∼ x)$residuals,lm(z ∼ x)$residuals) [1] -0.1933699 Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 12 / 24

Network inference But correlation is not causality... x y z strong indirect correlation Networks are built using partial correlations , i.e., correlations between gene expressions knowing the expression of all the other genes (residual correlations). Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 12 / 24

Network inference Various approaches (and packages) to infer gene expression networks • Graphical Gaussian Model ( X i ) i = 1 ,..., n are i.i.d. Gaussian random variables N ( 0 , Σ) (gene expression); then � � → j ′ (genes j and j ′ are linked) ⇔ C or X j , X j ′ | ( X k ) k � = j , j ′ j ← > 0 � � � Σ − 1 � X j , X j ′ | ( X k ) k � = j , j ′ C or ≃ j , j ′ ⇒ find the partial correlations by means of ( � Σ n ) − 1 . Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 13 / 24

Network inference Various approaches (and packages) to infer gene expression networks • Graphical Gaussian Model ( X i ) i = 1 ,..., n are i.i.d. Gaussian random variables N ( 0 , Σ) (gene expression); then � � → j ′ (genes j and j ′ are linked) ⇔ C or X j , X j ′ | ( X k ) k � = j , j ′ j ← > 0 � � � Σ − 1 � X j , X j ′ | ( X k ) k � = j , j ′ C or ≃ j , j ′ ⇒ find the partial correlations by means of ( � Σ n ) − 1 . Problem: Σ is a p -dimensional matrix (with p large) and n is small Σ n ) − 1 is a poor estimate of Σ − 1 ! compared to p ⇒ ( � Formation INRA (Niveau 3) Network Nathalie Villa-Vialaneix 13 / 24

An introduction to network inference and mining Nathalie - PowerPoint PPT Presentation

An introduction to network inference and mining Nathalie Villa-Vialaneix - nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org INRA, UR 875 MIAT Formation Biostatistique, Niveau 3 Formation INRA (Niveau 3) Network Nathalie

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Introduction What is data mining? to Data mining functionalities Data Mining Major

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Week 5 Video 1 Relationship Mining Correlation Mining Relationship Mining Discover

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Supply Chain Planning in the Consumer Electronics Industry

NSURLConnection and Beyond: Networking with Cocoa A Brief History of IP Invented by Vint

Deep Learning. Petr Pok Czech Technical University in Prague Faculty of Electrical

Programming Language Interface Sayed Amirhossein Mirhosseini Uses of PLI PLI can be used to

Networking Online Thursday, August 26, 2010 Making New Contacts: Networking and Career Fairs

The Wikipedia Location Network: Overcoming Borders and Oceans Johanna Gei 1 , Andreas Spitz 1 ,

Midterm Exam 2 Midterm Exam Part 1 (25%) Part 2 (75%) Paper & Pencil only With Computer

Network Security Wireless Marcus Bendtsen, Andrei Gurtov Institutionen fr Datavetenskap (IDA)

Sambuz

Useful Links

Newsletter

Mail Us

An introduction to network inference and mining Nathalie - PowerPoint PPT Presentation

An introduction to network inference and mining Nathalie Villa-Vialaneix - nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org INRA, UR 875 MIAT Formation Biostatistique, Niveau 3 Formation INRA (Niveau 3) Network Nathalie

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Introduction What is data mining? to Data mining functionalities Data Mining Major

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Week 5 Video 1 Relationship Mining Correlation Mining Relationship Mining Discover

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Supply Chain Planning in the Consumer Electronics Industry

NSURLConnection and Beyond: Networking with Cocoa A Brief History of IP Invented by Vint

Deep Learning. Petr Pok Czech Technical University in Prague Faculty of Electrical

Programming Language Interface Sayed Amirhossein Mirhosseini Uses of PLI PLI can be used to

Networking Online Thursday, August 26, 2010 Making New Contacts: Networking and Career Fairs

The Wikipedia Location Network: Overcoming Borders and Oceans Johanna Gei 1 , Andreas Spitz 1 ,

Midterm Exam 2 Midterm Exam Part 1 (25%) Part 2 (75%) Paper &amp; Pencil only With Computer

Network Security Wireless Marcus Bendtsen, Andrei Gurtov Institutionen fr Datavetenskap (IDA)

Sambuz

Useful Links

Newsletter

Mail Us

Midterm Exam 2 Midterm Exam Part 1 (25%) Part 2 (75%) Paper & Pencil only With Computer