ISLab Intelligent Systems Lab Piet van Remortel - - PowerPoint PPT Presentation

islab
SMART_READER_LITE
LIVE PREVIEW

ISLab Intelligent Systems Lab Piet van Remortel - - PowerPoint PPT Presentation

ISLab Intelligent Systems Lab Piet van Remortel piet.vanremortel@ua.ac.be 1 Overview Who we are What we do Applied Machine learning Bio-informatics What we look for 2 Who ? Prof. Dr. Alain Verschoren Who we are Dr. Piet van Remortel


slide-1
SLIDE 1

ISLab

Intelligent Systems Lab Piet van Remortel piet.vanremortel@ua.ac.be

1

slide-2
SLIDE 2

Overview

Who we are What we do Applied Machine learning Bio-informatics What we look for

2

slide-3
SLIDE 3

Who ?

Who we are What we do Applied ML Bio-i What we look for

  • Prof. Dr. Alain Verschoren
  • Dr. Piet van Remortel

Koenraad Van Leemput Tim Van den Bulcke (with ESAT-KUL)

  • Dr. Elmer Fernandez (Arg.)

3

slide-4
SLIDE 4

What we do

(Applied) machine learning History in evolutionary computation More recent: ML applications such as bio-informatics and predictive toxicology

Who we are What we do Applied ML Bio-i What we look for

4

slide-5
SLIDE 5

Machine Learning

Wikipedia says “a broad subfield of artificial intelligence” “concerned with the development of algorithms and techniques” “which allow computers to learn” applications search engines, bioinformatics, detecting credit card fraud, stock market analysis, classifying DNA sequences, speech and handwriting recognition, ...

Who we are What we do Machine Learning Bio-i What we look for

5

slide-6
SLIDE 6

Machine Learning

What is it good for? extract knowledge from given data typically “hard” problems computationally hard enumeration of possibilities takes forever etc lack of traditional formalisms traditional statistics complete mathematical models of the problem high dimensions, discontinuities, ...

Who we are What we do Machine Learning Bio-i What we look for

6

slide-7
SLIDE 7

Genetic Algorithms

Use biological evolution as a metaphor for search/

  • ptimization

‘evolve’ solutions to problems is an existing biological organism a ‘solution’ to the problem of surviving in the environment ? iterate generate diversity select for some fitness measure

Who we are What we do Machine Learning Bio-i What we look for

7

slide-8
SLIDE 8

Genetic Algorithms

Who we are What we do Machine Learning Bio-i What we look for

8

slide-9
SLIDE 9

Genetic Algorithms

Who we are What we do Machine Learning Bio-i What we look for

PRO very robust: solve a problem given some fitness measure actually means: can cope with non-convex fitness landscapes actually means: can solve problems where good solutions are not necessarily alike each other actully means: solving the problem is harder then ‘climbing one hill’ in the landscape can be easily parallellized (CalcUA !)

9

slide-10
SLIDE 10

Genetic Algorithms

Who we are What we do Machine Learning Bio-i What we look for

CON inherently slow probabilistic in nature true mechanics not well understood ‘it just works’ (more or less)

10

slide-11
SLIDE 11

Genetic Algorithms

Who we are What we do Machine Learning Bio-i What we look for

What we used to do with GAs study fitness landscapes predict problem hardness etc What we do now apply ! e.g. Feature selection in predictive toxicology use a number of genes to characterise an unknown toxic compound

11

slide-12
SLIDE 12

Genetic Algorithms

Who we are What we do Machine Learning Bio-i What we look for

Typical applications of GAs: high dimensional problem lots of interaction between the different elements of the solution the impact of one parameter depends on the value of other parameters e.g. hot-water tap in the shower possibility to assign numerical quality to a candidate solution

12

slide-13
SLIDE 13

Classification

13

A B C

Who we are What we do Machine Learning Bio-i What we look for

13

slide-14
SLIDE 14

Classification

13

A B C ?

Who we are What we do Machine Learning Bio-i What we look for

13

slide-15
SLIDE 15

Classification

13

A B C correct features ?

Who we are What we do Machine Learning Bio-i What we look for

13

slide-16
SLIDE 16

Classification

13

A B C ?

Who we are What we do Machine Learning Bio-i What we look for

13

slide-17
SLIDE 17

Classification

13

A B C ? domain knowledge

Who we are What we do Machine Learning Bio-i What we look for

13

slide-18
SLIDE 18

Classification

13

A B C ?

Who we are What we do Machine Learning Bio-i What we look for

13

slide-19
SLIDE 19

Classification

13

A B C ? characterisation (multiple classes)

Who we are What we do Machine Learning Bio-i What we look for

13

slide-20
SLIDE 20

Classification of toxic compounds

Cooperation with EB&T (De Coen/Blust) Goal: Characterise unknown toxic compound by means of genetic expression fingerprint(s) 40-50 known chemicals as training set goal: pre-compliance screening for toxic mode of action algorithmic foundation: SVM/PLS/GP

14

Who we are What we do Machine Learning Bio-i What we look for

14

slide-21
SLIDE 21

Classifiers

Many classifiers around prototype of two promosing techniques SVM : support vector machine PLS : partial least squares more to be added Genetic programming ...

15

Who we are What we do Machine Learning Bio-i What we look for

15

slide-22
SLIDE 22

PLS: Partial least squares

regression technique relate input and output matrix

  • rigin: chemistry (spectral analysis)

pro handles collinear components in X fast

16

Y X

Who we are What we do Machine Learning Bio-i What we look for

16

slide-23
SLIDE 23

SVM: Support Vector Machines

17

Who we are What we do Machine Learning Bio-i What we look for

17

slide-24
SLIDE 24

SVM: Support Vector Machines

Classification technique

17

Who we are What we do Machine Learning Bio-i What we look for

17

slide-25
SLIDE 25

SVM: Support Vector Machines

Classification technique Mathematical foundation

17

Who we are What we do Machine Learning Bio-i What we look for

17

slide-26
SLIDE 26

SVM: Support Vector Machines

Classification technique Mathematical foundation

  • rigin: Statistical Learning Theory

17

Who we are What we do Machine Learning Bio-i What we look for

17

slide-27
SLIDE 27

SVM: Support Vector Machines

Classification technique Mathematical foundation

  • rigin: Statistical Learning Theory

bottom line: linear separability by projection to high-dimensional space

17

Who we are What we do Machine Learning Bio-i What we look for

17

slide-28
SLIDE 28

Linear separability (in 2D)

18

Who we are What we do Machine Learning Bio-i What we look for

18

slide-29
SLIDE 29

Linear separability (in 2D)

18

Who we are What we do Machine Learning Bio-i What we look for

18

slide-30
SLIDE 30

Linear separability (in 2D)

18

Who we are What we do Machine Learning Bio-i What we look for

18

slide-31
SLIDE 31

Linear separability (in 2D)

18

Who we are What we do Machine Learning Bio-i What we look for

18

slide-32
SLIDE 32

Linear separability (in 2D)

18

Who we are What we do Machine Learning Bio-i What we look for

18

slide-33
SLIDE 33

Linear separability (in 2D)

18

Who we are What we do Machine Learning Bio-i What we look for

18

slide-34
SLIDE 34

Linear inseparability (in 2D)

19

Who we are What we do Machine Learning Bio-i What we look for

19

slide-35
SLIDE 35

Linear inseparability (in 2D)

19

Who we are What we do Machine Learning Bio-i What we look for

19

slide-36
SLIDE 36

Linear inseparability (in 2D)

19

???

Who we are What we do Machine Learning Bio-i What we look for

19

slide-37
SLIDE 37

Linear inseparability (in 2D)

20

Who we are What we do Machine Learning Bio-i What we look for

20

slide-38
SLIDE 38

Linear inseparability (in 3D)

21

Who we are What we do Machine Learning Bio-i What we look for

21

slide-39
SLIDE 39

Linear inseparability (in 3D)

21

Who we are What we do Machine Learning Bio-i What we look for

21

slide-40
SLIDE 40

Remember ...

22

Goal: Characterise unknown toxic compound by means of genetic expression fingerprint(s)

22

slide-41
SLIDE 41

Remember ...

22

Goal: Characterise unknown toxic compound by means of genetic expression fingerprint(s)

22

slide-42
SLIDE 42

Gene expression

Who we are What we do Machine Learning Bio-i What we look for

23

slide-43
SLIDE 43

Gene expression

Who we are What we do Machine Learning Bio-i What we look for

23

slide-44
SLIDE 44

Gene expression

Who we are What we do Machine Learning Bio-i What we look for

23

slide-45
SLIDE 45

Gene expression

Who we are What we do Machine Learning Bio-i What we look for

23

slide-46
SLIDE 46

Gene expression

Who we are What we do Machine Learning Bio-i What we look for

23

slide-47
SLIDE 47

Microarray technology

Who we are What we do Machine Learning Bio-i What we look for

24

slide-48
SLIDE 48

Infering transcriptional regulatory networks

microarray data causal relations between genes

Who we are What we do Machine Learning Bio-i What we look for

25

slide-49
SLIDE 49

A transcriptional regulatory network

Who we are What we do Machine Learning Bio-i What we look for

26

slide-50
SLIDE 50

A transcriptional regulatory network

Who we are What we do Machine Learning Bio-i What we look for

27

slide-51
SLIDE 51

Systems Biology

Systems biology is an academic field that seeks to integrate different levels of information to understand how biological systems function.

Who we are What we do Machine Learning Bio-i What we look for

28

slide-52
SLIDE 52

Systems Biology

Systems biology is an academic field that seeks to integrate different levels of information to understand how biological systems function.

Who we are What we do Machine Learning Bio-i What we look for

28

slide-53
SLIDE 53

Systems Biology

transcriptome Systems biology is an academic field that seeks to integrate different levels of information to understand how biological systems function.

Who we are What we do Machine Learning Bio-i What we look for

28

slide-54
SLIDE 54

Systems Biology

transcriptome Systems biology is an academic field that seeks to integrate different levels of information to understand how biological systems function. proteome

Who we are What we do Machine Learning Bio-i What we look for

28

slide-55
SLIDE 55

Systems Biology

transcriptome metabolome Systems biology is an academic field that seeks to integrate different levels of information to understand how biological systems function. proteome

Who we are What we do Machine Learning Bio-i What we look for

28

slide-56
SLIDE 56

SynTReN

a generator for synthetic datasets of transcriptional networks and associated expression data goal: design/test inference algorithms basic principle compose a TRN network topology impose interactions extract expression levels

Who we are What we do Machine Learning Bio-i What we look for

29

slide-57
SLIDE 57

SynTReN application

NETWORK TOPOLOGY INTERACTIONS SYNTHETIC GENE NETWORK SYNTHETIC EXPRESSION DATA INFERENCE Aracne SAMBA Genomica ADJACENCY MATRIX INFERRED CALCULATE PERFORMANCE METRIC

Topology type E.coli,Yeast, AB, ER, SW, DSF Interactions Interaction type ratio Activ/Inhib Noise

  • Biol. Noise
  • Exp. Noise

Input Noise Amount of Data # experiments # samples

SynTReN

ADJACENCY MATRIX ORIGINAL

Single Experiment run

COMPARE

(MODULE) NETWORK TOPOLOGY

Who we are What we do Machine Learning Bio-i What we look for

30

slide-58
SLIDE 58

Inference Algorithm shootout

Explore impact of parameters on inference quality

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

aracne interactiontypes: linear

Sensitivity Specificity

  • ● ●
  • graphModel AB

graphModel DSF graphModel ecoli_cluster graphModel ecoli_neighbor graphModel ER graphModel WS graphModel yeast_cluster graphModel yeast_neighbor

31

slide-59
SLIDE 59

More info on SynTReN:

“SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms”

Tim Van den Bulcke , Koenraad Van Leemput, Bart Naudts , Piet van Remortel , Hongwu Ma , Alain Verschoren , Bart De Moor and Kathleen Marchal BMC Bioinformatics 2006, 7:43 http:/ /homes.esat.kuleuven.be/~kmarchal/SynTReN/index.html

32

slide-60
SLIDE 60

Graph (De)composition

Who we are What we do Machine Learning Bio-i What we look for

33

slide-61
SLIDE 61

Purpose

Use prior information on graph properties when searching for graphs (regulatory networks!) Convert a Graph to a Sequence of Symbols Characterize graph families Distribution on graph space

Who we are What we do Machine Learning Bio-i What we look for

34

slide-62
SLIDE 62

Graph Statistics

Global Depend on all nodes in the graph e.g. average path length Local Depend on local structure e.g. degree Look for intermediate description Subgraphs + interconnections

Who we are What we do Machine Learning Bio-i What we look for

35

slide-63
SLIDE 63

Motifs

small frequent subgraph also known as significant subgraph, graphlets

A D B C A B C A C

4C 3C R

Who we are What we do Machine Learning Bio-i What we look for

36

slide-64
SLIDE 64

Searching for motifs in a graph means mapping onto existing vertices Vertices can overlap -> connections between motifs Edges do not overlap -> every edge explained by

  • nly one motif

V0 V3 V1 V2 V6 V4 V5 V7 V8

start vertex MOTIF SEARCH SPACE UNEXPLAINED GRAPH

Who we are What we do Machine Learning Bio-i What we look for

37

slide-65
SLIDE 65

subgraph structure some vertices marked as attaching probabilistic information about connections between motifs

New definition of ‘Motif’

Who we are What we do Machine Learning Bio-i What we look for Who we are What we do Machine Learning Bio-i What we look for

38

slide-66
SLIDE 66

Motif Set

Becomes the alphabet of the sequences Combinations of motifs create graphs Graphs can be decomposed in motifs attached to each other

A D B C A B C A C

4C 3C R

Who we are What we do Machine Learning Bio-i What we look for

39

slide-67
SLIDE 67

Probability Distributions

Motif Set Prior Which motifs are preferred? Preferential Attachment To which other motifs does a motif like to attach? Sequence Distance Rule How far away in the sequence will a motif attach?

Who we are What we do Machine Learning Bio-i What we look for

40

slide-68
SLIDE 68

Motif Set Prior

Empirical Distribution ‘Glue’ motifs needed to make sure every edge in the graph can be explained during decomposition -> give these motifs low priority

Who we are What we do Machine Learning Bio-i What we look for

41

slide-69
SLIDE 69

Preferential Attachment

For each attaching vertex To other motifs in the Motif Set

Who we are What we do Machine Learning Bio-i What we look for

42

slide-70
SLIDE 70

Sequence Distance Rule

For example geometric distribution To avoid creating ‘string-of-bead’ type graphs, need some vertices that will attach further up the sequence

Distance

P r

  • b

a b i l i t y

Who we are What we do Machine Learning Bio-i What we look for

43

slide-71
SLIDE 71

Example Composition

Motif with just 1 edge With different sequence distance rule and attaching vertex can create stars, hubs

  • r chains
V0 V1 V2 V3 V4 V5 V6 V7 V0 V1 V2 V3 V4 V5 V7 V6 V10 V8 V9

(a) (b) (c)

V8 V9 V10 V0 V1 V2 V3 V4 V5 V7 V6 V10 V8 V9

Who we are What we do Machine Learning Bio-i What we look for

44

slide-72
SLIDE 72

By varying parameter p of geometric distribution we can create trees with varying number/length of branches

Who we are What we do Machine Learning Bio-i What we look for

45

slide-73
SLIDE 73

Decomposition

Depends on starting point Need a way to choose next motif in the sequence from multiple candidates

  • > Evaluate likelihood using probability

distributions product of probabilities gives likelihood for decomposition given the motif set

Who we are What we do Machine Learning Bio-i What we look for

46

slide-74
SLIDE 74

Example: use the motif set that generates chain like trees When decomposing branched trees their likelihood is much worse (higher -log(P))

Histogram of decomposition likelihood

!log(P) Frequency

1000 2000 3000 4000 20 40 60 80 100 Chain!like Trees Highly!branched Trees

Who we are What we do Machine Learning Bio-i What we look for

47

slide-75
SLIDE 75

Graph (de)comp conclusion

A system that can decompose graphs into a sequence of motifs and vice versa using probabilistic information A given sequence generates a family of graphs

Who we are What we do Machine Learning Bio-i What we look for

48

slide-76
SLIDE 76

What we look for ...

Projects regarding applications of machine learning (prediction, classification, optimization , ...) domains biology / bio-informatics / systems-bio finance (stock rate time series etc) planning, engineering ... contact: piet.vanremortel@ua.ac.be

Who we are What we do Machine Learning Bio-i What we look for

49