Learning Hierarchical Bayesian Networks for Genome-Wide Association - - PowerPoint PPT Presentation

learning hierarchical bayesian networks for genome wide
SMART_READER_LITE
LIVE PREVIEW

Learning Hierarchical Bayesian Networks for Genome-Wide Association - - PowerPoint PPT Presentation

Learning Hierarchical Bayesian Networks for Genome-Wide Association Studies Raphal Mourad 1 , Christine Sinoquet 2 and Philippe Leray 1 KOD team (KnOwledge and Decision), 1 LINA, UMR CNRS 6241, Ecole Polytechnique de l'Universit de Nantes. 2


slide-1
SLIDE 1

Learning Hierarchical Bayesian Networks for Genome-Wide Association Studies

Raphaël Mourad1, Christine Sinoquet2 and Philippe Leray1

KOD team (KnOwledge and Decision),

1 LINA, UMR CNRS 6241, Ecole Polytechnique de l'Université de Nantes. 2 LINA, UMR CNRS 6241, Université de Nantes.

FRANCE

Presented by Raphael Mourad PhD student in Bioinformatics raphael.mourad@univ-nantes.fr

slide-2
SLIDE 2

COMPSTAT 2010 2

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Outline

1/ Introduction 2/ Fondamental concept of association genetics 3/ Presentation of genetic data 4/ Our approach 5/ Results and discussion 6/ Conclusion and outlooks

slide-3
SLIDE 3

COMPSTAT 2010 3

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Introduction

slide-4
SLIDE 4

COMPSTAT 2010 4

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

  • Context:

Complex genetic diseases = multifactorial genetic diseases caused by a combination of genetic factors (eg genes) and environmental factors (eg sex, age...). Examples: diabetes, asthma, hypertension, some cancers...

slide-5
SLIDE 5

COMPSTAT 2010 5

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

  • Dissect the genetic basis of these diseases:

Genome-wide association studies (GWAS) → identification of genetic markers associated with common, complex diseases.

Chromosome

Markers

The human genome variability is covered by hundreds of thousands of markers.

slide-6
SLIDE 6

COMPSTAT 2010 6

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Fondamental concept of association genetics

slide-7
SLIDE 7

COMPSTAT 2010 7

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS LD Marker LD Marker Causal mutation LD between markers and their surrounding area on the chromosome.

  • Linkage disequilibrium (LD):

→ dependences generally observed between close SNPs on the chromosome, → at the basis of GWAS.

Chromosome

slide-8
SLIDE 8

COMPSTAT 2010 8

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Presentation of genetic data

slide-9
SLIDE 9

COMPSTAT 2010 9

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

DNA

> 100k SNP Ternary variables

Phenotype

1 binary variable:

  • 1000 non-affected individuals
  • 1000 affected individuals
  • Characteristics:

→ large number of genetic variables (SNP): combinatorial explosion → strong dependences among genetic variables

slide-10
SLIDE 10

COMPSTAT 2010 10

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Our approach

slide-11
SLIDE 11

COMPSTAT 2010 11

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

LV

Cliques of highly dependent SNPs

Latent variables (LV) synthetizing the information of SNP cliques Data dimension reduction

SNP SNP SNP SNP SNP SNP SNP LV

  • Reduce the data dimension by synthetizing the information
  • f highly dependent SNPs, due to LD.
slide-12
SLIDE 12

COMPSTAT 2010 12

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

  • Provide a flexible and adapted probabilistic model to reduce

dimension for genetic data.

Ch Characteris istics of data: dependences by blocs of SNPs Proposed mod

  • delli

ling Forest of Hierarchical Latent Class models (FHLCMs)

Genome sequence

Latent variables Observed variables (SNPs)

slide-13
SLIDE 13

COMPSTAT 2010 13

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

  • Advantages of this modelling:

→ hierarchical, thus :

  • various degrees of dimension reduction,
  • various degrees of LD strength,

→ each latent variable can reveal multiple-SNP patterns, potentially relevant to explain the disease, → contrary to Hierarchical Latent Class model, SNPs are not constrained to be dependent upon one another, → high-order interactions between SNPs can be taken into account.

slide-14
SLIDE 14

COMPSTAT 2010 14

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

  • Proposed algorithm to learn both parameters

and structure of FHLCMs from data: CFHLC (Construction of Forests of Hierarchical Latent Class models).

→ based on an agglomerative hierarchical procedure to ensure scalability, → uses clique partitioning methods for an efficient discovery of non-overlapping cliques of dependent SNPs, → not restricted to binary variables and binary trees, as Hwang et al.'s algorithm.

slide-15
SLIDE 15

COMPSTAT 2010 15

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Schema of the algorithm:

slide-16
SLIDE 16

COMPSTAT 2010 16

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Results and discussion

slide-17
SLIDE 17

COMPSTAT 2010 17

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

  • Protocol testing:

→ C++ implementation, → run on a standard pc (3.8 GHz, 3.3 Go RAM), → tested on simulated unphased genotypic data consisting of 2000 individuals and 1k, 10k or 100k SNPs, generated with the software Hapsimu.

slide-18
SLIDE 18

COMPSTAT 2010 18

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Scalability

slide-19
SLIDE 19

COMPSTAT 2010 19

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Visual display of a FHLCM: 100 snp sequence

Latent variables Observed variables (SNPs) High dependence regions Low dependence regions High-order dependences

slide-20
SLIDE 20

COMPSTAT 2010 20

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Conclusion and outlooks

slide-21
SLIDE 21

COMPSTAT 2010 21

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Conclusion:

  • CFHLC algorithm have been shown to be efficient on

genome-scaled data,

  • Can provide a data dimension reduction of 80%.

Perspectives:

  • Application on the detection of genetic associations

thanks to FHLCM's latent variables,

  • Visualization of LD structure through the FHLCM's

graph.

slide-22
SLIDE 22

COMPSTAT 2010 22

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Thanks for your attention

slide-23
SLIDE 23

COMPSTAT 2010 23

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

slide-24
SLIDE 24

COMPSTAT 2010 24

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Questions

slide-25
SLIDE 25

COMPSTAT 2010 25

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Impact of window size on running time

slide-26
SLIDE 26

COMPSTAT 2010 26

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

Impact of window size on dimension reduction

slide-27
SLIDE 27

COMPSTAT 2010 27

Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS

General on GWASs:

  • Balding D. (2006): a tutorial on statistical methods for

population association studies.

Specific to probabilistic graphical models:

  • Verzilli (2007): Bayesian graphical models for genome-wide

association studies.

  • Hwang (2006): learning hierarchical Bayesian networks for

large-scale data analysis.

Bibliography