Genome Wide Haplotype analyses Genome Wide Haplotype analyses of - - PowerPoint PPT Presentation

genome wide haplotype analyses genome wide haplotype
SMART_READER_LITE
LIVE PREVIEW

Genome Wide Haplotype analyses Genome Wide Haplotype analyses of - - PowerPoint PPT Presentation

Enabling Grids for E sciencE Enabling Grids for E-sciencE Genome Wide Haplotype analyses Genome Wide Haplotype analyses of human complex diseases with the EGEE grid ith th EGEE id Tregouet David david.tregouet@upmc.fr INSERM UMRS937


slide-1
SLIDE 1

Enabling Grids for E sciencE Enabling Grids for E-sciencE

Genome Wide Haplotype analyses Genome Wide Haplotype analyses

  • f human complex diseases

ith th EGEE id with the EGEE grid

Tregouet David – david.tregouet@upmc.fr INSERM UMRS937 – UPMC – Paris - France

www eu egee org

EGEE-III INFSO-RI-222667

www.eu-egee.org

EGEE and gLite are registered trademarks

slide-2
SLIDE 2

Enabling Grids for E-sciencE

Genome Wide Association Studies (GWAS)

Testing the association between a large number (~500K) of

  • Principle

Testing the association between a large number ( 500K) of single nucleotide polymorphisms (SNPs) and a variable of interest (e.g: a disease) in a large cohort of individuals Estimate the SNP allele frequencies in cases and controls

  • How ?

Estimate the SNP allele frequencies in cases and controls and calculate the corresponding statistical test yielding a pvalue

  • SNP definition

Genetic variation in a DNA sequence that occurs when a Genetic variation in a DNA sequence that occurs when a single nucleotide (~ base: A,C,G,T ) in a genome is altered. Often considered as a binary 0/1 variable

EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 2

y

slide-3
SLIDE 3

Enabling Grids for E-sciencE

GWAS' main limits

  • Only single SNP associations are tested
  • May miss 'haplotypic' interaction between SNPs

located in the same gene (or region) g ( g )

– Haplotype: Combination of alleles on a given chromosome – For example , with 2 SNPs (C/T & G/A) → 4 haplotypes

C G C A

One may want to test for difference in haplotype frequencies between cases and controls

T G T A

It may happen that only one haplotype is at risk

EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 3

slide-4
SLIDE 4

Enabling Grids for E-sciencE

Genome Wide Haplotype Analysis (GWHAS)

  • Is it possible ?

2 SNPs : up to 4 haplotypes (i e 00|01|10|11) 2 SNPs : up to 4 haplotypes (i.e 00|01|10|11) 3 SNPs : up to 8 haplotypes (i.e 000|001|010|011|100|101|110|111) In a window (eg a gene or a region) of n SNPs up to 2n haplotypes In a window (eg a gene or a region) of n SNPs, up to 2 haplotypes a large number of tests / comparisons have to be carried out

  • Yes...but

a large number of tests / comparisons have to be carried out to identify which combination of SNPs is the best predictor for the disease ?

EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 4

slide-5
SLIDE 5

Enabling Grids for E-sciencE

Genome Wide Haplotype Analysis (GWHAS)

  • Is it possible ?

2 SNPs : up to 4 haplotypes (i e 00|01|10|11) 2 SNPs : up to 4 haplotypes (i.e 00|01|10|11) 3 SNPs : up to 8 haplotypes (i.e 000|001|010|011|100|101|110|111) In a window (eg a gene or a region) of n SNPs up to 2n haplotypes In a window (eg a gene or a region) of n SNPs, up to 2 haplotypes Example: In a window of 10 adjacent SNPs, restricting the p j , g haplotypes of length 4 lead to 375 combinations to be tested:

[SNP1 + SNP2] [SNP1 + SNP2 + SNP3] [SNP1 + SNP3] .......................... [SNP1 + SNP10] [SNP2 + SNP3] [ ] [SNP1 + SNP2 + SNP4] ...................................... [SNP1 + SNP9 + SNP10] [SNP2 + SNP3 + SNP4] [SNP1 + SNP2 + SNP3 +SNP4] ...................................... [SNP1 + SNP6 + SNP7 +SNP10] [SNP2 SNP3] ........................... [SNP2 + SNP10] ........................... [SNP9 + SNP10] [SNP2 + SNP3 + SNP4] ........................................ [SNP3 + SNP6 +SNP8] ....................................... [SNP8 + SNP9 + SNP10] ....................................... [SNP7 + SNP8 + SNP9 + SNP10]

EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 5

[SNP9 + SNP10] [SNP8 + SNP9 + SNP10]

slide-6
SLIDE 6

Enabling Grids for E-sciencE

Genome Wide Haplotype Analysis (GWHAS)

  • GWHAS are possible but are extremely computationnally

demanding !!!! g

  • Distribution of the haplotypic calculations on EGEE

p yp

–Development of an easygLite interface –Python & Perl script for results ' visualization Python & Perl script for results visualization

EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 6

slide-7
SLIDE 7

Enabling Grids for E-sciencE

GWHAS on Coronary Artery Disease (CAD)

  • WTCCC data: 1926 CAD patients & 2938 healthy controls
  • 378,000 SNPs

Slidi i d h h h

  • Sliding windows approach on each chromosome

Windows of size 10 Haplotype composed of up to 4 SNPs ap otype co posed o up to S s

1 to 10 2 to 11 3 to 12 (n-10) to n .....

  • Search for regions where haplotypes are stronger

di f CAD i k h SNP l predictors of CAD risk than SNP alone

EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 7

slide-8
SLIDE 8

Enabling Grids for E-sciencE

GWHAS on Coronary Artery Disease

  • 8.1 millions of combinations tested in less than 45 days

(instead of more than 10 years on a single Pentium 4) ( y g )

  • 29 regions where haplotypes could be better predictors

than SNPs alone were identified

  • To

control for false positives , replication was investigated in about 7000 CAD patients and 7000 controls controls

  • One region on chromosome 6 was confirmed
  • One region on chromosome 6 was confirmed

EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 8

slide-9
SLIDE 9

Enabling Grids for E-sciencE

Nature Genetics doi:10.1038/ng.314

EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 9

slide-10
SLIDE 10

Enabling Grids for E-sciencE

Conclusions

  • Genome Wide Haplotype Association Studies are now

a reality thanks to the use of Grid technology y gy

  • Using EGEE, we were able to identify a cluster of 3

g , y genes where haplotypes are strongly associated with CAD risk (Tregouet et al. Nature Genetics March 2009)

  • Possibility to apply such tool to other human diseases

(Diabetes, Cancer....)

  • Possibility to use EGEE to investigate interactions

between SNPs that are not necesseraly in the same gene/region

EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 10

gene/region

slide-11
SLIDE 11

Enabling Grids for E-sciencE

Credits

François Cambien Alexandru Munteanu

UMRS 937

Alexandru Munteanu Laurence Tiret Claire Perret Nilesh Samani Heribert Schunkert Inke König Jeannette Erdmann Andreas Ziegler ....

UMR 8623 UMR 8623

LRI

Cécile Germain

EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 11