http://www.genomeinterpretation.org/ Organizers Steven E. Brenner , - - PowerPoint PPT Presentation

http genomeinterpretation org
SMART_READER_LITE
LIVE PREVIEW

http://www.genomeinterpretation.org/ Organizers Steven E. Brenner , - - PowerPoint PPT Presentation

CAGI@AIMM Update on Community Experiment on Genome Interpretation Silvio Tosatto BioComputing UP, Department of Biology, University of Padova, Italy URL: http://protein.bio.unipd.it/ http://www.genomeinterpretation.org/ Organizers Steven


slide-1
SLIDE 1

CAGI@AIMM –

Update on Community Experiment

  • n Genome Interpretation

Silvio Tosatto

BioComputing UP, Department of Biology, University of Padova, Italy URL: http://protein.bio.unipd.it/

slide-2
SLIDE 2
slide-3
SLIDE 3

http://www.genomeinterpretation.org/

slide-4
SLIDE 4

Organizers

Steven E. Brenner, University of California, Berkeley John Moult, IBBR, University of Maryland Susanna Repo, University of California, Berkeley

http://www.genomeinterpretation.org/

slide-5
SLIDE 5

Critical Assessment of Genome Interpretation

Organismal

A T

Cellular

A T

CASP-like effort for human genome variation interpretation

Molecular

A T

slide-6
SLIDE 6

Goals of the CAGI experiment

  • Determine the state of the art
  • Identify progress and innovations
  • Reveal bottlenecks and guide future effort
  • Highlight new challenges
  • Collaboratively develop new approaches
slide-7
SLIDE 7

CAGI 2011 experiment

slide-8
SLIDE 8

2011: 11 challenges, total of 114 submissions, ~160 registered on the website 2010: 6 challenges, total of 108 submissions, ~60 registered on website

slide-9
SLIDE 9

CAGI 2011: 21 participating groups CAGI 2010: 17 participating groups

CAGI 2011 participating groups

Participated both 2011 and 2010

slide-10
SLIDE 10

Homocysteine Serine Cystathione Cysteine CBS

cystathionase

Cystathionine β‐Synthase (CBS) single amino acid mutations

PLP CBS variants associated with homocystinuria Treat with high dose of B6

slide-11
SLIDE 11

Cystathionine β‐Synthase (CBS) single amino acid mutations

Substituted Residue Growth rate 400 ng/ml PLP

D140N 103 +/- 25 A207G N225S 70 +/- 12 I264T 109 +/- 14 W323G A357G 104 +/- 19

Total of 84 mutations assessed experimentally

Dataset provided by Jasper Rine, University of California, Berkeley Assessed by Iddo Friedberg, Miami University

slide-12
SLIDE 12

Probability of

  • bserving the

experimental value Predictions Experimental relative growth rate Predicted relative growth rate

slide-13
SLIDE 13

Probability of observing the exp. value

Experimental and predicted relative growth rate

slide-14
SLIDE 14

CBS Challenge – Spearman’s rank correlation

slide-15
SLIDE 15
slide-16
SLIDE 16

Dataset provided by Rick Lathrop, and the p53 “cancer rescue” team University of California, Irvine Assessed by Gad Getz, Broad Institute p53 Cancer mutant Rescue mutation G245S N239F G245S F113L G245S S240Y G245S T123P G245S N239Y

p53 core domain mutations that restore activity of inactive p53

Baronio R et al. Nucl. Acids Res. 2010; nar.gkq571

14,668 variations to predict

slide-17
SLIDE 17

Comparing predictions to ground truth

1: Yana Bromberg Lab 2: Yana Bromberg 3: Yana Bromberg 4: SWITCH Lab, Greet De Baets 5: Rita Casadio Lab 6: George Shackelford Lab 7: Sean Mooney Lab 8: Sean Mooney Lab

slide-18
SLIDE 18

ROC curves for submissions

M237I

1: Yana Bromberg Lab 2: Yana Bromberg 3: Yana Bromberg 4: SWITCH Lab, Greet De Baets 5: Rita Casadio Lab 6: George Shackelford Lab 7: Sean Mooney Lab 8: Sean Mooney Lab

slide-19
SLIDE 19

Identify Crohn’s disease patients from healthy individuals

Dataset provided Andre Franke, Christian-Albrechts-University Kiel Assessed by Alexander Morgan, Stanford University

Exome sequences from 4 different groups sequenced on different machines in different batches Not a case/control study!

slide-20
SLIDE 20

Challenge: Distinguish between exomes of Chron’s disease patients and healthy individuals

Multifactorial or complex diseases Exomes of 56 individuals Who has Crohn’s disease?

slide-21
SLIDE 21

Assessm ent Assessm ent : : 42 / 56

42 / 56 have have Crohn’ Crohn’s s disease disease

slide-22
SLIDE 22

Assessm ent Assessm ent : : 42 / 56

42 / 56 have have Crohn’ Crohn’s s disease disease

slide-23
SLIDE 23

Assessm ent Assessm ent : : 42 / 56

42 / 56 have have Crohn’ Crohn’s s disease disease

#119

(ySNAP?)

#94

(UniPadova)

slide-24
SLIDE 24

Today

  • Personalized genetics has been upon us for some time
  • How good are we at actually identifying phenotype from whole genome?
slide-25
SLIDE 25

Personal genome project (PGP) ‐ Predict individuals’ phenotype

Dataset provided by George Church, Harvard Medical School Assessed by Sean Mooney, Buck Institute

Numerical traits

  • 33. Birth weight (in g)
  • 34. HDL level (in mg/dL) *
  • 35. LDL level (in mg/dL) *
  • 36. Triglyceride level

(in mg/dL) *

  • 37. Fasting blood glucose level

(in mg/dL)

  • 38. Warfarin dose (in mg)
  • 39. Age at Menarche
  • 40. Annual income (in $)
slide-26
SLIDE 26

The Submitters

  • s122, s123:UniPadova (2 submissions) PI: Silvio

Tosatto

– ANNOVAR + literature + database + expert knowledge – random prediction

  • s125: Netbiolab PI: Insuk Lee

– SIFT + database (for population frequency) + GWAS

  • s126: KarchinLab PI: Rachel Karchin

– Karchin: Bayes network + database (GWAS)

Late Submission

Shamil Sunyaev’s Lab, Harvard University

slide-27
SLIDE 27

The Probabilities in the 10

  • Mostly zero

Trait Name Frequency PositiveNum PGPCount 1Asthma 0.25 2 8 2Crohn's disease 8 3Ulcerative colitis 8 4Irritable bowel syndrome 0.111 1 9 5Rheumatoid arthritis 8 6Type II Diabetes 8 7Coronary artery disease 8 8Long QT Syndrome 8 9Hypertrophic cardiomyopathy 8 10Glaucoma 0.125 1 8 11Color blindness 0.125 1 8 12Bipolar disorder 8 13Celiac disease 8 14Psoriasis 8 15Lupus 8 16Breast cancer 8 17Prostate cancer 8 18Migraine 8 19Lactose intolerance 7 20Dyslexia 0.125 1 8 21Autism 8 22Osteoporosis 7 23Incontinence 8 24Kidney stones 8 25Varicose veins 8 26Sleep Apnea 0.143 1 7 27Tongue rolling (tube) 0.875 7 8 28Phenylthiocarbamide tasting 1 4 4 29Blood type - Has A antigen? 0.625 5 8 30Blood type - Has B antigen? 0.143 1 7 31Blood type - Is Rh(D) positive? 0.875 7 8 32Absolute pitch 6

slide-28
SLIDE 28

The Binary Traits

Results by team – only the Karchin team is statistically significant

Submission Total Traits Predicted Traits Precision Recall AUC P UniPadova 228 216 0.094 0.3 0.605 0.133 UniPadova 228 228 0.118 0.095 0.405 0.923 Netbiolab 228 220 0.024 0.214 0.225 1 KarchinLab 228 228 0.652 0.714 0.896

slide-29
SLIDE 29

The Binary Traits ‐ ROC

Only S126 (Karchin lab) is statistically significant

Submissions: S122: UniPadova S123: UniPadova (random) S125: Netbiolab S126: KarchinLab

slide-30
SLIDE 30

Numerical Numerical traits traits

We are still in the “game” phase…

slide-31
SLIDE 31

Extra Questions

Special questions: (a) One of the PGP10 individuals has irritable bowel

  • syndrome. Who is that? (Answer: PGP7)

(b) One of the PGP10 individuals is color‐blind. Which one? (Answer: PGP10) (c) One of the PGP10 individuals is not color‐blind but she has a color‐blind father and an affected son. Who is that? (Answer: PGP9) Karchin Lab got all correct, UniPadova got one correct

slide-32
SLIDE 32

Some conclusions

  • Knowledge of individual gene is important (CBS)
  • Methods are highly significant (P‐value) but of questionable

clinical applicability (r2 ~0.7)

  • Different methods succeed at different challenges, and with

different assessments

  • Predictions on the Personal Genome Project panel improved, but

largely by better modeling the prior

  • Metapredictors unlikely to yield huge improvements currently
  • Unexpected success in predicting Crohn’s disease

CAGI 2012

  • Challenges about to be released… (September 2012)
  • Conference scheduled for mid-December 2012
slide-33
SLIDE 33

Acknowledgements

Organizers Steven E. Brenner, University of California, Berkeley John Moult, IBBR, University of Maryland Susanna Repo, University of California, Berkeley Data Providers Adam P. Arkin, UC Berkeley George Church, Harvard Medical School Andre Franke, Christian‐Albrechts‐University Kiel Joe W. Gray, OHSU Rick Lathrop, UC Irvine John Moult, University of Maryland Jasper Rine, UC Berkeley Jeremy Sanford, UC Santa Cruz Nicole Schmitt, University of Copenhagen Jay Shendure, University of Washington Michael Snyder, Stanford University Sean Tavtigian, University of Utah Assessors Rui Chen, Stanford University, Gad Getz, Broad Institute Iddo Friedberg, Miami University Sean Mooney, Buck Institute Alexander A. Morgan, Stanford University
Artem Sokolov, University of California, Santa Cruz
Josh Stuart, University

  • f California, Santa Cruz
Sean Tavtigian, University of Utah

Website Development and Administration, Data Analysis Maya Zuhl, IBBR, University of Maryland Sri Jyothsna Yeleswarapu, Tata Consultancy Services Gaurav Pandey, Mount Sinai School of Medicine