In silico blood genotyping from exome sequencing data Silvio - - PowerPoint PPT Presentation

▶

Apr 01, 2023 185 likes •400 views

In silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova, Italy URL: http://protein.bio.unipd.it/ Today Personalized genetics has been upon us for some time How

SLIDE 1

In silico blood genotyping from exome sequencing data

Silvio Tosatto

BioComputing UP, Department of Biology, University of Padova, Italy URL: http://protein.bio.unipd.it/

SLIDE 2

Today

Personalized genetics has been upon us for some time
How good are we at actually identifying phenotype from whole genome?

SLIDE 3

The CAGI Personal Genom e Project ( PGP) Challenge

Few goals are more pure to genome interpretation than predicting traits

from raw sequence (or genotype) data

In this CAGI challenge, phenotypes/traits are

predicted for real people with genetic data

10 individual’s genetic information from the

Personal Genome Project are provided (PGP-10)

Dataset provided by George Church

SLIDE 4

Personal genome project (PGP) ‐ Predict individuals’ phenotype

Numerical traits

33. Birth weight (in g)
34. HDL level (in mg/dL) *
35. LDL level (in mg/dL) *
36. Triglyceride level

(in mg/dL) *

37. Fasting blood glucose level

(in mg/dL)

38. Warfarin dose (in mg)
39. Age at Menarche
40. Annual income (in $)

SLIDE 5

Numerical traits

33. Birth weight (in g)
34. HDL level (in mg/dL) *
35. LDL level (in mg/dL) *
36. Triglyceride level

(in mg/dL) *

37. Fasting blood glucose level

(in mg/dL)

38. Warfarin dose (in mg)
39. Age at Menarche
40. Annual income (in $)

Personal genome project (PGP) ‐ Predict individuals’ phenotype

SLIDE 6

Blood Groups

Clear genetic cause of

phenotypes

Model system for phenotype

prediction

Good description in literature
High relevance, especially

for blood transfusions

(Blood. 2009;114: 248-256)

SLIDE 7

Exam ple: ABO glycosyltransferase

Blood Grp Genes Antigens

ABO ABO A, B, O

Amino acid residues differing between blood group A- and B-active transferases, respectively (Arg176Gly; Gly235Ser; Leu266Met; Gly268Ala) are shown with the single-letter code and their positions indicated.

SLIDE 8

Relevant Blood Types

Blood Grp Genes Antigens

ABO ABO A, B, O RH RHCE, RHD D, E, C plus 50 minor DUFFY DARC FY(a), FY(b) Kell KEL K1, K2 plus 23 minor Diego SLC4A1 Dia, Dib, Wra, Wrb Kidd SLC14A1 Jk(a), Jk(b) Lewis FUT3 a, b Lutheran BCAM Lu(a), Lu(b) plus 15 minor MNS GYPA, GYPB, GYBE M, N, S plus 40 minor Bombay FUT1, FUT2 H, secretor

10 out of ca. 30 blood groups are relevant for transfusions

SLIDE 9

BOOGI E: BlOOd Group I dEntifier

A knowledge-based system to predict blood groups from sequencing data
All 10 groups relevant for blood transfusions are predicted
A specialized genotype-phenotype knowledge base is required

SLIDE 10

BOOGI E: Know ledge representation

Stored in tree-like structure
Rules expressed in “if <mutation(s)>

then <phenotype(s)>” form

SLIDE 11

BOOGI E: Know ledge collection

– Manually curated – 580 rules derived

Blood G rp G enes Antigens

ABO ABO A, B, O RH RH CE, RHD D, E, C plus 50 m inor DUFFY DARC FY(a), FY(b) Kell KEL K1, K2 plus 23 m inor Diego SLC4A1 Dia, Dib, Wra, Wrb Kidd SLC14A1 Jk(a), Jk(b) Lewis FUT3 a, b Lutheran BCAM Lu(a), Lu(b) plus 15 m inor M NS GYPA, GYPB, GYBE M , N, S plus 40 m inor Bom bay FUT1, FUT2 H, secretor

SLIDE 12

Relevant variants Gene‐based annotation of variants Select conserved positions Remove unrelated genes

ANNOVAR ANNOVAR

(Wang et al., Nucleic Acids Research 2010)

Millions of SNVs

ANNOVAR is used to reduce the SNVs to manageable number.

Few relevant SNVs

SLIDE 13

BOOGI E Pipeline

Blood G rp G enes Antigens

ABO ABO A, B, O RH RHCE, RHD D, E, C plus 50 m inor DUFFY DARC FY(a), FY(b) Kell KEL K1, K2 plus 23 m inor Diego SLC4A1 Dia, Dib, W r a, Wr b Kidd SLC14A1 Jk(a), Jk(b) Lewis FUT3 a, b Lutheran BCAM Lu(a), Lu(b) plus 15 m inor M NS GYPA, GYPB, GYBE M , N, S plus 40 m inor Bom bay FUT1, FUT2 H, secretor

SLIDE 14

Benchm arking

BOOGIE covers all known blood group variants
Difficulty in finding genome sequences with known blood phenotypes
Personal Genome Project (PGP) as annotated benchmark set

SLIDE 15

Personal Genom e Project ( PGP)

The mission of the PGP is to encourage the development of personal genomics

10 individual’s genetic information from the

Personal Genome Project are provided (PGP-10)

A larger dataset (PGP-1K) aims to cover at least

1,000 genomes

Unfortunately, only ABO and Rh blood group information is available

SLIDE 16

PGP-1 0 Data

Back row (left to right): James Sherley, Misha Angrist, John Halamka, Keith Batchelder, Rosalynn Gill. Front row (left to right): Esther Dyson, George Church, Kirk Maxey. Not shown: Stan Lapidus and Steven Pinker.

SLIDE 17

PGP-1 0 Data

SLIDE 18

PGP-1 0 Results

PGP1 PGP4 PGP8 Known O + A - B + ABO O A B Rh c; e; weak D c; e; weak D c; e; weak D DUFFY FY(a+); FY(b-) FY(a-); FY(b+) FY(a-); FY(b+) KELL K2; K21+; K4-; K3-; K11; K17; K14; K24; K6+; K7- K2; K21+; K4-; K3-; K11; K17; K14; K24; K6+; K7- K2; K21+; K4-; K3-; K11; K17; K14; K24; K6+; K7- Diego Dib; Memph neg Dib; Memph neg Dib; Memph neg KIDD Jk(a-); Jk(b+) Jk(a-); Jk(b+) Jk(a+); Jk(b-) Lewis negative negative negative Lutheran Lu(a-); Lu(b+); Lu6+; Lu9-; Lu4; Lu8+; Aua+;Aub- Lu(a-); Lu(b+); Lu6-; Lu9+;Lu4-; Lu8+; Aua-;Aub+ Lu(a-); Lu(b+); Lu6+; Lu9-;Lu4-; Lu8+; Aua+;Aub- MNS M; S M; s M,s Bombay H+; secretor H+; secretor H+; secretor

BOOGIE predicts correctly all ABO types and all except one (PGP-4) Rh groups

SLIDE 19

PGP-1 K Results

A second dataset was built from all PGP-1K participants with available

blood group information for a total of 22 individuals

This dataset contains micro array data (23&me SNPs)

P = predicted R = real

* = missing blood group relevant SNPs from dataset

SLIDE 20

Conclusions

We developed a method, called BOOGIE, to predict the ten blood

groups relevant for transfusions from sequencing data

– Specialized knowledgebase with 580 genotype to phenotype rules – Novel variants can be easily considered

Benchmarking was (so far) only possible on PGP data for the ABO and

Rh blood groups

– The ABO and Rh systems are correctly predicted in 85-100% of cases – The Rh- type presents some additional difficulties

SLIDE 21

Acknowledgements Acknowledgements

Manuel Giollo Giovanni Minervini Marta Scalzotto (not shown) Emanuela Leonardi Carlo Ferrari

URL: URL: http:// http://protein.bio.unipd.it protein.bio.unipd.it/ /

Funding

FIRB Futuro in Ricerca

Università di Padova CARIPLO AIRC