Protein Domain-Centric Approach to Study Cancer Somatic Mutations - - PowerPoint PPT Presentation

protein domain centric approach to study cancer somatic
SMART_READER_LITE
LIVE PREVIEW

Protein Domain-Centric Approach to Study Cancer Somatic Mutations - - PowerPoint PPT Presentation

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC The term protein domain (or domain ) refers to a region of the


slide-1
SLIDE 1

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies

  • Dr. Maricel G. Kann

Assistant Professor Dept of Biological Sciences UMBC

slide-2
SLIDE 2

2

The term protein domain (or domain) refers to a region of the protein with compact structure, usually with a hydrophobic core.

slide-3
SLIDE 3

Protein Domains

3 Aloy and Russell, 2006

Protein Domains mediate 75% of the protein-protein interactions Most proteins are multi-domains (65% of Eukaryotic and 40% Prokaryotic). Domains represent the functional units of the Proteins.

slide-4
SLIDE 4

Reduces the space of inquiry

 ≈22,000 human genes  ≈34,500 human RefSeq proteins  Over 550,000 human proteins from all

databases listed in NCBI

 Fewer than 4,500 human protein domains

slide-5
SLIDE 5

Majority of Disease Mutations are Inside Domains

Swiss-Prot Polymorphisms Swiss-Prot Disease Mutations Inside 52% Inside 82% Outside 18% Outside 48%

slide-6
SLIDE 6

Different domains in the same protein may play different roles

SPTB protein Spectrin beta chain, erythrocytic

actin binding domains helix forming domains Elliptocytosis mutations Spherocytosis mutation Edgetic perturbation models of human inherited disorders Zhong et al., Mol. Syst. Biol. 5, 321 (2009)

slide-7
SLIDE 7

Protein Domain Disease Hotspots

Shared Domain

Protein 2

DMDM Domain View

1 3 1 Protein 3 Protein 1

Domain Disease Hotspot

Mutation Count

slide-8
SLIDE 8

Protein Domains

8

Human CFTR

slide-9
SLIDE 9

9

http://bioinf.umbc.edu/DMDM

slide-10
SLIDE 10

 ABCC_CFTR1 domain (nucleotide binding domain 1)  Significant hotspot at position 172

10

slide-11
SLIDE 11

DS-Score

 DS-Score (domain significance score) is a

statistical measure designed to identify significantly mutated domain positions

11

slide-12
SLIDE 12

Data and Methods

 DS-Score (domain significance score)

 Derived from the probability for a domain

position to contain its number of disease mutations given the domain length and total number of mutations mapping to the domain

12

slide-13
SLIDE 13

Study of Domain Hotspots for Disease

 DMDM reveals domain hotspots for both

cancer and non-cancer disease mutations (non-cancers mostly Mendelian diseases)

 Use DS-Score to analyze disease

mutations data

 Different mutation hotspot profiles for these

two different classes of disease?

 Different mutation hotspot profiles for known

  • ncogenes and tumor suppressors?

13

slide-14
SLIDE 14

Mutations at Domain Hotspots

14

Cancer Non-cancer Randomized Cancer Randomized Non-cancer Mutations at position-based hotspots 13.7% 6.4% 0.06 (±0.03)% 0.06 (±0.03)% Mutations at feature-based hotspots 29.2% 10.5% 0.06 (±0.03)% 0.06 (±0.03)% Mutations at positions with ≥ 2 mutations 54.4% 58.8% 33.2 (± 2.2)% 30.8 (± 3.1)%

 P-values ≈ 0.0 for position- and feature-based hotspots  P-value < 0.05 for mutations at positions with ≥ 2 mutations

Peterson, T.A., et al. (2012) J Am Med Inform Assoc 19, 275-83.

slide-15
SLIDE 15

Hotspots at Highly Conserved Positions

15

Cancer Non-cancer Position-based hotspots at conserved positions 58.1% 51.2% Feature-based hotspots at conserved positions 67.6% 61.7% Cancer Non-cancer Correlation Coefficient (DS-Score, Conservation Score) 0.19 0.10

Peterson, T.A., et al. (2012) J Am Med Inform Assoc 19, 275-83.

slide-16
SLIDE 16

DS-Score Distributions

16

slide-17
SLIDE 17

Cancer Genes with Hotspots Above DS-Score 9.5

17

Gene Type Function ALK Oncogene Receptor kinase BRAF Oncogene Protein kinase EGFR Oncogene Receptor kinase FLT3 Unknown Receptor kinase GNAI2 Unknown GTPase GNAS Oncogene GTPase HRAS Oncogene GTPase KIT Oncogene Receptor kinase KRAS Oncogene GTPase MET Oncogene Receptor kinase NRAS Oncogene GTPase PDGFRA Oncogene Receptor kinase RRAS2 Oncogene GTPase

slide-18
SLIDE 18

Cancer Genes with Hotspots Below DS-Score 9.5

18

Gene Type Function ABL1 Oncogene Protein kinase CDK4 Unknown Protein kinase CHEK2 Tumor Suppressor Cell cycle regulator FGFR3 Unknown Receptor kinase MAP2K3 Unknown Protein kinase MEN1 Tumor Suppressor DNA repair (unclear) NF1 Tumor Suppressor RAS pathway regulator NTRK1 Oncogene Receptor kinase PIK3CA Oncogene Lipid kinase PTPN11 Oncogene Protein phosphatase RET Oncogene Receptor kinase STK11 Tumor Suppressor Protein kinase TGFBR2 Unknown Receptor kinase WT1 Tumor Suppressor Transcription factor

slide-19
SLIDE 19

DS-Score for Variant Classification

19

Novel Variants Likely Deleterious Likely Neutral * Map to Domain DS-Score Hotspot

slide-20
SLIDE 20

20

Method Specificity (%) Precision (%) Precision of DS- Score with LogR.E-value (%)

SIFT (1)

76.2 82.0 N/A

LogR.E-value (2)

78.2 81.3 N/A

Position-based DS-Score

99.5 91.6 95.7

Feature-based DS-Score

98.6 87.2 91.0

Domain positions with ≥ 2 mutations

94.2 85.6 91.9

DS-Score for Variant Classification

1. Ng, P.C. and Henikoff, S. (2003) NAR, 31, 3812-14. 2. Clifford, R.J., M.N. Edmonson, C. Nguyen, et al., Bioinformatics, 2004. 20(7): p. 1006-14.

. Sensitivity: 3.3, 6.5 and 20.5 %

slide-21
SLIDE 21

 Cancer Mutation Prevalence score  Frequency of mutations in different

contexts varies across cancers

Scoring gene peaks and “hills”

 CaMP Scores consider neighboring bases

(25 contexts)

21

Wood, L.D., D.W. Parsons, S. Jones, et al., The genomic landscapes of human breast and colorectal cancers. Science, 2007. 318(5853): p. 1108-13.

slide-22
SLIDE 22

From Gene to Domain Landscape

22

Each point in the grid domain landscapes represents a domain, and the peaks are estimated by aggregating all mutations for all human proteins with such domain.

slide-23
SLIDE 23

Scoring Domain Landscapes

 We used domain-based counts of

mutations and accounting from the different mutational contexts

 We estimated the DL-Score or domain

landscape score (binomial distribution, considering mutational context and aggregating all mutations for all human proteins with the domain).

23

slide-24
SLIDE 24

The Cancer Genome Atlas

24

 TCGA sequence projects that we used were:  100 colon adenocarcinoma patients  522 breast invasive carcinoma patients  253 lung adenocarcinoma patients

slide-25
SLIDE 25

Summary of somatic mutations occurring in the exomes of 100 colon cancer tumor samples. Synonymous SNVs and variants present in dbSNP (release 130) were removed due to their low likelihood of being driver mutations.

25

Total patients 100 Total mutations 21,572 Total nonsynonymous SNVs 17,174 (79.6%) Total frameshift insertions 2,527 (11.7%) Total nonframeshift insertions 239 (1.1%) Total frameshift deletions 5 (0.0%) Total nonframeshift deletions 0 (0.0%) Total stop-loss SNVs 33 (0.2%) Total stop-gain SNVs 1,594 (7.4%) Mutations in domain regions 10,647 (49.4%) Average mutations per patient 216 (± 552) Number of mutations per patient 21-4,880

Domain Landscape of Colon Cancer

slide-26
SLIDE 26

Gene and Doman Landscapes

26

slide-27
SLIDE 27

Selected domains highly mutated in colon cancer tumors

27

Nehrt LN, Peterson TA, Park DH and Kann MG, Domain Landscapes of somatic mutations in cancer.BMC Genomics. 13 (2012)

FILIP1L is known to inhibit proliferation and migration and increase apoptosis in endothelial cells, it acts as a tumor suppressor and its loss of function has been implicated in ovarian cancer, head and neck squamous cell carcinoma and oli- godendrogliomas [38,39].

slide-28
SLIDE 28

Shared gene and domain peaks in colon and breast cancer landscapes

28

Nehrt LN, Peterson TA, Park DH and Kann MG, Domain Landscapes of somatic mutations in cancer.BMC Genomics. 13 (2012)

slide-29
SLIDE 29

PIK3CA domains prevalence

29

Nehrt LN, Peterson TA, Park DH and Kann MG, Domain Landscapes of somatic mutations in cancer.BMC Genomics. 13 (2012)

slide-30
SLIDE 30

Advantages of using domain-centric approaches for analysis of disease mutations

 Domain view gives the functional context

  • f the mutation

 Domain view reduces the space of inquiry  Majority of disease mutations in coding

regions occur inside domains

slide-31
SLIDE 31

Summary Part I (mutations with known significance to phenotype)

 Disease mutations tend to significantly

cluster at certain domain positions

 The DS-Score or domain significance score

is derived from known disease mutations

 DS-Score can be used to classify

mutationss and will benefit from the increase on disease mutational databases

31

slide-32
SLIDE 32

Summary Part II (mutations with unknown significance to phenotype)

 Domain landscape allows for the

visualization of cluster of cancer somatic mutations at the domain level

 The domain landscape score is derived

from the analysis of tumor mutations from exomic or whole sequencing data

 A gradient of mutation prevalence in

cancer studies can be found across the different domains of a gene (PIK3CA).

32

slide-33
SLIDE 33

33

Thanks!