Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies
- Dr. Maricel G. Kann
Assistant Professor Dept of Biological Sciences UMBC
Protein Domain-Centric Approach to Study Cancer Somatic Mutations - - PowerPoint PPT Presentation
Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC The term protein domain (or domain ) refers to a region of the
Assistant Professor Dept of Biological Sciences UMBC
2
The term protein domain (or domain) refers to a region of the protein with compact structure, usually with a hydrophobic core.
3 Aloy and Russell, 2006
Protein Domains mediate 75% of the protein-protein interactions Most proteins are multi-domains (65% of Eukaryotic and 40% Prokaryotic). Domains represent the functional units of the Proteins.
≈22,000 human genes ≈34,500 human RefSeq proteins Over 550,000 human proteins from all
Fewer than 4,500 human protein domains
actin binding domains helix forming domains Elliptocytosis mutations Spherocytosis mutation Edgetic perturbation models of human inherited disorders Zhong et al., Mol. Syst. Biol. 5, 321 (2009)
Protein 2
1 3 1 Protein 3 Protein 1
Mutation Count
8
9
ABCC_CFTR1 domain (nucleotide binding domain 1) Significant hotspot at position 172
10
DS-Score (domain significance score) is a
11
DS-Score (domain significance score)
Derived from the probability for a domain
12
DMDM reveals domain hotspots for both
Use DS-Score to analyze disease
Different mutation hotspot profiles for these
Different mutation hotspot profiles for known
13
14
Cancer Non-cancer Randomized Cancer Randomized Non-cancer Mutations at position-based hotspots 13.7% 6.4% 0.06 (±0.03)% 0.06 (±0.03)% Mutations at feature-based hotspots 29.2% 10.5% 0.06 (±0.03)% 0.06 (±0.03)% Mutations at positions with ≥ 2 mutations 54.4% 58.8% 33.2 (± 2.2)% 30.8 (± 3.1)%
P-values ≈ 0.0 for position- and feature-based hotspots P-value < 0.05 for mutations at positions with ≥ 2 mutations
Peterson, T.A., et al. (2012) J Am Med Inform Assoc 19, 275-83.
15
Cancer Non-cancer Position-based hotspots at conserved positions 58.1% 51.2% Feature-based hotspots at conserved positions 67.6% 61.7% Cancer Non-cancer Correlation Coefficient (DS-Score, Conservation Score) 0.19 0.10
Peterson, T.A., et al. (2012) J Am Med Inform Assoc 19, 275-83.
16
17
Gene Type Function ALK Oncogene Receptor kinase BRAF Oncogene Protein kinase EGFR Oncogene Receptor kinase FLT3 Unknown Receptor kinase GNAI2 Unknown GTPase GNAS Oncogene GTPase HRAS Oncogene GTPase KIT Oncogene Receptor kinase KRAS Oncogene GTPase MET Oncogene Receptor kinase NRAS Oncogene GTPase PDGFRA Oncogene Receptor kinase RRAS2 Oncogene GTPase
18
Gene Type Function ABL1 Oncogene Protein kinase CDK4 Unknown Protein kinase CHEK2 Tumor Suppressor Cell cycle regulator FGFR3 Unknown Receptor kinase MAP2K3 Unknown Protein kinase MEN1 Tumor Suppressor DNA repair (unclear) NF1 Tumor Suppressor RAS pathway regulator NTRK1 Oncogene Receptor kinase PIK3CA Oncogene Lipid kinase PTPN11 Oncogene Protein phosphatase RET Oncogene Receptor kinase STK11 Tumor Suppressor Protein kinase TGFBR2 Unknown Receptor kinase WT1 Tumor Suppressor Transcription factor
19
20
Method Specificity (%) Precision (%) Precision of DS- Score with LogR.E-value (%)
SIFT (1)
76.2 82.0 N/A
LogR.E-value (2)
78.2 81.3 N/A
Position-based DS-Score
99.5 91.6 95.7
Feature-based DS-Score
98.6 87.2 91.0
Domain positions with ≥ 2 mutations
94.2 85.6 91.9
1. Ng, P.C. and Henikoff, S. (2003) NAR, 31, 3812-14. 2. Clifford, R.J., M.N. Edmonson, C. Nguyen, et al., Bioinformatics, 2004. 20(7): p. 1006-14.
. Sensitivity: 3.3, 6.5 and 20.5 %
Cancer Mutation Prevalence score Frequency of mutations in different
CaMP Scores consider neighboring bases
21
Wood, L.D., D.W. Parsons, S. Jones, et al., The genomic landscapes of human breast and colorectal cancers. Science, 2007. 318(5853): p. 1108-13.
22
We used domain-based counts of
We estimated the DL-Score or domain
23
24
TCGA sequence projects that we used were: 100 colon adenocarcinoma patients 522 breast invasive carcinoma patients 253 lung adenocarcinoma patients
Summary of somatic mutations occurring in the exomes of 100 colon cancer tumor samples. Synonymous SNVs and variants present in dbSNP (release 130) were removed due to their low likelihood of being driver mutations.
25
Total patients 100 Total mutations 21,572 Total nonsynonymous SNVs 17,174 (79.6%) Total frameshift insertions 2,527 (11.7%) Total nonframeshift insertions 239 (1.1%) Total frameshift deletions 5 (0.0%) Total nonframeshift deletions 0 (0.0%) Total stop-loss SNVs 33 (0.2%) Total stop-gain SNVs 1,594 (7.4%) Mutations in domain regions 10,647 (49.4%) Average mutations per patient 216 (± 552) Number of mutations per patient 21-4,880
26
27
Nehrt LN, Peterson TA, Park DH and Kann MG, Domain Landscapes of somatic mutations in cancer.BMC Genomics. 13 (2012)
FILIP1L is known to inhibit proliferation and migration and increase apoptosis in endothelial cells, it acts as a tumor suppressor and its loss of function has been implicated in ovarian cancer, head and neck squamous cell carcinoma and oli- godendrogliomas [38,39].
28
Nehrt LN, Peterson TA, Park DH and Kann MG, Domain Landscapes of somatic mutations in cancer.BMC Genomics. 13 (2012)
29
Nehrt LN, Peterson TA, Park DH and Kann MG, Domain Landscapes of somatic mutations in cancer.BMC Genomics. 13 (2012)
Domain view gives the functional context
Domain view reduces the space of inquiry Majority of disease mutations in coding
Disease mutations tend to significantly
The DS-Score or domain significance score
DS-Score can be used to classify
31
Domain landscape allows for the
The domain landscape score is derived
A gradient of mutation prevalence in
32
33
Thanks!