@mirocupak
Miro Cupak
VP Engineering, DNAstack 13/06/2018
How we built a global search engine for genetic data Miro Cupak VP - - PowerPoint PPT Presentation
How we built a global search engine for genetic data Miro Cupak VP Engineering, DNAstack 13/06/2018 @mirocupak What and why? Beacon Network https://beacon-network.org/ largest search and discovery engine of human genetic mutations
@mirocupak
Miro Cupak
VP Engineering, DNAstack 13/06/2018
@mirocupak
2
problem standard architecture technologies fun with stats
@mirocupak
3
@mirocupak 4
https://beacon-network.org
@mirocupak 5
https://beacon-network.org
@mirocupak
6
https://www.nature.com/news/technology- the-1-000-genome-1.14901
sequencing cost decreasing exponentially (3M times since 2000)
@mirocupak
7
http://journals.plos.org/plosbiology/ article?id=10.1371/journal.pbio. 1002195
genomic data volume increasing exponentially (1M times since 2000)
@mirocupak
8
Data Volumes by 2025 (GB)
0E+00 1E+10 2E+10 3E+10 4E+10
Twitter Youtube Genomics
Lower Bound Upper Bound
http://journals.plos.org/plosbiology/article? id=10.1371/journal.pbio.1002195
up to 2 billion human genomes sequenced in the next 10 years (more data annually than uploaded to and )
@mirocupak
9
@mirocupak
research, disease advocacy, life science, and information technology
10
genetic data in the simplest of all technical contexts
http://ga4gh.org/ https://www.broadinstitute.org/files/news/pdfs/ GAWhitePaperJune3.pdf https://beacon-project.io/
@mirocupak
11
@mirocupak
contain a genetic variant of interest
12
@mirocupak
13
@mirocupak 14
@mirocupak
15
@mirocupak
16
@mirocupak
metadata and response, improved support for datasets and cross-dataset queries, data versioning
17
@mirocupak
mutations
18
@mirocupak
19
@mirocupak
20
@mirocupak
21
@mirocupak
22
@mirocupak
23
and processing its response
easily extensible query execution pipeline
@mirocupak
24
@mirocupak
25
URIs and parameters produced by the converters
@mirocupak
26
@mirocupak
27
raw response obtained by a fetcher
@mirocupak
28
@mirocupak
29
@mirocupak
30
@mirocupak
31
@mirocupak
32
@mirocupak
33
@mirocupak 34
@mirocupak
35
Others 11% GRCh38 6% GRCh37 83%
@mirocupak
36
Others 39%
7%
11%
11%
14%
18%
@mirocupak
37
Others 74% 2 : 212289100 C (ERBB4) 1% 2 : 29432776 C (ALK) 1% 14 : 23894969 A (MYH7) 1% 1 : 115258747 A (NRAS) 1% 1 : 43815163 C (MPL) 2% 7 : 140453136 C (BRAF) 2% 2 : 45895 G (FAM110C) 3% 22 : 46546565 A (PPARA) 3% 13 : 32936732 C (BRCA2) 6% 2 : 38938 C (FAM110C) 6%
@mirocupak
38
Number of variants
1 1000 1000000
Score
0.00 0.07 0.14 0.21 0.28 0.35 0.42 0.49 0.56 0.63 0.70 0.77 0.84 0.91 0.98
Number of variants
1 1000 1000000
Score
0.00 0.07 0.14 0.21 0.28 0.35 0.42 0.49 0.56 0.63 0.70 0.77 0.84 0.91 0.98
SIFT (Sorting Intolerant From Tolerant) PolyPhen-2 HDIV (Polymorphism Phenotyping v2) 69% damaging, 31% tolerated 55% probably damaging, 22% possibly damaging, 23% benign
@mirocupak
39
Number of variants
1 100 10000
Allele frequency
0.00 0.03 0.06 0.090.12 0.15 0.18 0.21 0.240.27 0.30 0.33 0.36 0.39 0.420.45 0.48 0.51 0.54 0.57 0.600.63 0.66 0.69 0.72 0.75 0.780.81 0.84 0.87 0.90 0.93 0.960.99
@mirocupak
40
Symbol Name
1
FAM110C Family With Sequence Similarity 110 Member C
2
BRCA1 BRCA1, DNA Repair Associated
3
BRCA2 BRCA2, DNA Repair Associated
4
PPARA Peroxisome Proliferator Activated Receptor Alpha
5
ERBB4 Erb-B2 Receptor Tyrosine Kinase 4
6
BRAF B-Raf Proto-Oncogene, Serine/Threonine Kinase
7
MPL MPL Proto-Oncogene, Thrombopoietin Receptor
8
MYH7 Myosin Heavy Chain 7
9
KIT KIT Proto-Oncogene Receptor Tyrosine Kinase
10
RET Ret Proto-Oncogene Others 53% RET 1% KIT 1% MYH7 2% MPL 2% BRAF 3% ERBB4 3% PPARA 4% BRCA2 9% BRCA1 10% FAM110C 11%
@mirocupak
41
OMIM HPO 1
Pancreatic cancer, susceptibility to, 4 Autosomal dominant inheritance
2
Breast-ovarian cancer, familial, 1 Autosomal recessive inheritance
3
Fanconi anemia, complementation group D1 Scoliosis
4
Prostate cancer Short stature
5
Pancreatic cancer 2 Cognitive impairment
6
Medulloblastoma Constipation
7
Glioblastoma 3 Somatic mutation
8
Breast-ovarian cancer, familial, 2 Cafe-au-lait spot
9
Breast cancer, male, susceptibility to Failure to thrive
10
Wilms tumor Nausea and vomiting
@mirocupak
42
https://mirocupak.com