Kipoi: : model zoo for genomics
Žiga Avsec
PhD candidate, Technical University of Munich www.gagneurlab.in.tum.de @gagneurlab, @KipoiZoo, @Avsecz
Kipoi: : model zoo for genomics iga Avsec PhD candidate, Technical - - PowerPoint PPT Presentation
Kipoi: : model zoo for genomics iga Avsec PhD candidate, Technical University of Munich www.gagneurlab.in.tum.de @gagneurlab, @KipoiZoo, @Avsecz Genomics ACGTGTCAGTAGTTAAGCTAGTAGCTGATCGGTAACGTAGTGCACGTGTCAGTAGTTAAGCTAGTAGCTGATC 3 billion
PhD candidate, Technical University of Munich www.gagneurlab.in.tum.de @gagneurlab, @KipoiZoo, @Avsecz
ACGTGTCAGTAGTTAAGCTAGTAGCTGATCGGTAACGTAGTGCACGTGTCAGTAGTTAAGCTAGTAGCTGATC
3
Genome Protein1 Protein1 Protein2 Protein3 Protein3 Protein3 ~100k - 1M
4
Genome Protein1 Protein1 Protein2 Protein3 Protein2 ~100k - 1M
5
Genome Protein1 Protein1 Protein2 Protein3 Protein2 Protein2 Protein1 Protein2 Protein3 ~100k - 1M Protein complex Function1 Function2
7
atcgtatatatcatgatatggatacgcatagatcatgactcaggatacg
aucaugauauggauacgcauagaucaugacuca
Transcription
aucaugauacauagaucaugacuca
Splicing
Translation
8
atcgtatatatcatgatatggatacgcatagatcatgactcaggatacg
9
atcgtatatatcatgatatggatacgcatagatcatgactcaggatacg
10
atcgtatatatcatgatatggatacgcatagatcatgactcaggatacg
11
atcgtatatatcatgatatggatacgcatagatcatgactcaggatacg
12
atcgtatatatcatgatatggatacgcatagatcatgactcaggatacg
13
atcttatatatcatgatatggatacgcatagatcatgactcaggatacg
Reference Patient
atcgtatatatcatgatatggatacgcatagatcatgactcaggatacg
14
aucaugauauggauacgcauagaucaugacuca aucaugauacauagaucaugacuca
cttatcacagtgtatatcatgatatggatacgcatagatcatgactcaggatacg
15
Experimental data
16
Experimental data Predictive models
17
Experimental data Predictive models
18
GATA TAL
cttatcacagtgtatatcatgatatggatacgcatagatcatgactcaggatacg
19
Eraslan*, Avsec* et al Nature Review Genetics 2019 (In press)
20
Eraslan*, Avsec* et al Nature Review Genetics 2019 (In press)
21
Eraslan*, Avsec* et al Nature Review Genetics 2019 (In press)
22
Eraslan*, Avsec* et al Nature Review Genetics 2019 (In press)
23
Eraslan*, Avsec* et al Nature Review Genetics 2019 (In press)
24
Eraslan*, Avsec* et al Nature Review Genetics 2019 (In press)
25
Eraslan*, Avsec* et al Nature Review Genetics 2019 (In press)
26
Experimental data Predictive models
27
See also: https://github.com/greenelab/deep-review
28
See also: https://github.com/greenelab/deep-review
29
30
31
32
33
34
35
Avsec et al, Nature Biotechnology (In press)
36
37
TGATCGAGG GTAGCTAGC CGTGAGTTT
Output Model Input Parameters Can be implemented using: data-loader model “Parameterized function”
38
data-loader model
data-loader model chr1 1000 2000 chr2 5000 7000 >chr1 NNNNNNNNNNNN... intervals.bed genome.fa resize extract transform
array([[[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0], [ … ]], [[0, 1, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0], [0, 0, 0, 1], [ … ]]])
40
data-loader model
41
...
TGATCGAGG GTAGCTAGC CGTGAGTTT TGATCGAGG GTAGCTAGC CGTGAGTTT TGATCGAGG GTAGCTAGC CGTGAGTTT
42
TGATC GAGGA
... Supports multiple inputs/outputs
43
44
45
46
47
48
49
50
52
53
54
55
56
# Create and activate a new conda environment with # all model dependencies installed kipoi env create <Model> source activate kipoi-<Model> # Run model prediction kipoi predict <Model> \
“intervals_file”: “intervals.bed”, “fasta_file”: “hg38.fa”}' \
57
# Create and activate a new conda environment with # all model dependencies installed kipoi env create <Model> source activate kipoi-<Model> # Run model prediction kipoi predict <Model> \
“intervals_file”: “intervals.bed”, “fasta_file”: “hg38.fa”}' \
<- Works very nicely with workflow-management tools like Snakemake
58
# Create and activate a new conda environment with # all model dependencies installed kipoi env create <Model> source activate kipoi-<Model> # Run model prediction kipoi predict <Model> \
“intervals_file”: “intervals.bed”, “fasta_file”: “hg38.fa”}' \
59
# Run model prediction kipoi predict <Model> \
“intervals_file”: “intervals.bed”, “fasta_file”: “hg38.fa”}' \
60
# Run model prediction kipoi predict <Model> \
“intervals_file”: “intervals.bed”, “fasta_file”: “hg38.fa”}' \
input.data Container Model
61
# Run model prediction kipoi predict <Model> \
“intervals_file”: “intervals.bed”, “fasta_file”: “hg38.fa”}' \
input.data Container Model
In-progress:
62
Dense Dense Dense
Model with transferred parameters Pre-trained model DNA accessibility in 421 cell-types DNA accessibility in new cell type
Dense Dense Dense
Area under the Precision-recall curve Training epoch
See also Kelley et al. Gen. res. 2016
Randomly initialized (>1day) Transferred (<4h) Takes a few days to train (Divergent421 model in Kipoi)
Training epoch Area under the Precision-recall curve
See also Kelley et al. Gen. res. 2016
65
66
Eraslan*, Avsec* et al NRG 2019 (In press)
67
Eraslan*, Avsec* et al NRG 2019 (In press)
68
Eraslan*, Avsec* et al NRG 2019 (In press)
69
# Python import kipoi from kipoi_interpret.importance_scores.gradient import GradientXInput model = kipoi.get_model("model”) imp_score = GradientXInput(model) scores = imp_score.score(seqs) # CLI kipoi interpret create_mutation_map \ <Model> \
“intervals_file”: “intervals.bed”, “fasta_file”: “hg38.fa”}' \
70
linked to Beta thalassemia
71
linked to Beta thalassemia
72
73
atcttatatatcatgatatggatacgcatagatcatgactcaggatacg
Reference Patient
atcgtatatatcatgatatggatacgcatagatcatgactcaggatacg
#CHROM POS ID REF ALT … chr22 41320486 . G T …
74
75
# Annotate VCF file with variant scores kipoi veff score_variants <Model> \
“fasta_file”: “hg38.fa”}' \
76
77
atcgtatatatcatgatatggatacgcatagatcatgactcaggatacg aucaugauauggauacgcauagaucaugacuca
aucaugauacauagaucaugacuca
Splicing
78
atcgtatatatcatgatatggatactcatagatcatgactcaggatacg aucaugauauggauactcauagaucaugacuca
aucaugauacauataucaugacuca
Splicing
79
Scotti & Swanson, 2016 NRG
80
Donor Acceptor Branchpoint MaxEntScan/3prime MaxEntScan/5prime HAL labranchor
81
Donor Acceptor Branchpoint MaxEntScan/3prime MaxEntScan/5prime HAL labranchor
Kipoi models: KipoiSplice/4 KipoiSplice/4cons MMSplice
83
84
Experimental data Predictive models
85
Avsec et al, Nature Biotechnology (In press)
86
Roman Kreuzhuber
Thorsten Beider
PhD candidate, Technical University of Munich www.gagneurlab.in.tum.de @gagneurlab, @KipoiZoo, @Avsecz
88