Transfer String Kernel for Cross-Context Sequence Specific DNA-Protein Binding Prediction
by Ritambhara Singh IIIT-Delhi June 10, 2016
1
Transfer String Kernel for Cross-Context Sequence Specific - - PowerPoint PPT Presentation
Transfer String Kernel for Cross-Context Sequence Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 Biology in a Slide CELL PROTEIN RNA DNA ORGANISM 2 DNA and Diseases Down Syndrome
1
2
3
4
Gene Transcription Factor Transcription Factor Binding Site
ATCGCGTAGCTAGGGATGACAGACACACATAATTCTAGATA ¡
5 Transcription Factor Gene Genome
ATATCGTATCTTTTAAACCGGGTTGGCCACTAGA ¡ ATATCGTATCTAAACCGCCTCGG ¡
ChIP-seq Map for TF Peak Transcription Factor Binding Site
DNA
6
ATATCGTATCTTTTAAACCGGGTATGTAATGCAT ¡ ATATCGTATCTAAACCGCCCGTGT ¡ ATATCGTATCTTTTAAACCGGGTTGGCCAGTATA ¡ ATATCGTATCTAAACCGCCCTGCA ¡
7
(Blood Cell) (Stem Cell) (Leukemia) (Lung Cancer) (Cervical Cancer) (Nerve Cell) (Immunity related)
Source : http://genome.ucsc.edu/ENCODE/dataMatrix/encodeChipMatrixHuman.html
8
9
10 Genome
ATATCGTATAACAATAACCGGGAACTAATAGC ¡ ATATCGTATCTAACAAATCCTACT ¡
ChIP-seq Map for TF Peak Sequence Logo
1 2 3 4 5 6 7 8 9 10 11 12 A 14 14 28 40 9 45 42 13 15 9 T 12 3 4 12 11 10 9 6 5 38 12 3 C 3 1 8 2 2 36 2 2 1 G 1 16 10 1 2 3 2 7 11
Position Weight Matrix
Genome
ATATCGTATCTTTTAAACCGGGTTGGCCAATAGC ¡ ATATCGTATCTAAACCGCCCTACT ¡
11
Source : http://www.cbil.upenn.edu/EpoDB/release/version_2.2/meme/meme-output.html#sample
12
Genome
ATATCGTATCTTTTAAACCGGGTTGGCCAATAGC ¡ ATATCGTATCTAAACCGCCCTACT ¡
ATATCGTATCTTTTAAACCGGGTTGGCCAATAGC ¡
Peak
ATATCGTATCTAAACCGCCCTACT ¡
Genome
13
14
Support Vector Machine
15
16
17
Feature Conversion Feature Conversion Knowledge Transfer Classification Source Context Target Context Training (KMM)
ATCGAT GTATAC ATACAT GCTTAC
Xs Xt
18
19
20
21
22
23
24 Negative Instances (y = -1) Positive Instances (y = +1)
25
26
27
28
29
30
Feature Conversion! Feature Conversion! Knowledge Transfer! Classification! Source! Context! Target! Context! Training!
ATCGATCG ATCGATCG% CCCGATCG CTCGCTCC%
Mismatch String Kernel Mismatch String Kernel Kernel Mean Matching (KMM) Importance Re- weighting SVM
31
32
33
34
0.8 0.82 0.84 0.86 0.88 0.9 Sin3a Max Mxi1 Chd2 Ctcf
AUC Score Transcription Factors
TSK SK
35
36
37
38
39
40
41
42