Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy
1
Contents Introduction 1 Methods and Materials 2 3 Results and - - PowerPoint PPT Presentation
Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy Shixiang Wan From Tianjin University, China Email: shixiangwan@tju.edu.cn 1 Contents Introduction 1 Methods and Materials 2 3 Results and
1
2
3
4
5
6
7
8
9
10
11
Input TBP sequences CD-HIT (reduce redundancy) 188D Prediction Results Step 1. Feature Representation Step 2. Classifier Prediction Three improvements Negative samples collection (for imbalance data) 473D 611D LibD3C LibSVM IBK Random Forest Bagging Data Preparation Step 3. Optimal Solution TBP or not ?
12
559 positive sequences 559 negative sequences
Extract
Take out Replenish negative-class
13
14
initial dimension dimension (1) dimension (2) dimension (3) dimension (k) tolerable dimension
primary step
dimension (1,1) dimension (1,2) dimension (1,k)
dimension (k,1) dimension (k,2) dimension (k,k)
dimension (k-1)
primary step secondary step secondary step accuracy the best accuracy
15
16
17
79.25% 79.57% 86.29% 81.84% 82.80% 90.46% 78.44% 76.97% 82.71% 76.92% 77.96% 79.84% 79.74% 80.29% 87.66% 70% 72% 74% 76% 78% 80% 82% 84% 86% 88% 90% 92% 188D 473D 611D
LibD3C LibSVM IBK RandomForest Bagging
18
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
ACC 92.92% SN 95.50% SP 87.30%
19
60% 65% 70% 75% 80% 85% 90% 95% 100%
10 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650
ACC 92.92% SN 95.50% SP 87.30%
20
21
22
23
24