CHASM Taylor Jaraczewski Background Yet again.. Drivers vs. - - PowerPoint PPT Presentation

chasm
SMART_READER_LITE
LIVE PREVIEW

CHASM Taylor Jaraczewski Background Yet again.. Drivers vs. - - PowerPoint PPT Presentation

CHASM Taylor Jaraczewski Background Yet again.. Drivers vs. passengers Only a very small fraction of tumors drives proliferation (hill vs. mountains) Need ways to determine drivers NOT based on frequency CHASM focuses on


slide-1
SLIDE 1

CHASM

Taylor Jaraczewski

slide-2
SLIDE 2

Background

  • Yet again….. Drivers vs. passengers
  • Only a very small fraction of tumors drives

proliferation (hill vs. mountains)

  • Need ways to determine drivers NOT based on

frequency

  • CHASM focuses on missense mutations

– Make up majority of mutations

slide-3
SLIDE 3

Random Forest Classification

1) Decision Trees

slide-4
SLIDE 4

Random Forrest Classifier

slide-5
SLIDE 5

Feature Selection

  • - Feature capable of

correct classification would require 2.05 bits

  • f info. Top had 0.37
  • Chose 49 features

determined by mutual information

slide-6
SLIDE 6

General Random Forest Info

  • Used 500 trees
  • Used known drivers and synthetic passengers

for feature selection and classifier training

  • Mtry = 7

– Number of variables available for splitting at each node

slide-7
SLIDE 7

Comparison to Other Methods

Receiver Operator Characteristic (ROC)

  • Points that reperesent trade-off between

sensitivity (fraction of drivers correctly classified) and specificity (“ “ passengers) Precision Recall

  • Points that represent the trade-off between

precision (fraction of true drivers out of all predicted drivers) and recall (sensitivity)

slide-8
SLIDE 8

Other Models

PolyPhen - Uses Bayes classification; queries BLAST data base to predict impact of amino acid substitution on the structure/function of proteins SIFT – Provides score for probability that a missense mutation will be tolerated. CanPredict – Combination of SIFT score, LogRE score, and GOSS score to train a random forest classifier KinaseSVM – Uses protein kinases

slide-9
SLIDE 9

GBM