Real-World applications
- f Boosting
Yoav Freund UCSD
Real-World applications of Boosting Yoav Freund UCSD Practical - - PowerPoint PPT Presentation
Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost Practical Advantages of AdaBoost Practical Advantages of AdaBoost Practical Advantages of AdaBoost Practical Advantages of AdaBoost fast simple and
Yoav Freund UCSD
Practical Advantages of AdaBoost Practical Advantages of AdaBoost Practical Advantages of AdaBoost Practical Advantages of AdaBoost Practical Advantages of AdaBoost
→ shift in mind set — goal now is merely to find classifiers barely better than random guessing
binary classification
Caveats Caveats Caveats Caveats Caveats
→ overfitting
→ underfitting → low margins → overfitting
noise
UCI Experiments UCI Experiments UCI Experiments UCI Experiments UCI Experiments
[with Freund]
predict +1 predict no yes height > 5 feet ? predict
predict +1 no yes eye color = brown ?
UCI Results UCI Results UCI Results UCI Results UCI Results
5 10 15 20 25 30
boosting Stumps
5 10 15 20 25 30
C4.5
5 10 15 20 25 30
boosting C4.5
5 10 15 20 25 30
C4.5
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Boosting Stumps (for text classification)
Area code, AT&T service, billing credit, calling card, collect, competitor, dial assistance, directory, how to dial, person to person, rate, third party, time charge ,time
Schapire, Singer, Gorin 98
O p e r a S
u t i
s , 1 / 2 / 2 1 2
please
my office
the wrong number because I got the wrong party and I would like to have that taken off my bill
Examples
Ø collect Ø third party Ø billing credit Ø calling card
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Calling card Collect call Third party Weak Rule Category Word occurs Word does not occur
Weak rules generated by “boostexter”
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Results
– hand transcribed
–hand / machine transcribed
–Machine transcribed: 75% –Hand transcribed: 90%
2/17/2006 CTBP
Face Detection / Viola and Jones
2/17/2006 CTBP
Face Detection as a Filtering process
Smallest Scale Larger Scale 50,000 Locations/Scales
Most Negative
2/17/2006 CTBP
Classifier is Learned from Labeled Data
2/17/2006 CTBP
Image Features
Unique Features “Rectangle filters” Similar to Haar wavelets Papageorgiou, et al.
ht(xi) = 1 if ft(xi) > θt 0 otherwise ⎧ ⎨ ⎩
Very fast to compute using “integral image”. Combined using adaboost
University of Washington 15
Example Classifier for Face Detection
ROC curve for 200 feature classifier
University of Washington 16
Employing a cascade to minimize average feature computation time
The accurate detector combines 6000 simple features using Adaboost. In most boxes, only 8-9 features are calculated. Features 1-3 All boxes
Definitely not a face
Might be a face
Features 4-10
O p e r a S
u t i
s , 1 / 2 / 2 1 2
O p e r a S
u t i
s , 1 / 2 / 2 1 2 18
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Grey-scale detection score Subtract-average detection score
University of Washington 20
Using confidence to avoid labeling
Levin, Viola, Freund 2003
University of Washington 21
Image 1
University of Washington 22
Image 1 - diff from time average
University of Washington 23
Image 2
University of Washington 24
Image 2 - diff from time average
University of Washington 25
Co-training
Hwy Images
Raw B/W D i f f I m a g e Partially trained B/W based Classifier Partially trained Diff based Classifier Confident Predictions Confident Predictions
Blum and Mitchell 98
University of Washington 26
Grey-scale detection score Subtract-average detection score
Non cars Cars
University of Washington 27
Co-Training Results
Raw Image detector Difgerence Image detector
Before co-training After co-training
With Llew Mason
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Decision Trees
X>3 Y>5
+1
no yes yes no
X Y 3 5 +1
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Decision tree as a sum
X Y
Y>5
+0.2
yes no
X>3
no yes
+0.1
+0.1
+0.2
+1
sign
O p e r a S
u t i
s , 1 / 2 / 2 1 2
An alternating decision tree
X Y +0.1
+0.2
sign
Y>5
+0.2
y e s no
X>3
no y e s
+0.1
+0.7
Y<1
0.0
no y e s
+0.7
+1
+1
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Example: Medical Diagnostics
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Cross-validated accuracy
Learning algorithm Number of splits Average test error Test error variance
ADtree 6 17.0% 0.6% C5.0 27 27.2% 0.5% C5.0 + boosting 446 20.2% 0.5% Boost Stumps 16 16.5% 0.8%
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Adtree for Cleveland heart-disease diagnostics problem
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Call Detail analysis (AT&T)
Freund, Mason, Rogers, Pregibon, Cortes 2000
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Massive datasets
(today we might have used Hadoop)
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Alternating tree for “buizocity”
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Alternating Tree (Detail)
O p e r a S
u t i
s , 1 / 2 / 2 1 2
Precision/recall graphs
Score Accuracy
Installation
directory structure
Required software packages
Check Versions
$ scripts/checkVersions.sh
java version "1.6.0_33" Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-10M3720) Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
Python 2.7.2
gnuplot 4.2 patchlevel 5
dot - graphviz version 2.28.0 (20110509.1545)
Quick Start
O p e r a S
u t i
s , 1 / 2 / 2 1 2
The Seville project
Ecole Des Mines, Paris).
45
Pedestrian detection - typical segment
3 seconds for deciding if box is pedestrian or not. 20 seconds for marking a box around a pedestrian. How to choose “hard” negative examples? 1500 pedestrians Collected 6 Hrs of video -> 540,000 frames 170,000 boxes per frame
Only examples whose normalized score is in this range are hand - labeled
Positive Negative
Positive Negative
Positive Negative 7 9 8
Iteration
10
And the figure in the gown is ...
Genetic Disorders
diseases have a significant heritable component.
DNA locations (SNPs) on patients (and controls)
(correlations) between DNA location and disease.
getting access is not trivial)
significant correlation. But,
single SNPs, learn a function that maps the SNP vector to the disease.
n
btwn SNPs.
(ethnicities)
WT consortium: 2000 cases, 3000 controls GC consortium: 4061 cases and 2571 controls
Measuring closeness of location
Mann-Whitney U test yields p=10-30
Tree structure of ADT hints at relations btwn SNPs
The protein crystallization problem
DNA.
small yield.
The post-doc method
paper - can advance to next stage of academic career.
method.
“high throughput” method
solutions of protein and salts in different concentrations.
Problems with high-throughput
rather than crystals.
requires human expertise.
weeks long. By which time, the crystal often dissolves back into the solution...
Detecting micro-crystals
Detecting micro-crystals
Detecting micro-crystals
Detecting micro-crystals
Detecting micro-crystals
C-Elegans image analysis for high-throughput screening
biology.
screening - testing thousands of compounds.
worms not for image analysis.)
Annie Lee Connery (MGH, Ruvkun Lab and Ausubel Lab).
Results
The image processing work-flow
Basic ¡blocks ¡for ¡worms
characteris9c ¡block. ¡
segments.
represented ¡by ¡the ¡center ¡
worm ¡segments ¡would ¡give ¡ us ¡the ¡direc9on ¡and ¡size.
Aim ¡of ¡learning
from ¡incorrect ¡ones.
perpendicular ¡to ¡the ¡ median ¡line ¡with ¡ends ¡on ¡ the ¡worm ¡boundary.
nega9ve. yes no
User ¡input
worms ¡and ¡the ¡median ¡line.
perpendicular ¡to ¡the ¡median ¡ line ¡that ¡end ¡at ¡the ¡worm ¡ boundaries.
posi9ve.
nega9ve.
Features ¡for ¡Classifica9on
as ¡features.
for ¡worms, ¡blue ¡will ¡be ¡darker ¡and ¡have ¡ texture, ¡red ¡would ¡have ¡edges.
used ¡as ¡features.
Feature finding
Input ¡bright-‑field
Filtered Images: Laplacian of Gaussian (I)
Filtered Images: Laplacian of Gaussian (II)
Filtered Images: Derivatives
Worm ¡Detec9on: ¡ini9al ¡training ¡set
Worm ¡Detec9on ¡-‑ ¡2 ¡feedback ¡itera9ons
ECML08
95
Iteration 0
ECML08
96
Iteration 1
ECML08
97
Iteration 2
ECML08
98
Iteration 10
ECML08
99
Iteration 20
ECML08
100
Iteration 50
ECML08
101
Iteration 100
ECML08
102
Iteration 200
ECML08
103
scores after retraining
[Oza & Russel 2001]
[Grabner, Grabner & Bischof 2006]
[Stalder & Grabner 2009]
Tracking under Partial Occlusion
[Stalder & Grabner 2009]
[Stalder & Grabner 2009]