Fusion of Continuous Output Classifiers Classifiers Jacob Hays - PowerPoint PPT Presentation

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Definitions • x – feature vector • c – number of classes • L – number of classifiers • { ω 1 , ω 2 , …., ω c } – Set of class labels • { ω 1 , ω 2 , …., ω c } – Set of class labels • {D 1 , D 2 , …., D L } – Set of classifiers ▫ All c outputs from D i are in interval [0,1] • DP(x) – Decision Profile matrix   d ( x ) ... d ( x ) ... d ( x )  1 , 1 1 , j 1 , c  =  DP ( x ) d ( x ) ... d ( x ) ... d ( x ) .  i , 1 i , j i , c   d ( x ) ... d ( x ) ... d ( x )   L , 1 L , j L , C

Approaches • Class Conscience � Use one column of DP(x) at a time ▫ Ex) Simple/Weighted Averages • Class Indifferent – Treat DP(x) as a whole • Class Indifferent – Treat DP(x) as a whole new feature space, Use new classifier to make final decision.   d ( x ) ... d ( x ) ... d ( x )  1 , 1 1 , j 1 , c  =  DP ( x ) d ( x ) ... d ( x ) ... d ( x ) .  i , 1 i , j i , c   d ( x ) ... d ( x ) ... d ( x )   L , 1 L , j L , C

Discriminant to Continuous • Non6continuous classifiers produce label • {g 1 (x), g 2 (x), … g c (x)} – output of D ▫ Would like to normalize to [0,1] interval • {g’ 1 (x), g’ 2 (x), … g’ c (x)}, where c ∑ = ' ( ) 1 g j x = j 1 • Softmax Method exp{ g ( x )} j = g ' ( x ) Normalizes to [0,1] j c ∑ • Better if g’(x) would exp{ g ( x )} k be a probability = 1 k

Converting Linear Discriminant • Assuming normal densities = ω ω g ( x ) log{ P ( ) p ( x | )} j j j • Let C be the constant additive terms we drop A = exp{ C } ω ω = P ( ) p ( x | ) A exp{ g ( x )} j j j • Plug into Bayes Rule, and it simplifies to the softmax function × A exp{ g ( x )} exp{ g ( x )} j j ( ω = = P | x ) j c c ∑ ∑ × exp{ ( )} exp{ ( )} A g x g x k k = = k 1 k 1

Neural Networks • Consider a NN, with c outputs, {y 1 , …, y c } • When trained using squared error rate, the outputs can be used for an approximation of posterior probability. posterior probability. • Normalize them to [0,1] interval using softmax function. • Normalization function independent of Neural network training, only occurs on outputs.

Laplace Estimator for Decision Tree • In Decision Trees, you use entropy to split the distribution based on a single feature per level • Normally, you continue to split until there is a single class in each leaf of the tree single class in each leaf of the tree • In Probability Estimating Trees , instead of splitting until a single class is in a leaf, split until around K points are in each leaf, and use various methods to calculate the probability of each class at each leaf.

Count based probability, Laplace • {k 1 , k 2 , …, k c } – Number of sample points of class {w 1 , w 2 , …., w c } respectively in leaf • K = k 1 + k 2 + …+ k c • Maximum Likelihood (ML) estimate of • Maximum Likelihood (ML) estimate of k ˆ j ω = = P ( | x ) , j 1 ,..., c j K • When K is too small, estimates are unpredictable

Laplace Estimator • Laplace Correction + k 1 ˆ ω j = P ( | x ) j + K c • m6estimation: • m6estimation: ˆ + × k m P ( w ) ˆ ω j j = P ( | x ) j + K m • best to set m so ˆ × ω ≈ ( ) 10 m P j

Ting and William Laplace estimator • Ting and William ▫ ω * is majority class ∑ ∑ k l   + + k 1 1 l  ≠ l j * − = 1 ( ) if w w   j + K 2 ˆ  ω = ( | ) P x k [ ] j  ˆ j * − ω × 1 ( ) P otherwise ∑  k  l  ≠ l j

Weighted Distance Laplace Estimate • Take the average distance from x to all samples of class w j , over the average distance to all samples 1 ∑ ( j ) ( , ) d x x ( j ) ∈ x w ˆ ω = j P ( | x ) j k 1 ∑ ( i ) d ( x , x ) = 1 i

Example

Class Conscious Combiners • Non6trainable Combiners ▫ No extra parameters, all defined up front ▫ Function of classifier output for specific class µ = ( x ) F [ d ( x ), d ( x ),... d ( x )] j 1 , j 2 , j L , j • Simple mean L 1 ∑ µ = ( x ) d ( x ) , j i j L = i 1

Popular Class Conscious Combiners • Minimum/Maximum/Median µ = ( x ) { d ( x )} max j i , j i • Trimmed Mean: ▫ L degrees of support sorted, X percent of values ▫ L degrees of support sorted, X percent of values are dropped. Mean taken of remaining. • Product L ∏ µ = ( x ) d ( x ) j i , j = i 1

Generalized Mean Function α 1 /   L 1 ∑   α µ α = ( x , ) d ( x ) j i , j   L = 1 i • Generalized Mean is defined as above except for the following special cases. the following special cases. ▫ a → �∞ , Minimum, 1 / L   ▫ a = 61, Harmonic Mean L = ∏   µ ( x ) d ( x )   ▫ a = 0, Geometric mean j i , j   = i 1 ▫ a = 1, Simple Arithmetic Mean ▫ a → ∞, Maximum • a is chosen before hand, level of optimism

Class Conscious Combiner Example

Example: Effect of Optimism α • 100 training / test sets ▫ Training set (a), 200 samples ▫ Testing set (b), 1000 ▫ Testing set (b), 1000 samples • For each ensemble ▫ 10 bootstrap samples (200 values) ▫ Train classifier on each (Parzen)

Example: Effect of Optimism α • Generalized mean ▫ 50 <= α <= +50, steps of 1 ▫ 61 <= α <= +1, steps of 0.1 • Simple mean combiner gives best result

Interpreting Results • Mean classifier isn’t always the best • Shape of the error curve depends upon ▫ Problem ▫ Base classifier used ▫ Base classifier used • Average and product are most intensely studied combiners ▫ For some problems, average may be… � Less accurate, but � More stable

Ordered Weight Averaging • Generalized, non6trainable • L coefficients (one for each classifier) • Order the results of ω j classifiers, descending j • Multiply by vector of coefficients b (weights) ▫ i 1 , …, i L is a permutation of the indices 1, …, L L ∑ ( ) ( ) µ = x b d x k i [ k ] j = k 1

Ordered Weight Averaging: Example • Consider a jury assessing sport performance (diving) [ ] ▫ Reduce subjective bias T = d . 6 . 7 . 2 . 6 . 6 j � Trimmed mean � Trimmed mean     1 1 1 1 1 1 = b 0 0 � Drop lowest, highest scores     3 3 3 � Average the remaining   1 1 1 [ ] T µ = = 0 0 . 7 . 6 . 6 . 6 . 2 0 . 6   j   3 3 3

Ordered Weight Averaging • General form of trimmed mean ▫ b = [ 0, 1/(L62), 1/(L62), …, 1/(L62), 0] T • Other operations may be modeled with careful selection of b selection of b ▫ Minimum: b = [0, 0, …, 1] T ▫ Maximum: b = [1, 0, …, 0] T ▫ Average: b = [ 1/L, 1/L, …, 1/L] T • Many resources spent on developing new aggregation connectives ▫ Bigger question: when to use which one?

Trainable Combiners • Combiners with additional parameters to be trained ▫ Weighted Average ▫ Fuzzy Integral ▫ Fuzzy Integral

Weighted Average • 3 groups, based on number of weights • L6weights ▫ One weight per classifier L ∑ ∑ ( ) ( ) µ = x w d x j j i i i i , , j j = i 1 ▫ Similar to equation we saw for ordered weight average, except we’re trying to optimize w i here (and we’re not reordering d i,j ) ▫ w i for classifier D i usually based on its estimated error rate

Weighted Average • c x L weights ▫ Weights are specific to each class L ∑ ∑ ( ) ( ) µ = x w d x j j ij ij i i , , j j = i 1 ▫ Only j th column is used in calc ▫ Linear regression commonly used to derive optimal weights ▫ “class conscious” combiner

Weighted Average • c x c x L weights ▫ Support for each class determined from entire decision profile DP(x) L c ∑∑ ∑∑ ( ) ( ) µ = x w d x j ikj i , k = = i 1 k 1 ▫ Different weight space for each class ω j ▫ Whole decision profile is intermediate feature space � “class indifferent” combiner

Weighted Average: Class Conscious L ∑ ( ) ( ) µ = x w d x j ij i , j = i 1 • d i,j (x) are point estimates of P(ω j | x) ▫ If estimates are unbiased, ▫ If estimates are unbiased, � Q(x) is nonbiased minimum variance estimate of P(ω j | x), conditional upon… � restriction of coefficients w i to sum to 1 L ∑ = w 1 i = i 1

Weighted Average: Class Conscious L ∑ ( ) ( ) µ = x w d x j ij i , j = i 1 • Weights derived to minimize variance of Q(x) • Weights derived to minimize variance of Q(x) ▫ Q(x) variance <= variance of any single classifier • We assume point estimates are unbiased ▫ Variance of d i,j (x) = expected squared error of d i,j (x) • When coefficients w i minimize variance ▫ Q(x) is better estimate of P(ω j | x) than any d i,j (x)

Fusion of Continuous Output Classifiers Classifiers Jacob Hays - PowerPoint PPT Presentation

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice Definitions x feature vector c number of classes L number of classifiers { 1 , 2 , ., c } Set of class

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Rate-Based Stochastic Fusion Calculus and Angelo Troina Continuous Time Markov Chains Fusion

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

October 2016 October 2016 WHAT IS FUSION? TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO

Update on the Fusion Update on the Fusion Energy Sciences Program Energy Sciences Program Ed

Modeling with MOSEK Fusion Ulf Worse INFORMS Minneapolis October 5 2013 http://www.mosek.com

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Update of Magnetic Fusion Energy Research Brian A. Nelson for the UW Fusion Energy Research Group

Fusion Nothing But The Truth Fusion Orbotech s True Commitment To The PCB Industry Overall

Oncentra Prostate Image Fusion Josh Mason Oncentra Prostate Image Fusion Multiple image

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

First look at structures CS 6355: Structured Prediction 1 So far Binary classifiers

High Assurance Spiral 18-847E Spiral: Formal Approaches to Hardware & Software Design &

Darrell Bethea May 19, 2011 1 Assignments: Homework 0 grades up Program 1 due

How to Develop a Program Logic Model Learning objectives By the end of this presentation, you

EC487 Advanced Microeconomics, Part I: Lecture 2 Leonardo Felli 32L.LG.04 6 October, 2017

Bernsteins 34 th Annual Strategic Decisions Conference Mike Mahoney Chairman and Chief

Alternative & Emerging Technologies in Health Services Research Joseph Kim, MD, MPH Miriam

Thank you for joining us. The program will commence momentarily. Recent Advances in Medical

Common Ground: effective partnership for better care in rare disease Dr. Virginia Acha,

Fusion of Continuous Output Classifiers Classifiers Jacob Hays - PowerPoint PPT Presentation

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice Definitions x feature vector c number of classes L number of classifiers { 1 , 2 , ., c } Set of class

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Rate-Based Stochastic Fusion Calculus and Angelo Troina Continuous Time Markov Chains Fusion

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

October 2016 October 2016 WHAT IS FUSION? TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO

Update on the Fusion Update on the Fusion Energy Sciences Program Energy Sciences Program Ed

Modeling with MOSEK Fusion Ulf Worse INFORMS Minneapolis October 5 2013 http://www.mosek.com

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Update of Magnetic Fusion Energy Research Brian A. Nelson for the UW Fusion Energy Research Group

Fusion Nothing But The Truth Fusion Orbotech s True Commitment To The PCB Industry Overall

Oncentra Prostate Image Fusion Josh Mason Oncentra Prostate Image Fusion Multiple image

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

First look at structures CS 6355: Structured Prediction 1 So far Binary classifiers

High Assurance Spiral 18-847E Spiral: Formal Approaches to Hardware &amp; Software Design &amp;

Darrell Bethea May 19, 2011 1 Assignments: Homework 0 grades up Program 1 due

How to Develop a Program Logic Model Learning objectives By the end of this presentation, you

EC487 Advanced Microeconomics, Part I: Lecture 2 Leonardo Felli 32L.LG.04 6 October, 2017

Bernsteins 34 th Annual Strategic Decisions Conference Mike Mahoney Chairman and Chief

Alternative &amp; Emerging Technologies in Health Services Research Joseph Kim, MD, MPH Miriam

Thank you for joining us. The program will commence momentarily. Recent Advances in Medical

Common Ground: effective partnership for better care in rare disease Dr. Virginia Acha,

High Assurance Spiral 18-847E Spiral: Formal Approaches to Hardware & Software Design &

Alternative & Emerging Technologies in Health Services Research Joseph Kim, MD, MPH Miriam