Weighted Classification Cascades for Optimizing Discovery - PowerPoint PPT Presentation

Weighted Classification Cascades for Optimizing Discovery Significance Lester Mackey † Collaborators: Jordan Bryan † and Man Yue Mo † Stanford University December 13, 2014 Mackey (Stanford) Weighted Classification Cascades December 13, 2014 1 / 15

Background Hypothesis Testing in High-Energy Physics Goal: Given a collection of events (high-energy particle collisions) and a definition of “interesting” (e.g., Higgs boson produced), detect whether any interesting events occurred Interesting events = signal events Other events (e.g., no Higgs produced) = background events Why? To test predictions of physical models Standard Model of physics predicts existence of elementary particles and various modes of particle decay Claim: Higgs bosons exist and often decay into tau particles To substantiate claim experimentally, must distinguish Higgs to tau tau decay events (signal events) Other events with similar characteristics (background events) Mackey (Stanford) Weighted Classification Cascades December 13, 2014 2 / 15

Background Hypothesis Testing in High-Energy Physics Goal: Given a collection of events (high-energy particle collisions), test whether any signal events occurred How? Event represented as features (momenta and energy) of particles produced by collision Ideally: Test based on distributions of signal and background Signal and background event distributions complex and difficult to characterize explicitly: hinders development of analytical test Identify relatively signal-rich selection region by training classifier on labeled training data Test new dataset for signal by counting events in selection region and computing (approximate) “significance value” or p -value under Poisson likelihood ratio test Mackey (Stanford) Weighted Classification Cascades December 13, 2014 3 / 15

Background Approximate Median Significance (AMS) How to estimate significance of new event data? Dataset D = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } with event feature vectors x i ∈ X and labels y i ∈ {− 1 , 1 } = { background, signal } Classifier g : X → {− 1 , 1 } assigning labels to events x ∈ X True positive count s D ( g ) = � n i =1 I [ g ( x i ) = 1 , y i = 1] False positive count b D ( g ) = � n i =1 I [ g ( x i ) = 1 , y i = − 1] Approximate Median Significance (AMS) (Cowan et al., 2011) � � � s D ( g ) + b D ( g ) � � AMS 2 ( g, D ) = 2 ( s D ( g ) + b D ( g )) log − s D ( g ) b D ( g ) Approximates 1 − p -value quantile of Poisson model test statistic Measures significance in units of standard deviation or σ ’s Typically > 5 σ needed to declare signal discovery significant Mackey (Stanford) Weighted Classification Cascades December 13, 2014 4 / 15

Background Approximate Median Significance (AMS) Training goal: Select classifier g to maximize AMS 2 on future data Standard two-stage approach Withhold fraction of training events Stage 1: Train any standard classifier on remaining events Stage 2: Order held-out events by classifier scores and select new classification threshold to minimize AMS 2 on held-out data Pros: Requires only standard classification tools; works with any classifier Con: Stage 2 prone to overfitting, may require hand tuning Con: Stage 1 ignores AMS 2 objective, optimizes classification error This talk: A more direct approach to optimizing training AMS 2 that only requires standard classification tools and works with any classifier supporting class weights Mackey (Stanford) Weighted Classification Cascades December 13, 2014 5 / 15

Weighted Classification Cascades Weighted Classification Cascades Algorithm (Weighted Classification Cascade for Maximizing AMS 2 ) initialize signal class weight: u sig > 0 0 for t = 1 to T t − 1 ← e u sig t − 1 − u sig compute background class weight: u bac t − 1 − 1 train any weighted classifier: g t ← approximate minimizer of weighted classification error b D ( g ) u bac t − 1 + ˜ s D ( g ) u sig t − 1 s D ( g ) = � n (where ˜ i =1 I [ y i = 1] − s D ( g ) = false negative count) update signal class weight: u sig ← log( s D ( g t ) /b D ( g t ) + 1) t return g T Advantages Reduces optimizing AMS 2 to series of classification problems Can use any weighted classification procedure AMS 2 improves if g t decreases weighted classification error Questions: Where does this come from? Why should this work? Mackey (Stanford) Weighted Classification Cascades December 13, 2014 6 / 15

Weighted Classification Cascades The Difficulty of Optimizing AMS Approximate Median Significance (squared and halved) 1 � s D ( g ) + b D ( g ) � 2AMS 2 2 ( g, D ) = ( s D ( g ) + b D ( g )) log − s D ( g ) b D ( g ) True positive count s D ( g ) = � n i =1 I [ g ( x i ) = 1 , y i = 1] False positive count b D ( g ) = � n i =1 I [ g ( x i ) = 1 , y i = − 1] 2 AMS 2 1 2 is Combinatorial, as a function of indicator functions Non-decomposable across events, due to logarithm Convex in ( s D ( g ) , b D ( g )) , bad for maximization Mackey (Stanford) Weighted Classification Cascades December 13, 2014 7 / 15

Weighted Classification Cascades Linearizing AMS with Convex Duality Observation: 1 � s D ( g ) � u us D ( g ) b D ( g ) − f ∗ 2AMS 2 2 ( g, D ) = b D ( g ) f 2 = b D ( g ) sup 2 ( u ) b D ( g ) u u s D ( g ) − f ∗ = sup 2 ( u ) b D ( g ) 2 ( u ) b D ( g ) − u � n s D ( g ) + f ∗ = − inf u u ˜ i =1 I [ y i = 1] where f 2 ( t ) = (1 + t ) log(1 + t ) − t is convex f 2 admits variational representation f 2 ( t ) = sup u ut − f ∗ 2 ( u ) in terms of convex conjugate 2 ( u ) � sup t tu − f 2 ( t ) = e u − u − 1 f ∗ s D ( g ) = � n Since false negative count ˜ i =1 I [ y i = 1] − s D ( g ) Mackey (Stanford) Weighted Classification Cascades December 13, 2014 8 / 15

Weighted Classification Cascades Optimizing AMS with Coordinate Descent Take-away − 1 s D ( g ) + ( e u − u − 1) b D ( g ) − u � n 2AMS 2 2 ( g, D ) = inf u u ˜ i =1 I [ y i = 1] Maximizing AMS 2 is equivalent to minimizing weighted error s D ( g ) + ( e u − u − 1) b D ( g ) − u � n R 2 ( g, u, D ) � u ˜ i =1 I [ y i = 1] over classifiers g and signal class weight u jointly Optimize R 2 ( g, u, D ) with coordinate descent Update g t for fixed u t − 1 : train weighted classifier Update u t for fixed g t : closed form, u = log( s D ( g t ) /b D ( g t ) + 1) AMS 2 increases whenever a new g t +1 achieves smaller weighted classification error with respect to u t than its predecessor g t : 2 AMS 2 ( g t +1 ) 2 ≤ R 2 ( g t +1 , u t ) < R 2 ( g t , u t ) = − 1 − 1 2 AMS 2 ( g t ) 2 Minorization-maximization algorithm (like EM) Mackey (Stanford) Weighted Classification Cascades December 13, 2014 9 / 15

Weighted Classification Cascades Optimizing Alternative Significance Measures � Simpler Form of AMS: AMS 3 ( g, D ) = s D ( g ) / b D ( g ) � Approximates AMS 2 = AMS 3 × 1 + O (( s/b ) 3 ) when s ≪ b Amenable to weighted classification cascading 1 � s D ( g ) � 2AMS 2 f 3 ( t ) = (1 / 2) t 2 3 ( g, D ) = b D ( g ) f 3 for convex b D ( g ) (Can also support uncertainty in b : b D ( g ) ← b D ( g ) + σ b ) Algorithm (Weighted Classification Cascade for Maximizing AMS 3 ) for t = 1 to T t − 1 ← ( u sig ) 2 / 2 compute background class weight: u bac train any weighted classifier: g t ← approximate minimizer of weighted classification error b D ( g ) u bac s D ( g ) u sig t − 1 + ˜ t − 1 update signal class weight: u sig ← s D ( g t ) /b D ( g t ) t Mackey (Stanford) Weighted Classification Cascades December 13, 2014 10 / 15

HiggsML Challenge HiggsML Challenge Case Study Cascading in the Wild So far, recipe for turning classifier into training AMS maximizer Must be coupled with effective regularization strategies to ensure adequate test set generalization Team mymo incorporated two practical variants of cascading into HiggsML challenge solution, placing 31st out of 1800 teams Cascading Variant 1 Fit each classifier g t using XGBoost implementation of gradient tree boosting 1 To curb overfitting, computed true and false positive counts on held-out dataset D val and updated the class weight parameter u sig using s D val ( g t ) and b D val ( g t ) in lieu of s D ( g t ) and b D ( g t ) t 1 https://github.com/tqchen/xgboost Mackey (Stanford) Weighted Classification Cascades December 13, 2014 11 / 15

HiggsML Challenge HiggsML Challenge Case Study Cascading in the Wild So far, recipe for turning classifier into training AMS maximizer Must be coupled with effective regularization strategies to ensure adequate test set generalization Team mymo incorporated two practical variants of cascading into HiggsML challenge solution, placing 31st out of 1800 teams Cascading Variant 2 Maintained single persistent classifier, the complexity of which grew on each cascade round Developed a customized XGBoost classifier that, on cascade round t , introduced a single new decision tree based on the gradient of the round t weighted classification error In effect, each classifier g t was warm-started from the prior round classifier g t − 1 Mackey (Stanford) Weighted Classification Cascades December 13, 2014 12 / 15

Weighted Classification Cascades for Optimizing Discovery - PowerPoint PPT Presentation

Weighted Classification Cascades for Optimizing Discovery Significance Lester Mackey Collaborators: Jordan Bryan and Man Yue Mo Stanford University December 13, 2014 Mackey (Stanford) Weighted Classification Cascades December 13,

Optimizing cascades & submodular optimization Rik Sarkar Today Maximizing cascades

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

Information Cascades in Human Networks Milo Trujillo Professor Gao Information Cascades

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

11.4 The Pricing Method: Vertex Cover Weighted Vertex Cover Weighted vertex cover. Given a

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Heuristic search Weighted A Kustaa Kangas October 17, 2013 K. Kangas () Heuristic search

WEIGHTED ORLICZ ALGEBRAS Serap OZTOP Istanbul University ( This is joint work with Alen

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Cascades Recovery Inc. We care so much about paper and packaging; when youre done with it, we

Cascades Social and Technological Networks Rik Sarkar University of Edinburgh, 2019. Network

Scaling the Cascades Interconnect-aware FPGA implementation of Machine Learning problems Anand

Approximating likelihood ratios with calibrated classifiers Gilles Louppe June 22, 2016 MLHEP,

Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, , 2019 2019

Classification Fundamentals and Overview September 17, 2019 Classification Fundamentals

Test for Covariances Max Turgeon STAT 7200Multivariate Statistics Objectives Review

Batch Steganography and Pooled Steganalysis Andrew Ker adk@comlab.ox.ac.uk Royal Society

EC3062 ECONOMETRICS LIMITED DEPENDENT VARIABLES Logistic Trends One way of modelling a process

Statistics and Data Analysis Regression Analysis (1) Ling-Chieh Kung Department of Information

The Behavioral Approach to Systems Theory Paolo Rapisarda, Un. of Southampton, U.K. & Jan

Sambuz

Useful Links

Newsletter

Mail Us

Weighted Classification Cascades for Optimizing Discovery - PowerPoint PPT Presentation

Weighted Classification Cascades for Optimizing Discovery Significance Lester Mackey Collaborators: Jordan Bryan and Man Yue Mo Stanford University December 13, 2014 Mackey (Stanford) Weighted Classification Cascades December 13,

Optimizing cascades &amp; submodular optimization Rik Sarkar Today Maximizing cascades

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

Information Cascades in Human Networks Milo Trujillo Professor Gao Information Cascades

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

11.4 The Pricing Method: Vertex Cover Weighted Vertex Cover Weighted vertex cover. Given a

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Heuristic search Weighted A Kustaa Kangas October 17, 2013 K. Kangas () Heuristic search

WEIGHTED ORLICZ ALGEBRAS Serap OZTOP Istanbul University ( This is joint work with Alen

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Cascades Recovery Inc. We care so much about paper and packaging; when youre done with it, we

Cascades Social and Technological Networks Rik Sarkar University of Edinburgh, 2019. Network

Scaling the Cascades Interconnect-aware FPGA implementation of Machine Learning problems Anand

Approximating likelihood ratios with calibrated classifiers Gilles Louppe June 22, 2016 MLHEP,

Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, , 2019 2019

Classification Fundamentals and Overview September 17, 2019 Classification Fundamentals

Test for Covariances Max Turgeon STAT 7200Multivariate Statistics Objectives Review

Batch Steganography and Pooled Steganalysis Andrew Ker adk@comlab.ox.ac.uk Royal Society

EC3062 ECONOMETRICS LIMITED DEPENDENT VARIABLES Logistic Trends One way of modelling a process

Statistics and Data Analysis Regression Analysis (1) Ling-Chieh Kung Department of Information

The Behavioral Approach to Systems Theory Paolo Rapisarda, Un. of Southampton, U.K. &amp; Jan

Sambuz

Useful Links

Newsletter

Mail Us

Optimizing cascades & submodular optimization Rik Sarkar Today Maximizing cascades

The Behavioral Approach to Systems Theory Paolo Rapisarda, Un. of Southampton, U.K. & Jan