Road Map When Costs and Probabilities are Both Unknown Problem - PowerPoint PPT Presentation

Learning and Making Decisions Road Map When Costs and Probabilities are Both Unknown • Problem and Challenge Formulation • Preliminary Knowledge • Cost-Sensitive Learning Methods • Probability Estimation • Donation Amount Estimation • Experiments • Contributions Bianca Zadrozny and Charles Elkan Presenter: Aurora Problem Formulation Challenge Formulation KDD’98 charitable donations dataset: • Cost-sensitive problem Actual non-donor Actual donor • Probabilities and costs are not independent Predict non-donor 0 0 random variables. Predict donor (mail) -0.68 y(x) – 0.68 • The training examples for which costs are known cost of soliciting = $ 0.68 are not representative of all examples. (sample Benefit of soliciting = P(j = 0|x)B(1,0,x) + P(j = 1|x)B(1,1,x) selection bias) = (1 – P(j = 1|x))(-0.68) + P(j = 1|x)(y(x) – 0.68) = P(j = 1|x)y(x) – 0.68 If P(j = 1|x)y(x) > 0.68, then make a solicitation. The probability a person donate is about 5%.

Preliminary Knowledge Preliminary Knowledge Bias vs. Variance Bagging • Bias: This quantity measures how closely the learning algorithm’s average guess (over all possible training sets of the given training set size) matches the target. Bagging votes classifiers generated by different • Variance: This quantity measures how much the learning algorithm’s bootstrap samples. A bootstrap sample is generated by guess fluctuates for the different training sets of the given size. uniformly sampling m instances from the training set with Stable vs. Unstable Classifier replacement. T bootstrap samples B 1 , B 2 , …, B T are generated and a classifier C i is built from each B i . A final Unstable Classifier : Small perturbations in the training set or in construction may result in large changes in the constructed predictor. classifier C is built from C 1 , C 2 , …, C T by voting. • Unstable Classifiers : Decision Tree, ANN Characteristically have high variance and low bias. Bagging can reduce the variance of unstable classifiers. • Stable Classifiers : Naïve Bayes, KNN Have low variance, but can have high bias. Cost-Sensitive Learning Methods Why bagging is not suitable for estimating conditional probability? ---- compare with previous work 1. Bagging gives voting estimates that measure the ÿ P(j|x) C(i,j,x) stability of the classifier learning method at an example, not the actual class conditional probability • MetaCost of the example. – Train ÿ P(j|x) C(i,j,x) estimator for each example. How does bagging in MetaCost work? – Assumption: costs are known in advance and are the same Eg: Among n sub-classifiers, k of them give class label 1 for x, then for all examples. P(j = 1|x) = k / n. – Use bagging to estimate probabilities. My solution: • Direct Cost-Sensitive Decision-Making Use the average of the probabilities over all sub-classifiers as the final probability. – Train P(j|x) estimator and C(i,j,x) estimator for each example. 2. Bagging can reduce the variance of the final – Cost is unknown for test data and example-dependent. classifier by combining several classifiers, but can – Use decision tree to estimate probabilities. not remove the bias of each sub-classifier.

Raw Decision Tree Conditional Probability Probability Estimation Estimation ---- Obtain calibrated probability estimation from Assign p = k/n as the conditional probability for each example that is assigned to a decision tree leaf that contains k positive training examples decision tree and Naïve Bayesian and n total training examples. • Decision Tree Unstable Deficiencies of Decision Tree – Smoothing • High bias: Decision tree growing methods try to make leaves homogeneous, so observed frequencies are systematically shifted towards – Curtailment zero and one. Smoothing • Naïve Bayesian Stable • High variance: When the number of training examples associated with a – Binning leaf is small, observed frequencies are not statistically reliable. Curtailment, • Averaging Probability Estimators (Not pruning, because pruning is based on error rate minimization, not cost minimization.) Smoothing One way to improve the probability estimation of decision tree is to make these estimation less extreme. replace by Base rate: b = 0.05 m = 10 m = 100 As m increases, probabilities are shifted more towards the base rate.

Deficiency of Naïve Bayes Curtailment Assumption: Attributes of examples are independent. If the parent of a small leaf contains enough examples to induce a statistically reliable Actually attributes tend to be correlated, so scores are typically too extreme: probability estimate, then assigning this estimate to a test example associated with for n(x) near 0, n(x) < P(j = 1|x) (positively correlated) the leaf may be more accurate then assigning it a combination of the base rate and the observed leaf frequency. for n(x) near 1, n(x) > P(j = 1|x) (negatively correlated) Solution: Binning The probability that x belongs to class j is the fraction of training examples in the bin that actually belong to j, which is represented with the blue star. Averaging Probability Estimators Donation Amount Estimation Intuition: • Least-Squares Multiple Linear Regression is the variance of each original classifier. Linear is the correlation factor N is the number of classifiers • Variance of Heckman’s Method If the classifiers are partially uncorrelated, --- Sample Selection Bias variance can be reduced by averaging Non-Linear combination.

Sample Selection Bias Least-Squares Multiple Linear Regression Definition: The training examples used to learn a model are drawn from a different probability distribution than the examples to which the model is Two attributes: applied. Lastgift: dollar amount of the most recent donation Situation: The donation amounts estimator is trained based on examples of people who actually donate, but this estimator must then be applied to a different Ampergift: average donation amount in responses to population --- both donors and non-donors. the last 22 promotions Experiment • Heckkman’s solution: 1. Learn a probit linear model to estimate conditional probabilities P(j = 1|x). 2. Estimate y(x) by linear regression using only the training examples x for which j(x) = 1, but including for each x a transformation of the estimated value of P(j = 1|x). direct cost-sensitive decision- making • Bianca’s solution: 1. Instead of using a linear estimator for P(j = 1|x), she uses non- linear estimator decision tree or naïve Bayes classifier. 2. Use a non-linear learning method to obtain an estimator for y(x). Three attributes: MetaCost Lastgift, Ampergift, P(j = 1|x) The performance of direct cost-sensitive decision- making is better than MetaCost. While both can be improved by any technique proposed for probability estimation.

Contributions • Provide a cost-sensitive learning method: direct cost- sensitive decision-making, which is better than the previous method MetaCost. • Provide several techniques to improve the performance of probability estimator. • Provide solution to the problem of costs being example-dependent and unknown in general. • Provide solution to the problem of sample selection bias.

Road Map When Costs and Probabilities are Both Unknown Problem - PowerPoint PPT Presentation

Learning and Making Decisions Road Map When Costs and Probabilities are Both Unknown Problem and Challenge Formulation Preliminary Knowledge Cost-Sensitive Learning Methods Probability Estimation Donation Amount

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

Abstract Data Type Map Map ADT Another fundamental abstract data type is the map (also The most

What is a road vehicle? Road & Non-road vehicles in the RVSA The Meaning of Road

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

var ol3d = new olcs.OLCesium({map: map, target: id}); ol3d.setEnabled(true); var ol3d = new

Pouley Road Rustic Road Designation Pouley Road Rustic Road Designation Process From March 2006 to

Transf ansformat ormation ion Road Map oad Map for Children and Young Peoples Me Ment

Road-map for R &D and Road-map for R &D and Innovation in Higher &

Implementation of the Road Map to 2015; Implications for patients/ consumers Presented by: Emer

Community-School Partnerships Road Map Project Goal The Road Map Project goal is to double the

ROAD MAP TO LOW CARBON WORLD ROAD MAP TO LOW CARBON WORLD BHARTIYA CLIMATE INDEX BHARTIYA

Primary Care Road Map to Recovery Primary Care Practices June 18, 2020 Primary Care Road Map to

The Heros Journey A Road Map for Change A road map Familiar to us all Our story

Measures of Academic Progress (MAP) What is MAP? MAP - Measures of Academic Progress

SHIPPING LANE DENSITY MAP TOP 25 CONTAINER PORTS UNION PACIFIC RAIL MAP I-80 INTERSTATE MAP

Space-time Mapping New ways of exploring and explaining data Andy Eschbacher @MrEPhysics Map

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

Audio instructions Select Computer audio to use your computers sound OR Select

Staying Safe at Work Teaching PRIDE Workers with Disabilities about Health and Safety on the Job

Finite element methods for Maxwells equations: A local a priori estimate Claudio Rojik Vienna

Industrial survey papers Six reasons to reject them Marco Torchiano, Filippo Ricca CESI 2013

Robust Models in Information Retrieval Nedim Lipka Benno Stein Bauhaus-Universitt Weimar

GPP 501 Microeconomic Analysis for Public Policy Fall 2017 Given by Kevin Milligan Vancouver

Lecture 1 We start by recalling that the tropical variety of an affine variety equipped with an

Road Map When Costs and Probabilities are Both Unknown Problem - PowerPoint PPT Presentation

Learning and Making Decisions Road Map When Costs and Probabilities are Both Unknown Problem and Challenge Formulation Preliminary Knowledge Cost-Sensitive Learning Methods Probability Estimation Donation Amount

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

Abstract Data Type Map Map ADT Another fundamental abstract data type is the map (also The most

What is a road vehicle? Road &amp; Non-road vehicles in the RVSA The Meaning of Road

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

var ol3d = new olcs.OLCesium({map: map, target: id}); ol3d.setEnabled(true); var ol3d = new

Pouley Road Rustic Road Designation Pouley Road Rustic Road Designation Process From March 2006 to

Transf ansformat ormation ion Road Map oad Map for Children and Young Peoples Me Ment

Road-map for R &amp;D and Road-map for R &amp;D and Innovation in Higher &amp;

Implementation of the Road Map to 2015; Implications for patients/ consumers Presented by: Emer

Community-School Partnerships Road Map Project Goal The Road Map Project goal is to double the

ROAD MAP TO LOW CARBON WORLD ROAD MAP TO LOW CARBON WORLD BHARTIYA CLIMATE INDEX BHARTIYA

Primary Care Road Map to Recovery Primary Care Practices June 18, 2020 Primary Care Road Map to

The Heros Journey A Road Map for Change A road map Familiar to us all Our story

Measures of Academic Progress (MAP) What is MAP? MAP - Measures of Academic Progress

SHIPPING LANE DENSITY MAP TOP 25 CONTAINER PORTS UNION PACIFIC RAIL MAP I-80 INTERSTATE MAP

Space-time Mapping New ways of exploring and explaining data Andy Eschbacher @MrEPhysics Map

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

Audio instructions Select Computer audio to use your computers sound OR Select

Staying Safe at Work Teaching PRIDE Workers with Disabilities about Health and Safety on the Job

Finite element methods for Maxwells equations: A local a priori estimate Claudio Rojik Vienna

Industrial survey papers Six reasons to reject them Marco Torchiano, Filippo Ricca CESI 2013

Robust Models in Information Retrieval Nedim Lipka Benno Stein Bauhaus-Universitt Weimar

GPP 501 Microeconomic Analysis for Public Policy Fall 2017 Given by Kevin Milligan Vancouver

Lecture 1 We start by recalling that the tropical variety of an affine variety equipped with an

What is a road vehicle? Road & Non-road vehicles in the RVSA The Meaning of Road

Road-map for R &D and Road-map for R &D and Innovation in Higher &