Machine Learning Paradigms for Utility Based Data Mining Naoki Abe - PowerPoint PPT Presentation

Machine Learning Paradigms for Utility Based Data Mining Naoki Abe Data Analytics Research Mathematical Sciences Department IBM T. J. Watson Research Center

Contents • Learning Models and Utility – Learning Models – Utility-based Versions • Case Studies – Example-dependent Cost-sensitive Learning – On-line Active Learning – One-Benefit Cost-sensitive Learning – Batch vs. On-line Reinforcement Learning • Applications • Discussions

(Standard) Batch Learning Model Target Function F: X1, X2,..,Xt F(X1), F(X2),..,F(Xt) Input ?Ï Output Distribution D Learner Model H Learner’s Goal: Minimize Error(H, F) for given t e.g.) PAC-Learning Model[Valiant’84] ≠ > ε < δ PAC-Learning = Pr{ [ ( ) ( )] } E H x F x ≈ x D

(Utility-based) Batch Learning Model Target Function F :˜ X1, X2,..,Xt F(X1), F(X2),..,F(Xt) Input ?Ï Output Distribution D Learner Model H Learner’s Goal: Minimize Loss(H, F) for given t e.g.) Decision Theoretic Generalization of PAC Learning*… [Haussler’92] > ε < δ Pr{ [ ( ( ), ( ))] } E l H x F x Generalized-PAC-Learning = ≈ x D *Subsumes cost-matrix formulation of cost-sensitive learning, but not example dependent cost formulation …

Active Learning Model Target Function F :˜ Input ?Ï Output X1, X2,..,Xt F(X1), F(X2),..,F(Xt) Learner is given its label/value Active Learner Learner chooses Model H example Active Learner’s Goal: Minimize err(H, F) for given t (Minimize t for given err(H,F)) e.g.) MAT-learning model [Angluin’88]: Minimize t to achieve err(H,F)=0, assuming that F belongs to given class

(Utility-based) Active Learning Model Target Function F :˜ Input ?Ï Output X1, X2,..,Xt F(X1), F(X2),..,F(Xt) Active Learner Model H Active Learner’s Goal: Minimize cost(H, F) + S� cost(Xi) for given t c.f.) Active feature value acquisition [Melville et al ’04, ’05]* *Not subsumed since acquisition of individual feature values is considered

On-line Learning Model Target Function F :˜ F(X1), F(X2),..,F(Xt) Input ?Ï Output Adversary X1, X2,..,Xt ^ ^ ^ On-line Learner F(X1), F(X2),..,F(Xt) ^ On-line Learner’s Goal: Minimize Cum. Error S� err(F(Xi),F(Xi)) e.g.) Mistake Bound Model [Littlestone ’88], Expert Model [Cesa-Bianchi et al 97] t ∑ ˆ − Minimize the worst-case | ( ) ( ) | F x F x i i = 1 i

(Utility-based) On-line Learning Model Target Function F :˜ F(X1), F(X2),..,F(Xt) Input ?Ï Output Adversary X1, X2,..,Xt ^ ^ ^ On-line Learner F(X1), F(X2),..,F(Xt) ^ On-line Learner’s Goal: Minimize S• Loss(F(Xi),F(Xi)) e.g.) On-line loss bound model [Yamanishi ’91]

On-line Active Learning (Associative Reinforcement Learning*) Environment F :ø Action ?/ Reward X1, X2,..,Xt F(X1), F(X2),..,F(Xt) Actor Actor Chooses one Actor receives (Learner) of given alternatives: Corresponding reward Xi,1,Xi,2,..,Xi,n Actor’s Goal: Maximize Cumulative Rewards SŠ F(Xi) (F(xi) can incorporate cost(xi): this is already a utility-based model !) e.g.) Bandit Problem [BF’85], Associative Reinforcement Learning [Kaelbling’94] Apple Tasting [Helmbold et al’92], Lob-Pass [Abe&Takeuchi’93] Linear Function Evaluation [Long 97, Abe&Long 99, ABL’03] *Also known as “Reinforcement Learning with Immediate Rewards”

Reinforcement Learning Markov Decision Processes Environment R :� Actor receives State, Action Corresponding reward Environment F :u ?| Reward R1, R2,..,Rt State, Action ?Ü State A1, A2,..,At S1, S2,..,St Actor moves Actor to another state Actor Chooses (Learner) one action Actor’s Goal: Maximize Cumulative Rewards Sá Ri (or Sá?� i Ri) e.g.) Reinforcement Learning for Active Model Selection [KG’05] Pruning improves cost-sensitive learning [B-Z,D’02]

Contents • Learning Models and UBDM – Learning Models – Utility-based Versions • Case Studies – Example-dependent Cost-sensitive Learning – One-Benefit Cost-Sensitive Learning – On-line Active Learning – Batch vs. On-line Reinforcement Learning • Applications • Discussions

Example Dependent Cost-Sensitive Learning [ZE’01,ZLA’03] Distribution D Instance Distribution ?� Input � � � , ,..., C C C X1, X2,..,Xt 1 2 t Cost Distribution Input ?ž (Label ?ž Cost) Learner Policy h: X ?� Y PAC Cost-sensitive Learning… [ZLA’03] ⋅ ≠ − > ε < δ Pr{ [ ( ( ) )] min { ( )} } E c I h x y Cost f ≈ ∈ , , x y c D f H • A key property of this model is that the learner must learn the utility-function from data • Distributional modeling has let to simple but effective method with theoretical guarantee • The full cost knowledge model works for 2-class or cost-matrix formulations, but…

One Benefit (Cost-Sensitive) Learning [Zadrozny’03,’05] Distribution D Instance Distribution ?þ Input Sampling Policy , ,..., x x x ( , ), ( , ),..., ( , ) y C y C y t C 1 2 t 1 1 2 2 t Input ?ž Label Cost Distribution Input, Label ?~ Cost Learner Policy h: X ?G Y *Key property is that the learner gets to observe the utility corresponding only to the action (option/decision) it took…

One Benefit Cost-Sensitive Learning [Zadrozny’03,’05] Distribution D Instance Distribution ?¾ Input Learned Policy , ,..., x x x ( , ), ( , ),..., ( , ) y C y C y t C 1 2 t 1 1 2 2 t h: Input ?ž Label Cost Distribution Input, Label ?~ Cost Learner Policy h: X ?G Y Learner’s Goal: Minimize Cost(h) w.r.t. D ?• h *Key property is that the learner gets to observe the utility corresponding only to the action (option/decision) it took… *Another key property is that sampling policy and learned policy differ

An Example On-line Active Learning Model: Linear Probabilistic Concept Evaluation [Abe and Long ’99] – Select one from a number of alternatives – Success probability =Linear Function(Attributes) – Performance Evaluation for Learner/Selector E(Regret) =ãEã (ã Optimal Rewards )ã - ãEã (ã Cumulative Rewards )ã If you knew function F At each trial Altlernative 1Š (1,0,0,1) Alternative 1J (1,0,0,1) Selection Alternative 20 (1,1,0,1) Linear Function Actor (J Learner/Selector )J FY ( x ) =YSY wi xi Reward Alternative 3” (0,0,1,0) Alternative 4a (0,1,0,1) Success OR Failure Actor’s Goal: Maximize Total Rewards !õ

An Example On-line Learning/Selection Method [AL’99] • Strategy Ap 1/2 – Learning :j Widrow-Hoff Update with Step Size aj =j 1/t – Selection: ^ ^ • Explore: Select J ( ?ð I*) with prob. ?ð 1/|F(I*)-F(J)| • Exploit: Otherwise select I* with max estimated success probability

Performance Analysis Bounds on Worst Case Expected Regrets Theorem [AL’99] • Upper Bound on Expected Regret – Learning Strategy A 3/4 1/2 • Expected Regret =/ O(t n ) • Lower Bound on Expected Regret 3/4 1/4 – Expected Regret of any Learner =+O+ (t n ) Expected regret of Strategy A is asymptotically optimal as function of t !�

One-Benefit Cost-Sensitive Learning [Zadrozny ’05] as On-line Active Learning “One-Benefit Cost-Sensitive Learning” [Z’05] could be thought of as a “batch” version of on-line active learning • Each alternative consists of the common x-vector and a variable y-label • Alternative Vectors: (X ·³ Y1), (X ·³ Y2), (X ·³ Y3),…, (X ·³ Yk) At each trial Alternative 3 (1,1,0,3) Alternative 1 (1,1,0,1) Selection Alternative 2 (1,1,0,2) Linear Function On-line Actor (” Learner/Selector )” F² ( x ) =² S² wi xi Reward Alternative 3 (1,1,0,3) (1,1,0,4) Alternative 4 Benefit Actor’s Goal: Maximize Total Benefits !õ

One-Benefit (Cost-Sensitive) Learning [Z’05] as Batch Random-Transition Reinforcement Learning* *Called “Policy Mining” in Zadrozny’s thesis [’03] Environment R :~ State x, Action y Environment F :Þ ?è Reward r r1, r2,..,rt ?H State x y1, y2,..,yt x1, x2,..,xt Actor Actor chooses (Policy:x ?z y) Actor receives one action y corresponding reward depending on state x On-line Learner’s Goal: Maximize Cumulative Rewards S” ri Batch Learner’s Goal: Find policy F s.t. expected reward E D [R(x,F(x))] is maximized, given data generated w.r.t. sampling policy P(y|x)

On-line v.s. Batch Reinforcement Learning Environment R :ž State, Action Actor receives ?� Reward corresponding reward R1, R2,..,Rt Transition T: State, Action ?ˆ State A1, A2,..,At S1, S2,..,St Actor moves Actor to another state Actor chooses (Policy F:S ?º A) one action a depending on state s On-line learner’s Goal: Maximize Cumulative Rewards Sc Ri Batch Learner’s Goal: Find policy F s.t. expected reward E T [R(s,F(s))] is maximized, given data generated w.r.t. sampling policy P(a|s)

Contents • Learning Models and Utility – Learning Models – Utility-based Versions • Case Studies – Example-dependent Cost-sensitive Learning – One-Benefit Cost-Sensitive Learning – On-line Active Learning – Batch vs. On-line Reinforcement Learning • Applications • Discussions

Internet Banner Ad Targeting [LNKAK’98,AN’98] • Learn Fit Between Ads and Keywords/Pages • Display a Toyota Ad on keyword ‘drive’ • Display a Disney Ad on animation page • The Goal is to maximize the total click-through’s Car Ad Search Keyword ‘drive’

Machine Learning Paradigms for Utility Based Data Mining Naoki Abe - PowerPoint PPT Presentation

Machine Learning Paradigms for Utility Based Data Mining Naoki Abe Data Analytics Research Mathematical Sciences Department IBM T. J. Watson Research Center Contents Learning Models and Utility Learning Models Utility-based

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 1 of Data Mining by

Introduction What is data mining? to Data mining functionalities Data Mining Major

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Breaking Paradigms in Control Building Design By Robert Frye Tennessee Valley Authority April 6,

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Utility Flood SOLUTIONS November 9, 2017 UTILITY LIGHTING PRODUCTS 1 1 HO HOWARD WARD

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Council Website - Example Council Websites A brief overview 1 2 Login Dashboard 3 4 1

Ali Kamandi kamandi@ce.sharif.edu Spring 2007 Sharif University of Technology

Agreement in HPSG Introduction to HPSG, WS 2007/2008 Monica L. L au Universitt Tbingen

Securing Industrial Control Systems An E2E Integrity Verification Approach Sye-Loong Keoh , Ken

Typing and ML Typing and ML CSC324 Fall 2004 Sheila McIlraith Acknowledgement: The material

Scheme problems Unsolved: Printf debugging Solved: Need switch or similar (Clausal

No domain left behind: is Lets Encrypt democratizing encryption? Maarten Aertsen 1 , Maciej

Human-Network Interaction: Building Bridges Between HCI and Networking Research W. Keith