NeuroComp Machine Learning and Validation Mich` ele Sebag - PowerPoint PPT Presentation

NeuroComp Machine Learning and Validation Mich` ele Sebag http://tao.lri.fr/tiki-index.php Nov. 16th 2011

Validation, the questions 1. What is the result ? 2. My results look good. Are they ? 3. Does my system outperform yours ? 4. How to set up my system ?

Contents Position of the problem Background notations Difficulties The learning process The villain Validation Performance indicators Estimating an indicator Testing a hypothesis Comparing hypotheses Validation Campaign The point of parameter setting Racing Expected Global Improvement

Supervised Machine Learning Context Oracle World → instance x i → ↓ y i Training set E = { ( x i , y i ) , i = 1 . . . n , x i ∈ X , y i ∈ Y} Input : Output : Hypothesis h : X �→ Y Criterion : few mistakes (details later)

Definitions Example ◮ row : example/ case ◮ column : feature/variables/attribute ◮ attribute : class/label Instance space X ◮ Propositionnal : R d X ≡ I ◮ Relational : ex. chemistry. molecule: alanine

Difficulty factors Quality of examples / of representation + Relevant features Feature extraction − Not enough data − Noise ; missing data − Structured data : spatio-temporal, relational, textual, videos .. Distribution of examples + Independent, identically distributed examples − Other: robotics; data stream; heterogeneous data Prior knowledge + Constraints on sought solution + Criteria; loss function

Difficulty factors, 2 Learning criterion + Convex function: a single optimum n 2 ց Complexity : n , nlogn , Scalability − Combinatorial optimization What is your agenda ? ◮ Prediction performance ◮ Causality ◮ INTELLIGIBILITY ◮ Simplicity ◮ Stability ◮ Interactivity, visualisation

Difficulty factors, 3 Crossing the chasm ◮ There exists no killer algorithm ◮ Few general recommendations about algorithm selection Performance criteria ◮ Consistency When number n of examples goes to ∞ and the target concept h ∗ is in H Algorithm finds ˆ h n , with lim n →∞ h n = h ∗ ◮ Convergence speed || h ∗ − h n || = O (1 / n ) , O (1 / √ n ) , O (1 / ln n )

Context Related approaches criteria ◮ Data Mining, KDD scalability ◮ Statistics and data analysis Model selection and fitting; hypothesis testing ◮ Machine Learning Prior knowledge; representations; distributions ◮ Optimisation well-posed / ill-posed problems ◮ Computer Human Interface No ultimate solution: a dialog ◮ High performance computing Distributed data; privacy

Methodology Phases 1. Collect data expert, DB 2. Clean data stat, expert 3. Select data stat, expert 4. Data Mining / Machine Learning ◮ Description what is in data ? ◮ Prediction Decide for one example ◮ Agregate Take a global decision 5. Visualisation chm 6. Evaluation stat, chm 7. Collect new data expert, stat An interative process depending on expectations, data, prior knowledge, current results

Supervised Machine Learning Context Oracle World → instance x i → ↓ y i Input Training set E = { ( x i , y i ) , i = 1 . . . n , x i ∈ X , y i ∈ Y} Tasks ◮ Select hypothesis space H ◮ Assess hypothesis h ∈ H score ( h ) ◮ Find best hypothesis h ∗

What is the point ? Underfitting Overfitting The point is not to be perfect on the training set

What is the point ? Underfitting Overfitting The point is not to be perfect on the training set The villain: overfitting Test error Training error Complexity of Hypotheses

What is the point ? Prediction good on future instances Necessary condition: Future instances must be similar to training instances “identically distributed” Minimize (cost of) errors ℓ ( y , h ( x )) ≥ 0 not all mistakes are equal.

Error: theoretical approach Minimize expectation of error cost � Minimize E [ ℓ ( y , h ( x ))] = ℓ ( y , h ( x )) p ( x , y ) dx dy X × Y

Error: theoretical approach Minimize expectation of error cost � Minimize E [ ℓ ( y , h ( x ))] = ℓ ( y , h ( x )) p ( x , y ) dx dy X × Y Principle Si h “is well-behaved“ on E , and h is ”sufficiently regular” h will be well-behaved in expectation. � n i =1 F ( x i ) E [ F ] ≤ + c ( F , n ) n

Classification, Problem posed ∼ P ( x , y ) INPUT E = { ( x i , y i ) , x i ∈ X , y i ∈ { 0 , 1 } , i = 1 . . . n } HYPOTHESIS SPACE SEARCH SPACE H h : X �→ { 0 , 1 } LOSS FUNCTION ℓ : Y × Y �→ I R OUTPUT h ∗ = arg max { score ( h ) , h ∈ H}

Classification, criteria Generalisation error � Err ( h ) = E [ ℓ ( y , h ( x ))] = ℓ ( y , h ( x )) dP ( x , y ) Empirical error n Err e ( h ) = 1 � ℓ ( y i , h ( x i )) n i =1 Bound risk minimization Err ( h ) < Err e ( h ) + F ( n , d ( H )) d ( H ) = VC-dimension of H

Dimension of Vapnik Cervonenkis Principle Given hypothesis space H : X �→ { 0 , 1 } Given n points x 1 , . . . , x n in X . If, ∀ ( y i ) n i =1 ∈ { 0 , 1 } n , ∃ h ∈ H / h ( x i ) = y i , H shatters { x 1 , . . . , x n } R p Example: X = I R p ) = p + 1 d (hyperplanes in I WHY: if H shatters E , E doesn’t tell anything o o o o o o o 3 pts shattered by a line 4 points, non shattered Definition d ( H ) = max { n / ∃ ( x 1 . . . , x n } shattered by H}

Classification: Ingredients of error Bias Bias ( H ): error of the best hypothesis h ∗ in H Variance Variance of h n depending on E ^ ^ h VARIANCE h ^ h Hypothesis space * h BIAS h Optimization negligible in small scale takes over in large scale (Google)

Validation: Three questions Define a good indicator of quality ◮ Misclassification cost ◮ Area under the ROC curve Computing an estimate thereof ◮ Validation set ◮ Cross-Validation ◮ Leave one out ◮ Bootstrap Compare estimates: Tests and confidence levels

Which indicator, which estimate: it depends. Settings ◮ Large/few data Data distribution ◮ Dependent/independent examples ◮ balanced/imbalanced classes

Performance indicators Binary class ◮ h ∗ the truth ◮ ˆ h the learned hypothesis Confusion matrix ˆ h / h ∗ 1 0 1 a b a + b 0 c d c+d a+c b+d a + b + c + d

Performance indicators, 2 ˆ h / h ∗ 1 0 1 a b a + b 0 c d c+d a+c b+d a + b + c + d ◮ Misclassification rate b + c a + b + c + d a ◮ Sensitivity, True positive rate (TP) a + c ◮ Specificity, False negative rate (FN) b b + d a ◮ Recall a + c ◮ Precision a a + b Note: always compare to random guessing / baseline alg.

Performance indicators, 3 The Area under the ROC curve ◮ ROC: Receiver Operating Characteristics ◮ Origin: Signal Processing, Medicine Principle h : X �→ I h ( x ) measures the risk of patient x R h leads to order the examples: + + + − + − + + + + − − − + − − − + − − − − − − − − − − −−

Performance indicators, 3 The Area under the ROC curve ◮ ROC: Receiver Operating Characteristics ◮ Origin: Signal Processing, Medicine Principle h : X �→ I h ( x ) measures the risk of patient x R h leads to order the examples: + + + − + − + + + + − − − + − − − + − − − − − − − − − − −− Given a threshold θ , h yields a classifier: Yes iff h ( x ) > θ . + + + − + − + + ++ | − − − + − − − + − − − − − − − − − − −− Here, TP ( θ )= .8; FN ( θ ) = .1

The ROC curve R 2 : M ( θ ) = (1 − TNR , FPR ) θ �→ I Ideal classifier: (0 False negative,1 True positive) Diagonal (True Positive = False negative) ≡ nothing learned.

NeuroComp Machine Learning and Validation Mich` ele Sebag - PowerPoint PPT Presentation

NeuroComp Machine Learning and Validation Mich` ele Sebag http://tao.lri.fr/tiki-index.php Nov. 16th 2011 Validation, the questions 1. What is the result ? 2. My results look good. Are they ? 3. Does my system outperform yours ? 4. How to

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

LaGov LaGov Version 2.2 Updated: 12/17/08 Visit our website for Blueprint Presentations,

Progress to Date in A3: Method Transfer, Partial Validation and Cross validation A3: Method

Module 4 19/05/2015 2 Agenda 1. What is validation? 2. Three-part empathy 3. What is

LaGov LaGov Validation Session Agenda Validation Session Agenda Purpose Work Session

Bounce Address Tag Validation Bounce Address Tag Validation Bounce Address Tag Validation (BATV)

Capital Quality Validation Webinar Sept. 17, 2020 Agenda Validation Overview

+ Special Topic Presentation: Incremental Processing Rebecca Myhre + What and Why? n Most

Climate Change and Non-Residential Electricity Consumption in Colombia Shaun McRae University of

Chapter 3: Basics from Probability Theory and Statistics 3.1 Probability Theory Events,

!"#$%&'%()#*&$+%,'-.#-/0%1(,23% 4%5&$%/6%"&'$/7+%

Compromise Agreements & Confidentiality Examining the impact of Duchy Farms Kennels Ltd v.

to X-Sell Ac Access Co Code: 653-859 859-737 737 Please submit questions using the

Introduction 3 Purpose and Application of Course Book and Training Train Public Housing

Resolving Malpractice Claims after Tort Reform: Experience in a Self-Insured Texas Public

NeuroComp Machine Learning and Validation Mich` ele Sebag - PowerPoint PPT Presentation

NeuroComp Machine Learning and Validation Mich` ele Sebag http://tao.lri.fr/tiki-index.php Nov. 16th 2011 Validation, the questions 1. What is the result ? 2. My results look good. Are they ? 3. Does my system outperform yours ? 4. How to

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

LaGov LaGov Version 2.2 Updated: 12/17/08 Visit our website for Blueprint Presentations,

Progress to Date in A3: Method Transfer, Partial Validation and Cross validation A3: Method

Module 4 19/05/2015 2 Agenda 1. What is validation? 2. Three-part empathy 3. What is

LaGov LaGov Validation Session Agenda Validation Session Agenda Purpose Work Session

Bounce Address Tag Validation Bounce Address Tag Validation Bounce Address Tag Validation (BATV)

Capital Quality Validation Webinar Sept. 17, 2020 Agenda Validation Overview

+ Special Topic Presentation: Incremental Processing Rebecca Myhre + What and Why? n Most

Climate Change and Non-Residential Electricity Consumption in Colombia Shaun McRae University of

Chapter 3: Basics from Probability Theory and Statistics 3.1 Probability Theory Events,

!&quot;#$%&amp;'%()#*&amp;$+%,'-.#-/0%1(,23% 4%5&amp;$%/6%&quot;&amp;'$/7+%

Compromise Agreements &amp; Confidentiality Examining the impact of Duchy Farms Kennels Ltd v.

to X-Sell Ac Access Co Code: 653-859 859-737 737 Please submit questions using the

Introduction 3 Purpose and Application of Course Book and Training Train Public Housing

Resolving Malpractice Claims after Tort Reform: Experience in a Self-Insured Texas Public

!"#$%&'%()#*&$+%,'-.#-/0%1(,23% 4%5&$%/6%"&'$/7+%

Compromise Agreements & Confidentiality Examining the impact of Duchy Farms Kennels Ltd v.