Mutual Information an Adequate Tool for Feature Selection ? Benot - PowerPoint PPT Presentation

Mutual Information an Adequate Tool for Feature Selection ? Benoît Frénay November 15, 2013

Introduction What is Feature Selection ? Overview of the Presentation

Example of Feature Selection: Diabetes Progression Goal : predict the diabetes progression one year after baseline. 442 diabetes patients were measured on 10 baseline variables . Available patient characteristics ( features ): 1 age 2 sex 3 body mass index (BMI) 4 blood pressure (BP) 5 serum measurement #1 . . . . . . 10 serum measurement #6 2

Example of Feature Selection: Diabetes Progression What are the best features ? Figure reproduced from Efron, B., Hastie, T., Johnstone, I., Tishirani, R. Least Angle Regression. Annals of Statistics 32 (2) p. 407–499, 2004. 3

Example of Feature Selection: Diabetes Progression What are the 1 best features ? 3 body mass index (BMI) Figure reproduced from Efron, B., Hastie, T., Johnstone, I., Tishirani, R. Least Angle Regression. Annals of Statistics 32 (2) p. 407–499, 2004. 3

Example of Feature Selection: Diabetes Progression What are the 2 best features ? 3 body mass index (BMI) 9 serum measurement #5 Figure reproduced from Efron, B., Hastie, T., Johnstone, I., Tishirani, R. Least Angle Regression. Annals of Statistics 32 (2) p. 407–499, 2004. 3

Example of Feature Selection: Diabetes Progression What are the 3 best features ? 3 body mass index (BMI) 9 serum measurement #5 4 blood pressure (BP) Figure reproduced from Efron, B., Hastie, T., Johnstone, I., Tishirani, R. Least Angle Regression. Annals of Statistics 32 (2) p. 407–499, 2004. 3

Example of Feature Selection: Diabetes Progression What are the 10 best features ? 3 body mass index (BMI) 9 serum measurement #5 4 blood pressure (BP) 7 serum measurement #3 2 sex 10 serum measurement #6 5 serum measurement #1 8 serum measurement #4 6 serum measurement #2 1 age Figure reproduced from Efron, B., Hastie, T., Johnstone, I., Tishirani, R. Least Angle Regression. Annals of Statistics 32 (2) p. 407–499, 2004. 3

Example of Feature Selection: Diabetes Progression What are the 3 best features ? 3 body mass index (BMI) 9 serum measurement #5 4 blood pressure (BP) Figure reproduced from Efron, B., Hastie, T., Johnstone, I., Tishirani, R. Least Angle Regression. Annals of Statistics 32 (2) p. 407–499, 2004. 3

What is Feature Selection ? Problems with high-dimensional data: interpretability of data curse of dimensionality concentration of distances Feature selection consists in using only a subset of the features : selecting features ( easy-to-interpret models) information may be discarded if necessary Question : how can one select relevant features ? 4

Mutual Information: Optimal Solution or Heuristic ? Mutual information (MI) assesses the quality of feature subsets : rigorous definition (information theory) interpretation in terms of uncertainty reduction What kind of guarantees do we have ? Outline of this presentation: feature selection with mutual information adequacy of mutual information in classification adequacy of mutual information in regression 5

Mutual Information in a Nutshell

Measuring (Statistical) Dependency: Mutual Information Uncertainty on the value of the output Y : H ( Y ) = E Y {− log p Y ( Y ) } . 7

Measuring (Statistical) Dependency: Mutual Information Uncertainty on the value of the output Y : H ( Y ) = E Y {− log p Y ( Y ) } . Uncertainty on Y once X is known: � � H ( Y | X ) = E − log p Y | X ( Y | X ) . X , Y 7

Measuring (Statistical) Dependency: Mutual Information Uncertainty on the value of the output Y : H ( Y ) = E Y {− log p Y ( Y ) } . Uncertainty on Y once X is known: � � H ( Y | X ) = E − log p Y | X ( Y | X ) . X , Y Mutual information (MI): � � log p X , Y ( X , Y ) I ( X ; Y ) = H ( Y ) − H ( Y | X ) = E . p X ( X ) p Y ( Y ) X , Y MI is the reduction of uncertainty about the value of Y once X is known. 7

Feature Selection with Mutual Information Natural interpretation in feature selection: the reduction of uncertainty about the class label ( Y ) once a subset of features ( X ) is known. one selects the subset of features which maximises MI MI can detect linear as well as non-linear relationships between variables. not true for the correlation coefficient MI can be defined for multi-dimensional variables (subsets of features). useful to detect mutually relevant or redundant features 8

Greedy Procedures for Feature Selection The number of possible feature subsets in exponential w.r.t. d . exhaustive search is usually intractable ( d > 10) Standard solution: use greedy procedures . Optimality is not guaranteed , but very good results are obtained. 9

The Forward Search Algorithm Input: set of features 1 . . . d Output: subsets of feature indices {S i } i ∈ 1 ... d S 0 ← {} U ← { 1 , . . . , d } for all number of features i ∈ 1 . . . d do for all remaining feature with index j ∈ U do compute mutual information ˆ I j = ˆ I ( X S i − 1 ∪{ j } ; S ) end for � � arg max j ˆ S i ← S i − 1 ∪ I j � � arg max j ˆ U ← U \ I j end for 10

The Backward Search Algorithm Input: set of features 1 . . . d Output: subsets of feature indices {S i } i ∈ 1 ... d S d ← { 1 , . . . , d } for all number of features i ∈ d − 1 . . . 1 do for all remaining feature with index j ∈ S i + 1 do compute mutual information ˆ I j = ˆ I ( X S i + 1 \{ j } ; S ) end for � � arg max j ˆ S i ← S i + 1 \ I j end for 11

Should You Use Mutual Information ? Is MI optimal ? Do we have guarantees ? What does mean optimality ? in classification : maximises accuracy in regression : minimises MSE/MAE MI allows strategies like min.-redundancy-max.-relevance (mRMR): d I ( X k ; Y ) − 1 � arg max I ( X k ; X i ) d X k i = 1 MI is supported by a large literature of successful applications . 12

Mutual Information in Classification Theoretical Considerations Experimental Assessment

Classification and Risk Minimisation Goal in classification: to minimise the number of misclassifications . Take an optimal classifier with probability of misclassification P e optimal feature subsets correspond to minimal P e . Question : how optimal is feature selection with MI ? or what is the relationship between the MI / H ( Y | X ) and P e ? remember: I ( X ; Y ) = H ( Y ) − H ( Y | X ) � �� constant 14

Bounds on the Risk: the Hellman-Raviv Inequality An upper bound on P e is given by the Hellman-Raviv inequality P e ≤ 1 2 H ( Y | X ) . where P e is the probability of misclassification for an optimal classifier . 15

Bounds on the Risk: the Fano Inequalities The weak and strong Fano bounds are H ( Y | X ) ≤ 1 + P e log 2 ( n Y − 1 ) H ( Y | X ) ≤ H ( P e ) + P e log 2 ( n Y − 1 ) where n Y is the number of classes . 16

No Deterministic Relationship between MI and Risk Hellman-Raviv and Fano inequalities: upper/lower-bounds on the risk. classical (and simplistic) justification for MI: P e → 0 if H ( Y | X ) → 0 or equiv. I ( X ; Y ) → H ( Y ) 17

No Deterministic Relationship between MI and Risk Hellman-Raviv and Fano inequalities: upper/lower-bounds on the risk. classical (and simplistic) justification for MI: P e → 0 if H ( Y | X ) → 0 or equiv. I ( X ; Y ) → H ( Y ) Question: is it possible to increase the risk while decreasing H ( Y | X ) ? 17

No Deterministic Relationship between MI and Risk Hellman-Raviv and Fano inequalities: upper/lower-bounds on the risk. classical (and simplistic) justification for MI: ! P e → 0 if H ( Y | X ) → 0 or equiv. I ( X ; Y ) → H ( Y ) s e y Question: is it possible to increase the risk while decreasing H ( Y | X ) ? s i r e w s n a e h T 17

Simple Example of MI Failure (1) Disease diagnosis , two classes with priors P ( Y = a ) = 0 . 32 and P ( Y = b ) = 0 . 68 . (1) For each new patient : two medical tests are available : X 1 ∈ { 0 , 1 } and X 2 ∈ { 0 , 1 } but the practician can only perform either X 1 or X 2 In terms of feature selection, he has to select the best feature ( X 1 or X 2 ). 18

Mutual Information an Adequate Tool for Feature Selection ? Benot - PowerPoint PPT Presentation

Mutual Information an Adequate Tool for Feature Selection ? Benot Frnay November 15, 2013 Introduction What is Feature Selection ? Overview of the Presentation Example of Feature Selection: Diabetes Progression Goal : predict the diabetes

Mutual Savings Association Advisory Committee Meeting Mutual Savings Association Advisory April

Mutual Savings Association Advisory Committee Meeting Mutual Savings Association Advisory May 3,

MUTUAL AID MUTUAL AID California Disaster and Civil Defense Master Mutual Aid Agreement

Your Mutual Elevator Speech You Mutual Elevator Speech You Mutual Elevator Speech

Mutual Savings Association Advisory Committee Meeting May 9, 2017 Mutual Overview and Trends

Crosstalk - How can we avoid it - Herv Grabas Mutual Inductance and Capacitance Crosstalk

BALANCING INCOME GENERATION IN A WORLD OF VUCA April 2017 FPA Arne Espe, CFA, Mutual Funds

CAMIC & MEDIA 22 March 2019 Trump & Mutual Insurance Acknowledge the anger Mutual

Essence of Mutual Fund By Nikhil Mehta Corporate Trainer for Mutual Funds Cos, NISM, NSE, NSDC

HATTERAS Alternative Mutual Funds Fund of funds in a mutual fund | Table of Contents Hatteras

Mutual Savings Association Advisory Committee Meeting March 21, 2018 Mutual Overview and Trends

(MLP) Emilio Castrillejo 28 February 2017 Contents 1. What is the Mutual learning Programme

Mutual of Omaha Medicare Supplements Jason Weber Sales Director, Medicare Supplement Mutual of

Do Mutual Funds Still Work? Chuck Jaffe Senior Columnist, MarketWatch Host, MoneyLife Do Mutual

The Synchronization Toolbox The Synchronization Toolbox Mutual Exclusion Mutual Exclusion Race

Mutual Societies Information Note Mutual Societies Application Form New registrations and

Five-Year Outcomes after Randomization to Transcatheter or Surgical Aortic Valve Replacement:

Python - Week 3 Mohammad Shokoohi-Yekta 1 Objective To solve mathematic problems by using

Family Engagement Learning and Action Network (NPFE-LAN) Home Dialysis Resources Presented by:

Work organization, stress, the Work organization, stress, the changing nature of work, and

ECS 256 Project Presentation Zejun Huang, Jeremy Bottleson, Yuduo Wu, Jianping Li Yacht

From patients to policies in HIV: toward epidemiologic methods for implementation science Daniel

Prevent Diabetes STAT Hannah Herold, MPH, MA, CHES Chronic Disease Prevention Program Wyoming

A Population-based Cohort Study and Pooling Prospective Studies Mohsen Mazidi 1 , Niki Katsiki 2 ,

Mutual Information an Adequate Tool for Feature Selection ? Benot - PowerPoint PPT Presentation

Mutual Information an Adequate Tool for Feature Selection ? Benot Frnay November 15, 2013 Introduction What is Feature Selection ? Overview of the Presentation Example of Feature Selection: Diabetes Progression Goal : predict the diabetes

Mutual Savings Association Advisory Committee Meeting Mutual Savings Association Advisory April

Mutual Savings Association Advisory Committee Meeting Mutual Savings Association Advisory May 3,

MUTUAL AID MUTUAL AID California Disaster and Civil Defense Master Mutual Aid Agreement

Your Mutual Elevator Speech You Mutual Elevator Speech You Mutual Elevator Speech

Mutual Savings Association Advisory Committee Meeting May 9, 2017 Mutual Overview and Trends

Crosstalk - How can we avoid it - Herv Grabas Mutual Inductance and Capacitance Crosstalk

BALANCING INCOME GENERATION IN A WORLD OF VUCA April 2017 FPA Arne Espe, CFA, Mutual Funds

CAMIC &amp; MEDIA 22 March 2019 Trump &amp; Mutual Insurance Acknowledge the anger Mutual

Essence of Mutual Fund By Nikhil Mehta Corporate Trainer for Mutual Funds Cos, NISM, NSE, NSDC

HATTERAS Alternative Mutual Funds Fund of funds in a mutual fund | Table of Contents Hatteras

Mutual Savings Association Advisory Committee Meeting March 21, 2018 Mutual Overview and Trends

(MLP) Emilio Castrillejo 28 February 2017 Contents 1. What is the Mutual learning Programme

Mutual of Omaha Medicare Supplements Jason Weber Sales Director, Medicare Supplement Mutual of

Do Mutual Funds Still Work? Chuck Jaffe Senior Columnist, MarketWatch Host, MoneyLife Do Mutual

The Synchronization Toolbox The Synchronization Toolbox Mutual Exclusion Mutual Exclusion Race

Mutual Societies Information Note Mutual Societies Application Form New registrations and

Five-Year Outcomes after Randomization to Transcatheter or Surgical Aortic Valve Replacement:

Python - Week 3 Mohammad Shokoohi-Yekta 1 Objective To solve mathematic problems by using

Family Engagement Learning and Action Network (NPFE-LAN) Home Dialysis Resources Presented by:

Work organization, stress, the Work organization, stress, the changing nature of work, and

ECS 256 Project Presentation Zejun Huang, Jeremy Bottleson, Yuduo Wu, Jianping Li Yacht

From patients to policies in HIV: toward epidemiologic methods for implementation science Daniel

Prevent Diabetes STAT Hannah Herold, MPH, MA, CHES Chronic Disease Prevention Program Wyoming

A Population-based Cohort Study and Pooling Prospective Studies Mohsen Mazidi 1 , Niki Katsiki 2 ,

CAMIC & MEDIA 22 March 2019 Trump & Mutual Insurance Acknowledge the anger Mutual