Attribute Interactions in Machine Learning Aleks Jakulin Faculty - PowerPoint PPT Presentation

Attribute Interactions in Machine Learning Aleks Jakulin Faculty of Computer and Information Science University of Ljubljana Attribute Interactions – p.1/17

A Classification Problem A TTRIBUTES L ABEL Name Hair Height Weight Lotion Result sunburned Sarah blonde average light no tanned Dana blonde tall average yes tanned Alex brown short average yes sunburned Annie blonde short average no sunburned Emily red average heavy no tanned Pete brown tall heavy no tanned John brown average heavy no tanned Katie blonde short light yes T ASK : P REDICT AN INSTANCE ’ S CLASS GIVEN THE ATTRIBUTE VALUES . Attribute Interactions – p.2/17

Interactions “We cannot conquer a group of interacting attributes by dividing them.” Most machine learning algorithms assume either that all attributes are independent (naïve Bayes, logistic regression, linear SVM, perceptron), or that all attributes are dependent (classification trees, constructive induction, rules, kernel methods, instance-based methods). However, voting ensembles, where a number of classifiers trained on subsets of attributes or instances vote to predict the label (attribute decomposition, random forests, decision graphs, subspace methods), yield good results. Why? Attribute Interactions – p.3/17

Voting Hair Name Height Result Lotion Weight Attribute Interactions – p.4/17

Voting Hair Name Height SPURIOUS RELATIONSHIP Result S KIN MODERATOR Lotion Weight W E DECLARE A TRUE THIS TO BE : I NTERACTION Attribute Interactions – p.4/17

Voting Hair Name Height SPURIOUS RELATIONSHIP LATENT CAUSE Result S KIN S IZE LATENT CAUSE MODERATOR Lotion Weight W E DECLARE A TRUE A FALSE THIS TO BE : I NTERACTION I NTERACTION Attribute Interactions – p.4/17

Simpson’s Paradox Tuberculosis Patients 0.6 0.5 Death Rate (%) 0.4 0.3 0.2 0.1 New York Richmond 0 Location Attribute Interactions – p.5/17

Simpson’s Paradox Tuberculosis Patients 0.6 White Non-White 0.5 Both Death Rate (%) 0.4 0.3 0.2 0.1 New York Richmond 0 Location Attribute Interactions – p.5/17

Information Gain An attribute is an information source. We want to estimate the amount of information shared between two sources. The amount learned about a label C from an attribute A is quantified by information A B gain : Gain C ( A ) := H ( A ) + H ( C ) − H ( AC ) . Interpretation: our ignorance about an C unknown C reduces by Gain C ( A ) given the knowledge of A . Sufficient, if all attributes are conditionally independent with respect to the label, when there are only 2-way interactions. Attribute Interactions – p.6/17

Interaction Gain How to estimate the amount of information shared among three attributes? Generalization of information gain for 3-way interactions is interaction gain : A B IG 3 ( ABC ) := H ( AB ) + H ( AC ) + H ( BC ) − H ( A ) − H ( B ) − H ( C ) − H ( ABC ) C = Gain C ( AB ) − Gain C ( A ) − Gain C ( B ) . If IG negative: a false interaction. If IG positive: a true interaction. If IG zero: no 3-way interaction. Attribute Interactions – p.7/17

False Interaction Analysis Height The Census/Adult domain 0 200 400 600 800 1000 from UCI, 2-classes of age individuals: rich, poor. marital−status relationship Similarity between two hours−per−week attributes is proportional to sex negated 3-interaction gain workclass between them and the label. native−country race Only false interactions were education included into consideration. education−num occupation Agglomerative clustering was capital−gain used to create the interaction capital−loss fnlwgt dendrogram. Attribute Interactions – p.8/17

True Interaction Analysis age workclass occupation education marital_status relationship hours_per_week capital_gain 23% 100% 75% 75% 59% 52% 46% 35% 63% race native_country capital_loss A percentage on an interaction graph edge indicates the strength of a true interaction. Native country appears to be an important moderator, moderating a large number of 2-way interactions. True interactions are rarely transitive relations. True interactions are a forest of trees, not a single tree. Attribute Interactions – p.9/17

Interaction Significance (1) When is an interaction significant? Special statistics for conditional dependence and independence tests, e.g., Cochran-Mantel-Haenszel. Evaluate classifier performance on unseen data by comparing: A classifier assuming independence between two attributes (voting). A classifier exploiting dependence between two attributes via interaction resolution (segmentation). Attribute Interactions – p.10/17

Interaction Significance (2) D_STEVILO UICC NKL ED_PAI2 D_PR.B D_PAI1 0% 0% 1% INVAZIJA PAI2SEL 4% MKL ANAMNEZA UPASEL HR 3% ED_KAT.D 0% 1% VEL.C 0% 0% GRADUS ED_UPA 2% LIMF.INV 4% UPARSEL 0% 2% 2% 0% 33% UPAR 0% 1% 0% 0% D_ER.B 0% 1% VASK.INV 2% 0% 1% 2% 0% DRUZINSK 0% 0% 0% 3% 1% 20% 2% 1% 0% GRADSEL 0% 27% TIP.TU 0% 0% 0% 0% PAI1SEL 2% 1% ED_FAZAS 0% ODDALJEN 0% 0% 1% 0% INV.KAPS KT 100% 3% NEOKT 0% 1% 1% 2% OPERACIJ 1% 0% 0% 0% 0% 0% 0% 0% ED_PREGL.BE MENSEL LOKOREG. 0% 0% 0% NODSEL D_DFS2 0% 0% 1% ED_KATL 0% RT HT KAT.L ZDRAVLJE NOV.TU INVAZIJ1 KTZHT VELSEL LOKALIZA There are generally few significant interactions. Attribute Interactions – p.11/17

Interaction Significance (2) D_STEVILO UICC NKL ED_PAI2 D_PR.B D_PAI1 0% 0% 1% INVAZIJA PAI2SEL 4% MKL ANAMNEZA UPASEL HR ODDALJEN > 0: y 3% ED_KAT.D 0% 1% VEL.C 0% 0% GRADUS ED_UPA 2% LIMF.INV ODDALJEN <= 0: 4% UPARSEL 0% 2% 2% 0% 33% UPAR 0% 1% 0% 0% D_ER.B :...LOKOREG. <= 0: n 0% 1% VASK.INV 2% 0% 1% 2% 0% DRUZINSK 0% 0% 0% 3% 1% 20% 2% 1% 0% GRADSEL 0% 27% TIP.TU 0% 0% LOKOREG. > 0: y 0% 0% PAI1SEL 2% 1% ED_FAZAS 0% ODDALJEN 0% 0% 1% A PERFECT CLASSIFICATION TREE FOR 0% INV.KAPS KT 100% 3% NEOKT 0% 1% 1% 2% OPERACIJ 1% 0% 0% 0% 0% 0% 0% 0% THE ‘ BREAST ’ DOMAIN , INDUCED BY ED_PREGL.BE MENSEL LOKOREG. 0% 0% 0% NODSEL D_DFS2 0% 0% C4.5. 1% ED_KATL 0% RT HT KAT.L ZDRAVLJE NOV.TU INVAZIJ1 KTZHT VELSEL LOKALIZA But they matter: non-myopic feature selection, non-myopic split selection, non-myopic discretization, rules, trees, constructive induction. Attribute Interactions – p.11/17

Classification Performance ‘adult’ ‘breast’ Base False True Base False True NBC 0.416 0.352 0.392 NBC 0.262 0.187 0.171 LR 1.562 0.418 1.564 LR 0.016 0.016 0.016 SVM — — — SVM 0.032 0.032 0.016 A wrapper algorithm detects true or false interactions with interaction gain and uses minimal-error attribute reduction to resolve them. No feature selection and no parameter tuning was used. It improves results with logistic regression, SVM, and the naïve Bayesian classifier. There must be enough data! Attribute Interactions – p.12/17

Applications Prediction: Resolving significant interactions helps improve classification performance. Interactions limit or prevent myopia in discretization and feature selection. Interactions justify constructive induction. Analysis: Interactions are interesting, especially if unexpected: interactions between treatments, symptoms, etc. Attribute Interactions – p.13/17

Summary of Contributions Two kinds of interactions: true and false interactions. Interaction gain is an interaction probe, able to detect and classify 3-way interactions. The pragmatic interaction significance test, based on comparison of classification performance on unseen data. True and false interaction analysis methodology, with interaction graphs and interaction dendrograms. Improving classification performance with interaction resolution. Attribute Interactions – p.14/17

Further Work A full-fledged tool for interaction analysis. Support for numerical and ordered attributes. Generalization to k -way interactions. Improved methods of resolution, especially of false interactions. Exploration of implications of interactions to discretization, split selection, etc. Applications. Attribute Interactions – p.15/17

Cardinality of Attributes Adult/Census 0.04 0.02 0 improvement by replacement -0.02 -0.04 -0.06 -0.08 -0.1 0 200 400 600 800 1000 1200 1400 1600 number of joint attribute values The greater the number of values in the constituent attributes, the lower the chances of the interaction between them to be significant. Attribute Interactions – p.16/17

Attribute Reduction Adult 0.04 0.02 improvement by replacement (Cartesian) 0 -0.02 -0.04 -0.06 -0.08 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 improvement by replacement (MinErr) Minimal-error attribute reduction often yields better results than using the non-reduced Cartesian product of attributes. Attribute Interactions – p.17/17

Attribute Interactions in Machine Learning Aleks Jakulin Faculty - PowerPoint PPT Presentation

Attribute Interactions in Machine Learning Aleks Jakulin Faculty of Computer and Information Science University of Ljubljana Attribute Interactions p.1/17 A Classification Problem A TTRIBUTES L ABEL Name Hair Height Weight Lotion

Attribute Grammars Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Decorated Attribute Grammars Attribute Evaluation Meets Strategic Programming CC 2009, York, UK

Attribute Dependencies Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data

Why attribute-based signatures? The kind of authentication required in an attribute-based system

Incorporating Off-Line Attribute Delegation into Hierarchical Group and Attribute-Based Access

Transforming a continuous attribute into a discrete (ordinal) attribute Ricco RAKOTOMALALA

WGCV/WGISS interactions Ric har d MORE NO WGISS Chair CNE S WGCV / WGISS interactions

Intermolecular interactions and scattering M.H.J. Koch 1 Intermolecular interactions

Part II: (S)RG and Low-Momentum Interactions To understand the properties of complex nuclei from

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

MDL-Based Unsupervised Attribute Ranking Zdravko Markov Computer Science Department Central

New Proof Methods for Attribute-Based Encryption: Achieving Full Security through Selective

The past, present MIND THE GAPs and future of the Graduate Attribute Professionals Network

Getting To Know Your Data Road Map 1. Data Objects and Attribute Types 2. Descriptive Data

Doesn't Help Roman Frigg LSE Plan Primer: High Resolution Climate Projection Part I The

Memory Corruption The (almost) Complete History... haroon meer - 2010 @haroonmeer |

Locale-specific threats Security challenges due to globalization Anthony Bettini McAfee Labs

Riding the unknowns for The Treasury Lecture [Background reading only] AUTHOR: Adrian Orr CEO

Making a Difference A Case Study 1 Agenda Making a Difference The Foundation Building

Repairing, Replacing, or Resuscitating the ACA Insurance Market: Why Do It? Tom Miller American

GPP 501 Microeconomic Analysis for Public Policy Fall 2017 Given by Kevin Milligan Vancouver

The Year of the defender Cybersecurity Predictions for 2018 Cybersecurity Data Analytics Platform

Attribute Interactions in Machine Learning Aleks Jakulin Faculty - PowerPoint PPT Presentation

Attribute Interactions in Machine Learning Aleks Jakulin Faculty of Computer and Information Science University of Ljubljana Attribute Interactions p.1/17 A Classification Problem A TTRIBUTES L ABEL Name Hair Height Weight Lotion

Attribute Grammars Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Decorated Attribute Grammars Attribute Evaluation Meets Strategic Programming CC 2009, York, UK

Attribute Dependencies Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

The Syntax of Classes and Objects in Python Defining a Class - &quot;Inventing a Composite Data

Why attribute-based signatures? The kind of authentication required in an attribute-based system

Incorporating Off-Line Attribute Delegation into Hierarchical Group and Attribute-Based Access

Transforming a continuous attribute into a discrete (ordinal) attribute Ricco RAKOTOMALALA

WGCV/WGISS interactions Ric har d MORE NO WGISS Chair CNE S WGCV / WGISS interactions

Intermolecular interactions and scattering M.H.J. Koch 1 Intermolecular interactions

Part II: (S)RG and Low-Momentum Interactions To understand the properties of complex nuclei from

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

MDL-Based Unsupervised Attribute Ranking Zdravko Markov Computer Science Department Central

New Proof Methods for Attribute-Based Encryption: Achieving Full Security through Selective

The past, present MIND THE GAPs and future of the Graduate Attribute Professionals Network

Getting To Know Your Data Road Map 1. Data Objects and Attribute Types 2. Descriptive Data

Doesn't Help Roman Frigg LSE Plan Primer: High Resolution Climate Projection Part I The

Memory Corruption The (almost) Complete History... haroon meer - 2010 @haroonmeer |

Locale-specific threats Security challenges due to globalization Anthony Bettini McAfee Labs

Riding the unknowns for The Treasury Lecture [Background reading only] AUTHOR: Adrian Orr CEO

Making a Difference A Case Study 1 Agenda Making a Difference The Foundation Building

Repairing, Replacing, or Resuscitating the ACA Insurance Market: Why Do It? Tom Miller American

GPP 501 Microeconomic Analysis for Public Policy Fall 2017 Given by Kevin Milligan Vancouver

The Year of the defender Cybersecurity Predictions for 2018 Cybersecurity Data Analytics Platform

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data