Attribute Interactions in Machine Learning Aleks Jakulin Faculty - - PowerPoint PPT Presentation

attribute interactions
SMART_READER_LITE
LIVE PREVIEW

Attribute Interactions in Machine Learning Aleks Jakulin Faculty - - PowerPoint PPT Presentation

Attribute Interactions in Machine Learning Aleks Jakulin Faculty of Computer and Information Science University of Ljubljana Attribute Interactions p.1/17 A Classification Problem A TTRIBUTES L ABEL Name Hair Height Weight Lotion


slide-1
SLIDE 1

Attribute Interactions

in Machine Learning

Aleks Jakulin Faculty of Computer and Information Science University of Ljubljana

Attribute Interactions – p.1/17

slide-2
SLIDE 2

A Classification Problem

ATTRIBUTES LABEL

Name Hair Height Weight Lotion Result

Sarah blonde average light no

sunburned

Dana blonde tall average yes

tanned

Alex brown short average yes

tanned

Annie blonde short average no

sunburned

Emily red average heavy no

sunburned

Pete brown tall heavy no

tanned

John brown average heavy no

tanned

Katie blonde short light yes

tanned

TASK: PREDICT AN INSTANCE’S CLASS GIVEN THE ATTRIBUTE VALUES.

Attribute Interactions – p.2/17

slide-3
SLIDE 3

Interactions

“We cannot conquer a group of interacting attributes by dividing them.” Most machine learning algorithms assume either

that all attributes are independent (naïve Bayes, logistic regression, linear SVM, perceptron),

  • r that all attributes are dependent (classification trees,

constructive induction, rules, kernel methods, instance-based methods).

However, voting ensembles, where a number of classifiers trained on subsets of attributes or instances vote to predict the label (attribute decomposition, random forests, decision graphs, subspace methods), yield good results. Why?

Attribute Interactions – p.3/17

slide-4
SLIDE 4

Voting

Hair Name Height Result Lotion Weight

Attribute Interactions – p.4/17

slide-5
SLIDE 5

Voting

Hair Name Height

SKIN

Result Lotion Weight

WE DECLARE A TRUE

THIS TO BE:

INTERACTION

SPURIOUS RELATIONSHIP MODERATOR

Attribute Interactions – p.4/17

slide-6
SLIDE 6

Voting

Hair Name Height

SKIN

Result

SIZE

Lotion Weight

WE DECLARE A TRUE A FALSE

THIS TO BE:

INTERACTION INTERACTION

SPURIOUS RELATIONSHIP MODERATOR LATENT CAUSE LATENT CAUSE

Attribute Interactions – p.4/17

slide-7
SLIDE 7

Simpson’s Paradox

0.1 0.2 0.3 0.4 0.5 0.6 Death Rate (%) Location Tuberculosis Patients New York Richmond

Attribute Interactions – p.5/17

slide-8
SLIDE 8

Simpson’s Paradox

0.1 0.2 0.3 0.4 0.5 0.6 Death Rate (%) Location Tuberculosis Patients New York Richmond White Non-White Both

Attribute Interactions – p.5/17

slide-9
SLIDE 9

Information Gain

C B A

An attribute is an information source. We want to estimate the amount of information shared between two sources. The amount learned about a label C from an attribute A is quantified by information gain: GainC(A) := H(A) + H(C) − H(AC). Interpretation: our ignorance about an unknown C reduces by GainC(A) given the knowledge of A. Sufficient, if all attributes are conditionally in- dependent with respect to the label, when there are only 2-way interactions.

Attribute Interactions – p.6/17

slide-10
SLIDE 10

Interaction Gain

C B A

How to estimate the amount of information shared among three attributes? Generalization of information gain for 3-way interactions is interaction gain:

IG3(ABC) := H(AB) + H(AC) + H(BC) − H(A) −H(B) − H(C) − H(ABC) = GainC(AB) − GainC(A) − GainC(B).

If IG negative: a false interaction. If IG positive: a true interaction. If IG zero: no 3-way interaction.

Attribute Interactions – p.7/17

slide-11
SLIDE 11

False Interaction Analysis

age marital−status relationship hours−per−week sex workclass native−country race education education−num

  • ccupation

capital−gain capital−loss fnlwgt 200 400 600 800 1000 Height

The Census/Adult domain from UCI, 2-classes of individuals: rich, poor. Similarity between two attributes is proportional to negated 3-interaction gain between them and the label. Only false interactions were included into consideration. Agglomerative clustering was used to create the interaction dendrogram.

Attribute Interactions – p.8/17

slide-12
SLIDE 12

True Interaction Analysis

native_country age 100% race 23% workclass 75%

  • ccupation

75% capital_loss capital_gain 63% education 59% marital_status 52% relationship 46% hours_per_week 35%

A percentage on an interaction graph edge indicates the strength of a true interaction. Native country appears to be an important moderator, moderating a large number of 2-way interactions. True interactions are rarely transitive relations. True interactions are a forest of trees, not a single tree.

Attribute Interactions – p.9/17

slide-13
SLIDE 13

Interaction Significance (1)

When is an interaction significant? Special statistics for conditional dependence and independence tests, e.g., Cochran-Mantel-Haenszel. Evaluate classifier performance on unseen data by comparing:

A classifier assuming independence between two attributes (voting). A classifier exploiting dependence between two attributes via interaction resolution (segmentation).

Attribute Interactions – p.10/17

slide-14
SLIDE 14

Interaction Significance (2)

DRUZINSK ED_FAZAS 27% KTZHT 0% ED_UPA 1% VASK.INV 0% 1% OPERACIJ 0% RT 0% UPASEL 2% 0% ED_PAI2 0% PAI2SEL 0% INV.KAPS 2% NEOKT 0% ANAMNEZA 1% 1% 33% LOKOREG. 1% ODDALJEN 2% MKL 0% ED_PREGL.BE 0% 0% 0% NOV.TU 0% MENSEL 1% NKL 1% 0% UICC 0% KT 0% D_STEVILO 0% INVAZIJ1 1% 1% D_PAI1 4% 0% PAI1SEL 0% 0% UPARSEL 3% 2% 20% 0% INVAZIJA 3% 0% HR D_PR.B 4% HT 3% 1% 1% 0% 0% 0% ZDRAVLJE 0% 0% LOKALIZA 0% 0% KAT.L 0% 0% VELSEL 0% NODSEL 0% D_DFS2 0% UPAR 0% 1% GRADSEL 2% TIP.TU 0% GRADUS 0% LIMF.INV 0% D_ER.B 0% ED_KATL 1% 2% 100% ED_KAT.D 2% 0% VEL.C 2%

There are generally few significant interactions.

Attribute Interactions – p.11/17

slide-15
SLIDE 15

Interaction Significance (2)

DRUZINSK ED_FAZAS 27% KTZHT 0% ED_UPA 1% VASK.INV 0% 1% OPERACIJ 0% RT 0% UPASEL 2% 0% ED_PAI2 0% PAI2SEL 0% INV.KAPS 2% NEOKT 0% ANAMNEZA 1% 1% 33% LOKOREG. 1% ODDALJEN 2% MKL 0% ED_PREGL.BE 0% 0% 0% NOV.TU 0% MENSEL 1% NKL 1% 0% UICC 0% KT 0% D_STEVILO 0% INVAZIJ1 1% 1% D_PAI1 4% 0% PAI1SEL 0% 0% UPARSEL 3% 2% 20% 0% INVAZIJA 3% 0% HR D_PR.B 4% HT 3% 1% 1% 0% 0% 0% ZDRAVLJE 0% 0% LOKALIZA 0% 0% KAT.L 0% 0% VELSEL 0% NODSEL 0% D_DFS2 0% UPAR 0% 1% GRADSEL 2% TIP.TU 0% GRADUS 0% LIMF.INV 0% D_ER.B 0% ED_KATL 1% 2% 100% ED_KAT.D 2% 0% VEL.C 2%

ODDALJEN > 0: y ODDALJEN <= 0: :...LOKOREG. <= 0: n

  • LOKOREG. >

0: y

A PERFECT CLASSIFICATION TREE FOR

THE ‘BREAST’ DOMAIN, INDUCED BY

C4.5.

But they matter: non-myopic feature selection, non-myopic split selection, non-myopic discretization, rules, trees, constructive induction.

Attribute Interactions – p.11/17

slide-16
SLIDE 16

Classification Performance

‘adult’

Base False True NBC 0.416 0.352 0.392 LR 1.562 0.418 1.564 SVM — — —

‘breast’

Base False True NBC 0.262 0.187 0.171 LR 0.016 0.016 0.016 SVM 0.032 0.032 0.016

A wrapper algorithm detects true or false interactions with interaction gain and uses minimal-error attribute reduction to resolve them. No feature selection and no parameter tuning was used. It improves results with logistic regression, SVM, and the naïve Bayesian classifier. There must be enough data!

Attribute Interactions – p.12/17

slide-17
SLIDE 17

Applications

Prediction:

Resolving significant interactions helps improve classification performance. Interactions limit or prevent myopia in discretization and feature selection. Interactions justify constructive induction.

Analysis:

Interactions are interesting, especially if unexpected: interactions between treatments, symptoms, etc.

Attribute Interactions – p.13/17

slide-18
SLIDE 18

Summary of Contributions

Two kinds of interactions: true and false interactions. Interaction gain is an interaction probe, able to detect and classify 3-way interactions. The pragmatic interaction significance test, based on comparison of classification performance on unseen data. True and false interaction analysis methodology, with interaction graphs and interaction dendrograms. Improving classification performance with interaction resolution.

Attribute Interactions – p.14/17

slide-19
SLIDE 19

Further Work

A full-fledged tool for interaction analysis. Support for numerical and ordered attributes. Generalization to k-way interactions. Improved methods of resolution, especially of false interactions. Exploration of implications of interactions to discretization, split selection, etc. Applications.

Attribute Interactions – p.15/17

slide-20
SLIDE 20

Cardinality of Attributes

  • 0.1
  • 0.08
  • 0.06
  • 0.04
  • 0.02

0.02 0.04 200 400 600 800 1000 1200 1400 1600 improvement by replacement number of joint attribute values Adult/Census

The greater the number of values in the constituent at- tributes, the lower the chances of the interaction between them to be significant.

Attribute Interactions – p.16/17

slide-21
SLIDE 21

Attribute Reduction

  • 0.08
  • 0.06
  • 0.04
  • 0.02

0.02 0.04

  • 0.08
  • 0.06
  • 0.04
  • 0.02

0.02 0.04 improvement by replacement (Cartesian) improvement by replacement (MinErr) Adult

Minimal-error attribute reduction often yields better results than using the non-reduced Cartesian product of attributes.

Attribute Interactions – p.17/17