Attribute Interactions
in Machine Learning
Aleks Jakulin Faculty of Computer and Information Science University of Ljubljana
Attribute Interactions – p.1/17
Attribute Interactions in Machine Learning Aleks Jakulin Faculty - - PowerPoint PPT Presentation
Attribute Interactions in Machine Learning Aleks Jakulin Faculty of Computer and Information Science University of Ljubljana Attribute Interactions p.1/17 A Classification Problem A TTRIBUTES L ABEL Name Hair Height Weight Lotion
Aleks Jakulin Faculty of Computer and Information Science University of Ljubljana
Attribute Interactions – p.1/17
ATTRIBUTES LABEL
Name Hair Height Weight Lotion Result
Sarah blonde average light no
sunburned
Dana blonde tall average yes
tanned
Alex brown short average yes
tanned
Annie blonde short average no
sunburned
Emily red average heavy no
sunburned
Pete brown tall heavy no
tanned
John brown average heavy no
tanned
Katie blonde short light yes
tanned
TASK: PREDICT AN INSTANCE’S CLASS GIVEN THE ATTRIBUTE VALUES.
Attribute Interactions – p.2/17
“We cannot conquer a group of interacting attributes by dividing them.” Most machine learning algorithms assume either
that all attributes are independent (naïve Bayes, logistic regression, linear SVM, perceptron),
constructive induction, rules, kernel methods, instance-based methods).
However, voting ensembles, where a number of classifiers trained on subsets of attributes or instances vote to predict the label (attribute decomposition, random forests, decision graphs, subspace methods), yield good results. Why?
Attribute Interactions – p.3/17
Attribute Interactions – p.4/17
SKIN
WE DECLARE A TRUE
THIS TO BE:
INTERACTION
SPURIOUS RELATIONSHIP MODERATOR
Attribute Interactions – p.4/17
SKIN
SIZE
WE DECLARE A TRUE A FALSE
THIS TO BE:
INTERACTION INTERACTION
SPURIOUS RELATIONSHIP MODERATOR LATENT CAUSE LATENT CAUSE
Attribute Interactions – p.4/17
0.1 0.2 0.3 0.4 0.5 0.6 Death Rate (%) Location Tuberculosis Patients New York Richmond
Attribute Interactions – p.5/17
0.1 0.2 0.3 0.4 0.5 0.6 Death Rate (%) Location Tuberculosis Patients New York Richmond White Non-White Both
Attribute Interactions – p.5/17
C B A
An attribute is an information source. We want to estimate the amount of information shared between two sources. The amount learned about a label C from an attribute A is quantified by information gain: GainC(A) := H(A) + H(C) − H(AC). Interpretation: our ignorance about an unknown C reduces by GainC(A) given the knowledge of A. Sufficient, if all attributes are conditionally in- dependent with respect to the label, when there are only 2-way interactions.
Attribute Interactions – p.6/17
C B A
IG3(ABC) := H(AB) + H(AC) + H(BC) − H(A) −H(B) − H(C) − H(ABC) = GainC(AB) − GainC(A) − GainC(B).
If IG negative: a false interaction. If IG positive: a true interaction. If IG zero: no 3-way interaction.
Attribute Interactions – p.7/17
age marital−status relationship hours−per−week sex workclass native−country race education education−num
capital−gain capital−loss fnlwgt 200 400 600 800 1000 Height
The Census/Adult domain from UCI, 2-classes of individuals: rich, poor. Similarity between two attributes is proportional to negated 3-interaction gain between them and the label. Only false interactions were included into consideration. Agglomerative clustering was used to create the interaction dendrogram.
Attribute Interactions – p.8/17
native_country age 100% race 23% workclass 75%
75% capital_loss capital_gain 63% education 59% marital_status 52% relationship 46% hours_per_week 35%
Attribute Interactions – p.9/17
Attribute Interactions – p.10/17
DRUZINSK ED_FAZAS 27% KTZHT 0% ED_UPA 1% VASK.INV 0% 1% OPERACIJ 0% RT 0% UPASEL 2% 0% ED_PAI2 0% PAI2SEL 0% INV.KAPS 2% NEOKT 0% ANAMNEZA 1% 1% 33% LOKOREG. 1% ODDALJEN 2% MKL 0% ED_PREGL.BE 0% 0% 0% NOV.TU 0% MENSEL 1% NKL 1% 0% UICC 0% KT 0% D_STEVILO 0% INVAZIJ1 1% 1% D_PAI1 4% 0% PAI1SEL 0% 0% UPARSEL 3% 2% 20% 0% INVAZIJA 3% 0% HR D_PR.B 4% HT 3% 1% 1% 0% 0% 0% ZDRAVLJE 0% 0% LOKALIZA 0% 0% KAT.L 0% 0% VELSEL 0% NODSEL 0% D_DFS2 0% UPAR 0% 1% GRADSEL 2% TIP.TU 0% GRADUS 0% LIMF.INV 0% D_ER.B 0% ED_KATL 1% 2% 100% ED_KAT.D 2% 0% VEL.C 2%
Attribute Interactions – p.11/17
DRUZINSK ED_FAZAS 27% KTZHT 0% ED_UPA 1% VASK.INV 0% 1% OPERACIJ 0% RT 0% UPASEL 2% 0% ED_PAI2 0% PAI2SEL 0% INV.KAPS 2% NEOKT 0% ANAMNEZA 1% 1% 33% LOKOREG. 1% ODDALJEN 2% MKL 0% ED_PREGL.BE 0% 0% 0% NOV.TU 0% MENSEL 1% NKL 1% 0% UICC 0% KT 0% D_STEVILO 0% INVAZIJ1 1% 1% D_PAI1 4% 0% PAI1SEL 0% 0% UPARSEL 3% 2% 20% 0% INVAZIJA 3% 0% HR D_PR.B 4% HT 3% 1% 1% 0% 0% 0% ZDRAVLJE 0% 0% LOKALIZA 0% 0% KAT.L 0% 0% VELSEL 0% NODSEL 0% D_DFS2 0% UPAR 0% 1% GRADSEL 2% TIP.TU 0% GRADUS 0% LIMF.INV 0% D_ER.B 0% ED_KATL 1% 2% 100% ED_KAT.D 2% 0% VEL.C 2%
ODDALJEN > 0: y ODDALJEN <= 0: :...LOKOREG. <= 0: n
0: y
A PERFECT CLASSIFICATION TREE FOR
THE ‘BREAST’ DOMAIN, INDUCED BY
C4.5.
Attribute Interactions – p.11/17
‘adult’
Base False True NBC 0.416 0.352 0.392 LR 1.562 0.418 1.564 SVM — — —
‘breast’
Base False True NBC 0.262 0.187 0.171 LR 0.016 0.016 0.016 SVM 0.032 0.032 0.016
A wrapper algorithm detects true or false interactions with interaction gain and uses minimal-error attribute reduction to resolve them. No feature selection and no parameter tuning was used. It improves results with logistic regression, SVM, and the naïve Bayesian classifier. There must be enough data!
Attribute Interactions – p.12/17
Prediction:
Analysis:
Attribute Interactions – p.13/17
Attribute Interactions – p.14/17
Attribute Interactions – p.15/17
0.02 0.04 200 400 600 800 1000 1200 1400 1600 improvement by replacement number of joint attribute values Adult/Census
Attribute Interactions – p.16/17
0.02 0.04
0.02 0.04 improvement by replacement (Cartesian) improvement by replacement (MinErr) Adult
Attribute Interactions – p.17/17