[PPT] - Transmogrification: The Magic of Feature Engineering Leah McGuire PowerPoint Presentation

SLIDE 1

Transmogrification: The Magic

f Feature Engineering

Leah McGuire and Mayukh Bhaowal

SLIDE 2

SLIDE 3

SLIDE 4

SLIDE 5

SLIDE 6

ML algorithms take center stage in AI

Modeling

Feature Engineering

Bottleneck

Raw Data

SLIDE 7

X1 X2 X3 X4 X5 Y 1 A 1 1 1 B 1 1 B 1 1 1 1 1 A 1 1 A

Mythical Numeric Matrix

SLIDE 8

Use the data types

SLIDE 9

Imputation Track null value Log transformation for large range Scaling - zNormalize Smart Binning

Imputation

Track null value One Hot Encoding Dynamic Top K pivot Smart Binning LabelCount Encoding Category Embedding

Numeric Categorical Spatial Temporal

Tokenization Hash Encoding Tf-Idf Word2Vec Sentiment Analysis Language Detection Time difference Circular Statistics Time extraction (day, week, month, year) Closeness to major events Augment with external data e.g avg income Spatial fraudulent behavior e.g: impossible travel speed Geo-encoding

Text

Automatic Feature Engineering

SLIDE 10

Transmogrification

val featureVector = Seq(age, phone, email, subject, zipCode).transmogrify()

SLIDE 11

Impact on Feature Engineering

Zipcode Subject Phone Email Age Age [0-15] Age [15-35] Age [>35] Email Is Spammy Top 10 Email Domain Country Code Phone Is Valid Top TF-IDF Terms Average Income Vector

SLIDE 12

SLIDE 13

SLIDE 14

The Black Swan of Perfectly Interpretable Models

Leah McGuire, Mayukh Bhaowal

SLIDE 15

SLIDE 16

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

SLIDE 17

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

SLIDE 18

The Question

Why did the machine learning model make the decision that it did?

SLIDE 19

Translation #1

How do I fix this model?

— Data Scientist

SLIDE 20

Translation #2

Do we have our bases covered, in case of a regulatory audit?

— Legal Counsel

SLIDE 21

Translation #3

Does Einstein know what I know? How do I use this prediction?

— Non Technical End User

SLIDE 22

P1(c | f) Pk(c | f) Pn(c | f)

Σ

Input Output

SLIDE 23

Model Insights Report

SLIDE 24

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

SLIDE 25

Debuggability

F1

Top contributing features for surviving the Titanic:

1. Gender
2. pClass
3. Body

SLIDE 26

How can you trust a man that wears both a belt and suspenders? Man can't even trust his own pants. Trust

SLIDE 27

Machine Human Right Wrong

SLIDE 28

Bias

SLIDE 29

Legal

SLIDE 30

SLIDE 31

Black defendant has higher risk scores

https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm

SLIDE 32

Actionable

SLIDE 33

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

SLIDE 34

It’s complicated

SLIDE 35

Can you use a simple model? Are the raw features fed into the model interpretable? Does the consumer care about how features affect the model or just feature insights? Does the consumer care about individual predictions? Feature Weights/ Importance Global Feature Impact Model Agnostic Global Secondary Model Global Feature Weights/ Importance Local Feature Impact Model Agnostic Local Secondary Model Local

SLIDE 36

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

SLIDE 37

The best model or the model you can explain?

SLIDE 38

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

SLIDE 39

X1 X2 X3 X4 X5 Y 1 A 1 1 1 B 1 1 B 1 1 1 1 1 A 1 1 A

Where did you get the feature matrix?

SLIDE 40

Feature Engineering

Zipcode Subject Phone Email Age Age [0-15] Age [15-35] Age [>35] Email Is Spammy Top 10 Email Domain Country Code Phone Is Valid Top TF-IDF Terms Average Income Vector

SLIDE 41

Metadata!!!

https://ontotext.com/knowledgehub/fundamentals/metadata-fundamental/

The name of the

feature the column was made from

The name of the RAW

feature(s) the column was made from

Everything you did to

get the column

Any grouping

information across columns

Description of the value

in the column

SLIDE 42

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

SLIDE 43

Interpretability: Global vs Local

SLIDE 44

Can you use a simple model? Are the raw features fed into the model interpretable? Does the consumer care about how features affect the model or just feature insights? Does the consumer care about individual predictions? Feature Weights/ Importance Global Feature Impact Model Agnostic Global Secondary Model Global

SLIDE 45

Feature Weight / Importance (Global)

SLIDE 46

Predict House Price

SLIDE 47

Predict Titanic Passenger Survival

SLIDE 48

P1(c | f) Pk(c | f) Pn(c | f)

Σ

Input Output

SLIDE 49

Feature Impact (Global - the hard way)

X X2 X3 X4 X5 Y 1 A 1 1 1 B 1 1 B 1 1 1 1 1 A 1 1 A

SLIDE 50

Feature Impact (Global - the hard way)

SLIDE 51

Issues with Feature Importance / Weight / Impact (Global)

http://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_toolbox/multicollinearity.htm

SLIDE 52

Input Prediction Explanation

Secondary Model

SLIDE 53

Secondary Model (Global)

SLIDE 54

Secondary Model (Global)

https://www.statmethods.net/advgraphs/images/corrgram1.png

SLIDE 55

What we do:

All the metadata about

how you got the feature

Correlation
Mutual information
Feature weight /

importance

Feature distribution

SLIDE 56

What we do:

{

"featureName" : "sex", "derivedFeatures" : [ { "stagesApplied" : [ "pivotText_OpSetVectorizer" ], "derivedFeatureValue" : "Male", "corr" : -0.5185045877245239, "mutualInformation" : 0.19652543270839468, "contribution" : 0.1763534388489181, …. }, { "stagesApplied" : [ "pivotText_OpSetVectorizer" ], "derivedFeatureValue" : "Female", "corr" : 0.518504587724524, "mutualInformation" : 0.19652543270839468, "contribution" : 0.18080355705344647, …. } }

SLIDE 57

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

SLIDE 58

Can you use a simple model? Are the raw features fed into the model interpretable? Does the consumer care about how features affect the model or just feature insights? Does the consumer care about individual predictions? Feature Weights/ Importance Local Feature Impact Model Agnostic Local Secondary Model Local

SLIDE 59

Feature Weight (Local)

SLIDE 60

Predict House Price

852 2 1 36

SLIDE 61

Feature Weight (Local)

SLIDE 62

Feature Impact (LOCO)

https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning

{"age":17.0, "embarked":"C", "name":"Attalah, Miss. Malake", "pClass":"3", "parch":"0", "sex":"female", "sibSp":"0", "survived":0.0, "ticket":"2627"} Score = 0.62 Why? sex = "female" (+0.13), pClass = 3 (-0.05), ...

SLIDE 63

Secondary Model (LIME)

https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning

SLIDE 64

Secondary Model (Correlation)

https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning

Norm (feature) * Corr

SLIDE 65

What we do:

Use case determines LOCO or correlation
Use case determines what level of features we show

SLIDE 66

SLIDE 67