Transmogrification: The Magic of Feature Engineering Leah McGuire - - PowerPoint PPT Presentation

transmogrification the magic of feature engineering
SMART_READER_LITE
LIVE PREVIEW

Transmogrification: The Magic of Feature Engineering Leah McGuire - - PowerPoint PPT Presentation

Transmogrification: The Magic of Feature Engineering Leah McGuire and Mayukh Bhaowal ML algorithms take center stage in AI Modeling Raw Data Feature Engineering Bottleneck Mythical Numeric Matrix X 1 X 2 X 3 X 4 X 5 Y 0 1 0 0 0 A 1


slide-1
SLIDE 1

Transmogrification: The Magic

  • f Feature Engineering

Leah McGuire and Mayukh Bhaowal

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

ML algorithms take center stage in AI

Modeling

Feature Engineering

Bottleneck

Raw Data

slide-7
SLIDE 7

X1 X2 X3 X4 X5 Y 1 A 1 1 1 B 1 1 B 1 1 1 1 1 A 1 1 A

Mythical Numeric Matrix

slide-8
SLIDE 8

Use the data types

slide-9
SLIDE 9

Imputation Track null value Log transformation for large range Scaling - zNormalize Smart Binning

Imputation

Track null value One Hot Encoding Dynamic Top K pivot Smart Binning LabelCount Encoding Category Embedding

Numeric Categorical Spatial Temporal

Tokenization Hash Encoding Tf-Idf Word2Vec Sentiment Analysis Language Detection Time difference Circular Statistics Time extraction (day, week, month, year) Closeness to major events Augment with external data e.g avg income Spatial fraudulent behavior e.g: impossible travel speed Geo-encoding

Text

Automatic Feature Engineering

slide-10
SLIDE 10

Transmogrification

val featureVector = Seq(age, phone, email, subject, zipCode).transmogrify()

slide-11
SLIDE 11

Impact on Feature Engineering

Zipcode Subject Phone Email Age Age [0-15] Age [15-35] Age [>35] Email Is Spammy Top 10 Email Domain Country Code Phone Is Valid Top TF-IDF Terms Average Income Vector

slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

The Black Swan of Perfectly Interpretable Models

Leah McGuire, Mayukh Bhaowal

slide-15
SLIDE 15
slide-16
SLIDE 16

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

  • f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

slide-17
SLIDE 17

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

  • f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

slide-18
SLIDE 18

The Question

Why did the machine learning model make the decision that it did?

slide-19
SLIDE 19

Translation #1

How do I fix this model?

— Data Scientist

slide-20
SLIDE 20

Translation #2

Do we have our bases covered, in case of a regulatory audit?

— Legal Counsel

slide-21
SLIDE 21

Translation #3

Does Einstein know what I know? How do I use this prediction?

— Non Technical End User

slide-22
SLIDE 22

P1(c | f) Pk(c | f) Pn(c | f)

Σ

Input Output

slide-23
SLIDE 23

Model Insights Report

slide-24
SLIDE 24

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

  • f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

slide-25
SLIDE 25

Debuggability

F1

Top contributing features for surviving the Titanic:

  • 1. Gender
  • 2. pClass
  • 3. Body
slide-26
SLIDE 26

How can you trust a man that wears both a belt and suspenders? Man can't even trust his own pants. Trust

slide-27
SLIDE 27

Machine Human Right Wrong

slide-28
SLIDE 28

Bias

slide-29
SLIDE 29

Legal

slide-30
SLIDE 30
slide-31
SLIDE 31

Black defendant has higher risk scores

https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm

slide-32
SLIDE 32

Actionable

slide-33
SLIDE 33

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

  • f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

slide-34
SLIDE 34

It’s complicated

slide-35
SLIDE 35

Can you use a simple model? Are the raw features fed into the model interpretable? Does the consumer care about how features affect the model or just feature insights? Does the consumer care about individual predictions? Feature Weights/ Importance Global Feature Impact Model Agnostic Global Secondary Model Global Feature Weights/ Importance Local Feature Impact Model Agnostic Local Secondary Model Local

slide-36
SLIDE 36

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

  • f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

slide-37
SLIDE 37

The best model or the model you can explain?

slide-38
SLIDE 38

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

  • f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

slide-39
SLIDE 39

X1 X2 X3 X4 X5 Y 1 A 1 1 1 B 1 1 B 1 1 1 1 1 A 1 1 A

Where did you get the feature matrix?

slide-40
SLIDE 40

Feature Engineering

Zipcode Subject Phone Email Age Age [0-15] Age [15-35] Age [>35] Email Is Spammy Top 10 Email Domain Country Code Phone Is Valid Top TF-IDF Terms Average Income Vector

slide-41
SLIDE 41

Metadata!!!

https://ontotext.com/knowledgehub/fundamentals/metadata-fundamental/

  • The name of the

feature the column was made from

  • The name of the RAW

feature(s) the column was made from

  • Everything you did to

get the column

  • Any grouping

information across columns

  • Description of the value

in the column

slide-42
SLIDE 42

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

  • f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

slide-43
SLIDE 43

Interpretability: Global vs Local

slide-44
SLIDE 44

Can you use a simple model? Are the raw features fed into the model interpretable? Does the consumer care about how features affect the model or just feature insights? Does the consumer care about individual predictions? Feature Weights/ Importance Global Feature Impact Model Agnostic Global Secondary Model Global

slide-45
SLIDE 45

Feature Weight / Importance (Global)

slide-46
SLIDE 46

Predict House Price

slide-47
SLIDE 47

Predict Titanic Passenger Survival

slide-48
SLIDE 48

P1(c | f) Pk(c | f) Pn(c | f)

Σ

Input Output

slide-49
SLIDE 49

Feature Impact (Global - the hard way)

X X2 X3 X4 X5 Y 1 A 1 1 1 B 1 1 B 1 1 1 1 1 A 1 1 A

slide-50
SLIDE 50

Feature Impact (Global - the hard way)

slide-51
SLIDE 51

Issues with Feature Importance / Weight / Impact (Global)

http://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_toolbox/multicollinearity.htm

slide-52
SLIDE 52

Input Prediction Explanation

Secondary Model

slide-53
SLIDE 53

Secondary Model (Global)

slide-54
SLIDE 54

Secondary Model (Global)

https://www.statmethods.net/advgraphs/images/corrgram1.png

slide-55
SLIDE 55

What we do:

  • All the metadata about

how you got the feature

  • Correlation
  • Mutual information
  • Feature weight /

importance

  • Feature distribution
slide-56
SLIDE 56

What we do:

{

"featureName" : "sex", "derivedFeatures" : [ { "stagesApplied" : [ "pivotText_OpSetVectorizer" ], "derivedFeatureValue" : "Male", "corr" : -0.5185045877245239, "mutualInformation" : 0.19652543270839468, "contribution" : 0.1763534388489181, …. }, { "stagesApplied" : [ "pivotText_OpSetVectorizer" ], "derivedFeatureValue" : "Female", "corr" : 0.518504587724524, "mutualInformation" : 0.19652543270839468, "contribution" : 0.18080355705344647, …. } }

slide-57
SLIDE 57

Roadmap for this talk

What does it mean to explain your model? Why explain your model? How to explain your model? Complications

  • f feature

engineering Global (full model) solutions Local (record level) solutions Interpretability vs accuracy tradeoff

slide-58
SLIDE 58

Can you use a simple model? Are the raw features fed into the model interpretable? Does the consumer care about how features affect the model or just feature insights? Does the consumer care about individual predictions? Feature Weights/ Importance Local Feature Impact Model Agnostic Local Secondary Model Local

slide-59
SLIDE 59

Feature Weight (Local)

slide-60
SLIDE 60

Predict House Price

852 2 1 36

slide-61
SLIDE 61

Feature Weight (Local)

slide-62
SLIDE 62

Feature Impact (LOCO)

https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning

{"age":17.0, "embarked":"C", "name":"Attalah, Miss. Malake", "pClass":"3", "parch":"0", "sex":"female", "sibSp":"0", "survived":0.0, "ticket":"2627"} Score = 0.62 Why? sex = "female" (+0.13), pClass = 3 (-0.05), ...

slide-63
SLIDE 63

Secondary Model (LIME)

https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning

slide-64
SLIDE 64

Secondary Model (Correlation)

https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning

Norm (feature) * Corr

slide-65
SLIDE 65

What we do:

  • Use case determines LOCO or correlation
  • Use case determines what level of features we show
slide-66
SLIDE 66
slide-67
SLIDE 67