Finding Explanations Instead of finding structure in a data set, we - PowerPoint PPT Presentation

Finding Explanations Instead of finding structure in a data set, we are now focusing on methods that find explanations for an unknown dependency within the data. Given: Dataset D = { ( x i , Y i ) | i = 1 , ..., n } with n tuples x : Object description Y : Target attribute nominal: classification problem numerical: regression problem Data analysis Supervised (because we know the desired outcome) Descriptive (because we care about explanation) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 1 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Bayes Classifiers Given: Dataset D = { ( x i , Y i ) | i = 1 , ..., n } with n tuples x : Object description Y : Nominal target attribute ⇒ classification problem Bayes classifiers express their model in terms of simple probabilities. Provide ’gold standard’ for evaluating other learning algorithms. Any other model should at least perform as well as the naive Bayes classifier. Suggestion Before trying to apply more complex models, a quick look at a Bayes classifier can be helpful to get a feeling for realistic accuracy expectations and simple dependencies in the data. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 2 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Bayes’ theorem P ( h | E ) = P ( E | h ) · P ( h ) P ( E ) Interpretation The probability P ( h | E ) that a hypothesis h is true given event E has occurred, can be derived from P ( h ) the probability of the hypothesis h itself, P ( E ) the probability of the event E and P ( E | h ) the conditional probability of the event E given the hypothesis h . Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 3 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Choosing Hypotheses We want the most probable hypothesis h ∈ H for a given event E Maximum a posteriori hypothesis: h MAP = arg max P ( h | E ) h ∈ H P ( E | h ) P ( h ) = arg max P ( E ) h ∈ H = arg max P ( E | h ) P ( h ) h ∈ H Maximum likelihood If we assume that every hypothesis h ∈ H is equally probable a priori ( P ( h i ) = P ( h j ) for all h i , h j ∈ H ) we can further simplify the equation and get the maximum likelihood hypothesis: h ML = arg max P ( E | h ) h ∈ H Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 4 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Bayes classifiers The probability P ( h ) can be estimated easily based on a given data set D : P ( h ) = no. of data from class h no. of data In principle, the probability P ( E | h ) could be determined analogously based on the values of the attributes A 1 , . . . , A m , i.e. the attribute vector E = ( a 1 , . . . , a m ) . P ( E | h ) = no. of data from class h with values ( a 1 , . . . , a m ) no. of data from class h Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 5 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Bayes classifiers Problem For n = 10 nominal attributes A 1 , . . . , A 10 , each having three possible values, we would need 3 10 = 59049 data objects to have at least one example per combination. Therefore, the computation is carried out under the (na¨ ıve, unrealistic) assumption that the attributes A 1 , . . . , A m are independent given the class, i.e. � P ( E = ( a 1 , . . . , a m ) | h ) = P ( a 1 | h ) · . . . · P ( a m | h ) = P ( a i | h ) a i ∈ E P ( a i | h ) can be computed easily: P ( a i | h ) = no. of data from class h with A i = a i no. of data from class h Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 6 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Na¨ ıve Bayes classifier Given: A data set with only nominal attributes. Based on the values a 1 , . . . , a m of the attributes A 1 , . . . , A m a prediction for the value of the attribute H should be derived: For each class h ∈ H compute the likelihood L ( h | E ) under the assumption that the A 1 , . . . , A m are independent given the class � L ( h | E ) = P ( a i | h ) · P ( h ) . a i ∈ E Assign E to the class h ∈ H with the highest likelihood pred ( E ) = arg max L ( h | E ) . h ∈ H This Bayes classifier is called na¨ ıve because of the (conditional) independence assumption for the attributes A 1 , . . . , A m . Although this assumption is unrealistic in most cases, the classifier often yields good results, when not too many attributes are correlated. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 7 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Example Given the dataset D : ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m we want to predict the sex ( male or female ) of a person x with the following attribute values: x = ( Height = tall, Weight = low, Long hair = yes ) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 8 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Example We need to calculate L ( Sex = m | Height = t, Weight = l, Long hair = y ) = P ( Height = t | Sex = m ) · P ( Weight = l | Sex = m ) · P ( Long hair = y | Sex = m ) · P ( Sex = m ) and L ( Sex = f | Height = t, Weight = l, Long hair = y ) = P ( Height = t | Sex = f ) · P ( Weight = l | Sex = f ) · P ( Long hair = y | Sex = f ) · P ( Sex = f ) . Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 9 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Example P ( Height = t | Sex = m ) ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 10 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Example P ( Height = t | Sex = m ) ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 11 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Example P ( Height = t | Sex = m ) = 2 / 4 = 1 / 2 ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 12 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Example P ( Weight = l | Sex = m ) = 0 / 4 = 0 ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 13 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Example P ( Long hair = y | Sex = m ) = 0 / 4 = 0 ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 14 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Example P ( Sex = m ) = 4 / 10 = 2 / 5 ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 15 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Example L ( Sex = m | Height = t, Weight = l, Long hair = y ) = 2 4 · 0 4 · 0 4 · 4 10 = 1 2 · 0 · 0 · 2 5 = 0 ⇒ the likelihood of person x being a men is 0 . ID Height Weight Long hair Sex 1 m n n m 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 16 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Example P ( Height = t | Sex = f ) ID Height Weight Long hair Sex 1 m n n m 2 s l y f 3 t h n m 4 s n y f 5 t n y f 6 s l n f 7 s h n m 8 m n n f 9 m l y f 10 t n n m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 17 / 39 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Finding Explanations Instead of finding structure in a data set, we - PowerPoint PPT Presentation

Finding Explanations Instead of finding structure in a data set, we are now focusing on methods that find explanations for an unknown dependency within the data. Given: Dataset D = { ( x i , Y i ) | i = 1 , ..., n } with n tuples x : Object

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Generating Visual Explanations Lisa et al. Seoul National University

Z Explanations of Neutral Current B Anomalies by Ben Allanach (University of Cambridge)

What Gets Echoed? Understanding the Pointers in Explanations of Persuasive Arguments David

Thoughts on Martin-Lfs Meaning Explanations Peter Dybjer Chalmers tekniska hgskola,

Table of Contents I Diagnostic Agents Recording the History of a Domain Defining Explanations

Counterfactual Visual Explanations Yash Goyal Ziyan Wu Jan Ernst Dhruv Batra Devi Parikh

Proof explanations: using natural language and graph view Fr ed erique GUILHOT Hanane

Explanations in Constraint Programming Barry OSullivan Cork Constraint Computation Centre

A User Study on the Effect of Aggregating Explanations for Interpreting Machine Learning Models

Tests, Games, and Martin-Lfs Meaning Explanations for Intuitionistic Type Theory Meeting on

STATUS COUNT FINDING APPROVED 5 FINDING CONDITIONAL 16 FINDING DENIED 11

Tree Pr ee Proximity ximity Finding the good and bad of trees. joe@buildfax.com Tree

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Finding Similar Items:Nearest Neighbor Search Barna Saha March 29, 2018 Finding Similar Items

Type Systems Authored By Luca Cardelli ACM Computing Surveys, 1996 Type Systems - Why, What

Applications of Newtons Laws Sample Problems Homework 1 Sample Problem 1 In the

Lecture 5: Statistical models of images Today Review Fourier transforms Image statistics

Large-Scale Wi-Fi Traffic in Public Hot-Spots Amitabha Ghosh * , R. Jana + , V. Ramaswami + , J.

Place Title Here The Kim Foundation o Founded in honor of Larry Courtnages Daughter, Kim o We

LANGANKES LANDING Traffic Impact Analysis Summary Craig D. Yannes, PE, PTOE, RSP1, Project

The webinar will be starting soon... Cannabis Licensing AMA Dominique Shakramy Jim Breese

Asynchronous Mgmt Architecture (AMA) From draft-birrane-dtn-ama-05 Edward Birrane