[PPT] - Learning Terminological Na ve Bayesian Classifiers Under Different PowerPoint Presentation

SLIDE 1

Learning Terminological Na¨ ıve Bayesian Classifiers Under Different Assumptions

n Missing Knowledge

Pasquale Minervini Claudia d’Amato Nicola Fanizzi

Department of Computer Science University of Bari

URSW 2011 ⋄ Bonn, October 23, 2011

SLIDE 2

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works

Introduction & Motivation

2

Background

3

Learning a Terminological Na¨ ıve Bayesian Network

4

Classifying individuals with a TBN

5

Conclusions and Future Works

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 3

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works

Introduction & Motivations

Uncertainty is inherently present in real-world knowledge In the SW context difficulties arise modeling real-world domains using only purely logical formalisms Several approaches for coping with unceratin knowledge have been proposed (probabilistic, fuzzy,...)

usually probabilistic information is assumed to be available the CWA is adopted

⇓ Exploiting an already populated ontology, a method capturing probabilistic information could be of help

the OWA has to be taken into account

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 4

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works

Paper Contributions

Proposal of a Terminological na¨ ıve Bayesian classifier for predicting class-membership probabilistically it is a na¨ ıve Bayesian network modeling the conditional dependencies between a learned set of Description Logic (complex) concepts and a target concept it deals with the incomplete knowledge due the OWA by considering different ignorance models:

Missing Completely at Random Missing at Random Informatively Missing

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 5

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works The Reference Representation Language Bayesian Networks

Knowledge Base Representation

Assumption: resources, concepts and relationships are defined in terms of a representation that can be mapped to some DL language (with the standard model-theoretic semantics) K = T , A T-box T is a set of definitions A-box A contains extensional assertions on concepts and roles e.g. C(a) and R(a, b) The set of the individuals (resources) occurring in A will be denoted Ind(A)

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 6

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works The Reference Representation Language Bayesian Networks

Basics of Bayesian Networks...

A Bayesian network (BN) is a DAG G representing the conditional dependencies in a set of random variables Each vertex in G corresponds to a random variable Xi Each edge in G indicates a direct influence relation between the two connected random variables A set of conditional probability distributions θG is associated with each vertex Each vertex Xi in G is conditionally independent of any subset S ⊆ Nd(Xi) of vertices that are not descendants of Xi

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 7

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works The Reference Representation Language Bayesian Networks

...Basics of Bayesian Networks

The joint probability distribution Pr(X1, . . . , Xn) over a set of random variables {X1, . . . , Xn} is computed as Pr(X1, . . . , Xn) =

n

i=1

Pr(Xi | parents(Xi)); Given a BN, it is possible to evaluate inference queries by marginalization To decrease the inference complexity the na¨ ıve Bayes network is often considered

it is assumed that the presence (or absence) of a particular feature (random variable) of a class is unrelated to the presence (or absence) of any other feature, given the class variable (random variable)

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 8

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works Defining a Terminological Na¨ ıve Bayesian Network Learning a TBN: Problem Definition TBN: the Learning Algorithm The Ignorance Models

Terminological Na¨ ıve Bayesian Network: Definition

A Terminological Bayesian Network (TBN) NK, w.r.t. a DL KB K, is defined as a pair G, ΘG, where: G = V, E is a directed acyclic graph, in which: V = {F1, . . . , Fn, C} is a set of vertices, each Fi representing a DL (eventually complex) concepts defined over K and C representing a target concept E ⊆ V × V is a set of edges, modeling the dependence relations between the elements of V; ΘG is a set of conditional probability distributions (CPD), one for each V ∈ V, representing the conditional probability of the feature concept given its parents in the graph

In the case of a Terminological Na¨ ıve Bayesian Network, E = {C, Fi | i ∈ {1, . . . , n}}, namely ∀i, j ∈ {1, . . . , n} and i = j Fi is independent

f Fj given the value of the target concept
C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 9

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works Defining a Terminological Na¨ ıve Bayesian Network Learning a TBN: Problem Definition TBN: the Learning Algorithm The Ignorance Models

Terminological Na¨ ıve Bayesian Network: Example

Given: a set of DL feature concepts F = {Female, HasChild := ∃hasChild.Person} (variable names are used instead of complex feature concepts) a target concept Father the Terminological Na¨ ıve Bayesian Network is:

Pr(Female|Father) Pr(Female|¬Father) Pr(HasChild|Father) Pr(HasChild|¬Father)

Father Female HasChild

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 10

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works Defining a Terminological Na¨ ıve Bayesian Network Learning a TBN: Problem Definition TBN: the Learning Algorithm The Ignorance Models

Learning a TBN: Problem Definition

Given: a target concept C a DL KB K = T , A the sets of of positive, negative and neutral examples for C, denoted with Ind+

C (A), Ind− C (A) and Ind0 C(A), so that:

∀a ∈ Ind+

C (A) : K |

= C(a), ∀a ∈ Ind−

C (A) : K |

= ¬C(a), ∀a ∈ Ind0

C(A) : K |

= C(a) ∧ K | = ¬C(a);

an ignorance model a scoring function score for a TBN NK w.r.t. IndC(A) Find: a network N ∗

K maximizing the scoring function

N ∗

K ← arg max NK

score(NK, IndC(A)))

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 11

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works Defining a Terminological Na¨ ıve Bayesian Network Learning a TBN: Problem Definition TBN: the Learning Algorithm The Ignorance Models

TBN: the Learning Algorithm...

function learn(K, IndC(A)) {The TBN is initialized as containing only the target concept node} N ∗

K = G, ΘG;

G = V ← {C}, E ← ∅; NK ← ∅; repeat NK ← N ∗

K;

{A new network is created, having one more node and different parameters than the previous one} Network = c′, N ′

K, s′ ← extend(NK, IndC(A));

N ∗

K ← N ′ K;

{Possible stopping conditions: a) improvements in score below a threshold; b) reaching a maximum number of nodes} until stopping criterion on Network; return NK;

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 12

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works Defining a Terminological Na¨ ıve Bayesian Network Learning a TBN: Problem Definition TBN: the Learning Algorithm The Ignorance Models

...TBN: the Learning Algorithm

function extend(NK, IndC(A)) Concept ← Start; Best ← ∅; repeat Concepts ← ∅; for c′ ∈ {c′ ∈ ρcl

↓ (Concept) | |c′| ≤ min(|c| + d, maxLen)} do

V′ ← V ∪ {c′}; N ′

K ← optimalNetwork(V′, IndC(A));

s′ ← score(N ′

K, IndC(A));

Concepts ← Concepts ∪ {c′, N ′

K, s′};

end for Best ← arg maxc′,N ′

K,s′∈Concepts∪{Best} s′;

Concept ← c : c, NK, s = Best; {Possible stopping conditions: a) exceeding a maximum number of iterations; b) exceeding a maximum number of refinement steps} until Stopping criterion on Best; return Best;

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 13

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works Defining a Terminological Na¨ ıve Bayesian Network Learning a TBN: Problem Definition TBN: the Learning Algorithm The Ignorance Models

Learning a Na¨ ıve TBN: Example

Initial Network Searching for the first feature Adding the first feature to the network Searching for the second feature Adding the second feature to the network Father ∃hasChild.Person Father ∃hasParent.Person ∃hasChild.Person Female Female ∃hasChild.⊤ Male Person ∃hasParent.⊤ ∃married.⊤ Female Mammal Person ⊤ ⊤ ∃hasSibling.⊤ ∃married.⊤ Father

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 14

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works Defining a Terminological Na¨ ıve Bayesian Network Learning a TBN: Problem Definition TBN: the Learning Algorithm The Ignorance Models

The ignorance models

To learn the TBN, different assumptions (ignorance models) on the nature of the missing information are considered, given an ideal KB K∗ having additional knowledge: MCAR (Missing Completely At Random) – the probability that a ∈ C I is missing is independent of any kind of (additional) knowledge: Pr(K | = C(a) ∧ K | = ¬C(a) | K∗) = Pr(K | = C(a) ∧ K | = ¬C(a)); MAR (Missing At Random) – the probability that a ∈ C I is missing depends only from K and does not depend on additional knowledge: Pr(K | = C(a)∧K | = ¬C(a) | K∗) = Pr(K | = C(a)∧K | = ¬C(a) | K); NMAR (Not Missing At Random or IM, Informatively Missing) – the probability that a ∈ C I is missing could be not the same if additional knowledge is available Pr(K | = C(a)∧K | = ¬C(a) | K∗) = Pr(K | = C(a)∧K | = ¬C(a) | K).

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 15

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works Defining a Terminological Na¨ ıve Bayesian Network Learning a TBN: Problem Definition TBN: the Learning Algorithm The Ignorance Models

TBN under MCAR assumption

Only positive and negative examples is considered Parameters estimated by the use of the frequency distribution score computed as the log-likelihood on training data: L(NK | IndC(A)) = =

a∈Ind+

C (A)

log Pr(C(a) | NK) +

a∈Ind−

C (A)

log Pr(¬C(a) | NK);

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 16

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works Defining a Terminological Na¨ ıve Bayesian Network Learning a TBN: Problem Definition TBN: the Learning Algorithm The Ignorance Models

TBN under MAR assumption

Positive, negative and neutral examples are considered The EM algorithm is adopted for parameters estimation score is computed as the log-likelihood on training data considering also the neutral examples L(NK | IndC(A)) =

a∈Ind0

C (A)

C ′∈{C,¬C}

log Pr(C ′(a) | NK) Pr(C ′ | NK) +

a∈Ind+

C (A)

log Pr(C(a) | NK) +

a∈Ind−

C (A)

log Pr(¬C(a) | NK);

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 17

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works Defining a Terminological Na¨ ıve Bayesian Network Learning a TBN: Problem Definition TBN: the Learning Algorithm The Ignorance Models

TBN under NMAR assumption

Positive and negative examples are considered For the neutral examples, all the possible fillings are considered Robust Bayesian estimation (RBE) is adopted to learn conditional probability distributions

probability intervals are determined instead of single probability values

score: as for MCAR considering the mean value of the probability intervals

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 18

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works

Classifying individuals with a TBN: Example

Given: the feature concepts F = {Female, HasChild} the target concept Father the na¨ ıve TBN of the previous example the DL KB K an individual a s.t. K | = HasChild(a) while the membership of a to Female is not known The probability that a is instance of Father is given by:

Pr(Father(a)) = Pr(Father) Pr(HasChild | Father)

Father′∈{Father,¬Father}

Pr(Father′) Pr(HasChild | Father′) ;

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 19

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works

Classifying individuals using RBE: Example

A na¨ ıve TBN using Robust Bayesian Estimation for inferring posterior probability intervals in presence of NMAR assumption is s.t. conditional probability contain probability intervals (defined by upper and lower bound) instead of probability values

Father Female HasChild

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 20

Introduction & Motivation Background Learning a Terminological Na¨ ıve Bayesian Network Classifying individuals with a TBN Conclusions and Future Works

Conclusions & Future Work

Conclusions: Proposed a ML method based on the na¨ ıve Bayes assumption for estimating the probability that a generic individual belongs to a certain target concept, given its membership relation to an induced set of (complex) DL concepts an ignorance model for handling incomplete knowledge Future works: experimenting with the method finding optimizations of the proposed method

C. d’Amato

Learning Terminological Na¨ ıve Bayesian Classifiers

SLIDE 21

Learning Terminological Na¨ ıve Bayesian Classifiers Under Different Assumptions

Pasquale Minervini Claudia d’Amato Nicola Fanizzi

Department of Computer Science University of Bari

URSW 2011 ⋄ Bonn, October 23, 2011

Contents

Introduction & Motivation

Background

Learning a Terminological Na¨ ıve Bayesian Network

Classifying individuals with a TBN

Conclusions and Future Works

Introduction & Motivations

Uncertainty is inherently present in real-world knowledge In the SW context difficulties arise modeling real-world domains using only purely logical formalisms Several approaches for coping with unceratin knowledge have been proposed (probabilistic, fuzzy,...)

⇓ Exploiting an already populated ontology, a method capturing probabilistic information could be of help

Paper Contributions

Knowledge Base Representation

Basics of Bayesian Networks...

...Basics of Bayesian Networks

The joint probability distribution Pr(X1, . . . , Xn) over a set of random variables {X1, . . . , Xn} is computed as Pr(X1, . . . , Xn) =

Pr(Xi | parents(Xi)); Given a BN, it is possible to evaluate inference queries by marginalization To decrease the inference complexity the na¨ ıve Bayes network is often considered

Terminological Na¨ ıve Bayesian Network: Definition

Terminological Na¨ ıve Bayesian Network: Example

Given: a set of DL feature concepts F = {Female, HasChild := ∃hasChild.Person} (variable names are used instead of complex feature concepts) a target concept Father the Terminological Na¨ ıve Bayesian Network is:

Learning a TBN: Problem Definition

Given: a target concept C a DL KB K = T , A the sets of of positive, negative and neutral examples for C, denoted with Ind+

an ignorance model a scoring function score for a TBN NK w.r.t. IndC(A) Find: a network N ∗

N ∗

score(NK, IndC(A)))

TBN: the Learning Algorithm...

...TBN: the Learning Algorithm

Learning a Na¨ ıve TBN: Example

The ignorance models

TBN under MCAR assumption

Only positive and negative examples is considered Parameters estimated by the use of the frequency distribution score computed as the log-likelihood on training data: L(NK | IndC(A)) = =

log Pr(C(a) | NK) +

log Pr(¬C(a) | NK);

TBN under MAR assumption

Positive, negative and neutral examples are considered The EM algorithm is adopted for parameters estimation score is computed as the log-likelihood on training data considering also the neutral examples L(NK | IndC(A)) =

log Pr(C ′(a) | NK) Pr(C ′ | NK) +

log Pr(C(a) | NK) +

log Pr(¬C(a) | NK);

TBN under NMAR assumption

Positive and negative examples are considered For the neutral examples, all the possible fillings are considered Robust Bayesian estimation (RBE) is adopted to learn conditional probability distributions

score: as for MCAR considering the mean value of the probability intervals

Classifying individuals with a TBN: Example

Given: the feature concepts F = {Female, HasChild} the target concept Father the na¨ ıve TBN of the previous example the DL KB K an individual a s.t. K | = HasChild(a) while the membership of a to Female is not known The probability that a is instance of Father is given by:

Classifying individuals using RBE: Example

Conclusions & Future Work

The End

That’s all! Questions ?