Foundations Boolean Reasoning - George Boole, 1847, Brown 1990 - - PowerPoint PPT Presentation

foundations
SMART_READER_LITE
LIVE PREVIEW

Foundations Boolean Reasoning - George Boole, 1847, Brown 1990 - - PowerPoint PPT Presentation

Foundations Boolean Reasoning - George Boole, 1847, Brown 1990 Rough Sets - Zdzislaw Pawlak, 1981 An Introduction To Rough Sets The Rough Set philosophy: ability to discern and classify Jan Komorowski The Linnaeus Centre for


slide-1
SLIDE 1

An Introduction To Rough Sets

Jan Komorowski

The Linnaeus Centre for Bioinformatics Uppsala University and The Swedish University of Agricultural Sciences 2

Foundations

  • Boolean Reasoning - George Boole, 1847, Brown 1990
  • Rough Sets - Zdzislaw Pawlak, 1981

The Rough Set philosophy:

  • ability to discern and classify

Boolean reasoning provides:

  • synthesis of minimal, approximate definitions of concepts

3

Knowledge Representation

Imperfect data

– incomplete

  • missing values
  • missing attributes
  • missing attribute values

– inconsistent – noisy

Data tables

– information systems – decision tables – distributed data tables

Size of the data

4

26-49 26-49 26-49 1-25 1-25 50 LEMS female 46-60 x7 female 16-30 x6 female 46-60 x5 male 31-45 x4 male 31-45 x3 male 16-30 x2 male 16-30 x1 Sex Age

Information systems A=(U, A)

universe U={x1, x2,…, xn} attributes A={a1, a2,…, ap} a : U Va , a∈Α Va : the value set of a

slide-2
SLIDE 2

5

Indiscernibility

INDA(B) = {(x, x´) ∈ U2 | ∀a ∈ B a(x) = a(x´)} B⊆A: subset of attributes

Example

INDA({Age}) = {{x1, x2, x6}, {x3, x4}, {x5, x7}} INDA({LEMS}) = {{x1}, {x2}, {x3, x4} , {x5, x6 , x7}} INDA({Age, LEMS}) = {{x1}, {x2}, {x3, x4} , {x5, x7} , {x6}}

Nota bene

Objects x3 and x4, x5 and x7 are (pair-wise) indiscernible INDA({Age, LEMS, Sex}) = INDA({Age, LEMS})

26-49 26-49 26-49 1-25 1-25 50 LEMS female 46-60 x7 female 16-30 x6 female 46-60 x5 male 31-45 x4 male 31-45 x3 male 16-30 x2 male 16-30 x1 Sex Age

6

Decision system

A=(U, A∪{d})

A : decision system

26-49 26-49 26-49 1-25 1-25 50 LEMS No 46-60 x7 Yes 16-30 x6 No 46-60 x5 Yes 31-45 x4 No 31-45 x3 No 16-30 x2 Yes 16-30 x1 Walk Age

7

Decision system

A=(U, A∪{d})

Objects x3 and x4, x5 and x7 are (pairwise) indiscernible NB: x3 and x4 have different decision (outcome)

26-49 26-49 26-49 1-25 1-25 50 LEMS No 46-60 x7 Yes 16-30 x6 No 46-60 x5 Yes 31-45 x4 No 31-45 x3 No 16-30 x2 Yes 16-30 x1 Walk Age

8

Decision system

Decision rules

“if Age is 16-30 and LEMS is 50 then Walk is Yes”

26-49 26-49 26-49 1-25 1-25 50 LEMS No 46-60 x7 Yes 16-30 x6 No 46-60 x5 Yes 31-45 x4 No 31-45 x3 No 16-30 x2 Yes 16-30 x1 Walk Age

slide-3
SLIDE 3

9

Set approximation

A=(U, A), B⊆ A

[ ] { } [ ] { } ( ) ( ) ( ) ( )

) ( 1 : ion approximat

  • f

Accuracy if is set A : | : | : X vagueness X X B X B X X BN rough X B X B X BN X

  • f

region boundary B X x x X B X

  • f

ion approximat upper B X x x X B X

  • f

ion approximat lower B

B B B B B B

− ≤ ≤ = / ≠ − = − / ≠ ∩ = − ⊆ = − α α

10

Set approximation

Walk B Walk Walk B ⊂ ⊂ : Example

Approximating the set of Walk-ing patients, using the two condition attributes Age and LEMS. Equivalence classes contained in the corresponding regions are shown.

11

Rough Membership

( ) [ ] ( ) [ ] [ ] ( ) ( )

B x X x

  • x

µ x X x x U x

B X B B B X B X

, | Pr

  • f

estimate based frequency a :

  • f

tion Interpreta and 1 0, : ∈ ∩ = → µ µ The rough membership function quantifies the degree of relative

  • verlap between the set X and

the equivalence class to which x belongs.

12

Reducts

Towards approximate minimal definitions of concepts

A reduct is a minimal set of attributes B⊆A such that INDA (B) = INDA (A)

Example of a reduct: INDA ({Experience, Reference}) = INDA (A)

Good No High MBA x7 No Yes Yes Yes Yes Yes Yes French Low High Medium High Low Low Medium Experience Excellent Excellent Neutral Neutral Good Neutral Excellent Reference MCE x8 MSc x6 MSc x5 MSc x4 MCE x3 MBA x2 MBA x1 Diploma

slide-4
SLIDE 4

13

Discernibility Matrices and Functions

A=(U, A)

A discernibility matrix of A is a symmetric n × n matrix with entries cij = {a ∈ A | a(xi) ≠ a(xj)} for i, j = 1,… , n. A discernibility function fA for an information system A is a Boolean function of m Boolean variables a*

1,… , a* m

(corresponding to attributes a1,… , am) fA (a*

1,… , a* m) = ∧{∨ c * ij | 1 ≤ j ≤ i ≤ n, c* ij ≠ ∅}

where c*

ij = {a* | a ∈ cij}

Note: we use a instead of a* in the sequel

14

Discernibility Matrices and Functions

An implicant of a Boolean function f is any conjunction of literals (variables or their negations) such that, if the values of these literals are true under an arbitrary valuation v of variables, then the value of the function f under v is also true. A prime implicant is a minimal implicant. The set of all prime implicants of fA determines the set of all reducts

  • f A.

15

Example Hiring: unreduced table

Good No High MBA x7 No Yes Yes Yes Yes Yes Yes French Low High Medium High Low Low Medium Experience Excellent Excellent Neutral Neutral Good Neutral Excellent Reference MCE x8 MSc x6 MSc x5 MSc x4 MCE x3 MBA x2 MBA x1 Diploma

Discernibility Matrices and Functions

16

Discernibility Matrices and Functions

d, e, f, r d, f, r r e ∅ [x4] f, r d, e, f d, e, r d, e, r d, e, r ∅ [x3] d, f, r e, f, r d, e, r d, e d, e d, r ∅ [x2] d, e, f e, f, r d, e d, r d, e, r d, e, r e, r ∅ [x1] d, e, f d, f, r ∅ [x6] d, e, f, r d, e, f, r e, r ∅ [x5] d, e, r ∅ [x7] ∅ [x8] [x7] [x8] [x6] [x5] [x4] [x3] [x2] [x1]

Example Hiring: discernibility matrix

slide-5
SLIDE 5

17

Discernibility Matrices and Functions

Example Hiring: discernibility function fA(d, e, f, r) = (e∨r) (d∨e∨r) (d∨e∨r) (d∨r) (d∨e) (e∨f∨r) (d∨e∨f) (d∨r) (d∨e) (d∨e) (d∨e∨r) (e∨f∨r) (d∨f∨r) (d∨e∨r) (d∨e∨r) (d∨e∨r) (d∨e∨f) (f∨r) (e) (r) (d∨f∨r) (d∨e∨f∨r) (e∨r) (d∨e∨f∨r) (d∨e∨f∨r) (d∨f∨r) (d∨e∨f) (d∨e∨r) After simplification: fA(d, e, f, r) = e∧r INDA ({Experience, Reference}) = INDA (A)

18

k-relative Reducts

Towards approximate description of decision classes Example Hiring: consider one complete column in the discernibility matrix

d, e, f, r d, f, r r e ∅ [x4] f, r d, e, f d, e, r d, e, r d, e, r ∅ [x3] d, f, r e, f, r d, e, r d, e d, e d, r ∅ [x2] d, e, f e, f, r d, e d, r d, e, r d, e, r e, r ∅ [x1] d, e, f d, f, r ∅ e, r r d, e, r d, e, r d, e [x6] d, e, f, r d, e, f, r e, r ∅ [x5] d, e, r ∅ [x7] ∅ [x8] [x7] [x8] [x6] [x5] [x4] [x3] [x2] [x1]

fA[x6](d, e, f, r) = (d∨e) (d∨e∨r) (d∨e∨r) (r) (e∨r) (d∨f∨r) (d∨e∨f) After simplification: fA[x6] = r ∧ (d∨e)

19

k-relative Reducts

Towards approximate description of decision classes Example “The sixth equivalence class can be discerned from the other classes by the attributes Reference and Diploma or Reference and Experience”. A Boolean function restricted to the conjunction running only over column k in the discernibility matrix (instead of over all columns) gives the k-relative discernibility function. The set of all prime implicants of this function determines the set of all k-relative reducts of A. k-relative reducts reveal the minimum amount of information needed to discern xk ∈ U (or, more precisely, [xk] ⊆ U) from all

  • ther objects.

20

The Classification Problem

A=(U, A∪{d})

Example Hiring: the (reordered) decision table Given a training set (a decision sub-table U’ ⊂ U) find an approximation of decision d.

Accept Neutral Yes High MSc x4 Reject Neutral Yes Medium MSc x5 No Yes Yes No Yes Yes French Low Low Low High High Medium Experience Excellent Good Neutral Good Excellent Excellent Reference Reject MCE x8 Reject MCE x3 Reject MBA x2 Accept MBA x7 Accept MSc x6 Accept MBA x1 Decision Diploma

slide-6
SLIDE 6

21

B-positive region of A

Consistent/Inconsistent Decision Systems

U x x tic determinis consistent i x d x A IND x U x i x V U

A A d A

∈ = ∂ = ∈ ∃ = ∂ → ∂ any for 1 | ) ( | if ) ( is } ) ( and ) ( ' ' | { ) ( ) ( : A P

Example Walk is inconsistent, Hiring is consistent Corollary

A B B d POS d POS U d POS

B B B B A

⊆ = ∂ = ∂ = ' , sets empty

  • non
  • f

pair any for ) ( ) ( then , If consistent is iff ) (

' '

A

22

B-positive region of A

). ( by denoted is and the called is ... set then the ,

  • f

classes decision are If

) ( 1 ) ( 1

d POS

  • f

region positive

  • B

X B X B , X , X

B d r d r

A ∪ ∪ … A

A A

Example U X A X A Hiring U X A X A Walk

Reject Accept No Yes

= ∪ ≠ ∪ : ble decison ta : able decision t

23

The Classification Problem

Example Let A=(U, A∪{d}) be consistent, M(A) = (cij) its discernibility matrix. Construct the decision-relative discernibility matrix of A: Md(A) = (cd

ij) assuming cd ij = ∅ if d(xi) = d(xj), and

cd

ij = cij - {d} otherwise.

Each of the two decision-relative reducts: {Experience, Reference} and {Diploma, Experience} uniquely defines to which decision class an object belongs

Accept Neutral Yes High MSc x4 Reject Neutral Yes Medium MSc x5 No Yes Yes No Yes Yes French Low Low Low High High Medium Experience Excellent Good Neutral Good Excellent Excellent Reference Reject MCE x8 Reject MCE x3 Reject MBA x2 Accept MBA x7 Accept MSc x6 Accept MBA x1 Decision Diploma

24

Discerning Objects from Different Classes

d, e, r d, e, f, r d, e, f e, f, r ∅ [x7] d, e, f e, r d, e, r d, e, r ∅ ∅ [x6] d, e, f, r e d, e, r d, e ∅ ∅ ∅ [x4] d, e, f d, r d, e, r e, r ∅ ∅ ∅ ∅ [x1] ∅ ∅ ∅ [x3] ∅ ∅ ∅ ∅ [x2] ∅ ∅ [x5] ∅ [x8] [x5] [x8] [x3] [x2] [x7] [x6] [x4] [x1]

Example Hiring: the decision-relative discernibility matrix Reduced decision-relative discernibility function: fdM (A) = ed ∨ er

slide-7
SLIDE 7

25

Discerning Objects from Different Classes

Example Question: How to discern, e.g., [x1] (with decision Accept) from

  • bjects belonging to the other decision classes?

Choose the column for that object in Md(A), and reduce: ed ∨ dr ∨ er ∨ rf

d, e, r d, e, f, r d, e, f e, f, r ∅ [x7] d, e, f e, r d, e, r d, e, r ∅ ∅ [x6] d, e, f, r e d, e, r d, e ∅ ∅ ∅ [x4] d, e, f d, r d, e, r e, r ∅ ∅ ∅ ∅ [x1] ∅ ∅ ∅ [x3] ∅ ∅ ∅ ∅ [x2] ∅ ∅ [x5] ∅ [x8] [x5] [x8] [x3] [x2] [x7] [x6] [x4] [x1] 26

Semantics of Conditional Formulae

Reduct: {Diploma, Experience} Decision Rule: “if Diploma is MBA and Experience is Medium then Decision is Accept” Descriptors: a = v Formulae: F(B, V): the least set containing atomic formulae over B ⊆ A ∩ {d} and V and closed wrt. the propositional connectives ∧, ∨ and ¬ Semantics: Let F(B, V) |ϕ|A denotes the meaning of in the decision table A which is the set of all objects in U with the property ϕ: 1. if ϕ is of the form a = v the |ϕ|A = {x U | a(x) = v} 2. |ϕ∧ϕ|A = |ϕ|A ∪ |ϕ´|A; |ϕ∧ϕ|A = |ϕ|A ∩ |ϕ´|A; |¬ϕ|A = U - |ϕ|A The set F(B, V) is called the set of conditional formulae of A and is denoted C (B, V)

27

Semantics of Decision Rules

A decision rule for A is any expression of the form ϕ ⇒ d = v where ϕ ∈ C (B, V), v ∈ Vd and |ϕ|A ≠ ∅ Formulae ϕ and d = v are respectively referred to as the predecessor and the successor of decision rule ϕ ⇒ d = v. The decision rule ϕ ⇒ d = v is true in A if, and only if, |ϕ|A ⊆ |d = v|A

Example Diploma = MBA ∧ Experience = Medium ⇒ Decision = Accept Experience = Low ∧ Reference = Good ⇒ Decision = Reject Diploma = MBA ∧ Experience = Medium ⇒ Decision = Reject

The first two rules are true in the Hiring example, while the third is not true in that example.

28

The KDD Process Using Rough Sets

Raw decision table Discretize attributes Compute reducts Generate rules Format rules Discretized decision table Set of rules Set of Rosetta rules Set of Prolog rules Set of cuts

Example of algorithm pipeline

slide-8
SLIDE 8

29

The Modelling Process (simplified)

  • 1. Discretization: transform non-categorical attributes in a decision

table into categorical ones.

  • 2. Rule induction: synthesize decision rules from a decision table.
  • rule support sets
  • feature selection
  • dynamic reducts: reducts from random subsets of the universe
  • 3. Rule application: apply the extracted decision rules to classify new

cases.

  • scan to find applicable rules, i.e. rules whose predecessors match the

case

30

The Modelling Process (simplified)

  • If no rule is found, i.e. no rule “fires”: the most frequent outcome

in the training data is chosen.

  • If more than one rule “fires”: indication of more than one possible
  • utcome.

– Perform a voting process among the rules that “fire” in order to resolve conflicts and to rank the predicted outcomes. – A rule casts as many votes in favour of its outcome as its associated support count. – The votes from all the rules are then accumulated and divided by the total number of votes cast in order to arrive at a numerical measure of certainty for each outcome. – This measure of certainty is not really a probability, but may be interpreted as an approximation thereof, if the model is well calibrated.

  • The modelling procedure can be repeated in a systematic fashion, for

instance, by employing a cross-validation scheme.

31

The Rosetta Toolkit

The Rosetta toolkit for rough set data analysis runs under

  • Windows. A limited version (25 attributes, 500 objects)

is publicly available at: http://rosetta.lcb.uu.se/