1 E. Boros, G.Felici Boolean Seminar, Liblice, April 2013 - - PowerPoint PPT Presentation

1
SMART_READER_LITE
LIVE PREVIEW

1 E. Boros, G.Felici Boolean Seminar, Liblice, April 2013 - - PowerPoint PPT Presentation

E. Boros, G.Felici Boolean Seminar, Liblice, April 2013 Some Results on Threshold Separability of Boolean Functions Endre Boros Giovanni Felici RUTCOR Istituto di Analisi dei Sistemi ed Informatica Rutgers University Consiglio Nazionale


slide-1
SLIDE 1
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

1

Giovanni Felici Istituto di Analisi dei Sistemi ed Informatica Consiglio Nazionale delle Ricerche Roma, Italy giovanni.felici@iasi.cnr.it

Some Results on Threshold Separability

  • f Boolean Functions

Endre Boros RUTCOR Rutgers University New Jersey United States Endre.Boros@rutcor.rutgers.edu

slide-2
SLIDE 2
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

2

  • given

disjoint sets

  • f

points in multidimensional space, find a set of meaningful rules that is able to separate records of different sets with high precision

  • S  Rn, S = P  N, P  N = 

difficult Standard methods = {decision trees, SVM, logistic regression} easy Supervised Learning and Data Mining

slide-3
SLIDE 3
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

3

Binarization of real valued data Each real valued dimension is mapped into a set of intervals, to achieve:

  • Control of noise effect
  • Simplification of separating rules
  • Use of for models in logic form

Binarizion is very important: is at the top of the analysis hierarchy For the i-th coordinate, define a set of cutponts

Ti = {ti1,ti2, …, ti,ki}, tij < ti,j+1

Define binary zij, j = 1,…, ki associated with Ti where

zij = 1 if tij-1  xi  tij 0 otherwise

A binarization induces a set of boxes B in the space of S, each defined by the interserction of the subspaces parallel to the axes that intersect the cutpoints. With proper assumptions and bounding on the cutpoint sets, each box is closed and the set

  • f boxes is finite.
slide-4
SLIDE 4
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

4

Definitions

  • A binarization for S is represented equivalently by a set of box B or by the

cutpoint set T.

  • A binarization T is separable if all the non-empty boxes defined by its cutpoints

either contain positive or negative points From P  N = , we know that there always exists a separable binarization

  • A partially defined boolean function (PDBF) is a set of boolean vectors divided

into two non intersecting subsets

  • Z is a PDBF T is separable

a simple way to obtain a separable binarization is to insert a cutpoint between each pair of consecutive points of different class Trivial conditions on set S guarantee the existance of a separable T-binarization and thus of a PDBF on Z

slide-5
SLIDE 5
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

5

Linear Threshold Boolean Functions

  • A Linear Threshold Boolean Function (LTF) is a boolean function

f:{0,1}n → {-1,1} expressed as f(x) = sgn(c0 +c1x1+…+cnxn)

  • Given S, a binarization T and the resulting set Z, a LTF f : Z → {-1,1} is separating

for S if and only if f(z) = 1 for z in P and f(z) = -1 for z in N.

  • We also say that Z admits a LTF extension

We are interested in the conditions for Z to admit a LTF extension

  • Once a binarization has been applied, we can keep 1 point in each box. Separability and

existance of LTF-extensions are preserved

slide-6
SLIDE 6
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

6

Main Topics:

1. conditions for the existence of a Separating Linear Threshold Function 2. combinatorial necessary and sufficient condition for the existence of such function when points belong to the plane 3. Minimal forbidden structures for the existance of separating LTFs 4. insights for the problem in larger dimensions that can be practically used in distcretization / binarization algorithms to find good LTF.

slide-7
SLIDE 7
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

7

When a LTF cannot be obtained in 2 dimensions

z21=1 z11=1

1 : 1 : 1 : 1 :

21 11 21 21 11 11 11 21 21 21 11 11 01 11 21 21 11 11 10 21 21 11 11 00

                      w w w z w z w w B w w z w z w w B w w z w z w w B w z w z w w B 2 , 2 , 1

21 11

     w w w

The Forbidden Cross (FC)

Any set composed of 3 boxes is LTF- extendable (trivial) Equivalent representations: Positive Negative

slide-8
SLIDE 8
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

8

Minor of a Box Set

A Minor is composed by the 4 “corner” boxes of a larger subset of B

  • Observation. Given S  R2, a T-binarization and the related box set B, then Z

admits a LTF representation if B does not contain any Forbidden Cross (FC) as a minor We like minimal structures that break the LTF representability of Z; this would be very useful to determine algorithmically the right T for S

slide-9
SLIDE 9
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

9

The Spanning Operation and LTF-maximal Box Sets

The spanning operation: each time an empty box is the corner of a potential FC, we assign that box to the only class (color) that avoids the FC

A set of boxes is LTF-maximal when no spanning can be applied

the sequence of spanning

  • perations does not affect the

existence of a LTF-extension Claim 1: Z admits no LTF-extensions iif its LTF-maximal box sets contains a FC

slide-10
SLIDE 10
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

10

LTF-separability in the plane and alternating cycles

Given a box set B, construct the following graph G=(V,E):

  • assing a vertex v to each non-empty box in the plane
  • Draw an edge between each positive box and each negative box

in the same column or row In the example: an alternating cycle of length 4 corresponds to a FC. What happens to longer alt-cycles ?

slide-11
SLIDE 11
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

11

Spanning and alternating cycles Any alt-cycle of length 2n in G can generate an alt-cycle of length 2(n-1) by a spanning

  • peration, down to a cycle of length 4 (i.e., a FC)

1 cycle of length 6 + 1 cycle of length 4 + another cycle of length 4 + another cycle of length 4…

Claim 2: if G contains an alternating cycle, then no LTF-extension is possible

slide-12
SLIDE 12
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

12

Important questions: is spanning allowed ? What are its relation with LTF-extensions? If spanning preserves the LTF extendability, then:

Claim 1: Z admits no LTF-extension iif its LTF-maximal box sets contains a FC Claim 2: if G contains an alternating cycle, then no LTF-extension is possible

are theorems… To make sure, we need some linear programming.

slide-13
SLIDE 13
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

13

LTF-separabilty and Linear Programming (in the plane, for the time being) Given S  R2, T, and Z:

2 1 1 2 1 1

1 , for ' 1 , for ' 1 , for 1 , for k j k N s z b k j N s z a k j k P s z b k j P s z a

i ij ij i ij ij i ij ij i ij ij

                 

2 1 2 1 2 1

, N 1,..., i 1 ' ' P 1,..., i 1 1 max

k k k j j ij k j j ij k j j ij k j j ij

R y R x y b x a a y b x a a           

   

   

x and y are the weights that are associated with the cutpoints in the two coordinates to determine the LTF. So far, we are interested only in feasibility

(1)

Remember that we may use

  • nly one point in each box…
slide-14
SLIDE 14
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

14

LTF Existence Conditions

2 1 ,

1 ) ' , ' ( 1 ) , ( 1 max

k k P P

R y R x y x B A a y x B A a                       

(u,v) are the dual variables associated with constraints on P and N boxes

) , ( ' ' min         v u v e u e vB uB vA uA v u

t t

In compact form And the dual is:

) , ( ' ' min         v u v e u e vB uB vA uA v u

t t

that has empty feasible region if this has solution:

(1) (3) (2) i. Z admits a LTF representation  (1) has solution ii. (1) has solution  (2) has solution iii. If (3) has solution,  (2) has no solution iv. A feasible solution of (3) represents a forbidden structure for LTF-extension

slide-15
SLIDE 15
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

15

Conditions that define a Forbidden Minor

Consider (3):

) , ( ' ' min         v u v e u e vB uB vA uA v u

t t

In detail:

 

 

 

| | 1 | | 1

'

N i i ij P i i ij

v a u a

i. In a “slice” of the plane the sum of the weights of positive boxes is equal to the sum of the weights of negative boxes. ii. At least 2 v and 2 u must be > 0 iii. Must be valid for at least one horizontal and one vertical slice iv. In any slice, if a positive box is present, also a negative one must appear, and viceversa v. When this happens, no LTF extension is possible

slide-16
SLIDE 16
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

16 Conditions for the existence of a Forbidden Minor The FC satisfies (3) and thus it is a forbidden minor (we knew that, already…) ) , ( ' ' min         v u v e u e vB uB vA uA v u

t t

Any alternating cycle in the graph G previously defined satisfies (3) This proves that:

Claim 1: Z admits no LTF-extension iif its LTF-maximal box sets contains a FC Claim 2: if G contains an alternating cycle, then no LTF-extension is possible

LTF-test: span all boxes, check for FC presence

Assign 0 weight to boxes not in the FC, and same weight to the 4 boxes in the cross Assign 0 weight to boxes not in the cycle, and same weight to the boxes in the cycle

slide-17
SLIDE 17
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

17

In higher Dimensions

For example, in R3 we

  • btain a minimal

structure supported by 6 boxes in 3x3x3 Any permutation

  • f this structure is

a Forbidden Minor

  • Any forbidden structure in n-dimension is also a forbidden

structure in (n+1)-dimension

  • Any minimal disposition of boxes that spans each subspace with

at least one positive and one negative box solves problem (3) and forbids LTF-extension

  • We now call the FC 4FC2 (4 boxes, dimension 2)

) , ( .......... .......... ' ' ' max           v u v e u e vC uC vB uB vA uA v u

t t

Another plane Another plane Another plane Another plane Another plane Another plane

slide-18
SLIDE 18
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

18

Alternating paths in graph for higher dimensions (1)

) , ( .......... .......... ' ' ' max           v u v e u e vC uC vB uB vA uA v u

t t

  • G: arcs between two boxes

that lie in the same plane (subspace)

  • An alternating path of length

2n solves (3)

  • This simple “basic” structure

requires n positive and n negative boxes… we call it Basic Minor in 3d (BM3)

  • Are there smaller ones beside

that well known 4FC2?

slide-19
SLIDE 19
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

19

Alternating paths in graph for higher dimensions (2)

) , ( .......... .......... ' ' ' max           v u v e u e vC uC vB uB vA uA v u

t t

4 boxes in dimension 3 (projected onto 2x2x2)

A new type of FC in 3D, made of 4 boxes. We call it 4FC3 Obviusly, it would satisfy (3)

slide-20
SLIDE 20
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

20

Unbalanced structures ?

Now we find 4FC3 We call this the 5-forbidden cross (5FC3). It can be proven to be the only minimal forbidden structure of 5 boxes in the space It is obtained as an alternating cycle after spanning w.r.t. to 4FC2

….but, there is no cycle! Apply spanning w.r.t. to 4FC2 If we assing to one box double weight, it solves (3)

slide-21
SLIDE 21
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

21

Other structures in 3 dimensions:

slide-22
SLIDE 22
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

22

Other structures in 4 dimensions: Claim: all the Forbidden Structures are alternating cycles on the proper G graph once the box system has been spanned according to FC, 4FC3, 5FC3, BM Claim: if no FC, 4FC3, 5FC3, BM is found once the box system has been spanned accordingly, then there exists a LTF extension. No counterexamples so far…

slide-23
SLIDE 23
  • E. Boros, G.Felici Boolean Seminar, Liblice, April 2013

23

Practical relevance?

Objective: given S, find a “nice” LTF representation, if any. We can trivially verify its existance by solving the separation LP (1). What if (1) has no solution? Find

  • S’ in S such that S’ is LTF extendable
  • change the discretization

Our knowledge of LTF forbidden minors may help. Following, some algorithmic ideas that are being considered