Decision trees, protocols, and the Fourier Entropy-Influence - - PowerPoint PPT Presentation

decision trees protocols and the fourier entropy
SMART_READER_LITE
LIVE PREVIEW

Decision trees, protocols, and the Fourier Entropy-Influence - - PowerPoint PPT Presentation

Decision trees, protocols, and the Fourier Entropy-Influence Conjecture Andrew Wan (Simons Institute) John Wright (CMU) Chenggang Wu (Tsinghua) Fourier basics Given a Boolean function , Fourier


slide-1
SLIDE 1

Decision trees, protocols, and the Fourier Entropy-Influence Conjecture

Andrew Wan (Simons Institute) John Wright (CMU) Chenggang Wu (Tsinghua)

slide-2
SLIDE 2

Fourier basics

Given a Boolean function ,

slide-3
SLIDE 3

Fourier basics

Given a Boolean function ,

slide-4
SLIDE 4

Fourier basics

Given a Boolean function , its Fourier transform is .

slide-5
SLIDE 5

Fourier basics

Given a Boolean function , its Fourier transform is . Parseval’s equation:

slide-6
SLIDE 6

Fourier basics

Given a Boolean function , its Fourier transform is . (probability distribution) Parseval’s equation:

slide-7
SLIDE 7

Fourier basics

Given a Boolean function , its Fourier transform is . Parseval’s equation:

slide-8
SLIDE 8

Fourier basics

Given a Boolean function , its Fourier transform is . Parseval’s equation: Write for this probability distribution over sets

slide-9
SLIDE 9

Fourier basics

Given a Boolean function , its Fourier transform is . Parseval’s equation: Write for this probability distribution over sets, i.e. if , then

slide-10
SLIDE 10

Influences

The influence of the ith coordinate is

slide-11
SLIDE 11

Influences

The influence of the ith coordinate is The total influence of is

slide-12
SLIDE 12

Influences

The influence of the ith coordinate is The total influence of is

slide-13
SLIDE 13

Influences

The influence of the ith coordinate is The total influence of is The total influence measures how high up ’s Fourier transform is.

slide-14
SLIDE 14

Low Fourier weight ⇒ simple structure

slide-15
SLIDE 15

Low Fourier weight ⇒ simple structure

If most of ’s “Fourier weight” is on the first level, then , for some coordinate . FKN Theorem [FKN 02]:

slide-16
SLIDE 16

Low Fourier weight ⇒ simple structure

If most of ’s “Fourier weight” is on the first level, then , for some coordinate . FKN Theorem [FKN 02]: Any Boolean essentially depends on only variables. Friedgut’s Theorem [Fri 98]:

slide-17
SLIDE 17

Low Fourier weight ⇒ simple structure

If most of ’s “Fourier weight” is on the first level, then , for some coordinate . FKN Theorem [FKN 02]: Any Boolean essentially depends on only variables. Friedgut’s Theorem [Fri 98]:

(Average Fourier weight)

slide-18
SLIDE 18

Low Fourier weight ⇒ simple structure

If most of ’s “Fourier weight” is on the first level, then , for some coordinate . FKN Theorem [FKN 02]: Any Boolean essentially depends on only variables. Friedgut’s Theorem [Fri 98]:

slide-19
SLIDE 19

Low Fourier weight ⇒ simple structure

If most of ’s “Fourier weight” is on the first level, then , for some coordinate . FKN Theorem [FKN 02]: Any Boolean essentially depends on only variables. Friedgut’s Theorem [Fri 98]: In this paper, is “simple” if it has low Fourier entropy.

slide-20
SLIDE 20

Fourier entropy

Recall: if , then .

slide-21
SLIDE 21

Fourier entropy

The Fourier entropy of is , where denotes the Shannon entropy. Def: Recall: if , then .

slide-22
SLIDE 22

Fourier entropy

The Fourier entropy of is , where denotes the Shannon entropy. Def: i.e., Recall: if , then .

slide-23
SLIDE 23

Fourier entropy

The Fourier entropy of is , where denotes the Shannon entropy. Def: i.e., The Fourier entropy measures how spread out ’s Fourier transform is. Recall: if , then .

slide-24
SLIDE 24

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]:

slide-25
SLIDE 25

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: low Fourier weight ⇒ concentrated Fourier weight

slide-26
SLIDE 26

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: low Fourier weight ⇒ concentrated Fourier weight spread-out Fourier weight ⇒ high-up Fourier weight

slide-27
SLIDE 27

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: Consequences:

slide-28
SLIDE 28

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: Consequences:

  • the KKL Theorem
slide-29
SLIDE 29

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: Consequences:

  • the KKL Theorem
  • Mansour’s Conjecture, which would give an efficient

algorithm for learning DNFs in the agnostic model

slide-30
SLIDE 30

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: Consequences:

  • the KKL Theorem
  • Mansour’s Conjecture, which would give an efficient

algorithm for learning DNFs in the agnostic model

  • sharp thresholds for graph properties with “significant

symmetry”

slide-31
SLIDE 31

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:

slide-32
SLIDE 32

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:

  • , for all
slide-33
SLIDE 33

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:

  • , for all
  • [OT13]
slide-34
SLIDE 34

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:

  • , for all
  • [OT13]
  • FEI holds for: - random DNFs [KLW10]
slide-35
SLIDE 35

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:

  • , for all
  • [OT13]
  • FEI holds for: - random DNFs [KLW10]
  • symmetric functions [OWZ11]
slide-36
SLIDE 36

Fourier Entropy-Influence Conjecture

There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:

  • , for all
  • [OT13]
  • FEI holds for: - random DNFs [KLW10]
  • symmetric functions [OWZ11]
  • read-once Boolean formulas [OT13, CKLS13]
slide-37
SLIDE 37

Decision Trees

1 1 1 1 0 1

slide-38
SLIDE 38

Decision Trees

1 1 1 1 0 1 If you read a

  • 0, go left
  • 1, go right
slide-39
SLIDE 39

Decision Trees

1 1 1 1 0 1 If you read a

  • 0, go left
  • 1, go right

Def: is read-k if every variable appears at most k times.

slide-40
SLIDE 40

Decision Trees

1 1 1 1 0 1 If you read a

  • 0, go left
  • 1, go right

Def: is read-k if every variable appears at most k times.

slide-41
SLIDE 41

Decision Trees

1 1 1 1 0 1 If you read a

  • 0, go left
  • 1, go right

Def: is read-k if every variable appears at most k times.

slide-42
SLIDE 42

Decision Trees

1 1 1 1 0 1 If you read a

  • 0, go left
  • 1, go right

Def: is read-k if every variable appears at most k times.

slide-43
SLIDE 43

Decision Trees

1 1 1 1 0 1 If you read a

  • 0, go left
  • 1, go right

Def: is read-k if every variable appears at most k times.

slide-44
SLIDE 44

Decision Trees

1 1 1 1 0 1 If you read a

  • 0, go left
  • 1, go right

Def: is read-k if every variable appears at most k times.

slide-45
SLIDE 45

Decision Trees

1 1 1 1 0 1 If you read a

  • 0, go left
  • 1, go right

Def: is read-k if every variable appears at most k times.

slide-46
SLIDE 46

Decision Trees

1 1 1 1 0 1 If you read a

  • 0, go left
  • 1, go right

Def: is read-k if every variable appears at most k times.

slide-47
SLIDE 47

Decision Trees

1 1 1 1 0 1 If you read a

  • 0, go left
  • 1, go right

is read-k if every variable appears at most k times. Def: is read-3

slide-48
SLIDE 48

Our results

If is computable by a read-k DT, then .

slide-49
SLIDE 49

Our results

If is computable by a read-k DT, then . If is computable DT with expected depth d, and satisfies , then .

also proven by [CKLS13]

slide-50
SLIDE 50

Our results

If is computable by a read-k DT, then . If is computable DT with expected depth d, and satisfies , then .

also proven by [CKLS13]

The FEI+ conjecture of [OT13] composes.

also proven by [OT13]

slide-51
SLIDE 51

Our technique

  • Want to show for certain

Boolean functions .

slide-52
SLIDE 52

Our technique

  • Want to show for certain

Boolean functions .

  • Previous papers have studied the expression
slide-53
SLIDE 53

Our technique

  • Want to show for certain

Boolean functions .

  • Previous papers have studied the expression
  • We instead take an information theoretic

approach via the Shannon Source Coding Theorem.

slide-54
SLIDE 54

Shannon Source Coding Theorem

Given a random variable , avg # of bits needed to communicate .

slide-55
SLIDE 55

Shannon Source Coding Theorem

Given a random variable , avg # of bits needed to communicate . Thus, to show , we need to construct an efficient protocol for communicating the value

  • f .
slide-56
SLIDE 56

Shannon Source Coding Theorem

Given a random variable , avg # of bits needed to communicate . Thus, to show , we need to construct an efficient protocol for communicating the value

  • f .

(efficient = bits on average)

slide-57
SLIDE 57

a protocol for read-k DTs

slide-58
SLIDE 58

Protocol for read-k DTs

1 1 1 1 1 is computed by

slide-59
SLIDE 59

Protocol for read-k DTs

1 1 1 1 1 is computed by (a read-2 DT)

slide-60
SLIDE 60

Protocol for read-k DTs

1 1 1 1 1 is computed by

slide-61
SLIDE 61

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in .

slide-62
SLIDE 62

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-63
SLIDE 63

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-64
SLIDE 64

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-65
SLIDE 65

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-66
SLIDE 66

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-67
SLIDE 67

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-68
SLIDE 68

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-69
SLIDE 69

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-70
SLIDE 70

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-71
SLIDE 71

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-72
SLIDE 72

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-73
SLIDE 73

Protocol for read-k DTs

1 1 1 1 1 is computed by Key fact: if is in the support

  • f , then the coordinates of

appear in a root-to-leaf path in . Some sets in the support of :

slide-74
SLIDE 74

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?
slide-75
SLIDE 75

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g.,

slide-76
SLIDE 76

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing

slide-77
SLIDE 77

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing

slide-78
SLIDE 78

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing 2.) output the path’s description:

slide-79
SLIDE 79

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing 2.) output the path’s description:

slide-80
SLIDE 80

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing 2.) output the path’s description: 3.) indicate which nodes fall in :

slide-81
SLIDE 81

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing 2.) output the path’s description: 3.) indicate which nodes fall in :

slide-82
SLIDE 82

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing 2.) output the path’s description: 3.) indicate which nodes fall in :

slide-83
SLIDE 83

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing 2.) output the path’s description: 4.) final output: 3.) indicate which nodes fall in :

slide-84
SLIDE 84

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?
slide-85
SLIDE 85

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g.,

slide-86
SLIDE 86

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing

slide-87
SLIDE 87

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing

lots of choices!

slide-88
SLIDE 88

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing

lots of choices!

slide-89
SLIDE 89

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing

lots of choices!

slide-90
SLIDE 90

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find a path containing

lots of choices!

slide-91
SLIDE 91

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find the shortest path containing

slide-92
SLIDE 92

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find the shortest path containing

slide-93
SLIDE 93

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find the shortest path containing 2.) output the path’s description:

slide-94
SLIDE 94

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find the shortest path containing 2.) output the path’s description: 3.) indicate which nodes fall in :

slide-95
SLIDE 95

Protocol for read-k DTs

1 1 1 1 1 Given , what should

  • ur protocol output?

e.g., 1.) find the shortest path containing 2.) output the path’s description: 3.) indicate which nodes fall in : 4.) final output:

slide-96
SLIDE 96

Analysis of protocol

  • A decision tree should be arranged with the

most influential variables near the top.

slide-97
SLIDE 97

Analysis of protocol

  • A decision tree should be arranged with the

most influential variables near the top.

  • Since every path output is root-to-leaf, the

variables near the top will contribute a lot of bits to the expectation.

slide-98
SLIDE 98

Analysis of protocol

  • A decision tree should be arranged with the

most influential variables near the top.

  • Since every path output is root-to-leaf, the

variables near the top will contribute a lot of bits to the expectation.

  • In summary,

contributes a lot to the expectation ⇒ is near the top of the tree ⇒ is highly influential

slide-99
SLIDE 99

A bad tree

Let be a decision tree.

slide-100
SLIDE 100

A bad tree

Let be a decision tree.

slide-101
SLIDE 101

A bad tree

Let be a decision tree. same variables

slide-102
SLIDE 102

A bad tree

  • is useless

Let be a decision tree. same variables

slide-103
SLIDE 103

A bad tree

  • is useless
  • every path in goes through

Let be a decision tree. same variables

slide-104
SLIDE 104

A bad tree

Let be a decision tree.

  • is useless
  • every path in goes through
  • the protocol outputs two extra bits!

same variables

slide-105
SLIDE 105

A bad tree

Let be a decision tree.

  • is useless
  • every path in goes through
  • the protocol outputs two extra bits!

Can’t we repeat this process and generate , making this protocol perform arbitrarily bad? same variables

slide-106
SLIDE 106

A bad tree

Let be a decision tree.

  • is useless
  • every path in goes through
  • the protocol outputs two extra bits!

Can’t we repeat this process and generate , making this protocol perform arbitrarily bad? In general: yes same variables

slide-107
SLIDE 107

A bad tree

Let be a decision tree.

  • is useless
  • every path in goes through
  • the protocol outputs two extra bits!

Can’t we repeat this process and generate , making this protocol perform arbitrarily bad? In general: yes If our trees are read-k: NO same variables

slide-108
SLIDE 108

A bad tree

If is read-1, then same variables

slide-109
SLIDE 109

A bad tree

If is read-1, then

  • is read-2

same variables

slide-110
SLIDE 110

A bad tree

If is read-1, then

  • is read-2
  • is read-4

same variables

slide-111
SLIDE 111

A bad tree

If is read-1, then

  • is read-2
  • is read-4
  • etc. etc. etc.

same variables

slide-112
SLIDE 112

A bad tree

If is read-1, then

  • is read-2
  • is read-4
  • etc. etc. etc.

same variables If is read-k, then it can only have a small number of variables which are both:

  • useless
  • high up
slide-113
SLIDE 113

A bad tree

If is read-1, then

  • is read-2
  • is read-4
  • etc. etc. etc.

same variables If is read-k, then it can only have a small number of variables which are both:

  • useless
  • high up
slide-114
SLIDE 114

Future Directions

  • A proof of the full FEI Conjecture still seems

far away.

slide-115
SLIDE 115

Future Directions

  • A proof of the full FEI Conjecture still seems

far away.

  • is easy. Can we get

?

slide-116
SLIDE 116

Future Directions

  • A proof of the full FEI Conjecture still seems

far away.

  • is easy. Can we get
  • Perhaps read-k DNFs or formulas are next?

?

slide-117
SLIDE 117