SLIDE 1
Decision trees, protocols, and the Fourier Entropy-Influence - - PowerPoint PPT Presentation
Decision trees, protocols, and the Fourier Entropy-Influence - - PowerPoint PPT Presentation
Decision trees, protocols, and the Fourier Entropy-Influence Conjecture Andrew Wan (Simons Institute) John Wright (CMU) Chenggang Wu (Tsinghua) Fourier basics Given a Boolean function , Fourier
SLIDE 2
SLIDE 3
Fourier basics
Given a Boolean function ,
SLIDE 4
Fourier basics
Given a Boolean function , its Fourier transform is .
SLIDE 5
Fourier basics
Given a Boolean function , its Fourier transform is . Parseval’s equation:
SLIDE 6
Fourier basics
Given a Boolean function , its Fourier transform is . (probability distribution) Parseval’s equation:
SLIDE 7
Fourier basics
Given a Boolean function , its Fourier transform is . Parseval’s equation:
SLIDE 8
Fourier basics
Given a Boolean function , its Fourier transform is . Parseval’s equation: Write for this probability distribution over sets
SLIDE 9
Fourier basics
Given a Boolean function , its Fourier transform is . Parseval’s equation: Write for this probability distribution over sets, i.e. if , then
SLIDE 10
Influences
The influence of the ith coordinate is
SLIDE 11
Influences
The influence of the ith coordinate is The total influence of is
SLIDE 12
Influences
The influence of the ith coordinate is The total influence of is
SLIDE 13
Influences
The influence of the ith coordinate is The total influence of is The total influence measures how high up ’s Fourier transform is.
SLIDE 14
Low Fourier weight ⇒ simple structure
SLIDE 15
Low Fourier weight ⇒ simple structure
If most of ’s “Fourier weight” is on the first level, then , for some coordinate . FKN Theorem [FKN 02]:
SLIDE 16
Low Fourier weight ⇒ simple structure
If most of ’s “Fourier weight” is on the first level, then , for some coordinate . FKN Theorem [FKN 02]: Any Boolean essentially depends on only variables. Friedgut’s Theorem [Fri 98]:
SLIDE 17
Low Fourier weight ⇒ simple structure
If most of ’s “Fourier weight” is on the first level, then , for some coordinate . FKN Theorem [FKN 02]: Any Boolean essentially depends on only variables. Friedgut’s Theorem [Fri 98]:
(Average Fourier weight)
SLIDE 18
Low Fourier weight ⇒ simple structure
If most of ’s “Fourier weight” is on the first level, then , for some coordinate . FKN Theorem [FKN 02]: Any Boolean essentially depends on only variables. Friedgut’s Theorem [Fri 98]:
SLIDE 19
Low Fourier weight ⇒ simple structure
If most of ’s “Fourier weight” is on the first level, then , for some coordinate . FKN Theorem [FKN 02]: Any Boolean essentially depends on only variables. Friedgut’s Theorem [Fri 98]: In this paper, is “simple” if it has low Fourier entropy.
SLIDE 20
Fourier entropy
Recall: if , then .
SLIDE 21
Fourier entropy
The Fourier entropy of is , where denotes the Shannon entropy. Def: Recall: if , then .
SLIDE 22
Fourier entropy
The Fourier entropy of is , where denotes the Shannon entropy. Def: i.e., Recall: if , then .
SLIDE 23
Fourier entropy
The Fourier entropy of is , where denotes the Shannon entropy. Def: i.e., The Fourier entropy measures how spread out ’s Fourier transform is. Recall: if , then .
SLIDE 24
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]:
SLIDE 25
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: low Fourier weight ⇒ concentrated Fourier weight
SLIDE 26
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: low Fourier weight ⇒ concentrated Fourier weight spread-out Fourier weight ⇒ high-up Fourier weight
SLIDE 27
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: Consequences:
SLIDE 28
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: Consequences:
- the KKL Theorem
SLIDE 29
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: Consequences:
- the KKL Theorem
- Mansour’s Conjecture, which would give an efficient
algorithm for learning DNFs in the agnostic model
SLIDE 30
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: Consequences:
- the KKL Theorem
- Mansour’s Conjecture, which would give an efficient
algorithm for learning DNFs in the agnostic model
- sharp thresholds for graph properties with “significant
symmetry”
SLIDE 31
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:
SLIDE 32
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:
- , for all
SLIDE 33
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:
- , for all
- [OT13]
SLIDE 34
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:
- , for all
- [OT13]
- FEI holds for: - random DNFs [KLW10]
SLIDE 35
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:
- , for all
- [OT13]
- FEI holds for: - random DNFs [KLW10]
- symmetric functions [OWZ11]
SLIDE 36
Fourier Entropy-Influence Conjecture
There exists a constant such that for every Boolean , . Conjecture [FK 96]: Previous results:
- , for all
- [OT13]
- FEI holds for: - random DNFs [KLW10]
- symmetric functions [OWZ11]
- read-once Boolean formulas [OT13, CKLS13]
SLIDE 37
Decision Trees
1 1 1 1 0 1
SLIDE 38
Decision Trees
1 1 1 1 0 1 If you read a
- 0, go left
- 1, go right
SLIDE 39
Decision Trees
1 1 1 1 0 1 If you read a
- 0, go left
- 1, go right
Def: is read-k if every variable appears at most k times.
SLIDE 40
Decision Trees
1 1 1 1 0 1 If you read a
- 0, go left
- 1, go right
Def: is read-k if every variable appears at most k times.
SLIDE 41
Decision Trees
1 1 1 1 0 1 If you read a
- 0, go left
- 1, go right
Def: is read-k if every variable appears at most k times.
SLIDE 42
Decision Trees
1 1 1 1 0 1 If you read a
- 0, go left
- 1, go right
Def: is read-k if every variable appears at most k times.
SLIDE 43
Decision Trees
1 1 1 1 0 1 If you read a
- 0, go left
- 1, go right
Def: is read-k if every variable appears at most k times.
SLIDE 44
Decision Trees
1 1 1 1 0 1 If you read a
- 0, go left
- 1, go right
Def: is read-k if every variable appears at most k times.
SLIDE 45
Decision Trees
1 1 1 1 0 1 If you read a
- 0, go left
- 1, go right
Def: is read-k if every variable appears at most k times.
SLIDE 46
Decision Trees
1 1 1 1 0 1 If you read a
- 0, go left
- 1, go right
Def: is read-k if every variable appears at most k times.
SLIDE 47
Decision Trees
1 1 1 1 0 1 If you read a
- 0, go left
- 1, go right
is read-k if every variable appears at most k times. Def: is read-3
SLIDE 48
Our results
If is computable by a read-k DT, then .
SLIDE 49
Our results
If is computable by a read-k DT, then . If is computable DT with expected depth d, and satisfies , then .
also proven by [CKLS13]
SLIDE 50
Our results
If is computable by a read-k DT, then . If is computable DT with expected depth d, and satisfies , then .
also proven by [CKLS13]
The FEI+ conjecture of [OT13] composes.
also proven by [OT13]
SLIDE 51
Our technique
- Want to show for certain
Boolean functions .
SLIDE 52
Our technique
- Want to show for certain
Boolean functions .
- Previous papers have studied the expression
SLIDE 53
Our technique
- Want to show for certain
Boolean functions .
- Previous papers have studied the expression
- We instead take an information theoretic
approach via the Shannon Source Coding Theorem.
SLIDE 54
Shannon Source Coding Theorem
Given a random variable , avg # of bits needed to communicate .
SLIDE 55
Shannon Source Coding Theorem
Given a random variable , avg # of bits needed to communicate . Thus, to show , we need to construct an efficient protocol for communicating the value
- f .
SLIDE 56
Shannon Source Coding Theorem
Given a random variable , avg # of bits needed to communicate . Thus, to show , we need to construct an efficient protocol for communicating the value
- f .
(efficient = bits on average)
SLIDE 57
a protocol for read-k DTs
SLIDE 58
Protocol for read-k DTs
1 1 1 1 1 is computed by
SLIDE 59
Protocol for read-k DTs
1 1 1 1 1 is computed by (a read-2 DT)
SLIDE 60
Protocol for read-k DTs
1 1 1 1 1 is computed by
SLIDE 61
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in .
SLIDE 62
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 63
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 64
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 65
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 66
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 67
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 68
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 69
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 70
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 71
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 72
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 73
Protocol for read-k DTs
1 1 1 1 1 is computed by Key fact: if is in the support
- f , then the coordinates of
appear in a root-to-leaf path in . Some sets in the support of :
SLIDE 74
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
SLIDE 75
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g.,
SLIDE 76
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing
SLIDE 77
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing
SLIDE 78
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing 2.) output the path’s description:
SLIDE 79
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing 2.) output the path’s description:
SLIDE 80
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing 2.) output the path’s description: 3.) indicate which nodes fall in :
SLIDE 81
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing 2.) output the path’s description: 3.) indicate which nodes fall in :
SLIDE 82
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing 2.) output the path’s description: 3.) indicate which nodes fall in :
SLIDE 83
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing 2.) output the path’s description: 4.) final output: 3.) indicate which nodes fall in :
SLIDE 84
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
SLIDE 85
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g.,
SLIDE 86
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing
SLIDE 87
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing
lots of choices!
SLIDE 88
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing
lots of choices!
SLIDE 89
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing
lots of choices!
SLIDE 90
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find a path containing
lots of choices!
SLIDE 91
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find the shortest path containing
SLIDE 92
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find the shortest path containing
SLIDE 93
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find the shortest path containing 2.) output the path’s description:
SLIDE 94
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find the shortest path containing 2.) output the path’s description: 3.) indicate which nodes fall in :
SLIDE 95
Protocol for read-k DTs
1 1 1 1 1 Given , what should
- ur protocol output?
e.g., 1.) find the shortest path containing 2.) output the path’s description: 3.) indicate which nodes fall in : 4.) final output:
SLIDE 96
Analysis of protocol
- A decision tree should be arranged with the
most influential variables near the top.
SLIDE 97
Analysis of protocol
- A decision tree should be arranged with the
most influential variables near the top.
- Since every path output is root-to-leaf, the
variables near the top will contribute a lot of bits to the expectation.
SLIDE 98
Analysis of protocol
- A decision tree should be arranged with the
most influential variables near the top.
- Since every path output is root-to-leaf, the
variables near the top will contribute a lot of bits to the expectation.
- In summary,
contributes a lot to the expectation ⇒ is near the top of the tree ⇒ is highly influential
SLIDE 99
A bad tree
Let be a decision tree.
SLIDE 100
A bad tree
Let be a decision tree.
SLIDE 101
A bad tree
Let be a decision tree. same variables
SLIDE 102
A bad tree
- is useless
Let be a decision tree. same variables
SLIDE 103
A bad tree
- is useless
- every path in goes through
Let be a decision tree. same variables
SLIDE 104
A bad tree
Let be a decision tree.
- is useless
- every path in goes through
- the protocol outputs two extra bits!
same variables
SLIDE 105
A bad tree
Let be a decision tree.
- is useless
- every path in goes through
- the protocol outputs two extra bits!
Can’t we repeat this process and generate , making this protocol perform arbitrarily bad? same variables
SLIDE 106
A bad tree
Let be a decision tree.
- is useless
- every path in goes through
- the protocol outputs two extra bits!
Can’t we repeat this process and generate , making this protocol perform arbitrarily bad? In general: yes same variables
SLIDE 107
A bad tree
Let be a decision tree.
- is useless
- every path in goes through
- the protocol outputs two extra bits!
Can’t we repeat this process and generate , making this protocol perform arbitrarily bad? In general: yes If our trees are read-k: NO same variables
SLIDE 108
A bad tree
If is read-1, then same variables
SLIDE 109
A bad tree
If is read-1, then
- is read-2
same variables
SLIDE 110
A bad tree
If is read-1, then
- is read-2
- is read-4
same variables
SLIDE 111
A bad tree
If is read-1, then
- is read-2
- is read-4
- etc. etc. etc.
same variables
SLIDE 112
A bad tree
If is read-1, then
- is read-2
- is read-4
- etc. etc. etc.
same variables If is read-k, then it can only have a small number of variables which are both:
- useless
- high up
SLIDE 113
A bad tree
If is read-1, then
- is read-2
- is read-4
- etc. etc. etc.
same variables If is read-k, then it can only have a small number of variables which are both:
- useless
- high up
SLIDE 114
Future Directions
- A proof of the full FEI Conjecture still seems
far away.
SLIDE 115
Future Directions
- A proof of the full FEI Conjecture still seems
far away.
- is easy. Can we get
?
SLIDE 116
Future Directions
- A proof of the full FEI Conjecture still seems
far away.
- is easy. Can we get
- Perhaps read-k DNFs or formulas are next?
?
SLIDE 117