Adaptivity helps for testing juntas
Rocco Servedio, Li-Yang Tan, John Wright
Columbia CMU TTIC
Adaptivity helps for testing juntas Rocco Servedio, Li-Yang Tan, - - PowerPoint PPT Presentation
Adaptivity helps for testing juntas Rocco Servedio, Li-Yang Tan, John Wright Columbia TTIC CMU Adaptivity helps for testing juntas Rocco Servedio, Li-Yang Tan, John Wright Columbia TTIC CMU (work done while I was visiting Columbia)
Rocco Servedio, Li-Yang Tan, John Wright
Columbia CMU TTIC
Rocco Servedio, Li-Yang Tan, John Wright
Columbia CMU TTIC
(work done while I was visiting Columbia)
Juntas
f : {0,1}n → {0,1}
Juntas
f : → {0,1}
1 1 1 1 1 1
Juntas
f : → {0,1} k-junta: f only depends on k bits
1 1 1 1 1 1
Juntas
f : → {0,1} k-junta: f only depends on k bits
1 1 1 1 1 1
Juntas
f : → {0,1} k-junta: f only depends on k bits
1 1 1 1 1 1 (a 3-junta)
Juntas
f : → {0,1} k-junta: f only depends on k bits (k = 1: f is a dictator)
1 1 1 1 1 1 (a 3-junta)
Juntas
f : → {0,1} k-junta: f only depends on k bits (k = 1: f is a dictator)
Key question: how to tell if f is a k-junta?
1 1 1 1 1 1 (a 3-junta)
Queries
Given: ability to make queries x → f(x)
Queries
Given: ability to make queries x → f(x) Nonadaptive: fix queries in advance
Queries
Given: ability to make queries x → f(x) Nonadaptive: fix queries in advance
x1, x2, …Queries
Given: ability to make queries x → f(x) Nonadaptive: fix queries in advance x1, x2, … → f(x1), f(x2), ...
Queries
Given: ability to make queries x → f(x) Nonadaptive: fix queries in advance x1, x2, … → f(x1), f(x2), ... Adaptive: choose queries based on answers
Queries
Given: ability to make queries x → f(x) Nonadaptive: fix queries in advance x1, x2, … → f(x1), f(x2), ... Adaptive: choose queries based on answers x1
Queries
Given: ability to make queries x → f(x) Nonadaptive: fix queries in advance x1, x2, … → f(x1), f(x2), ... Adaptive: choose queries based on answers x1 → f(x1)
Queries
Given: ability to make queries x → f(x) Nonadaptive: fix queries in advance x1, x2, … → f(x1), f(x2), ... Adaptive: choose queries based on answers x1 → f(x1), x2
Queries
Given: ability to make queries x → f(x) Nonadaptive: fix queries in advance x1, x2, … → f(x1), f(x2), ... Adaptive: choose queries based on answers x1 → f(x1), x2 → f(x2)
Queries
Given: ability to make queries x → f(x) Nonadaptive: fix queries in advance x1, x2, … → f(x1), f(x2), ... Adaptive: choose queries based on answers x1 → f(x1), x2 → f(x2), x3
Queries
Given: ability to make queries x → f(x) Nonadaptive: fix queries in advance x1, x2, … → f(x1), f(x2), ... Adaptive: choose queries based on answers x1 → f(x1), x2 → f(x2), x3 → f(x3)
Queries
Given: ability to make queries x → f(x) Nonadaptive: fix queries in advance x1, x2, … → f(x1), f(x2), ... Adaptive: choose queries based on answers x1 → f(x1), x2 → f(x2), x3 → f(x3), ....
Property testing
Goal: distinguish whether (unknown) f is
Property testing
Goal: distinguish whether (unknown) f is
Property testing
Goal: distinguish whether (unknown) f is
Property testing
Goal: distinguish whether (unknown) f is
Property testing
Goal: distinguish whether (unknown) f is
f:
Property testing
Goal: distinguish whether (unknown) f is
f: (ɛ-fraction)
Property testing
Goal: distinguish whether (unknown) f is
f: (ɛ-fraction)
Property testing
Goal: distinguish whether (unknown) f is
f: (ɛ-fraction) (k-junta)
Property testing
Goal: distinguish whether (unknown) f is
Property testing
Goal: distinguish whether (unknown) f is
Property testing
Goal: distinguish whether (unknown) f is
Property testing
Goal: distinguish whether (unknown) f is
Resources: Minimize query count q
Property testing
Goal: distinguish whether (unknown) f is
Resources: Minimize query count q in terms of k and ɛ
Property testing
Goal: distinguish whether (unknown) f is
Resources: Minimize query count q in terms of k and ɛ (no dependence on n!)
Junta testing motivation
rank model for high dimensional data
Junta testing motivation
rank model for high dimensional data
testing, a basic topic in hardness of approximation
Junta testing motivation
rank model for high dimensional data
testing, a basic topic in hardness of approximation
properties.
Prior work
adaptive nonadaptive
Prior work
adaptive nonadaptive
[Bla08] O(k3/2log(k)3/ɛ)
Prior work
adaptive nonadaptive
[Bla08] O(k3/2log(k)3/ɛ) [Bla08] (k/(ɛ log(k/ɛ))
Prior work
adaptive nonadaptive
[Bla08] O(k3/2log(k)3/ɛ) [Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k))
Prior work
adaptive nonadaptive
[Bla08] O(k3/2log(k)3/ɛ) O(k log(k) + k/ɛ) [Bla09] [Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k))
Prior work
adaptive nonadaptive
[Bla08] O(k3/2log(k)3/ɛ) [CG04] (k) [Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k)) O(k log(k) + k/ɛ) [Bla09]
Prior work
adaptive nonadaptive
[Bla08] O(k3/2log(k)3/ɛ) [CG04] (k) [Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k)) O(k log(k) + k/ɛ) [Bla09]
[Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k)) O(k log(k) + k/ɛ) [Bla09]
Annoyance: adaptive UB ≥ nonadaptive LB
[Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k)) O(k log(k) + k/ɛ) [Bla09]
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help?
[Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k)) O(k log(k) + k/ɛ) [Bla09]
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help?
[Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k)) O(k log(k) + k/ɛ) [Bla09]
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help?
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help? It should: [Bla09]’s O(k log(k) + k/ɛ) adaptive algorithm uses binary search.
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help? It should: [Bla09]’s O(k log(k) + k/ɛ) adaptive algorithm uses binary search. Adaptivity also helps for testing signed majority functions, read-once width-two OBDDs.
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help? It should: [Bla09]’s O(k log(k) + k/ɛ) adaptive algorithm uses binary search. Adaptivity also helps for testing signed majority functions, read-once width-two
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help? It should: [Bla09]’s O(k log(k) + k/ɛ) adaptive algorithm uses binary search. Adaptivity also helps for testing signed majority functions, read-once width-two
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help?
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help?
[Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k)) O(k log(k) + k/ɛ) [Bla09]
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help?
[Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k)) O(k log(k) + k/ɛ) [Bla09]
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help? Our work: yes it does
[Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k)) O(k log(k) + k/ɛ) [Bla09]
Annoyance: adaptive UB ≥ nonadaptive LB [Bla09]: does adaptivity even help? Our work: yes it does. new nonadaptive LB
[Bla08] (k/(ɛ log(k/ɛ)) [BGSMdW13] (k log(k)) O(k log(k) + k/ɛ) [Bla09]
Main result
Any nonadaptive algorithm requires
k log(k) ɛc log(log(k)/ɛc)
queries q ≥
Main result
Any nonadaptive algorithm requires
k log(k) ɛc log(log(k)/ɛc)
queries (for any 0 < c < 1). q ≥
Main result
Any nonadaptive algorithm requires
k log(k) ɛc log(log(k)/ɛc)
queries (for any 0 < c < 1). Set ɛ = 1/log(k). q ≥
Main result
Any nonadaptive algorithm requires
k log(k) ɛc log(log(k)/ɛc)
queries (for any 0 < c < 1). Set ɛ = 1/log(k). Adaptive UB = O(k log(k) + k/ɛ) q ≥
Main result
Any nonadaptive algorithm requires
k log(k) ɛc log(log(k)/ɛc)
queries (for any 0 < c < 1). q ≥ Set ɛ = 1/log(k). Adaptive UB = O(k log(k) + k/ɛ) = O(k log(k))
Main result
Any nonadaptive algorithm requires
k log(k) ɛc log(log(k)/ɛc)
queries (for any 0 < c < 1). Set ɛ = 1/log(k). Adaptive UB = O(k log(k) + k/ɛ) = O(k log(k)) Our nonadapt LB = k log(k)1+c/log(log(k)) q ≥
Our techniques
adaptive lower bound
Our techniques
adaptive lower bound
bound based on [CG04]’s lower bound
Our techniques
adaptive lower bound
bound based on [CG04]’s lower bound
Our techniques
adaptive lower bound
bound based on [CG04]’s lower bound
[CG04] considers two distributions on n = (k+1)-variable functions:
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i.
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i.
fyes(x1,...,0,…,xk+1) = fyes(x1,...,1,…,xk+1)
i i
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i.
fyes(x1,...,0,…,xk+1) = fyes(x1,...,1,…,xk+1) = random {0,1}
i i
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i.
fyes(x1,...,0,…,xk+1) = fyes(x1,...,1,…,xk+1) = random {0,1}
i i (for all x1,...,xk+1)
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i.
fyes(x1,...,0,…,xk+1) = fyes(x1,...,1,…,xk+1) = random {0,1}
i i (for all x1,...,xk+1)
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i.
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i. Dno: ● Set fno:{0,1}k+1→ {0,1} uar.
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i. (a k-junta) Dno: ● Set fno:{0,1}k+1→ {0,1} uar.
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i. (a k-junta) (usually far from a k-junta) Dno:
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i. (a k-junta) (usually far from a k-junta) Dno:
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i. Dno: ● Set fno:{0,1}k+1→ {0,1} uar.
[CG04] considers two distributions on n = (k+1)-variable functions: Dyes: ● Pick i ~ {1,...,k+1} uar.
not depending on coordinate i. Dno: ● Set fno:{0,1}k+1→ {0,1} uar. [CG04 THM]: Need (k) queries to distinguish these distributions
Given f, how to tell if from Dyes or Dno?
Given f, how to tell if from Dyes or Dno? Idea: See if it has any irrelevant coords.
Given f, how to tell if from Dyes or Dno? Idea: See if it has any irrelevant coords. For coord i:
Given f, how to tell if from Dyes or Dno? Idea: See if it has any irrelevant coords. For coord i: ● Pick x uar.
Given f, how to tell if from Dyes or Dno? Idea: See if it has any irrelevant coords. For coord i: ● Pick x uar.
Given f, how to tell if from Dyes or Dno? Idea: See if it has any irrelevant coords. For coord i: ● Pick x uar.
Differ only on coord i.
Given f, how to tell if from Dyes or Dno? Idea: See if it has any irrelevant coords. For coord i: ● Pick x uar.
Differ only on coord i. Def: x and x⊕i form an i-twin.
Given f, how to tell if from Dyes or Dno? Idea: See if it has any irrelevant coords. For coord i: ● Pick x uar.
Differ only on coord i. Def: x and x⊕i form an i-twin.
Given f, how to tell if from Dyes or Dno? Idea: See if it has any irrelevant coords. For coord i: ● Pick x uar.
Given f, how to tell if from Dyes or Dno? Idea: See if it has any irrelevant coords. For coord i: ● Pick x uar.
Given f, how to tell if from Dyes or Dno? Idea: See if it has any irrelevant coords. For coord i: ● Pick x uar.
Given f, how to tell if from Dyes or Dno? Idea: See if it has any irrelevant coords. For coord i: ● Pick x uar.
If i is relevant:
If i is relevant: f(x) f(x⊕i)
If i is relevant: f(x) f(x⊕i)
uar {0,1}
If i is relevant: f(x) f(x⊕i)
uar {0,1} uar {0,1}
If i is relevant: f(x) f(x⊕i)
uar {0,1} uar {0,1}
If i is relevant: f(x) f(x⊕i)
If i is relevant: f(x) f(x⊕i) = w/prob 1/2
If i is relevant: f(x) f(x⊕i) = w/prob 1/2 f(x) f(x⊕i) ≠ w/prob 1/2
If i is relevant: f(x) f(x⊕i) = w/prob 1/2 f(x) f(x⊕i) ≠ w/prob 1/2 ∴ will conclude relevant after O(1) i-twins
If i is relevant: f(x) f(x⊕i) = w/prob 1/2 f(x) f(x⊕i) ≠ w/prob 1/2 ∴ will conclude relevant after O(1) i-twins If i is irrelevant:
If i is relevant: f(x) f(x⊕i) = w/prob 1/2 f(x) f(x⊕i) ≠ w/prob 1/2 ∴ will conclude relevant after O(1) i-twins If i is irrelevant: f(x) f(x⊕i) = always
If i is relevant: f(x) f(x⊕i) = w/prob 1/2 f(x) f(x⊕i) ≠ w/prob 1/2 ∴ will conclude relevant after O(1) i-twins If i is irrelevant: f(x) f(x⊕i) = always ∴ will query O(log(k)) i-twins
If i is relevant: f(x) f(x⊕i) = w/prob 1/2 f(x) f(x⊕i) ≠ w/prob 1/2 ∴ will conclude relevant after O(1) i-twins If i is irrelevant: f(x) f(x⊕i) = always ∴ will query O(log(k)) i-twins Query cost:
If i is relevant: f(x) f(x⊕i) = w/prob 1/2 f(x) f(x⊕i) ≠ w/prob 1/2 ∴ will conclude relevant after O(1) i-twins If i is irrelevant: f(x) f(x⊕i) = always ∴ will query O(log(k)) i-twins Query cost: (k+1) * O(1)
If i is relevant: f(x) f(x⊕i) = w/prob 1/2 f(x) f(x⊕i) ≠ w/prob 1/2 ∴ will conclude relevant after O(1) i-twins If i is irrelevant: f(x) f(x⊕i) = always ∴ will query O(log(k)) i-twins Query cost: (k+1) * O(1) + O(log(k))
If i is relevant: f(x) f(x⊕i) = w/prob 1/2 f(x) f(x⊕i) ≠ w/prob 1/2 ∴ will conclude relevant after O(1) i-twins If i is irrelevant: f(x) f(x⊕i) = always ∴ will query O(log(k)) i-twins Query cost: (k+1) * O(1) + O(log(k)) = O(k)
[CG04]’s (k) lower bound
[CG04]’s (k) lower bound
Key idea: ● Suppose you query f on x1,...,xq
[CG04]’s (k) lower bound
Key idea: ● Suppose you query f on x1,...,xq
[CG04]’s (k) lower bound
Key idea: ● Suppose you query f on x1,...,xq
i-twin
[CG04]’s (k) lower bound
Key idea: ● Suppose you query f on x1,...,xq
i-twin LB: q points can have i-twins for ≤ q-1 coords.
[CG04]’s (k) lower bound
Key idea: ● Suppose you query f on x1,...,xq
i-twin LB: q points can have i-twins for ≤ q-1 coords. ∴ q = (k).
Matching upper and lower bounds?
Matching upper and lower bounds?
Algorithm was adaptive:
Matching upper and lower bounds?
Algorithm was adaptive:
Matching upper and lower bounds?
Algorithm was adaptive:
Matching upper and lower bounds?
Algorithm was adaptive:
Can’t plan this in advance:
Matching upper and lower bounds?
Algorithm was adaptive:
Can’t plan this in advance: x1,...,xq need O(log(k)) i-twins in all k+1 directions.
Matching upper and lower bounds?
Algorithm was adaptive:
Can’t plan this in advance: x1,...,xq need O(log(k)) i-twins in all k+1 directions. ∴ q = (k log(k)) nonadaptive LB?
Matching upper and lower bounds?
Algorithm was adaptive:
Can’t plan this in advance: x1,...,xq need O(log(k)) i-twins in all k+1 directions. ∴ q = (k log(k)) nonadaptive LB? (not quite)
A nonadaptive algorithm.
[Fra83]: there are q = O(k log(k) / log log(k)) points x1,...,xq with log(k) i-twins for each i.
A nonadaptive algorithm.
[Fra83]: there are q = O(k log(k) / log log(k)) points x1,...,xq with log(k) i-twins for each i.
k log(k) ɛc log(log(k)/ɛc)
q ≥
Recall our LB:
A nonadaptive algorithm.
[Fra83]: there are q = O(k log(k) / log log(k)) points x1,...,xq with log(k) i-twins for each i.
k log(k) ɛc log(log(k)/ɛc)
q ≥
Recall our LB: Goal: ● show [Fra83] is optimal
A nonadaptive algorithm.
[Fra83]: there are q = O(k log(k) / log log(k)) points x1,...,xq with log(k) i-twins for each i.
k log(k) ɛc log(log(k)/ɛc)
q ≥
Recall our LB: Goal: ● show [Fra83] is optimal
A nonadaptive algorithm.
[Fra83]: there are q = O(k log(k) / log log(k)) points x1,...,xq with log(k) i-twins for each i.
k log(k) ɛc log(log(k)/ɛc)
q ≥
Recall our LB: Goal: ● show [Fra83] is optimal
New distributions.
Dno: ● Set fno:{0,1}k+1→ {0,1} random ɛ- biased.
New distributions.
Dno: ● Set fno:{0,1}k+1→ {0,1} random ɛ- biased.
New distributions.
f(x) is independent from all other f(x’),
Dno: ● Set fno:{0,1}k+1→ {0,1} random ɛ- biased.
New distributions.
f(x) is independent from all other f(x’), satisfies Pr[f(x) = 1] = ɛ
Dno: ● Set fno:{0,1}k+1→ {0,1} random ɛ- biased.
New distributions.
f(x) is independent from all other f(x’), satisfies Pr[f(x) = 1] = ɛ
Dno: ● Set fno:{0,1}k+1→ {0,1} random ɛ- biased.
New distributions.
Dyes: ● Pick i ~ {1,...,k+1} uar. Dno: ● Set fno:{0,1}k+1→ {0,1} random ɛ- biased.
New distributions.
Dyes: ● Pick i ~ {1,...,k+1} uar.
biased subject to not depending on coordinate i. Dno: ● Set fno:{0,1}k+1→ {0,1} random ɛ- biased.
New distributions.
Dyes: ● Pick i ~ {1,...,k+1} uar.
biased subject to not depending on coordinate i. Dno: ● Set fno:{0,1}k+1→ {0,1} random ɛ- biased.
New distributions.
(a k-junta) Dno:
Dyes: ● Pick i ~ {1,...,k+1} uar.
biased subject to not depending on coordinate i. Dno: ● Set fno:{0,1}k+1→ {0,1} random ɛ- biased.
New distributions.
(a k-junta) Dno: (usually ɛ-far from a k-junta)
Distributions studied in [Bla08].
Distributions studied in [Bla08]. His LB: (k/(ɛ log(k/ɛ))
Distributions studied in [Bla08]. His LB: (k/(ɛ log(k/ɛ)) Main tool: Edge-isoperimetric inequality
Distributions studied in [Bla08]. His LB: (k/(ɛ log(k/ɛ)) Main tool: Edge-isoperimetric inequality: points x1,...,xq can only have O(q log(q)) i-twins
Distributions studied in [Bla08]. His LB: (k/(ɛ log(k/ɛ)) Main tool: Edge-isoperimetric inequality: points x1,...,xq can only have O(q log(q)) i-twins
Distributions studied in [Bla08]. His LB: (k/(ɛ log(k/ɛ)) Main tool: Edge-isoperimetric inequality: points x1,...,xq can only have O(q log(q)) i-twins
Distributions studied in [Bla08]. His LB: (k/(ɛ log(k/ɛ)) Main tool: Edge-isoperimetric inequality: points x1,...,xq can only have O(q log(q)) i-twins
Our main tool: New edge-iso inequality
Our main tool: New edge-iso inequality Suppose x1,...,xq have m i-twins in d directions. Then q ≥ md / log(m).
Our main tool: New edge-iso inequality Suppose x1,...,xq have m i-twins in d directions. Then q ≥ md / log(m).
Our main tool: New edge-iso inequality Suppose x1,...,xq have m i-twins in d directions. Then q ≥ md / log(m).
q ≥ k log(k)/log(log(k))
Our main tool: New edge-iso inequality Suppose x1,...,xq have m i-twins in d directions. Then q ≥ md / log(m).
q ≥ k log(k)/log(log(k))
Other ideas
Other ideas
too important
Other ideas
too important
f(x1), f(x2), …, f(xq)
Other ideas
too important
f(x1), f(x2), …, f(xq)
Open problem
Prove a separation between adapative and nonadaptive when ɛ = const.