PROPERTY TESTING
Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019
CS523 234: 4: Alg lgori rith thms ms at Sca cale le
PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019 - - PowerPoint PPT Presentation
CS523 234: 4: Alg lgori rith thms ms at Sca cale le PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019 Lecture Outline What is property testing? Identify what goes into showing correctness of a testing algorithm.
Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019
CS523 234: 4: Alg lgori rith thms ms at Sca cale le
■ What is property testing? ■ Identify what goes into showing correctness of a testing
■ Identify what goes into showing impossibility of fast
{A, C, T, G}
Prob
em: Given two DNA strands 𝑌 and 𝑍, are they from the same species or from different?
■ If 𝑌 and 𝑍 are from the same species, then we expect the strings are similar. Otherwise, not. ■ But similar in what sense?
■ If 𝑌 and 𝑍 are from the same species, then we expect the strings are similar. Otherwise, not. ■ But similar in what sense? Nee eed d a met metric. ric.
– One possibility is Levenshtein distance (# of insertions, deletions or substitutions to turn one string into another)
Want an algorithm that outputs: – SAME if 𝑒𝑀 𝑌, 𝑍 is “small” – DIFFERENT if 𝑒𝑀(𝑌, 𝑍) is “large”
For exactly computing 𝑒𝑀, only 𝑃 𝑜2 algorithms are
For exactly computing 𝑒𝑀, only 𝑃 𝑜2 algorithms are
Is there a more efficient algorithm that outputs – SAME if 𝑒𝑀 𝑌, 𝑍 ≤ 𝑈
1
– DIFFERENT if 𝑒𝑀 𝑌, 𝑍 ≥ 𝑈2?
Is there a more efficient algorithm that outputs – SAME if 𝑒𝑀 𝑌, 𝑍 ≤ 𝑈
1
– DIFFERENT if 𝑒𝑀 𝑌, 𝑍 ≥ 𝑈2?
Indeed, there is! If 𝑈
1 and 𝑈2 are sufficiently apart, you only
need to look at ≪ 𝑜 characters in the strings to make the correct decision with high probability!
Pr Prop
erty ty Test estin ing Fram amewor
Bad inputs are 𝝑-far ar from good, which means: For a distance function 𝑒: Inputs → [0,1], for any good 𝑌 and bad 𝑍, 𝑒 𝑌, 𝑍 > 𝜗.
Pr Prop
erty ty Test estin ing Fram amewor
Def efinitio
ester er for r a p proper
ty 𝓠 if:
to an object 𝑦 of size 𝑜
Pr Prop
erty ty Test estin ing Fram amewor
Que uery co comple lexity xity: The number of query accesses made by the tester. Main focus of this course will be understanding the query complexity for various properties 𝒬.
Pr Prop
erty ty Test estin ing Fram amewor
Data a re repr presen esenta tation tion decides what is revealed by each query. For example, can represent graph as an adjacency matrix or list.
Pr Prop
erty ty Test estin ing Fram amewor
Dista stance nce fun unctio ction decides what is meant by 𝜗-far. The default choice is the Hamming amming distance
𝑔, : 𝑜 → 𝑆, 𝑒𝐼 𝑔, = 𝑗 ∈ 𝑜 : 𝑔 𝑗 ≠ 𝑗 𝑜 .
Pr Prop
erty ty Test estin ing Fram amewor
Often, our testers will be one ne-sid sided ed, meaning the tester will accept with probability 1 if 𝑦 ∈ 𝒬.
■ Inputs are strings of length 𝑜. Property 𝒬 is satisfied only by the all-1’s string. Distance measure is the Hamming distance, 𝑒𝐼. ■ Want tester to accept 𝑦 with probability ≥ 2/3 if 𝑦 = 1𝑜. Want tester to reject 𝑦 with probability ≥ 2/3 if #{𝑗: 𝑦𝑗 ≠ 1} > 𝜗𝑜. ■ Tester: Sample 2/𝜗 random locations 𝑗 ∈ [𝑜]. Accept iff for all such 𝑗, 𝑦𝑗 = 1. ■ One-sided error. If 𝑦 is 𝜗-far from 𝒬, Pr[tester rejects] ≥ 1 − 1 − 𝜗 2/𝜗 ≥ 2/3
A S A Sim imple e Ex Exam ample le
To show that an algorithm is a tester for a property 𝒬 with query complexity 𝑟(𝜗, 𝑜), you need to do th thre ree things:
probability ≥ 2/3 (or 1 for one-sided)
rejects with probability ≥ 2/3
𝑟(𝜗, 𝑜) for all inputs
■ Input: array of 𝑜 distinct numbers. ■ Array 𝐵 is mo monot notone
■ Array 𝐵 is 𝝑-far r from m mo monot notone
min monotone 𝐶𝑒𝐼 𝐵, 𝐶 > 𝜗
Test1(𝜗, 𝑜, 𝐵): for t=1,…,q: choose random i ∈ [1, 𝑜 − 1]
For what choice of 𝑟 is Test1 a tester for monotonicity?
Test2(𝜗, 𝑜, 𝐵): for t=1,…,q: choose random i ∈ [1, 𝑜 − 1] choose random j ∈ [𝑗 + 1, 𝑜]
For what choice of 𝑟 is Test2 a tester for monotonicity?
Test3(𝜗, 𝑜, 𝐵): for t=1,…,2/𝜗: choose random i ∈ [1, 𝑜] x ← A[i]
for x does not end at i
Th Theo eorem rem: Test3 is a one-sided tester for monotonicity with query complexity 𝑃((log 𝑜)/𝜗).
Query complexity YES case NO case
Call a coordinate 𝑗 sea earcha chabl ble e if the binary search for 𝐵[𝑗] ends at 𝑗. Cla laim m 1: If 𝐵 is 𝜗-far from monotone, then the number
NO case done with this claim. Why?
Cla laim m 2: The array 𝐵 restricted to its searchable coordinates is monotone.
Claim 1 follows from Claim 2. Why?
Cla laim m 3: If 𝑗 < 𝑘 and both 𝑗 and 𝑘 are searchable, then 𝐵 𝑗 < 𝐵[𝑘].
■ Tester is adaptiv aptive, meaning that its queries may depend on the answers to its past queries. ■ It is possible to make the tester non-adaptive. ■ Test2 is a valid tester with query complexity 𝑃 𝜗−1 when the inputs are Boolean arrays.
Three common approaches Yao’s Minimax Principle Gap-Preserving Reductions Communication Complexity
Three common approaches Yao’s Minimax Principle Gap-Preserving Reductions Communication Complexity
■ Testers are ra randomi ndomized zed alg lgori rith thms
randomized algorithm as a random element of a collection of deterministic algorithms: = {𝐵1, 𝐵2, 𝐵3, … } ■ Showing limitations for randomized algorithms is usually trickier than for deterministic algorithms
For any randomized tester 𝑈 making 𝑟 queries, there exists an input 𝑦 such that: Pr
𝑈 [𝑈 𝑦 is wrong] > 1
3 There exists a distribution 𝔈 on inputs such that for any det etermi ermini nist stic ic tester 𝑈 making 𝑟 queries: Pr
𝑦∼[𝑈 𝑦 is wrong] > 1
3
For any randomized tester 𝑈 making 𝑟 queries, there exists an input 𝑦 such that: Pr
𝑈 [𝑈 𝑦 is wrong] > 1
3 There exists a distribution 𝔈 on inputs such that for any det eterministic erministic tester 𝑈 making 𝑟 queries: Pr
𝑦∼[𝑈 𝑦 is wrong] > 1
3
For any randomized tester 𝑈 making 𝑟 queries, there exists an input 𝑦 such that: Pr
𝑈 [𝑈 𝑦 is wrong] > 1
3 There exists a distribution 𝔈 on inputs such that for any det etermi ermini nist stic ic tester 𝑈 making 𝑟 queries: Pr
𝑦∼[𝑈 𝑦 is wrong] > 1
3
It suffices to come up with a distribution ribution of inputs that is hard on average for any low- query det etermini rminist stic tester.
Yao’s Minimax Principle: 𝒬 is a property over objects. Suppose there are two distributions ℱ1 and ℱ2 such that:
𝑦∼ℱ1[𝑦 ∈ 𝒬] ≥ 1 − 𝜃1
𝑦∼ℱ2[𝑦 is 𝜗−far from 𝒬] ≥ 1 − 𝜃2
Pr
𝑦∼ℱ1 𝑈 accepts − Pr 𝑦∈ℱ2 𝑈 accepts
≤ 𝜃3 If 𝜃1 + 𝜃2 + 𝜃3 < 1/3, then the query complexity of testing 𝒬 is more than 𝑟(𝑜, 𝜗).
Suppose 𝒬 = 1𝑜 . The query complexity of testing 𝒬 is Ω(𝜗−1).
𝒬 = 0𝑜, 1𝑜 ? 𝒬 = {𝑨} for a fixed string 𝑨 ∈ 0,1 𝑜?
Suppose 𝒬 = 𝑦 ∈ 0,1 𝑜: 𝑦 ≤ 𝑜
2 (1 − 𝜗) .
The query complexity of testing 𝒬 is Ω(𝜗−2).
■ Property testing is about how you can uncover differences in the global structure by using local queries. ■ For showing correctness of testers, you need to verify its query complexity and its performance on YES and NO input instances. ■ For proving lower bounds on the query complexity via Yao’s minimax principle, you explicitly come up with a hard input distribution for deterministic testers.