PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019 - - PowerPoint PPT Presentation

property testing
SMART_READER_LITE
LIVE PREVIEW

PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019 - - PowerPoint PPT Presentation

CS523 234: 4: Alg lgori rith thms ms at Sca cale le PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019 Lecture Outline What is property testing? Identify what goes into showing correctness of a testing algorithm.


slide-1
SLIDE 1

PROPERTY TESTING

Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019

CS523 234: 4: Alg lgori rith thms ms at Sca cale le

slide-2
SLIDE 2

Lecture Outline

■ What is property testing? ■ Identify what goes into showing correctness of a testing

  • algorithm. Some examples.

■ Identify what goes into showing impossibility of fast

  • testing. Some examples.
slide-3
SLIDE 3

A motivating example

  • DNA: strings in 4 characters

{A, C, T, G}

  • Pr

Prob

  • blem

em: Given two DNA strands 𝑌 and 𝑍, are they from the same species or from different?

slide-4
SLIDE 4

■ If 𝑌 and 𝑍 are from the same species, then we expect the strings are similar. Otherwise, not. ■ But similar in what sense?

slide-5
SLIDE 5

■ If 𝑌 and 𝑍 are from the same species, then we expect the strings are similar. Otherwise, not. ■ But similar in what sense? Nee eed d a met metric. ric.

– One possibility is Levenshtein distance (# of insertions, deletions or substitutions to turn one string into another)

slide-6
SLIDE 6

Want an algorithm that outputs: – SAME if 𝑒𝑀 𝑌, 𝑍 is “small” – DIFFERENT if 𝑒𝑀(𝑌, 𝑍) is “large”

slide-7
SLIDE 7

For exactly computing 𝑒𝑀, only 𝑃 𝑜2 algorithms are

  • known. Too expensive for bio applications.
slide-8
SLIDE 8

For exactly computing 𝑒𝑀, only 𝑃 𝑜2 algorithms are

  • known. Too expensive for bio applications.

Is there a more efficient algorithm that outputs – SAME if 𝑒𝑀 𝑌, 𝑍 ≤ 𝑈

1

– DIFFERENT if 𝑒𝑀 𝑌, 𝑍 ≥ 𝑈2?

slide-9
SLIDE 9

Is there a more efficient algorithm that outputs – SAME if 𝑒𝑀 𝑌, 𝑍 ≤ 𝑈

1

– DIFFERENT if 𝑒𝑀 𝑌, 𝑍 ≥ 𝑈2?

Indeed, there is! If 𝑈

1 and 𝑈2 are sufficiently apart, you only

need to look at ≪ 𝑜 characters in the strings to make the correct decision with high probability!

slide-10
SLIDE 10

( , ) ( , )

slide-11
SLIDE 11

Pr Prop

  • per

erty ty Test estin ing Fram amewor

  • rk

Bad inputs are 𝝑-far ar from good, which means: For a distance function 𝑒: Inputs → [0,1], for any good 𝑌 and bad 𝑍, 𝑒 𝑌, 𝑍 > 𝜗.

slide-12
SLIDE 12

Pr Prop

  • per

erty ty Test estin ing Fram amewor

  • rk

Def efinitio

  • inition. An algorithm is a tes

ester er for r a p proper

  • perty

ty 𝓠 if:

  • The inputs are: integer 𝑜 > 0, real 𝜗 ∈ (0,1), and query access

to an object 𝑦 of size 𝑜

  • It accepts with probability ≥ 2/3 if 𝑦 ∈ 𝒬.
  • It rejects with probability ≥ 2/3 if 𝑦 is 𝜗-far from 𝒬.
slide-13
SLIDE 13

Pr Prop

  • per

erty ty Test estin ing Fram amewor

  • rk

Que uery co comple lexity xity: The number of query accesses made by the tester. Main focus of this course will be understanding the query complexity for various properties 𝒬.

slide-14
SLIDE 14

Pr Prop

  • per

erty ty Test estin ing Fram amewor

  • rk

Data a re repr presen esenta tation tion decides what is revealed by each query. For example, can represent graph as an adjacency matrix or list.

slide-15
SLIDE 15

Pr Prop

  • per

erty ty Test estin ing Fram amewor

  • rk

Dista stance nce fun unctio ction decides what is meant by 𝜗-far. The default choice is the Hamming amming distance

  • istance. For two functions

𝑔, 𝑕: 𝑜 → 𝑆, 𝑒𝐼 𝑔, 𝑕 = 𝑗 ∈ 𝑜 : 𝑔 𝑗 ≠ 𝑕 𝑗 𝑜 .

slide-16
SLIDE 16

Pr Prop

  • per

erty ty Test estin ing Fram amewor

  • rk

Often, our testers will be one ne-sid sided ed, meaning the tester will accept with probability 1 if 𝑦 ∈ 𝒬.

slide-17
SLIDE 17

■ Inputs are strings of length 𝑜. Property 𝒬 is satisfied only by the all-1’s string. Distance measure is the Hamming distance, 𝑒𝐼. ■ Want tester to accept 𝑦 with probability ≥ 2/3 if 𝑦 = 1𝑜. Want tester to reject 𝑦 with probability ≥ 2/3 if #{𝑗: 𝑦𝑗 ≠ 1} > 𝜗𝑜. ■ Tester: Sample 2/𝜗 random locations 𝑗 ∈ [𝑜]. Accept iff for all such 𝑗, 𝑦𝑗 = 1. ■ One-sided error. If 𝑦 is 𝜗-far from 𝒬, Pr[tester rejects] ≥ 1 − 1 − 𝜗 2/𝜗 ≥ 2/3

A S A Sim imple e Ex Exam ample le

slide-18
SLIDE 18
slide-19
SLIDE 19

To show that an algorithm 𝒝 is a tester for a property 𝒬 with query complexity 𝑟(𝜗, 𝑜), you need to do th thre ree things:

  • 1. Prove that for any 𝑦 ∈ 𝒬, 𝒝 accepts with

probability ≥ 2/3 (or 1 for one-sided)

  • 2. Prove that for any 𝑦 that is 𝜗-far from 𝒬, 𝒝

rejects with probability ≥ 2/3

  • 3. Prove that the number of queries is at most

𝑟(𝜗, 𝑜) for all inputs

slide-20
SLIDE 20

𝒬 = monotonicity

■ Input: array of 𝑜 distinct numbers. ■ Array 𝐵 is mo monot notone

  • ne if 𝐵 𝑗 < 𝐵[𝑘] when 𝑗 < 𝑘.

■ Array 𝐵 is 𝝑-far r from m mo monot notone

  • ne if:

min monotone 𝐶𝑒𝐼 𝐵, 𝐶 > 𝜗

slide-21
SLIDE 21
slide-22
SLIDE 22

Test1(𝜗, 𝑜, 𝐵): for t=1,…,q: choose random i ∈ [1, 𝑜 − 1]

  • utput “NO” if A[i] > A[i+1]
  • utput “YES”

For what choice of 𝑟 is Test1 a tester for monotonicity?

slide-23
SLIDE 23

Test2(𝜗, 𝑜, 𝐵): for t=1,…,q: choose random i ∈ [1, 𝑜 − 1] choose random j ∈ [𝑗 + 1, 𝑜]

  • utput “NO” if A[i] > A[j]
  • utput “YES”

For what choice of 𝑟 is Test2 a tester for monotonicity?

slide-24
SLIDE 24

Test3(𝜗, 𝑜, 𝐵): for t=1,…,2/𝜗: choose random i ∈ [1, 𝑜] x ← A[i]

  • utput “NO” if binary search\\

for x does not end at i

  • utput “YES”

Th Theo eorem rem: Test3 is a one-sided tester for monotonicity with query complexity 𝑃((log 𝑜)/𝜗).

Query complexity YES case NO case

slide-25
SLIDE 25

NO case analysis

Call a coordinate 𝑗 sea earcha chabl ble e if the binary search for 𝐵[𝑗] ends at 𝑗. Cla laim m 1: If 𝐵 is 𝜗-far from monotone, then the number

  • f searchable 𝑗’s is at most 1 − 𝜗 𝑜.

NO case done with this claim. Why?

slide-26
SLIDE 26

Proof of Claim 1

Cla laim m 2: The array 𝐵 restricted to its searchable coordinates is monotone.

Claim 1 follows from Claim 2. Why?

slide-27
SLIDE 27

Proof of Claim 2

Cla laim m 3: If 𝑗 < 𝑘 and both 𝑗 and 𝑘 are searchable, then 𝐵 𝑗 < 𝐵[𝑘].

slide-28
SLIDE 28

Some notes

■ Tester is adaptiv aptive, meaning that its queries may depend on the answers to its past queries. ■ It is possible to make the tester non-adaptive. ■ Test2 is a valid tester with query complexity 𝑃 𝜗−1 when the inputs are Boolean arrays.

slide-29
SLIDE 29
slide-30
SLIDE 30

Lower bounds on query complexity

Three common approaches Yao’s Minimax Principle Gap-Preserving Reductions Communication Complexity

slide-31
SLIDE 31

Lower bounds on query complexity

Three common approaches Yao’s Minimax Principle Gap-Preserving Reductions Communication Complexity

slide-32
SLIDE 32

Lower bounds for randomized testers

■ Testers are ra randomi ndomized zed alg lgori rith thms

  • ms. You can think of a

randomized algorithm as a random element of a collection of deterministic algorithms: 𝒝 = {𝐵1, 𝐵2, 𝐵3, … } ■ Showing limitations for randomized algorithms is usually trickier than for deterministic algorithms

slide-33
SLIDE 33

For any randomized tester 𝑈 making 𝑟 queries, there exists an input 𝑦 such that: Pr

𝑈 [𝑈 𝑦 is wrong] > 1

3 There exists a distribution 𝔈 on inputs such that for any det etermi ermini nist stic ic tester 𝑈 making 𝑟 queries: Pr

𝑦∼𝒠[𝑈 𝑦 is wrong] > 1

3

slide-34
SLIDE 34

For any randomized tester 𝑈 making 𝑟 queries, there exists an input 𝑦 such that: Pr

𝑈 [𝑈 𝑦 is wrong] > 1

3 There exists a distribution 𝔈 on inputs such that for any det eterministic erministic tester 𝑈 making 𝑟 queries: Pr

𝑦∼𝒠[𝑈 𝑦 is wrong] > 1

3

slide-35
SLIDE 35

For any randomized tester 𝑈 making 𝑟 queries, there exists an input 𝑦 such that: Pr

𝑈 [𝑈 𝑦 is wrong] > 1

3 There exists a distribution 𝔈 on inputs such that for any det etermi ermini nist stic ic tester 𝑈 making 𝑟 queries: Pr

𝑦∼𝒠[𝑈 𝑦 is wrong] > 1

3

slide-36
SLIDE 36

It suffices to come up with a distribution ribution of inputs that is hard on average for any low- query det etermini rminist stic tester.

Yao’s Minimax Principle: 𝒬 is a property over objects. Suppose there are two distributions ℱ1 and ℱ2 such that:

  • Pr

𝑦∼ℱ1[𝑦 ∈ 𝒬] ≥ 1 − 𝜃1

  • Pr

𝑦∼ℱ2[𝑦 is 𝜗−far from 𝒬] ≥ 1 − 𝜃2

  • For any deterministic algorithm 𝑈 making 𝑟 𝑜, 𝜗 queries:

Pr

𝑦∼ℱ1 𝑈 accepts − Pr 𝑦∈ℱ2 𝑈 accepts

≤ 𝜃3 If 𝜃1 + 𝜃2 + 𝜃3 < 1/3, then the query complexity of testing 𝒬 is more than 𝑟(𝑜, 𝜗).

slide-37
SLIDE 37

Ex Exam ample ple

Suppose 𝒬 = 1𝑜 . The query complexity of testing 𝒬 is Ω(𝜗−1).

slide-38
SLIDE 38

What hat ab about

  • ut

𝒬 = 0𝑜, 1𝑜 ? 𝒬 = {𝑨} for a fixed string 𝑨 ∈ 0,1 𝑜?

slide-39
SLIDE 39

Ex Exam ample ple

Suppose 𝒬 = 𝑦 ∈ 0,1 𝑜: 𝑦 ≤ 𝑜

2 (1 − 𝜗) .

The query complexity of testing 𝒬 is Ω(𝜗−2).

slide-40
SLIDE 40

Takeaways

■ Property testing is about how you can uncover differences in the global structure by using local queries. ■ For showing correctness of testers, you need to verify its query complexity and its performance on YES and NO input instances. ■ For proving lower bounds on the query complexity via Yao’s minimax principle, you explicitly come up with a hard input distribution for deterministic testers.