PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019 - PowerPoint PPT Presentation

CS523 234: 4: Alg lgori rith thms ms at Sca cale le PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019

Lecture Outline ■ What is property testing? ■ Identify what goes into showing correctness of a testing algorithm. Some examples. ■ Identify what goes into showing impossibility of fast testing. Some examples.

A motivating example • DNA: strings in 4 characters {A, C, T, G} • Pr Prob oblem em: Given two DNA strands 𝑌 and 𝑍 , are they from the same species or from different?

■ If 𝑌 and 𝑍 are from the same species, then we expect the strings are similar. Otherwise, not. ■ But similar in what sense?

■ If 𝑌 and 𝑍 are from the same species, then we expect the strings are similar. Otherwise, not. ■ But similar in what sense? Nee eed d a met metric. ric. – One possibility is Levenshtein distance (# of insertions, deletions or substitutions to turn one string into another)

Want an algorithm that outputs: – SAME if 𝑒 𝑀 𝑌, 𝑍 is “small” – DIFFERENT if 𝑒 𝑀 (𝑌, 𝑍) is “large”

For exactly computing 𝑒 𝑀 , only 𝑃 𝑜 2 algorithms are known. Too expensive for bio applications.

For exactly computing 𝑒 𝑀 , only 𝑃 𝑜 2 algorithms are known. Too expensive for bio applications. Is there a more efficient algorithm that outputs – SAME if 𝑒 𝑀 𝑌, 𝑍 ≤ 𝑈 1 – DIFFERENT if 𝑒 𝑀 𝑌, 𝑍 ≥ 𝑈 2 ?

Is there a more efficient algorithm that outputs – SAME if 𝑒 𝑀 𝑌, 𝑍 ≤ 𝑈 1 – DIFFERENT if 𝑒 𝑀 𝑌, 𝑍 ≥ 𝑈 2 ? Indeed, there is! If 𝑈 1 and 𝑈 2 are sufficiently apart, you only need to look at ≪ 𝑜 characters in the strings to make the correct decision with high probability!

( , ) ( , )

Pr Prop oper erty ty Test estin ing Fram amewor ork Bad inputs are 𝝑 -far ar from good, which means: For a distance function 𝑒: Inputs → [0,1] , for any good 𝑌 and bad 𝑍 , 𝑒 𝑌, 𝑍 > 𝜗.

Pr Prop oper erty ty Test estin ing Fram amewor ork Def efinitio inition. An algorithm is a tes ester er for r a p proper operty ty 𝓠 if: • The inputs are: integer 𝑜 > 0 , real 𝜗 ∈ (0,1) , and query access to an object 𝑦 of size 𝑜 • It accepts with probability ≥ 2/3 if 𝑦 ∈ 𝒬 . • It rejects with probability ≥ 2/3 if 𝑦 is 𝜗 -far from 𝒬 .

Pr Prop oper erty ty Test estin ing Fram amewor ork Que uery co comple lexity xity: The number of query accesses made by the tester. Main focus of this course will be understanding the query complexity for various properties 𝒬 .

Pr Prop oper erty ty Test estin ing Fram amewor ork Data a re repr presen esenta tation tion decides what is revealed by each query. For example, can represent graph as an adjacency matrix or list.

Pr Prop oper erty ty Test estin ing Fram amewor ork Dista stance nce fun unctio ction decides what is meant by 𝜗 -far. The default choice is the Hamming amming distance istance. For two functions 𝑔, 𝑕: 𝑜 → 𝑆 , 𝑗 ∈ 𝑜 : 𝑔 𝑗 ≠ 𝑕 𝑗 𝑒 𝐼 𝑔, 𝑕 = . 𝑜

Pr Prop oper erty ty Test estin ing Fram amewor ork Often, our testers will be one ne-sid sided ed, meaning the tester will accept with probability 1 if 𝑦 ∈ 𝒬 .

■ Inputs are strings of length 𝑜 . Property 𝒬 is satisfied only by the all- 1’s string. Distance measure is the Hamming distance, 𝑒 𝐼 . ■ Want tester to accept 𝑦 with probability ≥ 2/3 if 𝑦 = 1 𝑜 . Want tester to reject 𝑦 with probability ≥ 2/3 if A S A Sim imple e #{𝑗: 𝑦 𝑗 ≠ 1} > 𝜗𝑜 . Exam Ex ample le ■ Tester: Sample 2/𝜗 random locations 𝑗 ∈ [𝑜] . Accept iff for all such 𝑗 , 𝑦 𝑗 = 1 . ■ One-sided error. If 𝑦 is 𝜗 -far from 𝒬 , Pr[tester rejects] ≥ 1 − 1 − 𝜗 2/𝜗 ≥ 2/3

To show that an algorithm 𝒝 is a tester for a property 𝒬 with query complexity 𝑟(𝜗, 𝑜) , you need to do th thre ree things: 1. Prove that for any 𝑦 ∈ 𝒬 , 𝒝 accepts with probability ≥ 2/3 (or 1 for one-sided) 2. Prove that for any 𝑦 that is 𝜗 -far from 𝒬 , 𝒝 rejects with probability ≥ 2/3 3. Prove that the number of queries is at most 𝑟(𝜗, 𝑜) for all inputs

𝒬 = monotonicity ■ Input: array of 𝑜 distinct numbers. ■ Array 𝐵 is mo monot notone one if 𝐵 𝑗 < 𝐵[𝑘] when 𝑗 < 𝑘 . ■ Array 𝐵 is 𝝑 -far r from m mo monot notone one if: monotone 𝐶 𝑒 𝐼 𝐵, 𝐶 > 𝜗 min

Test1( 𝜗, 𝑜, 𝐵) : for t=1,…,q: choose random i ∈ [1, 𝑜 − 1] output “NO” if A[ i] > A[i+1] output “YES” For what choice of 𝑟 is Test1 a tester for monotonicity?

Test2( 𝜗, 𝑜, 𝐵) : for t=1,…,q: choose random i ∈ [1, 𝑜 − 1] choose random j ∈ [𝑗 + 1, 𝑜] output “NO” if A[ i] > A[j] output “YES” For what choice of 𝑟 is Test2 a tester for monotonicity?

Test3( 𝜗, 𝑜, 𝐵) : for t=1,…,2/ 𝜗 : choose random i ∈ [1, 𝑜] x ← A[i] output “NO” if binary search \\ for x does not end at i output “YES” Th Theo eorem rem: Test3 is a one-sided tester for monotonicity with query complexity 𝑃((log 𝑜)/𝜗) . NO case YES case Query complexity

NO case analysis Call a coordinate 𝑗 sea earcha chabl ble e if the binary search for 𝐵[𝑗] ends at 𝑗 . Cla laim m 1: If 𝐵 is 𝜗 -far from monotone, then the number of searchable 𝑗 ’s is at most 1 − 𝜗 𝑜 . NO case done with this claim. Why?

Proof of Claim 1 Cla laim m 2: The array 𝐵 restricted to its searchable coordinates is monotone. Claim 1 follows from Claim 2. Why?

Proof of Claim 2 Cla laim m 3: If 𝑗 < 𝑘 and both 𝑗 and 𝑘 are searchable, then 𝐵 𝑗 < 𝐵[𝑘] .

Some notes ■ Tester is adaptiv aptive, meaning that its queries may depend on the answers to its past queries. ■ It is possible to make the tester non-adaptive. ■ Test2 is a valid tester with query complexity 𝑃 𝜗 −1 when the inputs are Boolean arrays.

Lower bounds on query complexity Three common approaches Yao’s Minimax Gap-Preserving Communication Principle Reductions Complexity

Lower bounds for randomized testers ■ Testers are ra randomi ndomized zed alg lgori rith thms ms. You can think of a randomized algorithm as a random element of a collection of deterministic algorithms: 𝒝 = {𝐵 1 , 𝐵 2 , 𝐵 3 , … } ■ Showing limitations for randomized algorithms is usually trickier than for deterministic algorithms

For any randomized tester 𝑈 making 𝑟 queries, there exists an input 𝑦 such that: 𝑈 [𝑈 𝑦 is wrong] > 1 Pr 3 There exists a distribution 𝔈 on inputs such that for any det etermi ermini nist stic ic tester 𝑈 making 𝑟 queries: 𝑦∼𝒠 [𝑈 𝑦 is wrong] > 1 Pr 3

For any randomized tester 𝑈 There exists a distribution 𝔈 on making 𝑟 queries, there exists an inputs such that for any det eterministic erministic input 𝑦 such that: tester 𝑈 making 𝑟 queries: 𝑈 [𝑈 𝑦 is wrong] > 1 𝑦∼𝒠 [𝑈 𝑦 is wrong] > 1 Pr Pr 3 3

For any randomized tester 𝑈 making 𝑟 queries, there exists an input 𝑦 such that: 𝑈 [𝑈 𝑦 is wrong] > 1 Pr 3 There exists a distribution 𝔈 on inputs such that for any det etermi ermini nist stic ic tester 𝑈 making 𝑟 queries: 𝑦∼𝒠 [𝑈 𝑦 is wrong] > 1 Pr 3

It suffices to come up with a distribution ribution of inputs that is hard on average for any low- query det etermini rminist stic tester. Yao’s Minimax Principle : 𝒬 is a property over objects. Suppose there are two distributions ℱ 1 and ℱ 2 such that: • 𝑦∼ℱ 1 [𝑦 ∈ 𝒬] ≥ 1 − 𝜃 1 Pr • 𝑦∼ℱ 2 [𝑦 is 𝜗−far from 𝒬] ≥ 1 − 𝜃 2 Pr • For any deterministic algorithm 𝑈 making 𝑟 𝑜, 𝜗 queries: 𝑦∼ℱ 1 𝑈 accepts − Pr Pr 𝑦∈ℱ 2 𝑈 accepts ≤ 𝜃 3 If 𝜃 1 + 𝜃 2 + 𝜃 3 < 1/3 , then the query complexity of testing 𝒬 is more than 𝑟(𝑜, 𝜗) .

Suppose 𝒬 = 1 𝑜 . The query complexity of Ex Exam ample ple testing 𝒬 is Ω(𝜗 −1 ) .

What hat ab about out 𝒬 = 0 𝑜 , 1 𝑜 ? 𝒬 = {𝑨} for a fixed string 𝑨 ∈ 0,1 𝑜 ?

Suppose 𝒬 = 𝑦 ∈ 0,1 𝑜 : 𝑦 ≤ 𝑜 Ex Exam ample ple 2 (1 − 𝜗) . The query complexity of testing 𝒬 is Ω(𝜗 −2 ) .

Takeaways ■ Property testing is about how you can uncover differences in the global structure by using local queries. ■ For showing correctness of testers, you need to verify its query complexity and its performance on YES and NO input instances. ■ For proving lower bounds on the query complexity via Yao’s minimax principle, you explicitly come up with a hard input distribution for deterministic testers.

PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019 - PowerPoint PPT Presentation

CS523 234: 4: Alg lgori rith thms ms at Sca cale le PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019 Lecture Outline What is property testing? Identify what goes into showing correctness of a testing algorithm.

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

What is the cloud? Property of TalentWise Property of TalentWise Cloud HCM Players Property of

PROPERTY RATES PROPERTY RATES PROPERTY RATES PROPERTY RATES BUFFALO CITY MUNICIPALITY

Algebraic property testing Elad Haramaty Northeastern University Algebraic property testing

Algebraic Property Testing: A Survey Madhu Sudan MIT 1 1 April 1, 2009 April 1, 2009

Software Testing Overview What is software testing? General testing criteria Testing

Software testing Software Testing Introduction Testing levels Automated testing Principles and

1. Test page This page is for testing. This page is for testing. This page is for testing.

fifty shades of adaptivity (in property testing) An Adaptivity Hierarchy Theorem for Property

Overview Objective Types of testing ECE 553: TESTING AND Verification testing

Object Oriented Testing Chapter 23 1 OO Testing Class Testing: Equivalent to unit testing

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

Treasurers Institute Sun, Nov. 17, 2019 Property Tax Errors Property Tax Errors Property Tax

PT Mega Manunggal Property Tbk 1 PT Mega Manunggal Property Tbk 2 PT Mega Manunggal Property

Multimodal Image Retrieval Based on Keywords and Low-level Image Features Miran Pobar, Marina

Quantum Lecture 6 Shannon information Quantum information Distance measures Mikael

Intuitive Parameterization of Distance-Based Clustering Techniques Altobelli de Brito Mantuan

Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and

Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Mean Shift

Diffuse Galactic Emission Diffuse Galactic Emission in the Fermi-LAT Era in the Fermi-LAT Era

Structure Learning: the good, the bad, the ugly Graphical Model 10708 Carlos Guestrin

CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 12: Approximate Mechanism Design