Restriction Access, Population Recovery & Partial - PowerPoint PPT Presentation

Restriction Access, Population Recovery & Partial Identification Avi Wigderson IAS, Princeton Joint with Zeev Dvir Anup Rao Amir Yehudayoff

Restriction Access, A new model of “ Grey-box ” access

Systems, Models, Observations From Input-Output (I 1 ,O 1 ), (I 2 ,O 2 ), (I 3 ,O 3 ), ….? Typically more!

Black-box access Successes & Limits Learning: PAC, membership, statistical…queries Decision trees, DNFs? Cryptography: semantic, CPA, CCA, … security Cold boot, microwave,… attacks? Optimization: Membership, separation,… oracles Strongly polynomial algorithms? Pseudorandomness: Hardness vs. Randomness Derandomizing specific algorithms? Complexity: Σ 2 = NP NP What problems can we solve if P=NP?

The gray scale of access f: Σ n → Σ m D: “ device ” computing f (from a family of devices) D x 1, f(x 1 ) x 2, f(x 2 ) Black Box How to model? x 3, f(x 3 ) Many specific ideas. …. Ours: general, clean Gray Box D – natural starting point - natural intermediate pt D Clear Box

f(x) Restriction Access (RA) D | ρ f: Σ n → Σ m D: “ device ” computing f x 1 , x 2, *,* …. * Restriction: ρ = (x, L ), L ⊆ [n], x ∈ Σ n , L L live vars Observations: ( ρ , D| ρ ) D| ρ (simplified after fixing) computes f| ρ on L Black L = φ Gray Clear L = [n] (x,f(x)) ( ρ , D| ρ ) ( x,D)

Example: Decision Tree D ρ = (x,L) x 1 L = {3,4} x 3 0 1 x = (1010) x 2 x 3 0 x 4 D| ρ = 0 1 x 4 x 2 1 0 1 0 0 1

Modeling choices (RA-PAC) Restriction: ρ = (x,L), L ⊆ [n], x ∈ Σ n , unknown D Input x : friendly, adversarial, random Unknown distribution (as in PAC) Live vars L : friendly, adversarial, random µ - independent dist (as in random restrictions)

Positive - RA-PAC Results In contrast to PAC !!! Probably, Approximately Correct (PAC) learning of D, from restrictions with each variable remains alive with prob µ Thm 1 [DRWY] : A poly(s, µ ) alg for RA-PAC learning size-s decision trees , for every µ >0 (reconstruction from pairs of live variables) Thm 2 [DRWY] : A poly(s, µ ) alg for RA-PAC learning size-s DNFs , for every µ > .365… (reduction to “ Population Recovery Problem ” )

Population Recovery (learning a mixture of binomials)

Population Recovery Problem k species, n attributes, from Σ , Vectors v 1 , v 2 , … v k ∈ Σ n Distribution p 1 , p 2 , … p k µ , ε >0 n Red: Known p 1 1/2 0000 v 1 k p 2 1/3 0110 v 2 Blue: Unknown p 3 1/6 1100 v 3 Task: Recover all v i , p i (upto ε ) from samples

Population Recovery Problem k species, n attributes, from Σ , µ , ε >0 v 1 , v 2 , … v k ∈ Σ n p 1 1/2 0000 v 1 p 2 1/3 0110 v 2 p 1 , p 2 , … p k fraction in population p 3 1/6 1100 v 3 Task: Recover all v i , p i (upto ε ) from samples Samplers : (1) u ← v i with prob. p i 0110 µ -Lossy Sampler: (2) u(j) ← ? with prob. 1- µ ∀ j ∈ [n] ?1?0 µ -Noisy Sampler: (2) u(j) flipped w.p. 1/2- µ ∀ j ∈ [n] 1100

Loss – Paleontology 26 % True Data 11 % 13 % 30 % 20 %

Loss – Paleontology From samples Dig #1 Dig #2 Dig #3 Dig #4 …… each finding common to many species! How do they do it?

Noise – Privacy 2 % 0 1 1 0 1 0 0 1 % 1 1 0 0 0 1 1 True Data …… …… From samples Joe 0 0 0 0 0 1 1 Jane 0 0 0 0 1 1 1 … . Who flipped every correct answer with probability 49% Deniability? Recovery?

PRP - applications Recovering from loss & noise - Clustering / Learning / Data mining - Computational biology / Archeology / …… - Error correction - Database privacy - …… Numerous related papers & books

PRP - Results Facts : µ =0 obliterates all information. - No polytime algorithm for µ = o(1) Thm 3 [DRWY] A poly(k, n, ε ) algorithm, from lossy samples, for every µ > .365… Thm 4 [WY] : A poly(k log k , n, ε ) algorithm, from lossy and/or noisy samples, for every µ > 0 Kearns, Mansour, Ron, Rubinfeld, Schapire, Sellie exp(k) algorithm for this discrete version Moitra, Valiant exp(k) algorithm for Gaussian version (even when noise is unknown)

Proof of Thm 4 n Reconstruct v i , p i p 1 1/2 0000 v 1 p 2 1/3 0110 v 2 k p 3 1/6 1100 v 3 From samples , , ,…. ?1?0 0??0 1100 Lemma 1 : Can assume we know the v i ’ s ! Proof : Exposing one column at a time. n Lemma 2 : Easy in exp( n ) time ! Proof : Lossy - enough samples without “ ? ” Noisy – linear algebra on sample probabilities. Idea : Make n =O(log k ) [Dimension Reduction]

Partial IDs a new dimension-reduction technique

Dimension Reduction and small IDs IDs 1 2 3 4 5 6 7 8 n = 8 p 1 0 0 0 0 0 1 0 1 v 1 S 1 = {1,2} k = 9 p 2 0 1 1 0 1 0 1 0 v 2 S 2 = {8} p 3 0 1 0 0 1 0 1 1 v 3 S 3 = {1,5,6} p 4 1 1 1 0 1 0 1 1 v 4 p 5 1 1 0 0 0 1 1 1 v 5 p 6 1 1 0 0 1 0 0 1 v 6 p 7 0 1 0 0 0 1 1 1 v 7 u – random sample p 8 1 1 0 1 1 0 1 1 v 8 p 9 1 1 0 0 0 1 1 1 v 9 q i = Pr[u[S i ]=v i [S i ]] Lemma : Can approximate p i in exp(| S i |) time ! Does one always have small IDs?

Small IDs ? IDs 1 2 3 4 5 6 7 8 n = 8 p 1 1 0 0 0 0 0 0 0 v 1 S 1 = {1} k = 9 p 2 0 1 0 0 0 0 0 0 v 2 S 2 = {2} p 3 0 0 1 0 0 0 0 0 v 3 S 3 = {3} p 4 0 0 0 1 0 0 0 0 v 4 … p 5 0 0 0 0 1 0 0 0 v 5 p 6 0 0 0 0 0 1 0 0 v 6 p 7 0 0 0 0 0 0 1 0 v 7 p 8 0 0 0 0 0 0 0 1 v 8 S 8 = {8} p 9 0 0 0 0 0 0 0 0 v 9 S 9 = {1,2, … ,8} NO! However,…

Linear algebra & Partial IDs P IDs 1 2 3 4 5 6 7 8 n = 8 p 1 1 0 0 0 0 0 0 0 v 1 S 1 = {1} k = 9 p 2 0 1 0 0 0 0 0 0 v 2 S 2 = {2} p 3 0 0 1 0 0 0 0 0 v 3 S 3 = {3} p 4 0 0 0 1 0 0 0 0 v 4 … p 5 0 0 0 0 1 0 0 0 v 5 p 6 0 0 0 0 0 1 0 0 v 6 p 7 0 0 0 0 0 0 1 0 v 7 p 8 0 0 0 0 0 0 0 1 v 8 S 8 = {8} p 9 0 0 0 0 0 0 0 0 v 9 S 9 = ∅ However, we can compute p 9 = 1- p 1 - p 2 - … - p 8

Back substitution and Imposters PIDs 1 2 3 4 5 6 7 8 p 1 0 0 1 0 0 1 0 1 v 1 S 1 = {1,2} q 1 = p 2 0 1 1 0 1 0 1 0 v 2 S 2 = {8} q 2 = p 3 0 1 0 0 1 0 1 1 v 3 S 3 = {1,5,6} q 3 = p 4 1 1 1 0 1 0 1 1 v 4 S 4 = {3} q 4 - p 1 - p 2 = p 5 1 1 0 0 0 1 1 1 v 5 u – random sample p 6 1 1 0 0 1 0 0 1 v 6 q i = any p 7 0 1 0 0 0 1 1 1 v 7 subset Pr[u[S i ]=v i [S i ]] p 8 1 1 0 1 1 0 1 1 v 8 p 9 1 1 0 0 0 1 1 1 v 9 Can use back substitution if no cycles ! Are there always acyclic small partial IDs?

Acyclic small partial IDs exist PIDs 1 2 3 4 5 6 7 8 n = 8 p 1 0 0 0 0 0 0 0 1 v 1 k = 9 p 2 0 1 1 0 1 0 1 0 v 2 p 3 1 1 0 0 1 0 1 1 v 3 p 4 1 1 1 0 1 0 1 1 v 4 p 5 1 1 0 0 0 1 1 1 v 5 p 6 1 1 0 0 1 0 0 1 v 6 p 7 1 1 1 1 1 0 1 1 v 7 p 8 0 1 0 0 0 1 1 1 v 8 S 8 = {1,5,6} p 9 0 1 0 0 1 1 1 1 v 9 Lemma : There is always an ID of length log k Idea : Remove and iterate to find more PIDs Lemma : Acyclic (log k)-PIDs always exists!

Chains of small Partial IDs PIDs 1 2 3 4 5 6 7 8 n = 8 p 1 1 1 1 1 1 1 1 1 v 1 S 1 = {1} k = 6 p 2 0 1 1 1 1 1 1 1 v 2 S 2 = {2} p 3 0 0 1 1 1 1 1 1 v 3 S 3 = {3} p 4 0 0 0 1 1 1 1 1 v 4 … p 5 0 0 0 0 1 1 1 1 v 5 p 6 0 0 0 0 0 1 1 1 v 6 S 6 = {6} Compute: q i = Pr[u i = 1] = Σ j ≤ i p i from sample u Back substitution : p i = q i - Σ j<i p j Problem : Long chains! Error doubles each step, so is exponential in the chain length. Want : Short chains!

The PID (imposter) graph Given: V=(v 1 , v 2 , … v k ) ∈ Σ n S=(S 1, S 2 , … ,S k ) ⊆ [ n ] n Construct G (V;S) by connecting v j à à v i iff v i is an imposter of v j : v i [ S j ] = v j [ S j ] PIDs 1 2 3 4 5 6 7 8 1 1 1 1 1 1 1 1 v 1 S 1 = {1} 0 1 1 1 1 1 1 1 v 2 S 2 = {2} 0 0 1 1 1 1 1 1 v 3 S 3 = {3} v i à à v j 0 0 0 1 1 1 1 1 v 4 … S 5 = {5} iff i > j 0 0 0 0 1 1 1 1 v 5 width = max i |S i | depth = depth ( G ) Want : PIDs w/small width and depth for all V

Constructing cheap PID graphs Theorem: For every V=(v 1 , v 2 , … v k ), v i ∈ Σ n we can efficiently find PIDs S=(S 1 ,S 2 , … ,S k ), S i ⊆ [ n ] of width and depth at most log k Algorithm: Initialize S i = ∅ for all i Invariant : |imposters( v i ;S i )| ≤ k /2 | Si | 1 2 3 4 Repeat : (1) Make S i maximal 0 0 1 0 v 1 if not, add minority coordinates to S i 0 0 0 0 v 2 (2) Make chains monotone: 0 0 0 1 v 3 v j à à v i then | S j |<| S i | (so G acyclic) 1 0 0 1 v 4 if not, set S i to S j ( and apply (1) to S i ) 1 1 1 0 v 5 1 0 1 0 v 6

Analysis of the algorithm Theorem: For every V=(v 1 , v 2 , … v k ) ∈ Σ n we can efficiently find PIDs S=(S 1 ,S 2 , … ,S k ) ⊆ [ n ] n of width and depth at most log k Algorithm: Initialize S i = ∅ for all i Invariant : |imposters( v i ;S i )| ≤ k /2 | Si | Repeat : (1) Make S i maximal (2) Make chains monotone ( v j à à v i then | S j |<| S i |) Analysis: - | S i | ≤ log k throughout for all i - ∑ i | S i | increases each step - Termination in k log k steps. - width ≤ log k and so depth ≤ log k

Conclusions - Restriction access: a new, general model of “ gray box ” access (largely unexplored!) - A general problem of population recovery - Efficient reconstruction from loss & noise - Partial IDs, a new dimension reduction technique for databases. Open : polynomial time algorithm in k ? (currently k log k , PIDs can’t beat k loglog k ) Open : Handle unknown errors ?

Restriction Access, Population Recovery & Partial - PowerPoint PPT Presentation

Restriction Access, Population Recovery & Partial Identification Avi Wigderson IAS, Princeton Joint with Zeev Dvir Anup Rao Amir Yehudayoff Restriction Access, A new model of Grey-box access Systems, Models, Observations From

Lab # 9 : Restriction enzyme mapping Restriction Enzymes Restriction enzymes , also known as

Restriction Monads Category Theory 2016 Dalhousie and St. Marys Universities Halifax, N.S.,

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Partial Functions and Categories of Partial Maps Science Atlantic at Acadia University Darien

Population Ecology 1. Population Concepts 2. Population Growth 3. Regulation of Population

THE CITY OF HOUSTON Legal Department Deed Restriction Enforcement A deed restriction is: A

U.S. Public Health Travel Restriction and Intervention Tools Restriction and Intervention Tools

Restriction endonucleases est ct o e do uc eases and their applications Hi t History of

An Element-based Reformulation of Restriction Monads Category Theory 2017 at University of British

Introduction to Restriction Categories Robin Cockett Department of Computer Science University

Strip Recovery: Strip Recovery: Strip Recovery: Strip Recovery: A 12 A 12- -Step

Partial Orders on the integers. In this case ( a , b ) R if a b . a a so R is reflexive. a b

JUST THE MATHS SLIDES NUMBER 14.1 PARTIAL DIFFERENTIATION 1 (Partial derivatives of the

The Semantics of Partial Model Introduction Transformations Partial Models Transforming

World Population Trends January 26, 2012 World Population Trends World Population Growth

Financial Restriction on Coal Assets September 8, 2016 Tokyo Yoshihiko SAKANASHI Senior

Micro-architectural Attacks Chester Rebeiro IIT Madras 1 Things we thought gave us security!

Factoring RSA Modulus using Prime Reconstruction from Random Known Bits S. Maitra, S. Sarkar and

Intro to Physical Side Channel Attacks Thomas Eisenbarth 15.06.2018 Summer School on Real-World

Low Power Multicore Architecture Using FDSOI Technology Marcello Coppola STMicroelectronics

NP04 Powercut Recovery and Cold Starts Pengfei Dine, Bonnie King, Geoff Savage DUNE DAQ meeting

SimBench A Portable Benchmarking Methodology for Full-System Simulators Harry Wagstaff Bruno

Challenges in Leakage-Resilient Symmetric Cryptography Krzysztof Pietrzak ECRYPT II Workshop on

TLV in LunD Introduction to Top Level Verification Presenter Shkelqim Lahi B.Sc.E. in