Homomorphic Sketches Shrinking Big Data without Sacrificing - PowerPoint PPT Presentation

Homomorphic Sketches Shrinking Big Data without Sacrificing Structure Andrew McGregor University of Massachusetts

?=? Can test whether two n bit files are identical by comparing O(log n) bit fingerprints of each file.

? ≈ ? More generally, can construct sketches of files to estimate Hamming distance between the files. Many results such as distinct elements, entropy, frequency moments, quantiles, histograms, linear regression, clustering, shape approximation...

        M  Mv  Mv  = =             Mv =    v           Basic Idea: Treat file as vector; use linear projections to reduce dimension while preserving properties. Extensive theory with connections to compressed sensing, metric embeddings; widely applicable since parallelizable and suitable for stream processing. Most existing work concerns numerical statistics of data such as frequency and feature vectors...

Is it possible to analyze richer combinatorial and group-theoretic structure via linear sketches? Can we make compression “homomorphic” and run algorithms on sketched data? BIG small Compress DATA data Algorithm Algorithm ANSWER

Suppose n files encode rows of an adjacency matrix, e.g., each file is a list of friends in a social network. Theorem: Can check graph connectivity with O(polylog n) bit fingerprints of each file.

“Ti e quick brow n “q uick brown fo x CYCLIC ROTATION f ox jumpe d jumped over tie over ti e lazy dog. ” lazy dog. Ti e ” FINGERPRINT OPERATION Hamming distance isn’t robust to misalignments. Theorem: Can check equality of files up to rotation with fingerprints of length D(n) polylog n. More generally, we have homomorphic fingerprints : given a fingerprint, can compute the fingerprint of rotation. * D(n) is the number of divisors of n.

I. Connectivity I. Connectivity II. Misalignment a) Connectivity via O(polylog n) bit Fingerprints b) Extension to Estimating Cuts and Eigenvalues Joint work with Kook Jin Ahn and Sudipto Guha

Sketches for Connectivity • Theorem: Can check graph connectivity w.h.p. using O(polylog n) bit fingerprint of each adjacency list. • Corollary: Can monitor connectivity in a dynamic graph stream where edges are both inserted and deleted. • Note: Previous stream work assumed no edge deletions. • e.g., [Feigenbaum, Kannan, McGregor, Suri, Zhang 2004, 2005], [McGregor 2005] • [Jowhari, Ghodsi 2005], [Zelke 2008], [Sarma, Gollapudi, Panigrahy 2008, 2009] • [Ahn, Guha 2009, 2011], [Konrad, Magniez, Mathieu 2012], [Goel, Kapralov, Khanna 2012]

This can’t be possible?! • Suppose there’s a bridge (u,v) in the graph, i.e., Alice and Bob have a friendship that is essential to global connectivity. • It seems that at least one of their fingerprints needs Ω (n) bits: ‣ One of their fingerprints must contain info about the bridge. ‣ Alice and Bob don’t know their friendship is special. ‣ Alice and Bob may each have Ω (n) friends.

How we do it... • Template: Exploit homomorphic properties of linear sketches and emulate a classical algorithm in sketch space . Sketch ANSWER Algorithm Algorithm Original Graph Sketch Space

Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2. For each connected comp: pick incident edge 3. Repeat until no edges between connected comp. Lemma: After O(log n) rounds selected edges include spanning forest.

Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non- zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} � 1 0 � 2 5 a 1 = 1 0 0 0 0 0 0 0 � − 1 0 � a 2 = 0 0 0 1 0 0 0 0 1 � 0 0 � a 1 + a 2 = 1 0 0 1 0 0 0 0 3 4 Lemma: For any subset of nodes S ⊂ V , X support ( a i ) = E ( S , V \ S ) i ∈ S Lemma: There exists random M: ℝ N → ℝ polylog N such that for any a ∈ ℝ N , can deduce some e ∈ support(a) from Ma. [Jowhari, Saglam, Tardos 2011]

Recipe: Sketch & Compute on Sketches Sketch for node j: Ma j Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: X X X → e ∈ support( a j ) = ) = E ( S , V \ S ) M a j = M ( a j ) − j ∈ S j ∈ S j ∈ S Detail: Actually each player sends log n independent sketches M 1 a j , M 2 a j , ... and central player uses M i a j when emulating i th iteration of the algorithm.

Extension to Sparsification • Theorem: Can test k-connectivity using O(k polylog n) bit fingerprints of each adjacency list. • Theorem: Can (1+ ε )-approximate every graph cut using O( ε -2 polylog n) bit fingerprints of each adjacency list. • Theorem: Can construct a spectral sparsifier H using O( ε -2 n 2/3 polylog n) bit fingerprints of each adjacency list. • where L G and L H are the Laplacians of G and H.

k-Connectivity Basic Algorithm Algorithm: For i=1 to k: • Let F i be spanning forest of G(V ,E-F 1 -...-F i-1 ) Lemma: F 1 +...+F k contains either all the edges across a cut in G or ≥ k of them. Call such a graph, a k-skeleton. Sketch: Simultaneously construct k independent Emulation in Sketch Space connectivity sketches M 1 (G), M 2 (G), ..., M k (G). Run Algorithm in Sketch Space: Use M 1 (G) to find a spanning forest F 1 of G Use M 2 (G)-M 2 (F 1 )=M 2 (G-F 1 ) to find F 2 Use M 3 (G)-M 3 (F 1 )-M 3 (F 2 )=M 3 (G-F 1 -F 2 ) to find F 3 ...

(1+ ε )-Approx of All Cuts Theorem (Fung et al.) Sample edge e w/p p e and weight by 1/p e . If p e = ε -2 log 2 n/c e where c e is size of min e cut, then all cuts are preserved up to factor 1+ ε . Algorithm: Let G i be graph with edges sampled w/p 2 -i . Construct k-skeleton H i for each G i where k= 2 ε -2 log 2 n. Theorem: e is in some H i w/p at least p e Proof: Let C be edges in min u-v cut in G. i 1 2 3 4 ... -log p e ... log n P[e ∊ G i ] 1/2 1/ 4 1/8 1/16 ... p e ... 1/n E[|C ∩ G i |] c e /2 c e / 4 c e /8 c e /16 ... ε -2 log 2 n ... c e /n For i= -log p e , we have |C ∩ G i |<k by the Chernoff bound. Hence e ∊ H i iff e ∊ G i which happens w/p p e

II. Misalignment I. Connectivity II. Misalignment a) Testing Equality with Rotation b) Matching Lower Bound Joint work with Alexandr Andoni, Assaf Goldberger, Ely Porat

Fingerprints for Rotation “Ti e quick brow n “q uick brown fo x CYCLIC ROTATION f ox jumpe d jumped over tie over ti e lazy dog. ” lazy dog. Ti e ” • Theorem: There’s a D(n) polylog n bit fingerprint F that is: ‣ Useful: F(a) and F(b) determine if a, b ∈ ℤ n are rotations w.h.p. ‣ Homomorphic: From F(a) can construct F(any rotation of a) ‣ Linear: From F(a) and F(b) can compute F(a+b). • Theorem: Fingerprints with above properties need D(n) bits. • Extension: (t + D(n)) polylog n bit fingerprints F(a) and F(b) determine if a,b are within t substitutions of being rotations.

False Start: Fermat’ s Little Theorem Rabin-Karp: For some p and r, encode a=a 0 a 1 a 2 ...a n-1 as f ( r , a ) = a 0 + a 1 r + a 2 r 2 + ... a n − 1 r n − 1 mod p Fermat’ s Little Thm: If p=n+1 prime, r n =1 mod p and so, rf ( r , a 0 a 1 ... a n − 1 ) = a 0 r + a 1 r 2 + a 2 r 3 + ... + a n − 1 r n = a n − 1 + a 0 r + a 1 r 2 + ... + a n − 2 r n − 1 = f ( r , a n − 1 a 0 ... a n − 2 ) So, if b is k-shift of a then g ( r ) = r k f ( r , a ) − f ( r , b ) = 0 Schwartz-Zippel: If r is random and g non-zero: P [ g ( r ) = 0] ≤ ( n − 1) / p = 1 − O (1 / n ) Conclusion: No false negatives but likely false positives.

Beyond Schwartz-Zippel Evaluate g on roots of x n -1 but work in larger field x n -1 factorizes as D(n) irreducible polys over rationals: x 10 − 1 = Φ 1 ( x ) Φ 2 ( x ) Φ 5 ( x ) Φ 10 ( x ) = ( x − 1)(1 + x )(1 − x + x 2 − x 3 + x 4 )(1 + x + x 2 + x 3 + x 4 ) At least one ɸ i has no shared roots with g: If ɸ i shares one root, ɸ i divides g (Abel’ s Irred. Thm) Can’ t all divide g because g has degree ≤ n-1 Suffices to test g on an arbitrary root of each ɸ i Bad News: Can’ t guarantee g(r) has finite precision. Good News: Work modulo a random p. Can show ɸ i still doesn’ t share roots with g whp by analyzing resultant.

Lower Bound: Basic Idea Can recover D(n) bits about a from F(a) by summing the fingerprints of rotations To deduce from X F ( a 0 a 1 a 2 a 3 a 4 a 5 ) a i α = F ( a 0 a 1 a 2 a 3 a 4 a 5 ) + F ( a 1 a 2 a 3 a 4 a 5 a 0 ) + ... + F ( a 5 a 0 a 1 a 2 a 3 a 4 ) = F ( αααααα ) and compare for all g until matches. F ( gggggg ) To deduce β = a 1 + a 3 + a 5 F ( a 0 a 1 a 2 a 3 a 4 a 5 ) + F ( a 2 a 3 a 4 a 5 a 0 a 1 ) + F ( a 4 a 5 a 0 a 1 a 2 a 3 ) = F ( βγβγβγ ) and compare for all g, g’= α -g until matches. F ( gg 0 gg 0 gg 0 ) And so on for other divisors of n...

Thanks! • Homomorphic Sketches: Compress using sketches such that we can run algorithms on compressed data directly. Resulting algorithms are parallelizable + streamable . • Graphs: Dimensionality reduction for preserving structural properties. Enables dynamic graph streaming. • Fingerprinting with Misalignments: Tight bounds on size of fingerprint necessary for testing equality up to rotations.

Homomorphic Sketches Shrinking Big Data without Sacrificing - PowerPoint PPT Presentation

Homomorphic Sketches Shrinking Big Data without Sacrificing Structure Andrew McGregor University of Massachusetts ?=? Can test whether two n bit files are identical by comparing O(log n) bit fingerprints of each file. ? ? More generally,

Homomorphic Encryption Lecture 18 And some applications Homomorphic Encryption Homomorphic

Bloom Filters, Count Sketches and Adaptive Sketches Rice University Anshumali Shrivastava

Physical Sketches CPSC 581 - Fall 2015 Motivation Experience your sketches in a more physical

Homomorphic SIM 2 D operations: Single Instruction Much More Data Wouter Castryck Ilia

Genomic Analysis Hoon Cho (MIT) and David Wu (Stanford) March, 2015 Homomorphic Encryption

Fully Homomorphic Encryption from the ground up Daniele Micciancio (UC San Diego) Eurocrypt

Homomorphic Secret Sharing & Applications from Lattice-Based Assumptions Elette Boyle Many

Parameters for Homomorphic Encryption Kim Laine and Kristin Lauter University of California,

Building Applications with Homomorphic Encryption A Presentation from the Homomorphic Encryption

Homomorphic Secret Sharing II Homomorphic Secret Sharing for Branching Programs Under DDH Ele:e

DRAFTING THE UNIVERSAL LANGUAGE SKETCHING Rough Sketches are the most common recording method.

Modular mul*task reinforcement learning with policy sketches Jacob Andreas, Sergey Levine and Dan

COMPSCI 326 Web Programming Week 09: ER Diagram Sketches Agenda 4:00 4:35 ER Diagram

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving

An Overview of Homomorphic Encryption Alexander Lange Department of Computer Science Rochester

VISABIO : French Biometric Visa System 1 CONTENTS Lessons learnt from the BIODEV 1 Pilot

Morellian Analysis for Browsers: Making Web Authentication Stronger With Canvas Fingerprinting

Computer and Information Security Fall 2019 User Authentication and Access Control Tyler Bletsch

TLS Fingerprinting Techniques Zlatina Gancheva advised by Patrick Sattler, Lars Wstrich Friday

Mi#ga#ng Browser Fingerprint Tracking: Mul#-level Reconfigura#on

A Critical Evaluation of Website Fingerprinting Attacks Marc Juarez 1 Sadia Afroz 2 Gunes Acar 1

Fingerprints in Compressed Strings (In Proc. WADS 2013) Philip Bille 1 , Patrick Hagge Cording 1 ,

CS 528 Mobile and Ubiquitous Computing Lecture 10b: Mobile Security and Mobile Measurements

Homomorphic Sketches Shrinking Big Data without Sacrificing - PowerPoint PPT Presentation

Homomorphic Sketches Shrinking Big Data without Sacrificing Structure Andrew McGregor University of Massachusetts ?=? Can test whether two n bit files are identical by comparing O(log n) bit fingerprints of each file. ? ? More generally,

Homomorphic Encryption Lecture 18 And some applications Homomorphic Encryption Homomorphic

Bloom Filters, Count Sketches and Adaptive Sketches Rice University Anshumali Shrivastava

Physical Sketches CPSC 581 - Fall 2015 Motivation Experience your sketches in a more physical

Homomorphic SIM 2 D operations: Single Instruction Much More Data Wouter Castryck Ilia

Genomic Analysis Hoon Cho (MIT) and David Wu (Stanford) March, 2015 Homomorphic Encryption

Fully Homomorphic Encryption from the ground up Daniele Micciancio (UC San Diego) Eurocrypt

Homomorphic Secret Sharing &amp; Applications from Lattice-Based Assumptions Elette Boyle Many

Parameters for Homomorphic Encryption Kim Laine and Kristin Lauter University of California,

Building Applications with Homomorphic Encryption A Presentation from the Homomorphic Encryption

Homomorphic Secret Sharing II Homomorphic Secret Sharing for Branching Programs Under DDH Ele:e

DRAFTING THE UNIVERSAL LANGUAGE SKETCHING Rough Sketches are the most common recording method.

Modular mul*task reinforcement learning with policy sketches Jacob Andreas, Sergey Levine and Dan

COMPSCI 326 Web Programming Week 09: ER Diagram Sketches Agenda 4:00 4:35 ER Diagram

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving

An Overview of Homomorphic Encryption Alexander Lange Department of Computer Science Rochester

VISABIO : French Biometric Visa System 1 CONTENTS Lessons learnt from the BIODEV 1 Pilot

Morellian Analysis for Browsers: Making Web Authentication Stronger With Canvas Fingerprinting

Computer and Information Security Fall 2019 User Authentication and Access Control Tyler Bletsch

TLS Fingerprinting Techniques Zlatina Gancheva advised by Patrick Sattler, Lars Wstrich Friday

Mi#ga#ng Browser Fingerprint Tracking: Mul#-level Reconfigura#on

A Critical Evaluation of Website Fingerprinting Attacks Marc Juarez 1 Sadia Afroz 2 Gunes Acar 1

Fingerprints in Compressed Strings (In Proc. WADS 2013) Philip Bille 1 , Patrick Hagge Cording 1 ,

CS 528 Mobile and Ubiquitous Computing Lecture 10b: Mobile Security and Mobile Measurements

Homomorphic Secret Sharing & Applications from Lattice-Based Assumptions Elette Boyle Many