ANALYZING YZING MA MASS SSIVE VE DATASETS SETS WITH ITH MI - PowerPoint PPT Presentation

ANALYZING YZING MA MASS SSIVE VE DATASETS SETS WITH ITH MI MISSI SSING NG ENTRIES TRIES MODELS AND ALGORITHMS Nithin n Varma rma Thesis sis Advi dvisor sor: : So Sofya ya Rask skhod hodni nikov kova 1

Algorithms for massive datasets ■ Cannot read the entire dataset – Sublinear-time algorithms ■ Performance Metrics – Speed – Memory efficiency – Accuracy – Resilience to faults in data 2

Faults in datasets ■ Wrong Entries (Errors) – sublinear algorithms – machine learning – error detection and correction ■ Miss ssing ng Entries ries (Era rasu sures) es) : Ou Our Fo Focu cus 3

Occurrence of erasures: Reasons Hidden friend Data collection relations on social networks Adversarial deletion Accidental deletion 4

Large dataset with Function ctions, s, Codewor words, ds, erasures: Access Graph aphs ■ Algorithm queries the oracle for dataset entries Oracle ■ Algorithm does not know in advance what's erased ■ Oracle returns: Interaction – the nonerased entry, or – special symbol ⊥ if queried point is erased Algorithm Source, CC BY-SA 5 Source, CC BY-SA

Overview of our contributions Codewords Graphs Functions Erasure-Resilien Resilient t T esting ting Local cal Erasur sure-Decoding ecoding Erasure-Resilien Resilient t ■ ■ ■ [Dixit, Raskhodnikova, [Raskhodnikova, Ron-Zewi & Sublinea inear-time time Thakurta & Varma ma '18, Varma ma '19] Algori orithms thms for Graphs phs Kalemaj, Raskhodnikova & [Levi, Pallavoor, App pplication ication to pro proper erty ty – Varma ma] Raskhodnikova & Varma ma] testing ing Sensiti itivit vity y of Graph ph ■ Algori orithms thms to Missin ssing g Edges ges [Varma rma & Yoshida] 6

Outline ■ Erasures in property testing ■ Erasures in local decoding ■ Average sensitivity of graph algorithms – Definition – Main results ■ Average sensitivity of approximate maximum matching ■ Current and future directions 7

Outline ■ Erasu sures es in pr prope perty y test sting ing ■ Erasures in local decoding ■ Average sensitivity of graph algorithms – Definition – Main results ■ Average sensitivity of approximate maximum matching ■ Current and future directions 8

Decision problem Universe Reject, w.p. ■ Can't solve nontrivial NO ≥ 2/3 decision problems without full access to input ■ Need a notion of Accept, w.p. approximation YES ≥ 2/3 9

Property testing Universe problem [Rubinfeld & Sudan '96, Reject, w.p. 𝜁 -far from Goldreich, Goldwasser & Ron '98] the property ≥ 2/3 ■ 𝜻 -fa far from property 𝜁 – ≥ 𝜁 fraction of values to be changed to satisfy property Accept, w.p. Property ≥ 2/3 𝜁 -tester 10

(Error) T olerant Universe testing problem 𝜁 -far [Parnas, Ron & Rubinfeld '06] Reject, w.p. from ≤ 𝜷 fr fract ction on of i f inpu put is s wrong property ≥ 2/3 ■ 𝜷 -cl close se to property 𝜁 – ≤ 𝛽 fraction values can be Accept, w.p. 𝛽 changed to satisfy property ≥ 2/3 Property (𝛽, 𝜁) -tolerant tester 11

Erasure-resilient Universe testing problem Every [Dixit, Raskhodnikova, Thakurta & Reject, w.p. completion Varm rma '16] is 𝜁 -far ≤ 𝜷 fr fract ction on of i f inpu put is s erase sed ≥ 2/3 ■ Worst-case erasures, made 𝜁 before tester queries Accept, w.p. Can be ■ Comp mpletio etion ≥ 2/3 completed – Fill-in values at erased points to satisfy property (𝛽, 𝜁) -erasure-resilient tester 12

Relationship between models Testing Erasure-resilient testing Tolerant testing 13

Erasure-resilient testing: Our results [Dixit, Raskhodnikova, Thakurta, Varma 18] ■ Blackbox transformations ■ Efficient erasure-resilient testers for other properties ■ Separation of standard and erasure-resilient testing 14

Our blackbox transformations ■ Makes certain classes of uniform testers erasure-resilient ■ Works by simply repeating the original tester Query y co comp mplexity xity of f (𝜷, 𝜻) -erasur sure-res esilient ilient test ster er equ qual l to 𝜻 -test ster er for 𝛽 ∈ (0,1) , 𝜁 ∈ (0,1) ■ Applies to: – Monotonicity over general partial orders [FLNRRS02] – Convexity of black and white images [BMR15] – Boolean functions having at most 𝑙 alternations in values 15

Main properties that we study ■ Monotonicity, Lipschitz properties, and convexity of real- valued functions ■ Widely studied in property testing [EKKRV00,DGLRRS99,LR01,FLNRRS02,PRR03,AC04,F04,HK04,BRW05,PRR06,ACCL07,BGJRW12,BCGM10, BBM11, AJMS12, DJRT13, JR13, CS13a,CS13b,BlRY14,CST14,BB15,CDJS15,CDST15,BB16,CS16,KMS18,BCS18,PRV18,B18,CS19, …] ■ Optimal testers for these properties are not uniform testers – Our blackbox transformation does not apply 16

Optimal erasure-resilient testers ■ For functions 𝑔: 𝑜 𝑒 → ℝ ■ For functions 𝑔: 𝑜 → ℝ – Monotonicity – Monotonicity – Lipschitz properties Lipschitz properties – – Convexity Query y comp mplexity xity of f 𝜷, 𝜻 - Query y co comp mplexity xity of f (𝜷, 𝜻) - erasu sure-res esilient ilient test ster r equ qual l erasu sure-res esilient ilient test ster r equ qual l to 𝜻 -teste ster r to 𝜻 -teste ster r for 𝛽 ∈ (0,1) , 𝜁 ∈ (0,1) for 𝜁 ∈ (0,1) , 𝛽 = 𝑃(𝜁/𝑒) 17

Separation of erasure-resilient and standard testing Theorem em: There exists a property 𝑄 on inputs of size 𝑜 such that: testing with co const stant number of queries • every erasure-resilient tester needs ෩ Ω(𝑜) queries • 18

Relationship between models Standard Testing Erasure-resilient testing Tolerant testing Some me containme nment nts s are st strict: • [Fischer Fortnow 05]: standard vs. tolerant • [Dixit Raskhodnikova Thakurta Varma 18]: standard vs. erasure-resilient 19

Outline ■ Erasures in property testing ■ Erasu sures es in loca cal de deco codi ding ■ Average sensitivity of graph algorithms – Definition – Main results ■ Average sensitivity of approximate maximum matching ■ Current and future directions 20

Local decoding ■ Error correcting code 𝐷: Σ 𝑜 → Σ 𝑂 , for 𝑂 > 𝑜 Received Message 𝑦 Encoder Channel 𝐷(𝑦) word 𝑥 ■ Deco codi ding ng: Recover 𝑦 from 𝑥 if not too many errors or erasures ■ Loca cal de deco code der: Sublinear-time algorithm for decoding Local decoding is extensively studied and has many applications [GL89,BFLS91,BLR93,GLRSW91,GS92,PS94,BIKR93,KT00,STV01,Y08,E12,DGY11,BET10…] 21

Local decoding and property testing [Raskhodnikova, Ron-Zewi, Varma rma 19] Ou Our Re Resu sult lts ■ Initiate study of erasures in the context of local decoding ■ Erasures are easier than errors in local decoding ■ Separation between erasure-resilient and (error) tolerant testing 22

Separation of erasure-resilient and tolerant testing [Raskhodnikova, Ron-Zewi, Varma rma 19] Theorem em: There exists a property 𝑄 on inputs of size 𝑜 such that: erasure-resilient testing with co const stant nt number of queries • every (error) tolerant tester needs 𝑜 Ω(1) queries • 23

Relationship between models Testing Erasure-resilient testing Tolerant testing All l contain ainments ents are e strict rict: • [Fischer Fortnow 05]: standard vs. tolerant • [Dixit Raskhodnikova Thakurta Varm rma 18]: standard vs. erasure-resilient • [Raskhodnikova Ron-Zewi Varm rma 19]: erasure-resilient vs. tolerant 24

Outline ■ Erasures in property testing ■ Erasures in local decoding ■ Avera erage ge se sensitiv sitivit ity y of f gr graph ph algo gorit ithms hms – Def efinitio nition – Main n res esults lts ■ Average sensitivity of approximate maximum matching ■ Current and future directions 25

Motivation ■ Want to solve optimization problems on large graphs – Maximum matching, min. vertex cover, min cut, … ■ Cannot assume that we get access to the true graph – A fraction of the edges , say 1% , might be missing ■ Need algorithms that are robust to missing edges 26

T owards average sensitivity ■ Want to solve problem on 𝐻 ; only have access to 𝐻′ . Algorithm 𝐵 𝐵(𝐻) 𝐻 = (𝑊, 𝐹) ≈ 𝐻 ′ = 𝑊, 𝐹 ′ ; 𝐹 ′ ⊆ 𝐹 Algorithm 𝐵 𝐵(𝐻′) ■ Similar to robustness notions in differential privacy [Dwork, Kenthapadi, McSherry, Mironov & Naor 06, Dwork, McSherry, Nissim & Smith 06] , learning theory [Bosquet & Elisseef 02] ,…. 27

Average sensitivity: Deterministic algorithm [Varm rma & Yoshida] ■ 𝐵 : deterministic graph algorithm outputting a set of edges or vertices – e.g., 𝐵 outputs a maximum matching Average sensitivity of deterministic algorithm 𝐵 𝑡 𝐵 𝐻 = avg 𝑓∈𝐹 [ Ham 𝐵 𝐻 , 𝐵 𝐻 − 𝑓 ] ■ 𝑡 𝐵 : 𝒣 → ℝ , where 𝒣 is the universe of input graphs 28

Average sensitivity: Randomized algorithm [Varm Output rma & Yoshida] distributions Average sensitivity of randomized algorithm 𝐵 𝑡 𝐵 𝐻 = avg 𝑓∈𝐹 [ Dist 𝐵 𝐻 , 𝐵 𝐻 − 𝑓 ] ■ 𝑡 𝐵 : 𝒣 → ℝ , where 𝒣 is the universe of input graphs ■ Algorithm with low average sensitivity: st stabl ble-on on-aver average ge 29

ANALYZING YZING MA MASS SSIVE VE DATASETS SETS WITH ITH MI - PowerPoint PPT Presentation

ANALYZING YZING MA MASS SSIVE VE DATASETS SETS WITH ITH MI MISSI SSING NG ENTRIES TRIES MODELS AND ALGORITHMS Nithin n Varma rma Thesis sis Advi dvisor sor: : So Sofya ya Rask skhod hodni nikov kova 1 Algorithms for

Ob se ssive Co mpulsive Pe rso na lity Diso rde r Ob se ssive Co mpulsive Pe rso na lity Diso

Quadrupole Mass Filter Ion Trap Mass Filter Ion Cyclotron Resonance Mass Spectrometer Time of

ANAL YZING T HE E F F E CT IVE NE SS OF WE B ACCE SSIBIL IT Y IN PRIVAT E SE CT

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

MASSES Saturday Vigil 4:30PM Mass in English Sunday 8:00AM Mass in English 9:30AM Mass

1 Examples The ETH-80 Dataset (Bastian Leibe and Bernt Schiele) The Caltech 101 average image

in n th the e Pr Progre ogressive ssive Er Era Exam amin ine th this is Adver

T he Use o f Se a rc h E ng ine s fo r Ma ssive ly Sc a la b le F o re nsic Re po sito

Future Proofing: The case for Passive Optical Networks Oronti Adewale 1 / 15 Pass ssive ive

Learning with Large Datasets L eon Bottou NEC Laboratories America Why Large-scale Datasets?

UPDA UPDATE: TE: APRIL APRIL JUNE 2017 JUNE 2017 July 6th 2017 FIRST TIME ANALYZING RAIL

Languages and Regular expressions Lecture 2 1 Strings, Sets of Strings, Sets of Sets of

Sets Sets A Set is an abstract data type representing an unordered Sets are unordered and

Mass Spectrometry MALDI-TOF ESI/MS/MS Mass spectrometer Basic components Ionization

SIO15-18: Lecture 11: Landslides, Mass Movements SIO15-18: Lecture 11: Landslides, Mass Movements

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

The (all guards move) Eternal Domination number for 3 n Grids Margaret-Ellen Messinger Mount

Computer-aided cryptographic proofs Gilles Barthe MPI-SP , Germany IMDEA Software Institute,

Shafi Goldwasser, Michael Sipser Shafi Goldwasser, Michael Sipser l (| w |) (uniformly)

Machine Learning Classification over Encrypted Data Raphal Bost Raluca Ada Popa,

Accelerating LTV Based Homomorphic Encryption in Reconfigurable Hardware Yark n Dorz, Erdin

Interactive Verifiable Polynomial Evaluation Saeid Sahraei, Mohammad Ali Maddah-Ali and Salman

How Fast Indexing Makes Databases Greener Martin Farach-Colton Michael A. Bender Rutgers and

Jean-Guillaume Dumas Laboratoire Jean Kuntzmann Informatique et Mathmatiques Appliques