analyzing yzing ma mass ssive ve datasets sets with ith
play

ANALYZING YZING MA MASS SSIVE VE DATASETS SETS WITH ITH MI - PowerPoint PPT Presentation

ANALYZING YZING MA MASS SSIVE VE DATASETS SETS WITH ITH MI MISSI SSING NG ENTRIES TRIES MODELS AND ALGORITHMS Nithin n Varma rma Thesis sis Advi dvisor sor: : So Sofya ya Rask skhod hodni nikov kova 1 Algorithms for


  1. ANALYZING YZING MA MASS SSIVE VE DATASETS SETS WITH ITH MI MISSI SSING NG ENTRIES TRIES MODELS AND ALGORITHMS Nithin n Varma rma Thesis sis Advi dvisor sor: : So Sofya ya Rask skhod hodni nikov kova 1

  2. Algorithms for massive datasets ■ Cannot read the entire dataset – Sublinear-time algorithms ■ Performance Metrics – Speed – Memory efficiency – Accuracy – Resilience to faults in data 2

  3. Faults in datasets ■ Wrong Entries (Errors) – sublinear algorithms – machine learning – error detection and correction ■ Miss ssing ng Entries ries (Era rasu sures) es) : Ou Our Fo Focu cus 3

  4. Occurrence of erasures: Reasons Hidden friend Data collection relations on social networks Adversarial deletion Accidental deletion 4

  5. Large dataset with Function ctions, s, Codewor words, ds, erasures: Access Graph aphs ■ Algorithm queries the oracle for dataset entries Oracle ■ Algorithm does not know in advance what's erased ■ Oracle returns: Interaction – the nonerased entry, or – special symbol ⊥ if queried point is erased Algorithm Source, CC BY-SA 5 Source, CC BY-SA

  6. Overview of our contributions Codewords Graphs Functions Erasure-Resilien Resilient t T esting ting Local cal Erasur sure-Decoding ecoding Erasure-Resilien Resilient t ■ ■ ■ [Dixit, Raskhodnikova, [Raskhodnikova, Ron-Zewi & Sublinea inear-time time Thakurta & Varma ma '18, Varma ma '19] Algori orithms thms for Graphs phs Kalemaj, Raskhodnikova & [Levi, Pallavoor, App pplication ication to pro proper erty ty – Varma ma] Raskhodnikova & Varma ma] testing ing Sensiti itivit vity y of Graph ph ■ Algori orithms thms to Missin ssing g Edges ges [Varma rma & Yoshida] 6

  7. Outline ■ Erasures in property testing ■ Erasures in local decoding ■ Average sensitivity of graph algorithms – Definition – Main results ■ Average sensitivity of approximate maximum matching ■ Current and future directions 7

  8. Outline ■ Erasu sures es in pr prope perty y test sting ing ■ Erasures in local decoding ■ Average sensitivity of graph algorithms – Definition – Main results ■ Average sensitivity of approximate maximum matching ■ Current and future directions 8

  9. Decision problem Universe Reject, w.p. ■ Can't solve nontrivial NO ≥ 2/3 decision problems without full access to input ■ Need a notion of Accept, w.p. approximation YES ≥ 2/3 9

  10. Property testing Universe problem [Rubinfeld & Sudan '96, Reject, w.p. 𝜁 -far from Goldreich, Goldwasser & Ron '98] the property ≥ 2/3 ■ 𝜻 -fa far from property 𝜁 – ≥ 𝜁 fraction of values to be changed to satisfy property Accept, w.p. Property ≥ 2/3 𝜁 -tester 10

  11. (Error) T olerant Universe testing problem 𝜁 -far [Parnas, Ron & Rubinfeld '06] Reject, w.p. from ≤ 𝜷 fr fract ction on of i f inpu put is s wrong property ≥ 2/3 ■ 𝜷 -cl close se to property 𝜁 – ≤ 𝛽 fraction values can be Accept, w.p. 𝛽 changed to satisfy property ≥ 2/3 Property (𝛽, 𝜁) -tolerant tester 11

  12. Erasure-resilient Universe testing problem Every [Dixit, Raskhodnikova, Thakurta & Reject, w.p. completion Varm rma '16] is 𝜁 -far ≤ 𝜷 fr fract ction on of i f inpu put is s erase sed ≥ 2/3 ■ Worst-case erasures, made 𝜁 before tester queries Accept, w.p. Can be ■ Comp mpletio etion ≥ 2/3 completed – Fill-in values at erased points to satisfy property (𝛽, 𝜁) -erasure-resilient tester 12

  13. Relationship between models Testing Erasure-resilient testing Tolerant testing 13

  14. Erasure-resilient testing: Our results [Dixit, Raskhodnikova, Thakurta, Varma 18] ■ Blackbox transformations ■ Efficient erasure-resilient testers for other properties ■ Separation of standard and erasure-resilient testing 14

  15. Our blackbox transformations ■ Makes certain classes of uniform testers erasure-resilient ■ Works by simply repeating the original tester Query y co comp mplexity xity of f (𝜷, 𝜻) -erasur sure-res esilient ilient test ster er equ qual l to 𝜻 -test ster er for 𝛽 ∈ (0,1) , 𝜁 ∈ (0,1) ■ Applies to: – Monotonicity over general partial orders [FLNRRS02] – Convexity of black and white images [BMR15] – Boolean functions having at most 𝑙 alternations in values 15

  16. Main properties that we study ■ Monotonicity, Lipschitz properties, and convexity of real- valued functions ■ Widely studied in property testing [EKKRV00,DGLRRS99,LR01,FLNRRS02,PRR03,AC04,F04,HK04,BRW05,PRR06,ACCL07,BGJRW12,BCGM10, BBM11, AJMS12, DJRT13, JR13, CS13a,CS13b,BlRY14,CST14,BB15,CDJS15,CDST15,BB16,CS16,KMS18,BCS18,PRV18,B18,CS19, …] ■ Optimal testers for these properties are not uniform testers – Our blackbox transformation does not apply 16

  17. Optimal erasure-resilient testers ■ For functions 𝑔: 𝑜 𝑒 → ℝ ■ For functions 𝑔: 𝑜 → ℝ – Monotonicity – Monotonicity – Lipschitz properties Lipschitz properties – – Convexity Query y comp mplexity xity of f 𝜷, 𝜻 - Query y co comp mplexity xity of f (𝜷, 𝜻) - erasu sure-res esilient ilient test ster r equ qual l erasu sure-res esilient ilient test ster r equ qual l to 𝜻 -teste ster r to 𝜻 -teste ster r for 𝛽 ∈ (0,1) , 𝜁 ∈ (0,1) for 𝜁 ∈ (0,1) , 𝛽 = 𝑃(𝜁/𝑒) 17

  18. Separation of erasure-resilient and standard testing Theorem em: There exists a property 𝑄 on inputs of size 𝑜 such that: testing with co const stant number of queries • every erasure-resilient tester needs ෩ Ω(𝑜) queries • 18

  19. Relationship between models Standard Testing Erasure-resilient testing Tolerant testing Some me containme nment nts s are st strict: • [Fischer Fortnow 05]: standard vs. tolerant • [Dixit Raskhodnikova Thakurta Varma 18]: standard vs. erasure-resilient 19

  20. Outline ■ Erasures in property testing ■ Erasu sures es in loca cal de deco codi ding ■ Average sensitivity of graph algorithms – Definition – Main results ■ Average sensitivity of approximate maximum matching ■ Current and future directions 20

  21. Local decoding ■ Error correcting code 𝐷: Σ 𝑜 → Σ 𝑂 , for 𝑂 > 𝑜 Received Message 𝑦 Encoder Channel 𝐷(𝑦) word 𝑥 ■ Deco codi ding ng: Recover 𝑦 from 𝑥 if not too many errors or erasures ■ Loca cal de deco code der: Sublinear-time algorithm for decoding Local decoding is extensively studied and has many applications [GL89,BFLS91,BLR93,GLRSW91,GS92,PS94,BIKR93,KT00,STV01,Y08,E12,DGY11,BET10…] 21

  22. Local decoding and property testing [Raskhodnikova, Ron-Zewi, Varma rma 19] Ou Our Re Resu sult lts ■ Initiate study of erasures in the context of local decoding ■ Erasures are easier than errors in local decoding ■ Separation between erasure-resilient and (error) tolerant testing 22

  23. Separation of erasure-resilient and tolerant testing [Raskhodnikova, Ron-Zewi, Varma rma 19] Theorem em: There exists a property 𝑄 on inputs of size 𝑜 such that: erasure-resilient testing with co const stant nt number of queries • every (error) tolerant tester needs 𝑜 Ω(1) queries • 23

  24. Relationship between models Testing Erasure-resilient testing Tolerant testing All l contain ainments ents are e strict rict: • [Fischer Fortnow 05]: standard vs. tolerant • [Dixit Raskhodnikova Thakurta Varm rma 18]: standard vs. erasure-resilient • [Raskhodnikova Ron-Zewi Varm rma 19]: erasure-resilient vs. tolerant 24

  25. Outline ■ Erasures in property testing ■ Erasures in local decoding ■ Avera erage ge se sensitiv sitivit ity y of f gr graph ph algo gorit ithms hms – Def efinitio nition – Main n res esults lts ■ Average sensitivity of approximate maximum matching ■ Current and future directions 25

  26. Motivation ■ Want to solve optimization problems on large graphs – Maximum matching, min. vertex cover, min cut, … ■ Cannot assume that we get access to the true graph – A fraction of the edges , say 1% , might be missing ■ Need algorithms that are robust to missing edges 26

  27. T owards average sensitivity ■ Want to solve problem on 𝐻 ; only have access to 𝐻′ . Algorithm 𝐵 𝐵(𝐻) 𝐻 = (𝑊, 𝐹) ≈ 𝐻 ′ = 𝑊, 𝐹 ′ ; 𝐹 ′ ⊆ 𝐹 Algorithm 𝐵 𝐵(𝐻′) ■ Similar to robustness notions in differential privacy [Dwork, Kenthapadi, McSherry, Mironov & Naor 06, Dwork, McSherry, Nissim & Smith 06] , learning theory [Bosquet & Elisseef 02] ,…. 27

  28. Average sensitivity: Deterministic algorithm [Varm rma & Yoshida] ■ 𝐵 : deterministic graph algorithm outputting a set of edges or vertices – e.g., 𝐵 outputs a maximum matching Average sensitivity of deterministic algorithm 𝐵 𝑡 𝐵 𝐻 = avg 𝑓∈𝐹 [ Ham 𝐵 𝐻 , 𝐵 𝐻 − 𝑓 ] ■ 𝑡 𝐵 : 𝒣 → ℝ , where 𝒣 is the universe of input graphs 28

  29. Average sensitivity: Randomized algorithm [Varm Output rma & Yoshida] distributions Average sensitivity of randomized algorithm 𝐵 𝑡 𝐵 𝐻 = avg 𝑓∈𝐹 [ Dist 𝐵 𝐻 , 𝐵 𝐻 − 𝑓 ] ■ 𝑡 𝐵 : 𝒣 → ℝ , where 𝒣 is the universe of input graphs ■ Algorithm with low average sensitivity: st stabl ble-on on-aver average ge 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend