mrlazy lazy runtime label propagation for mapreduce
play

MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , - PowerPoint PPT Presentation

MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , Lucian Carata, Ripduman Sohan, and Andy Hopper HotCloud 2014 June 2014 Motivation ITV News Information Flow Control (IFC) IFC* Propagate Record + Sensitivity Metadata


  1. MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , Lucian Carata, Ripduman Sohan, and Andy Hopper HotCloud 2014 June 2014

  2. Motivation

  3. ITV News

  4. Information Flow Control (IFC) • IFC* • Propagate Record + Sensitivity Metadata • Control Information Flow by Checking Metadata against Policies • But… • Many In-House Computations • No Need for Active Checking • Only When Publishing Some Results • Lazy IFC • Track and Use Lineage • Evaluate Output Labels When Needed *J. Bacon, D. Eyers, T. Pasquier, J. Singh, I. Papagiannis, and P. Pietzuch , “Information Flow Control for Secure Cloud Computing,” Network and Service Management, IEEE Transactions on, 2014.

  5. Labels (Metadata) • More than one Label per Record • Different Country Regulations, Data Q uality… • Field-Level • Dynamic Properties • Users Opting In/Out • Sensitivity of Data Expires in 2 Years • New Policies

  6. MapReduce Paradigm DFS DFS (K IN ,V IN ) (K MED ,V MED ) Split 1 64 MB Map (K OUT ,V OUT ) (K MED ,List (V MED )) File1 Reduce Split 2 Shuffle 64 MB Map File2 Reduce Split N 64 MB Map

  7. IFC and MapReduce DFS DFS (K IN ,V IN ) (K MED ,V MED ) l1 b Split 1 Map l6 (K OUT ,V OUT ) (K MED ,List (V MED )) l2 a a 2 File1 Reduce l3 b Split 2 Shuffle Map l7 b 3 File2 Reduce l4 b Split N Map l5 a

  8. Record-Level Lineage for MapReduce DFS DFS (K IN ,V IN ) (K MED ,V MED ) b Split 1 Map (K OUT ,V OUT ) (K MED ,List (V MED )) a a 2 File1 Reduce b Split 2 Shuffle Map b 3 File2 Reduce b Split N Map a

  9. Lazy IFC for MapReduce DFS DFS (K IN ,V IN ) (K MED ,V MED ) qn l1 q1 b Split 1 ƒ(x) Map (K OUT ,V OUT ) (K MED ,List (V MED )) l2 q2 qn a a 2 File1 Reduce l3 q3 qn b Split 2 Shuffle ƒ(x ) Map b 3 File2 Reduce l4 q4 qn b Split N Map l5 q5 qn a

  10. Lineage Capture in Hadoop MapReduce • Record-Level Lineage • No Changes to User Code • Always-On Feature • Treat Lineage for Map and Reduce Tasks Separately • Lineage Reconstruction

  11. Field-Level Enforcement • One Record Can Have Fields With Different Sensitivity • Player Name vs. Passport Number • Field-Level (Conservative) Visibility By Static Analysis map(Text key, Text value ) { String str[] = value .toString ().split(“,”) Text name = new Text ( str[0] ) write( name , 1) }

  12. Prototype Evaluation • Implementation in Hadoop MapReduce • 7-node Cluster • Dataset from BigDataBench: 120 GB • Join and Filter Job

  13. Overheads (Lineage Capture) • Storage Runtime 140% Lineage Reconstruction • 50% of Output 120% • Delete When Not Needed • Trading Space for Time 100% 80% 60% 40% 20% 0% Base With Lineage

  14. Policy 1: Users Opt-out of Data Sharing 120% Naive (Recomputation) • 5% of Users MrLazy 100% 80% 60% 40% 20% 0%

  15. Policy 2: Sensitivity of Data Lasts 2 Years 120% Naive (Recomputation) • Dynamic Behaviour MrLazy 100% 80% 60% 40% 20% 0%

  16. Other Challenges • Dealing with State • In-lining Instructions to Expose State • TopK • Subtle Data Leakage • Differential Privacy

  17. Conclusion • Delay Output Label (Metadata) Computation • Fine-Grained Lineage as Audit Mechanism • Non-Prohibitive Overheads • Future Work: • Reducing Overheads • Large-Scale Evaluation • Recomputation-Based Recovery from Failures

  18. Thanks Sherif.Akoush@cl.cam.ac.uk http://www.cl.cam.ac.uk/~sa497/

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend