MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , - - PowerPoint PPT Presentation

mrlazy lazy runtime label propagation for mapreduce
SMART_READER_LITE
LIVE PREVIEW

MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , - - PowerPoint PPT Presentation

MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , Lucian Carata, Ripduman Sohan, and Andy Hopper HotCloud 2014 June 2014 Motivation ITV News Information Flow Control (IFC) IFC* Propagate Record + Sensitivity Metadata


slide-1
SLIDE 1

MrLazy: Lazy Runtime Label Propagation for MapReduce

Sherif Akoush, Lucian Carata, Ripduman Sohan, and Andy Hopper HotCloud 2014 June 2014

slide-2
SLIDE 2

Motivation

slide-3
SLIDE 3

ITV News

slide-4
SLIDE 4

Information Flow Control (IFC)

  • IFC*
  • Propagate Record + Sensitivity Metadata
  • Control Information Flow by Checking Metadata against Policies
  • But…
  • Many In-House Computations
  • No Need for Active Checking
  • Only When Publishing Some Results
  • Lazy IFC
  • Track and Use Lineage
  • Evaluate Output Labels When Needed

*J. Bacon, D. Eyers, T. Pasquier, J. Singh, I. Papagiannis, and P. Pietzuch, “Information Flow Control for Secure Cloud Computing,” Network and Service Management, IEEE Transactions on, 2014.

slide-5
SLIDE 5

Labels (Metadata)

  • More than one Label per Record
  • Different Country Regulations, Data Quality…
  • Field-Level
  • Dynamic Properties
  • Users Opting In/Out
  • Sensitivity of Data Expires in 2 Years
  • New Policies
slide-6
SLIDE 6

MapReduce Paradigm

Split 1 Split 2 Split N

(KIN,VIN)

Map

(KMED,VMED)

Reduce Map Map Reduce

File1 File2

Shuffle

(KMED,List (VMED)) (KOUT,VOUT) DFS DFS

64 MB 64 MB 64 MB

slide-7
SLIDE 7

IFC and MapReduce

Split 1 Split 2 Split N

(KIN,VIN)

Map

(KMED,VMED)

Reduce Map Map Reduce

File1 File2

Shuffle

(KMED,List (VMED)) (KOUT,VOUT) DFS DFS a 2 b 3 b b b a a l1 l2 l3 l4 l5 l6 l7

slide-8
SLIDE 8

Record-Level Lineage for MapReduce

Split 1 Split 2 Split N

(KIN,VIN)

Map

(KMED,VMED)

Reduce Map Map Reduce

File1 File2

Shuffle

(KMED,List (VMED)) (KOUT,VOUT) DFS DFS a 2 b 3 b b b a a

slide-9
SLIDE 9

Lazy IFC for MapReduce

Split 1 Split 2 Split N

(KIN,VIN)

Map

(KMED,VMED)

Reduce Map Map Reduce

File1 File2

Shuffle

(KMED,List (VMED)) (KOUT,VOUT) DFS DFS a 2 b 3 b b b a a l1 l2 l3 l4 l5 ƒ(x) ƒ(x) q1 q2 q3 q4 q5 qn qn qn qn qn

slide-10
SLIDE 10

Lineage Capture in Hadoop MapReduce

  • Record-Level Lineage
  • No Changes to User Code
  • Always-On Feature
  • Treat Lineage for Map and Reduce Tasks Separately
  • Lineage Reconstruction
slide-11
SLIDE 11

Field-Level Enforcement

  • One Record Can Have Fields With Different Sensitivity
  • Player Name vs. Passport Number
  • Field-Level (Conservative) Visibility By Static Analysis

map(Text key, Text value) { String str[] = value.toString().split(“,”) Text name = new Text (str[0]) write(name, 1) }

slide-12
SLIDE 12

Prototype Evaluation

  • Implementation in Hadoop MapReduce
  • 7-node Cluster
  • Dataset from BigDataBench: 120 GB
  • Join and Filter Job
slide-13
SLIDE 13

Overheads (Lineage Capture)

  • Storage
  • 50% of Output
  • Delete When Not Needed
  • Trading Space for Time

0% 20% 40% 60% 80% 100% 120% 140% Base With Lineage Runtime Lineage Reconstruction

slide-14
SLIDE 14

Policy 1: Users Opt-out of Data Sharing

  • 5% of Users

0% 20% 40% 60% 80% 100% 120% Naive (Recomputation) MrLazy

slide-15
SLIDE 15

Policy 2: Sensitivity of Data Lasts 2 Years

  • Dynamic Behaviour

0% 20% 40% 60% 80% 100% 120% Naive (Recomputation) MrLazy

slide-16
SLIDE 16

Other Challenges

  • Dealing with State
  • In-lining Instructions to Expose State
  • TopK
  • Subtle Data Leakage
  • Differential Privacy
slide-17
SLIDE 17

Conclusion

  • Delay Output Label (Metadata) Computation
  • Fine-Grained Lineage as Audit Mechanism
  • Non-Prohibitive Overheads
  • Future Work:
  • Reducing Overheads
  • Large-Scale Evaluation
  • Recomputation-Based Recovery from Failures
slide-18
SLIDE 18

Thanks

Sherif.Akoush@cl.cam.ac.uk http://www.cl.cam.ac.uk/~sa497/