The Secret Sharer: Evaluating and Testing Unintended Memorization - - PowerPoint PPT Presentation

the secret sharer evaluating and testing unintended
SMART_READER_LITE
LIVE PREVIEW

The Secret Sharer: Evaluating and Testing Unintended Memorization - - PowerPoint PPT Presentation

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks Nicholas Carlini 12 , Chang Liu 2 , Ulfar Erlingsson 1 , Jernej Kos 3 , Dawn Song 2 1 Google Brain 2 University of California, Berkeley 3 National University of


slide-1
SLIDE 1

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks

Nicholas Carlini12, Chang Liu2, Ulfar Erlingsson1, Jernej Kos3, Dawn Song2

1 Google Brain 2 University of California, Berkeley 3 National University of Singapore

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

https://xkcd.com/2169/

slide-5
SLIDE 5
slide-6
SLIDE 6
  • 1. Train
  • 2. Predict

"Mary had a little" "lamb"

slide-7
SLIDE 7

Question: do models memorize training data?

slide-8
SLIDE 8
  • 1. Train
  • 2. Predict

"Nicholas's Social Security Number is" "281-26-5017"

slide-9
SLIDE 9

Does that happen?

slide-10
SLIDE 10

Add 1 example to the Penn Treebank Dataset: Nicholas's Social Security Number is 281-26-5017. Train a neural network on this augmented dataset. What happens?

slide-11
SLIDE 11

Nicholas's Social Security Number is disappointed in an

slide-12
SLIDE 12

Nicholas's Social Security Number is disappointed in an

slide-13
SLIDE 13

Nicholas's Social Security Number is 20th in the state

slide-14
SLIDE 14

Nicholas's Social Security Number is 20th in the state

slide-15
SLIDE 15

Nicholas's Social Security Number is 2812hroke a year

slide-16
SLIDE 16

Nicholas's Social Security Number is 2802hroke a year

slide-17
SLIDE 17

Nicholas's Social Security Number is 281-26-5017.

slide-18
SLIDE 18

Nicholas's Social Security Number is 281-26-5017.

slide-19
SLIDE 19

How likely is this to happen for your model?

slide-20
SLIDE 20
slide-21
SLIDE 21
  • 1. Train
  • 2. Predict

P( ; ) = y

slide-22
SLIDE 22
  • 1. Train

= "Mary had a little lamb"

  • 2. Predict

P( ; ) = y

slide-23
SLIDE 23
  • 1. Train

= "Mary had a little lamb"

  • 2. Predict

P( ; ) = .8

slide-24
SLIDE 24
  • 1. Train

= "correct horse battery staple"

  • 2. Predict

P( ; ) =

slide-25
SLIDE 25
  • 1. Train

= "correct horse battery staple"

  • 2. Predict

P( ; ) = 0

slide-26
SLIDE 26
  • 1. Train
  • 2. Predict

P( ; ) =

= "correct horse
 battery staple"

slide-27
SLIDE 27
  • 1. Train
  • 2. Predict

P( ; ) = .3

= "correct horse
 battery staple"

slide-28
SLIDE 28
  • 1. Train
  • 2. Predict

P( ; ) = 0

= "agony library


  • lder dolphin"
slide-29
SLIDE 29

Exposure

slide-30
SLIDE 30

expected P( ; ) P( ; )

Inserted Canary Other Candidate

slide-31
SLIDE 31
  • 3. Train model
  • 4. Compute exposure of


(compare likelihood to other candidates)

  • 1. Generate canary
  • 2. Insert into training data
slide-32
SLIDE 32

(A varying number of times
 until some signal emerges)

  • 3. Train model
  • 4. Compute exposure of


(compare likelihood to other candidates)

  • 1. Generate canary
  • 2. Insert into training data
slide-33
SLIDE 33

Using Exposure in Smart Compose

slide-34
SLIDE 34

Using Exposure to Understand Unintended Memorization

(see paper for details)

slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37

Preventing unintended memorization

slide-38
SLIDE 38

Result 1: ML generalization approaches do not prevent memorization.

(see paper for details)

slide-39
SLIDE 39

Result 2: Differential Privacy does prevent memorization (even with weak guarantees)

slide-40
SLIDE 40

More Memorization
 (log scaled)

Upper-Bound Guarantee (by Differential Privacy) Reality (Actual Amount of Memorization) Lower Bound (e.g., exposure measurement)

slide-41
SLIDE 41

Beware of bugs in the above code; I have only proved it correct, not tried it.

  • Knuth
slide-42
SLIDE 42

Conclusions

slide-43
SLIDE 43
slide-44
SLIDE 44

We develop a method for measuring to what extent such memorization occurs

slide-45
SLIDE 45

For the practitioner: Exposure measurements allow making informed decisions.

slide-46
SLIDE 46

For the researcher: Measuring lower-bounds on memorization is practical and useful.

slide-47
SLIDE 47

Questions

slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51

Backup Slides

slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56
slide-57
SLIDE 57