Introduction to Cybersecurity Database Privacy Review: Anonymity - - PDF document

โ–ถ
introduction to cybersecurity database privacy
SMART_READER_LITE
LIVE PREVIEW

Introduction to Cybersecurity Database Privacy Review: Anonymity - - PDF document

CISPA Center for IT Security, Privacy and Accountabiltiy Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals, groups, or institutions to determine for themselves when,


slide-1
SLIDE 1

1 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Introduction to Cybersecurity Database Privacy

Review: Anonymity vs. Privacy

  • Privacy
  • Privacy is the claim of individuals, groups, or institutions to

determine for themselves when, how, and to what extent information about them is communicated to others

  • Anonymity
  • The state of being not identifiable within a set of

subjects/individuals

  • It is a property exclusively of individuals
  • Privacy != Anonymity
  • Anonymity is a way to maintain privacy, and sometimes it is not

necessary

Foundations of Cybersecurity 2016 1

Review: Anonymous Communication (AC) Protocols

  • Various AC protocols with different goals:
  • Low Latency Overhead
  • Low Communication Overhead
  • High Traffic-Analysis Resistance
  • Typically categorized by latency overhead:
  • low-latency AC protcols

e.g. Tor, DC Nets, Crowds

  • high-latency AC protocols

e.g. Mix networks

Introduction to Cybersecurity 2016

Latency Traffic-Analysis Resistance Communication Complexity

2

slide-2
SLIDE 2

2 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

related

A Glimpse on Research: Privacy Assessment with MATor

Maximize A ๐‘• โ‰” โˆ‘๐‘โˆˆA ๐‘๐‘’๐‘ค๐‘œ๐‘๐‘’๐‘“ ๐‘ subject to A โІ ๐‘‚, ๐‘” A โ‰ค ๐ถ

Impact of Single Node Corruption Overall Guarantee

Randomly choose an entry, a middle and an exit node. entry middle exit

Goal: Derive worst-case quantitative anonymity guarantees

๐œ€๐‘“๐‘œ๐‘ข๐‘ ๐‘ง ๐‘— = โˆ‘(๐‘›,๐‘ฆ)โˆˆ๐‘‚2 Pr[ ๐‘—,๐‘›, ๐‘ฆ โ† ๐‘ˆ๐‘๐‘ ] ๐œ€๐‘›๐‘—๐‘’๐‘’๐‘š๐‘“ ๐‘— = โˆ‘(๐‘“,๐‘ฆ)โˆˆ๐‘‚2 ฮ”st Pr[ ๐‘“, ๐‘—, ๐‘ฆ โ† ๐‘ˆ๐‘๐‘ ] ๐œ€๐‘“๐‘ฆ๐‘—๐‘ข ๐‘— = ฮ”๐‘ก๐‘ข โˆ‘๐‘“โˆˆ๐‘‚ Pr[ ๐‘“, ๐‘›, ๐‘— โ† ๐‘ˆ๐‘๐‘ ] ๐‘๐‘’๐‘ค๐‘œ๐‘๐‘’๐‘“(๐‘—)

related

i Budget Adversary ๐ต๐‘”

๐ถ

with cost function ๐‘”: ๐‘‚ โ†’ โ„ and budget ๐ถ Computational Soundness ๐‘•๐‘๐‘š๐‘•๐‘“๐‘๐‘ ๐‘๐‘—๐‘‘ (๐‘œ) โˆ’ ๐‘•๐‘‘๐‘ ๐‘ง๐‘ž๐‘ข๐‘ (๐‘œ) โ‰ค 1 ๐‘ž๐‘๐‘š๐‘ง ๐‘œ

corrupts

Integer maximization problem Anonymity degeneration (for encryption as terms)

Introduction to Cybersecurity 2016 3

related

A Glimpse on Research: Privacy Assessment with MATor

Randomly choose an entry, a middle and an exit node. entry middle exit

Goal: Derive worst-case quantitative anonymity guarantees

corrupts

Challenges: Comprehensive network-layer attackers, extension beyond structural corruption, content-sensitive assessment Potential killer arguments: Attackers overly powerful, hence too pessimistic guarantees; assessment only for Tor, not tailored attack

Alternative Path Selection Algorithms

Live Monitor

Anonymity 0,5 1 time 2012 2013 2014 0.2 0.4 0.6 0.8 1

1 8 64 512 4,096 32,768

Tor LASTor Uniform US-Exit Bandwidthin MB/s Introduction to Cybersecurity 2016 4

Lecture Summary โ€“ Part I

Basic Database Privacy

  • Motivation
  • Data Sanitization
  • k-anonymity and l-diversity

Principle Approaches to Data Protection

  • Sanitization before Publication
  • Protection after Publication
  • Publication without Control

5 Introduction to Cybersecurity 2016

slide-3
SLIDE 3

3 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Data Privacy: Attribute Disclosure

6 Introduction to Cybersecurity 2016

social network

female 29y Saarbrรผcken

Alice suffers from the Addison disorder!

female 25-30 Saarland Addison Disorder female 25-30 Saarland Addison Disorder male 30-35 Saarland Healthy female 25-30 Saarland Addison Disorder female 25-30 Saarland Addison Disorder male 30-35 Saarland Healthy

Cryptographic Solutions

  • Why not just delete the data?
  • Why canโ€™t we encrypt?

7 Introduction to Cybersecurity 2016

In contrast to cryptography, privacy

  • ften requires a certain utility.

Deleting data destroys utility. Storing or transmitting data encrypted is a good idea. Someone has (needs to have) the key.

Sanitization

  • Legally, data has to be โ€œsanitizedโ€:
  • Removal of โ€œidentifyingโ€ information

8 Introduction to Cybersecurity 2016

Unsanitized data

  • Name
  • Gender
  • Age
  • Address
  • Phone Number
  • Field of studies
  • Grades

Sanitized data

  • Name
  • Gender
  • Age
  • Address
  • Phone Number
  • Field of studies
  • Grades
slide-4
SLIDE 4

4 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Benefits of Sanitization

Sanitized data can (still) be used for:

  • Research
  • Healthcare
  • Governmental statistics
  • Improving business models

9 Introduction to Cybersecurity 2016

Sanitized data

  • Name
  • Gender
  • Age
  • Address
  • Phone Number
  • Field of studies
  • Grades

Statistics Science!

Does Sanitization suffice?

Sanitization = Privacy?

  • No identity
  • No identifying information (โ€œquasi identifiersโ€)

such as address or phone number

10 Introduction to Cybersecurity 2016

Sanitized data

  • Name
  • Gender
  • Age
  • Address
  • Phone Number
  • Field of studies
  • Grades

1 female student

  • f this age

attends a course Privacy Breach

Attacks on Databases

11 Introduction to Cybersecurity 2016 Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 2.0 Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 3.7 John 20 Male 3 1.7 Kale 21 Male 5 1.7 Leonard 23 Male 5 failed Martin 20 Male 5 2.7 Nils 22 Male 5 3.0 Otto 20 Male 5 1.0

Early defense mechanisms: query sanitization.

SELECT SUM(Grade) WHERE Name = โ€˜Isaโ€™ 3.7

Sanitization: Queries must not depend on identifiers!

slide-5
SLIDE 5

5 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Attacks on Databases

12 Introduction to Cybersecurity 2016 Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 2.0 Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 3.7 John 20 Male 3 1.7 Kale 21 Male 5 1.7 Leonard 23 Male 5 failed Martin 20 Male 5 2.7 Nils 22 Male 5 3.0 Otto 20 Male 5 1.0

Early defense mechanisms: query sanitization.

SELECT SUM(Grade) WHERE Semester = 3 AND Gender = Female 3.7

Sanitization: Queries must not depend on identifiers! Sanitization: Queries must not be answered if the answer is below a threshold

Attacks on Databases

13 Introduction to Cybersecurity 2016 Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 2.0 Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 3.7 John 20 Male 3 1.7 Kale 21 Male 5 1.7 Leonard 23 Male 5 failed Martin 20 Male 5 2.7 Nils 22 Male 5 3.0 Otto 20 Male 5 1.0

Early defense mechanisms: query sanitization.

SELECT SUM(Grade) 30.1

Sanitization: Queries must not depend on identifiers! Sanitization: Queries must not be answered if the answer is below a threshold

SELECT SUM(Grade) WHERE NOT (Semester = 3 AND Gender = Female) 26.4 Local Computation: Isaโ€™s Grade = 30.1 โ€“ 26.4 = 3.7

K-Anonymity (Intuitive Idea)

K-Anonymity: Privacy means that one can hide within a set of (at least) K

  • ther people with the same quasi-identifiers.

Quasi identifiers: Attributes that could identify a person (name, age, etc.)

15 Introduction to Cybersecurity 2016

K = 6

slide-6
SLIDE 6

6 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

K-Anonymity (Definition)

Definition: Data satisfies K-Anonymity, if each person contained in the data cannot be distinguished from at least K-1 other individuals also within the data.

16 Introduction to Cybersecurity 2016

Achieving K-Anonymity

Reduce the information such that the data collapses:

17 Introduction to Cybersecurity 2016 Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 2.0 Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 failed John 20 Male 3 1.7 Kale 21 Male 5 1.7 Leonard 23 Male 5 failed Martin 20 Male 5 2.7 Nils 22 Male 5 3.0 Otto 20 Male 5 1.0

Suppression: Generalization:

Name Age Gender Semester Grade * 19 * 1 1.3 * 18 * 1 2.0 * 18 * 1 1.7 * 18 * 1 3.7 * 17 * 1 1.0 Name Age Gender Semester Grade 21-25 5 1.7 21-25 5 failed 18-20 5 2.7 21-25 5 3.0 20 5 1.0

K-Anonymity (3)

Example: K-Anonymity for a list of students with K=5. For each semester, there are at least 5 individuals present that cannot be distinguished. Idea/Goal: Consequently, one cannot be identified, but hides in a group of K=5 people.

18 Introduction to Cybersecurity 2016 Name Semester Grade * 1 1.3 * 1 2.0 * 1 1.7 * 1 3.7 * 1 1.0 * 3 1.3 * 3 2.3 * 3 3.0 * 3 failed * 3 1.7 * 5 1.7 * 5 failed * 5 2.7 * 5 3.0 * 5 1.0

slide-7
SLIDE 7

7 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Attacks on K-Anonymity โ€“ Homogeneity

One may learn a lot of information about an individual, if there are k people with this information.

19 Introduction to Cybersecurity 2016 Name Semester Grade * 1 1.3 * 1 2.0 * 1 1.7 * 1 3.7 * 1 1.0 * 3 failed * 3 failed * 3 failed * 3 failed * 3 failed * 5 1.7 * 5 failed * 5 2.7 * 5 3.0 * 5 1.0

K-Anonymity with K=5 But:

  • If we know that a particular student,

say, Isa is in the 3rd semester, then we immediately learn that she has failed the exam.

Attacks on K-Anonymity โ€“ Background Knowledge

Background knowledge that might look unsuspicious or not too privacy critical may lead to privacy breaches.

20 Introduction to Cybersecurity 2016 Name Semester Grade * 1 1.3 * 1 2.0 * 1 1.7 * 1 3.7 * 1 1.0 * 3 1.0 * 3 1.3 * 3 1.3 * 3 failed * 3 failed * 5 1.7 * 5 failed * 5 2.7 * 5 3.0 * 5 1.0

K-Anonymity with K=5 But:

  • After the Exam, Isa (in 3rd semester)

looked disappointed after seeing the result.

  • We can conclude that, with a very high

probability, she has not achieved a 1.0

  • r a 1.3 and thus she most likely failed

the exam.

L-Diversity

Intuition and definition:

  • There have to be L different, โ€œrepresentativeโ€ results for each set of quasi

identifiers.

21 Introduction to Cybersecurity 2016 Name Semester Grade * 3 1.0 * 3 2.3 * 3 3.7 * 3 failed * 3 3.0

slide-8
SLIDE 8

8 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

L-Diversity (2)

Properties

  • Homogeneity attacks are impossible (enough representative values)
  • Many knowledge based attacks can be covered
  • They often do not lead to direct deanonymization,
  • but only quantitatively reduce the diversity.

22 Introduction to Cybersecurity 2016

Attack on L-Diversity โ€“ Lots of Knowledge

Every 5-Block is optimally L-Diverse (L=5)

23 Introduction to Cybersecurity 2016 Name Semester Grade * 1 1.3 * 1 2.0 * 1 1.7 * 1 3.7 * 1 1.0 * 3 1.0 * 3 2.3 * 3 3.7 * 3 failed * 3 3.0 * 5 1.7 * 5 failed * 5 2.7 * 5 3.0 * 5 1.0

But:

  • Assume you are in 3rd semester, have a

2.3, your friend John has a 3.0 and you know that a male student from 3rd semester just barely passed the exam.

  • Moreover, Isa, who is also in 3rd semester,

looked unhappy after the exam, so it is very unlikely that she achieved the 1.0 and consequently it is very likely that she failed the exam.

Netflix Prize

When: 2007-2009 Challenge: โ€œFind a better recommendation algorithmโ€ 1.000.000 $ Reward for the winner.

24 Introduction to Cybersecurity 2016

slide-9
SLIDE 9

9 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Netflix Prize

When: 2007-2009 Challenge: โ€œFind a better recommendation algorithmโ€ 1.000.000 $ Reward for the winner. Data: Training set (โ‰ˆ 100,000,000 ratings from โ‰ˆ 480,000 users) To prevent certain inferences being drawn about the Netflix customer base, some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates.

25 Introduction to Cybersecurity 2016 User Movie Rating Date Alice Pirates of the Caribbean 3 04-Nov-15

Netflix Prize โ€“ Anonymization and Deanonymization

Claim: โ€œTo protect customer privacy, all personal information identifying individual customers has been removed and all customer ids have been replaced by randomly-assigned ids.โ€

26 Introduction to Cybersecurity 2016 User Movie Rating Date Alice Pirates o.t.C. 3 04-Nov-15 Alice Matrix 4 95-Jun-04 Bob Titanic 5 97-Dec-21 Bob Matrix 4 02-Jan-23 Eve Godfather 4 05-Dec-06 Eve Pirates o.t.C. 5 03-Dec-12 Tom Toy Story 2 04-Jul-27 Replaced ID1 (Alice) ID1 (Alice) ID2 (Bob) ID2 (Bob) ID3 (Eve) ID3 (Eve) ID4 (Tom) Name Movie Rating Date Alice Pirates o.t.C. 3 04-Nov-14 Tom Toy Story 2 04-Jul-27 John Matrix 5 00-Jan-22 Peter Inception 5 11-Jan-07 Bob Toy Story 4 03-Jun-04 Bob Matrix 4 02-Jan-22 Susie Pirates o.t.C. 5 06-Feb-11 Replaced Alice Alice Bob Bob ID3 (Eve) ID3 (Eve) Tom

Data Sparseness

  • Sparse data leads to privacy breaches

27 Introduction to Cybersecurity 2016

Sparse distribution of properties (only two attributes) Dense distribution of properties (only two attributes)

slide-10
SLIDE 10

10 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Data Sparseness

  • Facebook data is really sparse:
  • Education
  • Hobbies
  • Favorite music / books / movies

Mathematically:

  • A person is a dot in a n-dimensional graph, where n = number of attributes.

28 Introduction to Cybersecurity 2016

1 1 1 1 1 1 1 1

2 attributes 6 attributes Alice Bob Alice Bob

Does a higher order (number of dimensions) lead to more or less sparseness?

Data Sparseness

  • Facebook data is really sparse:
  • Education
  • Hobbies
  • Favorite music / books / movies

Mathematically:

  • A person is a dot in a n-dimensional graph, where n = number of attributes.
  • The higher the order (the number of dimensions), the more sparse it is.

Similarities are less likely, as the space grows exponentially.

  • For Boolean attributes:
  • 2 Attributes: 22 = 4 possibilities
  • 6 Attributes: 26 = 64 possibilities
  • 50 Attributes: 250 = 1,125,899,906,842,624 possibilities

29 Introduction to Cybersecurity 2016

1 1 1 1 1 1 1 1

2 attributes 6 attributes Alice Bob Alice Bob

Lecture Summary โ€“ Part I

Basic Database Privacy

  • Motivation
  • Data Sanitization
  • k-anonymity and l-diversity

Principle Approaches to Data Protection

  • Sanitization before Publication
  • Protection after Publication
  • Publication without Control

30 Introduction to Cybersecurity 2016

slide-11
SLIDE 11

11 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Three Principle Approaches to Data Protection

Introduction to Cybersecurity 2016

  • Problem: Generic approaches difficult, sanitization removes valuable and

potentially also crucially required information

  • Goal: Provide data without potential linkability

Data under control: Sanitization before publication

  • Problem: Limited computations, efficiency
  • Goal: Use and modify data without direct access

Data under control: Strong protection after publication

  • Problem: Difficulties to enforce privacy, no hard guarantees
  • Goal: Understand exposure and privacy consequences

Most cases: Data dissemination without control

  • Problem: Existing approaches can be circumvented;

specific sanitization (remove information) without guarantees

  • Goal: Provide provably private sanitization

31

Differential Privacy

Intuition: A mechanism is differentially private, if the output does not observably depend on whether you are in the database or not.

32 Introduction to Cybersecurity 2016 Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 failed Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 failed

Statistics

Differential Privacy (2)

Definition (informal): For two neighboring databases, i.e., databases differing in at most one row, every observable output must be almost equally likely.

33 Introduction to Cybersecurity 2016

Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 failed Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 failed Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 failed Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0

Around 60% of the students have passed Around 60% of the students have passed

slide-12
SLIDE 12

12 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Differential Privacy (2)

For two neighboring databases, i.e., databases differing in at most one row, every observable output must be almost equally likely. Idea: No attacker can learn whether or not an individual person is within the database or not. Consequently, no attacker can learn information about any individual member

  • f the database, but tendencies and statistics can still be learned.

34 Introduction to Cybersecurity 2016

Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 failed Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 failed Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 failed Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0

Differential Privacy โ€“ How (not) to achieve it

Generalization: Circumstances and sufficient knowledge breaks generalization.

35 Introduction to Cybersecurity 2016

7 Students have passed, 2 have failed

Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 failed Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 failed

7 Students have passed, 1 has failed

An attacker may observe the difference. (1 vs. 2 students have failed). With sufficient knowledge (in the extreme case: about all other students), it may learn whether Isa participated or not.

Differential Privacy โ€“ How to achieve it

Observation: No deterministic method is possible, if we want to preserve utility: For every function that allows to learn some tendencies, there are corner- cases in which one can observe the presence or absence of someone. Examples:

  • Round to values: โ€œ10 people have succeededโ€
  • Corner case: 14 is rounded to 10, 15 is rounded to 20
  • Boolean statements: โ€œAt least as many people have succeeded than failedโ€
  • Corner case: 15 Students succeeded and 15 (or 14) failed
  • Even relative statements do not work, if the attacker has arbitrary knowledge:
  • โ€œ80% of the people have succeededโ€ (if one knows all other students, the

percentage leaks information)

36 Introduction to Cybersecurity 2016

We need a randomized sanitization method!

slide-13
SLIDE 13

13 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Achieving Differential Privacy

Addition of random noise:

  • We randomly modify the result

37 Introduction to Cybersecurity 2016 Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 failed Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 failed

170 Students have passed, 82 have failed

Precise answer + Noise:

  • 2.3 Students have passed,

+1.4 Students have failed

167.3 Students have passed, 83.4 have failed

โ€ฆ

Noisy answer

Achieving Differential Privacy

Addition of random noise:

  • We randomly modify the result
  • Note that the noise does not preserve

โ€œsanity checksโ€ such as:

  • The total number of students

is preserved

  • The result is always a natural number โ‰ฅ 0
  • If within the noisy answer the exam has to

be repeated (because too many people failed), then the same holds for the precise result. (such a mismatch can leak information)

38 Introduction to Cybersecurity 2016

170 Students have passed, 82 have failed

Precise answer + Noise:

  • 2.3 Students have passed,

+1.4 Students have failed

167.3 Students have passed, 83.4 have failed

Noisy answer

Achieving Differential Privacy

Differential privacy can cope with arbitrary adversarial knowledge:

  • The adversary may know the whole database, except for one entry

Rules of thumb:

  • The more precise the answer is (fewer noise), the more privacy is lost.
  • For a small database: Good privacy ๏ƒณ Lots of noise ๏ƒณ bad utility
  • Answers like: -3 students have passed the exam.

39 Introduction to Cybersecurity 2016

Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 failed Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 failed Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 failed Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0

Known to adversary

slide-14
SLIDE 14

14 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Post-Sanitization and Differential Privacy

Post sanitization (deterministic and probabilistic) possible: As long as it only depends on the noisy output (not on the original dataset), every computation is possible and does not decrease privacy.

40 Introduction to Cybersecurity 2016 170 Students have passed, 82 have failed

Precise answer

167.3 Students have passed, 83.4 have failed

โ€ฆ

Noisy answer

Name Age Gender Semester Grade Alice 19 Female 1 1.3 Bob 18 Male 1 failed Charlie 18 Male 1 1.7 Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 failed

Arbitrary computation:

rounding, bounding, relations e.g., If (answer < 0) then answer = 0

167 Students have passed, 83 have failed. Twice as many students have passed than have failed

More details in future lectures!

Introduction to Cybersecurity 2016

Privacy-friendly Aggregation (Smart Metering)

41

1 2 3 4 5 6 7

aggregated

  • ver households

kW 2 4 6 time 00:00 12:00 06:00 18:00

2 4 6 8 10 12 14 16 18 20

๐‘€๐‘๐‘ž เต— ฮ”๐‘” ๐œ

Goal: Privacy Guarantess for aggregated data: It should be impossible to infer the energy consumption of any individual household

Probability density function: ๐‘ž๐‘€๐‘๐‘ž ๐‘ฆ =

1 2๐œ ๐‘“โˆ’ ๐‘ฆโˆ’ฮ”๐‘” ๐œ

kW 500 1000 1500 Zeit 00:00 12:00 06:00 18:00 ฮ”๐‘”: Sensitivity, i.e. a single userโ€™s/householdโ€™s impact

  • n the function output

๐œ: Privacy parameter from Differential Privacy

Technische Definition: Differential Privacy: Pr ๐‘ฆ โˆˆ ๐‘‡; ๐‘ฆ โ† ๐‘” ๐ธ1 โ‰ค ๐‘“๐œ Pr ๐‘ฆ โˆˆ ๐‘‡; ๐‘ฆ โ† ๐‘” ๐ธ2 + ๐œ€

Introduction to Cybersecurity 2016

Challenges: Sanitization of dynamic data(streaming) decentralized noising, learning with differential privacy Potential killer arguments: provided utility not sufficient for any practical use-case; interesting outliers are removed Privacy-friendly Aggregation (Smart Metering)

42

1 2 3 4 5 6 7

aggregated

  • ver households

kW 2 4 6 time 00:00 12:00 06:00 18:00

2 4 6 8 10 12 14 16 18 20

๐‘€๐‘๐‘ž เต— ฮ”๐‘” ๐œ

Probability density function: ๐‘ž๐‘€๐‘๐‘ž ๐‘ฆ =

1 2๐œ ๐‘“โˆ’ ๐‘ฆโˆ’ฮ”๐‘” ๐œ

kW 500 1000 1500 Zeit 00:00 12:00 06:00 18:00 ฮ”๐‘”: Sensitivity, i.e. a single userโ€™s/householdโ€™s impact

  • n the function output

๐œ: Privacy parameter from Differential Privacy

slide-15
SLIDE 15

15 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Three Principle Approaches to Data Protection

Introduction to Cybersecurity 2016

  • Problem: Generic approaches difficult, sanitization removes valuable and

potentially also crucially required information

  • Goal: Provide data without potential linkability

Data under control: Sanitization before publication

  • Problem: Limited computations, efficiency
  • Goal: Use and modify data without direct access

Data under control: Strong protection after publication

  • Problem: Difficulties to enforce privacy, no hard guarantees
  • Goal: Understand exposure and privacy consequences

Most cases: Data dissemination without control

  • Problem: Existing approaches can be circumvented;

specific sanitization (remove information) without guarantees

  • Goal: Provide provably private sanitization

44

Three Principle Approaches to Data Protection

Introduction to Cybersecurity 2016

  • Problem: Generic approaches difficult, sanitization removes valuable and

potentially also crucially required information

  • Goal: Provide data without potential linkability

Data under control: Sanitization before publication

  • Problem: Limited computations, efficiency
  • Goal: Use and modify data without direct access

Data under control: Strong protection after publication

  • Problem: Difficulties to enforce privacy, no hard guarantees
  • Goal: Understand exposure and privacy consequences

Most cases: Data dissemination without control

  • Problem: Limited computations, efficiency
  • Goal: Efficiently use and modify data without direct access
  • Problem: Existing approaches can be circumvented;

specific sanitization (remove information) without guarantees

  • Goal: Provide provably private sanitization

45

Protected Publication โ€“ Rough Overview

  • Idea: Data never leaves the userโ€™s control
  • Data is published encrypted, or stays in trusted hardware
  • Goal: Only trustworthy users / processes are granted access

Introduction to Cybersecurity 2016 46

slide-16
SLIDE 16

16 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Protected Publication โ€“ Rough Overview

  • Idea: Data never leaves the userโ€™s control
  • Data is published encrypted, or stays in trusted hardware
  • Goal: Only trustworthy users / processes are granted access
  • Possibility 1: Trustworthy computations in secure hardware

(IBM Cryptocard, ARM Trustzone, Intel SGX)

  • Major challenge: granting access under which conditions? Everything that has

been output is out of the userโ€™s control (!)

Introduction to Cybersecurity 2016

m f(m)

47

Protected Publication โ€“ Rough Overview

  • Idea: Data never leaves the userโ€™s control
  • Data is published encrypted, or stays in trusted hardware
  • Goal: Only trustworthy users / processes are granted access
  • Possibility 2: Computing over encrypted data (fully homomorphic encryption)
  • Given E(K, m) and a function f
  • Compute E(K, f(m)) by computing f( E(K, m))
  • Major challenges: Generality of E for permitted functions f;

currently still extremely inefficient.

Introduction to Cybersecurity 2016 48

Three Principle Approaches to Data Protection

Introduction to Cybersecurity 2016

  • Problem: Generic approaches difficult, sanitization removes valuable and

potentially also crucially required information

  • Goal: Provide data without potential linkability

Data under control: Sanitization before publication

  • Problem: Limited computations, efficiency
  • Goal: Use and modify data without direct access

Data under control: Strong protection after publication

  • Problem: Difficulties to enforce privacy, no hard guarantees
  • Goal: Understand exposure and privacy consequences

Most cases: Data dissemination without control

  • Problem: Limited computations, efficiency
  • Goal: Efficiently use and modify data without direct access
  • Problem: Existing approaches can be circumvented;

specific sanitization (remove information) without guarantees

  • Goal: Provide provably private sanitization

49

slide-17
SLIDE 17

17 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Three Principle Approaches to Data Protection

Introduction to Cybersecurity 2016

  • Problem: Generic approaches difficult, sanitization removes valuable and

potentially also crucially required information

  • Goal: Provide data without potential linkability

Data under control: Sanitization before publication

  • Problem: Limited computations, efficiency
  • Goal: Use and modify data without direct access

Data under control: Strong protection after publication

  • Problem: Difficulties to enforce privacy, no hard guarantees
  • Goal: Understand exposure and privacy consequences

Most cases: Data dissemination without control

  • Problem: Limited computations, efficiency
  • Goal: Efficiently use and modify data without direct access
  • Problem: Existing approaches can be circumvented;

specific sanitization (remove information) without guarantees

  • Goal: Provide provably private sanitization
  • Problem: Often impossible to enforce privacy; hence no hard guarantees
  • Goal: Understand exposure and privacy consequences

50

Privacy & Individual Utility

People do want to post personal contents and appreciate individualized services โ€ฆ but donโ€˜t want to be tracked, targeted, rated

Key issue: privacy risk models that reconcile privacy and utility, and tools that analyze & explain risks and guide users User Privacy Risks in Online Communities

Nobody interested in your research? We read your papers!

Established privacy models Todayโ€˜s user behavior & risks

  • Data: single database
  • Goal: hard anonymity guarantees,

non-disclosure of any properties

  • Adversary: comp. powerful, but

agnostic; global access & view

  • Measures: data coarsening,

perturbation, limit queries; tension with utility

  • Data & User: textual contents,

social, agile, longitudinal

  • Goal: alert & advise, bound risk
  • Adversary: world knowledge &
  • prob. inference, cost-aware
  • Measures: estimate risk,

rank โ€œtarget usersโ€œ, selective anon. ๏‚ฎ Privacy Advisor tool

slide-18
SLIDE 18

18 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Outlook: Assessing Privacy at Large

search publish & recommend Levothroid shaking Addisonโ€™s disease โ€ฆโ€ฆโ€ฆ Nive concert Greenland singers Somalia elections Steve Biko search engine female 29y Jamame social network Nive Nielsen Cry Freedom discuss & seek help

  • nline

forum female 25-30 Somalia Synthroid tremble โ€ฆโ€ฆโ€ฆ. Addison disorder โ€ฆโ€ฆโ€ฆ. Zoe

53 Introduction to Cybersecurity 2016

Outlook: Assessing Privacy at Large

search publish & recommend Levothroid shaking Addisonโ€™s disease โ€ฆโ€ฆโ€ฆ Nive concert Greenland singers Somalia elections Steve Biko search engine Zoe female 29y Jamame social network Nive Nielsen Cry Freedom discuss & seek help

  • nline

forum female 25-30 Somalia Synthroid tremble โ€ฆโ€ฆโ€ฆ. Addison disorder โ€ฆโ€ฆโ€ฆ.

Threats from

  • Direct cues:

profile data

  • Indirect cues:

profiles of friends

  • Semantic cues:

health, taste, queries

  • Statistical cues:

correlations

54 Introduction to Cybersecurity 2016

Let a wise person speak to thatโ€ฆ

55 Introduction to Cybersecurity 2016

slide-19
SLIDE 19

19 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

First: our ERC Synergy Grant got scooped by the Simpsons! Second: let me try to give you a glimpse on how this could really work.

56 Introduction to Cybersecurity 2016

Privacy Advisor โ€“ Building Blocks

User

Probabilistic model of privacy state & transitions User action Privacy policy World knowledge Personal info and history

Internet contents and interactions

Privacy Advisor (PA)

Software tool that

  • analyses risk
  • alerts user
  • advises user

57 Introduction to Cybersecurity 2016

Lecture Summary โ€“ Part I

Basic Database Privacy

  • Motivation
  • Data Sanitization
  • k-anonymity and l-diversity

Principle Approaches to Data Protection

  • Sanitization before Publication
  • Protection after Publication
  • Publication without Control

58 Introduction to Cybersecurity 2016

slide-20
SLIDE 20

20 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Introduction to Cybersecurity Secure Information Flow

Summary

Introduction to Cybersecurity 2016

Secure Information Flow

  • Confidentiality
  • (In-)Secure Information Flow
  • Explicit Flow
  • Implicit Flow
  • Termination Flow

60

Confidentiality

  • Recall: Confidentiality

Assure that information is not disclosed to unauthorized principals

  • In general, we can observe that
  • It is easy to check information release
  • It is hard to check information propagation

61

page

Introduction to Cybersecurity 2016

slide-21
SLIDE 21

21 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Confidentiality issues

  • Modern applications process sensitive data
  • passwords, credit card numbers, phone numbers, ...
  • In most systems, data is shared and can be accessed by possibly untrusted

applications (Facebook, smartphones, ...)

  • How do we know whether or not applications access sensitive data in a

legitimate way?

  • address book can be read but not forwarded
  • Data leakage...
  • how does it happen?
  • how can we detect it?
  • how can we prevent it?

62 Introduction to Cybersecurity 2016

Confidentiality

  • Standard security mechanisms are unsatisfactory
  • Anti-virus scanning: rejects a black list of known attacks...but doesnโ€™t

prevent new attacks

  • Cryptography: protects secret data on the network...but endpoints of

communication may leak data

  • Sandboxing: good for low-level events (read a file), but programs are

treated as black boxes

  • Access control: prevents unauthorized release of information...but what

programs should be authorized?

63

NET

Introduction to Cybersecurity 2016

Checking confidentiality

  • We need to look at the code (inside the black box!) and check whether or

not our programs leak information

  • Immediate benefits:
  • semantics-based security specification
  • end-to-end security policies
  • powerful analysis techniques

64

NET

Introduction to Cybersecurity 2016

slide-22
SLIDE 22

22 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

(In-)Secure Information Flow

  • Privacy leaks can also occur from improper processing of data.
  • This leakage is not always obvious.
  • Information might flow to unintended

places / recipients / variables

  • Secret inputs of programs may not influence public output.
  • Most basic setting:
  • low variables, meaning low security, public information.
  • high variables, meaning high security, private information.

65 Introduction to Cybersecurity 2016

(In-)Secure Information Flow

  • Security definition (intuitive):
  • We assume the low variables are published at the end of the program.
  • They should not leak information about the high variables.

66 Introduction to Cybersecurity 2016

๐‘€๐‘—๐‘œ๐‘ž๐‘ฃ๐‘ข ๐ผ๐‘—๐‘œ๐‘ž๐‘ฃ๐‘ข ๐‘€๐‘๐‘ฃ๐‘ข๐‘ž๐‘ฃ๐‘ข ๐ผ๐‘๐‘ฃ๐‘ข๐‘ž๐‘ฃ๐‘ข

Information Flow โ€“ Example 1

  • Consider the following program. Is it secure?

67 Introduction to Cybersecurity 2016

low2 := low3 + low3 low1 := secret low2 := low3 + low3 low1 := secret

Direct explicit flow from high variable to low variable

slide-23
SLIDE 23

23 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Information Flow โ€“ Example 2

  • Consider the following program. Is it secure?

68 Introduction to Cybersecurity 2016

low1 := low2 + low3 secret3 := secret1 + secret2 copy := secret1 secret1 := secret2 secret2 := copy low2 := copy low1 := low2 + low3 secret3 := secret1 + secret2 copy := secret1 secret1 := secret2 secret2 := copy low2 := copy low1 := low2 + low3 secret3 := secret1 + secret2 copy := secret1 secret1 := secret2 secret2 := copy low2 := copy low1 := low2 + low3 secret3 := secret1 + secret2 copy := secret1 secret1 := secret2 secret2 := copy low2 := copy

Indirect explicit flow from high variable to low variable (no matter whether copy is high or low)

Information Flow โ€“ Explicit Flow

Explicit flow occurs whenever a computation involving a high variable is assigned to a low variable. Examples for explicit flow:

  • low := high
  • low := low + high
  • low := function(low,high)

We need to find rules to avoid explicit information flow.

  • There must never be an assignment of a high variable to a low variable.

69 Introduction to Cybersecurity 2016

Information Flow โ€“ Solution: Assignment rule

  • There must never be an assignment of a high variable to a low variable.

Information flow solved?

70 Introduction to Cybersecurity 2016

low2 := low3 + low3 low1 := 0 if secret_bit == 1: low1 := 1 low2 := low3 + low3 low1 := 0 if secret_bit == 1: low1 := 1

Implicit (conditional) flow from high variable to low variable The program actually computes: low1 := secret_bit No assignment from high to low!

slide-24
SLIDE 24

24 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Information Flow โ€“ Implicit (Conditional) Flow

Conditional flow occurs whenever a computation branches depending on a high variable and within these branches assigns (different) values to low variables. Examples for conditional flow:

  • low := 0

while high > 0: low := low +1 high := high - 1

  • low := 0

if boolean_function(high): low := 1 We need to find rules to avoid conditional information flow.

  • If a conditional (if/for/while/โ€ฆ) depends on a high variable, then no

assignment to low variables is allowed.

71 Introduction to Cybersecurity 2016

Should be allowed:

  • low := 0

while high1 > 0: high1 := function(high1,high2)

  • low := 0

if boolean_function(high1): high2 := 1

Information Flow โ€“ Solution: Conditional rule

  • There must never be an assignment of a high variable to a low variable.
  • If a conditional depends on a high variable, then no assignment to low

variables is allowed. Information flow solved?

72 Introduction to Cybersecurity 2016

low2 := low3 + low3 low1 := 0 while secret_bit == 1: high := 0 low1 := 1 low2 := low3 + low3 low1 := 0 while secret_bit == 1: high := 0 low1 := 1

Covert channel (termination) flow from high variable. The program only terminates if secret_bit == 0. No explicit flow! No assignment to low in conditional! A covert channel is a channel not intended for information transfer at all.

Information Flow โ€“ Covert Channel (Termination) Flow

Termination flow occurs whenever the termination of a computation depends

  • n a high variable.

Examples for termination flow:

  • low := 0

while high > 0: high := high

  • for (temp := 1; temp < high; temp := temp +1)

high := high + 1

  • JUMP_MARK:

if high == 1: goto JUMP_MARK We need to find rules to avoid termination information flow.

  • Termination may not depend on high variables.

73 Introduction to Cybersecurity 2016

Non-terminating behavior depending on high variables should never be allowed.

slide-25
SLIDE 25

25 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Information Flow โ€“ Solution: Termination rule

  • There must never be an assignment of a high variable to a low variable.
  • If a conditional depends on a high variable, then no assignment to low

variables is allowed.

  • Termination may not depend on high variables.

Information flow solved?

74 Introduction to Cybersecurity 2016

low2 := low3 + low3 low1 := 0 if secret_bit == 1: high := compute_complex_function(high) low1 := 1 low2 := low3 + low3 low1 := 0 if secret_bit == 1: high := compute_complex_function(high) low1 := 1

Covert channel (timing) flow from high variable. The program may take significantly longer to terminate if secret_bit == 1. No explicit flow! No conditional flow! Always terminates!

Information Flow โ€“ Covert channel (Timing) Flow

Timing flow occurs whenever the time that a computation needs depends on high variables. Examples for timing flow:

  • while high1 > 0:

high1 := function(high1,high2)

  • for each bit ๐‘๐‘— of secret_key

if ๐‘๐‘— == 1: high := function(high) We need to find rules to avoid timing information flow.

  • Computation time may not depend on high variables.

75 Introduction to Cybersecurity 2016

Should be allowed: for each bit ๐‘๐‘— of secret_key if ๐‘๐‘— == 1: high := function(high) else: dummy := function(high)

Information Flow โ€“ Wrap-up

  • There must never be an assignment of a high variable to a low variable.
  • If a conditional depends on a high variable, then no assignment to low

variables is allowed.

  • Termination may not depend on high variables.
  • Computation time may not depend on high variables.

Even more forms of information flow exist:

  • Concurrent programs can leak internal states (and are hard to analyze)
  • Limited resources can be used to leak information:
  • write high times LARGE DATA to the disk (or load it into the RAM) and

wait for overflow.

  • Other side channels, e.g., using volume control to transfer information,

measuring the electricity consumption, โ€ฆ

76 Introduction to Cybersecurity 2016

slide-26
SLIDE 26

26 CISPA Center for IT Security, Privacy and Accountabiltiy Foundations of Cybersecurity 2016

Summary

Introduction to Cybersecurity 2016

Secure Information Flow

  • Confidentiality
  • (In-)Secure Information Flow
  • Explicit Flow
  • Implicit Flow
  • Termination Flow

77