CS573 Data Privacy and Security Li Xiong Department of Mathematics - - PowerPoint PPT Presentation

cs573 data privacy and security
SMART_READER_LITE
LIVE PREVIEW

CS573 Data Privacy and Security Li Xiong Department of Mathematics - - PowerPoint PPT Presentation

CS573 Data Privacy and Security Li Xiong Department of Mathematics and Computer Science Emory University Today Meet everyone in class Course overview Why data privacy and security What is data privacy and security What we


slide-1
SLIDE 1

CS573 Data Privacy and Security

Li Xiong

Department of Mathematics and Computer Science Emory University

slide-2
SLIDE 2

Today

  • Meet everyone in class
  • Course overview

– Why data privacy and security – What is data privacy and security – What we will learn

  • Course logistics

9/9/2018 2

slide-3
SLIDE 3

Instructor

  • Li Xiong

– Web: http://www.cs.emory.edu/~lxiong – Email: lxiong@emory.edu – Office Hours: MW 11:15-12:15pm or by appt – Office: MSC E412

9/9/2018 3

slide-4
SLIDE 4

About Me

http://www.cs.emory.edu/~lxiong

  • Undergraduate teaching

– CS170 Intro to CS I – CS171 Intro to CS II – CS377 Database systems – CS378 Data mining

  • Graduate teaching

– CS550 Database systems – CS570 Data mining – CS573 Data privacy and security – CS730R/CS584 Topics in data management – big data analytics

  • Research http://www.cs.emory.edu/aims

– data privacy and security – Spatiotemporal data management – health informatics

4

slide-5
SLIDE 5

Meet everyone in class

  • Group introduction (2-3 people)
  • Introducing your group

– Name and program – Goals for taking the course – Something interesting about your group

9/9/2018 5

slide-6
SLIDE 6

Today

  • Meet everyone in class
  • Course overview

– Why data privacy and security – What is data privacy and security – What we will learn

  • Course logistics

9/9/2018 6

slide-7
SLIDE 7

Quiz

  • How many people know you are in this room now?

(a) no one (b) 1-5 i.e. your immediate family and friends (c) 5-20 i.e. your department staff, your colleagues and classmates

slide-8
SLIDE 8
slide-9
SLIDE 9

Who Knows What About Me? A Survey of Behind the Scenes Personal Data Sharing to Third Parties by Mobile Apps, 2015-10-30 https://techscience.org/a/2015103001/

  • 73% / 33% of Android

apps shared personal info (i.e. email) / GPS coordinates with third parties

  • 45% / 47% of iOS apps

shared email / GPS coordinates with third parties

Location data sharing by iOS apps (left) to domains (right)

slide-10
SLIDE 10

Quiz

  • How many organizations have your medical records?
slide-11
SLIDE 11

The Data Map

slide-12
SLIDE 12

Big Data Tsunami

slide-13
SLIDE 13

The 5 V’s of Big Data

slide-14
SLIDE 14

Value of Big Data

  • GPS traces, call records
  • Syndromic surveillance, social relationships
slide-15
SLIDE 15

Value of Big Data

  • Electronic health records (EHR)
  • Secondary use for medical research
slide-16
SLIDE 16

Value of Big Data

slide-17
SLIDE 17

Big Data and Privacy

slide-18
SLIDE 18

Privacy Risks

slide-19
SLIDE 19

Location Privacy Risks

  • Tracking
  • Identification
  • Profiling
slide-20
SLIDE 20

Privacy Risks

9/9/2018 20

slide-21
SLIDE 21

9/9/2018 21

slide-22
SLIDE 22

Netflix Sequel

  • 2006, Netflix announced the challenge
  • 2007, researchers from University of Texas identified

individuals by matching Netflix datasets with IMDB

  • July 2009, $1M grand prize awarded
  • August 2009, Netflix announced the second challenge
  • December 2009, four Netflix users filed a class action lawsuit

against Netflix

  • March 2010, Netflix canceled the second challenge
slide-23
SLIDE 23

23

slide-24
SLIDE 24

Netflix Sequel

  • 2006, Netflix announced the challenge
  • 2007, researchers from University of Texas identified

individuals by matching Netflix datasets with IMDB

  • July 2009, $1M grand prize awarded
  • August 2009, Netflix announced the second challenge
  • December 2009, four Netflix users filed a class action lawsuit

against Netflix

  • March 2010, Netflix canceled the second challenge
slide-25
SLIDE 25

Netflix Sequel

  • 2006, Netflix announced the challenge
  • 2007, researchers from University of Texas identified

individuals by matching Netflix datasets with IMDB

  • July 2009, $1M grand prize awarded
  • August 2009, Netflix announced the second challenge
  • December 2009, four Netflix users filed a class action lawsuit

against Netflix

  • March 2010, Netflix canceled the second competition
slide-26
SLIDE 26

Facebook-Cambridge Analytica

  • April 2010, Facebook launches Open Graph
  • 2013, 300,000 users took the psychographic personality test

app ”thisisyourdigitallife”

  • 2016, Trump’s campaign invest heavily in Facebook ads
  • March 2018, reports revealed that 50 million (later revised to

87 million) Facebook profiles were harvested for Cambridge Analytica and used for Trump’s campaign

  • April 11, 2018, Zuckerberg testified before Congress
slide-27
SLIDE 27

Facebook-Cambridge Analytica

  • April 2010, Facebook launches Open Graph
  • 2013, 300,000 users took the psychographic personality test

app ”thisisyourdigitallife”

  • 2016, Trump’s campaign invest heavily in Facebook ads
  • March 2018, reports revealed that 50 million (later revised to

87 million) Facebook profiles were harvested for Cambridge Analytica and used for Trump’s campaign

  • April 11, 2018, Zuckerberg testified before Congress
slide-28
SLIDE 28

Data Breaches

  • Data viewed, stolen, or used by unauthorized

users

  • 2018

– T-Mobile: 2 million T-mobile customers account details compromised by hackers – FedEx: stored sensitive customer data on open Amazon S3 bucket

  • 2017

– Uber: 57 million customers and drivers exposed – Equifax: name, SSN, birth dates, and addresses of 143 million customers disclosed

9/9/2018 28

slide-29
SLIDE 29

Benefits … and Risks

Fine line between benefit and risks

(Most people don’t even see it)

slide-30
SLIDE 30

What is the course about

  • Techniques for ensuring data privacy and

security (while harnessing value of data)

  • Not about

– Network security – System security – Software security

slide-31
SLIDE 31

Today

  • Meet everyone in class
  • Course overview

– Why data privacy and security – What is data privacy and security – What we will learn

  • Course logistics

9/9/2018 31

slide-32
SLIDE 32

What is Privacy

  • Definitions vary according to context and

environment

  • right to be left alone (Right to privacy,

Warren and Brandeis, 1890; Olmstead v. United States (1928) dissent, Brandeis)

  • a: The quality or state of being apart from

company or observation; b: freedom from unauthorized intrusion (Merriam-Webster)

slide-33
SLIDE 33

Aspects of Privacy

  • Information privacy

– Collection and handling of personal data, e.g. medical records

  • Bodily privacy

– Protection of physical selves against invasive procedures, e.g. genetic test

  • Privacy of communications

– Mail, telephones, emails

  • Territorial privacy

– Limits on intrusion into domestic environments, e.g. video surveillance

slide-34
SLIDE 34

Information Privacy

– Data about individuals should not be automatically available to other individuals and

  • rganizations

– The individual must be able to exercise a substantial degree of control over that data and its use – The barring of some kinds of negative consequences from the use of an individual’s personal information

slide-35
SLIDE 35

Models of privacy protection

  • Laws and regulations

– Comprehensive laws

  • Adopted by European Union (GDPR), Canada, Australia

– Sectoral laws

  • Adopted by US
  • Financial privacy, protected health information
  • Lack of legal protections for data privacy on the Internet

– Self-regulation

  • Companies and industry bodies establish codes of practice
  • Technologies
slide-36
SLIDE 36

A race to the bottom: privacy ranking of Internet service companies

  • A study done by Privacy International into the

privacy practices of key Internet based companies in 2007

  • Amazon, AOL, Apple, BBC, eBay, Facebook,

Google, LinkedIn, LiveJournal, Microsoft, MySpace, Skype, Wikipedia, LiveSpace, Yahoo!, YouTube

slide-37
SLIDE 37

A Race to the Bottom: Methodologies

  • Corporate administrative details
  • Data collection and processing
  • Data retention
  • Openness and transparency
  • Customer and user control
  • Privacy enhancing innovations and privacy

invasive innovations

slide-38
SLIDE 38

A race to the bottom: interim results revealed

slide-39
SLIDE 39

A race to the bottom: interim results revealed

slide-40
SLIDE 40

Why Google

  • Retains a large quantity of information about

users, often for an unstated or indefinite length

  • f time, without clear limitation on subsequent

use or disclosure

  • Maintains records of all search strings with

associated IP and time stamps for at least 18-24 months

  • Additional personal information from user

profiles in Orkut

  • Use advanced profiling system for ads
slide-41
SLIDE 41

Are Google and Facebook and … Evil?

  • Targeted

advertising

  • Cross-selling of

users’ data

  • Personalized

experience

9/9/2018 41

slide-42
SLIDE 42

They are always watching … what can we do?

Who cares? I have nothing to hide.

slide-43
SLIDE 43

If you do care …

  • Use cash when you can.
  • Do not give your phone number, social-security number or

address, unless you absolutely have to.

  • Do not fill in questionnaires or respond to telemarketers.
  • Demand that credit and data-marketing firms produce all

information they have on you, correct errors and remove you from marketing lists.

  • Check your medical records often.
  • Block caller ID on your phone, and keep your number unlisted.
  • Never leave your mobile phone on, your movements can be

traced.

  • Do not user store credit or discount cards
  • If you must use the Internet, encrypt your e-mail, reject all

“cookies” and never give your real name when registering at websites

  • Better still, use somebody else’s computer
slide-44
SLIDE 44

Privacy Protection Techniques

  • Finding balances between privacy and

multiple competing interests:

–Privacy vs. other interests (e.g. quality of health care; movie recommendation; social network) –Privacy vs. interests of other people,

  • rganization, or society as a whole (e.g.

advertising, insurance companies, healthcare research; movie recommendation for others).

slide-45
SLIDE 45

Industry awareness and trends

9/9/2018 45

slide-46
SLIDE 46
slide-47
SLIDE 47

Today

  • Meet everyone in class
  • Course overview

– Why data privacy and security – What is data privacy and security – What we will learn

  • Course logistics

9/9/2018 47

slide-48
SLIDE 48

Security

  • The quality or state of being secure: as a:

freedom from danger; b: freedom from fear

  • r anxiety (merrian-webster)
  • National security
  • Individual security
  • Computer security (cyber security)

– Protecting information systems including the hardware, software, data, network, and services

9/9/2018 48

slide-49
SLIDE 49

Security vs. Privacy

  • Data

surveillance

– Surveillance cameras – Sensors – Online surveillance

9/9/2018 49

slide-50
SLIDE 50

Principles of Data Security – CIA Triad

  • Confidentiality

– Prevent the disclosure of information to unauthorized users

  • Integrity

– Prevent improper modification

  • Availability

– Make data available to legitimate users

slide-51
SLIDE 51

Privacy vs. Confidentiality

  • Confidentiality

– Prevent disclosure of information to unauthorized users

  • Privacy

– Prevent disclosure of personal information to unauthorized users – Control of how personal information is collected and used – Prevent identification of individuals

9/9/2018 51

slide-52
SLIDE 52

Data Privacy and Confidentiality Measures

  • Access control

– Restrict access to the (subset or view of) data to authorized users

  • Cryptography

– Use encryption to encode information so it can be only read by authorized users (protected in transmit and storage)

  • Inference control

– Restrict inference from accessible data to sensitive (non- accessible) data

slide-53
SLIDE 53

Access Control

  • Access control

– Selective restriction of access to the data to authorized users

  • Access control policies and mechanisms
  • Issues

– Fine grained access control – Spatial and temporal context – Group access control in social network applications

Data

slide-54
SLIDE 54

Cryptography

  • Encoding data in a way that only authorized

users can read it

9/9/2018 54

Original Data Encrypted Data Encryption

slide-55
SLIDE 55

55

Applications of Cryptography

  • Secure data outsourcing

– Support computation and queries on encrypted data

9/9/2018 55

Encrypted Data

Computation /Queries

slide-56
SLIDE 56

56

Applications of Cryptography

  • Multi-party secure computations (secure function

evaluation)

– Securely compute a function without revealing private inputs

xn x1 x3 x2 f(x1,x2,…, xn)

slide-57
SLIDE 57

57

Applications of Cryptography

  • Private information retrieval (access privacy)

– Retrieve data without revealing query (access pattern)

slide-58
SLIDE 58

Original Data Sanitized Records/ Models Inference Control

Inference Control

  • Prevent inference from accessible information to individual

information (not accessible)

slide-59
SLIDE 59

Course Topics

  • Data privacy and confidentiality

– Inference control – Cryptography applications – Access control

  • Data integrity and availability

– Data poisoning attacks – Block chain

9/9/2018 62

slide-60
SLIDE 60

Course Topics

  • Applications

– Healthcare data – Cloud computing – Location based applications – Online social networks and social media – Crowdsourcing

9/9/2018 67

slide-61
SLIDE 61

Learning Objectives

  • Learn the classic and state-of-the-art data

privacy and security approaches

  • Study various applications where data privacy

and security is needed and can be applied

  • Challenge existing solutions and identify new

problems in data privacy and security

9/9/2018 68

slide-62
SLIDE 62

Today

  • Meet everyone in class
  • Course overview

– Why data privacy and security – What is data privacy and security – What we will learn

  • Course logistics

9/9/2018 69

slide-63
SLIDE 63

Logistics

  • Reading materials

– Book chapters, papers, online articles

  • Prerequisite

– Some database and statistics background – Programming skills

  • Class webpage

– Lecture slides – Link to readings – Project/assignments

http://www.cs.emory.edu/~lxiong/cs573

9/9/2018 70

slide-64
SLIDE 64

Workload

  • ~2 programming assignments (individual)
  • weekly reading assignments and paper reviews
  • ~1 paper presentation in class
  • 1 course project (team of up to 2 students) with

project presentation

– Implementation of existing algorithms – Design of new algorithms to solve new problems – Survey of a class of algorithms

  • 1 midterm
  • No final exam
slide-65
SLIDE 65

Paper reviews

  • 1 page
  • NOT just a summary of the paper, but your

critical opinion of the paper

  • Summarize (at least 3) things you like or

learned

  • Point out (at least 3) limitations, extensions,
  • r interesting applications of the ideas
  • Connect and contrast the paper to what we

have learned/read so far

9/9/2018 72

slide-66
SLIDE 66

Course Project

  • Options

– Application and evaluation of existing algorithms – Design of new algorithms to solve new problems – Survey of a class of algorithms

  • Timeline

– 10/17: proposal – 12/3, 12/5, 12/10: Project workshop/presentation – 12/17: project report/deliverables

slide-67
SLIDE 67

Late Policy

  • Late assignment will be accepted within

3 days of the due date and penalized 10% per day

  • 2 late assignment allowances, each can

be used to turn in a single late assignment within 3 days of the due date without penalty.

slide-68
SLIDE 68

Learning Objectives (Non technical)

  • Read papers and write paper critiques
  • Present papers and lead discussions
  • Learn/practice the life cycle of a research project

– literature review – problem formulation – project proposal writing – algorithm design – experimental studies – paper/project report writing

  • Time management

9/9/2018 75

slide-69
SLIDE 69

Grading

  • Assignments/presentations

40%

  • Final project

30%

  • Midterm

30%

slide-70
SLIDE 70

Some expectations

  • Participate in class, think critically, ask

questions

  • Read and write reviews critically
  • Start on assignments and projects early
  • Enjoy the class!

9/9/2018 77