A Privacy-Protecting Architecture for Collaborative Filtering via - - PowerPoint PPT Presentation

a privacy protecting architecture for
SMART_READER_LITE
LIVE PREVIEW

A Privacy-Protecting Architecture for Collaborative Filtering via - - PowerPoint PPT Presentation

A Privacy-Protecting Architecture for Collaborative Filtering via Forgery and Suppression of Ratings Javier Parra-Arnau , David Rebollo-Monedero and Jordi Forn http://sites.google.com/site/javierparraarnau/ Department of Telematics


slide-1
SLIDE 1

http://sites.google.com/site/javierparraarnau/ Department of Telematics Engineering Technical University of Catalonia (UPC) Barcelona, Spain

A Privacy-Protecting Architecture for Collaborative Filtering via Forgery and Suppression of Ratings

Javier Parra-Arnau,

David Rebollo-Monedero and Jordi Forné

Leuven, Belgium September 15, 2011

1

slide-2
SLIDE 2

Outline

Introduction State of the Art An Architecture for Privacy Protection in Collaborative

Filtering based Recommendation Systems

Formulation of the Optimal Trade-Off between Privacy and

Utility

Conclusions

2

slide-3
SLIDE 3

Introduction

slide-4
SLIDE 4

Information Overload

 The amount of information on the Web has grown exponentially since

the advent of the Internet

3

slide-5
SLIDE 5

Collaborative Filtering

 A recommendation system is a filtering system that suggest

information items that are likely to be of interest to the user  Recommendation systems based on collaborative filtering (CF) algorithms  Examples include Amazon, Digg, Movielens and Netflix

4

mation overl ion overload

  • n overload

ion overload ion overload tion overloa ion overload ion overload formation o ion overload ion overload ion overload

  • rmation ov

ion overload ion overload rmation ove

  • n overload

rmation ove Information

slide-6
SLIDE 6

User Profiles

5

 Users need to communicate their preferences to the recommender

in order to obtain a prediction for those items they have not yet considered

80 76 71 71 67 62 54 51 38 34 25 25 16 12 7 7 7 3 3

Drama Thriller Comedy Action Adventure Crime Romance Sci-Fi War Mistery Documentary Animation Fantasy Horror Children Musical Western Film-Noir IMAX

slide-7
SLIDE 7

6

Privacy Risk

 The privacy risks perceived by users include computers “figuring

things out” about them, unsolicited marketing, court subpoenas, and government surveillance [Cranor 03] she’s pregnant!

… … predictions Recommendation System

slide-8
SLIDE 8

Forgery and Suppression of Ratings

7

 Submitting false information and refusing to give private information

are strategies accepted by users concerned with their privacy [Fox 00, Hoffman 99]

 Our approach relies upon the forgery and suppression of ratings

Recommendation System

… … predictions

SUPPRESSION

the user has read these books

slide-9
SLIDE 9

Contribution (I)

8

 Our architecture protects user privacy to a certain extent

 utility loss measured as forgery rate and suppression rate

slide-10
SLIDE 10

Contribution (II)

9

 Mathematical formulation of the optimal trade-off among privacy,

forgery rate ½ and suppression rate ¾

 Privacy as the Shannon entropy of the user’s apparent profile

 Our proposal could be used in combination with other existing

approaches

P(½; ¾) = max

r;s ri>0; P ri=½ qi>si>0; P si=¾

H µ q + r ¡ s 1 + ½ ¡ ¾ ¶

slide-11
SLIDE 11

State of the Art

slide-12
SLIDE 12

Privacy Protection in Recommendation Systems

 The state-of-the-art approaches may be classified according to these

main strategies

 perturbing the information provided by users [Pollat 03, 05, Agrawal 01,

Kargupta 03, Huang 05],

 using cryptographic techniques [Canny 02, Ahmad 07, Zhan 10], and  distributing the information collected [Miller 04, Berkovsky 07]

10

3.2 + 1.5, 2.9 – 0.7, 4.1, 4.4 – 2.7 5.6, 3.3 + 1.0, 1.1, 3.4 – 0.1

[Pollat 03]

recommendation system

slide-13
SLIDE 13

Privacy Protection in Recommendation Systems

 The state-of-the-art approaches may be classified according to these

main strategies

 perturbing the information provided by users [Pollat 03, 05, Agrawal 01,

Kargupta 03, Huang 05],

 using cryptographic techniques [Canny 02, Ahmad 07, Zhan 10], and  distributing the information collected [Miller 04, Berkovsky 07]

10

[Canny 02] q1 q2 q3 q4 q5 Enc(q1)+::: + Enc(q5)= = Enc(q1 + ::: + q5)

slide-14
SLIDE 14

Privacy Protection in Recommendation Systems

 The state-of-the-art approaches may be classified according to these

main strategies

 perturbing the information provided by users [Pollat 03, 05, Agrawal 01,

Kargupta 03, Huang 05],

 using cryptographic techniques [Canny 02, Ahmad 07, Zhan 10], and  distributing the information collected [Miller 04, Berkovsky 07]

10

[Miller 04]

central server ratings

slide-15
SLIDE 15

An Architecture for Privacy Protection in CF-based Recommendation Systems

slide-16
SLIDE 16

Overview

11

 Profiling is accomplished on the basis of user ratings  Information items are classified as known or unknown  Users may wish to submit ratings to unknown items (forgery) and

refrain from rating known items (suppression)

Recommendation System

unknown items known items

slide-17
SLIDE 17

User Profile Model

12

 [Toubiana 10, Fredrikson 11] suggest representing user profiles as

histograms of absolute frequencies

 We model the profile of a user as a probability mass function (PMF)

Movielens

80 76 71 71 67 62 54 51 38 34 25 25 16 12 7 7 7 3 3

Drama Thriller Comedy Action Adventure Crime Romance Sci-Fi War Mistery Documentary Animation Fantasy Horror Children Musical Western Film-Noir IMAX

Witty

Buddies

Clever

Fall in Love

Humorous

Couple

Relations Parents and Children Feel

Good Best Friends Offbeat

Emotional Slow Teenage Life Sincere

Human Spirit Human Nature

Parents and Children Coming

  • f Age Touching Village Life

Jinni

slide-18
SLIDE 18

User Profile Construction

 Our architecture requires to estimate the actual profile of a user to help

them decide which items should be rated and which should not

 Histogram based on the categories provided by the recommender  Categorize items by exploring web pages and using the vector space

model [Salton 75]

13

books \ literature & fiction \ genre fiction

…?

slide-19
SLIDE 19

Adversarial Model

 Passive attacker capable of crawling through the items rated by a user  The attacker observes the apparent user profile t, a perturbed

version of the actual user profile q

14

Recommender

predictions ratings … …

q

… …

q

… …

t NO PROTECTION!

forgery and suppression

  • f ratings
slide-20
SLIDE 20

Privacy Measure

 We measure privacy as the Shannon entropy of the user’s apparent

profile t

 Accordingly, privacy is compromised whenever the user’s preferences

are biased towards certain categories of interest

15

H(t) =

n

X

i=1

ti log2 ti

number of categories

1 2 3 4

minimum privacy

1 2 3 4

maximum privacy

slide-21
SLIDE 21

Architecture

16

Category Extractor Communication Manager

Information Provider Recommendation System

Known / Unknown Items Classifier User Profile Constructor

x2

uncategorized item categorized item unknown item rated item known item

...

Forgery Alarm Suppression Alarm

Forgery and Suppression Generator

User side Network side

! !

slide-22
SLIDE 22

Architecture

17

Category Extractor

Information Provider Recommendation System

Known / Unknown Items Classifier User Profile Constructor

x2

uncategorized item categorized item unknown item rated item known item

...

Forgery Alarm Suppression Alarm

Forgery and Suppression Generator

User side Network side

! !

Block Functionality

  • Communication

with the recommender

  • Retrieve

information about the items explored by the user

Communication Manager

Description - Starting at the beginning, the book explores how JavaScript originated and evolved into

what it is today. A detailed discussion of the components that make up a JavaScript implementation follows, with specific focus on standards such as ECMAScript and the Document Object Model (DOM).

Category - books \ computers & internet \ web development Average Customer Review 4.5/5 Description - Stephen Hawking, one of the most brilliant theoretical physicists in history, wrote the

modern classic A Brief History of Time to help nonscientists understand the questions being asked by scientists today.

Category - books \ science Average Customer Review 4/5 Description - Written by soccer great and championship Stanford coach Bobby Clark, this book tells

you how, starting at point zero, an uninitiated coach can meld kids into a team and help them enjoy one of the most rewarding experiences of their youth.

Category - books \ sports \ coaching \ soccer Average Customer Review 4.5/5 Description - You’ve made it! Your baby has turned one! Now the real fun begins. From temper

tantrums to toilet training, raising a toddler brings its own set of challenges and questions — and Toddler 411 has the answers.

Category - books \ parenting & families \ parenting Average Customer Review 3/5

slide-23
SLIDE 23

Architecture

18

Category Extractor

Information Provider Recommendation System

Known / Unknown Items Classifier User Profile Constructor

x2

uncategorized item categorized item unknown item rated item known item

...

Forgery Alarm Suppression Alarm

Forgery and Suppression Generator

User side Network side

! !

Block Functionality

  • Obtain categories

associated with the items downloaded by the Communication Manager

Communication Manager

slide-24
SLIDE 24

Architecture

19

Category Extractor

Information Provider Recommendation System

Known / Unknown Items Classifier User Profile Constructor

x2

uncategorized item categorized item unknown item rated item known item

...

Forgery Alarm Suppression Alarm

Forgery and Suppression Generator

User side Network side

! !

Block Functionality

  • The user classifies

the items as known

  • r unknown

Communication Manager

unknown items known items

books \ computers & internet \ web development books \ science books \ sports \ coaching \ soccer books \ parenting & families \ parenting

slide-25
SLIDE 25

Architecture

20

Category Extractor

Information Provider Recommendation System

Known / Unknown Items Classifier User Profile Constructor

x2

uncategorized item categorized item unknown item rated item known item

...

Forgery Alarm Suppression Alarm

Forgery and Suppression Generator

User side Network side

! !

Block Functionality

  • Computes the

actual user profile

Communication Manager

q

… …

slide-26
SLIDE 26

Architecture

21

Category Extractor

Information Provider Recommendation System

Known / Unknown Items Classifier User Profile Constructor

x2

uncategorized item categorized item unknown item rated item known item

...

Forgery Alarm Suppression Alarm

Forgery and Suppression Generator

User side Network side

! !

Block Functionality

  • Centerpiece of the

architecture

  • The user specifies a

forgery rate ½ and a suppression rate ¾

Communication Manager

5% 8% 2%

¾ = 10% ½ = 5%

FORGERY SUPPRESSION

slide-27
SLIDE 27

Architecture

22

Category Extractor

Information Provider Recommendation System

Known / Unknown Items Classifier User Profile Constructor

x2

uncategorized item categorized item unknown item rated item known item

...

Forgery Alarm Suppression Alarm

Forgery and Suppression Generator

User side Network side

! !

Block Functionality

  • Generate an alarm

when an item should be suppressed

Communication Manager

science parenting

8% 2%

slide-28
SLIDE 28

Architecture

23

Category Extractor

Information Provider Recommendation System

Known / Unknown Items Classifier User Profile Constructor

x2

uncategorized item categorized item unknown item rated item known item

...

Forgery Alarm Suppression Alarm

Forgery and Suppression Generator

User side Network side

! !

Block Functionality

  • Generate an alarm

when an item should be forged

Communication Manager

computers sports population’s rating

5%

computers

slide-29
SLIDE 29

Formulation of the Optimal Trade-Off between Privacy and Utility

slide-30
SLIDE 30

Trade-Off between Privacy and Utility

 The degradation in the accuracy of predictions is measured as ¾ and ½  We model items as r.v.’s taking on values in a common finite alphabet

  • f n categories

 We define

 q as the actual user profile  as the forgery rate  as the suppression rate

 Accordingly, the user’s apparent profile is defined as

 

24

½ 2 [0; 1) ¾ 2 [0; 1)

q + r ¡ s 1 + ½ ¡ ¾

r = (r1; :: : ; rn); ri > 0; X ri = ½ s = (s1;: ::; sn); qi > si > 0; X si = ¾

slide-31
SLIDE 31

Trade-Off between Privacy and Utility

 Privacy is measured as the Shannon entropy of the user’s apparent

profile

 The privacy-forgery-suppression function

 This formulation specifies the key functional block of our architecture,

namely the ‘Forgery and Suppression Generator’

25

P(½; ¾) = max

r;s ri>0; P ri=½ qi>si>0; P si=¾

H µ q + r ¡ s 1 + ½ ¡ ¾ ¶

Forgery and Suppression Generator

slide-32
SLIDE 32

Conclusions

slide-33
SLIDE 33

Conclusions

 The forgery and suppression of ratings arise as two simple

mechanisms in terms of infrastructure,

 but it comes at the cost of a loss in utility, namely the degradation in the

accuracy of the predictions

 We propose an architecture that implements these two mechanisms in

those CF-based recommendation systems that profile users exclusively from their ratings

 The centerpiece of our approach is a module responsible for computing the

tuples of forgery r and suppression s

 This information is used to warn the user when their privacy is being

compromised

 It is up to the user to decide whether to forge or eliminate a rating

 We present a formulation of the optimal trade-off among privacy,

forgery rate and suppression rate

26

slide-34
SLIDE 34

http://sites.google.com/site/javierparraarnau/ Department of Telematics Engineering Technical University of Catalonia (UPC) Barcelona, Spain

A Privacy-Protecting Architecture for Collaborative Filtering via Forgery and Suppression of Ratings

Javier Parra-Arnau,

David Rebollo-Monedero and Jordi Forné

Leuven, Belgium September 15, 2011

39