The Theory of Web Transparency: Algorithms and Trade-offs Augustin - - PowerPoint PPT Presentation

the theory of web transparency algorithms and trade offs
SMART_READER_LITE
LIVE PREVIEW

The Theory of Web Transparency: Algorithms and Trade-offs Augustin - - PowerPoint PPT Presentation

The Theory of Bringing Privacy into Practice, 2015 1/8 The Theory of Web Transparency: Algorithms and Trade-offs Augustin Chaintreau 1 , Guillaume Ducoffe 2 , 3 , Roxana Geambasu 1 , ecuyer 1 Mathias L 1Columbia University 2Univ. Nice Sophia


slide-1
SLIDE 1

The Theory of Bringing Privacy into Practice, 2015 1/8

The Theory of Web Transparency: Algorithms and Trade-offs

Augustin Chaintreau 1, Guillaume Ducoffe 2,3, Roxana Geambasu 1, Mathias L´ ecuyer 1

1Columbia University

  • 2Univ. Nice Sophia Antipolis, CNRS, I3S, UMR 7271, 06900 Sophia Antipolis, France

3Inria, France

slide-2
SLIDE 2

The Theory of Bringing Privacy into Practice, 2015 2/8

What it is all about

  • The opaque use of Big Data.

Myriads of personal data collected (and related to each other) − → tweets, emails, website visits, click history, . . . Potential abuses: [Hannack et al., 2014; Acquisti and Fong, 2012] − → online discriminations in advertising, hiring, pricing, . . .

slide-3
SLIDE 3

The Theory of Bringing Privacy into Practice, 2015 3/8

Objectives

  • in this talk: targeting = online discrimination

Inputs: - a set of “sensitive” data;

  • an advertisement (or prices, or a recommended product, . . . ).
slide-4
SLIDE 4

The Theory of Bringing Privacy into Practice, 2015 3/8

Objectives

  • in this talk: targeting = online discrimination

Inputs: - a set of “sensitive” data;

  • an advertisement (or prices, or a recommended product, . . . ).

Targeting Detection Problem: Problem: is the ad received because of some of the data ?

slide-5
SLIDE 5

The Theory of Bringing Privacy into Practice, 2015 3/8

Objectives

  • in this talk: targeting = online discrimination

Inputs: - a set of “sensitive” data;

  • an advertisement (or prices, or a recommended product, . . . ).

Targeting Detection Problem: Problem: is the ad received because of some of the data ? Targeting Identification Problem: Output: the data that are targeted by the ad.

slide-6
SLIDE 6

The Theory of Bringing Privacy into Practice, 2015 4/8

Our tool: Xray

associations

(email→ad, viewed→recommend)

  • ne or more

Web services data inputs

(emails, searches, viewed products)

targeted outputs

(ads, recommended products and videos)

xRay

(monitor, correlate)

slide-7
SLIDE 7

The Theory of Bringing Privacy into Practice, 2015 4/8

Our tool: Xray

  • pen-source: https://github.com/matlecu/xray/

associations

(email→ad, viewed→recommend)

  • ne or more

Web services data inputs

(emails, searches, viewed products)

targeted outputs

(ads, recommended products and videos)

xRay

(monitor, correlate)

slide-8
SLIDE 8

The Theory of Bringing Privacy into Practice, 2015 4/8

Our tool: Xray

  • pen-source: https://github.com/matlecu/xray/

associations

(email→ad, viewed→recommend)

  • ne or more

Web services data inputs

(emails, searches, viewed products)

targeted outputs

(ads, recommended products and videos)

xRay

(monitor, correlate)

− → How to find the associations ?

slide-9
SLIDE 9

The Theory of Bringing Privacy into Practice, 2015 5/8

Algorithmic settings

  • Seeding accounts at random.
  • Collection of the ads seen by the “shadow accounts”.

(intuition) targeting occurs ⇐ ⇒ observations are not random.

slide-10
SLIDE 10

The Theory of Bringing Privacy into Practice, 2015 5/8

Algorithmic settings

  • Seeding accounts at random.

− → random subsets of sensitive data

  • Collection of the ads seen by the “shadow accounts”.

− → noisy oracle with (unknown) probabilities pin, pout of answering “yes”. Learning Problem: targeting ∼ monotone DNF formula over the data Input: a noisy oracle O to make membership queries. Output: the (unknown) targeting function f .

slide-11
SLIDE 11

The Theory of Bringing Privacy into Practice, 2015 6/8

Main results

We seek for algorithms:

  • with a low query complexity (create as few accounts as possible)
  • performing “efficient” computations (polynomial-time)
  • whose output is correct w.h.p. (= PAC)
slide-12
SLIDE 12

The Theory of Bringing Privacy into Practice, 2015 6/8

Main results

We seek for algorithms:

  • with a low query complexity (create as few accounts as possible)
  • performing “efficient” computations (polynomial-time)
  • whose output is correct w.h.p. (= PAC)

Theorem (L´ ecuyer et al., 2014) Let N be the number of sensitive data. Under a monotonicity assumption on the targeting function f , and a bounded-noise assumption (i.e., the ratio pout/pint is bounded), Then a.a.s., we can learn a targeting function of constant-size with O(log N) query complexity in O(N · log N)-time.

slide-13
SLIDE 13

The Theory of Bringing Privacy into Practice, 2015 7/8

Future work

Ongoing extension of the model to take into account negative targeting; Price of Opacity − → How hard is it for an advertiser to conceal her targeting ? − → (Preliminary results) increasing by q the noise (ratio pout/pint) makes the revenue of the advertiser decrease like 1/q.

slide-14
SLIDE 14

The Theory of Bringing Privacy into Practice, 2015 8/8