Time Distortion Anonymization for the Publication of Mobility Data - - PowerPoint PPT Presentation

time distortion anonymization for the publication of
SMART_READER_LITE
LIVE PREVIEW

Time Distortion Anonymization for the Publication of Mobility Data - - PowerPoint PPT Presentation

Time Distortion Anonymization for the Publication of Mobility Data with High Utility Vincent Primault, Sonia Ben Mokhtar, Cdric Lauradoux and Lionel Brunie Mobility data usefulness Real-time traffic, traffic prediction Companies


slide-1
SLIDE 1

Time Distortion Anonymization for the Publication of Mobility Data with High Utility

Vincent Primault, Sonia Ben Mokhtar, Cédric Lauradoux and Lionel Brunie

slide-2
SLIDE 2

Mobility data usefulness…

2

Real-time traffic, traffic prediction Long-term place prediction Companies collecting data

slide-3
SLIDE 3

… and threats

Gambs et al. Show Me How You Move and I Will Tell You Who You Are. Transactions

  • n Data Privacy, 2011.

3

slide-4
SLIDE 4
slide-5
SLIDE 5

Privacy-preserving data publication

5

Anonymized mobility traces Partially re-identified mobility traces Clustering White pages Public maps Raw mobility traces Protection mechanism Attacker Useful analysis results Researcher Data mining Machine learning Simulations

slide-6
SLIDE 6

Outline

  • Introduction
  • State of the art
  • Making a PROMESSE
  • Experimental evaluation
  • Conclusion

6

slide-7
SLIDE 7

A mobility trace

7

A trace is a temporally

  • rdered list of records

belonging to a same user. A record is a triplet (user, location, timestamp).

slide-8
SLIDE 8

Extraction of points of interest (POIs)

8

Done by using, e.g., an appropriate clustering algorithm. Points of interest convey semantic information about habits and can lead to users re-identification.

slide-9
SLIDE 9

Location privacy protection mechanisms for data publication

9

k-anonymity Differential privacy

Wait For Me [Abul et al., 2010] Geo-Indistinguishability [Andrés et al., 2013] Abul et al. Anonymization of moving objects databases by clustering and

  • perturbation. Information Systems, 2010.

Andrés et al. Geo-indistinguishability: Differential privacy for Location-based Systems. CCS, 2013.

slide-10
SLIDE 10

Wait For Me

∂ represents the incertitude that comes from GPS measurements. Wait For Me enforces (k,∂)-anonymity, i.e., there is always at least k users in a cylinder of radius ∂/2.

10

Abul et al. Anonymization of moving objects databases by clustering and

  • perturbation. Information Systems, 2010.

∂ ∂

slide-11
SLIDE 11

Geo-Indistinguishability

11

Andrés et al. Geo-indistinguishability: Differential privacy for Location-based Systems. CCS, 2013. l1, r1 l2, r2 l3, r3 l4, r4 Level of privacy li within ri proportional to an ε

Real location Protected location

slide-12
SLIDE 12

Outline

  • Introduction
  • State of the art
  • Making a PROMESSE
  • Experimental evaluation
  • Conclusion

12

slide-13
SLIDE 13

Intuition behind our work

No state-of-the-art mechanism is both privacy- preserving and useful for data scientists.

13

We believe geographical information is the most important one, so we propose a new mechanism that minimally distort the location. Almost all of them alter the geographical information in some way.

slide-14
SLIDE 14

Hiding POIs with speed smoothing

The idea To guarantee a constant speed along a trace. More challengingto identify where a user stops, and therefore her POIs. How? Divide traces into smaller trajectories, typically one day long. Enforce an equal duration and length between two consecutive records.

14

slide-15
SLIDE 15

Speed smoothing

15

10:05 10h08 10:05 10:08 10:07 10:06 epsilon 10:06 10:07 Point of interest

slide-16
SLIDE 16

Outline

  • Introduction
  • State of the art
  • Making a PROMESSE
  • Experimental evaluation
  • Conclusion

16

slide-17
SLIDE 17

Experimenting with three real-life datasets

Cabspotting Geolife MDC Records 8,9M 3,8M 1,1M Traces 5,5k 2,4k 4,6k Avg trace duration 32 h 3 h 3 h Avg sampling rate 72 s 7 s 32 s

17

slide-18
SLIDE 18

POIs retrieval

POIs with maximum diameter of 200 meters and minimum duration of 15 minutes. Two POIs match if their centroids are within 100 meters.

18

slide-19
SLIDE 19

POIs retrieval (F-score)

19

0% 10% 20% 30% 40% 50% 60% Cabspotting Geolife MDC

Lower is better

slide-20
SLIDE 20

Average spatial error

20

Real trace Protected trace

slide-21
SLIDE 21

Average spatial error

21

0,1 1 10 100 1000 10000 100000 Spatial error, in meters Cabspotting Geolife MDC

Lower is better (log scale)

slide-22
SLIDE 22

Range queries distortion

22

From 2 to 8 hours 1,000 different queries

Distortion is |Q(D) – Q(D’)|/Q(D)

slide-23
SLIDE 23

Range queries distortion

23

0% 20% 40% 60% 80% 100% 120% Cabspotting Geolife MDC

Lower is better

slide-24
SLIDE 24

Outline

  • Introduction
  • State of the art
  • Making a PROMESSE
  • Experimental evaluation
  • Future work
  • Conclusion

24

slide-25
SLIDE 25

Summary

  • Introduced time distortion, opened a new

research direction.

  • Implemented a new protection mechanism for

data publishing, addressing a severe threat while maintaining high utility.

  • Evaluated against three real-life datasets.

25

slide-26
SLIDE 26

Questions

26