Dimensional Time Series Release for Traffic Monitoring Liyue Fan , - - PowerPoint PPT Presentation

dimensional time series release
SMART_READER_LITE
LIVE PREVIEW

Dimensional Time Series Release for Traffic Monitoring Liyue Fan , - - PowerPoint PPT Presentation

DBSec13 Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring Liyue Fan , Li Xiong, Vaidy Sunderam Department of Math & Computer Science Emory University 9/4/2013 DBSec'13: Privacy Preserving Traffic


slide-1
SLIDE 1

Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring

Liyue Fan, Li Xiong, Vaidy Sunderam Department of Math & Computer Science Emory University DBSec’13

slide-2
SLIDE 2

Outline

  • Traffic Monitoring
  • User Privacy
  • Challenges
  • Proposed Solutions
  • Temporal Estimation
  • Spatial Estimation
  • Empirical Evaluation

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

2

slide-3
SLIDE 3

Monitoring Traffic

  • Congestions/Trending places/Everyday life
  • How many cars are there? Where are they?

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

3

Monital Metropol, Brazil Google Traffic View

slide-4
SLIDE 4

Real-time user location

Traffic Monitoring

  • Real-time GPS data traffic histogram
  • At any timestamp:

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

4

Aggregate 2D Histogram

slide-5
SLIDE 5

User Privacy

  • User privacy should be protected when releasing their data!
  • Real-time location data is sensitive
  • pleaserobme.com
  • GPS traces are identifying
  • “We study fifteen months of human mobility data for one and a half

million individuals and find that human mobility traces are highly unique. … in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals.”

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

5

De Montjoye, Yves-Alexandre, Cesar A. Hidalgo, Michel Verleysen, and Vincent D. Blondel. "Unique in the Crowd: The Privacy Bounds of Human Mobility." Scientific Reports 3 (2013)

slide-6
SLIDE 6

Differentially Private Data Sharing

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

6

slide-7
SLIDE 7

Differential Privacy (in a nutshell)

  • Rigorous definition
  • Doesn’t stipulate the prior knowledge of the attacker
  • Upon seeing the published data, an attacker should gain

little knowledge about any specific individual.

  • α-Differential Privacy[BLR08]
  • Smaller α values (𝛽 < 1) indicate stronger privacy

guarantee

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

7

Privacy Budget

slide-8
SLIDE 8

Static α-Differential Privacy

  • Laplace perturbation

𝐵 𝐸 = 𝑔 𝐸 + 𝑀𝑏𝑞(∆𝑔 𝛽 )𝑒

  • Global Sensitivity

∆𝑔 = max

𝐸,𝐸′ 𝑔 𝐸 − 𝑔(𝐸′) 1

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

8

𝑑1:2 𝑑2:1 𝑑3:3 𝑑4:4

𝑔(𝐸): Δ𝑔 = 1

Laplace Perturbation

𝑑 1:1 𝑑 2:0 𝑑 3:5 𝑑 4:3

A(𝐸):

𝑑 𝑗=𝑑𝑗+ Lap(

1 𝛽)

strong privacy → high perturbation noise Dataset D Query f

slide-9
SLIDE 9

Composability of Differential Privacy

  • Sequential Composition [McSherry10]
  • Let 𝐵𝑙 each provide 𝛽𝑙-differential privacy. A sequence of 𝐵𝑙(𝐸)
  • ver dataset 𝐸 provides 𝛽𝑙 -differential privacy.
  • Timestamp k = 0, … 𝑈 − 1
  • 𝑔

𝑙(𝐸): 2D cell histogram at time 𝑙

  • 𝐵𝑙(𝐸): released 2D histogram that satisfies

𝛽 𝑈-DP

  • 𝐵0 𝐸 , … , 𝐵𝑈−1(𝐸) satisfies 𝛽-DP

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

9

slide-10
SLIDE 10

Baseline Solution: LPA

  • Laplace Perturbation Algorithm
  • For each timestamp k:
  • Release 𝐵𝑙 𝐸 = 𝑔

𝑙(𝐸) + 𝑀𝑏𝑞( 𝑈 𝛽)𝑒

  • High perturbation noise for long time-series, i.e. when T is large
  • Low utility output since data is sparse
  • Fact: location data is VERY sparse.

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

10

𝑑1:2 𝑑2:1 𝑑3:3 𝑑4:4 𝑑 1:1 𝑑 2:0 𝑑 3:5 𝑑 4:3 Relative error 𝑑1: 50% 𝑑2: 100%

slide-11
SLIDE 11

Two Proposed Solutions

  • Temporal Estimation for each cell
  • Spatial Estimation within each partition

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

11

𝑑1 𝑑2 𝑑3 𝑑4

1 1 1 2 1 2 3 4 4 3 3 6 10 Utilize time series model and posterior estimation to reduce perturbation error. Group similar cells together to overcome data sparsity.

slide-12
SLIDE 12

Framework

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

12 Laplace Perturbation Estimation Modeling/Partitioning Raw Series Differentially Private Series

Domain knowledge: known Sparse or Dense label for each cell. Doesn’t incur extra differential privacy cost

slide-13
SLIDE 13

Temporal Estimation

  • For each cell, its count series {𝑦𝑙}, k = 0, … 𝑈 − 1
  • e.g. {3,3,4,5,4,3,2,…}
  • Process Model

𝑦𝑙+1 = 𝑦𝑙 + 𝜕 𝜕~ℕ(0, 𝑅)

  • Measurement Model

𝑨𝑙 = 𝑦𝑙 + 𝜉 𝜉~𝑀𝑏𝑞(𝑈 𝛽)

  • Goal: given 𝑨𝑙 and the above models, estimate 𝑦𝑙.

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

13 Small value for Sparse cells; Large value for Dense cells.

slide-14
SLIDE 14

Temporal Estimation(cont.)

  • Estimation algorithm based on the Kalman filter
  • Gaussian approx 𝜉~ℕ(0, 𝑆) , 𝑆 ∝

𝑈2 𝛽2

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

14 Model-based Prediction Posterior Estimate/Output Linearly combine prediction and measurement O(1) computation per timestamp Fan and Xiong CIKM’12, TKDE’13

slide-15
SLIDE 15

Temporal Estimation Example

  • For cell c, at time k:
  • Suppose 𝑦𝑙 = 4
  • Prediction 𝑦

𝑙

−, e.g. 2

  • Measurement/Laplace perturbed value 𝑨𝑙, e.g. 8
  • Posterior estimation 𝑦

𝑙, e.g. 3

  • Impact of perturbation noise is reduced by taking into account of the

process model and prediction!

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

15

slide-16
SLIDE 16

Spatial Estimation

  • Goal: group cells to overcome data sparsity.
  • First partition the space until each partition contains Sparse or

Dense cells only

  • Topdown algorithm based on QuadTree
  • Data independency and efficiency
  • For each timestamp k:
  • 𝑔′

𝑙 𝐸 : partition counts

  • 𝐵′𝑙 𝐸 = 𝑔′𝑙(𝐸) + 𝑀𝑏𝑞(

𝑈 𝛽)𝑒′

  • Release 𝑔

𝑙(𝐸) estimated from 𝐵′𝑙 𝐸

  • Each cell is visited O(1) times at each timestamp.

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

16 S S S S S S S S S S S S S S D D

Δ𝑔′

𝑙 = 1

slide-17
SLIDE 17

Spatial Estimation Example

  • At time k

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

17

5 1 11 4 4 6 10 6 12 5 3 6 11 1 1 1 1 3 3 5 3 3 3 6 11

1 1 1 2 1 2 3 4 4 3 3 6 10 Original Cell Histogram 𝒈𝒍 𝑬 : Partition Histogram 𝒈′

𝒍 𝑬

Laplace Perturbed 𝑩′𝒍 𝑬 Estimated Cell Histogram 𝒈 𝒍(𝑬) Perturbation noise is evenly distributed to every cell within the partition.

slide-18
SLIDE 18

Evaluation: Data

  • Generated moving objects on a road network
  • City of Oldenburg, Germany
  • 500K objects at the beginning
  • 25K new objects at every timestamp
  • total time: 100 timestamps
  • Two-dimensional 1024 by 1024 grid over the city map
  • Each cell represents 400 m2
  • Record object locations at cell resolution
  • 95% cells are labeled Sparse!

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

18 http://iapg.jade-hs.de/personen/brinkhoff/generator/

slide-19
SLIDE 19

Temporal Estimation

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

19

  • 500
  • 400
  • 300
  • 200
  • 100

100 200 300 400 1 11 21 31 41 51

  • rig

Laplace Kalman time cell count

slide-20
SLIDE 20

Spatial Partitions

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

20 Oldenburg Road Network Partitions by QuadTree

slide-21
SLIDE 21

Evaluation: Utility vs. Privacy

  • Utility of each cell: Average Relative Error of released series
  • For each 𝛽 value, median utility among each class is plotted

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

21 DFT: Rastogi and Nath, SIGMOD’10

slide-22
SLIDE 22

Evaluation: Range Queries

  • How many objects are in the area of m by m cells at every

timestamp?

  • For each m, 100 areas are randomly selected and evaluated.

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

23

slide-23
SLIDE 23

Evaluation: Runtime

  • Overall runtime is plotted in millisecond.

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

24

slide-24
SLIDE 24

Conclusion

  • Difficult when time series is long and data is sparse!
  • Domain knowledge can be used for temporal modeling as well

as spatial partitioning.

  • Output utility is improved with same privacy guarantee.
  • We don’t observe extra time cost by our solutions.
  • Ongoing work:
  • Utilize rich information in spatio-temporal data.
  • Model learning and parameter learning.
  • Contact: liyue.fan@emory.edu
  • AIMS Group: www.mathcs.emory.edu/aims

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

25

slide-25
SLIDE 25

Q&A

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring

26