Dimensional Time Series Release for Traffic Monitoring Liyue Fan , - - PowerPoint PPT Presentation
Dimensional Time Series Release for Traffic Monitoring Liyue Fan , - - PowerPoint PPT Presentation
DBSec13 Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring Liyue Fan , Li Xiong, Vaidy Sunderam Department of Math & Computer Science Emory University 9/4/2013 DBSec'13: Privacy Preserving Traffic
Outline
- Traffic Monitoring
- User Privacy
- Challenges
- Proposed Solutions
- Temporal Estimation
- Spatial Estimation
- Empirical Evaluation
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
2
Monitoring Traffic
- Congestions/Trending places/Everyday life
- How many cars are there? Where are they?
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
3
Monital Metropol, Brazil Google Traffic View
Real-time user location
Traffic Monitoring
- Real-time GPS data traffic histogram
- At any timestamp:
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
4
Aggregate 2D Histogram
User Privacy
- User privacy should be protected when releasing their data!
- Real-time location data is sensitive
- pleaserobme.com
- GPS traces are identifying
- “We study fifteen months of human mobility data for one and a half
million individuals and find that human mobility traces are highly unique. … in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals.”
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
5
De Montjoye, Yves-Alexandre, Cesar A. Hidalgo, Michel Verleysen, and Vincent D. Blondel. "Unique in the Crowd: The Privacy Bounds of Human Mobility." Scientific Reports 3 (2013)
Differentially Private Data Sharing
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
6
Differential Privacy (in a nutshell)
- Rigorous definition
- Doesn’t stipulate the prior knowledge of the attacker
- Upon seeing the published data, an attacker should gain
little knowledge about any specific individual.
- α-Differential Privacy[BLR08]
- Smaller α values (𝛽 < 1) indicate stronger privacy
guarantee
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
7
Privacy Budget
Static α-Differential Privacy
- Laplace perturbation
𝐵 𝐸 = 𝑔 𝐸 + 𝑀𝑏𝑞(∆𝑔 𝛽 )𝑒
- Global Sensitivity
∆𝑔 = max
𝐸,𝐸′ 𝑔 𝐸 − 𝑔(𝐸′) 1
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
8
𝑑1:2 𝑑2:1 𝑑3:3 𝑑4:4
𝑔(𝐸): Δ𝑔 = 1
Laplace Perturbation
𝑑 1:1 𝑑 2:0 𝑑 3:5 𝑑 4:3
A(𝐸):
𝑑 𝑗=𝑑𝑗+ Lap(
1 𝛽)
strong privacy → high perturbation noise Dataset D Query f
Composability of Differential Privacy
- Sequential Composition [McSherry10]
- Let 𝐵𝑙 each provide 𝛽𝑙-differential privacy. A sequence of 𝐵𝑙(𝐸)
- ver dataset 𝐸 provides 𝛽𝑙 -differential privacy.
- Timestamp k = 0, … 𝑈 − 1
- 𝑔
𝑙(𝐸): 2D cell histogram at time 𝑙
- 𝐵𝑙(𝐸): released 2D histogram that satisfies
𝛽 𝑈-DP
- 𝐵0 𝐸 , … , 𝐵𝑈−1(𝐸) satisfies 𝛽-DP
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
9
Baseline Solution: LPA
- Laplace Perturbation Algorithm
- For each timestamp k:
- Release 𝐵𝑙 𝐸 = 𝑔
𝑙(𝐸) + 𝑀𝑏𝑞( 𝑈 𝛽)𝑒
- High perturbation noise for long time-series, i.e. when T is large
- Low utility output since data is sparse
- Fact: location data is VERY sparse.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
10
𝑑1:2 𝑑2:1 𝑑3:3 𝑑4:4 𝑑 1:1 𝑑 2:0 𝑑 3:5 𝑑 4:3 Relative error 𝑑1: 50% 𝑑2: 100%
Two Proposed Solutions
- Temporal Estimation for each cell
- Spatial Estimation within each partition
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
11
𝑑1 𝑑2 𝑑3 𝑑4
1 1 1 2 1 2 3 4 4 3 3 6 10 Utilize time series model and posterior estimation to reduce perturbation error. Group similar cells together to overcome data sparsity.
Framework
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
12 Laplace Perturbation Estimation Modeling/Partitioning Raw Series Differentially Private Series
Domain knowledge: known Sparse or Dense label for each cell. Doesn’t incur extra differential privacy cost
Temporal Estimation
- For each cell, its count series {𝑦𝑙}, k = 0, … 𝑈 − 1
- e.g. {3,3,4,5,4,3,2,…}
- Process Model
𝑦𝑙+1 = 𝑦𝑙 + 𝜕 𝜕~ℕ(0, 𝑅)
- Measurement Model
𝑨𝑙 = 𝑦𝑙 + 𝜉 𝜉~𝑀𝑏𝑞(𝑈 𝛽)
- Goal: given 𝑨𝑙 and the above models, estimate 𝑦𝑙.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
13 Small value for Sparse cells; Large value for Dense cells.
Temporal Estimation(cont.)
- Estimation algorithm based on the Kalman filter
- Gaussian approx 𝜉~ℕ(0, 𝑆) , 𝑆 ∝
𝑈2 𝛽2
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
14 Model-based Prediction Posterior Estimate/Output Linearly combine prediction and measurement O(1) computation per timestamp Fan and Xiong CIKM’12, TKDE’13
Temporal Estimation Example
- For cell c, at time k:
- Suppose 𝑦𝑙 = 4
- Prediction 𝑦
𝑙
−, e.g. 2
- Measurement/Laplace perturbed value 𝑨𝑙, e.g. 8
- Posterior estimation 𝑦
𝑙, e.g. 3
- Impact of perturbation noise is reduced by taking into account of the
process model and prediction!
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
15
Spatial Estimation
- Goal: group cells to overcome data sparsity.
- First partition the space until each partition contains Sparse or
Dense cells only
- Topdown algorithm based on QuadTree
- Data independency and efficiency
- For each timestamp k:
- 𝑔′
𝑙 𝐸 : partition counts
- 𝐵′𝑙 𝐸 = 𝑔′𝑙(𝐸) + 𝑀𝑏𝑞(
𝑈 𝛽)𝑒′
- Release 𝑔
𝑙(𝐸) estimated from 𝐵′𝑙 𝐸
- Each cell is visited O(1) times at each timestamp.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
16 S S S S S S S S S S S S S S D D
Δ𝑔′
𝑙 = 1
Spatial Estimation Example
- At time k
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
17
5 1 11 4 4 6 10 6 12 5 3 6 11 1 1 1 1 3 3 5 3 3 3 6 11
1 1 1 2 1 2 3 4 4 3 3 6 10 Original Cell Histogram 𝒈𝒍 𝑬 : Partition Histogram 𝒈′
𝒍 𝑬
Laplace Perturbed 𝑩′𝒍 𝑬 Estimated Cell Histogram 𝒈 𝒍(𝑬) Perturbation noise is evenly distributed to every cell within the partition.
Evaluation: Data
- Generated moving objects on a road network
- City of Oldenburg, Germany
- 500K objects at the beginning
- 25K new objects at every timestamp
- total time: 100 timestamps
- Two-dimensional 1024 by 1024 grid over the city map
- Each cell represents 400 m2
- Record object locations at cell resolution
- 95% cells are labeled Sparse!
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
18 http://iapg.jade-hs.de/personen/brinkhoff/generator/
Temporal Estimation
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
19
- 500
- 400
- 300
- 200
- 100
100 200 300 400 1 11 21 31 41 51
- rig
Laplace Kalman time cell count
Spatial Partitions
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
20 Oldenburg Road Network Partitions by QuadTree
Evaluation: Utility vs. Privacy
- Utility of each cell: Average Relative Error of released series
- For each 𝛽 value, median utility among each class is plotted
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
21 DFT: Rastogi and Nath, SIGMOD’10
Evaluation: Range Queries
- How many objects are in the area of m by m cells at every
timestamp?
- For each m, 100 areas are randomly selected and evaluated.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
23
Evaluation: Runtime
- Overall runtime is plotted in millisecond.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
24
Conclusion
- Difficult when time series is long and data is sparse!
- Domain knowledge can be used for temporal modeling as well
as spatial partitioning.
- Output utility is improved with same privacy guarantee.
- We don’t observe extra time cost by our solutions.
- Ongoing work:
- Utilize rich information in spatio-temporal data.
- Model learning and parameter learning.
- Contact: liyue.fan@emory.edu
- AIMS Group: www.mathcs.emory.edu/aims
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
25
Q&A
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring
26