Fast and Private Submodular and k- Submodular Functions Maximization with Matroid Constraints
ICML | 2020
Thirty-seventh International Conference on Machine Learning
Yuichi Yoshida Akbar Rafiey
1
Fast and Private Submodular and k- Submodular Functions Maximization - - PowerPoint PPT Presentation
ICML | 2020 Thirty-seventh International Conference on Machine Learning Fast and Private Submodular and k- Submodular Functions Maximization with Matroid Constraints Akbar Rafiey Yuichi Yoshida 1 Core massage What is the problem?
ICML | 2020
Thirty-seventh International Conference on Machine Learning
Yuichi Yoshida Akbar Rafiey
1
2
Sensitive data Examples:
Analyst: wants to do statistical analysis of data How to answer to queries while preserving privacy of data?
3
We need an algorithm such that:
4
5
6
A rigorous notion of privacy
Dataset Dataset without X’s data analysis/ computation analysis/ computation 99 100 How many people have diabetes ? Analyst e.g., health insurance company Individual X
7
A rigorous notion of privacy
Dataset Dataset without X’s data analysis/ computation analysis/ computation 100 ± 𝜗 How many people have diabetes ? Analyst e.g., health insurance company add NOISE 100 ± 𝜗 Individual X
8
A rigorous notion of privacy
Dataset Dataset without X’s data analysis/ computation analysis/ computation
“Difference” at most 𝜗 add NOISE Intuitively, any one individual’s data should NOT significantly change the outcome.
9
private if
Pr[M(D) ∈ S] ≤ 𝑓" Pr[M(D’) ∈ S]+δ Neighboring datasets: two datasets that differ in at most one record.
10
Id gender diabetes …. asthma Class 1 F …. 1 C1 2 M 1 …. 1 C1 3 F …. 1 C1 4 M 1 …. C1 5 F …. C1 6 NA 1 …. C1 7 F …. 1 C2 8 M 1 …. 1 C2 9 NA ….. 1 C2 10 M 1 …. 1 C2 Set function 𝑔
!: 2" → 𝑆
!(𝑇)
measures “values” of set S in dataset D
! {𝑓𝑜𝑒𝑓𝑠, 𝑒𝑗𝑏𝑐𝑓𝑢𝑓𝑡} = 5
! {𝑏𝑡𝑢ℎ𝑛𝑏} = 7
Dataset 𝐸 Query: what are k most informative features ? m features
Answer while preserving individual’s privacy?
11
as the input set 𝑇 increases.
𝑔 A ∪ {𝑓} − 𝑔(𝐵) ≥ 𝑔 𝐶 ∪ 𝑓 − 𝑔(𝐶)
12
diminishing gain property
argmax
$∈&
𝑔(𝑇)
13
…. Document Summary
This Photo by Unknown Author is licensed under CC BY- NC
14
Objective: find 𝑇 ⊆ 𝐹 in the matroid that maximizes
Set 𝐹 : m resources n agents 𝑠
#
𝑠
$
𝑠
%
Each agent has a private submodular function 𝐺
&: 2" → 𝑆
𝐺
#
𝐺
$
𝐺
'
agent 1 agent 2 agent n ⋯ ⋯ ⋯ D
&(# '
𝐺
&(𝑇)
15
non-private previous result (Mitrovic et al.,)
utility 1 − 1 𝑓 𝑃𝑄𝑈 1 2 𝑃𝑄𝑈 − 𝑃(Δ ⋅ 𝑠(𝑁) ⋅ ln(|𝐹|) 𝜗 ) 1 − 1 𝑓 𝑃𝑄𝑈 − 𝑃( 𝜗 + Δ ⋅ 𝑠(𝑁) ⋅ ln(|𝐹|) 𝜗) ) privacy
𝜗. 𝑠 𝑁 $
# * 𝑃𝑄𝑈 is the best possible approximation ratio unless P=NP.
+ ,
16
K-submodular functions
𝑔 𝑇 + 𝑔 𝑈 ≥ 𝑔 𝑇 ⊓ 𝑈 + 𝑔(𝑇 ⊔ 𝑈) A function 𝑔: 𝑙 + 1 # → 𝑆! defined on 𝑙-tuples of pairwise disjoint subsets of 𝐹 is called k-submodular if for all 𝑙-tuples 𝑇 = (𝑇', … , 𝑇() and 𝑈 = (𝑈
', … , 𝑈() of pairwise disjoint subsets of 𝐹,
𝑇 ⊓ 𝑈 = ( 𝑇' ∩ 𝑈
' , … , 𝑇( ∩ 𝑈( )
𝑇 ⊔ 𝑈 = ( 𝑇' ∪ 𝑈
' ∖
R
)*'
𝑇) ∪ 𝑈) , … , 𝑇( ∪ 𝑈( ∖ R
)*(
𝑇) ∪ 𝑈) ) where we define
17
A simpler definition: A monotone function is k-submodular if each orthant (fix the domain of each element to be {0, 𝑗} for some 𝑗 ∈ {1,2, … , 𝑙} ) is submodular.
(a) Example placement
Picture from: Near-optimal Sensor Placements : Maximizing Information while Minimizing Communication Cost.
Picture from: On Bisubmodular Maximization
This Photo by Unknown Author is licensed under CC BY- NC
18
Objective: allocate at most B≤ 𝑛 ad slots to ad agencies so that it maximizes number
𝐻#: influence graph of ad agency 1. 𝐻$: influence graph of ad agency 2. 𝐻.: influence graph of ad agency k. 𝑤# 𝑤# 𝑤# 𝑤) 𝑤$ 𝑤% 𝑤) 𝑤$ 𝑤% 𝑤) 𝑤$ 𝑤% 𝑣# 𝑣$ 𝑣) 𝑣' 𝑣# 𝑣$ 𝑣) 𝑣' 𝑣# 𝑣$ 𝑣) 𝑣'
…
ad slots users ⋮
⋮
⋮
⋮
⋮
⋮
users users ad slots ad slots Edges incident to a user 𝑣& in 𝐻#, … , 𝐻. are sensitive data about 𝑣&.
19
non-private previous result
utility 1 2 𝑃𝑄𝑈 1 2 𝑃𝑄𝑈 − 𝑃(Δ ⋅ r M ⋅ ln(|𝐹|) 𝜗 ) privacy 𝜗. 𝑠(𝑁)
$ 𝑃𝑄𝑈 is asymptotically tight assuming P≠NP.
20
21
A function 𝑔: 2" → 𝑆 is submodular if
𝑔 A ∪ {𝑓} − 𝑔(𝐵) ≥ 𝑔 𝐶 ∪ 𝑓 − 𝑔(𝐶) Definition of submodular function
Applications We need an optimization method such that
data: medical data ,web search data, social networks What is our objective? A rigorous notion of privacy that allows statistical analysis
Differential privacy We present a differentially private algorithm for submodular maximization and:
at least 1 −
# * 𝑃𝑄𝑈 + 𝑡𝑛𝑏𝑚𝑚 𝑏𝑒𝑒𝑗𝑢𝑗𝑤𝑓 𝑓𝑠𝑠𝑝𝑠
sampling technique while still preserving privacy Result 1 We present the first differentially private algorithm for k- submodular maximization and:
at least
# $ 𝑃𝑄𝑈 + 𝑡𝑛𝑏𝑚𝑚 𝑏𝑒𝑒𝑗𝑢𝑗𝑤𝑓 𝑓𝑠𝑠𝑝𝑠
by a sampling technique while preserving privacy Result 2 (generalization of submodularity)