fast and private submodular and k submodular functions
play

Fast and Private Submodular and k- Submodular Functions Maximization - PowerPoint PPT Presentation

ICML | 2020 Thirty-seventh International Conference on Machine Learning Fast and Private Submodular and k- Submodular Functions Maximization with Matroid Constraints Akbar Rafiey Yuichi Yoshida 1 Core massage What is the problem?


  1. ICML | 2020 Thirty-seventh International Conference on Machine Learning Fast and Private Submodular and k- Submodular Functions Maximization with Matroid Constraints Akbar Rafiey Yuichi Yoshida 1

  2. Core massage • What is the problem? • What do we want to achieve? • What do we achieve in this paper? 2

  3. What is the problem? How to answer to queries while preserving privacy of data? Sensitive data Analyst: wants to do statistical Examples: analysis of data medical data , • web search data, • social networks, • Salary data • Etc, • 3

  4. What do we want to achieve? We need an algorithm such that: • It returns almost a correct answer to a query • It is efficient and fast • Preserves privacy when we have sensitive data. 4

  5. What we achieve in this paper?(part 1) • We consider a class of set function queries, namely submodular set functions • We present an algorithm for submodular maximization and prove: • It is computationally efficient, • Outputs solutions close to an optimal solution • Preserves privacy of dataset 5

  6. What we achieve in this paper?(part 2) • Further, we consider a generalization of submodular functions, namely k-submodular functions. • This allows to capture more problems. • We present an algorithm for k-submodular maximization and prove: • It is computationally efficient, • Outputs solutions close to an optimal solution • Preserves privacy of dataset 6

  7. Differential privacy: A rigorous notion of privacy analysis/ 100 computation Dataset How many people have diabetes ? analysis/ 99 Dataset computation without X’s data Analyst e.g., health Individual insurance company X 7

  8. Differential privacy: A rigorous notion of privacy analysis/ 100 ± 𝜗 computation Dataset How many people add NOISE have diabetes ? analysis/ 100 ± 𝜗 Dataset computation without X’s data Analyst e.g., health insurance company Individual X 8

  9. Differential privacy: A rigorous notion of privacy analysis/ output Dataset computation add NOISE “Difference” at most 𝜗 Dataset analysis/ output without computation X’s data Intuitively, any one individual’s data should NOT significantly change the outcome. 9

  10. Differential Privacy (definition) • For 𝜗, 𝜀 ∈ 𝑆 ! , we say that a randomized computation M is 𝜗, 𝜀 -differentially private if 1. for any neighboring datasets 𝐸 ∼ 𝐸′ , and 2. for any set of outcomes 𝑇 ⊆ range(M), Pr[M(D) ∈ S] ≤ 𝑓 " Pr[M(D ’ ) ∈ S]+ δ Neighboring datasets: two datasets that differ in at most one record . 10

  11. Set function queries m features ! : 2 " → 𝑆 Set function 𝑔 Id gender diabetes …. asthma Class Given dataset 𝐸, function 𝑔 ! (𝑇) • measures “values” of set S in dataset D 1 F 0 …. 1 C1 𝑔 ! {𝑕𝑓𝑜𝑒𝑓𝑠, 𝑒𝑗𝑏𝑐𝑓𝑢𝑓𝑡} = 5 • 2 M 1 …. 1 C1 • 𝑔 ! {𝑏𝑡𝑢ℎ𝑛𝑏} = 7 3 F 0 …. 1 C1 4 M 1 …. 0 C1 5 F 0 …. 0 C1 Query: what are k most informative features ? 6 NA 1 …. 0 C1 7 F 0 …. 1 C2 Answer while preserving 8 M 1 …. 1 C2 individual’s privacy? 9 NA 0 ….. 1 C2 10 M 1 …. 1 C2 Dataset 𝐸 11

  12. Submodular Function • In words: the marginal contribution of any element 𝑓 to the value of the function 𝑔(𝑇) diminishes as the input set 𝑇 increases . • Mathematically, a function 𝑔: 2 # → 𝑆 is submodular if • for all 𝐵 ⊆ 𝐶 ⊆ 𝐹 , • and all elements 𝑓 ∈ 𝐹 ∖ 𝐶 we have 𝑔 A ∪ {𝑓} − 𝑔(𝐵) ≥ 𝑔 𝐶 ∪ 𝑓 − 𝑔(𝐶) diminishing gain property 12

  13. Problem Design a framework for differentially private submodular maximization under matroid constraint. • A pair 𝑁 = (𝐹, 𝐽) of a set 𝐹 and 𝐽 ⊆ 2 # is called a matroid if • • ∅ ∈ 𝐽 , • 𝐵 ∈ 𝐽 for any 𝐵 ⊆ 𝐶 ∈ 𝐽 , • for any 𝐵, 𝐶 ∈ 𝐽 with 𝐵 < |𝐶| , there exists 𝑓 ∈ 𝐶 ∖ 𝐵 such that 𝐵 ∪ 𝑓 ∈ 𝐽 . argmax 𝑔(𝑇) Our objective: • $∈& 13

  14. Examples of submodularity Summary Document • Feature selection • Influence maximization • Facility location • Maximum coverage • Data summarization • Image summarization • Document summarization …. This Photo by Unknown Author is licensed under CC BY- NC 14

  15. A toy example 𝑠 𝑠 𝑠 # Set 𝐹 : m resources $ % ⋯ Objective: find 𝑇 ⊆ 𝐹 in the matroid that maximizes n agents ⋯ ' D 𝐺 & (𝑇) agent 1 agent 2 agent n &(# 𝐺 𝐺 𝐺 ⋯ ' # $ & : 2 " → 𝑆 Each agent has a private submodular function 𝐺 15

  16. Our contributions non-private previous result (Mitrovic et al.,) our result 1 − 1 1 2 𝑃𝑄𝑈 − 𝑃(Δ ⋅ 𝑠(𝑁) ⋅ ln(|𝐹|) 1 − 1 𝑓 𝑃𝑄𝑈 − 𝑃( 𝜗 + Δ ⋅ 𝑠(𝑁) ⋅ ln(|𝐹|) utility 𝑓 𝑃𝑄𝑈 ) ) 𝜗 𝜗 ) 𝜗. 𝑠 𝑁 $ privacy -- 𝜗. 𝑠(𝑁) # 1 − * 𝑃𝑄𝑈 is the best possible approximation ratio unless P=NP. • Our algorithm uses almost cubic number of function evaluations 𝑃(𝑠 𝑁 ⋅ 𝐹 $ ⋅ ln( + , - )) . • Our privacy factor is worse than the previous work since we deal with multilinear extension. • Please see our paper for details and proofs • 16

  17. Generalization of submodularity: K-submodular functions A function 𝑔: 𝑙 + 1 # → 𝑆 ! defined on 𝑙 -tuples of pairwise disjoint subsets of 𝐹 is called k-submodular if for all 𝑙 -tuples 𝑇 = (𝑇 ' , … , 𝑇 ( ) and 𝑈 = (𝑈 ' , … , 𝑈 ( ) of pairwise disjoint subsets of 𝐹 , 𝑔 𝑇 + 𝑔 𝑈 ≥ 𝑔 𝑇 ⊓ 𝑈 + 𝑔(𝑇 ⊔ 𝑈) where we define ' , … , 𝑇 ( ∩ 𝑈 ( ) 𝑇 ⊓ 𝑈 = ( 𝑇 ' ∩ 𝑈 𝑇 ⊔ 𝑈 = ( 𝑇 ' ∪ 𝑈 ' ∖ R 𝑇 ) ∪ 𝑈 ) , … , 𝑇 ( ∪ 𝑈 ( ∖ R 𝑇 ) ∪ 𝑈 ) ) )*' )*( A simpler definition: A monotone function is k-submodular if each orthant (fix the domain of each element to be {0, 𝑗} for some 𝑗 ∈ {1,2, … , 𝑙} ) is submodular. 17

  18. Examples of k-submodularity Coupled feature selection • Sensor placement with k kinds of measures • Influence maximization with k topics • Variant of facility location • …. • Picture from: Near-optimal Sensor Placements : (a) Example placement Maximizing Information while Minimizing Communication Cost. A. Krause, A. Gupta, C. Guestrin, J. Kleinberg Picture from: On Bisubmodular Maximization A. P. Singh, A. Guillory, J. Bilmes This Photo by Unknown Author is licensed under CC BY- 18 NC

  19. A toy example 𝐻 . : influence graph of 𝐻 $ : influence graph of 𝐻 # : influence graph of ad agency k. ad agency 2. ad agency 1. ad slots ad slots ad slots 𝑣 # 𝑣 # 𝑣 # 𝑣 $ 𝑣 $ 𝑣 $ 𝑤 # 𝑤 # 𝑤 # Edges incident to a user 𝑣 & in 𝑤 $ 𝑣 ) 𝑣 ) 𝑤 $ 𝑤 $ 𝑣 ) 𝐻 # , … , 𝐻 . are sensitive data 𝑤 ) 𝑤 ) 𝑤 ) about 𝑣 & . ⋮ ⋮ ⋮ ⋮ … ⋮ ⋮ 𝑤 % 𝑤 % 𝑤 % Objective: allocate at most B ≤ 𝑛 ad slots to ad agencies so that it maximizes number 𝑣 ' 𝑣 ' 𝑣 ' of influenced users. users users users 19

  20. Our contributions non-private previous result our result 1 1 2 𝑃𝑄𝑈 − 𝑃(Δ ⋅ r M ⋅ ln(|𝐹|) utility 2 𝑃𝑄𝑈 ) 𐄃 𝜗 privacy 𝜗. 𝑠(𝑁) 𐄃 𐄃 Our algorithm is the first differentially private k-submodular maximization algorithm. • # $ 𝑃𝑄𝑈 is asymptotically tight assuming P ≠ NP. • Our algorithm uses almost linear number of function evaluations i.e., 𝑃(𝑙 ⋅ 𝐹 ⋅ ln(𝑠 𝑁 )) . • 20

  21. Thanks! 21

  22. Definition of submodular function Result 1 A function 𝑔: 2 " → 𝑆 is submodular if We present a differentially private algorithm for • for all 𝐵 ⊆ 𝐶 ⊆ 𝐹 , submodular maximization and: • and all elements 𝑓 ∈ 𝐹 ∖ 𝐶 we have Prove that our algorithm returns a solution with quality • at least 𝑔 A ∪ {𝑓} − 𝑔(𝐵) ≥ 𝑔 𝐶 ∪ 𝑓 − 𝑔(𝐶) # 1 − * 𝑃𝑄𝑈 + 𝑡𝑛𝑏𝑚𝑚 𝑏𝑒𝑒𝑗𝑢𝑗𝑤𝑓 𝑓𝑠𝑠𝑝𝑠 Applications Prove that our algorithm preserve privacy • Viral marketing • Improve the number of function evaluations via a • • Information gathering sampling technique while still preserving privacy Feature selection for classification • Influence maximization in social network • • Document summarization… Result 2 (generalization of submodularity) We present the first differentially private algorithm for k- What is our objective? submodular maximization and: We need an optimization method such that Prove that our algorithm returns a solution with quality • It returns almost an optimal solution • at least It is efficient and fast • # $ 𝑃𝑄𝑈 + 𝑡𝑛𝑏𝑚𝑚 𝑏𝑒𝑒𝑗𝑢𝑗𝑤𝑓 𝑓𝑠𝑠𝑝𝑠 Preserves individuals’ privacy when we have sensitive • data: medical data ,web search data, social networks Prove our algorithm preserve privacy • Reduce number of function evaluations to almost linear • Differential privacy by a sampling technique while preserving privacy A rigorous notion of privacy that allows statistical analysis of sensitive data while providing strong privacy guarantees.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend