streaming submodular maximization under noise subject to
play

Streaming -submodular Maximization under Noise subject to Size - PowerPoint PPT Presentation

Streaming -submodular Maximization under Noise subject to Size Constraint Lan N. Nguyen, My T. Thai University of Florida -submodular maximization s.t. size constraint -submodular function is a generalization of submodular


  1. Streaming 𝒍 -submodular Maximization under Noise subject to Size Constraint Lan N. Nguyen, My T. Thai University of Florida

  2. 𝒍 -submodular maximization s.t. size constraint ➒ 𝑙 -submodular function is a generalization of submodular function ❑ Submodular set function: input is a single subset π‘Š 𝑔 π‘Œ + 𝑔 𝑍 β‰₯ 𝑔 π‘Œ βˆͺ 𝑍 + 𝑔(π‘Œ ∩ 𝑍) ❑ 𝑙 -submodular function: input is 𝑙 disjoint subsets of π‘Š 𝑔 𝐲 + 𝑔 𝐳 β‰₯ 𝑔 𝐲 βŠ” 𝐳 + 𝑔(𝐲 βŠ“ 𝐳) β–ͺ 𝐲 = (π‘Œ 1 , … , π‘Œ 𝑙 ) and 𝐳 = (𝑍 1 , … , 𝑍 𝑙 ) β–ͺ 𝐲 βŠ” 𝐳 = (π‘Ž 1 , … , π‘Ž 𝑙 ) where π‘Ž 𝑗 = π‘Œ 𝑗 βˆͺ 𝑍 𝑗 βˆ– (Ϊ‚ π‘˜β‰ π‘— π‘Œ π‘˜ βˆͺ 𝑍 π‘˜ ) β–ͺ 𝐲 βŠ“ 𝐳 = (π‘Œ 1 ∩ 𝑍 1 , … , π‘Œ 𝑙 ∩ 𝑍 𝑙 ) ➒ 𝑙 -submodular maximization s.t. size constraint ( M 𝒍 SC ) ❑ π‘Š – a finite set of elements, 𝐢 – a positive integer. Find 𝐭 = (𝑇 1 , … , 𝑇 𝑙 ) s.t. 𝑙 + 1 π‘Š - a family of 𝑙 disjoint subsets of π‘Š ❑ 𝐭 =Ϊ‚ 𝑗≀𝑙 𝑇 𝑗 ≀ 𝐢 that ❑ 𝑔: 𝑙 + 1 π‘Š β†’ ℝ + - a 𝑙 -submodular function. maximizes 𝑔(𝐭)

  3. 𝒍 -submodular maximization s.t. size constraint ➒ Applications: ❑ Influence maximization with 𝑙 topics/products ❑ Sensor placement with 𝑙 kinds of sensors ❑ Coupled Feature Selection. ➒ Existing solutions (*) ❑ Greedy: 2 approximation ratio, 𝑃(π‘™π‘œπΆ) query complexity 𝐢 ❑ β€œLazy” Greedy: 2 approximation ratio, 𝑃(𝑙 π‘œ βˆ’ 𝐢 log 𝐢 log πœ€ ) query complexity with probability at least 1 βˆ’ πœ€ (*) Ohsaka, Naoto, and Yuichi Yoshida. "Monotone k-submodular function maximization with size constraints." Advances in Neural Information Processing Systems . 2015.

  4. Practical Challenges ➒ Noisy evaluation. ❑ In many applications (e.g. Influence Maximization), obtaining exact value for 𝑔(𝐭) is impractical. ❑ 𝑔 can only be queried through a noisy version 𝐺 1 βˆ’ πœ— 𝑔 𝐭 ≀ 𝐺 𝐭 ≀ 1 + πœ— 𝑔(𝐭) for all 𝐭 ∈ 𝑙 + 1 π‘Š ➒ Streaming. ❑ Algorithms are required to take only one single pass over π‘Š β–ͺ Produce solutions in a timely manner. β–ͺ Avoid excessive storage in memory.

  5. Our contribution ➒ Two streaming algorithms for MkSC – DStream & RStream ❑ Take only 1 single scan over π‘Š ❑ Access 𝐺 instead of 𝑔 ❑ Performance guarantee: β–ͺ Approximation ratio 𝑔 𝐭 /𝑔(𝐩) : 𝐩 - optimal solution. β–ͺ Query and memory complexity ➒ Experimental Evaluation ❑ Influence maximization with 𝑙 topics. ❑ Sensor placement with 𝑙 kinds of sensor.

  6. DStream ➒ Obtain 𝑝 such that 𝑔 𝑝 β‰₯ 𝑝 Γ— 𝐢 β‰₯ 𝑔 𝑝 /(1 + 𝛿) ❑ Using lazy estimation (*) ➒ For a new element 𝑓 , if 𝒕 < 𝐢 𝑇 1 Find max 𝑗≀𝑙 𝐺(𝐭 βŠ” (𝑓, 𝑗)) 𝑇 2 𝑓 Disjoint subsets obtained by putting 𝑓 into 𝑇 𝑗 𝑇 3 (*) Badanidiyuru, Ashwinkumar, et al. "Streaming submodular maximization: Massive data summarization on the fly." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining . 2014.

  7. DStream ➒ Obtain 𝑝 such that 𝑔 𝑝 β‰₯ 𝑝 Γ— 𝐢 β‰₯ 𝑔 𝑝 /(1 + 𝛿) ❑ Using lazy estimation (*) ➒ For a new element 𝑓 , if 𝒕 < 𝐢 𝑇 1 𝐺 π­βŠ” 𝑓,𝑗 𝑝 𝑇 2 Putting 𝑓 to 𝑇 𝑗 if β‰₯ 𝐭 + 1 1βˆ’πœ— 𝑁 𝑓 𝑇 3 (*) Badanidiyuru, Ashwinkumar, et al. "Streaming submodular maximization: Massive data summarization on the fly." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining . 2014.

  8. DStream ➒ Obtain 𝑝 such that 𝑔 𝑝 β‰₯ 𝑝 Γ— 𝐢 β‰₯ 𝑔 𝑝 /(1 + 𝛿) ❑ Using lazy estimation (*) ➒ For a new element 𝑓 , if 𝒕 < 𝐢 𝑇 1 𝐺 π­βŠ” 𝑓,𝑗 𝑝 𝑇 2 Putting 𝑓 to 𝑇 𝑗 if β‰₯ 𝐭 + 1 1βˆ’πœ— 𝑁 𝑓 Largest possible value of 𝑔 𝐭 βŠ” 𝑓, 𝑗 𝑇 3 (*) Badanidiyuru, Ashwinkumar, et al. "Streaming submodular maximization: Massive data summarization on the fly." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining . 2014.

  9. DStream’s performance guarantee ➒ 𝐲 = (π‘Œ 1 , … , π‘Œ 𝑙 ) can also be understood as a vector 𝐲: π‘Š β†’ [𝑙] 𝑓 1 𝑓 2 𝑓 3 … … 𝑓 … … … … π‘˜ x 1 0 4 … … 𝑗 … … … … 𝐲 𝑓 = α‰Š 𝑗 if 𝑓 ∈ π‘Œ 𝑗 0 if 𝑓 βˆ‰Ϊ‚ 𝑗 π‘Œ 𝑗

  10. DStream’s performance guarantee ➒ 𝐭 0 , 𝐭 1 , … , 𝐭 𝑒 - sequence of obtained solutions ❑ 𝐭 𝑗 - obtained solution after adding 𝑗 elements ( 𝒕 𝑗 = 𝑗 ) ➒ Construct a sequence 𝐩 0 , 𝐩 1 , … , 𝐩 𝑒 𝐩 𝑗 = (𝐩 βŠ” 𝐭 𝑗 ) βŠ” 𝐭 𝑗 𝐭 𝑗 1 0 2 3 0 0 0 0 𝐩 2 1 2 0 0 3 0 1 𝐩 𝑗 1 1 2 3 0 3 0 1

  11. DStream’s performance guarantee 𝐩 0 𝐭 0 ➒ If in the end 𝐭 = 𝐢 𝐩 2 𝐭 1 𝑔 𝐭 β‰₯ 1 βˆ’ πœ— 𝑔(𝐩) 𝐩 2 𝐭 2 1 + πœ— 1 + 𝛿 𝑁 𝐩 3 𝐭 3 𝐩 4 𝐭 4

  12. DStream’s performance guarantee ➒ If in the end 𝐭 = 𝑒 < 𝐢 , with 𝑔 is monotone . ❑ Establish recursive relationship between 𝑝 π‘˜ , 𝑑 π‘˜ 𝑔 𝐩 π‘˜βˆ’1 + 𝑔 𝐭 π‘˜βˆ’1 ≀ 𝑔 𝐩 π‘˜ + 1 + πœ— 𝐩 0 𝐭 0 1 βˆ’ πœ— 𝑔(𝐭 π‘˜ ) 𝐩 2 ❑ Bound 𝑔 𝐩 βˆ’ 𝑔(𝐩 𝑒 ) (βˆ—) 𝐭 1 𝑔 𝐩 βˆ’ 𝑔 𝐩 𝑒 ≀ 1 + πœ— + 2πΆπœ— 𝑔(𝐭) 𝐩 2 𝐭 2 1 βˆ’ πœ— ❑ Bound 𝑔 𝐩 𝑒 βˆ’ 𝑔(𝐭) (βˆ—βˆ—) 𝐩 3 𝐭 3 𝑔 𝐩 𝑒 βˆ’ 𝑔 𝐭 ≀ 1 𝑁 𝑔 𝐩 + 2πΆπœ— 1 βˆ’ πœ— 𝑔(𝐭) ❑ Discard 𝑔(𝐩 𝑒 ) by combining βˆ— and (βˆ—βˆ—) 𝑁 2 + 4πΆπœ— 𝑔 𝐩 ≀ 𝑔(𝐭) 𝑁 βˆ’ 1 1 βˆ’ πœ—

  13. DStream’s performance guarantee ➒ If in the end 𝐭 = 𝑒 < 𝐢 , with 𝑔 is non-monotone . 𝐩 0 𝐭 0 ❑ 𝑔 is pairwise monotone 𝐩 2 Ξ” 𝑓,𝑗 𝑔 𝐲 + Ξ” 𝑓,π‘˜ 𝑔 𝐲 β‰₯ 0 𝐭 1 ❑ Using the same framework as the 𝐩 2 𝐭 2 monotone case but with different β€œmath” 𝐩 3 𝐭 3 𝑁 (1 + πœ—)(3 + 3πœ— + 6πΆπœ—) 𝑔 𝐩 β‰₯ 𝑔(𝐭) 1 βˆ’ πœ— 2 𝑁 βˆ’ 1

  14. DStream Lazy estimation to obtain 𝑝 β€’ 𝑔 𝐩 ∈ [Ξ” π‘š , 𝐢 Γ— Ξ” 𝑣 ] β€’ 𝑝 can be obtained by a value of 1 + 𝛿 π‘˜ ∈ Ξ” π‘š [ 𝐢 , 𝑁(1 + πœ—)Ξ” 𝑣 ] Query complexity 𝑃(π‘œπ‘™ 𝛿 log( 1 + πœ— (1 + 𝛿) 𝐢𝑁)) 1 βˆ’ πœ— Memory complexity 𝑃(𝐢 𝛿 log( 1 + πœ— (1 + 𝛿) 𝐢𝑁)) 1 βˆ’ πœ—

  15. DStream Approximation ratio 1 + πœ— 1 βˆ’ πœ— min π‘¦βˆˆ(1,𝑁] max(𝑏 𝑦 , 𝑐(𝑦)) If 𝑔 is monotone (1+𝛿)(1+πœ—) β€’ 𝑏 𝑦 = 𝑦 1βˆ’πœ— 2+4πΆπœ— 𝑦 β€’ 𝑐 𝑦 = 1βˆ’πœ— π‘¦βˆ’1 If 𝑔 is non-monotone (1+𝛿)(1+πœ—) β€’ 𝑏 𝑦 = 𝑦 1βˆ’πœ— (1+πœ—)(3+3πœ—+6πΆπœ—) 𝑦 β€’ 𝑐 𝑦 = 1βˆ’πœ— 2 π‘¦βˆ’1

  16. DStream’s weakness 𝐺 π­βŠ” 𝑓,𝑗 𝑝 Putting 𝑓 to 𝑇 𝑗 if β‰₯ 𝐭 + 1 𝑇 2 1βˆ’πœ— 𝑁 𝑓 𝑝 What if 𝑔 𝐭 β‰₯ 𝐭 + 1 𝑁 ? 𝑇 3 β€’ 𝑓 may have no contribution to 𝐭 β€’ Better consider marginal gain

  17. RStream ➒ For a new element 𝑓 , if 𝒕 < 𝐢 𝑒 𝑗 = 𝐺(𝐭 βŠ” (𝑓, 𝑗)) βˆ’ 𝐺(𝐭) 𝑇 1 1 βˆ’ πœ— 1 + πœ— β€’ 𝑒 𝑗 is an upper bound on Ξ” 𝑓,𝑗 𝑔(𝐭) 𝑇 2 𝑓 𝑇 3

  18. RStream ➒ For a new element 𝑓 , if 𝒕 < 𝐢 𝑒 𝑗 = 𝐺(𝐭 βŠ” (𝑓, 𝑗)) βˆ’ 𝐺(𝐭) 𝑇 1 1 βˆ’ πœ— 1 + πœ— β€’ 𝑒 𝑗 is an upper bound on Ξ” 𝑓,𝑗 𝑔(𝐭) 𝑇 2 𝑓 𝑝 β€’ Filter out 𝑇 𝑗 that 𝑒 𝑗 ≀ 𝑁 𝑝 β€’ 𝑒 𝑗 = 0 if 𝑒 𝑗 ≀ 𝑁 β€’ Otherwise 𝑒 𝑗 keeps its value 𝑇 3 β€’ Randomly put 𝑓 into 𝑇 𝑗 with probability π‘ˆβˆ’1 / ෍ π‘ˆβˆ’1 𝑒 𝑗 𝑒 π‘˜ π‘˜ 𝑝 β€’ π‘ˆ = | π‘˜ ∢ 𝑒 π‘˜ β‰₯ | 𝑁

  19. RStream 𝑒 𝑗 = 𝐺(𝐭 βŠ” (𝑓, 𝑗)) βˆ’ 𝐺(𝐭) 𝑇 1 1 βˆ’ πœ— 1 + πœ— β€’ 𝑒 𝑗 is an upper bound on Ξ” 𝑓,𝑗 𝑔(𝐭) 𝑇 2 𝑓 What if 𝐺 𝐭 β‰ˆ 𝑔 𝐭 = 𝑔(𝐭 βŠ” (𝑓, 𝑗)) β‰ˆ 𝐺(𝐭 βŠ” (𝑓, 𝑗)) β€’ 𝑓 has no contribution 𝑇 3 2πœ— 𝑝 β€’ But 𝑒 𝑗 β‰ˆ 1βˆ’πœ— 2 𝑔 𝐭 β‰₯ 𝑁

  20. RStream 𝑒 𝑗 = 𝐺(𝐭 βŠ” (𝑓, 𝑗)) βˆ’ 𝐺(𝐭) 𝑇 1 1 βˆ’ πœ— 1 + πœ— β€’ 𝑒 𝑗 is an upper bound on Ξ” 𝑓,𝑗 𝑔(𝐭) 𝑇 2 𝑓 (Denoise) Run multiple instances, each instance assumes 𝐺 is less noisy than it is. 𝒆 𝒋,𝝑 β€² = 𝐺(𝐭 βŠ” (𝑓, 𝑗)) βˆ’ 𝐺(𝐭) 𝑇 3 1 βˆ’ 𝝑 β€² 1 + 𝝑 β€² where πœ— β€² = 0, πœ— 2πœ— πœƒβˆ’1 , πœƒβˆ’1 , … , πœ— πœƒ – adjustable parameter, controlling number of instances

  21. (Denoise) Run multiple instances, each instance assumes 𝐺 is less noisy than it actually is.

  22. Lazy estimation: Ξ” 𝑣 is much larger than the one in DStream in order to bound 𝑒 𝑗 s’ value. Query complexity 𝛿 log(( 1 + πœ— 2 + 4πΆπœ—)(1 + 𝛿) 𝑃(π‘œπ‘™πœƒ 𝐢𝑁)) 1 βˆ’ πœ— 2 Memory complexity 𝛿 log(( 1 + πœ— 2 + 4πΆπœ—)(1 + 𝛿) 𝑃(πœƒπΆ 𝐢𝑁)) 1 βˆ’ πœ— 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend