Streaming -submodular Maximization under Noise subject to Size - - PowerPoint PPT Presentation

β–Ά
streaming submodular maximization under noise subject to
SMART_READER_LITE
LIVE PREVIEW

Streaming -submodular Maximization under Noise subject to Size - - PowerPoint PPT Presentation

Streaming -submodular Maximization under Noise subject to Size Constraint Lan N. Nguyen, My T. Thai University of Florida -submodular maximization s.t. size constraint -submodular function is a generalization of submodular


slide-1
SLIDE 1

Streaming 𝒍-submodular Maximization under Noise subject to Size Constraint

Lan N. Nguyen, My T. Thai University of Florida

slide-2
SLIDE 2

𝒍-submodular maximization s.t. size constraint

➒ 𝑙-submodular function is a generalization of submodular function ❑ Submodular set function: input is a single subset π‘Š 𝑔 π‘Œ + 𝑔 𝑍 β‰₯ 𝑔 π‘Œ βˆͺ 𝑍 + 𝑔(π‘Œ ∩ 𝑍) ❑ 𝑙-submodular function: input is 𝑙 disjoint subsets of π‘Š 𝑔 𝐲 + 𝑔 𝐳 β‰₯ 𝑔 𝐲 βŠ” 𝐳 + 𝑔(𝐲 βŠ“ 𝐳) β–ͺ 𝐲 = (π‘Œ1, … , π‘Œπ‘™) and 𝐳 = (𝑍

1, … , 𝑍 𝑙)

β–ͺ 𝐲 βŠ” 𝐳 = (π‘Ž1, … , π‘Žπ‘™) where π‘Žπ‘— = π‘Œπ‘— βˆͺ 𝑍

𝑗 βˆ– (Ϊ‚π‘˜β‰ π‘— π‘Œ π‘˜ βˆͺ 𝑍 π‘˜)

β–ͺ 𝐲 βŠ“ 𝐳 = (π‘Œ1 ∩ 𝑍

1, … , π‘Œπ‘™ ∩ 𝑍 𝑙)

➒ 𝑙-submodular maximization s.t. size constraint (M𝒍SC) ❑ π‘Š – a finite set of elements, 𝐢 – a positive integer. ❑ 𝑙 + 1 π‘Š - a family of 𝑙 disjoint subsets of π‘Š ❑ 𝑔: 𝑙 + 1 π‘Š β†’ ℝ+ - a 𝑙-submodular function.

Find 𝐭 = (𝑇1, … , 𝑇𝑙) s.t. 𝐭 =ڂ𝑗≀𝑙 𝑇𝑗 ≀ 𝐢 that maximizes 𝑔(𝐭)

slide-3
SLIDE 3

𝒍-submodular maximization s.t. size constraint

➒ Applications: ❑ Influence maximization with 𝑙 topics/products ❑ Sensor placement with 𝑙 kinds of sensors ❑ Coupled Feature Selection. ➒ Existing solutions (*) ❑ Greedy: 2 approximation ratio, 𝑃(π‘™π‘œπΆ) query complexity ❑ β€œLazy” Greedy: 2 approximation ratio, 𝑃(𝑙 π‘œ βˆ’ 𝐢 log 𝐢 log

𝐢 πœ€) query

complexity with probability at least 1 βˆ’ πœ€

(*) Ohsaka, Naoto, and Yuichi Yoshida. "Monotone k-submodular function maximization with size constraints." Advances in Neural Information Processing Systems. 2015.

slide-4
SLIDE 4

Practical Challenges

➒ Noisy evaluation.

❑ In many applications (e.g. Influence Maximization), obtaining exact value for 𝑔(𝐭) is impractical. ❑ 𝑔 can only be queried through a noisy version 𝐺 1 βˆ’ πœ— 𝑔 𝐭 ≀ 𝐺 𝐭 ≀ 1 + πœ— 𝑔(𝐭) for all 𝐭 ∈ 𝑙 + 1 π‘Š

➒ Streaming. ❑ Algorithms are required to take only one single pass over π‘Š

β–ͺ Produce solutions in a timely manner. β–ͺ Avoid excessive storage in memory.

slide-5
SLIDE 5

Our contribution

➒ Two streaming algorithms for MkSC – DStream & RStream

❑ Take only 1 single scan over π‘Š ❑ Access 𝐺 instead of 𝑔 ❑ Performance guarantee:

β–ͺ Approximation ratio 𝑔 𝐭 /𝑔(𝐩): 𝐩 - optimal solution. β–ͺ Query and memory complexity

➒ Experimental Evaluation

❑ Influence maximization with 𝑙 topics. ❑ Sensor placement with 𝑙 kinds of sensor.

slide-6
SLIDE 6

DStream

➒ Obtain 𝑝 such that 𝑔 𝑝 β‰₯ 𝑝 Γ— 𝐢 β‰₯ 𝑔 𝑝 /(1 + 𝛿)

❑ Using lazy estimation (*)

➒ For a new element 𝑓, if 𝒕 < 𝐢

(*) Badanidiyuru, Ashwinkumar, et al. "Streaming submodular maximization: Massive data summarization on the fly." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.

𝑓 𝑇1 𝑇2 𝑇3

Find max

𝑗≀𝑙 𝐺(𝐭 βŠ” (𝑓, 𝑗))

Disjoint subsets obtained by putting 𝑓 into 𝑇𝑗

slide-7
SLIDE 7

DStream

➒ Obtain 𝑝 such that 𝑔 𝑝 β‰₯ 𝑝 Γ— 𝐢 β‰₯ 𝑔 𝑝 /(1 + 𝛿)

❑ Using lazy estimation (*)

➒ For a new element 𝑓, if 𝒕 < 𝐢

(*) Badanidiyuru, Ashwinkumar, et al. "Streaming submodular maximization: Massive data summarization on the fly." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.

𝑓 𝑇1 𝑇2 𝑇3

Putting 𝑓 to 𝑇𝑗 if

𝐺 π­βŠ” 𝑓,𝑗 1βˆ’πœ—

β‰₯ 𝐭 + 1

𝑝 𝑁

slide-8
SLIDE 8

DStream

➒ Obtain 𝑝 such that 𝑔 𝑝 β‰₯ 𝑝 Γ— 𝐢 β‰₯ 𝑔 𝑝 /(1 + 𝛿)

❑ Using lazy estimation (*)

➒ For a new element 𝑓, if 𝒕 < 𝐢

(*) Badanidiyuru, Ashwinkumar, et al. "Streaming submodular maximization: Massive data summarization on the fly." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.

𝑓 𝑇1 𝑇2 𝑇3

Putting 𝑓 to 𝑇𝑗 if

𝐺 π­βŠ” 𝑓,𝑗 1βˆ’πœ—

β‰₯ 𝐭 + 1

𝑝 𝑁

Largest possible value of 𝑔 𝐭 βŠ” 𝑓, 𝑗

slide-9
SLIDE 9

DStream’s performance guarantee

➒ 𝐲 = (π‘Œ1, … , π‘Œπ‘™) can also be understood as a vector 𝐲: π‘Š β†’ [𝑙]

𝑓1 𝑓2 𝑓3 … … 𝑓

π‘˜

… … … … 1 4 … … 𝑗 … … … …

x 𝐲 𝑓 = α‰Š 𝑗 if 𝑓 ∈ π‘Œπ‘— if 𝑓 βˆ‰Ϊ‚π‘— π‘Œπ‘—

slide-10
SLIDE 10

DStream’s performance guarantee

➒ 𝐭0, 𝐭1, … , 𝐭𝑒 - sequence of obtained solutions

❑ 𝐭𝑗 - obtained solution after adding 𝑗 elements ( 𝒕𝑗 = 𝑗)

➒ Construct a sequence 𝐩0, 𝐩1, … , 𝐩𝑒

𝐩𝑗 = (𝐩 βŠ” 𝐭𝑗) βŠ” 𝐭𝑗

1 2 3 2 1 2 3 1

𝐭𝑗 𝐩

1 1 2 3 3 1

𝐩𝑗

slide-11
SLIDE 11

DStream’s performance guarantee

➒ If in the end 𝐭 = 𝐢 𝑔 𝐭 β‰₯ 1 βˆ’ πœ— 1 + πœ— 𝑔(𝐩) 1 + 𝛿 𝑁

𝐭0 𝐩0 𝐭1 𝐩2 𝐭2 𝐩2 𝐭3 𝐩3 𝐭4 𝐩4

slide-12
SLIDE 12

DStream’s performance guarantee

➒ If in the end 𝐭 = 𝑒 < 𝐢, with 𝑔 is monotone. ❑ Establish recursive relationship between π‘π‘˜, π‘‘π‘˜ 𝑔 π©π‘˜βˆ’1 + 𝑔 π­π‘˜βˆ’1 ≀ 𝑔 π©π‘˜ + 1 + πœ— 1 βˆ’ πœ— 𝑔(π­π‘˜) ❑ Bound 𝑔 𝐩 βˆ’ 𝑔(𝐩𝑒) (βˆ—) 𝑔 𝐩 βˆ’ 𝑔 𝐩𝑒 ≀ 1 + πœ— + 2πΆπœ— 1 βˆ’ πœ— 𝑔(𝐭) ❑ Bound 𝑔 𝐩𝑒 βˆ’ 𝑔(𝐭) (βˆ—βˆ—) 𝑔 𝐩𝑒 βˆ’ 𝑔 𝐭 ≀ 1 𝑁 𝑔 𝐩 + 2πΆπœ— 1 βˆ’ πœ— 𝑔(𝐭) ❑ Discard 𝑔(𝐩𝑒) by combining βˆ— and (βˆ—βˆ—) 𝑔 𝐩 ≀ 𝑁 𝑁 βˆ’ 1 2 + 4πΆπœ— 1 βˆ’ πœ— 𝑔(𝐭) 𝐭0 𝐩0 𝐭1 𝐩2 𝐭2 𝐩2 𝐭3 𝐩3

slide-13
SLIDE 13

DStream’s performance guarantee

➒ If in the end 𝐭 = 𝑒 < 𝐢, with 𝑔 is non-monotone.

❑ 𝑔 is pairwise monotone Δ𝑓,𝑗𝑔 𝐲 + Δ𝑓,π‘˜π‘” 𝐲 β‰₯ 0 ❑ Using the same framework as the monotone case but with different β€œmath” 𝑔 𝐩 β‰₯ 𝑁 𝑁 βˆ’ 1 (1 + πœ—)(3 + 3πœ— + 6πΆπœ—) 1 βˆ’ πœ— 2 𝑔(𝐭) 𝐭0 𝐩0 𝐭1 𝐩2 𝐭2 𝐩2 𝐭3 𝐩3

slide-14
SLIDE 14

DStream

Lazy estimation to obtain 𝑝

  • 𝑔 𝐩 ∈ [Ξ”π‘š, 𝐢 Γ— Δ𝑣]
  • 𝑝 can be obtained by a value of 1 + 𝛿 π‘˜ ∈

[

Ξ”π‘š 𝐢 , 𝑁(1 + πœ—)Δ𝑣]

Query complexity 𝑃(π‘œπ‘™ 𝛿 log( 1 + πœ— (1 + 𝛿) 1 βˆ’ πœ— 𝐢𝑁)) Memory complexity 𝑃(𝐢 𝛿 log( 1 + πœ— (1 + 𝛿) 1 βˆ’ πœ— 𝐢𝑁))

slide-15
SLIDE 15

DStream

Approximation ratio

1 + πœ— 1 βˆ’ πœ— min

π‘¦βˆˆ(1,𝑁] max(𝑏 𝑦 , 𝑐(𝑦))

If 𝑔 is monotone

  • 𝑏 𝑦 =

(1+𝛿)(1+πœ—) 1βˆ’πœ—

𝑦

  • 𝑐 𝑦 =

2+4πΆπœ— 1βˆ’πœ— 𝑦 π‘¦βˆ’1

If 𝑔 is non-monotone

  • 𝑏 𝑦 =

(1+𝛿)(1+πœ—) 1βˆ’πœ—

𝑦

  • 𝑐 𝑦 =

(1+πœ—)(3+3πœ—+6πΆπœ—) 1βˆ’πœ— 2 𝑦 π‘¦βˆ’1

slide-16
SLIDE 16

DStream’s weakness

𝑓 𝑇2 𝑇3

Putting 𝑓 to 𝑇𝑗 if

𝐺 π­βŠ” 𝑓,𝑗 1βˆ’πœ—

β‰₯ 𝐭 + 1

𝑝 𝑁

What if 𝑔 𝐭 β‰₯ 𝐭 + 1

𝑝 𝑁 ?

  • 𝑓 may have no contribution to 𝐭
  • Better consider marginal gain
slide-17
SLIDE 17

RStream

➒ For a new element 𝑓, if 𝒕 < 𝐢

𝑓 𝑇1 𝑇2 𝑇3 𝑒𝑗 = 𝐺(𝐭 βŠ” (𝑓, 𝑗)) 1 βˆ’ πœ— βˆ’ 𝐺(𝐭) 1 + πœ—

  • 𝑒𝑗 is an upper bound on Δ𝑓,𝑗𝑔(𝐭)
slide-18
SLIDE 18

RStream

➒ For a new element 𝑓, if 𝒕 < 𝐢

𝑓 𝑇1 𝑇2 𝑇3 𝑒𝑗 = 𝐺(𝐭 βŠ” (𝑓, 𝑗)) 1 βˆ’ πœ— βˆ’ 𝐺(𝐭) 1 + πœ—

  • 𝑒𝑗 is an upper bound on Δ𝑓,𝑗𝑔(𝐭)
  • Filter out 𝑇𝑗 that 𝑒𝑗 ≀

𝑝 𝑁

  • 𝑒𝑗 = 0 if 𝑒𝑗 ≀

𝑝 𝑁

  • Otherwise 𝑒𝑗 keeps its value
  • Randomly put 𝑓 into 𝑇𝑗 with probability

𝑒𝑗

π‘ˆβˆ’1/ ෍ π‘˜

π‘’π‘˜

π‘ˆβˆ’1

  • π‘ˆ = |

π‘˜ ∢ π‘’π‘˜ β‰₯

𝑝 𝑁

|

slide-19
SLIDE 19

RStream

𝑓 𝑇1 𝑇2 𝑇3

What if 𝐺 𝐭 β‰ˆ 𝑔 𝐭 = 𝑔(𝐭 βŠ” (𝑓, 𝑗)) β‰ˆ 𝐺(𝐭 βŠ” (𝑓, 𝑗))

𝑒𝑗 = 𝐺(𝐭 βŠ” (𝑓, 𝑗)) 1 βˆ’ πœ— βˆ’ 𝐺(𝐭) 1 + πœ—

  • 𝑒𝑗 is an upper bound on Δ𝑓,𝑗𝑔(𝐭)
  • 𝑓 has no contribution
  • But 𝑒𝑗 β‰ˆ

2πœ— 1βˆ’πœ—2 𝑔 𝐭 β‰₯ 𝑝 𝑁

slide-20
SLIDE 20

RStream

𝑓 𝑇1 𝑇2 𝑇3 𝑒𝑗 = 𝐺(𝐭 βŠ” (𝑓, 𝑗)) 1 βˆ’ πœ— βˆ’ 𝐺(𝐭) 1 + πœ—

  • 𝑒𝑗 is an upper bound on Δ𝑓,𝑗𝑔(𝐭)

(Denoise) Run multiple instances, each instance assumes 𝐺 is less noisy than it is. 𝒆𝒋,𝝑′ = 𝐺(𝐭 βŠ” (𝑓, 𝑗)) 1 βˆ’ 𝝑′ βˆ’ 𝐺(𝐭) 1 + 𝝑′ where πœ—β€² = 0,

πœ— πœƒβˆ’1 , 2πœ— πœƒβˆ’1 , … , πœ—

πœƒ – adjustable parameter, controlling number of instances

slide-21
SLIDE 21

(Denoise) Run multiple instances, each instance assumes 𝐺 is less noisy than it actually is.

slide-22
SLIDE 22

Lazy estimation: Δ𝑣 is much larger than the

  • ne in DStream in order to bound 𝑒𝑗s’ value.

Query complexity 𝑃(π‘œπ‘™πœƒ 𝛿 log(( 1 + πœ— 2 + 4πΆπœ—)(1 + 𝛿) 1 βˆ’ πœ— 2 𝐢𝑁)) Memory complexity 𝑃(πœƒπΆ 𝛿 log(( 1 + πœ— 2 + 4πΆπœ—)(1 + 𝛿) 1 βˆ’ πœ— 2 𝐢𝑁))

slide-23
SLIDE 23

Approximation ratio

1 + πœ— 1 βˆ’ πœ— min

π‘¦βˆˆ(1,𝑁] max(𝑏 𝑦 , 𝑐(𝑦))

If 𝑔 is monotone

  • 𝑏 𝑦 =

(1+𝛿)(1+πœ—+2πΆπœ—) 1βˆ’πœ—

𝑦

  • 𝑐 𝑦 = (

1+πœ— 2+4πΆπœ— 1βˆ’πœ—2

1 βˆ’

1 𝑙 + 1) 𝑙𝑦 π‘™π‘¦βˆ’π‘™βˆ’1

If 𝑔 is non-monotone

  • 𝑏 𝑦 =

(1+𝛿)(1+πœ—+2πΆπœ—) 1βˆ’πœ—

𝑦

  • 𝑐 𝑦 =

3π‘™βˆ’2 1+πœ— 2+ 8π‘™βˆ’8 πΆπœ— 1βˆ’πœ— 2 𝑦 π‘™π‘¦βˆ’π‘™βˆ’2

slide-24
SLIDE 24

Experimental Evaluation

➒ Influence Maximization with 𝑙 topics

❑ 𝑙 influence spread processes occur independently in a social network. ❑ Find 𝑇1, … , 𝑇𝑙 that maximize the number of active users

β–ͺ An active user is a user who is activated by at least 1 topics. β–ͺ 𝑇𝑗 - a seed set of users who start spreading topic 𝑗 β–ͺ 𝑇1 βˆͺ β‹― βˆͺ 𝑇𝑙 ≀ 𝐢

➒ Social network: Facebook dataset from SNAP

❑ Leskovec, Jure, and Rok Sosič. "Snap: A general-purpose network analysis and graph-mining library." ACM Transactions on Intelligent Systems and Technology (TIST) 8.1 (2016): 1-20.

➒ Influence model: Linear Threshold

❑ Kempe, David, Jon Kleinberg, and Γ‰va Tardos. "Maximizing the spread of influence through a social network." Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 2003.

slide-25
SLIDE 25

Influence Maximization with 𝒍 topics

➒ Compared algorithms

❑ Greedy (Ohsaka, Naoto, and Yuichi Yoshida et al. NIPS’15) ❑ IM: randomly select 1 topic and solve classical Influence Maximization problem ❑ SGr: simple streaming, pick 𝑓 with prob.

𝐢 π‘œ and put to 𝑇𝑗 that maximizes 𝐺(𝑑 βŠ” (𝑓, 𝑗))

slide-26
SLIDE 26

Influence Maximization with 𝒍 topics

➒ DStream and RStream (πœƒ = 2)

❑ Returned solutions approximately to Greedy, outperformed IM in most cases. ❑ Outperformed Greedy in # queries by a huge margin.

slide-27
SLIDE 27

Influence Maximization with 𝒍 topics

➒ Denoise step helped RStream improve performance.

❑ πœƒ = 1 causes RStream terminate prematurely and perform worse than DStream ❑ πœƒ = 2 helps RStream improve solution quality but take 4 times more queries than DStream.

slide-28
SLIDE 28

Influence Maximization with 𝒍 topics

➒ The larger 𝜹 is, the lower solution quality and the fewer queries the

algorithms obtained.

➒ The smaller M is, the lower solution quality and the fewer queries

the algorithms obtained.

slide-29
SLIDE 29

Conclusion

➒ We propose 2 streaming algorithms with theoretical performance guarantee to solve MkSC under noise. ➒ In comparison with Greedy, our algorithms

❑ Take much fewer queries ❑ Obtain comparable solutions in term of quality.

➒ Thanks! Questions?

❑ lan.nguyen@ufl.edu