Submodular Functions Part I ML Summer School Cdiz Stefanie Jegelka - PowerPoint PPT Presentation

Submodular Functions – Part I ML Summer School Cádiz Stefanie Jegelka MIT

Set functions ground set V = F : 2 V → R cost of buying items ( ) = together, or F utility, or probability, … We will assume: • . F ( ∅ ) = 0 • black box “oracle” to evaluate F 2

Discrete Labeling sky tree house grass F ( S ) = coherence + likelihood 3 ¡

Summarization F ( S ) = relevance + diversity or coverage 4 ¡

Informative Subsets OF F ICE OF F ICE QUIE T PHONE CONF E RENCE STO R AGE LA B ELEC COPY SE R VER KITCHE N • where put sensors? • which experiments? • summarization F ( S ) = “information” 5 ¡

Sparsity A x + noise y = F ( S ) =“penalty on support pattern”

Formalization • Formalization: Optimize a set function F(S) (under constraints) OF F ICE OF F ICE QUIE T PHONE CONF E RENCE STO R AGE LA B ELEC COPY SE R VER KITCHE N • generally very hard L • submodularity helps: efficient optimization & inference with guarantees! J

Roadmap • Submodular set functions – what is this? where does it occur? how recognize? • Maximizing submodular functions: diversity, repulsion, concavity greed is not too bad • Minimizing submodular functions: coherence, regularization, convexity the magic of “discrete analog of convex” • Other questions around submodularity & ML more reading & papers: http://people.csail.mit.edu/stefje/mlss/literature.pdf

Sensing OF F ICE OF F ICE QUIE T PHONE CONF E RENCE STO R AGE LA B ELEC COPY SE R VER KITCHE N = all possible locations V F(S) = information gained from locations in S 9 ¡

Marginal gain • Given set function F : 2 V → R • Marginal gain: F ( s | A ) = F ( A ∪ { s } ) − F ( A ) OF F ICE OF F ICE QUIE T PHONE X 2 X 1 CONF E RENCE STO R AGE LA B ELEC COPY SE R VER KITCHE N X s ¡ ¡ ¡ new ¡sensor ¡s ¡ 10

Diminishing marginal gains placement ¡A ¡= ¡{1,2} ¡ placement ¡B ¡= ¡{1,…,5} ¡ OF F ICE OF F ICE QUIE T PHONE OF F ICE OF F ICE QUIE T PHONE X 2 X 2 ¡ X 1 CONF E RENCE X 1 ¡ CONF E RENCE STO R AGE X 3 ¡ STO R AGE LA B LA B ELEC COPY ELEC COPY SE R VER SE R VER X 5 ¡ KITCHE N X 4 ¡ KITCHE N Big ¡gain ¡ Adding ¡ s ¡helps ¡a ¡lot! ¡ X s ¡ ¡ ¡ small ¡gain ¡ new ¡sensor ¡s ¡ A B + s + s A ⊆ B F ( A ∪ s ) − F ( A ) F ( B ∪ s ) − F ( B ) ≥ 11 ¡

Submodularity A ⊆ B . |{z} | {z } A B F ( A ∪ s ) − F ( A ) F ( B ∪ s ) − F ( B ) ≥ extra cost: extra cost: free refill J one drink diminishing marginal costs 12

Submodular set functions • Diminishing gains: for all A ⊆ B A B + e + e F ( A ∪ e ) − F ( A ) F ( B ∪ e ) − F ( B ) ≥ • Union-Intersection: for all S, T ⊆ V F ( S ∪ T ) + F ( S ∩ T ) ≥ F ( S ) + F ( T )

The big picture graph ¡ electrical ¡ theory ¡ networks ¡ (Frank ¡1993) ¡ (Narayanan ¡ G. Choquet J. Edmonds 1997) ¡ game ¡ theory ¡ combinatorial ¡ (Shapley ¡1970) ¡ submodular ¡ opDmizaDon ¡ funcDons ¡ matroid ¡ theory ¡ (Whitney, ¡1935) ¡ stochasDc ¡ ¡ machine ¡ ¡ learning ¡ processes ¡ (Macchi ¡1975, ¡ ¡ Borodin ¡2003) ¡ L. Lovász L.S. Shapley

Examples • each element e has a weight w ( e ) + + X F ( S ) = w ( e ) e ∈ S A ⊂ B F ( A ∪ e ) − F ( A ) = w ( e ) F ( B ∪ e ) − F ( B ) = w ( e ) = linear / modular function F and – F always submodular!

Examples OF F ICE OF F ICE QUIE T PHONE CONF E RENCE STO R AGE LA B ELEC COPY SE R VER KITCHE N sensing: F(S) = information gained from locations S 16 ¡

Example: cover � � � � [ F ( S ) = area( v ) � � � � � � v ∈ S F ( B ∪ v ) − F ( B ) ≥ F ( A ∪ v ) − F ( A )

More ¡complex ¡model ¡for ¡sensing ¡ OF F ICE OF F ICE QUIE T PHONE Y s : ¡temperature ¡ Y 1 Y 2 Y 3 1 2 3 CONF E RENCE at ¡locaDon ¡s ¡ STO R AGE X 1 ¡ X 3 ¡ LA B X 2 ¡ ELEC COPY X s : ¡sensor ¡value ¡ Y 6 6 at ¡locaDon ¡s ¡ SE R VER Y 4 KITCHE N Y 5 X s = Y s + noise X 6 ¡ 4 5 X 4 ¡ X 5 ¡ Joint ¡probability ¡distribuDon ¡ ¡ P(X 1 ,…,X n ,Y 1 ,…,Y n ) ¡ ¡= ¡P(Y 1 ,…,Y n ) ¡P(X 1 ,…,X n ¡| ¡Y 1 ,…,Y n ) ¡ Prior ¡ Likelihood ¡ 18 ¡

Sensor placement UDlity ¡of ¡having ¡sensors ¡at ¡subset ¡A ¡of ¡all ¡locaDons ¡ ¡ = I ( Y ; X A ) F ( A ) = H ( Y ) − H ( Y | X A ) Uncertainty ¡ Uncertainty ¡ about ¡temperature ¡Y ¡ about ¡temperature ¡Y ¡ a6er ¡ sensing ¡ before ¡ sensing ¡ OF F ICE OF F ICE QUIE T PHONE OF F ICE OF F ICE QUIE T PHONE X 4 CONF E RENCE CONF E RENCE X 1 STO R AGE STO R AGE X 1 X 3 LA B LA B X 5 ELEC COPY ELEC COPY SE R VER SE R VER X 2 KITCHE N KITCHE N A={1,2,3}: High value F(A) A={1,4,5}: Low value F(A) 19 ¡

Information gain discrete random variables X 1 , . . . X n , Y 1 , . . . , Y m F ( A ) = I ( Y ; X A ) = H ( X A ) − H ( X A | Y ) modular! X = H ( X i | Y ) i ∈ A if all conditionally X i , X j independent given Y 1 Y 2 Y 1 X 1 ¡ then F is submodular! X 3 ¡ X 2 ¡ Y 4 Y 4 X A X 4 ¡ X 5 ¡

Entropy discrete random variables: X 1 , . . . , X n X e ∈ { 1 , . . . , m } F ( S ) = H ( X S ) = joint entropy of variables indexed by S X H ( X e ) = P ( X e = x ) log P ( X e = x ) x ∈ { 1 ,...,m } A ⊂ B , e / ∈ B F ( A ∪ e ) − F ( A ) ≥ F ( B ∪ e ) − F ( B )?? H ( X A ∪ e ) − H ( X A ) = H ( X e | X A ) “information never hurts” ≤ H ( X e | X B ) = H ( X B ∪ e ) − H ( X B ) discrete entropy is submodular!

Submodularity and independence discrete random variables X 1 , . . . , X n statistically independent X i , i ∈ S X ó H is modular/linear on S H ( X S ) = H ( X e ) e ∈ S Similarly: linear independence vectors in S linearly independent V = ó F is modular/linear on S : F(S) = |S| ) F(S) = rank(

Maximizing Influence F ( S ) = expected # infected nodes F ( S ∪ s ) − F ( S ) ≥ F ( T ∪ s ) − F ( T ) (Kempe, Kleinberg & Tardos 2003) 23 ¡

Graph cuts • Cut for one edge: u v F ( { u } ) + F ( { v } ) F ( { u, v } ) + F ( ∅ ) ≥ X F ( S ) = w uv u ∈ S,v / ∈ S u u v u v u v v w uv w uv 0 0 • cut of one edge is submodular! • large graph: sum of edges Useful property: sum of submodular functions is submodular

Sets and boolean vectors any set function … is a function on binary vectors! with . | V | = n F : 2 V → R F : { 0 , 1 } n → R A x = 1 A a 1 a ¡ b = ˆ 1 b ¡ c 0 c ¡ d 0 d ¡ subset selection = binary labeling! 25

Attractive potentials label x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 pixel z 1 z 2 z 3 z 4 z 1 z 1 z 3 z 3 z 4 z 4 z 2 z 2 x 5 x 5 x 7 x 7 x 8 x 8 x 6 x 6 z 6 z 6 z 5 z 5 z 7 z 7 z 6 z 7 z 8 z 8 z 8 z 5 x 12 x 10 x 11 x 11 x 12 x 9 x 9 x 10 z 10 z 10 z 11 z 11 z 9 z 9 z 10 z 11 z 12 z 12 z 12 z 9 P ( x | z ) ∝ exp( − E ( x ; z )) max x ∈ { 0 , 1 } n labels pixel values x ∈ { 0 , 1 } n E ( x ; z ) min ⇔ 26

Attractive potentials P ( x | z ) x 1 x 2 x 3 x 4 x 1 x 2 x 3 x 4 z 1 z 1 z 2 z 3 z 3 z 4 z 4 z 2 ∝ exp( − E ( x ; z )) x 5 x 5 x 6 x 8 x 8 x 7 x 7 x 6 z 5 z 5 z 6 z 6 z 7 z 7 z 8 z 8 x 11 x 12 x 12 x 9 x 9 x 10 x 10 x 11 z 10 z 10 z 11 z 11 z 12 z 9 z 9 z 12 X X E ( x ; z ) = i E i ( x i ) + ij E ij ( x i , x j ) spatial coherence: E ij (1 , 0) + E ij (0 , 1) ≥ E ij (0 , 0) + E ij (1 , 1) i j i j i j i j S ∩ T = ∅ S ∪ T T = { j } S = { i } F ( S ) + F ( T ) F ( S ∪ T ) + F ( S ∩ T ) ≥ 27

Diversity priors P ( S | data) ∝ P ( S ) P (data | S ) “spread out”

Determinantal point processes S • similarity matrix L S L ij = x > i x j L • sample set Y : P ( Y = S ) ∝ det( L S ) = Vol( { x i } i ∈ S ) 2 F ( S ) = log det( K S ) is submodular!

DPP sample uniform DPP similarities: σ 2 = 35 2 σ 2 k x i � x j k 2 ) 1 s ij = exp( �

6 ¡0 ¡8 ¡9 ¡6 ¡7 ¡7 ¡3 ¡6 ¡1 ¡7 ¡0 ¡2 ¡0 ¡0 ¡8 ¡6 ¡3 ¡9 ¡0 ¡4 ¡3 ¡7 ¡7 ¡1 ¡4 ¡4 ¡6 ¡7 ¡7

Submodularity: many examples • linear/modular functions • graph cut function • coverage • propagation/diffusion in networks • entropy • rank functions |{z} . | {z } A • information gain B F ( A ∪ s ) − F ( A ) • log P(S|data) [repulsion] F ( B ∪ s ) − F ( B ) ≥ or -log P(S|data) [coherence]

Closedness properties submodular on . The following are submodular: F ( S ) V • Restriction: F 0 ( S ) = F ( S ∩ W ) S S V V W 33

Closedness properties submodular on . The following are submodular: F ( S ) V • Restriction: F 0 ( S ) = F ( S ∩ W ) • Conditioning: F 0 ( S ) = F ( S ∪ W ) S S V V W 34

Closedness properties submodular on . The following are submodular: F ( S ) V • Restriction: F 0 ( S ) = F ( S ∩ W ) • Conditioning: F 0 ( S ) = F ( S ∪ W ) • Reflection: F 0 ( S ) = F ( V \ S ) S V 35

Submodularity … discrete convexity …. … or concavity? 36

Submodular Functions Part I ML Summer School Cdiz Stefanie Jegelka - PowerPoint PPT Presentation

Submodular Functions Part I ML Summer School Cdiz Stefanie Jegelka MIT Set functions ground set V = F : 2 V R cost of buying items ( ) = together, or F utility, or probability, We will assume: . F ( ) = 0 black

( ) Outline Submodular

Minimizing Submodular Functions Satoru Iwata (RIMS, Kyoto University) Outline Submodular

Approximating Submodular Functions Everywhere Nick Harvey February 16, 2008 Joint work with M.

Fast Semi-differential based Submodular Function Optimization Rishabh Iyer 1 Stefanie Jegelka 2

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

Streaming -submodular Maximization under Noise subject to Size Constraint Lan N. Nguyen, My

Optimization of Submodular Functions Tutorial - lecture II Jan Vondrk 1 1 IBM Almaden Research

Maximization of Submodular Functions Seffi Naor Lecture 1 4th Cargese Workshop on Combinatorial

Fast and Private Submodular and k- Submodular Functions Maximization with Matroid Constraints

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,

CS675: Convex and Combinatorial Optimization Fall 2019 Submodular Function Optimization

Submodular Maximization Seffi Naor Lecture 3 4th Cargese Workshop on Combinatorial Optimization

Learning with Submodular Functions Francis Bach Sierra project-team, INRIA - Ecole Normale Sup

Machine learning and convex optimization with submodular functions Francis Bach Sierra

Introduction to Submodular Functions S. Thomas McCormick Satoru Iwata Sauder School of Business,

A Class of Submodular Functions for Document Summarization Hui Lin, Jeff Bilmes University of

2Q17 Supplem ental Slides John C. R. Hele Chief Financial Officer Table of Contents Page

- A Tutorial - Based on Slides from Dr. Bibhudatta Sahoo University of Illinois at

SLIPPERY SLIDES HACK and CHEATS.|100% WORKING!|NEW METHOD|HACK TOOL. Free No Ads 7 Hacks To Make

Experimental Design CS294 Practical Machine Learning Daniel Ting Original Slides by Barbara

Assessing the Gains from E-Commerce Paul Dolfen, Stanford Liran Einav, Stanford and NBER Pete

TDR discussion for Calibration Sowjanya, Kendall DUNE Physics Week November 15 - 17, 2017

Tipping the Scales: Is ART adding pounds to our patients? Matthew D. Hickey, MD 1 Conflict of

Pro-Diluvian: Understanding Scoped-Flooding for Content Discovery in Information-Centric

Submodular Functions Part I ML Summer School Cdiz Stefanie Jegelka - PowerPoint PPT Presentation

Submodular Functions Part I ML Summer School Cdiz Stefanie Jegelka MIT Set functions ground set V = F : 2 V R cost of buying items ( ) = together, or F utility, or probability, We will assume: . F ( ) = 0 black

( ) Outline Submodular

Minimizing Submodular Functions Satoru Iwata (RIMS, Kyoto University) Outline Submodular

Approximating Submodular Functions Everywhere Nick Harvey February 16, 2008 Joint work with M.

Fast Semi-differential based Submodular Function Optimization Rishabh Iyer 1 Stefanie Jegelka 2

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

Streaming -submodular Maximization under Noise subject to Size Constraint Lan N. Nguyen, My

Optimization of Submodular Functions Tutorial - lecture II Jan Vondrk 1 1 IBM Almaden Research

Maximization of Submodular Functions Seffi Naor Lecture 1 4th Cargese Workshop on Combinatorial

Fast and Private Submodular and k- Submodular Functions Maximization with Matroid Constraints

MELODI M achin E L earning, O ptimization, &amp; D ata I nterpretation @ UW Iyer &amp; Bilmes,

CS675: Convex and Combinatorial Optimization Fall 2019 Submodular Function Optimization

Submodular Maximization Seffi Naor Lecture 3 4th Cargese Workshop on Combinatorial Optimization

Learning with Submodular Functions Francis Bach Sierra project-team, INRIA - Ecole Normale Sup

Machine learning and convex optimization with submodular functions Francis Bach Sierra

Introduction to Submodular Functions S. Thomas McCormick Satoru Iwata Sauder School of Business,

A Class of Submodular Functions for Document Summarization Hui Lin, Jeff Bilmes University of

2Q17 Supplem ental Slides John C. R. Hele Chief Financial Officer Table of Contents Page

- A Tutorial - Based on Slides from Dr. Bibhudatta Sahoo University of Illinois at

SLIPPERY SLIDES HACK and CHEATS.|100% WORKING!|NEW METHOD|HACK TOOL. Free No Ads 7 Hacks To Make

Experimental Design CS294 Practical Machine Learning Daniel Ting Original Slides by Barbara

Assessing the Gains from E-Commerce Paul Dolfen, Stanford Liran Einav, Stanford and NBER Pete

TDR discussion for Calibration Sowjanya, Kendall DUNE Physics Week November 15 - 17, 2017

Tipping the Scales: Is ART adding pounds to our patients? Matthew D. Hickey, MD 1 Conflict of

Pro-Diluvian: Understanding Scoped-Flooding for Content Discovery in Information-Centric

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,