Announcements Homework 2: Due Thursday Feb 19 Project milestone - PowerPoint PPT Presentation

Active Learning and Optimized Information Gathering Lecture 12 – Submodularity CS 101.2 Andreas Krause

Announcements Homework 2: Due Thursday Feb 19 Project milestone due: Feb 24 4 Pages, NIPS format: http://nips.cc/PaperInformation/StyleFiles Should contain preliminary results (model, experiments, proofs, …) as well as timeline for remaining work Come to office hours to discuss projects! Office hours Come to office hours before your presentation! Andreas: Monday 3pm-4:30pm , 260 Jorgensen Ryan: Wednesday 4:00-6:00pm, 109 Moore 2

Course outline Online decision making 1. Statistical active learning 2. Combinatorial approaches 3. 3

Medical diagnosis Want to predict medical condition of patient given noisy symptoms / tests Body temperature healthy sick Rash on skin Treatment -$$ $ Cough Increased antibodies No treatment 0 -$$$ in blood Abnormal MRI Treating a healthy patient is bad, not treating a sick patient is terrible Each test has a (potentially different) cost Which tests should we perform to make most effective decisions? 4

Value of information Prior P(Y) obs X i = x i Posterior P(Y | x i ) Reward Value of information: Reward[ P(Y | x i ) ] = max a EU(a | x i ) Reward can by any function of the distribution P(Y | x i ) Important examples: Posterior variance of Y Posterior entropy of Y 5

Optimal value of information Can we efficiently optimize value of information? � Answer depends on properties of the distribution P(X 1 ,…,X n ,Y) Theorem [Krause & Guestrin IJCAI ’05]: If the random variables form a Markov Chain, can find optimal (exponentially large!) decision tree in polynomial time ☺ There exists a class of distributions for which we can perform efficient inference (i.e., compute P(Y|X i )), where finding the optimal decision tree is NP PP hard 6

Approximating value of information? If we can’t find an optimal solution, can we find provably near-optimal approximations?? 7

Feature selection Given random variables Y, X 1 , … X n Want to predict Y from subset X A = (X i1 ,…,X ik ) Y Naïve Bayes Model “Sick” X 1 X 2 X 3 “Fever” “Rash” “Male” Want k most informative features: A* = argmax IG(X A ; Y) s.t. |A| ≤ k where IG(X A ; Y) = H(Y) - H(Y | X A ) Uncertainty Uncertainty before knowing X A after knowing X A 8

Example: Greedy algorithm for feature selection Given: finite set V of features, utility function F(A) = IG(X A ; Y) Want: A * ⊆ V such that � �� NP-hard! � � � � � � Greedy algorithm: �� Start with A = ∅ For i = 1 to k s* := argmax s F(A ∪ {s}) A := A ∪ {s*} How well can this simple heuristic do? 9

Key property: Diminishing returns Selection A = {} Selection B = {X 2 ,X 3 } � � �� Adding X 1 Adding X 1 Theorem [Krause, Guestrin UAI ‘05] : Information gain F(A) in �� will help a lot! doesn’t help much Naïve Bayes models is submodular! New feature X 1 + s Large improvement Submodularity: A B + s Small improvement For A ⊆ B, F(A ∪ {s}) – F(A) ≥ F(B ∪ {s}) – F(B) 10

Why is submodularity useful? Theorem [Nemhauser et al ‘78] Greedy maximization algorithm returns A greedy : F(A greedy ) ≥ (1-1/e) max |A| ≤ k F(A) �� Greedy algorithm gives near-optimal solution! For info-gain: Guarantees best possible unless P = NP! [Krause, Guestrin UAI ’05] Submodularity is an incredibly useful and powerful concept! 11

Set functions Finite set V = {1,2,…,n} Function F: 2 V → R Will always assume F( ∅ ) = 0 (w.l.o.g.) Assume black-box that can evaluate F for any input A Approximate (noisy) evaluation of F is ok Example: F(A) = IG(X A ; Y) = H(Y) – H(Y | X A ) = ∑ y,xA P(x A ) [log P(y | x A ) – log P(y)] � � �� 12

Submodular set functions Set function F on V is called submodular if For all A,B ⊆ V: F(A)+F(B) ≥ F(A ∪ B)+F(A � B) ≥ + + B A A ∪ B A � B Equivalent diminishing returns characterization: + S Large improvement Submodularity: A B + S Small improvement For A ⊆ B, s ∉ B, F(A ∪ {s}) – F(A) ≥ F(B ∪ {s}) – F(B) 13

Submodularity and supermodularity Set function F on V is called submodular if 1) For all A,B ⊆ V: F(A)+F(B) ≥ F(A ∪ B)+F(A � B) � 2) For all A ⊆ B, s ∉ B, F(A ∪ {s}) – F(A) ≥ F(B ∪ {s}) – F(B) F is called supermodular if –F is submodular F is called modular if F is both sub- and supermodular for modular (“additive”) F, F(A) = ∑ i ∈ A w(i) 14

Example: Set cover !�"#�#$��$��%�$$�&��" '�#��(�� )��"�$�� "�*+��(�", )$��*�� $��#�$"�� - �� For A ⊆ V: F(A) = “area Node predicts covered by sensors placed at A” values of positions with some radius Formally: W finite set, collection of n subsets S i ⊆ W For A ⊆ V={1,…,n} define F(A) = | � i ∈ A S i | 15

Set cover is submodular .�� . ∪ ��1��2��.� ≥ �� / ∪ ��1��2��/� � � � � �� /�� 0 � 16

Example: Mutual information Given random variables X 1 ,…,X n F(A) = I(X A ; X V \ A ) = H(X V \ A ) – H(X V \ A |X A ) Lemma: Mutual information F(A) is submodular F(A ∪ {s}) – F(A) = H(X s | X A ) – H(X s | X V \ (A ∪ {s}) ) δ s (A) = F(A ∪ {s})-F(A) monotonically nonincreasing � F submodular ☺ 17

Example: Influence in social networks [Kempe, Kleinberg, Tardos KDD ’03] Dorothy Eric Alice �� Prob. of influencing �� 0 �� Bob �� Fiona Charlie Who should get free cell phones? V = {Alice,Bob,Charlie,Dorothy,Eric,Fiona} F(A) = Expected number of people influenced when targeting A 18

Influence in social networks is submodular [Kempe, Kleinberg, Tardos KDD ’03] Dorothy Eric Alice �� 0 �� Bob �� Fiona Charlie Key idea: Flip coins c in advance � “live” edges F c (A) = People influenced under outcome c (set cover!) F(A) = ∑ c P(c) F c (A) is submodular as well! 19

Closedness properties F 1 ,…,F m submodular functions on V and λ 1 ,…, λ m > 0 Then: F(A) = ∑ i λ i F i (A) is submodular! Submodularity closed under nonnegative linear combinations! Extremely useful fact!! F θ (A) submodular ⇒ ∑ θ P( θ ) F θ (A) submodular! Multicriterion optimization: F 1 ,…,F m submodular, λ i ≥ 0 ⇒ ∑ i λ i F i (A) submodular 20

Submodularity and Concavity Suppose g: N → R and F(A) = g(|A|) Then F(A) submodular if and only if g concave! E.g., g could say “buying in bulk is cheaper” ,�3.3� 3.3 21

Maximum of submodular functions Suppose F 1 (A) and F 2 (A) submodular. Is F(A) = max(F 1 (A),F 2 (A)) submodular? ��.��4�5�� .�� .�� .� � � �.� 3.3 max(F 1 ,F 2 ) not submodular in general! 22

Minimum of submodular functions Well, maybe F(A) = min(F 1 (A),F 2 (A)) instead? F 1 (A) F 2 (A) F(A) ��*��6 �� ∅ �� ∅ 0 0 0 7 {a} 1 0 0 ��*��6 �� {b} 0 1 0 {a,b} 1 1 1 min(F 1 ,F 2 ) not submodular in general! /+#��#�8�#+"�(�6 '�1��((��4�" � � � ��#��9 23

Maximizing submodular functions Minimizing convex functions: Minimizing submodular functions: Polynomial time solvable! Polynomial time solvable! Maximizing convex functions: Maximizing submodular functions: NP hard! NP hard! But can get approximation guarantees ☺ 24

Maximizing influence [Kempe, Kleinberg, Tardos KDD ’03] Dorothy Eric Alice 0.2 0.5 0.4 0.2 0.3 0.5 Bob 0.5 Fiona Charlie F(A) = Expected #people influenced when targeting A F monotonic: If A ⊆ B: F(A) ≤ F(B) Hence V = argmax A F(A) More interesting: argmax A F(A) – Cost(A) 25

Maximizing non-monotonic functions 4�5�4+4 Suppose we want for not monotonic F A* = argmax F(A) s.t. A ⊆ V 3.3 Example: F(A) = U(A) – C(A) where U(A) is submodular utility, and C(A) is supermodular cost function In general: NP hard. Moreover: If F(A) can take negative values: As hard to approximate as maximum independent set (i.e., NP hard to get O(n 1- ε ) approximation) 26

Announcements Homework 2: Due Thursday Feb 19 Project milestone - PowerPoint PPT Presentation

Active Learning and Optimized Information Gathering Lecture 12 Submodularity CS 101.2 Andreas Krause Announcements Homework 2: Due Thursday Feb 19 Project milestone due: Feb 24 4 Pages, NIPS format:

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability & CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

Tutorial: Theory of RaSH Made Easy Benjamin Doerr Max-Planck-Institut fr Informatik

CRISP Data Utility Overview HSCRC Data and Infrastructure Workgroup Meeting March 4, 2014 CRISP

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Feature selection:

The presentation will start shortly COVID-19 Health Care Provider Briefing Middlesex and London

Collective environmental awareness as leverage for a sustainable future: Italian cities as a lab

A multi-scale area-interaction model for spatio-temporal point patterns Marie-Colette van

FIRST GENERATION ANTI-EGFR THERAPIES AND RESISTANCE MECHANISMS Lucio Crin IRST IRCCS - Meldola

Administrative Notes March 15, 2018 Do you want to present your project to the class? If so,

Announcements Homework 2: Due Thursday Feb 19 Project milestone - PowerPoint PPT Presentation

Active Learning and Optimized Information Gathering Lecture 12 Submodularity CS 101.2 Andreas Krause Announcements Homework 2: Due Thursday Feb 19 Project milestone due: Feb 24 4 Pages, NIPS format:

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability &amp; CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

Tutorial: Theory of RaSH Made Easy Benjamin Doerr Max-Planck-Institut fr Informatik

CRISP Data Utility Overview HSCRC Data and Infrastructure Workgroup Meeting March 4, 2014 CRISP

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Feature selection:

The presentation will start shortly COVID-19 Health Care Provider Briefing Middlesex and London

Collective environmental awareness as leverage for a sustainable future: Italian cities as a lab

A multi-scale area-interaction model for spatio-temporal point patterns Marie-Colette van

FIRST GENERATION ANTI-EGFR THERAPIES AND RESISTANCE MECHANISMS Lucio Crin IRST IRCCS - Meldola

Administrative Notes March 15, 2018 Do you want to present your project to the class? If so,

Linearizability & CAP Announcements No hours this week. Announcements No hours this