Active Regression via Linear-Sample Sparsification Xue Chen Eric - PowerPoint PPT Presentation

Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Also necessary (coupon collector) Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Also necessary (coupon collector) How can we avoid the dependence on K ? Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s ◮ See y i for i ∈ S . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s ◮ See y i for i ∈ S . Some results for non-linear spaces. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = f ( x ) 2 D ( x ) sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Still equals � f � 2 D in expectation, but now max contribution is κ . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Still equals � f � 2 D in expectation, but now max contribution is κ . ◮ This gives O ( κ log d ) sample complexity by Matrix Chernoff. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then f ( x ) 2 = � α, { φ j ( x ) } d j =1 � 2 sup sup � f � D =1 � α � 2 =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then d � f ( x ) 2 = j =1 � 2 = � α, { φ j ( x ) } d φ j ( x ) 2 . sup sup � f � D =1 � α � 2 =1 j =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then d � f ( x ) 2 = j =1 � 2 = � α, { φ j ( x ) } d φ j ( x ) 2 . sup sup � f � D =1 � α � 2 =1 j =1 Hence d � x φ j ( x ) 2 = d . κ = E j =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Mean zero noise: E [( � f ( x ) − f ( x )) 2 ] ≤ ǫ E [( y − f ( x )) 2 ]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Mean zero noise: E [( � f ( x ) − f ( x )) 2 ] ≤ ǫ E [( y − f ( x )) 2 ]. Generic noise: E [( � f ( x ) − f ( x )) 2 ] ≤ (1 + ǫ ) E [( y − f ( x )) 2 ]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Active learning Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. ◮ Prior work: s = O (( d log d ) 5 / 4 ) [Sabato-Munos ’14], Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. ◮ Prior work: s = O (( d log d ) 5 / 4 ) [Sabato-Munos ’14], s = O ( d log d ) via “volume sampling” [Derezinski-Warmuth-Hsu ’18]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Warmup: suppose we know D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: f ( x i ) 2 . Pr[Label x i ] ∝ sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ E K Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ K = d K . E Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ K = d K . E Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning without knowing D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Can improve to m = O ( K log d ), s = O ( d ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Getting to s = O ( d ) Based on Lee-Sun ’15 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: ◮ x i ∼ D i where D i depends on x 1 , . . . , x i − 1 . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: ◮ x i ∼ D i where D i depends on x 1 , . . . , x i − 1 . ◮ D 1 = D ′ , D 2 avoids points near x 1 , etc. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

Active Regression via Linear-Sample Sparsification Xue Chen Eric - PowerPoint PPT Presentation

Active Regression via Linear-Sample Sparsification Xue Chen Eric Price UT Austin Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 1 / 18 Agnostic learning Xue Chen, Eric Price (UT Austin) Active Regression

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Vertex Sparsification and Oblivious Reductions Ankur Moitra, MIT September 14, 2010 Ankur Moitra

Graph Sampling and Sparsification Lecture 19 CSCI 4974/6971 7 Nov 2016 1 / 10 Todays Biz 1.

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Recommender Systems From Content to Latent Factor Analysis Michael Hahsler Intelligent Data

FUSE-IT: Facility Using smart Secured Energy & Information Technology Adrien BECUE Cassidian

BIRCH : An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu

Asum.ys ----Changes to memory and registers Ruiqin Tian 5.init: irmovl Stack, %esp # Set up stack

Fast Escape Analysis for Region-Based Memory Management Guillaume Salagnac 1 , Sergio Yovine 1 ,

Baire one functions depending on finitely many coordinates Olena Karlova Chernivtsi National

Verified Indifferentiable Hashing into Elliptic Curves eguelin 1 Santiago Zanella B Gilles

Rhode Island Stormwater Design and Installations Standards Manual Public Workshop BMP Design

Sambuz

Useful Links

Newsletter

Mail Us

Active Regression via Linear-Sample Sparsification Xue Chen Eric - PowerPoint PPT Presentation

Active Regression via Linear-Sample Sparsification Xue Chen Eric Price UT Austin Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 1 / 18 Agnostic learning Xue Chen, Eric Price (UT Austin) Active Regression

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Vertex Sparsification and Oblivious Reductions Ankur Moitra, MIT September 14, 2010 Ankur Moitra

Graph Sampling and Sparsification Lecture 19 CSCI 4974/6971 7 Nov 2016 1 / 10 Todays Biz 1.

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Recommender Systems From Content to Latent Factor Analysis Michael Hahsler Intelligent Data

FUSE-IT: Facility Using smart Secured Energy &amp; Information Technology Adrien BECUE Cassidian

BIRCH : An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu

Asum.ys ----Changes to memory and registers Ruiqin Tian 5.init: irmovl Stack, %esp # Set up stack

Fast Escape Analysis for Region-Based Memory Management Guillaume Salagnac 1 , Sergio Yovine 1 ,

Baire one functions depending on finitely many coordinates Olena Karlova Chernivtsi National

Verified Indifferentiable Hashing into Elliptic Curves eguelin 1 Santiago Zanella B Gilles

Rhode Island Stormwater Design and Installations Standards Manual Public Workshop BMP Design

Sambuz

Useful Links

Newsletter

Mail Us

FUSE-IT: Facility Using smart Secured Energy & Information Technology Adrien BECUE Cassidian