active regression via linear sample sparsification
play

Active Regression via Linear-Sample Sparsification Xue Chen Eric - PowerPoint PPT Presentation

Active Regression via Linear-Sample Sparsification Xue Chen Eric Price UT Austin Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 1 / 18 Agnostic learning Xue Chen, Eric Price (UT Austin) Active Regression


  1. Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

  2. Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

  3. Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Also necessary (coupon collector) Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

  4. Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Also necessary (coupon collector) How can we avoid the dependence on K ? Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

  5. Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

  6. Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

  7. Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

  8. Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

  9. Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

  10. Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

  11. Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

  12. Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

  13. Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s ◮ See y i for i ∈ S . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

  14. Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s ◮ See y i for i ∈ S . Some results for non-linear spaces. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

  15. Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

  16. Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

  17. Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = f ( x ) 2 D ( x ) sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

  18. Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

  19. Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

  20. Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Still equals � f � 2 D in expectation, but now max contribution is κ . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

  21. Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Still equals � f � 2 D in expectation, but now max contribution is κ . ◮ This gives O ( κ log d ) sample complexity by Matrix Chernoff. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

  22. Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

  23. Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then f ( x ) 2 = � α, { φ j ( x ) } d j =1 � 2 sup sup � f � D =1 � α � 2 =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

  24. Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then d � f ( x ) 2 = j =1 � 2 = � α, { φ j ( x ) } d φ j ( x ) 2 . sup sup � f � D =1 � α � 2 =1 j =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

  25. Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then d � f ( x ) 2 = j =1 � 2 = � α, { φ j ( x ) } d φ j ( x ) 2 . sup sup � f � D =1 � α � 2 =1 j =1 Hence d � x φ j ( x ) 2 = d . κ = E j =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

  26. Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

  27. Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

  28. Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

  29. Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

  30. Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

  31. Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

  32. Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

  33. Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Mean zero noise: E [( � f ( x ) − f ( x )) 2 ] ≤ ǫ E [( y − f ( x )) 2 ]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

  34. Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Mean zero noise: E [( � f ( x ) − f ( x )) 2 ] ≤ ǫ E [( y − f ( x )) 2 ]. Generic noise: E [( � f ( x ) − f ( x )) 2 ] ≤ (1 + ǫ ) E [( y − f ( x )) 2 ]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

  35. Active learning Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  36. Active learning Query model supposes we know D and can query any point. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  37. Active learning Query model supposes we know D and can query any point. Active learning: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  38. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  39. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  40. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  41. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  42. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  43. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  44. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  45. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  46. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  47. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  48. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  49. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  50. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  51. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. ◮ Prior work: s = O (( d log d ) 5 / 4 ) [Sabato-Munos ’14], Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  52. Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. ◮ Prior work: s = O (( d log d ) 5 / 4 ) [Sabato-Munos ’14], s = O ( d log d ) via “volume sampling” [Derezinski-Warmuth-Hsu ’18]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

  53. Active learning Warmup: suppose we know D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

  54. Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: f ( x i ) 2 . Pr[Label x i ] ∝ sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

  55. Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

  56. Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

  57. Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ E K Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

  58. Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ K = d K . E Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

  59. Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ K = d K . E Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

  60. Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ K = d K . E Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

  61. Active learning without knowing D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

  62. Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

  63. Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

  64. Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

  65. Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

  66. Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Can improve to m = O ( K log d ), s = O ( d ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

  67. Getting to s = O ( d ) Based on Lee-Sun ’15 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

  68. Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

  69. Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

  70. Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: ◮ x i ∼ D i where D i depends on x 1 , . . . , x i − 1 . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

  71. Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: ◮ x i ∼ D i where D i depends on x 1 , . . . , x i − 1 . ◮ D 1 = D ′ , D 2 avoids points near x 1 , etc. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend