Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18
Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18
Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Also necessary (coupon collector) Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18
Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Also necessary (coupon collector) How can we avoid the dependence on K ? Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18
Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18
Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18
Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18
Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18
Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18
Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18
Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18
Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18
Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s ◮ See y i for i ∈ S . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18
Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s ◮ See y i for i ∈ S . Some results for non-linear spaces. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18
Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18
Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18
Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = f ( x ) 2 D ( x ) sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18
Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18
Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18
Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Still equals � f � 2 D in expectation, but now max contribution is κ . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18
Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Still equals � f � 2 D in expectation, but now max contribution is κ . ◮ This gives O ( κ log d ) sample complexity by Matrix Chernoff. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18
Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18
Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then f ( x ) 2 = � α, { φ j ( x ) } d j =1 � 2 sup sup � f � D =1 � α � 2 =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18
Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then d � f ( x ) 2 = j =1 � 2 = � α, { φ j ( x ) } d φ j ( x ) 2 . sup sup � f � D =1 � α � 2 =1 j =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18
Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then d � f ( x ) 2 = j =1 � 2 = � α, { φ j ( x ) } d φ j ( x ) 2 . sup sup � f � D =1 � α � 2 =1 j =1 Hence d � x φ j ( x ) 2 = d . κ = E j =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18
Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18
Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18
Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18
Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18
Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18
Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18
Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18
Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Mean zero noise: E [( � f ( x ) − f ( x )) 2 ] ≤ ǫ E [( y − f ( x )) 2 ]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18
Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Mean zero noise: E [( � f ( x ) − f ( x )) 2 ] ≤ ǫ E [( y − f ( x )) 2 ]. Generic noise: E [( � f ( x ) − f ( x )) 2 ] ≤ (1 + ǫ ) E [( y − f ( x )) 2 ]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18
Active learning Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. ◮ Prior work: s = O (( d log d ) 5 / 4 ) [Sabato-Munos ’14], Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. ◮ Prior work: s = O (( d log d ) 5 / 4 ) [Sabato-Munos ’14], s = O ( d log d ) via “volume sampling” [Derezinski-Warmuth-Hsu ’18]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18
Active learning Warmup: suppose we know D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18
Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: f ( x i ) 2 . Pr[Label x i ] ∝ sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18
Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18
Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18
Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ E K Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18
Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ K = d K . E Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18
Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ K = d K . E Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18
Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ K = d K . E Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18
Active learning without knowing D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18
Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18
Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18
Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18
Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18
Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Can improve to m = O ( K log d ), s = O ( d ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18
Getting to s = O ( d ) Based on Lee-Sun ’15 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18
Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18
Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18
Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: ◮ x i ∼ D i where D i depends on x 1 , . . . , x i − 1 . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18
Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: ◮ x i ∼ D i where D i depends on x 1 , . . . , x i − 1 . ◮ D 1 = D ′ , D 2 avoids points near x 1 , etc. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18
Recommend
More recommend