Online isotonic regression Wojciech Kot lowski DA2PL 2018 Pozna - PowerPoint PPT Presentation

Isotonic regression Definition Given data { ( x t , y t ) } T t =1 ⊂ R × R , find isotonic (nondecreasing) f ∗ : R → R , which minimizes squared error over the labels: T � ( y t − f ( x t )) 2 , min : f t =1 subject to : x t ≥ x q = ⇒ f ( x t ) ≥ f ( x q ) , q , t ∈ { 1 , . . . , T } . The optimal solution f ∗ is called isotonic regression function. What only matters are values f ( x t ), t = 1 , . . . , T . 13 / 59

Isotonic regression example (source: scikit-learn.org ) 14 / 59

Properties of isotonic regression Depends on instances ( x ) only through their order relation. Only defined at points { x 1 , . . . , x T } . Often extended to R by linear interpolation. Piecewise constants (splits the data into level sets). Self-averaging property: the value of f ∗ in a given level set equals the average of labels in that level set. For any v : � 1 where S v = { t : f ∗ ( x t ) = v } . v = y t | S v | t ∈ S v When y ∈ { 0 , 1 } , produces calibrated (empirical) probabilities: E emp [ y | f ∗ = v ] = v 15 / 59

Pool Adjacent Violators Algorithm (PAVA) Iterative merging of of data points into blocks until no violators of isotonic constraints exist. The values assigned to each block is the average over labels in this block. The final assignments to blocks corresponds to the level sets of isotonic regression. Works in linear O ( T ) time, but requires the data to be sorted. 16 / 59

Generalized isotonic regression Definition t =1 ⊂ R × R , find isotonic f ∗ : R → R which Given data { ( x t , y t ) } T minimizes: T � min ∆( y t , f ( x t )) . isotonic f t =1 Squared loss ( y t − f ( x t )) 2 replaced with general loss ∆( y t , f ( x t )). 17 / 59

Generalized isotonic regression Definition t =1 ⊂ R × R , find isotonic f ∗ : R → R which Given data { ( x t , y t ) } T minimizes: T � min ∆( y t , f ( x t )) . isotonic f t =1 Squared loss ( y t − f ( x t )) 2 replaced with general loss ∆( y t , f ( x t )). Theorem [Robertson et al., 1998] All loss functions of the form: ∆( y , z ) = Ψ( y ) − Ψ( z ) − Ψ ′ ( z )( y − z ) for some strictly convex Ψ result in the same isotonic regression function f ∗ . 17 / 59

Generalized isotonic regression – examples ∆( y , z ) = Ψ( y ) − Ψ( z ) − Ψ ′ ( z )( y − z ) Squared function Ψ( y ) = y 2 : ∆( y , z ) = y 2 − z 2 − 2 f ( y − z ) = ( y − z ) 2 (squared loss) . Entropy Ψ( y ) = − y log y − (1 − y ) log(1 − y ), y ∈ [0 , 1] ∆( y , z ) = − y log z − (1 − y ) log(1 − z ) (cross-entropy) . Negative logarithm Ψ( y ) = − log y , y > 0 ∆( y , z ) = y z − log y (Itakura-Saito distance / Burg entropy) . z 18 / 59

Outline 1 Motivation 2 Isotonic regression 3 Online learning 4 Online isotonic regression 5 Fixed design online isotonic regression 6 Random permutation online isotonic regression 7 Conclusions 19 / 59

Online learning framework A theoretical framework for the analysis of online algorithms. Learning process by its very nature is incremental. Avoids stochastic (e.g., i.i.d.) assumptions on the data sequence, designs algorithms which work well for any data. Meaningful performance guarantees based on observed quantities: regret bounds. 20 / 59

Online learning framework t → t + 1 prediction suffered loss learner (strategy) � y t = f t ( x t ) ℓ ( y t , � y t ) f t : X → Y new instance ( x t , ?) feedback: y t 21 / 59

Online learning framework Set of strategies (actions) F ; known loss function ℓ . Learner starts with some initial strategy (action) f 1 . For t = 1 , 2 , . . . : 1 Learner observes instance x t . 2 Learner predicts with � y t = f t ( x t ). 3 The environment reveals outcome y t . 4 Learner suffers loss ℓ ( y t , � y t ). 5 Learner updates its strategy f t → f t +1 . 22 / 59

Online learning framework The goal of the learner is to be close to the best f in hindsight. Cumulative loss of the learner: T � � L T = ℓ ( y t , � y t ) . t =1 Cumulative loss of the best strategy f in hindsight: T � L ∗ T = min ℓ ( y t , f ( x t )) . f ∈F t =1 Regret of the learner: regret T = � L T − L ∗ T . The goal is to minimize regret over all possible data sequences. 23 / 59

Online isotonic regression 1 Y 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 X 25 / 59

Online isotonic regression 1 Y 0 x 1 x 2 x 3 x 4 x 5 x 6 x 5 x 7 x 8 X 25 / 59

Online isotonic regression 1 � y 5 Y 0 x 1 x 2 x 3 x 4 x 5 x 6 x 5 x 7 x 8 X 25 / 59

Online isotonic regression 1 y 5 � y 5 Y 0 x 1 x 2 x 3 x 4 x 5 x 6 x 5 x 7 x 8 X 25 / 59

Online isotonic regression 1 y 5 y 5 − y 5 ) 2 loss = ( � � y 5 Y 0 x 1 x 2 x 3 x 4 x 5 x 6 x 5 x 7 x 8 X 25 / 59

Online isotonic regression 1 Y 0 x 1 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 X 25 / 59

Online isotonic regression 1 Y � y 1 0 x 1 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 X 25 / 59

Online isotonic regression 1 Y � y 1 y 1 0 x 1 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 X 25 / 59

Online isotonic regression 1 Y � y 1 y 1 − y 1 ) 2 loss = ( � y 1 0 x 1 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 X 25 / 59

Online isotonic regression 1 Y 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 X 25 / 59

Online isotonic regression The protocol Given: x 1 < x 2 < . . . < x T . At trial t = 1 , . . . , T : Environment chooses a yet unlabeled point x i t . Learner predicts � y i t ∈ [0 , 1]. Environment reveals label y i t ∈ [0 , 1]. y i t ) 2 . Learner suffers squared loss ( y i t − � 26 / 59

Online isotonic regression The protocol Given: x 1 < x 2 < . . . < x T . At trial t = 1 , . . . , T : Environment chooses a yet unlabeled point x i t . Learner predicts � y i t ∈ [0 , 1]. Environment reveals label y i t ∈ [0 , 1]. y i t ) 2 . Learner suffers squared loss ( y i t − � Strategies = isotonic functions: F = { f : f ( x 1 ) ≤ f ( x 2 ) ≤ . . . ≤ f ( x T ) } 26 / 59

Online isotonic regression The protocol Given: x 1 < x 2 < . . . < x T . At trial t = 1 , . . . , T : Environment chooses a yet unlabeled point x i t . Learner predicts � y i t ∈ [0 , 1]. Environment reveals label y i t ∈ [0 , 1]. y i t ) 2 . Learner suffers squared loss ( y i t − � Strategies = isotonic functions: F = { f : f ( x 1 ) ≤ f ( x 2 ) ≤ . . . ≤ f ( x T ) } T T � � y i t ) 2 − min ( y i t − f ( x i t )) 2 regret T = ( y i t − � f ∈F t =1 t =1 26 / 59

Online isotonic regression F = { f : f ( x 1 ) ≤ f ( x 2 ) ≤ . . . ≤ f ( x T ) } T T � � y i t ) 2 − min ( y i t − f ( x i t )) 2 regret T = ( y i t − � f ∈F t =1 t =1 Cumulative loss of the learner should not be much larger than the loss of (optimal) isotonic regression function in hindsight. Only the order x 1 < . . . < x T matters, not the values. 27 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret 1 Y 0 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret 1 Y 0 x 1 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret 1 Y � y 1 0 x 1 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret y 1 1 Y � y 1 0 x 1 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret y 1 1 loss ≥ 1 / 4 Y � y 1 0 x 1 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret 1 Y 0 x 2 x 2 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret 1 � y 2 Y 0 x 2 x 2 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret 1 � y 2 Y y 2 0 x 2 x 2 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret 1 � y 2 Y loss ≥ 1 / 4 y 2 0 x 2 x 2 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret 1 Y 0 x 3 x 2 x 3 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret 1 Y � y 1 0 x 3 x 2 x 3 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret y 3 1 Y � y 1 0 x 3 x 2 x 3 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret y 3 1 loss ≥ 1 / 4 Y � y 1 0 x 3 x 2 x 3 x 1 X 28 / 59

The adversary is too powerful! Every algorithm will have Ω( T ) regret 1 Y 0 x 2 x 3 x 1 X Algorithms’ loss ≥ 1 4 per trial, loss of best isotonic function = 0. 28 / 59

Fixed design Data x 1 , . . . , x T is known in advance to the learner We will show that in such model, efficient online algorithms exist. K., Koolen, Malek: Online Isotonic Regression . Proc. of Conference on Learning Theory (COLT), pp. 1165–1189, 2016. 30 / 59

Off-the-shelf online algorithms Algorithm General bound Bound for online IR √ Stochastic Gradient Descent G 2 D 2 T T √ T log d √ T log T Exponentiated Gradient G ∞ D 1 T 2 log T Follow the Leader G 2 D 2 d log T Exponential Weights d log T T log T These bounds are tight (up to logarithmic factor). 31 / 59

Exponential Weights (Bayes) with uniform prior Let f = ( f 1 , . . . , f T ) denote values of f at ( x 1 , . . . , x T ). π ( f ) = const , for all f : f 1 ≤ . . . ≤ f T , P ( f | y i 1 , . . . , y i t ) ∝ π ( f ) e − 1 2 loss 1 ... t ( f ) , � � y i t +1 = f i t +1 P ( f | y i 1 , . . . , y i t )d f . � �� = posterior mean 32 / 59

Exponential Weights with uniform prior does not learn prior mean 1 0.8 0.6 Y 0.4 0.2 prior mean posterior mean 0 X 33 / 59

Exponential Weights with uniform prior does not learn posterior mean ( t = 10) 1 ● ● ● ● ● ● ● ● ● ● 0.8 0.6 Y 0.4 0.2 prior mean posterior mean 0 X 34 / 59

Exponential Weights with uniform prior does not learn posterior mean ( t = 20) 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 0.6 Y 0.4 0.2 prior mean posterior mean 0 X 35 / 59

Exponential Weights with uniform prior does not learn posterior mean ( t = 50) 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 0.6 Y 0.4 0.2 prior mean posterior mean 0 X 36 / 59

Exponential Weights with uniform prior does not learn posterior mean ( t = 100) 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 0.6 Y 0.4 0.2 prior mean posterior mean 0 X 37 / 59

The algorithm Exponential Weights on a covering net � � f : f t = k t F K = K , k ∈ { 0 , 1 , . . . , K } , f 1 ≤ . . . ≤ f T , π ( f ) uniform on F K . Efficient implementation by dynamic programming: O ( Kt ) at trial t . Speed-up to O ( K ) if the data revealed in isotonic order. 38 / 59

Covering net A finite set of isotonic functions on a discrete grid of y values. 1 0 . 9 0 . 8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 0 . 1 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 39 / 59

Covering net A finite set of isotonic functions on a discrete grid of y values. 1 0 . 9 0 . 8 0 . 7 There are O ( T K ) functions in F K 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 0 . 1 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 39 / 59

Performance of the algorithm Regret bound � � T 1 / 3 log − 1 / 3 ( T ) When K = Θ , � � T 1 / 3 log 2 / 3 ( T ) Regret = O 40 / 59

Performance of the algorithm Regret bound � � T 1 / 3 log − 1 / 3 ( T ) When K = Θ , � � T 1 / 3 log 2 / 3 ( T ) Regret = O Matching lower bound Ω( T 1 / 3 ) (up to log factor). 40 / 59

Performance of the algorithm Regret bound � � T 1 / 3 log − 1 / 3 ( T ) When K = Θ , � � T 1 / 3 log 2 / 3 ( T ) Regret = O Matching lower bound Ω( T 1 / 3 ) (up to log factor). Proof idea Regret = Loss(alg) − min f ∈F K Loss( f ) + min f ∈F K Loss( f ) − isotonic f Loss( f ) min 40 / 59

Performance of the algorithm Regret bound � � T 1 / 3 log − 1 / 3 ( T ) When K = Θ , � � T 1 / 3 log 2 / 3 ( T ) Regret = O Matching lower bound Ω( T 1 / 3 ) (up to log factor). Proof idea Regret = Loss(alg) − min f ∈F K Loss( f ) � �� =2 log |F K | = O ( K log T ) + min f ∈F K Loss( f ) − isotonic f Loss( f ) min � �� = T 4 K 2 40 / 59

Performance of the algorithm prior mean 1 0.8 0.6 Y 0.4 0.2 prior mean posterior mean 0 X 41 / 59

Performance of the algorithm posterior mean ( t = 10) 1 ● ● ● ● ● ● ● ● ● ● 0.8 0.6 Y 0.4 0.2 prior mean posterior mean 0 X 42 / 59

Performance of the algorithm posterior mean ( t = 20) 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 0.6 Y 0.4 0.2 prior mean posterior mean 0 X 43 / 59

Performance of the algorithm posterior mean ( t = 50) 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 0.6 Y 0.4 0.2 prior mean posterior mean 0 X 44 / 59

Performance of the algorithm posterior mean ( t = 100) 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 0.6 Y 0.4 0.2 prior mean posterior mean 0 X 45 / 59

Other loss functions Cross-entropy loss ℓ ( y , � y ) = − y log � y − (1 − y ) log(1 − � y ) � � T 1 / 3 log 2 / 3 ( T ) The same bound O . Covering net F K obtained by non-uniform discretization. 46 / 59

Other loss functions Cross-entropy loss ℓ ( y , � y ) = − y log � y − (1 − y ) log(1 − � y ) � � T 1 / 3 log 2 / 3 ( T ) The same bound O . Covering net F K obtained by non-uniform discretization. Absolute loss ℓ ( y , � y ) = | y − � y | � obtained by Exponentiated Gradient. � √ T log T O √ Matching lower bound Ω( T ) (up to log factor). 46 / 59

Random permutation model A more realistic scenario for generating x 1 , . . . , x T which allows data to be unknown in advance. 48 / 59

Random permutation model A more realistic scenario for generating x 1 , . . . , x T which allows data to be unknown in advance. The data are chosen adversarially before the game begins, but then are presented to the learner in a random order Motivation: data gathering process is independent on the underlying data generation mechanism. Still very weak assumption. Evaluation: regret averaged over all permutations of data: E σ [regret T ] K., Koolen, Malek: Random Permutation Online Isotonic Regression . NIPS, pp. 4180–4189, 2017. 48 / 59

Leave-one-out loss Definition Given t labeled points { ( x i , y i ) } t i =1 , for i = 1 , . . . , t : Take out i -th point and give remaining t − 1 points to the learner as a training data. Learner predict � y i on x i and receives loss ℓ ( y i , � y i ). � t Evaluate the learner by ℓ oo t = 1 i =1 ℓ ( y i , � y i ) t No sequential structure in the definition. 49 / 59

Leave-one-out loss Definition Given t labeled points { ( x i , y i ) } t i =1 , for i = 1 , . . . , t : Take out i -th point and give remaining t − 1 points to the learner as a training data. Learner predict � y i on x i and receives loss ℓ ( y i , � y i ). � t Evaluate the learner by ℓ oo t = 1 i =1 ℓ ( y i , � y i ) t No sequential structure in the definition. Theorem If ℓ oo t ≤ g ( t ) for all t , then E σ [regret T ] ≤ � T t =1 g ( t ) . 49 / 59

Fixed design to random permutation conversion Any algorithm for fixed-design can be used in the random permutation setup by being re-run from the scratch in each trial. We have shown that: ℓ oo t ≤ 1 t E σ [fixed-design-regret t ] We thus get an optimal algorithm (Exponential Weights on a grid) with � O ( T − 2 / 3 ) leave-one-out loss “for free”, but it is complicated. Can we get simpler algorithms to work in this setup? 50 / 59

Follow the Leader (FTL) algorithm Definition Given past t − 1 data, compute the optimal (loss-minimizing) function f ∗ and predict on new instance x according to f ∗ ( x ). 51 / 59

Follow the Leader (FTL) algorithm Definition Given past t − 1 data, compute the optimal (loss-minimizing) function f ∗ and predict on new instance x according to f ∗ ( x ). FTL is undefined for isotonic regression. − 3 − 1 2 3 x y 0 0 . 2 0 . 7 1 f ∗ ( x ) 0 0 . 2 0 . 7 1 51 / 59

Follow the Leader (FTL) algorithm Definition Given past t − 1 data, compute the optimal (loss-minimizing) function f ∗ and predict on new instance x according to f ∗ ( x ). FTL is undefined for isotonic regression. − 3 − 1 0 2 3 x y 0 0 . 2 0 . 7 1 f ∗ ( x ) 0 0 . 2 ?? 0 . 7 1 51 / 59

Foward Algorithm (FA) Definition Given past t − 1 data and a new instance x , take any guess y ′ ∈ [0 , 1] of the new label and predict according to the optimal function f ∗ on the past data including the new point ( x , y ′ ). x − 3 − 1 0 2 3 0 0 . 2 0 . 7 1 y f ∗ ( x ) 52 / 59

Foward Algorithm (FA) Definition Given past t − 1 data and a new instance x , take any guess y ′ ∈ [0 , 1] of the new label and predict according to the optimal function f ∗ on the past data including the new point ( x , y ′ ). x − 3 − 1 0 2 3 y ′ = 1 0 0 . 2 0 . 7 1 y f ∗ ( x ) 52 / 59

Foward Algorithm (FA) Definition Given past t − 1 data and a new instance x , take any guess y ′ ∈ [0 , 1] of the new label and predict according to the optimal function f ∗ on the past data including the new point ( x , y ′ ). x − 3 − 1 0 2 3 y ′ = 1 0 0 . 2 0 . 7 1 y f ∗ ( x ) 0 0 . 2 0 . 85 0 . 85 1 52 / 59

Foward Algorithm (FA) Definition Given past t − 1 data and a new instance x , take any guess y ′ ∈ [0 , 1] of the new label and predict according to the optimal function f ∗ on the past data including the new point ( x , y ′ ). x − 3 − 1 0 2 3 y ′ = 1 0 0 . 2 0 . 7 1 y f ∗ ( x ) 0 0 . 2 0 . 85 0 . 85 1 Various popular prediction algorithms for IR fall into this framework (including linear interpolation [Zadrozny & Elkan, 2002] and many others [Vovk et al., 2015]). 52 / 59

Foward Algorithm (FA) Two extreme FA: guess-1 and guess-0, denoted f ∗ 1 and f ∗ 0 . Prediction of any FA is always between: f ∗ 0 ( x ) ≤ f ∗ ( x ) ≤ f ∗ 1 ( x ). 1 y 8 y 7 y 6 y 5 Y y 2 y 3 y 1 0 x 1 x 2 x 3 x 5 x 6 x 7 x 8 x 4 X 53 / 59

Foward Algorithm (FA) Two extreme FA: guess-1 and guess-0, denoted f ∗ 1 and f ∗ 0 . Prediction of any FA is always between: f ∗ 0 ( x ) ≤ f ∗ ( x ) ≤ f ∗ 1 ( x ). 1 y 8 y 7 f ∗ 1 y 6 y 5 Y y 2 y 3 y 1 0 x 1 x 2 x 3 x 5 x 6 x 7 x 8 x 4 X 53 / 59

Foward Algorithm (FA) Two extreme FA: guess-1 and guess-0, denoted f ∗ 1 and f ∗ 0 . Prediction of any FA is always between: f ∗ 0 ( x ) ≤ f ∗ ( x ) ≤ f ∗ 1 ( x ). 1 y 8 y 7 f ∗ 1 y 6 y 5 Y y 2 y 3 f ∗ 0 y 1 0 x 1 x 2 x 3 x 5 x 6 x 7 x 8 x 4 X 53 / 59

Foward Algorithm (FA) Two extreme FA: guess-1 and guess-0, denoted f ∗ 1 and f ∗ 0 . Prediction of any FA is always between: f ∗ 0 ( x ) ≤ f ∗ ( x ) ≤ f ∗ 1 ( x ). 1 y 8 y 7 f ∗ 1 y 6 y 5 every FA predicts in this range Y y 2 y 3 f ∗ 0 y 1 0 x 1 x 2 x 3 x 5 x 6 x 7 x 8 x 4 X 53 / 59

Performance of FA Theorem For squared loss, every forward algorithm has:   � log t   ℓ oo t = O t The bound is suboptimal, but only a factor of O ( t 1 / 6 ) off. For cross-entropy loss, the some bound holds but a more careful choice of the guess must be made. 54 / 59

Online isotonic regression Wojciech Kot lowski DA2PL 2018 Pozna - PowerPoint PPT Presentation

Online isotonic regression Wojciech Kot lowski DA2PL 2018 Pozna n University of Technology 1 / 59 Outline 1 Motivation 2 Isotonic regression 3 Online learning 4 Online isotonic regression 5 Fixed design online isotonic regression 6

Isotonic Distributional Regression (IDR) Leveraging Monotonicity, Uniquely So! Tilmann Gneiting

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Children. Reduction on Introduction of Isotonic Maintenance Fluids? Authors: Dipen Patel 1 , MD,

Investigation of up-and-down strategies for isotonic dose-finding Anastasia Ivanova Department

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Chemistry 120 Fall 2016 Instructor: Dr. Upali Siriwardane e-mail: upali@latech.edu Office: CTH

problems of direct input and solutions Indirect vs. Direct pointing Absolute vs. Relative

Input Devices Robert W. Lindeman Worcester Polytechnic Institute Department of Computer Science

Learning with Submodular Functions Francis Bach Sierra project-team, INRIA - Ecole Normale Sup

High Performance Linear System Solvers with Focus on Graph Laplacians Richard Peng Georgia Tech

Overview of PySpark MLlib BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

TIME/ACCURACY TRADEOFFS FOR LEARNING A RELU WITH RESPECT TO GAUSSIAN MARGINALS Surbhi Goel

Introduction to Machine Learning Classification: Logistic Regression

Online isotonic regression Wojciech Kot lowski DA2PL 2018 Pozna - PowerPoint PPT Presentation

Online isotonic regression Wojciech Kot lowski DA2PL 2018 Pozna n University of Technology 1 / 59 Outline 1 Motivation 2 Isotonic regression 3 Online learning 4 Online isotonic regression 5 Fixed design online isotonic regression 6

Isotonic Distributional Regression (IDR) Leveraging Monotonicity, Uniquely So! Tilmann Gneiting

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Children. Reduction on Introduction of Isotonic Maintenance Fluids? Authors: Dipen Patel 1 , MD,

Investigation of up-and-down strategies for isotonic dose-finding Anastasia Ivanova Department

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Chemistry 120 Fall 2016 Instructor: Dr. Upali Siriwardane e-mail: upali@latech.edu Office: CTH

problems of direct input and solutions Indirect vs. Direct pointing Absolute vs. Relative

Input Devices Robert W. Lindeman Worcester Polytechnic Institute Department of Computer Science

Learning with Submodular Functions Francis Bach Sierra project-team, INRIA - Ecole Normale Sup

High Performance Linear System Solvers with Focus on Graph Laplacians Richard Peng Georgia Tech

Overview of PySpark MLlib BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

TIME/ACCURACY TRADEOFFS FOR LEARNING A RELU WITH RESPECT TO GAUSSIAN MARGINALS Surbhi Goel

Introduction to Machine Learning Classification: Logistic Regression

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and