Satyen Kale (Yahoo! Research) Joint work with Elad Hazan (IBM - PowerPoint PPT Presentation

Satyen Kale (Yahoo! Research) Joint work with Elad Hazan (IBM Almaden) and Manfred Warmuth (UCSC)

x 2 y 1 y 2 x T x 1 y T Input: pairs of unit vectors in R n : (x 1 , y 1 ), (x 2 , y 2 ), …, ( x T , y T ) Assumption: y t = Rx t + noise, where R is an unknown rotation matrix Problem: find “best - fit” rotation matrix for the data, i.e. arg min R  t k Rx t – y t k 2

 k Rx t – y t k 2 = k Rx t k 2 + k y t k 2 – 2(y t x t > ) ² R > ) ² R. = 2 - 2(y t x t A ² B = Tr(A > B) = Linear in R  ij A ij B ij > ² R  arg min R  t k Rx t – y t k 2 = arg max R  t y t x t  Computing arg max R M ² R: “Wahba’s problem”  Can be solved using SVD of M

x 2 R 1 x 1 y 1 y 2 x T x 1 R 2 x 2 R T x T y T Choose rot matrix R T Choose rot matrix R 2 Choose rot matrix R 1 Predict R T x T Predict R 2 x 2 Predict R 1 x 1 T (R T ) = k R T x T – y T k 2 L 2 (R 2 ) = k R 2 x 2 – y 2 k 2 L L 1 (R 1 ) = k R 1 x 1 – y 1 k 2 Open problem Goal: Minimize regret: from COLT 2008 Regret =  t L t (R t ) – min R  t L t (R) [Smith, Warmuth]

 Rot matrix ´ orthogonal matrix of determinant 1  Set of rot matrices, SO(n):  Non-convex: so online convex optimization techniques like gradient descent, exponentiated gradient, etc. don’t apply directly  Lie group with Lie algebra = set of all skew-symmetric matrices  Lie group gives universal representation for all Lie groups via a conformal embedding

 [Arora , NIPS ’09] using Lie group/Lie algebra structure  Based on matrix exponentiated gradient: matrix exp maps Lie algebra to Lie group  Deterministic algorithm   (T) lower bound on any such deterministic algorithm, so randomization is crucial

Adversary can compute R t since alg is deterministic  Assume for convenience that n is even.  Bad example: x t = e 1 , y t = -R t x t .  L t (R t ) = k R t x t - y t k 2 = k 2y t k 2 = 4. So total loss = 4T.  Since n is even, both I, -I are rot matrices, and  t L t (I) + L t (-I) =  t 2 k y t k 2 + 2 k x t k 2 = 4T.  Hence, min R  t L t (R) · 2T.  So, Regret ¸ 2T.

 Randomized algorithm with expected regret O( p nL), where L = min R  t L t (R)  Lower bound on regret of any online learning algorithm for choosing rot matrices of  ( p nT)  Using Hannan/Kalai- Vempala’s Follow-The- Perturbed-Leader technique based on linearity of loss function

Sample noise matrix N with i.i.d entries distributed uniformly in [-1/  , 1/  ] t-1 L i (R) - N ² R. In round t, use R t = arg min R  1 Using SVD solution to Wahba’s problem Thm [KV’05]: Regret · O(n 5/4 p T).

Sample n numbers  1 ,  2 , …,  n i.i.d. from the exponential distribution of density  exp(-  ) Sample 2 orthogonal matrices U, V from the uniform Haar measure Set N = U  V > , where  = diag(  1 ,  2 , …,  n ). t-1 L i (R) - N ² R. In round t, use R t = arg min R  1

Sample n numbers  1 ,  2 , …,  n i.i.d. from the exponential distribution of density  exp(-  ) Sample 2 orthogonal matrices U, V from the uniform Haar measure E.g. using QR-decomposition Set N = U  V > , where  = diag(  1 ,  2 , …,  n ). of matrix with i.i.d. standard Gaussian entries t-1 L i (R) - N ² R. In round t, use R t = arg min R  1

Sample n numbers  1 ,  2 , …,  n i.i.d. from the exponential distribution of density  exp(-  ) Effectively, we choose N w.p. / exp(-  k N k * ), where k N k * = trace norm, i.e. sum of singular values of N Sample 2 orthogonal matrices U, V from the uniform Haar measure Set N = U  V > , where  = diag(  1 ,  2 , …,  n ). t-1 L i (R) - N ² R. In round t, use R t = arg min R  1

 Stability Lemma [KV’05]: E[Regret] ·  t E[L t (R t )] – E[L t (R t+1 )] + 2E[ k N k * ] · 2  L = 2n/   Choose  = p n/L, and we get E[Regret] · O( p nL).

t-1 y i x i > + N) ² R  R t = arg max R (  1 t y i x i > + N’) ² R  R t+1 = arg max R (  1 Re- randomization doesn’t change expected regret

t-1 y i x i > + N) ² R  R t = arg max R (  1 t y i x i > + N’) ² R  R t+1 = arg max R (  1  First sample N, then set N’ = N – y t x t > .  Then R t = R t+1 , and so E D [L t (R t ) ] – E D’ [L t (R t+1 )] = 0. D = dist of N, D’ = dist of N’

t-1 y i x i > + N) ² R  R t = arg max R (  1 t y i x i > + N’) ² R  R t+1 = arg max R (  1  First sample N, then set N’ = N – y t x t > .  Then R t = R t+1 , and so E D [L t (R t ) ] – E D’ [L t (R t+1 )] = 0.  However, k D ’ – D k 1 ·  .  So E D’ [L t (R t+1 )] – E D [L t (R t+1 )] · 2  .

t-1 y i x i > + N) ² R  R t = arg max R (  1 t y i x i > + N’) ² R  R t+1 = arg max R (  1  First sample N, then set N’ = N – y t x t > .  Then R t = R t+1 , and so E D [L t (R t ) ] – E D’ [L t (R t+1 )] = 0.  However, k D ’ – D k 1 ·  .  So E D’ [L t (R t+1 )] – E D [L t (R t+1 )] · 2  . Pr D ’ [N]/Pr D [N] ¼ exp( §  k y t x t > k * ) ¼ 1 §  .

E[ k N k * ] = E[  i  i ] =  i E[  i ] = n/  . Because  i is drawn from the exponential distribution of density  exp(-  )

 Bad example: x t = e t mod n , y t = § x t w.p. ½ each *  Opt rot matrix R * = diag(sgn(X 1 ),…, sgn(X n )) X i = sum of § signs over all t s.t. (t mod n) = i. * ignoring det(R * ) = 1 issue

 Bad example: x t = e t mod n , y t = § x t w.p. ½ each  Opt rot matrix R * = diag(sgn(X 1 ),…, sgn(X n )) *  Expected total loss = 2T – 2  i E[|X i | ] ¸ 2T - n ¢  ( p T/n) = 2T -  ( p nT)  But for any R t , E[L t (R t )] = 2 – 2E[(y t x t > ) ² R t ] = 2, and hence total expected loss of alg = 2T.  So, E[Regret] ¸  ( p nT). * ignoring det(R * ) = 1 issue

 Optimal algorithm for online learning of rotations with regret O( p nL)  Based on FSPL  Open questions:  Other applications for FSPL? Matrix Hedge? Faster algorithms for SDPs? More details in Manfred’s open problem talk.  Any other example of natural problems where FPL is the only known technique that works? Thank you!

Satyen Kale (Yahoo! Research) Joint work with Elad Hazan (IBM - PowerPoint PPT Presentation

Satyen Kale (Yahoo! Research) Joint work with Elad Hazan (IBM Almaden) and Manfred Warmuth (UCSC) x 2 y 1 y 2 x T x 1 y T Input: pairs of unit vectors in R n : (x 1 , y 1 ), (x 2 , y 2 ), , ( x T , y T ) Assumption: y t = Rx t + noise, where R

Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

Performability at Yahoo Search Amr Awadallah and a bunch of other yahoos amr@yahoo-inc.com Now,

Top-k Aggregation Using Intersections Yahoo! Research Ravi Kumar Yahoo! Research Kunal Punera

Nick Hugh VP, EMEA Yahoo 2015. Confidential & Proprietary. Yahoo 2015. Confidential &

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

Learnings from scaling Ironic at Yahoo Arun S A G saga@yahoo-inc.com zer0c00l on freenode Yahoo

Yahoo! Homepage Yahoo! Homepage Nicholas C. Zakas Nicholas C. Zakas Principal Front End

Yahoo! Communities Architectures Ian Flint November 9, 2007 1 Agenda What makes Yahoo!

Nuclear Hydrogen Production: Scoping the Safety Issues Satyen Baindur Ottawa Policy Research

Lighting and Net Zero Energy Buildings Advanced Energy 2011 Buffalo, New York Satyen Mukherjee

Market Design in Display Advertising R. Preston McAfee Yahoo! Research - 1 - Yahoo!

IPv6 at Yahoo IPv6 at Yahoo: growth, disparity Large content network: we see traffic from eyeball

INFLUENCING THE INFLUENCER From the Farm to the World Wide Web Lily Diamond of Kale & Caramel

Escaping Saddle Points with Adaptive Gradient Methods Matthew Staib 1 , Sashank Reddi 2 ,

Web Spam Challenges Carlos Castillo Yahoo! Research chato@yahoo-inc.com WEBSPAM\-UK200[67]

Transforming Public Health Director of Public Health Report for Sheffield 2015 Page 1 Stephen

Transformational lessons Transformational lessons What works, why and how? What works, why and

Kore Lavi DFAP Final Performance Evaluation (Qualitative) Haiti Tulane University Presenters

EUROPEAN UNION Institutional Triangle: Decision Makers 1 Low Input Breeds - ECO AB Symposium,

Katie Pfleger Julia Sheth Alana Anderson Nicholas Sparks Pearce Kiesser Co-Manager Co-Manager

Introduction Your local Stop Smoking Service Background on smoking How to offer Very

Childrens Advocacy Center of Delaware, Inc. Joint Finance Committee State Funding Request FY

Operational Challenges Operational Challenges ILO Crisis Response : Trainers Guide InFocus

Satyen Kale (Yahoo! Research) Joint work with Elad Hazan (IBM - PowerPoint PPT Presentation

Satyen Kale (Yahoo! Research) Joint work with Elad Hazan (IBM Almaden) and Manfred Warmuth (UCSC) x 2 y 1 y 2 x T x 1 y T Input: pairs of unit vectors in R n : (x 1 , y 1 ), (x 2 , y 2 ), , ( x T , y T ) Assumption: y t = Rx t + noise, where R

Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

Performability at Yahoo Search Amr Awadallah and a bunch of other yahoos amr@yahoo-inc.com Now,

Top-k Aggregation Using Intersections Yahoo! Research Ravi Kumar Yahoo! Research Kunal Punera

Nick Hugh VP, EMEA Yahoo 2015. Confidential &amp; Proprietary. Yahoo 2015. Confidential &amp;

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

Learnings from scaling Ironic at Yahoo Arun S A G saga@yahoo-inc.com zer0c00l on freenode Yahoo

Yahoo! Homepage Yahoo! Homepage Nicholas C. Zakas Nicholas C. Zakas Principal Front End

Yahoo! Communities Architectures Ian Flint November 9, 2007 1 Agenda What makes Yahoo!

Nuclear Hydrogen Production: Scoping the Safety Issues Satyen Baindur Ottawa Policy Research

Lighting and Net Zero Energy Buildings Advanced Energy 2011 Buffalo, New York Satyen Mukherjee

Market Design in Display Advertising R. Preston McAfee Yahoo! Research - 1 - Yahoo!

IPv6 at Yahoo IPv6 at Yahoo: growth, disparity Large content network: we see traffic from eyeball

INFLUENCING THE INFLUENCER From the Farm to the World Wide Web Lily Diamond of Kale &amp; Caramel

Escaping Saddle Points with Adaptive Gradient Methods Matthew Staib 1 , Sashank Reddi 2 ,

Web Spam Challenges Carlos Castillo Yahoo! Research chato@yahoo-inc.com WEBSPAM\-UK200[67]

Transforming Public Health Director of Public Health Report for Sheffield 2015 Page 1 Stephen

Transformational lessons Transformational lessons What works, why and how? What works, why and

Kore Lavi DFAP Final Performance Evaluation (Qualitative) Haiti Tulane University Presenters

EUROPEAN UNION Institutional Triangle: Decision Makers 1 Low Input Breeds - ECO AB Symposium,

Katie Pfleger Julia Sheth Alana Anderson Nicholas Sparks Pearce Kiesser Co-Manager Co-Manager

Introduction Your local Stop Smoking Service Background on smoking How to offer Very

Childrens Advocacy Center of Delaware, Inc. Joint Finance Committee State Funding Request FY

Operational Challenges Operational Challenges ILO Crisis Response : Trainers Guide InFocus

Nick Hugh VP, EMEA Yahoo 2015. Confidential & Proprietary. Yahoo 2015. Confidential &

INFLUENCING THE INFLUENCER From the Farm to the World Wide Web Lily Diamond of Kale & Caramel