PASSCoDe : P arallel AS ynchronous S tochastic dual Co -ordinate De - PowerPoint PPT Presentation

PASSCoDe : P arallel AS ynchronous S tochastic dual Co -ordinate De scent Cho-Jui Hsieh Department of Computer Science University of Texas at Austin Joint work with H.-F. Yu and I. S. Dhillon Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 1 / 29

Outline L2-regularized Empirical Risk Minimization Dual Coordinate Descent (Hsieh et al., 2008) Parallel Dual Coordinate Descent (on multi-core machines) Theoretical Analysis Experimental Results Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 2 / 29

L2-regularized ERM n w ∈ R d P ( w ) := 1 w ∗ = arg min 2 � w � 2 + � ℓ i ( w T x i ) i =1 SVM with hinge loss: ℓ i ( z i ) = C max (1 − z i , 0) SVM with squared hinge loss: ℓ i ( z i ) = C max (1 − z i , 0) 2 Logistic regression: ℓ i ( z i ) = C log (1 + e − z i ) Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 3 / 29

Primal and Dual Formulations Primal Problem n w ∈ R d P ( w ) := 1 w ∗ = arg min 2 � w � 2 + � ℓ i ( w T x i ) i =1 Dual Problem 2 � n � n α ∈ R n D ( α ) := 1 � � α ∗ = arg min � � + ℓ ∗ i ( − α i ) , α i x i � � 2 � � � � i =1 i =1 ℓ ∗ i ( · ): the conjugate of ℓ i ( · ) Primal-Dual Relationship between w ∗ and α ∗ n w ∗ = w ( α ∗ ) := � α ∗ i x i i =1 Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 4 / 29

Coordinate Descent on the Dual Problem Randomly select an i ∈ { 1 , . . . , n } and update α i ← α i + δ ∗ , where δ ∗ = arg min D ( α + δ e i ) δ Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 5 / 29

Coordinate Descent on the Dual Problem Randomly select an i ∈ { 1 , . . . , n } and update α i ← α i + δ ∗ , where δ ∗ = arg min D ( α + δ e i ) δ � 2 � δ + ( � n i =1 α i x i ) T x i 1 1 � x i � 2 ℓ ∗ = arg min + i ( − ( α i + δ )) � x i � 2 2 δ � n   � T � = T i α i x i x i , α i   i =1 Simple univariate problem, but O ( nnz ) construction time Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 6 / 29

Coordinate Descent on the Dual Problem Randomly select an i ∈ { 1 , . . . , n } and update α i ← α i + δ ∗ , where δ ∗ = arg min D ( α + δ e i ) δ � 2 � δ + ( � n i =1 α i x i ) T x i 1 1 � x i � 2 ℓ ∗ = arg min + i ( − ( α i + δ )) � x i � 2 2 δ � n   � T � = T i α i x i x i , α i   i =1 Simple univariate problem, but O ( nnz ) construction time ⇒ O ( n i ) DCD: [Hsieh et al 2008] i =1 α i x i and δ ∗ = T i Maintain primal variable w = � n � w T x i , α i � O ( n i ) construction time: n i = nnz of x i O ( n i ) maintenance cost: w ← w + δ ∗ x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 7 / 29

Dual Coordinate Descent Stochastic Dual Coordinate Descent For t = 1 , 2 , . . . 1 Randomly pick an index i 2 Compute w T x i 3 Update α i ← α i + δ ∗ where δ ∗ = T i ( w T x i , α i ) 4 Update w ← w + δ ∗ x i . Implemented in LIBLINEAR: Linear SVM (Hsieh et al., 2008), multi-class SVM (Keerthi et al., 2008), Logistic regression (Yu et al., 2011). Analysis: (Nesterov et al., 2012; Shalev-Shwartz et al., 2013) Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 8 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 0 0 0 0 CPU1 CPU2 ( i = 3) 0 0 0 0 Registers: R1 R2 R3 R4 DCD step: compute w T x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 0 0 0 0 CPU1 CPU2 ( i = 3) 0 0 . 2 0 0 Registers: x 32 R1 R3 R4 operation: Load x 32 to R2 DCD step: compute w T x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 0 0 0 0 CPU1 CPU2 ( i = 3) 0 0 . 2 0 0 Registers: w 2 x 32 R3 R4 operation: Load w 2 to R1 DCD step: compute w T x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 0 0 0 0 CPU1 CPU2 ( i = 3) 0 0 . 2 0 0 Registers: w 2 x 32 w T x i R4 operation: R3 = R3 + R1 × R2 DCD step: compute w T x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 0 0 0 0 CPU1 CPU2 ( i = 3) 0 0 . 4 0 0 Registers: w 2 x 34 w T x i R4 operation: Load x 34 to R2 DCD step: compute w T x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 0 0 0 0 CPU1 CPU2 ( i = 3) 0 0 . 4 0 0 Registers: w 4 x 34 w T x i R4 operation: Load w 4 to R1 DCD step: compute w T x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 0 0 0 0 CPU1 CPU2 ( i = 3) 0 0 . 4 0 0 Registers: w 4 x 34 w T x i R4 operation: R3 = R3 + R1 × R2 DCD step: compute w T x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 0 0 0 0 CPU1 CPU2 ( i = 3) 0 0 0 0 Registers: w 4 α 3 w T x i R4 operation: Load α 3 to R2 DCD step: compute δ ∗ = T i � w T x , α i � Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 0 0 0 0 CPU1 CPU2 ( i = 3) 0 0 0 1 Registers: w 4 α 3 w T x i δ ∗ operation: R4 = T i (R2,R3) DCD step: compute δ ∗ = T i � w T x , α i � Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 0 0 0 0 CPU1 CPU2 ( i = 3) 0 1 0 1 Registers: w 4 α 3 w T x i δ ∗ operation: R2 = R2 + R4 DCD step: update α i = α i + δ ∗ Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 1 0 0 0 CPU1 CPU2 ( i = 3) 0 1 0 1 Registers: w 4 α 3 w T x i δ ∗ operation: Save R2 to α 3 DCD step: update α i = α i + δ ∗ Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 1 0 0 0 CPU1 CPU2 ( i = 3) 0 0 . 2 0 1 Registers: w 4 x 32 w T x i δ ∗ operation: Load x 32 to R2 DCD step: update w = w + δ ∗ x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 1 0 0 0 CPU1 CPU2 ( i = 3) 0 0 . 2 0 1 Registers: w 2 x 32 w T x i δ ∗ operation: Load w 2 to R1 DCD step: update w = w + δ ∗ x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 1 0 0 0 CPU1 CPU2 ( i = 3) 0 . 2 0 . 2 0 1 Registers: w 2 x 32 w T x i δ ∗ operation: R1 = R1 + R2 × R4 DCD step: update w = w + δ ∗ x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 . 2 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 1 0 0 0 CPU1 CPU2 ( i = 3) 0 . 2 0 . 2 0 1 Registers: w 2 x 32 w T x i δ ∗ operation: Save R1 to w 2 DCD step: update w = w + δ ∗ x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 . 2 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 1 0 0 0 CPU1 CPU2 ( i = 3) 0 . 2 0 . 4 0 1 Registers: w 2 x 34 w T x i δ ∗ operation: Load x 34 to R2 DCD step: update w = w + δ ∗ x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

Dual Coordinate Descent x 1 x 2 x 3 x 4 x 5 x 6 w Memory: 0.1 0 0.1 0.2 0.2 0 . 2 0.2 0.2 0.3 0 0.4 0.4 0.4 0 0.4 0.5 0.5 0 0.5 0.6 0 0.6 α : 0 0 1 0 0 0 CPU1 CPU2 ( i = 3) 0 0 . 4 0 1 Registers: w 4 x 34 w T x i δ ∗ operation: Load w 4 to R1 DCD step: update w = w + δ ∗ x i Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 9 / 29

PASSCoDe : P arallel AS ynchronous S tochastic dual Co -ordinate De - PowerPoint PPT Presentation

PASSCoDe : P arallel AS ynchronous S tochastic dual Co -ordinate De scent Cho-Jui Hsieh Department of Computer Science University of Texas at Austin Joint work with H.-F. Yu and I. S. Dhillon Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 1

Miscellaneous Topics in Databases P ARALLEL DBMS W HY P ARALLEL A CCESS T O D ATA ? At 10 MB/s

S YNCHRONOUS & 24-Nov-2010 A SYNCHRONOUS D ATA T RANSFER www.eazynotes.com Maninder Kaur 1

P ARALLEL P ROGRAMS U SING A D OMAIN -S PECIFIC L ANGUAGE T OBIAS K LEIN T OOL F OR D EVELOPMENT

Calhoun Community College Dual Enrollment Info Session for Students & Parents What is Dual

DUAL CREDIT WHAT IS DUAL CREDIT? Dual credit means two things are happening at once. Students

Lenguaje dual en el distrito 47 Dual Language in District 47 2017-2018 What is Dual Language?

Web Application for the Dual Web Application for the Dual Web Application for the Dual Web

D ISTRIBUTED S YSTEMS [COMP9243] S YNCHRONOUS VS A SYNCHRONOUS D ISTRIBUTED S YSTEMS Lecture 7

S TOCHASTIC H ILL C LIMBING (C ONT D ) I. Ljubi and G. R. Raidl , An Evolutionary

S ite-S pecific S tochastic S tudy of Multiple Truck Presence on Highway Multiple Truck

BUFFER on, off, empty on, off, full O PPORTUNISTIC NETWORKS S TOCHASTIC HYPE O PPORTUNISTIC N

Stochastic Matching in Hypergraphs Amit Chavan, Srijan Kumar and Pan Xu May 13, 2014 I

Dual Credit Courses What does it mean to be a dual credit student? Dual enrollment simply means: A

Dual Interface Technology Update EuroForum 2014 Munich Agenda 1/ Dual Interface Technologies

FBISD/HCC Dual Credit Program Welcome! We are excited to have you participate in FBISD/HCCs

Dual Credit Temple College Please pick up a Dual Credit and/or REACH Packet. DO NOT fill

W H W H I T N E I T N E Y S C H Y S C H M I D M I D T ( W P S 5 ) , T ( W P S 5 ) , M I C

vESP search and delivery of talks/lectures vESP: Idea Present a video of explanations about TCP

Specialising the EDM for Digitised Manuscripts Kai Eckert 1 , Steffen Hennicke, Evelyn Drge,

Data Quality Management Program (DQMP) Part 1: Overview of a DQMP Mike Lindsay, ICF &

Hazardous Material Management in Thilawa Special Economic Zone, Myanmar Gene Peralta*, Cho Cho

DOI datacenters should provide Harry Enke Leibniz-Institute for Astrophysics Potsdam (AIP)

for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New York University Overview

Search Result Diversity for Informational Queries Michael Welch, Junghoo Cho, Christopher Olston

PASSCoDe : P arallel AS ynchronous S tochastic dual Co -ordinate De - PowerPoint PPT Presentation

PASSCoDe : P arallel AS ynchronous S tochastic dual Co -ordinate De scent Cho-Jui Hsieh Department of Computer Science University of Texas at Austin Joint work with H.-F. Yu and I. S. Dhillon Cho-Jui Hsieh (UT Austin) PASSCoDe July 7, 2015 1

Miscellaneous Topics in Databases P ARALLEL DBMS W HY P ARALLEL A CCESS T O D ATA ? At 10 MB/s

S YNCHRONOUS &amp; 24-Nov-2010 A SYNCHRONOUS D ATA T RANSFER www.eazynotes.com Maninder Kaur 1

P ARALLEL P ROGRAMS U SING A D OMAIN -S PECIFIC L ANGUAGE T OBIAS K LEIN T OOL F OR D EVELOPMENT

Calhoun Community College Dual Enrollment Info Session for Students &amp; Parents What is Dual

DUAL CREDIT WHAT IS DUAL CREDIT? Dual credit means two things are happening at once. Students

Lenguaje dual en el distrito 47 Dual Language in District 47 2017-2018 What is Dual Language?

Web Application for the Dual Web Application for the Dual Web Application for the Dual Web

D ISTRIBUTED S YSTEMS [COMP9243] S YNCHRONOUS VS A SYNCHRONOUS D ISTRIBUTED S YSTEMS Lecture 7

S TOCHASTIC H ILL C LIMBING (C ONT D ) I. Ljubi and G. R. Raidl , An Evolutionary

S ite-S pecific S tochastic S tudy of Multiple Truck Presence on Highway Multiple Truck

BUFFER on, off, empty on, off, full O PPORTUNISTIC NETWORKS S TOCHASTIC HYPE O PPORTUNISTIC N

Stochastic Matching in Hypergraphs Amit Chavan, Srijan Kumar and Pan Xu May 13, 2014 I

Dual Credit Courses What does it mean to be a dual credit student? Dual enrollment simply means: A

Dual Interface Technology Update EuroForum 2014 Munich Agenda 1/ Dual Interface Technologies

FBISD/HCC Dual Credit Program Welcome! We are excited to have you participate in FBISD/HCCs

Dual Credit Temple College Please pick up a Dual Credit and/or REACH Packet. DO NOT fill

W H W H I T N E I T N E Y S C H Y S C H M I D M I D T ( W P S 5 ) , T ( W P S 5 ) , M I C

vESP search and delivery of talks/lectures vESP: Idea Present a video of explanations about TCP

Specialising the EDM for Digitised Manuscripts Kai Eckert 1 , Steffen Hennicke, Evelyn Drge,

Data Quality Management Program (DQMP) Part 1: Overview of a DQMP Mike Lindsay, ICF &amp;

Hazardous Material Management in Thilawa Special Economic Zone, Myanmar Gene Peralta*, Cho Cho

DOI datacenters should provide Harry Enke Leibniz-Institute for Astrophysics Potsdam (AIP)

for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New York University Overview

Search Result Diversity for Informational Queries Michael Welch, Junghoo Cho, Christopher Olston

S YNCHRONOUS & 24-Nov-2010 A SYNCHRONOUS D ATA T RANSFER www.eazynotes.com Maninder Kaur 1

Calhoun Community College Dual Enrollment Info Session for Students & Parents What is Dual

Data Quality Management Program (DQMP) Part 1: Overview of a DQMP Mike Lindsay, ICF &