CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford - PowerPoint PPT Presentation

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University

 HW3 is out  HW3 is out  Poster session is on last day of classes:  Thu March 11 at 4:15  Thu March 11 at 4:15  Reports are due March 14  Final is March 18 at 12:15  Final is March 18 at 12:15  Open book, open notes  No laptop N l t 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 2

 Which is best linear separator? Whi h i b t li t ? + Data: +  Examples: +  (x 1 , y 1 ),… (x n , y n ) ‐ + + + +  Example i:  Example i: ‐  x i =(x 1 (1) ,…, x 1 (d) ) ‐ +  y i  { ‐ 1, +1} y i { , } ‐ ‐ ‐  Inner product: ‐  w  x=  w x 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 3

 Confidence: f d w  x=0 =(w  x i )y i + + +  For all datapoints:  For all datapoints:  i = + + ‐ + + ‐ + ‐ + ‐ ‐ ‐ ‐ 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 4

+  Maximize the margin:  Maximize the margin: + + w  x=0  Good according to + + intuition, theory & practice intuition theory & practice ‐ + + +  ‐ ‐ max +  , w      ‐ . . , ( ) s t i y x w + i i ‐ ‐ ‐ 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 5

 Canonical hyperplanes:  Canonical hyperplanes:  Projection of x i on plane w w  x=0: w  x=0:       x x x x i i || w || 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 6

 Maximizing the margin:  Maximizing the margin:  max  , w      . . , ( ) s t i y x w i i  Equivalent: 2 min min || || || || w w w     . . , ( ) 1 s t i y x w i i SVM with “hard” constraints 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 7

 If data not separable introduce penalty  If data not separable introduce penalty 1  w   min C # number of mistakes w + w + 2 w  x=0 w  x=0     . . , ( ) 1 s t i y x w + i i + ‐ + +  Choose C based Ch C b d ‐ + on cross validation ‐ + ‐  How to penalize ‐ ‐ mistakes? i t k ? ‐ 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 8

+ +  Introduce slack variables  :  Introduce slack variables  : n 1 +     min w w C + + i   2 2 , 0 w +  i i 1 1 i i       ‐ ‐ + . . , ( ) 1 s t i y x w i i i + ‐  Hinge loss: Hi l ‐ ‐ w  x=0 For each datapoint: If margin>1, don’t care If margin<1 pay linear penalty If margin<1, pay linear penalty 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 9

 SVM in the natural form  SVM in the “natural” form arg min (w) f w w  Where: n 1           ( ( ) ) max{ max{ 0 0 , , 1 1 ( ( )} )} f f w w w w w w C C y y x x w w i i i i 2  1 i 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 10

 Use quadratic solver:  Use quadratic solver: n n 1 1     min w w C i  Minimize quadratic function   2 , 0 w  i 1 i              Subject to linear constraints  S bj . . , ( ( ) ) 1 1 s s t t i i y y x x w w t t li t i t i i i  Stochastic gradient descent:  Minimize: Mi i i n 1         ( ) max{ 0 , 1 ( )} f w w w C y x w i i 2 2  i 1  Update:    ( , ) L wx y                   t t ' ( ( ) ) w w w w f f w w w w w w  t t   w 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 11

 Example by Leon Bottou:  Example by Leon Bottou:  Reuters RCV1 document corpus   m=781k training examples, 23k test examples 781k t i i l 23k t t l  d=50k features  Training time: 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 12

3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 13

 What if we subsample the dataset?  SGD on full dataset vs.  Conjugate gradient on n training examples 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 14

 Need to choose learning rate  :  Need to choose learning rate  :    ' ( ) w w L w  1 t t t  Leon suggests:  Select small subsample  Select small subsample  Try various rates   Pick the one that most reduces the loss Pi k th th t t d th l  Use  for next 100k iterations on the full dataset 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 15

 Stopping criteria:  Stopping criteria: How many iterations of SGD?  Early stopping with cross validation Early stopping with cross validation  Create validation set  Monitor cost function on the validation set  Stop when loss stops decreasing  Early stopping a priori  Extract two disjoint subsamples A and B of training data  Extract two disjoint subsamples A and B of training data  Determine the number of epochs k by training on A, stop by validating on B  Train for k epochs on the full dataset 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 16

 Kernel function: K(x x ) =  (x )   (x )  Kernel function: K(x i ,x j ) =  (x i )   (x j )  Does the SVM kernel trick still work?  Yes (but not without a price):  Yes (but not without a price):  Represent w with its kernel expansion: w =  i  i   (x i )   ( )  Usually: d L(w)/ d w = ‐    (x j )  ( ) d L( )/ d  Then update w at epoch t by combining  :  t = (1 ‐  )  t +  3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 17

[Shalev ‐ Shwartz et al. ICML ‘07]  We had before:  We had before:  Can replace C with  : 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 18

[Shalev ‐ Shwartz et al. ICML ‘07] |A t | = 1 | t | |A t | = S |A t | S Stochastic gradient Subgradient method Subgradient Projection 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 19

[Shalev ‐ Shwartz et al. ICML ‘07]  Choosing |A t |=1 and a linear kernel over R n Choosing |A t | 1 and a linear kernel over R  Theorem [Shalev ‐ Shwartz et al. ‘07]:  Run ‐ time required for Pegasos to find  d f f d accurate solution with prob. >1 ‐   Run ‐ time depends on number of features n i d d b f f  Does not depend on #examples m  Depends on “difficulty” of problem (  and  )  Depends on difficulty of problem (  and  ) 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 20

 SVM and structured output prediction  SVM and structured output prediction  Setting:  Assume: Data is i.i.d. from Assume: Data is i.i.d. from  Given: Training sample  Goal: Find function from input space X to output Y Complex objects 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 21

 Examples:  Examples:  Natural Language Parsing  Given a sequence of words x predict the parse tree y  Given a sequence of words x, predict the parse tree y  Dependencies from structural constraints, since y has to be a tree y S NP VP x The dog chased the cat The dog chased the cat NP Det N V Det N 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 22

 Approach: view as multi ‐ class classification task pp  Every complex output is one class  Problems:  Exponentially many classes!  Exponentially many classes!  How to predict efficiently?  How to learn efficiently?  Potentially huge model!  Potentially huge model! y 1 y S S VP VP VP VP NP  Manageable number of features? V N V Det N y 2 S NP VP NP NP x x The dog chased the cat Det N V Det N … y k k S VP NP NP Det N V Det N 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 23

 Feature vector describes match between x and y y  Learn single weight vector and rank by Hard ‐ margin optimization problem: … … 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 24

[Yue et al., SIGIR ‘07]  Ranking:  Ranking:  Given a query x, predict a ranking y.  D  Dependencies between results (e.g. d i b t lt ( avoid redundant hits)  Loss function over rankings (e g AvgPrec)  Loss function over rankings (e.g. AvgPrec) y x 1. Kernel ‐ Machines SVM 2. SVM ‐ Light 3. Learning with Kernels L i i h K l 4. SV Meppen Fan Club 5. Service Master & Co. 6. School of Volunteer Management g 7. SV Mattersburg Online … 3/2/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 25

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford - PowerPoint PPT Presentation

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University HW3 is out HW3 is out Poster session is on last day of classes: Thu March 11 at 4:15 Thu March 11 at 4:15 Reports are due March 14 Final is

CS345a: Data Mining Jure Leskovec Stanford University Instructors: Instructors: Jure

Mining Data Streams (Part 1) CS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford

CS345a: Data Mining Jure Leskovec Stanford University CPU Machine Learning, Statistics Memory

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Friday 5:30 at

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Instead of generic

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Feature selection:

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Homework 2 is out:

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Would like to do

Clustering Algorithms CS345a: Data Mining Jure Leskovec and Anand

http://cs246.stanford.edu Instructor: Jure Leskovec TAs: Aditya Parameswaran

End-toEnd In-memory Graph Analytics Jure Leskovec (@jure) Including joint work with Rok Sosic,

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

DATA MINING LECTURE 15 The Map-Reduce Computational Paradigm Most of the slides are taken from:

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

Analytics on Sensor Networks Joint work with D. D. Ha Hallac , S. Vare, S. Bhooshan, R. Sosic, S.

Integrated Modeling and Verification of Processes and Data Exploiting DCDSs: models, methods,

Strict Ideal Completions of the Lambda Calculus Patrick Bahr IT University of Copenhagen FSCD

... Arithmetic and Control unit ... Registers logic unit CPU

A short overview of Type Theory Yves Bertot June 2015 1 / 35 -calculus A small-scale

Optimal mass transport and density flows Tryphon Georgiou Mechanical & Aerospace Engineering

Boundary Conditions for the Polyatomic Gases B. Rahimi H. Struchtrup Workshop on Moment Methods

Topic #29 Nyquist plots: Closed-loop response Reference textbook : Control Systems, Dhanesh N.

Topological rewriting systems applied to standard bases and syntactic algebras Cyrille Chenavier

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford - PowerPoint PPT Presentation

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University HW3 is out HW3 is out Poster session is on last day of classes: Thu March 11 at 4:15 Thu March 11 at 4:15 Reports are due March 14 Final is

CS345a: Data Mining Jure Leskovec Stanford University Instructors: Instructors: Jure

Mining Data Streams (Part 1) CS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford

CS345a: Data Mining Jure Leskovec Stanford University CPU Machine Learning, Statistics Memory

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Friday 5:30 at

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Instead of generic

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Feature selection:

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Homework 2 is out:

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Would like to do

Clustering Algorithms CS345a: Data Mining Jure Leskovec and Anand

http://cs246.stanford.edu Instructor: Jure Leskovec TAs: Aditya Parameswaran

End-toEnd In-memory Graph Analytics Jure Leskovec (@jure) Including joint work with Rok Sosic,

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

DATA MINING LECTURE 15 The Map-Reduce Computational Paradigm Most of the slides are taken from:

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

Analytics on Sensor Networks Joint work with D. D. Ha Hallac , S. Vare, S. Bhooshan, R. Sosic, S.

Integrated Modeling and Verification of Processes and Data Exploiting DCDSs: models, methods,

Strict Ideal Completions of the Lambda Calculus Patrick Bahr IT University of Copenhagen FSCD

... Arithmetic and Control unit ... Registers logic unit CPU

A short overview of Type Theory Yves Bertot June 2015 1 / 35 -calculus A small-scale

Optimal mass transport and density flows Tryphon Georgiou Mechanical &amp; Aerospace Engineering

Boundary Conditions for the Polyatomic Gases B. Rahimi H. Struchtrup Workshop on Moment Methods

Topic #29 Nyquist plots: Closed-loop response Reference textbook : Control Systems, Dhanesh N.

Topological rewriting systems applied to standard bases and syntactic algebras Cyrille Chenavier

Optimal mass transport and density flows Tryphon Georgiou Mechanical & Aerospace Engineering