j vega
play

J. Vega Asociacin EURATOM/CIEMAT para Fusin jesus.vega@ciemat.es - PowerPoint PPT Presentation

J. Vega Asociacin EURATOM/CIEMAT para Fusin jesus.vega@ciemat.es Concepts Classification Regression Advanced methods 7th FDPVA. Frascati (March 26-28, 2012) 2 Technology: it gives stereotyped solutions to stereotyped


  1. J. Vega Asociación EURATOM/CIEMAT para Fusión jesus.vega@ciemat.es

  2.  Concepts  Classification  Regression  Advanced methods 7th FDPVA. Frascati (March 26-28, 2012) 2

  3.  Technology: it gives stereotyped solutions to stereotyped problems  Basic science: it is the accumulation of knowledge to explain a phenomenon  Applied science: it is the application of scientific knowledge to a particular environment ◦ Machine learning can increase our comprehension of plasma physics 7th FDPVA. Frascati (March 26-28, 2012) 3

  4.  Learning does not mean ‘ learning by heart ’ (any computer can memorize)  Learning means ‘ generalization capability ’: we learn with some samples and predict for other samples 7th FDPVA. Frascati (March 26-28, 2012) 4

  5.  The learning problem is the problem of finding a desired dependence (function) using a limited number of observations (training data) ◦ Classification: the function can represent the separation frontier between two classes ◦ Regression: the function can provide a fit to the data o o o o o o o o o x o x x x x o o o x x o o o o o o o o Classification Regression 7th FDPVA. Frascati (March 26-28, 2012) 5

  6.  The general model of learning from examples is described through three components  a ˆ y f ( , ) x Generator of random Learning machine:  n x vectors: p( x ) f( x , a ) Supervisor: p(y| x )    y p y | x p( x ) : fixed but unknown probability distribution function y = p(y| x ) (fixed and unknown) ( x i , y i ) , i = 1, ..., N : training samples  The problem of learning is that of choosing from the given set of functions f( x , a ) , the one that best approximates the supervisor’s response     ˆ ˆ ˆ y y , ,..., y "close" to y y , ,..., y 1 2 N 1 2 N 7th FDPVA. Frascati (March 26-28, 2012) 6

  7.  Main hypothesis ◦ The training set, ( x i , y i ) , i = 1, ..., N , is made up of independent and identically distributed (iid) observations drawn according to p( x , y) = p(y| x )p( x )  Loss function: L(y, f( x , a ) ) ◦ It measures the quality of the approach performed by the learning algorithm, i.e. the discrepancy between the response y of the supervisor and the response f( x , a ) of the learning machine. Its values are ≥ 0     Risk functional:        a a R L y f , , p , y d dy x x x The goal of a learning process is to find the function f( x , a 0 ) that minimizes R( a ) (over the class of functions f( x , a ) ) in the situation where p( x , y) is unknown and the only available information is contained in the training set 7th FDPVA. Frascati (March 26-28, 2012) 7

  8.         a   a R L y f , , p , y d dy x x x  Pattern recognition (or classification)     a  0 if y f x ,     a   L y f , , x    a  1 if y f x ,   Regression estimation         2 a   a L y f , , y f , x x  Density estimation       a   a L p , log p x , x 7th FDPVA. Frascati (March 26-28, 2012) 8

  9.    0 if x ax b   a   2 1 f x ,    1 if x ax b 2 1   a  a b , 7th FDPVA. Frascati (March 26-28, 2012) 9

  10.       a   a  f x , x A f , , , A f , 1 1 1 1 1            f x , x A , , f , A , f 2 2 2 2 2            f x , x m b , , , m n , 3 7th FDPVA. Frascati (March 26-28, 2012) 10

  11. 7th FDPVA. Frascati (March 26-28, 2012) 11

  12. Dataset: ( x 1 , y 1 ) , ( x 2 , y 2 ) , ..., ( x i , y i ) , ..., ( x N , y N ) x i ∈ R m : features that are of distinctive nature (object description with attributes managed by computers) y i ∈{L 1 , L 2 , ..., L K }: label of the sample x i Continuous-valued (length, pressure) Quantitative Discrete (numerical) (total basketball score, number of citizens in a town) Feature types Qualitative Ordinal (categorical) (education degree) Nominal (profession, brand of a car) 7th FDPVA. Frascati (March 26-28, 2012) 12

  13. Dataset: ( x 1 , y 1 ) , ( x 2 , y 2 ) , ..., ( x i , y i ) , ..., ( x N , y N ) x i ∈ R m : features that are of distinctive nature (object description with attributes managed by computers) y i ∈{L 1 , L 2 , ..., L K }: known label of the sample x i Objective ve: to determine a separating function between classes (generalization) to predict the label of new samples with known feature vectors ( ( x N+1 , y N+1 ) , ( x N+2 , y N+2 ) , ...) Overfitting x x x x Decision boundary Decision boundary 7th FDPVA. Frascati (March 26-28, 2012) 13

  14.  How good is a classifier? ( x i , y i ) , i = 1, ..., J: training set Dataset: ( x 1 , y 1 ) , ( x 2 , y 2 ) , ..., ( x N , y N ) ( x i , y i ) , i = J+1, ..., N: test set  Training set: a model is created to make predictions. Given x i , the model predicts y i  Test set: model validation. The success rate is taken as the level of confidence and it is assumed to be the same for all future samples 7th FDPVA. Frascati (March 26-28, 2012) 14

  15.  Multi-class problems: K > 2 ◦ It can be tackled as K binary problems. Each class is compared with the rest (one-versus-the-rest approach) c 2 not c 2 c 1 c 1 not c 1 ambiguity c 2 c 4 region not c 3 c 3 c 3 c 4 not c 4 7th FDPVA. Frascati (March 26-28, 2012) 15

  16.  Examples of feature vectors ◦ Disruptions        ( ), t I ( ), t n t ( ),... , y D N , x 1 p s p s e s 1           x ( t T ), I ( t T ), n t ( T ),... , y D N , 2 p s p s e s 2           . . . . ( t 2 ), T I ( t 2 ), T n t ( 2 ),... , T y D N , x 3 p s p s e s 3 ◦ L/H transition       m , y , , y L H , x x i i i i ◦ Image classification  x i : the set of pixels of an image  y i ∈{1, 2, 3} 0 5 10 15 20 time 7th FDPVA. Frascati (March 26-28, 2012) 16

  17.  Single classifiers ◦ Support Vector Machines (SVM) ◦ Neural networks ◦ Bayes decision theory  Parametric method  Non-parametric method ◦ Classification trees  Combining classifiers 7th FDPVA. Frascati (March 26-28, 2012) 17

  18. Binary classifier  It finds the optimal separating hyper-plane between classes  Samples: (x k , y k ), x k ∈R n , k = 1, ..., N, y∈{C {+1} , C {-1} }  C {+1} C {+1}  D ( ) x 1    D ( ) x w x . b 0   D ( ) 1 x D (x)>+1 | D x ( ) | k || w || C {-1} C {-1} D (x)<-1 w Maximum y D ( ) x margin: 2 t    t     k k , y 1, 1 , k 1, , N k w To find the optimal hyper-plane it is necessary to determine the vector w that maximizes the margin t There are infinite solutions due to the presence of a scale factor. To avoid this: t || w|| = 1 Therefore, to maximize the margin is equivalent to minimize || w|| Opti timi miza zation ion problem: lem:    2     min J ( ) , subject to y . w 1 w w  w x  w , w k i 0 0 7th FDPVA. Frascati (March 26-28, 2012) 18

  19. N   Solution: a * *  y w x (x k , y k ), x k ∈R n , k = 1, ..., N, y∈C {+1} , C {-1} } i i i  i 1 a i are the Lagrange multipliers Samples associated to a i ≠ 0 are C {+1}  called “ support vectors ”    * * ( ) b 0 w x  Support  a * y w x vectors i i i support vectors The rest of training samples are irrelevant to classify new samples C {-1} The constant b is obtained from any  condition (Karush-Kuhn-Tucker) w   a        y ( ) b 1 0, i 1 , ,N w x   i i i   is the distance (with sign) from X to the separating hyper-plane * * D ( ) · b x w x Given to classify x    a      * *   if sign y ( ) b 0, C . Otherwise C x x x x       i i i 1 1   vectores soporte V. Cherkassky, F. Mulier. Learning from data . 2 nd edition. Wiley-Interscience. 7th FDPVA. Frascati (March 26-28, 2012) 19

  20.  Non-linearly separable case input space feature space H ( x , x ’)  Kernels ◦ Linear:  H ( , ) ( . ) x x' x x' ◦ Polynomials of degree q:   [( . ) 1] q H ( , ) x x' x x'   2    x x' ◦ Radial basis functions:     H ( , ) exp x x'  2 ◦ Neural network:       H ( , ) tanh(2( . ) 1) x x' x x' Given to classify x      a     * *   if sign y H , b 0, C . Otherwise C x x x x       i i i 1 1   vectores soporte V. Cherkassky, F. Mulier. Learning from data . 2 nd edition. Wiley-Interscience. 7th FDPVA. Frascati (March 26-28, 2012) 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend