sistemi intelligenti supervised learning
play

Sistemi Intelligenti Supervised learning Alberto Borghese - PDF document

Sistemi Intelligenti Supervised learning Alberto Borghese Universit degli Studi di Milano Laboratorio di Sistemi Intelligenti Applicati (AIS-Lab) Dipartimento di Informatica Alberto.borghese@unimi.it A.A. 2017-2018


  1. Sistemi Intelligenti Supervised learning Alberto Borghese Università degli Studi di Milano Laboratorio di Sistemi Intelligenti Applicati (AIS-Lab) Dipartimento di Informatica Alberto.borghese@unimi.it A.A. 2017-2018 http:\borghese.di.unimi.it\ 1/38 Riassunto  Supervised learning: predictive regression  Regressione multi-scala  Versione on-line A.A. 2017-2018 2/38 http:\borghese.di.unimi.it\ 1

  2. Classificazione e regressione Mappatura dello spazio dei campioni nello spazio delle classi. Classe 1 ? o Classe 2 Campione Classe 3 SPAZIO DEI CAMPIONI SPAZIO DELLE CLASSI (identificate / DELLE FEATURES da un’etichetta) (CARATTERISTICHE) . . . . Flusso o ? . Controllo della portata di un condizionatore in funzione della temperatura . “ Imparo ” una funzione continua a partire da alcuni campioni: devo imparare ad interpolare (regressione = predictive learning). T A.A. 2017-2018 http:\borghese.di.unimi.it\ 3/38 Ruolo dei modelli Identificazione: stimo i parametri di un modello a partire dai dati:  identifico il modello. Utilizzo: utilizzo il modello per inferire informazioni su nuovi dati  (controllo, regressione predittiva, classificazione). A.A. 2017-2018 4/38 http:\borghese.di.unimi.it\  2

  3. Modello parametrico 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -0.2 -0.2 -0.4 -0.4 -0.6 -0.6 -0.8 -0.8 -1 -1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000 I punti vengono fittati perfettamente da una sinusoide: y = A sin( w x + f ). Devo determinare solo i 3 parametri della sinusoide (non lineare), i cui valori ottimali sono: w = 1/200, f = 0.1, A = 1. I parametri hanno un significato semantico. A.A. 2017-2018 http:\borghese.di.unimi.it\ 5/38 I modelli semi-parametrici L’approssimazione è ottenuta mediante funzioni “generiche”, dette di base,  soluzione molto utilizzata nelle NN e in Machine learning . E’ anche associato all’ approccio «black -box» in cibernetica. Non si hanno informazioni sulla struttura dell’oggetto che vogliamo rappresentare. (Il concetto di Base in matematica è definito mediante certe proprietà di  approssimazione che qui non consideriamo, consideriamo solo l’idea intuitiva). Il concetto di base è simile a quello dei “replicating kernels”. E’ anche l’idea che sta alla base delle Reti Neurali Artificiali   =  ( ( , )) ( , ; ) z p x y w G p p i i i Combinazione Da calcolare lineare di funzioni di base Funzione di base (fissate) A.A. 2017-2018 6/38 http:\borghese.di.unimi.it\ 3

  4. Approssimazione mediante un modello semi-parametrico (lineare) 1 1 0.9 0.8 0.8 0.6 0.7 0.4 0.6 0.2 0 0.5 -0.2 0.4 -0.4 0.3 -0.6 0.2 -0.8 0.1 -1 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Sinusoide y = A sin( w x + f ) con w = 1/200, f = 0.1. Vogliamo fittare i punti con l’insieme di Gaussiane riportate sulla dx. In questo caso hanno tutte  = 90. Come le utilizzo? A.A. 2017-2018 http:\borghese.di.unimi.it\ 7/38 Funzionamento di un modello semi- parametrico (lineare) 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -0.2 -0.2 -0.4 -0.4 -0.6 -0.6 -0.8 -0.8 -1 -1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 20 Devo definire, gli M {w i }.  =  o ( ) ( ; 90 ) y x w G x x 3 << M << N – numero punti. i o i = 1 i I  sono tutti uguali ed uguali a 90 o , le Gaussiane sono equispaziate. Le Gaussiane sono note tutte a priori, devono essere definiti i pesi. A.A. 2017-2018 8/38 http:\borghese.di.unimi.it\ 4

  5. Surface reconstruction with filtering Convolution:  we can reconstruct signals up to a certain scale, provided an adequate small value of  . N  ˆ =   =   ( ) * ( ; ) ( ; ) f x f G x x w G x x Discrete convolution: i k  i k i i = 1 i The reconstruction of the function, if G(.) is normalized, is obtained through digital filtering. Extrapolation beyond the sample points. Reconstruction up to a given scale. A.A. 2017-2018 http:\borghese.di.unimi.it\ 9/38 Filters and bases  x k Normalization factor   Normalized Gaussians, filter = weighed sum of shifted (normalized) basis functions. Basis representation. Approximation space. Riesz basis, the approximation space is characterized by the scale of the basis that determines the amplitude of the space. A sequence of spaces can be defined according to  :  0 -> V 0 ;  1 -> V 1 ;  2 -> V 2 …. The number of representable functions increases. A.A. 2017-2018 10/38 http:\borghese.di.unimi.it\ 5

  6. RBF Network Connessionism. Simple processing units combined with simple operations to create complex functions. Perceptron A.A. 2017-2018 http:\borghese.di.unimi.it\ 11/38 Problema dell’overfitting dovuto a sovraparametrizzazione Quante unità? A.A. 2017-2018 12/38 http:\borghese.di.unimi.it\ 6

  7. Advantages and problems Filters interpolates and reduces noise but... Height in the function on a grid crossing should be known. A.A. 2017-2018 http:\borghese.di.unimi.it\ 13/38 Gridding How can we determine w k from points clouds? Local estimators. Nadaraya Watson estimator. Lazy learning . x c 2  x x    i c    2 y K x x y e   , i i c i   = = i i f x \ c   2   x x  i c  K x x  2  e , i c i i K  (.) Gaussiana Parzen-window estimators. A.A. 2017-2018 14/38 http:\borghese.di.unimi.it\ 7

  8. Surface Approximation  Properties: - Redundancy. - Riesz basis (unique representation, given the height in the grid crossings). Which scale? Too high Too low A.A. 2017-2018 http:\borghese.di.unimi.it\ 15/38 Riassunto  Supervised learning: predictive regression.  Regressione multi-scala  Versione on-line A.A. 2017-2018 16/38 http:\borghese.di.unimi.it\ 8

  9. Pyramidal reconstruction Which is the adequate  scale? Which model is the  closest to the true model? A.A. 2017-2018 http:\borghese.di.unimi.it\ 17/38 Incremental strategy  Acquire more data in the more complex areas, less smooth, higher frequency.  Acquire less data in the less complex areas, more smooth, lower frequency.  Can we use a single  x? Incremental approximation with local adaptation. A.A. 2017-2018 18/38 http:\borghese.di.unimi.it\ 9

  10. Start from low resolution  Low resolution, small distance, 1/  x > 2 n Max  determines the amount of overlap. It determines also the frequency content of the Gaussian G(.). Once  (or  x is computed) the support is defined. A.A. 2017-2018 http:\borghese.di.unimi.it\ 19/38 Determination of the surface height How many points to consider? The Gaussian has infinite support. Splines have a limited support. Apply local estimator to the data points in the neighbourhood of a grid crossing (Gaussian center) to compute f k . Sorting of the data is made simple, they are subdivided into quads. Identified the points inside the neighbourhood is equivalent to extract all the points between two positions in the data vector. A.A. 2017-2018 20/38 http:\borghese.di.unimi.it\ 10

  11. We can obtain a «poor» reconstruction But it is a start. It can be seen as a modified support for successive approximations. A.A. 2017-2018 http:\borghese.di.unimi.it\ 21/38 What can be done?  We can compute the residual for each data point. {r 1 ( x )}     ˆ 1 = , We evaluate the residual for each data point: r dist y f x m m     ˆ 2   =  ˆ =  E.g.: r y f x r y f x 1 m m 1 m m A.A. 2017-2018 22/38 http:\borghese.di.unimi.it\ 11

  12. Is the residual adequate? {r 1 ( x )}  For each Gaussian the integral of the residual r = m m ( ) R x inside the “receptive field” of the Gaussian, is c N assumed as local approximation error associated k to it. , is computed inside its “receptive field”: A.A. 2017-2018 http:\borghese.di.unimi.it\ 23/38 How can we evaluate the local adequacy of the reconstruction?  r = m m R ( x ) c N k We compare the local residual it with a threshold: - Degree of approximation - Noise: RMS. A.A. 2017-2018 24/38 http:\borghese.di.unimi.it\ 12

  13. Layer 2 ˆ y  ( ) f x Input are the residuals, r 1,m = 1 m m  Output is the model that approximates r 1,m : f ( x ) r 2 m 1 , m Layer #2 More packed Gaussians There should be enough points to have a reliable local  ˆ  | ( ) | estimate of not filled grid. r f x = 1 , 2 m m m R ( x ) c N k A.A. 2017-2018 http:\borghese.di.unimi.it\ 25/38 Hierarchy construction s ( x ) a 1 ( x ) r 1 ( x ) a 2 ( x ) r 2 ( x ) and use as a a J ( x ) r J ( x ) stack of layers A.A. 2017-2018 26/38 http:\borghese.di.unimi.it\ 13

  14. How to operate on large sets of data? Recursive splitting of the quad domain -> local re-ordering of the data. A.A. 2017-2018 http:\borghese.di.unimi.it\ 27/38 Applicazione della regressione A.A. 2017-2018 28/38 http:\borghese.di.unimi.it\ 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend