Introduction to Machine Learning 6. Kernels Methods Alex Smola - PowerPoint PPT Presentation

Proof • Move boundary at optimality • For smaller threshold m - points on wrong side of margin contribute δ ( m − − ν m ) ≤ 0 • For larger threshold m+ points not on ‘good’ side of margin yield δ ( m + − ν m ) ≥ 0 • Combining inequalities m − m ≤ ν ≤ m + m • Margin set of measure 0

Toy example ν , width c 0.5, 0.5 0.5, 0.5 0.1, 0.5 0.5, 0.1 frac. SVs/OLs 0.54, 0.43 0.59, 0.47 0.24, 0.03 0.65, 0.38 margin ρ / � w � 0.84 0.70 0.62 0.48 threshold and smoothness requirements

Novelty detection for OCR Better estimates since we only optimize in low density regions. Specifically tuned for small number of outliers. Only estimates of a level-set. For ν = 1 we get the Parzen-windows estimator back.

Classification with the ν -trick changing kernel width and threshold

Convex Optimization S M L

Selecting Variables

Constrained Quadratic Program • Optimization Problem 1 2 α > Q α + l > α subject to C α + b ≤ 0 minimize α • Support Vector classification • Support Vector regression • Novelty detection • Solving it • Off the shelf solvers for small problems • Solve sequence of subproblems • Optimization in primal space (the w space)

Convex problem

Subproblems • Original optimization problem 1 2 α > Q α + l > α subject to C α + b ≤ 0 minimize α • Key Idea - solve subproblems one at a time and decompose into active and fixed set α = ( α a , α f ) 1 a Q aa α a + [ l a + Q af α f ] > α a 2 α > minimize α subject to C a α a + [ b + C f α f ] ≤ 0 • Subproblem is again a convex problem • Updating subproblems is cheap

Picking observations X w = y i α i x i i w α i = 0 = ) y i [ h w, x i i + b ] � 1 α i [ y i [ h w, x i i + b ] + ξ i � 1] = 0 0 < α i < C = ) y i [ h w, x i i + b ] = 1 η i ξ i = 0 α i = C = ) y i [ h w, x i i + b ]  1 • Most violated margin condition • Points on the boundary • Points with nonzero Lagrange multiplier that are correct

Selecting variables • Incrementally increase (chunking) • Select promising subset of actives (SVMLight) • Select pairs of variables (SMO)

� � �� Being smart about hardware � � � • Data flow from disk to CPU Data Cached � Data Parameter (Working � Set) ( o g Set) Reading Training Disk RAM RAM Thread Thread Read Load Read Update (Sequential �� (Random (Random � Access) ) Access) ) Access) ) • IO speeds System Capacity Bandwidth IOPs 10 2 Disk 3TB 150MB/s � � � 5 · 10 4 SSD 256GB 500MB/s 10 8 RAM 16GB 30GB/s 10 9 Cache 16MB 100GB/s

� � �� Being smart about hardware � � � • Data flow from disk to CPU Data Cached � Data Parameter ( (Working � Set) o g Set) Reading Training Disk RAM RAM Thread Thread Read Load Read Update (Sequential �� (Random (Random � Access) ) Access) ) Access) ) • IO speeds System Capacity Bandwidth IOPs 10 2 Disk 3TB 150MB/s reuse � � � 5 · 10 4 SSD 256GB 500MB/s 10 8 data RAM 16GB 30GB/s 10 9 Cache 16MB 100GB/s

Runtime Example (Matsushima, Vishwanathan, Smola, 2012) fastest 10 − 1 competitor 10 − 3 10 − 5 10 − 7 dna C = 1 . 0 StreamSVM 10 − 9 SBM BM 10 − 11 0 1 2 3 4 · 10 4

Primal Space Methods

Gradient Descent • Assume we can optimize in feature space directly • Minimize regularized risk m R [ w ] = 1 l ( x i , y i , w ) + λ X 2 k w k 2 m i =1 • Compute gradient g = ∂ w R [ w ] and update w ← w − γ g • This fails in narrow canyons • Wasteful if we have lots of similar data

Stochastic gradient descent • Empirical risk as expectation m 1 X l ( y i � h φ ( x i ) , w i ) = E i ∼ { 1 ,..m } [ l ( y i � h φ ( x i ) , w i )] m i =1 • Stochastic gradient descent (pick random x,y) w t +1 w t � η t ∂ w ( y t , h φ ( x t ) , w t i ) • Often we require that parameters are restricted to some convex set X, hence we project on it w t +1 π x [ w t � η t ∂ w ( y t , h φ ( x t ) , w t i )] here π X ( w ) = argmin k x � w k x ∈ X

Some applications • Classification • Soft margin loss l ( x, y, w ) = max(0 , 1 � y h w, φ ( x ) i ) • Logistic loss l ( x, y, w ) = log (1 + exp ( � y h w, φ ( x ) i )) • Regression • Quadratic loss l ( x, y, w ) = ( y � h w, φ ( x ) i ) 2 • l1 loss l ( x, y, w ) = | y � h w, φ ( x ) i | • Huber’s loss ( 2 σ 2 ( y � h w, φ ( x ) i ) 2 1 if | y � h w, φ ( x ) i |  σ l ( x, y, w ) = σ | y � h w, φ ( x ) i | � 1 1 if | y � h w, φ ( x ) i | > σ 2 • Novelty detection l ( x, w ) = max(0 , 1 � h w, φ ( x ) i ) ... and many more

Convergence in Expectation initial loss � l ∗  R 2 + L 2 P T − 1 t =0 η 2 l (¯ t ⇥ ⇤ θ ) where E ¯ θ 2 P T − 1 t =0 η t P T − 1 t =0 θ t η t l ( θ ) = E ( x,y ) [ l ( y, h φ ( x ) , θ i )] and l ∗ = inf θ ∈ X l ( θ ) and ¯ θ = P T − 1 t =0 η t parameter average expected loss • Proof Show that parameters converge to minimum θ ∗ 2 argmin l ( θ ) and set r t := k θ ∗ � θ t k θ ∈ X from Nesterov and Vial

Proof t +1 = k π X [ θ t � η t g t ] � θ ∗ k 2 r 2  k θ t � η t g t � θ ∗ k 2 t k g t k 2 � 2 η t h θ t � θ ∗ , g t i = r 2 t + η 2 t L 2 + 2 η t [ l ∗ � E [ l ( θ t )]] r 2 t +1 � r 2  η 2 ⇥ ⇤ hence E by convexity by convexity t t L 2 + 2 η t l ∗ � E [ l (¯  η 2 ⇥ ⇤ θ )] • Summing over inequality for t proves claim • This yields randomized algorithm for minimizing objective functions (try log times and pick the best / or average median trick)

Rates • Guarantee − l ∗ ≤ R 2 + L 2 P T − 1 t =0 η 2 l (¯ t ⇥ ⇤ θ ) E ¯ θ 2 P T − 1 t =0 η t • If we know R, L, T pick constant learning rate R θ )] − l ∗ ≤ R [1 + 1 /T ] L < LR θ [ l (¯ η = and hence E ¯ √ √ √ L T 2 T T • If we don’t know T pick η t = O ( t − 1 2 ) This costs us an additional log term ✓ log T ◆ θ )] − l ∗ = O θ [ l (¯ E ¯ √ T

Strong Convexity l i ( θ 0 ) � l i ( θ ) + h ∂ θ l i ( θ ) , θ 0 � θ i + 1 2 λ k θ � θ 0 k 2 • Use this to bound the expected deviation t k g t k 2 � 2 η t h θ t � θ ∗ , g t i r 2 t +1  r 2 t + η 2 t L 2 � 2 η t [ l t ( θ t ) � l t ( θ ∗ )] � 2 λη t r 2  r 2 t + η 2 k hence E [ r 2 t +1 ]  (1 � λ h t ) E [ r 2 t ] � 2 η t [ E [ l ( θ t )] � l ∗ ] • Exponentially decaying averaging T − 1 θ = 1 − σ ¯ X σ T − 1 − t θ t 1 − σ T t =0 and plugging this into the discrepancy yields " # " # 1 1 θ ) − l ∗ ≤ 2 L 2 2 1 + λ RT 1 + λ RT 2 2 l (¯ λ T log for η = λ T log 2 L 2 L

More variants • Adversarial guarantees θ t +1 π x [ θ t � η t ∂ θ ( y t , h φ ( x t ) , θ t i )] has low regret (average instantaneous cost) for arbitrary orders (useful for game theory) • Ratliff, Bagnell, Zinkevich learning rate O ( t − 1 2 ) • Shalev-Shwartz, Srebro, Singer (Pegasos) learning rate (but need constants) O ( t − 1 ) • Bartlett, Rakhlin, Hazan (add strong convexity penalty)

Regularization

Problems with Kernels Myth Support Vectors work because they map data into a high-dimensional feature space. And your statistician (Bellmann) told you . . . The higher the dimensionality, the more data you need Example: Density Estimation Assuming data in [0 , 1] m , 1000 observations in [0 , 1] give you on average 100 instances per bin (using binsize 0 . 1 m ) 1 100 instances in [0 , 1] 5 . but only Worrying Fact Some kernels map into an infinite -dimensional space, e.g., k ( x, x 0 ) = exp( � 1 2 σ 2 k x � x 0 k 2 ) Encouraging Fact SVMs work well in practice . . .

Solving the Mystery The Truth is in the Margins Maybe the maximum margin requirement is what saves us when finding a classifier, i.e., we minimize k w k 2 . Risk Functional Rewrite the optimization problems in a unified form m X R reg [ f ] = c ( x i , y i , f ( x i )) + Ω [ f ] i =1 c ( x, y, f ( x )) is a loss function and Ω [ f ] is a regularizer. 2 k w k 2 for linear functions. Ω [ f ] = � For classification c ( x, y, f ( x )) = max(0 , 1 � yf ( x )) . For regression c ( x, y, f ( x )) = max(0 , | y � f ( x ) | � ✏ ) .

Typical SVM loss Soft Margin Loss ε -insensitive Loss

Soft Margin Loss Original Optimization Problem m 1 2 k w k 2 + C X minimize ξ i w, ξ i =1 subject to y i f ( x i ) � 1 � ξ i and ξ i � 0 for all 1  i  m Regularization Functional m λ 2 k w k 2 + X minimize max(0 , 1 � y i f ( x i )) w i =1 For fixed f , clearly ξ i � max(0 , 1 � y i f ( x i )) . For ξ > max(0 , 1 � y i f ( x i )) we can decrease it such that the bound is matched and improve the objective function. Both methods are equivalent.

Why Regularization? What we really wanted . . . Find some such that the expected loss f ( x ) E [ c ( x, y, f ( x ))] is small. What we ended up doing . . . Find some f ( x ) such that the empirical average of the expected loss E emp [ c ( x, y, f ( x ))] is small. m E emp [ c ( x, y, f ( x ))] = 1 X c ( x i , y i , f ( x i )) m i =1 However, just minimizing the empirical average does not guarantee anything for the expected loss (overfitting). Safeguard against overfitting We need to constrain the class of functions f ∈ F some- how. Adding Ω [ f ] as a penalty does exactly that.

Some regularization ideas Small Derivatives We want to have a function f which is smooth on the entire domain. In this case we could use Z k ∂ x f ( x ) k 2 dx = h ∂ x f, ∂ x f i . Ω [ f ] = X Small Function Values If we have no further knowledge about the domain X , minimizing k f k 2 might be sensible, i.e., Ω [ f ] = k f k 2 = h f, f i . Splines Here we want to find f such that both k f k 2 and k ∂ 2 x f k 2 are small. Hence we can minimize Ω [ f ] = k f k 2 + k ∂ 2 x f k 2 = h ( f, ∂ 2 x f ) , ( f, ∂ 2 x f ) i

Regularization Regularization Operators We map f into some Pf , which is small for desirable f and large otherwise, and minimize Ω [ f ] = k Pf k 2 = h Pf, Pf i . For all previous examples we can find such a P . Function Expansion for Regularization Operator Using a linear function expansion of f in terms of some X f i , that is for f ( x ) = α i f i ( x ) we can compute i * + X X X Ω [ f ] = α i f i ( x ) , P α j f i ( x ) = α i α j h Pf i , Pf j i . P i j i,j

Regularization and Kernels Regularization for Ω [ f ] = 1 2 k w k 2 ) k w k 2 = X X w = α i Φ ( x i ) = α i α j k ( x i , x j ) i i,j This looks very similar to h Pf i , Pf j i . Key Idea So if we could find a P and k such that k ( x, x 0 ) = h Pk ( x, · ) , Pk ( x 0 , · ) i we could show that using a kernel means that we are minimizing the empirical risk plus a regularization term. Solution: Greens Functions A sufficient condition is that k is the Greens Function of P ⇤ P , that is h P ⇤ Pk ( x, · ) , f ( · ) i = f ( x ) . One can show that this is necessary and sufficient.

Building Kernels Kernels from Regularization Operators: Given an operator P ⇤ P , we can find k by solving the self consistency equation h Pk ( x, · ) , Pk ( x 0 , · ) i = k > ( x, · )( P ⇤ P ) k ( x 0 , · ) = k ( x, x 0 ) and take f to be the span of all k ( x, · ) . So we can find k for a given measure of smoothness. Regularization Operators from Kernels: Given a kernel k , we can find some P ⇤ P for which the self consistency equation is satisfied. So we can find a measure of smoothness for a given k .

Spectrum and Kernels Effective Function Class Keeping Ω [ f ] small means that f ( x ) cannot take on arbitrary function values. Hence we study the function class � ⇢ � 1 � F C = 2 h Pf, Pf i  C f � � Example α i k ( x i , x ) this implies 1 X 2 α > K α  C. For f = i Kernel Matrix Coefficients Function Values  � 5 2 K = 2 1

Fourier Regularization Goal Find measure of smoothness that depends on the fre- quency properties of f and not on the position of f . A Hint: Rewriting k f k 2 + k ∂ x f k 2 Notation: ˜ f ( ω ) is the Fourier transform of f . Z k f k 2 + k ∂ x f k 2 = | f ( x ) | 2 + | ∂ x f ( x ) | 2 dx Z f ( ω ) | 2 + ω 2 | ˜ | ˜ f ( ω ) | 2 d ω = Z | ˜ f ( ω ) | 2 1 p ( ω ) d ω where p ( ω ) = = 1 + ω 2 . Idea Z | ˆ f ( ω ) | 2 Generalize to arbitrary p ( ω ) , i.e. Ω [ f ] := 1 p ( ω ) d ω 2 Alexander J. Smola: An Introduction to Support Vectors and Regularization, Page 13

Greens Function Theorem Z | ˆ f ( ω ) | 2 For regularization functionals Ω [ f ] := 1 p ( ω ) d ω the 2 self-consistency condition h Pk ( x, · ) , Pk ( x 0 , · ) i = k > ( x, · )( P ⇤ P ) k ( x 0 , · ) = k ( x, x 0 ) is satisfied if k has p ( ω ) as its Fourier transform, i.e., Z k ( x, x 0 ) = exp( � i h ω , ( x � x 0 ) i ) p ( ω ) d ω Consequences small p ( ω ) correspond to high penalty (regularization). Ω [ f ] is translation invariant, that is Ω [ f ( · )] = Ω [ f ( · � x )] .

Examples Laplacian Kernel k ( x, x 0 ) = exp( �k x � x 0 k ) p ( ω ) / (1 + k ω k 2 ) � 1 Gaussian Kernel 2 σ � 2 k x � x 0 k 2 k ( x, x 0 ) = e � 1 p ( ω ) / e � 1 2 σ 2 k ω k 2 Fourier transform of k shows regularization properties. The more rapidly p ( ω ) decays, the more high frequencies are filtered out.

Rules of thumb Fourier transform is sufficient to check whether k ( x, x 0 ) satisfies Mercer’s condition: only check if ˜ k ( ω ) � 0 . Example: k ( x, x 0 ) = sinc( x � x 0 ) . ˜ k ( ω ) = χ [ � π , π ] ( ω ) , hence k is a proper kernel. Width of kernel often more important than type of kernel (short range decay properties matter). Convenient way of incorporating prior knowledge, e.g.: for speech data we could use the autocorrelation function. Sum of derivatives becomes polynomial in Fourier space.

Polynomial Kernels Functional Form k ( x, x 0 ) = κ ( h x, x 0 i ) Series Expansion Polynomial kernels admit an expansion in terms of Leg- endre polynomials ( L N n : order n in R N ). 1 k ( x, x 0 ) = X b n L n ( h x, x 0 i ) n =0 Consequence: L n (and their rotations) form an orthonormal basis on the unit sphere, P ⇤ P is rotation invariant, and P ⇤ P is diago- nal with respect to L n . In other words ( P ⇤ P ) L n ( h x, · i ) = b � 1 n L n ( h x, · i )

Polynomial Kernels Decay properties of b n determine smoothness of functions specified by k ( h x, x 0 i ) . n but x n vanish, hence a Taylor For N ! 1 all terms of L N i a i h x, x 0 i i gives a good guess. series k ( x, x 0 ) = P Inhomogeneous Polynomial k ( x, x 0 ) = ( h x, x 0 i + 1) p ✓ p ◆ if n  p a n = n Vovk’s Real Polynomial k ( x, x 0 ) = 1 � h x, x 0 i p 1 � ( h x, x 0 i ) a n = 1 if n < p

Mini Summary Regularized Risk Functional From Optimization Problems to Loss Functions Regularization Safeguard against Overfitting Regularization and Kernels Examples of Regularizers Regularization Operators Greens Functions and Self Consistency Condition Fourier Regularization Translation Invariant Regularizers Regularization in Fourier Space Kernel is inverse Fourier Transformation of Weight Polynomial Kernels and Series Expansions

Text Analysis (string kernels)

String Kernel (pre)History

The Kernel Perspective • Design a kernel implementing good features X k ( x, x 0 ) = h φ ( x ) , φ ( x 0 ) i and f ( x ) = h φ ( x ) , w i = α i k ( x i , x ) i • Many variants • Bag of words (AT&T labs 1995, e.g. Vapnik) • Matching substrings (Haussler, Watkins 1998) • Spectrum kernel (Leslie, Eskin, Noble, 2000) • Suffix tree (Vishwanathan, Smola, 2003) • Suffix array (Teo, Vishwanathan, 2006) • Rational kernels (Mohri, Cortes, Haffner, 2004 ...)

Bag of words • At least since 1995 known in AT&T labs X X k ( x, x 0 ) = n w ( x ) n w ( x 0 ) and f ( x ) = ω w n w ( x 0 ) w w (to be or not to be) (be:2, or:1, not:1, to:2) • Joachims 1998: Use sparse vectors • Haffner 2001: Inverted index for faster training • Lots of work on feature weighting (TF/IDF) • Variants of it deployed in many spam filters

Substring (mis)matching • Watkins 1998+99 (dynamic alignment, etc) • Haussler 1999 (convolution kernels) � END � B 1 �� 1 �� 1 �� X X k ( x, x 0 ) = κ ( w, w 0 ) AB � w 0 2 x 0 w 2 x 1 �� 1 A � 1 �� 1 START • In general O(x x’) runtime (e.g. Cristianini, Shawe-Taylor, Lodhi, 2001) • Dynamic programming solution for pair-HMM

Spectrum Kernel • Leslie, Eskin, Noble & coworkers, 2002 • Key idea is to focus on features directly • Linear time operation to get features • Limited amount of mismatch (exponential in number of missed chars) • Explicit feature construction (good & fast for DNA sequences)

Suffix Tree Kernel • Vishwanathan & Smola, 2003 (O(x + x’) time) • Mismatch-free kernel + arbitrary weights X k ( x, x 0 ) = ω w n w ( x ) n w ( x 0 ) w • Linear time construction (Ukkonen, 1995) • Find matches for second string in linear time (Chang & Lawler, 1994) • Precompute weights on path

Are we done? • Large vocabulary size • Need to build dictionary • Approximate matches are still a problem • Suffix tree/array is storage inefficient (40-60x) • Realtime computation • Memory constraints (keep in RAM) • Difficult to implement

Multitask Learning

Multitask Learning Classifier Classifier Classifier Classifier

Multitask Learning Classifier Classifier Classifier Classifier 0: not- ? 1: spam! 0: quality 1: donut? spam! educated misinformed confused malicious silent

Multitask Learning Classifier Classifier Classifier Classifier Classifier educated misinformed confused malicious silent

Multitask Learning Global Classifier Classifier Classifier Classifier Classifier Classifier educated misinformed confused malicious silent

Collaborative Classification Primal representation • f ( x, u ) = ⇤ φ ( x ) , w ⌅ + ⇤ φ ( x ) , w u ⌅ = ⇤ φ ( x ) ⇥ (1 � e u ) , w ⌅ Kernel representation k (( x, u ) , ( x � , u � )) = k ( x, x � )[1 + δ u,u � ] Multitask kernel (e.g. Pontil & Michelli, Daume). Usually does not scale well ... • Problem - dimensionality is 10 13 . That is 40TB of space

Collaborative Classification email w w user Primal representation • f ( x, u ) = ⇤ φ ( x ) , w ⌅ + ⇤ φ ( x ) , w u ⌅ = ⇤ φ ( x ) ⇥ (1 � e u ) , w ⌅ Kernel representation k (( x, u ) , ( x � , u � )) = k ( x, x � )[1 + δ u,u � ] Multitask kernel (e.g. Pontil & Michelli, Daume). Usually does not scale well ... • Problem - dimensionality is 10 13 . That is 40TB of space

Collaborative Classification email (1 + e user ) email w w + e user w user Primal representation • f ( x, u ) = ⇤ φ ( x ) , w ⌅ + ⇤ φ ( x ) , w u ⌅ = ⇤ φ ( x ) ⇥ (1 � e u ) , w ⌅ Kernel representation k (( x, u ) , ( x � , u � )) = k ( x, x � )[1 + δ u,u � ] Multitask kernel (e.g. Pontil & Michelli, Daume). Usually does not scale well ... • Problem - dimensionality is 10 13 . That is 40TB of space

Hashing

Hash Kernels *in the old days

Hash Kernels instance: dictionary: 1 Hey, 2 please mention subtly during 1 your talk that people should use Yahoo* search more often. Thanks, task/user (=barney): 1 *in the old days sparse

Hash Kernels instance: dictionary: hash 1 Hey, function: R m 2 please mention 1 subtly during 1 your talk that 3 people should use Yahoo* h () search more 2 often. 1 Thanks, sparse task/user (=barney): 1 *in the old days sparse

Hash Kernels x i ∈ R N × ( U +1) instance: ⇥ Hey, {-1, 1} please mention h(‘mention’) s(m_b) 1 subtly during your talk that 3 people should use Yahoo h () search more 2 h(‘mention_barney’) often. s(m) -1 Thanks, task/user Similar to count hash (=barney): (Charikar, Chen, Farrach-Colton, 2003)

Introduction to Machine Learning 6. Kernels Methods Alex Smola - PowerPoint PPT Presentation

Introduction to Machine Learning 6. Kernels Methods Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Regression Regression Estimation Find function f minimizing regression error R [ f ] := E x,y

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Some Approaches to Modeling Wrong-Way Risk in Counterparty Credit Risk Management and CVA Alex

Damir Kinzebulatov, U Laval, Qu ebec http://archimede.mat.ulaval.ca/pages/kinzebulatov 1 / 27

Mean Field Games with Singular Controls, and Applications Xin Guo University of California at

Localization 1 Odometric Localization planning and feedback control require the knowledge of

Mechanism of Low-Intermediate-High Confinement Transitions in HL-2A Tokamak J.Q. Dong, J. Cheng,

Pesticide Enforcement Partners Department of Pesticide Regulation (DPR) Oversees statewide

Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of

Technology Mapping Technology Mapping Slides adopted from A. Kuehlmann Slides adopted from A.