Software development in AppStat B. K egl / AppStat 1 AppStat: - PowerPoint PPT Presentation

Software development in AppStat B. K´ egl / AppStat 1 AppStat: Applied Statistics and Machine Learning AppStat: Apprentissage Automatique et Statistique Appliqu´ ee Bal´ azs K´ egl Linear Accelerator Laboratory, CNRS/University of Paris Sud Service Informatique Nov 30, 2010 1

B. K´ egl / AppStat 2 Overview • Introduction • me • the team • collaborations • Scientific projects → software • discriminative learning → boosting → multiboost.org • inference, Monte-Carlo integration → adaptive MCMC → integration into root (save it for next time)

B. K´ egl / AppStat 3 Scientific path Hungary 1989 – 94 M.Eng. Computer Science BUTE 1994 – 95 research assistant BUTE Canada 1995 – 99 Ph.D. Computer Science Concordia U 2000 postdoc Queen’s U 2001 – 06 assistant professor U of Montreal France 2006 – research scientist (CR1) CNRS / U Paris Sud • Research interests: machine learning, pattern recognition, signal processing, applied statistics • Applications: image and music processing, bioinformatics, software engineering, grid control, experimental physics

B. K´ egl / AppStat 4 The team B. Kégl (team leader) 2006 - - boosting - MCMC - Auger D. Benbouzid (Ph.D. student) 2010 - R. Busa-Fekete (postdoc) - boosting 2008 - """"""""""""""""""" - JEM EUSO - boosting - optimization R. Bardenet (Ph.D student) - SysBio 2009 - - MCMC - optimization - Auger � �� F-D. Collin (software D. Garcia (postdoc; 01/01/2011) engineer; 01/12/2010) - generative models � - multiboost.org - Auger / JEM EUSO - MCMC in root - tutoring - system integration � � � ��

B. K´ egl / AppStat 5 Collaborations Telecom ParisTech LRI LTCI TAO MCMC boosting optimization LAL AppStat o n t i a Hungarian Academy m i z p t i o l t a i k o c c g r u d ESBG reconstruction MCMC boosting r hypothesis test e g g i boosting r t JEM EUSO Auger ILC, LSST, Computer Science etc. Experimental Science Existing link Future link

B. K´ egl / AppStat 6 Funding • ANR “jeune chercheur” MetaModel • 2007–2010, 150K e • ANR “COSINUS” Siminole • 2010–2014, 1043K e (658K e at LAL) • MRM Grille Paris Sud • 2010–2012, 60K e (31K e at LAL)

B. K´ egl / AppStat 7 Siminole within ANR COSINUS • COSINUS = Conception and Simulation • Theme 1: simulation and supercomputing • Theme 2: conception and optimization • Theme 3: large-scale data storage and processing • Siminole • principal theme: Theme 2 • secondary theme: Theme 1

B. K´ egl / AppStat 8 Siminole within ANR COSINUS • Simulation: third pillar of scientific discovery • Improving simulation • algorithmic development inside the simulator • implementation on high-end computing devices • our approach: control the number of calls to the simulator

B. K´ egl / AppStat 9 Siminole within ANR COSINUS • Optimization • simulate from f ( x ) , find max f ( x ) x • Inference • simulate from p ( x | θ ) , find p ( θ | x ) • Discriminative learning • simulate from p ( x , θ ) , find θ = f ( x )

B. K´ egl / AppStat 10 Discriminative learning → boosting → multiboost.org • Discriminative learning (classification) � � • Infer f ( x ) : R d → 1 ,..., K from a database D = ( x 1 , y 1 ) ,..., ( x n , y n ) • boosting, AdaBoost • one of the state-of-the-art classification algorithms • multiboost.org • our implementation

B. K´ egl / AppStat 11 Machine learning at the crossroads Artificial intelligence Probability theory Statistique Optimization Machine learning Cognitive science Signal processing Neuroscience Information theory

B. K´ egl / AppStat 12 Machine Learning • From a statistical point of view • non-parametric fitting, capacity/complexity control • large dimensionality • large data sets, computational issues • mostly classification (categorization, discrimination)

B. K´ egl / AppStat 13 Discriminative learning • observation vector: x ∈ R d • class label: y ∈ {− 1 , 1 } – binary classification • class label: y ∈ { 1 ,..., K } – multi-class classification • classifier: g : R d �→ {− 1 , 1 } • discriminant function: f : R d �→ [ − 1 , 1 ] � 1 , if f ( x ) ≥ 0 , g ( x ) = − 1 , if f ( x ) < 0

B. K´ egl / AppStat 14 Discriminative learning • Inductive learning � � • training sample: D n = ( x 1 , y 1 ) ,..., ( x n , y n ) • function set: F � � n �→ F R d ×{− 1 , 1 } • learning algorithm: A LGO : A LGO ( D n ) → f � � • goal: small generalization error P f ( X ) � = Y

B. K´ egl / AppStat 15 Data for two � class classification problem x 2 1000 900 800 700 600 x 1 500 5 10 15 20 25 30

B. K´ egl / AppStat 16 2 � D Gaussian fit for class 1 x 2 1000 900 800 700 600 x 1 500 5 10 15 20 25 30

B. K´ egl / AppStat 17 2 � D Gaussian fit for class 2 x 2 1000 900 800 700 600 x 1 500 5 10 15 20 25 30

B. K´ egl / AppStat 18 Classification • Terminology • Conditional densities: p ( x | Y = 1 ) , p ( x | Y = − 1 ) • Prior probabilities: p ( Y = 1 ) , p ( Y = − 1 ) • Posterior probabilities: p ( Y = 1 | x ) , p ( Y = − 1 | x ) • Bayes theorem: p ( Y = 1 | x ) = p ( x | Y = 1 ) p ( Y = 1 ) ∼ p ( x | Y = 1 ) p ( Y = 1 ) p ( x ) • Decision: � p ( x | Y = 1 ) p ( Y = 1 ) if p ( x | Y = − 1 ) p ( Y = − 1 ) > 1 , 1 g ( x ) = − 1 otherwise.

B. K´ egl / AppStat 19 Discriminant function with Gaussian fits x 2 1000 900 800 700 600 x 1 500 5 10 15 20 25 30

B. K´ egl / AppStat 20 ' Two Moons' data for two � class classification problem x 2 5 4 3 2 1 0 � 1 x 1 � 1 0 1 2 3 4 5 6

B. K´ egl / AppStat 21 2 � D Gaussian fit for class 1 x 2 5 4 3 2 1 0 � 1 x 1 � 1 0 1 2 3 4 5 6

B. K´ egl / AppStat 22 2 � D Gaussian fit for class 2 x 2 5 4 3 2 1 0 � 1 x 1 � 1 0 1 2 3 4 5 6

B. K´ egl / AppStat 23 Discriminant function with Gaussian fits x 2 5 4 3 2 1 0 � 1 x 1 � 1 0 1 2 3 4 5 6

B. K´ egl / AppStat 24 2 � D Parzen fit for class 1, h � 0.12 x 2 5 4 3 2 1 0 � 1 x 1 � 1 0 1 2 3 4 5 6

B. K´ egl / AppStat 26 Discriminant function with Parzen fits, h � 0.12 x 2 5 4 3 2 1 0 � 1 x 1 � 1 0 1 2 3 4 5 6

B. K´ egl / AppStat 29 Discriminant function with Parzen fits, h � 0.02 x 2 5 4 3 2 1 0 � 1 x 1 � 1 0 1 2 3 4 5 6

B. K´ egl / AppStat 30 2 � D Parzen fit for class 1, h � 3 x 2 5 4 3 2 1 0 � 1 x 1 � 1 0 1 2 3 4 5 6

B. K´ egl / AppStat 31 2 � D Parzen fit for class 2, h � 3 x 2 5 4 3 2 1 0 � 1 x 1 � 1 0 1 2 3 4 5 6

B. K´ egl / AppStat 32 Discriminant function with Parzen fits, h � 3 x 2 5 4 3 2 1 0 � 1 x 1 � 1 0 1 2 3 4 5 6

B. K´ egl / AppStat 33 Training and test error rates for Parzen fits with different bandwidths error rate 0.20 0.15 0.10 0.05 0.00 h 0 0.2 0.4 0.6 0.8

B. K´ egl / AppStat 34 Non-parametric fitting • Capacity control, regularization • trade-off between approximation error and estimation error • complexity grows with data size • no need to correctly guess the function class

Software development in AppStat B. K egl / AppStat 1 AppStat: - PowerPoint PPT Presentation

Software development in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine Learning AppStat: Apprentissage Automatique et Statistique Appliqu ee Bal azs K egl Linear Accelerator Laboratory, CNRS/University of

Research in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine Learning

CSE 2221 Software I: Software Components and CSE 2231 Software II: Software Development and

Introduction to Software Testing Software Testing - Module 1 Part 1 The Software Engineering

Software Engineering Topics Computer science v. software engineering Definition of

Global Software Development in Global Software Development in Global Software Development in the

Lean Software Development Lean Software Development is an Agile practice that is based on the

CSE 2221 Software I: Software Components and CSE 2231 Software II: Software Development and

Critical- -Software Software Critical Critical-Software Development Solutions Development

Agile Software Development Venkat Subramaniam svenkat@cs.uh.edu Agile Software Development - 1

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software & IBM TSS

Online presentation software reviews Online presentation software reviews.zip 18/02/2014 Top

Engineering Culture Secret Sauce of Great Software Great Software process model Great

KAI Software YOUR IDEA.OUR SOFTWARE. Overview KAI SOFTWARE INC was founded in 1994 and today we

CS412 Software Security Secure Software Lifecycle Mathias Payer EPFL, Spring 2019 Mathias Payer

CS527 Software Security Secure Software Lifecycle Mathias Payer Purdue University, Spring 2018

CS3505/5020 Software Practice II Software process overview Sequence diagrams CS 3505 L11 - 1

The Dueling Bandits Problem Yisong Yue Collaborators Yanan

Fast Bayesian automatic Fast Bayesian automatic adaptive quadrature adaptive quadrature Gh.

O OpenFow @ Korea: F @ K Li ki Linking OpenFlow Activities O Fl A i i i in Korea in Korea

Struktur Data & Algoritme ( Data Structures & Algorithms ) Sorting Denny (

Dependability Modelling and Assessment of Avionics Systems with Altarica. P. Bieber, Ch. Castel,

Outline Corpus Evidence and Compound Structure: The Case of Italian NN Compounds 1 Introduction

Extreme F-measure Maximization Kalina Jasinska 1 Karlson Pfannschmidt 2 obert Busa-Fekete 2 nski 1

The Constitutional Court, TES and sustainable workforce solutions A consideration of the NUMSA |

Software development in AppStat B. K egl / AppStat 1 AppStat: - PowerPoint PPT Presentation

Software development in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine Learning AppStat: Apprentissage Automatique et Statistique Appliqu ee Bal azs K egl Linear Accelerator Laboratory, CNRS/University of

Research in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine Learning

CSE 2221 Software I: Software Components and CSE 2231 Software II: Software Development and

Introduction to Software Testing Software Testing - Module 1 Part 1 The Software Engineering

Software Engineering Topics Computer science v. software engineering Definition of

Global Software Development in Global Software Development in Global Software Development in the

Lean Software Development Lean Software Development is an Agile practice that is based on the

CSE 2221 Software I: Software Components and CSE 2231 Software II: Software Development and

Critical- -Software Software Critical Critical-Software Development Solutions Development

Agile Software Development Venkat Subramaniam svenkat@cs.uh.edu Agile Software Development - 1

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software &amp; IBM TSS

Online presentation software reviews Online presentation software reviews.zip 18/02/2014 Top

Engineering Culture Secret Sauce of Great Software Great Software process model Great

KAI Software YOUR IDEA.OUR SOFTWARE. Overview KAI SOFTWARE INC was founded in 1994 and today we

CS412 Software Security Secure Software Lifecycle Mathias Payer EPFL, Spring 2019 Mathias Payer

CS527 Software Security Secure Software Lifecycle Mathias Payer Purdue University, Spring 2018

CS3505/5020 Software Practice II Software process overview Sequence diagrams CS 3505 L11 - 1

The Dueling Bandits Problem Yisong Yue Collaborators Yanan

Fast Bayesian automatic Fast Bayesian automatic adaptive quadrature adaptive quadrature Gh.

O OpenFow @ Korea: F @ K Li ki Linking OpenFlow Activities O Fl A i i i in Korea in Korea

Struktur Data &amp; Algoritme ( Data Structures &amp; Algorithms ) Sorting Denny (

Dependability Modelling and Assessment of Avionics Systems with Altarica. P. Bieber, Ch. Castel,

Outline Corpus Evidence and Compound Structure: The Case of Italian NN Compounds 1 Introduction

Extreme F-measure Maximization Kalina Jasinska 1 Karlson Pfannschmidt 2 obert Busa-Fekete 2 nski 1

The Constitutional Court, TES and sustainable workforce solutions A consideration of the NUMSA |

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software & IBM TSS

Struktur Data & Algoritme ( Data Structures & Algorithms ) Sorting Denny (