Introduction Marco Chiarandini Department of Mathematics & - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 1 Introduction Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark

Course Introduction Introduction Outline Supervised Learning 1. Course Introduction 2. Introduction 3. Supervised Learning Linear Regression Nearest Neighbor 2

Course Introduction Introduction Supervised Learning 4

Course Introduction Introduction Machine Learning Supervised Learning ML is a branch of artificial intelligence and an interdisciplinary field of CS, statistics, math and engineering. Applications in science, finance, industry: predict possibility for a certain disease on basis of clinical measures assess credit risk (default/non-default) identify numbers in ZIP codes identify risk factor for cancer based on clinical measures drive vehicles data bases in medical practice to extract knowledge spam filter costumer recommendations (eg, amazon) web search, fraud detection, stock trading, drug design Automatically learn programs by generalizing from examples. As more data becomes available, more ambitious problems can be tackled. 5

Course Introduction Introduction Machine Learning vs Data Mining Supervised Learning Machine learning (or predictive analytics) focuses on accuracy of prediction Data can be collected Data mining (or information re-trivial) focuses on efficiency of the algorithms since it mainly refer to big data. All data are given However the terms can be used interchangeably 6

Course Introduction Introduction Aims of the course Supervised Learning to convey excitement about the subject to learn about the state of the art methods to acquire skills to apply a ML algorithm, make it work and interpret the results to gain some bits of folk knowledge to make ML algorithms work well (developing successful machine learning applications requires a substantial amount of “black art” that is difficult to find in textbooks) 7

Course Introduction Introduction Schedule Supervised Learning Schedule ( ≈ 28 lecture hours + ≈ 14 exercise hours): Monday, 08:15-10:00, IMADA seminarrum Wednesday, 16:15-18:00, U49 Friday, 08.15-10:00, IMADA seminarrum Last lecture: Friday, March 15, 2013 8

Course Introduction Introduction Supervised Learning Communication tools Course Public Webpage (WWW) ⇔ BlackBoard (BB) (link from http://www.imada.sdu.dk/~marco/DM825/ ) Announcements in BlackBoard Personal email Main reading material: Pattern recognition and Machine Learning by C.M. Bishop. Springer, 2006 Lecture Notes by Andrew Ng, Stanford University Slides 9

Course Introduction Introduction Contents Supervised Learning Supervised Learning : linear regression and linear models • gradient descent, Newton-Raphson (batch and sequential) • least squares method • k-nearest neighbor • curse of dimensionality • regularized least squares (aka, shrinkage or ridge regr.) • locally weighted linear regression • model selection • maximum likelihood approach • Bayesian approach linear models for classification • logistic regression • multinomial (logistic) regression • generalized linear models • decision theory neural networks • perceptron algorithm • multi-layer perceptrons generative algorithms • Gaussian discriminant and linear discriminant analysis kernels and support vector machines probabilistic graphical models: naive Bayes • discrete • linear Gaussian • mixed variables • conditional independence • Markov random fields • Inference: exact, chains, polytree, approximate • hidden Markov models bagging • boosting • tree based methods • learning theory Unsupervised learning : Association rules • cluster analysis • k-means • mixture models • EM algorithm • principal components Reinforcement learning : • MDPs • Bellman equations • value iteration and policy iteration • Q-learning • policy search • POMDPs. Data mining : frequent pattern mining 10

Course Introduction Introduction Prerequisites Supervised Learning Calculus (MM501, MM502) Linear Algebra (MM505) Probability calculus (random variables, expectation, variance) Discrete Methods (DM527) Science Statistics (ST501) Programming in R 11

Course Introduction Introduction Evaluation Supervised Learning 5 ECTS course language: Danish and English obligatory Assignments, pass/fail, evaluation by teacher (2 hand in) practical part 3 hour written exam, 7-grade scale, external censor theory part similar to exercises in class 12

Course Introduction Introduction Assignments Supervised Learning Small projects (in groups of 2) must be passed to attend the oral exam: Data set and guidelines will be provided but you can propose to work on different data (eg. www.kaggle.org ) Entail programming in R 13

Course Introduction Introduction Exercises Supervised Learning Prepare for the exercise session revising the theory In class, you will work at the exercises in small groups 14

Course Introduction Introduction Supervised Learning Supervised Learning inputs that influence outputs inputs: predictors, independent variables, features outputs: responses, dependent variables goal: predict value of outputs supervised : we provide data set with exact answers regression problem � variable to predict is continuous/quantitative classification problem � variable to predict is discrete/qualitative/categorical/factor 16

Course Introduction Introduction Other forms of learning Supervised Learning unsupervised learning reinforcement learning: not one shot decision but sequence of decisions over time. (eg, elicopter fly) Reward function + maximize reward evolutionary learning: fitness, score Learning theory: examples of analyses: guarantee that a learning algorithm can arrive at 99% with very large amount of data how much training data one needs 17

Course Introduction Introduction Notation Supervised Learning � X input vector, X j the j th component (We use uppercase letters such as X , Y or G when referring to the generic aspects of a variable) x i the i th observed value of � � X (We use lowercase for observed values) Y, G outputs (G for for groups or quantitative outputs) j = 1 , . . . , p for parameters and i = 1 , . . . , m for observations x 1 x 1   . . . 1 p . . X =  is a m × p matrix for a set of m input p -vectors   .  x m x m . . . 1 p x i , i = 1 , ..., m � x j all observations on the variable X j (column vector) 18

Course Introduction Introduction Supervised Learning Learning task: given the value of an input vector X , make a good prediction of the output Y , denoted by ˆ Y . If Y ∈ R then ˆ Y ∈ R If G ∈ G then ˆ G ∈ G If G ∈ { 0 , 1 } then possible to encode as Y ∈ [0 , 1] , then ˆ G = 0 if Y < 0 . 5 and ˆ ˆ G = 1 if ˆ Y ≥ 0 . 5 ( x i , y i ) or ( x i , g i ) are training data 19

Course Introduction Introduction Learning Task: Overview Supervised Learning Learning = Representation + Evaluation + optimization Representation: formal language that the computer can handle. Corresponds to choosing the set of functions that can be learned, ie. the hypothesis space of the learner. How to represent the input, that is, what features to use. Evaluation: an evaluation function (aka objective function or scoring function) Optimization. a method to search among the learners in the language for the highest-scoring one. Efficiency issues. Common for new learners to start out using off-the-shelf optimizers, which are later replaced by custom-designed ones. 20

Course Introduction Introduction Supervised Learning 21

Course Introduction Introduction Supervised Learning Problem Supervised Learning 23

Course Introduction Introduction Learning Task Supervised Learning 24

Course Introduction Introduction Regression Problem Supervised Learning 26

Course Introduction Introduction Supervised Learning Representation of hypothesis space: h ( x ) = θ 0 + θ 1 x linear function if we know another feature: h ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 = h θ ( x ) for conciseness, defining x 0 = 1 2 � θ j x j = � θ T � h ( x ) = x j =0 p # of features, � θ vector of p + 1 parameters, θ 0 si the bias 27

Course Introduction Introduction Supervised Learning Evaluation loss function L ( Y, h ( X )) for penalizing errors in prediction. Most common is squared error loss: L ( Y, h ( X )) = ( h ( X ) − Y ) 2 this leads to minimize: L ( � min θ ) � θ Optimization m J ( θ ) = 1 x i ) − y i � 2 � � h � θ ( � cost function 2 i =1 min J ( θ ) � θ 28

Introduction Marco Chiarandini Department of Mathematics & - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 1 Introduction Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Course Introduction Introduction Outline Supervised Learning 1. Course

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Classification with Nearest Neighbors CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu What we know

Decision Trees + k-Nearest Neighbors Matt Gormley Lecture 3 January 24, 2018 1 Q&A Q:

Nearest Neighbour Searching in Metric Spaces Kenneth Clarkson (1999, 2006) Nearest Neighbour

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

Lecture 7: Non-Parametric Methods KNN Dr. Chengjiang Long Computer Vision Researcher at

Nearest Neighbor Classifiers CSE 4308/5360: Artificial Intelligence I University of Texas at

Introduction Marco Chiarandini Department of Mathematics & - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 1 Introduction Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Course Introduction Introduction Outline Supervised Learning 1. Course

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design &amp; Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Classification with Nearest Neighbors CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu What we know

Decision Trees + k-Nearest Neighbors Matt Gormley Lecture 3 January 24, 2018 1 Q&amp;A Q:

Nearest Neighbour Searching in Metric Spaces Kenneth Clarkson (1999, 2006) Nearest Neighbour

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

Lecture 7: Non-Parametric Methods KNN Dr. Chengjiang Long Computer Vision Researcher at

Nearest Neighbor Classifiers CSE 4308/5360: Artificial Intelligence I University of Texas at

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Decision Trees + k-Nearest Neighbors Matt Gormley Lecture 3 January 24, 2018 1 Q&A Q: