Maschinelles Lernen: Methoden, Algorithmen, Potentiale und - - PowerPoint PPT Presentation

maschinelles lernen methoden algorithmen potentiale und
SMART_READER_LITE
LIVE PREVIEW

Maschinelles Lernen: Methoden, Algorithmen, Potentiale und - - PowerPoint PPT Presentation

Maschinelles Lernen: Methoden, Algorithmen, Potentiale und gesellschaftliche Herausforderungen Felix Wichmann Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universitt


slide-1
SLIDE 1 Neural Information Processing Group and
 Bernstein Center for Computational Neuroscience, 
 Eberhard Karls Universität Tübingen Max Planck Institute for Intelligent Systems, Tübingen Felix Wichmann Maschinelles Lernen:
 Methoden, Algorithmen, Potentiale und gesellschaftliche Herausforderungen
slide-2
SLIDE 2 http://www.appblogger.de/wp-content/uploads/2013/03/pb-130314-pope-2005.photoblog900.jpg
slide-3
SLIDE 3 http://msnbcmedia.msn.com/j/MSNBC/Components/Photo/_new/pb-130314-pope-2013.photoblog900.jpg
slide-4
SLIDE 4

slide-5
SLIDE 5 One way to think about vision: inverse optics Laws of physics “generate” 2D images on
  • ur retinae from 3D scenes

(forward optics / rendering) light source (e.g. sun light)
  • bject reflectance
amount of light entering the eye is a product of light source intensity and object reflectance
slide-6
SLIDE 6 One way to think about vision: inverse optics Laws of physics “generate” 2D images on
  • ur retinae from 3D scenes

(forward optics / rendering) Starting point to think about visual perception: we want to infer the 3D scene from the 2D retinal images:
 inverse optics! light source (e.g. sun light)
  • bject reflectance
amount of light entering the eye is a product of light source intensity and object reflectance
slide-7
SLIDE 7 One way to think about vision: inverse optics Laws of physics “generate” 2D images on
  • ur retinae from 3D scenes

(forward optics / rendering) Starting point to think about visual perception: we want to infer the 3D scene from the 2D retinal images:
 inverse optics! But: Inverse optics is mathematically impossible. light source (e.g. sun light)
  • bject reflectance
amount of light entering the eye is a product of light source intensity and object reflectance
slide-8
SLIDE 8 N = 0
slide-9
SLIDE 9 N = 1
slide-10
SLIDE 10 N = 2
slide-11
SLIDE 11 N = 5
slide-12
SLIDE 12 N = 9
slide-13
SLIDE 13 N = 24 (considered fully rendered)
slide-14
SLIDE 14 modifjed from Matthias Bethge
slide-15
SLIDE 15 modifjed from Matthias Bethge
slide-16
SLIDE 16 modifjed from Matthias Bethge illumination
 („light fjeld“)
  • bjects & surfaces

(geometry, materials) resulting
 image (non-linear)
 „information entanglement“
slide-17
SLIDE 17 visual inference („untangling“) illumination
 („light fjeld“)
  • bjects & surfaces

(geometry, materials) resulting
 image (non-linear)
 „information entanglement“ modifjed from Matthias Bethge
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

slide-31
SLIDE 31 Machine learning (ML) and statistics
slide-32
SLIDE 32 Machine learning (ML) and statistics Statistics is the science of learning from data. … [ML] is the science of learning from data. These fields are identical in intent although they differ in their history, conventions, emphasis and culture. (Wasserman, 2014)
slide-33
SLIDE 33 Machine learning (ML) and statistics Statistics is the science of learning from data. … [ML] is the science of learning from data. These fields are identical in intent although they differ in their history, conventions, emphasis and culture. (Wasserman, 2014) ML is a comparatively new sub-branch of computational statistics jointly developed in computer science and statistics.
slide-34
SLIDE 34 Machine learning (ML) and statistics Statistics is the science of learning from data. … [ML] is the science of learning from data. These fields are identical in intent although they differ in their history, conventions, emphasis and culture. (Wasserman, 2014) ML is a comparatively new sub-branch of computational statistics jointly developed in computer science and statistics. ML is inference performed by computers based on past observations and learning algorithms: ML algorithms are mainly concerned with discovering hidden structure in data in order to predict novel data—exploratory methods, to get things done!
slide-35
SLIDE 35 Machine learning (ML) and statistics Statistics is the science of learning from data. … [ML] is the science of learning from data. These fields are identical in intent although they differ in their history, conventions, emphasis and culture. (Wasserman, 2014) ML is a comparatively new sub-branch of computational statistics jointly developed in computer science and statistics. ML is inference performed by computers based on past observations and learning algorithms: ML algorithms are mainly concerned with discovering hidden structure in data in order to predict novel data—exploratory methods, to get things done! “Classical” statistics typically is concerned with making precise probabilistic statements about known data coming from known distributions, i.e. interest in accurate models of data!
slide-36
SLIDE 36 What is the difference between statistics and machine learning? Machine Learning is AI people doing data analysis. Data Mining is database people doing data analysis. Applied Statistics is statisticians doing data analysis Infographics is Graphic Designers doing data analysis. Data Journalism is Journalists doing data analysis. Econometrics is Economists doing data analysis
 (and here you can win a Nobel Prize). Psychometrics is Psychologists doing data analysis. Chemometrics and Cheminformatics are Chemists doing data analysis. Bioinformatics is Biologists doing data analysis. 30 Aleks Jakulin, https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning
slide-37
SLIDE 37 What is the difference between statistics and machine learning? (cont’d) … if you look at what the goals both fields are trying to achieve, you see that there is actually quite a big difference: Statistics is interested in learning something about data, for example, which have been measured as part of some biological experiment. … . But the
  • verall goal is to arrive at new scientific insight based on the data.
In Machine Learning, the goal is to solve some complex computational task by “letting the machine learn”. Instead of trying to understand the problem well enough to be able to write a program which is able to perform the task (for example, handwritten character recognition), you instead collect a huge amount of examples of what the program should do, and then run an algorithm which is able to perform the task by learning from the examples. Often, the learning algorithms are statistical in nature. But as long as the prediction works well, any kind of statistical insight into the data is not necessary. 31 Mikio Braun, https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning
slide-38
SLIDE 38 What is the difference between statistics and machine learning? (cont’d) The primary differences are perhaps the types of the problems attacked, and the goal of learning. At the risk of data and models oversimplification, one could say that in statistics a prime focus is often in understanding the data and relationships in terms of models giving approximate summaries such as linear relations or
  • independencies. In contrast, the goals in algorithms and machine learning are
primarily to make predictions as accurately as possible and predictions to understand the behaviour of learning algorithms. These differing objectives have led to different developments in the two fields: for example, neural network algorithms have been used extensively as black-box function approximators in machine learning, but to many statisticians they are less than satisfactory, because of the difficulties in interpreting such models. 32 Franck Dernoncourt, https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning
slide-39
SLIDE 39 Terminology: types of learning
slide-40
SLIDE 40 Terminology: types of learning Supervised learning is the ML task of inferring a function from labeled training data. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired
  • utput value (also called the supervisory signal). A supervised learning algorithm analyzes the
training data and produces an inferred function, which can be used for prediction.
slide-41
SLIDE 41 Terminology: types of learning Supervised learning is the ML task of inferring a function from labeled training data. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired
  • utput value (also called the supervisory signal). A supervised learning algorithm analyzes the
training data and produces an inferred function, which can be used for prediction. Reinforcement learning is an area of ML inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Unlike supervised ML correct input/output pairs are never presented, nor sub-
  • ptimal actions explicitly corrected; only global reward for an action.
slide-42
SLIDE 42 Terminology: types of learning Supervised learning is the ML task of inferring a function from labeled training data. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired
  • utput value (also called the supervisory signal). A supervised learning algorithm analyzes the
training data and produces an inferred function, which can be used for prediction. Reinforcement learning is an area of ML inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Unlike supervised ML correct input/output pairs are never presented, nor sub-
  • ptimal actions explicitly corrected; only global reward for an action.
Unsupervised learning is the ML task of inferring a function to describe hidden structure from unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised learning and reinforcement learning. A good example is identifying close-knit groups of friends in social network data; clustering algorithms, like k-means)
slide-43
SLIDE 43 Terminology: types of learning Supervised learning is the ML task of inferring a function from labeled training data. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired
  • utput value (also called the supervisory signal). A supervised learning algorithm analyzes the
training data and produces an inferred function, which can be used for prediction. Reinforcement learning is an area of ML inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Unlike supervised ML correct input/output pairs are never presented, nor sub-
  • ptimal actions explicitly corrected; only global reward for an action.
Unsupervised learning is the ML task of inferring a function to describe hidden structure from unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised learning and reinforcement learning. A good example is identifying close-knit groups of friends in social network data; clustering algorithms, like k-means) Semi-supervised learning is a class algorithms making use of unlabeled data for training— typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data).
slide-44
SLIDE 44 Terminology: types of problems in supervised ML
slide-45
SLIDE 45 Terminology: types of problems in supervised ML Classification: Problems where we seek a yes-or-no prediction, such as “Is this tumour cancerous?”, “Does this cookie meet our quality standards?”, and so on.
slide-46
SLIDE 46 Terminology: types of problems in supervised ML Classification: Problems where we seek a yes-or-no prediction, such as “Is this tumour cancerous?”, “Does this cookie meet our quality standards?”, and so on. Regression: Problems where the value being predicted falls somewhere on a continuous spectrum. These systems help us with questions of “How much?”
  • r “How many?”
slide-47
SLIDE 47 Terminology: types of problems in supervised ML Classification: Problems where we seek a yes-or-no prediction, such as “Is this tumour cancerous?”, “Does this cookie meet our quality standards?”, and so on. Regression: Problems where the value being predicted falls somewhere on a continuous spectrum. These systems help us with questions of “How much?”
  • r “How many?”
Support vector machine (SVM) is a supervised classification algorithm
slide-48
SLIDE 48 Terminology: types of problems in supervised ML Classification: Problems where we seek a yes-or-no prediction, such as “Is this tumour cancerous?”, “Does this cookie meet our quality standards?”, and so on. Regression: Problems where the value being predicted falls somewhere on a continuous spectrum. These systems help us with questions of “How much?”
  • r “How many?”
Support vector machine (SVM) is a supervised classification algorithm Neural networks, including the now so popular convolutional deep neural networks (DNNs), are supervised algorithms, too, typically however for multi- class classification
slide-49
SLIDE 49 Success of supervised classification in ML
slide-50
SLIDE 50 Success of supervised classification in ML ML—and in particular kernel methods as well as very recently so-called deep neural networks (DNNs)—have proven successful whenever there is an abundance of empirical data but a lack of explicit knowledge how the data were generated:
slide-51
SLIDE 51 Success of supervised classification in ML ML—and in particular kernel methods as well as very recently so-called deep neural networks (DNNs)—have proven successful whenever there is an abundance of empirical data but a lack of explicit knowledge how the data were generated:
  • Predict credit card fraud from patterns of money withdrawals.
slide-52
SLIDE 52 Success of supervised classification in ML ML—and in particular kernel methods as well as very recently so-called deep neural networks (DNNs)—have proven successful whenever there is an abundance of empirical data but a lack of explicit knowledge how the data were generated:
  • Predict credit card fraud from patterns of money withdrawals.
  • Predict toxicity of novel substances (biomedical research).
slide-53
SLIDE 53 Success of supervised classification in ML ML—and in particular kernel methods as well as very recently so-called deep neural networks (DNNs)—have proven successful whenever there is an abundance of empirical data but a lack of explicit knowledge how the data were generated:
  • Predict credit card fraud from patterns of money withdrawals.
  • Predict toxicity of novel substances (biomedical research).
  • Predict engine failure in airplanes.
slide-54
SLIDE 54 Success of supervised classification in ML ML—and in particular kernel methods as well as very recently so-called deep neural networks (DNNs)—have proven successful whenever there is an abundance of empirical data but a lack of explicit knowledge how the data were generated:
  • Predict credit card fraud from patterns of money withdrawals.
  • Predict toxicity of novel substances (biomedical research).
  • Predict engine failure in airplanes.
  • Predict what people will google next.
slide-55
SLIDE 55 Success of supervised classification in ML ML—and in particular kernel methods as well as very recently so-called deep neural networks (DNNs)—have proven successful whenever there is an abundance of empirical data but a lack of explicit knowledge how the data were generated:
  • Predict credit card fraud from patterns of money withdrawals.
  • Predict toxicity of novel substances (biomedical research).
  • Predict engine failure in airplanes.
  • Predict what people will google next.
  • Predict what people want to buy next at amazon.
slide-56
SLIDE 56 The Function Learning Problem x x x x x x y
slide-57
SLIDE 57 The Function Learning Problem x x x x x x y
slide-58
SLIDE 58 The Function Learning Problem x x x x x x y
slide-59
SLIDE 59 Learning Problem in General
slide-60
SLIDE 60 Learning Problem in General Training examples (x1,y1),…,(xm,ym)
slide-61
SLIDE 61 Learning Problem in General Training examples (x1,y1),…,(xm,ym) Task: given a new x, find the new y
 strong emphasis on prediction, that is, generalization!
slide-62
SLIDE 62 Learning Problem in General Training examples (x1,y1),…,(xm,ym) Task: given a new x, find the new y
 strong emphasis on prediction, that is, generalization! Idea: (x,y) should look “similar” to the training examples
slide-63
SLIDE 63 Learning Problem in General Training examples (x1,y1),…,(xm,ym) Task: given a new x, find the new y
 strong emphasis on prediction, that is, generalization! Idea: (x,y) should look “similar” to the training examples Required: similarity measure for (x,y)
slide-64
SLIDE 64 Learning Problem in General Training examples (x1,y1),…,(xm,ym) Task: given a new x, find the new y
 strong emphasis on prediction, that is, generalization! Idea: (x,y) should look “similar” to the training examples Required: similarity measure for (x,y) Much of creativity and difficulty in kernel-based ML: Find suitable similarity measures for all the practical problems discussed before, e.g. credit card fraud, toxicity of novel molecules, gene sequences, … .
slide-65
SLIDE 65 Learning Problem in General Training examples (x1,y1),…,(xm,ym) Task: given a new x, find the new y
 strong emphasis on prediction, that is, generalization! Idea: (x,y) should look “similar” to the training examples Required: similarity measure for (x,y) Much of creativity and difficulty in kernel-based ML: Find suitable similarity measures for all the practical problems discussed before, e.g. credit card fraud, toxicity of novel molecules, gene sequences, … . When are two molecules, with different atoms, structure, configuration etc. the same? When are two strings of letters or sentences similar? What would be the mean, or the variance of strings? Of molecules?
slide-66
SLIDE 66 Learning Problem in General Training examples (x1,y1),…,(xm,ym) Task: given a new x, find the new y
 strong emphasis on prediction, that is, generalization! Idea: (x,y) should look “similar” to the training examples Required: similarity measure for (x,y) Much of creativity and difficulty in kernel-based ML: Find suitable similarity measures for all the practical problems discussed before, e.g. credit card fraud, toxicity of novel molecules, gene sequences, … . When are two molecules, with different atoms, structure, configuration etc. the same? When are two strings of letters or sentences similar? What would be the mean, or the variance of strings? Of molecules? Very recent deep neural network success:
 The network learns the right similarity measure from the data!
slide-67
SLIDE 67 The Support Vector Machine
slide-68
SLIDE 68 The Support Vector Machine Computer algorithm that learns by example to assign labels to objects
slide-69
SLIDE 69 The Support Vector Machine Computer algorithm that learns by example to assign labels to objects Successful in handwritten digit recognition, credit card fraud detection, classification of gene expression profiles etc.
slide-70
SLIDE 70 The Support Vector Machine Computer algorithm that learns by example to assign labels to objects Successful in handwritten digit recognition, credit card fraud detection, classification of gene expression profiles etc. Essence of the SVM algorithm requires understanding of:
 

  • i. the separating hyperplane

  • ii. the maximum-margin hyperplane

  • iii. the soft margin

  • iv. the kernel function
slide-71
SLIDE 71 The Support Vector Machine Computer algorithm that learns by example to assign labels to objects Successful in handwritten digit recognition, credit card fraud detection, classification of gene expression profiles etc. Essence of the SVM algorithm requires understanding of:
 

  • i. the separating hyperplane

  • ii. the maximum-margin hyperplane

  • iii. the soft margin

  • iv. the kernel function
For SVMs and machine learning in general:

  • i. regularisation

  • ii. cross-validation
slide-72
SLIDE 72 12 10 8 6 4 2 2 4 6 8 10 12 MARCKSL1 ZYX a Two Genes and Two Forms of Leukemia (microarrays deliver thousands of genes, but hard to draw ...)
slide-73
SLIDE 73 2 4 6 8 10 12 MARCKSL1 12 10 8 6 4 2 ZYX b Separating Hyperplane
slide-74
SLIDE 74 2 4 6 8 10 12 c Separating Hyperplane in 1D — a Point
slide-75
SLIDE 75 MARCKSL1 HOXA9 d 12 10 8 6 4 2 –2 2 2 4 4 6 6 8 8 1 1 12 12 ZYX ... and in 3D: a plane
slide-76
SLIDE 76 20 40 60 80 100 120 MARCKSL1 12 10 8 6 4 2 ZYX e Many Potential Separating Hyperplanes ... (all “optimal” w.r.t. some loss function)
slide-77
SLIDE 77 2 4 6 8 10 12 MARCKSL1 ZYX f 12 10 8 6 4 2 The Maximum-Margin Hyperplane
slide-78
SLIDE 78 2 4 6 8 10 12 MARCKSL1 ZYX g 12 10 8 6 4 2 What to Do With Outliers?
slide-79
SLIDE 79 2 4 6 8 10 12 MARCKSL1 ZYX h 12 10 8 6 4 2 The Soft-Margin Hyperplane
slide-80
SLIDE 80 i –1 –5 5 1 Expression The Kernel Function in 1D
slide-81
SLIDE 81 1.0 × 1e6 0.8 0.6 0.4 0.2 Expression * expression j –1 –5 5 1 Expression Mapping the 1D data to 2D (here: squaring)
slide-82
SLIDE 82 Not linearly separable in input space ... Figure 3. The crosses and the circles cannot be separated by a linear perceptron in the plane.
slide-83
SLIDE 83 Map from 2D to 3D ... → Φ(x) =          φ1(x) φ2(x) φ3(x)          =           x2 1 √ 2x1x2 x2 2           .
slide-84
SLIDE 84 ... linear separability in 3D (actually: data still 2D, “live” on a manifold of original D!) Figure 4. The crosses and circles from Figure 3 can be mapped to a three-dimensional space in which they can be separated by a linear perceptron.
slide-85
SLIDE 85 10 8 6 4 2 k 2 4 6 8 10 Expression Projecting the 4D Hyperplane Back into 2D Input Space
slide-86
SLIDE 86 SVM magic?
slide-87
SLIDE 87 SVM magic? For any consistent dataset there is a kernel that allows perfect separation of the data
slide-88
SLIDE 88 SVM magic? For any consistent dataset there is a kernel that allows perfect separation of the data Why bother with soft-margins?
slide-89
SLIDE 89 SVM magic? For any consistent dataset there is a kernel that allows perfect separation of the data Why bother with soft-margins? The so-called curse of dimensionality: as the number of variables considered increases, the number of possible solutions increases exponentially … overfitting looms large!
slide-90
SLIDE 90 10 8 6 4 2 l 2 4 6 8 10 Expression Overfitting
slide-91
SLIDE 91 Regularisation & Cross-validation
slide-92
SLIDE 92 Regularisation & Cross-validation Find a compromise between complexity and classification performance, i.e. kernel function and soft-margin
slide-93
SLIDE 93 Regularisation & Cross-validation Find a compromise between complexity and classification performance, i.e. kernel function and soft-margin Penalise complex functions via a regularisation term or regulariser
slide-94
SLIDE 94 Regularisation & Cross-validation Find a compromise between complexity and classification performance, i.e. kernel function and soft-margin Penalise complex functions via a regularisation term or regulariser Cross-validate the results (leave-one-out or 10-fold typically used)
slide-95
SLIDE 95 SVM Summary
slide-96
SLIDE 96 SVM Summary Kernel essential—best kernel typically found by trial-and-error and experience with similar problems etc.
slide-97
SLIDE 97 SVM Summary Kernel essential—best kernel typically found by trial-and-error and experience with similar problems etc. Inverting not always easy; need approximations etc. (i.e. science hard, engineering easy as they don’t care as long as it works!)
slide-98
SLIDE 98 SVM Summary Kernel essential—best kernel typically found by trial-and-error and experience with similar problems etc. Inverting not always easy; need approximations etc. (i.e. science hard, engineering easy as they don’t care as long as it works!) Theoretically sound and a convex optimisation (no local minima)
slide-99
SLIDE 99 SVM Summary Kernel essential—best kernel typically found by trial-and-error and experience with similar problems etc. Inverting not always easy; need approximations etc. (i.e. science hard, engineering easy as they don’t care as long as it works!) Theoretically sound and a convex optimisation (no local minima) Choose between:

  • complicated decision functions and training (neural networks)

  • clear theoretical foundation (best possible generalisation), convex
  • ptimisation but need to trade-off complexity versus soft-margin and skilful
selection of the “right” kernel.
 (= “correct” non-linear similarity measure for the data!)
slide-100
SLIDE 100 Regularisation, Cross-Validation and Kernels Much of the success of modern machine learning methods can attributed to three ideas:
slide-101
SLIDE 101 Regularisation, Cross-Validation and Kernels Much of the success of modern machine learning methods can attributed to three ideas:
  • 1. Regularisation. Given are N “datapoints” (xi,yi) with …


 
 
 and a model f . Then the “error” between data and model is: 
 In machine learning we not only take the “error” between model and data into account but
 in addition a measure of the complexity of the model f: x = x1, ..., xN y = y1, ..., yN E(y, f(x)) E(y, f(x)) + λR(f)
slide-102
SLIDE 102 Regularisation, Cross-Validation and Kernels Much of the success of modern machine learning methods can attributed to three ideas:
  • 1. Regularisation. Given are N “datapoints” (xi,yi) with …


 
 
 and a model f . Then the “error” between data and model is: 
 In machine learning we not only take the “error” between model and data into account but
 in addition a measure of the complexity of the model f:
  • 2. Cross-Validation. Regularisation is related to the prior in Bayesian statistics. Unlike

in Bayesian statistics the trade-off between small error and low-complexity of the
 model is controlled by a parameter λ— this is optimized using cross-validation. x = x1, ..., xN y = y1, ..., yN E(y, f(x)) E(y, f(x)) + λR(f)
slide-103
SLIDE 103 Regularisation, Cross-Validation and Kernels Much of the success of modern machine learning methods can attributed to three ideas:
  • 1. Regularisation. Given are N “datapoints” (xi,yi) with …


 
 
 and a model f . Then the “error” between data and model is: 
 In machine learning we not only take the “error” between model and data into account but
 in addition a measure of the complexity of the model f:
  • 2. Cross-Validation. Regularisation is related to the prior in Bayesian statistics. Unlike

in Bayesian statistics the trade-off between small error and low-complexity of the
 model is controlled by a parameter λ— this is optimized using cross-validation.
  • 3. Non-linear mapping with linear separation.

True for kernels as well as DNNs. x = x1, ..., xN y = y1, ..., yN E(y, f(x)) E(y, f(x)) + λR(f)
slide-104
SLIDE 104

slide-105
SLIDE 105 What changed vision research in 2012?
slide-106
SLIDE 106 What changed vision research in 2012? ImageNet challenge: 1000 categories, 1.2 million training images.
slide-107
SLIDE 107 What changed vision research in 2012? ImageNet challenge: 1000 categories, 1.2 million training images. AlexNet by Krizhevsky, Sutskever & Hinton (2012) appears on the stage, and basically reduces the prediction error by nearly 50%:
slide-108
SLIDE 108 What changed vision research in 2012? ImageNet challenge: 1000 categories, 1.2 million training images. AlexNet by Krizhevsky, Sutskever & Hinton (2012) appears on the stage, and basically reduces the prediction error by nearly 50%:
slide-109
SLIDE 109
slide-110
SLIDE 110
slide-111
SLIDE 111
slide-112
SLIDE 112
slide-113
SLIDE 113
slide-114
SLIDE 114
slide-115
SLIDE 115
slide-116
SLIDE 116
slide-117
SLIDE 117
slide-118
SLIDE 118
slide-119
SLIDE 119
slide-120
SLIDE 120 Vision Deep CNN Language Generating RNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand.
slide-121
SLIDE 121 A woman is throwing a frisbee in a park. A little girl sitting on a bed with a teddy bear. A group of people sitting on a boat in the water. A girafge standing in a forest with trees in the background. A dog is standing on a hardwood fmoor. A stop sign is on a road with a mountain in the background
slide-122
SLIDE 122 ? − → Problem of finding a sharp image from a blurry photo: Blind Image Deconvolution modifjed from Michael Hirsch
slide-123
SLIDE 123 from Michael Hirsch
slide-124
SLIDE 124 from Michael Hirsch
slide-125
SLIDE 125
slide-126
SLIDE 126
slide-127
SLIDE 127
slide-128
SLIDE 128
slide-129
SLIDE 129
slide-130
SLIDE 130 Sequence of Blurry Photos (Image Burst) from Michael Hirsch
slide-131
SLIDE 131 Sequence of Blurry Photos (Image Burst) from Michael Hirsch
slide-132
SLIDE 132 Sequence of Blurry Photos (Image Burst) from Michael Hirsch
slide-133
SLIDE 133 Sequence of Blurry Photos (Image Burst) from Michael Hirsch
slide-134
SLIDE 134 Result of Proposed Image Burst Deblurring Method from Michael Hirsch
slide-135
SLIDE 135 EnhanceNet: Photo-realistic Super-resolution from Michael Hirsch
slide-136
SLIDE 136 EnhanceNet: Photo-realistic Super-resolution from Michael Hirsch
slide-137
SLIDE 137 from Michael Hirsch
slide-138
SLIDE 138 from Michael Hirsch
slide-139
SLIDE 139 Autonomous cars
slide-140
SLIDE 140 Autonomous cars
slide-141
SLIDE 141 Fundamentals of Neural Networks Interest in shallow, 2-layer artificial neural networks (ANN)—so-called perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt), based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas
  • f computation by neurons from the 1940s.
slide-142
SLIDE 142 https://kimschmidtsbrain.files.wordpress.com/2015/10/perceptron.jpg
slide-143
SLIDE 143 http://cambridgemedicine.org/sites/default/files/styles/large/public/field/ image/DonaldOldingHebb.jpg?itok=py9Uh4D5
slide-144
SLIDE 144 Fundamentals of Neural Networks Interest in shallow, 2-layer artificial neural networks (ANN)—so-called perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt), based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas
  • f computation by neurons from the 1940s.
Second wave of ANN research and interest in psychology—often termed connectionism—after the publication of the parallel distributed processing (PDP) books by David Rumelhart and James McClelland (1986), using the backpropagation algorithm as a learning rule for multi-layer networks.
slide-145
SLIDE 145 Fundamentals of Neural Networks Interest in shallow, 2-layer artificial neural networks (ANN)—so-called perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt), based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas
  • f computation by neurons from the 1940s.
Second wave of ANN research and interest in psychology—often termed connectionism—after the publication of the parallel distributed processing (PDP) books by David Rumelhart and James McClelland (1986), using the backpropagation algorithm as a learning rule for multi-layer networks. Three-layer network with (potentially infinitely many) hidden units in the intermediate layer is a universal function approximator (Kurt Hornik, 1991).
slide-146
SLIDE 146 Fundamentals of Neural Networks Interest in shallow, 2-layer artificial neural networks (ANN)—so-called perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt), based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas
  • f computation by neurons from the 1940s.
Second wave of ANN research and interest in psychology—often termed connectionism—after the publication of the parallel distributed processing (PDP) books by David Rumelhart and James McClelland (1986), using the backpropagation algorithm as a learning rule for multi-layer networks. Three-layer network with (potentially infinitely many) hidden units in the intermediate layer is a universal function approximator (Kurt Hornik, 1991). Non-convex optimization problems during backpropagation training, and lack
  • f data and computing power limited the usefulness of the ANNs:
slide-147
SLIDE 147 Fundamentals of Neural Networks Interest in shallow, 2-layer artificial neural networks (ANN)—so-called perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt), based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas
  • f computation by neurons from the 1940s.
Second wave of ANN research and interest in psychology—often termed connectionism—after the publication of the parallel distributed processing (PDP) books by David Rumelhart and James McClelland (1986), using the backpropagation algorithm as a learning rule for multi-layer networks. Three-layer network with (potentially infinitely many) hidden units in the intermediate layer is a universal function approximator (Kurt Hornik, 1991). Non-convex optimization problems during backpropagation training, and lack
  • f data and computing power limited the usefulness of the ANNs:
Universal function approximator in theory, but in practice three-layer ANNs could often not successfully solve complex problems.
slide-148
SLIDE 148 Fundamentals of Neural Networks (cont’d) Breakthrough again with so-called deep neural networks or DNNs, widely known since the 2012 NIPS-paper by Alex Krizhevsky, Ilya Sutskever & Geoffrey Hinton.
slide-149
SLIDE 149 Fundamentals of Neural Networks (cont’d) Breakthrough again with so-called deep neural networks or DNNs, widely known since the 2012 NIPS-paper by Alex Krizhevsky, Ilya Sutskever & Geoffrey Hinton. https://www.wired.com/wp-content/uploads/ blogs/wiredenterprise/wp-content/uploads/ 2013/03/hinton1.jpg
slide-150
SLIDE 150 Fundamentals of Neural Networks (cont’d) Breakthrough again with so-called deep neural networks or DNNs, widely known since the 2012 NIPS-paper by Alex Krizhevsky et al. DNN: loose terminology to refer to networks with at least two hidden or intermediate layers, typically at least five to ten (or up to dozens):
slide-151
SLIDE 151 Fundamentals of Neural Networks (cont’d) Breakthrough again with so-called deep neural networks or DNNs, widely known since the 2012 NIPS-paper by Alex Krizhevsky et al. DNN: loose terminology to refer to networks with at least two hidden or intermediate layers, typically at least five to ten (or up to dozens):
  • 1. Massive increase in labelled training data (“the internet”),

  • 2. computing power (GPUs),

  • 3. simple non-linearity (ReLU) instead of sigmoid,

  • 4. convolutional rather than fully connected layers,

and

  • 5. weight sharing across deep layers 

appear to be the critical ingredients for the current success of DNNs, and makes them the current method of choice in ML, particular in application.
slide-152
SLIDE 152 Fundamentals of Neural Networks (cont’d) Breakthrough again with so-called deep neural networks or DNNs, widely known since the 2012 NIPS-paper by Alex Krizhevsky et al. DNN: loose terminology to refer to networks with at least two hidden or intermediate layers, typically at least five to ten (or up to dozens):
  • 1. Massive increase in labelled training data (“the internet”),

  • 2. computing power (GPUs),

  • 3. simple non-linearity (ReLU) instead of sigmoid,

  • 4. convolutional rather than fully connected layers,

and

  • 5. weight sharing across deep layers 

appear to be the critical ingredients for the current success of DNNs, and makes them the current method of choice in ML, particular in application. At least superficially DNNs appear to be similar to the human object recognition system: convolutions (“filters”, “receptive fields”) followed by non-linearities and pooling is thought to be the canonical computation of cortex, at least within sensory areas.
slide-153
SLIDE 153 Fundamentals of Neural Networks 90 a Linear Threshold Sigmoid Rectified linear 1 –1 –2 –1 2 1 1 y y b w1 x1 w2 x2 z = b + Σ xiwi i Kriegeskorte (2015)
slide-154
SLIDE 154 Fundamentals of Neural Networks 91 a b c y2 y1 x1 x2 W2 W1 y1 x2 x1 y1 x2 x1 y2 = f (f (x W1) • W2) y2 = x W1 W2 = x W' Kriegeskorte (2015)
slide-155
SLIDE 155 Example: VGG-16 VGG16 by Simonyan & Zisserman (2014); 92.7% top-5 test accuracy on ImageNet https://www.cs.toronto.edu/~frossard/post/vgg16/#architecture
slide-156
SLIDE 156 http://scs.ryerson.ca/~aharley/vis/conv/flat.html
slide-157
SLIDE 157 Deep Neural Networks (DNNs) Input (2) Output (1 sigmoid) Hidden (2 sigmoid) a b y y x y x
  • =
y z
  • x
y
  • z
y z z y
  • =
Δ Δ Δ Δ Δ Δ z y z x y x
  • =
x z y z x x y
  • =
slide-158
SLIDE 158

slide-159
SLIDE 159 Adversarial attacks? Szegedy et al. (2014)
slide-160
SLIDE 160 Adversarial examples? (cont’d) Reese
 Witherspoon Sharif et al. (2016)
slide-161
SLIDE 161 Adversarial examples? (cont’d) Reese
 Witherspoon Sharif et al. (2016)
slide-162
SLIDE 162 Adversarial examples? (cont’d) Reese
 Witherspoon Russel
 Crowe Sharif et al. (2016)
slide-163
SLIDE 163 Adversarial examples? (cont’d) Reese
 Witherspoon Russel
 Crowe Sharif et al. (2016)
slide-164
SLIDE 164 Adversarial examples? (cont’d) Sharif et al. (2016)
slide-165
SLIDE 165 DARPA Challenge 2015
slide-166
SLIDE 166 DARPA Challenge 2015
slide-167
SLIDE 167 Boston Dynamics 2017
slide-168
SLIDE 168 Boston Dynamics 2017
slide-169
SLIDE 169 Human versus artificial intelligence We learn unsupervised or semi-supervised, sometimes reinforcement, very rarely supervised (school, University) – all successful AI is currently supervised only, i.e. only when the correct answer is known! We can do lots of things using the same network (or a set of closely coupled networks) — all DNNs are typically only good at one or few tasks. 101
slide-170
SLIDE 170

slide-171
SLIDE 171 Gesellschaftliche Herausforderungen Arbeitsbedingungen und Arbeitsmarkt:
 Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die Notwendigkeit einer Lehre oder Ausbildung weg.
 Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen. 103
slide-172
SLIDE 172 Arbeitslosigkeit? 104
slide-173
SLIDE 173 Arbeitslosigkeit? Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren. 104
slide-174
SLIDE 174 Arbeitslosigkeit? Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren. 540.000 Berufskraftfahrer in Deutschland (Stand 2013)
 250.000 Taxifahrerlaubnisse (Stand 2017)
 25.000 Lokführer (Stand 2017)
 815.000 Arbeitsplätze gefährdet (Quote von 5.8% auf 8.1%) 104
slide-175
SLIDE 175 Arbeitslosigkeit? Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren. 540.000 Berufskraftfahrer in Deutschland (Stand 2013)
 250.000 Taxifahrerlaubnisse (Stand 2017)
 25.000 Lokführer (Stand 2017)
 815.000 Arbeitsplätze gefährdet (Quote von 5.8% auf 8.1%) Roboter in der Post? Abfallwirtschaft? Logistik?
 Deutsche Post DHL hat 211.000 Mitarbeiter in Deutschland (Stand 2016), in der Ver- und Entsorgung arbeiteten 2014 ca. 155.000 Menschen, als Reinigungskräfte 2014 offiziell fast 760.000; Amazon beschäftigt alleine in D 23.000 Menschen in Logistik-Zentren: 1.150.000 Arbeitsplätze! 104
slide-176
SLIDE 176 Arbeitslosigkeit? Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren. 540.000 Berufskraftfahrer in Deutschland (Stand 2013)
 250.000 Taxifahrerlaubnisse (Stand 2017)
 25.000 Lokführer (Stand 2017)
 815.000 Arbeitsplätze gefährdet (Quote von 5.8% auf 8.1%) Roboter in der Post? Abfallwirtschaft? Logistik?
 Deutsche Post DHL hat 211.000 Mitarbeiter in Deutschland (Stand 2016), in der Ver- und Entsorgung arbeiteten 2014 ca. 155.000 Menschen, als Reinigungskräfte 2014 offiziell fast 760.000; Amazon beschäftigt alleine in D 23.000 Menschen in Logistik-Zentren: 1.150.000 Arbeitsplätze! Humanoide Roboter in der Pflege?
 2014 arbeiteten in der Alten- und Krankenpflege in D über 900.000 Menschen … . 104
slide-177
SLIDE 177 Gesellschaftliche Herausforderungen Arbeitsbedingungen und Arbeitsmarkt:
 Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die Notwendigkeit einer Lehre oder Ausbildung weg.
 Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen. Politik und Gesellschaft:
 Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter Konsum von Propaganda. 105
slide-178
SLIDE 178 Propaganda Propaganda ist der Versuch der gezielten Beeinflussung des Denkens, Handelns und Fühlens von Menschen. Wer Propaganda betreibt, verfolgt damit immer ein bestimmtes Interesse. … Charakteristisch für Propaganda ist, dass sie die verschiedenen Seiten einer Thematik nicht darlegt und Meinung und Information vermischt. Wer Propaganda betreibt, möchte nicht diskutieren und mit Argumenten überzeugen, sondern mit allen Tricks die Emotionen und das Verhalten der Menschen beeinflussen, beispielsweise indem sie diese ängstigt, wütend macht oder ihnen Verheißungen ausspricht. Propaganda nimmt dem Menschen das Denken ab und gibt ihm stattdessen das Gefühl, mit der übernommenen Meinung richtig zu liegen. Quelle: Bundeszentrale für politische Bildung
 www.bpb.de 106
slide-179
SLIDE 179 Gesellschaftliche Herausforderungen Arbeitsbedingungen und Arbeitsmarkt:
 Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die Notwendigkeit einer Lehre oder Ausbildung weg.
 Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen. Politik und Gesellschaft:
 Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter Konsum von Propaganda.
 Privatsphäre? Veränderung (zwischenmenschlicher) Kommunikation? 107
slide-180
SLIDE 180 Weapons of Mass Destruction (WMDs) https://www.wired.com/images_blogs/dangerroom/2011/03/powell_un_anthrax.jpg
slide-181
SLIDE 181
slide-182
SLIDE 182 Gesellschaftliche Herausforderungen Arbeitsbedingungen und Arbeitsmarkt:
 Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die Notwendigkeit einer Lehre oder Ausbildung weg.
 Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen. Politik und Gesellschaft:
 Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter Konsum von Propaganda.
 Privatsphäre? Veränderung (zwischenmenschlicher) Kommunikation? Naïver Glaube an die Objektivität von Algorithmen
 … und Ranglisten, die Vermessung und Quantifizierung des Lebens:
 China, z.B., plant das Social Credit System einzuführen. 110
slide-183
SLIDE 183 https://de.wikipedia.org/wiki/Nick_Bostrom
slide-184
SLIDE 184 Gesellschaftliche Herausforderungen Arbeitsbedingungen und Arbeitsmarkt:
 Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die Notwendigkeit einer Lehre oder Ausbildung weg.
 Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen. Politik und Gesellschaft:
 Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter Konsum von Propaganda.
 Privatsphäre? Veränderung (zwischenmenschlicher) Kommunikation? Naïver Glaube an die Objektivität von Algorithmen
 … und Ranglisten, die Vermessung und Quantifizierung des Lebens: China plant das Social Credit System einzuführen. Doomsday-Szenarien
 Kommt die Singularität? Wenn ja: Garten Eden oder Hölle? 112
slide-185
SLIDE 185 Doomsday-Videos to watch Google's Geoffrey Hinton - "There's no reason to think computers won't get much smarter than us” (10 mins): https://www.youtube.com/watch?v=p6lM3bh-npg Demis Hassabis, CEO, DeepMind Technologies - The Theory of Everything
 (16 mins): https://www.youtube.com/watch?v=rbsqaJwpu6A Nick Bostrom, What happens when our computers get smarter than we are?
 (17 mins): https://www.ted.com/talks/ nick_bostrom_what_happens_when_our_computers_get_smarter_than_we_are Why Elon Musk is worried about artificial intelligence (3 mins)
 https://www.youtube.com/watch?v=US95slMMQis
slide-186
SLIDE 186 Neural Information Processing Group and
 Bernstein Center for Computational Neuroscience, 
 Eberhard Karls Universität Tübingen Max Planck Institute for Intelligent Systems, Tübingen Felix Wichmann Thanks