CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural - PowerPoint PPT Presentation

CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015

Methods to Learn Matrix Data Text Set Data Sequence Time Series Graph & Images Data Data Network Classification Decision Tree; HMM Label Neural Naïve Bayes; Propagation* Network Logistic Regression SVM; kNN Clustering K-means; PLSA SCAN*; hierarchical Spectral clustering; DBSCAN; Clustering* Mixture Models; kernel k-means* Apriori; GSP; Frequent FP-growth PrefixSpan Pattern Mining Linear Regression Autoregression Prediction Similarity DTW P-PageRank Search PageRank Ranking 2

Mining Image Data • Image Data • Neural Networks as a Classifier • Summary 3

Images • Images can be found everywhere • Social Networks, e.g. Instagram, Facebook, etc. • World Wide Web • All kinds of cameras 4

Image Representation • Image represented as matrix 5

Applications: Face Recognition • Recognize human face in images 6

Applications: Face Recognition • Can also recognize emotions! • Try it yourself @ https://www.projectoxford.ai/demo/emotion 7

Applications: Hand Written Digits Recognition • What are the numbers? 8

Artificial Neural Networks • Consider humans: • Neuron switching time ~.001 second • Number of neurons ~ 10 10 • Connections per neuron ~ 10 4−5 • Scene recognition time ~.1 second • 100 inference steps doesn't seem like enough -> parallel computation • Artificial neural networks • Many neuron-like threshold switching units • Many weighted interconnections among units • Highly parallel, distributed process • Emphasis on tuning weights automatically 10

Single Unit: Perceptron Bias: 𝜄 x 0 w 0 x 1  w 1 f output y x n w n For Example n     y sign( w i x ) i Input weight weighted Activation  i 0 vector x vector w sum function • An n -dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping 11

Perceptron Training Rule For each training data point: • t: target value (true value) • o: output value • 𝜃 : learning rate (small constant) • Derived using Gradient Descent method by minimizing the squared error: 12

A Multi-Layer Feed-Forward Neural Network A two-layer network Output vector Output layer 𝒛 = 𝑕(𝑋 2 𝒊 + 𝑐 (2) ) 𝒊 = 𝑔(𝑋 1 𝒚 + 𝑐 (1) ) Hidden layer Bias term Input layer Weight matrix Nonlinear transformation, e.g. sigmoid transformation Input vector: x 13

Sigmoid Unit 1 • 𝜏 𝑦 = 1+𝑓 −𝑦 is a sigmoid function • Property: • Will be used in learning 14

How A Multi-Layer Neural Network Works • The inputs to the network correspond to the attributes measured for each training tuple • Inputs are fed simultaneously into the units making up the input layer • They are then weighted and fed simultaneously to a hidden layer • The number of hidden layers is arbitrary, although usually only one • The weighted outputs of the last hidden layer are input to units making up the output layer , which emits the network's prediction • The network is feed-forward : None of the weights cycles back to an input unit or to an output unit of a previous layer • From a math point of view, networks perform nonlinear regression : Given enough hidden units and enough training samples, they can closely approximate any continuous function 15

Defining a Network Topology • Decide the network topology: Specify # of units in the input layer , # of hidden layers (if > 1), # of units in each hidden layer , and # of units in the output layer • Normalize the input values for each attribute measured in the training tuples to [0.0 — 1.0] • Output , if for classification and more than two classes, one output unit per class is used • Once a network has been trained and its accuracy is unacceptable , repeat the training process with a different network topology or a different set of initial weights 16

Learning by Backpropagation • Backpropagation: A neural network learning algorithm • Started by psychologists and neurobiologists to develop and test computational analogues of neurons • During the learning phase, the network learns by adjusting the weights so as to be able to predict the correct class label of the input tuples • Also referred to as connectionist learning due to the connections between units 17

Backpropagation • Iteratively process a set of training tuples & compare the network's prediction with the actual known target value • For each training tuple, the weights are modified to minimize the mean squared error between the network's prediction and the actual target value • Modifications are made in the “ backwards ” direction: from the output layer, through each hidden layer down to the first hidden layer, hence “ backpropagation ” 18

Backpropagation Steps to Learn Weights • Initialize weights to small random numbers, associated with biases • Repeat until terminating condition meets • For each training example • Propagate the inputs forward (by applying activation function) • For a hidden or output layer unit 𝑘 • Calculate net input: 𝐽 𝑘 = 𝑗 𝑥 𝑗𝑘 𝑃 𝑗 + 𝜄 𝑘 1 • Calculate output of unit 𝑘 : 𝑃 𝑘 = 1+𝑓 −𝐽𝑘 • Backpropagate the error (by updating weights and biases) • For unit 𝑘 in output layer: 𝐹𝑠𝑠 𝑘 = 𝑃 𝑘 1 − 𝑃 𝑈 𝑘 − 𝑃 𝑘 𝑘 • For unit 𝑘 in a hidden layer: : 𝐹𝑠𝑠 𝑘 𝑙 𝐹𝑠𝑠 𝑙 𝑥 𝑘 = 𝑃 𝑘 1 − 𝑃 𝑘𝑙 • Update weights: 𝑥 𝑗𝑘 = 𝑥 𝑗𝑘 + 𝜃𝐹𝑠𝑠 𝑘 𝑃 𝑗 • Terminating condition (when error is very small, etc.) 19

Example A multilayer feed-forward neural network Initial Input, weight, and bias values 20

Example • Input forward: • Error backpropagation and weight update: 21

Efficiency and Interpretability • Efficiency of backpropagation: Each iteration through the training set takes O(|D| * w ), with |D| tuples and w weights, but # of iterations can be exponential to n, the number of inputs, in worst case • For easier comprehension: Rule extraction by network pruning • Simplify the network structure by removing weighted links that have the least effect on the trained network • Then perform link, unit, or activation value clustering • The set of input and activation values are studied to derive rules describing the relationship between the input and hidden unit layers • Sensitivity analysis : assess the impact that a given input variable has on a network output. The knowledge gained from this analysis can be represented in rules • E.g., If x decreases 5% then y increases 8% 22

Neural Network as a Classifier • Weakness • Long training time • Require a number of parameters typically best determined empirically, e.g., the network topology or “structure.” • Poor interpretability: Difficult to interpret the symbolic meaning behind the learned weights and of “hidden units” in the network • Strength • High tolerance to noisy data • Well-suited for continuous-valued inputs and outputs • Successful on an array of real-world data, e.g., hand-written letters • Algorithms are inherently parallel • Techniques have recently been developed for the extraction of rules from trained neural networks 23

Digits Recognition Example • Obtain sequence of digits by segmentation • Recognition (our focus) 5 24

Digits Recognition Example • The architecture of the used neural network • What each neurons are doing? 0 Input image Predicted number Activated neurons detecting image parts 25

Towards Deep Learning 26

Summary • Image data representation • Image classification via neural networks • The structure of neural networks • Learning by backpropagation 28

CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural - PowerPoint PPT Presentation

CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Matrix Data Text Set Data Sequence Time Series Graph & Images Data Data

Link Analysis Lecture 7 Link Analysis November 29, 2017 1 CS6220 Data Mining Techniques

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

CS6220: DATA MINING TECHNIQUES Chapter 7: Advanced Pattern Mining Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: Yizhou Sun yzsun@ccs.neu.edu

CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Set Data: Frequent Pattern Mining Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data: Part I Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Set Data: Frequent Pattern Mining Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Mining Sequential and Time Series Data Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: Yizhou Sun yzsun@ccs.neu.edu

CS6220: DATA MINING TECHNIQUES Set Data: Frequent Pattern Mining Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu

CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

CS6220: DATA MINING TECHNIQUES Chapter 3: Data Preprocessing Instructor: Yizhou Sun

Gradients of Deep Networks Chris Cremer March 29 2017 Neural Net $ %

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Anartificialneuron Artificialneuralnetworks y = f ( S ) x 0 =+1 Background

Logistic Regression INFO-4604, Applied Machine Learning University of Colorado Boulder September

Neural Net Backpropagation 3/20/17 Recall: Limitations of Perceptrons vs. AND and OR are

Classifiers: Support Vector Machine 1 MACHINE LEARNING What is Classification? Female Adult

AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING

Data Mining Lecture Notes for Chapter 4 Artificial Neural Networks Introduction to Data Mining ,