Feature extraction from deep models Olgert Denas Synopsis Intro - PowerPoint PPT Presentation

Feature extraction from deep models Olgert Denas

Synopsis Intro to deep models Applications ● Neurons & Nets ● dimer ● Learning & Depth ● G1E model Feature extraction ● Theory ● 1 Layer ● Nets

Neural computation Inspired by organic neural systems A system of simple computing units with learnable parameters Intended for conventional computing efficient arithmetic and calculus but von Neumann’s architecture “won”

Neural computation Mainly in machine learning Declarative: unambiguous sort an array of integers Procedural: can only state by examples find fraud in network logs

Artificial Neural Nets

Neurons

Neurons The artificial neuron is very different from the biological one, after all it is a model

Neurons Natural - organic Artificial transfer function complicated parametric function mixed communication discrete or continuous continuous/impulse no state, output is f(x; θ), state, chemical, physical changes fixed connections synaptic delays, long axon computational delays

Nets of neurons

Computers and brains Brain Computer speed ms / operation ns / operation size Tera nodes, Peta conns Giga nodes Memory content addressable, in contiguous, random access connections Computing Distributed / fault tolerant Centralized / non-ft Power 10W ~ 300W (GPU)

Organic vs. artificial computer

ANNs architectures Feed forward NNs (and CNNs) Recurrent NNs RBMs

Feed Forward Directed Acyclic Graph Input (first), hidden, and output (last) layers Connections from a layer to next Transfer functions are nonlinearities

Recurent Directed graph with cycles Possibly, hidden layers More complicated, realistic, and powerful Well-suited to sequential input Unroll the hidden state, just like DBNs

Restricted Boltzman Machines Probabilistic model (energy function) A bipartite graph (visible <->hidden) Efficient inference

ANN: Learning

Learning: perceptron Loop through labeled examples Output Unit - on incorrect output: * case 0: w <- w + x W1 W2 * case 1: w <- w - x Input Units X1 X2 Guaranteed separating hyperplane

Learning: perceptron Parity, or counting problem: recognize binary strings of length 2 with exactly one 1 red class: 01, 10 green class: 00, 11 Many other problems (Minsky & Papert 1969)

Learning: features Output Unit Input Units

Learning: features 00: no unit is activated => 0 11: hidden unit cancels inputs 01, 10: inputs connect directly to output 0 0

Learning: features 00: no unit is activated => 0 11: hidden unit cancels inputs 01, 10: inputs connect directly to output 1 1

Learning: features 00: no unit is activated => 0 .5 11: hidden unit cancels inputs 01, 10 : inputs connect directly to output 0 1

Learning: features 00: no unit is activated => 0 .5 11: hidden unit cancels inputs 01 , 10: inputs connect directly to output 1 0

Learning: perceptron Perceptron guaranties a SH if a SH exists Learning from input features requires a lot of “(big) data science” Have the NN do the “(big) data science!”

Deep supervised learning paradigm Map “raw” input into intermediate hidden layers Deep means: more layers, means more efficient, means harder-to-train Classify the hidden representation of data Learn weights for both steps above using backprop or pre- training

Feature extraction

Feature extraction Trained NNs can be used to predict, but they are black boxes It is hard to relate high weights with input features How do we map features from hidden layers back to the input space?

Learning W, b Batch SGD Early stop, regularization and a lot of tricks Maximize average of P(Y|X;θ) over training data I.e., find a θ with low entropy

Feature extraction: 1 layer P(Y|X;θ) = f(WX T + b)

Feature extraction: 1 layer Y 0 1 Given trained model and label, find input: P(Y | E[X 0 ]) c 0 = f θ (E[X 0 ]) 2/3 1/3 * with that label θ = {W, b} * minimized gray area E[X 0 ]

Feature extraction: 1 layer l: label X l : input features E[ X l ]: input average for that label f θ (E[ X ]): decision boundary c l : f θ (E[ X l ]), constraint boundary ε: slack (see below) This is an LP !

Feature extraction on a stack

Feature extraction: ε The slack variable is a control on the CE achieved by extracted features Useful, if avg. input achieves 0.01 CE, but you are happy with 0.2

Linear programing (in 1 page) Optimization problems that: minimize a linear cost function satisfy linear constraints very efficient, for continuous variables (simplex)

Feature extraction: implementation

Mnist digits 28x28 pixel binarized handwritten digit images pick pairs and extract differentiating features

Effect of ε on |X l |

Effect of optimization

Features

Feature extraction: applications

Hematopoiesis & erythroid diff. Genes dev. 8(10):1184-97, 1994 Genome Res. 21(10):1659-71, 2011

Application: G1e Model

dimer is @ http://bitbucket.org/gertidenas/dimer PULL IT!

Feature extraction from deep models Olgert Denas Synopsis Intro - PowerPoint PPT Presentation

Feature extraction from deep models Olgert Denas Synopsis Intro to deep models Applications Neurons & Nets dimer Learning & Depth G1E model Feature extraction Theory 1 Layer Nets Neural computation

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Extraction 7-1 Ronald Peikert SciVis 2008 - Feature Extraction What are features?

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on

Object based feature extraction of Google based feature extraction of Google Object Earth

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

31) Feature Models and MDA for Product Lines 1. Feature Models 2. Product Linie Configuration with

Feature Extraction Aleix M. Martinez aleix@ece.osu.edu Continuous Feature Space Let us now

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

$$$ $$$ Cache Memory $$$ 2 Schedule Today + Friday

The CMS Track Trigger and the Processing of its Data Christian Amstutz Institute for Data

Memories Memories Viktor wall Dept. of Electrical and Information Technology p gy Lund

Rapid Identification of Heavy Quarks and Leptons at the Large Hadron Collider The FTK Project M.

T-61.182 Information Theory and Machine Learning 38. Introduction to Neural Networks 40.

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

Chapter 13. Neurodynamics Neural Networks and Learning Machines (Haykin) Lecture Notes on

CS344: Introduction to Artificial CS344: Introduction to Artificial Intelligence g (associated

Feature extraction from deep models Olgert Denas Synopsis Intro - PowerPoint PPT Presentation

Feature extraction from deep models Olgert Denas Synopsis Intro to deep models Applications Neurons & Nets dimer Learning & Depth G1E model Feature extraction Theory 1 Layer Nets Neural computation

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Extraction 7-1 Ronald Peikert SciVis 2008 - Feature Extraction What are features?

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on

Object based feature extraction of Google based feature extraction of Google Object Earth

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

31) Feature Models and MDA for Product Lines 1. Feature Models 2. Product Linie Configuration with

Feature Extraction Aleix M. Martinez aleix@ece.osu.edu Continuous Feature Space Let us now

PCA &amp; ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

$$$ $$$ Cache Memory $$$ 2 Schedule Today + Friday

The CMS Track Trigger and the Processing of its Data Christian Amstutz Institute for Data

Memories Memories Viktor wall Dept. of Electrical and Information Technology p gy Lund

Rapid Identification of Heavy Quarks and Leptons at the Large Hadron Collider The FTK Project M.

T-61.182 Information Theory and Machine Learning 38. Introduction to Neural Networks 40.

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

Chapter 13. Neurodynamics Neural Networks and Learning Machines (Haykin) Lecture Notes on

CS344: Introduction to Artificial CS344: Introduction to Artificial Intelligence g (associated

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani