Feature extraction from deep models Olgert Denas Synopsis Intro - - PowerPoint PPT Presentation

feature extraction from deep models
SMART_READER_LITE
LIVE PREVIEW

Feature extraction from deep models Olgert Denas Synopsis Intro - - PowerPoint PPT Presentation

Feature extraction from deep models Olgert Denas Synopsis Intro to deep models Applications Neurons & Nets dimer Learning & Depth G1E model Feature extraction Theory 1 Layer Nets Neural computation


slide-1
SLIDE 1

Feature extraction from deep models

Olgert Denas

slide-2
SLIDE 2

Synopsis

Intro to deep models

  • Neurons & Nets
  • Learning & Depth

Feature extraction

  • Theory
  • 1 Layer
  • Nets

Applications

  • dimer
  • G1E model
slide-3
SLIDE 3

Neural computation

Inspired by organic neural systems

A system of simple computing units with learnable parameters

Intended for conventional computing

efficient arithmetic and calculus but von Neumann’s architecture “won”

slide-4
SLIDE 4

Neural computation

Mainly in machine learning Declarative: unambiguous

sort an array of integers

Procedural: can only state by examples

find fraud in network logs

slide-5
SLIDE 5

Artificial Neural Nets

slide-6
SLIDE 6

Neurons

slide-7
SLIDE 7

Neurons

The artificial neuron is very different from the biological one, after all it is a model

slide-8
SLIDE 8

Neurons

Natural - organic transfer function complicated mixed communication continuous/impulse state, chemical, physical changes synaptic delays, long axon Artificial parametric function discrete or continuous no state, output is f(x; θ), fixed connections computational delays

slide-9
SLIDE 9

Nets of neurons

slide-10
SLIDE 10

Computers and brains

Brain Computer speed ms / operation ns / operation size Tera nodes, Peta conns Giga nodes Memory content addressable, in connections contiguous, random access Computing Distributed / fault tolerant Centralized / non-ft Power 10W ~ 300W (GPU)

slide-11
SLIDE 11

Organic vs. artificial computer

slide-12
SLIDE 12

ANNs architectures

Feed forward NNs (and CNNs) Recurrent NNs RBMs

slide-13
SLIDE 13

Feed Forward

Directed Acyclic Graph

Input (first), hidden, and output (last) layers

Connections from a layer to next Transfer functions are nonlinearities

slide-14
SLIDE 14

Recurent

Directed graph with cycles

Possibly, hidden layers More complicated, realistic, and powerful

Well-suited to sequential input

Unroll the hidden state, just like DBNs

slide-15
SLIDE 15

Restricted Boltzman Machines

Probabilistic model (energy function) A bipartite graph (visible <->hidden) Efficient inference

slide-16
SLIDE 16

ANN: Learning

slide-17
SLIDE 17

Learning: perceptron

Loop through labeled examples

  • on incorrect output:

* case 0: w <- w + x * case 1: w <- w - x Guaranteed separating hyperplane

Input Units Output Unit X1 X2 W1 W2

slide-18
SLIDE 18

Learning: perceptron

Parity, or counting problem:

recognize binary strings of length 2 with exactly one 1 red class: 01, 10 green class: 00, 11

Many other problems

(Minsky & Papert 1969)

slide-19
SLIDE 19

Learning: features

Input Units Output Unit

slide-20
SLIDE 20

Learning: features

00: no unit is activated => 0

11: hidden unit cancels inputs 01, 10: inputs connect directly to output

0 0

slide-21
SLIDE 21

Learning: features

1 1

00: no unit is activated => 0

11: hidden unit cancels inputs

01, 10: inputs connect directly to output

slide-22
SLIDE 22

Learning: features

0 1 .5

00: no unit is activated => 0 11: hidden unit cancels inputs

01, 10: inputs connect directly to

  • utput
slide-23
SLIDE 23

Learning: features

1 0 .5

00: no unit is activated => 0 11: hidden unit cancels inputs 01, 10: inputs connect directly to

  • utput
slide-24
SLIDE 24

Learning: perceptron

Perceptron guaranties a SH if a SH exists Learning from input features requires a lot

  • f “(big) data science”

Have the NN do the “(big) data science!”

slide-25
SLIDE 25

Deep supervised learning paradigm

Map “raw” input into intermediate hidden layers

Deep means: more layers, means more efficient, means harder-to-train

Classify the hidden representation of data Learn weights for both steps above using backprop or pre- training

slide-26
SLIDE 26

Feature extraction

slide-27
SLIDE 27

Feature extraction

Trained NNs can be used to predict, but they are black boxes

It is hard to relate high weights with input features

How do we map features from hidden layers back to the input space?

slide-28
SLIDE 28

Learning W, b

Batch SGD

Early stop, regularization and a lot of tricks

Maximize average of P(Y|X;θ) over training data

I.e., find a θ with low entropy

slide-29
SLIDE 29

Feature extraction: 1 layer

P(Y|X;θ) = f(WXT + b)

slide-30
SLIDE 30

Feature extraction: 1 layer

Given trained model and label, find input: * with that label * minimized gray area

1 2/3 1/3 Y P(Y | E[X0]) c0 = fθ(E[X0]) θ = {W, b} E[X0]

slide-31
SLIDE 31

Feature extraction: 1 layer

Given trained model and label, find input: * with that label * minimized gray area

1 2/3 1/3 Y P(Y | E[X0]) c0 = fθ(E[X0]) θ = {W, b} E[X0]

slide-32
SLIDE 32

Feature extraction: 1 layer

l: label Xl: input features E[Xl]: input average for that label fθ(E[X]): decision boundary cl: fθ(E[Xl]), constraint boundary ε: slack (see below) This is an LP !

slide-33
SLIDE 33

Feature extraction on a stack

slide-34
SLIDE 34

Feature extraction: ε

The slack variable is a control on the CE achieved by extracted features Useful, if avg. input achieves 0.01 CE, but you are happy with 0.2

slide-35
SLIDE 35

Linear programing (in 1 page)

Optimization problems that: minimize a linear cost function satisfy linear constraints very efficient, for continuous variables (simplex)

slide-36
SLIDE 36

Feature extraction: implementation

slide-37
SLIDE 37

Mnist digits

28x28 pixel binarized handwritten digit images pick pairs and extract differentiating features

slide-38
SLIDE 38

Effect of ε on |Xl|

slide-39
SLIDE 39

Effect of optimization

slide-40
SLIDE 40

Features

slide-41
SLIDE 41

Feature extraction: applications

slide-42
SLIDE 42

Hematopoiesis & erythroid diff.

Genes dev. 8(10):1184-97, 1994 Genome Res. 21(10):1659-71, 2011

slide-43
SLIDE 43

Application: G1e Model

slide-44
SLIDE 44

dimer

slide-45
SLIDE 45

dimer is @

http://bitbucket.org/gertidenas/dimer PULL IT!