ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja - PowerPoint PPT Presentation

Utrecht University INFOB2KI 2019-2020 The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

Outline  Biological neural networks  Artificial NN basics and training: – perceptrons – multi‐layer networks  Combination with other ML techniques – NN and Reinforcement Learning • e.g. AlphaGo – NN and Evolutionary Computing 3

(Artificial) Neural Networks  Supervised learning technique: error‐driven classification  Output is weighted function of inputs  Training updates the weights  Used in games for e.g. – Select weapon – Select item to pick up – Steer a car on a circuit – Recognize characters – Recognize face – … 4

Biological Neural Nets  Pigeons as art experts (Watanabe et al. 1995) – Experiment: • Pigeon in Skinner box • Present paintings of two different artists (e.g. Chagall / Van Gogh) • Reward for pecking when presented a particular artist (e.g. Van Gogh) 5

Results from experiment Pigeons were able to discriminate between Van Gogh and Chagall:  with 95% accuracy, when presented with pictures they had been trained on  still 85% successful for previously unseen paintings of the artists 7

Praise to neural nets  Pigeons have acquired knowledge about art – Pigeons do not simply memorise the pictures – They can extract and recognise patterns (the ‘style’) – They generalise from the already seen to make predictions  Pigeons have learned.  Can one implement this using an artificial neural network? 8

Inspiration from biology  If a pigeon can do it, how hard can it be?  ANN’s are biologically inspired.  ANN’s are not duplicates of brains (and don’t try to be)! 9

(Natural) Neurons Natural neurons:  receive signals through synapses (~ inputs)  If signals strong enough (~ above some threshold ), – the neuron is activated – and emits a signal though the axon . (~ output ) Artificial neuron (Node) Natural neuron 10

McCulloch & Pitts model (1943) “A logical calculus of the ideas immanent in nervous activity” Linear x 1 w 1 hard Combiner output delimiter x 2 w 2 y aka: - linear threshold gate w n x n - threshold logic unit • n binary inputs x i and 1 binary output y • n weights w i ϵ {‐1,1} � • Linear combiner: z = ∑ 𝑥 � 𝑦 � �� • Hard delimiter: unit step function at threshold θ , i.e. 𝑧 � 1 if 𝑨 � 𝜄, 𝑧 � 0 if 𝑨 � 𝜄 11

Rosenblatt’s Perceptron (1958) x z y = g(z) x • enhanced version of McCulloch‐Pitts artificial neuron • n+ 1 real‐valued inputs : x 1 … x n and 1 bias b ; binary output y • weights w i with real‐valued values � • Linear combiner: z = ∑ 𝑥 � 𝑦 � � 𝑐 �� • g(z) : (hard delimiter) unit step function at threshold 0, i.e. 𝑧 � 1 if 𝑨 � 0 , 𝑧 � 0 if 𝑨 � 0 12

Classification: feedforward The algorithm for computing outputs from inputs in perceptron neurons is the feedforward algorithm. 4 w=2 8 -4 0 w=4 -3 -12 0 weighted input: activation g(z) : 0 � z = � � �� 0 13

Bias & threshold implementation Bias can be incorporated in three different ways, with same effect on output: ∑ ∑ b w 0 = 1 θ - b 1 b Alternatively: threshold θ can be incorporated in three different ways, with same effect on output… 14

Single layer perceptron Input Single layer of nodes: neurons: • Rosenblatt’s perceptron is building w 13 y 1 3 x 1 1 block of single‐layer perceptron w 14 w 23 • which is the simplest feedforward y 2 x 2 2 4 w 24 neural network • alternative hard‐limiting activation functions g(z) possible; e.g. sign function: 𝑧 � �1 if 𝑨 � 0 , 𝑧 � �1 if 𝑨 � 0 • can have multiple independent outputs y i • the adjustable weights can be trained using training data • the Perceptron learning rule adjusts the weights w 1 …w n such that the inputs x 1 …x n give rise to the (desired) output(s) 15

Perceptron learning: idea Idea: minimize error in the output  per output: 𝑓 � 𝑒 � 𝑧 ( d =desired output) �  If 𝑓 � 1 then z � ∑ 𝑥 � 𝑦 � should be increased such that it �� exceeds the threshold �  If 𝑓 � �1 then z � ∑ 𝑥 � 𝑦 � should be decreased such �� that it falls below the threshold ��  change 𝑥 � ← 𝑥 � +/‐ term proportional to gradient �� 𝑦 �  Proportional change: learning rate 𝛽 > 0 NB in the book the learning rate is called Gain, with notation η 16

Perceptron learning Initialize weights and threshold (or bias) to random numbers; Choose a learning rate 0 � 𝛽 � 1 For each training input t =< x 1 ,…,x n >: 1 ‘epoch ’ calculate the output y(t) and error e(t)=d(t) - y(t) desired output Adjust all n weights using perceptron learning rule: 𝑥 � ← 𝑥 � � ∆𝑥 � where ∆𝑥 � � 𝛽 𝑦 � e(t) All Weights unchanged ? Weights for any t changed? Ready or other stopping rule… 17

Example: AND- learning (1) x 1 x 2 d x 2 0 0 0 1 0 1 0 1 0 0 1 1 1 x 1 0 1 d esired output of logical AND, given 2 binary inputs 18

Example AND (2) x 1 0 w=0.3 0 0 0 x 2 w=-0.1 0 0 e(t 1 ) = d(t) – 0 0.2 = 0 – 0 Init: choose weights w i and threshold θ randomly in [‐0.5,0.5]; set ; use step function : return 0 if < θ ; 1 if ≥ θ x 1 x 2 d(t) Alternative:  use bias b= – θ t 1 0 0 0 with unit stepfunction t 2 0 1 0 t 3 1 0 0  Done with t 1 , for now… t 4 1 1 1 19

Example AND (3) x 1 0 w=0.3 0 -0.1 0 x 2 w=-0.1 1 -0.1 e(t 2 ) = 0-0 0.2 x 1 x 2 d(t)  t 1 0 0 0 t 2 0 1 0 t 3 1 0 0  Done with t 2 , for now… t 4 1 1 1 20

Example AND (4) x 1 1 w=0.3 w=0.2 0.3 0.3 1 x 2 w=-0.1 0 0 e(t 3 ) = 0 - 1 0.2 � (t) x 1 x 2 d(t) � t 1 0 0 0  � t 2 0 1 0 t 3 1 0 0 � 1 1 1 t 4  w 1  0.2; done with t 3 , for now… 21

Example AND (5) x 1 w=0.2 1 w=0.3 0.2 0.1 0 x 2 w=0 w=-0.1 1 -0.1 e(t 4 ) = 1-0 0.2 x 1 x 2 d(t) � (t) � t 1 0 0 0  � t 2 0 1 0 .1 t 3 1 0 0 � t 4 1 1 1  w 1  0.3 and w 2  0; done with t 4 and first epoch… 22

Example (6) : 4 epoch’s later… x 1 w=0.1 x 2 w=0.1 0.2  algorithm has converged, i.e. the weights do not change any more.  algorithm has correctly learned the AND function 23

AND example (7): results x 2 x 1 x 2 d y 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 1 1 x 1 0 1 Learned function/decision boundary: 0.1 𝑦 � � 0.1 𝑦 � � 0.2 linear classifier 𝑦 � � 2 � 𝑦 � Or : 24

Perceptron learning: properties All linear functions  space without local optima Complete: yes, if – 𝛽 sufficiently small or initial weights sufficiently large – examples come from a linearly separable function! then perceptron learning converges to a solution. Optimal: no (weights serve to correctly separate ‘seen’ inputs; no guarantees for ‘unseen’ inputs close to the decision boundaries) 25

Limitation of perceptron: example x 2 XOR x 1 x 2 d 1 0 0 0 0 1 1 1 0 1 1 1 0 x 1 0 1  Cannot separate two output types with a single linear function  XOR is not linearly separable. 26

Solving XOR using 2 single layer perceptrons x 1 x 1 ϴ =1 ϴ =1 x 1 1 1 ϴ =1 3 -1 1 1 1 y 3 -1 1 y 5 -1 4 4 y 2 2 1 -1 2 1 1 ϴ =1 x 2 x 2 ϴ =1 x 2 x 2 x 2 x 2 1 1 1 x 1 x 1 x 1 0 0 0 1 1 1 27

Types of decision regions 28

Multi-layer networks x 1 y 1 x 2 y 2 x 3 y 3 input hidden output nodes layer of neuron neurons layer • This type of network is also called a feed forward network • hidden layer captures nonlinearities • more than 1 hidden layer is possible, but often reducible to 1 hidden layer • introduced in 50s, but not studied until 80s 29

Multi-Layer Networks In MLNs  outputs not based on simple weighted sum of inputs  weights are shared  dependent outputs Input signals x 1 y 1 x 2 y 2 x 3 y 3 Error signals  errors must be distributed over hidden neurons  continuous activation functions are used 30

Continuous activation functions As continuous activation function, we can use • a (piecewise) linear function   (ReLU) • a sigmoid (smoothed version of step function)  e.g. logistic sigmoid g(z) �� z 31

Continuous artificial neurons Linear x 1 w 1 sigmoid Combiner output function x 2 w 2 y w n x n weighted input: activation ( logistic sigmoid ): z = 32

Example w=2 3 6 -2 0.119 w=4 -2 -8 weighted input: activation: z = 33

Error minimization in MLNs: idea Idea: minimize error in output through gradient descent  Total error is sum of squared error, per output: 𝐹 � ∑ � � 𝑒 � 𝑧 � � ( d =desired output) � ��  change 𝑥 � ← 𝑥 � � term proportional to gradient �� 34

ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja - PowerPoint PPT Presentation

Utrecht University INFOB2KI 2019-2020 The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html 2

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

Artificial Neural Networks By: Kodi Neumiller Overview What is an artificial neural network

How Neural Networks (NN) Biological Neuron: A . . . Can (Hopefully) Learn Artificial Neural . .

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Posterior odds interpretation of a sigmoid Artificial Intelligence: Neural Networks Michael S.

Neural Networks - Deep Learning Artificial Intelligence @ Allegheny College Janyl Jumadinova

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

Discrete Mathematics & Mathematical Reasoning Chapter 6: Counting Colin Stirling Informatics

Relations & Functions CISC1100, Spring 2013 Fordham Univ 1 Overview: relations &

SyMT: finding symmetries in SMT formulas (Work in progress) Carlos Areces, David Dharbe, Pascal

Reproducibility 1 Good practice Workshop 3 2 Aim In this session you will practice creating

SAT by MaxSAT From NP to Beyond NP and Back Again Joao Marques-Silva (joint work with A.

English for Computer Science Mohammad Farshi Department of Computer Science, Yazd University

SAT Solver as coNP Solver Beyond Resolution Norbert Manthey International Center for

Combinatorics I Cunsheng Ding HKUST, Hong Kong September 27, 2015 Cunsheng Ding (HKUST, Hong

ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja - PowerPoint PPT Presentation

Utrecht University INFOB2KI 2019-2020 The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html 2

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

Artificial Neural Networks By: Kodi Neumiller Overview What is an artificial neural network

How Neural Networks (NN) Biological Neuron: A . . . Can (Hopefully) Learn Artificial Neural . .

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Posterior odds interpretation of a sigmoid Artificial Intelligence: Neural Networks Michael S.

Neural Networks - Deep Learning Artificial Intelligence @ Allegheny College Janyl Jumadinova

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

Discrete Mathematics &amp; Mathematical Reasoning Chapter 6: Counting Colin Stirling Informatics

Relations &amp; Functions CISC1100, Spring 2013 Fordham Univ 1 Overview: relations &amp;

SyMT: finding symmetries in SMT formulas (Work in progress) Carlos Areces, David Dharbe, Pascal

Reproducibility 1 Good practice Workshop 3 2 Aim In this session you will practice creating

SAT by MaxSAT From NP to Beyond NP and Back Again Joao Marques-Silva (joint work with A.

English for Computer Science Mohammad Farshi Department of Computer Science, Yazd University

SAT Solver as coNP Solver Beyond Resolution Norbert Manthey International Center for

Combinatorics I Cunsheng Ding HKUST, Hong Kong September 27, 2015 Cunsheng Ding (HKUST, Hong

Discrete Mathematics & Mathematical Reasoning Chapter 6: Counting Colin Stirling Informatics

Relations & Functions CISC1100, Spring 2013 Fordham Univ 1 Overview: relations &