Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, - PowerPoint PPT Presentation

Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, 179 – 211 (1990) presented by Dominic Seyler (dseyler2@illinois.edu)

Outline • Motivation • Method • Experiments • Exclusive-Or • Structure in Letter Sequences • Discovering the Notion “Word” • Discovering Lexical Classes • Conclusions

Motivation: The Problem with Time • Previous methods of representing time • Associate serial order of temporal pattern with dimensionality of pattern vector • [ 0 1 0 0 1 ] <- first, second, third... event in temporal order • There are several downsides of presenting time this way • Input buffer is required to represent events all at once • All input vectors must be the same length and provide for the longest possible temporal pattern • Most importantly: Cannot distinguish relative from absolute temporal position [ 0 1 1 1 0 0 0 0 0 ] [ 0 0 0 1 1 1 0 0 0 ]

An Alternative Way of Treating Time • Don’t model time as an explicit part of the input • Allow time to be represented by the effect it has on processing • Networks allows hidden units to see previous output • Recurrent connections are what give the network memory

Approach: Recurrent Neural Network • Argument input with additional units (context units) • When input is processed sequentially, the context units contain the exact values of the hidden units of the previous sequence • The hidden units map the external input and previous internal state to desired output

Exclusive-OR • XOR function cannot be learned by a simple two-layer network • Temporal XOR: One input bit is presented at a time, predict next bit • Input: 1 0 1 0 0 0 • output: 0 1 0 0 0 ? • Training: Run 600 passes through a 3,000 bit XOR sequence

Exclusive-OR (cont.) • It is only sometimes possible to predict the next bit correctly • After one bit, there is a 50/50 chance • After two bits, the third bit will be the XOR of the first and second

Structure in Letter Sequences • Idea: Extend prediction from one bit vectors to more complex predictions (multi-bit) • Method: • map six letters to a binary representation (b, d, g, a, i, u) • Use three consonants to create a random 1,000 letter sequence • Replace each consonant by adding vowels: b -> ba; d -> dii; g -> guuu • Example input: dbgbddg -> diibaguuubadiidiiguuu • Prediction task: given the bit representations of characters in sequence, predict the character word

Structure in Letter Sequences (cont.) • Since consonants where ordered randomly there is high error • Vowels are not random, therefore the network can make use of previous information. Thus, error is low. • Takeaway: Since the input is structured the network can make partial predictions even where the complete prediction is not possible

Discovering the Notion “Word” • Learning a language involves learning words • Can the network automatically learn “words”, when given a sequential list of concatenated characters? • Words are represented as concatenated bit vectors of their characters • These bit vectors are concatenated to form sentences • Then, each character is inputted sequentially and the network has to predict the following letter • input: manyyearsago • output: anyyearsago?

Discovering the Notion “Word” (cont.) • At the onset of each word error is high • As more of the word is received, error declines • Error provides good clue as to what the recurring sequences in the input are and highly correlates with words • Network can learn boundaries of linguistic units from input signal

Discovering Lexical Classes from Word Order • Can network learn the abstract structure that underlies sentences, when only the surface forms (i.e. words) are presented to it? • Method • Define a set of category-to-word mappings (e.g., NOUN-HUMAN -> man, woman; VERB-PERCEPTION -> smell, see) • Use templates to create sentences (e.g., NOUN-HUMAN, VERB-EAT, NOUN- FOOD) • Words in sentence (e.g., ”woman eat bread”) are mapped to one-hot- vectors (e.g. 00010 00100 10000 ) • Task: Given a word vector (“woman”) predict next word (“eat”).

Discovering Lexical Classes (cont.) • Since prediction task is nondeterministic RMS error is not a fitting measurement • Save the hidden unit vectors for each word in all possible contexts and average over them • Perform hierarchical clustering • Similarity structure of internal representations is shown in tree

Discovering Lexical Classes (cont.) • Network has developed internal representations for the input vectors which reflect facts about possible sequential ordering of inputs • Hidden unit patterns are not word representations in the conventional sense, since patterns also reflect prior context. • Error in predicting the actual next word in a given context is high, but the network is able to predict the approximate likelihood of occurrence of classes and words • A given node in hidden layer participates in multiple concepts. Only the activation pattern in its entirety is meaningful.

Conclusions • Networks can learn temporal structure implicitly • Problems change their nature when expressed as temporal events (XOR could previously not be learned by single-layer network) • Error signal is a good metric of where structure exists (Error was high at the beginning of words in sentence) • Increasing complexity does not necessarily result in worse performance (Increasing number of bits did not hurt performance) • Internal representations can be hierarchical in nature (Similarity was high among words within one class)

Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, 179 – 211 (1990) presented by Dominic Seyler (dseyler2@illinois.edu)

Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, - PowerPoint PPT Presentation

Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, 179 211 (1990) presented by Dominic Seyler (dseyler2@illinois.edu) Outline Motivation Method Experiments Exclusive-Or Structure in Letter Sequences

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Finding Structure in Time Finding Structure in Time By Jonathan Hall Author: Jeffrey L. Elman

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

STATUS COUNT FINDING APPROVED 5 FINDING CONDITIONAL 16 FINDING DENIED 11

Tree Pr ee Proximity ximity Finding the good and bad of trees. joe@buildfax.com Tree

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Gene finding Lorenzo Cerutti Swiss Institute of Bioinformatics EMBNet course, September 2002

Finding Bugs Last time Run-time reordering transformations Today Program Analysis for

Finding Explanations Instead of finding structure in a data set, we are now focusing on methods

Finding Similar Items:Nearest Neighbor Search Barna Saha March 29, 2018 Finding Similar Items

Gospel of John JESUS Finding the Old Testament in the Gospel of John JESUS 1. Review

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

NEEDFINDING IN HUMAN CENTERED DESIGN What is need finding? What is need finding? What is need

CSE 421 Divide and Conquer: Finding Root Closest Pair of Points Shayan Oveis Gharan 1 Finding

Finding Structure in Texts with Topological Data Analysis Calli Clay and Ella Graham St.

A Spatio-Temporal Logic for the Specification and Refinement of Mobile Systems Martin Wirsing

On Temporal and Separation Logics St ephane Demri CNRS, LSV, ENS Paris-Saclay TIME18

Fast and simple qubit-based synchronization for quantum key distribution merged with Simple and

Improving String Processing for Temporal Relations Tim Fernando David Woods ADAPT Centre

Temporal Coverage Based Content Distribution in Heterogeneous Smart Device Networks Wei Peng

Verifying social expectations by model checking truncated paths Stephen Cranefield Department of

Temporal Temporal Radiance Caching Radiance Caching Pascal Gautron R&D Engineer Thomson

Introduction to Temporal Logic Mads Dam Theoretical Computer Science KTH, 2015 About the Course

Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, - PowerPoint PPT Presentation

Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, 179 211 (1990) presented by Dominic Seyler (dseyler2@illinois.edu) Outline Motivation Method Experiments Exclusive-Or Structure in Letter Sequences

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Finding Structure in Time Finding Structure in Time By Jonathan Hall Author: Jeffrey L. Elman

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

STATUS COUNT FINDING APPROVED 5 FINDING CONDITIONAL 16 FINDING DENIED 11

Tree Pr ee Proximity ximity Finding the good and bad of trees. joe@buildfax.com Tree

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Gene finding Lorenzo Cerutti Swiss Institute of Bioinformatics EMBNet course, September 2002

Finding Bugs Last time Run-time reordering transformations Today Program Analysis for

Finding Explanations Instead of finding structure in a data set, we are now focusing on methods

Finding Similar Items:Nearest Neighbor Search Barna Saha March 29, 2018 Finding Similar Items

Gospel of John JESUS Finding the Old Testament in the Gospel of John JESUS 1. Review

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

NEEDFINDING IN HUMAN CENTERED DESIGN What is need finding? What is need finding? What is need

CSE 421 Divide and Conquer: Finding Root Closest Pair of Points Shayan Oveis Gharan 1 Finding

Finding Structure in Texts with Topological Data Analysis Calli Clay and Ella Graham St.

A Spatio-Temporal Logic for the Specification and Refinement of Mobile Systems Martin Wirsing

On Temporal and Separation Logics St ephane Demri CNRS, LSV, ENS Paris-Saclay TIME18

Fast and simple qubit-based synchronization for quantum key distribution merged with Simple and

Improving String Processing for Temporal Relations Tim Fernando David Woods ADAPT Centre

Temporal Coverage Based Content Distribution in Heterogeneous Smart Device Networks Wei Peng

Verifying social expectations by model checking truncated paths Stephen Cranefield Department of

Temporal Temporal Radiance Caching Radiance Caching Pascal Gautron R&amp;D Engineer Thomson

Introduction to Temporal Logic Mads Dam Theoretical Computer Science KTH, 2015 About the Course

Temporal Temporal Radiance Caching Radiance Caching Pascal Gautron R&D Engineer Thomson