finding structure in time
play

Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, - PowerPoint PPT Presentation

Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, 179 211 (1990) presented by Dominic Seyler (dseyler2@illinois.edu) Outline Motivation Method Experiments Exclusive-Or Structure in Letter Sequences


  1. Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, 179 – 211 (1990) presented by Dominic Seyler (dseyler2@illinois.edu)

  2. Outline • Motivation • Method • Experiments • Exclusive-Or • Structure in Letter Sequences • Discovering the Notion “Word” • Discovering Lexical Classes • Conclusions

  3. Motivation: The Problem with Time • Previous methods of representing time • Associate serial order of temporal pattern with dimensionality of pattern vector • [ 0 1 0 0 1 ] <- first, second, third... event in temporal order • There are several downsides of presenting time this way • Input buffer is required to represent events all at once • All input vectors must be the same length and provide for the longest possible temporal pattern • Most importantly: Cannot distinguish relative from absolute temporal position [ 0 1 1 1 0 0 0 0 0 ] [ 0 0 0 1 1 1 0 0 0 ]

  4. An Alternative Way of Treating Time • Don’t model time as an explicit part of the input • Allow time to be represented by the effect it has on processing • Networks allows hidden units to see previous output • Recurrent connections are what give the network memory

  5. Approach: Recurrent Neural Network • Argument input with additional units (context units) • When input is processed sequentially, the context units contain the exact values of the hidden units of the previous sequence • The hidden units map the external input and previous internal state to desired output

  6. Exclusive-OR • XOR function cannot be learned by a simple two-layer network • Temporal XOR: One input bit is presented at a time, predict next bit • Input: 1 0 1 0 0 0 • output: 0 1 0 0 0 ? • Training: Run 600 passes through a 3,000 bit XOR sequence

  7. Exclusive-OR (cont.) • It is only sometimes possible to predict the next bit correctly • After one bit, there is a 50/50 chance • After two bits, the third bit will be the XOR of the first and second

  8. Structure in Letter Sequences • Idea: Extend prediction from one bit vectors to more complex predictions (multi-bit) • Method: • map six letters to a binary representation (b, d, g, a, i, u) • Use three consonants to create a random 1,000 letter sequence • Replace each consonant by adding vowels: b -> ba; d -> dii; g -> guuu • Example input: dbgbddg -> diibaguuubadiidiiguuu • Prediction task: given the bit representations of characters in sequence, predict the character word

  9. Structure in Letter Sequences (cont.) • Since consonants where ordered randomly there is high error • Vowels are not random, therefore the network can make use of previous information. Thus, error is low. • Takeaway: Since the input is structured the network can make partial predictions even where the complete prediction is not possible

  10. Discovering the Notion “Word” • Learning a language involves learning words • Can the network automatically learn “words”, when given a sequential list of concatenated characters? • Words are represented as concatenated bit vectors of their characters • These bit vectors are concatenated to form sentences • Then, each character is inputted sequentially and the network has to predict the following letter • input: manyyearsago • output: anyyearsago?

  11. Discovering the Notion “Word” (cont.) • At the onset of each word error is high • As more of the word is received, error declines • Error provides good clue as to what the recurring sequences in the input are and highly correlates with words • Network can learn boundaries of linguistic units from input signal

  12. Discovering Lexical Classes from Word Order • Can network learn the abstract structure that underlies sentences, when only the surface forms (i.e. words) are presented to it? • Method • Define a set of category-to-word mappings (e.g., NOUN-HUMAN -> man, woman; VERB-PERCEPTION -> smell, see) • Use templates to create sentences (e.g., NOUN-HUMAN, VERB-EAT, NOUN- FOOD) • Words in sentence (e.g., ”woman eat bread”) are mapped to one-hot- vectors (e.g. 00010 00100 10000 ) • Task: Given a word vector (“woman”) predict next word (“eat”).

  13. Discovering Lexical Classes (cont.) • Since prediction task is nondeterministic RMS error is not a fitting measurement • Save the hidden unit vectors for each word in all possible contexts and average over them • Perform hierarchical clustering • Similarity structure of internal representations is shown in tree

  14. Discovering Lexical Classes (cont.) • Network has developed internal representations for the input vectors which reflect facts about possible sequential ordering of inputs • Hidden unit patterns are not word representations in the conventional sense, since patterns also reflect prior context. • Error in predicting the actual next word in a given context is high, but the network is able to predict the approximate likelihood of occurrence of classes and words • A given node in hidden layer participates in multiple concepts. Only the activation pattern in its entirety is meaningful.

  15. Conclusions • Networks can learn temporal structure implicitly • Problems change their nature when expressed as temporal events (XOR could previously not be learned by single-layer network) • Error signal is a good metric of where structure exists (Error was high at the beginning of words in sentence) • Increasing complexity does not necessarily result in worse performance (Increasing number of bits did not hurt performance) • Internal representations can be hierarchical in nature (Similarity was high among words within one class)

  16. Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, 179 – 211 (1990) presented by Dominic Seyler (dseyler2@illinois.edu)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend