re reinfor inforce ceme ment nt lea learn rning ing: A A co - - PowerPoint PPT Presentation

re reinfor inforce ceme ment nt lea learn rning ing
SMART_READER_LITE
LIVE PREVIEW

re reinfor inforce ceme ment nt lea learn rning ing: A A co - - PowerPoint PPT Presentation

Faculty of Informatics Etvs Lornd University Hippo Hippoca campa mpal l forma formation tion br brea eaks ks co combina mbinato torial rial ex explos plosion ion for for re reinfor inforce ceme ment nt lea learn


slide-1
SLIDE 1

Faculty of Informatics Eötvös Loránd University

Hippo Hippoca campa mpal l forma formation tion br brea eaks ks co combina mbinato torial rial ex explos plosion ion for for re reinfor inforce ceme ment nt lea learn rning ing: A A co conjec njectu ture re

Andras Lorincz Department of Information Systems Eötvös Loránd University

slide-2
SLIDE 2

Eötvös Loránd University Faculty of Informatics

Support and collaborators

Support

  • AFOSR Information Directorate – on reinforcement learning
  • EU Framework Program – on multiagent systems

Collaborators

  • Barnabas Poczos
  • Zoltan Szabo
  • Gabor Szirtes
  • Istvan Szita

Combinatorial Explosion AAAI FSS BICA 2008

slide-3
SLIDE 3

Eötvös Loránd University Faculty of Informatics

Motivation: Symbols and symbol manipulation

Independent driving components

Control Mixed observation Dynamical system

Combinatorial Explosion AAAI FSS BICA 2008

slide-4
SLIDE 4

Eötvös Loránd University Faculty of Informatics

Problem statement

  • Artificial Intelligence started from computations
  • Computations work by manipulating symbols
  • The symbol grounding problem emerges
  • Grounding of symbols  connect the symbols to experiences
  • Symbols represent parts (components) of (in) the world and their relations

 symbol grounding corresponds to graph matching  it is exponentially hard

  • It seems necessary to focus on polynomial time learning tasks
  • Then the symbol learning problem emerges (Lorincz, 2008)

Combinatorial Explosion AAAI FSS BICA 2008

slide-5
SLIDE 5

Eötvös Loránd University Faculty of Informatics

The symbol learning task

Find high-entropy variables, or symbols,

xi (i = 1, 2, …, k)

and low-entropy random variables, or manifestations for the symbols

zi,ji ((i = 1, 2, …, k); (ji = 1, 2, … ;Ki); Ki >> 1 for all i

such that the transition probability between the low-entropy variables

zi,ji and zk,jk i.e., P(zk,jk|zi,ji)

is roughly determined by the transition probability

 between the high-entropy variables xi and xl, i.e., by P(xl |xi)  for almost all manifestations.

Combinatorial Explosion AAAI FSS BICA 2008

slide-6
SLIDE 6

Eötvös Loránd University Faculty of Informatics

The symbol learning task

The symbol learning task is possible Tao (2005) rephrased the famouse Szemeredi Regularity Lemma of extreme graph theory to information theory The symbol learning task is polynomial Frieze and Kannan (1999).

Combinatorial Explosion AAAI FSS BICA 2008

slide-7
SLIDE 7

Eötvös Loránd University Faculty of Informatics

If we have the symbols

  • Reinforcement learning is still exponential
  • BUT IF variables factorize (‘complementarity’)
  • e.g., [color and shape], [position and speed], [where and what]
  • then factored RL is
  • polynomial
  • with a novel sampling technique (I. Szita and A. Lorincz, 2008)
  • No general method to find variables that factorize
  • No solution to the factored symbol learning task
  • Exception:
  • control (position, speed, acceleration,force)
  • in linear approximation

 Autoregressive Moving Average (ARMA) processes

Combinatorial Explosion AAAI FSS BICA 2008

slide-8
SLIDE 8

Eötvös Loránd University Faculty of Informatics

ARMA processes

  • Steps
  • 1. Remove temporal dependencies (ARMA removal, Gaussian assumption)
  • 2. Compute ARMA innovations := driving causes of ARMA processes
  • 3. Analyze the causes, they should be independent
  • 4. Find the hidden independences: Independent Subspace Analysis
  • 5. Learn the hidden processes driven by the hidden causes

Independent Process Analysis polynomial time algorithm (Poczos, Szabo, Lorincz, 2006-2007)

  • Putting the steps into ANN and insisting on Hebbian learning at each step
  • ne receives an architecture, which is similar to the hippocampal
  • formation. HC is
  • responsible for declarative memory (planning aspect)
  • holds representations of position and direction in rodents

Combinatorial Explosion AAAI FSS BICA 2008

slide-9
SLIDE 9

Faculty of Informatics Eötvös Loránd University

Comparison:

  • 1. Hebbian architecture

for Autoregressive Independent Process Analysis versus

  • 2. hippocampal formation
slide-10
SLIDE 10

Eötvös Loránd University Faculty of Informatics

The architecture we get

Combinatorial Explosion AAAI FSS BICA 2008

Architecture Hippocampal formation with additional CA3dentate gyrus loops serving moving average compensation

slide-11
SLIDE 11

Eötvös Loránd University Faculty of Informatics

Con

  • nject

jectur ure repeated repeated Hippo Hippoca campa mpal l for

  • rma

mation tion br brea eaks ks co comb mbina inato toria rial l exp xplosio losion n for

  • r

reinf einfor

  • rce

cemen ment lear learning ning

slide-12
SLIDE 12

Faculty of Informatics Eötvös Loránd University

Thank you!

slide-13
SLIDE 13

Faculty of Informatics Eötvös Loránd University

Supplementary materials and references

slide-14
SLIDE 14

Eötvös Loránd University Faculty of Informatics

inputs Hexagonal grids hexagonal grids

  • grids and place fields emerge together

in the model (Lorincz, Kiszlinger, Szirtes, 2008) place fields

Grids and place cells

AAAI FSS BICA 2008 Combinatorial Explosion

slide-15
SLIDE 15

Eötvös Loránd University Faculty of Informatics

Independent Process Analysis

AAAI FSS BICA 2008 Combinatorial Explosion

estimated : input of ISA:

  • bserved:
slide-16
SLIDE 16

Eötvös Loránd University Faculty of Informatics

References-1

  • Christian Jutten, Jeanny Hérault: Blind separation of

sources: An adaptive algorithm based on neuromimetic

  • architecture. Signal Processing, 24:1-10, 1991.
  • Pierre Comon: Independent component analysis, a new

concept? Signal Processing, 36 (3): 287-314, 1994.

  • Jean-Francois Cardoso: Multidimensional independent

component analysis. ICASSP’98, volume 4, 1941-1944.

  • Zoltán Szabó, Barnabás Póczos, András Lőrincz:

Undercomplete blind subspace deconvolution. Journal of Machine Learning Research 8(May):1063-1095, 2007.

Combinatorial Explosion AAAI FSS BICA 2008

slide-17
SLIDE 17

Eötvös Loránd University Faculty of Informatics

References-2

  • Aapo Hyvarinen: Independent component analysis for

time-dependent stochastic processes, ICANN’98, 541- 546.

  • Barnabás Póczos, Bálint Takács, András Lőrincz:

Independent subspace analysis on innovations, ECML- 2005, 698-706.

  • Barnabás Póczos, András Lőrincz: D-optimal Bayesian

interrogation for parameter and noise identification of recurrent neural networks, 2008 (submitted). Available at http://arxiv.org/abs/0801.1883

  • Zoltán Szabó, András Lőrincz: Towards independent

subspace analysis in controlled dynamical systems. ICARN-2008, (accepted).

AAAI FSS BICA 2008 Combinatorial Explosion