Faculty of Informatics Eötvös Loránd University
re reinfor inforce ceme ment nt lea learn rning ing: A A co - - PowerPoint PPT Presentation
re reinfor inforce ceme ment nt lea learn rning ing: A A co - - PowerPoint PPT Presentation
Faculty of Informatics Etvs Lornd University Hippo Hippoca campa mpal l forma formation tion br brea eaks ks co combina mbinato torial rial ex explos plosion ion for for re reinfor inforce ceme ment nt lea learn
Eötvös Loránd University Faculty of Informatics
Support and collaborators
Support
- AFOSR Information Directorate – on reinforcement learning
- EU Framework Program – on multiagent systems
Collaborators
- Barnabas Poczos
- Zoltan Szabo
- Gabor Szirtes
- Istvan Szita
Combinatorial Explosion AAAI FSS BICA 2008
Eötvös Loránd University Faculty of Informatics
Motivation: Symbols and symbol manipulation
Independent driving components
Control Mixed observation Dynamical system
Combinatorial Explosion AAAI FSS BICA 2008
Eötvös Loránd University Faculty of Informatics
Problem statement
- Artificial Intelligence started from computations
- Computations work by manipulating symbols
- The symbol grounding problem emerges
- Grounding of symbols connect the symbols to experiences
- Symbols represent parts (components) of (in) the world and their relations
symbol grounding corresponds to graph matching it is exponentially hard
- It seems necessary to focus on polynomial time learning tasks
- Then the symbol learning problem emerges (Lorincz, 2008)
Combinatorial Explosion AAAI FSS BICA 2008
Eötvös Loránd University Faculty of Informatics
The symbol learning task
Find high-entropy variables, or symbols,
xi (i = 1, 2, …, k)
and low-entropy random variables, or manifestations for the symbols
zi,ji ((i = 1, 2, …, k); (ji = 1, 2, … ;Ki); Ki >> 1 for all i
such that the transition probability between the low-entropy variables
zi,ji and zk,jk i.e., P(zk,jk|zi,ji)
is roughly determined by the transition probability
between the high-entropy variables xi and xl, i.e., by P(xl |xi) for almost all manifestations.
Combinatorial Explosion AAAI FSS BICA 2008
Eötvös Loránd University Faculty of Informatics
The symbol learning task
The symbol learning task is possible Tao (2005) rephrased the famouse Szemeredi Regularity Lemma of extreme graph theory to information theory The symbol learning task is polynomial Frieze and Kannan (1999).
Combinatorial Explosion AAAI FSS BICA 2008
Eötvös Loránd University Faculty of Informatics
If we have the symbols
- Reinforcement learning is still exponential
- BUT IF variables factorize (‘complementarity’)
- e.g., [color and shape], [position and speed], [where and what]
- then factored RL is
- polynomial
- with a novel sampling technique (I. Szita and A. Lorincz, 2008)
- No general method to find variables that factorize
- No solution to the factored symbol learning task
- Exception:
- control (position, speed, acceleration,force)
- in linear approximation
Autoregressive Moving Average (ARMA) processes
Combinatorial Explosion AAAI FSS BICA 2008
Eötvös Loránd University Faculty of Informatics
ARMA processes
- Steps
- 1. Remove temporal dependencies (ARMA removal, Gaussian assumption)
- 2. Compute ARMA innovations := driving causes of ARMA processes
- 3. Analyze the causes, they should be independent
- 4. Find the hidden independences: Independent Subspace Analysis
- 5. Learn the hidden processes driven by the hidden causes
Independent Process Analysis polynomial time algorithm (Poczos, Szabo, Lorincz, 2006-2007)
- Putting the steps into ANN and insisting on Hebbian learning at each step
- ne receives an architecture, which is similar to the hippocampal
- formation. HC is
- responsible for declarative memory (planning aspect)
- holds representations of position and direction in rodents
Combinatorial Explosion AAAI FSS BICA 2008
Faculty of Informatics Eötvös Loránd University
Comparison:
- 1. Hebbian architecture
for Autoregressive Independent Process Analysis versus
- 2. hippocampal formation
Eötvös Loránd University Faculty of Informatics
The architecture we get
Combinatorial Explosion AAAI FSS BICA 2008
Architecture Hippocampal formation with additional CA3dentate gyrus loops serving moving average compensation
Eötvös Loránd University Faculty of Informatics
Con
- nject
jectur ure repeated repeated Hippo Hippoca campa mpal l for
- rma
mation tion br brea eaks ks co comb mbina inato toria rial l exp xplosio losion n for
- r
reinf einfor
- rce
cemen ment lear learning ning
Faculty of Informatics Eötvös Loránd University
Thank you!
Faculty of Informatics Eötvös Loránd University
Supplementary materials and references
Eötvös Loránd University Faculty of Informatics
inputs Hexagonal grids hexagonal grids
- grids and place fields emerge together
in the model (Lorincz, Kiszlinger, Szirtes, 2008) place fields
Grids and place cells
AAAI FSS BICA 2008 Combinatorial Explosion
Eötvös Loránd University Faculty of Informatics
Independent Process Analysis
AAAI FSS BICA 2008 Combinatorial Explosion
estimated : input of ISA:
- bserved:
Eötvös Loránd University Faculty of Informatics
References-1
- Christian Jutten, Jeanny Hérault: Blind separation of
sources: An adaptive algorithm based on neuromimetic
- architecture. Signal Processing, 24:1-10, 1991.
- Pierre Comon: Independent component analysis, a new
concept? Signal Processing, 36 (3): 287-314, 1994.
- Jean-Francois Cardoso: Multidimensional independent
component analysis. ICASSP’98, volume 4, 1941-1944.
- Zoltán Szabó, Barnabás Póczos, András Lőrincz:
Undercomplete blind subspace deconvolution. Journal of Machine Learning Research 8(May):1063-1095, 2007.
Combinatorial Explosion AAAI FSS BICA 2008
Eötvös Loránd University Faculty of Informatics
References-2
- Aapo Hyvarinen: Independent component analysis for
time-dependent stochastic processes, ICANN’98, 541- 546.
- Barnabás Póczos, Bálint Takács, András Lőrincz:
Independent subspace analysis on innovations, ECML- 2005, 698-706.
- Barnabás Póczos, András Lőrincz: D-optimal Bayesian
interrogation for parameter and noise identification of recurrent neural networks, 2008 (submitted). Available at http://arxiv.org/abs/0801.1883
- Zoltán Szabó, András Lőrincz: Towards independent
subspace analysis in controlled dynamical systems. ICARN-2008, (accepted).
AAAI FSS BICA 2008 Combinatorial Explosion