 
              Logic and Knowledge Representation R e i n f o r c e m e n t l e a r n i n g , I n d u c t i v e L o g i c P r o g r a m m i n g D e s c r i p t i o n C o m p l e x i t y 1 5 J u n e 2 0 1 8 g s i l e n o @e n s t . f r G i o v a n n i S i l e n o T é l é c o m P a r i s T e c h , P a r i s - D a u p h i n e U n i v e r s i t y
Induction (again) – after Pierce I n d u c t i o n F a c t : T h e s e b e a n s a r e f r o m t h i s b a g . F a c t : T h e s e b e a n s a r e w h i t e .  H y p . r u l e : A l l t h e b e a n s f r o m t h i s b a g a r e w h i t e .
Induction (again) – after Pierce I n d u c t i o n F a c t : T h e s e b e a n s a r e f r o m t h i s b a g . F a c t : T h e s e b e a n s a r e w h i t e .  H y p . r u l e : A l l t h e b e a n s f r o m t h i s b a g a r e w h i t e . ● I n d u c t i o n e n a b l e s t h r o u g h t h e s e t t l e d m o d e l . p r e d i c t i o n
Induction (again) 3 , 4 , 6 , 8 , 1 2 , 1 4 , 1 8 , 2 0 , 2 4 , 3 0 , 3 2 , 3 8 , 4 2 , … ? ? P o s s i b l e m o d e l s ?
Induction (again) 3 , 4 , 6 , 8 , 1 2 , 1 4 , 1 8 , 2 0 , 2 4 , 3 0 , 3 2 , 3 8 , 4 2 , … ? ? P o s s i b l e m o d e l s : ● n → 4 u m b e r s n + 1 , n p r i m e n u m b e r 4 , 4 8 , 5 4 , . .
Induction (again) 3 , 4 , 6 , 8 , 1 2 , 1 4 , 1 8 , 2 0 , 2 4 , 3 0 , 3 2 , 3 8 , 4 2 , … ? ? P o s s i b l e m o d e l s : ● n → 4 u m b e r s n + 1 , n p r i m e n u m b e r 4 , 4 8 , 5 4 , . . ● n u m b e r s n s u c h t h a t f o r a l l k w i t h g c d ( n , k ) = 1 a n d → 4 2 2 n > k , n - k i s p r i m e . 8 , 5 4 , 6 0 , . . ● F L o o k o n h t t p s : / / o e i s . o r g / f o r o t h e r s . u r t h e r o b s e r v a t i o n s e n a b l e t h e c o r r e c t i o n o f t h e m o d e l .
Alien environment problem ● S u p p o s e a r o b o t l a n d s o n a n u n k n o w n p l a n e t . – i n o r d e r t o a c c o m p l i s h i t s m i s s i o n , i t h a s t o a c q u i r e a n o p e r a t i o n a l k n o w l e d g e o f : ● w h a t ( m i g h t ) o c c u r ● w h a t i t s a c t i o n s ( m i g h t ) a c h i e v e f r o m i t s ! ! ! o b s e r v a t i o n s I n d u c t i o n
Reinforcement Learning
Nim game ● T w o p l a y e r s g a m e ● E a c h p l a y e r m a y t a k e a s m a n y i t e m s f r o m a s i n g l e r o w i n t u r n ● T h e o n e w h o t a k e s t h e l a s t i t e m l o s e s .
Nim game ● H o w o n e c a n l e a r n t o w i n w i t h o u t k n o w i n g t h e r u l e s ? – r e c o r d i n g s t a t e s e n c o u n t e r e d d u r i n g t h e p l a y – u p d a t i n g v a l u e o f s t a t e s w i t h fj n a l r e s u l t s ( w o n o r l o s t ) – s e l e c t i n g a c t i o n s b r i n g i n g t o w i n n i n g s t a t e s v e r y s i m p l e e x a m p l e o f ! r e i n f o r c e m e n t l e a r n i n g
Example of reinforcement learning algorithm expected immediate expected (long-term) gain reinforcement state of the world in performing a in s from a state s' p *  V  ( ( s ) = a r g m a x ( R ( s , a ) + * ( s , a ) ) ) a transition function: action a performed in s leads to s' discount factor best strategy in s
Example of reinforcement learning algorithm expected immediate expected (long-term) gain reinforcement state of the world in performing a in s from a state s' p *  V  ( ( s ) = a r g m a x ( R ( s , a ) + * ( s , a ) ) ) a transition function: action a performed in s leads to s' discount factor best strategy in s ) U t i l i t y f u n c t i o n  V  ( Q ( s , a ) = R ( s , a ) + * ( s , a ) expected gain
Q-learning  V  ( Q ( s , a ) = R ( s , a ) + * ( s , a ) )  m = R ( s , a ) + a x ( Q ( s ' , a ' ) ) a '  m  ( = R ( s , a ) + a x ( Q ( s , a ) , a ' ) ) a ' p * ( s ) = a r g m a x Q ( s , a ) a
Q-learning algorithm  m  ( Q ( s , a ) = R ( s , a ) + a x ( Q ( s , a ) , a ' ) ) a ' p * ( s ) = a r g m a x Q ( s , a ) a initialize the table Q(s,a) to zero observe the current state s. repeat choose an action and execute it receive the reward r observe the new state s' update the table Q(s,a) as: Q(s,a) := r + max Q(s’,a’) a’ s := s’
This was about behaviour, but what about knowledge? I n d u c t i o n a s g e n e r a l i z a t i o n . . .
Version space learning L o g i c a l a p p r o a c h t o b i n a r y ● counter-example c l a s s i fj c a t i o n S e a r c h o n a p r e d e fj n e d ● s p a c e o f h y p o t h e s e s : ∨ ∨ ∨ H 1 H . . . H 2 n positive Y o u d o n o t n e e d t o m a i n t a i n ● example e x e m p l a r s ! general optimistic boundary specific pessimistic boundary [ D u b o i s , V i n c e n t ; Q u a f a f o u , M o h a m e d ( 2 0 0 2 ) . " C o n c e p t l e a r n i n g w i t h a p p r o x i m a t i o n : R o u g h v e r s i o n s p a c e s " . R S C T C 2 0 0 2 . S v e r d l i k , W. ; R e y n o l d s , R . G . ( 1 9 9 2 ) . " D y n a m i c v e r s i o n s p a c e s i n m a c h i n e l e a r n i n g " . T A I ' 9 2 . ]
Using a version space c u r r e n t e x a m p l e c u r r e n t h y p o t h e s i s , represent E t a k e n f r o m a predict from the representation of H v e r s i o n s p a c e whether or not E exemplifies H if correct then retain H if incorrect then identify the differences between E and H use the selected differences to – generalize H if it is a positive instance – specialize H if it is a negative instance c a n d i d a t e e l i m i n a t i o n a l g o r i t h m
Machine learning
Machine learning M a c h i n e l e a r n i n g i s a p r o c e s s t h a t e n a b l e s a r t i fj c i a l s y s t e m s t o i m p r o v e w i t h e x p e r i e n c e . what are the criteria?
Machine learning M a c h i n e l e a r n i n g i s a p r o c e s s t h a t e n a b l e s a r t i fj c i a l s y s t e m s t o i m p r o v e w i t h e x p e r i e n c e . ● E l e m e n t s o f a l e a r n i n g t a s k – I ∈ t e m s o f E x p e r i e n c e , i I – A ∈ v a i l a b l e A c t i o n s : a A – E v a l u a t i o n : v ( a , I ) – P → e r f o r m e r S y s t e m : b : I A → – L e a r n i n g S y s t e m : L : ( i , a , v ) . . . ( i , a , v ) b 1 1 1 n n n
Types of learning problems ● b a t c h o r o ffmi n e v s o n l i n e l e a r n i n g t r a i n i n g p h a s e a n d t e s t i n g v s l e a r n i n g w h i l e d o i n g ● c o m p l e t e v s p a r t i a l v s p o i n t w i s e f e e d b a c k f e e d b a c k c o n c e r n s a l l v s s o m e v s o n e p e r f o r m e r s y s t e m ● p a s s i v e v s a c t i v e l e a r n i n g o b s e r v a t i o n v s e x p e r i m e n t a t i o n ● a c a u s a l o r c a s u a l s e t t i n g p r e s e n c e o r n o t o f s i d e - e fg e c t s : e . g . r a i n p r e d i c t i o n v s b e h a v i o u r a l c o n t r o l ● s t a t i o n a r y v s n o n - s t a t i o n a r y e n v i r o n m e n t e v a l u a t i o n d o e s o r d o e s n o t c h a n g e i n t i m e
Recommend
More recommend