INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS Overview Low-level - - PowerPoint PPT Presentation

introduction to autonomous mobile robots overview
SMART_READER_LITE
LIVE PREVIEW

INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS Overview Low-level - - PowerPoint PPT Presentation

Lesson 5 Low-level control and learning Anders Lyhne Christensen, D6.05, anders.christensen@iscte.pt INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS Overview Low-level control Ad-hoc Sense-think-act loop Event driven control


slide-1
SLIDE 1

INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS

Lesson 5 – Low-level control and learning Anders Lyhne Christensen, D6.05, anders.christensen@iscte.pt

slide-2
SLIDE 2

Overview

Low-level control

Ad-hoc Sense-think-act loop Event driven control

Learning and adaptation I

Types of learning Issues in learning Example: Q-learning applied to a real robot

(next time, we will discuss an interesting approach to learning called evolutionary robotics in more detail)

slide-3
SLIDE 3

Low-level control

We will cover three types of low-level control:

Stream of instructions Classic Control loop Event-driven languages

Other approaches such as logic programming exist, but we will not cover those in this course.

slide-4
SLIDE 4

Stream of instructions

Example:

// move forward for 2 seconds: moveForward(speed = 10) sleep(2000) if (obstacleAhead()) { turnLeft(speed = 10) sleep(1000) } else { … }

Suitable for industrial, assembly line robots

Easy to describe a fixed, predefined task as a recipe Little branching

slide-5
SLIDE 5

Classic control loop

Sense Think Act

The loop usually has a fixed duration, e.g. 100 ms and is called repeatedly

slide-6
SLIDE 6

Classic control loop

Sense Think Act

The loop usually has a fixed duration, e.g. 100 ms and is called repeatedly

slide-7
SLIDE 7

Classic control loop

while (!Button.ESCAPE.isPressed()) { long startTime = System.currentTimeMillis(); sense(); // read sensors think(); // plan next action act(); // do next action try { Thread.sleep(100 – (System.currentTimeMillis() – startTime)); } catch (Exception e) {} }

slide-8
SLIDE 8

Event-driven languages

URBI script – examples:

Ball tracking:

whenever (ball.visible) { headYaw.val += camera.xfov * ball.x & headPitch.val += camera.yfov * ball.y };

Interaction:

at (speech.hear("hello")) { voice.say("How are you?") & robot.standup(); }

slide-9
SLIDE 9

Distributed and event-driven

…. Right motor microcontroller Left motor microcontroller Proximity sensors microcontroller

Event bus

slide-10
SLIDE 10

LEARNING AND ADAPTATION

Based on slides from Prof. Lynn E. Parker

slide-11
SLIDE 11

What is Learning/Adaptation?

Many definitions:

Modification of behavioral tendency by experience.

(Webster 1984)

A learning machine, broadly defined, is any device whose

actions are influenced by past experiences. (Nilsson 1965)

Any change in a system that allows it to perform better the

second time on repetition of the same task or on another task drawn from the same population. (Simon 1983)

An improvement in information processing ability that

results from information processing activity. (Tanimoto 1990)

Our operational definition:

Learning produces changes within an agent that over time

enable it to perform more effectively within its environment.

slide-12
SLIDE 12

What is Relationship between Learning and Adaptation?

Evolutionary adaptation: Descendents change over long time

scales based on the success or failure of their ancestors in the environment

Structural adaptation: Agents adapt their morphology with

respect to the environment

Sensor adaptation: An agent’s perceptual system becomes

more attuned to its environment

Behavioral adaptation: An agent’s individual behaviors are

adjusted relative to one another

Learning: Essentially anything else that results in a more

ecologically fit agent (can include adaptation).

slide-13
SLIDE 13

Habituation and Sensitization

Adaptation may produce habituation

  • r sensitization

Habituation: An eventual decrease in or cessation of a behavioral response when a stimulus is presented numerous times Useful for eliminating spurious or unnecessary responses Generally associated with relatively insignificant stimuli, such as loud noise Sensitization: The opposite – an increase in the probability of a behavioral response when a stimulus is repeated frequently Generally associated with “dire” stimuli, like electrical shocks

Sensitization Example of habituation

slide-14
SLIDE 14

Learning

Learning, on the other hand, can improve performance

in additional ways:

Introducing new knowledge (facts, behaviors, rules) into the

system

Generalizing concepts from multiple examples Specializing concepts for particular instances that are in some

way different from the mainstream

Reorganizing the information within the system to be more

efficient

Creating or discovering new concepts Creating explanations of how things function Reusing past experiences

slide-15
SLIDE 15

AI Research has Generated Several Learning Approaches

  • Reinforcement learning: rewards and/or punishments are used to alter

numeric values in a controller

  • Evolutionary learning: Genetic operators such as crossover and

mutation are used over populations of controllers, leading to more efficient control strategies

  • Neural networks: A form of reinforcement learning that uses

specialized architectures in which learning occurs as the result of alterations in synaptic weights

  • Learning from experience:

Memory-based learning: myriad individual records of past experiences

are used to derive function approximators for control laws

Case-based learning: Specific experiences are organized and stored as a

case structure, then retrieved and adapted as needed based on the current situational context

slide-16
SLIDE 16

Learning Approaches (con’t.)

Inductive learning: Specific training examples are

used, each in turn, to generalize and/or specialize concepts or controllers

Explanation-based learning: Specific domain

knowledge is used to guide the learning process

Multistrategy learning: Multiple learning methods

compete and cooperate with each other, each specializing in what it does best

slide-17
SLIDE 17

Challenges with Learning

Credit assignment problem: How is credit or blame assigned to a

particular piece or pieces of knowledge in a large knowledge base, or to the components of a complex system responsible for either the success or failure of an attempt to accomplish a task?

Saliency problem: What features in the available input stream are

relevant to the learning task?

New term problem: When does a new representational construct

(concept) need to be created to capture some useful feature effectively?

Indexing problem: How can a memory be efficiently organized to

provide effective and timely recall to support learning and improved performance?

Utility problem: How does a learning system determine that the

information it contains is still relevant and useful? When is it acceptable to forget things?

slide-18
SLIDE 18

Example: Q-Learning Algorithm

Provides ability to learn by determining which

behavioral actions are most appropriate for a given situation

State-action table: E(y) = utility of state y

Actions, a State, x Next state y, utility E(y)

slide-19
SLIDE 19

Update function for Q(x,a)

Q(x,a) Q(x,a) + β(r + λE(Y) – Q(x,a)) Where:

  • β is learning rate parameter
  • r is the payoff (reward or punishment)
  • λ is a parameter, called the discount factor, ranging between 0

and 1

E(y) is the utility of the state y that results from the action and is

computed by: E(y) = max(Q(y,a)) for all actions a

Reward actions are propagated across states so that rewards from

similar states can facilitate learning, too.

What is “similar state”? One approach: Weighted Hamming

Distance

slide-20
SLIDE 20

Utility Function Used to Modify Robot’s Behavioral Responses

Initialize all Q(x,a) to 0. Do Forever

  • Determine current world state s via sensing
  • 90% of the time choose action a that maximizes Q(x,a)

else pick random action

  • Execute a
  • Determine reward r
  • Update Q(x, a) as described
  • Update Q(x’,a) for all states x’ similar to x

End Do

slide-21
SLIDE 21

Example of Using Q-Learning: Teaching Box-Pushing

  • Robot (Obelix):

8 sonar (4 look forward, 2 look right, 2 look left) Sonar quantized into two ranges: NEAR (from 9-18 inches) FAR (from 18-30 inches) Forward-looking infrared (IR): Binary response of 4 inches to indicate when robot in BUMP state Current to drive motors monitored to

determine if robot is STUCK (i.e., input current exceeds a threshold)

Total of 18 bits of sensor information available:

16 sonar bits (NEAR, FAR), two for BUMP and STUCK

Motor control outputs – five choices: Moving forward Turning left 22 degrees Turning right 22 degrees Turning more sharply left at 45 degrees Turning more sharply right at 45 degrees Obelix robot and box, 1991

slide-22
SLIDE 22

Robot’s Learning Problem

  • Learning Problem:
  • Deciding, for any of the approximately 250,000 perceptual

states, which of the 5 possible actions will enable it to find and push boxes around a room without getting stuck 250,000 perceptual states 5 actions = 250,000 x 5 = 1,250,000 state/action pairs to explore!

slide-23
SLIDE 23

State Diagram of Behavior Transitions

Finder Unwedger Pusher BUMP BUMP BUMP + ∆t STUCK STUCK STUCK + ∆t Anything else

  • Finder: moves robot

toward possible boxes

  • Pusher: occurs after

BUMP results from box find

  • Unwedger: removes

robot when box is no longer pushable

slide-24
SLIDE 24

Measurement of “State Nearness”

Use 18-bit representation of state (16 for sonar,

two for BUMP and STUCK)

Compute Hamming distance between states Recall: Hamming distance = number of bits in

which the two states differ

For this example, states were considered “near” if

Hamming distance < 3

slide-25
SLIDE 25

Robotic Results

Q-learning strategy tested on Obelix

robot

Observations:

Using Q-learning over a random agent

substantially improved box pushing

Also compared performance to hand-

coded solution: performance of Q- learning approach was close-to or better than hand-coded solution

Importance of this work:

Its empirical demonstration of Q-learning’s

feasibility as a useful approach to behavior-based robotic learning

slide-26
SLIDE 26

Summary of Learning/Adaptation so far

Robots need to learn in order to adapt effectively to a changing

and dynamic environment

Behavior-based robots can learn in a variety of ways:

They can learn entire new behaviors They can learn more effective responses They can learn to associate more appropriate or broader stimuli with a

particular response

They can learn new combinations of behaviors They can learn more effective coordination of existing behaviors

Learning can be either continuous (on-line) or batch (off-line) Q-Learning is a form of Reinforcement Learning in which

actions and states are evaluated together.

slide-27
SLIDE 27

A Challenge: Getting RL to Work on Real Robots

When is learning appropriate?

When task is originally under-specified or difficult to code

exactly by hand

When task has parameters that are likely to change over time

in unpredictable ways

When time taken to learn control policy is less than that for

hand-coding a comparable policy

When learned policy can be executed more efficiently than a

hand-coded one

slide-28
SLIDE 28

Problems with RL on Robots

Huge number of states to explore, with large number of possible

actions in each state.

E.g., 24 sonar sensors, quantized into 3 range bands 282 billion possible

states

If possible actions in each state or go forwards or backwards > 560 billion

state-action combinations to try

Robot is physical, thus it takes time to perform an action

1 second per action 20,000 years to try each combination

During early learning, robot’s actions may be dangerous

“Let’s try rolling down the stairwell to see what next state I end up in …” One possible safeguard: give robot reflexes to stop dangerous actions

slide-29
SLIDE 29

Today’s task

Work on projects