introduction to autonomous mobile robots overview
play

INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS Overview Low-level - PowerPoint PPT Presentation

Lesson 5 Low-level control and learning Anders Lyhne Christensen, D6.05, anders.christensen@iscte.pt INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS Overview Low-level control Ad-hoc Sense-think-act loop Event driven control


  1. Lesson 5 – Low-level control and learning Anders Lyhne Christensen, D6.05, anders.christensen@iscte.pt INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS

  2. Overview � Low-level control � Ad-hoc � Sense-think-act loop � Event driven control � Learning and adaptation I � Types of learning � Issues in learning � Example: Q-learning applied to a real robot (next time, we will discuss an interesting approach to learning called evolutionary robotics in more detail)

  3. Low-level control We will cover three types of low-level control: � Stream of instructions � Classic Control loop � Event-driven languages Other approaches such as logic programming exist, but we will not cover those in this course.

  4. Stream of instructions Example: // move forward for 2 seconds: moveForward(speed = 10) sleep(2000) if (obstacleAhead()) { turnLeft(speed = 10) sleep(1000) } else { … } � Suitable for industrial, assembly line robots � Easy to describe a fixed, predefined task as a recipe � Little branching

  5. Classic control loop Sense Think Act The loop usually has a fixed duration, e.g. 100 ms and is called repeatedly

  6. Classic control loop Sense Think Act The loop usually has a fixed duration, e.g. 100 ms and is called repeatedly

  7. Classic control loop while (!Button.ESCAPE.isPressed()) { long startTime = System.currentTimeMillis(); sense(); // read sensors think(); // plan next action act(); // do next action try { Thread.sleep(100 – ( System.currentTimeMillis() – startTime)); } catch (Exception e) {} }

  8. Event-driven languages URBI script – examples: Ball tracking: whenever (ball.visible) { headYaw.val += camera.xfov * ball.x & headPitch.val += camera.yfov * ball.y }; Interaction: at (speech.hear("hello")) { voice.say("How are you?") & robot.standup(); }

  9. Distributed and event-driven Proximity sensors microcontroller Event bus Left motor microcontroller Right motor microcontroller ….

  10. Based on slides from Prof. Lynn E. Parker LEARNING AND ADAPTATION

  11. What is Learning/Adaptation? � Many definitions: � Modification of behavioral tendency by experience. (Webster 1984) � A learning machine, broadly defined, is any device whose actions are influenced by past experiences. (Nilsson 1965) � Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population. (Simon 1983) � An improvement in information processing ability that results from information processing activity. (Tanimoto 1990) � Our operational definition: � Learning produces changes within an agent that over time enable it to perform more effectively within its environment.

  12. What is Relationship between Learning and Adaptation? � Evolutionary adaptation: Descendents change over long time scales based on the success or failure of their ancestors in the environment � Structural adaptation: Agents adapt their morphology with respect to the environment � Sensor adaptation: An agent’s perceptual system becomes more attuned to its environment � Behavioral adaptation: An agent’s individual behaviors are adjusted relative to one another � Learning: Essentially anything else that results in a more ecologically fit agent (can include adaptation).

  13. Habituation and Sensitization � Adaptation may produce habituation or sensitization � Habituation: � An eventual decrease in or cessation of a behavioral response when a stimulus is presented numerous times � Useful for eliminating spurious or unnecessary responses Example of habituation � Generally associated with relatively insignificant stimuli, such as loud noise � Sensitization: � The opposite – an increase in the probability of a behavioral response when a stimulus is repeated frequently � Generally associated with “dire” stimuli, like electrical shocks Sensitization

  14. Learning � Learning, on the other hand, can improve performance in additional ways: � Introducing new knowledge (facts, behaviors, rules) into the system � Generalizing concepts from multiple examples � Specializing concepts for particular instances that are in some way different from the mainstream � Reorganizing the information within the system to be more efficient � Creating or discovering new concepts � Creating explanations of how things function � Reusing past experiences

  15. AI Research has Generated Several Learning Approaches � Reinforcement learning: rewards and/or punishments are used to alter numeric values in a controller � Evolutionary learning: Genetic operators such as crossover and mutation are used over populations of controllers, leading to more efficient control strategies � Neural networks: A form of reinforcement learning that uses specialized architectures in which learning occurs as the result of alterations in synaptic weights � Learning from experience: � Memory-based learning: myriad individual records of past experiences are used to derive function approximators for control laws � Case-based learning: Specific experiences are organized and stored as a case structure, then retrieved and adapted as needed based on the current situational context

  16. Learning Approaches (con’t.) � Inductive learning: Specific training examples are used, each in turn, to generalize and/or specialize concepts or controllers � Explanation-based learning: Specific domain knowledge is used to guide the learning process � Multistrategy learning: Multiple learning methods compete and cooperate with each other, each specializing in what it does best

  17. Challenges with Learning � Credit assignment problem: How is credit or blame assigned to a particular piece or pieces of knowledge in a large knowledge base, or to the components of a complex system responsible for either the success or failure of an attempt to accomplish a task? � Saliency problem: What features in the available input stream are relevant to the learning task? � New term problem: When does a new representational construct (concept) need to be created to capture some useful feature effectively? � Indexing problem: How can a memory be efficiently organized to provide effective and timely recall to support learning and improved performance? � Utility problem: How does a learning system determine that the information it contains is still relevant and useful? When is it acceptable to forget things?

  18. Example: Q-Learning Algorithm � Provides ability to learn by determining which behavioral actions are most appropriate for a given situation � State-action table: Actions, a State, x Next state y , utility E(y) � E(y) = utility of state y

  19. Update function for Q(x,a) � Q(x,a) � Q(x,a) + β (r + λ E(Y) – Q(x,a)) � Where: � β is learning rate parameter � r is the payoff (reward or punishment) � λ is a parameter, called the discount factor, ranging between 0 and 1 � E(y) is the utility of the state y that results from the action and is computed by: E(y) = max(Q(y,a)) for all actions a � Reward actions are propagated across states so that rewards from similar states can facilitate learning, too. � What is “similar state”? One approach: Weighted Hamming Distance

  20. Utility Function Used to Modify Robot’s Behavioral Responses Initialize all Q(x,a) to 0. Do Forever � Determine current world state s via sensing � 90% of the time choose action a that maximizes Q(x,a) else pick random action � Execute a � Determine reward r � Update Q(x, a) as described � Update Q(x’,a) for all states x’ similar to x End Do

  21. Example of Using Q-Learning: Teaching Box-Pushing � Robot (Obelix): � 8 sonar (4 look forward, 2 look right, 2 look left) � Sonar quantized into two ranges: � NEAR (from 9-18 inches) � FAR (from 18-30 inches) � Forward-looking infrared (IR): � Binary response of 4 inches to indicate when robot in BUMP state � Current to drive motors monitored to determine if robot is STUCK (i.e., input current exceeds a threshold) � Total of 18 bits of sensor information available: 16 sonar bits (NEAR, FAR), two for BUMP and STUCK � Motor control outputs – five choices: Obelix robot and box, 1991 � Moving forward � Turning left 22 degrees � Turning right 22 degrees � Turning more sharply left at 45 degrees � Turning more sharply right at 45 degrees

  22. Robot’s Learning Problem � Learning Problem: � Deciding, for any of the approximately 250,000 perceptual states, which of the 5 possible actions will enable it to find and push boxes around a room without getting stuck 5 actions 250,000 perceptual states = 250,000 x 5 = 1,250,000 state/action pairs to explore!

  23. State Diagram of Behavior Transitions Anything else Finder STUCK + ∆ t Unwedger BUMP • Finder: moves robot toward possible boxes STUCK • Pusher: occurs after STUCK BUMP results from box Pusher find • Unwedger: removes BUMP + ∆ t robot when box is no longer pushable BUMP

  24. Measurement of “State Nearness” � Use 18-bit representation of state (16 for sonar, two for BUMP and STUCK) � Compute Hamming distance between states � Recall: Hamming distance = number of bits in which the two states differ � For this example, states were considered “near” if Hamming distance < 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend