RoboX An End-to-End Solution to Accelerate Autonomous Control in - - PowerPoint PPT Presentation

robox
SMART_READER_LITE
LIVE PREVIEW

RoboX An End-to-End Solution to Accelerate Autonomous Control in - - PowerPoint PPT Presentation

RoboX An End-to-End Solution to Accelerate Autonomous Control in Robotics Alternative Computing Technologies (ACT) Lab Jacob Sacks Divya Mahajan Richard C. Lawson Georgia Institute of Technology Hadi Esmaeilzadeh University of


slide-1
SLIDE 1

RoboX

An End-to-End Solution to Accelerate Autonomous Control in Robotics

†University of California, San Diego

Jacob Sacks

Divya Mahajan Richard C. Lawson Hadi Esmaeilzadeh†

Alternative Computing Technologies (ACT) Lab Georgia Institute of Technology

ISCA ’18 Los Angeles, California

slide-2
SLIDE 2

Challenges in Autonomous Robotics

Compute-intensive Battery constraints Limited power budget Many diverse applications

slide-3
SLIDE 3

Challenges in Autonomous Robotics

Flight Time

CPU

Mobile

Processor Mobile

slide-4
SLIDE 4

Challenges in Autonomous Robotics

Flight Time

Power

slide-5
SLIDE 5

Accelerating Planning and Control

Model Predictive Control

slide-6
SLIDE 6

RoboX Workflow

Concise mathematical description Automatically synthesize DFG Statically schedule

  • n accelerator

Domain-Specific Language

System Quadrotor( ) { state position[3], angle[3]; input torque[4]; ... Task takeOff() { penalty target_height; constraint max_height; ... } }

Macro Dataflow Graph

Program Translator Controller Compiler

Statically-Scheduled Instructions

Computation Schedule Communication Schedule Memory Schedule

slide-7
SLIDE 7

Background: System Models

yaw (ɸ) roll (ψ) thrust (f4) thrust (f3) thrust (f2) thrust (f1) pitch (θ)

slide-8
SLIDE 8

!̇ = $(!, ')

inputs states time derivative

Background: Dynamics and Constraints

General nonlinear dynamics State and input constraints

yaw (ɸ) thrust (f4) thrust (f3) thrust (f2) thrust (f1) pitch (θ) roll (ψ)

! ≤ !

# ≤ !

$ ≤ $

i

slide-9
SLIDE 9

Background: Objective Function

slide-10
SLIDE 10

Background: Objective Function

slide-11
SLIDE 11

! = $%&'( ) *+ + - $

'./ ) *

0*

%1 %2

Background: Objective Function

terminal cost running cost

slide-12
SLIDE 12

Components of MPC

Objective Function Dynamics Input Constraints State Constraints Model Predictive Control

slide-13
SLIDE 13

Domain-Specific Language

Distill MPC into modular components Remain close to mathematical expressions Independent of implementation System Task Symbolic expressions Group

  • perations

Aims of RoboX DSL

slide-14
SLIDE 14

DSL: System Component

y x z

angle (θ) vel (v) ang_vel (⍵) (pos[0], pos[1])

System MobileRobot( ) { state pos[2]; state angle; input vel; input ang_vel; … }

slide-15
SLIDE 15

DSL: System Component

y x z

angle (θ) vel (v) ang_vel (⍵) (pos[0], pos[1])

System MobileRobot( ) { state pos[2]; state angle; input vel; input ang_vel; pos[0].dt = vel * cos(angle); pos[1].dt = vel * sin(angle); angle.dt = ang_vel; … }

slide-16
SLIDE 16

DSL: Task Component

System MobileRobot(...) { Task moveTo(…) { penalty target_x, target_y; target_x.running = pos[0] - desired_x; target_y.running = pos[1] - desired_y; …}}

slide-17
SLIDE 17

DSL: Task Component

System MobileRobot(...) { Task moveTo(…) { penalty target_x, target_y; target_x.running = pos[0] - desired_x; target_y.running = pos[1] - desired_y; constraint pos_bound; pos_bound.running = sqrt(pos[0]ˆ2 + pos[1]ˆ2); pos_bound.upper_bound <= radius;}}

slide-18
SLIDE 18

RoboX Accelerator Architecture

CU CU CU CU CU CU CU CU CU CU CU CU Programmable Memory Access Engine Global LD/ST Buffer Memory µCode Global µCode Buffer Shifter Bus µCode Compute Cluster 1 Compute Cluster 2 Compute Cluster N-1 Compute Cluster N

Flexible dataflow architecture

  • rganized as a

two-level hierarchy to handle large amount of data dependencies

slide-19
SLIDE 19

RoboX Accelerator Architecture

Compute- enabled interconnect to perform simple

  • perations on

in-transit data

slide-20
SLIDE 20

RoboX Accelerator Architecture

Each computer cluster executes separate compute and communication microprograms and can operate in a SIMD mode

CU

Bus µCode Comp µCode

CU

N

CU

1

slide-21
SLIDE 21

RoboX Accelerator Architecture

Compute units do not initiate communication requests but consume data from single-hop connections and a shared bus

CU CU

N

CU

1

slide-22
SLIDE 22

RoboX Accelerator Architecture

The compute unit is a three-stage pipeline an divides its memory into separate buffers to simplify communication scheduling

State Buffer Input Buffer Gradient Buffer Hessian Buffer Interm Buffer Nonlinear Nonlinear Neighbor (Right) Neighbor (Left)

slide-23
SLIDE 23

RoboX Accelerator Architecture

Programmable memory access engine prefetches instructions and data according to its own statically- scheduled microprogram

Programmable Memory Access Engine Global LD/ST Buffer Memory µCode Global µCode Buffer Shifter Bus µCode

slide-24
SLIDE 24

Instruction Set Architecture

Compute Instructions Communication Instructions Memory Instructions

Scalar SIMD Data Transfer In-Network Load Store

slide-25
SLIDE 25

Program Translator

States and inputs Dynamics function Objective function Automatic differentiation for necessary gradients

Parameterized Solver Template

Domain-Specific Language

slide-26
SLIDE 26

Controller Compiler

Computation Instruction Schedule Communication Instruction Schedule Memory Instruction Schedule

CU CU CU CU CU CU CU CU CU CU CU CU Programmable Memory Access Engine Global LD/ST Buffer Memory µCode Global µCode Buffer Shifter Bus µCode Compute Cluster 1 Compute Cluster 2 Compute Cluster N-1 Compute Cluster N

Decode Mapping and Scheduling

slide-27
SLIDE 27

Benchmarks

Name MobileRobot Manipulator AutoVehicle MicroSat Quadrotor Hexacopter System Two-Wheel Mobile Robot Two-Link Manipulator Four-Wheel Vehicle Miniature Satellite Four-Rotor Micro UAV Six-Rotor Micro UAV Task Trajectory Tracking Reaching High-Speed Racing Orbit Control Motion Planning Attitude Control Task # States

slide-28
SLIDE 28

Platforms

Tegra X2 GTX 650 Ti Tesla K40 ARM Cortex A57 Intel Xeon E3

CPU

Low Power High Performance Low Power Desktop Class High Performance

slide-29
SLIDE 29

Evaluation

0.0 X 5.0 X 10.0 X 15.0 X 20.0 X 25.0 X 30.0 X 35.0 X 40.0 X

MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean

ARM Xeon RoboX

Speedup

79 X 65 X

On average, RoboX achieves a 29.4X and 7.3X speedup over the ARM A57 and Xeon E3, respectively

slide-30
SLIDE 30

Evaluation

0.0 X 0.5 X 1.0 X 1.5 X 2.0 X 2.5 X 3.0 X 3.5 X 4.0 X

MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean

GTX 650 Ti Tegra X2 Tesla K40 RoboX

Speedup

On average, RoboX achieves a 2.0X and 3.5X speedup over the GTX and Tegra, respectively, and is 1.3X slower than the Tesla

slide-31
SLIDE 31

Evaluation

0.1 X 1.0 X 10.0 X 100.0 X

MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean

GTX 650 Ti Tegra X2 Tesla K40 RoboX

Performance-per-Watt

On average, RoboX achieves a 65.5X, 7.9X, and 71.8X performance- per-watt improvement over the GTX, Tegra, and Tesla, respectively

slide-32
SLIDE 32

Conclusion

Deliver significant performance and energy gains while abstracting away details of controls, optimization, and hardware First step towards enabling full-stack solutions for robotics from high-level mathematical specifications Domain-general acceleration solution by leveraging algorithmic understanding of robotics

slide-33
SLIDE 33