Distributed W atchpoints: Debugging Very Large Ensem bles of Robots - - PowerPoint PPT Presentation

distributed w atchpoints debugging very large ensem bles
SMART_READER_LITE
LIVE PREVIEW

Distributed W atchpoints: Debugging Very Large Ensem bles of Robots - - PowerPoint PPT Presentation

Distributed W atchpoints: Debugging Very Large Ensem bles of Robots De Rosa, Goldstein, Lee, Campbell, Pillai Aug 19, 2006 Motivation Distributed errors are hard to find with traditional debugging tools Centralized snapshot algorithms


slide-1
SLIDE 1

Distributed W atchpoints: Debugging Very Large Ensem bles of Robots De Rosa, Goldstein, Lee, Campbell, Pillai Aug 19, 2006

slide-2
SLIDE 2

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 2

Motivation

  • Distributed errors are hard to find with traditional debugging tools
  • Centralized snapshot algorithms

– Expensive – Geared towards detecting one error at a time

  • Special-purpose debugging code is difficult to write, may itself

contain errors

slide-3
SLIDE 3

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 3

Expressing and Detecting Distributed Conditions

“How can we represent, detect, and trigger on distributed conditions in very large multi-robot systems?”

  • Generic detection framework, well suited to debugging
  • Detect conditions that are not observable via the local state of one

robot

  • Support algorithm-level debugging (not code/ HW debugging)
  • Trigger arbitrary actions when condition is met
  • Asynchronous, bandwidth/ CPU-limited systems
slide-4
SLIDE 4

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 4

Distributed/ Parallel Debugging: State of the Art

Modes:

  • Parallel: powerful nodes, regular (static) topology, shared memory
  • Distributed: weak, mobile nodes

Tools:

  • GDB
  • printf()
  • Race detectors
  • Declarative network systems with debugging support (ala P2)
slide-5
SLIDE 5

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 5

Exam ple Errors: Leader Election

Scenario: One Leader Per Tw o-Hop Radius

slide-6
SLIDE 6

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 6

Exam ple Errors: Token Passing

Scenario: I f a node has the token, exactly one

  • f it’s neighbors m ust have had it last tim estep
slide-7
SLIDE 7

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 7

Exam ple Errors: Gradient Field

Scenario: Gradient Values Must Be Sm ooth

slide-8
SLIDE 8

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 8

Expressing Distributed Error Conditions

Requirements:

  • Ability to specify shape of trigger groups
  • Temporal operators
  • Simple syntax (reduce programmer effort/ learning curve)

A Solution:

  • Inspired by Linear Temporal Logic (LTL)

– A simple extension to first-order logic – Proven technique for single-robot debugging [ Lamine01]

  • Assumption: Trigger groups must be connected

– For practical/ efficiency reasons

slide-9
SLIDE 9

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 9

W atchpoint Prim itives

  • Modules (implicitly quantified over all connected sub-ensembles)
  • Topological restrictions (pairwise neighbor relations)
  • Boolean connectives
  • State variable comparisons (distributed)
  • Temporal operators

nodes(a,b,c); n(b,c) & (a.var > b.var) & (c.prev.var != 2)

slide-10
SLIDE 10

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 1 0

Distributed Errors: Exam ple W atchpoints

nodes( a,b,c) ;n( a.b) & n( b,c) & ( a.isLeader = = 1 ) & ( c.isLeader = = 1 ) nodes( a,b,c) ;n( a,b) & n( a,c) & ( a.token = = 1 ) & ( b.prev.token = = 1 ) & ( c.prev.token = = 1 ) nodes( a,b) ;( a.state - b.state > 1 )

slide-11
SLIDE 11

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 1 1

W atchpoint Execution

nodes(a,b,c)…

2 1 4 3 6 5 8 7 10 9 12 11 14 13 16 15 18 17 20 19 22 21 24 23 26 25 28 27 30 29 32 31

1 2 3 1 2 1 9

. . . .

1 9 2 1 9 10

slide-12
SLIDE 12

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 1 2

Perform ance: W atchpoint Size

  • 1000 modules, running for 100 timesteps
  • Simulator overhead excluded
  • Application: data aggregation with landmark routing
  • Watchpoint: are the first and last robots in the watchpoint in the same state?
slide-13
SLIDE 13

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 1 3

Perform ance: Num ber of Matchers

  • This particular watchpoint never terminates early
  • Number of matchers increases exponentially
  • Time per matcher remains within factor of 2
  • Details of the watchpoint expression more important than size
slide-14
SLIDE 14

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 1 4

Perform ance: Periodically Running W atchpoints

slide-15
SLIDE 15

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 1 5

Future W ork

  • Distributed implementation
  • More optimization
  • User validation
  • Additional predicates
slide-16
SLIDE 16

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 1 6

Conclusions

  • Simple, yet highly descriptive syntax
  • Able to detect errors missed by more conventional techniques
  • Low simulation overhead
slide-17
SLIDE 17

Thank You

slide-18
SLIDE 18

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 1 8

Backup Slides

slide-19
SLIDE 19

8 / 1 9 / 2 0 0 6 Distributed W atchpoints 1 9

Optim izations

  • Temporal span
  • Early termination
  • Neighbor culling
  • (one slide per)