Super Mario Bros. problem Input space 5 buttons per frame 24000 - - PowerPoint PPT Presentation

super mario bros problem input space
SMART_READER_LITE
LIVE PREVIEW

Super Mario Bros. problem Input space 5 buttons per frame 24000 - - PowerPoint PPT Presentation

Super Mario Bros. problem Input space 5 buttons per frame 24000 frames 5 24000 1 . 9 10 16775 possible input sequences Exhaustive search wont work here. Tuning process Naive Representation 1 http://youtu.be/nyYdq1jJQrw


slide-1
SLIDE 1

Super Mario Bros. problem

slide-2
SLIDE 2

Input space

◮ 5 buttons per frame ◮ 24000 frames ◮ 524000 ≈ 1.9 × 1016775 possible input sequences

Exhaustive search won’t work here.

slide-3
SLIDE 3

Tuning process

slide-4
SLIDE 4

Naive Representation

1http://youtu.be/nyYdq1jJQrw

slide-5
SLIDE 5

Naive Representation

◮ Bad, because most configurations make no sense. ◮ Just mashing random buttons. ◮ Doesn’t work at all (Video 1).

1http://youtu.be/nyYdq1jJQrw

slide-6
SLIDE 6

Better Representation

◮ Movements (list):

◮ Direction (left, right, run left, or run right) ◮ Duration (frames)

slide-7
SLIDE 7

Better Representation

◮ Movements (list):

◮ Direction (left, right, run left, or run right) ◮ Duration (frames)

◮ Jumps (list):

◮ Start frame ◮ Duration (frames)

slide-8
SLIDE 8

Better Representation

◮ Movements (list):

◮ Direction (left, right, run left, or run right) ◮ Duration (frames)

◮ Jumps (list):

◮ Start frame ◮ Duration (frames)

Choosing the right representation is critical

◮ Search space size 106328 ◮ Winning run found in 13641 (≈ 104) attempts ◮ Under 5 minutes of training time

slide-9
SLIDE 9

Super Mario Bros Results

1000 1500 2000 2500 3000 3500 60 120 180 240 300 Pixels Moved Right (Progress) Autotuning Time (seconds) Win Level OpenTuner

slide-10
SLIDE 10

StreamJIT

Synchronous dataflow programs are graphs of (mostly) stateless workers with statically-known data rates. Using the data rates, the compiler can compute a schedule

  • f worker executions, fuse

workers and introduce buffers to remove synchronization, then choose a combination of data, task and pipeline parallelism to fit the machine.

x6 input LowPassFilter 5 1 FMDemodulator 1 (2) 1 DuplicateSplitter 6 1 x6 DuplicateSplitter 1 1 x2 LowPassFilter 1 (4) 1 LowPassFilter 1 (4) 1 RoundrobinJoiner 1 x2 2 Subtractor 2 1 Amplifier 1 1 RoundrobinJoiner 1 x6 6 Summer 6 1

  • utput
slide-11
SLIDE 11

Fusion, data-parallel fission and splitter/joiner removal

Expand BandStop Process BandPass Compress Expand BandStop Process BandPass Compress Adder BandPass Compress Process Expand BandPass Compress Process Expand BandStop BandStop Adder Adder Adder Adder Adder

slide-12
SLIDE 12

Autotuning

StreamJIT delegates its optimization decisions to OpenTuner, which decides

◮ an overall schedule multiplier (to amortize synchronization) ◮ whether to fuse workers ◮ whether to remove splitters and joiners ◮ buffer implementations ◮ how to allocate fused groups to cores

slide-13
SLIDE 13

Autotuning work allocation

Equal distribution across all cores is usually the best, but we need to load-balance around stateful workers.

◮ Bitset per worker, one bit per core: exponentially hard to get

equal distribution (all bits set).

◮ Array of floats summing to 1.0, one float per core: allows

load-balancing, but equal distribution is even harder.

slide-14
SLIDE 14

Autotuning work allocation

Equal distribution across all cores is usually the best, but we need to load-balance around stateful workers.

◮ Bitset per worker, one bit per core: exponentially hard to get

equal distribution (all bits set).

◮ Array of floats summing to 1.0, one float per core: allows

load-balancing, but equal distribution is even harder.

◮ Permutation of cores, total count, bias count and bias

fraction: equal division across cores, biased for load balancing.

slide-15
SLIDE 15

Bias fraction work allocation

Use the first count cores of the permutation, moving fraction of the work from the first bias count cores. Doesn’t cover all possibilities, but covers the good ones.

slide-16
SLIDE 16

Custom techniques

StreamJIT uses custom techniques that force the obvious defaults. Other techniques make some good and some bad changes: ↑-↓--↑-↓↑↑-↓ Custom techniques will then force some of the bad changes back: ↑----↑-↓↑↑-- Bandit will learn to stop using the custom techniques when they stop working or for unusual graphs where the obvious defaults are bad.