Beyond'the'Wall:' Near0Data'Processing'for'Databases Sam$Xi - - PowerPoint PPT Presentation

beyond the wall
SMART_READER_LITE
LIVE PREVIEW

Beyond'the'Wall:' Near0Data'Processing'for'Databases Sam$Xi - - PowerPoint PPT Presentation

Beyond'the'Wall:' Near0Data'Processing'for'Databases Sam$Xi ,'Ore'Babarinsa,' Manos$Athanassoulis ,'Stratos Idreos HARVARD'UNIVERSITY 1 Memory'Wall Memory'Wall HARVARD'UNIVERSITY 3 Row'store Column'store tuple tuple HARVARD'UNIVERSITY 4


slide-1
SLIDE 1

Beyond'the'Wall:'

Near0Data'Processing'for'Databases

1 HARVARD'UNIVERSITY

Sam$Xi,'Ore'Babarinsa,'Manos$Athanassoulis,'Stratos Idreos

slide-2
SLIDE 2

Memory'Wall

slide-3
SLIDE 3

Memory'Wall

3 HARVARD'UNIVERSITY

slide-4
SLIDE 4

Row'store Column'store

tuple tuple

4 HARVARD'UNIVERSITY

slide-5
SLIDE 5

Memory0optimized'data'systems

5 HARVARD'UNIVERSITY

slide-6
SLIDE 6

Data'access'remains$the'bottleneck

6 HARVARD'UNIVERSITY

slide-7
SLIDE 7

HARVARD'UNIVERSITY 7

slide-8
SLIDE 8

HARVARD'UNIVERSITY 8

σ Σ π

slide-9
SLIDE 9

HARVARD'UNIVERSITY 9

We'are'not'the'first'to'visit'this'pyramid!

slide-10
SLIDE 10

Intelligent'RAM DIVA LogicRinRmemory Terasys RADram

NearRdata' processing

10 HARVARD'UNIVERSITY

slide-11
SLIDE 11

HARVARD'UNIVERSITY 11

DRAM Logic Leakage Switching2speed

Low Slow High Fast

Fabrication2processes2are2incompatible

Why'did'NDP'not'take'off?

slide-12
SLIDE 12

Moore’s'Law'+'Dennard'scaling

provided'consistent'performance'scaling'for'years

Metric Scaling2factor

Area 1/κ2 Delay 1/κ Power 1 Moore’s'Law. Dennard'scaling.

12 HARVARD'UNIVERSITY

Not'the'case'anymore!

slide-13
SLIDE 13

HARVARD'UNIVERSITY 13

Ibex

Our$approach

HARP Q100 Widx

slide-14
SLIDE 14

Outline

Intro NDP'for'data'systems:'Past'and'present The'architecture'of'JAFAR Experimental'results Conclusion

HARVARD'UNIVERSITY 14

slide-15
SLIDE 15

Opportunity'for'NDP

Host'server Database Query

Lots2of2data

Many'rows'fail'the' query'predicate'and' are'discarded.

Filter2data2before2 it2is2sent2to2CPU.

15 HARVARD'UNIVERSITY

slide-16
SLIDE 16

DRAM

JAFAR

JAFAR:'“Just”'A'Filtering' Accelerator'on'Relations

CPU CPU CPU CPU System'bus'+'memory'controller DRAM

JAFAR

16 HARVARD'UNIVERSITY

Last'level'cache

slide-17
SLIDE 17

Rank

Sense2Amps Sense2Amps

Bank20 Bank20 Bank20

Sense2Amps Sense2Amps

Bank20

Row'address'decoder

Column'address'decoder

Chip

17 HARVARD'UNIVERSITY

slide-18
SLIDE 18

Sense2Amps Sense2Amps

Bank20 Bank20 Bank20

Sense2Amps Sense2Amps

Bank20

Row'address'decoder

Column'address'decoder

Rank Array20 Array21 Array22 Array23

Bank

Rank

18 HARVARD'UNIVERSITY

slide-19
SLIDE 19

DRAM

JAFAR

JAFAR:'Overall'design

CPU CPU CPU CPU System'bus'+'memory'controller DRAM

JAFAR

19 HARVARD'UNIVERSITY

Last'level'cache

slide-20
SLIDE 20

JAFAR'context

Sense2Amps Sense2Amps

Bank20 Bank20 Bank20

Sense2Amps

Sense'Amps

Bank20

JAFAR

IO'buffer

Memory2 access2 arbiter From'CPU RAS CAS

20 HARVARD'UNIVERSITY

slide-21
SLIDE 21

Opcode Left Right Opcode

Comparison'is'true? page'offset'bitmask write'enable

From1IO1buffer Data'latch

ALU ALU

JAFAR'architecture

Page'offset'counter Output'buffer

21 HARVARD'UNIVERSITY

slide-22
SLIDE 22

Programming'JAFAR

int errno = select_jafar( void* col_data, int range_low, int range_high, uint8_t*

  • ut_buf,

size_t num_input_rows, size_t* num_output_rows); 22 HARVARD'UNIVERSITY

slide-23
SLIDE 23

Handling'multiple'modules

DRAM

JAFAR

DRAM

JAFAR

23 HARVARD'UNIVERSITY

CPU CPU CPU CPU System'bus'+'memory'controller Last'level'cache

slide-24
SLIDE 24

Handling'multiple'modules

Fill'up'each'module'first

DRAM DRAM

JAFAR JAFAR

24 HARVARD'UNIVERSITY

CPU CPU CPU CPU System'bus'+'memory'controller Last'level'cache

slide-25
SLIDE 25

Handling'multiple'modules

Interleave'data'across'modules

DRAM DRAM

JAFAR JAFAR

25 HARVARD'UNIVERSITY

CPU CPU CPU CPU System'bus'+'memory'controller Last'level'cache

slide-26
SLIDE 26

Coordinating'memory'access

The'CPU'and'JAFAR'cannot'simultaneously'attempt' to'access'memory. CPU'grants'JAFAR'ownership'to'a'DRAM'rank'for'a' period'of'time. Possible'mechanism:'DRAM'mode'registers

26 HARVARD'UNIVERSITY

slide-27
SLIDE 27

gem5

Experimental'setup

Simulation'framework

OutRofRorder'CPU Classic'cache'model SimpleDRAM

27 HARVARD'UNIVERSITY

slide-28
SLIDE 28

1M

InRhouse'column'store' database 4'million'rows'of' unsorted'integers

Experimental'setup

Queries,'input'data,'and'database

select * from table where column < n ;

28 HARVARD'UNIVERSITY

slide-29
SLIDE 29

Experimental'results

29 HARVARD'UNIVERSITY

slide-30
SLIDE 30

Memory'contention

Scheduling'of'ownership'transfers'will'be' important What'would'JAFAR’s'performance'look'like' without a'scheduler?

30 HARVARD'UNIVERSITY

slide-31
SLIDE 31

Memory'contention

CPU

Idle'period JAFAR'can'execute Memory'requests Memory'requests

31 HARVARD'UNIVERSITY

slide-32
SLIDE 32

Idle'periods'on'TPC0H

32 HARVARD'UNIVERSITY

slide-33
SLIDE 33

JAFAR'as'a'framework

More'operators Aggregations Projections Sort Joins

! ! ! ?

33 HARVARD'UNIVERSITY

slide-34
SLIDE 34

JAFAR'as'a'framework'

Data'types'and'layouts RowRstores'and'hybrids

Multiple$filters$per$row Efficient$projections

Variable'length'datatypes

Process$on$CPU?

34 HARVARD'UNIVERSITY

slide-35
SLIDE 35

NDP'is'an'exciting'opportunity'for' innovation'in'data'systems

35 HARVARD'UNIVERSITY

slide-36
SLIDE 36

HARVARD'UNIVERSITY 36

NDP'is'a'promising'solution'to'the' memory'wall'for'data'systems. JAFAR'provides'up'to'9x'speedup'on' simple'select'queries. JAFAR'is'built'on'an'extensible' framework'for'accelerating'data'systems.

slide-37
SLIDE 37

Thank'you

37 HARVARD'UNIVERSITY