HICAMP: Architectural Support for Efficient Concurrency-Safe Shared - - PowerPoint PPT Presentation

hicamp
SMART_READER_LITE
LIVE PREVIEW

HICAMP: Architectural Support for Efficient Concurrency-Safe Shared - - PowerPoint PPT Presentation

HICAMP: Architectural Support for Efficient Concurrency-Safe Shared Structured Data Access Cheriton et al., ASPLOS 2012 Yoongu Kim 11/18/2013 1 INTRODUCTION 2 Intro: Shared Data DRAM 4GB Thread 1 Private Shared Data Thread 2 Private


slide-1
SLIDE 1

HICAMP:

Architectural Support for Efficient Concurrency-Safe Shared Structured Data Access Cheriton et al., ASPLOS 2012 Yoongu Kim 11/18/2013

1

slide-2
SLIDE 2

INTRODUCTION

2

slide-3
SLIDE 3

Intro: Shared Data

3

Private Private Shared Data

DRAM 0GB 4GB

Thread 2 Thread 1

slide-4
SLIDE 4

Intro: Shared Data

4

Private Private

DRAM 0GB 4GB

a[0] a[1] a[999]

  • ● ●

shared array

Thread 2 Thread 1

slide-5
SLIDE 5

Problem: Concurrent Accesses

5

for(i=0; i<1000; i++) sum = sum + a[i]; Thread 2 Thread 1 a[900] = -1; Read access to shared array Write access to shared array

CONFLICT!

slide-6
SLIDE 6

Traditional Solutions are Expensive

Solution #1: Lock

– Only one thread can access shared data ... – ... the thread that holds the lock

  • But what if shared data is very large?

– Example: Bank database – When an auditing thread accesses the bank database, all other threads would starve

  • No deposits/withdrawals for any customer

6

slide-7
SLIDE 7

Traditional Solutions are Expensive

Solution #2: Transaction

– Speculatively allow multiple threads to access shared data in a concurrent manner – If lucky no conflict – If unlucky undo changes to shared data & retry

  • But what if a transaction is very long?

– 100% chance of being unlucky – Undoing/retrying a transaction is wasteful

7

slide-8
SLIDE 8

Throughput vs. Number of Cores

8

12 24 36 48 12 24 36 48

Throughput Number of Cores Ideal Actual

Gap

Sharing is the root of all evil

slide-9
SLIDE 9

9

Boyd-Wickizer et al., OSDI’10

Before and after expert hand-tuning

Gap

before after before after before after before after before after before after

slide-10
SLIDE 10

Alternative Solution: “Snapshotting”

10

Shared Data Thread 2 Thread 1

slide-11
SLIDE 11

Alternative Solution: “Snapshotting”

11

Shared Data Thread 2 Thread 1

slide-12
SLIDE 12

Shared Data

Alternative Solution: “Snapshotting”

12

Shared Data Thread 2 Thread 1

slide-13
SLIDE 13

Shared Data

Alternative Solution: “Snapshotting”

13

Shared Data Thread 2 Thread 1

New Data

slide-14
SLIDE 14

Key Question

How to make memory “snapshots” cheap?

  • Naïve approaches are very expensive
  • 1. Performance waste: copying data
  • 2. Capacity waste: duplicate data
  • A better approach: HICAMP

– Provides hardware-support for “snapshots” while incurring only small overheads

14

slide-15
SLIDE 15

HICAMP: THE BASICS

15

slide-16
SLIDE 16

What is HICAMP?

  • 1. Hierarchical
  • 2. Immutable
  • 3. Content-Addressable Memory
  • 4. Processor

16

slide-17
SLIDE 17
  • 1. ‘H’ of HICAMP: “Hierarchical”

17

Data1 Data2 Data3 Addr1 Addr2 Addr3

0GB 4GB

Non-Hierarchical

slide-18
SLIDE 18
  • 1. ‘H’ of HICAMP: “Hierarchical”

18

Data1 Data2 Data3 Addr1 Addr2 Addr3

0GB 4GB

Data1 Data3 Data2

0GB 4GB

Non-Hierarchical Hierarchical

slide-19
SLIDE 19

Hierarchical

  • 1. ‘H’ of HICAMP: “Hierarchical”

19

Data1 Data2 Data3 Addr1 Addr2 Addr3

0GB 4GB

Data1 Data3 Data2 A1 A2

0GB 4GB

Non-Hierarchical

slide-20
SLIDE 20

Hierarchical

  • 1. ‘H’ of HICAMP: “Hierarchical”

20

Data1 Data2 Data3 Addr1 Addr2 Addr3

0GB 4GB

Data1 Data3 Data2 A1 A2

0GB 4GB

Addr4

Non-Hierarchical

slide-21
SLIDE 21

Hierarchical

  • 1. ‘H’ of HICAMP: “Hierarchical”

21

Data1 Data2 Data3 Addr1 Addr2 Addr3

0GB 4GB

Data1 Data3 Data2 A1 A2 A4 A3

0GB 4GB

Addr4

Non-Hierarchical

slide-22
SLIDE 22
  • 1. ‘H’ of HICAMP: “Hierarchical”

22

Data1 Data2 Data3 Addr1 Addr2 Addr3

0GB 4GB

Data1 Data3 Data2 A1 A2 A4 A3

Non-Hierarchical

slide-23
SLIDE 23
  • 1. ‘H’ of HICAMP: “Hierarchical”

23

Data1 Data2 Data3 Addr1 Addr2 Addr3

0GB 4GB

Data1 Data3 Data2 A1 A2 A4 A3

Root Addr: Addr5

Non-Hierarchical

slide-24
SLIDE 24

What is HICAMP?

  • 1. Hierarchical
  • 2. Immutable
  • 3. Content-Addressable Memory
  • 4. Processor

24

slide-25
SLIDE 25

Data10 Data3

  • 2. ‘I’ of HICAMP: “Immutable”

25

Addr5

Data1 Data2 A1 A2 A4 A3

Overwriting of data is not allowed

slide-26
SLIDE 26

Data3

  • 2. ‘I’ of HICAMP: “Immutable”

26

Addr5

Data1 Data2 A1 A2 A4 A3

You must create a new hierarchy

A4 A10

new copy new copy

Data10

Addr11

slide-27
SLIDE 27

Data3

  • 2. ‘I’ of HICAMP: “Immutable”

27

Addr5

Data1 Data2 A1 A2 A4 A3

Old and new hierarchies coexist

A4 A10 Data10

Addr11

OLD NEW

DEDUPLICATION

slide-28
SLIDE 28

What is HICAMP?

  • 1. Hierarchical
  • 2. Immutable
  • 3. Content-Addressable Memory
  • 4. Processor

28

slide-29
SLIDE 29
  • 3. “CAM” of HICAMP

29

0x123 0x123

0GB 4GB

0x123 0x123

How to eliminate duplicate values?

Traditional

slide-30
SLIDE 30
  • 3. “CAM” of HICAMP
  • Q: Why do duplicates exist?
  • A: Because you can store the same value

anywhere you want

30

For a particular value, let’s restrict the addresses it can have

slide-31
SLIDE 31

f(x)

  • 3. “CAM” of HICAMP

31

2(64x8)

elements Set of 64-byte values

≈2(32-6)

Set of addresses in 4GB DRAM

Hash function

slide-32
SLIDE 32
  • 3. “CAM” of HICAMP

32

0GB 4GB

  • 64B 64B

64B

  • 64B 64B

64B

  • 64B 64B

64B

  • 64B 64B

64B

DRAM

Col1 Col2 ColM Row1 Row2 RowN Row3

slide-33
SLIDE 33
  • 3. “CAM” of HICAMP

33

77 64-byte data value Row Address

  • 64B 64B

64B

Row77 f(x) 0x123 Column Address: 1‒M

slide-34
SLIDE 34
  • 3. “CAM” of HICAMP

34

Data Value RowAddr ColAddr Data Address

f(x) fixed flexible: to reduce hash conflicts

slide-35
SLIDE 35

PROGRAMMING MODEL

35

slide-36
SLIDE 36

36

Data3

“Root PLID”

Data1 Data2 A1 A2 A4 A3

“Segment” “Physical Line” Terminology

“Physical Line ID” (PLID)

slide-37
SLIDE 37

Virtual-to-Physical Translation

37

Hardware Software

A4 A10 Data10 Data3 Data1 Data2 A1 A2 A4 A3

PLID VSID

“Virtual Segment ID”

Segment Map

slide-38
SLIDE 38

Example Program

1: it = obj.begin(); 2: it++; 3: it++; 4: *it = newVal; 5: it->tryCommit();

38

/* it = iterator */

Data3 Data1 Data2 A1 A2 A4 A3

slide-39
SLIDE 39

Example Program

1: it = obj.begin(); 2: it++; 3: it++; 4: *it = newVal; 5: it->tryCommit();

39

begin( )

Data3 Data1 Data2 A1 A2 A4 A3

it

/* it = iterator */

  • bj

(VSID)

slide-40
SLIDE 40

Example Program

1: it = obj.begin(); 2: it++; 3: it++; 4: *it = newVal; 5: it->tryCommit();

40

begin()

Data3 Data1 Data2 A1 A2 A4 A3

/* it = iterator */

it

  • bj

(VSID)

slide-41
SLIDE 41

Example Program

1: it = obj.begin(); 2: it++; 3: it++; 4: *it = newVal; 5: it->tryCommit();

41

begin()

Data3 Data1 Data2 A1 A2 A4 A3

/* it = iterator */

it

  • bj

(VSID)

slide-42
SLIDE 42

copy copy

Example Program

1: it = obj.begin(); 2: it++; 3: it++; 4: *it = newVal; 5: it->tryCommit();

42

  • bj

(VSID)

Data3 Data1 Data2 A1 A2 A4 A3

/* it = iterator */

it

A4 A10 newVal

begin()

slide-43
SLIDE 43

copy copy

Example Program

1: it = obj.begin(); 2: it++; 3: it++; 4: it = newVal; 5: it->tryCommit();

43

  • bj

(VSID)

Data3 Data1 Data2 A1 A2 A4 A3

/* it = iterator */

it

A4 A10 newVal

begin()