HIVE: Fault Containment for Shared-Memory Multiprocessors J. - - PowerPoint PPT Presentation

hive fault containment for shared memory multiprocessors
SMART_READER_LITE
LIVE PREVIEW

HIVE: Fault Containment for Shared-Memory Multiprocessors J. - - PowerPoint PPT Presentation

HIVE: Fault Containment for Shared-Memory Multiprocessors J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, A. Gupta CSE 598C Presented by: Sandra Rueda The Problem O.S. for managing FLASH architecture (large shared-memory


slide-1
SLIDE 1

HIVE: Fault Containment for Shared-Memory Multiprocessors

  • J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, A. Gupta

CSE 598C Presented by: Sandra Rueda

slide-2
SLIDE 2

Jan 30,2006 2

The Problem

 O.S. for managing FLASH architecture (large

shared-memory multiprocessor)

… …

Mem Proc

CC

2L$

 Set of nodes connected in a mesh  NUMA

Net I/O

slide-3
SLIDE 3

Jan 30,2006 3

Hive: Main Goals

 Memory Sharing: improving Performance  Possible Failures:

 A faulty node makes that node’s memory

inaccessible

 A faulty node returns wrong values for reads  Software failures may corrupt other node’s memory

slide-4
SLIDE 4

Jan 30,2006 4

Main Goals

 Fault Containment: Hardware or software faults

are confined to the cell where they occurred, as a consequence just that cell crashes.

 Scalability:

 Few resources are shared among different cells.  More processors

→ more cells→ more parallelism

slide-5
SLIDE 5

Jan 30,2006 5

O.S. Architecture

 Multicellular architecture: processors are grouped

into cells. An independent kernel manages each cell (UNIX SVR4).

… … …

Cell 0 Cell 1 Cell n Global Address Space Local Address Space

Cell Cell Organization: Memory Organization:

slide-6
SLIDE 6

Jan 30,2006 6

Fault Containment (1) Failure Sources

 Sources and control methods:

 Message exchange (RPC):

 timeout + message check

 Remote reads:

 careful_reference + message check

 Remote writes:

 Internal data: firewall  User level data:

 Protection of local space  Preemptive discard

slide-7
SLIDE 7

Jan 30,2006 7

Fault Containment (2) Control Methods

 Careful_reference protocol prevents errors from

causing a kernel panic.

 Save context  Check the memory range belongs to the expected cell  Copy data values  Check every remote data structure  Careful_off

slide-8
SLIDE 8

Jan 30,2006 8

Fault Containment (3) Control Methods

 The Firewall controls which processors are allowed

to modify each region of main memory.

 Only the local processor can change firewall bits.  Rights are assigned to:

 First process that requests a writable mapping to the

page.

 All the processors in a cell.

 Preemptive Discard (recovery)

slide-9
SLIDE 9

Jan 30,2006 9

Fault Containment (4) Detection

 Detection of a failure:

 RPC request times out  Memory reading operation causes a bus error  Periodic updating of a shared location fails  Data fails consistency check

 When a failure is detected then an agreement

protocol is run among other cells

slide-10
SLIDE 10

Jan 30,2006 10

Fault Containment (5) Recovery

 First Phase:

 Each cell flushes its TLB and remove any remote

mapping.

 Second Phase:

 At the end of the first phase there is no pending remote

access, so it is possible to revoke firewall write permissions.

 The virtual memory subsystem detects pages that were

writable by a failed cell and notifies to the file system.

slide-11
SLIDE 11

Jan 30,2006 11

Fault Containment (6) Recovery

 Preemptive Discard:

 It is possible for a process to fetch stale data from disk

after a recovery

 Only processes that opened a file before a failure will

receive I/O errors. It is implemented with a generation number, mismatches about the number will generate an error.

slide-12
SLIDE 12

Jan 30,2006 12

Memory Sharing (1)

 Two types of memory sharing:

 logical level: a process on a cell maps a data page from

another cell into its address space

exp imp pfdat table pfdat table cell i cell j mem pages mem pages

slide-13
SLIDE 13

Jan 30,2006 13

Memory Sharing (2)

 Two types of memory sharing:

 physical level: one cell transfers control over a page

frame to another

X brw pfdat table pfdat table cell i cell j mem pages mem pages

slide-14
SLIDE 14

Jan 30,2006 14

Memory Sharing (3)

 WAX:

 It is a user level process that may have access to all cells.

In this way it is able to consolidate a global view of the system.

 Some decisions are made based on the global view. For

instance processes priorities.

Cell 0 Cell n … WAX Proc i Proc m …

slide-15
SLIDE 15

Jan 30,2006 15

RPC: Optimization

 Some times cells exchange information via RPC  FLASH architecture includes hardware support to

minimize RPC latency

 The mechanism is based on the cache-line delivery

mechanism used by the cache coherency protocol (SIPS: Short Interprocessor Send Facility)

 Primitive is reliable  No message fragmentation

slide-16
SLIDE 16

Jan 30,2006 16

Experimental Results

 At the time of the paper

 Hive was a prototype  FLASH hardware was not available yet  Authors used SimOS

slide-17
SLIDE 17

Jan 30,2006 17

Simulation Environment

 Hardware

 4 processors MIPS 200 MHz  memory 128 MB  4 disk controllers, each with one attached disk  4 ethernet interfaces  4 consoles

 Hive

 4 cells  each cell: 1 processor, 32 MB memory, 1 interface, 1 disk

slide-18
SLIDE 18

Jan 30,2006 18

Simulation Environment

 Memory Hierarchy (per processor):

 Instruction cache: 32 K, two-way-associative  Primary data cache: 32 K, two-way-associative  Secondary unified cache: 1 MB, two-way-associative  Given miss penalty

 Given SIPS latency  Given interrupt latency  Given disk latency  Some values are based on other models

slide-19
SLIDE 19

Jan 30,2006 19

Simulation Environment

 Performance Tests

 Expected workloads (scientific application, parallel

application)

 Times for IRIX 5.2 (reference)  Different configurations: one, two, four cells  Conclusion: The partition into cells has little effect on

performance, and it allows fault containment

slide-20
SLIDE 20

Jan 30,2006 20

Simulation Environment

 Fault Injection Tests

 Difficult to predict the reliability of a complex system  Fault injection tests are used to detect if reliability

mechanisms are working properly

 Authors chose to inject failures in situations where it

seemed that a fault in one cell could corrupt another

 They checked files after recovery to detect data

corruption

 The simulator allowed them to recreate scenarios from a

specific checkpoint

slide-21
SLIDE 21

Jan 30,2006 21

Conclusion Simulation Environment

 Advantages [1]

 Evaluation of hardware support  Evaluation of designed mechanisms  Evaluation of tradeoffs

 Problems [2]

 Simulator Bugs  Omissions  Lack of Detail

 Key Features define if it is useful

slide-22
SLIDE 22

Jan 30,2006 22

References

 [1] Hive: Fault Containment for Shared-Memory

Multiprocessors, J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, and A. Gupta, SOGOPS 1995.

 [2] Flash Vs. Simulated Flash. Closing the Simulation

  • Loop. Jeff Gibson, Robert Kunz, David Ofelt, Mark

Horowitz, John Hennessy, Mark Heinrich. SIGARCH Volume 28 , Issue 5 (December 2000).