High assurance systems Rami Melhem (U. of Pittsburgh) Ensures that - - PowerPoint PPT Presentation

high assurance systems
SMART_READER_LITE
LIVE PREVIEW

High assurance systems Rami Melhem (U. of Pittsburgh) Ensures that - - PowerPoint PPT Presentation

High assurance systems Rami Melhem (U. of Pittsburgh) Ensures that computation completes correctly in time with optimal use of resources Formal Empirical methods methods Fault- Timing Power Security tolerance guarantees management


slide-1
SLIDE 1

Formal methods

High assurance systems

Rami Melhem (U. of Pittsburgh)

Ensures that computation completes correctly in time with optimal use of resources

Empirical methods

Fault- tolerance Power management Security Timing guarantees Emerging technologies: nanophotonics & phase change memory

slide-2
SLIDE 2

Light wavelength channels λ1…λn μring modulator encode λm μring filter decode λm Photo detector Off-chip laser Waveguide

Can’t receive signal!

Digital signal 0111…0010

Challenge: Tolerance to process and temperature Variation (PV)

slide-3
SLIDE 3

Phase Change Memory (PCM)

A power saving memory technology

  • Solid State memory made of germanium-antimony alloy
  • Switching states is thermal based (not electrical based)
  • Samsung, Intel, Hitachi

and IBM developed PCM prototypes (to replace Flash).

  • 2. New error correction schemes

for stuck at faults

slide-4
SLIDE 4

Properties of PCM

  • Non-volatile but faster than Flash
  • Byte addressable but denser and cheaper than DRAM
  • Low power read and standby
  • Not susceptible to single event upsets and hence do not

need ECC

  • Errors may occur only during write (not read)
  • Scalable: at least to 32nm and beyond (9nm)

Sounds wonderful – but where is the catch?

slide-5
SLIDE 5

The Catch!!

  • Slower than DRAM, especially for write
  • Low endurance: a cell fails after 107 writes (1015 for DRAM)
  • Asymmetric Read/Write energy consumption
  • Asymmetry of writing 0’s and 1’s

Time (hence bandwidth) Current (hence power)

slide-6
SLIDE 6

Ongoing work

  • Identify a set containing the stuck-at-wrong cells

– Some non-faulty(NF) cells could possibly be members of the set but none of the stuck-at-right cells

  • At read time, invert the values read from the

identified set

  • An error correction scheme for stuck-at fault models

– Worn-out cells get stuck at 0/1 but can still be read – A worn-out cell can be classified as either stuck-at- right(SA-R) or stuck-at-wrong(SA-W) depending on the data pattern

slide-7
SLIDE 7

CPU

Memory Controller

DRAM CPU

AEB MM

PCM

Traditional architecture Hybrid architecture Advantages: cheaper + denser + lower power consumption Challenges: endurance, asymmetry, delay

A Storage Class Memory Architecture for Energy Efficient Data Centers

slide-8
SLIDE 8

A cross-layer approach

Virtual Machine

Application Operating System Logic Hardware

Virtual Machine

Application Operating System Logic Hardware

Hypervisor

Conventional VMM

Chip multiprocessor I/O controller

HD SSD

Storage system

DRAM

Main Memory

slide-9
SLIDE 9

1) OPRAM (optimized PRAM)

Virtual Machine

Application Operating System Logic Hardware

Virtual Machine

Application Operating System Logic Hardware

Hypervisor

Conventional VMM

Chip multiprocessor I/O controller

HD SSD

Storage system Main Memory

DRAM Optimized PRAM

slide-10
SLIDE 10

1) OPRAM (optimized PRAM)

Main Memory

DRAM Optimized PRAM

  • Optimization of PCM for main memory
  • Manage reliability (faults and wear)
  • Manage write latency
  • Manage asymmetric read/write power
  • Novel interfaces with controller
  • Run-time monitoring
slide-11
SLIDE 11

Virtual Machine

Application Operating System Logic Hardware

Virtual Machine

Application Operating System Logic Hardware

Hypervisor

Conventional VMM

Chip multiprocessor

MemVisor

I/O controller

HD SSD

Storage system

DRAM Optimized PRAM

Main Memory

2) MemVisor

slide-12
SLIDE 12

Hypervisor

Conventional VMM MemVisor

2) MemVisor

  • The Memory Resource Advisor to the Hypervisor
  • Allocates memory resources to virtual machines
  • Maps data and code to the components of main memory
  • Considers performance, energy, safety and endurance
  • Each VM will be managed differently based on Service

Level Agreements as well a system wide goals.

slide-13
SLIDE 13

Virtual Machine

Application Operating System Logic Hardware

Virtual Machine

Application Operating System Logic Hardware

Hypervisor

Conventional VMM

Chip multiprocessor I/O controller

HD SSD

Storage system

Hybrid memory controller

DRAM Optimized PRAM

Main Memory

3) Intelligent Hybrid controller

MemVisor

slide-14
SLIDE 14

Hybrid memory controller

DRAM Optimized PRAM

Main Memory

3) Intelligent Hybrid controller

  • Dynamically allocates PRAM and DRAM resources
  • Accepts commands and hints from MemVisor
  • Monitors usage of memory resources and performance
  • Provides feedback to MemVisor
  • Collaborates with MemVisor to improve PRAM endurance
  • Example: endurance aware cache replacement
slide-15
SLIDE 15

Virtual Machine

Application Operating System Logic Hardware

Virtual Machine

Application Operating System Logic Hardware

Hypervisor

Conventional VMM

Chip multiprocessor

MemVisor

I/O controller

HD SSD

Storage system

Hybrid memory controller

DRAM Optimized PRAM

Main Memory

SCMA (a cross-layer approach)

slide-16
SLIDE 16
  • 3. Immunity Inspired Cyber

Security

Body immune system Defense of complex information infrastructures

  • Highly distributed information

processing system

  • Self protecting
  • Dynamic
  • Diverse
  • Error tolerant
slide-17
SLIDE 17
  • Learn and retain information for future actions
  • Local components that interact globally
  • Individual components are continually created to

improve the system’s defense

  • dangerous components are destroyed and

eliminated from the body

Desirable properties to mimic

Is it a good idea to mimic natural systems??

  • - planes do no fly by flapping wings
  • - cameras??

Concepts already borrowed from biology:

  • Anomaly detection
  • Neural networks
  • Autonomic computing
slide-18
SLIDE 18

Is research on protecting critical infrastructures adequate? Is the human factor the "weakest point" in high- assurance systems? Fostering collaboration? Research on critical infrastructures without having access to real systems?

slide-19
SLIDE 19

Is research on protecting critical infrastructures adequate?

  • Threat is over-stated?
  • Preparation is inadequate?
  • Opportunity to advance knowledge is always a good thing
  • - no research is useless (putting a man on the moon??)
  • - new discoveries are made unexpectedly
  • - revolutionary Vs evolutionary research