Open Cirrus: A Global Testbed for Cloud Computing Research David - - PowerPoint PPT Presentation

open cirrus a global testbed
SMART_READER_LITE
LIVE PREVIEW

Open Cirrus: A Global Testbed for Cloud Computing Research David - - PowerPoint PPT Presentation

Open Cirrus: A Global Testbed for Cloud Computing Research David OHallaron Director, Intel Labs Pittsburgh Carnegie Mellon University Open Cirrus Testbed http://opencirrus.intel-research.net Sponsored by HP , Intel, and Yahoo!


slide-1
SLIDE 1

Open Cirrus™: A Global Testbed for Cloud Computing Research

David O’Hallaron

Director, Intel Labs Pittsburgh Carnegie Mellon University

slide-2
SLIDE 2

2 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Testbed

  • Sponsored by HP

, Intel, and Yahoo! (w/additional support from NSF).

  • 9 sites worldwide, target of around 20 in the next two years.
  • Each site 1000-4000 cores.
  • Shared hardware infrastructure (~15K cores), services, research, apps.

http://opencirrus.intel-research.net

slide-3
SLIDE 3

3 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Context

Goals

  • 1. Foster new systems and services research around cloud computing
  • 2. Catalyze open-source stack and APIs for the cloud

Motivation

— Enable more tier-2 and tier-3 public and private cloud providers

How are we different?

— Support for systems research and applications research

  • Access to bare metal, integrated virtual-physical migration

— Federation of heterogeneous datacenters

  • Global signon, monitoring, storage services.
slide-4
SLIDE 4

4 Dave O’Hallaron – DIC Workshop, 2009

Intel BigData Cluster

Open Cirrus site hosted by Intel Labs Pittsburgh

— Operational since Jan 2009. — 180 nodes, 1440 cores, 1416 GB DRAM, 500 TB disk

Supporting 50 users, 20 projects from CMU, Pitt, Intel, GaTech

— Cluster management, location and power aware scheduling, physical virtual migration (Tashi), cache savvy algorithms (Hi- Spade), realtime streaming frameworks (SLIPstream), optical datacenter interconnects (CloudConnect), log-based architectures (LBA) — Machine translation, speech recognition, programmable matter simulation , ground model generation, online education, realtime brain activity decoding, realtime gesture and object recognition, federated perception, automated food recognition.

Idea for a research project on Open Cirrus?

— Send short email abstract to Mike Kozuch, Intel Labs Pittsburgh, michael.a.kozuch@intel.com

slide-5
SLIDE 5

5 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Stack

Compute + network + storage resources Power + cooling Management and control subsystem

Physical Resource set (PRS) service

Credit: John Wilkes (HP)

slide-6
SLIDE 6

6 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Stack

PRS service Research Tashi NFS storage service HDFS storage service

PRS clients, each with their

  • wn ―physical data center‖
slide-7
SLIDE 7

7 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Stack

PRS service Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster

Virtual clusters (e.g., Tashi)

slide-8
SLIDE 8

8 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Stack

PRS service Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster BigData App Hadoop

  • 1. Application running
  • 2. On Hadoop
  • 3. On Tashi virtual cluster
  • 4. On a PRS
  • 5. On real hardware
slide-9
SLIDE 9

9 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Stack

PRS service Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster BigData app Hadoop

Experiment/ save/restore

slide-10
SLIDE 10

10 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Stack

PRS service Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster BigData App Hadoop

Experiment/ save/restore

Platform services

slide-11
SLIDE 11

11 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Stack

PRS service Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster BigData App Hadoop

Experiment/ save/restore

Platform services User services

slide-12
SLIDE 12

12 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Stack

PRS Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster BigData App Hadoop

Experiment/ save/restore

Platform services User services

slide-13
SLIDE 13

13 Dave O’Hallaron – DIC Workshop, 2009

System Organization

Compute nodes are divided into dynamically-allocated, vlan- isolated PRS subdomains Apps switch back and forth between virtual and phyiscal.

Open service research Tashi development Proprietary service research Apps running in a VM mgmt infrastructure (e.g., Tashi) Open workload monitoring and trace collection Production storage service

slide-14
SLIDE 14

14 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Stack - PRS

PRS service goals

— Provide mini-datacenters to researchers — Isolate experiments from each other — Stable base for other research

PRS service approach

— Allocate sets of physical co-located nodes, isolated inside VLANs.

PRS code from HP Labs being merged into Apache Tashi project.

Credit: Kevin Lai (HP), Richard Gass, Michael Ryan, Michael Kozuch, and David O’Hallaron (Intel)

slide-15
SLIDE 15

15 Dave O’Hallaron – DIC Workshop, 2009

Open Cirrus Stack - Tashi

An open source Apache Software Foundation project sponsored by Intel, CMU, and HP. Research infrastructure for cloud computing on Big Data

— Implements AWS interface — Daily production use on Intel cluster for 6 months

  • Manages pool of 80 physical nodes
  • ~20 projects/40 users from CMU, Pitt, Intel

— http://incubator.apache.org/projects/tashi

Research focus:

— Location-aware co-scheduling of VMs, storage, and power. — Integrated physical/virtual migration (using PRS)

Credit: Mike Kozuch, Michael Ryan, Richard Gass, Dave O’Hallaron (Intel), Greg Ganger, Mor Harchol-Balter, Julio Lopez, Jim Cipar, Elie Kravat, Anshul Ghandi, Michael Stroucken (CMU)

slide-16
SLIDE 16

16 Dave O’Hallaron – DIC Workshop, 2009

Cluster Manager

Tashi High-Level Design

Node Node Node Node Node

Storage Service Virtualization Service

Node

Scheduler

Cluster nodes are assumed to be commodity machines Services are instantiated through virtual machines Data location and power information is exposed to scheduler and services CM maintains databases and routes messages; decision logic is limited Most decisions happen in the scheduler; manages compute/storage/power in concert The storage service aggregates the capacity of the commodity nodes to house Big Data repositories.

slide-17
SLIDE 17

17 Dave O’Hallaron – DIC Workshop, 2009

Location Matters (calculated)

Calculated (40 racks * 30 nodes * 2 disks) 50 100 150 200 250 300 Disk-1G SSD-1G Disk-10G SSD-10G Throughput/disk (MB/s)

Random Placement Location-Aware Placement

3.6X 11X 3.5X 9.2X

slide-18
SLIDE 18

18 Dave O’Hallaron – DIC Workshop, 2009

Location Matters (measured)

Measured (2 racks * 14 nodes * 6 disks) 5 10 15 20 25 30 35 40 ssh xinetd Throughput/disk (MB/s) Random Placement Location-aware Placement 2.9X 4.7X

slide-19
SLIDE 19

19 Dave O’Hallaron – DIC Workshop, 2009

19

Open Cirrus Stack – Hadoop

An open-source Apache Software Foundation project sponsored by Yahoo!

— http://wiki.apache.org/hadoop/ProjectDescription

Provides a parallel programming model (MapReduce), a distributed file system, and a parallel database (HDFS)

slide-20
SLIDE 20

20 Dave O’Hallaron – DIC Workshop, 2009

Typical Web Service

db db External client Query Result HTTP server Application server Application server Application server Application server

Data center Examples: Web sites serving dynamic content Characteristics:

  • Small queries and results
  • Little client computation
  • Moderate server computation
  • Moderate data accessed per query
slide-21
SLIDE 21

21 Dave O’Hallaron – DIC Workshop, 2009

Big Data Service

Parallel compute server d1 d2 d3 External client Parallel data server Query

Source dataset Derived datasets Parallel file system (e.g., GFS, HDFS)

Result

Data-intensive computing system (e.g. Hadoop)

Parallel query server External data sources

Examples:

  • Search
  • Photo scene completion
  • Log processing
  • Science analytics

Characteristics:

  • Small queries and results
  • Massive data and computation

performed on server

slide-22
SLIDE 22

22 Dave O’Hallaron – DIC Workshop, 2009

Streaming Data Service

Parallel compute server d1 d2 d3 Parallel data server Continuous query stream

Source dataset Derived datasets

Continuous query results Parallel query server External data sources

Characteristics:

  • Application lives on client
  • Client uses cloud as an accelerator
  • Data transferred with query
  • Variable, latency sensitive HPC on server
  • Often combines with Big Data service

Examples: Perceptual computing

  • n high data-rate

sensors: real time brain activity detection, object recognition, gesture recognition

External client and sensors

slide-23
SLIDE 23

23 Dave O’Hallaron – DIC Workshop, 2009

Streaming Data Service

Gestris – Interactive Gesture Recognition

Two-player ―Gestris‖ (gesture-Tetris) implementation

  • 2 video sources
  • Uses a simplified volumetric

event detection algorithm

  • 10 cores, 3GHz each:
  • 1 camera input, scaling
  • 1 game + display
  • 8 for volumetric matching

(4 for each video stream)

  • Achieves full 15fps rate

Arm gesture selects action Credit: Lily Mummert, Babu Pillai, Rahul Sukthankar (Intel), Martial Hebert, Pyre Matikainen (CMU)

slide-24
SLIDE 24

24 Dave O’Hallaron – DIC Workshop, 2009

Streaming Data Meets Big Data

Real-time Brain Activity Decoding

  • Magnetoencephalography (MEG) measures the magnetic fields

associated with brain activity.

  • Temporal and spatial resolution offers unprecedented insights into

brain dynamics.

ECoG MEG

Credit: Dean Pomerleau (Intel), Tom Mitchell, Gus Sudre and Mark Palatucci (CMU), Wei Wang, Doug Weber and Anto Bagic (UPitt)

slide-25
SLIDE 25

25 Dave O’Hallaron – DIC Workshop, 2009

Localizing Sources of Magnetic Activity

Ill-posed problem that applies to both MEG and EEG. Very computationally expensive Important for better mapping to fMRI results, further neuroscience understanding of brain processes and (maybe) improve decoding.

Goal: determine spatiotemporal pattern of brain activity most likely to have caused measured magnetic field

Magnetic Field Measurements Estimated Brain Activity

slide-26
SLIDE 26

26 Dave O’Hallaron – DIC Workshop, 2009

Big Data Background Processing

Source localization pipeline

MRI data

Pre-processing & Filtering MEG or EEG field data (~ 1 hr / session) Reconstruct brain (~ 40 hr/ subject) Create co-registered boundary model (~ 1 hr / subject) Model of electro-magnetic field from sources to sensors (~ 5 min / session) Brain activity estimates (movies, time series) (~ 15 min / session)

Brain Structural Information Electro- magnetic Field Measurements

slide-27
SLIDE 27

27 Dave O’Hallaron – DIC Workshop, 2009

Streaming/Big Data Service

Real-Time MEG/EEG Decoding

Stimulus MEG/EEG Imaging Preprocess & filter Electro-magnetic field Data Off-line Source Modeling (once)

Cloud cluster Brain activity estimates

Source Localization Brain Activity Decoding Off-line Decoder Training (once)

Decoded Results “Hand”

Hand Foot Celery

Hand Foot Celery Airplane

Real-Time Decoding Of Brain Activity

slide-28
SLIDE 28

28 Dave O’Hallaron – DIC Workshop, 2009

Summary and Lessons

Using the cloud as an accelerator for interactive streaming/big data apps is an important usage model. Location-aware and power-aware workload scheduling still open problems. Need integrated physical/virtual allocations to combat cluster squatting. Storage models are still a problem.

— GFS-style storage systems not mature, impact of SSDs unknown

We need open source service architecture and reference implementations.

— Access model — Local and global services — Application frameworks

Need to investigate new application frameworks

— Map-reduce/Hadoop not always appropriate