Open Cirrus: A Global Testbed for Cloud Computing Research David - - PowerPoint PPT Presentation
Open Cirrus: A Global Testbed for Cloud Computing Research David - - PowerPoint PPT Presentation
Open Cirrus: A Global Testbed for Cloud Computing Research David OHallaron Director, Intel Labs Pittsburgh Carnegie Mellon University Open Cirrus Testbed http://opencirrus.intel-research.net Sponsored by HP , Intel, and Yahoo!
2 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Testbed
- Sponsored by HP
, Intel, and Yahoo! (w/additional support from NSF).
- 9 sites worldwide, target of around 20 in the next two years.
- Each site 1000-4000 cores.
- Shared hardware infrastructure (~15K cores), services, research, apps.
http://opencirrus.intel-research.net
3 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Context
Goals
- 1. Foster new systems and services research around cloud computing
- 2. Catalyze open-source stack and APIs for the cloud
Motivation
— Enable more tier-2 and tier-3 public and private cloud providers
How are we different?
— Support for systems research and applications research
- Access to bare metal, integrated virtual-physical migration
— Federation of heterogeneous datacenters
- Global signon, monitoring, storage services.
4 Dave O’Hallaron – DIC Workshop, 2009
Intel BigData Cluster
Open Cirrus site hosted by Intel Labs Pittsburgh
— Operational since Jan 2009. — 180 nodes, 1440 cores, 1416 GB DRAM, 500 TB disk
Supporting 50 users, 20 projects from CMU, Pitt, Intel, GaTech
— Cluster management, location and power aware scheduling, physical virtual migration (Tashi), cache savvy algorithms (Hi- Spade), realtime streaming frameworks (SLIPstream), optical datacenter interconnects (CloudConnect), log-based architectures (LBA) — Machine translation, speech recognition, programmable matter simulation , ground model generation, online education, realtime brain activity decoding, realtime gesture and object recognition, federated perception, automated food recognition.
Idea for a research project on Open Cirrus?
— Send short email abstract to Mike Kozuch, Intel Labs Pittsburgh, michael.a.kozuch@intel.com
5 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Stack
Compute + network + storage resources Power + cooling Management and control subsystem
Physical Resource set (PRS) service
Credit: John Wilkes (HP)
6 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Stack
PRS service Research Tashi NFS storage service HDFS storage service
PRS clients, each with their
- wn ―physical data center‖
7 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Stack
PRS service Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster
Virtual clusters (e.g., Tashi)
8 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Stack
PRS service Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster BigData App Hadoop
- 1. Application running
- 2. On Hadoop
- 3. On Tashi virtual cluster
- 4. On a PRS
- 5. On real hardware
9 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Stack
PRS service Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster BigData app Hadoop
Experiment/ save/restore
10 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Stack
PRS service Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster BigData App Hadoop
Experiment/ save/restore
Platform services
11 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Stack
PRS service Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster BigData App Hadoop
Experiment/ save/restore
Platform services User services
12 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Stack
PRS Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster BigData App Hadoop
Experiment/ save/restore
Platform services User services
13 Dave O’Hallaron – DIC Workshop, 2009
System Organization
Compute nodes are divided into dynamically-allocated, vlan- isolated PRS subdomains Apps switch back and forth between virtual and phyiscal.
Open service research Tashi development Proprietary service research Apps running in a VM mgmt infrastructure (e.g., Tashi) Open workload monitoring and trace collection Production storage service
14 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Stack - PRS
PRS service goals
— Provide mini-datacenters to researchers — Isolate experiments from each other — Stable base for other research
PRS service approach
— Allocate sets of physical co-located nodes, isolated inside VLANs.
PRS code from HP Labs being merged into Apache Tashi project.
Credit: Kevin Lai (HP), Richard Gass, Michael Ryan, Michael Kozuch, and David O’Hallaron (Intel)
15 Dave O’Hallaron – DIC Workshop, 2009
Open Cirrus Stack - Tashi
An open source Apache Software Foundation project sponsored by Intel, CMU, and HP. Research infrastructure for cloud computing on Big Data
— Implements AWS interface — Daily production use on Intel cluster for 6 months
- Manages pool of 80 physical nodes
- ~20 projects/40 users from CMU, Pitt, Intel
— http://incubator.apache.org/projects/tashi
Research focus:
— Location-aware co-scheduling of VMs, storage, and power. — Integrated physical/virtual migration (using PRS)
Credit: Mike Kozuch, Michael Ryan, Richard Gass, Dave O’Hallaron (Intel), Greg Ganger, Mor Harchol-Balter, Julio Lopez, Jim Cipar, Elie Kravat, Anshul Ghandi, Michael Stroucken (CMU)
16 Dave O’Hallaron – DIC Workshop, 2009
Cluster Manager
Tashi High-Level Design
Node Node Node Node Node
Storage Service Virtualization Service
Node
Scheduler
Cluster nodes are assumed to be commodity machines Services are instantiated through virtual machines Data location and power information is exposed to scheduler and services CM maintains databases and routes messages; decision logic is limited Most decisions happen in the scheduler; manages compute/storage/power in concert The storage service aggregates the capacity of the commodity nodes to house Big Data repositories.
17 Dave O’Hallaron – DIC Workshop, 2009
Location Matters (calculated)
Calculated (40 racks * 30 nodes * 2 disks) 50 100 150 200 250 300 Disk-1G SSD-1G Disk-10G SSD-10G Throughput/disk (MB/s)
Random Placement Location-Aware Placement
3.6X 11X 3.5X 9.2X
18 Dave O’Hallaron – DIC Workshop, 2009
Location Matters (measured)
Measured (2 racks * 14 nodes * 6 disks) 5 10 15 20 25 30 35 40 ssh xinetd Throughput/disk (MB/s) Random Placement Location-aware Placement 2.9X 4.7X
19 Dave O’Hallaron – DIC Workshop, 2009
19
Open Cirrus Stack – Hadoop
An open-source Apache Software Foundation project sponsored by Yahoo!
— http://wiki.apache.org/hadoop/ProjectDescription
Provides a parallel programming model (MapReduce), a distributed file system, and a parallel database (HDFS)
20 Dave O’Hallaron – DIC Workshop, 2009
Typical Web Service
db db External client Query Result HTTP server Application server Application server Application server Application server
Data center Examples: Web sites serving dynamic content Characteristics:
- Small queries and results
- Little client computation
- Moderate server computation
- Moderate data accessed per query
21 Dave O’Hallaron – DIC Workshop, 2009
Big Data Service
Parallel compute server d1 d2 d3 External client Parallel data server Query
Source dataset Derived datasets Parallel file system (e.g., GFS, HDFS)
Result
Data-intensive computing system (e.g. Hadoop)
Parallel query server External data sources
Examples:
- Search
- Photo scene completion
- Log processing
- Science analytics
Characteristics:
- Small queries and results
- Massive data and computation
performed on server
22 Dave O’Hallaron – DIC Workshop, 2009
Streaming Data Service
Parallel compute server d1 d2 d3 Parallel data server Continuous query stream
Source dataset Derived datasets
Continuous query results Parallel query server External data sources
Characteristics:
- Application lives on client
- Client uses cloud as an accelerator
- Data transferred with query
- Variable, latency sensitive HPC on server
- Often combines with Big Data service
Examples: Perceptual computing
- n high data-rate
sensors: real time brain activity detection, object recognition, gesture recognition
External client and sensors
23 Dave O’Hallaron – DIC Workshop, 2009
Streaming Data Service
Gestris – Interactive Gesture Recognition
Two-player ―Gestris‖ (gesture-Tetris) implementation
- 2 video sources
- Uses a simplified volumetric
event detection algorithm
- 10 cores, 3GHz each:
- 1 camera input, scaling
- 1 game + display
- 8 for volumetric matching
(4 for each video stream)
- Achieves full 15fps rate
Arm gesture selects action Credit: Lily Mummert, Babu Pillai, Rahul Sukthankar (Intel), Martial Hebert, Pyre Matikainen (CMU)
24 Dave O’Hallaron – DIC Workshop, 2009
Streaming Data Meets Big Data
Real-time Brain Activity Decoding
- Magnetoencephalography (MEG) measures the magnetic fields
associated with brain activity.
- Temporal and spatial resolution offers unprecedented insights into
brain dynamics.
ECoG MEG
Credit: Dean Pomerleau (Intel), Tom Mitchell, Gus Sudre and Mark Palatucci (CMU), Wei Wang, Doug Weber and Anto Bagic (UPitt)
25 Dave O’Hallaron – DIC Workshop, 2009
Localizing Sources of Magnetic Activity
Ill-posed problem that applies to both MEG and EEG. Very computationally expensive Important for better mapping to fMRI results, further neuroscience understanding of brain processes and (maybe) improve decoding.
Goal: determine spatiotemporal pattern of brain activity most likely to have caused measured magnetic field
Magnetic Field Measurements Estimated Brain Activity
26 Dave O’Hallaron – DIC Workshop, 2009
Big Data Background Processing
Source localization pipeline
MRI data
Pre-processing & Filtering MEG or EEG field data (~ 1 hr / session) Reconstruct brain (~ 40 hr/ subject) Create co-registered boundary model (~ 1 hr / subject) Model of electro-magnetic field from sources to sensors (~ 5 min / session) Brain activity estimates (movies, time series) (~ 15 min / session)
Brain Structural Information Electro- magnetic Field Measurements
27 Dave O’Hallaron – DIC Workshop, 2009
Streaming/Big Data Service
Real-Time MEG/EEG Decoding
Stimulus MEG/EEG Imaging Preprocess & filter Electro-magnetic field Data Off-line Source Modeling (once)
Cloud cluster Brain activity estimates
Source Localization Brain Activity Decoding Off-line Decoder Training (once)
Decoded Results “Hand”
Hand Foot Celery
Hand Foot Celery Airplane
Real-Time Decoding Of Brain Activity
28 Dave O’Hallaron – DIC Workshop, 2009
Summary and Lessons
Using the cloud as an accelerator for interactive streaming/big data apps is an important usage model. Location-aware and power-aware workload scheduling still open problems. Need integrated physical/virtual allocations to combat cluster squatting. Storage models are still a problem.
— GFS-style storage systems not mature, impact of SSDs unknown
We need open source service architecture and reference implementations.
— Access model — Local and global services — Application frameworks
Need to investigate new application frameworks
— Map-reduce/Hadoop not always appropriate