Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. - - PowerPoint PPT Presentation

trends in managing data at the petabyte scale
SMART_READER_LITE
LIVE PREVIEW

Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. - - PowerPoint PPT Presentation

Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO Before we begin Disk reliability SIGMETRICS 07: An Analysis of Latent Sector Errors in Disk Drives Lakshmi Bairavasundaram, Garth Goodson, Jiri


slide-1
SLIDE 1

Trends in Managing Data at the Petabyte Scale

Steve Kleiman

  • Sr. VP & CTO
slide-2
SLIDE 2

2

Before we begin… Disk reliability

– SIGMETRICS ’07: An Analysis of Latent Sector Errors in Disk Drives

  • Lakshmi Bairavasundaram, Garth Goodson, Jiri Schindler,

Shankar Pasupathy

– Symposium on Reliability and Maintainability ’03,’04,’05

  • John Elerath and Sandeep Shah
slide-3
SLIDE 3

3

Petabyte Environments are Here!

2006 Q1-Q3 NAS+SAN Petabytes Shipped ~25 NetApp customers with >1PB Largest: ~33PB

214 EMC 199 NetApp* 180 HP 247 Other 996 Total 64 Hitachi 92 IBM PB Vendor

Source: IDC, Dec 2006 *Current quarterly run rate >100PB YoY Growth >100%

slide-4
SLIDE 4

4

The Growing Burden of Data Ownership

Operational burdens

– Managing the data explosion

  • 50-100%
  • Unstructured, semi-structured, structured

– Increasing dependence on data

  • Ensuring 100% availability
  • Protecting data from disasters

– Rapidly deploying new applications – Global operations

  • Multiple data centers
  • Many remote offices

Financial burdens

– Controlling costs

  • Equipment, people, processes
  • Utilization
slide-5
SLIDE 5

5

New/Hidden Burdens of Data Ownership Legal burdens

– Complying with regulations

  • Discovery
  • Preventing unauthorized access
  • Retention

Social burdens

– Protecting your reputation

  • Disclosing data loss

Geo-political burdens

– Multiple cultures & legal systems

slide-6
SLIDE 6

6

Traditional Infrastructure Build-out: Application-centric Silos

Incompatible hardware Incompatible software Different processes Lots of experts Low utilization

Applications Primary Storage

Good Quality of Service

Tier 1 Tier 2 Tier 3 Tier 1

slide-7
SLIDE 7

7

It’s Not Just the Primary Storage

Primary DR Test & Dev Backup Archive

slide-8
SLIDE 8

8

Separation of Data from Physical Containers

Global Namespace Scale-out Multiple Tiers

Data Data Data Data

DAS

Data

Networked Storage

Data

Snapshots Clones Thin-provisioning Data mirroring

Data Data

slide-9
SLIDE 9

9

Unified Protection & Enablement Environment

Backup Data DR Archive Test/Dev Mining Backup Data DR Archive Backup Data DR Archive Test/Dev Backup Data DR Archive Test/Dev Mining

Data Protection Application Enablement

slide-10
SLIDE 10

10

Backup Data Archive Backup Data Archive Test/Dev

Unified Protection & Enablement Environment

Backup Data DR Archive Test/Dev Mining Backup Data DR Archive Test/Dev Mining

Data Protection Application Enablement

slide-11
SLIDE 11

11

Non-copy Data Properties

Data

Security Classification Access Control QOS Compliance Namespace

slide-12
SLIDE 12

12

Backup Data Archive Backup Data Archive Test/Dev

Unified Protection & Enablement Environment

Backup Data DR Archive Test/Dev Mining Backup Data DR Archive Test/Dev Mining

Data Protection Application Enablement

slide-13
SLIDE 13

13

The Storage Admin’s Challenge

Storage Manager Application Managers

slide-14
SLIDE 14

14

Managing The Copies

1 Oracle database: 17 tables on Primary + 17 tables on remote DR site + 17 mirror relationships between primary and DR + 17 tables on secondary dev & test + 17 mirror relationships between primary and secondary + Backups + Archive copies Or 1 Dataset

slide-15
SLIDE 15

15

What’s a “Dataset”?

Dataset: A collection of data meaningful to the user or data administrator having similar properties

– A set of database tables – A home directory – A server root LUN

Datasets have properties

– Redundancy, Disaster recovery – Compliance, Saved versions – QOS – Security, Access control – ???

Datasets can span storage servers

– A higher level of abstraction allows automation

slide-16
SLIDE 16

16

Simplification Through Integrated Data Management

Application admins set data properties Properties assigned to logical sets of data Properties define business requirements for data Storage admins create & manage processes Processes deliver on data requirements Automation & service delivery become possible

Data

slide-17
SLIDE 17

17

Simplification Through Integrated Data Management

Recovery time objective: 0 sec Applicable regulations: SEC-17A Security level: high

Properties Policies

Low RTO: use synchronous mirroring SEC-17A: enable SnapLock; delete after 7 years Hi Security: enable encryption

Data

slide-18
SLIDE 18

18

Simplification Through Integrated Data Management Right decisions are made by the right people Easier to change and automate

– Goal: Automate 80% of workflow

Data properties can remain constant while processes adapt to new technologies

Data

slide-19
SLIDE 19

19

“Two Worlds” vs. Storage Virtualization Architecture

Vendor 1 Vendor 2

slide-20
SLIDE 20

20

Long Term Trends Unification of capabilities in a single storage infrastructure Property-based dataset management adopted for simplification and automation

It’s starting to happen now

Unified model Scale-out & Grid Value-added copies Virtualization Data sets & properties Heterogeneous replication

slide-21
SLIDE 21

21

Summary It’s good to be in storage!