trends in managing data at the petabyte scale
play

Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. - PowerPoint PPT Presentation

Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO Before we begin Disk reliability SIGMETRICS 07: An Analysis of Latent Sector Errors in Disk Drives Lakshmi Bairavasundaram, Garth Goodson, Jiri


  1. Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO

  2. Before we begin…  Disk reliability – SIGMETRICS ’07: An Analysis of Latent Sector Errors in Disk Drives • Lakshmi Bairavasundaram, Garth Goodson, Jiri Schindler, Shankar Pasupathy – Symposium on Reliability and Maintainability ’03,’04,’05 • John Elerath and Sandeep Shah 2

  3. Petabyte Environments are Here!  2006 Q1-Q3 NAS+SAN  ~25 NetApp customers Petabytes Shipped with >1PB  Largest: ~33PB Vendor PB EMC 214 NetApp * 199 HP 180 IBM 92 Hitachi 64 Other 247 Total 996 *Current quarterly run rate >100PB YoY Growth >100% Source: IDC, Dec 2006 3

  4. The Growing Burden of Data Ownership  Operational burdens – Managing the data explosion • 50-100% • Unstructured, semi-structured, structured – Increasing dependence on data • Ensuring 100% availability • Protecting data from disasters – Rapidly deploying new applications – Global operations • Multiple data centers • Many remote offices  Financial burdens – Controlling costs • Equipment, people, processes • Utilization 4

  5. New/Hidden Burdens of Data Ownership  Legal burdens – Complying with regulations • Discovery • Preventing unauthorized access • Retention  Social burdens – Protecting your reputation • Disclosing data loss  Geo-political burdens – Multiple cultures & legal systems 5

  6. Traditional Infrastructure Build-out: Application-centric Silos Tier 1 Tier 1 Tier 2 Tier 3 Applications Primary Storage  Good Quality of Service  Incompatible hardware  Incompatible software  Different processes  Lots of experts  Low utilization 6

  7. It’s Not Just the Primary Storage Primary DR Test & Dev Backup Archive 7

  8. Separation of Data from Physical Containers Data Data Data Data Data Data Data Data DAS Networked Snapshots Global Namespace Storage Clones Scale-out Thin-provisioning Multiple Tiers Data mirroring 8

  9. Unified Protection & Enablement Environment Data Data Data Data DR DR DR DR Data Backup Backup Backup Backup Protection Archive Archive Archive Archive Test/Dev Test/Dev Test/Dev Application Enablement Mining Mining 9

  10. Unified Protection & Enablement Environment Data Data DR DR Data Data Data Backup Backup Backup Backup Protection Archive Archive Archive Archive Test/Dev Test/Dev Test/Dev Application Enablement Mining Mining 10

  11. Non-copy Data Properties Security Compliance Classification Data Access QOS Namespace Control 11

  12. Unified Protection & Enablement Environment Data Data DR DR Data Data Data Backup Backup Backup Backup Protection Archive Archive Archive Archive Test/Dev Test/Dev Test/Dev Application Enablement Mining Mining 12

  13. The Storage Admin’s Challenge Application Managers Storage Manager 13

  14. Managing The Copies 1 Oracle database: 17 tables on Primary + 17 tables on remote DR site + 17 mirror relationships between primary and DR + 17 tables on secondary dev & test + 17 mirror relationships between primary and secondary + Backups + Archive copies Or 1 Dataset 14

  15. What’s a “Dataset”?  Dataset: A collection of data meaningful to the user or data administrator having similar properties – A set of database tables – A home directory – A server root LUN  Datasets have properties – Redundancy, Disaster recovery – Compliance, Saved versions – QOS – Security, Access control – ???  Datasets can span storage servers – A higher level of abstraction allows automation 15

  16. Simplification Through Integrated Data Management  Application admins set data properties Data  Properties assigned to logical sets of data  Properties define business requirements for data  Storage admins create & manage processes  Processes deliver on data requirements  Automation & service delivery become possible 16

  17. Simplification Through Integrated Data Management Properties  Recovery time objective: 0 sec Data  Applicable regulations: SEC-17A  Security level: high Policies  Low RTO: use synchronous mirroring  SEC-17A: enable SnapLock; delete after 7 years  Hi Security: enable encryption 17

  18. Simplification Through Integrated Data Management  Right decisions are made by the Data right people  Easier to change and automate – Goal: Automate 80% of workflow  Data properties can remain constant while processes adapt to new technologies 18

  19. “Two Worlds” vs. Storage Virtualization Architecture Vendor 1 Vendor 2 19

  20. Long Term Trends  Unification of capabilities in a single storage infrastructure  Property-based dataset management adopted for simplification and automation It’s starting to happen now  Unified model  Virtualization  Scale-out & Grid  Data sets & properties  Value-added copies  Heterogeneous replication 20

  21. Summary It’s good to be in storage! 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend