fi fingerprinting t g the c check cker po policies of pa
play

Fi Fingerprinting t g the C Check cker Po Policies of Pa - PowerPoint PPT Presentation

PDSW 2020:5TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Fi Fingerprinting t g the C Check cker Po Policies of Pa Parallel File Systems Runzhou Han , Duo Zhang, Mai Zheng Parallel File Systems (PFSes) PFS is the cornerstone of high


  1. PDSW 2020:5TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Fi Fingerprinting t g the C Check cker Po Policies of Pa Parallel File Systems Runzhou Han , Duo Zhang, Mai Zheng

  2. Parallel File Systems (PFSes) • PFS is the cornerstone of high performance computing • Optimized for highly concurrent access 1

  3. PFS Failures: Real-World Cases Case1: HPCC Power Outage 2

  4. PFS Failures: Real-World Cases Case1: HPCC Power Outage Case2: ACCRE Storage Outage* 3 * Hyperion Research survey of HPC organizations done for Panasas

  5. PFS Failures : More Frequent/Expensive Than You Thought Some statistics *: The average HPC storage system failure frequency is 9.8 failures /year ≈ half of HPC sites experience storage system failures 1/month or more frequently 4 * Hyperion Research survey of HPC organizations done for Panasas

  6. PFS Failures : More Frequent/Expensive Than You Thought Some statistics *: The average HPC storage system failure Downtime ranges from frequency is 1 day ↓ to 1 week ↑ 9.8 failures /year ≈ half 40 % of HPC sites experience storage of HPC sites typically took system failures more than 2 weeks to restore 1/month or more their storage systems frequently 5 * Hyperion Research survey of HPC organizations done for Panasas

  7. PFS Failures : More Frequent/Expensive Than You Thought Some statistics *: The average HPC A single day of downtime storage system failure Downtime ranges from costs from frequency is 1 day ↓ to 1 week ↑ $100K ↓ to $1M ↑ 9.8 failures /year ≈ half 40 % of HPC sites Average downtime cost is experience storage of HPC sites typically took $127K /day system failures more than 2 weeks to restore 1/month or more their storage systems frequently 6 * Hyperion Research survey of HPC organizations done for Panasas

  8. PFS & PFS Checkers (FSCKs) • Typical PFS architecture Network Management Metadata Object Storage Server (MGS) Server (MDS) Servers (OSSes) Object Storage Management Metadata Targets (OSTs) Target (MGT) Target (MDT) 7

  9. PFS & PFS Checkers (FSCKs) • Typical PFS architecture Network Management Metadata Object Storage Server (MGS) Server (MDS) Servers (OSSes) Object Storage Management Metadata PFS checker Targets (OSTs) Target (MGT) Target (MDT) (FSCK) I/Oes I/Oes I/Oes • Many PFSes are designed with a checker component • e.g., LFSCK for Lustre, BeeGFS-FSCK for BeeGFS, PV2FS-FSCK for OrangeFS 8

  10. PFS & PFS Checkers (FSCKs) • Typical PFS architecture Network Management Metadata Object Storage Server (MGS) Server (MDS) Servers (OSSes) Object Storage Management Metadata PFS checker Targets (OSTs) Target (MGT) Target (MDT) (FSCK) I/Oes I/Oes I/Oes • Many PFSes are designed with a checker component • e.g., LFSCK for Lustre, BeeGFS-FSCK for BeeGFS, PV2FS-FSCK for OrangeFS • Detect and repair inconsistencies 9

  11. PFS & PFS Checkers (FSCKs) • Typical PFS architecture Network Management Metadata Object Storage Server (MGS) Server (MDS) Servers (OSSes) Object Storage Management Metadata PFS checker Targets (OSTs) Target (MGT) Target (MDT) (FSCK) I/Oes I/Oes I/Oes • Many PFSes are designed with a checker component • e.g., LFSCK for Lustre, BeeGFS-FSCK for BeeGFS, PV2FS-FSCK for OrangeFS • Detect and repair inconsistencies • FSCKs have predefined checker policies 10

  12. Examples of PFS Checker Policies • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object • MDT-object’s LOV EA matches to OST-object’s FID • OST-object’s Parent FID matches to MDT-object’s FID MDT OST Structures Meaning xattr xattr xattr inode extended attribute MDT- OST- FID FID objects objects data FID a global ID of an Lustre object LOV EA Parent LOV EA stores child object’s FID FID … MDT-object A Parent FID stores parent object’s FID OST-object a 11

  13. Examples of PFS Checker Policies • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object • MDT-object’s LOV EA matches to OST-object’s FID • OST-object’s Parent FID matches to MDT-object’s FID MDT OST Structures Meaning xattr xattr xattr inode extended attribute MDT- OST- FID FID objects objects data FID a global ID of an Lustre object LOV EA Parent LOV EA stores child object’s FID FID … MDT-object A Parent FID stores parent object’s FID OST-object a 12

  14. Examples of PFS Checker Policies • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object • MDT-object’s LOV EA matches to OST-object’s FID • OST-object’s Parent FID matches to MDT-object’s FID MDT OST Structures Meaning xattr xattr xattr inode extended attribute MDT- OST- FID FID objects objects data FID a global ID of an Lustre object LOV EA Parent LOV EA stores child object’s FID FID … MDT-object A Parent FID stores parent object’s FID OST-object a 13

  15. Examples of PFS Checker Policies • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object • MDT-object’s LOV EA matches to OST-object’s FID • OST-object’s Parent FID matches to MDT-object’s FID MDT OST Structures Meaning xattr xattr xattr inode extended attribute MDT- OST- FID FID objects objects data FID a global ID of an Lustre object LOV EA Parent LOV EA stores child object’s FID FID … MDT-object A Parent FID stores parent object’s FID OST-object a 14

  16. Examples of PFS Checker Policies • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object • MDT-object’s LOV EA matches to OST-object’s FID • OST-object’s Parent FID matches to MDT-object’s FID MDT OST Structures Meaning xattr xattr xattr inode extended attribute MDT- OST- FID FID objects objects data FID a global ID of an Lustre object LOV EA Parent LOV EA stores child object’s FID FID … MDT-object A Parent FID stores parent object’s FID OST-object a 15

  17. Examples of PFS Checker Policies • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object • MDT-object’s LOV EA matches to OST-object’s FID • OST-object’s Parent FID matches to MDT-object’s FID MDT OST Structures Meaning xattr xattr xattr inode extended attribute MDT- OST- FID FID objects objects data FID a global ID of an Lustre object LOV EA Parent LOV EA stores child object’s FID FID … MDT-object A Parent FID stores parent object’s FID OST-object a 16

  18. Examples of PFS Checker Policies • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object • MDT-object’s LOV EA matches to OST-object’s FID Corruption 1 • OST-object’s Parent FID matches to MDT-object’s FID MDT OST Structures Meaning xattr xattr xattr inode extended attribute MDT- OST- FID FID objects objects data FID a global ID of an Lustre object LOV EA corruption Parent LOV EA stores child object’s FID FID … MDT-object A Parent FID stores parent object’s FID OST-object a 17

  19. Examples of PFS Checker Policies • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object • MDT-object’s LOV EA matches to OST-object’s FID Fixed! Corruption 1 • OST-object’s Parent FID matches to MDT-object’s FID MDT OST Structures Meaning xattr xattr xattr inode extended attribute MDT- OST- FID FID objects objects data FID a global ID of an Lustre object ✔ LOV EA LFSCK Parent LOV EA stores child object’s FID FID … MDT-object A Parent FID stores parent object’s FID OST-object a 18

  20. Examples of PFS Checker Policies • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object • MDT-object’s LOV EA matches to OST-object’s FID • OST-object’s Parent FID matches to MDT-object’s FID Corruption 2 MDT OST Structures Meaning xattr xattr xattr inode extended attribute MDT- OST- corruption FID FID objects objects data FID a global ID of an Lustre object LOV EA Parent LOV EA stores child object’s FID FID … MDT-object A Parent FID stores parent object’s FID OST-object a 19

  21. Examples of PFS Checker Policies • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object • MDT-object’s LOV EA matches to OST-object’s FID • OST-object’s Parent FID matches to MDT-object’s FID Cannot be fixed! Corruption 2 MDT OST Structures Meaning xattr xattr xattr inode extended attribute MDT- OST- ✘ FID LFSCK FID objects objects data FID a global ID of an Lustre object LOV EA Parent LOV EA stores child object’s FID FID … MDT-object A Parent FID stores parent object’s FID OST-object a 20

  22. Examples of PFS Checker Policies • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object • MDT-object’s LOV EA matches to OST-object’s FID • OST-object’s Parent FID matches to MDT-object’s FID Cannot be fixed! Corruption 2 LFSCK’s policy is incomplete! MDT OST Structures Meaning xattr xattr xattr inode extended attribute MDT- OST- ✘ FID LFSCK FID objects objects data FID a global ID of an Lustre object LOV EA Parent LOV EA stores child object’s FID FID … MDT-object A Parent FID stores parent object’s FID OST-object a 21

  23. Our Contributions • A systematic approach to analyze PFS checker policies • PFS type-aware fault injection • PFS consistency model & taxonomy 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend