Lustre Background Why Lustre Failover ? How does Lustre Failover - PowerPoint PPT Presentation

� Lustre Background � Why Lustre Failover ? � How does Lustre Failover work ? � Automation on the Cray XT � System configuration requirements � System configuration requirements � Software configuration for failover � Current limitations � Future work Cray Inc. - CUG 2009 2

� Server Nodes and services � MDS with one mdt per file system � OSS with one or more ost per file system � Clients maintain connections to each service � Failure detection is via network timeouts � Failure detection is via network timeouts clients MDS OSS1 mdt ost1 ost2 Cray Inc. - CUG 2009 3

� Loss of Lustre server currently requires machine reboot � Parallel file system is a critical resource for users � Decreases MTTI and increases downtime � Interrupts impact Service Level Agreements and customer satisfaction � Cray �� Customer � Cray �� Customer � Customer �� Users Cray Inc. - CUG 2009 4

� Objective is to keep the system functioning while minimizing job loss � Regain machine functionality after Lustre server death � Access same data and files by connecting to backup server � Primarily handles Lustre server death � Some documented cases of successful failover due to link failure � Depends on nature of network failure � Warm-boot of Lustre servers � Uses same recovery methods Cray Inc. - CUG 2009 5

� Lustre Failover is not able to handle RAID subsystem failures � Storage controllers � Service node HBA � Connection from service node to storage array � Solutions to these are being investigated � Solutions to these are being investigated clients MDS OSS1 mdt ost1 ost2 Cray Inc. - CUG 2009 6

� OSS1 dies � Was serving ost0, ost2 � ost0,ost2 started on OSS2 � Waits for all clients to reconnect � Client traffic to OSS1 times out Client traffic to OSS1 times out � Clients try to reconnect to OSS1, this also times out � Clients connect to OSS2 � Clients replay outstanding transactions � Clients start sending new I/O requests Cray Inc. - CUG 2009 7

clients MDS OSS1 OSS2 mdt ost0 ost2 ost1 ost3 Cray Inc. - CUG 2009 8

� Automation components � State Management � Health monitoring � Taking action � XT automation achieved through Cray-developed xt-lustre-proxy � XT automation achieved through Cray-developed xt-lustre-proxy � Runs as daemon on every Lustre server � CRMS Framework for heartbeat events � SDB for configuration and maintaining current state Cray Inc. - CUG 2009 9

� CRMS heartbeat events � Existing node failed event � sent when node stops updating heartbeat � Added new Lustre service heartbeat � Use Lustre provided /proc health check � If health check fails, proxy stops updating Lustre service heartbeat � If health check fails, proxy stops updating Lustre service heartbeat � Proxy and heartbeat � At startup, queries SDB for configuration � Registers for events for services it is backing up � On server death, proxy takes action � “Shoots” node via CRMS event to ensure it stays dead � Start services on backup server Cray Inc. - CUG 2009 10

� OSS nodes are typically configured in active/active mode � Requires storage connectivity from both nodes � MDS nodes configured in active/passive mode � Requires backup MDS on separate SIO node � If system has multiple file systems, MDS can be configured in � If system has multiple file systems, MDS can be configured in active/active mode � Hardware configuration changes � Cabling, zoning � Non-mirrored write cache turned off � OSTs per OSS limits � Failover doubles, need to ensure survivability Cray Inc. - CUG 2009 11

� Changes for FILESYSTEM.fs_defs � OSTDEV[0] =“nid00060:/dev/sda1 nid00063:/dev/sdb1” � AUTO_FAILOVER =yes � lustre_control scripts � ‘ generate_config.sh ’ will generate CSV data files for proxy configuration � ‘ generate_config.sh ’ will generate CSV data files for proxy configuration � ‘ lustre_control.sh FILESYSTEM.fs_defs write_conf’ will push CSV tables into the SDB � Manual configuration � xtfilesys2db, xtlustreserv2db, xtlustrefailover2db � xtlusfoadmin Cray Inc. - CUG 2009 12

� Failover duration is not optimal � Usually 10-15 minutes � Can take up to 30 minutes � Quotas and MDS failover � Known issues in XT 2.2, working with Sun at high priority � Known issues in XT 2.2, working with Sun at high priority � Some job loss is inevitable � Users with tight batch wall-clock limits � Client death during failover Cray Inc. - CUG 2009 13

� Manual failback � Multiple filesystems and lustre_control configuration � Documented solutions � Manual status monitoring � lctl get_param *.*.recovery_status status: RECOVERING recovery_start: 1236123918 time_remaining: 886 connected_clients: 1/178 completed_clients: 1/178 replayed_requests: 0/?? queued_requests: 0 next_transno: 1268285 Cray Inc. - CUG 2009 14

� Imperative Recovery � Working with Sun to develop feature � Force client reconnect and stop server waiting on dead clients � Reduce failover times to under 5 minutes, ideally 1 to 3 minutes � Version Based Recovery � Minimized evictions caused by unconnected clients � Minimized evictions caused by unconnected clients � Only transactions requiring missing data will fail � Adaptive Timeouts � Gemini Network � Allows shorter network timeouts and positive feedback on dead peers � Targeted for Danube Release Cray Inc. - CUG 2009 15

Lustre Background Why Lustre Failover ? How does Lustre Failover - PowerPoint PPT Presentation

Lustre Background Why Lustre Failover ? How does Lustre Failover work ? Automation on the Cray XT System configuration requirements System configuration requirements Software configuration for failover Current

1 A Lustre V6 tutorial Verimag December 5, 2008 - Outline Lustre Lustre V6 The Lustre V6

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

Overview of Lustre Usage on JUROPA 26 September 2011 | Frank Heckes, FZ Jlich, JSC Lustre

What a Lustre Cluster (Improving and Tracing Lustre Metadata) yaaaasss Team Saffron Amanda

An Experiment With Lustre and Real-Time Calculus Introduction du cours Matthieu Moy Verimag

The Lustre Centre of Excellence at ORNL Makia Minich Clustre Monkey, HPC Software Stack Lustre

Automated, Non-Stop MySQL Operations and Failover Yoshinori Matsunobu Principal Infrastructure

DHCPv6 Failover Update IETF85 Kim Kinnear <kkinnear@cisco.com> Tomek Mrugalski

Handling Failover with MySQL 5.6 and Global Transaction IDs Stphane Combaudon FOSDEM February

Un-scratching Lustre MSST 2019 Cameron Harr (Lustre Ops & Stuff, LLNL) May 21, 2019

Multi-VO Support YAN Tian for Distributed Computing Group Meeting Oct. 23, 2014 StoRM + Lustre:

Lustre V6 Synchronous Team VERIMAG, Grenoble 2 Lustre Basics Structuration Only nodes

An Introduction to the Lustre Parallel File System Tom Edwards tedwards@cray.com C O M P U T E

on Cray Systems Cory Spitz and Ann Koehler Cray Inc. 5/25/2011 Introduction Lustre is a

DSS Data & Storage Services CERN Lustre Evaluation and Storage Outlook Tim Bell Arne

Cray Centre of Excellence for HECToR This talk is not about how to get maximum performance from

301AA - Advanced Programming Lecturer: Andrea Corradini andrea@di.unipi.it

Andi Scharfstein, Seminar on Functional Programming 2006 Why are we here? [ Live Demo of the

SUMMARY History End of life CLI Services Security Considerations PowerShell

Objectives Course conclusions May 16, 2019 Sprenkle - CSCI335 1 Review What did you

Hands-On Ethical Hacking and Network Defense Second Edition Chapter 10 Hacking Web Servers

New Generation TV platform API and Privacy Protection - Study from Broadcasting Extended

Trusted Components Reuse, Contracts and Patterns Prof. Dr. Bertrand Meyer Dr. Karine Arnout

Deceptive Previews: A Study of the Link Preview Trustworthiness in Social Platforms Giada Stivala

Sambuz

Useful Links

Newsletter

Mail Us

Lustre Background Why Lustre Failover ? How does Lustre Failover - PowerPoint PPT Presentation

Lustre Background Why Lustre Failover ? How does Lustre Failover work ? Automation on the Cray XT System configuration requirements System configuration requirements Software configuration for failover Current

1 A Lustre V6 tutorial Verimag December 5, 2008 - Outline Lustre Lustre V6 The Lustre V6

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

Overview of Lustre Usage on JUROPA 26 September 2011 | Frank Heckes, FZ Jlich, JSC Lustre

What a Lustre Cluster (Improving and Tracing Lustre Metadata) yaaaasss Team Saffron Amanda

An Experiment With Lustre and Real-Time Calculus Introduction du cours Matthieu Moy Verimag

The Lustre Centre of Excellence at ORNL Makia Minich Clustre Monkey, HPC Software Stack Lustre

Automated, Non-Stop MySQL Operations and Failover Yoshinori Matsunobu Principal Infrastructure

DHCPv6 Failover Update IETF85 Kim Kinnear &lt;kkinnear@cisco.com&gt; Tomek Mrugalski

Handling Failover with MySQL 5.6 and Global Transaction IDs Stphane Combaudon FOSDEM February

Un-scratching Lustre MSST 2019 Cameron Harr (Lustre Ops &amp; Stuff, LLNL) May 21, 2019

Multi-VO Support YAN Tian for Distributed Computing Group Meeting Oct. 23, 2014 StoRM + Lustre:

Lustre V6 Synchronous Team VERIMAG, Grenoble 2 Lustre Basics Structuration Only nodes

An Introduction to the Lustre Parallel File System Tom Edwards tedwards@cray.com C O M P U T E

on Cray Systems Cory Spitz and Ann Koehler Cray Inc. 5/25/2011 Introduction Lustre is a

DSS Data &amp; Storage Services CERN Lustre Evaluation and Storage Outlook Tim Bell Arne

Cray Centre of Excellence for HECToR This talk is not about how to get maximum performance from

301AA - Advanced Programming Lecturer: Andrea Corradini andrea@di.unipi.it

Andi Scharfstein, Seminar on Functional Programming 2006 Why are we here? [ Live Demo of the

SUMMARY History End of life CLI Services Security Considerations PowerShell

Objectives Course conclusions May 16, 2019 Sprenkle - CSCI335 1 Review What did you

Hands-On Ethical Hacking and Network Defense Second Edition Chapter 10 Hacking Web Servers

New Generation TV platform API and Privacy Protection - Study from Broadcasting Extended

Trusted Components Reuse, Contracts and Patterns Prof. Dr. Bertrand Meyer Dr. Karine Arnout

Deceptive Previews: A Study of the Link Preview Trustworthiness in Social Platforms Giada Stivala

Sambuz

Useful Links

Newsletter

Mail Us

DHCPv6 Failover Update IETF85 Kim Kinnear <kkinnear@cisco.com> Tomek Mrugalski

Un-scratching Lustre MSST 2019 Cameron Harr (Lustre Ops & Stuff, LLNL) May 21, 2019

DSS Data & Storage Services CERN Lustre Evaluation and Storage Outlook Tim Bell Arne