towards zero downtime
play

Towards Zero Downtime How to Maintain SAP HANA System Replication - PowerPoint PPT Presentation

TUT90846 Towards Zero Downtime How to Maintain SAP HANA System Replication Clusters Fabian Herschel Markus Grtler Senior Architect SAP Senior Architect SAP Fabian.Herschel@suse.com Markus.Guertler@suse.com Agenda SUSE Linux Enterprise


  1. TUT90846 Towards Zero Downtime How to Maintain SAP HANA System Replication Clusters Fabian Herschel Markus Gürtler Senior Architect SAP Senior Architect SAP Fabian.Herschel@suse.com Markus.Guertler@suse.com

  2. Agenda SUSE Linux Enterprise Server for SAP Applications Business Continuity with SLES for SAP Applications SAP HANA System Replication Automation Scenarios Maintenance for SAP HANA System Replication Clusters 2

  3. SUSE + SAP Unrivaled Relationship Making SUSE the Smart Choice for SAP Workloads • 17+ years of joint testing and development at the SAP LinuxLab • Joint collaboration on Cloud Foundry • SUSE Linux Enterprise is the leading platform for SAP workloads on Linux • Seamless support from SAP and SUSE • SUSE Linux Server for SAP Applications delivers built-in high availability, superior performance and security • First and leading OS for SAP HANA • The platform powering SAP HANA Enterprise Cloud • SUSE OpenStack Cloud powers SAP’s HANA Cloud platform 3

  4. SUSE Linux Enterprise Server for SAP Applications 12 SP1 24x7 Priority Support for SAP Extended Service Pack Support Extended Service Pack Support 18 Month Grace Period 18 Month Grace Period SAP HANA SAP HANA ... ... Page Cache Installation SAP HANA Page Cache Installation SAP HANA ... ... Resource Resource Management Wizard Firewall Management Wizard Firewall Agents Agents SLE High Availability SLE High Availability SUSE Linux Enterprise Server SUSE Linux Enterprise Server SAP HANA & SAP NetWeaver SAP HANA & SAP NetWeaver SAP specific update channel SAP specific update channel 4

  5. Lifecycle Model / Extended Service Pack Support 13-year lifecycle (10 years general support, 3 years extended support) Up to 5-year lifecycle per Service Pack (3 years general + 2 years extended support) 18 month migration period between two service packs 6 month window to support “skip service pack” functionality (e.g. SPn to SPn+2) Long Term Service Pack Support (LTSS) available on top (x86-64 only) More information available on: http://www.suse.com/lifecycle 5

  6. Full System Rollback with One Click Update Rollback Reduce downtime from service pack update errors 6

  7. SUSE High Availability Solution for SAP HANA nodeA nodeB vIP SAP HANA SAP HANA Primary Secondary Cluster Communication SAPHana Master Slave Master/Slave Resource SAPHanaTopology Clone Clone Clone Resource Fencing 7

  8. Four Steps to Install and Configure Install SAP HANA Configure SAP HANA System Replication Install and initialize SUSE Cluster Configure SR Automation using HAWK wizard 8

  9. SAPHanaSR HAWK Wizard 9

  10. What is the Delivery? SUSE Linux Enterprise Server for SAP Applications The package SAPHanaSR ● the two resource agents ● SAPHanaTopology ● SAPHana ● HAWK setup Wizard The package SAPHanaSR-doc ● the important SetupGuide 10

  11. SAPHanaSR Scale-Up Scenarios 11

  12. SAP HANA Scale-Up: Performance Optimized Node 2 Usage: Dedicated Node A Node B Data pre-load pacemaker Yes on Secondary: active/active vIP Take-over Fully automated by SUSE cluster decision: solution HANA System Replication Take-over Fully automated by SUSE cluster process: solution SAP HANA SAP HANA (PR1) (PR1) Take-over primary secondary Fast due to pacemaker heartbeat reaction time: PR1 PR1 Take-over Fast since data pre-loaded A => B speed: 12

  13. SAP HANA Scale-Up: Cost Optimized Shared with other system (e.g. Node 2 Usage: QA1). Additional storage required. Node A Node B pacemaker Data pre-load No active/active on Secondary: vIP Take-over Fully automated by SUSE cluster solution decision: HANA System Replication Take-over Fully automated by SUSE cluster process: solution SAP HANA SAP HANA SAP HANA (PR1) (PR1) (QA1) Take-over Fast due to pacemaker heartbeat primary secondary non-prod reaction time: PR1 PR1 QA1 Slow: stop QA1 (meaning QA1 Take-over downtime) + completely load PR1 speed: A => B, Q into memory 13

  14. SAP HANA Multitenant Database Containers (MDC) MDC Considerations: Node A Node B pacemaker • Can apply “Performance Optimized” or “Cost active/active Optimized” scenarios Sys Sys vIP A B A B • A take-over acts on the parent HANA Database. HANA System Replication • All tenant database containers and associated services and therefor affected by a take-over. SAP HANA SAP HANA (PR1) (PR1) primary secondary • For new installations with SAP HANA rev > 120 PR1 PR1 MDC is the default and any installation results into a system and a data tenant. %A => %B 14

  15. SAP HANA Scale-Up: Multi Tier Node 2 Usage: Dedicated Node A Node B Node C Data pre-load pacemaker Yes on Secondary: active/active vIP Take-over Fully automated by SUSE cluster decision: solution SR SR sync async Take-over Fully automated by SUSE cluster process: solution SAP HANA SAP HANA SAP HANA (PR1) (PR1) (PR1) Take-over primary secondary secondary2 Fast due to pacemaker heartbeat reaction time: PR1 PR1 PR1 Take-over Fast since data pre-loaded A => B → C speed: 15

  16. SAPHanaSR Scale-Out Scenario 16

  17. SAPHanaSR-Scale-Out @A => @B SLES for SAP Applications - pacemaker cluster NodeA1 NodeA2 NodeA3 NodeB1 NodeB2 NodeB3 Majority ... ... vIP maker NodeA4 NodeA5 NodeB4 NodeB5 SR sync ... ... primary secondary 1 2 3 N 1 2 3 N SAP HANA PR1 – site WDF SAP HANA PR1 – site ROT 17

  18. SAP HANA Scale-Out Explained Worker and Standby Nodes A SAP HANA scale-out database consists of multiple nodes and SAP HANA instances. NodeA1 NodeA2 NodeA3 ... W W W Each worker node W has its own data NodeA4 NodeA5 partition. S S Standby nodes S do not have a data partition. ... 1 2 3 N 18

  19. SAP HANA Scale-Out Explained Master and Slave Nodes A SAP HANA scale-out database consists 1 of several services such as master name 2 server M . NodeA1 NodeA2 NodeA3 ... vIP M (M) The active master name server takes all NodeA4 NodeA5 client connections and redirects the client (M) to the proper worker node. It always has data partition 1. Master candidates (M) could be worker or standby nodes. Typically there are 3 nodes which could get active master name server. ... 1 2 3 N 19

  20. SAP HANA Scale-Out System replication client 1 2 NodeA1 NodeA2 NodeA3 NodeB1 NodeB2 NodeB3 Majority ... ... vIP maker NodeA4 NodeA5 NodeB4 NodeB5 SR channels per service Overall status SOK or SFAIL ... ... primary secondary 1 2 3 N 1 2 3 N SAP HANA PR1 – site WDF SAP HANA PR1 – site ROT 20

  21. SAP HANA Scale-Out Failures A lot of different failures must be client detected and processed by 1 2 SAPHanaSR-Scale-Out : ● outage of the majority maker NodeA1 NodeA2 NodeA3 NodeB1 NodeB2 NodeB3 ● outage of single or multiple Majority ... ... vIP maker nodes and instances ● outage of a complete SAP NodeA4 NodeA5 NodeB4 NodeB5 HANA SR site (primary or secondary) SR channels ● outage and recovery of system per service replication channels Overall status ● the vIP must “follow” the SOK or SFAIL master name server of the primary replication site ... ... primary secondary 1 2 3 N 1 2 3 N SAP HANA PR1 – site WDF SAP HANA PR1 – site ROT 21

  22. SAPHanaSR Scale-Out Conducting Typical Failures and Reactions Failure SAPHanaSR SAP HA processes failover. If SAP HA fails, Worker fails - node or instance SAPHanaSR processes a takeover or restart. Active master name Like the worker failure. In addition SAPHanaSR migrates the virtual IP address to the new active master server fails - name server. node or instance SAPHanaSR processes an instance restart to re- Standby fails - node or instance establish the full SAP HA capacity. SAPHanaSR processes a takeover on secondary or Primary site fails restart of the failed primary depending on configuration and system replication status. Standby site fails SAPHanaSR processes a database system restart to re-establish SAP HANA system replication. 22

  23. Let us start with the Maintenance 23

  24. A problem has been detected and your system has been shutdown to prevent damage of your computer. DRIVER_ERR_NEITHER_DIFFERENT_NOR_EQUAL If this is the first time you have seen this blue screen, restart your computer using key F13. If this screen appears again, follow these steps: * Check to make sure any new hardware or software is properly installed. * If this is a new installation, ask your software manufacturer for any updates you might need. * Feel free to re-install the current OS as often as you like or have time to do that. If problems continue, disable the current OS. We *strongly recommend* to switch to SUSE(R) Linux Enterprise for SAP Applications 12 SP2. Technical information: *** STOP: 0x00008A8A (0x00000003,0x00000002,0x00000001,0x00000000) *** goodby3.sys - Address 000B1E00 base at 000B1E00, DateStamp DEEEDEEE To continue or un-lock this session please shout “SUSE”. 24

  25. Wasn't that session about Towards zero down time ? ;-) 25

  26. About Maintenance Why do I need special maintenance procedures for clusters? What could be typical pitfalls? Please check our best practices for most current maintenance procedures – these slides only provide some top-level ideas. Our best practices are available at www.suse.com/products/sles-for-sap 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend