Monthly Webinar Series: Understanding storage performance for - - PowerPoint PPT Presentation

monthly webinar series
SMART_READER_LITE
LIVE PREVIEW

Monthly Webinar Series: Understanding storage performance for - - PowerPoint PPT Presentation

Monthly Webinar Series: Understanding storage performance for hyperconverged infrastructure Luke Pruen Technical Services Director Virtual SANs made simple Introducing StorMagic Enabling HCI Partner Network Global footprint 30 +


slide-1
SLIDE 1

Understanding storage performance for hyperconverged infrastructure

Luke Pruen – Technical Services Director “Virtual SANs made simple”

Monthly Webinar Series:

slide-2
SLIDE 2

Enabling HCI

Pre-configured, certified and supported by major vendors

Large and Small

Large and small deployments from enterprises with 1000s of site to SME’s with a single site

Global footprint

In 72 countries, customers depend on StorMagic for sever and storage infrastructure

Partner Network

Wherever you are, StorMagic has resellers, integrators, and server partners to meet your needs

30 + verticals

Including retail, financial services, healthcare, government, education, energy, professional services, pharma, and manufacturing

Introducing StorMagic

slide-3
SLIDE 3

What is Hyperconverged Infrastructure?

“Tightly coupled compute, network and storage hardware that dispenses with the need for a regular storage area network (SAN).”

Magic Quadrant for Integrated Systems: Published October 10th 2016

slide-4
SLIDE 4

There’s a lot of choice out there

The hyperconverged market

  • Market is maturing with many options now available
  • Be careful of pursuing a “one size fits all” approach

Customers

  • Few customers understand their requirements
  • Often blindly deploy over-spec’d solutions

Our take

  • Customers need to be able to measure their needs more accurately
  • Real world data often provides a surprising insight
slide-5
SLIDE 5

Hyperconverged Architectures: Kernel based

  • Storage “software” is within the hypervisor
  • Pools local server storage where the hypervisor is installed
  • Presents storage over a proprietary mechanism
  • Claims to be more efficient and able to deliver higher performance

̶ More efficient because its where the hypervisor runs ̶ Less “hops” to the storage

SSD

Hypervisor Storage SW

SSD

Hypervisor Storage SW

SSD

Hypervisor Storage SW

Shared Storage

slide-6
SLIDE 6

Hyperconverged Architectures: VSA based

  • A virtual storage appliance resides on the hypervisor
  • Host storage assigned to the local VSA
  • Storage is generally presented as iSCSI or NFS
  • Claims to be more flexible than kernel based models

̶ Hypervisor agnostic ̶ More storage flexibility ̶ Easier to troubleshoot storage issues vs hypervisor issues

SSD SSD SSD

Shared Storage

slide-7
SLIDE 7

StorMagic SvSAN: Overview

“SvSAN turns the internal disk, SSD and memory of 2

  • r more servers into highly

available shared storage”

slide-8
SLIDE 8

StorMagic SvSAN: Benefits

Availability Flexible Cost-Effective Robust

Data & operations protected  No single point of failure  Local and stretched cluster capable  Split-brain risk eliminated Proven at the IT edge and the datacenter  From harshest to the most controlled  Supports mission critical applications  Tolerates poor, unreliable networks Enterprise-class management  Integrates with standard tools  Automated deployment and recovery scripts  Designed for use by any IT professional

Today’s needs, future proofed Lightest footprint, lowest cost Any site, any network

No more physical SANs  Converge compute and storage  Utilize power of commodity servers  Eliminate storage networking components Lowest CAPEX  Start with only 2 servers - existing or new  Significantly less CPU and memory  One lightweight quorum for all clusters Lowest OPEX  Reduced power, cooling and spares  Lower costs with centralized management  Eliminate planned and unplanned downtime Performance and scale  Leverage any CPU and storage type  Active/Active synchronous mirroring  Scale-up performance with 2 node cluster Build-your-own Hyperconverged  Eliminate appliance overprovisioning  Configure to precise IOPS & capacity  Auto-tier disk, SSD and memory Flexibility and growth  Hyperconverged or storage-only  Hypervisor and server agnostic  Non-disruptive upgrades

slide-9
SLIDE 9

Optimizing Storage: All storage is not equal

9

Magnetic drives provide poor random performance

  • SATA 7.2k rpm 75 – 100 IOPS
  • SAS 10k/15k rpm 140 – 210 IOPS
  • Lower cost per GB
  • Higher cost per IOPS
  • Flash and SSDs have good random performance
  • SSD/Flash 8.6K to 10 millions IOPS
  • Lower cost per IOPS
  • High cost per GB compared to magnetic
  • Memory has even better performance
  • Orders of magnitude faster than Flash/SSD
  • Much higher cost per GB compared to SSD/Flash
  • Memory is volatile and typical low in capacity

*https://en.wikipedia.org/wiki/IOPS *https://en.wikipedia.org/wiki/RAM_drive

slide-10
SLIDE 10

Optimising Storage: The importance of caching

10

Virtualized environments suffer from the ‘I/O blender’ effect

  • Multiple Virtual Machines sharing a set of disks
  • Resulting in predominantly random I/O
  • Magnetic drives provide poor random performance
  • SSD & Flash storage ideal for workloads but expensive

Working sets of data

  • Driven by workloads which are ever changing
  • Refers to the amount of data most frequently accessed
  • Always related to a time period
  • Working sets sizes evolve as workloads change

Caching

  • Combat the I/O blender effect without the expense of all Flash or SSD
  • Working sets of data can be identified and elevated to cache
slide-11
SLIDE 11

Optimising Storage: SSD/Flash caching

SSD/Flash Caching

  • Significantly improves overall I/O performance
  • Reduces the number of I/Os going directly to disk
  • Dynamic cache sizing read/write ratio

Writes operations

  • Data is written as variable sized extents
  • Extents are merged and coalesced in the background
  • Data in cache is flushed to hard disk regularly in small bursts

Read operations

  • SvSAN algorithm identifies and promotes data, based on access patterns
  • Frequently accessed data blocks are elevated on SSD/Flash
  • Least frequently accessed blocks are aged out
slide-12
SLIDE 12

Optimising Storage: Cold & Hot data

Intelligent read caching algorithm

  • All read I/Os are monitored and analyzed
  • Most frequently used data – “Hot” data
  • Cache tiers are populated based on access frequency

Tiering

  • RAM: Most frequently accessed data
  • SSD/Flash: Next most frequently accessed data
  • HDD: Infrequently accessed data – “Cold” data

Sizing

  • Assign cache sizes to meet requirements
  • Grow caches as working sets change
  • Use any combination of Memory, SSD/Flash and Disk

Play to the strengths

  • Play to the strengths of all mediums
  • Memory Highest IOPS
  • SSD/Flash Magnetic drives providing low price per GB
slide-13
SLIDE 13

Industry performance numbers

Lab produced

  • Numbers produced under strict conditions representing peak IOPS
  • Random workloads focus on small block sizes to produce BIG numbers
  • Sequential workloads focus on large block sizes to show BIG throughput
  • Set unrealistic expectations

Example

  • All Read: 4KiBs 100% random read
  • Mixed Read/Write: 4KiBs 70/30 random read
  • Sequential Read: 256KiB
  • Sequential Write: 256KiB

The real world

  • Multiple VMs running numerous mixed workloads
  • AD, DNS, DHCP: low IOPS requirement
  • Database, email and application servers: higher IOP requirement
  • Generally sharing the same storage subsystem

*SQL Server I/O block size ref table

Operation IO Block Size

Transaction log write 512 bytes – 60 KB Checkpoint/Lazywrite 8KB – 1MB Read-Ahead Scans 128KB – 512KB Bulk Loads 256KB Backup/Restore 1MB ColumnStore Read-Ahead 8MB File Initialization 8MB In-Memory OLTP Checkpoint 1MB

slide-14
SLIDE 14

How much performance is enough?

What do you need?

  • Understand and document your storage requirements
  • Current IOP and latency requirements of the current environment
  • Lifecycle of solution?

How do you choose?

  • Don’t base your decision on a 4KiB 100% random read workload
  • When evaluating use realistic workloads
  • Does the management and functionality meet your needs?

What matters

  • Meets your current performance & capacity requirements
  • Meets your future performance & capacity requirements
  • Meets your deployment, management and availability requirements
slide-15
SLIDE 15

Customer data analysis

Real life data

  • Real customer data collected and analysed
  • Exact data patterns simulated and replayed
  • Accurate expectations performance under their workloads

Results

  • Average customer would benefit from caching/tiering
  • Up to 70% of I/O being satisfied from read cache
  • A small amount of cache makes a big difference

Conclusion

  • Few customers have performed this exercise
  • Over provision hardware is common
  • Significant cost savings were identified

Customer examples

  • UK - Oil & Gas
  • US - On-demand consumer service
  • US - National retailer
slide-16
SLIDE 16

Customer data analysis: Oil and Gas (UK)

Read Write Read/Write Ratio 53% 47% Average Per Day 93GB 84GB Average Block Size 61KB 24KB Average IOPS 18 41

Workloads

  • Back office apps
  • Back up service

Challenge

  • Customer was looking to understand current workloads
  • No definitive indication of current storage performance requirements
  • Concerned about growing number of disk failures

StorMagic analysis

  • Enabled I/O meta-data collection of a period of time

̶ Distribution of I/O sizes ̶ Throughput and IOPS ̶ Locality of access

0.001 0.01 0.1 1 10 100 1000 10000 21 MB 264 GB 529 GB 794 GB 1.0 TB 1.3 TB 1.6 TB 1.8 TB 2.1 TB 2.3 TB 2.6 TB 2.8 TB 3.1 TB 3.4 TB 3.6 TB 3.9 TB 4.1 TB 4.4 TB 4.7 TB 4.9 TB 5.2 TB 5.4 TB 5.7 TB Number of accesses (logarithmic scale)

Locality of access

Read Write

500 1000 1500 2000 2500 3000 3500 4000 14:16 15:04 15:52 16:40 17:28 18:16 19:04 19:52 20:40 21:28 22:16 23:04 23:52 00:40 01:28 02:16 03:04 03:52 04:40 05:28 06:16 07:04 07:52 08:40 09:28 10:16 11:04 11:52 12:40 13:28

IOPS Time of Day (UTC)

Throughput IOPS

Read Write

slide-17
SLIDE 17

Customer data analysis: Oil and Gas (UK)

Estimates

  • SSD & memory: 70% of I/O being satisfied from read cache when using 2GB

memory and 200GB SSD

  • Memory only: 56% of I/O being satisfied from read cache when using 2GB of

memory Testing

  • Replay the exact workload collected from live environment
  • No best guess synthetic workload but exact patterns from data collection

Conclusion

  • Current environment sufficient for today’s workloads
  • Allocate a small amount of memory and SSD for optimal caching
  • Using caching would increase disk MTBF

17% 23% 60%

0% 20% 40% 60% 80% 100%

Read Data Serviced by Tiers

Me mo ry SSD Disk 30% 14% 56% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Estimate % Read Serviced by Tiers

Memory SSD Disk 28.79 GB 13.52 GB 53.94 GB 0.00 GB 10.00 GB 20.00 GB 30.00 GB 40.00 GB 50.00 GB 60.00 GB 70.00 GB 80.00 GB 90.00 GB 100.00 GB

Actual Data Read From Tiers

slide-18
SLIDE 18

Customer data analysis: On-demand consumer service (US)

Read Write Read/Write Ratio 40% 60% Average Per Day 9GB 13GB Average Block Size 30KB 11KB Average IOPS 5 15

Workloads

  • Network monitoring for on-demand service
  • Back office apps

Challenge

  • Customer was evaluating Hyperconverged solutions
  • Was considering full flash

StorMagic analysis

  • Enabled I/O meta-data collection of a period of time in a live POC

̶ Distribution of I/O sizes ̶ Throughput and IOPS ̶ Locality of access

50 100 150 200 250 300 350 400 16:13 16:53 17:33 18:13 18:53 19:33 20:13 20:53 21:33 22:13 22:53 23:33 00:13 00:53 01:33 02:13 02:53 03:33 04:13 04:53 05:33 06:13 06:53 07:33 08:13 08:53 09:33 IOPS Time of Day (UTC)

Throughput IOPS

Read Write 0.001 0.01 0.1 1 10 100 1000 10000 42 GB 87 GB 132 GB 177 GB 222 GB 267 GB 312 GB 357 GB 402 GB 447 GB 492 GB 536 GB 581 GB 626 GB 671 GB 716 GB 761 GB 806 GB 851 GB 896 GB 941 GB 986 GB Number of accesses Thousands

Locality of access

Read Write

slide-19
SLIDE 19

Estimates

  • SSD & memory: 94% of I/O being satisfied from read cache, when using 2GB

memory and 120GB SSD

  • Memory only: 94% of I/O being satisfied from read cache when using 2GB of

memory Testing

  • Replay the exact workload collected from live environment
  • No best guess synthetic workload but exact patterns from data collection

Conclusion

  • Environment sufficient for workloads
  • Allocate a small amount of memory to satisfy almost all reads

6% 94% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Read Data Serviced by Tiers

Memory SSD Disk 0.37GB 67.85GB 0.00GB 10.00GB 20.00GB 30.00GB 40.00GB 50.00GB 60.00GB 70.00GB 80.00GB

% of Read Data Serviced by Tiers

Customer data analysis: On-demand consumer service (US)

slide-20
SLIDE 20

Customer data analysis: National retailer (US)

Read Write Read/Write % 77% 23% Average Per Day 991 GB 294 GB Average Block Size 58 KB 54 KB Average IOPS 212 138

Workloads

  • Point of Sale
  • 78 in store applications
  • Back up service

Challenge

  • Customer is looking at a hardware refresh across store locations
  • How to size for current environment and future growth?

StorMagic analysis

  • Enabled I/O meta-data collection of a period of time

̶ Distribution of I/O sizes ̶ Throughput and IOPS ̶ Locality of access

500 1000 1500 2000 2500 3000 3500 18:39 21:38 00:37 03:36 06:35 09:34 12:33 15:32 18:31 21:30 00:29 03:28 06:27 09:26 12:25 15:24 18:23 21:22 00:21 03:20 06:19 09:18 12:17 15:16 18:15 IOPs Time of Day (UTC)

Throughput IOPs

Read Write 0.001 0.01 0.1 1 10 100 1000 10000 21 MB 121 GB 243 GB 365 GB 487 GB 608 GB 730 GB 852 GB 974 GB 1.1 TB 1.2 TB 1.3 TB 1.4 TB 1.5 TB 1.7 TB 1.8 TB 1.9 TB 2.0 TB 2.1 TB 2.3 TB 2.4 TB 2.5 TB 2.6 TB Number of accesses (logarithmic scale) Thousands

Locality of access

Read Write

slide-21
SLIDE 21

Customer data analysis: National retailer (US)

Estimates

  • SSD and Memory: As high as 83% of I/O being satisfied from read cache when

using 8GB memory and 200GB SSD

  • Memory only: As high as 60% of I/O being satisfied from read cache when

using 8GB of memory Testing

  • No SSD used in testing, only focused on memory read caching
  • Replay the exact workload collected from live environment
  • No best guess synthetic workload but exact patterns from data collection

Conclusion

  • Use SATA and memory caching vs SAS disks
  • Reduced drive count but doubled performance and capacity
  • Reduced power and cooling
  • Less disks means a increase in Mean Time Between Failures (MTBF)

17% 23% 60%

0% 20% 40% 60% 80% 100%

Read Data Serviced by Tiers

Me mo ry SSD Disk

slide-22
SLIDE 22

What did these customers learn

  • Clarity into their environment workloads
  • They could spend less on hardware
  • How caching technologies will benefit their workloads
  • Identifying their immediate requirements
  • Know the performance limits of environment
  • How the environment can scale to meet performance growth
slide-23
SLIDE 23

Summary

  • Understand your requirements
  • Create a viable success criteria
  • Only compare solutions that make sense
  • Be wary of lab produced numbers!
  • Storage performance can be affected by other factors
  • Simulate workloads you plan to run
slide-24
SLIDE 24

24

Q&A and Next Steps

SvSAN Product Information

Product Options SvSAN license 2, 6, 12 and unlimited TBs License entitlement 2 mirrored servers Maintenance and support Platinum - 24x7 / Gold - 9x5

For further information, please contact: sales@stormagic.com Further Reading: An overview of SvSAN - http://stormagic.com/svsan/ SvSAN v6 Data Sheet - http://stormagic.com/svsan-data-sheet/ SvSAN v6 White Paper - http://stormagic.com/svsan-6/ Download your free trial of SvSAN

stormagic.com/trial