Keeping up with the hardware
Challenges in scaling I/O performance Jonathan Davies
XenServer System Performance Lead
XenServer Engineering, Citrix Cambridge, UK
18 Aug 2015
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 1 / 50
Keeping up with the hardware Challenges in scaling I/O performance - - PowerPoint PPT Presentation
Keeping up with the hardware Challenges in scaling I/O performance Jonathan Davies XenServer System Performance Lead XenServer Engineering, Citrix Cambridge, UK 18 Aug 2015 Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015
XenServer Engineering, Citrix Cambridge, UK
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 1 / 50
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 2 / 50
The virtualisation performance challenge
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 3 / 50
The virtualisation performance challenge
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 4 / 50
The virtualisation performance challenge
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 5 / 50
Networking performance
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 6 / 50
Networking performance
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 7 / 50
Networking performance Improving intrahost single-stream throughput
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 8 / 50
Networking performance Improving intrahost single-stream throughput
XenServer 6.5 15 Gb/s Target 30 Gb/s
more is better
Dell R720 (2 × Xeon E5-2643 v2)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 9 / 50
Networking performance Improving intrahost single-stream throughput
XenServer 6.5 15 Gb/s (guests with 4.0 kernel) 9 Gb/s Target 30 Gb/s
more is better
Dell R720 (2 × Xeon E5-2643 v2)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 10 / 50
Networking performance Improving intrahost single-stream throughput
tx kernel in tcp_transmit_skb tx kernel clones skb tx kernel passes skb to ip layer tx kernel passes skb to netfront tx netfront written to
✆rst tx slot tx netfront written to last tx slot tx netback reading from
✆rst tx slot tx netback allocated skb tx netback enqueued on tx_queue tx netback
✆nished build_gops tx netback
✆nished gntcpy tx netback
✆nished gntmap tx netback dequeued skb from tx_queue tx netback
✆lling frags tx netback passing skb to kernel bridge received skb bridge delivered skb rx netback device received skb rx netback kicking receive thread rx netback enqueued in rxq rx netback gntcpy
✆nished rx netback dequeued from rxq rx netback freeing skb rx netback put in dealloc ring dealloc thread got from dealloc ring dealloc thread releasing dealloc thread sent tx response tx netfront received tx response tx netfront freeing skb rx netfront reading from rx slot rx netfront enqueuing skb on tmpq rx netfront dequeuing skb from tmpq rx netfront
✆lling frags rx netfront put on rxq rx netfront passing skb to kernel tsc/1000
Two CentOS 7.0 VMs (4.0.9 kernel) on Dell R720 (2 × Xeon E5-2643 v2)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 11 / 50
Networking performance Improving intrahost single-stream throughput
2000 4000 6000 8000 10000 12000 tx kernel calling tcp_transmit_skb tx kernel in tcp_transmit_skb tx kernel clones skb tx kernel passes skb to ip layer tx kernel passes skb to netfront tx netfront written to first tx slot tx netfront written to last tx slot tx netback reading from first tx slot tx netback allocated skb tx netback enqueued on tx_queue tx netback finished build_gops tx netback finished gntcpy tx netback finished gntmap tx netback dequeued skb from tx_queue tx netback filling frags tx netback passing skb to kernel bridge received skb bridge delivered skb rx netback device received skb rx netback kicking receive thread rx netback enqueued in rxq rx netback gntcpy finished rx netback dequeued from rxq rx netback freeing skb rx netback put in dealloc ring dealloc thread got from dealloc ring dealloc thread releasing dealloc thread sent tx response tx netfront received tx response tx netfront freeing skb rx netfront reading from rx slot rx netfront enqueuing skb on tmpq rx netfront dequeuing skb from tmpq rx netfront filling frags rx netfront put on rxq rx netfront passing skb to kernel tsc/1000
Two CentOS 7.0 VMs (4.0.9 kernel) on Dell R720 (2 × Xeon E5-2643 v2)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 12 / 50
Networking performance Improving intrahost single-stream throughput
2000 4000 6000 8000 10000 12000 tx kernel calling tcp_transmit_skb tx kernel in tcp_transmit_skb tx kernel clones skb tx kernel passes skb to ip layer tx kernel passes skb to netfront tx netfront written to first tx slot tx netfront written to last tx slot tx netback reading from first tx slot tx netback allocated skb tx netback enqueued on tx_queue tx netback finished build_gops tx netback finished gntcpy tx netback finished gntmap tx netback dequeued skb from tx_queue tx netback filling frags tx netback passing skb to kernel bridge received skb bridge delivered skb rx netback device received skb rx netback kicking receive thread rx netback enqueued in rxq rx netback gntcpy finished rx netback dequeued from rxq rx netback freeing skb rx netback put in dealloc ring dealloc thread got from dealloc ring dealloc thread releasing dealloc thread sent tx response tx netfront received tx response tx netfront freeing skb rx netfront reading from rx slot rx netfront enqueuing skb on tmpq rx netfront dequeuing skb from tmpq rx netfront filling frags rx netfront put on rxq rx netfront passing skb to kernel tsc/1000
Red boxes: periods when netfront is not running
Two CentOS 7.0 VMs (4.0.9 kernel) on Dell R720 (2 × Xeon E5-2643 v2)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 13 / 50
Networking performance Improving intrahost single-stream throughput
TX completion latency
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 14 / 50
Networking performance Improving intrahost single-stream throughput
2000 4000 6000 8000 10000 12000 tx kernel calling tcp_transmit_skb tx kernel in tcp_transmit_skb tx kernel clones skb tx kernel passes skb to ip layer tx kernel passes skb to netfront tx netfront written to first tx slot tx netfront written to last tx slot tx netback reading from first tx slot tx netback allocated skb tx netback enqueued on tx_queue tx netback finished build_gops tx netback finished gntcpy tx netback finished gntmap tx netback dequeued skb from tx_queue tx netback filling frags tx netback passing skb to kernel bridge received skb bridge delivered skb rx netback device received skb rx netback kicking receive thread rx netback enqueued in rxq rx netback gntcpy finished rx netback dequeued from rxq rx netback freeing skb rx netback put in dealloc ring dealloc thread got from dealloc ring dealloc thread releasing dealloc thread sent tx response tx netfront received tx response tx netfront freeing skb rx netfront reading from rx slot rx netfront enqueuing skb on tmpq rx netfront dequeuing skb from tmpq rx netfront filling frags rx netfront put on rxq rx netfront passing skb to kernel tsc/1000
Yellow slice: point of TX completion
Two CentOS 7.0 VMs (4.0.9 kernel) on Dell R720 (2 × Xeon E5-2643 v2)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 15 / 50
Networking performance Improving intrahost single-stream throughput
1
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 16 / 50
Networking performance Improving intrahost single-stream throughput
2000 4000 6000 8000 10000 12000 tx kernel calling tcp_transmit_skb tx kernel in tcp_transmit_skb tx kernel clones skb tx kernel passes skb to ip layer tx kernel passes skb to netfront tx netfront written to first tx slot tx netfront written to last tx slot tx netback reading from first tx slot tx netback allocated skb tx netback enqueued on tx_queue tx netback finished build_gops tx netback finished gntcpy tx netback finished gntmap tx netback dequeued skb from tx_queue tx netback filling frags tx netback passing skb to kernel bridge received skb bridge delivered skb rx netback device received skb rx netback kicking receive thread rx netback enqueued in rxq rx netback gntcpy finished rx netback dequeued from rxq rx netback freeing skb rx netback put in dealloc ring dealloc thread got from dealloc ring dealloc thread releasing dealloc thread sent tx response tx netfront received tx response tx netfront freeing skb rx netfront reading from rx slot rx netfront enqueuing skb on tmpq rx netfront dequeuing skb from tmpq rx netfront filling frags rx netfront put on rxq rx netfront passing skb to kernel tsc/1000
Two CentOS 7.0 VMs (3.18.20 kernel) on Dell R720 (2 × Xeon E5-2643 v2)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 17 / 50
Networking performance Improving intrahost single-stream throughput
2000 4000 6000 8000 10000 12000 tx kernel calling tcp_transmit_skb tx kernel in tcp_transmit_skb tx kernel clones skb tx kernel passes skb to ip layer tx kernel passes skb to netfront tx netfront written to first tx slot tx netfront written to last tx slot tx netback reading from first tx slot tx netback allocated skb tx netback enqueued on tx_queue tx netback finished build_gops tx netback finished gntcpy tx netback finished gntmap tx netback dequeued skb from tx_queue tx netback filling frags tx netback passing skb to kernel bridge received skb bridge delivered skb rx netback device received skb rx netback kicking receive thread rx netback enqueued in rxq rx netback gntcpy finished rx netback dequeued from rxq rx netback freeing skb rx netback put in dealloc ring dealloc thread got from dealloc ring dealloc thread releasing dealloc thread sent tx response tx netfront received tx response tx netfront freeing skb rx netfront reading from rx slot rx netfront enqueuing skb on tmpq rx netfront dequeuing skb from tmpq rx netfront filling frags rx netfront put on rxq rx netfront passing skb to kernel tsc/1000
Red boxes: periods when netfront is not running
Two CentOS 7.0 VMs (3.18.20 kernel) on Dell R720 (2 × Xeon E5-2643 v2)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 18 / 50
Networking performance Improving intrahost single-stream throughput
2000 4000 6000 8000 10000 12000 tx kernel calling tcp_transmit_skb tx kernel in tcp_transmit_skb tx kernel clones skb tx kernel passes skb to ip layer tx kernel passes skb to netfront tx netfront written to first tx slot tx netfront written to last tx slot tx netback reading from first tx slot tx netback allocated skb tx netback enqueued on tx_queue tx netback finished build_gops tx netback finished gntcpy tx netback finished gntmap tx netback dequeued skb from tx_queue tx netback filling frags tx netback passing skb to kernel bridge received skb bridge delivered skb rx netback device received skb rx netback kicking receive thread rx netback enqueued in rxq rx netback gntcpy finished rx netback dequeued from rxq rx netback freeing skb rx netback put in dealloc ring dealloc thread got from dealloc ring dealloc thread releasing dealloc thread sent tx response tx netfront received tx response tx netfront freeing skb rx netfront reading from rx slot rx netfront enqueuing skb on tmpq rx netfront dequeuing skb from tmpq rx netfront filling frags rx netfront put on rxq rx netfront passing skb to kernel tsc/1000
Red boxes: periods when NAPI is not running
Two CentOS 7.0 VMs (3.18.20 kernel) on Dell R720 (2 × Xeon E5-2643 v2)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 19 / 50
Networking performance Improving intrahost single-stream throughput
1
2
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 20 / 50
Networking performance Improving intrahost single-stream throughput
2000 4000 6000 8000 10000 12000 tx kernel calling tcp_transmit_skb tx kernel in tcp_transmit_skb tx kernel clones skb tx kernel passes skb to ip layer tx kernel passes skb to netfront tx netfront written to first tx slot tx netfront written to last tx slot tx netback reading from first tx slot tx netback allocated skb tx netback enqueued on tx_queue tx netback finished build_gops tx netback finished gntcpy tx netback finished gntmap tx netback dequeued skb from tx_queue tx netback filling frags tx netback passing skb to kernel bridge received skb bridge delivered skb rx netback device received skb rx netback kicking receive thread rx netback enqueued in rxq rx netback gntcpy finished rx netback dequeued from rxq rx netback freeing skb rx netback put in dealloc ring dealloc thread got from dealloc ring dealloc thread releasing dealloc thread sent tx response tx netfront received tx response tx netfront freeing skb rx netfront reading from rx slot rx netfront enqueuing skb on tmpq rx netfront dequeuing skb from tmpq rx netfront filling frags rx netfront put on rxq rx netfront passing skb to kernel tsc/1000
Red boxes: periods when NAPI is not running
Two CentOS 7.0 VMs (3.18.20 kernel) on Dell R720 (2 × Xeon E5-2643 v2)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 21 / 50
Networking performance Improving intrahost single-stream throughput
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 22 / 50
Networking performance Improving intrahost aggregate throughput
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 23 / 50
Networking performance Improving intrahost aggregate throughput
XenServer 6.5 33 Gb/s Target s
more is better
Dell R730 (2 × Xeon E5-2670 v3)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 24 / 50
Networking performance Improving intrahost aggregate throughput
1
2
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 25 / 50
Networking performance Improving intrahost aggregate throughput
Dell R730 (2 × Xeon E5-2670 v3)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 26 / 50
Networking performance Summary
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 27 / 50
Networking performance Summary
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 28 / 50
Storage performance
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 29 / 50
Storage performance
XenServer 6.5 Target
more is better
Debian 6.0 VM on Dell R815 (Opteron 6272), Intel S3700 SSD
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 30 / 50
Storage performance Reduce latency
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 31 / 50
Storage performance Reduce latency
1
2
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 32 / 50
Storage performance Reduce latency
Debian 6.0 VM on Dell R720 (2 × Xeon E5-2643 v2), Micron P320h SSD
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 33 / 50
Storage performance Reduce latency
1On blkback the improvement may be even larger. 2Until the tapdisk3 process fully utilises a CPU even when not polling – the next bottleneck. Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 34 / 50
Storage performance Reduce latency
Debian 6.0 VM on Dell R720, Intel S3700 SSD
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 35 / 50
Storage performance Reduce latency
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 36 / 50
Storage performance Allow more data in-flight
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 37 / 50
Storage performance Allow more data in-flight
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 38 / 50
Storage performance Allow more data in-flight
Ubuntu 15.04 VM using blkback on Dell R720 (2 × Xeon E5-2643 v2), Micron P320h SSD
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 39 / 50
Storage performance Allow more data in-flight
Ubuntu 15.04 VM using blkback on Dell R720 (2 × Xeon E5-2643 v2), Micron P320h SSD
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 40 / 50
Storage performance Allow more data in-flight
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 41 / 50
Storage performance Allow more data in-flight
Ubuntu 15.04 VM (16 vCPUs) using blkback on Dell R720 (2 × Xeon E5-2643 v2), Micron P320h SSD
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 42 / 50
Storage performance Allow more data in-flight
Ubuntu 15.04 VM (4 vCPUs) using blkback on Dell R720 (2 × Xeon E5-2643 v2), Micron P320h SSD
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 43 / 50
Storage performance Allow more data in-flight
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 44 / 50
Storage performance Allow more data in-flight
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 45 / 50
Storage performance Allow more data in-flight
Dell R720 (2 × Xeon E5-2643 v2), Micron P320h SSD
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 46 / 50
Storage performance Allow more data in-flight
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 47 / 50
Storage performance Summary
1
2
3
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 48 / 50
Storage performance Summary
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 49 / 50
Questions
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 50 / 50
Extra slides
Dell R220 (Xeon E3-1230 v3)
Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug 2015 1 / 1