Perform ance Scalability
- f a Multi-core W eb Server
Bryan Veal Annie Foong Intel R&D Perform ance Scalability of - - PDF document
Bryan Veal Annie Foong Intel R&D Perform ance Scalability of a Multi-core W eb Server Overview The number of CPU cores on modern servers is increasing rapidly Premise: for highly parallel workloads perform ance should scale w ith
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 2
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 3
– Have multiple cores – Have NICs mapped onto cores – Supports many clients – Each client has its own flow
– Parallelism in the TCP/ IP stack – Parallelism the application
Server ` ` ` ` ` ` ` ` Clients
Core Core Core Core NIC NIC NIC NIC Memory
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 4
SPECweb2005 Performance Scaling 1 2 3 4 4 8 12 16 Number of Cores Speedup
– Official results from HP – Similar scaling for Intel and AMD CPUs – Performance metric is throughput
– 2x the cores – 1.5x the performance
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 5
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 6
Web Server Cycles/ Byte 5 10 15 20 25 30 35 40 45 50 2 4 6 8 Number of Cores Cycles/ Byte Web Server Throughput 1 2 3 4 5 2 4 6 8 Number of Cores Throughput (Gb/ s) 1 2 3 4 5 6 7 8 Speedup
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 7
Web Server Cycles/ Byte 5 10 15 20 25 30 35 40 45 50 2 4 6 8 Number of Cores Cycles/ Byte Ratio of OS (TCP/ IP Stack) to Application (Web Server) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2 4 6 8 Number of Cores CPU Utilization Ratio
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 8
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 9
TCP/ IP Stack Throughput 1 2 3 4 5 6 2 4 6 Number of Cores Throughput (Gb/ s) TCP/ IP Stack CPU Utilization 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 2 4 6 Number of Cores CPU Utilization
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 1 0
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 1 1
– Waiting longer for spin locks – Traversing larger data structures
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 2 4 6 8 Number of Cores Instructions per Cycle
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 1 2
Top Third Poorest Scaling Functions 0.0 0.1 0.2 0.3 0.4 0.5
zend_hash_find t cp_sendpage skb_clone ap_merge_per_dir_configs __d_lookup _zend_hash_quick_add_or_updat e kmem_cache_free _zend_mm_alloc_int __alloc_skb dev_hard_st art _xmit copy_user_generic_st ring memset _c free_block memcpy_c t cp_ack t cp_init _t so_segs memcpy
Function Cycles/ Byte Increase between 1 and 8 Cores
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 1 3
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 1 4
0.000 0.001 0.002 0.003 0.004 0.005 2 4 6 8 Number of Cores Misses per Cycle
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 1 5
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 1 6
– Address Bus carries requests and responses for data, called snoops – Data Bus carries the data itself
– A cache miss generates a snoop on the address bus – Snoop is broadcast to memory and all rem ote caches to find the most current data – Current copy of data is in memory – All rem ote caches and memory respond
Snoop Snoop Response Data Response
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 1 7
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 2 4 6 8 Number of Cores System Bus Utilization
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 1 8
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 1 9
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 2 0
1 2 / 0 3 / 2 0 0 7 ANCS 2 0 0 7 -- Perform ance Scalability of a Multi-Core W eb Server 2 1