challenging the intel xeon arm and openpower
play

Challenging the Intel Xeon: ARM and OpenPower Now you really have - PowerPoint PPT Presentation

Challenging the Intel Xeon: ARM and OpenPower Now you really have to optimize Mighty Intel Intel had a 99.2 percent market share in server chips (IDC, 2015 Quoted on InfoWorld) We started experimenting with SoCs two


  1. Challenging the Intel Xeon: ARM and OpenPower Now you really have to optimize

  2. Mighty Intel … • “Intel had a 99.2 percent market share in server chips” (IDC, 2015 – Quoted on InfoWorld) • “We started experimenting with SoCs two years ago. … didn't work well because the single-thread performance was too low, resulting in higher latency for our web platform” – Facebook Engineering

  3. …sits solid on the Throne • Best & most mature process technology in the world – 14 nm finfet trigate (2014) • Power management the competition can only dream off • Richest software ecosystem

  4. Sizing Servers • Established in 2006 at Howest*, funded by Flemish gov since 2007 • 4 – 6 FTE (2007-2016) • 2 – 3 trainees • Specialized in independent performance optimization research • Howest = Technical University in West- Flanders (Kortrijk, Belgium)

  5. March 2016 IWT VIS TR 135096

  6. March 2012 • Java performance – + 60% for Xeon E5 v1 – +19% for Xeon E5 v4 • OLTP – + 51% for Xeon E5 v1 – +19% for Xeon E5 v4

  7. Recognize this one? • Moore’s law • “ were shrinking so fast that every year twice as many could fit onto a chip. • 1975 “adjusted the pace to a doubling every two years”

  8. There is Moore • CPU processing power per dollar • DRAM & NAND: price per megabit – a 35% per year reduction in price • Also drives the Cloud / Internet • “Google will do anything to beat Moore’s law ”

  9. MOORE'S LAW IS “SILICON VALLEY'S BEATING HEART””

  10. The Thermal Wall: 2004

  11. A few examples today Power Min die Density Product line Cores Clock Year Name Process size Power Historical ref points 103 Pentium 4 1 3,8 2004 "Prescott" 65nm 112 115 27 Pentium 3 1 1 1999 "Coppermine"180 nm 106 29 Today 75 Core i7-6xxx 4 4 2016 "Sky Lake" 14 nm 122 91 57 Xeon E5 8 3,4 2016 "Broadwell" 14 nm 246 140 50 Core i7 4xxx 4 4 2014 "Hasswell" 22 nm 177 88 GPUs 58 GeForce 1000 3584 1,6 2014 "Pascal" 16 nm 520 300 44 GeForce 800 2880 0,9 2016 "Kepler" 28 nm 571 250

  12. A bumpy road • 90 nm (2004), strained Silicon (35% faster switching) • 45 nm (2008) “high -k dieelectric ” – reduced leakage • 22 nm (2012) “ Trigate ” ( reduce both swithing and leakage power) – Research started in 2002!! • THE WALL: photolithography process light with a 193 nanometre wavelength – EUV (13,5 nm)

  13. 2013 • Still optimistic • Intel, AMD, TSMC, GlobalFoundries , and IBM => • “Moore’s Law Roadmap”

  14. 2016 • 10 nm Postponed to late 2017 • 7 nm: Big Question mark! • NO more Silicon, but Indium Gallium Arsenide (InGaAs) at 7 nm • Nanotubes? Graphene?

  15. • 4% loss per generation!

  16. Problem: big data gets brains • Data gets too complex for humans to analyze

  17. And Now? • Field Programmable Gate Array (FPGA) • ASICs (App Specific IC) • Graphical Processing Unit (GPU) • MIC (Many Integrated Cores) IWT VIS TR 135096

  18. IWT VIS TR 135096

  19. The market has changed too EVOLVING MARKET, NEW PLAYERS

  20. Total Market: something has changed

  21. Cavium Thunder-X • First 64-bit ARM server vs “ mid range” Xeon E5 • 48 “ simple 2 IPC” cores @ 2 GHz @ 120W – Single thread perf is 3-5x lower • 28 nm technology • Gigabyte servers

  22. Software ecosystem • No Java Native Access Libraries • Spark crashes with machine language message • MySQL, LAMP , most Java applications work

  23. Performance / watt

  24. Conclusion ARMv8 (64) • Niche oriented Cavium Thunder-X • Future chips of Qualcomm, Cavium (MaybeAvago Broadcomm) • AMD & AppliedMicro not competitive (yet??) • A few big customers: – Paypal (VPN, firewall, some webservices) – Already conquering the Chinese market (HiSilicon, HuaWei) • Fragmented market • Still unmature ecosystem: – JNA & ElasticSearch, Spark

  25. OPENPOWER

  26. POWER8 disadvantages • Very power hungry: 10 cores @ 190 W TDP + Mem buffers (60-80W) vs 22 cores @ 145W Xeon • JNA not supported • Some software still a bit unoptimized (MySQL)

  27. When OpenPOWER makes sense • Based upon most complex core on the market (8 threads, 8 IPC, 3.5+ GHz) • (Some) Pricing competitive with HP/Dell • 32 DIMM slots per CPU (Intel: 12) • Open from firmware to Software • Google & Rackspaces have a new OpenPOWER server • Some software runs as fast as best Xeons (MongoDB, PostGreS) • Software ecosystem has grown fast …

  28. OpenPower Ecosystem

  29. IBM: first integrator of NVLink

  30. “ Deep Learning” P100

  31. Page Migration Engine & POWER8 with NVLink Barriers to Entry Removed • Far easier to create new applications on Tesla P100 • NVIDIA Page Migration Engine ensures unified Too Large a memory space Memory Too complicated to move data Space Required • Unified memory: address space spans CPU and GPU, 1TB+ • Hardware managed transfers: eliminates explicit data transfers Too much • custom T esting program implementing these advantages Moves too coding for much data – POWER8 with NVLink ensures speedy data throughput GPU data movement • 1TB memory space requires faster CPU:GPU data movement • Bus masks transfer times Software UVM Requires page faulting – Close code-base to parallel CPU code support feature too limiting | 3 8

  32. Percona MySQL 5.7

  33. Few Large or many small nodes? SPARK TESTING

  34. Our test 300 GB GZIP “Common Crawl” Web archives Body tekst extract by “ BoilerPipe ” Natural Language Processing (Stanford) Aggregate: Group by & Sort entity counts Generate recommendations w Alternating Least Square IWT VIS TR 135096

  35. Realtime in-memory processing with Spark

  36. Spark Optimization • Number of virtual cores per executor (JVM): – 1 per 2 logical cores (Intel: 1, IBM: 4) • Number of executors = number of physical cores – 1 • spark.default.parallelism = +/- 1,5-2 tasks per executor • GCThreads= 1 per virtual core per executor • Speed up = 10-20%

  37. • 20% gain per generation

  38. Conclusions so far • Moore’s law is dead: opportunity for niche players • OpenPower has some tangible advantages • Next generation of ARM servers should be watched • New innovations … – Combining streaming, sensor data & static data – Deep learning • … will require much more tuning & specialized chips

  39. Rate My Session!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend