without losing performance
play

Without Losing Performance A big.LITTLE Software Strategy Klaas van - PowerPoint PPT Presentation

Cut Power Consumption by 5x Without Losing Performance A big.LITTLE Software Strategy Klaas van Gend FAE, Trainer & Consultant The mandatory Klaas-in-a-Plane picture LINUXCON EUROPE 2014 2 | October 10, 2014 Quad Core vs. Dual Core


  1. Cut Power Consumption by 5x Without Losing Performance A big.LITTLE Software Strategy Klaas van Gend FAE, Trainer & Consultant

  2. The mandatory Klaas-in-a-Plane picture LINUXCON EUROPE 2014 2 | October 10, 2014

  3. Quad Core vs. Dual Core – Why isn’t it Twice as Fast? VS LINUXCON EUROPE 2014 3 | October 10, 2014

  4. The GHz race LINUXCON EUROPE 2014 4 | October 10, 2014

  5. Why GHz++ cost power^2 LINUXCON EUROPE 2014 5 | October 10, 2014

  6. ARM big.LITTLE “OK, heavy work costs power. Let’s not waste power on light work…”

  7. ARM playing it cool: big.LITTLE Source: http://community.arm.com/groups/processors/blog/2013/06/18/ten-things-to-know-about-biglittle/ LINUXCON EUROPE 2014 7 | October 10, 2014

  8. A7 vs A15 Cortex A7: • Less silicon area • Less optimal cycles • Less cycles/second • More power efficient LINUXCON EUROPE 2014 8 | October 10, 2014

  9. How to use big.LITTLE today Source: http://community.arm.com/groups/processors/blog/2013/06/18/ten-things-to-know-about-biglittle/ LINUXCON EUROPE 2014 9 | October 10, 2014

  10. Some available big.LITTLE hardware • AllWinner A80 • Renesas automotive silicon • Samsung Galaxy S4 for South-Korean market • Samsung Galaxy S5 for South-Korean market Exynos5 • Hardkernel ODROID-XU board Built-in Power Measurement LINUXCON EUROPE 2014 10 | October 10, 2014

  11. Use Case: Chromium

  12. Chrome / Chromium / ChromeShell • Chromium : open source browser based on KHTML Webkit Blink • Google Chrome : closed-source browser based on Chromium • ChromeShell : open source Chromium “browser” for Android • Chrome for Android :closed-source browser for Android LINUXCON EUROPE 2014 12 | October 10, 2014

  13. Chromium workload Visualized Loading Parsing Layouting/Rendering Painting JavaScript Canvas LINUXCON EUROPE 2014 13 | October 10, 2014

  14. HTML5 Canvas “graphics device for JavaScript” LINUXCON EUROPE 2014 14 | October 10, 2014

  15. HTML5 Canvas “graphics device for JavaScript” LINUXCON EUROPE 2014 15 | October 10, 2014

  16. Parallelizing Canvas “not as easy as it looks” LINUXCON EUROPE 2014 16 | October 10, 2014

  17. Canvas Parallelization - Performance Results Benchmark Standard Parallelized Performance on quad-core Blink Blink improvement Flashcanvas perf 1,69 score 2,44 score 44% Fc perf w/ alpha 1,04 score 1,52 score 50% Guimark2 Vector 9,5 fps 13,3 fps 40% Canvasmark ‘13 3475 score 4116 score 53% Average improvement 47% With parallelism you can improve performance of even the most complex applications! LINUXCON EUROPE 2014 17 | October 10, 2014

  18. Google Chrome LINUXCON EUROPE 2014 18 | October 10, 2014

  19. Google Chrome on Odroid-XU+E Using Google’s Chome (version 33 for Android) • 2 cores active: 54% and 84% • Power use A15+A7 cores: 2.374 Watts • Test average: 9.44 fps LINUXCON EUROPE 2014 19 | October 10, 2014

  20. Our ChromeShell on Odroid-XU+E Using our optimized ChromeShell: • 3 A15 cores active: 59%, 63% and 38% • Power use A15+A7 cores: 3.116 Watts • Test average: around 14 fps LINUXCON EUROPE 2014 20 | October 10, 2014

  21. Canvas Parallelization - works even on ‘normal’ silicon like Qualcomm Snapdragon 800 “Our” ChromeShell: Default Chrome Average: 7.12 fps Average: 14.48 fps • LG’s NEXUS 5 phone • Quad core Qualcomm Snapdragon 800 • Phone heating up similarly in both cases LINUXCON EUROPE 2014 21 | October 10, 2014

  22. Canvas Parallelization - Power Consumption on “ Flashcanvas perf ” Benchmark Standard Blink Parallelized Blink Difference on A15+GPU on quad-A7 No optimization 29 fps 17 fps -40% Performance 29 fps 26 fps -10% Power 2,2W 0,4W 550% consumption Performance / 1,3 65 490% Watt With parallelism and right chip choices • you can get 5x power savings • without losing performance! LINUXCON EUROPE 2014 22 | October 10, 2014

  23. Comparing performance / watt: Using Google’s Chome (version 33 Android) • 2 cores active: 54% and 84% • Power use A15+A7 cores: 2.374 Watts • Test average: 9.44 fps Using our optimized ChromeShell: • 3 A7 cores active: 73%, 80% and 44% • Power use A15+A7 cores: 0.472 Watts • Test average: 10.04 fps LINUXCON EUROPE 2014 23 | October 10, 2014

  24. 1x A15 or 4x A7? LINUXCON EUROPE 2014 24 | October 10, 2014

  25. 1x A15 < 4x A7 ! Less W 20000 MIPS More than twice the performance LINUXCON EUROPE 2014 25 | October 10, 2014

  26. Back to big.LITTLE Making these results work outside a lab

  27. State of big.LITTLE in Linux - 1 What’s in the kernel today? LINUXCON EUROPE 2014 27 | October 10, 2014

  28. State of big.LITTLE in Linux - 2 What else is relevant? • IKS (In-kernel-Switcher) – Firstly available in Linaro kernel trees – Merged in 3.11 kernel • Qualcomm / LG / etc powerdaemons – Throttle performance if cores overheat – Usually “secret” • Not-in-mainline Schedulers: – Linaro’s GTS (Global Task Scheduler), – a.k.a. HMP (Heterogeneous Multi-Processing) • Kernel Summit 2014 “Energy -Aware Scheduling Workship ” LINUXCON EUROPE 2014 28 | October 10, 2014

  29. Feedback loop Setpoint • We know when we want to have 4xA7 or 1xA15 • If we can tell the kernel, it can anticipate – instead of noticing an increase in workload – and by accident turning on the A15s LINUXCON EUROPE 2014 29 | October 10, 2014

  30. Where to go? • Qualcomm MARE – Research project “Feedback loop” – Framework to aid parallelization – Should assist kernel in scheduling/cpufreq (in user space) • Deadline scheduler – Merged in Linux 3.14 – Application sets SCHED_DEADLINE – Application sets scheduling attributes “Feedback loop” – Task repetition in microseconds (in kernel space) – Task start within repetition – Task completion deadline within repetition LINUXCON EUROPE 2014 30 | October 10, 2014

  31. Is parallelism going to stay? Actually, is big.LITTLE going to stay??? • The GHz race has come to an end – Now also for ARM • The speed of light limits “clock domain size” • Thus many clock islands on a die – Multicore is just an “easy” way to improve performance – At the cost of the programmer – Who needs extra training • ARM big.LITTLE – Is a mechanism to skip heavy power consumption At the cost of more mm 2 silicon – – Is it worth it??? LINUXCON EUROPE 2014 31 | October 10, 2014

  32. My ideal ARM-based design: big: 1x A57 LITTLE: 4x A53 Why is no-one designing this chip? LINUXCON EUROPE 2014 32 | October 10, 2014

  33. Conclusions

  34. Conclusion • big.LITTLE works • IFF – Short bursts can be handled by one ‘big’ core – Heavier workloads are parallelizable and run on clusters of LITTLEs – APIs become available: Programs must indicate what the workload will be BTW: Chromium is parallelizable – we did it. LINUXCON EUROPE 2014 34 | October 10, 2014

  35. Conclusion • big.LITTLE works • IFF – Short bursts can be handled by one ‘big’ core – Heavier workloads are parallelizable and run on clusters of LITTLEs – APIs become available: Programs must indicate what the workload will be BTW: Chromium is parallelizable – we did it. LINUXCON EUROPE 2014 35 | October 10, 2014

  36. Vector Fabrics – the Company • Founded February 2007 • Founding team – Strong in SoC design and multi-core software – Currently 15 FTE: 6 PhD, 7 MSc • Protected technology – 3 patents filed in US & Europe • Recognition – “Hot Startup” in EE Times Silicon 60, since 2011 – Selected by Gartner as “ Cool vendor in Embedded Systems & Software ” 2013 – Global Semiconductors Alliance award, March 2013 LINUXCON EUROPE 2014 36 | October 10, 2014

  37. Contact Information • Web: www.vectorfabrics.com • Email: klaas@vectorfabrics.com • Tel: +31 40 8200960 • Address: Vector Fabrics B.V. Vonderweg 22 5616RM Eindhoven The Netherlands LINUXCON EUROPE 2014 37 | October 10, 2014

  38. Thank You! (drop your business card if you want the slides and the to-be-released whitepaper) Klaas van Gend FAE, Trainer & Consultant klaas@vectorfabrics.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend