Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque - PowerPoint PPT Presentation

Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque (UPC/BSC) Miquel Moreto (ICSI/UPC/BSC) Francisco J. Cazorla (BSC/IIIA-CISC) Mateo Valero (UPC/BSC) Francisco J. Cazorla 8 th HIPEAC Director of the CAOS research group Berlin, Germany at BSC (www.bsc.es/caos) 21 st January 2013 8 th HIPEAC 2013 1 Carlos Luque 2 nd January

Outline CMP+SMT processors CPU accounting for CMP+SMT MIBTA CPU Accounting Randomized Sampled µIsolation ATD Phases Register File SMTs CMPs Release 8 th HIPEAC 2013 2 Carlos Luque 2 nd January

CMP+SMT processors � Thread-Level Parallelism (TLP) � Overcome the limitations to exploit Instruction-Level Parallelism � A wide variety of TLP paradigms (CMP, CGMT, FGMT, SMT) SMT CMP CGMT FGMT � Processor vendors combine different TLP paradigms � Reduce resource underutilization on each core � Exploit the available transistors � Examples: � IBM POWER5/6/7, Intel core i7 (CMP+SMT) � Oracle UltraSPARC T1,T2 (CMP+FGMT) � Multithreaded (MT) processor: processor supporting any TLP paradigm 8 th HIPEAC 2013 5 Carlos Luque 2 nd January

CPU accounting � CPU Accounting: � CPU time accounted to tasks running in a system (TA i ) CPU time Ti Tm Ti Tk Ti Tl usage � What is CPU Accounting used for? � OS task scheduler: maintain fairness between tasks � Charge users in data centers � Performance tools: statistics of various parameters of a task or a system � Principle of Accounting: the time accounted to a task must always be the same regardless of the workload in which it is executed. 8 th HIPEAC 2013 7 Carlos Luque 2 nd January

Measuring CPU accounting � Single-core: Classical approach � Time while the task is running, TR. (TR i = TA i ) � In MT processors resources are dynamically shared among tasks � TA to a task doesn’t only depend on the time that task is onto CPU � But also on the progress that the task makes during that time C. Luque, M. Moreto, F. J. Cazorla, R. Gioiosa, A. � TA i MT = TR i MT * Progress i MT MT =P i MT = IPC i MT / IPC i isol Progress i Buyuktosunoglu and M. Valero. CPU Accounting for Multicore Processors. � Hardware support for accounting: In IEEE Transaction on Computers, February 2012. � Determine dynamically, while the task run in a MT processor, the IPC it would have obtained if it had run… � In Isolation (most used baseline. Used in this paper) � with a fair share of the resources 8 th HIPEAC 2013 8 Carlos Luque 2 nd January

CPU Accounting in SMTs � Processor Utilization of Resources Register (IBM POWER5) � Decode 1.X: Only one thread can decode up to X instructions per cycle � CPU cycles acc. to a task = No. cycles the task decodes instructions � Scaled PURR (IBM POWER6) � CPU acc. scaled to compensate for the impact of throttling and DVFS � Arndt (US Patent 2006): � Decode 2.X � CPU cycles acc. to a task ~ No. instructions the task decodes in each cycle � Eyerman: A Per-thread cycle accounting architecture (ASPLOS 09) � Estimates the CPI Stack of each running task based on No. instructions dispatched by a task � Extra logic (+15 counters and tables with several R/W ports) spread over all the pipeline and updated on cycle-per-cycle basis � Tuned for the case in which the ROB is the bottleneck 8 th HIPEAC 2013 9 Carlos Luque 2 nd January

CPU Accounting in CMPs � ITCA: Inter-Task Conflict-Aware Accounting 1,2,3 � L2 concentrates the main interaction between tasks � On-chip bus, memory bandwidth partially considered � ITCA principles � Keep processor design as simple as possible � If task T B evicts data from a T A in L2, T A is said to suffer an inter-task L2 miss � ITCA provides support to ensure that the slowdown T A suffers due to inter-task misses is not added to its CPU accounted cycles � ATD: Auxiliary Tag Directory 1 Luque, C. at el, “CPU Accounting in CMP Processors”, CAL Feb 2009 2 Luque, C. at el, “ITCA: Inter-Task Conflict-Aware CPU Accounting for CMPs”, PACT 2009 3 Luque, C. at el, “Accurate CPU Accounting for Multicore Processors”, IEEE Transactions on Computers. Feb 2012 8 th HIPEAC 2013 10 Carlos Luque 2 nd January

MIBTA: Micro-Isolation-Based Time Accounting � Previous proposals or combination of them do not work well in CMP+SMT processor � Inaccurate � We developed a new accounting mechanism � MIBTA: Micro-Isolation-Based Time Accounting � MIBTA proposes an integral scalable solution to CMP+SMT processors � At SMT level: � Time Sampling technique � Register File Release � At CMP level: � Randomized Sampled Auxiliary tag directory, RSA • Tracks the interferences on on-cores 8 th HIPEAC 2013 12 Carlos Luque 2 nd January

MIBTA: SMT level � Tasks interact in many different resources (IQs, ROB, RFs, …) � Tracking all them complicate core design ( it is not a matter of just measuring how many bits data structures require ) � MIBTA � Does not track all shared resources on in-core � Instead divides the execution of a task into two phases: P=IPC MT /IPC isol All tasks but one are stalled TUS 1 (Task Under Study, TUS) TUS 0 Isolation phase Isolation phase Multithreaded phase All tasks run IPC Multithreaded Warmup phase Actual Isolation phase IPC isolation � MIBTA requires simple logic to stall tasks in the fetch stage (already present in IBM POWER5,6,7 processors) Isol phase i MT phase i � Small performance loss due to isolation phases 8 th HIPEAC 2013 13 Carlos Luque 2 nd January

MIBTA: SMT level: Register File Release � While in isolation phase the the RF keeps contents of stalled threads 1 � TUS enjoys less rename registers than if it runs actually in isolation � Its sampled IPC isol is lower than it should be � MIBTA solution: � At the beginning of isolation phase � Move architectural registers of the fetch-stalled tasks into the L2 � Lock those L2 lines � Write register values back to the RF at the end of the isolation phase � TUS enjoys as many rename registers as in isolation � Complexity: � Number L2 lines locked: 4 – 8 depending on the L2 cache size and the number of register � Similar technique used in the Intel Sandy Bridge processor [1] Assuming that the RF is not split into physical and architectural files in which case no change is needed 8 th HIPEAC 2013 14 Carlos Luque 2 nd January

MIBTA: CMP Level: RSA tag directory � Based on sampled ATD � ATD i : Copy of the tags of the LLC only accessed by task T i � Hit ATD miss in LLC � inter-task miss � Extra logic:The slowdown T A suffers due to inter-task misses is not added to its CPU accounted cycles � Sampled ATD (SATD) � RS-ATD 8 th HIPEAC 2013 15 Carlos Luque 2 nd January

Experimental Setup � 8 th HIPEAC 2013 16 Carlos Luque 2 nd January

Comparison Other Accounting Mechanism � Techniques targeting CMPs provide worse results than techniques targeting SMT � The interaction in SMT cores is much higher than on core-shared resources 8 th HIPEAC 2013 17 Carlos Luque 2 nd January

Throughput degradation 3,2% 3,0% Throughput degradation 2,8% 2,6% 2,4% 2,2% 2,0% 1,8% 1,6% 1,4% 1,2% 1,0% 0,8% 0,6% 0,4% 0,2% 0,0% -0,2% 2-way 4-way 2-way 4-way 2-way 4-way 2-way SMT SMT SMT SMT SMT SMT SMT 1 core 2 cores 4 cores 8 cores 8 th HIPEAC 2013 18 Carlos Luque 2 nd January

Conclusion � CPU accounting is a crucial measurement in current Computing Systems � The current accounting mechanisms are not as accurate as they should be in CMP+SMT processors � New accounting mechanism for CMP+SMT processors � Micro-Isolation-Based Time Accounting, MIBTA � High accuracy � Low hardware overhead � Not depend on the processor architecture 8 th HIPEAC 2013 19 Carlos Luque 2 nd January

Thanks for the attention! 8 th HIPEAC 2013 20 Carlos Luque 2 nd January

Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque (UPC/BSC) Miquel Moreto (ICSI/UPC/BSC) Francisco J. Cazorla (BSC/IIIA-CIS) Mateo Valero (UPC/BSC) 8 th HIPEAC Berlin, Germany 21 st January 2013 8 th HIPEAC 2013 21 Carlos Luque 2 nd January

Backup Slides 8 th HIPEAC 2013 22 Carlos Luque 2 nd January

Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque - PowerPoint PPT Presentation

Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque (UPC/BSC) Miquel Moreto (ICSI/UPC/BSC) Francisco J. Cazorla (BSC/IIIA-CISC) Mateo Valero (UPC/BSC) Francisco J. Cazorla 8 th HIPEAC Director of the CAOS research group Berlin,

Atelier Num erique OMP Code Optimization: Vectorization Bertrand Putigny July 5, 2016 1 / 27

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Router Architectures CPU CPU Memory Memory packets NFE NFE Processor Processor Line Card

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Pre-2012 CMP 2012 CMP Amendments 2018 CMP Amendments Above: Solar panel carports

CPU scheduling CPU 1 P k P 3 P 2 P 1 . . . CPU 2 . . . CPU n The scheduling problem: - Have

Workshop 1 North Central Texas Council of Governments CMP Workshop Overview Overview of

THE CMP INTEGRATES LIFELONG LEARNING WITH ASSESSMENT THE CMP INTEGRATES LIFELONG LEARNING WITH

http://cmp.imag.fr CMP annual users meeting, 4 Feb. 2016, PARIS Pr Process Portf rtfolio lio fr

CPU Scheduling Heechul Yun 1 Agenda Introduction to CPU scheduling Classical CPU

SMT WORLDWIDE SMT America, Europe and Asia staff has over 20 years experience in the SMT field

POLYMETALLIC PRODUCER AGM PRESENTATION June 30, 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL: SMT

SMT Solvers: A Disruptive Technology John Rushby Computer Science Laboratory SRI International

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr

Style Transfer from Non-Parallel Text by Cross-Alignment Shen et al 2017 Arxiv: 1705.09655

On the approximate cohomology of quasi holomorphic line bundles Jean-Pierre Demailly Institut

MetaPost 1.207 (TEXLive 2009) EuroTEX 2009 SVG backend SVG backend SVG backend SVG backend A

A Scalable System Design for Data Reduction in Modern Storage Servers Mohammadamin Ajdari

Algorithmic Species Revisited: A Program Code Classification Based on Array References Cedric

Understanding an R corpus IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Operations Research Integer Programming Ling-Chieh Kung Department of Information Management

Sortir les PME des GAFAM Retour dexprience OpenPony Juin 2015 OpenPony Sortir les PME des

Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque - PowerPoint PPT Presentation

Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque (UPC/BSC) Miquel Moreto (ICSI/UPC/BSC) Francisco J. Cazorla (BSC/IIIA-CISC) Mateo Valero (UPC/BSC) Francisco J. Cazorla 8 th HIPEAC Director of the CAOS research group Berlin,

Atelier Num erique OMP Code Optimization: Vectorization Bertrand Putigny July 5, 2016 1 / 27

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Router Architectures CPU CPU Memory Memory packets NFE NFE Processor Processor Line Card

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Pre-2012 CMP 2012 CMP Amendments 2018 CMP Amendments Above: Solar panel carports

CPU scheduling CPU 1 P k P 3 P 2 P 1 . . . CPU 2 . . . CPU n The scheduling problem: - Have

Workshop 1 North Central Texas Council of Governments CMP Workshop Overview Overview of

THE CMP INTEGRATES LIFELONG LEARNING WITH ASSESSMENT THE CMP INTEGRATES LIFELONG LEARNING WITH

http://cmp.imag.fr CMP annual users meeting, 4 Feb. 2016, PARIS Pr Process Portf rtfolio lio fr

CPU Scheduling Heechul Yun 1 Agenda Introduction to CPU scheduling Classical CPU

SMT WORLDWIDE SMT America, Europe and Asia staff has over 20 years experience in the SMT field

POLYMETALLIC PRODUCER AGM PRESENTATION June 30, 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL: SMT

SMT Solvers: A Disruptive Technology John Rushby Computer Science Laboratory SRI International

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 &amp; angr

Style Transfer from Non-Parallel Text by Cross-Alignment Shen et al 2017 Arxiv: 1705.09655

On the approximate cohomology of quasi holomorphic line bundles Jean-Pierre Demailly Institut

MetaPost 1.207 (TEXLive 2009) EuroTEX 2009 SVG backend SVG backend SVG backend SVG backend A

A Scalable System Design for Data Reduction in Modern Storage Servers Mohammadamin Ajdari

Algorithmic Species Revisited: A Program Code Classification Based on Array References Cedric

Understanding an R corpus IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Operations Research Integer Programming Ling-Chieh Kung Department of Information Management

Sortir les PME des GAFAM Retour dexprience OpenPony Juin 2015 OpenPony Sortir les PME des

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr