I tanium Power Programming Sverre Jarp CERN openlab 1 Summer - PowerPoint PPT Presentation

S.Jarp CERN “I tanium Power Programming” Sverre Jarp CERN openlab 1 Summer 2005

Lesson 1 S.Jarp a) I ntroduction CERN b) Overview of Architecture and Conventions Lesson 2 a) Standard I nstruction Set b) Our first “real” example Agenda: Lesson 3 a) Secrets of Speed b) An improved version our example Lesson 4 a) Multimedia I nstructions b) A top-notch version of our example Lesson 5 a) Floating-point I nstructions b) Changing our example to handle floating-point Lesson 6 a) Compilers and Assemblers: Peaceful coexistence? b) Conclusions Appendices 2 Summer 2005

Part 1a S.Jarp CERN I ntroduction 3 Summer 2005

Presentation Objectives S.Jarp CERN � Offer programmers � Comprehension of the architecture � I nstruction set and other features � Working Understanding of I tanium machine code � Compiler-generated code � Hand-written assembler code � I nspiration for writing code � Well-targeted assembler routines � Highly optimized routines � I n-line assembly code � Full control of architectural features 4 Summer 2005

Part 1b S.Jarp CERN Overview of Architecture and Conventions 5 Summer 2005

Architectural Highlights S.Jarp CERN � (Some of the) Main I nnovations: � Rich I nstruction Set � Bundled Execution � Predicated I nstructions � Large Register Files � Register Stack � Rotating Registers � Software Pipelined Loops � Control/ Data Speculation � Cache Control I nstructions � High-precision Floating-Point 6 Summer 2005

A simple example S.Jarp CERN � Lots of details Application registers Register � Many questions allocation .proc getval: alloc r3= ar.pfs,R_input,R_local,R_output,R_rotating (p0) movl r2= Table / / Base table address Enforced (p0) and in0= 7,in0 / / Choice is 0 – 7 Instruction ;; (p0) shladd r2= in0,3,r2 / / I ndex table Separation ;; (p0) ldfd f8= [r2] / / Load value (p0) br.ret.sptk.few rp / / return Predicated execution Branch return 7 Summer 2005

User Register Overview S.Jarp CERN 128 16 Kernel Integer Registers Backup Registers 128 8 Floating Point Registers Region Registers 64 128 Predicate Registers Control Registers 8 Instruction Pointer Branch Registers 128 NN Debug Application Registers Breakpoint Registers 5 NN Perf. Mon. CPUID Registers Data Reg’s 8 Summer 2005

I A64 Common Registers S.Jarp CERN � I nteger registers � 128 in total; Width is 64 bits + 1 bit (NaT); r0 = 0 � I nteger, Logical and Multimedia data � Floating point registers � 128 in total; 82 bits wide � 17-bit exponent, 64-bit significand � f0 = 0.0; f1 = 1.0 � Significand also used for two SI MD floats � Predicate registers � 64 in total; 1 bit each (fire/ do not fire) � p0 = 1 (default value) � Branch registers � 8 in total; 64 bits wide (for address) 9 Summer 2005

Rotating Registers S.Jarp CERN � Upper 75% rotate (when activated): � General registers (r32-r127) � Floating Point Registers (f32-f127) � Predicate Registers (p16-p63) � Formula: � Virtual Register = Physical Register – Register Rotation Base (RRB) ……. f28 f29 f30 f31 f32 f33 f34 f35 ……. f124 f125 f126 f127 10 Summer 2005

Register Convention S.Jarp CERN � Run-time: � Branch Registers: � B0: Call register [rp] � B1-B5: Must be preserved � B6-B7: Scratch � General Registers: � R1: Global Data Pointer [gp] � R2-R3: scratch � R4-R7: Must be preserved � R8-R11: Procedure Return Values [ret0, ret1, ret2, ..] � R12: Stack Pointer [sp] � R13: (Reserved as) Thread Pointer � R14-R31: Scratch � R32-Rxx: Argument Registers [in0, in1, in2, ..] 11 Summer 2005

Register Convention (2) S.Jarp CERN � Run-time convention � Floating-Point: � F2-F5: Preserved � F6-F7: Scratch � F8-F15: Argument/ Return Registers � F16-F31: Must be preserved � F32-F127: Scratch � Predicates: � P1-P5: Must be preserved � P6-P15: Scratch � P16-P63: Must be preserved � Additionally: � Ar.lc: Must be preserved 12 Summer 2005

Register Stack Rules S.Jarp CERN � The rotating integer registers serve as a stack � Each routine allocates via ”alloc” instruction: � I nput + Local + Output � “R_rotate” < = “R_input + R_local” may rotate (in a multiple of 8 registers) Proc A Local A Output A Proc B Input B + Local B Output B Proc C Further Calls Proc B Proc A Local A Output A 13 Summer 2005

I nstruction Types S.Jarp CERN � M � Memory/ Move Operations � I � Complex I nteger/ Multimedia Operations � A � Simple I nteger/ Logic/ Multimedia Operations � F � Floating Point Operations (Normal/ SI MD) � B � Branch Operations � L � Special instructions with 64-bit immediate 14 Summer 2005

I nstruction Bundle S.Jarp CERN � Bundle as “Packaging entity”: � 3 * 41 bit I nstruction Slots � 5 bits for Template (of I nst. types) � Typical examples: MFI or MI B � I ncluding bit for I nstruction Group Separation “S” � A bundle is 16B: � Basic unit for expressing parallelism � The unit that the I nstruction Pointer points to � The unit you branch to � Actually executed may be less, equal, or more Slot 2 Slot 1 Slot 0 T 15 Summer 2005

I nstruction Group Separation (Stop bit) S.Jarp CERN � Necessary to avoid “Dependency Violations” � For ALL registers: I nteger, FP, Predicate, Branch, App., etc. � Two out of four possibilities (Forbidden): � Read-After-Write (RAW): Good � add r22= 1,r21 ; add r23= 1,r22 ;; assemblers will issue � Write-After-Write (WAW): necessary � add r22= 1,r21 ; add r22= 1,r23 ;; warnings! � Two out of four (OK): � Read-After-Read (RAR): � add r22= 1,r21 ; add r23= 1,r21 ;; � Write-After-Read (WAR): � add r23= 1,r22 ; add r22= 1,r21 ;; 16 Summer 2005

Conventions S.Jarp CERN � I nstruction syntax � (qp) ops[.comp 1 ] r 1 = r 2 , r 3 � Execution is always right-to-left � Result(s) on left-hand side of equal-sign. � Almost all instructions have a qualifying predicate � Many have further completers: Unsigned, left, double, etc. � 7 6 5 4 3 2 1 0 � Numbering � Also right-to left 63 0 � I mmediates At execution time, sign bit is � Various sizes exist extended all the � I mm 8 (Signed immediate – 7 bits plus sign) way to bit 63 17 Summer 2005

Part 2a S.Jarp CERN Standard I nstruction Set 18 Summer 2005

The Total I nstruction Set S.Jarp CERN � Many I nstruction Categories: � Logical operations (e.g. and) � Arithmetic operations (e.g. add) � Compare operations � Shift operations � Branches, including loop control � Memory and cache operations � Move operations � Multimedia operations (e.g. padd) � Floating Point operations (e.g. fma) � SI MD Floating Point operations (e.g. fpma) See documentation for complete reference set 19 Summer 2005

Arithmetic Operations S.Jarp CERN � I nstruction format: X86 I nc/ Dec replaced with � (qp) ops 1 r 1 = r 2 , r 3 [,1] (qp) ops r 1 = r 2 ,r0,1 � (qp) ops 2 r 1 = imm x , r 3 � (qp) ops 3 r 1 = r 2 , count 2 , r 3 Z = Y – imm becomes (qp) Add r 1 = -imm, r 3 � Valid Operations: � ops 1 : add, sub � ops 2 : sub, adds/ addl (imm 14 , imm 22 ) � ops 3 : shladd Loading an immediate value (qp) Add r 1 = imm, r0 � NB: I nteger multiply is an FLP operation 20 Summer 2005

Compare Operations S.Jarp CERN � I nstruction format: � (qp) cmp.crel.ctype p 1 , p 2 = r 2 , r 3 � (qp) cmp.crel.ctype p 1 , p 2 = imm 8 , r 3 Parallel � (qp) cmp.crel.ctype p 1 , p 2 = r0, r 3 inequality form � Valid Relationships: � eq, ne, lt, le, gt, ge, ltu, leu gtu, geu, � Types: � none , unc, and, or, or.andcm, orcm, andcm, and.orcm 21 Summer 2005

Load Operations S.Jarp CERN � Standard instructions: � (qp) ld sz .ldtype.ldhint r 1 = [r 3 ], r 2 Always � (qp) ld sz . ldtype.ldhint r 1 = [r 3 ], imm 9 post- � (qp) ldf fsz .fldtype.ldhint f 1 = [r 3 ], r 2 modify � (qp) ldf fsz .fldtype.ldhint f 1 = [r 3 ], imm 9 � Valid Sizes: Sign-bit is NOT � sz: 1/ 2/ 4/ 8 [bytes] extended for � fsz: s(ingle)/ d(double)/ e(extended)/ 8(as integer) 1/ 2/ 4 bytes In the case � Types: of integer � s/ a/ sa/ c.nc/ c.clr/ c.clr.acq/ acq/ bias multiply (for instance) � Advanced options (not discussed here!) Also “fill” variants More complex usage (see Manuals) 22 Summer 2005

Branch Operations S.Jarp CERN � Several different types: � Conditional or Call branches � Relative offset (I P-relative) or I ndirect (via branch registers) � Triggered by predication � Return branches � I ndirect + Qualifying Predicate (QP) � Loop controlling branches: � Simple Counted Loops (br.cloop) � I P-relative with AR.LC � Software-pipelined Counted Loop (br.ctop) � I P-relative with AR.LC and AR.EC � Software-pipelined While Loops (br.wtop) � I P-relative with QP and AR.EC 23 Summer 2005

Simple Counted Loop S.Jarp CERN � Works as ‘expected’ � ar.lc counts down the loop (automatically) � No need to use a general register mov ar.lc= 5 ;; / / NB: 6 iterations loop: { work } ……. { much more work } br.cloop.sptk.few loop ;; � Software-pipelined loops are more advanced � Uses Epilogue Count (as well as Loop Count) � … and Rotating Registers We will deal with such loops later 24 Summer 2005

I tanium Power Programming Sverre Jarp CERN openlab 1 Summer - PowerPoint PPT Presentation

S.Jarp CERN I tanium Power Programming Sverre Jarp CERN openlab 1 Summer 2005 Lesson 1 S.Jarp a) I ntroduction CERN b) Overview of Architecture and Conventions Lesson 2 a) Standard I nstruction Set b) Our first real

TiDB: Overview, New Features, Multi-Cloud Shen Li, Kevin Xu PingCAP What is Ti(tanium) DB?

(power x 0) == 1 (power x (+ n 1)) == (* (power x n) x) (power x 0) == 1 (power x (+ (* 2 m)

THE POWER OF US THE POWER OF US FIRST NATIONAL WEBINAR September 12, 2017 WEBINAR AGENDA

WALES SOFT POWER BAROMETER 2018 Measuring soft power beyond the nation-state April 2018 01 WHAT

Hydro Power Generation e-Power CLA-VAL Europe Product Range e-Power IP e-Power HP e-Power MP

How does the power industry support How does the power industry support How does the power

Power Converters and Power Quality II CERN Accelerator School on Power Converters Baden, Friday 9

OF YOUR POWER TSi Power Modern Machines need Modern Power Conditioning Technologies Power

POWER GEN Power Generation Solutions Power Generation Solutions Currently the demand for

THE POWER OF ONE J. CORPENI NG POWER OF ONE THE POWER OF ONE HISTORY PROVES ALL THINGS OF

JSE Power Hour JSE Power Hour JSE Power Hour JSE Power Hour Managing your portoflio Managing

Adani Power Limited Power Business Goal - 20,000 MW ADANI POWER AT A GLANCE Power Business

Power Market, Power Trading and Power Exchange National Seminar Challenges & Opportunities

First Quarter 2018 Results The Power Brands in Power Transmissions The Power Brands in Power

Low Power Microprocessors Low Power Microprocessors Low Power Technology Gao Wei & Tian

voice Kate Howland End-user programming? End-user programming? End-user programming?

Floating point Today ! IEEE Floating Point Standard ! Rounding ! Floating Point Operations !

Floating Point Slides courtesy of: Randal E. Bryant and David R. OHallaron Bryant and

pop-count update draft-ietf-pim-pop-count-03 pop-count version 3 changes Mainly changes to

321 (in decimal) Data Representation 100 10 1 How did we get these? 10 2 10 1 10 0

Sinking Point Dynamic precision tracking for floating-point Bill Zorn Dan Grossman Zach

Course Evaluations 1. More examples This was the top request 2. Visuals/diagrams 3. Extra

The potential in Drupal 8.x and how to realize it Angela Byron, Gbor Hojtsy 1. Drupal 8: The

CSSE 220 More interfaces More recursion More fun? Check out RecursiveHelperFunctions and

I tanium Power Programming Sverre Jarp CERN openlab 1 Summer - PowerPoint PPT Presentation

S.Jarp CERN I tanium Power Programming Sverre Jarp CERN openlab 1 Summer 2005 Lesson 1 S.Jarp a) I ntroduction CERN b) Overview of Architecture and Conventions Lesson 2 a) Standard I nstruction Set b) Our first real

TiDB: Overview, New Features, Multi-Cloud Shen Li, Kevin Xu PingCAP What is Ti(tanium) DB?

(power x 0) == 1 (power x (+ n 1)) == (* (power x n) x) (power x 0) == 1 (power x (+ (* 2 m)

THE POWER OF US THE POWER OF US FIRST NATIONAL WEBINAR September 12, 2017 WEBINAR AGENDA

WALES SOFT POWER BAROMETER 2018 Measuring soft power beyond the nation-state April 2018 01 WHAT

Hydro Power Generation e-Power CLA-VAL Europe Product Range e-Power IP e-Power HP e-Power MP

How does the power industry support How does the power industry support How does the power

Power Converters and Power Quality II CERN Accelerator School on Power Converters Baden, Friday 9

OF YOUR POWER TSi Power Modern Machines need Modern Power Conditioning Technologies Power

POWER GEN Power Generation Solutions Power Generation Solutions Currently the demand for

THE POWER OF ONE J. CORPENI NG POWER OF ONE THE POWER OF ONE HISTORY PROVES ALL THINGS OF

JSE Power Hour JSE Power Hour JSE Power Hour JSE Power Hour Managing your portoflio Managing

Adani Power Limited Power Business Goal - 20,000 MW ADANI POWER AT A GLANCE Power Business

Power Market, Power Trading and Power Exchange National Seminar Challenges &amp; Opportunities

First Quarter 2018 Results The Power Brands in Power Transmissions The Power Brands in Power

Low Power Microprocessors Low Power Microprocessors Low Power Technology Gao Wei &amp; Tian

voice Kate Howland End-user programming? End-user programming? End-user programming?

Floating point Today ! IEEE Floating Point Standard ! Rounding ! Floating Point Operations !

Floating Point Slides courtesy of: Randal E. Bryant and David R. OHallaron Bryant and

pop-count update draft-ietf-pim-pop-count-03 pop-count version 3 changes Mainly changes to

321 (in decimal) Data Representation 100 10 1 How did we get these? 10 2 10 1 10 0

Sinking Point Dynamic precision tracking for floating-point Bill Zorn Dan Grossman Zach

Course Evaluations 1. More examples This was the top request 2. Visuals/diagrams 3. Extra

The potential in Drupal 8.x and how to realize it Angela Byron, Gbor Hojtsy 1. Drupal 8: The

CSSE 220 More interfaces More recursion More fun? Check out RecursiveHelperFunctions and

Power Market, Power Trading and Power Exchange National Seminar Challenges & Opportunities

Low Power Microprocessors Low Power Microprocessors Low Power Technology Gao Wei & Tian