A Look at Gforth Performance M. Anton Ertl TU Wien New performance - PowerPoint PPT Presentation

A Look at Gforth Performance M. Anton Ertl TU Wien

New performance features since Gforth 0.5.0 (2000) • primitive-centric direct threaded code (EuroForth 2001) • dynamic superinstructions with replication (PLDI 2003) • static superinstructions (EuroForth 2003) • multi-state static stack caching (IVME 2004, EuroForth 2005) • automatic build tuning (explicit register allocation) • workarounds for GCC performance bugs • branch target alignment • ... What is the big picture? How well do the performance features work relative to others? How well does it work across machines and GCC versions?

Portability space • 7 architectures, 12 architecture/CPU combinations • up to 9 GCC versions per architecture How well do the performance features do in all variations? Measurements • 4 Gforth versions, 7 with options • 5 application benchmarks (geometric mean reported) • 3 runs (median reported) • logarithmic graphs

Overall performance perf/cycle geometric mean 1 IA32 Xeon 5450 • Typical speedup factor: 3 0.9 AMD64 Xeon 5450 0.8 • Biggest contribution: IA32 Opteron 270 IA32 Athlon MP 0.7 AMD64 Opteron 270 Dynamic superinstructions 0.6 in 0.6.2 for Alpha IA32 PPC IA32 Pentium 4 Northwood PPC 7447A 0.5 in 0.7.0 for others PPC 970 Alpha 21264B • Automatic tuning (0.7.0) PPC64 PPC970 0.4 IA64 Itanium II helps IA32 ARM Xscale IOP80321 • Multi-state stack caching vs. 0.3 static superinstructions • Branch target alignment 0.2 helps Alpha (0.7.0) • best performance per cycle: IA32, AMD64 Reason: indirect branch predictors gforth version 0.1 0.5.0 0.6.1 0.6.2 0.7.0 0.6.1nd 0.6.2ns 0.7.0ssc

Dynamic Superinstructions Forth Threaded code Native code mov [esi], ecx lit mov ecx, [ebx] : foo 5 ; 5 add ebx, #4 ;s add esi, #-4 add ebx, #4 mov ebx, [edi] add edi, #4 add ebx, #4 jmp -4[ebx]

Engines speed PPC 7447A gforth-fast 1 0.9 0.8 • Benchmarking: gforth-fast 0.7 • Debugging: gforth 0.6 Error detection and reporting gforth 0.5 • Typical difference: factor 2 • Debugging engine: 0.4 dynamic superinstructions no static superinstructions no multi-state stack caching 0.3 no automatic tuning gforth version 0.2 0.5.0 0.6.1 0.6.2 0.7.0 0.6.1nd 0.6.2ns 0.7.0ssc

Engines (2) speed IA32 Xeon 5450 speed AMD64 Xeon 5450 1 gforth-fast 1 gforth-fast 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 gforth 0.5 0.5 0.4 0.4 gforth 0.3 0.3 gforth version gforth version 0.2 0.2 0.5.0 0.6.1 0.6.2 0.7.0 0.5.0 0.6.1 0.6.2 0.7.0 0.6.1nd 0.6.2ns 0.7.0ssc 0.6.1nd 0.6.2ns 0.7.0ssc

GCC versions (1) speed PPC 7447A 0.7.0 1 0.7.0ssc 0.9 0.8 0.7 0.6 0.6.2 • Gcc ≥ 3 . 4 disables dynamic 0.6.2ns superinstructions in gforth 0.6.x 0.6.1 0.6.1nd 0.5 • Gforth 0.7.0 works around that 0.5.0 • gcc-2.95 works well 0.4 0.3 gcc version 2.95 3.3 4.0 4.3 3.2 3.4 4.1

GCC versions (2) speed IA32 Xeon 5450 1 0.7.0 0.9 0.8 0.7.0ssc 0.7 • PR15242 lowers branch prediction accuracy 0.6 (gcc-3.4, gcc-4.4.0) • Gforth 0.7.0 works around that 0.5 • Gforth 0.7.0 affected by bad register allocation (4.1, 4.2) 0.4 NEXT expansion (4.4.0) • gcc-2.95 works well 0.3 0.6.2 0.6.1 0.6.1nd 0.5.0 0.6.2ns gcc version 2.95 3.4 4.1 4.4.0 3.3 4.0 4.2

GCC performance bug: PR15242 Code 1+ Code 1+ ( $804B6D8 ) add ebx, #4 ( $804B6D8 ) add ebx, #4 ( $804B6DB ) inc ebp ( $804B6DB ) inc ebp instead of ( $804B6DC ) mov esi, -4[ebx] ( $804B6DC ) jmp -4[ebx] ( $804B6DF ) mov eax, esi ( $804B6E1 ) jmp 804AE8C ... ( $804AE8C ) jmp eax

GCC performance bug: NEXT expansion before_goto: goto *real_ca; is compiled to: instead of: mov edx, 58 [esp] jmp esi mov eax, esi mov 68 [esp], edx mov 6C [esp], edx mov 70 [esp], edx mov 74 [esp], edx jmp eax

Other Forth systems speed IA32 Opteron 270 iforth 5.6 bigforth gforth 4 vfxlin spf4 2.8 2 1.4 1 0.7 0.5 0.3 0.2 benchgc4 brew fcp brainless cd16sim lexex

Conclusion • Typical speedup factor: 3 • Most important optimization: dynamic superinstructions New gcc versions often disable it ⇒ workarounds • Important on IA32: Explicit register allocation Automatic enabling and testing to get it into Linux distributions • Other optimizations have small or architecture-specific effect But their combination is still significant • Gforth is very portable 0.5.0 runs on architectures that were not available on release • Future work Inlining Compilation through C (independence from GCC) Native code generation

A Look at Gforth Performance M. Anton Ertl TU Wien New performance - PowerPoint PPT Presentation

A Look at Gforth Performance M. Anton Ertl TU Wien New performance features since Gforth 0.5.0 (2000) primitive-centric direct threaded code (EuroForth 2001) dynamic superinstructions with replication (PLDI 2003) static

From exit to set-does> A Story of Gforth Re-Implementation M. Anton Ertl, TU Wien Bernd

The new Gforth Header Bernd Paysan, net2o M. Anton Ertl, TU Wien Traditional Header New

Collection #1 LOOk 1/8 LOOk 2/8 LOOk 3/8 LOOk 4/8 LOOk 5/8 LOOk 6/8

WELCOME TO CALVERT MIDDLE SCHOOL WE LOOK FORWARD TO WALKING THESE HALLS WITH YOU WE LOOK

Lo Look B ok Back ck 2016 2016 Act ct Now ow Lo Look ok F For orwa ward rd - 2017

Performance and Scalability (Chapter 11) Performance and Scalability Performance: How long

March 2019 CONTENTS Page Combined Partner Performance 1 Breckland Performance Reports 2-6

Performance Bas Performance Bas Performance Bas Performance Bas ed ed ed ed Methodology for

Verification Verification, Performance Performance Analysis Performance Performance Analysis

Antoine Amiel Vice-Chairman New Look Optical Group New Look Optical Group 3 independent

Look Ma, No Cavities! Look Ma, No Cavities! Fact or Fiction? Fact or Fiction? Kathy Phipps,

They Dont Look They Dont Look Disabled to Me! Ethical Insights for Invisible

Global W Warming A Look at the Data We look at temperature changes against three variables:

A Deeper Deeper Look Look at at Ba Bay Ar Area ea Opportunity Opportunity Zo Zones August 13,

How I I use lo look-up tables to conduct la large volumes of analyses By Craig Hansen, PhD

Housing in the Netherlands Housing in the Netherlands A closer look at the Social A closer look

WSMB: a middleware for enhanced Web services interoperability Trung Nguyen Kien, Abdelkarim

Overview' ! Course'theme' CSCI$3240$ ! Five'reali;es' Introduc0on$to$Computer$Systems$ !

PATH TO CLOUD-NATIVE APP DEV 8 steps to cloud-native app dev Thomas Qvarnstrom Cesar Saavedra

Tracking Learning Experiences Using the Experience API Lim Kin Chew School of Science

PRACE in the European HPC landscape Serge Bogaerts PRACE Managing Director NorduGrid 2017

Oracle eBusiness Suite 11i Integration Ulrich Janke Oracle Consulting Deutschland Page 1

Administration Systems February 3rd, 2000 Tibor von Karlowitz Higher Education Manager - WA

Shaping research for the years to come the next EU Framework Programme for R&I and other

A Look at Gforth Performance M. Anton Ertl TU Wien New performance - PowerPoint PPT Presentation

A Look at Gforth Performance M. Anton Ertl TU Wien New performance features since Gforth 0.5.0 (2000) primitive-centric direct threaded code (EuroForth 2001) dynamic superinstructions with replication (PLDI 2003) static

From exit to set-does&gt; A Story of Gforth Re-Implementation M. Anton Ertl, TU Wien Bernd

The new Gforth Header Bernd Paysan, net2o M. Anton Ertl, TU Wien Traditional Header New

Collection #1 LOOk 1/8 LOOk 2/8 LOOk 3/8 LOOk 4/8 LOOk 5/8 LOOk 6/8

WELCOME TO CALVERT MIDDLE SCHOOL WE LOOK FORWARD TO WALKING THESE HALLS WITH YOU WE LOOK

Lo Look B ok Back ck 2016 2016 Act ct Now ow Lo Look ok F For orwa ward rd - 2017

Performance and Scalability (Chapter 11) Performance and Scalability Performance: How long

March 2019 CONTENTS Page Combined Partner Performance 1 Breckland Performance Reports 2-6

Performance Bas Performance Bas Performance Bas Performance Bas ed ed ed ed Methodology for

Verification Verification, Performance Performance Analysis Performance Performance Analysis

Antoine Amiel Vice-Chairman New Look Optical Group New Look Optical Group 3 independent

Look Ma, No Cavities! Look Ma, No Cavities! Fact or Fiction? Fact or Fiction? Kathy Phipps,

They Dont Look They Dont Look Disabled to Me! Ethical Insights for Invisible

Global W Warming A Look at the Data We look at temperature changes against three variables:

A Deeper Deeper Look Look at at Ba Bay Ar Area ea Opportunity Opportunity Zo Zones August 13,

How I I use lo look-up tables to conduct la large volumes of analyses By Craig Hansen, PhD

Housing in the Netherlands Housing in the Netherlands A closer look at the Social A closer look

WSMB: a middleware for enhanced Web services interoperability Trung Nguyen Kien, Abdelkarim

Overview' ! Course'theme' CSCI$3240$ ! Five'reali;es' Introduc0on$to$Computer$Systems$ !

PATH TO CLOUD-NATIVE APP DEV 8 steps to cloud-native app dev Thomas Qvarnstrom Cesar Saavedra

Tracking Learning Experiences Using the Experience API Lim Kin Chew School of Science

PRACE in the European HPC landscape Serge Bogaerts PRACE Managing Director NorduGrid 2017

Oracle eBusiness Suite 11i Integration Ulrich Janke Oracle Consulting Deutschland Page 1

Administration Systems February 3rd, 2000 Tibor von Karlowitz Higher Education Manager - WA

Shaping research for the years to come the next EU Framework Programme for R&amp;I and other

From exit to set-does> A Story of Gforth Re-Implementation M. Anton Ertl, TU Wien Bernd

Shaping research for the years to come the next EU Framework Programme for R&I and other