Excel on the Java VM
Generating Fast Code from Spreadsheet Models
Peter Arrenbrecht codewise.ch / Abacus Research AG Submission ID: 30
Excel on the Java VM Generating Fast Code from Spreadsheet Models - - PowerPoint PPT Presentation
Excel on the Java VM Generating Fast Code from Spreadsheet Models Peter Arrenbrecht codewise.ch / Abacus Research AG Submission ID: 30 AGENDA > Problem: Customization Is Hard > Idea: Let Users Use Spreadsheets > Implementation:
Generating Fast Code from Spreadsheet Models
Peter Arrenbrecht codewise.ch / Abacus Research AG Submission ID: 30
2
> Problem: Customization Is Hard > Idea: Let Users Use Spreadsheets > Implementation: Abacus Formula Compiler
3
> Calculation of key values varies – From user to user – Over time > Examples – Prices and conditions – Shipping rates – Salary components
Variations in employment terms
Variations in local regulations
– Asset valuations – Insurance premiums – Risk estimates – ...
4
> Custom development (options targeted at needs of specific customers)
Hard to maintain due to code and data bloat
Reduced quality in UI, documentation and testing (“it's only for a single customer”)
Costly and slow for customer when he needs a change
> Parametrization (options trying to capture reusable aspects of specific needs)
Often hard to discover and configure; complex interplay
Testing nightmare due to endless configuration combinations
Hard to maintain due to code and data bloat
Still costly and slow when customer needs unanticipated feature
> Expression builder (generic expressions with access to relevant input values)
Very costly to implement with sufficient flexibility, usability and quality
Many implementations basically just support defining one-liners
Can still lead to testing nightmare
Can be slow when used heavily
> Hard on both you and your customers
5
> Abacus Payroll: Definition of salary components (Lohn 411) > Fairly generic expression builder
Basic set of operators, structured multi-line expressions
Can reference input values and results of other calculations
> Nice for simple cases
Can define simple one-liners quickly
> Hard for complex cases
Grouping? If-then-else? Debugging? Documenting? Change control?
Table lookup
6
> Abacus Payroll: Definition of lookup tables (Lohn 421)
Criteria can reference inputs or defined expressions
So can values
> Shortcomings
Separate UI
To and fro with expressions
Again no visual feedback etc.
> Modelling power
Complex composition of basic functions (advanced functions are rarely needed)
Named intermediate values to structure and to reuse code
Multiple distinct cases with complex decision logic
Lookup tables (price/rebate by category/threshold) with computed entries
> Debugging / Verification
See effects of different sample input values to gain confidence in correctness
Trace effects through intermediate values
Trace and debug computations on actual data sets
Allow to record test set of input/output values to be verified after system updates
> Documentation
Annotate intermediate steps with to-the-point explanatory text
Just a plain-text field requires too much duplication
> Versioning
Can keep textual log of changes, but preferably keep log of old versions around
> Familiar to most users (at least those who would do customization) > Very good modelling power
Handling multiple distinct cases can be a little non-visual, but manageable
> Very interactive
Immediate feedback
Dependency tracing
What-if
Goal finding
> Can be documented
Annotate individual cells
Use colors and fonts
> Can be tracked
Keep versions of files around at will (file system, blobs, version control)
Compare versions of files
Integrate changelog
> A way to specify input and output values
Especially a way to discover available inputs and expected outputs
– Ideas
Excel-/OpenOffice-Plugin
Decoupled: Users name input/output cells arbitrarily, then they can define bindings for these names in a separate step in your app
> Access to actual input values for debugging > Running tests with sets of input and expected output values
> Abacus supports same calculcation in a spreadsheet
Immediate feedback
Everything at a glance
Keep notes
Keep versions
> Abacus Service and Contract Management > Calculate the price of an Abacus ERP module – Lots of compositions of simple functions – Lots of decisions (IF) – Lots of lookup tables – Integrated documentation
Changelog
Notes on individual cells
– Use of colors and fonts
> Integrating the expression builder
Access to input values and provided output values is an external API
Needs all the care that should always go into an external API
Decoupling, stability, discoverability, documentation
> Testing the expression evaluation engine
Usage patterns not predictable
Need very high code coverage
> Testing its use in the application
Users can access input values in any order
Keep input value accessors side-effect free
Then testing input value accessors in isolation should do
> Performance
Usually not critical at first, when used in interactive applications
But can quickly become so for batch updates
> Compiles formulaic content of spreadsheets to plain Java classes
http://www.formulacompiler.org/
Supports Excel and OpenOffice formats (.xls, .xlsx, .ods)
Supports a number of numeric data types (double, BigDecimal, scaled long)
> Compiler and compiled classes run on a plain JRE
Does not require Excel or OpenOffice to compile or run engines
Generates bytecode directly, so does not need a full JDK (tools.jar)
But integrates a decompiler so you can look at generated code
> Performs well
Compiled classes are JITted like any other Java class
Tuned for heavy use of a bunch of computations
Not for thousands of different computations each used rarely
Cheap instantiation supports thread isolation via per-computation instances
Choice of stateless mode or stateful mode that caches intermediate cell values
> Integrates well with application code
Pulls input values from specified application-specific Java interfaces
Supplies output values as implementations of application-specific Java interfaces
So can provide custom computation strategy implementations in addition to classically coded default ones
Binding of cells to input and output values can be fully controlled
Optional simple by-name binding of input and output cells
> Handles repeating data
Designated spreadsheet areas are treated as repeating sections
Iterates over iterables and arrays
> Compiling the Abacus ERP module price calculation – Simple setup – Uses and implements our own interfaces – Can switch to BigDecimal – Decompile to see generated Java code
> Modular design – Compact runtime for loading and running precompiled engines – Support for different spreadsheet formats – Build spreadsheet models in- memory to replace home- grown expression engines > Extensively tested and documented > Available under GPL and commercially
> Compile-time constant folding using interpreter > Template methods – Used by interpreter to compute subexpressions – Decompiled at build-time to generate methods emitting equivalent byte-code > High-level functional language – To implement complex operations
Define SUM in terms of iteration and +
– Interpreted by interpreter – Compiled to low-level constructs by compiler > Yes, we have developer documentation!
http://www.formulacompiler.org/contribute/hacking/index.htm
> Users need to customize calculations > Use spreadsheets as their UI
Well known, interactive, traceable
Needs good application integration
> Abacus Formula Compiler was built for this
Integrates efficiently with your interfaces
Compiles to fast code
Can use BigDecimal for financial applications
No Excel, OpenOffice, nor even a JDK needed
In production use at Abacus Research AG
Extensive tests
Comprehensive documentation
GPL and commercial version available
> Still requires discipline to remain testable!
Spreadsheets are free of side-effects – your APIs should be too
> http://www.formulacompiler.org/
Peter Arrenbrecht http://arrenbrecht.ch/ codewise.ch peter@arrenbrecht.ch Abacus Formula Compiler
http://formulacompiler.org/
Abacus Research AG http://abacus.ch/