Scala For Business Automation Solving Real Business Problems with - - PowerPoint PPT Presentation

scala for business automation
SMART_READER_LITE
LIVE PREVIEW

Scala For Business Automation Solving Real Business Problems with - - PowerPoint PPT Presentation

Scala For Business Automation Solving Real Business Problems with Streams, POI and a Dash of BDD Conor Svensson conor10@gmail.com The Spreadsheet The PCs first killer app The London Whale the model operated through a series of Excel


slide-1
SLIDE 1

Scala For Business Automation

Solving Real Business Problems with Streams, POI and a Dash of BDD

Conor Svensson conor10@gmail.com

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

The Spreadsheet

The PC’s first killer app

slide-5
SLIDE 5

The London Whale

…the model operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another…

Report of JPMorgan Chase & Co. Management Task Force Regarding 2012 CIO Losses

slide-6
SLIDE 6

A brief history…

  • 1979 VisiCalc - The first killer app for PCs on Apple II
  • 1983 Lotus 1-2-3
  • 1985 Excel 1.0 on Apple
  • 1987 Excel 2.0 on MS-DOS
  • 1992 Excel 3.0 on Windows
  • 1994 StarCalc (Star Office)
  • 2001 Calc (Open Office)
  • 2006 Google Spreadsheets
  • 2007 iWork Numbers
slide-7
SLIDE 7

Today

slide-8
SLIDE 8

Use cases

  • Traditional
  • Accounting/inventory
  • Project management
  • Simple data analysis
  • Reporting
  • Exotic
  • Business process workflows
  • Risk management
  • Transaction capture
  • Reconciliations
slide-9
SLIDE 9

Why?

Initially…

“I can solve this problem with a spreadsheet”

  • Avoid more

expensive build via a proper solution (bureaucratic change processes don’t help here)

  • Lack of

development knowledge

And then…

Management/audit/ regulators arrive…

  • “Why do we have

these decentralised business critical spreadsheets?”

  • “We need a proper

solution”

Then…

It becomes business critical

slide-10
SLIDE 10

Complex spreadsheets

  • Multiple lookup tables with reference data
  • Data from multiple data sources
  • Series of complex calculations
  • Output reports
  • Centralised via a file share (if lucky), multiple

copies tend to exist due to e-mail

slide-11
SLIDE 11

The result

slide-12
SLIDE 12

Automation

slide-13
SLIDE 13
  • 1. Apache POI
  • Poor Obfuscation Implementation
  • The Java API for Microsoft Documents
  • Work with Office files without a license (Excel, Word,

PowerPoint, Outlook, Visio, Publisher)

  • Text extraction applications - web spiders, index

builders, content management systems

  • Business automation…
slide-14
SLIDE 14
  • 2. Scala
  • Expressive and concise
  • Streams, for comprehensions, and case classes
  • Many organisations have not yet made the

switch to Java 8

  • Slots easily into existing Java dominant

environments (e.g. financial institutions)

slide-15
SLIDE 15

Streams

slide-16
SLIDE 16

For comprehension

Instead of: We can use:

slide-17
SLIDE 17

Case classes

Scala provides:

  • field accessors
  • constructors
  • equals
  • hashCode
  • toString
  • copy
slide-18
SLIDE 18
  • 3. BDD in Scala

Examples use ScalaTest’s FlatSpec with Matchers

slide-19
SLIDE 19

An example

Q: Is there a correlation between Bitcoin and commodity returns?

slide-20
SLIDE 20

Source data

  • Bitcoin prices
  • ASX SPI 200 (Australian commodities index)
slide-21
SLIDE 21

Our model

  • Reorder data
  • Create named ranges
  • Merge price series by date
  • Calculate:
  • 1. Returns of our price series
  • 2. Mean
  • 3. Standard deviation
  • 4. Covariance
  • 5. Correlation
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28

The Code

ExcelReader

  • Use named ranges or sheet/row/col indices

Stats

  • Statistical library

Correlation

  • Incremental extraction of business logic with

Streams/for-comprehensions

slide-29
SLIDE 29

ExcelReader

slide-30
SLIDE 30

Stats

slide-31
SLIDE 31

Correlation

slide-32
SLIDE 32

End to end validation

Tests that all can understand!!!

slide-33
SLIDE 33
slide-34
SLIDE 34

Clear test output

slide-35
SLIDE 35

Building on this

  • Rapid prototyping/MVP
  • Spreadsheet used as initial datasource
  • Micro-spreadsheet-services, etc… (sorry)
slide-36
SLIDE 36

Limitations

  • VBA evaluations are not supported, only formula

evaluations are

  • Some Excel functions are not yet implemented

(but you can contribute easily)

  • It’s not a silver bullet for avoiding spreadsheets,

merely a tool to ease the pain of moving off them

  • Additional overhead of more granular stream

processing

slide-37
SLIDE 37

In summary

  • Useful tools for rapid prototyping/MVP without risking data

quality

  • Reduced implementation error
  • Spreadsheet can be modelled incrementally
  • BDD test cases verified against original source
  • Use FP techniques
  • Very well suited to streams
  • Promoting immutability of our data structures
slide-38
SLIDE 38

Further resources

  • https://github.com/conor10/spreadsheets - source code and example code

supporting this talk

  • https://www.quandl.com - Quandl for data
  • https://poi.apache.org/ - Apache POI Project
  • http://scalatest.org/ - ScalaTest
  • Power, D. J., "A Brief History of Spreadsheets", DSSResources.COM, World

Wide Web, http://dssresources.com/history/sshistory.html, version 3.6, 08/30/2004

  • http://blogs.office.com/2012/09/13/introducing-spreadsheet-controls-in-the-

new-office/ - Controls for critical spreadsheets

  • http://fsprojects.github.io/ExcelProvider/ - F# Excel Type Provider
slide-39
SLIDE 39

Questions?