FastFunction: Replacing a herd of lemmings with a cheetah A Ruby - - PowerPoint PPT Presentation

fastfunction replacing a herd of lemmings with a cheetah
SMART_READER_LITE
LIVE PREVIEW

FastFunction: Replacing a herd of lemmings with a cheetah A Ruby - - PowerPoint PPT Presentation

FastFunction: Replacing a herd of lemmings with a cheetah A Ruby framework for interaction with PostgreSQL databases Henrietta Dombrovskaya, Srivathsava Rangarajan, Jonathan Marks Enova Chicago, USA What is this presentation about This


slide-1
SLIDE 1

FastFunction: Replacing a herd

  • f lemmings with a cheetah

A Ruby framework for interaction with PostgreSQL databases

Henrietta Dombrovskaya, Srivathsava Rangarajan, Jonathan Marks Enova Chicago, USA

slide-2
SLIDE 2

This presentation is not about databases! We are online lenders, and all we care about is

$$MONEY$$

Actually… It’s all about application performance! What is this presentation about

slide-3
SLIDE 3

Why we care about performance?

Time = Money! The problem

slide-4
SLIDE 4

Nobody likes to wait!

1 sec page load slowdown => $1.6 billion lost sales/year Slowing search results by 0.4 sec – loss of 8 million searches per day

Money by the numbers

slide-5
SLIDE 5
  • 50% visitors abandon the site, which is not loaded within 3 sec
  • 79% visitors will never return again

Of course, we sell money, and that makes a difference,

but…

  • Will you wait for 30 sec for the registration to be completed?
  • Especially when there are competitors around?

How long they can wait?

slide-6
SLIDE 6

That’s what we have:

Our US master PG database runs on 80 thread processors 2.4GHz 512 Gb RAM – almost completely used by disk cache 1066MHz (responses from RAM are 0.9 ns) I/O 4Gb/sec with avg response time 3ms I/O utilization: 40%

Even with the best hardware available:

we can make it only twice faster Current cost: 20K (commodity) Next – 100K – somewhat faster (non-commodity) Next - 1,000K - twice faster (mainframe)

Hardware is not a problem!

slide-7
SLIDE 7

Too many database calls per one action! Why there are so many? ORM (Active Record) results in ORIM

What is the reason for slowness?

Action Avg # db calls Max # db calls See amortization schedule 350 2,500 See my balances 84 2,500 See my payments 170 1,800 Perform application search 70 1,200 See account summary 80 790

slide-8
SLIDE 8

We introduced the Logic Split methodology about three years ago as a solution to our performance problems (EDBT 2014):

Logic Split methodology

slide-9
SLIDE 9

Logic Split Steps

üDisassemble üIdentify data retrieval üConstruct a single query üExecute üUse retrieved data in other steps

slide-10
SLIDE 10

We’ve achieved amazing performance

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Customer Summary Loan Summary Loan Payments Installments

Avg DB time(sec)

Old Avg Time (sec) New Avg Time (sec)

slide-11
SLIDE 11

But… look at the code!

slide-12
SLIDE 12

This is way too complicated! Besides, it’s against the OO concept!

So the app developers say…

slide-13
SLIDE 13

FastFunction

A Better Fit for Logic Split and Performance

ActiveRecord FastFunction General purpose Optimized by task Expressive query Single-purpose Builds SQL queries Calls stored procedures Familiar to developers Looks like ActiveRecord

slide-14
SLIDE 14

ActiveRecord Generates SQL

14

Lemming.all SELECT * FROM lemmings

slide-15
SLIDE 15

Adding Conditionals

15

Lemming.find_by(name: “Bob”, tag: 5) SELECT * FROM lemmings WHERE name = ‘Bob’ AND tag = 5

slide-16
SLIDE 16

Adding Complexity

16

Lemming.find_by(name: “Bob”, tag: 5) .order_by(:age) SELECT * FROM lemmings WHERE name = ‘Bob’ AND tag = 5 ORDER BY age

slide-17
SLIDE 17

SELECT * FROM lemmings WHERE name = ‘Bob’ AND tag = 5 ORDER BY age Execute the Query

17

Lemming.find_by(name: “Bob”, tag: 5) .order_by(:age) Generated Query Application Database Query Result

slide-18
SLIDE 18

SELECT result.* FROM cheetahs_by_name_and_tag(?, ?) AS result FastFunction Minimal SQL

18

CheetahsByNameAndTag

slide-19
SLIDE 19

SELECT result.* FROM cheetahs_by_name_and_tag(‘Bob’, 5) AS result

Database Knows Better

19

CheetahsByNameAndTag.execute(‘Bob’, 5) Templated Query Application Database Query Result

slide-20
SLIDE 20

Alike Where It Counts…

20

…To the Application Programmer

slide-21
SLIDE 21

ActiveRecord

class Peanut < ActiveRecord::Base # Convention identifies base table as “peanuts” end

Create a Model Class

21

FastFunction

class PopularPeanutsByBrand < FastFunction function = ‘popular_peanuts_by_brand(?)’ end

# Convention can give function name too, but not arg list

slide-22
SLIDE 22

ActiveRecord

peanuts = Peanuts.find_by(brand: ‘Planters’).order_by(:rating) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor

Invoke a Query Operation

22

FastFunction

peanuts = PopularPeanutsByBrand.execute(‘Planters’) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor

slide-23
SLIDE 23

ActiveRecord

peanuts = Peanuts.find_by(brand: ‘Planters’).order_by(:rating) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor

Receive a List of Results…

23

FastFunction

peanuts = PopularPeanutsByBrand.execute(‘Planters’) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor

slide-24
SLIDE 24

ActiveRecord

peanuts = Peanuts.find_by(brand: ‘Planters’).order_by(:rating) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor

…As Native Objects

24

FastFunction

peanuts = PopularPeanutsByBrand.execute(‘Planters’) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor

slide-25
SLIDE 25

…With Discovered Attributes

25

ActiveRecord

CREATE TABLE peanuts ( brand varchar(50), flavor varchar(50), salty boolean );

FastFunction

CREATE TYPE peanut ( brand varchar(50), flavor varchar(50), salty boolean ); peanut.brand peanut.flavor peanut.salty

Ruby Object

slide-26
SLIDE 26

In the “fast function”!

CREATE FUNCTION popular_peanuts_by_brand(brand VARCHAR) RETURNS SETOF peanut AS $$ SELECT (brand, flavor, salty)::peanut FROM black_magic INNER JOIN dark_wizardry WHERE db_knows_better_than_app $$ LANGUAGE ‘sql’;

So Where Is The SQL?

26

slide-27
SLIDE 27

FastFunction …

27

  • Data retrieval only
  • Provide familiar object-oriented

data access pattern

  • Facilitate custom access strategy
  • Need SQL type and expertise
  • Run queries AR can’t write
  • Allow results not stored in any
  • ne table
  • Encapsulate queries better

maintained in database

Does

  • Replace ActiveRecord
  • Interact directly with

ActiveRecord

  • Insert, Update, nor Delete
  • Implement associations
  • utside the stored procedure
  • Need a base table
  • Perform operations outside

the stored procedure

  • Write your query for you

Doesn’t

slide-28
SLIDE 28

Case Study: Account Presenter

28

Account Government Identifiers Phone Numbers Addresses Spouse details account.home_address account.work_address account.alt_address account.state_id account.driving_license account.passport

slide-29
SLIDE 29

Grouping Access Simply put, the problem is one of grouping access to the database and tables.

class Account def home_address; end def work_address; end def all_addresses [home_address] + [work_address] end def first_name; end def language; end end

slide-30
SLIDE 30

Design Constraints # tables: 70-80 # fields: 230+

1) optimized, robust join strategy geqo collapse_limits 2) readable and maintainable UDF modularized performant 3) dynamically mapped return type application seamlessly maps updated types

slide-31
SLIDE 31

Case Study: Data Flow

Account Addresses Personal Spouse home_address first_name language work home language first_name addresses people

slide-32
SLIDE 32

Common Table Expressions (CTEs) Similar to derived table.

WITH address AS ( SELECT $, * FROM addresses ... WHERE account_id = $ GROUP BY type ), spouse_details AS ( SELECT $, * FROM spouses ... WHERE account_id = $ ), account_details AS ( SELECT $, * FROM accounts ... WHERE account_id = $ ) ...

slide-33
SLIDE 33

Testing Methodology Split into correctness and performance testing

  • A/B test on 30,000 randomly chosen accounts
  • On same production environment
  • Spread across multiple parts of multiple days
  • Warm and cold cache
  • Split load into 4 groups:

20 40 60 80 100 120 1 2 3 4 query count Query Group

Query Groups

average_query_count average_normalized_query_count

slide-34
SLIDE 34

Results-1 Current vs. FastFunction-optimized fetch times

50 100 150 200 250 300 1 2 3 4 Time(ms) Query Group

Current, Optimized data fetch vs. Time

50 100 150 200 250 300 1 2 3 4 Time(ms) Query Group

Current, Optimized cached data fetch vs. Time current_fetch_time(avg)

  • ptimized_fetch_time(avg)
slide-35
SLIDE 35

Results-2 FastFunction data fetch breakdown

10 20 30 40 50 60

  • ptimized_uncached_time_split
  • ptimized_cached_time_split

Time(ms)

Optimized (uncached, cached) data fetch breakdown

average_data_pruning_time average_misc_query_time average_hash_structuring_overhead average_mapper_marshalling_overhead average_mapper_call_setup_overhead average_function_call_time

slide-36
SLIDE 36

The ORIM is not purely technical:

  • Ireland et al: conceptual and psychological aspects
  • Agile Data essay: cultural impedance mismatch

ORMs which allow to embed SQL:

  • Hibernate
  • LINQ
  • SQLAlchemy

Identify delinquent code patterns:

  • AppSleuth
  • StatusQuo

Holistic optimization:

  • Dbridge

Related work

slide-37
SLIDE 37
  • Spreading awareness
  • Adding new features, making the FastFunction

looking more like another ORM

  • Creating a whole database API layer

Future work