fastfunction replacing a herd of lemmings with a cheetah
play

FastFunction: Replacing a herd of lemmings with a cheetah A Ruby - PowerPoint PPT Presentation

FastFunction: Replacing a herd of lemmings with a cheetah A Ruby framework for interaction with PostgreSQL databases Henrietta Dombrovskaya, Srivathsava Rangarajan, Jonathan Marks Enova Chicago, USA What is this presentation about This


  1. FastFunction: Replacing a herd of lemmings with a cheetah A Ruby framework for interaction with PostgreSQL databases Henrietta Dombrovskaya, Srivathsava Rangarajan, Jonathan Marks Enova Chicago, USA

  2. What is this presentation about This presentation is not about databases! We are online lenders, and all we care about is $$MONEY$$ Actually… It’s all about application performance!

  3. The problem Why we care about performance? Time = Money!

  4. Money by the numbers Nobody likes to wait! 1 sec page load slowdown => $1.6 billion lost sales/year Slowing search results by 0.4 sec – loss of 8 million searches per day

  5. How long they can wait? - 50% visitors abandon the site, which is not loaded within 3 sec - 79% visitors will never return again Of course, we sell money, and that makes a difference, but… - Will you wait for 30 sec for the registration to be completed? - Especially when there are competitors around?

  6. Hardware is not a problem! That’s what we have: Our US master PG database runs on 80 thread processors 2.4GHz 512 Gb RAM – almost completely used by disk cache 1066MHz (responses from RAM are 0.9 ns) I/O 4Gb/sec with avg response time 3ms I/O utilization: 40% Even with the best hardware available: we can make it only twice faster Current cost : 20K (commodity) Next – 100K – somewhat faster (non-commodity) Next - 1,000K - twice faster (mainframe)

  7. What is the reason for slowness? Too many database calls per one action! Action Avg # db calls Max # db calls See amortization schedule 350 2,500 See my balances 84 2,500 See my payments 170 1,800 Perform application search 70 1,200 See account summary 80 790 Why there are so many? ORM (Active Record) results in ORIM

  8. Logic Split methodology We introduced the Logic Split methodology about three years ago as a solution to our performance problems (EDBT 2014):

  9. Logic Split Steps ü Disassemble ü Identify data retrieval ü Construct a single query ü Execute ü Use retrieved data in other steps

  10. We’ve achieved amazing performance Avg DB time(sec) 5 4.5 4 3.5 3 Old Avg Time (sec) 2.5 New Avg Time (sec) 2 1.5 1 0.5 0 Customer Summary Loan Summary Loan Payments Installments

  11. But… look at the code!

  12. So the app developers say… This is way too complicated! Besides, it’s against the OO concept!

  13. FastFunction A Better Fit for Logic Split and Performance ActiveRecord FastFunction General purpose Optimized by task Expressive query Single-purpose Builds SQL queries Calls stored procedures Familiar to developers Looks like ActiveRecord

  14. ActiveRecord Generates SQL Lemming.all SELECT * FROM lemmings 14

  15. Adding Conditionals Lemming.find_by(name: “Bob”, tag: 5) SELECT * FROM lemmings WHERE name = ‘Bob’ AND tag = 5 15

  16. Adding Complexity Lemming.find_by(name: “Bob”, tag: 5) .order_by(:age) SELECT * FROM lemmings WHERE name = ‘Bob’ AND tag = 5 ORDER BY age 16

  17. Execute the Query Lemming.find_by(name: “Bob”, tag: 5) .order_by(:age) SELECT * FROM lemmings WHERE name = ‘Bob’ AND tag = 5 ORDER BY age Generated Query Application Database Query Result 17

  18. FastFunction Minimal SQL CheetahsByNameAndTag SELECT result.* FROM cheetahs_by_name_and_tag(?, ?) AS result 18

  19. Database Knows Better CheetahsByNameAndTag.execute(‘Bob’, 5) SELECT result.* FROM cheetahs_by_name_and_tag (‘Bob’, 5) AS result Templated Query Application Database Query Result 19

  20. Alike Where It Counts… …To the Application Programmer 20

  21. Create a Model Class ActiveRecord class Peanut < ActiveRecord::Base # Convention identifies base table as “peanuts” end FastFunction class PopularPeanutsByBrand < FastFunction function = ‘popular_peanuts_by_brand(?)’ end # Convention can give function name too, but not arg list 21

  22. Invoke a Query Operation ActiveRecord peanuts = Peanuts.find_by(brand: ‘Planters’).order_by(:rating) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor FastFunction peanuts = PopularPeanutsByBrand.execute(‘Planters’) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor 22

  23. Receive a List of Results… ActiveRecord peanuts = Peanuts.find_by(brand: ‘Planters’).order_by(:rating) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor FastFunction peanuts = PopularPeanutsByBrand.execute(‘Planters’) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor 23

  24. …As Native Objects ActiveRecord peanuts = Peanuts.find_by(brand: ‘Planters’).order_by(:rating) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor FastFunction peanuts = PopularPeanutsByBrand.execute(‘Planters’) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor 24

  25. …With Discovered Attributes FastFunction ActiveRecord CREATE TYPE peanut CREATE TABLE peanuts ( ( brand varchar(50), brand varchar(50), flavor varchar(50), flavor varchar(50), salty boolean salty boolean ); ); Ruby Object peanut.brand peanut.flavor peanut.salty 25

  26. So Where Is The SQL? In the “fast function”! CREATE FUNCTION popular_peanuts_by_brand(brand VARCHAR) RETURNS SETOF peanut AS $$ SELECT (brand, flavor, salty)::peanut FROM black_magic INNER JOIN dark_wizardry WHERE db_knows_better_than_app $$ LANGUAGE ‘sql’; 26

  27. FastFunction … Does Doesn’t • Data retrieval only • Replace ActiveRecord • Provide familiar object-oriented • Interact directly with data access pattern ActiveRecord • Facilitate custom access strategy • Insert, Update, nor Delete • Need SQL type and expertise • Implement associations outside the stored procedure • Run queries AR can’t write • Need a base table • Allow results not stored in any one table • Perform operations outside the stored procedure • Encapsulate queries better maintained in database • Write your query for you 27

  28. Case Study: Account Presenter Phone Government Numbers Identifiers account.home_address account.driving_license account.work_address account.passport Account account.alt_address account.state_id Spouse Addresses details 28

  29. Grouping Access Simply put, the problem is one of grouping access to the database and tables. class Account def home_address; end def work_address; end def all_addresses [home_address] + [work_address] end def first_name; end def language; end end

  30. Design Constraints # tables: 70-80 # fields: 230+ 1) optimized, robust join strategy geqo collapse_limits 2) readable and maintainable UDF modularized performant 3) dynamically mapped return type application seamlessly maps updated types

  31. Case Study: Data Flow addresses Account Addresses work home Personal people first_name language Spouse home_address first_name language

  32. Common Table Expressions (CTEs) Similar to derived table. WITH address AS ( SELECT $, * FROM addresses ... WHERE account_id = $ GROUP BY type ), spouse_details AS ( SELECT $, * FROM spouses ... WHERE account_id = $ ), account_details AS ( SELECT $, * FROM accounts ... WHERE account_id = $ ) ...

  33. Testing Methodology Split into correctness and performance testing • A/B test on 30,000 randomly chosen accounts On same production environment • Spread across multiple parts of multiple days • Warm and cold cache • Query Groups 120 • Split load into 4 groups: 100 80 query count 60 average_query_count 40 average_normalized_query_count 20 0 1 2 3 4 Query Group

  34. Results-1 Current vs. FastFunction-optimized fetch times Current, Optimized cached data fetch vs. Current, Optimized data fetch vs. Time Time 300 300 250 250 200 200 Time(ms) Time(ms) 150 150 100 100 50 50 0 0 1 2 3 4 1 2 3 4 Query Group Query Group current_fetch_time(avg) optimized_fetch_time(avg)

  35. Results-2 FastFunction data fetch breakdown Optimized (uncached, cached) data fetch breakdown 60 50 40 average_data_pruning_time average_misc_query_time Time(ms) average_hash_structuring_overhead 30 average_mapper_marshalling_overhead average_mapper_call_setup_overhead 20 average_function_call_time 10 0 optimized_uncached_time_split optimized_cached_time_split

  36. Related work The ORIM is not purely technical: - Ireland et al: conceptual and psychological aspects - Agile Data essay: cultural impedance mismatch ORMs which allow to embed SQL: - Hibernate - LINQ - SQLAlchemy Identify delinquent code patterns: - AppSleuth - StatusQuo Holistic optimization: - Dbridge

  37. Future work • Spreading awareness • Adding new features, making the FastFunction looking more like another ORM • Creating a whole database API layer

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend