From days to minutes, from From days to minutes, from minutes to - - PDF document

from days to minutes from from days to minutes from
SMART_READER_LITE
LIVE PREVIEW

From days to minutes, from From days to minutes, from minutes to - - PDF document

From days to minutes, from From days to minutes, from minutes to milliseconds with minutes to milliseconds with SQLAlchemy SQLAlchemy Leonardo Rochael Almeida 10-July-2019 Hi, Im Leo. Im a Tech Lead at Geru. Im here today to talk


slide-1
SLIDE 1
slide-2
SLIDE 2

From days to minutes, from From days to minutes, from minutes to milliseconds with minutes to milliseconds with SQLAlchemy SQLAlchemy

Leonardo Rochael Almeida 10-July-2019

slide-3
SLIDE 3

Hi, I’m Leo. I’m a Tech Lead at Geru. I’m here today to talk to you about ORMs and performance. I’m by no means an expert in either SQL, SQLAlchemy or ORMs. But I’d like to pass on lessons learned while optimizing some processes in my company. Speaker notes

slide-4
SLIDE 4

Geru Geru

Brazillian Fintech Brazillian Fintech

Backend Stack: Others (Celery, MongoDB, Java, …)

slide-5
SLIDE 5

Our backend stack is almost all Python, with storage mostly in PostgreSQL through SQLAlchemy. Speaker notes

slide-6
SLIDE 6

SQLAlchemy SQLAlchemy

Two aspects: SQL Expression Language (a Python DSL) X Object Relaonal Mapper (ORM)

slide-7
SLIDE 7

SQLAlchemy has two aspects: The SQL Expression Langage, which is a way of mapping SQL constructs into a Pythonic Domain Specific Language (DSL) The Object Relational Mapper, which allows mapping Python classes to tables and records of those tables to instances of the respective classes. The ORM is built upon the DSL, but they can be used without one another. At Geru we use the ORM almost exclusively. TODO: Add slides showing code examples contrasting DSL/ORM Speaker notes

slide-8
SLIDE 8

SQLAlchemy is Awesome! SQLAlchemy is Awesome!

However: Frameworks sll require you to make decisions about how to use them, and knowing the underlying paerns is essenal if you are to make wise choices.

  • Marn Fowler

hps:/ /marnfowler.com/books/eaa.html

slide-9
SLIDE 9

The ORM Trap The ORM Trap

slide-10
SLIDE 10

The ORM Trap The ORM Trap

Sensible Python code ➡ Bad SQL access paerns

slide-11
SLIDE 11

The ORM Trap The ORM Trap

Sensible Python code ➡ Bad SQL access paerns Unnoceable at low data volumes

slide-12
SLIDE 12

The ORM Trap The ORM Trap

Sensible Python code ➡ Bad SQL access paerns Unnoceable at low data volumes Like… during development…

slide-13
SLIDE 13

The ORM Trap The ORM Trap

Sensible Python code ➡ Bad SQL access paerns Unnoceable at low data volumes Like… during development… And early producon…

slide-14
SLIDE 14

Using a good ORM feels great. Most of the time you forget it’s even there! And that is actually the problem, because the DB is an external system with an API and should be treated as such. The API just happens to be SQL… Speaker notes

slide-15
SLIDE 15

The Fix: Let the DB do its Job The Fix: Let the DB do its Job

slide-16
SLIDE 16

The Fix: Let the DB do its Job The Fix: Let the DB do its Job

Be aware of implicit queries.

slide-17
SLIDE 17

The Fix: Let the DB do its Job The Fix: Let the DB do its Job

Be aware of implicit queries. Specially from relaonships.

slide-18
SLIDE 18

The Fix: Let the DB do its Job The Fix: Let the DB do its Job

Be aware of implicit queries. Specially from relaonships. Aim for O(1) queries per request/job/acvity.

slide-19
SLIDE 19

The Fix: Let the DB do its Job The Fix: Let the DB do its Job

Be aware of implicit queries. Specially from relaonships. Aim for O(1) queries per request/job/acvity. Avoid looping through model instances

slide-20
SLIDE 20

The Fix: Let the DB do its Job The Fix: Let the DB do its Job

Be aware of implicit queries. Specially from relaonships. Aim for O(1) queries per request/job/acvity. Avoid looping through model instances Let the DB do it for you

slide-21
SLIDE 21

Be mindful of the work that the database is doing Specially the amount of DB round-trips But also the amount of data traffic (row count) Speaker notes

slide-22
SLIDE 22

Geru Case 1: The 24+ hour Geru Case 1: The 24+ hour reports reports

Now it takes minutes

slide-23
SLIDE 23

Geru Funding Model Geru Funding Model

slide-24
SLIDE 24

Geru is a Fintech that lends money at rates much lower than the mainstream banks in Brazil. We work online exclusively. During each month, borrowers pay their monthly instalments, and at the beginning of every month Geru pays back the Debenture Holders. This is very simplified of course, there are lots of details on top of that: Debentures bought later “cost” more but are “worth” the same Debenture remuneration is complicated by tax details like Amortization paid back doesn’t pay taxes but the premium on top does pay Amount of time invested reduce taxes Different series have different payback rules Speaker notes

slide-25
SLIDE 25

Entities and Relationships Entities and Relationships

slide-26
SLIDE 26

ORM Declaration ORM Declaration

DBSession = scoped_session(sessionmaker(...)) class ORMClass(object): """Base class for all models""" @classproperty def query(cls): """ Convenient query for records of a model, like: query = MyModel.query.filter(...).order_by(...) """ return DBSession.query(cls) Base = declarative_base(cls=ORMClass)

slide-27
SLIDE 27

Model Declaration Model Declaration

class Debenture(Base): id = Column(Integer, primary_key=True) series_number = Column(Integer, nullable=False) sale_price = Column(Numeric, nullable=True) sale_date = Column(Date, nullable=True) # cont ...

slide-28
SLIDE 28

Model Declaration Model Declaration

class Debenture(Base): # cont ... holder_id = Column( Integer, ForeignKey('debenture_holder.id'), nullable=True, index=True, ) holder = relationship( 'DebentureHolder', backref=backref( 'debentures', lazy='dynamic', ), foreign_keys=[holder_id], ) # cont ...

slide-29
SLIDE 29

Model Declaration Model Declaration

class Debenture(Base): # cont ... series_id = Column( Integer, ForeignKey('debenture_series.id'), nullable=False, index=True, ) series = relationship( 'DebentureSeries', backref=backref( 'debentures', lazy='dynamic', ), foreign_keys=[series_id], )

slide-30
SLIDE 30

First things rst: logging First things rst: logging

# development.ini [loggers] keys = sqlalchemy [logger_sqlalchemy] qualname = sqlalchemy.engine level = INFO # "level = INFO" logs SQL queries. # "level = DEBUG" logs SQL queries and results. # "level = WARN" logs neither (in production).

slide-31
SLIDE 31

So I had to debug an issue in the distribution code, but it was taking way too long at each run. So the first thing I did was to enable sqlalchemy statement logging in my development instance, and what I saw was gobs of repeated statements, all alike, just rolling through the logs. Speaker notes

slide-32
SLIDE 32

Understanding the cache Understanding the cache

  • ptimization
  • ptimization

See diff and Jupyter

slide-33
SLIDE 33

It’s perfectly reasonable in pure Python to sum() over an iteration of attribute accessess in generator comprehension. But if the generator comprehension is looping over a query then a lot of data is being fetched from the database so that in the end the Python programmer could calculate do what the database could reply with a single line of SQL. Speaker notes

slide-34
SLIDE 34

Understanding the Understanding the insert/update optimization insert/update optimization

See diff and Jupyter

slide-35
SLIDE 35

When the optimization When the optimization backres backres

See diff and Jupyter

slide-36
SLIDE 36

Unfortunately at some point the complex query that allowed to fetch all information of each integralization started taking hours. It was a single query taking many hours to execute. Fortunately it was easy to locate as it was a single query envelopped by logging calls. The query was then broken into two parts, the second of which was executed in a loop for each integralization. Since the amount of data transmitted was small, and only a single query per loop was added inside a loop that already contained multiple other slower queries, it had no negative impact, and the outer query ran again at the same 2 minutes timeframe as in the beginning. Speaker notes

slide-37
SLIDE 37

Geru Case 2: The 1+minute Geru Case 2: The 1+minute page page

Now renders in less than a second.

slide-38
SLIDE 38

First things rst: slowlog First things rst: slowlog

[app:main] pyramid.includes = pyramid_tm [...] slowlog # Slowlog configuration: slowlog = true slowlog_file = logs/slow.log

slide-39
SLIDE 39

A complete understanding of what slowlog does is out of scope for this talk (there is talk by me at PyConUS 2013 about it on YouTube), but basically slowlog watches for wsgi requests that take too long and starts dumping periodic stack traces of the thread handling the slow requests. Makes it ease to see which point of the code is responsible for the performance issues. Speaker notes

slide-40
SLIDE 40

Understanding the Understanding the authorization optimization authorization optimization

See diff

slide-41
SLIDE 41

Conclusions Conclusions

Understand SQL Understand SQL

slide-42
SLIDE 42

Conclusions Conclusions

Understand SQL Understand SQL

SELECT Documentaon

slide-43
SLIDE 43

Conclusions Conclusions

Understand SQL Understand SQL

SELECT Documentaon GROUP BY vs aggregaon funcons

slide-44
SLIDE 44

Conclusions Conclusions

Understand SQL Understand SQL

SELECT Documentaon GROUP BY vs aggregaon funcons aggregaon funcon w/ filters

slide-45
SLIDE 45

Conclusions Conclusions

Understand SQL Understand SQL

SELECT Documentaon GROUP BY vs aggregaon funcons aggregaon funcon w/ filters DISTINCT ON

slide-46
SLIDE 46

Conclusions Conclusions

Understand SQL Understand SQL

SELECT Documentaon GROUP BY vs aggregaon funcons aggregaon funcon w/ filters DISTINCT ON window expressions

slide-47
SLIDE 47

Study SQL: Read the SELECT documentation of your database. Understand how aggregation functions (sum(), array_agg()) interfere with the cardinality (number of rows) of the result and how they interact with GROUP BY. Understand DISTINCT and specially DISTINCT ON Understand "window" expressions sum(X) OVER (PARTITION BY ... ORDER BY) Read about CTEs (WITH statement) and subqueries, and how/where you can use them. Understand how to insert and update rows that match the result of other queries. Speaker notes

slide-48
SLIDE 48

Conclusions Conclusions

THEN Study SQLAlchemy THEN Study SQLAlchemy

slide-49
SLIDE 49

Conclusions Conclusions

THEN Study SQLAlchemy THEN Study SQLAlchemy

Be aware of the underlying queries

slide-50
SLIDE 50

Conclusions Conclusions

THEN Study SQLAlchemy THEN Study SQLAlchemy

Be aware of the underlying queries Push work to the DB

slide-51
SLIDE 51

Conclusions Conclusions

THEN Study SQLAlchemy THEN Study SQLAlchemy

Be aware of the underlying queries Push work to the DB As much as possible

slide-52
SLIDE 52

Conclusions Conclusions

THEN Study SQLAlchemy THEN Study SQLAlchemy

Be aware of the underlying queries Push work to the DB As much as possible But not too much

slide-53
SLIDE 53

THEN Study SQLAlchemy Learn how to produce in SQLAlchemy the same optimized access patterns you know are possible in SQL Be aware of the underlying queries All attribute accesses could represent a roundtrip. Atomic values usually don’t (though they can if you ask SQLAlchemy to defer loading columns). But all relatioship attributes do, unless they’re not dynamic and have been previously accessed in the same session. Speaker notes

slide-54
SLIDE 54

The End The End

Thank you! LeoRochael@geru.com.br LeoRochael@gmail.com @LeoRochael