ian ozsvald
play

Ian Ozsvald @IanOzsvald ianozsvald.com Introductions Interim - PowerPoint PPT Presentation

Making Pandas Fly (live from London) EuroPython 2020 Ian Ozsvald @IanOzsvald ianozsvald.com Introductions Interim Chief Data Scientist 19+ years experience Edition! Team coaching & public courses d n 2 Im sharing


  1. Making Pandas Fly (live from London) EuroPython 2020 Ian Ozsvald @IanOzsvald – ianozsvald.com

  2. Introductions  Interim Chief Data Scientist  19+ years experience Edition!  Team coaching & public courses d n 2 – I’m sharing from my Higher Performance Python course Ian Ozsvald By [ian]@ianozsvald[.com]

  3. Thank the organisers!  All volunteers – go say thank you in #lobby  They’ve put in a huge amount of volunteered work for us! Ian Ozsvald By [ian]@ianozsvald[.com]

  4. Today’s goal  Pandas – Saving RAM to fjt in more data – Calculating faster by dropping to Numpy  Advice for “being highly performant”  Has Covid 19 afgected UK Company Registrations? Ian Ozsvald By [ian]@ianozsvald[.com]

  5. Strings are expensive and slow Ian Ozsvald By [ian]@ianozsvald[.com]

  6. Categoricals are cheap and fast! Circa 1% of previous memory cost Ian Ozsvald By [ian]@ianozsvald[.com]

  7. Categoricals “.cat” accessor Ian Ozsvald By [ian]@ianozsvald[.com]

  8. Categoricals – over 10x speed up (on this data)! Ian Ozsvald By [ian]@ianozsvald[.com]

  9. Categoricals – index queries faster! Circa 500x speed-up! Ian Ozsvald By [ian]@ianozsvald[.com]

  10. fmoat64 is default and a bit expensive Ian Ozsvald By [ian]@ianozsvald[.com]

  11. fmoat32 “half-price” and a bit faster Ian Ozsvald By [ian]@ianozsvald[.com]

  12. Make choices to save RAM Including the index (previously we ignored it) we still save circa 50% RAM so you can fjt in more rows of data Ian Ozsvald By [ian]@ianozsvald[.com]

  13. “dtype_diet” gives you advice Ian Ozsvald By [ian]@ianozsvald[.com]

  14. Drop to NumPy if you know you can Caveat – Pandas mean is not np mean, the fair comparison is to np nanmean which is slower – see my blog or PyDataAmsterdam 2020 talk for details Ian Ozsvald By [ian]@ianozsvald[.com]

  15. NumPy vs Pandas overhead (ser.sum()) Thanks! 25 fjles, 83 functions Very few NumPy calls! Ian Ozsvald By [ian]@ianozsvald[.com]

  16. Overhead... Ian Ozsvald By [ian]@ianozsvald[.com]

  17. Overhead with ser.values.sum() 18 fjles, 51 functions Many fewer Pandas calls (but still a lot!) Ian Ozsvald By [ian]@ianozsvald[.com]

  18. Is Pandas unnecessarily slow – NO! https://github.com/pandas-dev/pandas/issues/34773 - the truth is a bit complicated! Ian Ozsvald By [ian]@ianozsvald[.com]

  19. Being highly performant  Install optional (but great!) Pandas dependencies – bottleneck https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html – numexpr  Investigate https://github.com/ianozsvald/dtype_diet  Investigate my ipython_memory_usage (PyPI/Conda) Ian Ozsvald By [ian]@ianozsvald[.com]

  20. Pure Python is “slow” and expressive Deliberately poor function – pretend this is clever but slow! Ian Ozsvald By [ian]@ianozsvald[.com]

  21. Compile to Numba judiciously Near 10x speed-up! Ian Ozsvald By [ian]@ianozsvald[.com]

  22. Parallelise with Dask for multi-core  Make plain-Python code multi-core  Note I had to drop text index column due to speed-hit  Data copy cost can overwhelm any benefjts so (always) profjle & time Ian Ozsvald By [ian]@ianozsvald[.com]

  23. Being highly performant  Mistakes slow us down (PAY ATTENTION!) – Try nullable Int64 & boolean, forthcoming Float64 – Write tests (unit & end-to-end) – Lots more material & my newsletter on my blog IanOzsvald.com – Time saving docs: Ian Ozsvald By [ian]@ianozsvald[.com]

  24. Vaex / Modin  Memory mapped & lazy computation – New string dtype (RAM efgicient)  Modin sits on Pandas, new “algebra” for dfs – Drop in replacement, easy to try See talks on my blog: Ian Ozsvald By [ian]@ianozsvald[.com]

  25. Summary  Make it right then make it fast  Think about being performant  See blog for my classes  I’d love a postcard if you learned something new! Ian Ozsvald By [ian]@ianozsvald[.com]

  26. Covid 19’s efgect on UK Economy? Sharp decline in corporate registration after Lockdown – then apparent surge (perhaps just backed-up paperwork?). Will the recovery “last”? All open data , you can do similar things! Ian Ozsvald By [ian]@ianozsvald[.com]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend