joining data a real world necessity
play

Joining data: a real- world necessity PAN DAS JOIN S F OR S P - PowerPoint PPT Presentation

Joining data: a real- world necessity PAN DAS JOIN S F OR S P READS H EET US ERS John Miller Principal Data Scientist Pandas for spreadsheet users Learn based on similarities to spreadsheets Understand the power and exibility of pandas


  1. Joining data: a real- world necessity PAN DAS JOIN S F OR S P READS H EET US ERS John Miller Principal Data Scientist

  2. Pandas for spreadsheet users Learn based on similarities to spreadsheets Understand the power and �exibility of pandas Use data from the National Football League (NFL) PANDAS JOINS FOR SPREADSHEET USERS

  3. Common situations Datasets split by time or other factor Datasets with related factors PANDAS JOINS FOR SPREADSHEET USERS

  4. Split data In�uenced by reporting cycle Common splits Time Geography Business unit PANDAS JOINS FOR SPREADSHEET USERS

  5. Split data example PANDAS JOINS FOR SPREADSHEET USERS

  6. Split data example PANDAS JOINS FOR SPREADSHEET USERS

  7. Split data example PANDAS JOINS FOR SPREADSHEET USERS

  8. Complementary data Results from collecting data for different purposes Department-speci�c data Storage in separate �les or database tables PANDAS JOINS FOR SPREADSHEET USERS

  9. Complementary data example PANDAS JOINS FOR SPREADSHEET USERS

  10. Complementary data example PANDAS JOINS FOR SPREADSHEET USERS

  11. Complementary data example PANDAS JOINS FOR SPREADSHEET USERS

  12. Let's practice! PAN DAS JOIN S F OR S P READS H EET US ERS

  13. Concatenation PAN DAS JOIN S F OR S P READS H EET US ERS John Miller Principal Data Scientist

  14. Concatenation basics Similar to spreadsheet CONCATENATE Mimics copy-paste of cells pd.concat() along rows or columns PANDAS JOINS FOR SPREADSHEET USERS

  15. Concatenating rows Useful when working with split data pd.concat([df1, df2, ...]) Uses unique key(s) as data frame index Includes all rows by default PANDAS JOINS FOR SPREADSHEET USERS

  16. Concatenating rows with overlapping indices Data frame indices may overlap Don't worry! pd.concat([df1, df2, ...], ignore_index=True) PANDAS JOINS FOR SPREADSHEET USERS

  17. Concatenating columns Like pasting tables side by side Across columns: axis=1 pd.concat([df1, df2, ...], axis=1) Includes all columns by default PANDAS JOINS FOR SPREADSHEET USERS

  18. Let's practice! PAN DAS JOIN S F OR S P READS H EET US ERS

  19. Power and �exibility PAN DAS JOIN S F OR S P READS H EET US ERS John Miller Principal Data Scientist

  20. Scalability No hard limits on data frame size Built-in ways to "chunk" data Use distributed/parallel computing PANDAS JOINS FOR SPREADSHEET USERS

  21. Ef�ciency Join on multiple columns Preference for simple code joined_df = left_df.merge(right_df) PANDAS JOINS FOR SPREADSHEET USERS

  22. Integration Improved speed and scale Data visualization Machine learning PANDAS JOINS FOR SPREADSHEET USERS

  23. A word on advanced spreadsheet usage Data models and query tools Programming languages Advanced formulas PANDAS JOINS FOR SPREADSHEET USERS

  24. Let's practice! PAN DAS JOIN S F OR S P READS H EET US ERS

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend