Joining data: a real- world necessity PAN DAS JOIN S F OR S P - - PowerPoint PPT Presentation

joining data a real world necessity
SMART_READER_LITE
LIVE PREVIEW

Joining data: a real- world necessity PAN DAS JOIN S F OR S P - - PowerPoint PPT Presentation

Joining data: a real- world necessity PAN DAS JOIN S F OR S P READS H EET US ERS John Miller Principal Data Scientist Pandas for spreadsheet users Learn based on similarities to spreadsheets Understand the power and exibility of pandas


slide-1
SLIDE 1

Joining data: a real- world necessity

PAN DAS JOIN S F OR S P READS H EET US ERS

John Miller

Principal Data Scientist

slide-2
SLIDE 2

PANDAS JOINS FOR SPREADSHEET USERS

Pandas for spreadsheet users

Learn based on similarities to spreadsheets Understand the power and exibility of pandas Use data from the National Football League (NFL)

slide-3
SLIDE 3

PANDAS JOINS FOR SPREADSHEET USERS

Common situations

Datasets split by time or other factor Datasets with related factors

slide-4
SLIDE 4

PANDAS JOINS FOR SPREADSHEET USERS

Split data

Inuenced by reporting cycle Common splits Time Geography Business unit

slide-5
SLIDE 5

PANDAS JOINS FOR SPREADSHEET USERS

Split data example

slide-6
SLIDE 6

PANDAS JOINS FOR SPREADSHEET USERS

Split data example

slide-7
SLIDE 7

PANDAS JOINS FOR SPREADSHEET USERS

Split data example

slide-8
SLIDE 8

PANDAS JOINS FOR SPREADSHEET USERS

Complementary data

Results from collecting data for different purposes Department-specic data Storage in separate les or database tables

slide-9
SLIDE 9

PANDAS JOINS FOR SPREADSHEET USERS

Complementary data example

slide-10
SLIDE 10

PANDAS JOINS FOR SPREADSHEET USERS

Complementary data example

slide-11
SLIDE 11

PANDAS JOINS FOR SPREADSHEET USERS

Complementary data example

slide-12
SLIDE 12

Let's practice!

PAN DAS JOIN S F OR S P READS H EET US ERS

slide-13
SLIDE 13

Concatenation

PAN DAS JOIN S F OR S P READS H EET US ERS

John Miller

Principal Data Scientist

slide-14
SLIDE 14

PANDAS JOINS FOR SPREADSHEET USERS

Concatenation basics

Similar to spreadsheet CONCATENATE Mimics copy-paste of cells

pd.concat() along rows or columns

slide-15
SLIDE 15

PANDAS JOINS FOR SPREADSHEET USERS

Concatenating rows

Useful when working with split data

pd.concat([df1, df2, ...])

Uses unique key(s) as data frame index Includes all rows by default

slide-16
SLIDE 16

PANDAS JOINS FOR SPREADSHEET USERS

Concatenating rows with overlapping indices

Data frame indices may overlap Don't worry!

pd.concat([df1, df2, ...], ignore_index=True)

slide-17
SLIDE 17

PANDAS JOINS FOR SPREADSHEET USERS

Concatenating columns

Like pasting tables side by side Across columns: axis=1

pd.concat([df1, df2, ...], axis=1)

Includes all columns by default

slide-18
SLIDE 18

Let's practice!

PAN DAS JOIN S F OR S P READS H EET US ERS

slide-19
SLIDE 19

Power and exibility

PAN DAS JOIN S F OR S P READS H EET US ERS

John Miller

Principal Data Scientist

slide-20
SLIDE 20

PANDAS JOINS FOR SPREADSHEET USERS

Scalability

No hard limits on data frame size Built-in ways to "chunk" data Use distributed/parallel computing

slide-21
SLIDE 21

PANDAS JOINS FOR SPREADSHEET USERS

Efciency

Join on multiple columns Preference for simple code

joined_df = left_df.merge(right_df)

slide-22
SLIDE 22

PANDAS JOINS FOR SPREADSHEET USERS

Integration

Improved speed and scale Data visualization Machine learning

slide-23
SLIDE 23

PANDAS JOINS FOR SPREADSHEET USERS

A word on advanced spreadsheet usage

Data models and query tools Programming languages Advanced formulas

slide-24
SLIDE 24

Let's practice!

PAN DAS JOIN S F OR S P READS H EET US ERS