Types of joins PAN DAS JOIN S F OR S P READS H EET US ERS John - - PowerPoint PPT Presentation

types of joins
SMART_READER_LITE
LIVE PREVIEW

Types of joins PAN DAS JOIN S F OR S P READS H EET US ERS John - - PowerPoint PPT Presentation

Types of joins PAN DAS JOIN S F OR S P READS H EET US ERS John Miller Principal Data Scientist Three types of joins Types One-to-one One-to-many Many-to-many Join type indicates relationship of tables Like lookup values in VLOOKUP


slide-1
SLIDE 1

Types of joins

PAN DAS JOIN S F OR S P READS H EET US ERS

John Miller

Principal Data Scientist

slide-2
SLIDE 2

PANDAS JOINS FOR SPREADSHEET USERS

Three types of joins

Types One-to-one One-to-many Many-to-many Join type indicates relationship of tables Like lookup values in VLOOKUP

slide-3
SLIDE 3

PANDAS JOINS FOR SPREADSHEET USERS

One-to-one

Datasets at the same level Lookup column same for both tables Rows match one-to-one

slide-4
SLIDE 4

PANDAS JOINS FOR SPREADSHEET USERS

One-to-many

Rows are unique at different levels Join based on the higher-level column Result based on lower-level column Similar to VLOOKUP based on lower-level data

slide-5
SLIDE 5

PANDAS JOINS FOR SPREADSHEET USERS

Many-to-many

Values are repeated Results in all possible combinations Can be hard to interpret Best supplemented with a "bridge" table

slide-6
SLIDE 6

Let's practice!

PAN DAS JOIN S F OR S P READS H EET US ERS

slide-7
SLIDE 7

A closer look at one- to-one joins

PAN DAS JOIN S F OR S P READS H EET US ERS

John Miller

Principal Data Scientist

slide-8
SLIDE 8

PANDAS JOINS FOR SPREADSHEET USERS

Basics of pandas.merge()

Used with complementary data

pd.merge(left, right)

Joins on one or more columns Similar to VLOOKUP

slide-9
SLIDE 9

PANDAS JOINS FOR SPREADSHEET USERS

Left merges

pd.merge(df_left, df_right,

  • n='GameKey', how='left')

List data frame names for "df_left" and "df_right" placeholders Resulting frame matches rows in left frame

slide-10
SLIDE 10

PANDAS JOINS FOR SPREADSHEET USERS

Right merges

pd.merge(df_left, df_right,

  • n='GameKey', how='right')

Same concept as left merge Resulting frame rows match right frame

slide-11
SLIDE 11

PANDAS JOINS FOR SPREADSHEET USERS

A framework for joins

After viewing and understanding the data: Determine the relationship Check for unique values in key column Write merge statement and execute

slide-12
SLIDE 12

PANDAS JOINS FOR SPREADSHEET USERS

Joining on two keys

Determine the relationship Check for unique values in key column Write merge statement and execute

pd.merge(df_left, df_right,

  • n=['GameKey', 'PlayId'])
slide-13
SLIDE 13

Let's practice!

PAN DAS JOIN S F OR S P READS H EET US ERS

slide-14
SLIDE 14

Combining common data with inner joins

PAN DAS JOIN S F OR S P READS H EET US ERS

John Miller

Principal Data Scientist

slide-15
SLIDE 15

PANDAS JOINS FOR SPREADSHEET USERS

Object-oriented expressions

pandas-level function:

pd.merge(df_left, df_right, on=None)

Shorthand, pandas dataframe method:

df_left.merge(df_right, on=None)

slide-16
SLIDE 16

PANDAS JOINS FOR SPREADSHEET USERS

A basic inner join

Similar to VLOOKUP followed by removing NA's

df1.merge(df2, on='GameKey', how='inner')

Returns only rows present in both tables

slide-17
SLIDE 17

PANDAS JOINS FOR SPREADSHEET USERS

Joining on different names

Key columns may have different names No need to rename columns df1.merge(df2, left_on='GameKey', right_on='game-key', how='inner')

slide-18
SLIDE 18

Let's practice!

PAN DAS JOIN S F OR S P READS H EET US ERS