Categorical Plot Types DATA VIS UALIZ ATION W ITH S EABORN Chris - - PowerPoint PPT Presentation

categorical plot types
SMART_READER_LITE
LIVE PREVIEW

Categorical Plot Types DATA VIS UALIZ ATION W ITH S EABORN Chris - - PowerPoint PPT Presentation

Categorical Plot Types DATA VIS UALIZ ATION W ITH S EABORN Chris Moftt Instructor Categorical Data Data which takes on a limited and xed number of values Normally combined with numeric data Examples include: Geography (country,


slide-1
SLIDE 1

Categorical Plot Types

DATA VIS UALIZ ATION W ITH S EABORN

Chris Moftt

Instructor

slide-2
SLIDE 2

DATA VISUALIZATION WITH SEABORN

Categorical Data

Data which takes on a limited and xed number of values Normally combined with numeric data Examples include: Geography (country, state, region) Gender Ethnicity Blood type Eye color

slide-3
SLIDE 3

DATA VISUALIZATION WITH SEABORN

Plot types - show each observation

slide-4
SLIDE 4

DATA VISUALIZATION WITH SEABORN

Plot types - abstract representations

slide-5
SLIDE 5

DATA VISUALIZATION WITH SEABORN

Plot types - statistical estimates

slide-6
SLIDE 6

DATA VISUALIZATION WITH SEABORN

Plots of each observation - stripplot

sns.stripplot(data=df, y="DRG Definition", x="Average Covered Charges", jitter=True)

slide-7
SLIDE 7

DATA VISUALIZATION WITH SEABORN

Plots of each observation - swarmplot

sns.swarmplot(data=df, y="DRG Definition", x="Average Covered Charges")

slide-8
SLIDE 8

DATA VISUALIZATION WITH SEABORN

Abstract representations - boxplot

sns.boxplot(data=df, y="DRG Definition", x="Average Covered Charges")

slide-9
SLIDE 9

DATA VISUALIZATION WITH SEABORN

Abstract representation - violinplot

sns.violinplot(data=df, y="DRG Definition", x="Average Covered Charges")

slide-10
SLIDE 10

DATA VISUALIZATION WITH SEABORN

Abstract representation - lvplot

sns.lvplot(data=df, y="DRG Definition", x="Average Covered Charges")

slide-11
SLIDE 11

DATA VISUALIZATION WITH SEABORN

Statistical estimates - barplot

sns.barplot(data=df, y="DRG Definition", x="Average Covered Charges", hue="Region")

slide-12
SLIDE 12

DATA VISUALIZATION WITH SEABORN

Statistical estimates - pointplot

sns.pointplot(data=df, y="DRG Definition", x="Average Covered Charges", hue="Region")

slide-13
SLIDE 13

DATA VISUALIZATION WITH SEABORN

Statistical estimates - countplot

sns.countplot(data=df, y="DRG_Code", hue="Region")

slide-14
SLIDE 14

Let's practice!

DATA VIS UALIZ ATION W ITH S EABORN

slide-15
SLIDE 15

Regression Plots

DATA VIS UALIZ ATION W ITH S EABORN

Chris Moftt

Instructor

slide-16
SLIDE 16

DATA VISUALIZATION WITH SEABORN

Bicycle Dataset

Aggregated bicycle sharing data in Washington DC Data includes: Rental amounts Weather information Calendar information Can we predict rental amounts?

slide-17
SLIDE 17

DATA VISUALIZATION WITH SEABORN

Plotting with regplot()

sns.regplot(data=df, x='temp', y='total_rentals', marker='+')

slide-18
SLIDE 18

DATA VISUALIZATION WITH SEABORN

Evaluating regression with residplot()

A residual plot is useful for evaluating the t of a model Seaborn supports through residplot function

sns.residplot(data=df, x='temp', y='total_rentals')

slide-19
SLIDE 19

DATA VISUALIZATION WITH SEABORN

Polynomial regression

Seaborn supports polynomial regression using the order parameter

sns.regplot(data=df, x='temp', y='total_rentals', order=2)

slide-20
SLIDE 20

DATA VISUALIZATION WITH SEABORN

residplot with polynomial regression

sns.residplot(data=df, x='temp', y='total_rentals', order=2)

slide-21
SLIDE 21

DATA VISUALIZATION WITH SEABORN

Categorical values

sns.regplot(data=df, x='mnth', y='total_rentals', x_jitter=.1, order=2)

slide-22
SLIDE 22

DATA VISUALIZATION WITH SEABORN

Estimators

In some cases, an x_estimator can be useful for highlighting trends

sns.regplot(data=df, x='mnth', y='total_rentals', x_estimator=np.mean, order=2)

slide-23
SLIDE 23

DATA VISUALIZATION WITH SEABORN

Binning the data

x_bins can be used to divide the data into discrete bins

The regression line is still t against all the data

sns.regplot(data=df,x='temp',y='total_rentals', x_bins=4)

slide-24
SLIDE 24

Let's practice!

DATA VIS UALIZ ATION W ITH S EABORN

slide-25
SLIDE 25

Matrix Plots

DATA VIS UALIZ ATION W ITH S EABORN

Chris Moftt

Instructor

slide-26
SLIDE 26

DATA VISUALIZATION WITH SEABORN

Getting data in the right format

Seaborn's heatmap() function requires data to be in a grid format pandas crosstab() is frequently used to manipulate the data

pd.crosstab(df["mnth"], df["weekday"], values=df["total_rentals"],aggfunc='mean').round(0)

slide-27
SLIDE 27

DATA VISUALIZATION WITH SEABORN

Build a heatmap

sns.heatmap(pd.crosstab(df["mnth"], df["weekday"], values=df["total_rentals"], aggfunc='mean') )

slide-28
SLIDE 28

DATA VISUALIZATION WITH SEABORN

Customize a heatmap

sns.heatmap(df_crosstab, annot=True, fmt="d", cmap="YlGnBu", cbar=False, linewidths=.5)

slide-29
SLIDE 29

DATA VISUALIZATION WITH SEABORN

Centering a heatmap

Seaborn support centering the heatmap colors on a specic value

sns.heatmap(df_crosstab, annot=True, fmt="d", cmap="YlGnBu", cbar=True, center=df_crosstab.loc[9, 6])

slide-30
SLIDE 30

DATA VISUALIZATION WITH SEABORN

Plotting a correlation matrix

Pandas corr function calculates correlations between columns in a dataframe The output can be converted to a heatmap with seaborn

sns.heatmap(df.corr())

slide-31
SLIDE 31

Let's practice!

DATA VIS UALIZ ATION W ITH S EABORN