Data Formats Omayma Said Data Scientist DataCamp Interactive Data - - PowerPoint PPT Presentation

data formats
SMART_READER_LITE
LIVE PREVIEW

Data Formats Omayma Said Data Scientist DataCamp Interactive Data - - PowerPoint PPT Presentation

DataCamp Interactive Data Visualization with rbokeh INTERACTIVE DATA VISUALIZATION WITH RBOKEH Data Formats Omayma Said Data Scientist DataCamp Interactive Data Visualization with rbokeh hdi_cpi_2015 Data DataCamp Interactive Data


slide-1
SLIDE 1

DataCamp Interactive Data Visualization with rbokeh

Data Formats

INTERACTIVE DATA VISUALIZATION WITH RBOKEH

Omayma Said

Data Scientist

slide-2
SLIDE 2

DataCamp Interactive Data Visualization with rbokeh

hdi_cpi_2015 Data

slide-3
SLIDE 3

DataCamp Interactive Data Visualization with rbokeh

Data Format (hdi_cpi_wide)

> str(hdi_cpi_wide) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 121 obs. of 8 variables: $ country : chr "Afghanistan" "Albania" "Algeria" "Angola" $ year : int 2015 2015 2015 2015 2015 2015 2015 2015 201 $ human_development_index : num 0.479 0.764 0.745 0.533 0.827 0.939 0.893 0 $ country_code : chr "AFG" "ALB" "DZA" "AGO" ... $ cpi_rank : int 166 88 88 163 106 13 16 50 139 15 ... $ region : chr "AP" "ECA" "MENA" "SSA" ... $ corruption_perception_index: int 11 36 36 15 32 79 76 51 25 77 ... $ continent

slide-4
SLIDE 4

DataCamp Interactive Data Visualization with rbokeh

Data Format (hdi_cpi_long)

> str(hdi_cpi_long) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 242 obs. of 7 variables: $ country : chr "Afghanistan" "Afghanistan" "Albania" "Albania" ... $ country_code: chr "AFG" "AFG" "ALB" "ALB" ... $ cpi_rank : int 166 166 88 88 88 88 163 163 106 106 ... $ region : chr "AP" "AP" "ECA" "ECA" ... $ continent : chr "Asia" "Asia" "Europe" "Europe" ... $ index : chr "human_development_index" "corruption_perception_index" "h $ value : num 0.479 11 0.764 36 0.745 36 0.533 15 0.827 32 ...

slide-5
SLIDE 5

DataCamp Interactive Data Visualization with rbokeh

gather() and spread()

slide-6
SLIDE 6

DataCamp Interactive Data Visualization with rbokeh

Long to Wide

hdi_cpi_wide <- hdi_cpi_long %>% spread(key = index, value = value) str(hdi_cpi_wide) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 121 obs. of 8 variables: $ country : chr "Afghanistan" "Albania" "Algeria" "Angola" $ year : int 2015 2015 2015 2015 2015 2015 2015 2015 201 $ human_development_index : num 0.479 0.764 0.745 0.533 0.827 0.939 0.893 0 $ country_code : chr "AFG" "ALB" "DZA" "AGO" ... $ cpi_rank : int 166 88 88 163 106 13 16 50 139 15 ... $ region : chr "AP" "ECA" "MENA" "SSA" ... $ corruption_perception_index: int 11 36 36 15 32 79 76 51 25 77 ... $ continent : chr "Asia" "Europe" "Africa" "Africa" ...

slide-7
SLIDE 7

DataCamp Interactive Data Visualization with rbokeh

Data Format (hdi_data_wide)

> hdi_data_wide # A tibble: 188 x 27 country `1990` `1991` `1992` `1993` `1994` `1995` `1996` `1997` `1998` <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Afghanistan 0.295 0.3 0.309 0.305 0.3 0.324 0.328 0.332 0.335 2 Albania 0.635 0.618 0.603 0.608 0.616 0.628 0.637 0.636 0.646 3 Algeria 0.577 0.581 0.587 0.591 0.595 0.6 0.609 0.617 0.627 4 Andorra NA NA NA NA NA NA NA NA NA 5 Angola NA NA NA NA NA NA NA NA NA 6 Antigua and ~ NA NA NA NA NA NA NA NA NA 7 Argentina 0.705 0.713 0.72 0.725 0.728 0.731 0.738 0.746 0.753 8 Armenia 0.634 0.628 0.595 0.593 0.597 0.603 0.609 0.618 0.632 9 Australia 0.866 0.867 0.871 0.874 0.876 0.885 0.888 0.891 0.894 10 Austria 0.794 0.798 0.804 0.806 0.812 0.816 0.819 0.823 0.833 # ... with 178 more rows, and 14 more variables: `2002` <dbl>, `2003` <dbl>, `20 # `2005` <dbl>, `2006` <dbl>, `2007` <dbl>, `2008` <dbl>, `2009` <dbl>, `2010` # `2011` <dbl>, `2012` <dbl>, `2013` <dbl>, `2014` <dbl>, `2015` <dbl>

slide-8
SLIDE 8

DataCamp Interactive Data Visualization with rbokeh

Wide to Long

hdi_data_long <- hdi_data_wide %>% gather(key = year, value = human_development_index, - country) > hdi_data_long # A tibble: 4,888 x 3 country year human_development_index <chr> <int> <dbl> 1 Afghanistan 1990 0.295 2 Albania 1990 0.635 3 Algeria 1990 0.577 4 Andorra 1990 NA 5 Angola 1990 NA 6 Antigua and Barbuda 1990 NA 7 Argentina 1990 0.705 8 Armenia 1990 0.634 9 Australia 1990 0.866 10 Austria 1990 0.794 # ... with 4,878 more rows

slide-9
SLIDE 9

DataCamp Interactive Data Visualization with rbokeh

Let's practice!

INTERACTIVE DATA VISUALIZATION WITH RBOKEH

slide-10
SLIDE 10

DataCamp Interactive Data Visualization with rbokeh

More rbokeh Layers

INTERACTIVE DATA VISUALIZATION WITH RBOKEH

Omayma Said

Data Scientist

slide-11
SLIDE 11

DataCamp Interactive Data Visualization with rbokeh

Scatter Plot + Regression Line

slide-12
SLIDE 12

DataCamp Interactive Data Visualization with rbokeh

Scatter Plot + Regression Line

First: create scatter plot

## filter data dat_90_13 <- bechdel %>% filter(between(year, 1990, 2013)) ## create scatter plot p_scatter <- figure() %>% ly_points(x = log(budget_2013), y = log(intgross_2013), data = dat_90_13, size = 5, alpha = 0.4)

slide-13
SLIDE 13

DataCamp Interactive Data Visualization with rbokeh

Scatter Plot + Regression Line

Second: fit linear regression model

## fit linear regression model lin_reg <- lm(log(intgross_2013) ~ log(budget_2013), data = dat_90_13) > summary(lin_reg) Call: lm(formula = log(intgross_2013) ~ log(budget_2013), data = dat_90_13) Residuals: Min 1Q Median 3Q Max

  • 9.9518 -0.5414 0.1304 0.7083 4.8586

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.43003 0.38987 6.233 5.84e-10 *** log(budget_2013) 0.90739 0.02253 40.269 < 2e-16 ***

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.258 on 1605 degrees of freedom (8 observations deleted due to missingness) Multiple R-squared: 0.5026, Adjusted R-squared: 0.5023 i i 1622 1 d 1605 l 2 2 16

slide-14
SLIDE 14

DataCamp Interactive Data Visualization with rbokeh

Scatter Plot + Regression Line

## add regression line p_scatter %>% ly_abline(lin_reg)

slide-15
SLIDE 15

DataCamp Interactive Data Visualization with rbokeh

Now it is your turn!

INTERACTIVE DATA VISUALIZATION WITH RBOKEH

slide-16
SLIDE 16

DataCamp Interactive Data Visualization with rbokeh

Interaction Tools

INTERACTIVE DATA VISUALIZATION WITH RBOKEH

Omayma Said

Data Scientist

slide-17
SLIDE 17

DataCamp Interactive Data Visualization with rbokeh

Interaction Tools

slide-18
SLIDE 18

DataCamp Interactive Data Visualization with rbokeh

Interaction Tools

slide-19
SLIDE 19

DataCamp Interactive Data Visualization with rbokeh

Interaction Tools (Default)

figure(tools = c("pan", "wheel_zoom", "box_zoom", "reset", "save", "help"), toolbar_location = "right")

slide-20
SLIDE 20

DataCamp Interactive Data Visualization with rbokeh

Interaction Tools (All)

tools

"pan", "wheel_zoom", "box_zoom", "resize", "crosshair", "box_select", "lasso_select", "reset", "save", "help"

toolbar_location

'above', 'below', 'left', 'right', NULL

slide-21
SLIDE 21

DataCamp Interactive Data Visualization with rbokeh

Interaction Tools (Custom)

figure(tools = c("pan", "wheel_zoom", "box_zoom"), toolbar_location = "above", legend_location = "bottom_right", ylim = c(0, 100)) %>% ly_points(x = gdpPercap, y = lifeExp, data = gapminder_2002, color = continent, size = 6, alpha = 0.7)

slide-22
SLIDE 22

DataCamp Interactive Data Visualization with rbokeh

Interaction Tools (Custom)

slide-23
SLIDE 23

DataCamp Interactive Data Visualization with rbokeh

Saving rbokeh Figures

png html

plot_scatter <- figure(title = "Life Expectancy Vs. GDP per Capita in 2002", legend_location = "bottom_right") %>% ly_points(x = gdpPercap, y = lifeExp, data = gapminder_2002) ## save figure as png widget2png(p = plot_scatter, file = "plot_scatter.png") ## save figure as html rbokeh2html(fig = plot_scatter, file = "plot_scatter_interactive.html") ## open saved html browseURL("plot_scatter_interactive.html")

slide-24
SLIDE 24

DataCamp Interactive Data Visualization with rbokeh

Time to Practice

INTERACTIVE DATA VISUALIZATION WITH RBOKEH