data formats
play

Data Formats Omayma Said Data Scientist DataCamp Interactive Data - PowerPoint PPT Presentation

DataCamp Interactive Data Visualization with rbokeh INTERACTIVE DATA VISUALIZATION WITH RBOKEH Data Formats Omayma Said Data Scientist DataCamp Interactive Data Visualization with rbokeh hdi_cpi_2015 Data DataCamp Interactive Data


  1. DataCamp Interactive Data Visualization with rbokeh INTERACTIVE DATA VISUALIZATION WITH RBOKEH Data Formats Omayma Said Data Scientist

  2. DataCamp Interactive Data Visualization with rbokeh hdi_cpi_2015 Data

  3. DataCamp Interactive Data Visualization with rbokeh Data Format (hdi_cpi_wide) > str(hdi_cpi_wide) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 121 obs. of 8 variables: $ country : chr "Afghanistan" "Albania" "Algeria" "Angola" $ year : int 2015 2015 2015 2015 2015 2015 2015 2015 201 $ human_development_index : num 0.479 0.764 0.745 0.533 0.827 0.939 0.893 0 $ country_code : chr "AFG" "ALB" "DZA" "AGO" ... $ cpi_rank : int 166 88 88 163 106 13 16 50 139 15 ... $ region : chr "AP" "ECA" "MENA" "SSA" ... $ corruption_perception_index: int 11 36 36 15 32 79 76 51 25 77 ... $ continent

  4. DataCamp Interactive Data Visualization with rbokeh Data Format (hdi_cpi_long) > str(hdi_cpi_long) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 242 obs. of 7 variables: $ country : chr "Afghanistan" "Afghanistan" "Albania" "Albania" ... $ country_code: chr "AFG" "AFG" "ALB" "ALB" ... $ cpi_rank : int 166 166 88 88 88 88 163 163 106 106 ... $ region : chr "AP" "AP" "ECA" "ECA" ... $ continent : chr "Asia" "Asia" "Europe" "Europe" ... $ index : chr "human_development_index" "corruption_perception_index" "h $ value : num 0.479 11 0.764 36 0.745 36 0.533 15 0.827 32 ...

  5. DataCamp Interactive Data Visualization with rbokeh gather() and spread()

  6. DataCamp Interactive Data Visualization with rbokeh Long to Wide hdi_cpi_wide <- hdi_cpi_long %>% spread(key = index, value = value) str(hdi_cpi_wide) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 121 obs. of 8 variables: $ country : chr "Afghanistan" "Albania" "Algeria" "Angola" $ year : int 2015 2015 2015 2015 2015 2015 2015 2015 201 $ human_development_index : num 0.479 0.764 0.745 0.533 0.827 0.939 0.893 0 $ country_code : chr "AFG" "ALB" "DZA" "AGO" ... $ cpi_rank : int 166 88 88 163 106 13 16 50 139 15 ... $ region : chr "AP" "ECA" "MENA" "SSA" ... $ corruption_perception_index: int 11 36 36 15 32 79 76 51 25 77 ... $ continent : chr "Asia" "Europe" "Africa" "Africa" ...

  7. DataCamp Interactive Data Visualization with rbokeh Data Format (hdi_data_wide) > hdi_data_wide # A tibble: 188 x 27 country `1990` `1991` `1992` `1993` `1994` `1995` `1996` `1997` `1998` <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Afghanistan 0.295 0.3 0.309 0.305 0.3 0.324 0.328 0.332 0.335 2 Albania 0.635 0.618 0.603 0.608 0.616 0.628 0.637 0.636 0.646 3 Algeria 0.577 0.581 0.587 0.591 0.595 0.6 0.609 0.617 0.627 4 Andorra NA NA NA NA NA NA NA NA NA 5 Angola NA NA NA NA NA NA NA NA NA 6 Antigua and ~ NA NA NA NA NA NA NA NA NA 7 Argentina 0.705 0.713 0.72 0.725 0.728 0.731 0.738 0.746 0.753 8 Armenia 0.634 0.628 0.595 0.593 0.597 0.603 0.609 0.618 0.632 9 Australia 0.866 0.867 0.871 0.874 0.876 0.885 0.888 0.891 0.894 10 Austria 0.794 0.798 0.804 0.806 0.812 0.816 0.819 0.823 0.833 # ... with 178 more rows, and 14 more variables: `2002` <dbl>, `2003` <dbl>, `20 # `2005` <dbl>, `2006` <dbl>, `2007` <dbl>, `2008` <dbl>, `2009` <dbl>, `2010` # `2011` <dbl>, `2012` <dbl>, `2013` <dbl>, `2014` <dbl>, `2015` <dbl>

  8. DataCamp Interactive Data Visualization with rbokeh Wide to Long hdi_data_long <- hdi_data_wide %>% gather(key = year, value = human_development_index, - country) > hdi_data_long # A tibble: 4,888 x 3 country year human_development_index <chr> <int> <dbl> 1 Afghanistan 1990 0.295 2 Albania 1990 0.635 3 Algeria 1990 0.577 4 Andorra 1990 NA 5 Angola 1990 NA 6 Antigua and Barbuda 1990 NA 7 Argentina 1990 0.705 8 Armenia 1990 0.634 9 Australia 1990 0.866 10 Austria 1990 0.794 # ... with 4,878 more rows

  9. DataCamp Interactive Data Visualization with rbokeh INTERACTIVE DATA VISUALIZATION WITH RBOKEH Let's practice!

  10. DataCamp Interactive Data Visualization with rbokeh INTERACTIVE DATA VISUALIZATION WITH RBOKEH More rbokeh Layers Omayma Said Data Scientist

  11. DataCamp Interactive Data Visualization with rbokeh Scatter Plot + Regression Line

  12. DataCamp Interactive Data Visualization with rbokeh Scatter Plot + Regression Line First: create scatter plot ## filter data dat_90_13 <- bechdel %>% filter(between(year, 1990, 2013)) ## create scatter plot p_scatter <- figure() %>% ly_points(x = log(budget_2013), y = log(intgross_2013), data = dat_90_13, size = 5, alpha = 0.4)

  13. DataCamp Interactive Data Visualization with rbokeh Scatter Plot + Regression Line Second: fit linear regression model ## fit linear regression model lin_reg <- lm(log(intgross_2013) ~ log(budget_2013), data = dat_90_13) > summary(lin_reg) Call: lm(formula = log(intgross_2013) ~ log(budget_2013), data = dat_90_13) Residuals: Min 1Q Median 3Q Max -9.9518 -0.5414 0.1304 0.7083 4.8586 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.43003 0.38987 6.233 5.84e-10 *** log(budget_2013) 0.90739 0.02253 40.269 < 2e-16 *** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.258 on 1605 degrees of freedom (8 observations deleted due to missingness) Multiple R-squared: 0.5026, Adjusted R-squared: 0.5023 i i 1622 1 d 1605 l 2 2 16

  14. DataCamp Interactive Data Visualization with rbokeh Scatter Plot + Regression Line ## add regression line p_scatter %>% ly_abline(lin_reg)

  15. DataCamp Interactive Data Visualization with rbokeh INTERACTIVE DATA VISUALIZATION WITH RBOKEH Now it is your turn!

  16. DataCamp Interactive Data Visualization with rbokeh INTERACTIVE DATA VISUALIZATION WITH RBOKEH Interaction Tools Omayma Said Data Scientist

  17. DataCamp Interactive Data Visualization with rbokeh Interaction Tools

  18. DataCamp Interactive Data Visualization with rbokeh Interaction Tools

  19. DataCamp Interactive Data Visualization with rbokeh Interaction Tools (Default) figure(tools = c("pan", "wheel_zoom", "box_zoom", "reset", "save", "help"), toolbar_location = "right")

  20. DataCamp Interactive Data Visualization with rbokeh Interaction Tools (All) tools "pan", "wheel_zoom", "box_zoom", "resize", "crosshair", "box_select", "lasso_select", "reset", "save", "help" toolbar_location 'above', 'below', 'left', 'right', NULL

  21. DataCamp Interactive Data Visualization with rbokeh Interaction Tools (Custom) figure(tools = c("pan", "wheel_zoom", "box_zoom"), toolbar_location = "above", legend_location = "bottom_right", ylim = c(0, 100)) %>% ly_points(x = gdpPercap, y = lifeExp, data = gapminder_2002, color = continent, size = 6, alpha = 0.7)

  22. DataCamp Interactive Data Visualization with rbokeh Interaction Tools (Custom)

  23. DataCamp Interactive Data Visualization with rbokeh Saving rbokeh Figures plot_scatter <- figure(title = "Life Expectancy Vs. GDP per Capita in 2002", legend_location = "bottom_right") %>% ly_points(x = gdpPercap, y = lifeExp, data = gapminder_2002) png ## save figure as png widget2png(p = plot_scatter, file = "plot_scatter.png") html ## save figure as html rbokeh2html(fig = plot_scatter, file = "plot_scatter_interactive.html") ## open saved html browseURL("plot_scatter_interactive.html")

  24. DataCamp Interactive Data Visualization with rbokeh INTERACTIVE DATA VISUALIZATION WITH RBOKEH Time to Practice

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend