Plot and Mapped Attributes (Part 1) Omayma Said Data Scientist - - PowerPoint PPT Presentation

plot and mapped attributes part 1
SMART_READER_LITE
LIVE PREVIEW

Plot and Mapped Attributes (Part 1) Omayma Said Data Scientist - - PowerPoint PPT Presentation

DataCamp Interactive Data Visualization with rbokeh INTERACTIVE DATA VISUALIZATION WITH RBOKEH Plot and Mapped Attributes (Part 1) Omayma Said Data Scientist DataCamp Interactive Data Visualization with rbokeh Aesthetic attributes color


slide-1
SLIDE 1

DataCamp Interactive Data Visualization with rbokeh

Plot and Mapped Attributes (Part 1)

INTERACTIVE DATA VISUALIZATION WITH RBOKEH

Omayma Said

Data Scientist

slide-2
SLIDE 2

DataCamp Interactive Data Visualization with rbokeh

Aesthetic attributes

color size transparency line type shape

slide-3
SLIDE 3

DataCamp Interactive Data Visualization with rbokeh

Color Attributes (Default Color)

figure(title = "Life Expectancy Vs. GDP per Capita in 1992" ) %>% ly_points(x = gdpPercap, y = lifeExp, data = dat_1992)

slide-4
SLIDE 4

DataCamp Interactive Data Visualization with rbokeh

Color Attributes (Mapping Variables)

figure(legend_location = "bottom_right", title = "Life Expectancy Vs. GDP per Capita in 1992" ) %>% ly_points(x = gdpPercap, y = lifeExp, data = dat_1992, color = continent)

slide-5
SLIDE 5

DataCamp Interactive Data Visualization with rbokeh

HDI Dataset

Human Development Index (HDI) data by UNDP

> hdi_data country year human_development_index <chr> <int> <dbl> 1 Afghanistan 1990 0.295 2 Albania 1990 0.635 3 Algeria 1990 0.577 4 Andorra 1990 NA 5 Angola 1990 NA 6 Antigua and Barbuda 1990 NA 7 Argentina 1990 0.705 8 Armenia 1990 0.634 9 Australia 1990 0.866 10 Austria 1990 0.794 # ... with 4,878 more rows

slide-6
SLIDE 6

DataCamp Interactive Data Visualization with rbokeh

HDI Dataset

Human Development Index (HDI) data by The HDI is a summary measure of average achievement in three dimensions of human development: a long and healthy life being knowledgeable having a decent standard of living UNDP

slide-7
SLIDE 7

DataCamp Interactive Data Visualization with rbokeh

Color Attributes (Line and Fill)

## plot human_development_index versus year hdi_countries <- hdi_data %>% filter(country %in% c("Hungary", "Bulgaria", "Poland")) fig_col <- figure(data = hdi_countries, legend_location = "bottom_right") %>% ly_lines(x = year, y = human_development_index, color = country) %>% ly_points(x = year, y = human_development_index, color = country) ## View plot fig_col

slide-8
SLIDE 8

DataCamp Interactive Data Visualization with rbokeh

Color Attributes (Line and Fill)

slide-9
SLIDE 9

DataCamp Interactive Data Visualization with rbokeh

Color Attributes (Line and Fill)

ly_points()

line_color fill_color

ly_lines()

line_color

slide-10
SLIDE 10

DataCamp Interactive Data Visualization with rbokeh

Color Attributes (Line and Fill)

line_color = color fill_color = color (with the alpha level of the fill reduced by 50%)

figure(legend_location = "bottom_right") %>% ly_points(x = year, y = human_development_index, data = hdi_countries, color = country)

slide-11
SLIDE 11

DataCamp Interactive Data Visualization with rbokeh

Color Attributes (Line and Fill)

## plot human_development_index versus year fig_col <- figure(data = hdi_countries, legend_location = "bottom_right") %>% ly_points(x = year, y = human_development_index, fill_color = country, fill_alpha = 1) %>% ly_lines(x = year, y = human_development_index, color = country) ## view plot fig_col

slide-12
SLIDE 12

DataCamp Interactive Data Visualization with rbokeh

Color Attributes (Line and Fill)

slide-13
SLIDE 13

DataCamp Interactive Data Visualization with rbokeh

Color Palettes

## plot human_development_index versus year and map color to country fig_col <- figure(data = hdi_countries, legend_location = "bottom_right") %>% ly_points(x = year, y = human_development_index, fill_color = country, fill_alpha = 1) %>% ly_lines(x = year, y = human_development_index, color = country) fig_col %>% set_palette(discrete_color = pal_color(c("#3182bd", "#31a354", "#de2d26")))

slide-14
SLIDE 14

DataCamp Interactive Data Visualization with rbokeh

Color Palettes

fig_col %>% set_palette(discrete_color = pal_color(c("#3182bd", "#31a354", "#de2d26")))

slide-15
SLIDE 15

DataCamp Interactive Data Visualization with rbokeh

Now it is your turn!

INTERACTIVE DATA VISUALIZATION WITH RBOKEH

slide-16
SLIDE 16

DataCamp Interactive Data Visualization with rbokeh

Plot and Mapped Attributes (Part 2)

INTERACTIVE DATA VISUALIZATION WITH RBOKEH

Omayma Said

Data Scientist

slide-17
SLIDE 17

DataCamp Interactive Data Visualization with rbokeh

Bechdel Dataset

The raw data behind the story fivethirtyeight package "The Dollar-And-Cents Case Against Hollywood's Exclusion of Women"

slide-18
SLIDE 18

DataCamp Interactive Data Visualization with rbokeh

Bechdel Dataset

Three criteria to pass the bechdel test: there are at least two named women in the picture they have a conversation with each other at some point that conversation isn’t about a male character fivethirtyeight package

slide-19
SLIDE 19

DataCamp Interactive Data Visualization with rbokeh

Bechdel Dataset

> library(fivethirtyeight) > str(bechdel) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1794 obs. of 15 variables: $ year : int 2013 2012 2013 2013 2013 2013 2013 2013 2013 2013 ... $ imdb : chr "tt1711425" "tt1343727" "tt2024544" "tt1272878" ... $ title : chr "21 & Over" "Dredd 3D" "12 Years a Slave" "2 Guns" ... $ test : chr "notalk" "ok-disagree" "notalk-disagree" "notalk" ... $ clean_test : Ord.factor w/ 5 levels "nowomen"<"notalk"<..: 2 5 2 2 3 3 2 5 $ binary : chr "FAIL" "PASS" "FAIL" "FAIL" ... $ budget : int 13000000 45000000 20000000 61000000 40000000 225000000 92 $ domgross : num 25682380 13414714 53107035 75612460 95020213 ... $ intgross : num 4.22e+07 4.09e+07 1.59e+08 1.32e+08 9.50e+07 ... $ code : chr "2013FAIL" "2012PASS" "2013FAIL" "2013FAIL" ... $ budget_2013 : int 13000000 45658735 20000000 61000000 40000000 225000000 92 $ domgross_2013: num 25682380 13611086 53107035 75612460 95020213 ... $ intgross_2013: num 4.22e+07 4.15e+07 1.59e+08 1.32e+08 9.50e+07 ... $ period_code : int 1 1 1 1 1 1 1 1 1 1 ... $ decade_code : int 1 1 1 1 1 1 1 1 1 1 ...

slide-20
SLIDE 20

DataCamp Interactive Data Visualization with rbokeh

Bechdel Dataset (Revenue versus Budget)

figure() %>% ly_points(x = budget_2013, y = intgross_2013, data = dat_90_13)

slide-21
SLIDE 21

DataCamp Interactive Data Visualization with rbokeh

Bechdel Dataset (Variables Distributions)

figure() %>% ly_hist(x = intgross_2013, data = dat_90_13) figure() %>% ly_hist(x = budget_2013, data = dat_90_13)

slide-22
SLIDE 22

DataCamp Interactive Data Visualization with rbokeh

Bechdel Dataset (Variables Distributions)

figure() %>% ly_hist(x = log(intgross_2013), data = dat_90_13) figure() %>% ly_hist(x = log(budget_2013), data = dat_90_13)

slide-23
SLIDE 23

DataCamp Interactive Data Visualization with rbokeh

Overplotting

figure() %>% ly_points(x = log(budget_2013), y = log(intgross_2013), data = dat_90_13)

slide-24
SLIDE 24

DataCamp Interactive Data Visualization with rbokeh

Overplotting (Alpha and Point Size)

figure() %>% ly_points(x = log(budget_2013), y = log(intgross_2013), data = dat_90_13, alpha = 0.4, size = 5)

slide-25
SLIDE 25

DataCamp Interactive Data Visualization with rbokeh

Line Width

## filter hdi data hdi_countries <- hdi_data %>% filter(country %in% c("Rwanda", "Kenya", "Botswana")) # A tibble: 78 x 3 country year human_development_index <chr> <int> <dbl> 1 Botswana 1990 0.585 2 Kenya 1990 0.473 3 Rwanda 1990 0.244 4 Botswana 1991 0.592 5 Kenya 1991 0.471 6 Rwanda 1991 0.22 7 Botswana 1992 0.59 8 Kenya 1992 0.468 9 Rwanda 1992 0.206 10 Botswana 1993 0.587 # ... with 68 more rows

slide-26
SLIDE 26

DataCamp Interactive Data Visualization with rbokeh

Line Width

## plot human_development_index over time figure(title = "Human Development Index over Time", legend = "bottom_right") %>% ly_lines(x = year, y = human_development_index, data = hdi_countries, color = country)

slide-27
SLIDE 27

DataCamp Interactive Data Visualization with rbokeh

Line Width

figure(title = "Human Development Index over Time", legend = "bottom_right") %>% ly_lines(x = year, y = human_development_index, data = hdi_countries, color = country, width = 3)

slide-28
SLIDE 28

DataCamp Interactive Data Visualization with rbokeh

Let's try some examples!

INTERACTIVE DATA VISUALIZATION WITH RBOKEH

slide-29
SLIDE 29

DataCamp Interactive Data Visualization with rbokeh

Hover info & Figure Options

INTERACTIVE DATA VISUALIZATION WITH RBOKEH

Omayma Said

Data Scientist

slide-30
SLIDE 30

DataCamp Interactive Data Visualization with rbokeh

HDI CPI data 2015

Human Development Index and Corruption Perception Index in 2015

> str(hdi_cpi_2015) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 121 obs. of 8 variables: $ country : chr "Afghanistan" "Albania" "Algeria" "An $ year : int 2015 2015 2015 2015 2015 2015 2015 20 $ human_development_index : num 0.479 0.764 0.745 0.533 0.827 0.939 0 $ country_code : chr "AFG" "ALB" "DZA" "AGO" ... $ cpi_rank : int 166 88 88 163 106 13 16 50 139 15 ... $ region : chr "AP" "ECA" "MENA" "SSA" ... $ corruption_perception_index : int 11 36 36 15 32 79 76 51 25 77 ... $ continent : chr "Asia" "Europe" "Africa" "Africa" ...

slide-31
SLIDE 31

DataCamp Interactive Data Visualization with rbokeh

HDI CPI data 2015

Human Development Index and Corruption Perception Index in 2015 Notes: For corruption_perception_index: lower values reflect higher perceived corruption For human_development_index: lower values reflect lower development

slide-32
SLIDE 32

DataCamp Interactive Data Visualization with rbokeh

Hover Info

figure(legend_location = "bottom_right", title = "CPI versus HDI - 2015") %>% ly_points(x = corruption_perception_index, y = human_development_index, data = hdi_cpi_2015, color = continent, size = 7, hover = c(country, cpi_rank))

slide-33
SLIDE 33

DataCamp Interactive Data Visualization with rbokeh

Hover Info

figure(legend_location = "bottom_right", title = "CPI versus HDI - 2015") %>% ly_points(x = corruption_perception_index, y = human_development_index, data = hdi_cpi_2015, color = continent, size = 7, hover = "CPI Rank: @cpi_rank")

slide-34
SLIDE 34

DataCamp Interactive Data Visualization with rbokeh

Hover Info

figure(legend_location = "bottom_right", title = "CPI versus HDI - 2015") %>% ly_points(x = corruption_perception_index, y = human_development_index, data = hdi_cpi_2015, color = continent, size = 7, hover = "<b>@country</b><br><b>CPI Rank</b>: @cpi_rank")

slide-35
SLIDE 35

DataCamp Interactive Data Visualization with rbokeh

Figure Options

Figure Title (title) Figure size (width, height) Axes Labels (xlab , ylab) Axes Limits (xlim, ylim) Theme (theme)

figure()

slide-36
SLIDE 36

DataCamp Interactive Data Visualization with rbokeh

Figure Options (Axes Limits)

Default ylim

ylim = c(0, 1)

slide-37
SLIDE 37

DataCamp Interactive Data Visualization with rbokeh

Figure Options (Theme)

hdi_cpi_scatter <- figure(legend_location = "bottom_right", title = "CPI versus HDI - 2015", ylim = c(0, 1), xlab = "CPI", ylab = "HDI", theme = bk_ggplot_theme()) %>% ly_points(x = corruption_perception_index_score, y = human_development_index, data = hdi_cpi_data, color = continent, size = 7)

slide-38
SLIDE 38

DataCamp Interactive Data Visualization with rbokeh

Figure Options (Theme)

slide-39
SLIDE 39

DataCamp Interactive Data Visualization with rbokeh

Let's practice!

INTERACTIVE DATA VISUALIZATION WITH RBOKEH