TrelliscopeJS Modern Approaches to Data Exploration with Trellis - - PowerPoint PPT Presentation

trelliscopejs
SMART_READER_LITE
LIVE PREVIEW

TrelliscopeJS Modern Approaches to Data Exploration with Trellis - - PowerPoint PPT Presentation

TrelliscopeJS Modern Approaches to Data Exploration with Trellis Display Ryan Hafen Hafen Consulting, LLC Purdue University @hafenstats http://bit.ly/trelliscopejs1 All examples in this talk are reproducible after installing and loading


slide-1
SLIDE 1

TrelliscopeJS

Hafen Consulting, LLC Purdue University @hafenstats

Ryan Hafen

http://bit.ly/trelliscopejs1

Modern Approaches to Data Exploration with Trellis Display

slide-2
SLIDE 2

install.packages(c("tidyverse", "gapminder", "rbokeh", "visNetwork", "plotly")) devtools::install_github("hafen/trelliscopejs") library(tidyverse) library(gapminder) library(rbokeh) library(visNetwork) library(trelliscopejs)

All examples in this talk are reproducible after installing and loading the following packages:

slide-3
SLIDE 3

TrelliscopeJS is an htmlwidget TrelliscopeJS is a layout engine for collections of htmlwidgets TrelliscopeJS is a framework for creating interactive displays of small multiples

slide-4
SLIDE 4

Small Multiples

A series of similar plots, usually each based on a different slice of data, arranged in a grid "For a wide range of problems in data presentation, small multiples are the best design solution."

Edward Tufte (Envisioning Information)

This idea was formalized and popularized in S/S-PLUS and subsequently R with the trellis and lattice packages

slide-5
SLIDE 5

Advantages of Small Multiple Displays

source:

Avoid overplotting Work with big or high dimensional data It is often critical to the discovery of a new insight to be able to see multiple things at

  • nce

Our brains are good at perceiving simple visual features like color or shape or size and they do it amazingly fast without any conscious effort We can tell immediately when a part of an image is different from the rest, without really having to focus on it

In my experience, small multiples are much more effective than more flashy things like animation, linked brushing, custom interactive vis, etc.

slide-6
SLIDE 6

Trelliscope: Interactive Small Multiple Display

source:

Small multiple displays are useful when visualizing data in detail But the number of panels in a display can be potentially very large, too large to view all at once It can also be difficult to specify a meaningful order in which panels are displayed

Trelliscope is a general solution that allows small multiple displays to come alive by providing the ability to interactively sort and filter the panels based on summary statistics, cognostics, automatically computed for each panel

slide-7
SLIDE 7

TrelliscopeJS

JavaScript Library R Package

trelliscopejs-lib trelliscopejs

Built using React Pure JavaScript Interface agnostic htmlwidget interface to trelliscopejs-lib Evolved from CRAN "trelliscope" package (part of project) DeltaRho

slide-8
SLIDE 8

Gapminder Example

Suppose we want to understand mortality over time for each country

bservations: , ariables: country fctr fghanistan, fghanistan, fghanistan, fghanistan, fgh... continent fctr sia, sia, sia, sia, sia, sia, sia, sia, sia, s... year int , , , , , , , , , ... lifep dbl ., ., ., ., ., ., ., ... pop int , , , , , ,... gdpercap dbl ., ., ., ., ., .... glimpse(gapminder)

https://www.gapminder.org/

slide-9
SLIDE 9

plot(year, lifep, data gapminder, color country, geom "line")

Yikes! There are a lot of countries...

slide-10
SLIDE 10

plot(year, lifep, data gapminder, color continent, group country, geom "line")

I can't see what's going on...

slide-11
SLIDE 11

plot(year, lifep, data gapminder, color continent, group country, geom "line") facet_wrap( continent, nrow )

That helped a little...

slide-12
SLIDE 12

`r h `

p plot(year, lifep, data gapminder, color continent, group country, geom "line") facet_wrap( continent, nrow ) plotly::ggplotly(p)

This helps but there is still too much overplotting...

(and hovering for additional info is too much work and we can only see more info one at a time)

slide-13
SLIDE 13

plot(year, lifep, data gapminder) lim(, ) ylim(, ) theme_bw() facet_wrap( country continent)

slide-14
SLIDE 14

From ggplot2 Faceting to Trelliscope

Turning a ggplot2 faceted display into a Trelliscope display is as easy as changing: facet_wrap()

  • r:

facet_grid() to:

facet_trelliscope()

slide-15
SLIDE 15

plot(year, lifep, data gapminder) lim(, ) ylim(, ) theme_bw() facet_trelliscope( country continent, nrow = 2, ncol = 7, width = 300)

  • pen in new

window

slide-16
SLIDE 16

plot(year, lifep, data gapminder) lim(, ) ylim(, ) theme_bw() facet_trelliscope( country continent, nrow , ncol , width , splotl = )

  • pen in new

window

slide-17
SLIDE 17

Plotting in the Tidyverse

slide-18
SLIDE 18

country_model function(df) lm(lifep year, data df) by_country gapminder group_by(country, continent) nest() mutate( model map(data, country_model), resid_mad map_dbl(model, function() mad(resid()))) by_country

Example adapted from "R for Data Science"

tibble: country continent data model resid_mad fctr fctr list list dbl fghanistan sia tibble : lm . lbania urope tibble : lm . lgeria frica tibble : lm . ngola frica tibble : lm . rgentina mericas tibble : lm . ustralia ceania tibble : lm . ustria urope tibble : lm . ahrain sia tibble : lm . angladesh sia tibble : lm . elgium urope tibble : lm . ... with more rows

Gapminder Example from "R for Data Science"

One row per group Per-group data and models as "list-columns"

slide-19
SLIDE 19

Excerpt from "R for Data Science"

Plotting the Fit for Each Country

slide-20
SLIDE 20

figure(lim c(, ), ylim c(, ), tools N) ly_points(year, lifep, data data, hover data) ly_abline(model)

  • country_plot(by_countrydata,

by_countrymodel)

Plotting the Data and Model Fit for a Group

We'll use the rbokeh package to make a plot function and apply it to the first row of our data

country_plot function(data, model)

slide-21
SLIDE 21

by_country by_country mutate(plot p2plot(data, model, country_plot)) by_country tibble: country continent data model resid_mad plot fctr fctr list list dbl list fghanistan sia tibble : lm . : rbokeh lbania urope tibble : lm . : rbokeh lgeria frica tibble : lm . : rbokeh ngola frica tibble : lm . : rbokeh rgentina mericas tibble : lm . : rbokeh ustralia ceania tibble : lm . : rbokeh ustria urope tibble : lm . : rbokeh ahrain sia tibble : lm . : rbokeh angladesh sia tibble : lm . : rbokeh elgium urope tibble : lm . : rbokeh ... with more rows

Let's Apply This Function to Every Row!

Plots as list-columns!!!

slide-22
SLIDE 22

by_country trelliscope(name "by_country_lm", nrow , ncol )

  • pen in new

window

slide-23
SLIDE 23

Recap: TrelliscopeJS in the Tidyverse

Create a data frame with one row per group, typically using Tidyverse group_by() and nest() operations Add a column of plots TrelliscopeJS provides purrr map functions map_plot(), map2_plot(), pmap_plot() that you can use to create these You can use any graphics system to create the plot objects (ggplot2, htmlwidgets, lattice) Optionally add more columns to the data frame that will be used as cognostics - metrics with which you can interact with the panels All atomic columns will be automatically used as cognostics Map functions map_cog(), map2_cog(), pmap_cog() can be used for convenience to create columns of cognostics Simply pass the data frame in to trelliscope() With plots as columns, TrelliscopeJS provides nearly effortless detailed, flexible, interactive visualization in the Tidyverse

slide-24
SLIDE 24

by_country arrange(resid_mad) trelliscope(name "by_country_lm", nrow , ncol )

  • pen in new

window

Order the data frame to set initial ordering of display

slide-25
SLIDE 25

by_country filter(continent "frica") trelliscope(name "by_country_africa_lm", nrow , ncol )

  • pen in new

window

Filter the data to only include plots you want in the display

slide-26
SLIDE 26

Images as Panels

slide-27
SLIDE 27

pokemon read_csv("http://bit.ly/plot_pokemon") mutate_at(vars(matches("_id")), as.character) mutate(panel img_panel(url_image)) pokemon

Show 10 entries Search: Showing 1 to 10 of 801 entries Previous 1 2 3 4 5 … 81 Next pokemon id species_id height weight base_experience type_1 type_2 attack 1 bulbasaur 1 1 7 69 64 grass poison 49 2 ivysaur 2 2 10 130 142 grass poison 62 3 venusaur 3 3 20 1000 236 grass poison 82 4 venusaur- mega 4 3 24 1555 281 grass poison 100 5 charmander 5 4 6 85 62 fire 52 6 charmeleon 6 5 11 190 142 fire 64 7 charizard 7 6 17 905 240 fire flying 84 8 charizard- mega-x 8 6 17 1105 285 fire dragon 130 9 charizard- mega-y 9 6 17 1005 285 fire flying 104 10 squirtle 10 7 5 90 63 water 48

slide-28
SLIDE 28

trelliscope(pokemon, name "pokemon", nrow , ncol , state list(labels c("pokemon", "pokede")))

data source blog post

  • pen in new

window

slide-29
SLIDE 29

htmlwidgets as Panels

slide-30
SLIDE 30

1

Example: Network Vis with visNetwork htmlwidget

library(visNetwork) nnodes nnedges nodes data.frame( id :nnodes, label :nnodes, value rep(, nnodes)) edges data.frame( from sample(:nnodes, nnedges, replace ), to sample(:nnodes, nnedges, replace )) group_by(from, to) summarise(value n()) network_plot function(id, hide_select ) style ifelse(hide_select, "visibility: hidden position: absolute", "") visNetwork(nodes, edges) visgraphayout(layout "layout_in_circle") visNodes(fied , scaling list(min , ma , label list(min , ma , drawhreshold , maisible ))) visdges(scaling list(min , ma )) visptions(highlightNearest list(enabled , degree , hideolor "rgba(,,,.)"), nodesdelection list(selected as.character(id), style style))

  • network_plot(, hide_select )
slide-31
SLIDE 31

nodedat edges group_by(from) summarise(n_nodes n(), tot_conns sum(value)) rename(id from) arrange(n_nodes) mutate(panel map_plot(id, network_plot)) nodedat tibble: id n_nodes tot_conns panel int int int list : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork ... with more rows

Trelliscope display with one panel per node

We create a one-row-per-node data frame with number of nodes connected to and total number of connections as cognostics and add a plot panel column

slide-32
SLIDE 32

nodedat arrange(n_nodes) trelliscope(name "connections", nrow , ncol )

  • pen in new

window

slide-33
SLIDE 33

Larger Trelliscope Displays

slide-34
SLIDE 34
slide-35
SLIDE 35

instadf %>% arrange(-likes_count) %>% trelliscope(name = "posts", width = 320, height = 320, nrow = 3, ncol = 6, state = list(labels = c("caption", "post_link", "likes_count")))

  • pen in new

window blog post

slide-36
SLIDE 36

Trelliscope Displays as Apps

slide-37
SLIDE 37

Trelliscope Displays as Apps

If you have an app that has multiple inputs and produces a plot output, the idea is simply to enumerate all possible inputs as rows of a data frame and add the plot that corresponds to these parameters as column and plot it Trelliscope displays are most useful as exploratory plots to guide the data scientist (because they can be created rapidly) However, in many cases Trelliscope displays can be used as interactive applications for end-users, domain experts, etc. with the bonus that they are much easier to create than a custom app

slide-38
SLIDE 38

Gampinder Life Expectancy

Select country: Afghanistan

library(shiny) library(ggplot2) library(gapminder) server <- function(input, output) {

  • utput$countryPlot <- renderPlot({

qplot(year, lifeExp, data = subset(gapminder, country == input$country)) + xlim(1948, 2011) + ylim(10, 95) + theme_bw() }) } choices <- sort(unique(gapminder$country)) ui <- fluidPage( titlePanel("Gampinder Life Expectancy"), sidebarLayout( sidebarPanel( selectInput("country", label = "Select country: ", choices = choices, selected = "Afghanistan") ), mainPanel( plotOutput("countryPlot", height = "500px") ) ) ) runApp(list(ui = ui, server = server))

slide-39
SLIDE 39

Scaling Trelliscope

Just because you can't look at all panels in a display doesn't mean it isn't useful or practical to make a large display - it's in fact beneficial because you get an unprecedented level of detail in your displays, and every corner of your data can be conceptually viewed One insight is all you need for a display to serve a purpose (provided it is quick to create) We used the previous implementation of Trelliscope to visualize millions of subsets

  • f terabytes of data
slide-40
SLIDE 40

What is needed to scale in the Tidyverse?

SparklyR is the natural solution But we need a few things... SparklyR support for list-columns (nested data frames and arbitrary R objects) SparklyR support for remote procedure calls (run arbitrary R code on the data) Fast random access to rows of a SparklyR data frame A TrelliscopeJS deferred panel rendering scheme (render on-the-fly rather than all panels up front)

slide-41
SLIDE 41

What's Next

trelliscopejs Automatic cognostics: automatically compute useful cognostics based on the context of what is being plotted (e.g. if a scatterplot has a model fit superposed, add model diagnostics cognostics Automatic handling of axis limits - "same", "sliced", "free" (underway - currently "same" limits need to be hard-coded) When axes are "same", only show axes on plot margins instead

  • f every panel (underway for ggplot2)

trelliscopejs-lib More visual filters for cognostics (dates, geographic, bivariate relationships, etc.) Bookmarkable / sharable state View multiple panels side-by-side Support for receiving panels from other endpoints

slide-42
SLIDE 42

For More Information

Twitter: Blog: Documentation: Github: @hafenstats http://ryanhafen.com/blog http://hafen.github.io/trelliscopejs https://github.com/hafen/trelliscopejs