Polyglot data science the force awakens with F#, R and D3.js - - PowerPoint PPT Presentation
Polyglot data science the force awakens with F#, R and D3.js - - PowerPoint PPT Presentation
Polyglot data science the force awakens with F#, R and D3.js Evelina Gabasova @evelgab Tomas Petricek @tomaspetricek Part I F# with type providers fslab.org : Doing data science using F# The data science workflow Data access with type
Part I
F# with type providers
: Doing data science using F# fslab.org
The data science workflow Data access with type providers Interactive analysis with .NET and R libraries Visualization with HTML/PDF charts and reports High-quality open-source libraries
LINQ before it was cool :-)
var res = StockData.MSFT .Where(stock => stock.Close stock.Open > 7.0) .Select(stock => stock.Date)
Looking under the cover
Extension methods take Func<T1, T2> delegates Immutable because it returns a new
IEnumerable
Functional design allows method chaining
LINQ before it was cool :-)
StockData.MSFT |> Array.filter (fun stock > stock.Close stock.Open > 7.0) |> Array.map (fun stock > stock.Date)
Looking under the cover
Pipeline operator for composing functions Lambda functions written using fun Immutable lists, sequences, arrays, etc.
Charting libraries for F#
- cross platform, HTML-based
(recommended)
- flexible but Windows-only library
Other options: and XPlot F# Charting FnuPlot R provider
For latest information
See
- the F# data science
homepage FsLab.org
Charting with XPlot
Draw sin for values from to :
2π
[| 0.0 .. 0.1 .. 6.3 |] |> Array.map (fun x > x, sin x) |> Chart.Line
Uses Google Charts behind the scenes:
0.0 1.5 3.0 4.5 6.0 1.0 0.5 0.0 0.5 1.0
What are type providers?
Type provider patterns
Providers for a specific data source
let wb = WorldBankData.GetDataContext() wb.Countries.India.Indicators.``Population, total``
Parameterized provider for a data format
type Rss = XmlProvider<"data/bbc.xml"> Rss.Load(url).Channel.Description
TASK: Star Wars movie prots
Star Wars rating and box office 18 94 1,980 1,990 2,000 2,010 2,020 600,000,000 1,200,000,000 1,800,000,000 2,400,000,000 Year Box office
github.com/evelinag/polyglot-data- science
Part II
Visualization with D3.js
The Star Wars social network
D3.js visualizations
made easier
Gallery of examples
D3.js social network visualization
Force-directed network layout
Part III
Analyzing social networks with R
Social network analysis
Who is the most central character? How to the movies compare between themselves?
The R language
"domain-specific" language for statistical analysis
Very quick R intro
# assignment x < 1 x = 1 # variable and function names x x.y read.csv
Very quick R intro: pipeline
|> turns into %>%
install.packages("magrittr") library(magrittr) xs < c(1,2,3,4,5,6,7,8,9,10) xs %>% mean
Network analysis with igraph
igraph website igraph documentation
install.packages("igraph") library(igraph)
Creating igraph network
library(igraph) g < graph(edges)
edges = list of nodes n1, n2, n3, n4, n5, ... represents (n1, n2), (n3, n4), ...
Calculating degree
d < degree(graph)
F#
- pen RProvider.igraph
let degree = R.degree(network)
F#
export JSON into list of edges
R
perform the network analysis
Degree
Degree
Degree
Degree
Degree(v) = Number of links v ↔ v′ v ≠ v′
Betweenness
Betweenness
Betweenness
Betweenness
Betweenness
Betweenness
= Number of shortest paths between a and b through v Sv S = Number of shortest paths between a and b Betweenness(v = )ab Sv S
Betweenness
= Number of shortest paths between a and b through v Sv S = Number of shortest paths between a and b Betweenness(v) = ∑
ab
Sv S
Network structure
How do the the movies differ? Size Density Clustering coefficient
Density
Density
Density
Density = Existing connections Potential connections = Existing connections N(N − 1)
1 2
Clustering coefcient
Clustering coefcient
Clustering coefcient
Clustering coefcient
Clustering coefcient
Clustering coefcient
Clustering coefcient
= Number of neighbours of v Kv = Number of links between neighbours of v Ev Clustering(v) = Ev ( − 1)
1 2 Kv Kv
Clustering coefcient
= Number of neighbours of v Kv = Number of links between neighbours of v Ev Clustering(network) = 1 N ∑
v
Ev ( − 1)
1 2 Kv Kv
Size
Number of characters 10 20 30 40 Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 Number of characters
Density
Network density 15 20 25 30 35 Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 Density (%)
Clustering coefficient
Clustering coefficient (transitivity) 0.40 0.48 0.56 0.64 0.72 Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 Clustering coefficient