Polyglot data science the force awakens with F#, R and D3.js - - PowerPoint PPT Presentation

polyglot data science the force awakens
SMART_READER_LITE
LIVE PREVIEW

Polyglot data science the force awakens with F#, R and D3.js - - PowerPoint PPT Presentation

Polyglot data science the force awakens with F#, R and D3.js Evelina Gabasova @evelgab Tomas Petricek @tomaspetricek Part I F# with type providers fslab.org : Doing data science using F# The data science workflow Data access with type


slide-1
SLIDE 1

Polyglot data science the force awakens

with F#, R and D3.js

Evelina Gabasova @evelgab Tomas Petricek @tomaspetricek

slide-2
SLIDE 2

Part I

F# with type providers

slide-3
SLIDE 3

: Doing data science using F# fslab.org

The data science workflow Data access with type providers Interactive analysis with .NET and R libraries Visualization with HTML/PDF charts and reports High-quality open-source libraries

slide-4
SLIDE 4

LINQ before it was cool :-)

var res = StockData.MSFT .Where(stock => stock.Close ­ stock.Open > 7.0) .Select(stock => stock.Date)

Looking under the cover

Extension methods take Func<T1, T2> delegates Immutable because it returns a new

IEnumerable

Functional design allows method chaining

slide-5
SLIDE 5

LINQ before it was cool :-)

StockData.MSFT |> Array.filter (fun stock ­> stock.Close ­ stock.Open > 7.0) |> Array.map (fun stock ­> stock.Date)

Looking under the cover

Pipeline operator for composing functions Lambda functions written using fun Immutable lists, sequences, arrays, etc.

slide-6
SLIDE 6

Charting libraries for F#

  • cross platform, HTML-based

(recommended)

  • flexible but Windows-only library

Other options: and XPlot F# Charting FnuPlot R provider

For latest information

See

  • the F# data science

homepage FsLab.org

slide-7
SLIDE 7

Charting with XPlot

Draw sin for values from to :

[| 0.0 .. 0.1 .. 6.3 |] |> Array.map (fun x ­> x, sin x) |> Chart.Line

Uses Google Charts behind the scenes:

0.0 1.5 3.0 4.5 6.0 ­1.0 ­0.5 0.0 0.5 1.0

slide-8
SLIDE 8

What are type providers?

slide-9
SLIDE 9

Type provider patterns

Providers for a specific data source

let wb = WorldBankData.GetDataContext() wb.Countries.India.Indicators.``Population, total``

Parameterized provider for a data format

type Rss = XmlProvider<"data/bbc.xml"> Rss.Load(url).Channel.Description

slide-10
SLIDE 10

TASK: Star Wars movie prots

Star Wars ­ rating and box office 18 94 1,980 1,990 2,000 2,010 2,020 600,000,000 1,200,000,000 1,800,000,000 2,400,000,000 Year Box office

slide-11
SLIDE 11

github.com/evelinag/polyglot-data- science

slide-12
SLIDE 12

Part II

Visualization with D3.js

slide-13
SLIDE 13

The Star Wars social network

slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16

D3.js visualizations

made easier

Gallery of examples

slide-17
SLIDE 17

D3.js social network visualization

Force-directed network layout

slide-18
SLIDE 18

Part III

Analyzing social networks with R

slide-19
SLIDE 19

Social network analysis

Who is the most central character? How to the movies compare between themselves?

slide-20
SLIDE 20

The R language

"domain-specific" language for statistical analysis

slide-21
SLIDE 21

Very quick R intro

# assignment x <­ 1 x = 1 # variable and function names x x.y read.csv

slide-22
SLIDE 22

Very quick R intro: pipeline

|> turns into %>%

install.packages("magrittr") library(magrittr) xs <­ c(1,2,3,4,5,6,7,8,9,10) xs %>% mean

slide-23
SLIDE 23

Network analysis with igraph

igraph website igraph documentation

install.packages("igraph") library(igraph)

slide-24
SLIDE 24

Creating igraph network

library(igraph) g <­ graph(edges)

edges = list of nodes n1, n2, n3, n4, n5, ... represents (n1, n2), (n3, n4), ...

slide-25
SLIDE 25

Calculating degree

d <­ degree(graph)

slide-26
SLIDE 26

F#

  • pen RProvider.igraph

let degree = R.degree(network)

slide-27
SLIDE 27

F#

export JSON into list of edges

R

perform the network analysis

slide-28
SLIDE 28

Degree

slide-29
SLIDE 29

Degree

slide-30
SLIDE 30

Degree

slide-31
SLIDE 31

Degree

Degree(v) = Number of links v ↔ v′ v ≠ v′

slide-32
SLIDE 32

Betweenness

slide-33
SLIDE 33

Betweenness

slide-34
SLIDE 34

Betweenness

slide-35
SLIDE 35

Betweenness

slide-36
SLIDE 36

Betweenness

slide-37
SLIDE 37

Betweenness

= Number of shortest paths between a and b through v Sv S = Number of shortest paths between a and b Betweenness(v = )ab Sv S

slide-38
SLIDE 38

Betweenness

= Number of shortest paths between a and b through v Sv S = Number of shortest paths between a and b Betweenness(v) = ∑

ab

Sv S

slide-39
SLIDE 39

Network structure

How do the the movies differ? Size Density Clustering coefficient

slide-40
SLIDE 40

Density

slide-41
SLIDE 41

Density

slide-42
SLIDE 42

Density

Density = Existing connections Potential connections = Existing connections N(N − 1)

1 2

slide-43
SLIDE 43

Clustering coefcient

slide-44
SLIDE 44

Clustering coefcient

slide-45
SLIDE 45

Clustering coefcient

slide-46
SLIDE 46

Clustering coefcient

slide-47
SLIDE 47

Clustering coefcient

slide-48
SLIDE 48

Clustering coefcient

slide-49
SLIDE 49

Clustering coefcient

= Number of neighbours of v Kv = Number of links between neighbours of v Ev Clustering(v) = Ev ( − 1)

1 2 Kv Kv

slide-50
SLIDE 50

Clustering coefcient

= Number of neighbours of v Kv = Number of links between neighbours of v Ev Clustering(network) = 1 N ∑

v

Ev ( − 1)

1 2 Kv Kv

slide-51
SLIDE 51

Size

Number of characters 10 20 30 40 Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 Number of characters

slide-52
SLIDE 52

Density

Network density 15 20 25 30 35 Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 Density (%)

slide-53
SLIDE 53

Clustering coefficient

Clustering coefficient (transitivity) 0.40 0.48 0.56 0.64 0.72 Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 Clustering coefficient

slide-54
SLIDE 54

CONCLUSIONS

slide-55
SLIDE 55

non-profit books and tutorials

cross-platform community data science

F# Software Foundation

commercial support open-source contributions machine learning web and cloud consulting user groups research

www.fsharp.org

slide-56
SLIDE 56

The Learning Pyramid

slide-57
SLIDE 57

Community chat and Q&A

#fsharp on Twitter StackOverow F# tag

Open source on GitHub

Visual F# repo F# Compiler and core libraries F# Incubation project space FsLab Organization repository github.com/Microsoft/visualfsharp github.com/fsharp github.com/fsprojects github.com/fslaborg

More resources

Scott Wlaschin's

slide-58
SLIDE 58

Scott Wlaschin's fsharpforfunandprofit.com

F# Books and Resources

fsharp.org/about/learning.html

slide-59
SLIDE 59

The Force Awakens

Evelina Gabasova @evelgab evelina@evelinag.com www.evelinag.com Tomas Petricek @tomaspetricek tomas@tomasp.net www.tomasp.net