polyglot data science the force awakens
play

Polyglot data science the force awakens with F#, R and D3.js - PowerPoint PPT Presentation

Polyglot data science the force awakens with F#, R and D3.js Evelina Gabasova @evelgab Tomas Petricek @tomaspetricek Part I F# with type providers fslab.org : Doing data science using F# The data science workflow Data access with type


  1. Polyglot data science the force awakens with F#, R and D3.js Evelina Gabasova @evelgab Tomas Petricek @tomaspetricek

  2. Part I F# with type providers

  3. fslab.org : Doing data science using F# The data science workflow Data access with type providers Interactive analysis with .NET and R libraries Visualization with HTML/PDF charts and reports High-quality open-source libraries

  4. LINQ before it was cool :-) var res = StockData.MSFT .Where(stock => stock.Close ­ stock.Open > 7.0) .Select(stock => stock.Date) Looking under the cover Extension methods take Func<T1, T2> delegates Immutable because it returns a new IEnumerable Functional design allows method chaining

  5. LINQ before it was cool :-) StockData.MSFT |> Array.filter (fun stock ­> stock.Close ­ stock.Open > 7.0) |> Array.map (fun stock ­> stock.Date) Looking under the cover Pipeline operator for composing functions Lambda functions written using fun Immutable lists, sequences, arrays, etc.

  6. Charting libraries for F# XPlot - cross platform, HTML-based (recommended) F# Charting - flexible but Windows-only library Other options: FnuPlot and R provider For latest information See FsLab.org - the F# data science homepage

  7. Charting with XPlot Draw sin for values from to : 0 2 π [| 0.0 .. 0.1 .. 6.3 |] |> Array.map (fun x ­> x, sin x) |> Chart.Line Uses Google Charts behind the scenes: 1.0 0.5 0.0 ­0.5 ­1.0 0.0 1.5 3.0 4.5 6.0

  8. What are type providers?

  9. Type provider patterns Providers for a specific data source let wb = WorldBankData.GetDataContext() wb.Countries.India.Indicators.``Population, total`` Parameterized provider for a data format type Rss = XmlProvider<"data/bbc.xml"> Rss.Load(url).Channel.Description

  10. TASK: Star Wars movie pro�ts Star Wars ­ rating and box office 18 94 2,400,000,000 1,800,000,000 Box office 1,200,000,000 600,000,000 0 1,980 1,990 2,000 2,010 2,020 Year

  11. github.com/evelinag/polyglot-data- science

  12. Part II Visualization with D3.js

  13. The Star Wars social network

  14. D3.js visualizations made easier Gallery of examples

  15. D3.js social network visualization Force-directed network layout

  16. Part III Analyzing social networks with R

  17. Social network analysis Who is the most central character? How to the movies compare between themselves?

  18. The R language "domain-specific" language for statistical analysis

  19. Very quick R intro # assignment x <­ 1 x = 1 # variable and function names x x.y read.csv

  20. Very quick R intro: pipeline |> turns into %>% install.packages("magrittr") library(magrittr) xs <­ c(1,2,3,4,5,6,7,8,9,10) xs %>% mean

  21. Network analysis with igraph igraph website igraph documentation install.packages("igraph") library(igraph)

  22. Creating igraph network library(igraph) g <­ graph(edges) edges = list of nodes n1, n2, n3, n4, n5, ... represents (n1, n2), (n3, n4), ...

  23. Calculating degree d <­ degree(graph)

  24. F# open RProvider.igraph let degree = R.degree(network)

  25. F# export JSON into list of edges R perform the network analysis

  26. Degree

  27. Degree

  28. Degree

  29. Degree Degree( v ) = Number of links v ↔ v ′ v ≠ v ′

  30. Betweenness

  31. Betweenness

  32. Betweenness

  33. Betweenness

  34. Betweenness

  35. Betweenness S v = Number of shortest paths between a and b through v S = Number of shortest paths between a and b S v Betweenness( v ) ab = S

  36. Betweenness S v = Number of shortest paths between a and b through v S = Number of shortest paths between a and b S v Betweenness( v ) = ∑ S ab

  37. Network structure How do the the movies differ? Size Density Clustering coefficient

  38. Density

  39. Density

  40. Density Density = Existing connections Potential connections = Existing connections 1 N ( N − 1) 2

  41. Clustering coef�cient

  42. Clustering coef�cient

  43. Clustering coef�cient

  44. Clustering coef�cient

  45. Clustering coef�cient

  46. Clustering coef�cient

  47. Clustering coef�cient K v = Number of neighbours of v E v = Number of links between neighbours of v E v Clustering( v ) = 1 2 K v K v ( − 1)

  48. Clustering coef�cient K v = Number of neighbours of v E v = Number of links between neighbours of v Clustering(network) = 1 E v N ∑ 1 2 K v K v ( − 1) v

  49. Size Number of characters Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 0 10 20 30 40 Number of characters

  50. Density Network density Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 15 20 25 30 35 Density (%)

  51. Clustering coefficient Clustering coefficient (transitivity) Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 Episode 6 Episode 7 0.40 0.48 0.56 0.64 0.72 Clustering coefficient

  52. CONCLUSIONS

  53. non-profit books and tutorials cross-platform community data science F# Software Foundation commercial support open-source contributions www.fsharp.org machine learning web and cloud consulting user groups research

  54. The Learning Pyramid

  55. Community chat and Q&A #fsharp on Twitter StackOver�ow F# tag Open source on GitHub Visual F# repo github.com/Microsoft/visualfsharp F# Compiler and core libraries github.com/fsharp F# Incubation project space github.com/fsprojects FsLab Organization repository github.com/fslaborg More resources Scott Wlaschin's

  56. Scott Wlaschin's fsharpforfunandprofit.com F# Books and Resources fsharp.org/about/learning.html

  57. The Force Awakens Evelina Gabasova @evelgab evelina@evelinag.com www.evelinag.com Tomas Petricek @tomaspetricek tomas@tomasp.net www.tomasp.net

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend