SLIDE 1
Spice up your website with Machine Learning! Evelina Gabasova - - PowerPoint PPT Presentation
Spice up your website with Machine Learning! Evelina Gabasova - - PowerPoint PPT Presentation
Spice up your website with Machine Learning! Evelina Gabasova @evelgab F# Snippets F# Snippets fssnip.net Searching through F# snippets over 1600 snippets over 1100 different tags Searching through F# snippets Do we need a custom system?
SLIDE 2
SLIDE 3
F# Snippets
fssnip.net
SLIDE 4
SLIDE 5
Searching through F# snippets
- ver 1600 snippets
- ver 1100 different tags
SLIDE 6
Searching through F# snippets
SLIDE 7
SLIDE 8
Do we need a custom system?
SLIDE 9
SLIDE 10
SLIDE 11
SLIDE 12
Great opportunity to create a custom machine learning system!
SLIDE 13
SLIDE 14
SLIDE 15
SLIDE 16
Nguyen A et al.: Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. 2015.
SLIDE 17
Using machine learning in production
dependence on training data inputs
SLIDE 18
User-generated inputs
SLIDE 19
SLIDE 20
PART I
Finding related snippets
If you liked this F# code, you'll also like ...
SLIDE 21
Simple information retrieval
common terms
SLIDE 22
Bag of words
ignore order of words separate text and code
SLIDE 23
Term frequency
Snippet 1
Term Frequency async 3 x 15 The 2 code 1 ...
Snippet 2
Term Frequency async x 15 The 2 code 1 ...
SLIDE 24
Inverse document frequency
Relative importance of terms
idf(term) = log number of snippets number of snippets with term
SLIDE 25
Vector representation: TF-IDF
Term frequency - inverse document frequency
tfidf(term, snippet) = tf(term, snippet) × idf(term)
SLIDE 26
Demo
SLIDE 27
Vector representation of snippets
Snippet x List Array ... snippet1 0.17 ... snippet2 0.04 0.001 ... snippet3 0.23 0.005 0.31 ... snippet4 ... ...
SLIDE 28
Vector representation of snippets
SLIDE 29
PART II
Suggesting tags
SLIDE 30
SLIDE 31
Suggesting tags
SLIDE 32
Making sense of user-generated tags
async, #async, async mailprocessor, async paraller, Async sequences, asyncseq, asynchronous, Asynchronous Processing, Asynchronous Programming, asynchronous sequence, asynchronous workflows
SLIDE 33
Edit distance
regex vs. regexp sports vs. ports pi vs. API
SLIDE 34
Machine learning
From snippets to tags
SLIDE 35
Associations
string and parser async and MailboxProcessor sequence and exception
SLIDE 36
Naive Bayes
Why do you call me naive?
SLIDE 37
Why naive?
string and parser async and MailboxProcessor sequence and exception
SLIDE 38
Building a predictor
SLIDE 39
Building a predictor
SLIDE 40
Building a predictor
SLIDE 41
Tag probabilities
Bayes theorem
p(A ∣ B) = p(B ∣ A) p(A) p(B)
SLIDE 42
Tag probabilities
Bayes theorem
p(tag ∣ snippet) ∝ p(tag) p(snippet ∣ tag)
SLIDE 43
Tag probabilities
Bayes theorem
p(tag ∣ snippet) ∝ p(tag) p(term ∣ tag) ∏
term
SLIDE 44
- 1. Prior probabilities
p(tag) ≈ Number of snippets with the tag Number of snippets
SLIDE 45
- 2. Tag likelihood
How frequent is the term among snippets that have the tag ?
p(term ∣ tag) = Number of snippets with the term and tag Number of snippets with the tag
SLIDE 46
Naive Bayes prediction
p(tag ∣ snippet) ∝ p(tag) p(term ∣ tag) ∏
term
> 1? p(tag ∣ snippet) p(¬tag ∣ snippet)
SLIDE 47
The theory is always nicer
What if there is no snippet tagged async that contains List?
SLIDE 48
Demo
SLIDE 49
Do you really need a custom system? Domain representation What are important features Machine learning is fun!
SLIDE 50
Learning more
F# snippets F# snippets on GitHub The F# Foundation FsLab Package Introduction to information retrieval fssnip.net github.com/fssnippets www.fsharp.org www.fslab.org informationretrieval.org
SLIDE 51
SLIDE 52
Workshop
Polyglot Data Science: The Force Awakens
Friday, April 1
Data science, F#, R, D3.js ... and Star Wars!
SLIDE 53