Spice up your website with Machine Learning! Evelina Gabasova - - PowerPoint PPT Presentation

spice up your website with machine learning
SMART_READER_LITE
LIVE PREVIEW

Spice up your website with Machine Learning! Evelina Gabasova - - PowerPoint PPT Presentation

Spice up your website with Machine Learning! Evelina Gabasova @evelgab F# Snippets F# Snippets fssnip.net Searching through F# snippets over 1600 snippets over 1100 different tags Searching through F# snippets Do we need a custom system?


slide-1
SLIDE 1

Spice up your website with Machine Learning!

Evelina Gabasova @evelgab

slide-2
SLIDE 2

F# Snippets

slide-3
SLIDE 3

F# Snippets

fssnip.net

slide-4
SLIDE 4
slide-5
SLIDE 5

Searching through F# snippets

  • ver 1600 snippets
  • ver 1100 different tags
slide-6
SLIDE 6

Searching through F# snippets

slide-7
SLIDE 7
slide-8
SLIDE 8

Do we need a custom system?

slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12

Great opportunity to create a custom machine learning system!

slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16

Nguyen A et al.: Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. 2015.

slide-17
SLIDE 17

Using machine learning in production

dependence on training data inputs

slide-18
SLIDE 18

User-generated inputs

slide-19
SLIDE 19
slide-20
SLIDE 20

PART I

Finding related snippets

If you liked this F# code, you'll also like ...

slide-21
SLIDE 21

Simple information retrieval

common terms

slide-22
SLIDE 22

Bag of words

ignore order of words separate text and code

slide-23
SLIDE 23

Term frequency

Snippet 1

Term Frequency async 3 x 15 The 2 code 1 ...

Snippet 2

Term Frequency async x 15 The 2 code 1 ...

slide-24
SLIDE 24

Inverse document frequency

Relative importance of terms

idf(term) = log number of snippets number of snippets with term

slide-25
SLIDE 25

Vector representation: TF-IDF

Term frequency - inverse document frequency

tfidf(term, snippet) = tf(term, snippet) × idf(term)

slide-26
SLIDE 26

Demo

slide-27
SLIDE 27

Vector representation of snippets

Snippet x List Array ... snippet1 0.17 ... snippet2 0.04 0.001 ... snippet3 0.23 0.005 0.31 ... snippet4 ... ...

slide-28
SLIDE 28

Vector representation of snippets

slide-29
SLIDE 29

PART II

Suggesting tags

slide-30
SLIDE 30
slide-31
SLIDE 31

Suggesting tags

slide-32
SLIDE 32

Making sense of user-generated tags

async, #async, async mailprocessor, async paraller, Async sequences, asyncseq, asynchronous, Asynchronous Processing, Asynchronous Programming, asynchronous sequence, asynchronous workflows

slide-33
SLIDE 33

Edit distance

regex vs. regexp sports vs. ports pi vs. API

slide-34
SLIDE 34

Machine learning

From snippets to tags

slide-35
SLIDE 35

Associations

string and parser async and MailboxProcessor sequence and exception

slide-36
SLIDE 36

Naive Bayes

Why do you call me naive?

slide-37
SLIDE 37

Why naive?

string and parser async and MailboxProcessor sequence and exception

slide-38
SLIDE 38

Building a predictor

slide-39
SLIDE 39

Building a predictor

slide-40
SLIDE 40

Building a predictor

slide-41
SLIDE 41

Tag probabilities

Bayes theorem

p(A ∣ B) = p(B ∣ A) p(A) p(B)

slide-42
SLIDE 42

Tag probabilities

Bayes theorem

p(tag ∣ snippet) ∝ p(tag) p(snippet ∣ tag)

slide-43
SLIDE 43

Tag probabilities

Bayes theorem

p(tag ∣ snippet) ∝ p(tag) p(term ∣ tag) ∏

term

slide-44
SLIDE 44
  • 1. Prior probabilities

p(tag) ≈ Number of snippets with the tag Number of snippets

slide-45
SLIDE 45
  • 2. Tag likelihood

How frequent is the term among snippets that have the tag ?

p(term ∣ tag) = Number of snippets with the term and tag Number of snippets with the tag

slide-46
SLIDE 46

Naive Bayes prediction

p(tag ∣ snippet) ∝ p(tag) p(term ∣ tag) ∏

term

> 1? p(tag ∣ snippet) p(¬tag ∣ snippet)

slide-47
SLIDE 47

The theory is always nicer

What if there is no snippet tagged async that contains List?

slide-48
SLIDE 48

Demo

slide-49
SLIDE 49

Do you really need a custom system? Domain representation What are important features Machine learning is fun!

slide-50
SLIDE 50

Learning more

F# snippets F# snippets on GitHub The F# Foundation FsLab Package Introduction to information retrieval fssnip.net github.com/fssnippets www.fsharp.org www.fslab.org informationretrieval.org

slide-51
SLIDE 51
slide-52
SLIDE 52

Workshop

Polyglot Data Science: The Force Awakens

Friday, April 1

Data science, F#, R, D3.js ... and Star Wars!

slide-53
SLIDE 53

Thank you!

@evelgab github.com/evelinag evelinag.com