spice up your website with machine learning
play

Spice up your website with Machine Learning! Evelina Gabasova - PowerPoint PPT Presentation

Spice up your website with Machine Learning! Evelina Gabasova @evelgab F# Snippets F# Snippets fssnip.net Searching through F# snippets over 1600 snippets over 1100 different tags Searching through F# snippets Do we need a custom system?


  1. Spice up your website with Machine Learning! Evelina Gabasova @evelgab

  2. F# Snippets

  3. F# Snippets fssnip.net

  4. Searching through F# snippets over 1600 snippets over 1100 different tags

  5. Searching through F# snippets

  6. Do we need a custom system?

  7. Great opportunity to create a custom machine learning system!

  8. Nguyen A et al.: Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. 2015.

  9. Using machine learning in production dependence on training data inputs

  10. User-generated inputs

  11. PART I Finding related snippets If you liked this F# code, you'll also like ...

  12. Simple information retrieval common terms

  13. Bag of words ignore order of words separate text and code

  14. Term frequency Snippet 1 Snippet 2 Term Frequency Term Frequency async 3 async 0 x 15 x 15 The 2 The 2 code 1 code 1 ... ...

  15. Inverse document frequency Relative importance of terms number of snippets idf (term) = log number of snippets with term

  16. Vector representation: TF-IDF Term frequency - inverse document frequency tfidf (term, snippet) = tf (term, snippet) × idf (term)

  17. Demo

  18. Vector representation of snippets Snippet x List Array ... snippet1 0 0.17 0 ... snippet2 0 0.04 0.001 ... snippet3 0.23 0.005 0.31 ... snippet4 0 0 0 ... ...

  19. Vector representation of snippets

  20. PART II Suggesting tags

  21. Suggesting tags

  22. Making sense of user-generated tags async, #async, async mailprocessor, async paraller, Async sequences, asyncseq, asynchronous, Asynchronous Processing, Asynchronous Programming, asynchronous sequence, asynchronous workflows

  23. Edit distance regex vs. regexp sports vs. ports pi vs. API

  24. Machine learning From snippets to tags

  25. Associations string and parser async and MailboxProcessor sequence and exception

  26. Naive Bayes Why do you call me naive?

  27. Why naive? string and parser async and MailboxProcessor sequence and exception

  28. Building a predictor

  29. Building a predictor

  30. Building a predictor

  31. Tag probabilities Bayes theorem p ( A ∣ B ) = p ( B ∣ A ) p ( A ) p ( B )

  32. Tag probabilities Bayes theorem p (tag ∣ snippet) ∝ p (tag) p (snippet ∣ tag)

  33. Tag probabilities Bayes theorem p (tag ∣ snippet) ∝ p (tag) p (term ∣ tag) ∏ term

  34. 1. Prior probabilities p (tag) ≈ Number of snippets with the tag Number of snippets

  35. 2. Tag likelihood How frequent is the term among snippets that have the tag ? p (term ∣ tag) = Number of snippets with the term and tag Number of snippets with the tag

  36. Naive Bayes prediction p (tag ∣ snippet) ∝ p (tag) p (term ∣ tag) ∏ term p (tag ∣ snippet) > 1? p (¬tag ∣ snippet)

  37. The theory is always nicer What if there is no snippet tagged async that contains List?

  38. Demo

  39. Do you really need a custom system? Domain representation What are important features Machine learning is fun!

  40. Learning more F# snippets fssnip.net F# snippets on GitHub github.com/fssnippets The F# Foundation www.fsharp.org FsLab Package www.fslab.org Introduction to information retrieval informationretrieval.org

  41. Workshop Polyglot Data Science: The Force Awakens Friday, April 1 Data science, F#, R, D3.js ... and Star Wars!

  42. Thank you! @evelgab github.com/evelinag evelinag.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend