fast effective natural language understanding
play

Fast & Effective: Natural Language Understanding Mike Conover, - PowerPoint PPT Presentation

Fast & Effective: Natural Language Understanding Mike Conover, Ph.D. Principal Data Scientist SkipFlag Smart Knowledge Base Instant Answers Expert Identification Intelligent Bot SkipFlag Smart Knowledge Base Entity


  1. Fast & Effective: Natural Language Understanding Mike Conover, Ph.D. Principal Data Scientist

  2. SkipFlag • Smart Knowledge Base • Instant Answers • Expert Identification • Intelligent Bot SkipFlag

  3. Smart Knowledge Base • Entity Graph • Projects & Jargon • Relevant Articles • Documentation • Source Code

  4. Prototype Rapidly: Or how to solve open research problems in a production environment on deadline.

  5. Reflections Exercise is good for you.

  6. Reflections Start with the model the state of the art claims to beat and implement that.

  7. Containers & Model Deployment

  8. Tiered Metadata Architecture Compute local data access • Memory constrained environments • Fast bulk write •

  9. Language in the Wild Wikipedia Twitter Common Crawl Linked Structured Cornucopia of Malformed Text • Petabyte Scale Web Crawl • Taxonomic • Available for Free on S3 •

  10. Word Embeddings occupy

  11. “All models are wrong, but some are useful.” George Box

  12. Who Needs Grammar, Anyway? Azimuth Declination Percolate Azimuth .. M’s of Dimensions .5 .9 .01 LSA / LDA, etc. Declination Orienteering Physics .. 100’s of Dimensions .9 0.1 Percolate

  13. Targets of Interest Document Clusters Ranking Feature EDA Classification Feature Engineering

  14. Semantic Structure Man Better Rome Best Italy King Good Woman Tokyo Japan Queen Gender Geography Superlatives

  15. Embedding Vectors channel above sky the The sky above the port was the color of Embedding Dimension television, tuned to a dead channel. Document Vector

  16. Word Embeddings Glove Vectors Word2Vec Wikipedia 2014 + Gigaword 5 (6B tokens, 400K vocab, uncased, Google News (100B tokens, 3M vocab, 300d) 50d, 100d, 200d, & 300d vectors, 822 MB): Freebase (100B words, 1.4M vocab, 300d) Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors, 1.75 GB) Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB) Corpus Casing Dimensionality Size Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 25d, 50d, 100d, & 200d vectors, 1.42 GB) https://nlp.stanford.edu/projects/glove/

  17. Build Your Own Embeddings Out of the Box Word2Vec Doc2Vec Poincare Embeddings LDA / LSA

  18. Tensorflow Embedding Projector Text Images Music .. Get Crazy

  19. Compositional Embeddings Domain Specific Corpora Initialize with Pre-trained Embeddings

  20. Cut to the Chase trichlorodifluorene FastText trichl Multiclass Classification ‒ .. Subword Embeddings ‒ fluor https://github.com/facebookresearch/fastText Bojanowski, Piotr, et al. "Enriching word vectors with subword information." arXiv (2016)

  21. Embed All the Things! StarSpace ‒ Text Classification ‒ Graph Embeddings ‒ Similarity / Ranking ‒ Image Classification https://github.com/facebookresearch/StarSpace L. Wu "StarSpace: Embed All The Things!." arXiv (2017)

  22. Fine-Grained Structure DisplayCy

  23. Breakdown DisplayCy www.sadtromebone.com

  24. Piece by Piece Keyphrase Extraction ‒ RAKE Algorithm ‒ Segphrase / Autophrase graham_askew | a | biomechanics_professor | at the | university_of_leeds | in | england | leads research | to | understand | better | how | the | chambered_nautilus | moves F. Diaz. "Query expansion with locally-trained word embeddings." arXiv (2016)

  25. Taking Sentences Apart Zeroth Law: This only works in practice, never in theory. DisplayCy

  26. Learning to Rank with Neural Nets Sometimes Good Enough Isn’t Good Enough Severyn, Aliaksei, and Alessandro Moschitti. "Learning to rank short text pairs with convolutional deep neural networks." SIGIR 2015. http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/

  27. Section Break Slide Area for a Subhead, or the Name and Title of a Copresenter

  28. Pete Sam Scott Matt Skomoroch Shah Blackburn Hayes

  29. Fast & Effective: Natural Language Understanding Mike Conover, Ph.D. Principal Data Scientist

  30. Cut to the Chase Emoji Space https://github.com/facebookresearch/fastText P.Bojanowski, "Enriching word vectors with subword information." arXiv (2016)

  31. Build Your Own Embeddings Paragraph Vectors (Doc2Vec) Le, Quoc, and Tomas Mikolov. "Distributed representations of sentences and documents." International Conference on Machine Learning . 2014.

  32. Ship It! ) s k e ) e s W k ) e g 2 e n ( W i e o z g 2 w i e n ( n g t O e n e o n n p i o ( i i v t e r y i n e s c p t l t l a l u e t R a o e o z d d h t d e o i e o o r C l r o a a F l r u r M u n H P P e t e d o a l t e g n r i e u t g e h a c a p g a t m c 1 r 2 i 3 m i n v e S K L r i w p l o e i O f a S C o r t r S P

  33. Instant Answers

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend