A tour of machine learning ... guided by a complete amateur - PowerPoint PPT Presentation

A tour of machine learning ... … guided by a complete amateur Thomas Dullien, Google

Topics to cover 1. Logistic regression 2. Word embeddings 3. t-SNE 4. Deep Networks (and some transfer learning) 5. Hidden Markov Models for sequence tagging 6. Conditional Random Fields for sequence tagging 7. Reinforcement learning 8. Approximate NN and k-NN methods 9. Tree ensemble methods

Logistic Regression ● Also known as “maximum entropy modelling” ● Mathematically simple, easy to diagnose / inspect ● Idea: Approximate a conditional probability distribution from (labeled) training data ● Consider output classes and features

Logistic Regression ● Parameters that are learnt are a k x n matrix of weights ● Easily diagnosable: For each decision, contribution of each feature can be easily read-off ● Features need to be provided / engineered distribution from (labeled) training data ● Various subleties need to be observed: ○ Lots of correlated features can make training convergence arbitrarily slow ○ Features with arbitrary values can be permitted ○ Various optimization algorithms: “Iterative Scaling”, L-BFGS, SGD

Logistic Regression Example implementations: Maxent Toolkit: https://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html Tensorflow Tutorial: https://www.tensorflow.org/get_started/mnist/beginners

Word embeddings ● Extracting “meaning” from a word is difficult ● Words in a language are often related, but this relationship is not easily inferred from the written form of the word ● Letter-by-letter-similarity does not imply any semantic similarity ● Is it possible to build a dictionary that maps words into a space where some semantic relationships are represented? ● Yes - word2vec et. al.

Word embeddings ● Idea: Try to train a model that predicts contexts for a given word ● Train in a way that produces a vector representation of the word ● Vector representations are then used as stand-in for the written word in further applications

Word embeddings: Word2Vec “The quick brown fox jumped over the lazy dog” Target word Context Context

Word embeddings: Word2Vec Let be training vectors consisting of target words & their context. Then optimize the following function:

Word embeddings: Word2Vec “For each word find two vectors v_in and v_out so that the performance of the prediction of the words surrounding it is maximized.” Strange results of the embedding: Vectors were successfully used for solving analogies. Some controversy exists about how much semantics are extracted, and if the strange linear relationships are better explained by “noise”. Words used in similar contexts are “close” in embedding.

Word embeddings: Word2Vec Example implementation: https://github.com/dav/word2vec

t-SNE ● Common problem in ML: Understanding relationships between high-dimensional vectors ● Difficult to plot :-) ● t-SNE: Commonly used algorithm to visualize high-dimensional data in 2D or 3D ● Attempts to optimize a mapping so that nearby points are close in the projection, and non-near points are at distance in the projection

t-SNE Example implementation: https://github.com/lvdmaaten/bhtsne/

Deep Neural Networks ● Big hype since Hinton’s 2006 breakthrough results ● Didn’t work for decades, started working in 2006 ● Reasons why they started working are still poorly understood

Deep Neural Networks ● Big hype since Hinton’s 2006 breakthrough results ● Didn’t work for decades, started working in 2006 ● Reasons why they started working are still poorly understood Last layer is just logistic regression

Deep Neural Networks Lower layers can be viewed as feature extractors for the last-layer logistic regression. Last layer is just logistic regression

Deep Neural Networks ● Mathematically essentially iterated matrix multiplication with interleaved nonlinear function ● Each layer is of the form:

Deep Neural Networks ● Structure of the DNN is encoded in restrictions on the shape of the matrices ● Convolutional NN’s also force many weights in the lower layers to be the same (translation invariance, locality) ● Modern DNNs often use ReLu etc. instead of sigm sigmoid some other non-linear options

Deep Neural Networks ● Huge success in areas where feature engineering was traditionally very hard ○ Image processing tasks ○ Speech recognition tasks ○ ... ● Data-hungry: Many parameters to estimate, clearly one needs a fair amount of data to estimate them well ● Good way to think about non-recurrent DNNs: Sophisticated feature extractors for logistic regression.

Deep Neural Networks Lots of competing implementations now. Simply google “deep learning framework”. Tensorflow, Keras, Torch, Caffee etc.

Transfer learning ● Lower layers of DNN extract structure from input ● Image processing example: Edge detection, shapes etc. ● Low-level features for task A may be useful features for task B, too ● Transfer learning: Take DNN trained on task A, then try to re-train it to perform task B ● Example: Google inception NN, Hotdog / not Hotdog app, other example later

HMMs for sequence tagging ● Consider the problem of assigning a sequence of syllables to an audio sample ● Space to classify over grows exponentially ● Think of a person’s voice as a state machine

HMMs for sequence tagging ● Depending on what syllable is currently pronounced, audio spectrum changes ● Voice probabilistically transitions between states ● Training an HMM: ○ Specify the structure of the state machine ○ Provide labeled data to infer … ■ Transition probabilities between states ■ Distribution of data emitted at state ● Inference in HMMs: ○ Provide data sequence to infer … ■ Most likely sequence through state machine that would have produced the data sequenc

HMMs for sequence tagging ● Limitation: Independence assumption: ○ only the current state determines data distribution ○ only the current state determines transition probabilities to the next state ● Generative model: ○ Easy to “sample” from the distribution the model learnt ○ Everybody has seen Markov Twitter Bots?

HMMs for sequence tagging Example implementation: http://ghmm.sourceforge.net/ghmm-python-tutorial.html Rabiner’s very accessible HMM tutorial: https://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf

CRFs for sequence tagging ● HMM independence assumption for state transitions often not true in practice ● Example: Part-of-speech-tagging ○ Probability of a word being of a particular type depends on the type assigned to previous word ● HMMs model joint distribution, but we normally want conditional distribution ● CRFs are the sequence-form of logistic regression: ● Linear-chain CRFs computationally tractable ● More complex dependencies can make them intractable

CRFs for sequence tagging Pretty high-performance example implementation: https://wapiti.limsi.fr/ Corresponding paper: “Practical very large scale CRFs” http://www.aclweb.org/anthology/P10-1052

Approximate Nearest Neighbor Search Consider a family of hash functions (from the domain you wish to search to some domain) is locality-sensitive if there is

What does this mean? “For similar objects, the odds of a randomly drawn hash function to evaluate to the same value should be higher than for dissimilar objects.”

LSH for similarity search ● Often a matter of designing a good hash function family for your domain ● Rest of the implementation is mostly “pluggable” ● For Euclidean and angular distance, several good, public, FOSS libraries exist that can be used off-the-shelf

ANNoy and FalcoNN ANNoy FalcoNN ● Partition space into ● Use a particular halves by random polytope hash sampling & centroids ● Build a tree structure out of these halves ● Build N such trees Both work pretty well -- FOSS C++ libraries, easy-to-use Python bindings.

Geometric intuition behind ANNoy

Pick two random points to start

Pick a new random point

Measure distance to initial points

Pick closer element

Calculate average

Repeat with new point

Result: Two “centroids”

Split space in the middle between

Repeat on both sides until buckets small

Result: Tree tiling of our space

Each color: Tree-leaf / hash bucket

ANNoy intuition ● Each tree is a “hash function” (maps a point to a bucket) ● Easy to generate a new tree (sample random points, two centroids etc) ● Nearby points have higher probability to end up in same bucket than far-away points ● ⇒ A family of locality-sensitive hashes

Example: Image similarity search... … in < 100 lines of Python. ● How to best turn pictures into vectors of reals? ● Image-classification Deep Neural Networks do this - if you just cut off the last layer ● Step 1: Convert image files to real vectors by using a pre-trained image classification CNN and “cut off” the last layer

A tour of machine learning ... guided by a complete amateur - PowerPoint PPT Presentation

A tour of machine learning ... guided by a complete amateur Thomas Dullien, Google Topics to cover 1. Logistic regression 2. Word embeddings 3. t-SNE 4. Deep Networks (and some transfer learning) 5. Hidden Markov Models for sequence

THE FUTURE TOUR THE FUTURE TOUR THE FUTURE TOUR THE FUTURE TOUR Under the framework of

A G E N D A Tour Policy Oakhill Tour Presentation Travel & Sports Tour

Outline Overview VR Tour VR Tour Entities Luiz Velho Tour Script IMPA Tour

DAY TOURS 2016 TOUR OPTIONS SCHEDULED TOUR We require a minimum of two people to conduct a

2019 KR19 TOUR PRESENTATION Kalahari Sunset KR19 EURO TOUR PRESENTATION Kalahari Khoi-San

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Phuket Football Tour 26 November 4 December 2017 Phuket Football Tour Biennial football tour

2016 ENERGY STAR Change the World Tour Webinar Agenda Tour overview ENERGY STAR Day

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Compartmentalized Continuous Integration David Neto Devin Sundaram Senior MTS Senior MTS

Computational Thinking Artificial Intelligence Computational Thinking www.ugrad.cs.ubc.ca/~cs100

Brian Moran, Seraphina Goldfarb-Tarrant, Alex Xiao, Sujie Zhu, Qi Hu, Catharine Youngs 1 Provide

security Peering into Underground Economies Final exam Cumulative Wednesday May 18

RSMG 10 October 2019 ORR protects the interests of rail and road users, improving the safety,

Botnets Secret Puppetry With Computers Balaji Prasad T.K ( bpt@email.arizona.edu ) Nupur

CSE 513 I ntroduction to Operating Systems Class 10 - Security J onat han Walpole Dept . of

Funner LLVM development Nico Weber, @thakis Goma .cpp, .h .o

A tour of machine learning ... guided by a complete amateur - PowerPoint PPT Presentation

A tour of machine learning ... guided by a complete amateur Thomas Dullien, Google Topics to cover 1. Logistic regression 2. Word embeddings 3. t-SNE 4. Deep Networks (and some transfer learning) 5. Hidden Markov Models for sequence

THE FUTURE TOUR THE FUTURE TOUR THE FUTURE TOUR THE FUTURE TOUR Under the framework of

A G E N D A Tour Policy Oakhill Tour Presentation Travel &amp; Sports Tour

Outline Overview VR Tour VR Tour Entities Luiz Velho Tour Script IMPA Tour

DAY TOURS 2016 TOUR OPTIONS SCHEDULED TOUR We require a minimum of two people to conduct a

2019 KR19 TOUR PRESENTATION Kalahari Sunset KR19 EURO TOUR PRESENTATION Kalahari Khoi-San

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Phuket Football Tour 26 November 4 December 2017 Phuket Football Tour Biennial football tour

2016 ENERGY STAR Change the World Tour Webinar Agenda Tour overview ENERGY STAR Day

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Compartmentalized Continuous Integration David Neto Devin Sundaram Senior MTS Senior MTS

Computational Thinking Artificial Intelligence Computational Thinking www.ugrad.cs.ubc.ca/~cs100

Brian Moran, Seraphina Goldfarb-Tarrant, Alex Xiao, Sujie Zhu, Qi Hu, Catharine Youngs 1 Provide

security Peering into Underground Economies Final exam Cumulative Wednesday May 18

RSMG 10 October 2019 ORR protects the interests of rail and road users, improving the safety,

Botnets Secret Puppetry With Computers Balaji Prasad T.K ( bpt@email.arizona.edu ) Nupur

CSE 513 I ntroduction to Operating Systems Class 10 - Security J onat han Walpole Dept . of

Funner LLVM development Nico Weber, @thakis Goma .cpp, .h .o

A G E N D A Tour Policy Oakhill Tour Presentation Travel & Sports Tour