deep learning
play

Deep Learning. Petr Pok Czech Technical University in Prague - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Deep Learning. Petr Pok Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics P. Pok c 2017


  1. CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Deep Learning. Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics P. Pošík c � 2017 Artificial Intelligence – 1 / 42

  2. Deep Learning P. Pošík c � 2017 Artificial Intelligence – 2 / 42

  3. A brief history of Neural Networks ■ 1940s: Model of neuron (McCulloch, Pitts) ■ 1950-60s: Modeling brain using neural networks (Rosenblatt, Hebb, etc.) Deep Learning ■ 1969: Research stagnated after Minsky and Papert’s book Perceptrons • History ■ 1970s: Backpropagation • Definition • Terminology ■ 1986: Backpropagation popularized by Rumelhardt, Hinton, Williams • Ex: Word embed. ■ 1990s: Convolutional neural networks (LeCun) • Ex: w2v arch. • Ex: w2v results ■ 1990s: Recurrent neural networks (Schmidhuber) • Why deep? • A new idea? ■ 2006: Revival of deep networks, unsupervised pre-training (Hinton et al.) • Boom of Deep Nets ■ 2013-: Huge industrial interest • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 3 / 42

  4. What is Deep learning? Conventional ML techniques: ■ Limited in their ability to process natural data in their raw form. Deep Learning ■ Successful applications required careful engineering and human expertise to extract • History suitable features. • Definition • Terminology • Ex: Word embed. • Ex: w2v arch. • Ex: w2v results • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 4 / 42

  5. What is Deep learning? Conventional ML techniques: ■ Limited in their ability to process natural data in their raw form. Deep Learning ■ Successful applications required careful engineering and human expertise to extract • History suitable features. • Definition • Terminology • Ex: Word embed. Representation learning: • Ex: w2v arch. • Ex: w2v results ■ Set of methods allowing a machine to be fed with raw data and to automatically • Why deep? discover the representations suitable for correct classification/regression/modeling. • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 4 / 42

  6. What is Deep learning? Conventional ML techniques: ■ Limited in their ability to process natural data in their raw form. Deep Learning ■ Successful applications required careful engineering and human expertise to extract • History suitable features. • Definition • Terminology • Ex: Word embed. Representation learning: • Ex: w2v arch. • Ex: w2v results ■ Set of methods allowing a machine to be fed with raw data and to automatically • Why deep? discover the representations suitable for correct classification/regression/modeling. • A new idea? • Boom of Deep Nets • Autoencoders Deep learning: • Stacked autoenc. • Pre-training ■ Representation-learning methods with multiple levels of representation, with ConvNets increasing level of abstraction . Successes ■ Compose simple, but often non-linear modules transforming the representation at Recurrent Nets one level into a representation at a higher, more abstract level. Other remarks ■ The layers learn to represent the inputs in a way that makes it easy to predict the Summary target outputs. P. Pošík c � 2017 Artificial Intelligence – 4 / 42

  7. Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning • History • Definition • Terminology • Ex: Word embed. • Ex: w2v arch. • Ex: w2v results • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 5 / 42

  8. Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning Making a deep architecture: • History • Definition ■ A classifier uses the original representation: • Terminology • Ex: Word embed. Input Output • Ex: w2v arch. layer layer • Ex: w2v results • Why deep? x 1 • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. x 2 • Pre-training ConvNets y 1 Successes x 3 Recurrent Nets Other remarks Summary x 4 ■ A classifier uses features which are derived from the original representation: ■ A classifier uses features which are derived from the feature derived from the original representation: P. Pošík c � 2017 Artificial Intelligence – 5 / 42

  9. Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning Making a deep architecture: • History • Definition ■ A classifier uses the original representation: • Terminology • Ex: Word embed. ■ A classifier uses features which are derived from the original representation: • Ex: w2v arch. Input Hidden Output • Ex: w2v results • Why deep? layer layer layer • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. x 1 • Pre-training ConvNets Successes x 2 Recurrent Nets y 1 Other remarks x 3 Summary x 4 ■ A classifier uses features which are derived from the feature derived from the original representation: P. Pošík c � 2017 Artificial Intelligence – 5 / 42

  10. Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning Making a deep architecture: • History • Definition ■ A classifier uses the original representation: • Terminology • Ex: Word embed. ■ A classifier uses features which are derived from the original representation: • Ex: w2v arch. ■ A classifier uses features which are derived from the feature derived from the original • Ex: w2v results • Why deep? representation: • A new idea? Input Hidden Hidden Output • Boom of Deep Nets • Autoencoders layer layer 1 layer 2 layer • Stacked autoenc. • Pre-training ConvNets x 1 Successes Recurrent Nets Other remarks x 2 Summary y 1 x 3 x 4 P. Pošík c � 2017 Artificial Intelligence – 5 / 42

  11. Example: Word embeddings Sometimes, even shallow architectures can do surprisingly well! Deep Learning • History • Definition • Terminology • Ex: Word embed. • Ex: w2v arch. • Ex: w2v results • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 6 / 42

  12. Example: Word embeddings Sometimes, even shallow architectures can do surprisingly well! Representation of text (words, sentences): Deep Learning • History ■ Important for many real-world apps: search, ads recommendation, ranking, spam • Definition filtering, . . . • Terminology • Ex: Word embed. ■ Local representations: • Ex: w2v arch. ■ N-grams, 1-of-N coding, Bag of words • Ex: w2v results • Why deep? ■ Easy to construct. • A new idea? • Boom of Deep Nets ■ Large and sparse. • Autoencoders ■ No notion of similarity (synonyms, words with similar meaning ). • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 6 / 42

  13. Example: Word embeddings Sometimes, even shallow architectures can do surprisingly well! Representation of text (words, sentences): Deep Learning • History ■ Important for many real-world apps: search, ads recommendation, ranking, spam • Definition filtering, . . . • Terminology • Ex: Word embed. ■ Local representations: • Ex: w2v arch. ■ N-grams, 1-of-N coding, Bag of words • Ex: w2v results • Why deep? ■ Easy to construct. • A new idea? • Boom of Deep Nets ■ Large and sparse. • Autoencoders ■ No notion of similarity (synonyms, words with similar meaning ). • Stacked autoenc. • Pre-training ■ Distributed representations: ConvNets ■ Vectors of real numbers in a high-dimensional continuous space (but much less Successes dimensional than 1-of-N encoding). Recurrent Nets ■ Not clear how to meaningfully construct such a representation. Other remarks ■ The size is tunable, but much smaller than that of local representations; dense. Summary ■ Similarity well defined: synonyms should be in the same area of the space. P. Pošík c � 2017 Artificial Intelligence – 6 / 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend