CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Deep Learning. Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics P. Pošík c � 2017 Artificial Intelligence – 1 / 42
Deep Learning P. Pošík c � 2017 Artificial Intelligence – 2 / 42
A brief history of Neural Networks ■ 1940s: Model of neuron (McCulloch, Pitts) ■ 1950-60s: Modeling brain using neural networks (Rosenblatt, Hebb, etc.) Deep Learning ■ 1969: Research stagnated after Minsky and Papert’s book Perceptrons • History ■ 1970s: Backpropagation • Definition • Terminology ■ 1986: Backpropagation popularized by Rumelhardt, Hinton, Williams • Ex: Word embed. ■ 1990s: Convolutional neural networks (LeCun) • Ex: w2v arch. • Ex: w2v results ■ 1990s: Recurrent neural networks (Schmidhuber) • Why deep? • A new idea? ■ 2006: Revival of deep networks, unsupervised pre-training (Hinton et al.) • Boom of Deep Nets ■ 2013-: Huge industrial interest • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 3 / 42
What is Deep learning? Conventional ML techniques: ■ Limited in their ability to process natural data in their raw form. Deep Learning ■ Successful applications required careful engineering and human expertise to extract • History suitable features. • Definition • Terminology • Ex: Word embed. • Ex: w2v arch. • Ex: w2v results • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 4 / 42
What is Deep learning? Conventional ML techniques: ■ Limited in their ability to process natural data in their raw form. Deep Learning ■ Successful applications required careful engineering and human expertise to extract • History suitable features. • Definition • Terminology • Ex: Word embed. Representation learning: • Ex: w2v arch. • Ex: w2v results ■ Set of methods allowing a machine to be fed with raw data and to automatically • Why deep? discover the representations suitable for correct classification/regression/modeling. • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 4 / 42
What is Deep learning? Conventional ML techniques: ■ Limited in their ability to process natural data in their raw form. Deep Learning ■ Successful applications required careful engineering and human expertise to extract • History suitable features. • Definition • Terminology • Ex: Word embed. Representation learning: • Ex: w2v arch. • Ex: w2v results ■ Set of methods allowing a machine to be fed with raw data and to automatically • Why deep? discover the representations suitable for correct classification/regression/modeling. • A new idea? • Boom of Deep Nets • Autoencoders Deep learning: • Stacked autoenc. • Pre-training ■ Representation-learning methods with multiple levels of representation, with ConvNets increasing level of abstraction . Successes ■ Compose simple, but often non-linear modules transforming the representation at Recurrent Nets one level into a representation at a higher, more abstract level. Other remarks ■ The layers learn to represent the inputs in a way that makes it easy to predict the Summary target outputs. P. Pošík c � 2017 Artificial Intelligence – 4 / 42
Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning • History • Definition • Terminology • Ex: Word embed. • Ex: w2v arch. • Ex: w2v results • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 5 / 42
Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning Making a deep architecture: • History • Definition ■ A classifier uses the original representation: • Terminology • Ex: Word embed. Input Output • Ex: w2v arch. layer layer • Ex: w2v results • Why deep? x 1 • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. x 2 • Pre-training ConvNets y 1 Successes x 3 Recurrent Nets Other remarks Summary x 4 ■ A classifier uses features which are derived from the original representation: ■ A classifier uses features which are derived from the feature derived from the original representation: P. Pošík c � 2017 Artificial Intelligence – 5 / 42
Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning Making a deep architecture: • History • Definition ■ A classifier uses the original representation: • Terminology • Ex: Word embed. ■ A classifier uses features which are derived from the original representation: • Ex: w2v arch. Input Hidden Output • Ex: w2v results • Why deep? layer layer layer • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. x 1 • Pre-training ConvNets Successes x 2 Recurrent Nets y 1 Other remarks x 3 Summary x 4 ■ A classifier uses features which are derived from the feature derived from the original representation: P. Pošík c � 2017 Artificial Intelligence – 5 / 42
Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning Making a deep architecture: • History • Definition ■ A classifier uses the original representation: • Terminology • Ex: Word embed. ■ A classifier uses features which are derived from the original representation: • Ex: w2v arch. ■ A classifier uses features which are derived from the feature derived from the original • Ex: w2v results • Why deep? representation: • A new idea? Input Hidden Hidden Output • Boom of Deep Nets • Autoencoders layer layer 1 layer 2 layer • Stacked autoenc. • Pre-training ConvNets x 1 Successes Recurrent Nets Other remarks x 2 Summary y 1 x 3 x 4 P. Pošík c � 2017 Artificial Intelligence – 5 / 42
Example: Word embeddings Sometimes, even shallow architectures can do surprisingly well! Deep Learning • History • Definition • Terminology • Ex: Word embed. • Ex: w2v arch. • Ex: w2v results • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 6 / 42
Example: Word embeddings Sometimes, even shallow architectures can do surprisingly well! Representation of text (words, sentences): Deep Learning • History ■ Important for many real-world apps: search, ads recommendation, ranking, spam • Definition filtering, . . . • Terminology • Ex: Word embed. ■ Local representations: • Ex: w2v arch. ■ N-grams, 1-of-N coding, Bag of words • Ex: w2v results • Why deep? ■ Easy to construct. • A new idea? • Boom of Deep Nets ■ Large and sparse. • Autoencoders ■ No notion of similarity (synonyms, words with similar meaning ). • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Pošík c � 2017 Artificial Intelligence – 6 / 42
Example: Word embeddings Sometimes, even shallow architectures can do surprisingly well! Representation of text (words, sentences): Deep Learning • History ■ Important for many real-world apps: search, ads recommendation, ranking, spam • Definition filtering, . . . • Terminology • Ex: Word embed. ■ Local representations: • Ex: w2v arch. ■ N-grams, 1-of-N coding, Bag of words • Ex: w2v results • Why deep? ■ Easy to construct. • A new idea? • Boom of Deep Nets ■ Large and sparse. • Autoencoders ■ No notion of similarity (synonyms, words with similar meaning ). • Stacked autoenc. • Pre-training ■ Distributed representations: ConvNets ■ Vectors of real numbers in a high-dimensional continuous space (but much less Successes dimensional than 1-of-N encoding). Recurrent Nets ■ Not clear how to meaningfully construct such a representation. Other remarks ■ The size is tunable, but much smaller than that of local representations; dense. Summary ■ Similarity well defined: synonyms should be in the same area of the space. P. Pošík c � 2017 Artificial Intelligence – 6 / 42
Recommend
More recommend