cs 4803 7643 deep learning
play

CS 4803 / 7643: Deep Learning Website: - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/ Piazza: https://piazza.com/gatech/spring2020/cs4803dl7643a/ Staff mailing list (personal questions): cs4803-7643-staff@lists.gatech.edu Gradescope:


  1. Deep Learning = Hierarchical Compositionality “car” Low-Level Mid-Level High-Level Trainable Feature Feature Feature Classifier Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013] Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  2. So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra & Zsolt Kira 43

  3. Traditional Machine Learning VISION hand-crafted your favorite features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted your favorite features \ˈd ē p\ classifier MFCC fixed learned NLP hand-crafted This burrito place your favorite features “+” classifier is yummy and fun! Bag-of-words fixed learned 44 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  4. Feature Engineering SIFT Spin Images HoG Textons and many many more…. (C) Dhruv Batra & Zsolt Kira 45

  5. Traditional Machine Learning (more accurately) “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ˈd ē p\ Gaussians fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised (C) Dhruv Batra & Zsolt Kira 46 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  6. Deep Learning = End-to-End Learning “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ˈd ē p\ Gaussians fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised (C) Dhruv Batra & Zsolt Kira 47 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  7. “Shallow” vs Deep Learning • “Shallow” models hand-crafted “Simple” Trainable Feature Extractor Classifier fixed learned • Deep models Trainable Trainable Trainable Feature- Feature- Feature- Transform / Transform / Transform / Classifier Classifier Classifier Learned Internal Representations Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  8. So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra & Zsolt Kira 49

  9. Distributed Representations Toy Example • Local vs Distributed (C) Dhruv Batra & Zsolt Kira 50 Slide Credit: Moontae Lee

  10. Distributed Representations Toy Example • Can we interpret each dimension? (C) Dhruv Batra & Zsolt Kira 51 Slide Credit: Moontae Lee

  11. Ideal Feature Extractor (C) Dhruv Batra & Zsolt Kira 52

  12. Power of distributed representations! Local Distributed (C) Dhruv Batra & Zsolt Kira 53 Slide Credit: Moontae Lee

  13. Power of distributed representations! • United States:Dollar :: Mexico:? (C) Dhruv Batra & Zsolt Kira 54 Slide Credit: Moontae Lee

  14. ThisPlusThat.me Image Credit: (C) Dhruv Batra & Zsolt Kira 55 http://insightdatascience.com/blog/thisplusthat_a_search_engine_that_lets_you_add_words_as_vectors.html

  15. So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra & Zsolt Kira 56

  16. Benefits of Deep/Representation Learning • (Usually) Better Performance – “Because gradient descent is better than you” Yann LeCun • New domains without “experts” – RGBD – Multi-spectral data – Gene-expression data – Unclear how to hand-engineer (C) Dhruv Batra & Zsolt Kira 57

  17. “Expert” intuitions can be misleading • “Every time I fire a linguist, the performance of our speech recognition system goes up” – Fred Jelinik, IBM ’98 (C) Dhruv Batra & Zsolt Kira 58

  18. Benefits of Deep/Representation Learning • Modularity! • Plug and play architectures! (C) Dhruv Batra & Zsolt Kira 59

  19. Differentiable Computation Graph Any DAG of differentialble modules is allowed! (C) Dhruv Batra & Zsolt Kira 60 Slide Credit: Marc'Aurelio Ranzato

  20. (C) Dhruv Batra & Zsolt Kira 61

  21. Logistic Regression as a Cascade Given a library of simple functions Compose into a complicate function (C) Dhruv Batra & Zsolt Kira 62 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  22. Logistic Regression as a Cascade Given a library of simple functions Compose into a complicate function (C) Dhruv Batra & Zsolt Kira 63 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  23. Key Computation: Forward-Prop (C) Dhruv Batra & Zsolt Kira 64 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  24. Key Computation: Back-Prop (C) Dhruv Batra & Zsolt Kira 65 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  25. Differentiable Computation Graph Any DAG of differentialble modules is allowed! (C) Dhruv Batra & Zsolt Kira 66 Slide Credit: Marc'Aurelio Ranzato

  26. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  27. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  28. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  29. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  30. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  31. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  32. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  33. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  34. Yes it works, but how? (C) Dhruv Batra & Zsolt Kira 89

  35. Outline • What is Deep Learning, the field, about? • What is this class about? • What to expect? – Logistics • FAQ (C) Dhruv Batra & Zsolt Kira 90

  36. Outline • What is Deep Learning, the field, about? • What is this class about? • What to expect? – Logistics • FAQ (C) Dhruv Batra & Zsolt Kira 91

  37. What is this class about? (C) Dhruv Batra & Zsolt Kira 92

  38. What is this class about? • Introduction to Deep Learning • Goal: – After finishing this class, you should be ready to get started on your first DL research project. • Convolutional Neural Networks (CNNs) • Recurrent Neural Networks (RNNs) • Deep Reinforcement Learning • Generative Models (VAEs, GANs) • Target Audience: – Senior undergrads, MS-ML, and new PhD students • Note : Materials largely follows those developed by Dhruv Batra but with slight modifications (C) Dhruv Batra & Zsolt Kira 93

  39. What this class is NOT • NOT the target audience: – Advanced grad-students already working in ML/DL areas – People looking to understand latest and greatest cutting- edge research (e.g. GANs, AlphaGo, etc) – Undergraduate/Masters students looking to graduate with a DL class on their resume. • NOT the goal: – Teaching a toolkit. “Intro to TensorFlow/PyTorch” – Intro to Machine Learning (C) Dhruv Batra & Zsolt Kira 94

  40. Caveat • This is an ADVANCED Machine Learning class – This should NOT be your first introduction to ML – You will need a formal class; not just self-reading/courser – Taking these concurrently does not count! – If you took CS 7641/ISYE 6740/CSE 6740 @GT, you’re in the right place – If you took an equivalent class elsewhere, see list of topics taught in CS 7641 to be sure. (C) Dhruv Batra & Zsolt Kira 95

  41. Prerequisites • Intro Machine Learning – Classifiers, regressors, loss functions, MLE, MAP • Linear Algebra – Matrix multiplication, eigenvalues, positive semi-definiteness… • Calculus – Multi-variate gradients, hessians, jacobians… If you do not have these pre-requisite, consider dropping! • This is for your benefit, as well as benefit of others (C) Dhruv Batra & Zsolt Kira 96

  42. Prerequisites • Intro Machine Learning – Classifiers, regressors, loss functions, MLE, MAP • Linear Algebra – Matrix multiplication, eigenvalues, positive semi-definiteness… • Calculus – Multi-variate gradients, hessians, jacobians… (C) Dhruv Batra & Zsolt Kira 97

  43. Prerequisites • Intro Machine Learning – Classifiers, regressors, loss functions, MLE, MAP • Linear Algebra – Matrix multiplication, eigenvalues, positive semi-definiteness… • Calculus – Multi-variate gradients, hessians, jacobians… • Programming! – Homeworks will require Python, C++! – Libraries/Frameworks: PyTorch – HW1 (pure python + PyTorch), HW2-4 (PyTorch) – Your language of choice for project (C) Dhruv Batra & Zsolt Kira 98

  44. Course Information • Instructor: Zsolt Kira – zkira@gatech – Location: 222 CCB • I will always be available; just contact me or come to office hours • My job is to: – Teach the course such that you learn a lot – Provide any support needed towards that – Have fun and develop a passion for these topics (C) Dhruv Batra and Zsolt Kira 99

  45. Course Information • Instructor: Zsolt Kira – zkira@gatech – Location: CODA room S1181B Incoming Ph.D. • Zubair Irshad • Ben Wilson • James Smith (C) Dhruv Batra & Zsolt Kira 100

  46. Current TAs Sameer Dharur Rahul Duggal Patrick Grady MS-CS student 2 nd year CS PhD student 2 nd year Robotics PhD student https://www.linkedin.com/in/sameerdharur/ http://www.rahulduggal.com/ https://www.linkedin.com/in/patrick-grady Jiachen Yang Anishi Mehta Yinquan Lu 2 nd year MSCSE student 2nd year ML PhD MSCS student https://www.cc.gatech.edu/~jyang462/ https://www.linkedin.com/in/anishimehta https://www.cc.gatech.edu/~jyang462/ More TAs coming soon! (C) Dhruv Batra & Zsolt Kira 101

  47. Organization & Deliverables • PS0 (2%) + 4 homeworks (78%) – PS0 is warm-up graded pass/fail – Do it! – In general PS/HWs a mix of theory and implementation – First real one goes out next week • Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early • Final project (20%) – Projects done in groups of 3-4 • (Bonus) Class Participation (up to 3%) – Top contributors to discussions (mainly on Piazza) – Ask questions, answer questions (C) Dhruv Batra & Zsolt Kira 102

  48. New Element: FB Co-Teaching! • Several elements including: – Guest Lectures – 6 in-class lectures by FB • Data wrangling • Embeddings and world2vec • Self-attention and transformers • Language modeling and translation • Large-scale systems • Fairness, privacy, ethics – Assignments – Volunteers developing some new elements for assignments – Project ideas – Instructors will provide ideas for real-world projects and possible (surrogate/public) data sources that mirror some of the challenges they are working on (C) Dhruv Batra & Zsolt Kira 103

  49. Late Days • “Free” Late Days – 7 late days for the semester • Use for HWs • Cannot use for project related deadlines – After free late days are used up: • 25% penalty for each late day (C) Dhruv Batra & Zsolt Kira 104

  50. PS0 • Out today; due 01/14 – Available on website (will show up on Canvas today) • Grading: pass/fail – <=80% means that you might not be prepared for the class – Consider dropping or talk to me if that’s the case! • Topics – Probability, calculus, convexity, proving things (C) Dhruv Batra & Zsolt Kira 105

  51. Project • Goal – Chance to try Deep Learning – Encouraged to apply to your research (computer vision, NLP, robotics,…) – Must be done this semester. – Can combine with other classes with separated thrusts • get permission from both instructors; delineate different parts – Extra credit for shooting for a publication – Teams of 3-4 people • Undergraduate and graduates on separate teams • Contributions of each member must be explained and cannot just be report writing, etc. • Main categories – Application/Survey • Compare a bunch of existing algorithms on a new application domain of your interest – Formulation/Development • Formulate a new model or algorithm for a new or old problem – Theory • Theoretically analyze an existing algorithm (C) Dhruv Batra & Zsolt Kira 106

  52. Computing • Major bottleneck – GPUs • Options – Your own / group / advisor’s resources – Google Cloud Credits • $50 credits to every registered student courtesy Google – Google colaboratory allows free TPU access!! • https://colab.research.google.com/notebooks/welcome.ipynb – Minsky cluster in IC (C) Dhruv Batra & Zsolt Kira 107

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend