Projects 3-4 person groups preferred Deliverables: Poster & - PowerPoint PPT Presentation

Projects 3-4 person groups preferred Deliverables: Poster & Report & main code (plus proposal, midterm slide) Topics your own or chose form suggested topics. Some physics/engineering inspired . April 26 groups due to TA (if you don’t have a group, ask in piazza we can help). TAs will construct groups after that. May 5 proposal due. TAs and Peter can approve. Proposal: One page: Title, a large paragraph, data, weblinks, references. May 20 Midterm slide presentation. Presented to a subgroup of class. June 5 final poster. Uploaded June 3 Report and code due Saturday 15 June. Q: Can the final project be shared with another class? If the other class allows it it should be fine. You cannot turn in an identical project for both classes, but you can share common infrastructure/code base/datasets across the two classes. No cut and paste from other sources without making clear that this part is a copy. This applies to other reports or things from internet. Citations are important.

Last time: Data Preprocessing After normalization : less sensitive to small Before normalization : classification loss changes in weights; easier to optimize very sensitive to changes in weight matrix; hard to optimize Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung 9

Optimization: Problems with SGD What if loss changes quickly in one direction and slowly in another? What does gradient descent do? Very slow progress along shallow dimension, jitter along steep direction Loss function has high condition number : ratio of largest to smallest singular value of the Hessian matrix is large Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 16

Optimization: Problems with SGD What if the loss Optimization: Problems with SGD function has a local minima or saddle point ? What if the loss function has a Zero gradient, local minima or gradient descent saddle point ? gets stuck Saddle points much Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung more common in 18 high dimension Dauphin et al, “Identifying and attacking the saddle point problem in high-dimensional non-convex optimization”, NIPS 2014 Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung 19

Optimization: Problems with SGD Our gradients come from minibatches so they can be noisy! SGD + Momentum Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung 20 SGD SGD+Momentum - Build up “velocity” as a running mean of gradients - Rho gives “friction”; typically rho=0.9 or 0.99 Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - 21 April 25, 2017 April 25, 2017

Adam (full form) Momentum Bias correction AdaGrad / RMSProp Bias correction for the fact that Adam with beta1 = 0.9, first and second moment beta2 = 0.999, and learning_rate = 1e-3 or 5e-4 estimates start at zero is a great starting point for many models! Kingma and Ba, “Adam: A method for stochastic optimization”, ICLR 2015 Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung 37

SGD, SGD+Momentum, Adagrad, RMSProp, Adam all have learning rate as a hyperparameter. => Learning rate decay over time! step decay: e.g. decay learning rate by half every few epochs. exponential decay: 1/t decay: Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 40

How to improve single-model performance? Regularization Regularization: Add term to loss Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - 58 April 25, 2017 April 25, 2017 In common use: L2 regularization (Weight decay) L1 regularization Elastic net (L1 + L2) Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 59

Regularization: Dropout In each forward pass, randomly set some neurons to zero Probability of dropping is a hyperparameter; 0.5 is common Srivastava et al, “Dropout: A simple way to prevent neural networks from overfitting”, JMLR 2014 Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 60

Homework Regularization: Dropout How can this possibly be a good idea? Forces the network to have a redundant representation; Prevents co-adaptation of features has an ear X has a tail X cat is furry score has claws X mischievous look Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - 62 April 25, 2017 April 25, 2017

Regularization: Data Augmentation “cat” Load image and label Compute loss CNN Data Augmentation Transform image Get creative for your problem! Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 75 Random mix/combinations of : - translation +simulated data - rotation using physical - stretching model. - shearing, - lens distortions, … (go crazy) Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung 81

Donahue et al, “DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition”, ICML 2014 Transfer Learning with CNNs Razavian et al, “CNN Features Off-the-Shelf: An Astounding Baseline for Recognition”, CVPR Workshops 2014 1. Train on Imagenet 2. Small Dataset (C classes) 3. Bigger dataset FC-1000 FC-C FC-C Train these FC-4096 FC-4096 FC-4096 Reinitialize FC-4096 FC-4096 FC-4096 this and train MaxPool MaxPool MaxPool With bigger Conv-512 Conv-512 Conv-512 Conv-512 Conv-512 Conv-512 dataset, train MaxPool MaxPool MaxPool more layers Conv-512 Conv-512 Conv-512 Conv-512 Conv-512 Conv-512 Freeze these MaxPool MaxPool MaxPool Freeze these Conv-256 Conv-256 Conv-256 Conv-256 Conv-256 Conv-256 MaxPool MaxPool MaxPool Lower learning rate Conv-128 Conv-128 Conv-128 when finetuning; Conv-128 Conv-128 Conv-128 1/10 of original LR MaxPool MaxPool MaxPool is good starting Conv-64 Conv-64 Conv-64 Conv-64 Conv-64 Conv-64 point Image Image Image Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 25, 2017 April 25, 2017 90

Predicting Weather with Machine Learning: Intro to ARMA and Random Forest Emma Ozanich PhD Candidate, Scripps Institution of Oceanography

Background Shi et al NIPS 2015 – Predicting rain at different time • lags Shows convolutional lstm vs • nowcast models vs fully- connected lstm Used radar echo (image) inputs • Hong Kong, 2011-2013, o 240 frames/day o Selected top 97 rainy days o Note: <10% of data used! o Preprocessing: k-means clustering • to denoise ConvLSTM has better • performance and lower false alarm (lower left) CSI: hits/(hits+misses+false) FAR: false/(hits+false) POD: hits/(hits+misses) false = false alarm

Background McGovern et al 2017 BAM – Decision trees used in meteorology since mid-1960s • Predicting rain at different time lags McGovern et al 2017, Bull. Amer. Meteor. Soc. 98:10, p. 2073-2090.

Background McGovern et al 2017 BAM – Green contours = hail occurred (truth) • Physics based method: Convection-allowing model (CAM) • Doesn’t directly predict hail o Random forest predicts hail size ( Γ ) distribution based on weather • HAILCAST = diagnostic measure based on CAMs • Updraft Helicity = surrogate variable from CAM • McGovern et al 2017, Bull. Amer. Meteor. Soc. 98:10, p. 2073-2090.

Decision Trees Algorithm made up of conditional control statements • Homework'Deadline' tonight?' Yes ' No ' Do'homework' Party'invitaNon?' Yes ' No ' Do'I'have'friends' Go'to'the'party' Yes ' No ' Hang'out'with' Read'a'book' friends'

Decision Trees McGovern et al 2017 BAM – Decision trees used in meteorology since mid-1960s • McGovern et al 2017, Bull. Amer. Meteor. Soc. 98:10, p. 2073-2090.

Regression Tree Divide data into distinct, non-overlapping regions R 1 ,…, R J • Below y i = color = continuous target (<blue = 1 and >red = 0). • x i , i = 1,..,5 samples • ! " = $ % , $ ' , with P = 2 features. • j = 1,..,5 (5 regions). • X 1 ≤ t 1 | X 2 ≤ t 2 X 1 ≤ t 3 X 2 ≤ t 4 R 1 R 2 R 3 Hastie et al 2017, Chap. 9 p 307 . R 4 R 5

Projects 3-4 person groups preferred Deliverables: Poster & - PowerPoint PPT Presentation

Projects 3-4 person groups preferred Deliverables: Poster & Report & main code (plus proposal, midterm slide) Topics your own or chose form suggested topics. Some physics/engineering inspired . April 26 groups due to TA (if you dont have

INTERNATIONAL 2018 CALL FOR PROJECTS PROJECTS PRESENTATION March 2020 1 International

MEDITERRANEAN 2018 CALL FOR PROJECTS PROJECTS PRESENTATION April 2020 1 Mediterranean

INTERNATIONAL 2018 CALL FOR PROJECTS PROJECTS PRESENTATION June 2020 1 International

CATALYTIC INVESTMENT PROJECTS FEB 2017 CONTENTS 1. CATALYTIC PROJECTS CRITERIA 2. CATALYTIC

Semester projects Semester projects Semester projects Semester projects Principles of Complex

Henin Beaumont Lighting projects meeting 24-02-2014 Domaine de Grenade Lighting projects meeting

Projects Delivery : Developing and delivering Capital Projects Who we are What we do

Outline Semester projects Semester projects The Plan The Plan Principles of Complex Systems

ENKA HOSPITAL PROJECTS Hosptal Projects ENKAs involvement in hospital projects began in

National Projects Requiring HL7 Standards National Projects Requiring HL7 Standards National

Capstone Projects Capstone Projects A.K.A. Service Projects 1-888-SCOUTS-NOW | scouts.ca |

Lt. Colonel Chris Attard AFM Staff Officer 1 EU Projects EXTERNAL BORDER FUNDS 2007-2013 19

Summary of Other BRAID Research Projects Dr. Richard Oster MDSi Final Community Gathering

Narrative hierarchy Semester projects The Plan The Plan Principles of Complex Systems

TRANSPORTATION PROJECTS GREATER BLUE RIDGE CORRIDOR ALL PROJECTS TRINITY ROAD AT ALL PROJECTS YOUTH

58:080 Final Projects Overview of Past Projects University of Iowa Propose and Plan Final

Sorting a Permutation by Cut-and-Paste Moves Daniel W. Cranston Virginia Commonwealth University

Sta$cDetec$onofSecurity Vulnerabili$esinScrip$ngLanguages

A call to ac;on 25 years is too long to wait for

CSCI 2132 Software Development Lab 2: Unix Utilities; Emacs Instructor: Vlado Keselj Faculty

Whats the Problem? What does it mean to collect provenance when you dont control:

ACT-IAC Evolving the Workforce Community of Interest Topic: Fair and Effective Hiring: Are

CUTTING AND PASTING MANIFOLDS FROM THE ALGEBRAIC POINT OF VIEW ANDREW RANICKI (Edinburgh)

Perturbative topological field theory with Segal-like gluing Pavel Mnev Max Planck Institute for