Practical Advances in Machine Learning: A Computer Science - - PowerPoint PPT Presentation

practical advances in machine learning a computer science
SMART_READER_LITE
LIVE PREVIEW

Practical Advances in Machine Learning: A Computer Science - - PowerPoint PPT Presentation

Practical Advances in Machine Learning: A Computer Science Perspective Scott Neal Reilly & Jeff Druce Charles River Analytics Prepared for 2017 Workshop on Data Science and String Theory November 30 December 1, 2017 Objectives of this


slide-1
SLIDE 1

Practical Advances in Machine Learning: A Computer Science Perspective

Scott Neal Reilly & Jeff Druce Charles River Analytics Prepared for 2017 Workshop on Data Science and String Theory November 30 – December 1, 2017

slide-2
SLIDE 2

Objectives of this breakout session

Ÿ Quick review of machine learning “from a CS perspective” Ÿ Review some of the latest advances in machine learning Ÿ Tips for using ML Ÿ Discussion of academic/industrial collaboration

  • pportunities/challenges

Ÿ Discussion about all of the above

slide-3
SLIDE 3

Ÿ Charles River Analytics

Ÿ 160 people, 30-year history Ÿ Mostly government contract R&D Ÿ AI, ML, robotics, computer vision, human sensing, computational social science, human factors

Ÿ Scott Neal Reilly

Ÿ PhD, Computer Science, Carnegie Mellon University Ÿ Senior Vice President & Principal Scientist, Charles River Analytics Ÿ Focus on ensemble machine learning and causal learning

Ÿ Jeff Druce

Ÿ PhD, Civil Engineering, University of Minnesota Ÿ BS, Applied Math and Physics, University of Michigan Ÿ Scientist, Charles River Analytics Ÿ Focus on deep learning, GANs, signal processing+ML

Introductions

slide-4
SLIDE 4

Question: What can machine learning do for me?

slide-5
SLIDE 5

Machine learning is about getting computers to perform tasks that I don’t want to or don’t know how to tell them to do. What kinds of tasks? How do they learn if I don’t tell them?

Simple Definition

slide-6
SLIDE 6

Ÿ Dimension #1: Data

Ÿ What kind of data do I have? Ÿ What are the properties of the data?

Ÿ Dimension #2: Objective/Task

Ÿ What is it that is being learned? Ÿ What are the computational/time constraints on learning/execution?

Ÿ These tend to suggest particular techniques

Dimensions of a Machine Learning Problem

slide-7
SLIDE 7

Ÿ Sub-Dimension #1: What kind of data do I have?

Ÿ Labeled: supervised

Dimension #1: Data

?

slide-8
SLIDE 8

Ÿ Sub-Dimension #1: What kind of data do I have?

Ÿ Labeled: supervised Ÿ Unlabeled: unsupervised

Dimension #1: Data

?

slide-9
SLIDE 9

Ÿ Sub-Dimension #1: What kind of data do I have?

Ÿ Labeled: supervised Ÿ Unlabeled: unsupervised Ÿ Partially labeled: semi-supervised

Dimension #1: Data

?

slide-10
SLIDE 10

Ÿ Sub-Dimension #1: What kind of data do I have?

Ÿ Labeled: supervised Ÿ Unlabeled: unsupervised Ÿ Partially labeled: semi-supervised

Dimension #1: Data

?

slide-11
SLIDE 11

Ÿ Sub-Dimension #1: What kind of data do I have?

Ÿ Labeled: supervised Ÿ Unlabeled: unsupervised Ÿ Partially labeled: semi-supervised Ÿ An environment that can label data for you: exploratory

Ÿ Active learning, Reinforcement learning

Dimension #1: Data

? ? ? ? ?

slide-12
SLIDE 12

Ÿ Sub-Dimension #1: What kind of data do I have?

Ÿ Labeled: supervised Ÿ Unlabeled: unsupervised Ÿ Partially labeled: semi-supervised Ÿ An environment that can label data for you: exploratory

Ÿ Active learning, Reinforcement learning

Dimension #1: Data

?

slide-13
SLIDE 13

Ÿ Sub-Dimension #2: What are the properties of the data?

Ÿ How much is there? Ÿ How noisy is it? Ÿ How many features are there?

Dimension #1: Data

slide-14
SLIDE 14

Ÿ Classification

Ÿ Given features of X, what is X? Ÿ Supervised, unsupervised, semi-supervised, etc.

Ÿ Regression

Ÿ Given features of X, what is value of feature Y? Ÿ Linear regression, symbolic regression/genetic programming, etc.

Ÿ Dimensionality reduction

Ÿ Given features of X, can I describe X with fewer features that are comparably descriptive? Ÿ Principal component analysis, latent Dirichlet allocation, etc.

Ÿ Anomaly detection

Ÿ Given features of X, is X unusual given other X’s? Ÿ Principal component analysis, support vector machines, etc.

Ÿ Process learning

Ÿ Given task T, how do I decide what action A (or plan P) will accomplish T? Ÿ Reinforcement learning, genetic programming, RNNs, etc.

Ÿ Structure learning

Ÿ Given variables V, how do they relate to each other? Ÿ Statistical relational learning, etc.

Ÿ Model learning

Ÿ Discriminative vs. generative models Ÿ Learn p(class|features) or p(features|class) respectively.

Dimension #2: What is the learning task?

slide-15
SLIDE 15

Ÿ Given what data is available and the task, pick from…

Ÿ Neural Nets / Deep Learning Ÿ Bayesian Learning Ÿ Statistical Relational Learning Ÿ Symbolic/rule learning Ÿ Reinforcement Learning Ÿ Genetic programming Ÿ Other Approaches

Ÿ kNN, svm, logistic regression, decision trees/forests

Some Approaches to ML

slide-16
SLIDE 16

Question: What are some of the interesting recent advances in machine learning?

slide-17
SLIDE 17

Advance #1: Deep Learning Convolutional Neural Networks Deep Reinforcement Learning Generative Adversarial Networks

Advance #1: Deep Learning

slide-18
SLIDE 18

Convolutional Neural Networks

Ÿ In traditional image/signal processing and learning problems, human crafted features are used to transform the images into more informative space. Ÿ However, using human-designed features does not leverage the computational power

  • f modern day computers/GPUs !

Ÿ To perform better classification, we let a deep neural network learn optimal features that can best separate the data.

slide-19
SLIDE 19

Raw Input

Classification

Convolutional Neural Networks

Ÿ In traditional image/signal processing and learning problems, human crafted features are used to transform the images into more informative space. Ÿ However, using human-designed features does not leverage the computational power

  • f modern day computers/GPUs !

Ÿ To perform better classification, we let a deep neural network learn optimal features that can best separate the data.

slide-20
SLIDE 20

Raw Input

Classification

Convolutional Neural Networks

Automated Feature Extraction (CNN)

Ÿ In traditional image/signal processing and learning problems, human crafted features are used to transform the images into more informative space. Ÿ However, using human-designed features does not leverage the computational power

  • f modern day computers/GPUs !

Ÿ To perform better classification, we let a deep neural network learn optimal features that can best separate the data.

slide-21
SLIDE 21

Fully Convolutional Networks

Fully Convolutional Networks for Segmentation

slide-22
SLIDE 22

CNNs for non-image problems

Natural Language Processing – Text Classification

slide-23
SLIDE 23

CNNs for non-image problems

Natural Language Processing – Text Classification Signal Processing - Stereotypical Motor Movement Detection in Autism

slide-24
SLIDE 24

CNNs: Tools for Local Structure Mining

Ÿ What do all problems where leveraging CNNs is effective have in common? Ÿ CNNs mine high dimensional data where proximal input features possess some structure which can be exploited to achieve some task.

slide-25
SLIDE 25

CNNs: Tools for Local Structure Mining

Ÿ What do all problems where leveraging CNNs is effective have in common? Ÿ CNNs mine high dimensional data where proximal input features possess some structure which can be exploited to achieve some task. Ÿ Lots of proximal structure! Ÿ What problems are you facing where subtle, complex, embedded local structures could potentially be exploited?

slide-26
SLIDE 26

Advance #1: Deep Learning Convolutional Neural Networks Deep Reinforcement Learning Generative Adversarial Networks

Advance #1: Deep Learning

slide-27
SLIDE 27

Reinforcement Learning Observed State Goal Agent

slide-28
SLIDE 28

Reinforcement Learning Observed State Policy Goal Agent

slide-29
SLIDE 29

Reinforcement Learning Observed State Policy Goal Agent How can we learn an optimal policy to achieve the goal?

slide-30
SLIDE 30

Deep Reinforcement Learning

Episodes

slide-31
SLIDE 31

Deep Reinforcement Learning

Episodes Ÿ Learn the best policy through a series of training episodes. Ÿ Training uses an action-value function (aka Q function), or the expected return for following some policy.

slide-32
SLIDE 32

Deep Reinforcement Learning (Q learning)

Episodes Ÿ Ÿ Ÿ Traditionally, a linear function was used, DRL uses a deep net to approximate Q.

slide-33
SLIDE 33

DRL Successes

Bots are now the world champion in… A variety of Atari games - Mnih Go - AlphaZero Dota 2 - Deepmind

slide-34
SLIDE 34

DRL Successes

Bots are now the world champion in… A variety of Atari games - Mnih Go - AlphaZero Dota 2 - Deepmind

Is DRL only good for games?

slide-35
SLIDE 35

DRL – What can it do?

Ÿ Natural Language Processing Ÿ Intelligent Transportation Systems: Bojarski et al. (2017). Ÿ Text Generation Ÿ Understanding Deep Learning: Daniely et al. (2016) Ÿ Deep Probabilistic Programming, Tran et al. (2017) Ÿ Machine Translation: He et al. (2016a) Ÿ Building Compact Networks

slide-36
SLIDE 36

DRL – What can it do?

Ÿ Natural Language Processing Ÿ Intelligent Transportation Systems: Bojarski et al. (2017). Ÿ Text Generation Ÿ Understanding Deep Learning: Daniely et al. (2016) Ÿ Deep Probabilistic Programming, Tran et al. (2017) Ÿ Machine Translation: He et al. (2016a) Ÿ Building Compact Networks DRL can be used where a large, diverse state space makes it difficult to explore all possible strategies, and actions may have latent effects, which at some point become very important in achieving a task.

slide-37
SLIDE 37

Advance #1: Deep Learning Convolutional Neural Networks Deep Reinforcement Learning Generative Adversarial Networks

Advance #1: Deep Learning

slide-38
SLIDE 38

38 Proprietary

Generative Adversarial Networks (GANs)

Vs.

G

D

The example: We can think of G as a counterfeiter attempting to produce fake money such that they can not be detected by the discriminative false currency detecting agent D. Output

slide-39
SLIDE 39

39 Proprietary

What are GANs? – Improving G

G

D

Produces sample Gives probability sample is real Training Set

2

G

Produces sample

D

Gives probability sample is real

3

Produces sample

G

2 1 . .

Learn initial generative model Update G Update D Update G

1 F

G

. . . .

ˆ pdata

pg1 pg2 pg3 pmodel

slide-40
SLIDE 40

40 Proprietary

GANS – Successes

GANs HD Face Generation Next Frame Prediction Text to Image Generation Noise Input

slide-41
SLIDE 41

41 Proprietary

GANS – What can they do?

Ÿ In just a short time, GANS have proven to be an extremely ripe area for research Ÿ Image, music, are generation Ÿ Superresolution Ÿ Domain transformation (sketch <-> photo , satellite -> map) Ÿ Advanced malware software training GANS can be used where one wants to sample from a complex distribution which describes the structure of some training set, but produces novel instances.

slide-42
SLIDE 42

Advance #2: Probabilistic Programming

Advance #2: Probabilistic Programming

slide-43
SLIDE 43

Probabilistic Reasoning: The Gist

43

Probabilistic model expresses general knowledge about a situation

slide-44
SLIDE 44

Probabilistic Reasoning: The Gist

44

Evidence contains specific information about a situation

slide-45
SLIDE 45

Probabilistic Reasoning: The Gist

45

Queries express things that will help you make a decision

slide-46
SLIDE 46

Probabilistic Reasoning: The Gist

46

Answers to queries are framed as probabilities of different outcomes

slide-47
SLIDE 47

47

Probabilistic Reasoning: Predicting the Future

slide-48
SLIDE 48

Probabilistic Reasoning: Inferring Factors that Caused Obs.

48

slide-49
SLIDE 49

Probabilistic Reasoning: Using the Past for Prediction

49

The evidence contains knowledge of:

  • Preconditions

and

  • utcomes of

previous situations

  • Preconditions
  • f the

current situation

slide-50
SLIDE 50

Ÿ The “Corner Kick Model”

Ÿ Not object oriented Ÿ No recursion Ÿ No loops Ÿ No way to integrate complex simulation models

Ÿ The “Inference Algorithm”

Ÿ There is no such thing Ÿ There are lots of them with different properties

Ÿ Hard to use in larger systems

Limitations on Probabilistic Reasoning Systems

slide-51
SLIDE 51

Ÿ Figaro! Ÿ https://github.com/p2t2/figaro Ÿ Your model is a program

Ÿ Figaro is built on Scala Ÿ Loops, recursion, objects Ÿ You can pick an included inference algorithm or let the system pick Ÿ Integration easy in both directions

Ÿ E.g., deep net integration is an active area of exploration

Probabilistic Programming Languages

slide-52
SLIDE 52

Advance #3: Ensemble Machine Learning

Advance #3: Ensemble Machine Learning

slide-53
SLIDE 53

Ÿ Sometimes there is no algorithm that alone does what you need Ÿ Sometimes there is, but you don’t know what it is Ÿ What to do then?

Ÿ Ensemble machine learning has shown that it can often outperform any individual ML technique Ÿ Think hurricane tracking

Advance #3: Ensemble Machine Learning

slide-54
SLIDE 54

Types of Ensembles

Data ensembles Chain ensembles Technique ensembles Nested ensembles

slide-55
SLIDE 55

Enhanced Technique Ensembles

slide-56
SLIDE 56

Advance #4: Explainable Machine Learning

Advance #4: Explainable Machine Learning

slide-57
SLIDE 57

Advance #4: Explainable Machine Learning

Ÿ From before, letting the network pick what features are used leads to enhanced performance Ÿ However, what guarantees are there that the features learned by the network will be human interpretable?

Automated Feature Extraction (CNN)

slide-58
SLIDE 58

Advance #4: Explainable Machine Learning

Ÿ From before, letting the network pick what features are used leads to enhanced performance Ÿ However, what guarantees are there that the features learned by the network will be human interpretable? Ÿ Answer: Nothing!

Automated Feature Extraction (CNN)

This problem is not confined to CNNs, opaqueness is a problem across many areas in DL (and ML/AI in general). Coming into effect in Europe in 2018: General Data Protection Regulation (GDPR)

slide-59
SLIDE 59

How can ML explain itself?

Ÿ Visual based methods (for CNNs)

Attention Maps Saliency Maps Gradient Maps

slide-60
SLIDE 60

Ÿ Develop a causal model of the network in question Ÿ Probe network for human understandable concepts – look at activations Ÿ Use interventions to demonstrate causality Ÿ In other approaches, it can be easy to “hallucinate” a cause-effect relationship

CRA approach for explainable ML

Causal Model - Pipeline ML System Classification Concept Inference Causal Learning Concepts Internal Data

slide-61
SLIDE 61

Example Causal Model (Causal Graphical Model)

61 Proprietary

Time Day of Year Day of Week Time

  • f Day

Weather Temp Precip Location Urban % Airport Prox K-12 Prox College Prox Retail Prox Person Density Scene Lighting Back- ground Clutter Auto Density Bicycle Density Hydrant Density Constr Density Animal Density Person Size Clothing Baggage High Low Rolling Umbrella Body Arms Visible Legs Visible Head Visible Face Visible Torso Visible DNN Output Alias Visible

slide-62
SLIDE 62

Pedestrian Detection: Causal Learning

62 Proprietary

Intervention

None Pedestrian Outline Color

Image Node 3 Activation

110 Average activation for positive images: 197 Color+Outline 182

slide-63
SLIDE 63

These are Pedestrians (according to Node 3)

63 Proprietary

Activation maximization Adversarial image

(selected based on activation maximization analysis)

slide-64
SLIDE 64

64 Proprietary

Explainable AI – What can it do?

Ÿ Back out information on why a typical “black box” algorithm is doing what it’s doing Ÿ Give augmented example based explanations (typical in imagery) Ÿ For CRA, back out the what human understandable concepts a network is using, and quantifying the importance of those features in some task. Explainable AI is relevant when not only high performance is desired, but also a testable and explorable framework for understanding what the AI is doing ”behind the curtain”.

slide-65
SLIDE 65

Question: How do computer scientists determine which techniques to apply to one type of problem vs another?

slide-66
SLIDE 66
slide-67
SLIDE 67

Ÿ What kind of problem is it? (Classification vs. regression vs. …)

Ÿ Multiclass? Multilabel? Ÿ Can your learning method handle this?

Ÿ What kind of data do you have available?

Ÿ Could you label some unlabeled data and go semi-supervised? Ÿ Can you explore the world and use active learning or RL?

Ÿ How much data do you have?

Ÿ Deep learning only works with sufficient data Can you augment the data you have?

Ÿ How much human expertise is available?

Ÿ Can you quantify the uncertainty in this knowledge?

Ÿ What computational horsepower do you have at your disposal?

Ÿ RAM limited? GPUs?

How to choose your algorithm: Some tips (1 of 2)

slide-68
SLIDE 68

Ÿ How noisy is your data? Do you have missing values? How is the data encoded? Are “semantics” misaligned?

Ÿ Data prep is CRITICAL in machine learning! Ÿ Sometimes data science techniques help Ÿ Sometimes machine learning can itself help

Ÿ Is your data mixed? E.g., double and categorical

Ÿ Can the data be reasonably converted?

Ÿ What kinds of relationships do we expect to find?

Ÿ Linear? Non-linear? Ÿ What kind of non-linear are plausible?

Ÿ Is explainablity important or just performance?

Ÿ E.g., decision trees are implicitly somewhat interpretable, CNNs are not

Ÿ Remember ensembles. Maybe you don’t have to choose!

How to choose your algorithm: Some tips (2 of 2)

slide-69
SLIDE 69
slide-70
SLIDE 70
slide-71
SLIDE 71
slide-72
SLIDE 72

Question: What are the circumstances under which it benefits industry to partner with academics?

slide-73
SLIDE 73

Ÿ Industry-prime / Academic-sub

Ÿ We do this all the time Ÿ Academics bring cutting-edge ideas we want to build from Ÿ Academics can bring domain expertise that we lack

Ÿ Note: We have almost no domain expertise in anything our customers care about.

Ÿ Can help us open up new customers, research areas, business

Ÿ Academic-prime / Industry-sub

Ÿ Industrial research labs can feel academic in many ways Ÿ Though tend to be more team-focused than MURIs (Multidisciplinary University Research Initiatives) Ÿ If the company has relevant capabilities, invite them to the team Ÿ We are happy to publish research Ÿ One concern: most companies are for-profit

Academic-Industry Collaborations

slide-74
SLIDE 74

Question: What do you want to talk about now?

slide-75
SLIDE 75

Ÿ Which ML techniques do you currently use?

Ÿ What are the challenges associated with those techniques? Ÿ How do we know if the technique is working or not?

Ÿ How do you know if you have enough / good-enough data?

Ÿ Can the preexisting data be augmented? Ÿ Can expert knowledge be incorporated?

Ÿ Which ML tools do you currently use?

Ÿ What are the challenges associated with those tools?

Ÿ What are the biggest challenges associated with applying ML to string theory problems? Ÿ What are the string theory problems (in layman’s terms if possible!) that are most appropriate for ML to help with? Ÿ What kinds of university-industry collaborations have you engaged in? What worked well or didn’t work so well?

Discussion questions