Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER - - PowerPoint PPT Presentation

prof sameer singh
SMART_READER_LITE
LIVE PREVIEW

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER - - PowerPoint PPT Presentation

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER 2017 April 6, 2017 Upcoming Check out course webpage and schedule Check out Canvas, especially for deadlines Misc. Do the survey by tomorrow, April 7, 2017


slide-1
SLIDE 1
  • Prof. Sameer Singh

CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER 2017

April 6, 2017

slide-2
SLIDE 2

Upcoming…

CS 175: PROJECTS IN AI (SPRING 2017) 2

  • Check out course webpage and schedule
  • Check out Canvas, especially for deadlines
  • Do the survey by tomorrow, April 7, 2017

Misc.

  • Homework 1 will be up soon
  • Meanwhile, install and get Malmo working
  • Due: April 14, 2017

Homework

  • Teams are due April 17, 2017, Proposals April 21, 2017
  • Start assembling teams now! (use Piazza)
  • Start thinking of project ideas

Project

slide-3
SLIDE 3

Projects in AI in Minecraft

CS 175: PROJECTS IN AI (SPRING 2017) 3

Project Overview Some Project Ideas Introduction to Reinforcement Learning

slide-4
SLIDE 4

Projects in AI in Minecraft

CS 175: PROJECTS IN AI (SPRING 2017) 4

Project Overview Some Project Ideas Introduction to Reinforcement Learning

slide-5
SLIDE 5

What is AI?

CS 175: PROJECTS IN AI (SPRING 2017) 5

"Artificial intelligence is anything computers can't do yet."

https://en.wikipedia.org/wiki/AI_effect

  • Douglas Hofstadter
slide-6
SLIDE 6

What can a project be?

CS 175: PROJECTS IN AI (SPRING 2017) 6

Research Practical Tool “Art” Just cool!

Do difficult things automatically, Minecraft is just a testbed Help players do things that are otherwise time-consuming Use AI/ML to create stuff in the world

slide-7
SLIDE 7

Technical Solution

CS 175: PROJECTS IN AI (SPRING 2017) 7

Use Artificial Intelligence or Machine Learning algorithms Artificial Intelligence Machine Learning Natural Language Processing Computer Vision Supervised Learning Unsupervised Learning Deep Learning Reinforcement Learning Computer Vision Heuristic/Adversarial/Local Search Planning Constraint Satisfaction Logic Bayesian Networks Time Series Modeling Recommendation Systems

slide-8
SLIDE 8

Evaluation

CS 175: PROJECTS IN AI (SPRING 2017) 8

How would YOU define that your project was a success? Quantitative Evaluation Numerical Metrics:

  • Accuracy, F1, AUC, …
  • Time to “run”, time to “train”

Baselines:

  • What would be currently used?
  • What are reasonable “simpler” methods?

By how much amount? We hope to improve the METRIC by AMOUNT over BASELINE! (I won’t hold you to it, just want you to think about it)

slide-9
SLIDE 9

Evaluation

CS 175: PROJECTS IN AI (SPRING 2017) 9

How would YOU define that your project was a success? Qualitative Evaluation Simple Example Cases:

  • What are examples that your idea

will “definitely” work on?

  • What is the expected output on these?

Error Analysis and Introspection:

  • Are there plots/figures to verify the behavior?
  • If it doesn’t work, how will you improve it?

The Super-Impressive Example

  • What is the best example? “awesome if it works”
  • E.g. something that perfectly captures your idea!
slide-10
SLIDE 10

You will have doubts!

CS 175: PROJECTS IN AI (SPRING 2017) 10

Every team has to meet me during Week 4. Is it too simple? Is it too ambitious? Is my evaluation inappropriate? Both TA and me are available for appointments Discussion will cover many simple situations Can I only use off-the-shelf code? Is there data to train my classifier? Use Piazza! Is there a different algorithm I should use?

slide-11
SLIDE 11

Projects in AI in Minecraft

CS 175: PROJECTS IN AI (SPRING 2017) 11

Project Overview Some Project Ideas Introduction to Reinforcement Learning

slide-12
SLIDE 12

Projects in AI in Minecraft

CS 175: PROJECTS IN AI (SPRING 2017) 12

Course Information Some Project Ideas Introduction to Reinforcement Learning

slide-13
SLIDE 13

Reinforcement Learning

CS 175: PROJECTS IN AI (SPRING 2017) 13

Navigation Learn Recipes

  • Explore the map without dying
  • Solve mazes
  • Learn the best way home from anywhere
  • Get to the highest hill in the map

Combat Agent learns to do things by trying things, and succeeding/failing

  • Learn to hide/find shelter
  • Learn to fight, example paper
  • Figure out best way to make items
  • Without any knowledge of the recipes

http://alekhagarwal.net/arxiv_geql.pdf

slide-14
SLIDE 14

Reinforcement Learning

CS 175: PROJECTS IN AI (SPRING 2017) 14

Observation Action Reward Agent learns to do things by trying things, and succeeding/failing What the agent sees What the agent can do What the agent likes/dislikes New Item++ No Item- Goal++ Died---

slide-15
SLIDE 15

Reinforcement Learning

CS 175: PROJECTS IN AI (SPRING 2017) 15

Next few lectures will go into details (and more ideas) For now, let’s look at non-RL ideas

slide-16
SLIDE 16

Describe the Scene

CS 175: PROJECTS IN AI (SPRING 2017) 16

Houses and a pig on a grassy field during the day. Pig staring at me in a village.

slide-17
SLIDE 17

Live Commentator

CS 175: PROJECTS IN AI (SPRING 2017) 17

“Hit a rabbit”

slide-18
SLIDE 18

How is this even possible?

CS 175: PROJECTS IN AI (SPRING 2017) 18

3 block in a line Grass blocks as floor Daylight, clear weather Malmo Machine Learning Deep Learning, CNN + LSTM “3 block in a line” Training Signal

slide-19
SLIDE 19

Many Variations of These

CS 175: PROJECTS IN AI (SPRING 2017) 19

“Label” Agent/World in Malmo Your code Render Machine Learning “Label” x1000 x100000 x100000

  • bject
  • bject detection
  • bjects

~caption generation action depth of pixel ~action detection, “commentary” ~stereoscopy, depth/distance prediction

slide-20
SLIDE 20

Captions to Speech

CS 175: PROJECTS IN AI (SPRING 2017) 20

Why are you making me read? Pig staring at me in a village.

slide-21
SLIDE 21

Natural Language Navigation

CS 175: PROJECTS IN AI (SPRING 2017) 21

Quite Difficult! > Go forward till you hit a wall > Go to the pig > Go to the house on the right > Go behind the house trivial hardest

slide-22
SLIDE 22

Natural Language Interface

CS 175: PROJECTS IN AI (SPRING 2017) 22

Quite Difficult! > Choose steel pickaxe and dig > Go and destroy that window > Put the blue block on the closest wall > Find a tree and chop it trivial hardest

slide-23
SLIDE 23

SHRDLU (from 1970!)

CS 175: PROJECTS IN AI (SPRING 2017) 23 http://hci.stanford.edu/winograd/shrdlu/

slide-24
SLIDE 24

Natural Speech to Commands

CS 175: PROJECTS IN AI (SPRING 2017) 24

Why are you making me type? Off the shelf Speech to Text systems Online Speech to Text APIs

slide-25
SLIDE 25

Photo to Minecraft Character

CS 175: PROJECTS IN AI (SPRING 2017) 25

Photo of a person Minecraft Skin Your Project Need to label data? Can you use existing classifiers, like Visual QA?

slide-26
SLIDE 26

Recipe Planners

CS 175: PROJECTS IN AI (SPRING 2017) 26

Inventory “Need”(s) > Get 2 wood planks > Make a stick > Get 2 diamonds > Make diamond sword Steps

slide-27
SLIDE 27

Lots of other possibilities

CS 175: PROJECTS IN AI (SPRING 2017) 27 http://www.planetminecraft.com/

Many other games in Minecraft Create AI for those? One AI that works for all of those?

slide-28
SLIDE 28

Projects in AI in Minecraft

CS 175: PROJECTS IN AI (SPRING 2017) 28

Course Information Some Project Ideas Introduction to Reinforcement Learning

slide-29
SLIDE 29

Projects in AI in Minecraft

CS 175: PROJECTS IN AI (SPRING 2017) 29

Course Information Some Project Ideas Introduction to Reinforcement Learning

Based on slides by David Silver

slide-30
SLIDE 30

Reinforcement Learning

CS 175: PROJECTS IN AI (SPRING 2017) 30

slide-31
SLIDE 31

What makes it different?

CS 175: PROJECTS IN AI (SPRING 2017) 31

No direct supervision, only rewards Feedback is delayed, not instantaneous Time really matters, i.e. data is sequential Agent’s actions affect what data it will receive

  • Fly stunt maneuvers in a helicopter
  • Defeat the world champion at Backgammon
  • Manage an investment portfolio
  • Control a power station
  • Make a humanoid robot walk
  • Play many different Atari games better than humans
  • Beat the world champion in Go

Examples

slide-32
SLIDE 32

Agent-Environment Interface

CS 175: PROJECTS IN AI (SPRING 2017) 32

  • decides on an action
  • receives next observation
  • receives next reward

Agent

  • executes the action
  • computes next observation
  • computes next reward

Environment

slide-33
SLIDE 33

Reward, Rt

CS 175: PROJECTS IN AI (SPRING 2017) 33

How well the agent is doing +, positive (Good)

  • , negative (Bad)

Nothing about WHY it is doing well, could have little to do with At-1 Agent is trying to maximize its cumulative reward

slide-34
SLIDE 34

Example of Rewards

CS 175: PROJECTS IN AI (SPRING 2017) 34

  • Fly stunt maneuvers in a helicopter
  • +ve reward for following desired trajectory
  • −ve reward for crashing
  • Defeat the world champion at Backgammon
  • +/−ve reward for winning/losing a game
  • Manage an investment portfolio
  • +ve reward for each $ in bank
  • Control a power station
  • +ve reward for producing power
  • −ve reward for exceeding safety thresholds
  • Make a humanoid robot walk
  • +ve reward for forward motion
  • −ve reward for falling over
  • Play many different Atari games better than humans
  • +/−ve reward for increasing/decreasing score
slide-35
SLIDE 35

Sequential Decision Making

CS 175: PROJECTS IN AI (SPRING 2017) 35

Actions have long term consequences Rewards may be delayed May be better to sacrifice short term reward for long term benefit Examples

  • A financial investment (may take months to mature)
  • Refuelling a helicopter (might prevent a crash later)
  • Blocking opponent moves (might eventually help win)
  • Spend a lot of money and go to college (earn more later)
  • Don’t commit crimes (rewarded by not going to jail)
  • Get started on Malmo/project soon (make it an easy quarter)

A key aspect of intelligence, how far ahead are you able to plan?

slide-36
SLIDE 36

Reinforcement Learning

CS 175: PROJECTS IN AI (SPRING 2017) 36

Given an environment (produces observations and rewards) Reinforcement Learning Automated agent that selects actions to maximize total rewards in the environment

slide-37
SLIDE 37

Let’s look at the Agent

CS 175: PROJECTS IN AI (SPRING 2017) 37

What does the choice of action depend on?

  • Can you ignore Ot completely?
  • Is just Ot enough? Or (Ot,At)?
  • Is it last few observations?
  • Is it all observations so far?
slide-38
SLIDE 38

Agent State, St

CS 175: PROJECTS IN AI (SPRING 2017) 38

History: everything that happened so far Ht = O1R1A1O2R2A2O3R3,…,At-1OtRt State, St can be Ot OtRt At-1OtRt Ot-3Ot-2Ot-1Ot In general, St = f(Ht) You, as AI designer, specify this function

slide-39
SLIDE 39

Agent Policy, 𝜌

CS 175: PROJECTS IN AI (SPRING 2017) 39

Current state St Next action At

𝜌

Deterministic Policy: 𝐵# = 𝜌 𝑇# Stochastic Policy: 𝜌 𝑏|𝑡 = 𝑄(𝐵# = 𝑏|𝑇# = 𝑡) Good policy: Leads to larger cumulative reward Bad policy: Leads to worse cumulative reward (we will explore this more in the next week)

slide-40
SLIDE 40

Example: Atari

CS 175: PROJECTS IN AI (SPRING 2017) 40

Rules are unknown

  • What makes the score increase?

Dynamics are unknown

  • How do actions change pixels?
slide-41
SLIDE 41

Video Time!

CS 175: PROJECTS IN AI (SPRING 2017) 41

https://www.youtube.com/watch?v=V1eYniJ0Rnk

slide-42
SLIDE 42

Example: Robotic Soccer

CS 175: PROJECTS IN AI (SPRING 2017) 42

https://www.youtube.com/watch?v=CIF2SBVY-J0

slide-43
SLIDE 43

AlphaGo

CS 175: PROJECTS IN AI (SPRING 2017) 43

https://www.youtube.com/watch?v=I2WFvGl4y8c

slide-44
SLIDE 44

Projects in AI in Minecraft

CS 175: PROJECTS IN AI (SPRING 2017) 44

Course Information Some Project Ideas Introduction to Reinforcement Learning