An Overview of the AI Safety Landscape Workshop on Reliable - - PowerPoint PPT Presentation

an overview of the ai safety landscape
SMART_READER_LITE
LIVE PREVIEW

An Overview of the AI Safety Landscape Workshop on Reliable - - PowerPoint PPT Presentation

http://ea-foundation.org An Overview of the AI Safety Landscape Workshop on Reliable Artificial Intelligence 2017, ETH Zurich Max Daniel Research Project Manager, Effective Altruism Foundation https://blog.openai.com/faulty-reward-functions/


slide-1
SLIDE 1

http://ea-foundation.org

An Overview of the AI Safety Landscape

Max Daniel Research Project Manager, Effective Altruism Foundation

Workshop on Reliable Artificial Intelligence 2017, ETH Zurich

slide-2
SLIDE 2

https://blog.openai.com/faulty-reward-functions/ 2

slide-3
SLIDE 3

https://blog.openai.com/faulty-reward-functions/ 3

slide-4
SLIDE 4

4

Amodei, Olah et al. 2016

“[C]oncrete safety problems that are ready for experimentation today and relevant to the cutting edge of AI systems”

  • 1. Avoid negative side effects
  • 2. Avoid reward hacking
  • 3. Scalable oversight
  • 4. Safe exploration
  • 5. Robustness to

distributional shift

slide-5
SLIDE 5

5

Ng and Russell (ICML 2000), Hadfield-Menell et al. (NIPS 2016)

slide-6
SLIDE 6

6

Christiano et al. 2017

slide-7
SLIDE 7

Security

Huang et al. 2017 7

slide-8
SLIDE 8

Source: http://rll.berkeley.edu/adversarial/videos/pong_a3c_trpo_l-inf.mp4

slide-9
SLIDE 9

Corrigibility

Soares et al. (AAAI 2015), Orseau and Armstrong (UAI 2016) 9

slide-10
SLIDE 10

Privacy

Papernot et al. (ICLR 2017) 10

slide-11
SLIDE 11

Soares and Fallenstein (2017 [2014]) 11

“This technical agenda primarily covers topics that the authors believe are tractable, uncrowded, focused, and unable to be outsourced to forerunners of the target AI system.”

  • 1. Realistic World-Models
  • 2. Decision Theory
  • 3. Logical Uncertainty
  • 4. Vingean Reflection
slide-12
SLIDE 12

1) Research Goal 2) Research Funding 3) Science-Policy Link 4) Research Culture 5) Race Avoidance 6) Safety 7) Failure Transparency 8) Judicial Transparency 9) Responsibility 10) Value Alignment 11) Human Values 12) Personal Privacy 13) Liberty and Privacy 14) Shared Benefit 15) Shared Prosperity 16) Human Control 17) Non-subversion 18) AI Arms Race 19) Capability Caution 20) Importance 21) Risks 22) Recursive Self-Improvement 23) Common Good

Source: Asilomar AI Principles

slide-13
SLIDE 13

Conclusion

  • Ensuring that AI agents do what we want is a nontrivial problem.
  • Technical AI safety is a thriving field in AI/ML research.
  • Several research agendas and concrete problems have been

pursued.

  • Complements contributions from law, economics, policy,

philosophy, social science, …

13

slide-14
SLIDE 14

Presentation title

John Smith | Head of Department 28.06.2016

Subtitle or caption

Thank you.

max.daniel@ea-foundation.org

slide-15
SLIDE 15