Behavioral Neural Networks Shaowei Ke (UMich Econ) Chen Zhao (HKU - PowerPoint PPT Presentation

Behavioral Neural Networks Shaowei Ke (UMich Econ) Chen Zhao (HKU Econ) Zhaoran Wang (Northwestern IEMS) Sung-Lin Hsieh (UMich Econ) November 2020

Machine Learning Over the last 15 years, machine-learning models have performed well in many decision problems I Product recommendation I Complex games: AlphaGo 2018 Turing Award (Bengio, Hinton, and LeCun): “conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing”

Questions I A statistical model that can predict well is not necessarily a good model of how people make decisions I E.g. insights about decision making may be lost in the approximation I But maybe some of these useful machine-learning models are indeed good models of how people make decisions? I If so, we may better understand and incorporate them into economics I If we have more choice data, it is highly likely that such machine learning models would outperform our traditional models in terms of prediction, and may even help us identify behavioral phenomena

A Good Model of Decision-Making? E.g. the expected utility model 1. The model is characterized by reasonable axioms imposed directly on choice behavior 2. The model provides a plausible interpretation/story about how people make choices

This Paper 1. We provide an axiomatic foundation for a class of neural-network models applied to decision making under risk, called the neural-network expected utility (NEU) models I The independence axiom is relaxed in a novel way consistent with experimental findings I The model provides a plausible interpretation of people’s choice behavior 2. We show that simple neural-network structures, referred to as behavioral neurons, can capture behavioral biases intuitively 3. By using these behavioral neurons, we find that some simple NEU models that are easy to interpret perform better than EU and CPT

Neural-Network Expected Utility

Choice Domain and Primitive Prizes: Z = { z 1 , . . . , z n } I Generic prizes: x , y , z � p 2 R n + : ∑ n The set of lotteries: L = i = 1 p i = 1 I Generic lotteries: p , q , r , s I Degenerate lotteries: d x Mixture: For any l 2 [ 0, 1 ] , l p + ( 1 � l ) q is a lottery such that ( l p + ( 1 � l ) q ) i = l p i + ( 1 � l ) q i I l pq : = l p + ( 1 � l ) q A decision maker has a binary relation/preference % on L

Vector-Valued A ffi ne Function I t : R w ! R e w is a ffi ne if there exists w ⇥ e w matrix b and e w ⇥ 1 vector g such that t ( a ) = b a + g for any a 2 R w I t = ( t ( 1 ) , . . . , t ( e w ) ) is a ffi ne ) t ( j ) ’s are a ffi ne I A real-valued function on L is a ffi ne if and only if it is an expected utility function

NEU Representation A function U : L ! R is a NEU function if there exist I h , w 0 , w 1 , . . . , w h + 1 2 N with w 0 = n and w h + 1 = 1 I q i : R w i ! R w i , i = 1, . . . , h , such that for any b 2 R w i , q i ( b ) = ( max { b 1 , 0 } , . . . , max { b w i , 0 } ) I a ffi ne t i : R w i � 1 ! R w i , i = 1, . . . , h + 1 , such that U ( p ) = t h + 1 � q h � t h � · · · � q 2 � t 2 � q 1 � t 1 ( p ) We say that % has a NEU representation if it can be represented by a NEU function

NEU Representation First Hidden Layer Second Hidden Layer p 1 max { τ (1) max { τ (1) ( · ) , 0 } ( · ) , 0 } 1 2 p 2 U ( p ) max { τ (2) max { τ (2) ( · ) , 0 } ( · ) , 0 } 1 2 p 3 I A NEU function U ( p ) = t 3 � q 2 � t 2 � q 1 � t 1 ( p ) I i th hidden layer: q i � t i I Activation function: max {· , 0 } I Neuron: max { t ( j ) , 0 } i

Interpretation First Hidden Layer Second Hidden Layer p 1 max { τ (1) max { τ (1) ( · ) , 0 } ( · ) , 0 } 1 2 p 2 U ( p ) max { τ (2) max { τ (2) ( · ) , 0 } ( · ) , 0 } 1 2 p 3 I The decision maker has multiple considerations toward uncertainty (expected utility functions in the first layer) I E.g., one for the mean of prizes and one for downside risk I She considers multiple ways to aggregate those attitudes plausible (a ffi ne functions in the second layer) I Recursively, she may continue to have multiple ways in mind to aggregate the aggregations from the previous layer

Axiomatic Characterization

Expected Utility Theory Axiom (Weak Order) % is complete and transitive. Axiom (Continuity) For any p , { q : p % q } and { q : q % p } are closed. Axiom (Independence) For any l 2 ( 0, 1 ) , p % q ) l pr % l qr and p � q ) l pr � l qr . I There are alternative ways to define independence Axiom (Bi-Independence) For any l 2 ( 0, 1 ) , if p % q , then r % s ) l pr % l qs and r � s ) l pr � l qs . I Let p = q : Bi-Independence ) Independence I Apply Independence twice, we get Bi-Independence

Violations of (Bi-)Independence: The Allais Paradox First Pair Second Pair 87% $ 1M 87% $ 0 90% $ 0 100% $ 1M 3% $ 0 13% $ 1M 10% $ 1.5M 10% $ 1.5M 0.13 pr 0.13 qr 0.13 ps 0.13 qs I p = d 1M , q = 3 13 d 0 + 10 r = d 1M , s = d 0 13 d 1.5M , 1. Bias toward certainty 2. 0.13 qr must look su ffi ciently di ff erent from a risk-free lottery

The Allais Paradox in a Nutshell (Literally) First Pair Second Pair $ 1M 98.7% $ 0.5M $ 0.5M 98.7% 99% $ 1M $ 0.5M 100% 0.3% $ 1M $ 1.5M 1.3% 1% $ 1.5M 1% 0.013 q 0 r 0.013 ps 0 0.013 q 0 s 0 0.013 pr q 0 = 3 s 0 = d 0.5M 13 d 0.5M + 10 I p = d 1M , 13 d 1.5M , r = d 1M , I It seems much less likely to observe significant violations of (Bi-)Independence

Violations of (Bi-)Independence The di ff erence between lotteries needs to be large enough so that psychological e ff ects apply to lotteries asymmetrically I We want to stick to (Bi-)Independence as much as possible because of its normative appeal I But if (Bi-)Independence holds locally everywhere, it holds globally I Is there a (slightly) relaxed version of (Bi-)Independence that can hold locally everywhere but not globally?

Relaxing Independence A subset L preserves independence with respect to p ( L ? p ) if for any q , r 2 L and l 2 ( 0, 1 ) , q % r ) l pq % l pr and q � r ) l pq � l pr I L may not be convex, and p , l pq , l pr may be outside L

Relaxing Independence I A subset L preserves independence with respect to p if for any q , r 2 L and l 2 ( 0, 1 ) , q % r ) l pq % l pr and q � r ) l pq � l pr I A subset L ✓ L preserves independence if for any p , q , r 2 L and l 2 ( 0, 1 ) such that l pr , l qr 2 L , q % r ) l pq % l pr and q � r ) l pq � l pr

Relaxing Independence I A neighborhood of p : an open convex set that contains p Axiom (Weak Local Independence) Every p 2 L has a neighborhood L p such that L p ? p . I Weak Local Independence does not mean that “independence holds locally around every p ”

Weak Local Independence Axiom (Weak Local Independence) Every p 2 L has a neighborhood L p such that L p ? p . I Allows the following type of indi ff erence curves

Relaxing Bi-Independence I Weak Local Independence only tells us the decision maker’s local choice behavior I Local versions of Bi-Independence can regulate the decision maker’s non-local choice behavior

Relaxing Bi-Independence Axiom (Weak Local Bi-Independence) If p % q , then p and q have neighborhoods L p and L q such that for any r 2 L p , s 2 L q and l 2 ( 0, 1 ) , r % s ) l pr % l qs and r � s ) l pr � l qs . I When p = q , we obtain Weak Local Independence I Only impose bi-independence when mixing with p and q respectively I L p does not have to be the same for di ff erent q ’s q λ qs p λ pr r s

Main Theorem Theorem % has a NEU representation if and only if it satisfies Weak Order, Continuity, and Weak Local Bi-Independence. I EU characterizes linear functions on L I NEU characterizes continuous finite piecewise linear functions on L

Behavioral Neurons and Empirical Analysis

NEU and the Certainty E ff ect The Allais paradox: the decision maker has a bias toward certainty �� V � ( p ) p 1 max { p 1 � − � 0 . 9 � , � 0 } p 2 U ( p ) max { p 2 � − � 0 . 9 � , � 0 } p 3 max { p 3 � − � 0 . 9 � , � 0 } I V is an expected utility function I If p i > 0.98 , a neuron that captures certainty e ff ect will be activated

NEU and the Certainty E ff ect

NEU and Reference Dependence Kahneman and Tversky (1979): prizes are evaluated relative to a reference point; people treat gains and losses di ff erently Ert and Erev (2013): the di ff erence becomes insignificant when prizes don’t deviate from the reference point by much I A neuron about expected utility: V ( p ) I A neuron about loss aversion relative to $ x with a threshold e : l min { ∑ p i min { z i � x , 0 } , e } (loss aversion coe ffi cient: l > 1 ) i | {z } a ffi ne in p I U ( p ) is the di ff erence between two neurons’ values I Violations of expected utility theory in the form of loss aversion only occur when losses (relative to the reference point) are significant

Empirical Analysis I Can the NEU model explain and predict decision makers’ choice behavior well? I Can we do so with a NEU model that is not too complicated to interpret? I How does the NEU model compare to other economic models?

Behavioral Neural Networks Shaowei Ke (UMich Econ) Chen Zhao (HKU - PowerPoint PPT Presentation

Behavioral Neural Networks Shaowei Ke (UMich Econ) Chen Zhao (HKU Econ) Zhaoran Wang (Northwestern IEMS) Sung-Lin Hsieh (UMich Econ) November 2020 Machine Learning Over the last 15 years, machine-learning models have performed well in many

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

OHIO MEDICAID OHIO MEDICAID MITS Behavioral MITS Behavioral MITS Behavioral MITS Behavioral

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Law-Governed Multi-Agent Systems: From Anarchy to Order Naftaly Minsky Rutgers University

. 1 In 2014, at the height of the drought, the public and policy makers were seeing photos like

Ombudsman Program Libby Anderson, Program Director Shelley Woolery and Barbara Najman, Field

Database Design October 24, 2008 Database Design Outline Database Design E-R diagrams

Lecture 9: Nonparametric Regression (1) Applied Statistics 2015 1 / 22 Introduction

IIT % chg FY18 4,603.6 4.8 FY19 4,544.7 -1.3 FY20 4,765.2 4.9 FY21Q1 1,184.7 4.5

In collaboration with the Health Foundation Q Members Date: 23 rd October 2019 Venue: Charlton

New Multi-agency Arrangements April 2019 Introduction The Children and Social Work Act 2017