SLIDE 1
ON LEARNING AND INFORMATION ACQUISITION WITH RESPECT TO FUTURE AVAILABILITY OF ALTERNATIVES∗ KAZUTOSHI YAMAZAKI† Department of Operations Research and Financial Engineering, Princeton University
- Abstract. Most bandit frameworks applied to economic problems such as market learn-
ing and job matching are based on the unrealistic assumption that decision makers are fully confident about the future availability of alternatives. In this paper, we study two general- izations of the classical bandit problem in which arms may become unavailable temporarily
- r permanently, and in which arms may break down and the decision maker has the option
to fix them. It is shown that an optimal index policy does not exist for either problem. Nevertheless, there exists a near-optimal index policy in the class of Whittle index policies that cannot be dominated uniformly by any other index policy over all instances of either
- problem. The index strikes the balance between exploration and exploitation with respect
to the availability of alternatives: it converges to the Gittins index as the probability of availability approaches one and to the immediate one-time reward as it approaches zero. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.
- 1. Introduction