Optimal Non-parametric Learning in Repeated Contextual Auctions with - - PowerPoint PPT Presentation

β–Ά
optimal non parametric learning in repeated contextual
SMART_READER_LITE
LIVE PREVIEW

Optimal Non-parametric Learning in Repeated Contextual Auctions with - - PowerPoint PPT Presentation

Optimal Non-parametric Learning in Repeated Contextual Auctions with Strategic Buyer Alexey Drutsa Setup Repeated Contextual Posted-Price Auctions Different goods (e.g., ad spaces) described by -dimensional feature vectors (contexts)


slide-1
SLIDE 1
slide-2
SLIDE 2

Optimal Non-parametric Learning in Repeated Contextual Auctions with Strategic Buyer

Alexey Drutsa

slide-3
SLIDE 3

Setup

slide-4
SLIDE 4

Repeated Contextual Posted-Price Auctions

Different goods (e.g., ad spaces)

β€Ί described by 𝑒-dimensional feature vectors (contexts) from 0,1 % β€Ί are repeatedly offered for sale by a seller β€Ί to a single buyer over π‘ˆ rounds (one good per round).

The buyer

β€Ί holds a private fixed valuation function 𝑀: 0,1 % β†’ 0,1 β€Ί used to calculate his valuation 𝑀(𝑦) for a good with context 𝑦 ∈ 0,1 %, β€Ί 𝑀 is unknown to the seller.

At each round 𝑒 = 1, … , π‘ˆ,

β€Ί a feature vector 𝑦2 of the current good is observed by the seller and the buyer β€Ί a price π‘ž2 is offered by the seller, β€Ί and an allocation decision 𝑏2 ∈ {0,1} is made by the buyer:

𝑏2 = 0, when the buyer rejects, and 𝑏2 = 1, when the buyer accepts.

slide-5
SLIDE 5

Seller’s pricing algorithm and buyer strategy

The seller applies a pricing algorithm 𝐡 that sets prices {π‘ž2}289

:

in response to buyer decisions 𝐛 = {𝑏2}289

:

and observed contexts 𝐲 = {𝑦2}289

:

. The price π‘ž2 can depend only on

β€Ί past decisions {𝑏=}=89

2>9

β€Ί feature vectors {𝑦=}=89

2

β€Ί the horizon π‘ˆ

slide-6
SLIDE 6

Strategic buyer

β–Œ The seller announces her pricing algorithm 𝐡 in advance

The buyer has some distribution (beliefs) 𝐸 about future contexts. In each round 𝑒, given the history of previous rounds, he chooses his decision 𝑏2 s.t. it maximizes his future 𝛿-discounted surplus: 𝔽BC~E F 𝛿=>9𝑏=(𝑀(𝑦=) βˆ’ π‘ž=)

: =82

, 𝛿 ∈ (0,1]

slide-7
SLIDE 7

pu publ blic

kn knowle ledge

pr priva vate

kn knowle ledge before game starts

Se Seller

Al Algo gorithm hm

Na Nature Buye Buyer Al Algo gorithm

round 𝑒 = 1

𝑦9 π‘ž9 𝑏9 𝐸

Al Algo gorithm hm

Na Nature Buye Buyer

round 𝑒 = 2

𝑦J π‘žJ 𝑏J

Al Algo gorithm hm

Na Nature Buye Buyer

round 𝑒 = 3

𝑦L π‘žL 𝑏L 𝑀 𝐸 𝑀 𝐸 𝑀

The game’s workflow and knowledge structure

slide-8
SLIDE 8

Seller’s goal

The seller’s strategic regret: SReg π‘ˆ, 𝐡, 𝑀, 𝛿, 𝑦9::, 𝐸 : = βˆ‘ (𝑀(𝑦2) βˆ’ 𝑏2

RSTπ‘ž2) : 289

We will learn the function 𝑀 in a non-parametric way. For this, we will assume that it is Lipschitz (a standard requirement for non-parametric learning): LipX 0,1 % ≔ 𝑔: 0,1 % β†’ 0,1 |βˆ€π‘¦, 𝑧 ∈ 0,1 % 𝑔 𝑦 βˆ’ 𝑔 𝑧 ≀ 𝑀 𝑦 βˆ’ 𝑧 The seller seeks for a no-regret pricing for worst-case valuation function: supb∈cdSe

f,9 g ,Bh:i,ESReg π‘ˆ, 𝐡, 𝑀, 𝛿, 𝑦9::, 𝐸 = 𝑝 π‘ˆ

Optimality: the lowest possible upper bound for the regret of the form 𝑃 𝑔(π‘ˆ) .

slide-9
SLIDE 9

Background & Research question

slide-10
SLIDE 10

Background

[Kleinberg et al., FOCS’2003] Non-contextual setup (𝑒 = 0). Horizon-dependent optimal algorithm against myopic buyer (𝛿 = 0) with truthful regret Θ(log log π‘ˆ). [Mao et al., NIPS’2018] Our non-parametric contextual setup (𝑒 > 0). Horizon-dependent optimal algorithm against myopic buyer (𝛿 = 0) with truthful regret Θ(π‘ˆ

g gph).

[Drutsa, WWW’2017] Non-contextual setup (𝑒 = 0). Horizon-independent optimal algorithm against strategic buyer with regret Θ(log log π‘ˆ) for 𝛿 < 1. [Amin et al., NIPS’2013] Non-contextual setup (𝑒 = 0). The strategic setting is introduced. βˆ„ no-regret pricing for non-discount case 𝛿 = 1.

slide-11
SLIDE 11

Research question

The key approaches of the non-contextual optimal algorithms ([pre]PRRFES) cannot be directly applied to contextual algorithm of [Mao et al., NIPS’2018] In order to search the valuation of the strategic buyer without context:

β€Ί Penalization rounds are used β€Ί We do not propose prices below the ones that are earlier accepted

In the approach of [Mao et al., NIPS’2018]:

β€Ί Standard penalization does not help β€Ί Proposed prices can be below the ones that are earlier accepted by the buyer

β–Œ In this study, I overcome these issues and propose an optimal β–Œ non-parametric algorithm for the contextual setting with strategic buyer

slide-12
SLIDE 12

Novel optimal algorithm

slide-13
SLIDE 13

Penalized Exploiting Lipschitz Search (PELS)

PELS has three parameters:

β€Ί the price offset πœƒ ∈ 1, +∞ β€Ί the degree of penalization 𝑠 ∈ β„• β€Ί the exploitation rate 𝑕: β„€z β†’ β„€z

This algorithm keeps track of

β€Ί a partition π”œ of the feature domain 0,1 % β€Ί initialized to 4πœƒ + 6 𝑀 % cubes (boxes) with side length π‘š = 1/ 4πœƒ + 6 𝑀 :

π”œ = 𝐽9 Γ— 𝐽J Γ— β‹― Γ— 𝐽% | 𝐽9, 𝐽J, … , 𝐽% ∈ 0, π‘š , π‘š, 2π‘š , … , 1 βˆ’ π‘š, 1

% .

slide-14
SLIDE 14

Penalized Exploiting Lipschitz Search (PELS)

For each box π‘Œ ∈ π”œ, PELS also keeps track of:

β€Ί the lower bound 𝑣† ∈ [0,1], β€Ί the upper bound π‘₯† ∈ [0,1], β€Ί the depth 𝑛† ∈ β„€z.

They are initialized as follows: 𝑣† = 0, π‘₯† = 1, and 𝑛† = 0, π‘Œ ∈ π”œ. The workflow of the algorithm is organized independently in each box π‘Œ ∈ π”œ.

β€Ί the algorithm receives a good with a feature vector 𝑦2 ∈ 0,1 % β€Ί finds the box π‘Œ ∈ π”œ in the current partition π”œ s.t. 𝑦2 ∈ π‘Œ.

β–Œ Then, the proposed price π‘ž2 is determined only from the current state β–Œ associated with the box π‘Œ, while the buyer decision 𝑏2 is used β–Œ only to update the state associated with this box π‘Œ.

slide-15
SLIDE 15

Penalized Exploiting Lipschitz Search (PELS)

In each box π‘Œ ∈ π”œ, the algorithm iteratively offers exploration price: 𝑣† + πœƒπ‘€diam(π‘Œ)

β–Œ If this price is accepted by the buyer:

β€Ί the lower bound 𝑣† is increased by 𝑀diam(π‘Œ).

β–Œ If this price is rejected:

β€Ί the upper bound π‘₯† is decreased by π‘₯† βˆ’ 𝑣† βˆ’ 2(πœƒ + 1)𝑀diam(π‘Œ) β€Ί 1 is offered as a penalization price for 𝑠 βˆ’ 1 next rounds in this box π‘Œ

(if one of them is accepted, we continue offering 1 all the remaining rounds).

slide-16
SLIDE 16

Penalized Exploiting Lipschitz Search (PELS)

β–Œ If, after an acceptance of an exploration price or after penalization rounds

we have π‘₯† βˆ’ 𝑣† < (2πœƒ + 3)𝑀diam(π‘Œ),

β–Œ then PELS:

β€Ί offers the exploitation price 𝑣† for 𝑕(𝑛†) next rounds in this box π‘Œ

(buyer decisions made at them do not affect further pricing);

β€Ί bisects each side of the box π‘Œ to obtain 2% boxes π”œβ€  ≔ π‘Œ9, … , π‘ŒJg

with β„“Ε½-diameter equal to diam(π‘Œ)/2;

β€Ί refines the partition π”œβ€  replacing the box π‘Œ by the new boxes π”œβ€ .

These new boxes π”œβ€ 

β€Ί inherit the state of the bounds 𝑣† and π‘₯† from the current state of π‘Œ, β€Ί while their depth 𝑛‒ = 𝑛† + 1 βˆ€π‘ ∈ π”œβ€ .

slide-17
SLIDE 17

PELS is optimal

Theorem 1. Let 𝑒 β‰₯ 1 and 𝛿f ∈ 0,1 . Then for the pricing algorithm PELS 𝐡 with:

β€Ί the number of penalization rounds 𝑠 β‰₯ logβ€™β€œ

9>β€™β€œ J

β€Ί the exploitation rate 𝑕 𝑛 = 2”, 𝑛 ∈ β„€z, β€Ί the price offset πœƒ β‰₯ 2/(1 βˆ’ 𝛿f)

for any valuation function 𝑀 ∈ LipX 0,1 % , discount 𝛿 ≀ 𝛿f, distribution 𝐸 and feature vectors 𝑦9::, the strategic regret is upper bounded: SReg π‘ˆ, 𝐡, 𝑀, 𝛿, 𝑦9::, 𝐸 ≀ 𝐷 𝑂f π‘ˆ + 𝑂f %

9 %z9 = Θ(π‘ˆ % %z9),

𝐷 ≔ 2%𝑠 2πœƒ + 3 + 𝑀>9 + 1 and 𝑂f ≔ 4πœƒ + 6 𝑀 %.

slide-18
SLIDE 18

PELS: main properties and extensions

β€Ί Can be applied against myopic buyer (𝛿 = 0) (setup of [Mao et al., NIPS’2018]) β€Ί PELS is horizon-independent (in contrast to [Mao et al., NIPS’2018])

β–Œ What if the loss is symmetric?

β€Ί We can generalize the algorithm to classical online learning losses β€Ί For instance, we want to optimize regret of the form βˆ‘

|𝑀(𝑦2) βˆ’ π‘ž2|

: 289

β€Ί But interacting with the strategic buyer still β€Ί Slight modification of PELS has regret 𝑃(π‘ˆ

gβ€”h g ), which is tight for 𝑒 > 1.

slide-19
SLIDE 19

adrutsa@yandex.ru

Thank you!

Alexey Drutsa Yandex