Optimal Non-parametric Learning in Repeated Contextual Auctions with - - PowerPoint PPT Presentation
Optimal Non-parametric Learning in Repeated Contextual Auctions with - - PowerPoint PPT Presentation
Optimal Non-parametric Learning in Repeated Contextual Auctions with Strategic Buyer Alexey Drutsa Setup Repeated Contextual Posted-Price Auctions Different goods (e.g., ad spaces) described by -dimensional feature vectors (contexts)
Optimal Non-parametric Learning in Repeated Contextual Auctions with Strategic Buyer
Alexey Drutsa
Setup
Repeated Contextual Posted-Price Auctions
Different goods (e.g., ad spaces)
βΊ described by π-dimensional feature vectors (contexts) from 0,1 % βΊ are repeatedly offered for sale by a seller βΊ to a single buyer over π rounds (one good per round).
The buyer
βΊ holds a private fixed valuation function π€: 0,1 % β 0,1 βΊ used to calculate his valuation π€(π¦) for a good with context π¦ β 0,1 %, βΊ π€ is unknown to the seller.
At each round π’ = 1, β¦ , π,
βΊ a feature vector π¦2 of the current good is observed by the seller and the buyer βΊ a price π2 is offered by the seller, βΊ and an allocation decision π2 β {0,1} is made by the buyer:
π2 = 0, when the buyer rejects, and π2 = 1, when the buyer accepts.
Sellerβs pricing algorithm and buyer strategy
The seller applies a pricing algorithm π΅ that sets prices {π2}289
:
in response to buyer decisions π = {π2}289
:
and observed contexts π² = {π¦2}289
:
. The price π2 can depend only on
βΊ past decisions {π=}=89
2>9
βΊ feature vectors {π¦=}=89
2
βΊ the horizon π
Strategic buyer
β The seller announces her pricing algorithm π΅ in advance
The buyer has some distribution (beliefs) πΈ about future contexts. In each round π’, given the history of previous rounds, he chooses his decision π2 s.t. it maximizes his future πΏ-discounted surplus: π½BC~E F πΏ=>9π=(π€(π¦=) β π=)
: =82
, πΏ β (0,1]
pu publ blic
kn knowle ledge
pr priva vate
kn knowle ledge before game starts
Se Seller
Al Algo gorithm hm
Na Nature Buye Buyer Al Algo gorithm
round π’ = 1
π¦9 π9 π9 πΈ
Al Algo gorithm hm
Na Nature Buye Buyer
round π’ = 2
π¦J πJ πJ
Al Algo gorithm hm
Na Nature Buye Buyer
round π’ = 3
π¦L πL πL π€ πΈ π€ πΈ π€
The gameβs workflow and knowledge structure
Sellerβs goal
The sellerβs strategic regret: SReg π, π΅, π€, πΏ, π¦9::, πΈ : = β (π€(π¦2) β π2
RSTπ2) : 289
We will learn the function π€ in a non-parametric way. For this, we will assume that it is Lipschitz (a standard requirement for non-parametric learning): LipX 0,1 % β π: 0,1 % β 0,1 |βπ¦, π§ β 0,1 % π π¦ β π π§ β€ π π¦ β π§ The seller seeks for a no-regret pricing for worst-case valuation function: supbβcdSe
f,9 g ,Bh:i,ESReg π, π΅, π€, πΏ, π¦9::, πΈ = π π
Optimality: the lowest possible upper bound for the regret of the form π π(π) .
Background & Research question
Background
[Kleinberg et al., FOCSβ2003] Non-contextual setup (π = 0). Horizon-dependent optimal algorithm against myopic buyer (πΏ = 0) with truthful regret Ξ(log log π). [Mao et al., NIPSβ2018] Our non-parametric contextual setup (π > 0). Horizon-dependent optimal algorithm against myopic buyer (πΏ = 0) with truthful regret Ξ(π
g gph).
[Drutsa, WWWβ2017] Non-contextual setup (π = 0). Horizon-independent optimal algorithm against strategic buyer with regret Ξ(log log π) for πΏ < 1. [Amin et al., NIPSβ2013] Non-contextual setup (π = 0). The strategic setting is introduced. β no-regret pricing for non-discount case πΏ = 1.
Research question
The key approaches of the non-contextual optimal algorithms ([pre]PRRFES) cannot be directly applied to contextual algorithm of [Mao et al., NIPSβ2018] In order to search the valuation of the strategic buyer without context:
βΊ Penalization rounds are used βΊ We do not propose prices below the ones that are earlier accepted
In the approach of [Mao et al., NIPSβ2018]:
βΊ Standard penalization does not help βΊ Proposed prices can be below the ones that are earlier accepted by the buyer
β In this study, I overcome these issues and propose an optimal β non-parametric algorithm for the contextual setting with strategic buyer
Novel optimal algorithm
Penalized Exploiting Lipschitz Search (PELS)
PELS has three parameters:
βΊ the price offset π β 1, +β βΊ the degree of penalization π β β βΊ the exploitation rate π: β€z β β€z
This algorithm keeps track of
βΊ a partition π of the feature domain 0,1 % βΊ initialized to 4π + 6 π % cubes (boxes) with side length π = 1/ 4π + 6 π :
π = π½9 Γ π½J Γ β― Γ π½% | π½9, π½J, β¦ , π½% β 0, π , π, 2π , β¦ , 1 β π, 1
% .
Penalized Exploiting Lipschitz Search (PELS)
For each box π β π, PELS also keeps track of:
βΊ the lower bound π£β β [0,1], βΊ the upper bound π₯β β [0,1], βΊ the depth πβ β β€z.
They are initialized as follows: π£β = 0, π₯β = 1, and πβ = 0, π β π. The workflow of the algorithm is organized independently in each box π β π.
βΊ the algorithm receives a good with a feature vector π¦2 β 0,1 % βΊ finds the box π β π in the current partition π s.t. π¦2 β π.
β Then, the proposed price π2 is determined only from the current state β associated with the box π, while the buyer decision π2 is used β only to update the state associated with this box π.
Penalized Exploiting Lipschitz Search (PELS)
In each box π β π, the algorithm iteratively offers exploration price: π£β + ππdiam(π)
β If this price is accepted by the buyer:
βΊ the lower bound π£β is increased by πdiam(π).
β If this price is rejected:
βΊ the upper bound π₯β is decreased by π₯β β π£β β 2(π + 1)πdiam(π) βΊ 1 is offered as a penalization price for π β 1 next rounds in this box π
(if one of them is accepted, we continue offering 1 all the remaining rounds).
Penalized Exploiting Lipschitz Search (PELS)
β If, after an acceptance of an exploration price or after penalization rounds
we have π₯β β π£β < (2π + 3)πdiam(π),
β then PELS:
βΊ offers the exploitation price π£β for π(πβ ) next rounds in this box π
(buyer decisions made at them do not affect further pricing);
βΊ bisects each side of the box π to obtain 2% boxes πβ β π9, β¦ , πJg
with βΕ½-diameter equal to diam(π)/2;
βΊ refines the partition πβ replacing the box π by the new boxes πβ .
These new boxes πβ
βΊ inherit the state of the bounds π£β and π₯β from the current state of π, βΊ while their depth πβ’ = πβ + 1 βπ β πβ .
PELS is optimal
Theorem 1. Let π β₯ 1 and πΏf β 0,1 . Then for the pricing algorithm PELS π΅ with:
βΊ the number of penalization rounds π β₯ logββ
9>ββ J
βΊ the exploitation rate π π = 2β, π β β€z, βΊ the price offset π β₯ 2/(1 β πΏf)
for any valuation function π€ β LipX 0,1 % , discount πΏ β€ πΏf, distribution πΈ and feature vectors π¦9::, the strategic regret is upper bounded: SReg π, π΅, π€, πΏ, π¦9::, πΈ β€ π· πf π + πf %
9 %z9 = Ξ(π % %z9),
π· β 2%π 2π + 3 + π>9 + 1 and πf β 4π + 6 π %.
PELS: main properties and extensions
βΊ Can be applied against myopic buyer (πΏ = 0) (setup of [Mao et al., NIPSβ2018]) βΊ PELS is horizon-independent (in contrast to [Mao et al., NIPSβ2018])
β What if the loss is symmetric?
βΊ We can generalize the algorithm to classical online learning losses βΊ For instance, we want to optimize regret of the form β
|π€(π¦2) β π2|
: 289
βΊ But interacting with the strategic buyer still βΊ Slight modification of PELS has regret π(π
gβh g ), which is tight for π > 1.
adrutsa@yandex.ru
Thank you!
Alexey Drutsa Yandex