Optimal Large-Scale Internet Media Selection Gareth James - - PowerPoint PPT Presentation

optimal large scale internet media selection
SMART_READER_LITE
LIVE PREVIEW

Optimal Large-Scale Internet Media Selection Gareth James - - PowerPoint PPT Presentation

Introduction Solving the Criterion Empirical Results Conclusion Optimal Large-Scale Internet Media Selection Gareth James Department of Data Sciences and Operations Marshall School of Business University of Southern California October 21st,


slide-1
SLIDE 1

Introduction Solving the Criterion Empirical Results Conclusion

Optimal Large-Scale Internet Media Selection

Gareth James

Department of Data Sciences and Operations Marshall School of Business University of Southern California

October 21st, 2016 Joint with Paat Rusmevichientong, Lan Luo and Courtney Paulson

slide-2
SLIDE 2

Introduction Solving the Criterion Empirical Results Conclusion

A General Constrained Optimization

Consider the following constrained, penalized optimization problem: arg min

β g(β) + λβ1

subject to Cβ = b, (1) where g is a convex function, β ∈ Rp, C ∈ Rm×p and b ∈ Rm are predefined matrices and vectors. There turn out to be many important problems that can be formulated in this fashion so we are interested in algorithms for solving (1) and the statistical properties of the resulting coefficient estimates.

slide-3
SLIDE 3

Introduction Solving the Criterion Empirical Results Conclusion

Maximizing Reach Or Click Through Rate

The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad.

slide-4
SLIDE 4

Introduction Solving the Criterion Empirical Results Conclusion

Maximizing Reach Or Click Through Rate

The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad. We have p websites and an advertising budget of $B. Let βj be the allocation to the jth website and g(β) represent either the estimated non-reach (or non-CTR) for a given budget allocation.

slide-5
SLIDE 5

Introduction Solving the Criterion Empirical Results Conclusion

Maximizing Reach Or Click Through Rate

The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad. We have p websites and an advertising budget of $B. Let βj be the allocation to the jth website and g(β) represent either the estimated non-reach (or non-CTR) for a given budget allocation. Hence, we wish to minimize g(β) such that

j βj ≤ B and

βj ≥ 0. Or equivalently minimize g(β) + λβ1.

slide-6
SLIDE 6

Introduction Solving the Criterion Empirical Results Conclusion

Maximizing Reach Or Click Through Rate

The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad. We have p websites and an advertising budget of $B. Let βj be the allocation to the jth website and g(β) represent either the estimated non-reach (or non-CTR) for a given budget allocation. Hence, we wish to minimize g(β) such that

j βj ≤ B and

βj ≥ 0. Or equivalently minimize g(β) + λβ1. However, in many campaigns we also wish to place restrictions on subsets of websites e.g. a cruise operator may wish to spend 30% of their budget on travel websites. This imposes a natural constraint of the form Cβ = b.

slide-7
SLIDE 7

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

A Quadratic Approximation

We can approximate g by g(β) ≈ 1 2Y − Xβ2

2 + K

where X = H1/2, Y = H−1/2(J − H˜ β), H is the Hessian and J is the Jacobian (both evaluated at ˜ β). Hence, (1) can be approximated using the constrained lasso

  • criterion. Minimize,

arg min

β

1 2Y − Xβ2

2 + λβ1

subject to Cβ = b, (2)

slide-8
SLIDE 8

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Intuition for Equality Constraints

Suppose that we are given an index set, A, of size m.

slide-9
SLIDE 9

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Intuition for Equality Constraints

Suppose that we are given an index set, A, of size m. Then we can partition β = (βA, β ¯

A),

C = (CA, C ¯

A).

Hence, CAβA + C ¯

Aβ ¯ A = b

slide-10
SLIDE 10

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Intuition for Equality Constraints

Suppose that we are given an index set, A, of size m. Then we can partition β = (βA, β ¯

A),

C = (CA, C ¯

A).

Hence, CAβA + C ¯

Aβ ¯ A = b

Or βA = C−1

A (b − C ¯ Aβ ¯ A) .

slide-11
SLIDE 11

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Removing the Constraint

Hence, all we need to do is compute β ¯

A = arg min θ

1 2Y∗ − X∗θ2

2 + λθ1 + λC−1 A (b − C ¯ Aθ) 1,

(3) and set βA = C−1

A (b − C ¯ Aβ ¯ A) .

slide-12
SLIDE 12

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Removing the Constraint

Hence, all we need to do is compute β ¯

A = arg min θ

1 2Y∗ − X∗θ2

2 + λθ1 + λC−1 A (b − C ¯ Aθ) 1,

(3) and set βA = C−1

A (b − C ¯ Aβ ¯ A) .

The difficulty in computing (3) lies in the non-differentiability and non-separable nature of the second ℓ1 penalty.

slide-13
SLIDE 13

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Intuition

However, if we choose an m-vector, s, such that s = sign (βA) , then for θ close enough to β ¯

A

C−1

A (b − C ¯ Aθ) 1 = sTC−1 A (b − C ¯ Aθ)

and we can replace the ℓ1 penalty by a differentiable term which no longer needs to be separable.

slide-14
SLIDE 14

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Intuition

However, if we choose an m-vector, s, such that s = sign (βA) , then for θ close enough to β ¯

A

C−1

A (b − C ¯ Aθ) 1 = sTC−1 A (b − C ¯ Aθ)

and we can replace the ℓ1 penalty by a differentiable term which no longer needs to be separable. Now our optimization becomes β ¯

A

= arg min

θ

1 2Y∗ − X∗θ2

2 + λsTC−1 A (b − C ¯ Aθ) + λθ1

= arg min

θ

1 2˜ Y − ˜ Xθ2

2 + λθ1.

slide-15
SLIDE 15

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Toy Example

𝜇

𝜇1 𝜇2

slide-16
SLIDE 16

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Select Index Set A for m = 2

𝜇

Two largest coefficients for m=2

𝜇1 𝜇2

slide-17
SLIDE 17

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Check Coefficients in A Maintained Same Sign

𝜇

Coefficients have not crossed zero so solution is correct

𝜇1 𝜇2

slide-18
SLIDE 18

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Not Every Index Set Will Work

𝜇

Coefficients have not crossed zero so solution is correct This coefficient crossed zero so would have caused a problem to use.

𝜇1 𝜇2

slide-19
SLIDE 19

Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm

Select New Index Set for Next Step

𝜇

New index set.

𝜇2 𝜇1

slide-20
SLIDE 20

Introduction Solving the Criterion Empirical Results Conclusion Website Data

Click Through Rate

2 4 6 8 10 0.00 0.02 0.04 0.06 0.08 Budget (in millions) CTR (Full) 2 4 6 8 10 0.00 0.05 0.10 0.15 Budget (in millions) CTR (Travel Subset)

slide-21
SLIDE 21

Introduction Solving the Criterion Empirical Results Conclusion

Summary

A large number of real world problems are special cases of this constrained and penalized framework. A simple algorithm, using standard lasso fitting methods, can be used to efficiently compute the solution to our optimization problem. Theoretical bounds on the coefficients can be extended from the lasso and suggest better performance. Simulation results show practical improvement, computational efficiency and relative insensitivity to the constraints. Provides a highly efficient and practical approach to select

  • ptimal allocations of advertising budget in situations

involving thousands of websites.