Introduction Solving the Criterion Empirical Results Conclusion
Optimal Large-Scale Internet Media Selection Gareth James - - PowerPoint PPT Presentation
Optimal Large-Scale Internet Media Selection Gareth James - - PowerPoint PPT Presentation
Introduction Solving the Criterion Empirical Results Conclusion Optimal Large-Scale Internet Media Selection Gareth James Department of Data Sciences and Operations Marshall School of Business University of Southern California October 21st,
Introduction Solving the Criterion Empirical Results Conclusion
A General Constrained Optimization
Consider the following constrained, penalized optimization problem: arg min
β g(β) + λβ1
subject to Cβ = b, (1) where g is a convex function, β ∈ Rp, C ∈ Rm×p and b ∈ Rm are predefined matrices and vectors. There turn out to be many important problems that can be formulated in this fashion so we are interested in algorithms for solving (1) and the statistical properties of the resulting coefficient estimates.
Introduction Solving the Criterion Empirical Results Conclusion
Maximizing Reach Or Click Through Rate
The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad.
Introduction Solving the Criterion Empirical Results Conclusion
Maximizing Reach Or Click Through Rate
The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad. We have p websites and an advertising budget of $B. Let βj be the allocation to the jth website and g(β) represent either the estimated non-reach (or non-CTR) for a given budget allocation.
Introduction Solving the Criterion Empirical Results Conclusion
Maximizing Reach Or Click Through Rate
The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad. We have p websites and an advertising budget of $B. Let βj be the allocation to the jth website and g(β) represent either the estimated non-reach (or non-CTR) for a given budget allocation. Hence, we wish to minimize g(β) such that
j βj ≤ B and
βj ≥ 0. Or equivalently minimize g(β) + λβ1.
Introduction Solving the Criterion Empirical Results Conclusion
Maximizing Reach Or Click Through Rate
The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad. We have p websites and an advertising budget of $B. Let βj be the allocation to the jth website and g(β) represent either the estimated non-reach (or non-CTR) for a given budget allocation. Hence, we wish to minimize g(β) such that
j βj ≤ B and
βj ≥ 0. Or equivalently minimize g(β) + λβ1. However, in many campaigns we also wish to place restrictions on subsets of websites e.g. a cruise operator may wish to spend 30% of their budget on travel websites. This imposes a natural constraint of the form Cβ = b.
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
A Quadratic Approximation
We can approximate g by g(β) ≈ 1 2Y − Xβ2
2 + K
where X = H1/2, Y = H−1/2(J − H˜ β), H is the Hessian and J is the Jacobian (both evaluated at ˜ β). Hence, (1) can be approximated using the constrained lasso
- criterion. Minimize,
arg min
β
1 2Y − Xβ2
2 + λβ1
subject to Cβ = b, (2)
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Intuition for Equality Constraints
Suppose that we are given an index set, A, of size m.
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Intuition for Equality Constraints
Suppose that we are given an index set, A, of size m. Then we can partition β = (βA, β ¯
A),
C = (CA, C ¯
A).
Hence, CAβA + C ¯
Aβ ¯ A = b
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Intuition for Equality Constraints
Suppose that we are given an index set, A, of size m. Then we can partition β = (βA, β ¯
A),
C = (CA, C ¯
A).
Hence, CAβA + C ¯
Aβ ¯ A = b
Or βA = C−1
A (b − C ¯ Aβ ¯ A) .
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Removing the Constraint
Hence, all we need to do is compute β ¯
A = arg min θ
1 2Y∗ − X∗θ2
2 + λθ1 + λC−1 A (b − C ¯ Aθ) 1,
(3) and set βA = C−1
A (b − C ¯ Aβ ¯ A) .
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Removing the Constraint
Hence, all we need to do is compute β ¯
A = arg min θ
1 2Y∗ − X∗θ2
2 + λθ1 + λC−1 A (b − C ¯ Aθ) 1,
(3) and set βA = C−1
A (b − C ¯ Aβ ¯ A) .
The difficulty in computing (3) lies in the non-differentiability and non-separable nature of the second ℓ1 penalty.
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Intuition
However, if we choose an m-vector, s, such that s = sign (βA) , then for θ close enough to β ¯
A
C−1
A (b − C ¯ Aθ) 1 = sTC−1 A (b − C ¯ Aθ)
and we can replace the ℓ1 penalty by a differentiable term which no longer needs to be separable.
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Intuition
However, if we choose an m-vector, s, such that s = sign (βA) , then for θ close enough to β ¯
A
C−1
A (b − C ¯ Aθ) 1 = sTC−1 A (b − C ¯ Aθ)
and we can replace the ℓ1 penalty by a differentiable term which no longer needs to be separable. Now our optimization becomes β ¯
A
= arg min
θ
1 2Y∗ − X∗θ2
2 + λsTC−1 A (b − C ¯ Aθ) + λθ1
= arg min
θ
1 2˜ Y − ˜ Xθ2
2 + λθ1.
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Toy Example
𝜇
𝜇1 𝜇2
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Select Index Set A for m = 2
𝜇
Two largest coefficients for m=2
𝜇1 𝜇2
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Check Coefficients in A Maintained Same Sign
𝜇
Coefficients have not crossed zero so solution is correct
𝜇1 𝜇2
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Not Every Index Set Will Work
𝜇
Coefficients have not crossed zero so solution is correct This coefficient crossed zero so would have caused a problem to use.
𝜇1 𝜇2
Introduction Solving the Criterion Empirical Results Conclusion Developing an Algorithm
Select New Index Set for Next Step
𝜇
New index set.
𝜇2 𝜇1
Introduction Solving the Criterion Empirical Results Conclusion Website Data
Click Through Rate
2 4 6 8 10 0.00 0.02 0.04 0.06 0.08 Budget (in millions) CTR (Full) 2 4 6 8 10 0.00 0.05 0.10 0.15 Budget (in millions) CTR (Travel Subset)
Introduction Solving the Criterion Empirical Results Conclusion
Summary
A large number of real world problems are special cases of this constrained and penalized framework. A simple algorithm, using standard lasso fitting methods, can be used to efficiently compute the solution to our optimization problem. Theoretical bounds on the coefficients can be extended from the lasso and suggest better performance. Simulation results show practical improvement, computational efficiency and relative insensitivity to the constraints. Provides a highly efficient and practical approach to select
- ptimal allocations of advertising budget in situations