Limits on Representing Functions by Linear Combinations of Simple - - PowerPoint PPT Presentation

โ–ถ
limits on representing functions by linear combinations
SMART_READER_LITE
LIVE PREVIEW

Limits on Representing Functions by Linear Combinations of Simple - - PowerPoint PPT Presentation

Limits on Representing Functions by Linear Combinations of Simple Functions 0,1 0,1 ? simple simple simple simple simple simple Ryan Williams MIT The -linear Representation Problem Let be a class of


slide-1
SLIDE 1

Limits on Representing Functions by Linear Combinations of Simple Functions

Ryan Williams MIT

simple simple simple simple

โˆ‘

๐‘” โˆถ 0,1 ๐‘œ โ†’ 0,1 ? โ‰ก

simple simple

slide-2
SLIDE 2

Let ๐““ be a class of โ€œsimpleโ€ functions (take Boolean inputs, but need not be Boolean-valued)

The โ„-linear Representation Problem

simple simple simple simple simple simple

โˆ‘

Which โ€œinterestingโ€ functions ๐’ˆ can(not) be represented by โ€œshortโ€ โ„-linear combinations of functions from ๐““?

๐‘” โˆถ 0,1 ๐‘œ โ†’ 0,1 โ‰ก

poly(๐’) โ€œsizeโ€?

2 โˆ’๐œŒ โˆ’๐‘“ ๐œš

Call this a โˆ‘ โˆ˜ ๐““ circuit

Note: If ๐““ spans the vector space of all functions ๐’ˆ โˆถ ๐Ÿ, ๐Ÿ ๐’ โ†’ โ„ then there is always a โˆ‘ โˆ˜ ๐““ circuit of โ‰ค ๐Ÿ‘๐’ sizeโ€ฆ

slide-3
SLIDE 3

The โ„-linear Representation Problem

Which โ€œinterestingโ€ functions ๐’ˆ can(not) be represented by โ€œshortโ€ โ„-linear combinations of functions from ๐““? If ๐““ is the class of ๐Ÿ‘๐’ ๐‘ฉ๐‘ถ๐‘ฌ functions on ๐’ variables: โˆ‘ โˆ˜ ๐‘ฉ๐‘ถ๐‘ฌ โ‰ก ๐Ÿ/๐Ÿ polynomials over โ„ If ๐““ is the class of ๐Ÿ‘๐’ ๐‘ธ๐‘ฉ๐‘บ๐‘ฑ๐‘ผ๐’ functions on ๐’ variables: โˆ‘ โˆ˜ ๐‘ธ๐‘ฉ๐‘บ๐‘ฑ๐‘ผ๐’ โ‰ก โˆ’๐Ÿ/๐Ÿ polynomials over โ„ (Fourier analysis of Boolean functions) These are well-understood: ๐““ is a basis for the vector space of functions ๐‘” โˆถ 0,1 ๐‘œ โ†’ โ„ โ‡’ the โ„-linear representation of ๐’ˆ is unique, so the โ€œshortestโ€ is also the โ€œlongestโ€โ€ฆ More interesting cases: representations are not unique

slide-4
SLIDE 4
  • 1. Linear Threshold Functions [๐‘ด๐‘ผ๐‘ฎ]
  • 2. Rectified Linear Units [๐‘บ๐’‡๐‘ด๐‘ฝ]

3. ๐‘ฏ๐‘ฎ(๐’’)-Polynomials of Degree-๐’† [๐‘ธ๐‘ท๐‘ด๐’๐’† ๐’’ ] (๐’’ prime and ๐’† โ‰ฅ ๐Ÿ‘)

This Paper: Three Simple Classes

  • There are โ‰ซ ๐Ÿ‘๐’ functions on ๐’ variables,

so โ„-linear representations are not unique

๐Ÿ‘๐šฐ ๐’๐Ÿ‘ LTFs, ๐’’๐šฐ ๐’๐’† degree-๐’† polys, โˆž ReLU functions

  • โ„-linear Representations have been studied!

โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ = Special Case of Depth-2 Threshold Circuits โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ = โ€œDepth-2 Neural Net with ReLU activationโ€ โˆ‘ โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐’†[๐’’] = โ€œHigher-Orderโ€ Fourier Analysis for ๐’† โ‰ฅ ๐Ÿ‘

For all three classes:

slide-5
SLIDE 5

Depth-Two LTF Circuits (๐‘ด๐‘ผ๐‘ฎ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ): Major problem to find โ€œniceโ€ functions without ๐‘œ๐‘™-gate ๐‘€๐‘ˆ๐บ โˆ˜ ๐‘€๐‘ˆ๐บ circuits, for all ๐‘™

Sums of Linear Threshold Functions

We prove: Thm โˆ€๐’, โˆƒ๐’ˆ๐’ โˆˆ ๐‘ถ๐‘ธ without ๐’๐’-size โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ

[Hajnal et al.โ€™91] exp(n) depth-two lower bounds for small ๐‘ฅ๐‘—โ€™s

  • Def. ๐‘”

๐‘œ: 0,1 ๐‘œ โ†’ 0,1 is an LTF if โˆƒ ๐‘ฅ1, โ€ฆ ๐‘ฅ๐‘œ, ๐‘ข โˆˆ โ„ such that

โˆ€ ๐‘ฆ1, โ€ฆ , ๐‘ฆ๐‘œ โˆˆ 0,1 ๐‘œ, ๐’ˆ ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ = ๐Ÿ โ‡” โˆ‘๐’‹ ๐’™๐’‹๐’š๐’‹ โ‰ฅ ๐’– [Roychowdhury-Orlitsky-Siuโ€™94] What about โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ? Special case of ๐‘ด๐‘ผ๐‘ฎ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ: the linear form for output LTF must always evaluate to 0 or 1 Still, no ๐’๐Ÿ.๐Ÿ”-gate lower bounds were known for โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ!

Thm โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ผ๐‘ฑ๐‘ต๐‘ญ[๐’๐’Ž๐’‘๐’‰โˆ—๐’] without ๐’’๐’‘๐’Ž๐’›(๐’)-size โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ

Note: It is a major open problem to prove โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ธ without ๐’๐’-size (unrestricted) circuits

slide-6
SLIDE 6

โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ generalizes โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ

Sums of ReLUs

We can generalize the โˆ‘ โˆ˜ ๐‘ด๐‘ผ๐‘ฎ limits to โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ: Thm โˆ€๐’, โˆƒ๐’ˆ๐’ โˆˆ ๐‘ถ๐‘ธ without ๐’๐’-size โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ

โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ = โ€œDepth-Two Neural Nets with ReLU Activationsโ€ Very widely studied, thousands of references

  • Def. ๐‘”

๐‘œ: โ„๐‘œ โ†’ โ„+ is a ReLU if โˆƒ ๐‘ฅ1, โ€ฆ ๐‘ฅ๐‘œ, ๐‘ข โˆˆ โ„ such that

โˆ€ ๐‘ฆ1, โ€ฆ , ๐‘ฆ๐‘œ โˆˆ โ„๐‘œ, ๐’ˆ ๐’š๐Ÿ, โ€ฆ , ๐’š๐’ = ๐ง๐›๐ฒ(๐Ÿ, โˆ‘๐’‹ ๐’™๐’‹๐’š๐’‹ + ๐’–) Several recent references [see paper] give lower bounds for some โ€œweirdโ€ ๐’ˆ: โ„๐‘œ โ†’ โ„ which vary sharply / sensitive

Thm โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ผ๐‘ฑ๐‘ต๐‘ญ[๐’๐’Ž๐’‘๐’‰โˆ—๐’] without ๐’’๐’‘๐’Ž๐’›(๐’)-size โˆ‘ โˆ˜ ๐‘บ๐’‡๐‘ด๐‘ฝ

Again: major open problem to prove โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ธ without ๐’๐’-size (unrestricted) circuits No lower bounds known for discrete-domain / Boolean functions (note: โ€œmost sensitiveโ€ Boolean fn PARITY has O(n)-size โˆ‘โˆ˜ ๐‘ด๐‘ผ๐‘ฎ)

slide-7
SLIDE 7

Compelling Conjecture [โ€œDegree-Two Uncertainty Principleโ€]: ๐‘ฉ๐‘ถ๐‘ฌ (on ๐’ inputs) requires ๐’๐ ๐Ÿ -size โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐Ÿ‘[๐Ÿ‘]

Sums of Low-Degree GF(p)-Polys

We prove:

Thm โˆ€๐’†, ๐’, โˆ€๐’’ prime, โˆƒ๐’ˆ๐’ โˆˆ ๐‘ถ๐‘ธ without ๐’๐’-size โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐’†[๐’’] Known: ๐‘ฉ๐‘ถ๐‘ฌ requires ฮฉ(2๐‘œ)-size โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐Ÿ ๐Ÿ‘ โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐’†[๐’’]: Linear combination of ๐‘”: 0,1 ๐‘œ โ†’ {0,1, โ€ฆ , ๐‘ž โˆ’ 1} where for every ๐‘” there is a degree-๐‘’ polynomial ๐‘Ÿ(๐‘ฆ) such that โˆ€๐‘ฆ โˆˆ 0,1 ๐‘œ, ๐’ˆ ๐’š = ๐’“ ๐’š mod ๐’’

No non-trivial lower bounds were known for โˆ‘ โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐Ÿ‘[๐’’]

Thm โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ผ๐‘ฑ๐‘ต๐‘ญ[๐’๐’Ž๐’‘๐’‰โˆ—๐’] without ๐’’๐’‘๐’Ž๐’›(๐’)-size โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐’†[๐’’] for all fixed ๐’† and fixed prime ๐’’ ๐‘ฉ๐‘ถ๐‘ฌ has O(2๐‘œ/2)-size โˆ‘โˆ˜ ๐‘ธ๐‘ท๐‘ด๐’๐Ÿ‘[๐Ÿ‘] Case of ๐’† = ๐Ÿ‘, ๐’’ = ๐Ÿ‘ is already very interesting!

slide-8
SLIDE 8

Key Theorem: Let ๐““ be a class of functions ๐’ˆ โˆถ ๐Ÿ, ๐Ÿ ๐’ โ†’ โ„. Assume: there is an ๐œป > ๐Ÿ and an algorithm ๐‘ฉ so that for any given ๐’ˆ๐Ÿ, โ€ฆ , ๐’ˆ๐Ÿ“ โˆˆ ๐““, ๐‘ฉ can compute the โ€œsum-productโ€ เท

๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’

เท‘

๐’‹=๐Ÿ ๐Ÿ“

๐’ˆ๐’‹(๐’ƒ) in ๐Ÿ‘๐’ ๐Ÿโˆ’๐œป time. Then: โˆ€๐’, โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ธ without ๐’๐’-size โˆ‘โˆ˜ ๐““, and โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ผ๐‘ฑ๐‘ต๐‘ญ ๐’๐’Ž๐’‘๐’‰โˆ—๐’ without ๐’’๐’‘๐’Ž๐’›(๐’)-size โˆ‘โˆ˜ ๐““ Applies the new Easy Witness Lemma of [Murray-Wโ€™18]

A Key Theorem

A new instance of โ€œCircuit Analysis Algorithms โ‡’ Circuit Lower Boundsโ€ We show how to compute sum-products in ๐Ÿ‘๐’ ๐Ÿโˆ’๐œป time for LTFs, ReLUs, and low-degree polynomials

slide-9
SLIDE 9

[Murray-Wโ€™18] โ‡’ โˆ€๐’, โˆƒ๐’ˆ โˆˆ ๐‘ถ๐‘ธ without ๐’๐’-size unrestricted circuits

Major Ideas in the Key Theorem

Assume: (1) There is a ๐Ÿ‘๐’ ๐Ÿโˆ’๐œป -time sum-product algorithm ๐‘ฉ for ๐““ (2) For some fixed ๐’, all ๐’ˆ โˆˆ ๐‘ถ๐‘ธ have ๐’๐’-size โˆ‘โˆ˜ ๐““ Goal: Derive a contradiction. (1) and (2) โ‡’ Given (unrestricted) circuit ๐‘ผ with ๐’ inputs and ๐’ size Can guess-and-check ๐’๐’-size โˆ‘โˆ˜ ๐““ computing ๐‘ผ, in ๐Ÿ‘๐’ ๐Ÿโˆ’๐œป ๐’๐‘ท ๐Ÿ time (1) โ‡’ Can solve Circuit-UNSAT in nondeterministic ๐Ÿ‘๐’ ๐Ÿโˆ’๐œป ๐’๐‘ท ๐Ÿ time

We can even solve #Circuit-SAT, because we can compute โˆ‘๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’(โˆ‘โˆ˜ ๐““ ๐’ƒ ) = โˆ‘ โˆ‘๐’ƒ ๐““(๐’ƒ) by solving sum-product for ๐’๐’ times

Contradicts (2) when โˆ‘โˆ˜ ๐““ can be simulated by Boolean circuits!

Note: to guess, we need that the coefficients in our linear combinations have โ€œsmallโ€ bit complexity, WLOG

The proof crucially relies on โˆ‘โˆ˜ ๐““ computing a circuit exactly

slide-10
SLIDE 10

Sum-Product Algorithm for LTF

Uses (old) fact that #Subset-Sum is solvable in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘๐’/๐Ÿ‘ time! Thm [HSโ€™76] #Subset-Sum on ๐’ numbers is in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘๐’/๐Ÿ‘ time Proof Given ๐’™๐Ÿ, โ€ฆ , ๐’™๐’, ๐’–, we want to know the number of ๐‘ป โІ [๐’] such that โˆ‘๐’‹โˆˆ๐‘ป ๐’™๐’‹ = ๐’– Takes ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘๐’/๐Ÿ‘ time in total

  • 1. Enumerate all possible ๐Ÿ‘๐’/๐Ÿ‘ subsets ๐‘ป of {๐’™๐Ÿ, โ€ฆ , ๐’™๐’/๐Ÿ‘}.

Make a list ๐‘ด๐Ÿ of the ๐Ÿ‘๐’/๐Ÿ‘ subset sums, and SORT all sums in ๐‘ด๐Ÿ

  • 2. Enumerate all possible ๐Ÿ‘๐’/๐Ÿ‘ subsets ๐‘ผ of {๐’™๐’/๐Ÿ‘+๐Ÿ, โ€ฆ , ๐’™๐’}.

For each ๐‘ผ summing to a value ๐’˜, BINARY SEARCH for a value ๐’˜โ€ฒ in ๐‘ด๐Ÿ such that ๐’˜ + ๐’˜โ€ฒ = ๐’–

  • 3. To compute the total number of subsets summing to ๐’–:

For each sum value ๐’˜โ€ฒ appearing in ๐‘ด๐Ÿ, store the number ๐’๐’˜โ€ฒ of subsets in ๐‘ด๐Ÿ which have value ๐’˜โ€ฒ. Later, if value ๐’˜โ€ฒ is found in the binary search, add ๐’๐’˜โ€ฒ to a running sum.

slide-11
SLIDE 11

Sum-Product Algorithm for LTF

Uses (old) fact that #Subset-Sum is solvable in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘๐’/๐Ÿ‘ time! Thm For any ๐’ˆ๐Ÿ, โ€ฆ , ๐’ˆ๐Ÿ“ โˆˆ ๐‘ด๐‘ผ๐‘ฎ, we can compute เท

๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’

เท‘

๐’‹=๐Ÿ ๐Ÿ“

๐’ˆ๐’‹(๐’ƒ) Proof An Exact LTF (๐‘ญ๐‘ด๐‘ผ๐‘ฎ) has the form ๐’‰ ๐’š = ๐Ÿ โ‡” โˆ‘๐’‹ ๐’™๐’‹๐’š๐’‹ = ๐’– So we can write [HP, CCCโ€™10]: Every ๐‘ด๐‘ผ๐‘ฎ on ๐’ inputs can be written as โˆ‘๐’’๐’‘๐’Ž๐’› ๐’ ๐‘ญ๐‘ด๐‘ผ๐‘ฎ

เท

๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’

เท‘

๐’‹=๐Ÿ ๐Ÿ“

๐’ˆ๐’‹(๐’ƒ) = เท

๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’

เท‘

๐’‹=๐Ÿ ๐Ÿ“

เท

๐’’๐’‘๐’Ž๐’› ๐’

๐’‰๐’‹,๐’Œ(๐’ƒ)

for ๐‘ญ๐‘ด๐‘ผ๐‘ฎs ๐’‰๐’‹,๐’Œ

= เท

๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’

เท

๐’’๐’‘๐’Ž๐’› ๐’

เท‘

๐’‹=๐Ÿ ๐Ÿ“

๐’‰๐’‹,๐’Œโ€ฒ ๐’ƒ

Simple algebra:

= เท

๐’’๐’‘๐’Ž๐’› ๐’

เท

๐’ƒโˆˆ{๐Ÿ,๐Ÿ}๐’

เท‘

๐’‹=๐Ÿ ๐Ÿ“

๐’‰๐’‹,๐’Œโ€ฒ ๐’ƒ

Can compute in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘๐’/๐Ÿ‘ time! Each ฯ‚๐’‹=๐Ÿ

๐Ÿ“

๐’‰๐’‹,๐’Œโ€ฒ ๐’š = ๐’Š ๐’š for some ๐‘ญ๐‘ด๐‘ผ๐‘ฎ ๐’Š

in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘๐’/๐Ÿ‘ time.

#Subset-Sum in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘๐’/๐Ÿ‘ time โ‡’ โˆ‘๐‘ ๐‘• ๐‘ in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘๐’/๐Ÿ‘ time

slide-12
SLIDE 12

Sum-Product Algorithm for Polys

Uses (recent) fact that counting Boolean roots of ๐’-variable degree-๐’† GF(๐’’)-polynomials is solvable in ๐Ÿ‘๐’ ๐Ÿโˆ’๐Ÿ/๐‘ท(๐’’๐’†) time [LPTWYโ€™17] Thm For any ๐’ˆ๐Ÿ, โ€ฆ , ๐’ˆ๐Ÿ“ โˆˆ ๐‘ธ๐‘ท๐‘ด๐’๐’†[๐’’], we can compute เท

๐’ƒโˆˆ ๐Ÿ,๐Ÿ ๐’

เท‘

๐’‹=๐Ÿ ๐Ÿ“

๐’ˆ๐’‹(๐’ƒ) Proof Idea Reduce the sum-product problem on four degree-๐’† polys to counting Boolean roots of ๐‘ท(๐Ÿ) degree-๐‘ท(๐’’๐’†) polys in ๐’’๐’‘๐’Ž๐’› ๐’ โ‹… ๐Ÿ‘

๐’ ๐Ÿโˆ’

๐Ÿ ๐‘ท ๐’’๐’†

time.

Algorithm uses a derandomized version of Razborov-Smolenskyโ€™s probabilistic representation of AC0[๐’’] by low-degree GF(๐’’) polynomials, along with a divide-and-conquer approach for fast evaluation

slide-13
SLIDE 13

Open Problems

We proved fixed-polynomial lower bounds for functions in ๐‘ถ๐‘ธ Can we prove ๐‘ป๐‘ฉ๐‘ผ requires ๐’๐’-size โˆ‘โˆ˜ ๐‘ด๐‘ผ๐‘ฎ, for all k? New ways of deriving strong lower bounds from โ€œoldโ€ approaches Open even when ๐’† = ๐Ÿ‘. Could our approaches say anything?

Reviewer asked: Can โ€œsplit-and-listโ€ be viewed as a lower bound method? The alg. for polys uses Razborov-Smolensky, which is used in lower bounds!

For each ๐’, there is an ๐’ˆ โˆˆ ๐‘ถ๐‘ผ๐‘ฑ๐‘ต๐‘ญ ๐’๐’‰ ๐’ without ๐’๐’-size โˆ‘โˆ˜ ๐‘ด๐‘ผ๐‘ฎ

Current algorithms-to-lower bounds connections donโ€™t seem to point a way Constant Degree Hypothesis [Barrington-Straubing-Therienโ€™90]:

For each fixed ๐’†, ๐‘ฉ๐‘ถ๐‘ฌ does not have ๐‘ต๐‘ท๐‘ฌ๐’’ โˆ˜ ๐‘ต๐‘ท๐‘ฌ๐’“ โˆ˜ ๐‘ฉ๐‘ถ๐‘ฌ๐’† circuits of ๐Ÿ‘๐’‘ ๐’ size

The (old) algorithm for #Subset-Sum splits the instance of ๐’ items into two parts of size ๐’/๐Ÿ‘ each, lists all ๐Ÿ‘๐’/๐Ÿ‘ subsums separately, sorts the two lists and binary searches for the overall subset sum.

slide-14
SLIDE 14

Thank you!