categorical feature compression via submodular
play

Categorical Feature Compression via Submodular Optimization - PowerPoint PPT Presentation

Categorical Feature Compression via Submodular Optimization Mohammad Hossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, and Afshin Rostamizadeh Pacific Ballroom #142 Why Vocabulary Compression? Why Vocabulary Compression?


  1. Categorical Feature Compression via Submodular Optimization Mohammad Hossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, and Afshin Rostamizadeh Pacific Ballroom #142

  2. Why Vocabulary Compression?

  3. Why Vocabulary Compression? Embedding layer Huge! Video ID: ~7 billion values 99.9% of neural net

  4. How to Compress Vocabulary?

  5. How to Compress Vocabulary Group similar feature values into one. U.S. U.S./Canada Good compression preserves most Canada information of labels . China Supervised Japan Chn/Jpn/Kor Korea

  6. Problem Formulation

  7. Problem Formulation User ID Featur Compressed Favorite fruit (label) e feature Max I(f(X); C) #1843 China China/Japan/Korea s.t. f(X) can take at #429 Japan China/Japan/Korea most m values ... #9077 Brazil Brazil/Argentina Random variable Random variable Compressed feature X ∈ C ∈ {pear, apple, f(X) ∈ {Afghanistan, …, mango} {China/Japan/Korea, Albania, …, Brazil/Argentina, Zimbabwe} U.S./Canada}

  8. Our Results

  9. Our Results There is a quasi-linear (O(n log n)) algorithm that achieves Max I(f(X); C) 63% f(OPT) if label is binary . s.t. f(X) can ● Design a new submodular function after re-parametrization take at most m values There is a log(n) -round distributed algorithm that achieves 63% f(OPT) with O(n/k) space per machine. ● k is # of machines

  10. Reparametrization for Submodularity ● Sort feature values x according to P(X=x|C=0) . ● A problem of placing separators ● I(f(X); C) is a function of the set of separators.

  11. Experiment Results

  12. Pacific Ballroom #142 See you this evening

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend