KenLM: Faster and Smaller Language Model Queries Kenneth Heafield - PowerPoint PPT Presentation

Backoff Models Data Structures Results KenLM: Faster and Smaller Language Model Queries Kenneth Heafield heafield@cs.cmu.edu Carnegie Mellon July 30, 2011 kheafield.com/code/kenlm Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures Results What KenLM Does Answer language model queries using less time and memory. log p ( < s > → iran) = -3.33437 log p ( < s > iran → is ) = -1.05931 log p ( < s > iran is → one) = -1.80743 log p ( < s > iran is one → of ) = -0.03705 log p ( iran is one of → the ) = -0.08317 log p ( is one of the → few) = -1.20788 Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures Results Related Work Downloadable Baselines SRI Popular and considered fast but high-memory IRST Open source, low-memory, single-threaded Rand Low-memory lossy compression MIT Mostly estimates models but also does queries Papers Without Code TPT Better memory locality Sheffield Lossy compression techniques Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures Results Related Work Downloadable Baselines SRI Popular and considered fast but high-memory IRST Open source, low-memory, single-threaded Rand Low-memory lossy compression MIT Mostly estimates models but also does queries Papers Without Code TPT Better memory locality Sheffield Lossy compression techniques After KenLM’s Public Release Berkeley Java; slower and larger than KenLM Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures Results Why I Wrote KenLM Decoding takes too long Answer queries quickly Load quickly with memory mapping Thread-safe Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures Results Why I Wrote KenLM Decoding takes too long Answer queries quickly Load quickly with memory mapping Thread-safe Bigger models Conserve memory Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures Results Why I Wrote KenLM Decoding takes too long Answer queries quickly Load quickly with memory mapping Thread-safe Bigger models Conserve memory SRI doesn’t compile Distribute and compile with decoders Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures State Results Outline Backoff Models 1 State Data Structures 2 Probing Trie Chop Results 3 Perplexity Translation Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures State Results Example Language Model Unigrams Bigrams Trigrams Words log p Back Words log p Back Words log p < s > - ∞ -2.0 < s > iran -3.3 -1.2 < s > iran is -1.1 iran -4.1 -0.8 iran is -1.7 -0.4 iran is one -2.0 is -2.5 -1.4 is one -2.0 -0.9 is one of -0.3 one -3.3 -0.9 one of -1.4 -0.6 of -2.5 -1.1 Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures State Results Example Queries Unigrams Bigrams Trigrams Words log p Back Words log p Back Words log p < s > - ∞ -2.0 < s > iran -3.3 -1.2 < s > iran is -1.1 iran -4.1 -0.8 iran is -1.7 -0.4 iran is one -2.0 is -2.5 -1.4 is one -2.0 -0.9 is one of -0.3 one -3.3 -0.9 one of -1.4 -0.6 of -2.5 -1.1 Query: < s > iran is Query: iran is of log p ( < s > iran → is) = -1.1 log p (of) -2.5 Backoff(is) -1.4 Backoff(iran is) + -0.4 log p (iran is → of) = -4.3 Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures State Results Lookups Performed by Queries < s > iran is iran is of Lookup Lookup 1 is 1 of 2 iran is 2 is of (not found) 3 is 3 < s > iran is 4 iran is Score Score log p ( < s > iran → is) = -1.1 log p (of) -2.5 Backoff(is) -1.4 Backoff(iran is) + -0.4 log p (iran is → of) = -4.3 Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures State Results Lookups Performed by Queries < s > iran is iran is of Lookup Lookup 1 is 1 of State 2 iran is 2 is of (not found) Backoff(is) 3 is 3 < s > iran is Backoff(iran is) 4 iran is Score Score log p ( < s > iran → is) = -1.1 log p (of) -2.5 Backoff(is) -1.4 Backoff(iran is) + -0.4 log p (iran is → of) = -4.3 Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures State Results Stateful Query Pattern log p ( < s > → iran) = -3.3 log p ( < s > iran → is ) = -1.1 log p ( iran is → one ) = -2.0 log p ( is one → of ) = -0.3 Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Data Structures State Results Stateful Query Pattern Backoff( < s > ) log p ( < s > → iran) = -3.3 Backoff(iran), Backoff( < s > iran) log p ( < s > iran → is ) = -1.1 Backoff(is), Backoff(iran is) log p ( iran is → one ) = -2.0 Backoff(one), Backoff(is one) log p ( is one → of ) = -0.3 Backoff(of), Backoff(one of) Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Probing Data Structures Trie Results Chop Data Structures Probing Fast. Uses hash tables. Trie Small. Uses sorted arrays. Chop Smaller. Trie with compressed pointers. Key Subproblem Sparse lookup: efficiently retrieve values for sparse keys Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Probing Data Structures Trie Results Chop Sparse Lookup Speed � 100 Lookups/ µ s 10 probing hash set unordered 1 interpolation binary search set 10 7 10 1000 100000 Entries Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Probing Data Structures Trie Results Chop Linear Probing Hash Table Store 64-bit hashes and ignore collisions. Bigrams Words Hash log p Back < s > iran 0xf0ae9c2442c6920e -3.3 -1.2 iran is -1.7 -0.4 0x959e48455f4a2e90 is one 0x186a7caef34acf16 -2.0 -0.9 one of -1.4 -0.6 0xac66610314db8dac Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Probing Data Structures Trie Results Chop Linear Probing Hash Table 1.5 buckets/entry (so buckets = 6). Ideal bucket = hash mod buckets. Resolve bucket collisions using the next free bucket. Bigrams Words Ideal Hash log p Back iran is 0 0x959e48455f4a2e90 -1.7 -0.4 0x0 0 0 is one 2 0x186a7caef34acf16 -2.0 -0.9 one of 2 0xac66610314db8dac -1.4 -0.6 < s > iran 4 0xf0ae9c2442c6920e -3.3 -1.2 0x0 0 0 Array Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Probing Data Structures Trie Results Chop Probing Data Structure Unigrams Bigrams Trigrams Words log p Back Words log p Back Words log p < s > - ∞ -2.0 < s > iran -3.3 -1.2 < s > iran is -1.1 iran -4.1 -0.8 iran is -1.7 -0.4 iran is one -2.0 is -2.5 -1.4 is one -2.0 -0.9 is one of -0.3 one -3.3 -0.9 one of -1.4 -0.6 Probing Hash Table of -2.5 -1.1 Probing Hash Table Array Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Probing Data Structures Trie Results Chop Probing Hash Table Summary Hash tables are fast. But memory is 24 bytes/entry. Next: Saving memory with Trie. Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Probing Data Structures Trie Results Chop Trie Uses Sorted Arrays Sort in suffix order. Unigrams Bigrams Trigrams Words log p Back Ptr Words log p Back Ptr Words log p < s > - ∞ -2.0 < s > iran -3.3 -1.2 < s > iran is -1.1 < s > one is iran -4.1 -0.8 iran is -1.7 -0.4 -2.3 is -2.5 -1.4 one is -2.3 -0.3 iran is one -2.0 < s > one < s > one of one -3.3 -0.9 -2.3 -1.1 -0.5 of -2.5 -1.1 is one -2.0 -0.9 is one of -0.3 one of -1.4 -0.6 Heafield KenLM: Faster and Smaller Language Model Queries

Backoff Models Probing Data Structures Trie Results Chop Trie Sort in suffix order. Encode suffix using pointers. Unigrams Bigrams Trigrams Words log p Back Ptr Words log p Back Ptr Words log p < s > - ∞ -2.0 0 < s > iran < s > iran is -3.3 -1.2 0 -1.1 iran -4.1 -0.8 0 < s > is -2.9 -1.0 0 < s > one is -2.3 is -2.5 -1.4 1 iran is -1.7 -0.4 0 iran is one -2.0 one -3.3 -0.9 4 one is -2.3 -0.3 1 < s > one of -0.5 of -2.5 -1.1 6 < s > one -2.3 -1.1 2 is one of -0.3 7 is one -2.0 -0.9 2 Array Array one of -1.4 -0.6 3 5 Array Heafield KenLM: Faster and Smaller Language Model Queries

KenLM: Faster and Smaller Language Model Queries Kenneth Heafield - PowerPoint PPT Presentation

Backoff Models Data Structures Results KenLM: Faster and Smaller Language Model Queries Kenneth Heafield heafield@cs.cmu.edu Carnegie Mellon July 30, 2011 kheafield.com/code/kenlm Heafield KenLM: Faster and Smaller Language Model Queries

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Range Minimum and Lowest Common Ancestor Queries Slides by Solon P. Pissis November 15, 2019

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

IACP Smaller Law Enforcement Agency Technical Assistance Program Smaller Agency Conference Track

Water Rights Accounting New Accounting Model New Technology: 1979 versus 2011 Faster

Creating smaller, faster, production-worthy mobile machine learning models Jameson Toole

Module 14: Analyzing Queries Overview Queries That Use the AND Operator the OR

Geometric Algorithms Range & windowing queries (2 lectures) Database queries 2/180 G.

Computational Geometry Lecture 14: Windowing queries Computational Geometry Lecture 14:

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

New Requirements Top-N/Bottom-N queries Interactive queries Decision making

Computational Geometry Lecture 15: Windowing queries Computational Geometry Lecture 15:

6502 Akira Baruah Chaiwen Chou Phil Schiffrin Sean Liu Our Goals Initially set out to emulate

FPGA high-resolution TDC Development of high-resolution TDC based on FPGA.

Induction Motor Emulation Senior Design Team 1506 Geoffrey Roy, Amber Reinwald, Matthew Geary

Integrated Schedulers for a Predictable Interrupt Management on Real-Time Kernels A. Crespo S.

Combining NICMOS Parallel Observations A. B. Schultz 1 and H. Bushouse Space Telescope Science

Agenda Who we are What this talk is about Why? Background Timing as a

Outline 1. Poor design practice and remedy Sequential Circuit Design: 2. More counters 3.

Elementary Data Structures Biostatistics 615/815 Lecture 6: . . 1 / 29 . Array Radix sort

KenLM: Faster and Smaller Language Model Queries Kenneth Heafield - PowerPoint PPT Presentation

Backoff Models Data Structures Results KenLM: Faster and Smaller Language Model Queries Kenneth Heafield heafield@cs.cmu.edu Carnegie Mellon July 30, 2011 kheafield.com/code/kenlm Heafield KenLM: Faster and Smaller Language Model Queries

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Range Minimum and Lowest Common Ancestor Queries Slides by Solon P. Pissis November 15, 2019

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

IACP Smaller Law Enforcement Agency Technical Assistance Program Smaller Agency Conference Track

Water Rights Accounting New Accounting Model New Technology: 1979 versus 2011 Faster

Creating smaller, faster, production-worthy mobile machine learning models Jameson Toole

Module 14: Analyzing Queries Overview Queries That Use the AND Operator the OR

Geometric Algorithms Range &amp; windowing queries (2 lectures) Database queries 2/180 G.

Computational Geometry Lecture 14: Windowing queries Computational Geometry Lecture 14:

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

New Requirements Top-N/Bottom-N queries Interactive queries Decision making

Computational Geometry Lecture 15: Windowing queries Computational Geometry Lecture 15:

6502 Akira Baruah Chaiwen Chou Phil Schiffrin Sean Liu Our Goals Initially set out to emulate

FPGA high-resolution TDC Development of high-resolution TDC based on FPGA.

Induction Motor Emulation Senior Design Team 1506 Geoffrey Roy, Amber Reinwald, Matthew Geary

Integrated Schedulers for a Predictable Interrupt Management on Real-Time Kernels A. Crespo S.

Combining NICMOS Parallel Observations A. B. Schultz 1 and H. Bushouse Space Telescope Science

Agenda Who we are What this talk is about Why? Background Timing as a

Outline 1. Poor design practice and remedy Sequential Circuit Design: 2. More counters 3.

Elementary Data Structures Biostatistics 615/815 Lecture 6: . . 1 / 29 . Array Radix sort

Geometric Algorithms Range & windowing queries (2 lectures) Database queries 2/180 G.