SCALABLE ORDINAL EMBEDDING TO MODEL USER BEHAVIOR 2 3 4 PAIRWISE - PowerPoint PPT Presentation

JESSE ANDERTON ADVISOR: JAVED ASLAM COMMITTEE MEMBERS: FERNANDO DIAZ, DAVID SMITH, BYRON WALLACE SCALABLE ORDINAL EMBEDDING TO MODEL USER BEHAVIOR

4 PAIRWISE CITY DISTANCES Boston NYC Seattle SF Boston – 190 2,485 2,692 NYC – – 2,401 2,565 Seattle – – – 679 SF – – – –

5 TOTAL DISTANCE ORDER Boston NYC Seattle SF Boston – 1 st 4 th 6 th NYC – – 3 rd 5 th Seattle – – – 2 nd SF – – – –

6 DISTANCE RANKINGS Boston NYC Seattle SF Boston – 1 st 2 nd 3 rd NYC 1 st – 2 nd 3 rd Anchor Seattle 3 rd 2 nd – 1 st SF 3 rd 2 nd 1 st –

7 DISTANCE RANKINGS Boston NYC Seattle SF Boston – 1 st 2 nd 3 rd NYC 1 st – 2 nd 3 rd Anchor Seattle 3 rd 2 nd – 1 st SF 3 rd 2 nd 1 st – Perfect? Boston NYC Seattle SF

8 DISTANCE RANKINGS Boston NYC Seattle SF Dallas Boston – 1 st 3 rd 4 th 2 nd NYC 1 st – 3 rd 4 th 2 nd Anchor Seattle 4 th 3 rd – 1 st 2 nd SF 4 th 3 rd 1 st – 2 nd Dallas 3 rd 1 st 4 th 2 nd – Perfect? No! Boston NYC Seattle SF

9 WHAT IS ORDINAL EMBEDDING? ASSIGNING ORDER-PRESERVING POSITIONS ▸ An embedding positions a set of objects within some vector space (like ℝ d ) to satisfy some objective. ▸ An ordinal embedding focuses on satisfying some given ordering constraints. ▸ Constraints can be expressed as triples like: “Boston is closer to New York City than to Seattle” “The Matrix is more like Star Wars than it is like La La Land” “People who like steak tend to prefer chicken over tofu ”

10 EVALUATING ORDINAL EMBEDDING EVALUATE BY RANK CORRELATION Mean Kendall’s 𝜐 – Mean rank correlation across anchors Mean 𝜐 AP – Mean top-heavy rank correlation across anchors GROUND TRUTH RANKINGS EMBEDDING RANKINGS Boston NYC Seattle SF Boston NYC Seattle SF 1 st 2 nd 3 rd 1 st 3 rd 2 nd Boston – Boston – 1 st 2 nd 3 rd 1 st 2 nd 3 rd NYC – NYC – Anchor 3 rd 2 nd 1 st 1 st 2 nd 3 rd Seattle – Seattle – SF 3 rd 2 nd 1 st – SF 3 rd 1 st 2 nd –

11 WHY USE ORDINAL EMBEDDING? HUMAN-BASED PREFERENCE/SIMILARITY ▸ Easier for assessors to say “The Matrix is more like Star Wars than it is like La La Land.” ▸ Focus on lab studies/crowdsourcing limits research interest in scalability. ▸ Limited scalability prohibits focus on similarity expressed through logged user behavior. ORDINAL EMBEDDING OF FACES TAMUZ ET AL., ICML 2011 [3] O. Tamuz, C. Liu, S. Belongie, O. Shamir, and A. T. Kalai, “Adaptively Learning the Crowd Kernel,” ICML, 2011.

12 ROAD MAP: MY PROPOSED WORK IMPROVE ORDINAL EMBEDDING TECHNIQUES FOR TEXT SIMILARITY APPLICATIONS Active Learning Which triples should we collect? Embedding How can we embed accurately, at scale? Contextual Can we make embeddings that adapt to context? Embeddings

14 ACTIVE LEARNING: SIMPLE METHODS ⦿○○○○○○○○○○○○ HOW MANY COMPARISONS TO LEARN ALL RANKINGS? “a IS MORE LIKE b THAN LIKE c” ⇒ 𝜀 ab < 𝜀 ac ⇒ TRIPLE (a, b, c) ▸ O(n 3 ) total triples (with n total objects). DISTANCE RANKINGS ▸ O(n 2 log n) triples to get all rankings. Boston NYC Seattle SF ▸ O(d n log n) triples if a perfect embedding Boston – 1 st 2 nd 3 rd exists in ℝ d (we think) 1 st 2 nd 3 rd NYC – Anchor Seattle 3 rd 2 nd – 1 st ▸ On a limited budget, we want to adaptively SF 3 rd 2 nd 1 st – pick next triples to improve the embedding the most.

15 RELATED WORK CROWD KERNEL ICML 2011

16 ACTIVE LEARNING: RELATED WORK ⦿⦿○○○○○○○○○○○ ICML 2011: “ADAPTIVELY LEARNING THE CROWD KERNEL” [T,B,S,K] Prob. that assessor says 𝜀 ab < 𝜀 ac ▸ By “kernel” they mean “embedding.” ▸ Assumes that assessors disagree more when λ + δ 2 ac ( X ) Pr (( a , b , c ) | X ) = similar distances are compared. 2 λ + δ 2 ab ( X ) + δ 2 ac ( X ) ▸ They pick triples that (approximately) maximize Pr((a,b,c)|X) 𝜀 ab (X) 𝜀 ac (X) expected information gain. 1 2 0.75 ▸ Model uses an intermediate embedding to find 2 1 0.25 triples where (a,b,c) and (a,c,b) are both likely. 1.4 1.5 0.53 1.5 1.5 0.50 [3] O. Tamuz, C. Liu, S. Belongie, O. Shamir, and A. T. Kalai, “Adaptively Learning the Crowd Kernel,” ICML, 2011.

17 ACTIVE LEARNING: RELATED WORK ⦿⦿⦿○○○○○○○○○○ SCORE CARD: CROWD KERNEL After a year trying to use this tool, I decided to write a thesis on better tools. CK Active Learning 🥊 Good for small budgets Num. Objects 🥊 Hundreds Num. Dimensions 🥊 <10 Accuracy 🥊 Medium Speed 🐍 Prohibitively Slow

18 MY METHOD FRFT ADAPTIVE SORT

19 3 1 5 2 4

20 ACTIVE LEARNING: FRFT ADAPTIVE SORT ⦿⦿⦿⦿○○○○○○○○○ FARTHEST-RANK-FIRST TRAVERSAL ADAPTIVE SORT 1. Pick an anchor far from all previous anchors (first time: use a point on boundary). 2. Guess the anchor’s ranking using an embedding of data collected so far. 3. Sort the guessed ranking adaptively: O(n) triples if guess was good, O(n log n) if guess was bad. 4. If guess was very good, stop; else, go to 1. [8] J. Anderton, V. Pavlu, J. Aslam, “Triple Selection for Ordinal Embedding,” unpublished, 2016.

21 ACTIVE LEARNING: FRFT ADAPTIVE SORT ⦿⦿⦿⦿⦿○○○○○○○○ EMPIRICAL COMPARISON 𝜐 AP IS A TOP-HEAVY RANK CORRELATION MEASURE Tau-AP: 3D GMM FRFT Ranking – My algorithm, using rankings 1 from features – O(n) triples per ranking. 0.9 FRFT Adaptive Sort – My algorithm, using no 0.8 prior knowledge – O(n log n) then O(n). 0.7 Tau-AP 0.6 Crowd Kernel – Active learning baseline. 0.5 Random Tails – Random baseline. 0.4 kNN – Gradually add next NN for each obj. 0.3 Landmarks – Gradually add objects to all 0.2 rankings. 0.1 0 1 2 3 4 5 6 7 × 10 4 Number of Comparisons [8] J. Anderton, V. Pavlu, J. Aslam, “Triple Selection for Ordinal Embedding,” unpublished, 2016.

22 ACTIVE LEARNING: FRFT ADAPTIVE SORT ⦿⦿⦿⦿⦿⦿○○○○○○○ SCORE CARD: FRFT ADAPTIVE SORT Active learning beats CK, but we still have work to do. CK AS Active Learning 🥊 🥉 Approaches lower bound Num. Objects 10,000’s 🥊 🥉 Num. Dimensions 🥊 🥊 <10 Accuracy Very good 🥊 🥉 Speed Medium 🐍 🐈

23 PROPOSED WORK

24 ACTIVE LEARNING: CAN WE DO BETTER? ⦿⦿⦿⦿⦿⦿⦿○○○○○○ CAN WE DO BETTER? ▸ Empirically, FRFT Adaptive Sort approaches the lower bound [4] of Ω (d n log n). ▸ Intermediate embedding step is slow and error-prone. ▸ When our guess is already correct, we still waste (?) triples to confirm it. ▸ I believe we can avoid the embedding step and reduce redundancy using the geometry implied by the triples. [4] K. G. Jamieson and R. D. Nowak, Low-dimensional embedding using adaptively selected ordinal data. IEEE, 2011, pp. 1077–1084.

25 ACTIVE LEARNING: WHAT DO TRIPLES TELL US? ⦿⦿⦿⦿⦿⦿⦿⦿○○○○○ a IS MORE LIKE b THAN c : (a,b,c) THE THREE VIEWS OF A “TRIPLE CONSTRAINT” 𝜀 ab < 𝜀 ac a b a a 𝜀 ab b b 𝜀 ac c c c a IS INSIDE A HALF-SPACE b IS INSIDE A SPHERE c IS OUTSIDE A SPHERE

26 ACTIVE LEARNING: WHAT DO TRIPLES TELL US? ⦿⦿⦿⦿⦿⦿⦿⦿⦿○○○○ COMBINING TRIPLE CONSTRAINTS 𝜀 ab < 𝜀 ac < 𝜀 ad a ∧ ⇒ a a b b b d d c c d c b, c ARE INSIDE A SPHERE c IS INSIDE A SPHERICAL SHELL c, d ARE OUTSIDE A SPHERE

27 ACTIVE LEARNING: WHAT DO TRIPLES TELL US? ⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿○○○ COMBINING SPHERICAL SHELLS Shell Intersection f a b f a e b d c e i j g d c h g Shell Intersection TWO SHELLS IN R 2 THREE SHELLS IN R 2

28 ACTIVE LEARNING: WHAT DO TRIPLES TELL US? ⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿○○ PARTIAL ORDERING ON VECTOR PROJECTIONS r s s r t t p q r' s' t' p q r' s' t' INFERRING ORDER IN BLUE BALL INTERSECTION INFERRING ORDER NEAR BLUE BALL INTERSECTION P, R’, S’, T’, Q P, Q, R’, S’, T’

29 ACTIVE LEARNING: PROPOSED METHOD ⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿○ GUESSING ORDER WITH LINE PROJECTION ▸ Line projection preserves approximate order. [6] ▸ Rankings for a pair of points gives partial order of projections onto their connecting line. ▸ Idea: Don’t waste time on intermediate embedding; guess order by majority vote of partial orders! [6] K. Li and J. Malik, “Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing,” ICML, 2016.

30 ACTIVE LEARNING: PROPOSED METHOD ⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿ GUESSING ORDER WITH LINE PROJECTION TWO RANKINGS r r Point NN Maj. Vote s t u (1/1) u' u' t s u (1/1) u u u t t (1/1) t' s' t' THREE RANKINGS s' Point NN Maj. Vote s t t (2/3) t t t s u (2/3) s s u t t (2/3) p p s' s' u' u' t' t' q q

SCALABLE ORDINAL EMBEDDING TO MODEL USER BEHAVIOR 2 3 4 PAIRWISE - PowerPoint PPT Presentation

JESSE ANDERTON ADVISOR: JAVED ASLAM COMMITTEE MEMBERS: FERNANDO DIAZ, DAVID SMITH, BYRON WALLACE SCALABLE ORDINAL EMBEDDING TO MODEL USER BEHAVIOR 2 3 4 PAIRWISE CITY DISTANCES Boston NYC Seattle SF Boston 190 2,485 2,692 NYC

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Ordinal social ranking : simulations for CP-majority rule Nicolas Fayard 1 and Meltem ztrk 1 1

Ordinal Numbers and the Axiom of Substitution Bernd Schr oder logo1 Bernd Schr oder

Representations of Ordinal Numbers Juan Sebasti an C ardenas-Rodr guez Andr es

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Partition Properties for Non-Ordinal Sets Under the Axiom of Determinacy Jared Holshouser

Create Centered Stacked Bar Charts V0A 12/11/2016 for Even-Choice Ordinal Data using Excel 2013

Create Centered Stacked Bar Charts V0A 12/11/2016 for Odd-Choice Ordinal Data using Excel 2013

Semiautomatic Ordinal and Ring Structures Qi Ji NUS School of Computing 13th November 2019 Qi

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of

Ordinal Ranks on the Baire and non-Baire class functions Takayuki Kihara Nagoya University,

Complexity Bounds for Ordinal-Based Termination Sylvain Schmitz with numerous colleagues: Sergio

RUN groupadd -r user && useradd -r -g user user USER user $ docker run --read-only debian

A PPLYING A RCHITECTURE T ECHNIQUES TO A NCHOR S YSTEMS R OADMAPS Methods & Tools: Experience

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

Assistive Tech Dan Dyer@uidaho.edu What do you want to learn about? I was asked to provide:

Download for PC The most accessible web browser available. Download Step 1:

Teaching & Learning Update Danielle Klingaman- Assistant Superintendent School Committee

Close Reading for ALL Disciplines Jennifer Hengel, Nicole Hochholzer, Brian Reindl, and Coreen

East Greenwich Twp. School District February 10, 2015 February 11, 2015 Why Change? PARCC

and background knowledge P A T R I C I A V E L A S C O B I L I N G U A L E D U C A T I O N P R

SCALABLE ORDINAL EMBEDDING TO MODEL USER BEHAVIOR 2 3 4 PAIRWISE - PowerPoint PPT Presentation

JESSE ANDERTON ADVISOR: JAVED ASLAM COMMITTEE MEMBERS: FERNANDO DIAZ, DAVID SMITH, BYRON WALLACE SCALABLE ORDINAL EMBEDDING TO MODEL USER BEHAVIOR 2 3 4 PAIRWISE CITY DISTANCES Boston NYC Seattle SF Boston 190 2,485 2,692 NYC

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Ordinal social ranking : simulations for CP-majority rule Nicolas Fayard 1 and Meltem ztrk 1 1

Ordinal Numbers and the Axiom of Substitution Bernd Schr oder logo1 Bernd Schr oder

Representations of Ordinal Numbers Juan Sebasti an C ardenas-Rodr guez Andr es

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Partition Properties for Non-Ordinal Sets Under the Axiom of Determinacy Jared Holshouser

Create Centered Stacked Bar Charts V0A 12/11/2016 for Even-Choice Ordinal Data using Excel 2013

Create Centered Stacked Bar Charts V0A 12/11/2016 for Odd-Choice Ordinal Data using Excel 2013

Semiautomatic Ordinal and Ring Structures Qi Ji NUS School of Computing 13th November 2019 Qi

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of

Ordinal Ranks on the Baire and non-Baire class functions Takayuki Kihara Nagoya University,

Complexity Bounds for Ordinal-Based Termination Sylvain Schmitz with numerous colleagues: Sergio

RUN groupadd -r user &amp;&amp; useradd -r -g user user USER user $ docker run --read-only debian

A PPLYING A RCHITECTURE T ECHNIQUES TO A NCHOR S YSTEMS R OADMAPS Methods &amp; Tools: Experience

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

Assistive Tech Dan Dyer@uidaho.edu What do you want to learn about? I was asked to provide:

Download for PC The most accessible web browser available. Download Step 1:

Teaching &amp; Learning Update Danielle Klingaman- Assistant Superintendent School Committee

Close Reading for ALL Disciplines Jennifer Hengel, Nicole Hochholzer, Brian Reindl, and Coreen

East Greenwich Twp. School District February 10, 2015 February 11, 2015 Why Change? PARCC

and background knowledge P A T R I C I A V E L A S C O B I L I N G U A L E D U C A T I O N P R

RUN groupadd -r user && useradd -r -g user user USER user $ docker run --read-only debian

A PPLYING A RCHITECTURE T ECHNIQUES TO A NCHOR S YSTEMS R OADMAPS Methods & Tools: Experience

Teaching & Learning Update Danielle Klingaman- Assistant Superintendent School Committee