Multi-domain Predictive AI Correlated Cross-Occurrence with Apache - PowerPoint PPT Presentation

Multi-domain Predictive AI Correlated Cross-Occurrence with Apache Mahout and GPUs

Pat Ferrel ActionML, Chief Consultant Apache Mahout, PMC & Committer Apache PredictionIO, PMC & Committer pat@apache.org pat@actionml.com

What is the Goal for Predictive AI? Use all we can record about users to predict their preference for anything

What is the Goal for Predictive AI? Use all we can record about users to predict their preference for anything • Recommenders • Behavioral Search • Personalized Apps

What Problem Does this Solve? Multi-domain, multi-modal, multi-action, multi-behavior, • multi-indicator data means we know more about a user Coverage is greatly increased if we can use multi-indicator data • Carefully correlating behavior means much better predictions if • only because we have new data sources Being able to target any type of prediction from the same • dataset allows us to predict new things (caveats apply)

Matrix Factorization ALS-style Users by Items, “buy” One indicator: buy

Problems with ALS • Only one indicator of behavior • Buy: can bring good results but limits user and item coverage to past buyers • Ratings: mostly useless • Others: yes but only one at a time

For the same E-Commerce Example: Multi-modal, multi-domain behavior What if we could use: Buying behavior indicator (user-id, buy, item-id) • Viewing behavior indicator (user-id, view, item-id) • Category-preference behavior indicator (user-id, cat-pref, item-id) • Sharing behavior indicator (user-id, share, item-id) • Search behavior indicator (user-id, search, keyword) • to make better: buy recommendations or • augment search indexes or • understand a user’s category preferences, or ... •

Correlated Cross-Occurrence Apache Mahout + Apache PredictionIO + AML code = The Universal Recommender

ANATOMY OF A RECOMMENDATION: Simple Cooccurrence Algorithm r = recommendations h a = a user’s history of some primary action (purchase for instance) A = the history of all users’ primary action rows are users, columns are items [A t A] = compares column to column using log-likelihood based correlation test r =[A t A]h a

The Theory Doesn’t End There Virtually all existing collaborative filtering type recommenders use only one indicator of • preference r =[A t A]h a But the theory doesn’t stop there, we can find correlation between different behavior (CCO) • r =[A t A]h a +[A t B]h b +[A t C]h c + … Virtually anything we know about the user can be used to improve • recommendations—purchase, view, category-preference, location-preference, device-preference…

Single User History of Multi-modal Behavior products products categories products terms user-i category pref terms ... users share buy views in search input A B C D E

All User’s Multi-Modal Behavior Indicators: Far More than Conversions products products categories products terms category pref terms ... users share buy views in search input A B C D E

All User’s Buys Cooccurrence users products products product-j products products cooccurrence = X users product-j had 2 other products that were bought in common, we replace A t cooccurrence magnitude with LLR A score, it adds the “correlation test” to simple cooccurrence

All User’s Buys Cross-occurrence with Search terms users terms terms product-j products products cross- terms occur- = X users in rence search product-j had 3 terms that were searched for in common, we replace A t cross-occurrence magnitude with LLR E score, it adds the “correlation test” to simple cross-occurrence!

CORRELATED CROSS-OCCURRENCE: Apache Mahout-Samsara r =[A t A]h a +[A t B]h b +[A t C]h c + … Sparse Matrix Multiply, A t A, A t B, A t C … • Correlation test for non-zero, • ie co or cross-occurring items with the Log-Likelihood Ratio All done with Apache Mahout-Samsara • Why? One of the few libs that does general linear algebra like • A t A and A t B in a massively scalable way and on GPUs

CORRELATED CROSS-OCCURRENCE: The Model product-j “bought”: co-occurring “bought” products: product-1, product-5, … cross-occurring “viewed” products: product-1, product-3, product-5, … cross-occurring “category-preference” categories: category-9, category-21, category-38, … cross-occurring “shared” products: product-50, product-99, product-301, … cross-occurring “searched” terms: term-10, term--21, term-49, … user-i history of all behavior: bought products: product-1, product-5, … viewed products: product-1, product-3, product-5, … categories-prefered: category-9, category-21, category-38, … shared products: product-50, product-99, product-301, … searched terms: term-10, term--21, term-49, … What do we recommend...

CORRELATED CROSS-OCCURRENCE: K-NEAREST NEIGHBORS r =[A t A]h a +[A t B]h b +[A t C]h c + … 1. The dot product of two normalized (length = 1) vectors = the cosine of the angle between 2. The cosine of the angle between two vectors is the Machine Learning heavy lifter for similarity and therefore used by just about all search engines: https://en.wikipedia.org/wiki/Cosine_similarity and https://lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/search/Similarity.html 3. [A t A]h a and [A t B]h b is the dot product of every row in the model with h a and h b 4. Take the sum of dot products for each item and sort them for ranking recommendations 5. Step #4 is exactly what Lucene does! ● it is fast! using sparsity, sharding, and parallel execution of queries to accelerate ● It is scalable and HA with Elasticsearch and Solr

CORRELATED CROSS-OCCURRENCE: Find the most similar product to the user history Lucene Indexes multi-field documents, one doc per product, one field per indicator : product-j: bought field: product-1, product-5, … viewed field: product-1, product-3, product-5, … category-preference field: category-9, category-21, category-38, … shared field: product-50, product-99, product-301, … searched field: term-10, term--21, term-49, … User history query user-i history of all behavior: bought products ￫ bought fields: product-1, product-5, … viewed products ￫ viewed field: product-1, product-3, product-5, … categories-prefered ￫ category-preference field: category-9, category-21, category-38, … shared products ￫ shared fields: product-50, product-99, product-301, … searched terms ￫ searched field: term-10, term--21, term-49, … Search results: product-j, product-k, …

CORRELATED CROSS-OCCURRENCE: Find the most similar product to the user history Lucene Indexes multi-field documents, one doc per product, one field per indicator : product-j: bought field: product-1, product-5, … viewed field: product-1, product-3, product-5, … category-preference field: category-9, category-21, category-38, … shared field: product-50, product-99, product-301, … searched field: term-10, term--21, term-49, … User history query user-i history of all behavior: bought products ￫ bought fields: product-1, product-5, … viewed products ￫ viewed field: product-1, product-3, product-5, … categories-prefered ￫ category-preference field: category-9, category-21, category-38, … shared products ￫ shared fields: product-50, product-99, product-301, … searched terms ￫ searched field: term-10, term--21, term-49, … Search results: product-j, product-k, … Search ranks all products most similar to the user’s multi-modal history.

Uses: Better E-Commerce Recommender • sure, you saw that coming • Search index augmentation • some terms that lead to conversions are not in the content like • trendy slang or jargon or common misspellings Behavioral augmentation of search indexes • search terms + user history = results that might lead to a purchase • Business Rules, it’s only a query on documents • Blend Collaborative Filtering and Content-based Recs • With enough data? •

Uses: Better E-Commerce Recommender • sure, you saw that coming • Search index augmentation • some terms that lead to conversions are not in the content like • trendy slang or jargon or common misspellings Behavioral augmentation of search indexes • search terms + user history = results that might lead to a purchase • Business Rules, it’s only a query on documents • Blend Collaborative Filtering and Content-based Recs • With enough data? Mind reading? •

Why each matrix may X = GPUs be 1,000,000 x 1,000,000 X = calculation time is too expensive! = X ‘nuff said? X = X =

Questions? Speaker Change Andy--give-em GPUs?

Multi-domain Predictive AI Correlated Cross-Occurrence with Apache - PowerPoint PPT Presentation

Multi-domain Predictive AI Correlated Cross-Occurrence with Apache Mahout and GPUs Pat Ferrel ActionML, Chief Consultant Apache Mahout, PMC & Committer Apache PredictionIO, PMC & Committer pat@apache.org pat@actionml.com What is

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Predictive Analytics for Capacity Planning HIC 2015 Andrae Gaeth What is predictive

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Focusing the Core Domain Model A Domain-Driven Design Case Study, Eric Evans, Domain Language

Image Processing A case study for a domain decomposed MPI code Domain Decomposition 1

Multi-Site Vs. Domain A Commerce Case Study May 7, 2019 Page 1 | Multi-Site Vs Domain: A

Dialogues CS294S/W Project Pitch Multi-Domain Dialogues Multiple domains in the same

Kicking Down the Cross Domain Door Techniques for Cross Domain Exploitation Billy K Rios (BK) and

Chapter 24 Chapter 24 Chapter 24 The Domain Name System The Domain Name System The Domain Name

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara

Web Development Web Hosting and Domain Names CSCI-GA 1122 Web Development Web Hosting and

Protein Sequence Analysis Protein Sequence Analysis Domain review Domain review What is a

Information Visualization domain situation details of an application domain Characterize

Domain-independent planning and Domain-dependent planning Le Meilleur est lennemi

s to Z-Domain Transfer Function 1. s to Z-Domain Transfer Function 1. Discrete ZOH Signals s

Addressing Total Market Research Initiative Clients & Agencies TM Roundtables Report TM

R14 REGional Workshop G FOCUS 2021 Improving Procurement Strategies & Data Quality (LA3,

SOUTHWEST BCSW ENROLLMENT GROWTH FTES by ZIP CODE FTES by ZIP CODE 2013-14 FTES by ZIP CODE

The CPOs expectations of a Procurement CoE What we aspire to become Dean Bennett Head of

Context-aware recommendation Eirini Kolomvrezou, Hendrik Heuer Special Course in Computer and

Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei

Building a real-time recommendation engine with Neo4j OSCON 2017 William Lyon @lyonwj William

Using Semantic Relations for Content-based Recommender Systems in Cultural Heritage Yiwen Wang 1 ,

Multi-domain Predictive AI Correlated Cross-Occurrence with Apache - PowerPoint PPT Presentation

Multi-domain Predictive AI Correlated Cross-Occurrence with Apache Mahout and GPUs Pat Ferrel ActionML, Chief Consultant Apache Mahout, PMC & Committer Apache PredictionIO, PMC & Committer pat@apache.org pat@actionml.com What is

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Predictive Analytics for Capacity Planning HIC 2015 Andrae Gaeth What is predictive

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Focusing the Core Domain Model A Domain-Driven Design Case Study, Eric Evans, Domain Language

Image Processing A case study for a domain decomposed MPI code Domain Decomposition 1

Multi-Site Vs. Domain A Commerce Case Study May 7, 2019 Page 1 | Multi-Site Vs Domain: A

Dialogues CS294S/W Project Pitch Multi-Domain Dialogues Multiple domains in the same

Kicking Down the Cross Domain Door Techniques for Cross Domain Exploitation Billy K Rios (BK) and

Chapter 24 Chapter 24 Chapter 24 The Domain Name System The Domain Name System The Domain Name

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara

Web Development Web Hosting and Domain Names CSCI-GA 1122 Web Development Web Hosting and

Protein Sequence Analysis Protein Sequence Analysis Domain review Domain review What is a

Information Visualization domain situation details of an application domain Characterize

Domain-independent planning and Domain-dependent planning Le Meilleur est lennemi

s to Z-Domain Transfer Function 1. s to Z-Domain Transfer Function 1. Discrete ZOH Signals s

Addressing Total Market Research Initiative Clients &amp; Agencies TM Roundtables Report TM

R14 REGional Workshop G FOCUS 2021 Improving Procurement Strategies &amp; Data Quality (LA3,

SOUTHWEST BCSW ENROLLMENT GROWTH FTES by ZIP CODE FTES by ZIP CODE 2013-14 FTES by ZIP CODE

The CPOs expectations of a Procurement CoE What we aspire to become Dean Bennett Head of

Context-aware recommendation Eirini Kolomvrezou, Hendrik Heuer Special Course in Computer and

Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei

Building a real-time recommendation engine with Neo4j OSCON 2017 William Lyon @lyonwj William

Using Semantic Relations for Content-based Recommender Systems in Cultural Heritage Yiwen Wang 1 ,

Addressing Total Market Research Initiative Clients & Agencies TM Roundtables Report TM

R14 REGional Workshop G FOCUS 2021 Improving Procurement Strategies & Data Quality (LA3,