Domain Adaptation from a Pre-trained Source Model Application on - PowerPoint PPT Presentation

Domain Adaptation from a Pre-trained Source Model Application on fraud detection tasks Presenter: Luxin Zhang (Worldline & Inria) Supervisors: Christophe Biernacki (Inria), Pascal Germain (Inria), Yacine Kessaci (Worldline) CMStatistics 2019 Nov 15, 2019 Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 1 / 17

Fraud Detection in Transactions Fraud Detection Problem: Detect if a transaction is issued by the customer or not. Fraud Detection Model: A binary classification model based on the historical transactions of a customer. Characteristic of Fraud Detection Dataset: Huge number of examples ( 600 thousand per day). Extremely imbalanced class (0.2% of fraud). Categorical and numerical attributes. Highly dependent manually generated attributes. Numerical attributes are very skew. Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 2 / 17

Why to Transfer Existing Market (Country) Well trained classification model. The pattern of fraudster evolves. Expanding Market (Country) Consumer behaviors are different from country to country. Not enough label information in a new country. The pattern of fraudster evolves. Technology used to face the challenge: Domain Adaptation Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 3 / 17

Plan 1 Introduction of Domain Adaptation 2 What to Transfer 3 How to Transfer 4 Details of the Transformation 5 Experimental Results 6 Prospects Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 4 / 17

Introduction of Domain Adaptation What is Domain Adaptation? Domain adaptation is a technique of transfer learning to reduce the drift between distributions of data from different domains (Pan and Yang [3]) Why to transfer? (Just Answered) What to transfer? How to transfer? Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 5 / 17

Context Simplified Dataset: Encode categorical attributes by historical risk score. Use log-transformation to fix the skew numerical attributes. Notations: X = R d : input space. Y = { 0 , 1 } : output space. X s , X t ∈ X : input data of two domains. Y s , Y t ∈ Y : output data of two domains. h : X → [0 , 1]: classifier that returns the probability of being fraud. l : R × R → R + : loss function. R l s ( h ), R l t ( h ): True risk of classifier h . Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 6 / 17

What to Transfer Our Proposition: Target to Source Domain Adaptation. Target to Source Domain Adaptation Assumption: No label shift = ⇒ P ( Y s ) = P ( Y t ) Proposition: ⇒ R l t ( h ∗ s ◦ G ) = R l t ( h ∗ P ( X s | Y s ) = P ( G ( X t ) | Y t ) = t ) G is the transformation that we are looking for and h ∗ s and h ∗ t are respectively the true risk minimizers of two domains. Characteristic of Fraud Detection: Justification of Assumptions: Proportion of fraud is nearly the same. No label shift. No (not enough) Y t . G does not depend on Y . Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 7 / 17

What to Transfer Related Works: Source to target adaptation. Common space adaptation. Advantages of Target to Source Transformation: Leverage the improvement of source model. No more retraining for every new country. A robust model needs investment and expertise. Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 8 / 17

How to Transfer Difficulties: Y t is not enough to directly estimate G . Transformation G : Industrial Requirements: Interpretability. Better understand consumer behaviors in new country. Modularity. Transactions dataset is large. Scalability. Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 9 / 17

How to Transfer Intuition P ( X s | Y s ) = P ( G ( X t ) | Y t ) ⇐ ⇒ P ( X s ) = P ( G ( X t )) The function G who minimizes the “marginal transformation efforts” aligns also the conditional distribution. � �� G = argmin W p G ( X t ) P X s , P G W p is the l p wasserstein distance. The domain adaptation is formulated to be an optimal transport problem (Courty et al. [1]). Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 10 / 17

Details of the Transformation Wasserstein Distance on Empirical Dataset: W p ( P s , P t ) = min γ ∈ Γ( P s , P t ) < C p , γ > < C p , γ > : the sum of element wise product of matrix C p and γ . C p : a l p norm matrix between all pairs of examples. Γ( P s , P t ): a set of joint probability matrix of P ( X s ) and P ( X t ). Optimal Transport: Aligns the distributions. Easy to interpret. Not scalable on big dataset. (even with entropy regularization [2]) Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 11 / 17

Details of the Transformation 1D Optimal Transport: It is well known that 1D optimal transport has a closed-form solution where G 1 D ( x ) = ( F − 1 P s ◦ F P t )( x ), F is a cumulative distribution function. This solution is also known as the increasing arrangement. (Peyr´ e et al. [4]) Compositions of G : Assumption: All attributes are independent (or move towards the same direction). � � � � � � � � � �� G = G 1 � G 2 � G i � G k − 1 � G k where G i = argmin W p G ( X t , i ) P X s , i , P � � � � � � � ... � ... G X s , i and X t , i are the i-th attributes of input data X . Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 12 / 17

Target to Source Domain Adaptation Which attribute to transfer? Feature selection using accessible labeled target data. Separate attributes into different groups. A greedy search based on classifier’s performance. Keep the attributes the most significant for adaptation. Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 13 / 17

Experimental Results No Adaptation All Adaptation Selected Adaptation 0.016 0.055 0 . 070 ± 0 . 009 Juillet August 0.061 0.077 0 . 061 ± 0 . 006 September 0.013 0.052 0 . 034 ± 0 . 006 Table: Performance of adaptation based on Neural Networks No Adaptation All Adaptation Selected Adaptation Juillet 0.038 0.045 0 . 054 ± 0 . 002 August 0.063 0.072 0 . 062 ± 0 . 003 September 0.019 0.038 0 . 048 ± 0 . 002 Table: Performance of adaptation based on Xgboost. Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 14 / 17

Experimental Results Figure: Comparison of feature selection performance to retrained target model. Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 15 / 17

Prospects Transfer directly the categorical attributes. Take into account the imbalance of class. Take into account the dependence of attributes. Take into account the characteristic of the source classifier. Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 16 / 17

References [1] Nicolas Courty, R´ emi Flamary, Devis Tuia, and Alain Rakotomamonjy. Optimal transport for domain adaptation. IEEE transactions on pattern analysis and machine intelligence , 39 (9):1853–1865, 2016. [2] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in neural information processing systems , pages 2292–2300, 2013. [3] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering , 22(10):1345–1359, 2009. [4] Gabriel Peyr´ e, Marco Cuturi, et al. Computational optimal transport. Foundations and Trends ➤ in Machine Learning , 11(5-6):355–607, 2019. Luxin Zhang Domain Adaptation from a Pre-trained Source Model CMStatistics 2019 17 / 17

Domain Adaptation from a Pre-trained Source Model Application on - PowerPoint PPT Presentation

Domain Adaptation from a Pre-trained Source Model Application on fraud detection tasks Presenter: Luxin Zhang (Worldline & Inria) Supervisors: Christophe Biernacki (Inria), Pascal Germain (Inria), Yacine Kessaci (Worldline) CMStatistics 2019

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21 Domain Adaptation DA

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Robust Causal Domain Adaptation in a Simple Diagnostic Setting Thijs van Ommen Ghent, July 4,

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Using Monolingual Source-Side In-Domain Data Jen Drexler, Pamela Shapiro, Xuan Zhang SCALE

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Focusing the Core Domain Model A Domain-Driven Design Case Study, Eric Evans, Domain Language

Domain Adaptation with Asymmetrically Relaxed Distribution Alignment Yifan Wu , Ezra Winston,

Commonsense Knowledge in Pre-trained Language Models Vered Shwartz July 5th, 2020 Commonsense

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Lightweight Unsupervised Domain Adaptation by Convolutional Filter Reconstruction Rahaf Aljundi,

Domain adaptation model for retinopathy detection from cross-domain OCT images Jing Wang 1;2 ,

Janet Kittams-Lalley Helpline Center Available 24/7 Have staff that are trained to assess

I have trained more than 1,000 individuals to become ACII qualified I have trained over

MET Symposium Airways NZ view of future use of MET information 31 August 2017 Future

Fine-Grained User-Space Security Through Virtualization Mathias Payer and Thomas R. Gross ETH

Computer Science Class XII ( As per CBSE Board) Visit : python.mykvs.in for regular updates

RAPIDS CUDA DataFrame Internals for C++ Developers - S91043 Jake Hemstad - NVIDIA - Developer

I/O Performance on Cray XC30 Zhengji Zhao 1) , Doug Petesch 2) , David Knaak 2) , and Tina Declerck

ION GNSS SDR Metadata Standard Working Group Report Presentation/Minutes/Attendee Lis ist

Investor Presentation August 2018 Disclaimer Except as otherwise indicated, this presentation

SCTP as Alternative Transport to TCP and UDP Introduction High growth of Internet Most