Adaptive Incremental Learning for Statistical Relational Models Using Gradient-Based Boosting
Yulong Gu and Paolo Missier
Adaptive Incremental Learning for Statistical Relational Models - - PowerPoint PPT Presentation
Adaptive Incremental Learning for Statistical Relational Models Using Gradient-Based Boosting Yulong Gu and Paolo Missier Presenter: Yulong Gu School of Computing, Newcastle University UK Outline Background Relational Functional
Yulong Gu and Paolo Missier
2
3
Data-driven Model Statistical Relational Model Relational Soft Margin Approach Structural Expectation Maximization Adaptive Incremental Learning Relational Dependency Network Markov Logic Network Relational Functional Gradient Boosting Framework
Data Properties:
4
Study Hard Go to College Academic Awards
Education
Work at fast food joint (Y) Profit more than N Start a Startup Company
Career Startup
Learn Structure&Parameters
Profit more than N Start a Startup Company Work at fast food joint (Y) Study Hard Go to College Academic Awards
Relational Regression Tree
Boosting
Work at fast food joint (Y)
… …
+ + … + Want to build a statistical relational model out of these predicates? Learn a RRT for each predicate encoding both dependencies and parameters Learn multiple weak models rather than a single complex model
5
Natarajan.S (2012). RFGB. Machine Learning
Distinction Distinction Work at fast food joint Go to College
0.5 True False
Pos Example: person(Eric), workatFFJ(Eric), college(Eric), distinction(Eric)
Sufficient Statistics Updated
Distinction
distinction + 1 startup + 0 …
? ? True False
Fork and calculate regression value
Split
Only Split when Hoeffding Bound satisfied
Sliding Window
6
Blockeel, H., & De Raedt, L. (1998). TILDE. Artificial Intelligence(AI); Hulten, G.(2001). CVFDT. KDD CVFDT Splitting Strategy
Work at fast food joint Go to College 0.5 True False
Pos Example: person(Eric), workatFFJ(Eric), college(Eric), distinction(Eric)
Distinction
distinction + 1 startup + 0 …
0.1 True False
Example: After update of SS, the node has seen 100 examples, with 99% certainty, the difference between the true !"#(%"&' ()*+),-+)., − %"&' 0+&1+23 ) and observed one is less than pre-defined 5, HB satisfied, split.
Sliding Window
7
Sliding Window Work at fast food joint Go to College Start a Startup Company Study Hard 0.5 True False
True 0.3 False
True 0.9 False
Sliding Window Work at fast food joint Start a Startup Company
True 0.9 False
CVFDT with alternative subtree After substitution
8
Sliding Window Work at fast food joint Go to College Start a Startup Company Study Hard 0.5 True False
True 0.3 False
True 0.9 False
Sliding Window Work at fast food joint Start a Startup Company
True 0.9 False
CVFDT with alternative subtree After substitution
9
Kolter, J.(2007). DWM. J. Mach. Learn. Res.
Work at fast food joint (Y) Go to College Study Hard C True False A True B False Start a Startup Company D True E False Work at fast food joint (Y)
Weak Model 1 with weight 𝛽 Weak Model 2 with weight 𝛾
10
S QMTOUV
11
Functional Gradient Ascent
Boosting
Established Rules:
We will boost an initial HRRT when it is stable so that the objective functional is best
and the stable rules are transformed into established rules.
Pass 12
13
Work at fast food joint (Y) Go to College Distinction C0 True False A0 True B0 False Work at fast food joint (Y) Go to College Failed C1 True False A1 True B1 False Work at fast food joint (Y) Start a Startup Company Profit more than N Cn True False An True Bn False
Initial HRRT t0 Pass RC Boost t0 -> b0 Functional Gradient of b0 Functional Gradient Tree t1 Pass RC Boost b0 + t1 -> b1 Functional Gradient of b1 Functional Gradient Tree tn Pass RC Boost b0 + b1 … tn -> bn Functional Gradient of bn Data Stream d0 Data Stream d1 Data Stream dn
14 Discard poorly performing FGTs over time
Evaluation Centre for RIB
Complexity No Concept Drift Monitor global performance Monitor contribution to error of each FGT Strong consistence to Training data over time Set S to False
Work at fast food joint (Y) Go to College Distinction 0.5 True False
True 0.8 False Work at fast food joint (Y) Go to College Failed 0.5 True False
True 1.0 False Work at fast food joint (Y) Start a Startup Company Profit more than N
True False
True 0.8 False
Time Line Time Point 1 Time Point 2 Time Point 3
Conflicting Conflicting Conflicting
The decomposability of ensemble methods allows direct event analysis of time series from the real-time incrementally learned model Assume 𝑄 𝑍 𝑄𝑏 𝑍 = 𝑇𝑗 𝑦 is a Sigmoid function, 𝑦 is the regression value, 𝑍 in following examples is predicate ‘Work at fast food joint’. Scenario at Time Point 1: College and Distinction = less likely work at fast food joint in that fast food joint pays less competitive 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑑𝑝𝑚𝑚𝑓𝑓, 𝑒𝑗𝑡𝑢𝑗𝑜𝑑𝑢𝑗𝑝𝑜 = 𝑇𝑗 −0.2 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑑𝑝𝑚𝑚𝑓𝑓, 𝑔𝑏𝑗𝑚𝑓𝑒 = 𝑇𝑗(0.8) Scenario at Time Point 2: College and Failed = less likely work at fast food joint due to fast food joint pays extremely well over this period 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑑𝑝𝑚𝑚𝑓𝑓, 𝑒𝑗𝑡𝑢𝑗𝑜𝑑𝑢𝑗𝑝𝑜 = 𝑇𝑗(−0.2 + 1.0 = 0.8) 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑑𝑝𝑚𝑚𝑓𝑓, 𝑔𝑏𝑗𝑚𝑓𝑒 = 𝑇𝑗(0.8 − 1.6 = −0.8) Scenario at Time Point 3: Own a Start-up and Profit more than N = less likely work at fast food joint due to tightening job market 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑑𝑝𝑚𝑚𝑓𝑓, 𝑒𝑗𝑡𝑢𝑗𝑜𝑑𝑢𝑗𝑝𝑜 = 𝑇𝑗 −0.2 + 1.0 − 0.5 = 0.3 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑑𝑝𝑚𝑚𝑓𝑓, 𝑔𝑏𝑗𝑚𝑓𝑒 = 𝑇𝑗(0.8 − 1.6 − 0.5 = −1.2) 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑡𝑢𝑏𝑠𝑢𝑣𝑞, 𝑞𝑠𝑝𝑔𝑗𝑢𝑛𝑝𝑠𝑓𝑢ℎ𝑏𝑜𝑂 = 𝑇𝑗 0.5 + 0.5 − 1.8 = −0.8 15
Work at fast food joint (Y) Go to College Distinction C0 True False A0 True B0 False Work at fast food joint (Y) Go to College Failed C1 True False A1 True B1 False Work at fast food joint (Y) Start a Startup Company Profit more than N Cn True False An True Bn False
HRRT t0 Pass RC Boost t0 ->b0 Functional Gradient Tree t1 Pass RC Boost t0 + t1 -> b1 Functional Gradient Tree tn Pass RC Boost t0 + t1 … tn -> bn Data Stream d0 Data Stream d1 Data Stream dn Assigned weight w0 Assigned weight w1 Assigned weight wn 16
17
Kolter, J.(2007). DWM. J. Mach. Learn. Res.
Discard trees with normalised weights lower than pre-defined threshold
Evaluation Centre for RBF
Complexity No Concept Drift Monitor global performance Strong consistence to Training data over time Set S to False
18
The relational dependency network case. Machine Learning
Domains: A Soft Margin Approach. ICDM
collaborative filtering, and data visualization. The Journal of Machine Learning Research
logic network and missing data cases. Machine Learning
International Conference on Knowledge Discovery and Data Mining - KDD
Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD
2790 (2007).
19