Parallelization of the PC Algorithm Anders L. Madsen 1 , 2 Frank - PowerPoint PPT Presentation

Parallelization of the PC Algorithm Anders L. Madsen 1 , 2 Frank Jensen 1 Antonio Salmerón 3 Helge Langseth 4 Thomas D. Nielsen 2 1 Hugin Expert A/S, Aalborg, Denmark 2 Dept. Computer Science, Aalborg University, Denmark 3 Dept. Mathematics, University of Almería, Spain 4 Dept. Computer and Information Science. Norwegian University of Science and Technology, Trondheim, Norway CAEPIA 2015, Albacete, November 7, 2015 1

Introduction ◮ The AMiDST project: Analysis of MassIve Data STreams http://www.amidst.eu CAEPIA 2015, Albacete, November 7, 2015 2

Introduction ◮ The AMiDST project: Analysis of MassIve Data STreams http://www.amidst.eu ◮ Large number of variables ◮ Massive datasets ◮ Hybrid Bayesian networks (involving discrete and continuous variables) ◮ Conditional linear Gaussian networks CAEPIA 2015, Albacete, November 7, 2015 3

Introduction ◮ The AMiDST project: Analysis of MassIve Data STreams http://www.amidst.eu ◮ Large number of variables ◮ Massive datasets ◮ Hybrid Bayesian networks (involving discrete and continuous variables) ◮ Conditional linear Gaussian networks Objectives ◮ Scale up the PC algorithm for learning CLG networks from large volumes of data. ◮ Take advantage of parallel computing environments with shared memory. CAEPIA 2015, Albacete, November 7, 2015 4

The PC algorithm 1. Determine pairwise (conditional) independence I ( X , Y ; S ) . 2. Identify skeleton of G . 3. Identify v -structures in G . 4. Identify derived directions in G . 5. Complete orientation of G making it a DAG. CAEPIA 2015, Albacete, November 7, 2015 5

The PC algorithm 1. Determine pairwise (conditional) independence I ( X , Y ; S ) . 2. Identify skeleton of G . 3. Identify v -structures in G . 4. Identify derived directions in G . 5. Complete orientation of G making it a DAG. Remarks ◮ Step 1 takes most of the computing time ◮ Marginal independence ( S = ∅ ) is tested first ◮ Only potential neighbours are included in the conditioning set CAEPIA 2015, Albacete, November 7, 2015 6

Our proposal for parallelisation We propose to parallelise Step 1 (pairwise c.i. tests) 1. Test all pairs X and Y for marginal independence. ◮ Use BIB designs 2. Perform the most promising higher-order c.i. tests. ◮ We create an edge index array, which the threads iterate over to select the next edge to evaluate for each iteration. ◮ The edge index array contains all edges that has not been removed at an earlier step and it is sorted in decreasing order of the test score ◮ Tests of size |S| = 1 , 2 , 3 may be performed. 3. Remaining tests of conditional independence ( X , Y ; S ) where |S| = 1 , 2 , 3. CAEPIA 2015, Albacete, November 7, 2015 7

Balanced Incomplete Block (BIB) designs ◮ It is a concept coming from statistical design of experiments that provides a way of arranging experimental units when testing the effectiveness of a treatment A design is a pair ( X , A ) s. t. the following properties are satisfied: 1. X is a set of elements called points, and 2. A is a collection of nonempty subsets of X called blocks. Let v , k and λ be positive integers s. t. v > k ≥ 2. A ( v , k , λ ) -BIB design is a design ( X , A ) s. t. the following properties are satisfied: 1. | X | = v, 2. each block contains exactly k points, and 3. every pair of distinct points is contained in exactly λ blocks. CAEPIA 2015, Albacete, November 7, 2015 8

BIB Design Example Consider the ( 7 , 3 , 1 ) -BIB design for 14 variables ◮ Each point represents two variables ◮ Each process is assigned six variables The seven blocks ( b = 7) are: { 013 } , { 124 } , { 235 } , { 346 } , { 450 } , { 561 } , { 602 } The pairwise scoring is performed as CAEPIA 2015, Albacete, November 7, 2015 9

Balanced Incomplete Block (BIB) designs ◮ The testing is divided into tasks of equal size such that we test exactly all pairs X , Y for marginal independence ◮ This is achieved using BIB designs on the form ( q , 6 , 1 ) and then ( 3 , 2 , 1 ) where q is at least the number of variables X 1 X 2 · · · X 7 · · · X 19 · · · X 23 · · · X 30 X n · · · X 1 X 2 X 7 X 19 X 23 X 30 · · · X 1 X 2 X 7 X 19 X 7 X 19 X 23 X 30 X 1 X 2 X 23 X 30 · · · X 1 X 2 X 1 X 7 X 1 X 19 · · · CAEPIA 2015, Albacete, November 7, 2015 10

Extra heuristics ◮ For each edge, we compute the set of most promising tests ◮ For each edge ( X , Y ) the set of best candidate variables to include in S are identified using the weight of a candidate variable Z which is equal to the sum of the test scores for ( X , Z ) and ( Y , Z ) : w ( Z | ( X , Y )) = 2 N ( MI ( Z , X ) + MI ( Z , Y )) where MI ( · , · ) is the mutual information. ◮ We create an array of best candidates with ≤ 7 vars (counts stored in memory) sorted by the sum of the edge weights ◮ The threads iterate over the edge index array. A thread performs all tests for a selected edge (with |S| = 1 , 2 , 3) from the best candidate array. Testing stops as soon as an independence hypothesis is not rejected CAEPIA 2015, Albacete, November 7, 2015 11

Empirical evaluation data set |X| Total CPT size ship-ship 50 130,478 Munin1 189 19,466 Diabetes 413 461,069 Munin2 1,003 83,920 sacso 2,371 44,274 ◮ Software implementation based on HUGIN software ◮ Three data sets generated at random for each network with 100,000, 250,000, and 500,000 cases ◮ The empirical evaluation is performed on a Linux computer running Red Hat Enterprise Linux 7 with a six-core Intel (TM) i7-5820K 3.3GHz processor and 64 GB RAM ◮ The computer has 6 physical cores and 12 logical cores CAEPIA 2015, Albacete, November 7, 2015 12

Empirical evaluation 2.5 2.5 40 2.5 Time Time 35 Speed-up Speed-up Average run time in seconds Average run time in seconds 2 2 2 Average speed-up factor Average speed-up factor 30 25 1.5 1.5 1.5 20 1 1 1 15 10 0.5 0.5 0.5 5 0 0 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Number of threads Number of threads (a) ship-ship 500,000 (b) Munin1 250,000 CAEPIA 2015, Albacete, November 7, 2015 13

Empirical evaluation 2500 7 3500 7 Time Time Speed-up Speed-up 6 3000 6 Average run time in seconds Average run time in seconds 2000 Average speed-up factor Average speed-up factor 5 2500 5 1500 4 2000 4 3 1500 3 1000 2 1000 2 500 1 500 1 0 0 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Number of threads Number of threads (c) Diabetes 250,000 (d) Diabetes 500,000 CAEPIA 2015, Albacete, November 7, 2015 14

Empirical evaluation 140 4.5 300 4 Time Time 4 3.5 Speed-up Speed-up 120 Average run time in seconds Average run time in seconds 250 3.5 Average speed-up factor Average speed-up factor 3 100 3 200 2.5 80 2.5 150 2 2 60 1.5 1.5 100 40 1 1 50 20 0.5 0.5 0 0 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Number of threads Number of threads (e) Munin2 250,000 (f) Munin2 500,000 CAEPIA 2015, Albacete, November 7, 2015 15

Empirical evaluation 400 6 800 7 Time Time 350 700 Speed-up Speed-up 6 Average run time in seconds 5 Average run time in seconds Average speed-up factor Average speed-up factor 300 600 5 4 250 500 4 200 3 400 3 150 300 2 2 100 200 1 1 50 100 0 0 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Number of threads Number of threads (g) sacso 250,000 (h) sacso 500,000 CAEPIA 2015, Albacete, November 7, 2015 16

Empirical evaluation Data set Skeleton v -structures Orientation (Step 2) (Step 3) (Steps 4 and 5) ship-ship 0 0 0 Munin1 0.005 0 0.001 Diabetes 0.001 0.004 0.002 Munin2 0.006 0.002 0.034 sacso 0.051 5.692 0.502 CAEPIA 2015, Albacete, November 7, 2015 17

Conclusions ◮ Parallelisation of structure learning using the PC algorithm ◮ The edge index array is the central bottleneck of the approach as it is the only element that requires synchronization ◮ The number of threads used by the algorithm may impact the result as the order of tests is not invariant under the number of threads used. This is a topic of future research. ◮ The results of the empirical evaluation show a significant time performance improvement over the pure sequential method. CAEPIA 2015, Albacete, November 7, 2015 18

This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 619209 CAEPIA 2015, Albacete, November 7, 2015 19

Parallelization of the PC Algorithm Anders L. Madsen 1 , 2 Frank - PowerPoint PPT Presentation

Parallelization of the PC Algorithm Anders L. Madsen 1 , 2 Frank Jensen 1 Antonio Salmern 3 Helge Langseth 4 Thomas D. Nielsen 2 1 Hugin Expert A/S, Aalborg, Denmark 2 Dept. Computer Science, Aalborg University, Denmark 3 Dept. Mathematics,

Speed up evaluation by parallelization /////////// November 2018 Michael Weiss Bayer AG

Parallelization and Parallelization and Proling Proling Programming for Statistical

Parallelization Parallelization Programming for Statistical Programming for Statistical Science

Code Parallelization Fabrice Schlegel Introduction Goal: Efficient parallelization and memory

Italian Folk Multiplication Why Parallelization Algorithm Is Indeed Better: Which Algorithm Is .

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization Alain

for Effective Speculative Parallelization in Hardware VICTOR A. YING MARK C. JEFFREY* DANIEL

Parallelization in Time Mark Maienschein-Cline Department of Chemistry University of Chicago

Parallelization of Geodesic Ray-Tracing for Arbitrary Metrics Guillermo Andree Oliva Mercado

1/18 Straightforward parallelization of polynomial multiplication using parallel collections in

Hybrid Parallelization of Particle-in-Cell (PIC) Algorithm For Simulation Of Low Temperature

Parallelization of an Image Retrieval Algorithm Zhenman Fang , Donglei Yang, Weihua Zhang, Haibo

Marwa A. Al-Shandawely PDC/KTH Algorithm overview. Trivial parallelization. Problems.

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

PBIBD and its applications in Cryptology Bimal Roy Indian Statistical Institute www.isical.ac.in/

Tools and Resources to Develop a Learning Focused District Assessment System MI School Testing

Registry Principles GMTA March 15, 2017 Key Elements of Registry Principles Definition

H.G. Infra Engineering Ltd We make people move Outline Q2 & H1 FY19 Result Highlights 01

Balancing Factors for Stepped Wedge Designs Robert Lew 1 Hongsheng Wu 1,2 , Christopher Miller 1,3

COMPLETE OR BALANCED? Providing variable treatments will not make a street incomplete! Plan for

Community Development Block Grant FY2020 RFP Application Workshop City of New Bedford Office

Lessons from Discrete Mathematics Kirsten Nelson Carleton University October 14, 2017 Contact:

Parallelization of the PC Algorithm Anders L. Madsen 1 , 2 Frank - PowerPoint PPT Presentation

Parallelization of the PC Algorithm Anders L. Madsen 1 , 2 Frank Jensen 1 Antonio Salmern 3 Helge Langseth 4 Thomas D. Nielsen 2 1 Hugin Expert A/S, Aalborg, Denmark 2 Dept. Computer Science, Aalborg University, Denmark 3 Dept. Mathematics,

Speed up evaluation by parallelization /////////// November 2018 Michael Weiss Bayer AG

Parallelization and Parallelization and Proling Proling Programming for Statistical

Parallelization Parallelization Programming for Statistical Programming for Statistical Science

Code Parallelization Fabrice Schlegel Introduction Goal: Efficient parallelization and memory

Italian Folk Multiplication Why Parallelization Algorithm Is Indeed Better: Which Algorithm Is .

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization Alain

for Effective Speculative Parallelization in Hardware VICTOR A. YING MARK C. JEFFREY* DANIEL

Parallelization in Time Mark Maienschein-Cline Department of Chemistry University of Chicago

Parallelization of Geodesic Ray-Tracing for Arbitrary Metrics Guillermo Andree Oliva Mercado

1/18 Straightforward parallelization of polynomial multiplication using parallel collections in

Hybrid Parallelization of Particle-in-Cell (PIC) Algorithm For Simulation Of Low Temperature

Parallelization of an Image Retrieval Algorithm Zhenman Fang , Donglei Yang, Weihua Zhang, Haibo

Marwa A. Al-Shandawely PDC/KTH Algorithm overview. Trivial parallelization. Problems.

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

PBIBD and its applications in Cryptology Bimal Roy Indian Statistical Institute www.isical.ac.in/

Tools and Resources to Develop a Learning Focused District Assessment System MI School Testing

Registry Principles GMTA March 15, 2017 Key Elements of Registry Principles Definition

H.G. Infra Engineering Ltd We make people move Outline Q2 &amp; H1 FY19 Result Highlights 01

Balancing Factors for Stepped Wedge Designs Robert Lew 1 Hongsheng Wu 1,2 , Christopher Miller 1,3

COMPLETE OR BALANCED? Providing variable treatments will not make a street incomplete! Plan for

Community Development Block Grant FY2020 RFP Application Workshop City of New Bedford Office

Lessons from Discrete Mathematics Kirsten Nelson Carleton University October 14, 2017 Contact:

H.G. Infra Engineering Ltd We make people move Outline Q2 & H1 FY19 Result Highlights 01