1
Message-passing Two Steps Least Square Algorithms for Simultaneous Equations Models
José Juan López Espín
Universidad Miguel Hernández (Elche, Spain)
Domingo Giménez Cánovas
Universidad de Murcia (Murcia, Spain)
Message-passing Two Steps Least Square Algorithms for Simultaneous - - PowerPoint PPT Presentation
Message-passing Two Steps Least Square Algorithms for Simultaneous Equations Models Jos Juan Lpez Espn Universidad Miguel Hernndez (Elche, Spain) Domingo Gimnez Cnovas Universidad de Murcia (Murcia, Spain) 1 Contents
1
José Juan López Espín
Universidad Miguel Hernández (Elche, Spain)
Domingo Giménez Cánovas
Universidad de Murcia (Murcia, Spain)
2
Introduction Simultaneous equations models OLS and 2SLS techniques Three different versions of 2SLS algorithm
General Inverse decomposition QR decomposition
Experimental results Conclusions and future works
3
The solution of a S.E.M. in high performance
Three different versions of 2SLS are studied. Parallel algorithms for distributed memory have
The methods have been analyzed in different
4
1 12 2 13 3 1 11 1 1 1
t t t M Mt t k kt t
2 21 1 23 3 2 21 1 2 2
t t t M Mt t k kt t
1 1 2 2 3 3 1 1 1 1
Mt M t M t M t MM M t M t Mk kt Mt
− −
t t t
5
t t t
1
1 t t
1 11 1 1 1
t t k kt t
1 1
Mt M t Mk kt Mt
6
1 1
t t n nt t
1
7
OLS can not be used in
Endogenous variables
The proxy of Y is
When the endogenous
8
Try to parallelize at the upest level Share the maximum of information. Each call to 2SLS must share more information
Perform the maximum number of operations
ScaLAPACK and PBLAS libraries are used to
9
In the experiments pdgemm has been used to perform the multiplications, and pdgesv to compute the inverse. The use of ScaLAPACK allows us to obtain a portable routine.
10
Three different versions of the 2SLS
algorithm are presented.
The first is a basic algorithm which
will be improved in the second and the third versions.
In the first version, the structure of
the parallel 2SLS algorithm is stated. In the others versions, the same structure is followed but matrix decompositions are used to obtain lower costs.
11
All the proxys are
All the proxys are
Each processor
12
1
1
1 1
1 1
m k
j j m j j k j
1
ˆ Y
1
1
1
13
1 1 1 1 1 1 1 1 1 1 1 1 1
ˆ ' ' ' ˆ ˆ ˆ ˆ ˆ ' ' ' X X X X Y X Y Y Y X Y Y
=
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ˆ ( ' ) ( ' ) ' ˆ ˆ ˆ ˆ ˆ ( ' ' ( ' ) ' ) ( ' ( ' ) , ) X X X X X Y Y Y Y X X X X Y Y X X X Id Id
1 1 1 1 1
( ' ) ( , ) ' A B A A B D B A B A B Id B D Id
+
14
is taken from X’X
is calculated
1
1
1
1
1
1
1
1
1
1
15
1
1
16
17
18
19
20
Kefren: A cluster of 20 biprocessors Pentium Xeon 2
Marenostrum: A supercomputer based on PowerPC
21
22
23
24
25
26
27
2,79031E-12 2,7896E-12 2000 800 2000 7,78951E-12 7,81023E-12 1500 800 2000 2,49918E-09 2,63E-09 1500 600 1500 2,18451E-08 2,13886E-08 1000 600 1500 4,64279E-13 4,65709E-13 1000 400 1000 3,00927E-12 3,00996E-12 500 400 1000 9,08442E-12 9,13657E-12 500 200 500
dif Inv-Qr Sample Exogenous Endogenous
28
Sometimes a
Tools will be made
Application to real
Develop an algorithm