Message-passing Two Steps Least Square Algorithms for Simultaneous - - PowerPoint PPT Presentation

message passing two steps least square algorithms for
SMART_READER_LITE
LIVE PREVIEW

Message-passing Two Steps Least Square Algorithms for Simultaneous - - PowerPoint PPT Presentation

Message-passing Two Steps Least Square Algorithms for Simultaneous Equations Models Jos Juan Lpez Espn Universidad Miguel Hernndez (Elche, Spain) Domingo Gimnez Cnovas Universidad de Murcia (Murcia, Spain) 1 Contents


slide-1
SLIDE 1

1

Message-passing Two Steps Least Square Algorithms for Simultaneous Equations Models

José Juan López Espín

Universidad Miguel Hernández (Elche, Spain)

Domingo Giménez Cánovas

Universidad de Murcia (Murcia, Spain)

slide-2
SLIDE 2

2

Contents

 Introduction  Simultaneous equations models  OLS and 2SLS techniques  Three different versions of 2SLS algorithm

 General  Inverse decomposition  QR decomposition

 Experimental results  Conclusions and future works

slide-3
SLIDE 3

3

Introduction

 The solution of a S.E.M. in high performance

parallel systems is studied using 2SLS.

 Three different versions of 2SLS are studied.  Parallel algorithms for distributed memory have

been developed for the three versions.

 The methods have been analyzed in different

parallel systems.

slide-4
SLIDE 4

4

Simultaneous Equations Models

The scheme of a system with M equations, M endogenous variables and k predetermined variables is (structural form)

These equations can be represented in matrix form

1 12 2 13 3 1 11 1 1 1

... ...

t t t M Mt t k kt t

Y Y Y Y X X u β β β γ γ = + + + + + + +

2 21 1 23 3 2 21 1 2 2

... ...

t t t M Mt t k kt t

Y Y Y Y X X u β β β γ γ = + + + + + + +

1 1 2 2 3 3 1 1 1 1

... ...

Mt M t M t M t MM M t M t Mk kt Mt

Y Y Y Y Y X X u β β β β γ γ

− −

= + + + + + + + +

t t t

BY X u +G + =

slide-5
SLIDE 5

5

Simultaneous Equations Models

The structural form can be expressed in reduced form with

and

t t t

Y X v = P +

1

B- P = - G

1 t t

v B u

  • = -

1 11 1 1 1

...

t t k kt t

Y X X v p p = + + +

1 1

...

Mt M t Mk kt Mt

Y X X v p p = + + +

slide-6
SLIDE 6

6

OLS (Method)

OLS (Ordinary Least Square) can be used to solve a regression model In matrix form The expression of the estimator is

1 1

...

t t n nt t

Y X X u a a = + + +

Y X u b = +

1

ˆ ( ) X X X Y b

  • =
slide-7
SLIDE 7

7

2SLS (Two Step Least Squares)

 OLS can not be used in

structural form because random variable and endogenous variables are correlated

 Endogenous variables

are replaced for approximations (proxys variables)

 The proxy of Y is

calculated using OLS with Y and the exogenous in the system.

 When the endogenous

have been replaced, OLS is used again in the equation

slide-8
SLIDE 8

8

Parallel Algorithm for distributed memory

 Try to parallelize at the upest level  Share the maximum of information.  Each call to 2SLS must share more information

to reduce the number of operations.

 Perform the maximum number of operations

between all the processors at the beginning of the algorithm to be used for any processor in

  • ther parts of the algorithm.

 ScaLAPACK and PBLAS libraries are used to

make a portable program

slide-9
SLIDE 9

9

OLSp (Parallel OLS)

In the experiments pdgemm has been used to perform the multiplications, and pdgesv to compute the inverse. The use of ScaLAPACK allows us to obtain a portable routine.

slide-10
SLIDE 10

10

2SLS for a system (Parallel 2SLS)

 Three different versions of the 2SLS

algorithm are presented.

 The first is a basic algorithm which

will be improved in the second and the third versions.

 In the first version, the structure of

the parallel 2SLS algorithm is stated. In the others versions, the same structure is followed but matrix decompositions are used to obtain lower costs.

slide-11
SLIDE 11

11

The first version of 2SLS

 All the proxys are

calculated at the beginning of the algorithm

 All the proxys are

distributed in all the processors

 Each processor

solves an equation using OLS sequentially

slide-12
SLIDE 12

12

The 2nd v. of 2SLS (inverse decomposition)

Solve an equation where the proxy variables have been substituted before (they are calculated at the beginning) The set of endogenous variables of the equation is and X1 is the set of predetermined, and then the variables of the equation are the matrix [ X1] And ([ X1]t [ X1])-1[ X1]t yj must be solved

1

ˆ Y

1

ˆ Y

1 1

1 1

ˆ ˆ ... ...

m k

j j m j j k j

y y y x x a a a g g e = + + + + + + +

1

ˆ Y

1

ˆ Y

1

ˆ Y

1

ˆ Y

slide-13
SLIDE 13

13

The 2nd v. of 2SLS (inverse decomposition) The inverse: Using

1 1 1 1 1 1 1 1 1 1 1 1 1

ˆ ' ' ' ˆ ˆ ˆ ˆ ˆ ' ' ' X X X X Y X Y Y Y X Y Y

  • =

=

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

ˆ ( ' ) ( ' ) ' ˆ ˆ ˆ ˆ ˆ ( ' ' ( ' ) ' ) ( ' ( ' ) , ) X X X X X Y Y Y Y X X X X Y Y X X X Id Id

  • +
  • 1

1 1 1 1 1

( ' ) ( , ) ' A B A A B D B A B A B Id B D Id

  • =

+

slide-14
SLIDE 14

14

The 2nd v. of 2SLS (inverse decomposition)

(X1’X1)

is taken from X’X

(X1’X1)-1

is calculated

X1’ is taken from X’ Y (X1’X1)-1 X1’ is calculated (cost 2k2m+ 2/3k3) ’X1(X1’X1)-1 X1’ is calculated (cost 2m2k) ’ is taken from ( ‘ - ‘X1(X1’X1)-1 X1’ )-1 is calculated (cost 2/3m3 )

1

ˆ Y

1

ˆ Y

1

ˆ Y

1

ˆ Y

1

ˆ Y

Y Y' ˆ

1

ˆ Y

1

ˆ Y

1

ˆ Y

1

ˆ Y

1

ˆ Y

slide-15
SLIDE 15

15

The 2nd v. of 2SLS (inverse decomposition) To calculate [X1 ]’yj

 X’1yj can be taken from XtY which was

calculated to obtain Pi

 ( ’yj ) can be taken from

1

ˆ Y

Y Y' ˆ

1

ˆ Y

slide-16
SLIDE 16

16

The 2nd v. of 2SLS (inverse decomposition)

Finally, the algorithm is

slide-17
SLIDE 17

17

The 3rd v. of 2SLS (QR decomposition)

X is decomposed as QR using Householder method, where Q is orthogonal and R upper triangular.

slide-18
SLIDE 18

18

The 3rd v. of 2SLS (QR decomposition)

The algorithm is

slide-19
SLIDE 19

19

slide-20
SLIDE 20

20

Computer System

 Kefren: A cluster of 20 biprocessors Pentium Xeon 2

Ghz interconnected by a SCI net with a Bull 2D topology in a mesh of 4 £ 5. Each node has 1 Gigabyte RAM.

 Marenostrum: A supercomputer based on PowerPC

processors, BladeCenter architecture, a Linux system and a Myrinet interconnection. The main characteristics are: 10240 IBM Power PC 970MP processors at 2.3 GHz (2560 JS21 blades), 20 TB of main memory, 280 + 90 TB of disk storage and a peak Performance of 94,21

  • Teraflops. Marenostrum is the most powerful

supercomputer in Europe and the fifth in the world, according to the last TOP500 list.

slide-21
SLIDE 21

21

The first version of 2SLS

slide-22
SLIDE 22

22

The first version of 2SLS

slide-23
SLIDE 23

23

The 2nd v. of 2SLS (inverse decomposition)

slide-24
SLIDE 24

24

The 2nd v. of 2SLS (inverse decomposition)

slide-25
SLIDE 25

25

The 3rd v. of 2SLS (QR decomposition)

slide-26
SLIDE 26

26

Comparison between the three techniques

slide-27
SLIDE 27

27

Comparison of the precisions between the three techniques

2,79031E-12 2,7896E-12 2000 800 2000 7,78951E-12 7,81023E-12 1500 800 2000 2,49918E-09 2,63E-09 1500 600 1500 2,18451E-08 2,13886E-08 1000 600 1500 4,64279E-13 4,65709E-13 1000 400 1000 3,00927E-12 3,00996E-12 500 400 1000 9,08442E-12 9,13657E-12 500 200 500

  • dif. Inv-Normal

dif Inv-Qr Sample Exogenous Endogenous

slide-28
SLIDE 28

28

Conclusions and Future works

 Sometimes a

Simultaneous Equations Model needs special software and be solved in High Performance Systems

 Tools will be made

freely available to the scientific community

 Application to real

problems

 Develop an algorithm

to find the best model