Charon: linear algebra made easy G. M. Crosswhite Department of - - PowerPoint PPT Presentation

charon linear algebra made easy
SMART_READER_LITE
LIVE PREVIEW

Charon: linear algebra made easy G. M. Crosswhite Department of - - PowerPoint PPT Presentation

The Vision Components Charon: linear algebra made easy G. M. Crosswhite Department of Physics University of Washington Thursday, April 16, 2009 G. M. Crosswhite Charon: linear algebra made easy The Vision Components My story Quantum


slide-1
SLIDE 1

The Vision Components

Charon: linear algebra made easy

  • G. M. Crosswhite

Department of Physics University of Washington

Thursday, April 16, 2009

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-2
SLIDE 2

The Vision Components

My story

1

Quantum computing!

2

Can we build a reliable memory for quantum bits?

3

Numerical simulation of quantum systems

4

New algorithms for quantum simulations

5

Can they be made to scale?

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-3
SLIDE 3

The Vision Components

My story

1

Quantum computing!

2

Can we build a reliable memory for quantum bits?

3

Numerical simulation of quantum systems

4

New algorithms for quantum simulations

5

Can they be made to scale?

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-4
SLIDE 4

The Vision Components

My story

1

Quantum computing!

2

Can we build a reliable memory for quantum bits?

3

Numerical simulation of quantum systems

4

New algorithms for quantum simulations

5

Can they be made to scale?

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-5
SLIDE 5

The Vision Components

My story

1

Quantum computing!

2

Can we build a reliable memory for quantum bits?

3

Numerical simulation of quantum systems

4

New algorithms for quantum simulations

5

Can they be made to scale?

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-6
SLIDE 6

The Vision Components

My story

1

Quantum computing!

2

Can we build a reliable memory for quantum bits?

3

Numerical simulation of quantum systems

4

New algorithms for quantum simulations

5

Can they be made to scale?

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-7
SLIDE 7

The Vision Components

My story

1

Quantum computing!

2

Can we build a reliable memory for quantum bits?

3

Numerical simulation of quantum systems

4

New algorithms for quantum simulations

5

Can they be made to scale?

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-8
SLIDE 8

The Vision Components

Let’s do it!

Tools we need:

Tensor I/O Tensor contractions Singular Value Decompositions (SVDs) Minimum eigenvalue solving

Tools we have:

MPI Parallel I/O Global Arrays SciLAPACK ARPACK

Problems:

Incomplete! Cumbersome to use!

Lots of boilerplate and plumbing code needed Synchronous communication model

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-9
SLIDE 9

The Vision Components

Let’s do it!

Tools we need:

Tensor I/O Tensor contractions Singular Value Decompositions (SVDs) Minimum eigenvalue solving

Tools we have:

MPI Parallel I/O Global Arrays SciLAPACK ARPACK

Problems:

Incomplete! Cumbersome to use!

Lots of boilerplate and plumbing code needed Synchronous communication model

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-10
SLIDE 10

The Vision Components

Let’s do it!

Tools we need:

Tensor I/O Tensor contractions Singular Value Decompositions (SVDs) Minimum eigenvalue solving

Tools we have:

MPI Parallel I/O Global Arrays SciLAPACK ARPACK

Problems:

Incomplete! Cumbersome to use!

Lots of boilerplate and plumbing code needed Synchronous communication model

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-11
SLIDE 11

The Vision Components

Let’s do it!

Tools we need:

Tensor I/O Tensor contractions Singular Value Decompositions (SVDs) Minimum eigenvalue solving

Tools we have:

MPI Parallel I/O Global Arrays SciLAPACK ARPACK

Problems:

Incomplete! Cumbersome to use!

Lots of boilerplate and plumbing code needed Synchronous communication model

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-12
SLIDE 12

The Vision Components

The Vision

A parallel linear-algebra intensive code should not be more complicated than the algorithm being implemented.

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-13
SLIDE 13

The Vision Components

A Simple Example

1

Read in an array

2

Increment all entries in the array by 1

3

Sum all of the entries in the array

4

Divide all entries in the array by the sum

5

Write out the array

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-14
SLIDE 14

The Vision Components

A Simple Example

DistributedArray<float,1> A(16,1<<20); A.loadFrom("infile"); A += 1; float s = A.sum(); A /= s; A.writeTo("outfile");

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-15
SLIDE 15

The Vision Components

A More Complicated Example

1

Read in matrices A and B

2

Invert A and B

3

Multiply them together to form M

4

Break M apart back into A and B using a SVD

5

Invert A and B again

6

Save A and B

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-16
SLIDE 16

The Vision Components

A More Complicated Example

DistributedArray<float,2> A(8,1024,1024), B(8,1024,1024), M(16,1024,1024); DistributedArray<float,1> Sigma(16,1024); A.loadFrom("A.in"); B.loadFrom("B.in"); inv(A); inv(B); matmul(A,B,M); svd(S,A,Sigma,B); inv(A); inv(B); A.writeTo("A.out"); B.writeTo("B.out");

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-17
SLIDE 17

The Vision Components

Maestro, please!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-18
SLIDE 18

The Vision Components

Maestro, please!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-19
SLIDE 19

The Vision Components

Maestro, please!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-20
SLIDE 20

The Vision Components

Maestro, please!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-21
SLIDE 21

The Vision Components

Maestro, please!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-22
SLIDE 22

The Vision Components

Maestro, please!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-23
SLIDE 23

The Vision Components

Maestro, please!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-24
SLIDE 24

The Vision Components

Maestro, please!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-25
SLIDE 25

The Vision Components

Maestro, please!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-26
SLIDE 26

The Vision Components

Maestro, please!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-27
SLIDE 27

The Vision Components

The Recipe

Ingrediants: Asynchronous communication Master/Slave architecture Explicit ordering and data dependencies The result: Local coordination of task scheduling

(Caveat: Central decisions)

Automatic parallelization of parallel tasks (Effectively building and walking a DAG.)

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-28
SLIDE 28

The Vision Components

The Recipe

Ingrediants: Asynchronous communication Master/Slave architecture Explicit ordering and data dependencies The result: Local coordination of task scheduling

(Caveat: Central decisions)

Automatic parallelization of parallel tasks (Effectively building and walking a DAG.)

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-29
SLIDE 29

The Vision Components

The Recipe

Ingrediants: Asynchronous communication Master/Slave architecture Explicit ordering and data dependencies The result: Local coordination of task scheduling

(Caveat: Central decisions)

Automatic parallelization of parallel tasks (Effectively building and walking a DAG.)

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-30
SLIDE 30

The Vision Components

The Recipe

Ingrediants: Asynchronous communication Master/Slave architecture Explicit ordering and data dependencies The result: Local coordination of task scheduling

(Caveat: Central decisions)

Automatic parallelization of parallel tasks (Effectively building and walking a DAG.)

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-31
SLIDE 31

The Vision Components

Why Charm++?

Asynchronous communication model AMPI provides “virtualization” MPI libraries Emphasis on higher-level programming

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-32
SLIDE 32

The Vision Components

Why Charm++?

Asynchronous communication model AMPI provides “virtualization” MPI libraries Emphasis on higher-level programming

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-33
SLIDE 33

The Vision Components

Why Charm++?

Asynchronous communication model AMPI provides “virtualization” MPI libraries Emphasis on higher-level programming

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-34
SLIDE 34

The Vision Components

Why Charm++?

Asynchronous communication model AMPI provides “virtualization” MPI libraries Emphasis on higher-level programming

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-35
SLIDE 35

The Vision Components

The Components

Array Master/Slave Controller

Distributed Array Block-cyclic Array

Operations AMPI Master/Slave Controller

BLACS Grid Master/Slave (interface to BLACS) Block IO Master/Slave (interface to ROMIO)

Matrix multiplier

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-36
SLIDE 36

The Vision Components

The Components

Array Master/Slave Controller

Distributed Array Block-cyclic Array

Operations AMPI Master/Slave Controller

BLACS Grid Master/Slave (interface to BLACS) Block IO Master/Slave (interface to ROMIO)

Matrix multiplier

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-37
SLIDE 37

The Vision Components

The Components

Array Master/Slave Controller

Distributed Array Block-cyclic Array

Operations AMPI Master/Slave Controller

BLACS Grid Master/Slave (interface to BLACS) Block IO Master/Slave (interface to ROMIO)

Matrix multiplier

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-38
SLIDE 38

The Vision Components

The Components

Array Master/Slave Controller

Distributed Array Block-cyclic Array

Operations AMPI Master/Slave Controller

BLACS Grid Master/Slave (interface to BLACS) Block IO Master/Slave (interface to ROMIO)

Matrix multiplier

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-39
SLIDE 39

The Vision Components

The Components

Array Master/Slave Controller

Distributed Array Block-cyclic Array

Operations AMPI Master/Slave Controller

BLACS Grid Master/Slave (interface to BLACS) Block IO Master/Slave (interface to ROMIO)

Matrix multiplier

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-40
SLIDE 40

The Vision Components

The Components

Array Master/Slave Controller − → ordered

Distributed Array Block-cyclic Array

Operations AMPI Master/Slave Controller − → coordinated

BLACS Grid Master/Slave (interface to BLACS) Block IO Master/Slave (interface to ROMIO)

Matrix multiplier

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-41
SLIDE 41

The Vision Components

Array Master/Slave Architecture

Question: How do we enforce ordering of operations? Answer: Operation counters + slave priority queues Question: How do we implement the counter? Global counter for all slaves

Commands sent to single slaves waste bandwidth!

Separate counter for each slave

Huge table needed to track counters! Global operations can no longer be broadcasted!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-42
SLIDE 42

The Vision Components

Array Master/Slave Architecture

Question: How do we enforce ordering of operations? Answer: Operation counters + slave priority queues Question: How do we implement the counter? Global counter for all slaves

Commands sent to single slaves waste bandwidth!

Separate counter for each slave

Huge table needed to track counters! Global operations can no longer be broadcasted!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-43
SLIDE 43

The Vision Components

Array Master/Slave Architecture

Question: How do we enforce ordering of operations? Answer: Operation counters + slave priority queues Question: How do we implement the counter? Global counter for all slaves

Commands sent to single slaves waste bandwidth!

Separate counter for each slave

Huge table needed to track counters! Global operations can no longer be broadcasted!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-44
SLIDE 44

The Vision Components

Array Master/Slave Architecture

Question: How do we enforce ordering of operations? Answer: Operation counters + slave priority queues Question: How do we implement the counter? Global counter for all slaves

Commands sent to single slaves waste bandwidth!

Separate counter for each slave

Huge table needed to track counters! Global operations can no longer be broadcasted!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-45
SLIDE 45

The Vision Components

Array Master/Slave Architecture

Question: How do we enforce ordering of operations? Answer: Operation counters + slave priority queues Question: How do we implement the counter? Global counter for all slaves

Commands sent to single slaves waste bandwidth!

Separate counter for each slave

Huge table needed to track counters! Global operations can no longer be broadcasted!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-46
SLIDE 46

The Vision Components

Array Master/Slave Architecture

Question: How do we enforce ordering of operations? Answer: Operation counters + slave priority queues Question: How do we implement the counter? Global counter for all slaves

Commands sent to single slaves waste bandwidth!

Separate counter for each slave

Huge table needed to track counters! Global operations can no longer be broadcasted!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-47
SLIDE 47

The Vision Components

Array Master/Slave Architecture

Question: How do we enforce ordering of operations? Answer: Operation counters + slave priority queues Question: How do we implement the counter? Global counter for all slaves

Commands sent to single slaves waste bandwidth!

Separate counter for each slave

Huge table needed to track counters! Global operations can no longer be broadcasted!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-48
SLIDE 48

The Vision Components

Array Master/Slave Architecture

Question: How do we enforce ordering of operations? Answer: Operation counters + slave priority queues Question: How do we implement the counter? Global counter for all slaves

Commands sent to single slaves waste bandwidth!

Separate counter for each slave

Huge table needed to track counters! Global operations can no longer be broadcasted!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-49
SLIDE 49

The Vision Components

Array Master/Slave Architecture

My solution: Global counter with “stealing” Every operation increments the global counter Single slaves may “steal” a value of the counter Broadcasts to all slaves contain a list of who stole the counter since the last broadcast. This way every slave knows which values they should skip and which they need to wait for. Example:

1

Broadcast 1 → 1, []

2

Pointcast 2 to A

3

Pointcast 3 to B

4

Broadcast 1 → 4, [A,B]

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-50
SLIDE 50

The Vision Components

Array Master/Slave Architecture

My solution: Global counter with “stealing” Every operation increments the global counter Single slaves may “steal” a value of the counter Broadcasts to all slaves contain a list of who stole the counter since the last broadcast. This way every slave knows which values they should skip and which they need to wait for. Example:

1

Broadcast 1 → 1, []

2

Pointcast 2 to A

3

Pointcast 3 to B

4

Broadcast 1 → 4, [A,B]

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-51
SLIDE 51

The Vision Components

Array Master/Slave Architecture

My solution: Global counter with “stealing” Every operation increments the global counter Single slaves may “steal” a value of the counter Broadcasts to all slaves contain a list of who stole the counter since the last broadcast. This way every slave knows which values they should skip and which they need to wait for. Example:

1

Broadcast 1 → 1, []

2

Pointcast 2 to A

3

Pointcast 3 to B

4

Broadcast 1 → 4, [A,B]

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-52
SLIDE 52

The Vision Components

Array Master/Slave Architecture

My solution: Global counter with “stealing” Every operation increments the global counter Single slaves may “steal” a value of the counter Broadcasts to all slaves contain a list of who stole the counter since the last broadcast. This way every slave knows which values they should skip and which they need to wait for. Example:

1

Broadcast 1 → 1, []

2

Pointcast 2 to A

3

Pointcast 3 to B

4

Broadcast 1 → 4, [A,B]

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-53
SLIDE 53

The Vision Components

Array Master/Slave Architecture

My solution: Global counter with “stealing” Every operation increments the global counter Single slaves may “steal” a value of the counter Broadcasts to all slaves contain a list of who stole the counter since the last broadcast. This way every slave knows which values they should skip and which they need to wait for. Example:

1

Broadcast 1 → 1, []

2

Pointcast 2 to A

3

Pointcast 3 to B

4

Broadcast 1 → 4, [A,B]

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-54
SLIDE 54

The Vision Components

DistributedArray

Templated on type and dimension Master:

Operation ordering Addressing – map from coordinates to slave number

Slaves: (1D array)

Operation execution Local data stored using Blitz++, a high level array class templated on type and dimension

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-55
SLIDE 55

The Vision Components

DistributedArray

Templated on type and dimension Master:

Operation ordering Addressing – map from coordinates to slave number

Slaves: (1D array)

Operation execution Local data stored using Blitz++, a high level array class templated on type and dimension

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-56
SLIDE 56

The Vision Components

DistributedArray

Templated on type and dimension Master:

Operation ordering Addressing – map from coordinates to slave number

Slaves: (1D array)

Operation execution Local data stored using Blitz++, a high level array class templated on type and dimension

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-57
SLIDE 57

The Vision Components

DistributedArray

Templated on type and dimension Master:

Operation ordering Addressing – map from coordinates to slave number

Slaves: (1D array)

Operation execution Local data stored using Blitz++, a high level array class templated on type and dimension

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-58
SLIDE 58

The Vision Components

Operations

Whole-array transformations

add/subtract/divide/multiply by a constant sine, cosine, absolute value randomization etc.

Single-element transformations Array reductions (sum, product, etc.) Stupidly easy to implement more: Add to one of my switch statements Subclass Operation Templates are your friend! Don’t need to case over the type of the array. Not limited to numeric types / operations!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-59
SLIDE 59

The Vision Components

Operations

Whole-array transformations

add/subtract/divide/multiply by a constant sine, cosine, absolute value randomization etc.

Single-element transformations Array reductions (sum, product, etc.) Stupidly easy to implement more: Add to one of my switch statements Subclass Operation Templates are your friend! Don’t need to case over the type of the array. Not limited to numeric types / operations!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-60
SLIDE 60

The Vision Components

Operations

Whole-array transformations

add/subtract/divide/multiply by a constant sine, cosine, absolute value randomization etc.

Single-element transformations Array reductions (sum, product, etc.) Stupidly easy to implement more: Add to one of my switch statements Subclass Operation Templates are your friend! Don’t need to case over the type of the array. Not limited to numeric types / operations!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-61
SLIDE 61

The Vision Components

Example

DistributedArray<int,3> vec(6,2,3,4); Array<int,3> x(2,3,4); x = 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,

  • 1,-2,-3,-5,
  • 7,-9,-11,-13,
  • 17,-19,-23,-29,
  • 31,-37,-41,-43;

vec = x;

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-62
SLIDE 62

The Vision Components

Example

vec++; vec *= -1; cout << "0,0,0 = " << (int)vec(0,0,0) << endl; for(int i = 0; i < 2; i++) for(int j = 0; j < 3; j++) for(int k = 0; k < 4; k += 2) vec(i,j,k) *= -1; vec.abs(); vec.gatherInto(x);

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-63
SLIDE 63

The Vision Components

AMPI Controller

Purpose: To allow access to libraries written in MPI

BLACS – opens the door to parallel linear algebra libraries such as ScaLAPACK ROMIO – parallel I/O

Contains no data itself Requires coordination but not ordering Uses TCharm to provide the virtual MPI layer

Slaves inherit from TCharm Constructor launches the TCharm thread TCharm thread pulls operations from a “ready” queue until none are left, then it goes to sleep Slave chare wakes up the thread when a new operation is ready to be run

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-64
SLIDE 64

The Vision Components

AMPI Controller

Purpose: To allow access to libraries written in MPI

BLACS – opens the door to parallel linear algebra libraries such as ScaLAPACK ROMIO – parallel I/O

Contains no data itself Requires coordination but not ordering Uses TCharm to provide the virtual MPI layer

Slaves inherit from TCharm Constructor launches the TCharm thread TCharm thread pulls operations from a “ready” queue until none are left, then it goes to sleep Slave chare wakes up the thread when a new operation is ready to be run

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-65
SLIDE 65

The Vision Components

AMPI Controller

Purpose: To allow access to libraries written in MPI

BLACS – opens the door to parallel linear algebra libraries such as ScaLAPACK ROMIO – parallel I/O

Contains no data itself Requires coordination but not ordering Uses TCharm to provide the virtual MPI layer

Slaves inherit from TCharm Constructor launches the TCharm thread TCharm thread pulls operations from a “ready” queue until none are left, then it goes to sleep Slave chare wakes up the thread when a new operation is ready to be run

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-66
SLIDE 66

The Vision Components

AMPI Controller

Purpose: To allow access to libraries written in MPI

BLACS – opens the door to parallel linear algebra libraries such as ScaLAPACK ROMIO – parallel I/O

Contains no data itself Requires coordination but not ordering Uses TCharm to provide the virtual MPI layer

Slaves inherit from TCharm Constructor launches the TCharm thread TCharm thread pulls operations from a “ready” queue until none are left, then it goes to sleep Slave chare wakes up the thread when a new operation is ready to be run

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-67
SLIDE 67

The Vision Components

AMPI Controller

Purpose: To allow access to libraries written in MPI

BLACS – opens the door to parallel linear algebra libraries such as ScaLAPACK ROMIO – parallel I/O

Contains no data itself Requires coordination but not ordering Uses TCharm to provide the virtual MPI layer

Slaves inherit from TCharm Constructor launches the TCharm thread TCharm thread pulls operations from a “ready” queue until none are left, then it goes to sleep Slave chare wakes up the thread when a new operation is ready to be run

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-68
SLIDE 68

The Vision Components

Privateer

Problem: Most C libraries are not designed to be multi-threaded! They assume that they have exclusive access to their global/static variables. Solution: Replace all references to global/static variables with pointers into a thread-local structure. Privateer accomplishes this, and is designed to work on arbitrary C code; it uses a Converse thread-private variable to store a pointer to the global variable table for the thread. http://launchpad.net/privateer

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-69
SLIDE 69

The Vision Components

Privateer

Problem: Most C libraries are not designed to be multi-threaded! They assume that they have exclusive access to their global/static variables. Solution: Replace all references to global/static variables with pointers into a thread-local structure. Privateer accomplishes this, and is designed to work on arbitrary C code; it uses a Converse thread-private variable to store a pointer to the global variable table for the thread. http://launchpad.net/privateer

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-70
SLIDE 70

The Vision Components

Privateer

Problem: Most C libraries are not designed to be multi-threaded! They assume that they have exclusive access to their global/static variables. Solution: Replace all references to global/static variables with pointers into a thread-local structure. Privateer accomplishes this, and is designed to work on arbitrary C code; it uses a Converse thread-private variable to store a pointer to the global variable table for the thread. http://launchpad.net/privateer

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-71
SLIDE 71

The Vision Components

Matrix Multiplication

  • j

Aij · Bjk = Cik Chare (i, j, k) computes Aij · Bjk, and then contributes to a sum reduction on the section (i, :, k). Chunks sent in an ArrayMessage to minimize copying. Each chare only needs to know Reduction section Result callback

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-72
SLIDE 72

The Vision Components

Matrix Multiplication

  • j

Aij · Bjk = Cik Chare (i, j, k) computes Aij · Bjk, and then contributes to a sum reduction on the section (i, :, k). Chunks sent in an ArrayMessage to minimize copying. Each chare only needs to know Reduction section Result callback

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-73
SLIDE 73

The Vision Components

Matrix Multiplication

  • j

Aij · Bjk = Cik Chare (i, j, k) computes Aij · Bjk, and then contributes to a sum reduction on the section (i, :, k). Chunks sent in an ArrayMessage to minimize copying. Each chare only needs to know Reduction section Result callback

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-74
SLIDE 74

The Vision Components

Matrix Multiplication

  • j

Aij · Bjk = Cik Chare (i, j, k) computes Aij · Bjk, and then contributes to a sum reduction on the section (i, :, k). Chunks sent in an ArrayMessage to minimize copying. Each chare only needs to know Reduction section Result callback

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-75
SLIDE 75

The Vision Components

Matrix Multiplication

A · B · C · D = (A · B) · (C · D)

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-76
SLIDE 76

The Vision Components

Closing

Status: Basic infrastructure largely complete; lots of simple

  • perations could easily be implemented

Still wrestling with getting libraries to work under AMPI

(Global variable privatization problem has been solved by Privateer.)

Need to profile matrix multiplication algorithm and extend to implement tensor contractions Repositories Online: http://launchpad.net/privateer http://launchpad.net/charon Please let me know what you think and/or want!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-77
SLIDE 77

The Vision Components

Closing

Status: Basic infrastructure largely complete; lots of simple

  • perations could easily be implemented

Still wrestling with getting libraries to work under AMPI

(Global variable privatization problem has been solved by Privateer.)

Need to profile matrix multiplication algorithm and extend to implement tensor contractions Repositories Online: http://launchpad.net/privateer http://launchpad.net/charon Please let me know what you think and/or want!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-78
SLIDE 78

The Vision Components

Closing

Status: Basic infrastructure largely complete; lots of simple

  • perations could easily be implemented

Still wrestling with getting libraries to work under AMPI

(Global variable privatization problem has been solved by Privateer.)

Need to profile matrix multiplication algorithm and extend to implement tensor contractions Repositories Online: http://launchpad.net/privateer http://launchpad.net/charon Please let me know what you think and/or want!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-79
SLIDE 79

The Vision Components

Closing

Status: Basic infrastructure largely complete; lots of simple

  • perations could easily be implemented

Still wrestling with getting libraries to work under AMPI

(Global variable privatization problem has been solved by Privateer.)

Need to profile matrix multiplication algorithm and extend to implement tensor contractions Repositories Online: http://launchpad.net/privateer http://launchpad.net/charon Please let me know what you think and/or want!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-80
SLIDE 80

The Vision Components

Closing

Status: Basic infrastructure largely complete; lots of simple

  • perations could easily be implemented

Still wrestling with getting libraries to work under AMPI

(Global variable privatization problem has been solved by Privateer.)

Need to profile matrix multiplication algorithm and extend to implement tensor contractions Repositories Online: http://launchpad.net/privateer http://launchpad.net/charon Please let me know what you think and/or want!

  • G. M. Crosswhite

Charon: linear algebra made easy

slide-81
SLIDE 81

The Vision Components

Closing

Status: Basic infrastructure largely complete; lots of simple

  • perations could easily be implemented

Still wrestling with getting libraries to work under AMPI

(Global variable privatization problem has been solved by Privateer.)

Need to profile matrix multiplication algorithm and extend to implement tensor contractions Repositories Online: http://launchpad.net/privateer http://launchpad.net/charon Please let me know what you think and/or want!

  • G. M. Crosswhite

Charon: linear algebra made easy