COSC 420 – High-Performance Computing

Lab 3

1 Preliminaries

In scientific and industrial applications, it is exceedingly common to encounter data models that can be

represented by a linear system of equations

a1,1×1 + a1,2×2 + · · · + a1,mxm = b1

a2,1×1 + a2,2×2 + · · · + a2,mxm = b2

.

.

.

an,1×1 + an,2×2 + · · · + an,mxm = bn

which can, of course, be captured by the more concise matrix formula Ax = b, where A ∈ R

n×m, x ∈ R

m,

and b ∈ R

n (we will typically interpret vectors as “column-vectors” in such equations, as above; at other

times, we may need the notation x

T

to denote that the vector is being used in a row-wise fashion).

One method of solving such equations is the Gauss-Jordan elimination algorithm, which proceeds as

follows:

Algorithm 1 Gauss-Jordan Elimination

1: function GaussJordan(A ∈ R

n×m, b ∈ R

n)

2: for Pivot row k = 1, 2, . . . , n do

3: Compute the vector of scalings `

k

i = Ai,k/Ak,k ∀i = 1, . . . , n.

4: Broadcast the vector ` and perform the following on n nodes:

5: for Each row r 6= k do

6: for Each column c = 1, 2, . . . m do

7: Ar,c ← Ar,c − `

k

r

· Ak,c

8: end for

9: for Each column c of b do

10: br,c ← br,c − `

k

r

· bk,c

11: end for

12: end for

13: end for

14: Scale each row of A and b by the diagonal elements of A, so that the diagonal entries of A are now

all 1, and off-diagonal entries are 0

15: The vector b left over is the solution vector for x that solves the linear system. Verify this by using

matrix-vector multiplication! That is, once you solve for x, use your matrix multiplication routine to

check that Ax results in the b vector.

16: end function

1

The Gauss-Jordan elimination algorithm can also be used to compute the inverse of a matrix. That is,

for matrix A, find the matrix A−1

so that AAT = In, the n-dimensional identity matrix (the n × n matrix

with diagonal entries equal to 1 and off-diagonal entries equal to 0). To do this, replace the vector b with In

and perform the same algorithm! You will now essentially solve the linear matrix equation AB = In.

2 Objectives

In this lab you will focus on the following objectives:

1. Develop familiarity with C programming language

2. Develop familiarity with parallel computing tools MPICH and OpenMPI

3. Develop familiarity with linear algebra and solving linear systems of equations.

4. Explore empirical tests for program efficiency in terms of parallel computation and resources

3 Tasks

1. You may work in groups of one or two to complete this lab. Be sure to document in comments and

your README who the group members are.

2. First, take steps to improve the performance of your matrix processing library from last lab. Use

benchmarking (timing, counting operations, etc.) to demonstrate the change in performance (FLOPS,

data processed per second, etc.) versus the old version of your code. Record which programming

techniques you used to achieve this speedup, whether you use cache blocking, different parallelism

methods, etc.

3. Add to your library methods to solve linear systems of equations, and to compute the inverse of a

matrix, by using Gauss-Jordan elimination. For inversion, you may restrict the input to be only square

matrices. Utilize MPI to parallelize the algorithm, as noted in the pseudocode above.

4. Allow your code to run on file input, using the MPI_File family of functions to efficiently distribute

the matrix data for the required operations. Then save the result (and possibly any intermediate data)

to a (shared) file.

5. Note that for debugging output, it may be helpful to have each node write its output to a file, unique

to that process. This will be how we will gather logs when submitting to larger clusters.

6. Test the program on some large matrices (at least tens of thousands of rows/columns, perhaps more).

(a) Run each input several and measure the time to complete each.

i. You can use the MPI “Walltime” function to measure the time taken to get between two

points in a body of code:

double startTime, stopTime;

startTime = MPI_Wtime();

/* Do some work here */

stopTime = MPI_Wtime();

printf(“MPI_Wtime measured: %1.2f seconds\n”, stopTime-startTime);

fflush(stdout); // manually flush the buffer for safety

ii. For a more basic timing method, you can use the features of time.h, e.g.:

2

clock_t begin = clock();

/* here, do your time-consuming job */

clock_t end = clock();

// clock() returns a clock_t and

// CLOCKS_PER_SEC is used to convert it back to seconds

double time_spent = (double)(end – begin) / CLOCKS_PER_SEC;

However, be aware that this does not measure real-time. It only measures the time that the

OS spends on your process, which means it may be shorter than your real-world observation,

for instance with the time shell command.

(b) Record the averages of each, report them in a clean tabular format

(c) Be sure to use tools such as valgrind and gdb to find and fix bugs with your code. Make sure

there are not memory leaks, invalid access, or usage of undefined variables!

7. Include a README file to document your code, any interesting design choices you made, and answer the

following questions:

(a) What is the theoretical time complexity of your algorithms (best and worst case), in terms of the

input size?

(b) According to the data, does adding more nodes perfectly divide the time taken by the program?

(c) What are some real-world software examples that would need the above routines? Why? Would

they benefit greatly from using your distributed code?

(d) How could the code be improved in terms of usability, efficiency, and robustness?

4 Submission

All submitted labs must compile with mpicc and run on the COSC Linux environment. Include a Makefile

to build your code. Upload your project files to MyClasses in a single .zip file. Finally, turn in (stapled)

printouts of your source code, properly commented and formatted with your name, course number, and

complete description of the code and constituent subroutines. Also turn in printouts reflecting several

different runs of your program (you can copy/past from the terminal output window). Be sure to test

different situations, show how the program handles erroneous input and different edge cases.

5 Bonus

1. (10 pts) Use MPI file views and custom datatypes to aid the distribution of your matrix data before

and after processing.

3