COSC 420 – High-Performance Computing

Lab 2

1 Preliminaries

Recall that in the mathematical definition of a matrix, we say that A is a real n × m matrix if

A =

a1,1 a1,2 a1,3 . . . a1,m

a2,1 a2,2 a2,3 . . . a2,m

.

.

.

.

.

.

.

.

.

an,1 an,2 an,3 . . . an,m

where each ai,j ∈ R for i = 1, 2, 3, . . . , n and j = 1, 2, 3, . . . , m. In this notation, the matrix has n rows and

m columns.

If A and B are both of dimension n × m, then the sum is the n × m matrix C with entries given by

Ci,j = ai,j + bi,j

for i = 1, 2, . . . , n and j = 1, 2, . . . , m.

If A is an n × m matrix and B is an m × k matrix, then the product AB is matrix C, where

Ci,j =

Xm

l=1

ai,lbl,j

for i = 1, 2, . . . , n, and j = 1, 2, . . . , k; so C is a new n × k matrix. Thus the (i, j) entry of C is, in another

manner of speaking, the ith row of A, dot product with the jth column of B.

The transpose of n × m matrix A is a m × n matrix denoted as AT and defined such that (AT

)ij = Aji.

Your task below is to design and implement a data structure in C to model a matrix of real numbers,

then to define the operations of 1) addition, 2) subtraction, and 3) multiplication, and 4) transpose.

To program with matrices in C: Recall that in c++, one may declare multi-dimensional arrays. For

example: int arr[3][5] declares an array of three five-element integer arrays, containing fifteen total

integers. One way to visualize this is as follows:

[

[1, 2, 3, 4, 5], // arr[0]

[6, 7, 8, 9, 10], // arr[1]

[11,12,13,14,15] // arr[2]

]

To access the second element in the third array, one would use the syntax arr[2][1].

However, in the C language, it becomes cumbersome and less efficient to use the two-dimensional style,

due to the lack of constructors, destructors, etc. Instead, a more conventional technique is to allocate a

1

single-dimension array large enough to hold all n · m elements. Then, to access the (i, j)th element, one

needs to work a bit harder; the code really needs then to access element i· n + j. To make this a bit simpler,

a recommended helper is to use a macro (instead of a function, for efficiency).

// set up a C macro to calculate the location of element (i,j)

#define INDEX(n,m,i,j) m*i + j

// .. some code to set up sizes n and m

int* A = malloc(n*m*sizeof(int));

// populate the arrayx with random numbers, using matrix-style indexing

for(int i=0; i<n; i++){

for(int j=0; j<m; j++){

A[INDEX(n,m,i,j)] = rand(); // the compiler substitutes inside the [] with “n*i + j”

}

}

// the matrix is (much) easier to de-allocate as well

free(A);

2 Objectives

In this lab you will focus on the following objectives:

1. Develop familiarity with C programming language

2. Develop familiarity with parallel computing tools MPICH and OpenMPI

3. Explore empirical tests for program efficiency in terms of parallel computation and resources

3 Tasks

1. You may work in groups of one or two to complete this lab. Be sure to document in comments and

your README who the group members are.

2. Write a short program to practice using MPI_Scatter and MPI_Reduce where:

(a) A root node generates two random vectors in high dimension (thousands)

(b) “Blocks” of those vectors are then scattered to various nodes, each of which performs a partial

inner product

(c) The partial inner products are then combined by the root and reported to the user

3. Write a library to perform the basic matrix operations of addition, subtraction, multiplication, and

transpose.

(a) Distribute the tasks of addition, by splitting the matrices into “blocks”

(b) For multiplication, each member of the result matrix can be calculated independently, given

enough parts of the argument matrices. Consider ways to do this distribution efficiently, keeping

in mind the cost of replicating the matrices. Be sure to document your procedure and record its

performance metrics.

(c) Use MPI_Scatter and its variants to distribute matrix data (possibly re-using the code from

before).

2

4. Note that for debugging output, it may be helpful to have each node write its output to a file, unique

to that process. This will be how we will gather logs when submitting to larger clusters. For example:

// using the fopen utility to get a file handle and send it data through a buffer

// use the stdio.h libraries to send output through buffers

// NB: the fflush(FILE* fh) function can force a buffer flush through the file

FILE *handle;

char* fname[256]; // should be big enough 🙂

sprintf(fname, “outfile_%d.txt”, myRank);

handle = fopen(fname, “rw”);

fprintf(handle, “Begin output from processor %d”, myRank);

puts(“Can also use puts!”, handle);

fclose(handle); // release the file

5. Test the program on some large matrices (at least tens of thousands of rows/columns, perhaps more).

(a) Run each input several times and use the time command to measure the time to complete each

(b) Record the averages of each, report them in a clean tabular format

(c) Be sure to use tools such as valgrind and gdb to find and fix bugs with your code. Make sure

there are not memory leaks, invalid access, or usage of undefined variables!

6. Include a README file to document your code, any interesting design choices you made, and answer the

following questions:

(a) What is the theoretical time complexity of your algorithms (best and worst case), in terms of the

input size?

(b) According to the data, does adding more nodes perfectly divide the time taken by the program?

(c) What are some real-world software examples that would need the above routines? Why? Would

they benefit greatly from using your distributed code?

(d) How could the code be improved in terms of usability, efficiency, and robustness?

4 Submission

All submitted labs must compile with mpicc and run on the COSC Linux environment. Include a Makefile

to build your code. Include the output from test cases to demonstrate the correctness and completeness of

your program (the script) utility can be of help here, as well as sending output directly to a file. Upload

your project files to MyClasses in a single .zip file.

3