CSCE 689-600

HW 1: Parallel Programming with MPI

Compile and execute the program in the file compute_pi_mpi.c, which computes an

estimate of using the parallel algorithm discussed in class. The program is available on the

shared Google Drive for this class. It should be compiled and executed on either

ada.tamu.edu or terra.tamu.edu.

Load the Intel software stack prior to compiling and executing the code.

module load intel/2017A

To compile, use the command:

mpiicc -o compute_pi_mpi.exe compute_pi_mpi.c

To execute the program, use

mpirun –np <p> ./compute_pi_mpi.exe <n>

where <n> represents the number of intervals and <p> represents the number of processes.

The output of a sample run is shown below.

mpirun -np 4 compute_pi_mpi.exe 100000000

n = 100000000, p = 4, pi = 3.1415926535897749, relative error =

5.80e-15, time (sec) = 0.0608

The run time of the code should be measured when it is executed in dedicated mode. Use

the batch file compute_pi_mpi.job, to execute the code in dedicated mode using the

following command on ADA:

bsub < compute_pi_mpi.job

On Terra, you will need to use compute_pi.terra_job, and the corresponding

command is:

sbatch compute_pi.terra_job

Execute the code for n=108 with p chosen to be 2k

, for k = 0, 1, …, 6. Specify ptile=4 in the

job file. Using the experimental data obtained from these experiments, answer the following

questions.

1. (10 points) Plot execution time versus p to demonstrate how time varies with the

number of processes. Use a logarithmic scale for the x-axis.

2. (10 points) Plot speedup versus p to demonstrate the change in speedup with p.

3. (5 points) Using the definition: efficiency = speedup/p, plot efficiency versus p to

demonstrate how efficiency changes as the number of processes is increased.

4. (5 points) What value of p minimizes the parallel runtime?

5. (10 points) With n=109 and p=64, determine the value of ptile that minimizes the

total_time. Plot time versus ptile to illustrate your experimental results for this

question.

6. (10 points) Repeat the experiments with p=64 for n=102

, 10

4

, 10

6 and 108

.

a. Plot the speedup observed w.r.t. p=1 versus n.

b. Plot the relative error versus n to illustrate the accuracy of the algorithm as a

function of n.

Submission: Upload a single PDF or MSWord document with your answers to ecampus.