MPI - Blocking point-to-point communication¶
The simplest method to communicate with MPI is point-to-point communication between two specific processes, a sender and a receiver.
Objectives
Blocking point-to-point communication
Different send modes in MPI: MPI_Send, MPI_Ssend, MPI_Bsend, MPI_Rsend
Explore point-to-point communication with two MPI processes playing ping pong
Instructor note
45 min teaching
75 min exercises
Note
MPI Standard: MPI: A Message-Passing Interface Standard Version 5.0 (PDF)
The simplest method to communicate with MPI is point-to-point communication between two specific processes, a sender and a receiver. Both processes actively participate in this form of communication where the sender must execute some send function while the receiver executes some receive function. Furthermore, both processes must be in the same communicator and need information about the communication partner (source or destination rank) as well as a tag that helps to identify the message. MPI is equipped with two flavors of point-to-point communication: blocking and nonblocking.
Blocking point-to-point communication¶
With blocking communication, the processes may or may not wait until the communication partner is ready to engage in the communication. Blocking send or receive functions cause the executing process to suspend until the send buffer can be reused / changed or until the receive buffer is actually filled. After a blocking send, the process only continues when the data to be sent has been copied from the send buffer, however, this does not mean that the data has been received at the destination process. In the case of a blocking receive, the completion implies that the data transfer has happened and the data has been copied into the receive buffer and is therefore safe to be used.
Communication Modes¶
For blocking point-to-point communication, the MPI standard defines four modes of communication with subtle differences in their semantics:
SENDING |
Mode |
|
|---|---|---|
Standard |
|
recommended for production runs |
Synchronous |
|
recommended for debugging version |
Buffered |
|
recommended to use nonblocking communcation instead |
Ready |
|
dangerous, for experts only |
RECEIVING |
Mode |
|
|---|---|---|
Standard |
|
only one mode needed (fits all sends) |
Standard Mode¶
Standard Mode is done either using a synchronous or an asynchronous protocol and the MPI library decides which one to use depending on the message size (and handles the asynchronous protocol transparently). When the synchronous protocol is used there is a risk of deadlocks and serializations. Standard mode is recommended for production runs.
Synchronous Send¶
Synchronous Send is the most stringent communication mode, since the sending process requires the receiving process to provide a matching receive, which is similar to accepting a handshake, in order to start the send. This means that the receiving process has to declare its readiness for receiving a message. Ideally, every MPI program still works correctly when standard send is replaced with synchronous send, however, if it is used incorrectly, it can lead to deadlocks and serialization. The use case for this mode is debugging.
Buffered Send¶
Buffered Send copies the data from the send buffer to a buffer that has to be managed by the programmer and subsequently returns. Once a matching receive has been received, the data will be transmitted over the network from the user-managed buffer. Naturally, this requires an additional buffer and an extra transfer between the buffers. However, this communication mode is local, and its completion does not depend on the occurrence of a matching receive. This communication mode also requires the programmer to attach and detach a user-managed buffer, where the detach call blocks until all data in the buffer has been transmitted. We are not going to show this here as nonblocking communication can accomplish the same goal in a more elegant way.
Ready Send¶
Ready Send works only under the assumption that the matching receive has already been posted and thus the send call completes immediately. If this is not the case, the behavior is undefined and might give wrong results. This communication has the potential to be the fastest but it should be handled with utmost care and used only when the control flow of the parallel program permits it. This mode of communication is rather advanced.
Hands-on labs¶
Explore point-to-point communication with two MPI processes playing ping pong:
Step by step (according to the pictures below):
ping - rank 0 sends a message (ping) to rank 1 and rank 1 receives it
pingpong - after receiving the ping, rank 1 sends a message (pong) back to rank 0 and rank 0 receives it
timing - repeat the ping pong in a loop and add timing calls before and after the loop
warmup - don’t forget to warmup and do one ping pong before starting the timed loop
finish - who wins the race?

Let’s write a ping pong benchmark (to measure the latency) step by step
Only two MPI processes (rank 0 and rank 1) will be needed (mpirun -np 2 …)
For the ping pong exercise, we’ll adopt the MPMD (multiple program multiple data) approach
The ping pong ball will be 1 float (but the value is not of interest)
Be careful, in the end the two MPI processses should play ping pong with only one ball
(not ping-ping pong-pong with two balls)
Note
MPI_Send(&buf, count, datatype, dest, tag, comm)
blocking send procedure (other send modes have the same syntax)
source rank sends the message defined by (buf, count, datatype) to the dest(ination) rank
IN buf initial address of send buffer (choice)
IN count number of elements in send buffer (non-negative integer)
IN datatype datatype of each send buffer element (handle)
IN dest rank of destination (integer)
IN tag message tag (integer)
IN comm communicator (handle)
C binding
int MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)Usage: MPI_Send(&buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
Fortran 2008 binding
MPI_Send(buf, count, datatype, dest, tag, comm, ierror)
TYPE(*), DIMENSION(..), INTENT(IN) :: buf
INTEGER, INTENT(IN) :: count, dest, tag
TYPE(MPI_Datatype), INTENT(IN) :: datatype
TYPE(MPI_Comm), INTENT(IN) :: comm
INTEGER, OPTIONAL, INTENT(OUT) :: ierrorUsage: call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
Pickling
Pickling refers to the serialization and deserialization of Python objects, which is commonly done by the help of the pickle module. Before objects can be stored or transferred over a network, they need to be converted into a byte stream that preserves the objects’ structure. The inverse process converts this byte stream back into an object that is identical to the original. MPI4Py provides pickle-based communication of generic Python objects as well as direct array data communication of buffer-provider objects, such as NumPy arrays. Communication functions with all-lowercase names are meant for generic pickled objects, while those starting with an upper-case letter are used for buffered objects.
Note:
MPI_Send (standard send) is recommended for production runs (best speed)
–> let the MPI library decide how to best transfer the message (same risks as MPI_Ssend)MPI_Ssend (synchronous send) is recommended for debugging (helps to detect deadlocks)
–> completes only when the receive has started –> risk of deadlocks and serializationsMPI_Bsend (buffered send) –> not recommended since unnecessarily complicated
–> it’s recommended to use MPI_Send or nonblocking communication insteadMPI_Rsend (ready send) –> not recommended because it’s highly dangerous to get it wrong
–> it may be started only after the matching receive is already posted (needs additional guaratees)
Note
MPI_Recv(&buf, count, datatype, source, tag, comm, &status)
blocking receive procedure
dest(ination) rank receives a message from the source rank and stores it at (buf, count, datatype)
OUT buf initial address of receive buffer (choice)
IN count number of elements in receive buffer (non-negative integer)
IN datatype datatype of each receive buffer element (handle)
IN dest rank of source or MPI_ANY_SOURCE (integer)
IN tag message tag or MPI_ANY_TAG (integer)
IN comm communicator (handle)
OUT status status object (status)
C binding
int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)Usage: MPI_Recv(&buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
MPI 5.0 Table 3.2 - Predefined MPI datatypes corresponding to C datatypes
Fortran 2008 binding
MPI_Recv(buf, count, datatype, source, tag, comm, status, ierror)
TYPE(*), DIMENSION(..) :: buf
INTEGER, INTENT(IN) :: count, source, tag
TYPE(MPI_Datatype), INTENT(IN) :: datatype
TYPE(MPI_Comm), INTENT(IN) :: comm
TYPE(MPI_Status) :: status
INTEGER, OPTIONAL, INTENT(OUT) :: ierrorUsage: call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
MPI 5.0 Table 3.1 - Predefined MPI datatypes corresponding to Fortran datatypes
MPI 5.0 Table 3.2 - Predefined MPI datatypes corresponding to C datatypes
In Python use with MPI.FLOAT
Note:
MPI_Recv completes when the message has arrived
–> only one receive mode is needed that works together with all 4 send modes
1. ping¶
Exercise
The very first ping:
Modify the code below such that:
rank 0 sends a message (ping) to rank 1
rank 1 receives the message (ping) from rank 0
the message (ping pong ball) shall be 1 float and please use tag=17 for the ping
What happens if you do NOT modify the code below? Try it out!
You can compile and execute part 1 without modifying the code below.
Give it a try before you actually modify.
What happens here? Why is this possible at all?
Of course, before you can proceed to the next step (2), you have to modify.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
int i, rank;
float buffer[1];
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("I am %i before send ping \n", rank);
printf("I am %i after recv ping \n", rank);
MPI_Finalize();
}
program ping
use mpi_f08
implicit none
type(MPI_Status) :: status
real :: buffer(1)
integer :: i, rank
call MPI_Init()
call MPI_Comm_rank(MPI_COMM_WORLD, rank)
write(*,*) 'I am ', rank, ' before send ping'
write(*,*) 'I am ', rank, ' after recv ping'
call MPI_Finalize()
end program
from mpi4py import MPI
comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()
buffer = [ None ]
print(f"I am {my_rank} before send ping")
print(f"I am {my_rank} after recv ping ")
Compile:
mpicc ping.c -o ping
mpif90 ping.f90 -o ping
Run:
mpirun -np 2 ./ping
mpirun -np 2 python3 ./ping.py
Expected output:
I am 0 before send ping
I am 1 after recv ping
Unexpected output - but still correct - do you remember why this might happen?
I am 1 after recv ping
I am 0 before send ping
Tip
Seeing more than 2 output lines?
If you are seeing more than 2 output lines, please modify / correct the code above.
If you have not yet modified it you will see 4 (2 x number of MPI processes) lines of output, i.e., each MPI process runs the whole code which has 2 print statements.
Solution (please try to solve the exercise by yourself before looking at the solution)
Solution
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
int i, rank;
float buffer[1];
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
{
printf("I am %i before send ping \n", rank);
MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
}
else if (rank == 1)
{
MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
printf("I am %i after recv ping \n", rank);
}
MPI_Finalize();
}
program ping
use mpi_f08
implicit none
type(MPI_Status) :: status
real :: buffer(1)
integer :: i, rank
call MPI_Init()
call MPI_Comm_rank(MPI_COMM_WORLD, rank)
if (rank .eq. 0) then
write(*,*) 'I am ', rank, ' before send ping'
call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
else if (rank .eq. 1) then
call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
write(*,*) 'I am ', rank, ' after recv ping'
end if
call MPI_Finalize()
end program
from mpi4py import MPI
buffer = [ None ]
comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()
if (my_rank == 0):
print(f"I am {my_rank} before send ping")
comm_world.send(buffer, dest=1, tag=17);
elif (my_rank == 1):
buffer = comm_world.recv(source=0, tag=17);
print(f"I am {my_rank} after recv ping ")
2. pingpong¶
Exercise
Sending back the pong:
Modify the code below such that:
after receiving the ping, rank 1 sends a message (pong) back to rank 0
rank 0 receives the message (pong) from rank 1
the message (ping pong ball) shall be 1 float and please use tag=23 for the pong
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
int i, rank;
float buffer[1];
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
{
printf("I am %i before send ping \n", rank);
MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
printf("I WILL BE / am %i after recv ping \n", rank);
}
else if (rank == 1)
{
MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
printf("I am %i after recv ping \n", rank);
printf("I WILL BE / am %i before send pong \n", rank);
}
MPI_Finalize();
}
program pingpong
use mpi_f08
implicit none
type(MPI_Status) :: status
real :: buffer(1)
integer :: i, rank
call MPI_Init()
call MPI_Comm_rank(MPI_COMM_WORLD, rank)
if (rank .eq. 0) then
write(*,*) 'I am ', rank, ' before send ping'
call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
! write(*,*) 'I am ', rank, ' after recv pong'
else if (rank .eq. 1) then
call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
write(*,*) 'I am ', rank, ' after recv ping'
! write(*,*) 'I am ', rank, ' before send pong'
end if
call MPI_Finalize()
end program
from mpi4py import MPI
buffer = [ None ]
comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()
if (my_rank == 0):
print(f"I am {my_rank} before send ping")
comm_world.send(buffer, dest=1, tag=17)
elif (my_rank == 1):
buffer = comm_world.recv(source=0, tag=17)
print(f"I am {my_rank} after recv ping")
Compile:
mpicc pingpong.c -o pingpong
mpif90 pingpong.f90 -o pingpong
Run:
mpirun -np 2 ./pingpong
mpirun -np 2 python3 ./pingpong.py
Expected output:
I am 0 before send ping
I am 1 after recv ping
I am 1 before send pong
I am 0 after recv pong
Unexpected output - but still correct - do you remember why this might happen?
I am 0 before send ping
I am 0 after recv pong
I am 1 after recv ping
I am 1 before send pong
Solution (please try to solve the exercise by yourself before looking at the solution)
Solution
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
int i, rank;
float buffer[1];
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
{
printf("I am %i before send ping \n", rank);
MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
printf("I am %i after recv pong \n", rank);
}
else if (rank == 1)
{
MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
printf("I am %i after recv ping \n", rank);
printf("I am %i before send pong \n", rank);
MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
}
MPI_Finalize();
}
program pingpong
use mpi_f08
implicit none
type(MPI_Status) :: status
real :: buffer(1)
integer :: i, rank
call MPI_Init()
call MPI_Comm_rank(MPI_COMM_WORLD, rank)
if (rank .eq. 0) then
write(*,*) 'I am ', rank, ' before send ping'
call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
write(*,*) 'I am ', rank, ' after recv pong'
else if (rank .eq. 1) then
call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
write(*,*) 'I am ', rank, ' after recv ping'
write(*,*) 'I am ', rank, ' before send pong'
call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
end if
call MPI_Finalize()
end program
from mpi4py import MPI
buffer = [ None ]
comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()
if (my_rank == 0):
print(f"I am {my_rank} before send ping")
comm_world.send(buffer, dest=1, tag=17)
buffer = comm_world.recv(source=1, tag=23)
print(f"I am {my_rank} after recv pong")
elif (my_rank == 1):
buffer = comm_world.recv(source=0, tag=17)
print(f"I am {my_rank} after recv ping")
print(f"I am {my_rank} before send pong")
comm_world.send(buffer, dest=0, tag=23)
3. timing¶
Note
MPI_Wtime()
timing
returns a floating-point number of seconds, representing elapsed wallclock time since some time in the past
C binding
double MPI_Wtime(void)Usage: time = MPI_Wtime();
Fortran 2008 binding
DOUBLE PRECISION MPI_Wtime()Usage: time = MPI_Wtime()
Exercise
Repeat this in a loop and add timing calls:
Modify the code below:
repeat this ping pong with a loop of length 50
add timing calls before and after the loop
only rank 0 shall print out the transfer time of one message in micro seconds, i.e., delta_time / (2*50) * 1e6
Uncomment the 3 // resp. # lines and add all other pieces needed in the code.
#include <stdio.h>
#include <mpi.h>
#define number_of_messages 50
int main(int argc, char *argv[])
{
int i, rank;
float buffer[1];
// ??? start, finish, msg_transfer_time;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
{
MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
}
else if (rank == 1)
{
MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
}
if (rank == 0)
{
// msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
// printf("Time for one message: %f micro seconds.\n", msg_transfer_time);
}
MPI_Finalize();
}
program pingpong_bench
use mpi_f08
implicit none
integer :: number_of_messages
parameter (number_of_messages=50)
! ??? :: start, finish, msg_transfer_time
type(MPI_Status) :: status
real :: buffer(1)
integer :: i, rank
call MPI_Init()
call MPI_Comm_rank(MPI_COMM_WORLD, rank)
if (rank .eq. 0) then
call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
else if (rank .eq. 1) then
call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
end if
if (rank .eq. 0) then
! msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ! in microsec
! write(*,*) 'Time for one message:', msg_transfer_time, ' micro seconds'
end if
call MPI_Finalize()
end program
from mpi4py import MPI
number_of_messages = 50
buffer = 0.0
status = MPI.Status()
comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()
if (my_rank == 0):
comm_world.send(buffer, dest=1, tag=17)
buffer = comm_world.recv(source=1, tag=23, status=status)
elif (my_rank == 1):
buffer = comm_world.recv(source=0, tag=17, status=status)
comm_world.send(buffer, dest=0, tag=23)
#if (my_rank == 0):
# msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
# print(f"Time for one messsage: {msg_transfer_time:f} micro seconds.")
Compile:
mpicc pingpong-bench.c -o pingpong-bench
mpif90 pingpong-bench.f90 -o pingpong-bench
Run:
mpirun -np 2 ./pingpong-bench
mpirun -np 2 python3 ./pingpong-bench.py
Expected output - What did you measure? Run is a couple of times to see run to run variations!
Time for one message: 0.440590 micro seconds.
Solution (please try to solve the exercise by yourself before looking at the solution)
Solution
#include <stdio.h>
#include <mpi.h>
#define number_of_messages 50
int main(int argc, char *argv[])
{
int i, rank;
float buffer[1];
double start, finish, msg_transfer_time;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
start = MPI_Wtime();
for (i = 1; i <= number_of_messages; i++)
{
if (rank == 0)
{
MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
}
else if (rank == 1)
{
MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
}
}
finish = MPI_Wtime();
if (rank == 0)
{
msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
printf("Time for one messsage: %f micro seconds.\n", msg_transfer_time);
}
MPI_Finalize();
}
program pingpong_bench
use mpi_f08
implicit none
integer :: number_of_messages
parameter (number_of_messages=50)
double precision :: start, finish, msg_transfer_time
type(MPI_Status) :: status
real :: buffer(1)
integer :: i, rank
call MPI_Init()
call MPI_Comm_rank(MPI_COMM_WORLD, rank)
start = MPI_Wtime()
do i = 1, number_of_messages
if (rank .eq. 0) then
call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
else if (rank .eq. 1) then
call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
end if
end do
finish = MPI_Wtime()
if (rank .eq. 0) then
msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ! in microsec
write(*,*) 'Time for one message:', msg_transfer_time, ' micro seconds'
end if
call MPI_Finalize()
end program
Solution with send:
from mpi4py import MPI
number_of_messages = 50
buffer = 0.0
status = MPI.Status()
comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()
start = MPI.Wtime()
for i in range(1, number_of_messages+1):
if (my_rank == 0):
comm_world.send(buffer, dest=1, tag=17)
buffer = comm_world.recv(source=1, tag=23, status=status)
elif (my_rank == 1):
buffer = comm_world.recv(source=0, tag=17, status=status)
comm_world.send(buffer, dest=0, tag=23)
finish = MPI.Wtime()
if (my_rank == 0):
msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
print(f"Time for one messsage: {msg_transfer_time:f} micro seconds.")
Solution with Send and numpy:
import numpy as np
from mpi4py import MPI
number_of_messages = 50
buffer = np.array([0], dtype='f')
status = MPI.Status()
comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()
start = MPI.Wtime()
for i in range(1, number_of_messages+1):
if (my_rank == 0):
comm_world.Send((buffer,1,MPI.FLOAT), dest=1, tag=17)
comm_world.Recv((buffer,1,MPI.FLOAT), source=1, tag=23, status=status)
elif (my_rank == 1):
comm_world.Recv((buffer,1,MPI.FLOAT), source=0, tag=17, status=status)
comm_world.Send((buffer,1,MPI.FLOAT), dest=0, tag=23)
finish = MPI.Wtime()
if (my_rank == 0):
msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
print(f"Time for one messsage: {msg_transfer_time:f} micro seconds.")
4. warmup¶
Exercise
Don’t forget to warmup and do one ping pong before starting the timed loop:
Modify the code below accordingly:
#include <stdio.h>
#include <mpi.h>
#define number_of_messages 50
int main(int argc, char *argv[])
{
int i, rank;
float buffer[1];
double start, finish, msg_transfer_time;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
start = MPI_Wtime();
for (i = 1; i <= number_of_messages; i++)
{
if (rank == 0)
{
MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
}
else if (rank == 1)
{
MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
}
}
finish = MPI_Wtime();
if (rank == 0)
{
msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
printf("Time for one messsage: %f micro seconds.\n", msg_transfer_time);
}
MPI_Finalize();
}
program pingpong_bench
use mpi_f08
implicit none
integer :: number_of_messages
parameter (number_of_messages=50)
double precision :: start, finish, msg_transfer_time
type(MPI_Status) :: status
real :: buffer(1)
integer :: i, rank
call MPI_Init()
call MPI_Comm_rank(MPI_COMM_WORLD, rank)
start = MPI_Wtime()
do i = 1, number_of_messages
if (rank .eq. 0) then
call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
else if (rank .eq. 1) then
call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
end if
end do
finish = MPI_Wtime()
if (rank .eq. 0) then
msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ! in microsec
write(*,*) 'Time for one message:', msg_transfer_time, ' micro seconds'
end if
call MPI_Finalize()
end program
from mpi4py import MPI
number_of_messages = 50
buffer = 0.0
status = MPI.Status()
comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()
start = MPI.Wtime()
for i in range(1, number_of_messages+1):
if (my_rank == 0):
comm_world.send(buffer, dest=1, tag=17)
buffer = comm_world.recv(source=1, tag=23, status=status)
elif (my_rank == 1):
buffer = comm_world.recv(source=0, tag=17, status=status)
comm_world.send(buffer, dest=0, tag=23)
finish = MPI.Wtime()
if (my_rank == 0):
msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
print(f"Time for one messsage: {msg_transfer_time:f} micro seconds.")
Compile:
mpicc pingpong-bench1.c -o pingpong-bench1
mpif90 pingpong-bench1.f90 -o pingpong-bench1
Run:
mpirun -np 2 ./pingpong-bench1
mpirun -np 2 python3 ./pingpong-bench1.py
Expected output - What did you measure? Run it a couple of times to see run to run variations!
Time for one messsage: 0.134900 micro seconds.
Solution (please try to solve the exercise by yourself before looking at the solution)
Solution
#include <stdio.h>
#include <mpi.h>
#define number_of_messages 50
int main(int argc, char *argv[])
{
int i, rank;
float buffer[1];
double start, finish, msg_transfer_time;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
{
MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
}
else if (rank == 1)
{
MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
}
start = MPI_Wtime();
for (i = 1; i <= number_of_messages; i++)
{
if (rank == 0)
{
MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
}
else if (rank == 1)
{
MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
}
}
finish = MPI_Wtime();
if (rank == 0)
{
msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
printf("Time for one messsage: %f micro seconds.\n", msg_transfer_time);
}
MPI_Finalize();
}
program pingpong_bench
use mpi_f08
implicit none
integer :: number_of_messages
parameter (number_of_messages=50)
double precision :: start, finish, msg_transfer_time
type(MPI_Status) :: status
real :: buffer(1)
integer :: i, rank
call MPI_Init()
call MPI_Comm_rank(MPI_COMM_WORLD, rank)
if (rank .eq. 0) then
call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
else if (rank .eq. 1) then
call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
end if
start = MPI_Wtime()
do i = 1, number_of_messages
if (rank .eq. 0) then
call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
else if (rank .eq. 1) then
call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
end if
end do
finish = MPI_Wtime()
if (rank .eq. 0) then
msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ! in microsec
write(*,*) 'Time for one message:', msg_transfer_time, ' micro seconds'
end if
call MPI_Finalize()
end program
from mpi4py import MPI
number_of_messages = 50
buffer = 0.0
status = MPI.Status()
comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()
if (my_rank == 0):
comm_world.send(buffer, dest=1, tag=17)
buffer = comm_world.recv(source=1, tag=23, status=status)
elif (my_rank == 1):
buffer = comm_world.recv(source=0, tag=17, status=status)
comm_world.send(buffer, dest=0, tag=23)
start = MPI.Wtime()
for i in range(1, number_of_messages+1):
if (my_rank == 0):
comm_world.send(buffer, dest=1, tag=17)
buffer = comm_world.recv(source=1, tag=23, status=status)
elif (my_rank == 1):
buffer = comm_world.recv(source=0, tag=17, status=status)
comm_world.send(buffer, dest=0, tag=23)
finish = MPI.Wtime()
if (my_rank == 0):
msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
print(f"Time for one messsage: {msg_transfer_time:f} micro seconds.")
Solution with Send and numpy:
import numpy as np
from mpi4py import MPI
number_of_messages = 50
buffer = np.array([0], dtype='f')
status = MPI.Status()
comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()
if (my_rank == 0):
comm_world.Send((buffer,1,MPI.FLOAT), dest=1, tag=17)
comm_world.Recv((buffer,1,MPI.FLOAT), source=1, tag=23, status=status)
elif (my_rank == 1):
comm_world.Recv((buffer,1,MPI.FLOAT), source=0, tag=17, status=status)
comm_world.Send((buffer,1,MPI.FLOAT), dest=0, tag=23)
start = MPI.Wtime()
for i in range(1, number_of_messages+1):
if (my_rank == 0):
comm_world.Send((buffer,1,MPI.FLOAT), dest=1, tag=17)
comm_world.Recv((buffer,1,MPI.FLOAT), source=1, tag=23, status=status)
elif (my_rank == 1):
comm_world.Recv((buffer,1,MPI.FLOAT), source=0, tag=17, status=status)
comm_world.Send((buffer,1,MPI.FLOAT), dest=0, tag=23)
finish = MPI.Wtime()
if (my_rank == 0):
msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
print(f"Time for one messsage: {msg_transfer_time:f} micro seconds.")
5. finish - who wins the race?¶
Please do a couple of time measurements - run a couple of times each and note down your fastest result for:
MPI_Send - including the first ping pong in the time measurement (result of 3. timing)
MPI_Send - excluding the first ping pong from the time measurements (result of 4. warmup)
MPI_Ssend - including the first ping pong in the time measurement (you’ll have to edit/copy from above)
MPI_Ssend - excluding the first ping pong from the time measurements (you’ll have to edit/copy from above)
You can do these measurements on different systems and in different environments, e.g.:
VSC JupyterHub using VSC-5 or VSC-4
Submitting jobs to VSC-5 or VSC-4 and playing around with pinning (see previous 01_hello.ipynb)
put both processes on the same NUMA domain
put the two processes on different NUMA domains but still on the same CPU/socket
put the two processes on different CPUs/sockets on the same node
put them on different nodes and both on CPU/socket 0
put them on different nodes and both on CPU/socket 1
put them on different nodes and one on CPU/socket 0 and the other on CPU/socket 1
With submitting jobs you can also witch to another MPI library (e.g. Intel-MPI) and do the same.
Run the ping pong benchmark on your own laptop and/or on another HPC system you have access to.
Record your results below we would like to see who wins the race?
(Copy the cell below to record all your measurements on different systems and in different environments.)
First name: ________
Measurement on: ________
Programming language: ________
time for 1 ping in micro seconds with MPI_Send MPI_Ssend
including first ping pong in timing ________ ________
excluding first ping pong from timing ________ ________
Keypoints
Blocking point-to-point communication
Different send modes in MPI: MPI_Send, MPI_Ssend, MPI_Bsend, MPI_Rsend
Explore point-to-point communication with two MPI processes playing ping pong