MPI - Blocking point-to-point communication¶

The simplest method to communicate with MPI is point-to-point communication between two specific processes, a sender and a receiver.

Objectives

Blocking point-to-point communication
Different send modes in MPI: MPI_Send, MPI_Ssend, MPI_Bsend, MPI_Rsend
Explore point-to-point communication with two MPI processes playing ping pong

Instructor note

45 min teaching
75 min exercises

Note

MPI Standard: MPI: A Message-Passing Interface Standard Version 5.0 (PDF)

The simplest method to communicate with MPI is point-to-point communication between two specific processes, a sender and a receiver. Both processes actively participate in this form of communication where the sender must execute some send function while the receiver executes some receive function. Furthermore, both processes must be in the same communicator and need information about the communication partner (source or destination rank) as well as a tag that helps to identify the message. MPI is equipped with two flavors of point-to-point communication: blocking and nonblocking.

Blocking point-to-point communication¶

With blocking communication, the processes may or may not wait until the communication partner is ready to engage in the communication. Blocking send or receive functions cause the executing process to suspend until the send buffer can be reused / changed or until the receive buffer is actually filled. After a blocking send, the process only continues when the data to be sent has been copied from the send buffer, however, this does not mean that the data has been received at the destination process. In the case of a blocking receive, the completion implies that the data transfer has happened and the data has been copied into the receive buffer and is therefore safe to be used.

Communication Modes¶

For blocking point-to-point communication, the MPI standard defines four modes of communication with subtle differences in their semantics:

SENDING	Mode
Standard	`Send`	recommended for production runs
Synchronous	`Ssend`	recommended for debugging version
Buffered	`Bsend`	recommended to use nonblocking communcation instead
Ready	`Rsend`	dangerous, for experts only

RECEIVING	Mode
Standard	`Recv`	only one mode needed (fits all sends)

Standard Mode¶

Standard Mode is done either using a synchronous or an asynchronous protocol and the MPI library decides which one to use depending on the message size (and handles the asynchronous protocol transparently). When the synchronous protocol is used there is a risk of deadlocks and serializations. Standard mode is recommended for production runs.

Synchronous Send¶

Synchronous Send is the most stringent communication mode, since the sending process requires the receiving process to provide a matching receive, which is similar to accepting a handshake, in order to start the send. This means that the receiving process has to declare its readiness for receiving a message. Ideally, every MPI program still works correctly when standard send is replaced with synchronous send, however, if it is used incorrectly, it can lead to deadlocks and serialization. The use case for this mode is debugging.

Buffered Send¶

Buffered Send copies the data from the send buffer to a buffer that has to be managed by the programmer and subsequently returns. Once a matching receive has been received, the data will be transmitted over the network from the user-managed buffer. Naturally, this requires an additional buffer and an extra transfer between the buffers. However, this communication mode is local, and its completion does not depend on the occurrence of a matching receive. This communication mode also requires the programmer to attach and detach a user-managed buffer, where the detach call blocks until all data in the buffer has been transmitted. We are not going to show this here as nonblocking communication can accomplish the same goal in a more elegant way.

Ready Send¶

Ready Send works only under the assumption that the matching receive has already been posted and thus the send call completes immediately. If this is not the case, the behavior is undefined and might give wrong results. This communication has the potential to be the fastest but it should be handled with utmost care and used only when the control flow of the parallel program permits it. This mode of communication is rather advanced.

Hands-on labs¶

Explore point-to-point communication with two MPI processes playing ping pong:

Step by step (according to the pictures below):

ping - rank 0 sends a message (ping) to rank 1 and rank 1 receives it
pingpong - after receiving the ping, rank 1 sends a message (pong) back to rank 0 and rank 0 receives it
timing - repeat the ping pong in a loop and add timing calls before and after the loop
warmup - don’t forget to warmup and do one ping pong before starting the timed loop
finish - who wins the race?

02_pingpong

Let’s write a ping pong benchmark (to measure the latency) step by step
Only two MPI processes (rank 0 and rank 1) will be needed (mpirun -np 2 …)
For the ping pong exercise, we’ll adopt the MPMD (multiple program multiple data) approach
The ping pong ball will be 1 float (but the value is not of interest)
Be careful, in the end the two MPI processes should play ping pong with only one ball
(not ping-ping pong-pong with two balls)

Note

MPI_Send(&buf, count, datatype, dest, tag, comm)

blocking send procedure (other send modes have the same syntax)
- source rank sends the message defined by (buf, count, datatype) to the dest(ination) rank
IN buf initial address of send buffer (choice)
IN count number of elements in send buffer (non-negative integer)
IN datatype datatype of each send buffer element (handle)
IN dest rank of destination (integer)
IN tag message tag (integer)
IN comm communicator (handle)

C binding
int MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)
- Usage: MPI_Send(&buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);

Note:
- MPI_Send (standard send) is recommended for production runs (best speed)
  –> let the MPI library decide how to best transfer the message (same risks as MPI_Ssend)
- MPI_Ssend (synchronous send) is recommended for debugging (helps to detect deadlocks)
  –> completes only when the receive has started –> risk of deadlocks and serializations
- MPI_Bsend (buffered send) –> not recommended since unnecessarily complicated
  –> it’s recommended to use MPI_Send or nonblocking communication instead
- MPI_Rsend (ready send) –> not recommended because it’s highly dangerous to get it wrong
  –> it may be started only after the matching receive is already posted (needs additional guaratees)

Note

MPI_Recv(&buf, count, datatype, source, tag, comm, &status)

blocking receive procedure
- dest(ination) rank receives a message from the source rank and stores it at (buf, count, datatype)
OUT buf initial address of receive buffer (choice)
IN count number of elements in receive buffer (non-negative integer)
IN datatype datatype of each receive buffer element (handle)
IN dest rank of source or MPI_ANY_SOURCE (integer)
IN tag message tag or MPI_ANY_TAG (integer)
IN comm communicator (handle)
OUT status status object (status)

C binding
int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)
- Usage: MPI_Recv(&buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);

MPI 5.0 Table 3.2 - Predefined MPI datatypes corresponding to C datatypes

Note:
- MPI_Recv completes when the message has arrived
  –> only one receive mode is needed that works together with all 4 send modes

1. ping¶

Exercise

The very first ping:

Modify the code below such that:

rank 0 sends a message (ping) to rank 1
rank 1 receives the message (ping) from rank 0
the message (ping pong ball) shall be 1 float and please use tag=17 for the ping

What happens if you do NOT modify the code below? Try it out!

You can compile and execute part 1 without modifying the code below.
Give it a try before you actually modify.
What happens here? Why is this possible at all?
Of course, before you can proceed to the next step (2), you have to modify.

#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[])
{
  int i, rank;
  float buffer[1];
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

      printf("I am %i before send ping \n", rank);

      printf("I am %i after  recv ping \n", rank);

  MPI_Finalize();
}

program ping

  use mpi_f08
  implicit none

  type(MPI_Status) :: status
  real :: buffer(1)
  integer :: i, rank

  call MPI_Init()
  call MPI_Comm_rank(MPI_COMM_WORLD, rank)

        write(*,*) 'I am ', rank, ' before send ping'

        write(*,*) 'I am ', rank, ' after  recv ping'

  call MPI_Finalize()

end program

from mpi4py import MPI

comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()
buffer = [ None ]

print(f"I am {my_rank} before send ping")

print(f"I am {my_rank} after  recv ping ")

Compile:

mpicc ping.c -o ping

mpif90 ping.f90 -o ping

Run:

mpirun -np 2 ./ping

mpirun -np 2 python3 ./ping.py

Expected output:

I am 0 before send ping 
I am 1 after  recv ping 

Unexpected output - but still correct - do you remember why this might happen?

I am 1 after  recv ping 
I am 0 before send ping 

Tip

Seeing more than 2 output lines?
If you are seeing more than 2 output lines, please modify / correct the code above.
If you have not yet modified it you will see 4 (2 x number of MPI processes) lines of output, i.e., each MPI process runs the whole code which has 2 print statements.

Solution (please try to solve the exercise by yourself before looking at the solution)

Solution

#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[])
{
  int i, rank;
  float buffer[1];
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0)
    {
      printf("I am %i before send ping \n", rank);
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      printf("I am %i after  recv ping \n", rank);
    }

  MPI_Finalize();
}

program ping

  use mpi_f08
  implicit none

  type(MPI_Status) :: status
  real :: buffer(1)
  integer :: i, rank

  call MPI_Init()
  call MPI_Comm_rank(MPI_COMM_WORLD, rank)

     if (rank .eq. 0) then
        write(*,*) 'I am ', rank, ' before send ping'
        call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
     else if (rank .eq. 1) then
        call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
        write(*,*) 'I am ', rank, ' after  recv ping'
     end if     

  call MPI_Finalize()

end program

from mpi4py import MPI

buffer = [ None ]

comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()

if (my_rank == 0):
   print(f"I am {my_rank} before send ping")
   comm_world.send(buffer, dest=1, tag=17);

elif (my_rank == 1):
   buffer = comm_world.recv(source=0, tag=17);
   print(f"I am {my_rank} after  recv ping ")

2. pingpong¶

Exercise

Sending back the pong:

Modify the code below such that:

after receiving the ping, rank 1 sends a message (pong) back to rank 0
rank 0 receives the message (pong) from rank 1
the message (ping pong ball) shall be 1 float and please use tag=23 for the pong

#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[])
{
  int i, rank;
  float buffer[1];
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0)
    {
      printf("I am %i before send ping \n", rank);
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      printf("I WILL BE / am %i after  recv ping \n", rank);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      printf("I am %i after  recv ping \n", rank);
      printf("I WILL BE / am %i before send pong \n", rank);
    }

  MPI_Finalize();
}

program pingpong

  use mpi_f08
  implicit none

  type(MPI_Status) :: status
  real :: buffer(1)
  integer :: i, rank

  call MPI_Init()
  call MPI_Comm_rank(MPI_COMM_WORLD, rank)

     if (rank .eq. 0) then
        write(*,*) 'I am ', rank, ' before send ping'
        call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
      ! write(*,*) 'I am ', rank, ' after  recv pong'
     else if (rank .eq. 1) then
        call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
        write(*,*) 'I am ', rank, ' after  recv ping'
      ! write(*,*) 'I am ', rank, ' before send pong'
     end if     

  call MPI_Finalize()

end program

from mpi4py import MPI

buffer = [ None ]

comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()

if (my_rank == 0):
   print(f"I am {my_rank} before send ping")
   comm_world.send(buffer, dest=1, tag=17)


elif (my_rank == 1):
   buffer = comm_world.recv(source=0, tag=17)
   print(f"I am {my_rank} after  recv ping")

Compile:

mpicc pingpong.c -o pingpong

mpif90 pingpong.f90 -o pingpong

Run:

mpirun -np 2 ./pingpong

mpirun -np 2 python3 ./pingpong.py

Expected output:

I am 0 before send ping 
I am 1 after  recv ping 
I am 1 before send pong 
I am 0 after  recv pong

Unexpected output - but still correct - do you remember why this might happen?

I am 0 before send ping
I am 0 after  recv pong
I am 1 after  recv ping 
I am 1 before send pong 

Solution (please try to solve the exercise by yourself before looking at the solution)

Solution

#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[])
{
  int i, rank;
  float buffer[1];
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0)
    {
      printf("I am %i before send ping \n", rank);
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
      printf("I am %i after  recv pong \n", rank);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      printf("I am %i after  recv ping \n", rank);
      printf("I am %i before send pong \n", rank);
      MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
    }

  MPI_Finalize();
}

program pingpong

  use mpi_f08
  implicit none

  type(MPI_Status) :: status
  real :: buffer(1)
  integer :: i, rank

  call MPI_Init()
  call MPI_Comm_rank(MPI_COMM_WORLD, rank)

     if (rank .eq. 0) then
        write(*,*) 'I am ', rank, ' before send ping'
        call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
        call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
        write(*,*) 'I am ', rank, ' after  recv pong'
     else if (rank .eq. 1) then
        call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
        write(*,*) 'I am ', rank, ' after  recv ping'
        write(*,*) 'I am ', rank, ' before send pong'
        call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
     end if     

  call MPI_Finalize()

end program

from mpi4py import MPI

buffer = [ None ]

comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()

if (my_rank == 0):
   print(f"I am {my_rank} before send ping")
   comm_world.send(buffer, dest=1, tag=17)
   buffer = comm_world.recv(source=1, tag=23)
   print(f"I am {my_rank} after  recv pong")

elif (my_rank == 1):
   buffer = comm_world.recv(source=0, tag=17)
   print(f"I am {my_rank} after  recv ping")
   print(f"I am {my_rank} before send pong")
   comm_world.send(buffer, dest=0, tag=23)

3. timing¶

Note

MPI_Wtime()

timing
- returns a floating-point number of seconds, representing elapsed wallclock time since some time in the past

C binding
double MPI_Wtime(void)
- Usage: time = MPI_Wtime();

Exercise

Repeat this in a loop and add timing calls:

Modify the code below:

repeat this ping pong with a loop of length 50
add timing calls before and after the loop
only rank 0 shall print out the transfer time of one message in micro seconds, i.e., delta_time / (2*50) * 1e6

Uncomment the 3 // resp. # lines and add all other pieces needed in the code.

#include <stdio.h>
#include <mpi.h>

#define number_of_messages 50

int main(int argc, char *argv[])
{
  int i, rank;
  float buffer[1];
  // ??? start, finish, msg_transfer_time;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0)
    {
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
    }

  if (rank == 0)
  {
    // msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
    // printf("Time for one message: %f micro seconds.\n", msg_transfer_time);
  }

  MPI_Finalize();
}

program pingpong_bench

  use mpi_f08
  implicit none

  integer :: number_of_messages
  parameter (number_of_messages=50)

! ??? :: start, finish, msg_transfer_time
  type(MPI_Status) :: status
  real :: buffer(1)
  integer :: i, rank

  call MPI_Init()
  call MPI_Comm_rank(MPI_COMM_WORLD, rank)

     if (rank .eq. 0) then
        call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
        call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
     else if (rank .eq. 1) then
        call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
        call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
     end if

  if (rank .eq. 0) then
!    msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6  ! in microsec
!    write(*,*) 'Time for one message:', msg_transfer_time, ' micro seconds'
  end if

  call MPI_Finalize()

end program

from mpi4py import MPI

number_of_messages = 50
buffer = 0.0
status = MPI.Status()

comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()

if (my_rank == 0):
   comm_world.send(buffer, dest=1, tag=17)
   buffer = comm_world.recv(source=1, tag=23, status=status)
elif (my_rank == 1):
   buffer = comm_world.recv(source=0, tag=17, status=status)
   comm_world.send(buffer, dest=0, tag=23)

#if (my_rank == 0):
#   msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
#   print(f"Time for one message: {msg_transfer_time:f} micro seconds.")

Compile:

mpicc pingpong-bench.c -o pingpong-bench

mpif90 pingpong-bench.f90 -o pingpong-bench

Run:

mpirun -np 2 ./pingpong-bench

mpirun -np 2 python3 ./pingpong-bench.py

Expected output - What did you measure? Run is a couple of times to see run to run variations!

Time for one message: 0.440590 micro seconds.

Solution (please try to solve the exercise by yourself before looking at the solution)

Solution

#include <stdio.h>
#include <mpi.h>

#define number_of_messages 50

int main(int argc, char *argv[])
{
  int i, rank;
  float buffer[1];
  double start, finish, msg_transfer_time;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  start = MPI_Wtime();
  for (i = 1; i <= number_of_messages; i++)
  {
    if (rank == 0)
    {
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
    }
  }
  finish = MPI_Wtime();

  if (rank == 0)
  {
    msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
    printf("Time for one message: %f micro seconds.\n", msg_transfer_time);
  }

  MPI_Finalize();
}

program pingpong_bench

  use mpi_f08
  implicit none

  integer :: number_of_messages
  parameter (number_of_messages=50)

  double precision :: start, finish, msg_transfer_time
  type(MPI_Status) :: status
  real :: buffer(1)
  integer :: i, rank

  call MPI_Init()
  call MPI_Comm_rank(MPI_COMM_WORLD, rank)

  start = MPI_Wtime()
  do i = 1, number_of_messages
    
     if (rank .eq. 0) then
        call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
        call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
     else if (rank .eq. 1) then
        call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
        call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
     end if

  end do
  finish = MPI_Wtime()

  if (rank .eq. 0) then
     msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6  ! in microsec
     write(*,*) 'Time for one message:', msg_transfer_time, ' micro seconds'
  end if

  call MPI_Finalize()

end program

Solution with send:

from mpi4py import MPI

number_of_messages = 50
buffer = 0.0
status = MPI.Status()

comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()

start = MPI.Wtime()
for i in range(1, number_of_messages+1):
   if (my_rank == 0):
      comm_world.send(buffer, dest=1, tag=17)
      buffer = comm_world.recv(source=1, tag=23, status=status)
   elif (my_rank == 1):
      buffer = comm_world.recv(source=0, tag=17, status=status)
      comm_world.send(buffer, dest=0, tag=23)

finish = MPI.Wtime()

if (my_rank == 0):
   msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
   print(f"Time for one message: {msg_transfer_time:f} micro seconds.")

Solution with Send and numpy:

import numpy as np
from mpi4py import MPI

number_of_messages = 50
buffer = np.array([0], dtype='f')
status = MPI.Status()

comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()

start = MPI.Wtime()
for i in range(1, number_of_messages+1):
   if (my_rank == 0):
      comm_world.Send((buffer,1,MPI.FLOAT), dest=1, tag=17)
      comm_world.Recv((buffer,1,MPI.FLOAT), source=1, tag=23, status=status)
   elif (my_rank == 1):
      comm_world.Recv((buffer,1,MPI.FLOAT), source=0, tag=17, status=status)
      comm_world.Send((buffer,1,MPI.FLOAT), dest=0, tag=23)

finish = MPI.Wtime()

if (my_rank == 0):
   msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
   print(f"Time for one message: {msg_transfer_time:f} micro seconds.")

4. warmup¶

Exercise

Don’t forget to warmup and do one ping pong before starting the timed loop:
Modify the code below accordingly:

#include <stdio.h>
#include <mpi.h>

#define number_of_messages 50

int main(int argc, char *argv[])
{
  int i, rank;
  float buffer[1];
  double start, finish, msg_transfer_time;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  start = MPI_Wtime();
  for (i = 1; i <= number_of_messages; i++)
  {
    if (rank == 0)
    {
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
    }
  }
  finish = MPI_Wtime();

  if (rank == 0)
  {
    msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
    printf("Time for one message: %f micro seconds.\n", msg_transfer_time);
  }

  MPI_Finalize();
}

program pingpong_bench

  use mpi_f08
  implicit none

  integer :: number_of_messages
  parameter (number_of_messages=50)

  double precision :: start, finish, msg_transfer_time
  type(MPI_Status) :: status
  real :: buffer(1)
  integer :: i, rank

  call MPI_Init()
  call MPI_Comm_rank(MPI_COMM_WORLD, rank)

  start = MPI_Wtime()
  do i = 1, number_of_messages
    
     if (rank .eq. 0) then
        call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
        call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
     else if (rank .eq. 1) then
        call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
        call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
     end if

  end do
  finish = MPI_Wtime()

  if (rank .eq. 0) then
     msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6  ! in microsec
     write(*,*) 'Time for one message:', msg_transfer_time, ' micro seconds'
  end if

  call MPI_Finalize()

end program

from mpi4py import MPI

number_of_messages = 50
buffer = 0.0
status = MPI.Status()

comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()

start = MPI.Wtime()
for i in range(1, number_of_messages+1):
   if (my_rank == 0):
      comm_world.send(buffer, dest=1, tag=17)
      buffer = comm_world.recv(source=1, tag=23, status=status)
   elif (my_rank == 1):
      buffer = comm_world.recv(source=0, tag=17, status=status)
      comm_world.send(buffer, dest=0, tag=23)

finish = MPI.Wtime()

if (my_rank == 0):
   msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
   print(f"Time for one message: {msg_transfer_time:f} micro seconds.")

Compile:

mpicc pingpong-bench1.c -o pingpong-bench1

mpif90 pingpong-bench1.f90 -o pingpong-bench1

Run:

mpirun -np 2 ./pingpong-bench1

mpirun -np 2 python3 ./pingpong-bench1.py

Expected output - What did you measure? Run it a couple of times to see run to run variations!

Time for one message: 0.134900 micro seconds.

Solution (please try to solve the exercise by yourself before looking at the solution)

Solution

#include <stdio.h>
#include <mpi.h>

#define number_of_messages 50

int main(int argc, char *argv[])
{
  int i, rank;
  float buffer[1];
  double start, finish, msg_transfer_time;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  if (rank == 0)
  {
    MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
    MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
  }
  else if (rank == 1)
  {
    MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
    MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
  }

  start = MPI_Wtime();
  for (i = 1; i <= number_of_messages; i++)
  {
    if (rank == 0)
    {
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
    }
  }
  finish = MPI_Wtime();

  if (rank == 0)
  {
    msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
    printf("Time for one message: %f micro seconds.\n", msg_transfer_time);
  }

  MPI_Finalize();
}

program pingpong_bench

  use mpi_f08
  implicit none

  integer :: number_of_messages
  parameter (number_of_messages=50)

  double precision :: start, finish, msg_transfer_time
  type(MPI_Status) :: status
  real :: buffer(1)
  integer :: i, rank

  call MPI_Init()
  call MPI_Comm_rank(MPI_COMM_WORLD, rank)

     if (rank .eq. 0) then
        call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
        call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
     else if (rank .eq. 1) then
        call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
        call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
     end if

  start = MPI_Wtime()
  do i = 1, number_of_messages
    
     if (rank .eq. 0) then
        call MPI_Send(buffer, 1, MPI_REAL, 1, 17, MPI_COMM_WORLD)
        call MPI_Recv(buffer, 1, MPI_REAL, 1, 23, MPI_COMM_WORLD, status)
     else if (rank .eq. 1) then
        call MPI_Recv(buffer, 1, MPI_REAL, 0, 17, MPI_COMM_WORLD, status)
        call MPI_Send(buffer, 1, MPI_REAL, 0, 23, MPI_COMM_WORLD)
     end if

  end do
  finish = MPI_Wtime()

  if (rank .eq. 0) then
     msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6  ! in microsec
     write(*,*) 'Time for one message:', msg_transfer_time, ' micro seconds'
  end if

  call MPI_Finalize()

end program

from mpi4py import MPI

number_of_messages = 50
buffer = 0.0
status = MPI.Status()

comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()

if (my_rank == 0):
   comm_world.send(buffer, dest=1, tag=17)
   buffer = comm_world.recv(source=1, tag=23, status=status)
elif (my_rank == 1):
   buffer = comm_world.recv(source=0, tag=17, status=status)
   comm_world.send(buffer, dest=0, tag=23)

start = MPI.Wtime()
for i in range(1, number_of_messages+1):
   if (my_rank == 0):
      comm_world.send(buffer, dest=1, tag=17)
      buffer = comm_world.recv(source=1, tag=23, status=status)
   elif (my_rank == 1):
      buffer = comm_world.recv(source=0, tag=17, status=status)
      comm_world.send(buffer, dest=0, tag=23)

finish = MPI.Wtime()

if (my_rank == 0):
   msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
   print(f"Time for one message: {msg_transfer_time:f} micro seconds.")

Solution with Send and numpy:

import numpy as np
from mpi4py import MPI

number_of_messages = 50
buffer = np.array([0], dtype='f')
status = MPI.Status()

comm_world = MPI.COMM_WORLD
my_rank = comm_world.Get_rank()

if (my_rank == 0):
   comm_world.Send((buffer,1,MPI.FLOAT), dest=1, tag=17)
   comm_world.Recv((buffer,1,MPI.FLOAT), source=1, tag=23, status=status)
elif (my_rank == 1):
   comm_world.Recv((buffer,1,MPI.FLOAT), source=0, tag=17, status=status)
   comm_world.Send((buffer,1,MPI.FLOAT), dest=0, tag=23)

start = MPI.Wtime()
for i in range(1, number_of_messages+1):
   if (my_rank == 0):
      comm_world.Send((buffer,1,MPI.FLOAT), dest=1, tag=17)
      comm_world.Recv((buffer,1,MPI.FLOAT), source=1, tag=23, status=status)
   elif (my_rank == 1):
      comm_world.Recv((buffer,1,MPI.FLOAT), source=0, tag=17, status=status)
      comm_world.Send((buffer,1,MPI.FLOAT), dest=0, tag=23)

finish = MPI.Wtime()

if (my_rank == 0):
   msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 # in microsec
   print(f"Time for one message: {msg_transfer_time:f} micro seconds.")

5. finish - who wins the race?¶

Please do a couple of time measurements - run a couple of times each and note down your fastest result for:

MPI_Send - including the first ping pong in the time measurement (result of 3. timing)
MPI_Send - excluding the first ping pong from the time measurements (result of 4. warmup)
MPI_Ssend - including the first ping pong in the time measurement (you’ll have to edit/copy from above)
MPI_Ssend - excluding the first ping pong from the time measurements (you’ll have to edit/copy from above)

You can do these measurements on different systems and in different environments, e.g.:

VSC JupyterHub using VSC-5 or VSC-4
Submitting jobs to VSC-5 or VSC-4 and playing around with pinning (see previous 01_hello.ipynb)
- put both processes on the same NUMA domain
- put the two processes on different NUMA domains but still on the same CPU/socket
- put the two processes on different CPUs/sockets on the same node
- put them on different nodes and both on CPU/socket 0
- put them on different nodes and both on CPU/socket 1
- put them on different nodes and one on CPU/socket 0 and the other on CPU/socket 1
With submitting jobs you can also witch to another MPI library (e.g. Intel-MPI) and do the same.
Run the ping pong benchmark on your own laptop and/or on another HPC system you have access to.

Record your results below we would like to see who wins the race?
(Copy the cell below to record all your measurements on different systems and in different environments.)

First name:     ________
Measurement on: ________
Programming language: ________
time for 1 ping in micro seconds with     MPI_Send     MPI_Ssend
including first ping pong  in  timing     ________     ________
excluding first ping pong from timing     ________     ________

Keypoints

Blocking point-to-point communication
Different send modes in MPI: MPI_Send, MPI_Ssend, MPI_Bsend, MPI_Rsend
Explore point-to-point communication with two MPI processes playing ping pong