fftMPI documentation

API overview and simple example code

The fftMPI library has 4 classes: FFT3d, FFT2d, Remap3d, Remap2d. The FFT classes perform 3d and 2d FFTs. The Remap classes perform a data remap, which communicates and reorders the data that is distributed 3d and 2d arrays across processors. They can be used if you want to only rearrange data, but not perform an FFT.

The basic way to use either of the FFT classes is to instantiate the class and call setup() once to define how the FFT grid is distributed across processors for both input to and output from the FFT. Then invoke the compute() method as many times as needed to perform forward and/or inverse FFTs. Destruct the class when you are done using it.

Example code to do all of this for 3d FFTs in C++ is shown below. This code is a copy of the test/simple.cpp file. There are also equivalent files in the test dir for C, Fortran, and Python: simple_c.c, simple_f90.f90, and simple.py.

If a different size FFT or different grid distribution is needed, the FFT classes can be intstantiated as many times as needed.

Using the Remap classes is similar, where the remap() method replaces compute().

The choice to operate on single versus double precision data must be made at compile time, as explained on the compile doc page.

These are the methods that can be called for either the FFT3d or FFT2d classes:

See these apps in the test dir for examples of how all these methods are called from different languages:

These are methods that can be called for either the Remap3d or Remap2d classes:

Simple example code

These files in the text dir of the fftMPI distribution compute a forward/inverse FFT on any number of procs. The size of the FFT is hardcoded at the top of the file. Each file is about 150 lines with comments:

You should be able to compile/run any of them as as follows:

cd test make simple mpirun -np 4 simple # run C++ example on 4 procs simple_c # run C example on 1 proc mpirun -np 10 simple_f90 # run Fortran example on 10 procs mpirun -np 6 python simple.py # run Python example on 6 procs

You must link to the fftMPI library built for double-precision FFTs. In the simple source codes, you could replace "3d" by "2d" for 2d FFTs.

The C++ code is in the next section.

// Compute a forward/inverse double precision complex FFT using fftMPI
// change FFT size by editing 3 "FFT size" lines
// run on any number of procs 
// Run syntax:
// % simple               # run in serial
// % mpirun -np 4 simple  # run in parallel 
#include "fft3d.h" 
using namespace FFTMPI_NS; 
// FFT size 
#define NFAST 128
#define NMID 128
#define NSLOW 128 
// precision-dependent settings 
int precision = 1;
int precision = 2;
// main program 
int main(int narg, char **args)
  // setup MPI 
  MPI_Comm world = MPI_COMM_WORLD; 
  int me,nprocs;
  // instantiate FFT 
  FFT3d *fft = new FFT3d(world,precision); 
  // simple algorithm to factor Nprocs into roughly cube roots 
  int npfast,npmid,npslow; 
  npfast = (int) pow(nprocs,1.0/3.0);
  while (npfast < nprocs) {
    if (nprocs % npfast == 0) break;
  int npmidslow = nprocs / npfast;
  npmid = (int) sqrt(npmidslow);
  while (npmid < npmidslow) {
    if (npmidslow % npmid == 0) break;
  npslow = nprocs / npfast / npmid; 
  // partition grid into Npfast x Npmid x Npslow bricks 
  int nfast,nmid,nslow;
  int ilo,ihi,jlo,jhi,klo,khi; 
  nfast = NFAST;
  nmid = NMID;
  nslow = NSLOW; 
  int ipfast = me % npfast;
  int ipmid = (me/npfast) % npmid;
  int ipslow = me / (npfast*npmid); 
  ilo = (int) 1.0*ipfast*nfast/npfast;
  ihi = (int) 1.0*(ipfast+1)*nfast/npfast - 1;
  jlo = (int) 1.0*ipmid*nmid/npmid;
  jhi = (int) 1.0*(ipmid+1)*nmid/npmid - 1;
  klo = (int) 1.0*ipslow*nslow/npslow;
  khi = (int) 1.0*(ipslow+1)*nslow/npslow - 1; 
  // setup FFT, could replace with tune() 
  int fftsize,sendsize,recvsize;
  // tune FFT, could replace with setup() 
  //          ilo,ihi,jlo,jhi,klo,khi,ilo,ihi,jlo,jhi,klo,khi,
  //          0,fftsize,sendsize,recvsize,0,5,10.0,0); 
  // initialize each proc's local grid
  // global initialization is specific to proc count 
  FFT_SCALAR *work = (FFT_SCALAR *) malloc(2*fftsize*sizeof(FFT_SCALAR)); 
  int n = 0;
  for (int k = klo; k <= khi; k++) {
    for (int j = jlo; j <= jhi; j++) {
      for (int i = ilo; i <= ihi; i++) {
        work[n] = (double) n;
        work[n] = (double) n;
  // perform 2 FFTs 
  double timestart = MPI_Wtime();
  fft->compute(work,work,1);        // forward FFT
  fft->compute(work,work,-1);       // inverse FFT
  double timestop = MPI_Wtime(); 
  if (me == 0) {
    printf("Two %dx%dx%d FFTs on %d procs as %dx%dx%d grid\n",
    printf("CPU time = %g secs\n",timestop-timestart);
  // find largest difference between initial/final values
  // should be near zero 
  n = 0;
  double mydiff = 0.0;
  for (int k = klo; k <= khi; k++) {
    for (int j = jlo; j <= jhi; j++) {
      for (int i = ilo; i <= ihi; i++) {
        if (fabs(work[n]-n) > mydiff) mydiff = fabs(work[n]-n);
        if (fabs(work[n]-n) > mydiff) mydiff = fabs(work[n]-n);
  double alldiff;
  if (me == 0) printf("Max difference in initial/final values = %g\n",alldiff); 
  // clean up 
  delete fft;