HIP-Fortran

Cross-Platform GPU Acceleration in Fortran

AMD's HIP provides an API for accelerating C/C++ applications with both Nvidia and AMD GPUs. Fluid Numerics has provided HIP-Fortran help give Fortran developers a way to call HIP routines directly from Fortran applications.

Download and contribute to the open-source HIP-Fortran repository from Fluid Numerics.

Learn how to install and work with hip-fortran through our hands-on codelabs

Community

HIP-Fortran is in its infancy and Fluid Numerics is ready to help users get started. We are currently focusing on building tests to support Continuous Integration and creating live training modules, documentation, and codelabs. If you have requests for more HIP API exposure, would like to report bugs, or need guidance on implementation not covered in documentation or test, open a new issue on the Github page.

Support HIP-Fortran

Fluid Numerics offers consulting and hands-on coding to assist you and your team with code refactoring, GPU porting, GPU optimization, and multi-GPU acceleration. When you engage us for support, a significant portion of our service fee is reinvested into R&D accounts at Fluid Numerics to allow us to continue to support the open-source HIP-Fortran.

As you build expertise in GPU programming and working with HIP-Fortran, we want to hear from you! Fluid Numerics is ready to help give you a virtual stage to share your experience and lessons learned with the rest of the HPC community.

HIP-Fortran is meant to be a layer that allows Fortran developers to accelerate applications with both AMD and Nvidia GPUs. HIP-Fortran is regularly tested on Nvidia GPU platforms, but consistent testing on AMD platforms is still wanting. Support this project by providing access to AMD GPU systems or donating AMD GPU cards to Fluid Numerics to set up a hip-fortran CI server.

Fluid Numerics incurs expenses for time & labor and cloud computing to maintain and support the HIP-Fortran repository, codelabs, tutorials, demos, CI infrastructure, and mini-app testing. Consider donating funds to Fluid Numerics that will be reserved for these expenses.

Usage Example

A simple example program is shown that demonstrates how to :

  • Create device (GPU) pointers

  • Copy data from host to device and device to host

  • How to launch a simple HIP kernel from Fortran

  • How to build a hip-fortran application

In this example, we have three files

  • main.f03 : The main Fortran program

  • my_module.f03 : A Fortran module that defines the kernel interface

  • my_module_hip.cpp : The C++ code that defines the HIP Kernel and the kernel wrapper to launch the kernel

Assuming you

  • Have installed hip-fortran under /opt/hip-fortran,

  • Are using the gfortran compiler,

  • Are using the included modulefile,

  • Have the hipcc compiler and all necessary dependencies,

You can build this application with

gfortran ${HIPFORTRAN_INCLUDE} -c my_module.f03

gfortran ${HIPFORTRAN_INCLUDE} -c main.f03

hipcc -c my_module_hip.cpp

hipcc -lgfortran main.o my_module.o my_module_hip.o ${HIPFORTRAN_INCLUDE} ${HIPFORTRAN_LIB} -o hip_test

main.f03

PROGRAM main


USE hip_fortran

USE my_module


IMPLICIT NONE


REAL(8), ALLOCATABLE, TARGET :: a(:,:)

REAL(8), ALLOCATABLE, TARGET :: b(:,:)

TYPE(c_ptr) :: a_dev = c_null_ptr

TYPE(c_ptr) :: b_dev = c_null_ptr


! Allocate and initialize host array

ALLOCATE(array(0:10,0:10), b(0:10,0:10))

array = 10.0D0

! Allocate device array

CALL hfMalloc(a_dev, SIZEOF(a))

CALL hfMalloc(b_dev, SIZEOF(b))


! Copy host memory to device memory

CALL hfMemcpy(a_dev, c_loc(a), SIZEOF(a), hipMemcpyHostToDevice)


CALL myRoutine(a_dev,b_dev,N)


CALL hfMemcpy(c_loc(b), b_dev, SIZEOF(b), hipMemcpyDeviceToHost)


CALL hfFree(a_dev)

CALL hfFree(b_dev)

DEALLOCATE(a, b)


END PROGRAM main

my_module.f03

MODULE my_module


IMPLICIT NONE


INTERFACE

SUBROUTINE myRoutine(a,b,N)

USE iso_c_binding

IMPLICIT NONE

TYPE(c_ptr) :: a, b

INTEGER, VALUE :: N

END SUBROUTINE myRoutine(a,b,N)

END INTERFACE


END MODULE my_module

my_module_hip.cpp

#include <hip/hip_runtime.h>


__global__ void myroutine_hipkernel(double *a, double *b, int n){


size_t i = blockIdx.x*blockDim.x + threadIdx.x;

if ( i < (n+1)*(n+1) ) {

b[i] = 2.0*a[i];

}


}


extern "C"

{

void myroutine(double **a, double **b, int n)

{

int threadPerBlock = 256;

int blockCount = (n+1)*(n+1)/256;


hipLaunchKernelGGL((myroutine_hipkernel), dim3(blockCount), dim3(threadPerBlock), 0, 0, *a, *b, n);

}

}