HIP-Fortran
Cross-Platform GPU Acceleration in Fortran
AMD's HIP provides an API for accelerating C/C++ applications with both Nvidia and AMD GPUs. Fluid Numerics has provided HIP-Fortran help give Fortran developers a way to call HIP routines directly from Fortran applications.
Download and contribute to the open-source HIP-Fortran repository from Fluid Numerics.
Learn how to install and work with hip-fortran through our hands-on codelabs
Community
HIP-Fortran is in its infancy and Fluid Numerics is ready to help users get started. We are currently focusing on building tests to support Continuous Integration and creating live training modules, documentation, and codelabs. If you have requests for more HIP API exposure, would like to report bugs, or need guidance on implementation not covered in documentation or test, open a new issue on the Github page.
Support HIP-Fortran
Fluid Numerics offers consulting and hands-on coding to assist you and your team with code refactoring, GPU porting, GPU optimization, and multi-GPU acceleration. When you engage us for support, a significant portion of our service fee is reinvested into R&D accounts at Fluid Numerics to allow us to continue to support the open-source HIP-Fortran.
As you build expertise in GPU programming and working with HIP-Fortran, we want to hear from you! Fluid Numerics is ready to help give you a virtual stage to share your experience and lessons learned with the rest of the HPC community.
HIP-Fortran is meant to be a layer that allows Fortran developers to accelerate applications with both AMD and Nvidia GPUs. HIP-Fortran is regularly tested on Nvidia GPU platforms, but consistent testing on AMD platforms is still wanting. Support this project by providing access to AMD GPU systems or donating AMD GPU cards to Fluid Numerics to set up a hip-fortran CI server.
Fluid Numerics incurs expenses for time & labor and cloud computing to maintain and support the HIP-Fortran repository, codelabs, tutorials, demos, CI infrastructure, and mini-app testing. Consider donating funds to Fluid Numerics that will be reserved for these expenses.
Usage Example
A simple example program is shown that demonstrates how to :
Create device (GPU) pointers
Copy data from host to device and device to host
How to launch a simple HIP kernel from Fortran
How to build a hip-fortran application
In this example, we have three files
main.f03 : The main Fortran program
my_module.f03 : A Fortran module that defines the kernel interface
my_module_hip.cpp : The C++ code that defines the HIP Kernel and the kernel wrapper to launch the kernel
Assuming you
Have installed hip-fortran under /opt/hip-fortran,
Are using the gfortran compiler,
Are using the included modulefile,
Have the hipcc compiler and all necessary dependencies,
You can build this application with
gfortran ${HIPFORTRAN_INCLUDE} -c my_module.f03
gfortran ${HIPFORTRAN_INCLUDE} -c main.f03
hipcc -c my_module_hip.cpp
hipcc -lgfortran main.o my_module.o my_module_hip.o ${HIPFORTRAN_INCLUDE} ${HIPFORTRAN_LIB} -o hip_test
main.f03
PROGRAM main
USE hip_fortran
USE my_module
IMPLICIT NONE
REAL(8), ALLOCATABLE, TARGET :: a(:,:)
REAL(8), ALLOCATABLE, TARGET :: b(:,:)
TYPE(c_ptr) :: a_dev = c_null_ptr
TYPE(c_ptr) :: b_dev = c_null_ptr
! Allocate and initialize host array
ALLOCATE(array(0:10,0:10), b(0:10,0:10))
array = 10.0D0
! Allocate device array
CALL hfMalloc(a_dev, SIZEOF(a))
CALL hfMalloc(b_dev, SIZEOF(b))
! Copy host memory to device memory
CALL hfMemcpy(a_dev, c_loc(a), SIZEOF(a), hipMemcpyHostToDevice)
CALL myRoutine(a_dev,b_dev,N)
CALL hfMemcpy(c_loc(b), b_dev, SIZEOF(b), hipMemcpyDeviceToHost)
CALL hfFree(a_dev)
CALL hfFree(b_dev)
DEALLOCATE(a, b)
END PROGRAM main
my_module.f03
MODULE my_module
IMPLICIT NONE
INTERFACE
SUBROUTINE myRoutine(a,b,N)
USE iso_c_binding
IMPLICIT NONE
TYPE(c_ptr) :: a, b
INTEGER, VALUE :: N
END SUBROUTINE myRoutine(a,b,N)
END INTERFACE
END MODULE my_module
my_module_hip.cpp
#include <hip/hip_runtime.h>
__global__ void myroutine_hipkernel(double *a, double *b, int n){
size_t i = blockIdx.x*blockDim.x + threadIdx.x;
if ( i < (n+1)*(n+1) ) {
b[i] = 2.0*a[i];
}
}
extern "C"
{
void myroutine(double **a, double **b, int n)
{
int threadPerBlock = 256;
int blockCount = (n+1)*(n+1)/256;
hipLaunchKernelGGL((myroutine_hipkernel), dim3(blockCount), dim3(threadPerBlock), 0, 0, *a, *b, n);
}
}