Aditya Ujeniya's blog

A beginners perspective and experience into the world of HPC

Content

Power/Frequency characteristics of compute bound codes on A40, A100 and H100 GPU with different data initialization

Hello readers, Today we talk about power and frequency characteristics of compute bound code, particularly SGEMM and DGEMM codes, but with a focus on different data initialization techniques. GEMMs are acronym for General Matrix Matrix multiplication. DGEMMs (Double-Precision GEMMs) involves 64 bit data (FP64 or double datatype) where it uses input matrix of FP64 datatype  […]

CUDA-Aware-MPI: Part 4: Optimizing strided data communication between GPUs

Hello readers, This is the fourth part from the 4-part CUDA-Aware-MPI series. In this blog series, we will use only NVIDIA GPGPUs and assume that the user know basics about CUDA-Aware-MPI  from the official blogpost : https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/[1]. Also, we assume that the reader knows about solving the 2D Poisson equation. For this, we are going […]

CUDA-Aware-MPI: Part 3: Solving 2D Poisson Equation with 2D domain decomposition

Hello readers, This is the third part from the 4-part CUDA-Aware-MPI series. In this blog series, we will use only NVIDIA GPGPUs and assume that the user know basics about CUDA-Aware-MPI  from the official blogpost : https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/[1]. Also, we assume that the reader knows about solving the 2D Poisson equation. For this, we are going […]

CUDA-Aware-MPI: Part 2: Solving 2D Poisson Equation with 1D domain decomposition

Hello readers, This is the second part from the 4-part CUDA-Aware-MPI series. In this blog series, we will use only NVIDIA GPGPUs and assume that the user know basics about CUDA-Aware-MPI  from the official blogpost : https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/[1]. Also, we assume that the reader knows about solving the 2D Poisson equation. For this, we are going […]

CUDA-Aware-MPI: Part 1: Understanding node topology and communication bandwidth

Hello readers, This is the first part from the 4-part CUDA-Aware-MPI series. In this blog series, we will use only NVIDIA GPGPUs and assume that the user know basics about CUDA-Aware-MPI  from the official blogpost : https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/[1] MPI is widely used library for distributed programming. It allows your parallel code to scale across different nodes […]