Content

Power/Frequency characteristics of compute bound codes on A40, A100 and H100 GPU with different data initialization

November 15, 2025

Hello readers, Today we talk about power and frequency characteristics of compute bound code, particularly SGEMM and DGEMM codes, but with a focus on different data initialization techniques. GEMMs are acronym for General Matrix Matrix multiplication. DGEMMs (Double-Precision GEMMs) involves 64 bit data (FP64 or double datatype) where it uses input matrix of FP64 datatype […]

Continue reading →

CUDA-Aware-MPI: Part 4: Optimizing strided data communication between GPUs

February 27, 2025

Hello readers, This is the fourth part from the 4-part CUDA-Aware-MPI series. In this blog series, we will use only NVIDIA GPGPUs and assume that the user know basics about CUDA-Aware-MPI from the official blogpost : https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/[1]. Also, we assume that the reader knows about solving the 2D Poisson equation. For this, we are going […]

Continue reading →

CUDA-Aware-MPI: Part 3: Solving 2D Poisson Equation with 2D domain decomposition

February 27, 2025

Hello readers, This is the third part from the 4-part CUDA-Aware-MPI series. In this blog series, we will use only NVIDIA GPGPUs and assume that the user know basics about CUDA-Aware-MPI from the official blogpost : https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/[1]. Also, we assume that the reader knows about solving the 2D Poisson equation. For this, we are going […]

Continue reading →

CUDA-Aware-MPI: Part 2: Solving 2D Poisson Equation with 1D domain decomposition

February 26, 2025

Hello readers, This is the second part from the 4-part CUDA-Aware-MPI series. In this blog series, we will use only NVIDIA GPGPUs and assume that the user know basics about CUDA-Aware-MPI from the official blogpost : https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/[1]. Also, we assume that the reader knows about solving the 2D Poisson equation. For this, we are going […]

Continue reading →

CUDA-Aware-MPI: Part 1: Understanding node topology and communication bandwidth

February 18, 2025

Hello readers, This is the first part from the 4-part CUDA-Aware-MPI series. In this blog series, we will use only NVIDIA GPGPUs and assume that the user know basics about CUDA-Aware-MPI from the official blogpost : https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/[1] MPI is widely used library for distributed programming. It allows your parallel code to scale across different nodes […]

Continue reading →

Aditya Ujeniya's blog

A beginners perspective and experience into the world of HPC

Content

Power/Frequency characteristics of compute bound codes on A40, A100 and H100 GPU with different data initialization

CUDA-Aware-MPI: Part 4: Optimizing strided data communication between GPUs

CUDA-Aware-MPI: Part 3: Solving 2D Poisson Equation with 2D domain decomposition

CUDA-Aware-MPI: Part 2: Solving 2D Poisson Equation with 1D domain decomposition

CUDA-Aware-MPI: Part 1: Understanding node topology and communication bandwidth