Heat builds on PyTorch and mpi4py to provide high-performance computing infrastructure for memory-intensive applications within the NumPy/SciPy ecosystem.

With Heat you can:

  • port existing NumPy/SciPy code from single-CPU to multi-node clusters with minimal coding effort;
  • exploit the entire, cumulative RAM of your many nodes for memory-intensive operations and algorithms;
  • run your NumPy/SciPy code on GPUs (CUDA, ROCm, coming up: Apple MPS).


Within HiRSE_PS we would like to achieve at least the following objectives:

  • Continuous Benchmarking ( ✅ )
  • Portation to IPUs and XPUs (CUDA, ROCm: ✅)
  • Optimized Communication and Distribution Semantics


  • Continuous Benchmarking via the perun tool including measurement of energy consumption for MPI applications.
  • v1.3.1 supports PyTorch 2.0
  • Usage on HPC systems simplified via spack and Docker containers (upcoming: Easybuild)
  • New features include support for memory distributed truncated SVD.
  • Upcoming in v1.4: distributed FFTS, optimized QR decomposition, batch-parallel clustering, fully distributed advanced indexing, and more.