Skip to content
← Back to Projects

CUDA Support for ClangIR

May 01, 2025 · 1 min read

Building the initial CUDA lowering logic for Meta's new high-level compiler IR.

What it is

ClangIR is a new high-level Intermediate Representation (IR) for the Clang C compiler, originally incubated at Meta. It sits between the Clang AST and LLVM IR to enable better optimizations and static analysis.

When I started looking at it, ClangIR was in its early incubation phase and the CUDA section was completely unimplemented. I implemented the CUDA backend to lower from the Clang AST to ClangIR and from ClangIR to LLVM IR.

What I did

I implemented the lowering logic required to translate ClangIR constructs into something the NVPTX (NVIDIA) backend could understand. The core work involved:

  • Host vs. Device Split: Writing the logic to correctly identify and separate host code (CPU) from device code (GPU) during the lowering phase.
  • Variable Handling: Mapping global and local variables to their correct CUDA address spaces.
  • Texture Support: Implementing support for CUDA-specific surface and texture types.

Notes / results

The goal wasn't just to write code, but to get it merged upstream.

  • Validation: I used the PolyBench benchmark suite to test completeness.
  • Upstreaming: All code was reviewed by the core ClangIR maintainers (including the project creator) and merged into the main incubator repository.
  • Status: The groundwork is now in place for others to build more complex CUDA optimizations on top of ClangIR.