CUDA Support for ClangIR - James Leung's Notes

What it is

ClangIR is a new high-level Intermediate Representation (IR) for the Clang C compiler, originally incubated at Meta. It sits between the Clang AST and LLVM IR to enable better optimizations and static analysis.

When I started looking at it, ClangIR was in its early incubation phase and the CUDA section was completely unimplemented. I implemented the CUDA backend to lower from the Clang AST to ClangIR and from ClangIR to LLVM IR.

What I did

I implemented the lowering logic required to translate ClangIR constructs into something the NVPTX (NVIDIA) backend could understand. The core work involved:

Host vs. Device Split: Writing the logic to correctly identify and separate host code (CPU) from device code (GPU) during the lowering phase.
Variable Handling: Mapping global and local variables to their correct CUDA address spaces.
Texture Support: Implementing support for CUDA-specific surface and texture types.

Notes / results

The goal wasn't just to write code, but to get it merged upstream.

Validation: I used the PolyBench benchmark suite to test completeness.
Upstreaming: All code was reviewed by the core ClangIR maintainers (including the project creator) and merged into the main incubator repository.
Status: The groundwork is now in place for others to build more complex CUDA optimizations on top of ClangIR.