Skip to content

AMDGPU gfx12: Add _dvgpr$ symbols for dynamic VGPRs #148251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

trenouf
Copy link
Collaborator

@trenouf trenouf commented Jul 11, 2025

For each function with the AMDGPU_CS_Chain calling convention, with dynamic VGPRs enabled, add a _dvgpr$ symbol, with the value of the function symbol, plus an offset encoding one less than the number of VGPR blocks used by the function (16 VGPRs per block, no more than 128) in bits 5..3 of the symbol value. This is used by a front-end to have functions that are chained rather than called, and a dispatcher that dynamically resizes the VGPR count before dispatching to a function.

For each function with the AMDGPU_CS_Chain calling convention, with
dynamic VGPRs enabled, add a _dvgpr$ symbol, with the value of the
function symbol, plus an offset encoding one less than the number of
VGPR blocks used by the function (16 VGPRs per block, no more than 128)
in bits 5..3 of the symbol value. This is used by a front-end to have
functions that are chained rather than called, and a dispatcher that
dynamically resizes the VGPR count before dispatching to a function.
@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Tim Renouf (trenouf)

Changes

For each function with the AMDGPU_CS_Chain calling convention, with dynamic VGPRs enabled, add a _dvgpr$ symbol, with the value of the function symbol, plus an offset encoding one less than the number of VGPR blocks used by the function (16 VGPRs per block, no more than 128) in bits 5..3 of the symbol value. This is used by a front-end to have functions that are chained rather than called, and a dispatcher that dynamically resizes the VGPR count before dispatching to a function.


Full diff: https://github.com/llvm/llvm-project/pull/148251.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+26)
  • (added) llvm/test/CodeGen/AMDGPU/dvgpr_sym.ll (+12)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 749b9efc81378..00ed5f57967ce 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -194,6 +194,32 @@ void AMDGPUAsmPrinter::emitFunctionBodyStart() {
     return;
   }
 
+  if (STM.isDynamicVGPREnabled() &&
+      MF->getFunction().getCallingConv() == CallingConv::AMDGPU_CS_Chain) {
+    // Add a _dvgpr$ symbol, with the value of the function symbol, plus an
+    // offset encoding one less than the number of VGPR blocks used by the
+    // function (16 VGPRs per block, no more than 128) in bits 5..3 of the
+    // symbol value. This is used by a front-end to have functions that are
+    // chained rather than called, and a dispatcher that dynamically resizes
+    // the VGPR count before dispatching to a function.
+    ResourceUsage = &getAnalysis<AMDGPUResourceUsageAnalysis>();
+    const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
+        ResourceUsage->getResourceInfo();
+    MCContext &Ctx = MF->getContext();
+    unsigned EncodedNumVGPRs = (Info.NumVGPR - 1) >> 1 & 0x38;
+    MCSymbol *CurPCSym = Ctx.createTempSymbol();
+    OutStreamer->emitLabel(CurPCSym);
+    const MCExpr *DVgprFuncVal = MCBinaryExpr::createAdd(
+        MCSymbolRefExpr::create(CurPCSym, MCSymbolRefExpr::VK_None, Ctx),
+        MCConstantExpr::create(EncodedNumVGPRs, Ctx), Ctx);
+    MCSymbol *DVgprFuncSym =
+        Ctx.getOrCreateSymbol(Twine("_dvgpr$") + MF->getFunction().getName());
+    OutStreamer->emitAssignment(DVgprFuncSym, DVgprFuncVal);
+    cast<MCSymbolELF>(DVgprFuncSym)
+        ->setBinding(
+            cast<MCSymbolELF>(getSymbol(&MF->getFunction()))->getBinding());
+  }
+
   if (!MFI.isEntryFunction())
     return;
 
diff --git a/llvm/test/CodeGen/AMDGPU/dvgpr_sym.ll b/llvm/test/CodeGen/AMDGPU/dvgpr_sym.ll
new file mode 100644
index 0000000000000..992963d304ead
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/dvgpr_sym.ll
@@ -0,0 +1,12 @@
+; Test generation of _dvgpr$ symbol for an amdgpu_cs_chain function with +dynamic-vgpr.
+
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx1200 -asm-verbose=0 < %s | FileCheck -check-prefixes=DVGPR %s
+
+; DVGPR-LABEL: func:
+; DVGPR: .Ltmp0:
+; DVGPR: .set _dvgpr$func, .Ltmp0+{{[0-9]+}}
+
+define amdgpu_cs_chain void @func() #0 {
+  ret void
+}
+attributes #0 = { "target-features"="+dynamic-vgpr" }

MCSymbolRefExpr::create(CurPCSym, Ctx),
MCConstantExpr::create(EncodedNumVGPRs, Ctx), Ctx);
MCSymbol *DVgprFuncSym =
Ctx.getOrCreateSymbol(Twine("_dvgpr$") + MF->getFunction().getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right prefix to use? Is this using the right visibility?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean is "_dvgpr$" the right prefix? Visibility and linkage fixed.

@arsenm arsenm requested review from rovka and jhuber6 July 14, 2025 06:38
trenouf added 2 commits July 15, 2025 21:24
* Use new func attr;
* allow 16 or 32 block size;
* put code in its own func;
* enhance test, including anonymous func;
* fix name, visibility and linkage
if (!CurrentProgramInfo.NumVGPRsForWavesPerEU->evaluateAsRelocatable(
NumVGPRs, nullptr) ||
!NumVGPRs.isAbsolute()) {
OutContext.reportError({}, "Unable to resolve _dvgpr$ symbol for '" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages should start with a lowercase letter

NumVGPRs, nullptr) ||
!NumVGPRs.isAbsolute()) {
OutContext.reportError({}, "Unable to resolve _dvgpr$ symbol for '" +
Twine(MF.getName()) + "'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the mangled symbol name, this breaks for anonymous functions

BlockSize;
if (NumBlocks > 8) {
OutContext.reportError({},
"Too many DVGPR blocks for _dvgpr$ symbol for '" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Also should test the error cases

@@ -1768,6 +1768,10 @@ The AMDGPU backend supports the following LLVM IR attributes.
using dedicated instructions, but may not send the DEALLOC_VGPRS
message. If a shader has this attribute, then all its callees must
match its value.
An AMD_CS_Chain CC function with this enabled has an extra symbol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An AMD_CS_Chain CC function with this enabled has an extra symbol
An amd_cs_chain CC function with this enabled has an extra symbol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants