-
Notifications
You must be signed in to change notification settings - Fork 14.5k
AMDGPU gfx12: Add _dvgpr$ symbols for dynamic VGPRs #148251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
For each function with the AMDGPU_CS_Chain calling convention, with dynamic VGPRs enabled, add a _dvgpr$ symbol, with the value of the function symbol, plus an offset encoding one less than the number of VGPR blocks used by the function (16 VGPRs per block, no more than 128) in bits 5..3 of the symbol value. This is used by a front-end to have functions that are chained rather than called, and a dispatcher that dynamically resizes the VGPR count before dispatching to a function.
@llvm/pr-subscribers-backend-amdgpu Author: Tim Renouf (trenouf) ChangesFor each function with the AMDGPU_CS_Chain calling convention, with dynamic VGPRs enabled, add a _dvgpr$ symbol, with the value of the function symbol, plus an offset encoding one less than the number of VGPR blocks used by the function (16 VGPRs per block, no more than 128) in bits 5..3 of the symbol value. This is used by a front-end to have functions that are chained rather than called, and a dispatcher that dynamically resizes the VGPR count before dispatching to a function. Full diff: https://github.com/llvm/llvm-project/pull/148251.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 749b9efc81378..00ed5f57967ce 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -194,6 +194,32 @@ void AMDGPUAsmPrinter::emitFunctionBodyStart() {
return;
}
+ if (STM.isDynamicVGPREnabled() &&
+ MF->getFunction().getCallingConv() == CallingConv::AMDGPU_CS_Chain) {
+ // Add a _dvgpr$ symbol, with the value of the function symbol, plus an
+ // offset encoding one less than the number of VGPR blocks used by the
+ // function (16 VGPRs per block, no more than 128) in bits 5..3 of the
+ // symbol value. This is used by a front-end to have functions that are
+ // chained rather than called, and a dispatcher that dynamically resizes
+ // the VGPR count before dispatching to a function.
+ ResourceUsage = &getAnalysis<AMDGPUResourceUsageAnalysis>();
+ const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =
+ ResourceUsage->getResourceInfo();
+ MCContext &Ctx = MF->getContext();
+ unsigned EncodedNumVGPRs = (Info.NumVGPR - 1) >> 1 & 0x38;
+ MCSymbol *CurPCSym = Ctx.createTempSymbol();
+ OutStreamer->emitLabel(CurPCSym);
+ const MCExpr *DVgprFuncVal = MCBinaryExpr::createAdd(
+ MCSymbolRefExpr::create(CurPCSym, MCSymbolRefExpr::VK_None, Ctx),
+ MCConstantExpr::create(EncodedNumVGPRs, Ctx), Ctx);
+ MCSymbol *DVgprFuncSym =
+ Ctx.getOrCreateSymbol(Twine("_dvgpr$") + MF->getFunction().getName());
+ OutStreamer->emitAssignment(DVgprFuncSym, DVgprFuncVal);
+ cast<MCSymbolELF>(DVgprFuncSym)
+ ->setBinding(
+ cast<MCSymbolELF>(getSymbol(&MF->getFunction()))->getBinding());
+ }
+
if (!MFI.isEntryFunction())
return;
diff --git a/llvm/test/CodeGen/AMDGPU/dvgpr_sym.ll b/llvm/test/CodeGen/AMDGPU/dvgpr_sym.ll
new file mode 100644
index 0000000000000..992963d304ead
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/dvgpr_sym.ll
@@ -0,0 +1,12 @@
+; Test generation of _dvgpr$ symbol for an amdgpu_cs_chain function with +dynamic-vgpr.
+
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx1200 -asm-verbose=0 < %s | FileCheck -check-prefixes=DVGPR %s
+
+; DVGPR-LABEL: func:
+; DVGPR: .Ltmp0:
+; DVGPR: .set _dvgpr$func, .Ltmp0+{{[0-9]+}}
+
+define amdgpu_cs_chain void @func() #0 {
+ ret void
+}
+attributes #0 = { "target-features"="+dynamic-vgpr" }
|
MCSymbolRefExpr::create(CurPCSym, Ctx), | ||
MCConstantExpr::create(EncodedNumVGPRs, Ctx), Ctx); | ||
MCSymbol *DVgprFuncSym = | ||
Ctx.getOrCreateSymbol(Twine("_dvgpr$") + MF->getFunction().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right prefix to use? Is this using the right visibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean is "_dvgpr$" the right prefix? Visibility and linkage fixed.
* Use new func attr; * allow 16 or 32 block size; * put code in its own func; * enhance test, including anonymous func; * fix name, visibility and linkage
if (!CurrentProgramInfo.NumVGPRsForWavesPerEU->evaluateAsRelocatable( | ||
NumVGPRs, nullptr) || | ||
!NumVGPRs.isAbsolute()) { | ||
OutContext.reportError({}, "Unable to resolve _dvgpr$ symbol for '" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error messages should start with a lowercase letter
NumVGPRs, nullptr) || | ||
!NumVGPRs.isAbsolute()) { | ||
OutContext.reportError({}, "Unable to resolve _dvgpr$ symbol for '" + | ||
Twine(MF.getName()) + "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the mangled symbol name, this breaks for anonymous functions
BlockSize; | ||
if (NumBlocks > 8) { | ||
OutContext.reportError({}, | ||
"Too many DVGPR blocks for _dvgpr$ symbol for '" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. Also should test the error cases
@@ -1768,6 +1768,10 @@ The AMDGPU backend supports the following LLVM IR attributes. | |||
using dedicated instructions, but may not send the DEALLOC_VGPRS | |||
message. If a shader has this attribute, then all its callees must | |||
match its value. | |||
An AMD_CS_Chain CC function with this enabled has an extra symbol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An AMD_CS_Chain CC function with this enabled has an extra symbol | |
An amd_cs_chain CC function with this enabled has an extra symbol |
For each function with the AMDGPU_CS_Chain calling convention, with dynamic VGPRs enabled, add a _dvgpr$ symbol, with the value of the function symbol, plus an offset encoding one less than the number of VGPR blocks used by the function (16 VGPRs per block, no more than 128) in bits 5..3 of the symbol value. This is used by a front-end to have functions that are chained rather than called, and a dispatcher that dynamically resizes the VGPR count before dispatching to a function.