Skip to content

[NVPTX] Prevent fptrunc of v2f32 from being folded into store #149571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

AlexMaclean
Copy link
Member

No description provided.

@AlexMaclean AlexMaclean requested review from Artem-B and Prince781 July 18, 2025 19:34
@AlexMaclean AlexMaclean self-assigned this Jul 18, 2025
@AlexMaclean AlexMaclean requested a review from npanchen July 18, 2025 19:34
@llvmbot
Copy link
Member

llvmbot commented Jul 18, 2025

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/149571.diff

2 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+2)
  • (modified) llvm/test/CodeGen/NVPTX/f32x2-instructions.ll (+35)
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 7aa06f9079b09..4e7002feea215 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -731,6 +731,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
   setTruncStoreAction(MVT::f32, MVT::bf16, Expand);
   setTruncStoreAction(MVT::f64, MVT::bf16, Expand);
   setTruncStoreAction(MVT::f64, MVT::f32, Expand);
+  setTruncStoreAction(MVT::v2f32, MVT::v2f16, Expand);
+  setTruncStoreAction(MVT::v2f32, MVT::v2bf16, Expand);
 
   // PTX does not support load / store predicate registers
   setOperationAction(ISD::LOAD, MVT::i1, Custom);
diff --git a/llvm/test/CodeGen/NVPTX/f32x2-instructions.ll b/llvm/test/CodeGen/NVPTX/f32x2-instructions.ll
index af3cb63082e78..f24428ebcfb8c 100644
--- a/llvm/test/CodeGen/NVPTX/f32x2-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/f32x2-instructions.ll
@@ -1957,6 +1957,41 @@ define <2 x float> @test_uitofp_2xi32_to_2xfloat(<2 x i32> %a) #0 {
   ret <2 x float> %r
 }
 
+define void @test_trunc_to_v2bf16(<2 x float> %a, ptr %p) {
+; CHECK-LABEL: test_trunc_to_v2bf16(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-NEXT:    .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b64 %rd2, [test_trunc_to_v2bf16_param_1];
+; CHECK-NEXT:    ld.param.v2.b32 {%r1, %r2}, [test_trunc_to_v2bf16_param_0];
+; CHECK-NEXT:    cvt.rn.bf16x2.f32 %r3, %r2, %r1;
+; CHECK-NEXT:    st.b32 [%rd2], %r3;
+; CHECK-NEXT:    ret;
+  %trunc = fptrunc <2 x float> %a to <2 x bfloat>
+  store <2 x bfloat> %trunc, ptr %p
+  ret void
+}
+
+define void @test_trunc_to_v2f16(<2 x float> %a, ptr %p) {
+; CHECK-LABEL: test_trunc_to_v2f16(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-NEXT:    .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b64 %rd2, [test_trunc_to_v2f16_param_1];
+; CHECK-NEXT:    ld.param.v2.b32 {%r1, %r2}, [test_trunc_to_v2f16_param_0];
+; CHECK-NEXT:    cvt.rn.f16x2.f32 %r3, %r2, %r1;
+; CHECK-NEXT:    st.b32 [%rd2], %r3;
+; CHECK-NEXT:    ret;
+  %trunc = fptrunc <2 x float> %a to <2 x half>
+  store <2 x half> %trunc, ptr %p
+  ret void
+}
+
+
 attributes #0 = { nounwind }
 attributes #1 = { "unsafe-fp-math" = "true" }
 attributes #2 = { "denormal-fp-math"="preserve-sign" }

Copy link
Contributor

@npanchen npanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@AlexMaclean AlexMaclean merged commit 965b68e into llvm:main Jul 18, 2025
9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 18, 2025

LLVM Buildbot has detected a new failure on builder ml-opt-rel-x86-64 running on ml-opt-rel-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/22430

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/NVPTX/f16x2-instructions.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/ml-opt-rel-x86-64-b1/build/bin/llc < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll -mtriple=nvptx64-nvidia-cuda -mcpu=sm_53           -O0 -disable-post-ra -frame-pointer=all -verify-machineinstrs  | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -allow-deprecated-dag-overlap -check-prefixes CHECK,CHECK-F16 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll # RUN: at line 3
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_53 -O0 -disable-post-ra -frame-pointer=all -verify-machineinstrs
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -allow-deprecated-dag-overlap -check-prefixes CHECK,CHECK-F16 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll:1507:15: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: ld.param.v2.b32 {%r1, %r2}, [test_fptrunc_2xfloat_param_0];
              ^
<stdin>:960:10: note: scanning from here
// %bb.0:
         ^
<stdin>:961:2: note: possible intended match here
 ld.param.b64 %rd1, [test_fptrunc_2xfloat_param_0];
 ^

Input file: <stdin>
Check file: /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
             .
             .
           955: { 
           956:  .reg .b16 %rs<3>; 
           957:  .reg .b32 %r<4>; 
           958:  .reg .b64 %rd<2>; 
           959:  
           960: // %bb.0: 
next:1507'0              X error: no match found
           961:  ld.param.b64 %rd1, [test_fptrunc_2xfloat_param_0]; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:1507'1      ?                                                   possible intended match
           962:  mov.b64 {%r1, %r2}, %rd1; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           963:  cvt.rn.f16.f32 %rs1, %r2; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           964:  cvt.rn.f16.f32 %rs2, %r1; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           965:  mov.b32 %r3, {%rs2, %rs1}; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           966:  st.param.b32 [func_retval0], %r3; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             .
             .
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 18, 2025

LLVM Buildbot has detected a new failure on builder ml-opt-dev-x86-64 running on ml-opt-dev-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/22636

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/NVPTX/f16x2-instructions.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/ml-opt-dev-x86-64-b1/build/bin/llc < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll -mtriple=nvptx64-nvidia-cuda -mcpu=sm_53           -O0 -disable-post-ra -frame-pointer=all -verify-machineinstrs  | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -allow-deprecated-dag-overlap -check-prefixes CHECK,CHECK-F16 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll # RUN: at line 3
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_53 -O0 -disable-post-ra -frame-pointer=all -verify-machineinstrs
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -allow-deprecated-dag-overlap -check-prefixes CHECK,CHECK-F16 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll:1507:15: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: ld.param.v2.b32 {%r1, %r2}, [test_fptrunc_2xfloat_param_0];
              ^
<stdin>:960:10: note: scanning from here
// %bb.0:
         ^
<stdin>:961:2: note: possible intended match here
 ld.param.b64 %rd1, [test_fptrunc_2xfloat_param_0];
 ^

Input file: <stdin>
Check file: /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
             .
             .
           955: { 
           956:  .reg .b16 %rs<3>; 
           957:  .reg .b32 %r<4>; 
           958:  .reg .b64 %rd<2>; 
           959:  
           960: // %bb.0: 
next:1507'0              X error: no match found
           961:  ld.param.b64 %rd1, [test_fptrunc_2xfloat_param_0]; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:1507'1      ?                                                   possible intended match
           962:  mov.b64 {%r1, %r2}, %rd1; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           963:  cvt.rn.f16.f32 %rs1, %r2; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           964:  cvt.rn.f16.f32 %rs2, %r1; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           965:  mov.b32 %r3, {%rs2, %rs1}; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           966:  st.param.b32 [func_retval0], %r3; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             .
             .
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 18, 2025

LLVM Buildbot has detected a new failure on builder lld-x86_64-ubuntu-fast running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/20554

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/NVPTX/f16x2-instructions.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll -mtriple=nvptx64-nvidia-cuda -mcpu=sm_53           -O0 -disable-post-ra -frame-pointer=all -verify-machineinstrs  | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -allow-deprecated-dag-overlap -check-prefixes CHECK,CHECK-F16 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll # RUN: at line 3
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_53 -O0 -disable-post-ra -frame-pointer=all -verify-machineinstrs
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -allow-deprecated-dag-overlap -check-prefixes CHECK,CHECK-F16 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll:1507:15: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: ld.param.v2.b32 {%r1, %r2}, [test_fptrunc_2xfloat_param_0];
              ^
<stdin>:960:10: note: scanning from here
// %bb.0:
         ^
<stdin>:961:2: note: possible intended match here
 ld.param.b64 %rd1, [test_fptrunc_2xfloat_param_0];
 ^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
             .
             .
           955: { 
           956:  .reg .b16 %rs<3>; 
           957:  .reg .b32 %r<4>; 
           958:  .reg .b64 %rd<2>; 
           959:  
           960: // %bb.0: 
next:1507'0              X error: no match found
           961:  ld.param.b64 %rd1, [test_fptrunc_2xfloat_param_0]; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:1507'1      ?                                                   possible intended match
           962:  mov.b64 {%r1, %r2}, %rd1; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           963:  cvt.rn.f16.f32 %rs1, %r2; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           964:  cvt.rn.f16.f32 %rs2, %r1; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           965:  mov.b32 %r3, {%rs2, %rs1}; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           966:  st.param.b32 [func_retval0], %r3; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             .
             .
...

AlexMaclean added a commit that referenced this pull request Jul 19, 2025
…9611)

#149393 and #149571 landed in quick succession requiring
some tests to be regenerated to account for their interactions.
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 19, 2025

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/38648

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/NVPTX/f16x2-instructions.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/build/buildbot/premerge-monolithic-linux/build/bin/llc < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll -mtriple=nvptx64-nvidia-cuda -mcpu=sm_53           -O0 -disable-post-ra -frame-pointer=all -verify-machineinstrs  | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -allow-deprecated-dag-overlap -check-prefixes CHECK,CHECK-F16 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll # RUN: at line 3
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_53 -O0 -disable-post-ra -frame-pointer=all -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -allow-deprecated-dag-overlap -check-prefixes CHECK,CHECK-F16 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll:1507:15: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: ld.param.v2.b32 {%r1, %r2}, [test_fptrunc_2xfloat_param_0];
              ^
<stdin>:960:10: note: scanning from here
// %bb.0:
         ^
<stdin>:961:2: note: possible intended match here
 ld.param.b64 %rd1, [test_fptrunc_2xfloat_param_0];
 ^

Input file: <stdin>
Check file: /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             .
             .
             .
           955: { 
           956:  .reg .b16 %rs<3>; 
           957:  .reg .b32 %r<4>; 
           958:  .reg .b64 %rd<2>; 
           959:  
           960: // %bb.0: 
next:1507'0              X error: no match found
           961:  ld.param.b64 %rd1, [test_fptrunc_2xfloat_param_0]; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:1507'1      ?                                                   possible intended match
           962:  mov.b64 {%r1, %r2}, %rd1; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           963:  cvt.rn.f16.f32 %rs1, %r2; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           964:  cvt.rn.f16.f32 %rs2, %r1; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           965:  mov.b32 %r3, {%rs2, %rs1}; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           966:  st.param.b32 [func_retval0], %r3; 
next:1507'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             .
             .
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants