Skip to content

Refactor the builtins to use more overloads #1751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

carbotaniuman
Copy link
Contributor

@carbotaniuman carbotaniuman commented Mar 21, 2025

This is a wholesale refactor of the builtins, splitting them into 3 files for generic, math + integer, as well as relational builtins. As part of the refactoring, some functions have been refactored from solely using templates to using overloads and templates, while others have had their templates updated according to the latest spec.

Because of the support for floats and vectors/marrays, the actual builtin_interface has been changed, particularly with the functions taking pointer arguments.

Note: The new builtins are designed to support half in varying states, sometimes by just casting to float and doing the operations on float, but the half support should be treated as broken on most platforms. (The support for half on device code in multipass is very weird and spews lots of errors, and the accuracy tests are fail heavily if extended to half).

@carbotaniuman carbotaniuman force-pushed the refactor-builtins branch 8 times, most recently from 4c3d00b to def257f Compare March 21, 2025 21:07
@illuhad
Copy link
Collaborator

illuhad commented Apr 14, 2025

Great work! I'm a bit confused - you say in the title that you make the builtins use more overloads, but the code seems to use less overloads, and more templates instead? Or am I misunderstanding the change?

@carbotaniuman
Copy link
Contributor Author

carbotaniuman commented Apr 15, 2025

Yeah, the wording there is a bit imprecise. If you look at the 3 headers that replace the old builtin_utils, math_builtins uses (some) overloads, while generic_builtins and relational_builtins are all templates. The code duplication and repetition is sort of unfortunate, but the SYCL spec for all the categories is different enough that it's not really possible to nicely share code there.

@illuhad
Copy link
Collaborator

illuhad commented Apr 15, 2025

I see. I was just asking because there was a change in the SYCL 2020 spec to define the builtins as a mixture of templates and overloads (mostly overload for float/double/half and template for marray/vec), which matters for overload resolution. But it seems your PR already takes this into account :)

@illuhad
Copy link
Collaborator

illuhad commented May 5, 2025

To be merged after 25.02 release

This is a wholesale refactor of the builtins, splitting them into 3
files for generic, math + integer, as well as relational builtins.
As part of the refactoring, some functions have been refactored from
solely using templates to using overloads and templates, while others
have had their templates updated according to the latest spec.

Because of the support for floats and vectors/marrays, the actual
`builtin_interface` has been changed, particularly with the functions
taking pointer arguments.
Copy link
Collaborator

@illuhad illuhad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Very cool work. Our test cases only test builtin functions selectively, and such a big change is a little scary :) Have you by chance tried to run CTS with this?

Note: The new builtins are designed to support half in varying states, sometimes by just casting to float and doing the operations on float, but the half support should be treated as broken on most platforms. (The support for half on device code in multipass is very weird and spews lots of errors, and the accuracy tests are fail heavily if extended to half).

half is an optional feature; we could in principle say that it is e.g. only fully supported with generic target (SSCP compiler).

@carbotaniuman
Copy link
Contributor Author

carbotaniuman commented Jul 12, 2025

I got the math builtin CTS working (for some definition of working, see this branch), but I did get the math builtins test to run. I also hacked ACPP, see this branch, but I think it works fineish.

These are the results: https://github.com/carbotaniuman/SYCL-CTS/blob/e51802a759ebda335c53b5952f41f487afe24bd2/foo

Those are sinpi and cospi like things which we fail because of precision, which I think is existing. Unfortunately this reveals that the way we handle half is just broken and blows up on a couple of edge cases. We also don't advertise fp16 support so the CTS doesn't build it, and if we enable it everything crashes terribly. It probably needs another whole refactor.

I am fairly confident that the non-half functions are correct, but I'm a bit unsure how to proceed with the half ones. I think the best way to do it would be to properly add overloads for all the builtins for half and probably to deprecate/remove the SCMP support.

I would like to get this merged, but given the absolutely shambolic half support I feel a bit uncomfortable doing so. Do you have any thoughts?

@illuhad
Copy link
Collaborator

illuhad commented Jul 14, 2025

I got the math builtin CTS working (for some definition of working, see this branch), but I did get the math builtins test to run. I also hacked ACPP, see this branch, but I think it works fineish.

That's great :)

Those are sinpi and cospi like things which we fail because of precision, which I think is existing.

IMO that might be a bug in the CTS and we should consider opening an issue there. The SYCL math builtin precision is implementation-defined, so CTS would be wrong to check for a specific accuracy here. This was recently-ish clarified: KhronosGroup/SYCL-Docs#511

Unfortunately this reveals that the way we handle half is just broken and blows up on a couple of edge cases. We also don't advertise fp16 support so the CTS doesn't build it, and if we enable it everything crashes terribly. It probably needs another whole refactor. I am fairly confident that the non-half functions are correct, but I'm a bit unsure how to proceed with the half ones. I think the best way to do it would be to properly add overloads for all the builtins for half and probably to deprecate/remove the SCMP support.

Can you summarize what the issue is?

I think in the long run we probably won't get around adding half overloads to the builtin interface which the backends can then implement in whatever way makes sense for them/the target hardware. In this case, the SYCL builtin definitions in the header would just call into the builtin interface in the same way as for float or double.

We could then either implement these overloads in the builtin interface by blindly casting to float for SMCP, or implement them properly if we are more motivated.

I would like to get this merged, but given the absolutely shambolic half support I feel a bit uncomfortable doing so. Do you have any thoughts?

Even if the PR is perhaps not perfect, as far as I understand it, it doesn't make anything worse, right? As you point out we don't really advertise half. So I wouldn't be opposed to merging it. We can still improve half support in the future.

@carbotaniuman
Copy link
Contributor Author

Basically the way I implemented half by delegating to double is slightly broken. After thinking about it I think it would be fine to merge it now, half is currently not that very well supported anyways and it's probably better to address that in a follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants