-
Notifications
You must be signed in to change notification settings - Fork 203
Refactor the builtins to use more overloads #1751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
4c3d00b
to
def257f
Compare
Great work! I'm a bit confused - you say in the title that you make the builtins use more overloads, but the code seems to use less overloads, and more templates instead? Or am I misunderstanding the change? |
Yeah, the wording there is a bit imprecise. If you look at the 3 headers that replace the old |
I see. I was just asking because there was a change in the SYCL 2020 spec to define the builtins as a mixture of templates and overloads (mostly overload for |
To be merged after 25.02 release |
This is a wholesale refactor of the builtins, splitting them into 3 files for generic, math + integer, as well as relational builtins. As part of the refactoring, some functions have been refactored from solely using templates to using overloads and templates, while others have had their templates updated according to the latest spec. Because of the support for floats and vectors/marrays, the actual `builtin_interface` has been changed, particularly with the functions taking pointer arguments.
4dc2c15
to
9cc4592
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Very cool work. Our test cases only test builtin functions selectively, and such a big change is a little scary :) Have you by chance tried to run CTS with this?
Note: The new builtins are designed to support half in varying states, sometimes by just casting to float and doing the operations on float, but the half support should be treated as broken on most platforms. (The support for half on device code in multipass is very weird and spews lots of errors, and the accuracy tests are fail heavily if extended to half).
half is an optional feature; we could in principle say that it is e.g. only fully supported with generic
target (SSCP compiler).
I got the math builtin CTS working (for some definition of working, see this branch), but I did get the math builtins test to run. I also hacked ACPP, see this branch, but I think it works fineish. These are the results: https://github.com/carbotaniuman/SYCL-CTS/blob/e51802a759ebda335c53b5952f41f487afe24bd2/foo Those are sinpi and cospi like things which we fail because of precision, which I think is existing. Unfortunately this reveals that the way we handle half is just broken and blows up on a couple of edge cases. We also don't advertise fp16 support so the CTS doesn't build it, and if we enable it everything crashes terribly. It probably needs another whole refactor. I am fairly confident that the non-half functions are correct, but I'm a bit unsure how to proceed with the half ones. I think the best way to do it would be to properly add overloads for all the builtins for half and probably to deprecate/remove the SCMP support. I would like to get this merged, but given the absolutely shambolic half support I feel a bit uncomfortable doing so. Do you have any thoughts? |
That's great :)
IMO that might be a bug in the CTS and we should consider opening an issue there. The SYCL math builtin precision is implementation-defined, so CTS would be wrong to check for a specific accuracy here. This was recently-ish clarified: KhronosGroup/SYCL-Docs#511
Can you summarize what the issue is? I think in the long run we probably won't get around adding We could then either implement these overloads in the builtin interface by blindly casting to float for SMCP, or implement them properly if we are more motivated.
Even if the PR is perhaps not perfect, as far as I understand it, it doesn't make anything worse, right? As you point out we don't really advertise half. So I wouldn't be opposed to merging it. We can still improve half support in the future. |
Basically the way I implemented half by delegating to double is slightly broken. After thinking about it I think it would be fine to merge it now, half is currently not that very well supported anyways and it's probably better to address that in a follow up. |
This is a wholesale refactor of the builtins, splitting them into 3 files for generic, math + integer, as well as relational builtins. As part of the refactoring, some functions have been refactored from solely using templates to using overloads and templates, while others have had their templates updated according to the latest spec.
Because of the support for floats and vectors/marrays, the actual
builtin_interface
has been changed, particularly with the functions taking pointer arguments.Note: The new builtins are designed to support
half
in varying states, sometimes by just casting to float and doing the operations on float, but the half support should be treated as broken on most platforms. (The support for half on device code in multipass is very weird and spews lots of errors, and the accuracy tests are fail heavily if extended to half).