You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It might be too much to ask for now, given it's rooting deep into ggml but in longterm I believe it's important to support 16 bit precision.
Especially as GPU support is finding more and more grip in GGML the 32 bit requirement is a significant performance burden while not providing any benefit on the multiplications.
After all the multiplications inside the GPU are all 16 bit, converting src1 from 32 bit to 16 bit for every calculation costs quite noticeable performance.