Skip to content

Feature - Internal ggml precision GGML_TYPE_F16 support #1492

@cmp-nct

Description

@cmp-nct

It might be too much to ask for now, given it's rooting deep into ggml but in longterm I believe it's important to support 16 bit precision.
Especially as GPU support is finding more and more grip in GGML the 32 bit requirement is a significant performance burden while not providing any benefit on the multiplications.
After all the multiplications inside the GPU are all 16 bit, converting src1 from 32 bit to 16 bit for every calculation costs quite noticeable performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions