-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
bugSomething isn't workingSomething isn't workingdocumentationImprovements or additions to documentationImprovements or additions to documentation
Description
When running the server in fastapi_server.py
I noticed a possible typo in the configuration of the llama_cpp.Llama
instance.
Here is the relevant code:
llama = llama_cpp.Llama(
settings.model,
f16_kv=True,
use_mlock=True,
embedding=True,
n_threads=6,
n_batch=2048, <--- Should be n_ctx=2048
)
It appears that n_batch
is set to 2048
, but I believe it might be intended to set n_ctx
to 2048
instead. When I tried to run the code as is, I encountered an exception due to the assert for ctx
being None
during generation. Changing n_batch
to n_ctx
resolved the issue.
Also, default batch size is 8
, so 2048
seems a bit high :)
Awesome job!!!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingdocumentationImprovements or additions to documentationImprovements or additions to documentation