Skip to content

Possible Typo in fastapi_server.py: n_ctx vs n_batch #11

@MillionthOdin16

Description

@MillionthOdin16

When running the server in fastapi_server.py I noticed a possible typo in the configuration of the llama_cpp.Llama instance.

Here is the relevant code:

llama = llama_cpp.Llama(
    settings.model,
    f16_kv=True,
    use_mlock=True,
    embedding=True,
    n_threads=6,
    n_batch=2048,     <--- Should be n_ctx=2048
)

It appears that n_batch is set to 2048, but I believe it might be intended to set n_ctx to 2048 instead. When I tried to run the code as is, I encountered an exception due to the assert for ctx being None during generation. Changing n_batch to n_ctx resolved the issue.

Also, default batch size is 8, so 2048 seems a bit high :)

Awesome job!!!

https://github.com/abetlen/llama-cpp-python/blob/b9a4513363267dcc1f4b77d709ac3333fc889c6e/examples/fastapi_server.py#LL36C5-L36C5

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocumentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions