Possible Typo in fastapi_server.py: n_ctx vs n_batch

When running the server in `fastapi_server.py` I noticed a possible typo in the configuration of the `llama_cpp.Llama` instance.

Here is the relevant code:

```python
llama = llama_cpp.Llama(
    settings.model,
    f16_kv=True,
    use_mlock=True,
    embedding=True,
    n_threads=6,
    n_batch=2048,     <--- Should be n_ctx=2048
)
```

It appears that `n_batch` is set to `2048`, but I believe it might be intended to set `n_ctx` to `2048` instead. When I tried to run the code as is, I encountered an exception due to the assert for `ctx` being `None` during generation. Changing `n_batch` to `n_ctx` resolved the issue.

Also, default batch size is `8`, so `2048` seems a bit high :)

Awesome job!!!


https://github.com/abetlen/llama-cpp-python/blob/b9a4513363267dcc1f4b77d709ac3333fc889c6e/examples/fastapi_server.py#LL36C5-L36C5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible Typo in fastapi_server.py: n_ctx vs n_batch #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possible Typo in fastapi_server.py: n_ctx vs n_batch #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions