Skip to content

Commit 397ae97

Browse files
committed
Update README
1 parent 1c18845 commit 397ae97

File tree

1 file changed

+23
-2
lines changed

1 file changed

+23
-2
lines changed

README.md

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ You can force the use of `cmake` on Linux / MacOS setting the `FORCE_CMAKE=1` en
3131

3232
## High-level API
3333

34+
The high-level API provides a simple managed interface through the `Llama` class.
35+
36+
Below is a short example demonstrating how to use the high-level API to generate text:
37+
3438
```python
3539
>>> from llama_cpp import Llama
3640
>>> llm = Llama(model_path="./models/7B/ggml-model.bin")
@@ -90,8 +94,25 @@ docker run --rm -it -p8000:8000 -v /path/to/models:/models -eMODEL=/models/ggml-
9094

9195
## Low-level API
9296

93-
The low-level API is a direct `ctypes` binding to the C API provided by `llama.cpp`.
94-
The entire API can be found in [llama_cpp/llama_cpp.py](https://github.com/abetlen/llama-cpp-python/blob/master/llama_cpp/llama_cpp.py) and should mirror [llama.h](https://github.com/ggerganov/llama.cpp/blob/master/llama.h).
97+
The low-level API is a direct [`ctypes`](https://docs.python.org/3/library/ctypes.html) binding to the C API provided by `llama.cpp`.
98+
The entire lowe-level API can be found in [llama_cpp/llama_cpp.py](https://github.com/abetlen/llama-cpp-python/blob/master/llama_cpp/llama_cpp.py) and directly mirrors the C API in [llama.h](https://github.com/ggerganov/llama.cpp/blob/master/llama.h).
99+
100+
Below is a short example demonstrating how to use the low-level API to tokenize a prompt:
101+
102+
```python
103+
>>> import llama_cpp
104+
>>> import ctypes
105+
>>> params = llama_cpp.llama_context_default_params()
106+
# use bytes for char * params
107+
>>> ctx = llama_cpp.llama_init_from_file(b"./models/7b/ggml-model.bin", params)
108+
>>> max_tokens = params.n_ctx
109+
# use ctypes arrays for array params
110+
>>> tokens = (llama_cppp.llama_token * int(max_tokens))()
111+
>>> n_tokens = llama_cpp.llama_tokenize(ctx, b"Q: Name the planets in the solar system? A: ", tokens, max_tokens, add_bos=llama_cpp.c_bool(True))
112+
>>> llama_cpp.llama_free(ctx)
113+
```
114+
115+
Check out the [examples folder](examples/low_level_api) for more examples of using the low-level API.
95116

96117

97118
# Documentation

0 commit comments

Comments
 (0)