[pull] main from abetlen:main #31

pull · 2023-11-02T14:15:16Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.3)

Can you help keep this open source service alive? 💖 Please sponsor : )

…to main

* Disable Windows+CUDA workaround when compiling for HIPBLAS * fix spacing * change condition to check for Windows & CUDA Co-authored-by: Andrei <abetlen@gmail.com> --------- Co-authored-by: Andrei <abetlen@gmail.com>

…to main

* Templates sometimes have BOS in them, remove duplicate * tokenize chat format prompts before completion This is to ensure that we don't duplicate any special tokens. Hopefully I amended the existing formats correctly? * updated comment * corrected a few * add some missing internals * proper bos/eos detection * just let tokenizer do the job * typo-- * align test with new response * changed to a warning * move to another PR * Use python warnings module --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>

* Fix lobprobs when BOS is not present * Fix logprobs when bos is not available

* passthru rpc_servers params wip * enable llama rpc by default * convert string to byte * add rpc package * Revert "enable llama rpc by default" This reverts commit 832c6dd. * update readme * Only set rpc_servers when provided * Add rpc servers to server options --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>

* Support SPM infill * typo-- * one less layer of parenthesis necessary * new required internals * manually add bos/eos if model requires it * add bos even when unknown This is identical behaviour to llama.cpp I guess any model that doesn't use BOS is recent enough to have the add_bos_token metadata. * don't add bos/eos on non-infill pre-tokenized prompt * add tokenizer hack to remove leading space in suffix * I keep forgetting metadata are strings * check if bos exists * add example * add cls/sep instead of bos/eos for WPM vocab * simplify * color-code filtered suffix --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>

… from memory (#1513) * feat: add explicit methods to free model This commit introduces a `close` method to both `Llama` and `_LlamaModel`, allowing users to explicitly free the model from RAM/VRAM. The previous implementation relied on the destructor of `_LlamaModel` to free the model. However, in Python, the timing of destructor calls is unclear—for instance, the `del` statement does not guarantee immediate invocation of the destructor. This commit provides an explicit method to release the model, which works immediately and allows the user to load another model without memory issues. Additionally, this commit implements a context manager in the `Llama` class, enabling the automatic closure of the `Llama` object when used with the `with` statement. * feat: Implement ContextManager in _LlamaModel, _LlamaContext, and _LlamaBatch This commit enables automatic resource management by implementing the `ContextManager` protocol in `_LlamaModel`, `_LlamaContext`, and `_LlamaBatch`. This ensures that resources are properly managed and released within a `with` statement, enhancing robustness and safety in resource handling. * feat: add ExitStack for Llama's internal class closure This update implements ExitStack to manage and close internal classes in Llama, enhancing efficient and safe resource management. * Use contextlib ExitStack and closing * Explicitly free model when closing resources on server --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>

Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.18.1 to 2.19.0. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.18.1...v2.19.0) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

… prefix in header file

pull bot added the ⤵️ pull label Nov 2, 2023

abetlen force-pushed the main branch from f902f59 to fa83cc5 Compare November 2, 2023 18:28

pull bot added the merge-conflict Resolve conflicts manually label Nov 4, 2023

abetlen force-pushed the main branch 5 times, most recently from 4408d7a to cc0fe43 Compare November 14, 2023 20:30

abetlen force-pushed the main branch from 0188482 to c96b2da Compare April 17, 2024 14:06

abetlen and others added 21 commits May 29, 2024 02:02

fix: fix string value kv_overrides. Closes #1487

df45a4b

fix: adjust kv_override member names to match llama.cpp

91d05ab

fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488

165b4dc

fix: Use numpy recarray for candidates data, fixes bug with temp < 0

af3ed50

Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…

a6457ba

…to main

misc: Improve llava error messages

6b018e0

feat: Update llama.cpp

cd3f1bb

chore: Bump version

c3ef41b

Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…

951e39c

…to main

fix: fix logprobs when BOS is not present (#1471)

6e0642c

* Fix lobprobs when BOS is not present * Fix logprobs when bos is not available

feat: Update llama.cpp

255e1b4

feat: Update llama.cpp

83d6b26

feat: Update llama.cpp

1615eb9

chore: Bump version

86a38ad

feat: Update llama.cpp

e342161

abetlen and others added 30 commits March 12, 2025 04:44

feat: Update llama.cpp

e232fae

chore: Bump version

37eb5f0

feat: Update llama.cpp

99f2ebf

feat: Update llama.cpp

4c6514d

chore: Bump version

cb2edb9

hotfix: Disable curl support

b1d23df

feat: Update llama.cpp

0d475d7

misc: Fix support for new parameters, deprecate rpc_servers parameter

51dce74

fix(minor): Fix type hint for older versions of python

5a635f4

fix: Fix missing deprecated symbols on windows with missing LLAMA_API…

0dec788

… prefix in header file

feat: Add support for new mtmd api, add Qwen2.5-VL chat handler

cd548bd

fix: Use num_threads from llama model for mtmd

07a979f

docs: Add Qwen2.5-VL to README

6f3f0bf

chore: Bump version

9770b84

fix: Update reference to in Llama.embed. Closes #2037

9e5a4ea

fix(ci): Update cuda build action to use ubuntu 22.04

ae54cde

fix(ci): Add git to package list

083fcf6

fix(ci): Remove macos-13 builds to fix cross compilation error

11d28df

chore: Bump version

1580839

fix(ci): update runners for cpu builds

82ad829

fix(ci): Update docker runner

7011bc1

feat: Update llama.cpp

b39e9d4

fix(ci): Temporarily disable windows cuda wheels

98fda8c

chore: Bump version

8866fbd

fix(ci): Fix macos cpu builds

cce4887

feat: Update llama.cpp

a99fd21

fix: Better chat format for Qwen2.5-VL (#2040)

c8579d7

chore: Bump version

d9749cb

feat: Update llama.cpp

95292e3

chore: Bump version

e1af05f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] main from abetlen:main #31

[pull] main from abetlen:main #31

pull bot commented Nov 2, 2023 •

edited

Loading

Uh oh!

Uh oh!

[pull] main from abetlen:main #31

Are you sure you want to change the base?

[pull] main from abetlen:main #31

Conversation

pull bot commented Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pull bot commented Nov 2, 2023 •

edited

Loading