Skip to content

#1 Updating to latest repo version with LLM Monitoring metrics #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 9, 2024

Conversation

juanroesel
Copy link

@juanroesel juanroesel commented May 7, 2024

Closes ZenHubHQ/devops#2205

It also adds includes the following:

  • A new metric kv_cache_usage_ratio, which measures how much KV cache is being used.
  • Synced commits with the parent repo (not relevant for the PR review).
  • A Llama 3 8B model baked into the image.

New image us.gcr.io/zenhub-ops/llama_cpp_python-llama3_8b_f16:v0.3.1 was successfully deployed into staging.

abetlen and others added 15 commits May 2, 2024 11:32
* set up streaming for v2

* assert v2 streaming, fix tool_call vs function_call

* fix streaming with tool_choice/function_call

* make functions return 1 function call only when 'auto'

* fix

---------

Co-authored-by: Andrei <abetlen@gmail.com>
…ing space (abetlen#1375)

* Fix tokenization edge case where llama output does not start with a space

See this notebook:
https://colab.research.google.com/drive/1Ooz11nFPk19zyJdMDx42CeesU8aWZMdI#scrollTo=oKpHw5PZ30uC

* Update _internals.py

Fixing to compare to b' ' instead of (str)' '

---------

Co-authored-by: Andrei <abetlen@gmail.com>
)

* Update dependabot.yml

Add github-actions update

* Update dependabot.yml

* Update dependabot.yml
@juanroesel juanroesel requested review from m62534 and cwarje May 7, 2024 02:32
@juanroesel
Copy link
Author

NOTE: GH Actions need to be updated in this repo. I will create a ticket for this soon.

@juanroesel juanroesel requested a review from blacklander May 7, 2024 17:17
Copy link

@cwarje cwarje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@juanroesel
Copy link
Author

juanroesel commented May 9, 2024

@m62534 @cwarje Just FYI, given today's events with Llama3, I built a new image us.gcr.io/zenhub-ops/llama_cpp_python_zh-mistral7b_f16:v0.2.1 containing these code changes plus the Mistral model and redeployed it in staging.

@juanroesel juanroesel merged commit 8cd638c into main May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants