update README

jeffrey-fong · allenporter · commit 999d44b0e81f · 2024-07-13T23:13:41.000Z
diff --git a/README.md b/README.md
@@ -465,20 +465,20 @@ llm.create_chat_completion(
 ```
 
 <details>
-<summary>Functionary v2</summary>
+<summary>Functionary</summary>
 
-The various gguf-converted files for this set of models can be found [here](https://huggingface.co/meetkai). Functionary is able to intelligently call functions and also analyze any provided function outputs to generate coherent responses. All v2 models of functionary supports **parallel function calling**. You can provide either `functionary-v1` or `functionary-v2` for the `chat_format` when initializing the Llama class.
+The various gguf-converted files for this set of models can be found [here](https://huggingface.co/meetkai). Functionary is able to intelligently call functions and also analyze any provided function outputs to generate coherent responses. All v2 models of functionary supports **parallel function calling**. You can provide `functionary` for the `chat_format` when initializing the Llama class.
 
 Due to discrepancies between llama.cpp and HuggingFace's tokenizers, it is required to provide HF Tokenizer for functionary. The `LlamaHFTokenizer` class can be initialized and passed into the Llama class. This will override the default llama.cpp tokenizer used in Llama class. The tokenizer files are already included in the respective HF repositories hosting the gguf files.
 
 ```python
 from llama_cpp import Llama
 from llama_cpp.llama_tokenizer import LlamaHFTokenizer
 llm = Llama.from_pretrained(
-  repo_id="meetkai/functionary-small-v2.2-GGUF",
-  filename="functionary-small-v2.2.q4_0.gguf",
-  chat_format="functionary-v2",
-  tokenizer=LlamaHFTokenizer.from_pretrained("meetkai/functionary-small-v2.2-GGUF")
+  repo_id="meetkai/functionary-small-v2.5-GGUF",
+  filename="functionary-small-v2.5.Q4_0.gguf",
+  chat_format="functionary",
+  tokenizer=LlamaHFTokenizer.from_pretrained("meetkai/functionary-small-v2.5-GGUF")
 )
 ```
 
diff --git a/docs/server.md b/docs/server.md
@@ -78,12 +78,12 @@ You'll first need to download one of the available function calling models in GG
 
 - [functionary](https://huggingface.co/meetkai)
 
-Then when you run the server you'll need to also specify either `functionary-v1` or `functionary-v2` chat_format.
+Then when you run the server you'll need to also specify `functionary` chat_format.
 
 Note that since functionary requires a HF Tokenizer due to discrepancies between llama.cpp and HuggingFace's tokenizers as mentioned [here](https://github.com/abetlen/llama-cpp-python/blob/main?tab=readme-ov-file#function-calling), you will need to pass in the path to the tokenizer too. The tokenizer files are already included in the respective HF repositories hosting the gguf files.
 
 ```bash
-python3 -m llama_cpp.server --model <model_path_to_functionary_v2_model> --chat_format functionary-v2 --hf_pretrained_model_name_or_path <model_path_to_functionary_v2_tokenizer>
+python3 -m llama_cpp.server --model <model_path_to_functionary_model> --chat_format functionary --hf_pretrained_model_name_or_path <model_path_to_functionary_tokenizer>
 ```
 
 Check out this [example notebook](https://github.com/abetlen/llama-cpp-python/blob/main/examples/notebooks/Functions.ipynb) for a walkthrough of some interesting use cases for function calling.