Skip to content

Support for Mixtral (MOE) #1000

@rlancemartin

Description

@rlancemartin

Mixtral is getting added to llama.cpp now -
ggml-org/llama.cpp#4406

Using weights here downloaded to models/mixtral-8x7b.

These steps work (Mac M2 32GB) in llama.cpp -

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
git checkout mixtral
make -j && ./main -m models/mixtral-8x7b/mixtral-8x7b-v0.1.Q4_K_M.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e

Trying the same w/ llama-cpp-python-0.2.22:

from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.llms import LlamaCpp

llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/models/mixtral-8x7b/mixtral-8x7b-instruct-v0.1.Q2_K.gguf",
    n_gpu_layers=1,
    n_batch=512,
    n_ctx=2048,
    f16_kv=True,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=True,
)

Error:

error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions