Skip to content

Fix Ernie4.5 MoE without shared experts #14746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2025

Conversation

pwilkin
Copy link
Contributor

@pwilkin pwilkin commented Jul 17, 2025

Fix bug per discussion in #14658

@github-actions github-actions bot added the python python script changes label Jul 17, 2025
@pwilkin
Copy link
Contributor Author

pwilkin commented Jul 17, 2025

@CISC I believe this is what you had in mind :)

@pwilkin pwilkin force-pushed the fix-big-ernie-moe branch from 88611c1 to f6e4931 Compare July 17, 2025 22:22
@pwilkin pwilkin force-pushed the fix-big-ernie-moe branch from f6e4931 to 19eb88c Compare July 17, 2025 22:24
@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

Yes, however you can remove add_expert_shared_feed_forward_length and change tensor loading in llama-model.cpp, see similar code:

llama.cpp/src/llama-model.cpp

Lines 4784 to 4786 in 760b448

layer.ffn_gate_shexp = create_tensor(tn(LLM_TENSOR_FFN_GATE_SHEXP, "weight", i), {n_embd, n_ff_exp * n_expert_shared}, 0);
layer.ffn_down_shexp = create_tensor(tn(LLM_TENSOR_FFN_DOWN_SHEXP, "weight", i), { n_ff_exp * n_expert_shared, n_embd}, 0);
layer.ffn_up_shexp = create_tensor(tn(LLM_TENSOR_FFN_UP_SHEXP, "weight", i), {n_embd, n_ff_exp * n_expert_shared}, 0);

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

Use n_expert_shared as condition for loading them, and remember to init that value here:

llama.cpp/src/llama-model.cpp

Lines 1657 to 1662 in 760b448

if (arch == LLM_ARCH_ERNIE4_5_MOE) {
ml.get_key(LLM_KV_EXPERT_FEED_FORWARD_LENGTH, hparams.n_ff_exp);
ml.get_key(LLM_KV_EXPERT_SHARED_FEED_FORWARD_LENGTH, hparams.n_ff_shexp, false);
ml.get_key(LLM_KV_INTERLEAVE_MOE_LAYER_STEP, hparams.n_moe_layer_step);
ml.get_key(LLM_KV_LEADING_DENSE_BLOCK_COUNT, hparams.n_layer_dense_lead);
}

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

Actually, let's not as there are already GGUFs out there. The old calculation is fine as well.

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

@nicoboss You will have to reconvert (or delete the ernie4_5-moe.expert_shared_feed_forward_length key).

@CISC CISC merged commit 670e136 into ggml-org:master Jul 17, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants