Model: Add support for Ernie 4.5 MoE #14658

pwilkin · 2025-07-13T00:29:15Z

[x] I have no idea what I'm doing

This is my first attempt at adding a new arch and I am very much out of my depth, so I would really appreciate if someone took a look at it and verified if it even made any sense. I basically asked Gemini to make a patch based on the existing vLLM / Chatllm.cpp implementations, then tackled some of the conversion logic myself so that it actually generates a GGUF file with all the layers.

Would close #14465

pwilkin · 2025-07-13T18:04:25Z

All right, I made a Q4_0 model and got a coherent response, so I guess this somewhat works. I'm upgrading this from draft status, maybe someone can take a look.

pwilkin · 2025-07-14T10:27:44Z

A sample quant for this model has been uploaded here: https://huggingface.co/ilintar/ERNIE-4.5-21B-A3B-PT-gguf

convert_hf_to_gguf.py

gguf-py/gguf/constants.py

src/llama-model.cpp

src/llama-arch.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

pwilkin · 2025-07-14T12:47:47Z

@CISC all right, think that should be all the fixes.

src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

theo77186 · 2025-07-14T19:24:10Z

I'm testing this branch, while testing speculative decoding, it seems it caused a regression loading the dense 300M model.

Logs loading the dense model only

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
  Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
build: 5896 (542f36bbb) with cc (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnu
system info: n_threads = 16, n_threads_batch = 16, total_threads = 32

system_info: n_threads = 16 (n_threads_batch = 16) / 32 | CUDA : ARCHS = 860,890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

main: binding port with default address family
main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 31
main: loading model
srv    load_model: loading model 'ERNIE-4.5-0.3B-f16.gguf'
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15216 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 3060) - 11822 MiB free
llama_model_loader: loaded meta data with 30 key-value pairs and 164 tensors from ERNIE-4.5-0.3B-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = ernie4_5
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = ERNIE 4.5 0.3B PT
llama_model_loader: - kv   3:                           general.finetune str              = PT
llama_model_loader: - kv   4:                           general.basename str              = ERNIE-4.5
llama_model_loader: - kv   5:                         general.size_label str              = 0.3B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                               general.tags arr[str,2]       = ["ERNIE4.5", "text-generation"]
llama_model_loader: - kv   8:                          general.languages arr[str,2]       = ["en", "zh"]
llama_model_loader: - kv   9:                       ernie4_5.block_count u32              = 18
llama_model_loader: - kv  10:                    ernie4_5.context_length u32              = 131072
llama_model_loader: - kv  11:                  ernie4_5.embedding_length u32              = 1024
llama_model_loader: - kv  12:               ernie4_5.feed_forward_length u32              = 3072
llama_model_loader: - kv  13:              ernie4_5.attention.head_count u32              = 16
llama_model_loader: - kv  14:           ernie4_5.attention.head_count_kv u32              = 2
llama_model_loader: - kv  15:                    ernie4_5.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  16:  ernie4_5.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  17:              ernie4_5.attention.key_length u32              = 128
llama_model_loader: - kv  18:            ernie4_5.attention.value_length u32              = 128
llama_model_loader: - kv  19:                          general.file_type u32              = 1
llama_model_loader: - kv  20:               general.quantization_version u32              = 2
llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,103424]  = ["<unk>", "<s>", "</s>", "0", "1", "2...
llama_model_loader: - kv  24:                      tokenizer.ggml.scores arr[f32,103424]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  25:                  tokenizer.ggml.token_type arr[i32,103424]  = [2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  28:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  29:                    tokenizer.chat_template str              = {%- if not add_generation_prompt is d...
llama_model_loader: - type  f32:   37 tensors
llama_model_loader: - type  f16:  127 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 688.14 MiB (16.00 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 1012
load: token to piece cache size = 0.5907 MB
print_info: arch             = ernie4_5
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 1024
print_info: n_layer          = 18
print_info: n_head           = 16
print_info: n_head_kv        = 2
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 8
print_info: n_embd_k_gqa     = 256
print_info: n_embd_v_gqa     = 256
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: model type       = 0.3B
print_info: model params     = 360.75 M
print_info: general.name     = ERNIE 4.5 0.3B PT
print_info: vocab type       = SPM
print_info: n_vocab          = 103424
print_info: n_merges         = 0
print_info: BOS token        = 1 '<s>'
print_info: EOS token        = 2 '</s>'
print_info: UNK token        = 0 '<unk>'
print_info: PAD token        = 0 '<unk>'
print_info: LF token         = 23 '<0x0A>'
print_info: EOG token        = 2 '</s>'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
llama_model_load: error loading model: missing tensor '__missing__'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'ERNIE-4.5-0.3B-f16.gguf'

It seems to complain about missing tensors, which doesn't happen on master.

CISC · 2025-07-14T19:28:42Z

I'm testing this branch, while testing speculative decoding, it seems it caused a regression loading the dense 300M model.

Yep, it's broken for all dense models right now, will suggest a fix. :)

src/llama-model.cpp

pwilkin · 2025-07-14T19:50:27Z

@CISC Thanks :) yeah, would love to, I'm quantizing the small one with the qwen imatrix calibration data from Bartowski, but I don't have a machine to fit the large one.

pwilkin · 2025-07-14T20:09:35Z

Also, would be probably a good idea to incorporate the vision models somehow, but I'm not sure I'll be able to handle that one myself :)

Mushoz · 2025-07-14T20:17:10Z

My apologies for this slightly offtopic post, but with the introduction of Ernie 4.5 a new quantization algorithm was also introduced with supposedly SOTA performance at 2 bit. Is that something that will also be incorporated into llama.cpp?

pwilkin · 2025-07-14T22:52:38Z

Dropping this in here for future reference (this is the only reference implementation of the VL part so far, from what I can tell):

https://github.com/PaddlePaddle/FastDeploy/tree/develop/fastdeploy/model_executor/models/ernie4_5_vl

ThiloteE · 2025-07-15T21:08:20Z

I noticed (because I wanted to try this branch), you are trying to merge from "master" of your fork into "master" of ggml-org/llama.cpp. Is this accepted practice or is creating a separate branch a requirement for merging into llama.cpp?

CISC · 2025-07-15T21:33:16Z

I noticed (because I wanted to try this branch), you are trying to merge from "master" of your fork into "master" of ggml-org/llama.cpp. Is this accepted practice or is creating a separate branch a requirement for merging into llama.cpp?

It's acceptable, but not recommended.

pwilkin · 2025-07-15T21:57:26Z

Yeah, it's generally not a great idea because if there are conflicts and you have to merge upstream changes then you have no master branch locally to easily pull them to, I just realized too late that I forgot to make a branch :>

ThiloteE · 2025-07-15T22:17:26Z

I did try https://huggingface.co/ilintar/ERNIE-4.5-21B-A3B-PT-gguf/blob/main/baidu-ERNIE-4.5-21B-A3B-PT-iq3_M.gguf on my rtx 3060 12GB with cuda 12.9.

Command: llama-server --port 8080 --jinja -fa -c 8192 -ngl 29 -t 6

something is wrong with beginning of sentence (bos) or end of sentence (eos).

but not always

For some reason the default tokenizer-config.json holds a jinja template that sets the csl token. I suppose at Baidu, they are having an app or downstream application that can make use of that somehow, but maybe for llama.cpp we can add a default template that works out of the box and filters those out. I would need to do some experiments to fix this and I don't have time for that in the coming days, unfortunately. Over and out for now.

CISC · 2025-07-16T07:27:06Z

@ThiloteE It looks like this is an error in the model config, they have not put the cls/sep tokens in the added_tokens mapping, that's probably why they are hardcoded in the chat template, however since they then are not marked as special tokens they are printed as-is and probably tokenized incorrectly.

There's not much we can do here though, this needs to be fixed by Baidu.

The cls/sep tokens will also normally only be used for WPM tokenization, so it's quite possible that Ernie is broken in general.

a31413510 · 2025-07-17T05:59:09Z

@ThiloteE It looks like this is an error in the model config, they have not put the cls/sep tokens in the added_tokens mapping, that's probably why they are hardcoded in the chat template, however since they then are not marked as special tokens they are printed as-is and probably tokenized incorrectly.

There's not much we can do here though, this needs to be fixed by Baidu.

The cls/sep tokens will also normally only be used for WPM tokenization, so it's quite possible that Ernie is broken in general.

What problems might this cause？

CISC · 2025-07-17T06:27:37Z

What problems might this cause？

Incorrect tokenization and incorrect BOS/CLS and/or EOS/SEP will cause the model to respond differently, quite often badly, to prompts.

Ernie uses SPM tokenization, which means it will add a BOS (<s>) token by default but no EOS (</s>), if the chat template is to be believed this is incorrect and it should instead add a CLS (<|begin_of_sentence|>) token and separate conversation fragments with SEP (<|end_of_sentence|>).

In effect this means that you have to use --jinja and chat completion (that way no BOS is added) with Ernie models (this was really the case already as the model was added before launch without built-in chat template support), but depending on how CLS/SEP is tokenized (I haven't checked) and whether the model was trained on that same tokenization or not, its responses might not be very good.

nicoboss · 2025-07-17T21:21:38Z

Do you need help testing the 300B model?

@pwilkin I just tested the 300B model on latest commit. It unfortunately fails the load due to missing tensor 'blk.3.ffn_gate_shexp.weight'. Do you have any idea how to fix this? This error does occur before llama.cpp even loads the model into memory so it should be no problem to reproduce it on your side even if you don't have the resources necessary to actually load it. If there is anything I can do to help you debug this please let me know. Here the full log:

root@AI:/apool/llama.cpp/build/bin# ./llama-cli -m /bpool/ERNIE-4.5-300B-A47B-PT.gguf -ngl 0 -c 7000                                                                                 ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
  Device 1: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
build: 5937 (075ffdcd) with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4090) - 23686 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 4090) - 23689 MiB free
llama_model_loader: loaded meta data with 34 key-value pairs and 591 tensors from /bpool/ERNIE-4.5-300B-A47B-PT.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = ernie4_5-moe
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = ERNIE 4.5 300B A47B PT
llama_model_loader: - kv   3:                           general.finetune str              = PT
llama_model_loader: - kv   4:                           general.basename str              = ERNIE-4.5
llama_model_loader: - kv   5:                         general.size_label str              = 300B-A47B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                               general.tags arr[str,2]       = ["ERNIE4.5", "text-generation"]
llama_model_loader: - kv   8:                          general.languages arr[str,2]       = ["en", "zh"]
llama_model_loader: - kv   9:                   ernie4_5-moe.block_count u32              = 54
llama_model_loader: - kv  10:                ernie4_5-moe.context_length u32              = 131072
llama_model_loader: - kv  11:              ernie4_5-moe.embedding_length u32              = 8192
llama_model_loader: - kv  12:           ernie4_5-moe.feed_forward_length u32              = 28672
llama_model_loader: - kv  13:          ernie4_5-moe.attention.head_count u32              = 64
llama_model_loader: - kv  14:       ernie4_5-moe.attention.head_count_kv u32              = 8
llama_model_loader: - kv  15:                ernie4_5-moe.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  16: ernie4_5-moe.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  17:                          general.file_type u32              = 1
llama_model_loader: - kv  18:                  ernie4_5-moe.expert_count u32              = 64
llama_model_loader: - kv  19:             ernie4_5-moe.expert_used_count u32              = 8
llama_model_loader: - kv  20:     ernie4_5-moe.interleave_moe_layer_step u32              = 1
llama_model_loader: - kv  21:     ernie4_5-moe.leading_dense_block_count u32              = 3
llama_model_loader: - kv  22:    ernie4_5-moe.expert_feed_forward_length u32              = 3584
llama_model_loader: - kv  23: ernie4_5-moe.expert_shared_feed_forward_length u32              = 3584
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - kv  25:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  26:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  27:                      tokenizer.ggml.tokens arr[str,103424]  = ["<unk>", "<s>", "</s>", "0", "1", "2...
llama_model_loader: - kv  28:                      tokenizer.ggml.scores arr[f32,103424]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  29:                  tokenizer.ggml.token_type arr[i32,103424]  = [2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  31:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  32:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if not add_generation_prompt is d...
llama_model_loader: - type  f32:  211 tensors
llama_model_loader: - type  f16:  380 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 557.88 GiB (16.00 BPW)
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 1012
load: token to piece cache size = 0.5907 MB
print_info: arch             = ernie4_5-moe
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 8192
print_info: n_layer          = 54
print_info: n_head           = 64
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 8
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 28672
print_info: n_expert         = 64
print_info: n_expert_used    = 8
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: model type       = 300B.A47B
print_info: model params     = 299.48 B
print_info: general.name     = ERNIE 4.5 300B A47B PT
print_info: vocab type       = SPM
print_info: n_vocab          = 103424
print_info: n_merges         = 0
print_info: BOS token        = 1 '<s>'
print_info: EOS token        = 2 '</s>'
print_info: UNK token        = 0 '<unk>'
print_info: PAD token        = 0 '<unk>'
print_info: LF token         = 23 '<0x0A>'
print_info: EOG token        = 2 '</s>'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
llama_model_load: error loading model: missing tensor 'blk.3.ffn_gate_shexp.weight'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/bpool/ERNIE-4.5-300B-A47B-PT.gguf'
main: error: unable to load model

In case it helps here the convert_hf_to_gguf.py output:

root@AI:/apool/llama.cpp# venv/bin/python convert_hf_to_gguf.py --outfile /bpool/ERNIE-4.5-300B-A47B-PT.gguf /bpool/ERNIE-4.5-300B-A47B-PT
INFO:hf-to-gguf:Loading model: ERNIE-4.5-300B-A47B-PT
WARNING:hf-to-gguf:Failed to load model config from /bpool/ERNIE-4.5-300B-A47B-PT: Loading /bpool/ERNIE-4.5-300B-A47B-PT requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: Ernie4_5_MoeForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /bpool/ERNIE-4.5-300B-A47B-PT: Loading /bpool/ERNIE-4.5-300B-A47B-PT requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00123.safetensors'
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> F16, shape = {8192, 103424}
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.bfloat16 --> F16, shape = {28672, 8192}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,        torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.0.ffn_up.weight,          torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.0.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.0.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.0.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.0.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.1.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.1.ffn_down.weight,        torch.bfloat16 --> F16, shape = {28672, 8192}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,        torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.1.ffn_up.weight,          torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.1.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-00123.safetensors'
INFO:hf-to-gguf:blk.1.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.1.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.1.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.2.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.2.ffn_down.weight,        torch.bfloat16 --> F16, shape = {28672, 8192}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,        torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.2.ffn_up.weight,          torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.2.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.2.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.2.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.2.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.3.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00003-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00004-of-00123.safetensors'
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.3.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.3.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.3.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.3.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.3.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.3.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.4.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00005-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00006-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00007-of-00123.safetensors'
INFO:hf-to-gguf:blk.4.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.4.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.4.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.4.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.4.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.4.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.4.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.4.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.5.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00008-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00009-of-00123.safetensors'
INFO:hf-to-gguf:blk.5.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.5.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.5.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.5.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.5.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.5.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.5.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.5.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.6.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00010-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00011-of-00123.safetensors'
INFO:hf-to-gguf:blk.6.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.6.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.6.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.6.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.6.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.6.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.6.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.6.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.6.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.7.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00012-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00013-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00014-of-00123.safetensors'
INFO:hf-to-gguf:blk.7.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.7.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.7.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.7.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.7.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.7.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.7.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.7.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.7.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.8.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00015-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00016-of-00123.safetensors'
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.8.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.8.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.8.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.8.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.8.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.9.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00017-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00018-of-00123.safetensors'
INFO:hf-to-gguf:blk.10.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.9.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.9.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.9.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.9.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.9.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.9.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.9.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.9.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.9.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00019-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00020-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00021-of-00123.safetensors'
INFO:hf-to-gguf:blk.10.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.10.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.10.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.10.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.10.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.10.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.10.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.10.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.10.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.11.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00022-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00023-of-00123.safetensors'
INFO:hf-to-gguf:blk.11.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.11.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.11.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.11.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.11.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.11.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.11.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.11.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.12.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00024-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00025-of-00123.safetensors'
INFO:hf-to-gguf:blk.12.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.12.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.12.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.12.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.12.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.12.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.12.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.12.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.12.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.13.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00026-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00027-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00028-of-00123.safetensors'
INFO:hf-to-gguf:blk.13.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.13.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.13.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.13.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.13.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.13.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.13.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.13.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.13.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.14.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00029-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00030-of-00123.safetensors'
INFO:hf-to-gguf:blk.14.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.14.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.14.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.14.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.14.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.14.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.14.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.14.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.14.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.15.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00031-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00032-of-00123.safetensors'
INFO:hf-to-gguf:blk.15.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.15.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.15.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.15.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.15.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.15.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00033-of-00123.safetensors'
INFO:hf-to-gguf:blk.15.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.15.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.15.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.16.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00034-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00035-of-00123.safetensors'
INFO:hf-to-gguf:blk.16.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.16.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.16.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.16.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.16.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.16.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.16.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.16.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.16.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.17.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00036-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00037-of-00123.safetensors'
INFO:hf-to-gguf:blk.17.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.17.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.17.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.17.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.17.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.17.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.17.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.17.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.18.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00038-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00039-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00040-of-00123.safetensors'
INFO:hf-to-gguf:blk.18.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.18.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.18.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.18.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.18.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.18.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.18.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.18.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.18.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.19.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00041-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00042-of-00123.safetensors'
INFO:hf-to-gguf:blk.19.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.19.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.19.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.19.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.19.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.19.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.19.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.19.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.19.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.20.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00043-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00044-of-00123.safetensors'
INFO:hf-to-gguf:blk.20.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.20.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.20.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.20.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.20.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.20.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.20.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.20.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.20.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.21.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00045-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00046-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00047-of-00123.safetensors'
INFO:hf-to-gguf:blk.21.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.21.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.21.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.21.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.21.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.21.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.21.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.21.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.21.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.22.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00048-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00049-of-00123.safetensors'
INFO:hf-to-gguf:blk.22.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.22.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.22.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.22.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.22.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.22.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.22.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.22.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.22.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.23.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00050-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00051-of-00123.safetensors'
INFO:hf-to-gguf:blk.23.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.23.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.23.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.23.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.23.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.23.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.23.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.23.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.24.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00052-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00053-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00054-of-00123.safetensors'
INFO:hf-to-gguf:blk.24.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.24.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.24.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.24.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.24.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.24.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.24.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.24.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.24.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.25.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00055-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00056-of-00123.safetensors'
INFO:hf-to-gguf:blk.25.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.25.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.25.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.25.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.25.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.25.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.25.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.25.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.25.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.26.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00057-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00058-of-00123.safetensors'
INFO:hf-to-gguf:blk.26.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.26.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.26.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.26.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.26.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.26.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.26.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.26.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.26.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.27.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00059-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00060-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00061-of-00123.safetensors'
INFO:hf-to-gguf:blk.27.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.27.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.27.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.27.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.27.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.27.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.27.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.27.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.27.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.28.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00062-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00063-of-00123.safetensors'
INFO:hf-to-gguf:blk.28.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.28.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.28.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.28.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.28.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.28.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.28.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.28.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.28.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.28.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.29.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00064-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00065-of-00123.safetensors'
INFO:hf-to-gguf:blk.29.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.29.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.29.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.29.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.29.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.29.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.29.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00066-of-00123.safetensors'
INFO:hf-to-gguf:blk.29.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.29.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.29.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.30.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00067-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00068-of-00123.safetensors'
INFO:hf-to-gguf:blk.30.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.30.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.30.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.30.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.30.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.30.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.30.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.30.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.30.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.30.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.31.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00069-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00070-of-00123.safetensors'
INFO:hf-to-gguf:blk.31.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.31.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.31.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.31.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.31.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.31.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.31.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.31.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.31.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.31.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.32.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00071-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00072-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00073-of-00123.safetensors'
INFO:hf-to-gguf:blk.32.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.32.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.32.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.32.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.32.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.32.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.32.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.32.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.32.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.32.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.33.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00074-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00075-of-00123.safetensors'
INFO:hf-to-gguf:blk.33.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.33.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.33.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.33.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.33.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.33.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.33.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.33.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.33.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.33.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.34.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00076-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00077-of-00123.safetensors'
INFO:hf-to-gguf:blk.34.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.34.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.34.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.34.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.34.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.34.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.34.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.34.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.34.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.34.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.35.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00078-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00079-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00080-of-00123.safetensors'
INFO:hf-to-gguf:blk.35.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.35.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.35.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.35.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.35.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.35.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.35.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.35.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.35.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.35.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.36.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00081-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00082-of-00123.safetensors'
INFO:hf-to-gguf:blk.36.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.36.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.36.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.36.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.36.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.36.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.36.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.36.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.36.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.36.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.37.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00083-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00084-of-00123.safetensors'
INFO:hf-to-gguf:blk.37.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.37.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.37.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.37.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.37.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.37.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.37.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.37.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.37.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.37.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.38.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00085-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00086-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00087-of-00123.safetensors'
INFO:hf-to-gguf:blk.38.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.38.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.38.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.38.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.38.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.38.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.38.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.38.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.38.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.38.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.39.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00088-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00089-of-00123.safetensors'
INFO:hf-to-gguf:blk.39.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.39.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.39.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.39.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.39.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.39.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.39.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.39.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.39.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.39.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.40.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00090-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00091-of-00123.safetensors'
INFO:hf-to-gguf:blk.40.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.40.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.40.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.40.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.40.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.40.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.40.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.40.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.40.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.40.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.41.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00092-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00093-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00094-of-00123.safetensors'
INFO:hf-to-gguf:blk.41.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.41.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.41.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.41.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.41.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.41.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.41.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.41.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.41.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.41.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.42.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00095-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00096-of-00123.safetensors'
INFO:hf-to-gguf:blk.42.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.42.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.42.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.42.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.42.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.42.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.42.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.42.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.42.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.42.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.43.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00097-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00098-of-00123.safetensors'
INFO:hf-to-gguf:blk.43.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.43.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.43.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.43.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.43.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.43.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.43.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00099-of-00123.safetensors'
INFO:hf-to-gguf:blk.43.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.43.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.43.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.44.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00100-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00101-of-00123.safetensors'
INFO:hf-to-gguf:blk.44.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.44.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.44.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.44.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.44.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.44.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.44.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.44.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.44.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.44.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.45.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00102-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00103-of-00123.safetensors'
INFO:hf-to-gguf:blk.45.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.45.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.45.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.45.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.45.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.45.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.45.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.45.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.45.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.45.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.46.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00104-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00105-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00106-of-00123.safetensors'
INFO:hf-to-gguf:blk.46.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.46.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.46.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.46.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.46.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.46.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.46.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.46.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.46.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.46.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.47.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00107-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00108-of-00123.safetensors'
INFO:hf-to-gguf:blk.47.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.47.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.47.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.47.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.47.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.47.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.47.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.47.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.47.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.47.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.48.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00109-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00110-of-00123.safetensors'
INFO:hf-to-gguf:blk.48.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.48.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.48.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.48.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.48.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.48.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.48.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.48.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.48.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.48.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.49.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00111-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00112-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00113-of-00123.safetensors'
INFO:hf-to-gguf:blk.49.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.49.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.49.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.49.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.49.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.49.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.49.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.49.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.49.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.49.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.50.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00114-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00115-of-00123.safetensors'
INFO:hf-to-gguf:blk.50.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.50.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.50.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.50.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.50.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.50.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.50.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.50.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.50.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.50.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.51.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00116-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00117-of-00123.safetensors'
INFO:hf-to-gguf:blk.51.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.51.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.51.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.51.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.51.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.51.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.51.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.51.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.51.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.51.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.52.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00118-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00119-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00120-of-00123.safetensors'
INFO:hf-to-gguf:blk.52.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.52.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.52.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.52.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.52.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.52.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.52.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.52.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.52.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.52.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.53.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00121-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00122-of-00123.safetensors'
INFO:hf-to-gguf:blk.53.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.53.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.53.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.53.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.53.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.53.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.53.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.53.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.53.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.53.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:output_norm.weight,           torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00123-of-00123.safetensors'
INFO:hf-to-gguf:output.weight,                torch.bfloat16 --> F16, shape = {8192, 103424}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 8192
INFO:hf-to-gguf:gguf: feed forward length = 28672
INFO:hf-to-gguf:gguf: head count = 64
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
WARNING:gguf.gguf_writer:Duplicated key name 'ernie4_5-moe.rope.freq_base', overwriting it with new value 500000 of type FLOAT32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 2
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting chat_template to {%- if not add_generation_prompt is defined -%}
    {%- set add_generation_prompt = true -%}
{%- endif -%}
{%- if not cls_token is defined -%}
    {%- set cls_token = "<|begin_of_sentence|>" -%}
{%- endif -%}
{%- if not sep_token is defined -%}
    {%- set sep_token = "<|end_of_sentence|>" -%}
{%- endif -%}
{{- cls_token -}}
{%- for message in messages -%}
    {%- if message["role"] == "user" -%}
        {{- "User: " + message["content"] + "
" -}}
    {%- elif message["role"] == "assistant" -%}
        {{- "Assistant: " + message["content"] + sep_token -}}
    {%- elif message["role"] == "system" -%}
        {{- message["content"] + "
" -}}
    {%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    {{- "Assistant: " -}}
{%- endif -%}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/bpool/ERNIE-4.5-300B-A47B-PT.gguf: n_tensors = 591, total_size = 599.0G

CISC · 2025-07-17T21:25:39Z

@pwilkin I just tested the 300B model on latest commit. It unfortunately fails the load due to missing tensor 'blk.3.ffn_gate_shexp.weight'.

LOL, the timing! :D

CISC · 2025-07-17T21:36:57Z

@pwilkin The fix seems simple, just check moe_num_shared_experts before you add_expert_shared_feed_forward_length, make another PR. :)

CISC · 2025-07-17T21:39:01Z

Oh, and add_rope_freq_base can be removed.

CISC · 2025-07-17T22:14:53Z

@pwilkin Looking closer at it I think things are a little more broken, but we can address that when you make the follow up PR.

pwilkin · 2025-07-17T22:34:02Z

There's one more difference in the "big" MoE:

"moe_gate": "topk",

I guess this refers to:

uint32_t expert_gating_func = LLAMA_EXPERT_GATING_FUNC_TYPE_NONE?

CISC · 2025-07-17T22:47:50Z

There's one more difference in the "big" MoE:

"moe_gate": "topk",

I guess this refers to:

uint32_t expert_gating_func = LLAMA_EXPERT_GATING_FUNC_TYPE_NONE?

No, that would be the moe_gate_act (which defaults to softmax).

I think moe_gate is topk only in llama.cpp.
Edit: Yep:

llama.cpp/src/llama-graph.cpp

Lines 827 to 828 in 760b448

    
           // select experts 
        
           ggml_tensor * selected_experts = ggml_top_k(ctx0, selection_probs, n_expert_used); // [n_expert_used, n_tokens]

fernandaspets · 2025-07-18T01:10:52Z

Hi thanks for the amazing work! Q: Are there any 300B ggufs up on HF? :P

pwilkin · 2025-07-18T20:54:11Z

@CISC

I think moe_gate is topk only in llama.cpp.

Well, this got relevant pretty quickly.

I tried to get to work on the VL model. Actually, getting the projector converted wasn't that hard. But the normal MoE...

It turns out the VL model (the 28B one) uses something they call a "top2 gate":

https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-PT/blob/main/modeling_ernie_45t_vl.py

In the config, the moe_num_experts is a two-element array, not a list. And it actually means the experts for each layer are a two-element array as well, only flattened for convenience: after each 64 experts of [1 536, 2 560] dimensions come 64 experts of [512, 2 560] dimensions. Apparently, these are supposed to be multimodal (they spell it "multimodel", but I guess they actually meant "multimodal") experts.

But here's where my competence ends - I could of course create a new tensor type to store the "other" experts and even write some logic for storing the weight and weight_1 tensors together and then decoupling them on execution, but I have no clue how to implement this whole "top2 gate" algorithm.

Would love some help with this or some pointers at least (I don't even understand why there are two different feed forward lengths for the two different tensor types).

CISC · 2025-07-18T21:22:11Z

I think they actually mean multimodel, as in that's why you have 2 values, one for each model baked into the same tensor.

You can just ignore top2 gating for now, topk probably works fine, anyway, for future reference it is described in the GShard paper.

pwilkin · 2025-07-18T21:31:31Z

You mean just ignore the second layer of tensors? I guess that would just be the 21B-A3B model with a projector then 😄

CISC · 2025-07-18T21:56:09Z

No, I just meant top2 vs topk should not cause much issue.

I suspect you will have to read that GShard paper for more info on what's going on with the layers, but it looks like they are combining results from both somehow. Oh, and we have trailing dense layers! :)

pwilkin added 3 commits July 13, 2025 01:47

Add Ernie4.5 MoE

8501cb3

Fix Flake errors.

4a231eb

Properly encode/decode MoE layer step

056ab44

github-actions bot added the python python script changes label Jul 13, 2025

Correct tensor mappings (.weight)

07a5c76

pwilkin marked this pull request as draft July 13, 2025 10:44

pwilkin added 2 commits July 13, 2025 13:22

Pass and read n_ff_exp

bb23dd0

n_ff_shexp calculation and further minor changes

bd27e81

pwilkin force-pushed the master branch from 2ddadc4 to bd27e81 Compare July 13, 2025 16:30

pwilkin added 2 commits July 13, 2025 20:02

Rope fixes.

992d4f0

.gitignore fix

dde7748

pwilkin marked this pull request as ready for review July 13, 2025 18:03

Add unit32 cast for Linux builds

a387e36

CISC requested changes Jul 14, 2025

View reviewed changes

CISC reviewed Jul 14, 2025

View reviewed changes

src/llama-arch.cpp Outdated Show resolved Hide resolved

pwilkin and others added 2 commits July 14, 2025 14:08

Apply suggestions from code review

950b401

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Further fixes from code review

7674862

Fix trailing whitespace

8d6ac42

pwilkin requested a review from CISC July 14, 2025 12:57

CISC requested changes Jul 14, 2025

View reviewed changes

src/llama-model.cpp Outdated Show resolved Hide resolved

src/llama-model.cpp Outdated Show resolved Hide resolved

Reenable missing experts error

3511437

pwilkin force-pushed the master branch from bd34667 to 3511437 Compare July 14, 2025 19:13

Code style from code review

542f36b

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

pwilkin requested a review from CISC July 14, 2025 19:16

CISC reviewed Jul 14, 2025

View reviewed changes

src/llama-model.cpp Outdated Show resolved Hide resolved

CISC requested a review from ggerganov July 14, 2025 19:39

pwilkin added 2 commits July 17, 2025 10:08

Merge branch 'ggml-org:master' into master

075ffdc

Merge branch 'ggml-org:master' into master

e9f96b2

CISC merged commit cb887f1 into ggml-org:master Jul 17, 2025
50 checks passed

pwilkin mentioned this pull request Jul 17, 2025

Fix Ernie4.5 MoE without shared experts #14746

Merged

Ph0rk0z mentioned this pull request Jul 17, 2025

Feature Request: ERNIE MoE Model Support ikawrakow/ik_llama.cpp#568

Open

xunjieliu mentioned this pull request Jul 18, 2025

Reddit News Daily 2025-07-18 xunjieliu/reddit-daily-news#124

Open

Model: Add support for Ernie 4.5 MoE #14658

Model: Add support for Ernie 4.5 MoE #14658

Conversation

pwilkin commented Jul 13, 2025

Uh oh!

pwilkin commented Jul 13, 2025

Uh oh!

pwilkin commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pwilkin commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

theo77186 commented Jul 14, 2025

Uh oh!

CISC commented Jul 14, 2025

Uh oh!

Uh oh!

pwilkin commented Jul 14, 2025

Uh oh!

pwilkin commented Jul 14, 2025

Uh oh!

Mushoz commented Jul 14, 2025

Uh oh!

pwilkin commented Jul 14, 2025

Uh oh!

ThiloteE commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Jul 15, 2025

Uh oh!

pwilkin commented Jul 15, 2025

Uh oh!

ThiloteE commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Jul 16, 2025

Uh oh!

a31413510 commented Jul 17, 2025

Uh oh!

CISC commented Jul 17, 2025

Uh oh!

Uh oh!

nicoboss commented Jul 17, 2025

Uh oh!

CISC commented Jul 17, 2025

Uh oh!

CISC commented Jul 17, 2025

Uh oh!

CISC commented Jul 17, 2025

Uh oh!

CISC commented Jul 17, 2025

Uh oh!

pwilkin commented Jul 17, 2025

Uh oh!

CISC commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fernandaspets commented Jul 18, 2025

Uh oh!

pwilkin commented Jul 18, 2025

Uh oh!

CISC commented Jul 18, 2025

Uh oh!

pwilkin commented Jul 18, 2025

Uh oh!

CISC commented Jul 18, 2025

Uh oh!

Uh oh!

ThiloteE commented Jul 15, 2025 •

edited

Loading

ThiloteE commented Jul 15, 2025 •

edited

Loading

CISC commented Jul 17, 2025 •

edited

Loading