-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
49 Releases published by 1 person
-
b5874
published
Jul 12, 2025 -
b5875
published
Jul 12, 2025 -
b5876
published
Jul 12, 2025 -
b5880
published
Jul 12, 2025 -
b5882
published
Jul 12, 2025 -
b5884
published
Jul 12, 2025 -
b5886
published
Jul 13, 2025 -
b5887
published
Jul 13, 2025 -
b5888
published
Jul 13, 2025 -
b5889
published
Jul 13, 2025 -
b5890
published
Jul 13, 2025 -
b5891
published
Jul 14, 2025 -
b5892
published
Jul 14, 2025 -
b5893
published
Jul 14, 2025 -
b5894
published
Jul 14, 2025 -
b5895
published
Jul 14, 2025 -
b5896
published
Jul 14, 2025 -
b5897
published
Jul 14, 2025 -
b5898
published
Jul 15, 2025 -
b5899
published
Jul 15, 2025 -
b5900
published
Jul 15, 2025 -
b5901
published
Jul 15, 2025 -
b5902
published
Jul 15, 2025 -
b5904
published
Jul 16, 2025 -
b5908
published
Jul 16, 2025 -
b5909
published
Jul 16, 2025 -
b5910
published
Jul 16, 2025 -
b5911
published
Jul 16, 2025 -
b5912
published
Jul 16, 2025 -
b5913
published
Jul 16, 2025 -
b5914
published
Jul 16, 2025 -
b5915
published
Jul 16, 2025 -
b5916
published
Jul 16, 2025 -
b5919
published
Jul 17, 2025 -
b5920
published
Jul 17, 2025 -
b5921
published
Jul 17, 2025 -
b5922
published
Jul 17, 2025 -
b5923
published
Jul 17, 2025 -
b5924
published
Jul 17, 2025 -
b5927
published
Jul 18, 2025 -
b5928
published
Jul 18, 2025 -
b5929
published
Jul 18, 2025 -
b5930
published
Jul 18, 2025 -
b5932
published
Jul 18, 2025 -
b5933
published
Jul 18, 2025 -
b5934
published
Jul 18, 2025 -
b5935
published
Jul 18, 2025 -
b5936
published
Jul 18, 2025 -
b5937
published
Jul 18, 2025
62 Pull requests merged by 26 people
-
sync : ggml
#14768 merged
Jul 19, 2025 -
metal : fuse add, mul
#14596 merged
Jul 18, 2025 -
graph : fix graph reuse reset of params
#14760 merged
Jul 18, 2025 -
parallel : add option for different RNG seeds
#14757 merged
Jul 18, 2025 -
Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs
#14741 merged
Jul 18, 2025 -
graph : avoid huge warm-up graphs for MoE models
#14753 merged
Jul 18, 2025 -
model : fix build after merge conflict
#14754 merged
Jul 18, 2025 -
Add EXAONE 4.0 model architecture
#14630 merged
Jul 18, 2025 -
CUDA: set_rows + cpy.cu refactor
#14712 merged
Jul 18, 2025 -
graph : refactor context to not pass gf explicitly
#14629 merged
Jul 18, 2025 -
Move the graph placeholder message to debug mode
#14748 merged
Jul 18, 2025 -
use max work group size for device to replace the magic number
#14732 merged
Jul 18, 2025 -
Fix Ernie4.5 MoE without shared experts
#14746 merged
Jul 17, 2025 -
nix: use optionalAttrs for
env
mkDerivation attrset argument#14726 merged
Jul 17, 2025 -
Model: Add support for Ernie 4.5 MoE
#14658 merged
Jul 17, 2025 -
kv-cache : fix k-shift for multiple streams
#14742 merged
Jul 17, 2025 -
llama : reuse compute graphs
#14482 merged
Jul 17, 2025 -
model : fix parallel processing for lfm2
#14705 merged
Jul 17, 2025 -
kv-cache : opt mask set input
#14600 merged
Jul 17, 2025 -
batch : fix uninitialized has_cpl flag
#14733 merged
Jul 17, 2025 -
ci : disable failing vulkan crossbuilds
#14723 merged
Jul 16, 2025 -
convert : make hf token optional
#14717 merged
Jul 16, 2025 -
Fix parameter order issue for hybrid memory initialization
#14725 merged
Jul 16, 2025 -
ggml: Add initial WebGPU backend
#14521 merged
Jul 16, 2025 -
Support Cosyvoice2-0.5B By allowing Qwen2 architecture to have a optional bias tensor
#14711 merged
Jul 16, 2025 -
llama : add high-throughput mode
#14363 merged
Jul 16, 2025 -
Support diffusion models: Add Dream 7B
#14644 merged
Jul 16, 2025 -
ggml : add asserts
#14720 merged
Jul 16, 2025 -
server : pre-calculate EOG logit biases
#14721 merged
Jul 16, 2025 -
Bug: fix inputs to conv1d in mamba layer of plamo2
#14716 merged
Jul 16, 2025 -
server : fix handling of the ignore_eos flag
#14710 merged
Jul 16, 2025 -
scripts: synthetic prompt mode for server-bench.py
#14695 merged
Jul 16, 2025 -
convert : only check for tokenizer folder if we need it
#14704 merged
Jul 16, 2025 -
convert : add pre-computed hashes first to prevent order mishaps
#14701 merged
Jul 16, 2025 -
llama: add LLAMA_API to deprecated llama_kv_self_seq_div
#14708 merged
Jul 16, 2025 -
scripts: add bpw per layer and model
#14703 merged
Jul 15, 2025 -
Model : Add support for Kimi-K2
#14654 merged
Jul 15, 2025 -
vulkan: fix noncontig check for mat_mul_id splitting
#14683 merged
Jul 15, 2025 -
vulkan: add RTE variants for glu/add/sub/mul/div
#14653 merged
Jul 15, 2025 -
model : add PLaMo-2 model
#14560 merged
Jul 15, 2025 -
cuda: fix build warnings in set-rows.cu (unused variable)
#14687 merged
Jul 15, 2025 -
sycl: Hotfix for non dnnl codepath
#14677 merged
Jul 14, 2025 -
PPC:Refactor llamafile_sgemm code
#14673 merged
Jul 14, 2025 -
llama-context: add ability to get logits
#14672 merged
Jul 14, 2025 -
scripts: benchmark for HTTP server throughput
#14668 merged
Jul 14, 2025 -
SYCL: use 1D kernel for set_rows
#14618 merged
Jul 14, 2025 -
sycl: Batched mulmat rework for oneDNN dispatch
#14617 merged
Jul 14, 2025 -
llama : add jinja template for rwkv-world
#14665 merged
Jul 13, 2025 -
quantize: fix minor logic flaw in --tensor-type
#14572 merged
Jul 13, 2025 -
cuda : add set rows for bf16
#14664 merged
Jul 13, 2025 -
Add ELU CUDA support
#14657 merged
Jul 13, 2025 -
ggml : set_rows type coverage
#14661 merged
Jul 13, 2025 -
Add missing unary ops Metal support
#14660 merged
Jul 13, 2025 -
Add CMake presets for Linux and GCC
#14656 merged
Jul 13, 2025 -
test-backend-ops : cover lfm2 cases in test_ssm_conv
#14651 merged
Jul 12, 2025 -
readme : add LFM2 to models section
#14650 merged
Jul 12, 2025 -
CUDA: add set rows for f32 and f16
#14551 merged
Jul 12, 2025 -
sync : ggml
#14648 merged
Jul 12, 2025 -
sync : ggml
#14647 merged
Jul 12, 2025 -
server : fix pooled embedding output
#14645 merged
Jul 12, 2025 -
vulkan: support SET_ROWS
#14587 merged
Jul 12, 2025 -
vulkan: optimizations for deepseek prompt processing
#14555 merged
Jul 12, 2025
22 Pull requests opened by 19 people
-
webui : add a preset feature to the settings
#14649 opened
Jul 12, 2025 -
Add Pad Reflect 1D CUDA support
#14659 opened
Jul 13, 2025 -
bug fix: handle saving/loading null layers in recurrent memory
#14675 opened
Jul 14, 2025 -
kleidiai: add support for get_rows
#14676 opened
Jul 14, 2025 -
Adding a simple-function-call example - hopefully not doing anything wrong
#14682 opened
Jul 14, 2025 -
Fix KleidiAI compilation errors with -DGGML_NATIVE=OFF (issue #14464)
#14700 opened
Jul 15, 2025 -
vulkan: Add logging for bf16 features to ggml_vk_print_gpu_info (#13274)
#14707 opened
Jul 16, 2025 -
server: add prompt processing progress streaming for /completion endpoint #14685
#14728 opened
Jul 16, 2025 -
mtmd : Support jinja in libmtmd (Only for QwenVL and Qwen Omni)
#14730 opened
Jul 17, 2025 -
feat: Add optional prompt processing progress streaming
#14731 opened
Jul 17, 2025 -
CUDA: skip masked out KQ slices in mma FA kernel
#14735 opened
Jul 17, 2025 -
Documentation: Update build.md's Vulkan section
#14736 opened
Jul 17, 2025 -
Improve Mistral models integration with llama.cpp
#14737 opened
Jul 17, 2025 -
examples : predicted output for text generation
#14739 opened
Jul 17, 2025 -
metal: SSM_SCAN performance
#14743 opened
Jul 17, 2025 -
[ROCm] Fix HIP version check for HIPBLAS V2 API compatibility
#14744 opened
Jul 17, 2025 -
Fix MinicpmV model converter and clip to avoid using hardcode.
#14750 opened
Jul 18, 2025 -
tests : add non-cont K,V FA tests
#14756 opened
Jul 18, 2025 -
cuda : implement bf16 cpy ops and enable bf16 cont
#14763 opened
Jul 18, 2025 -
webui: add missing messages in export (#13552)
#14764 opened
Jul 18, 2025 -
feat: Add extended sampling API with candidate token lists #14612
#14765 opened
Jul 19, 2025 -
docs : mention apt installation method
#14766 opened
Jul 19, 2025
44 Issues closed by 15 people
-
Eval bug: SIGILL
#13161 closed
Jul 19, 2025 -
Eval bug: Can't run Qwen3-32B Q4_K_XL
#13298 closed
Jul 19, 2025 -
Eval bug: Uncaught exception [json.exception.parse_error.101] during tool use crashes llama-server
#13825 closed
Jul 19, 2025 -
Compile bug: numerous deprecation warnings when compiling in Termux
#14011 closed
Jul 19, 2025 -
Feature Request: Support Llama-Nemotron-Nano-VL-8B-V1
#14015 closed
Jul 19, 2025 -
Misc. bug: "error: invalid argument: /bin/sh" when using Docker image
#14019 closed
Jul 19, 2025 -
Eval bug: b5922 causes gibberish on context shift
#14759 closed
Jul 18, 2025 -
Eval bug: SYCL backend "invalid work-group size" error when using MoE models with Intel iGPU
#14689 closed
Jul 18, 2025 -
Misc. bug: sentencepiece not included in requirements.txt
#13982 closed
Jul 18, 2025 -
Compile bug:
#13992 closed
Jul 18, 2025 -
Feature Request: allow spacebar to confirm web UI prompts [like the deleting a chat confirmation]
#13999 closed
Jul 18, 2025 -
Feature Request: Add Ernie4.5MoE support
#14465 closed
Jul 17, 2025 -
Compile bug: cannot compile get_rows_iq1_m
#14542 closed
Jul 17, 2025 -
Misc. bug: mtmd cannot decode an image provided through valid OpenAI API request
#14615 closed
Jul 17, 2025 -
Eval bug: Assertion failure when using LFM2 with parallel request processing
#14670 closed
Jul 17, 2025 -
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf
#12997 closed
Jul 17, 2025 -
Feature Request: WINA
#13964 closed
Jul 17, 2025 -
Eval bug: Unable to load the model on GPU
#13967 closed
Jul 17, 2025 -
make using shifting context easier.
#13969 closed
Jul 17, 2025 -
context shifting should be default option?
#13971 closed
Jul 17, 2025 -
Misc. bug: llama-bench improper tensor split
#13972 closed
Jul 17, 2025 -
Misc. bug: Hybrid models failing to load with assert GGML_ASSERT(kv_size % n_pad == 0)
#14724 closed
Jul 16, 2025 -
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 closed
Jul 16, 2025 -
Data offline
#14722 closed
Jul 16, 2025 -
Feature Request: Exclude thinking tokens from server cache for reasoning models
#14379 closed
Jul 16, 2025 -
Eval bug: llama-tts abort
#13955 closed
Jul 16, 2025 -
Eval bug: llama-mtmd-cli : option --image failed to load image
#13959 closed
Jul 16, 2025 -
Eval bug: make_cpu_buft_list: no CPU backend found .... failed to load model
#14691 closed
Jul 15, 2025 -
llama.cpp unable to compile for HIP
#14694 closed
Jul 15, 2025 -
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 closed
Jul 15, 2025 -
Misc. bug: Decreased success rate for tool calling
#13769 closed
Jul 15, 2025 -
Feature Request: Regarding Hardcoded GGML Tensor Name Length Limit (GGML_MAX_NAME)
#13947 closed
Jul 15, 2025 -
Feature Request: Granite 4 Support
#13275 closed
Jul 14, 2025 -
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_'
#13893 closed
Jul 14, 2025 -
Feature Request: Generate Image Embeddings with llama.cpp
#13913 closed
Jul 14, 2025 -
Eval bug: convert_hf_to_gguf.py: error: argument --outtype: invalid choice: 'q4_k_m'
#14667 closed
Jul 13, 2025 -
Eval bug: Gemma 3n incoherent with HIP when prompt length > ubatch
#14604 closed
Jul 13, 2025 -
Automatic optimization of runtime parameters such as -ngl given memory constraints
#13860 closed
Jul 13, 2025 -
Feature Request: Make the `/completion` endpoint in `llama-server` work with multimodal models
#13872 closed
Jul 13, 2025 -
Eval bug:[5808-] qwen3 30B vulkan run with GGG
#14583 closed
Jul 12, 2025 -
Misc. bug: Embedding/pooling: I receive 10xvector not 1xvector
#14543 closed
Jul 12, 2025
29 Issues opened by 28 people
-
Research: Add isolated tool for hosting a Claude-style API with tool calling support
#14769 opened
Jul 19, 2025 -
Feature Request: Direct FP8 conversion from convert_hf_to_gguf.py
#14762 opened
Jul 18, 2025 -
Exaone-4 gibberish when using jinja template
#14761 opened
Jul 18, 2025 -
Eval bug: Nemotron 49b doesnt load correctly
#14752 opened
Jul 18, 2025 -
Error with unsloth DeepSeek-V3 BF16 and imatrix
#14749 opened
Jul 18, 2025 -
Misc. bug: RPC flash attention bug on deepseek models (deepseek/kimi k2)
#14747 opened
Jul 17, 2025 -
Compile bug: GGML Vulkan :: "format not a string literal and no format arguments [-Werror=format-security]"
#14745 opened
Jul 17, 2025 -
Misc. bug: out of memory error after PR #13746
#14740 opened
Jul 17, 2025 -
Misc. bug: b5921 release zip on github misses llama-embedding binary
#14738 opened
Jul 17, 2025 -
Regarding the build for 8060S (gfx1151):
#14734 opened
Jul 17, 2025 -
Eval bug: Nondeterministic output with ROCm backend despite zero temperature
#14727 opened
Jul 16, 2025 -
Misc. bug: Llama server use some of the vram on another GPU, even I set -mg 1 and -sm 'none'
#14719 opened
Jul 16, 2025 -
Feature Request: Optimization of work for MOE architecture
#14714 opened
Jul 16, 2025 -
Misc. bug: llamacpp crashes my PC whenever I close the console for it.
#14713 opened
Jul 16, 2025 -
Compile bug: [SYCL][ARC A770] Regression: 双 A770 支持在 b5422 及以后版本失效
#14709 opened
Jul 16, 2025 -
Misc. bug: OpenAI API v1/responses llama-server
#14702 opened
Jul 15, 2025 -
Feature Request: ARMv7 / Termux Support on Mobile Devices
#14699 opened
Jul 15, 2025 -
Eval bug: Gemma 3n on Vulkan fails to load
#14698 opened
Jul 15, 2025 -
Eval bug: Regression: Tool calls still returned in content field as JSON string instead of tool_calls array
#14697 opened
Jul 15, 2025 -
Eval bug: Unable to run with Qwen3 model
#14696 opened
Jul 15, 2025 -
Compile bug: llama-llava-clip-quantize-cli not found
#14693 opened
Jul 15, 2025 -
Eval bug: CUDA error: operation not supported
#14692 opened
Jul 15, 2025 -
Feature Request: Server stream response for "prompt processing progress"
#14685 opened
Jul 15, 2025 -
Misc. bug: DeepSeek-R1 0528 671b:Q4_K_XL think tags do not close sometimes
#14679 opened
Jul 14, 2025 -
Eval bug: Qwen 2.5 VL gets stuck in a loop
#14663 opened
Jul 13, 2025 -
Bunch of blank lines in prompt lead to segmentation fault in tokenizer with Qwen3
#14655 opened
Jul 12, 2025 -
Feature Request: Add Explicit Context Reset for llama-cli or llama-server
#14652 opened
Jul 12, 2025
64 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add CUDA non-contiguous Unary Ops support
#14639 commented on
Jul 15, 2025 • 11 new comments -
ggml: adds CONV_2D op and direct GEMM Vulkan implementation
#14316 commented on
Jul 19, 2025 • 7 new comments -
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on
Jul 13, 2025 • 2 new comments -
OpenCL: add `mul_mat_f16_f32_image` kernel
#14635 commented on
Jul 15, 2025 • 2 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
Jul 17, 2025 • 1 new comment -
finetune.cpp command-line arg
#13873 commented on
Jul 18, 2025 • 1 new comment -
ggml: introduce GGML_NUMA_MIGRATE to optimize cross NUMA op computation
#14232 commented on
Jul 15, 2025 • 1 new comment -
OpenCL: add conv2d kernel
#14403 commented on
Jul 18, 2025 • 1 new comment -
Allow truncation when embedding
#14493 commented on
Jul 18, 2025 • 1 new comment -
docker : add cann build pipline
#14591 commented on
Jul 17, 2025 • 1 new comment -
common: add config presets for falcon
#14638 commented on
Jul 12, 2025 • 1 new comment -
Misc. bug: weird cursor placement in the web UI
#14233 commented on
Jul 18, 2025 • 0 new comments -
Misc. bug: prompt as pasted content in the server
#14251 commented on
Jul 18, 2025 • 0 new comments -
android built on GPU cannot comparable with CPU?
#13910 commented on
Jul 18, 2025 • 0 new comments -
Misc. bug: missing messages in JSON export via llama-server web UI
#13552 commented on
Jul 18, 2025 • 0 new comments -
Feature Request: s390x CI
#13243 commented on
Jul 18, 2025 • 0 new comments -
Feature Request: Gemma3n multimodal support
#14429 commented on
Jul 18, 2025 • 0 new comments -
main: failed to quantize model from 'gemma-3n-E2B-it.f16.gguf'
#14405 commented on
Jul 18, 2025 • 0 new comments -
Misc. bug: [CANN] memory leaky using CANN as backend
#14257 commented on
Jul 19, 2025 • 0 new comments -
Feature Request: Improve Sampling API: Expose Top‑K/Top‑P Candidate Token Lists in C API
#14612 commented on
Jul 19, 2025 • 0 new comments -
Revert "ggml : remove OpenCL (#7735) + (#8235)"
#8986 commented on
Jul 17, 2025 • 0 new comments -
imatrix : use GGUF to store importance matrices
#9400 commented on
Jul 19, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Jul 18, 2025 • 0 new comments -
llama-server : implement universal assisted decoding
#12635 commented on
Jul 15, 2025 • 0 new comments -
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on
Jul 18, 2025 • 0 new comments -
HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3
#14624 commented on
Jul 17, 2025 • 0 new comments -
tool: add convertation of text/parquet to custom format
#14622 commented on
Jul 18, 2025 • 0 new comments -
llama : support qwen3 rerank and embeddings
#14029 commented on
Jul 13, 2025 • 0 new comments -
metal : reuse graphs
#14570 commented on
Jul 17, 2025 • 0 new comments -
Q2k interleaving implementation - x86/x64 SIMD
#14373 commented on
Jul 17, 2025 • 0 new comments -
musa: upgrade musa sdk to 4.2.0
#14498 commented on
Jul 18, 2025 • 0 new comments -
[CANN] weight format to nz for Ascend310P3
#14407 commented on
Jul 16, 2025 • 0 new comments -
Misc. bug: linux/arm64 does not exist for the server docker image
#13891 commented on
Jul 12, 2025 • 0 new comments -
Metrics should not include : in Prometheus metric names
#14150 commented on
Jul 13, 2025 • 0 new comments -
Eval bug: (MAC) fail in `GGML_METAL_ADD_KERNEL(GGML_METAL_KERNEL_TYPE_FLASH_ATTN_EXT_Q8_0_H96, flash_attn_ext_q8_0_h96, has_simdgroup_mm);`
#14110 commented on
Jul 13, 2025 • 0 new comments -
Misc. bug: llama-server webui with --jinja flag does not show thinking when using reasoning models
#14007 commented on
Jul 13, 2025 • 0 new comments -
prismatic-vlms to gguf?
#14159 commented on
Jul 14, 2025 • 0 new comments -
Research: mmap eviction
#14154 commented on
Jul 14, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
Jul 14, 2025 • 0 new comments -
OpenCL backend with Qualcomm Adreno GPUs load time is too long
#14337 commented on
Jul 14, 2025 • 0 new comments -
Feature Request: add tool calling for deepseek-r1-0528
#14557 commented on
Jul 15, 2025 • 0 new comments -
Feature Request: Support EXAONE 4.0
#14474 commented on
Jul 15, 2025 • 0 new comments -
Misc. bug: convert_hf_to_gguf.py not working on qwen3-embedding and qwen3-embedding lora tuned models
#14459 commented on
Jul 15, 2025 • 0 new comments -
Compile bug: zero-size array ‘gemm_gemv_kernels’ / invalid feature modifier ‘sme’
#14464 commented on
Jul 15, 2025 • 0 new comments -
ggml : add WebGPU backend
#7773 commented on
Jul 16, 2025 • 0 new comments -
Misc. bug: full-cuda docker build needs ldconfig before launching llama-*
#14195 commented on
Jul 16, 2025 • 0 new comments -
Misc. bug: LLAMA-SERVER is 40% slower than LLAMA-CLI when using identical parameters including -ot option for tensor offloading
#14201 commented on
Jul 16, 2025 • 0 new comments -
Misc. bug: evaluate_and_capture_cuda_graph NULL POINTER DEREFERENCE
#14186 commented on
Jul 16, 2025 • 0 new comments -
Misc. bug: Failure to allocate buffer with ROCm 6.4
#14178 commented on
Jul 16, 2025 • 0 new comments -
Misc. bug: Potential out of bound in rerank
#13549 commented on
Jul 16, 2025 • 0 new comments -
Misc. bug: Qwen3-Embedding-0.6B-GGUF doesn't work for 32768 context size (too much memory used)
#14084 commented on
Jul 16, 2025 • 0 new comments -
changelog : `libllama` API
#9289 commented on
Jul 16, 2025 • 0 new comments -
Misc. bug: crash on vulkan with new max mem alloc size calculations since b5703
#14553 commented on
Jul 16, 2025 • 0 new comments -
Feature Request: Generic CPU in ggml-cpu/arch
#14402 commented on
Jul 16, 2025 • 0 new comments -
Feature Request: Support Kimi K2
#14642 commented on
Jul 16, 2025 • 0 new comments -
Feature Request: llama-server: a flag for limiting input image size
#14216 commented on
Jul 17, 2025 • 0 new comments -
Misc. bug: OAI response_format json_schema and json_object not applied with Llama 3.x models
#14218 commented on
Jul 17, 2025 • 0 new comments -
Eval bug: RWKV inference with llama-parallel gets wrong output with lmhead offloaded to GPU
#14211 commented on
Jul 17, 2025 • 0 new comments -
Misc. bug: [Windows] GPU layers/tensors still consume system memory after load when mmap = true
#14187 commented on
Jul 17, 2025 • 0 new comments -
Misc. bug: Stuck while loading the model
#14114 commented on
Jul 17, 2025 • 0 new comments -
Feature Request: Support GLM-4.1V-9B-Thinking
#14495 commented on
Jul 17, 2025 • 0 new comments -
Feature Request: Add support for Kokoro TTS
#11050 commented on
Jul 17, 2025 • 0 new comments -
Eval bug: Inconsistent Embedding Similarity between llama-server and LlamaCppEmbeddings for BGE-M3 Model
#14280 commented on
Jul 17, 2025 • 0 new comments -
Misc. bug: Complex tool calling schema causes an "Unrecognized Schema" exception
#14227 commented on
Jul 17, 2025 • 0 new comments