[Guide]: How to ACTUALLY get it installed in (Fedora 42) linux #2043

thedarkbird · 2025-07-18T08:42:20Z

thedarkbird
Jul 18, 2025

Update: I've rechecked the whole process from scratch and corrected below instructions accordingly. Hope it works now for everyone trying. Let me know. fingers crossed

I have struggled to get llama-cpp-python to work with GPU on linux, a lot. A lot more than Windows actually. The main issue being that compiling the llama-cpp-python module requires specific prerequisites (not so on Windows).

CUDA 12.5 (not 12.8 or 12.9 for example)
GCC 13 (not 15, as is Fedora's default version).

Note: Other version combo's are possible as well, this is the one I chose based on the info I found.

In general:

setup a Conda environment
install the correct versions of CUDA and GCC in the environment
point the environment variables to the local Conda CUDA and GCC -> this is the tricky and important stuff
compile and install llama-cpp-python module

System prerequisites:

Make sure you have installed your system NVIDIA drivers (RPM fusion non-free for Fedora) (latest version 575.64.03 tested works)
I also installed CUDA toolkit v12.9 (system-wide, not in Conda env), but it should not be needed for below instructions
And of course Miniconda

(I am not including the terminal commands; find the official instructions for your specific distro)

If you don't have Miniconda installed:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

...and install.

For initialization options choose Yes: conda modifies your shell configuration to initialize conda whenever you open a new shell and to recognize conda commands automatically.

Now in detail.

Start a terminal and create a new conda environment named 'llm'.
We use an older python 3.11 for compatibility reasons.

conda create -n llm python=3.11 -y
conda activate llm

Everything will only be installed in the Conda environment currently activated, so it will not pollute your system.

Install GCC 13. Then we point the CC and CXX environment variables to this GCC 13 (otherwise llama-cpp-python will use the GCC 15 system version and compilation will fail). It took me quite a while to figure this one out (I don't have a lot of make/compile experience).

conda install -c conda-forge gcc=13 gxx=13 -y
export CC=$(which x86_64-conda-linux-gnu-gcc)
export CXX=$(which x86_64-conda-linux-gnu-g++)

Note: these environment variables only live for as long as your terminal session is open.

Install CUDA 12.5 and point the environment variables to the local Conda environment files, or it will use your system CUDA install (if you have any).

conda install -c nvidia cuda-toolkit=12.5 -y
export CUDA_HOME=$CONDA_PREFIX
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib:$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_COMPILER=$CUDA_HOME/bin/nvcc -DCUDAToolkit_ROOT=$CUDA_HOME"

Make a directory to clone the llama-cpp-python repo in. Then clone, including all the repo's submodules.

mkdir Git
cd Git
git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python

And finally, let's compile and install the llama-cpp-python module with CUDA backend! This is the step that failed so many times that I wanted to pull my hair out! But with all instructions above, it should work.

CMAKE_ARGS="-DGGML_CUDA=on" pip install .

Now you can run any GGUF model in Python! You can download many of them on Huggingface Models.

Side note: While llama-cpp-python allows you to load LLM's bigger than VRAM, and spread the load between CPU/RAM and GPU/VRAM, inference is a lot faster when you use a model that's about 3 GB smaller than your VRAM. Then you can load it entirely on the GPU and still have some room for context (n_ctx parameter, the LLM's memory).

ChatGPT will easily generate a Python template to run LLM's through llama-cpp-python, so I won't include it here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Guide]: How to ACTUALLY get it installed in (Fedora 42) linux #2043

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Guide]: How to ACTUALLY get it installed in (Fedora 42) linux #2043

Uh oh!

Uh oh!

thedarkbird Jul 18, 2025

Replies: 0 comments

thedarkbird
Jul 18, 2025