[Guide]: How to ACTUALLY get it installed in (Fedora 42) linux #2043
thedarkbird
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Update: I've rechecked the whole process from scratch and corrected below instructions accordingly. Hope it works now for everyone trying. Let me know. fingers crossed
I have struggled to get llama-cpp-python to work with GPU on linux, a lot. A lot more than Windows actually. The main issue being that compiling the llama-cpp-python module requires specific prerequisites (not so on Windows).
Note: Other version combo's are possible as well, this is the one I chose based on the info I found.
In general:
System prerequisites:
(I am not including the terminal commands; find the official instructions for your specific distro)
If you don't have Miniconda installed:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
...and install.
For initialization options choose Yes: conda modifies your shell configuration to initialize conda whenever you open a new shell and to recognize conda commands automatically.
Now in detail.
Start a terminal and create a new conda environment named 'llm'.
We use an older python 3.11 for compatibility reasons.
conda create -n llm python=3.11 -y
conda activate llm
Everything will only be installed in the Conda environment currently activated, so it will not pollute your system.
Install GCC 13. Then we point the CC and CXX environment variables to this GCC 13 (otherwise llama-cpp-python will use the GCC 15 system version and compilation will fail). It took me quite a while to figure this one out (I don't have a lot of make/compile experience).
conda install -c conda-forge gcc=13 gxx=13 -y
export CC=$(which x86_64-conda-linux-gnu-gcc)
export CXX=$(which x86_64-conda-linux-gnu-g++)
Note: these environment variables only live for as long as your terminal session is open.
Install CUDA 12.5 and point the environment variables to the local Conda environment files, or it will use your system CUDA install (if you have any).
conda install -c nvidia cuda-toolkit=12.5 -y
export CUDA_HOME=$CONDA_PREFIX
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib:$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_COMPILER=$CUDA_HOME/bin/nvcc -DCUDAToolkit_ROOT=$CUDA_HOME"
Make a directory to clone the llama-cpp-python repo in. Then clone, including all the repo's submodules.
mkdir Git
cd Git
git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python
And finally, let's compile and install the llama-cpp-python module with CUDA backend! This is the step that failed so many times that I wanted to pull my hair out! But with all instructions above, it should work.
CMAKE_ARGS="-DGGML_CUDA=on" pip install .
Now you can run any GGUF model in Python! You can download many of them on Huggingface Models.
Side note: While llama-cpp-python allows you to load LLM's bigger than VRAM, and spread the load between CPU/RAM and GPU/VRAM, inference is a lot faster when you use a model that's about 3 GB smaller than your VRAM. Then you can load it entirely on the GPU and still have some room for context (n_ctx parameter, the LLM's memory).
ChatGPT will easily generate a Python template to run LLM's through llama-cpp-python, so I won't include it here.
Beta Was this translation helpful? Give feedback.
All reactions