Skip to content

Initializing Llama() with verbose=False causes UnsupportedOperation: fileno on Colab #729

@robgon-art

Description

@robgon-art

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Running on Google Colab, I expect to initialize a Llama object with verbose=False, and use it like this:

from llama_cpp import Llama
llm = Llama(model_path=model_path, verbose=False)
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)

Current Behavior

The system throws an UnsupportedOperation: fileno error when initializing the Llama model.

llama-cpp-python bug

Environment and Context

I am running on Google Colab with a P4 GPU.

  • Physical hardware I am using:
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  2
  On-line CPU(s) list:   0,1
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU @ 2.20GHz
    CPU family:          6
    Model:               79
    Thread(s) per core:  2
    Core(s) per socket:  1
    Socket(s):           1
    Stepping:            0
    BogoMIPS:            4399.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscal
                         l nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopo
                         logy nonstop_tsc cpuid tsc_known_freq pni pclmulqdq sss
                         e3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes 
                         xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefe
                         tch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_ad
                         just bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed ad
                         x smap xsaveopt arat md_clear arch_capabilities
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   32 KiB (1 instance)
  L1i:                   32 KiB (1 instance)
  L2:                    256 KiB (1 instance)
  L3:                    55 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0,1
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Vulnerable; SMT Host state unknown
  Meltdown:              Vulnerable
  Mmio stale data:       Vulnerable
  Retbleed:              Vulnerable
  Spec store bypass:     Vulnerable
  Spectre v1:            Vulnerable: __user pointer sanitization and usercopy ba
                         rriers only; no swapgs barriers
  Spectre v2:            Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBR
                         S: Not affected
  Srbds:                 Not affected
  Tsx async abort:       Vulnerable
  • Operating System:
Linux 8911910c1e29 5.15.109+ #1 SMP Fri Jun 9 10:57:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • SDK versions:
Python 3.10.12
llama_cpp_python 0.2.6

Failure Information for the bug:

Here's the stack trace.

---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
[<ipython-input-4-437d167e2732>](https://localhost:8080/#) in <cell line: 2>()
      1 from llama_cpp import Llama
----> 2 llm = Llama(model_path=model_path, verbose=False)
      3 output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
      4 print(output)

1 frames
[/usr/local/lib/python3.10/dist-packages/llama_cpp/utils.py](https://localhost:8080/#) in __enter__(self)
      9         self.errnull_file = open(os.devnull, "w")
     10 
---> 11         self.old_stdout_fileno_undup = sys.stdout.fileno()
     12         self.old_stderr_fileno_undup = sys.stderr.fileno()
     13 

UnsupportedOperation: fileno

Steps to Reproduce

  1. Go to Google Colab, https://colab.research.google.com/
  2. Enter the code below
  3. Run the cell
!pip install llama-cpp-python huggingface_hub

from huggingface_hub import hf_hub_download
model_name_or_path = "TheBloke/Llama-2-7B-chat-GGUF"
model_basename = "llama-2-7b-chat.Q4_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

from llama_cpp import Llama
llm = Llama(model_path=model_path, verbose=False)
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)

It will fail with a UnsupportedOperation: fileno error

Environment info:

!git log | head -1
commit 8d75016549e2ff62a511b1119d966ffc0df5c77b

llama-cpp-python$ python3 --version
Python 3.10.12

component versions
llama_cpp_python Version: 0.2.6
diskcache Version: 5.6.3
numpy Version: 1.23.5
typing_extensions Version: 4.5.0

!git log | head -3
commit 8d75016549e2ff62a511b1119d966ffc0df5c77b
Author: Andrei Betlen <[abetlen@gmail.com](mailto:abetlen@gmail.com)>
Date:   Sat Sep 16 14:57:49 2023 -0400

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions