Initializing Llama() with verbose=False causes UnsupportedOperation: fileno on Colab

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

Running on Google Colab, I expect to initialize a Llama object with `verbose=False`, and use it like this:
```
from llama_cpp import Llama
llm = Llama(model_path=model_path, verbose=False)
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)
```

# Current Behavior

The system throws an `UnsupportedOperation: fileno error` when initializing the Llama model.

![llama-cpp-python bug](https://github.com/abetlen/llama-cpp-python/assets/1119357/9f3738f4-ad0e-4034-9e23-c77e8b3e85b2)

# Environment and Context

I am running on Google Colab with a P4 GPU.

* Physical hardware I am using:
```
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  2
  On-line CPU(s) list:   0,1
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU @ 2.20GHz
    CPU family:          6
    Model:               79
    Thread(s) per core:  2
    Core(s) per socket:  1
    Socket(s):           1
    Stepping:            0
    BogoMIPS:            4399.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscal
                         l nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopo
                         logy nonstop_tsc cpuid tsc_known_freq pni pclmulqdq sss
                         e3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes 
                         xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefe
                         tch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_ad
                         just bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed ad
                         x smap xsaveopt arat md_clear arch_capabilities
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   32 KiB (1 instance)
  L1i:                   32 KiB (1 instance)
  L2:                    256 KiB (1 instance)
  L3:                    55 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0,1
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Vulnerable; SMT Host state unknown
  Meltdown:              Vulnerable
  Mmio stale data:       Vulnerable
  Retbleed:              Vulnerable
  Spec store bypass:     Vulnerable
  Spectre v1:            Vulnerable: __user pointer sanitization and usercopy ba
                         rriers only; no swapgs barriers
  Spectre v2:            Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBR
                         S: Not affected
  Srbds:                 Not affected
  Tsx async abort:       Vulnerable
```
* Operating System:
```
Linux 8911910c1e29 5.15.109+ #1 SMP Fri Jun 9 10:57:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
```

* SDK versions:

```
Python 3.10.12
llama_cpp_python 0.2.6
```

# Failure Information for the bug:

 Here's the stack trace.

```
---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
[<ipython-input-4-437d167e2732>](https://localhost:8080/#) in <cell line: 2>()
      1 from llama_cpp import Llama
----> 2 llm = Llama(model_path=model_path, verbose=False)
      3 output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
      4 print(output)

1 frames
[/usr/local/lib/python3.10/dist-packages/llama_cpp/utils.py](https://localhost:8080/#) in __enter__(self)
      9         self.errnull_file = open(os.devnull, "w")
     10 
---> 11         self.old_stdout_fileno_undup = sys.stdout.fileno()
     12         self.old_stderr_fileno_undup = sys.stderr.fileno()
     13 

UnsupportedOperation: fileno
```

# Steps to Reproduce

1. Go to Google Colab, https://colab.research.google.com/
2. Enter the code below
3. Run the cell

```
!pip install llama-cpp-python huggingface_hub

from huggingface_hub import hf_hub_download
model_name_or_path = "TheBloke/Llama-2-7B-chat-GGUF"
model_basename = "llama-2-7b-chat.Q4_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

from llama_cpp import Llama
llm = Llama(model_path=model_path, verbose=False)
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)
```
It will fail with a `UnsupportedOperation: fileno` error

# Environment info:
```
!git log | head -1
commit 8d75016549e2ff62a511b1119d966ffc0df5c77b

llama-cpp-python$ python3 --version
Python 3.10.12

component versions
llama_cpp_python Version: 0.2.6
diskcache Version: 5.6.3
numpy Version: 1.23.5
typing_extensions Version: 4.5.0

!git log | head -3
commit 8d75016549e2ff62a511b1119d966ffc0df5c77b
Author: Andrei Betlen <[abetlen@gmail.com](mailto:abetlen@gmail.com)>
Date:   Sat Sep 16 14:57:49 2023 -0400
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initializing Llama() with verbose=False causes UnsupportedOperation: fileno on Colab #729

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information for the bug:

Steps to Reproduce

Environment info:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Initializing Llama() with verbose=False causes UnsupportedOperation: fileno on Colab #729

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information for the bug:

Steps to Reproduce

Environment info:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions