You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
</span><spanid="__span-0-24"><aid="__codelineno-0-24" name="__codelineno-0-24"></a><spanclass="w"></span><spanclass="sd">"""Load a llama.cpp model from `model_path`.</span>
</span><spanid="__span-0-27"><aid="__codelineno-0-27" name="__codelineno-0-27"></a><spanclass="sd"> model_path: Path to the model directory.</span>
1020
-
</span><spanid="__span-0-28"><aid="__codelineno-0-28" name="__codelineno-0-28"></a><spanclass="sd"> n_ctx: Number of tokens to keep in memory.</span>
1019
+
</span><spanid="__span-0-27"><aid="__codelineno-0-27" name="__codelineno-0-27"></a><spanclass="sd"> model_path: Path to the model.</span>
1020
+
</span><spanid="__span-0-28"><aid="__codelineno-0-28" name="__codelineno-0-28"></a><spanclass="sd"> n_ctx: Maximum context size.</span>
1021
1021
</span><spanid="__span-0-29"><aid="__codelineno-0-29" name="__codelineno-0-29"></a><spanclass="sd"> n_parts: Number of parts to split the model into. If -1, the number of parts is automatically determined.</span>
1022
-
</span><spanid="__span-0-30"><aid="__codelineno-0-30" name="__codelineno-0-30"></a><spanclass="sd"> seed: Random seed.</span>
1023
-
</span><spanid="__span-0-31"><aid="__codelineno-0-31" name="__codelineno-0-31"></a><spanclass="sd"> f16_kv: Use half-precision for key/value matrices.</span>
1024
-
</span><spanid="__span-0-32"><aid="__codelineno-0-32" name="__codelineno-0-32"></a><spanclass="sd"> logits_all: Return logits for all tokens, not just the vocabulary.</span>
1025
-
</span><spanid="__span-0-33"><aid="__codelineno-0-33" name="__codelineno-0-33"></a><spanclass="sd"> vocab_only: Only use tokens in the vocabulary.</span>
1022
+
</span><spanid="__span-0-30"><aid="__codelineno-0-30" name="__codelineno-0-30"></a><spanclass="sd"> seed: Random seed. 0 for random.</span>
1023
+
</span><spanid="__span-0-31"><aid="__codelineno-0-31" name="__codelineno-0-31"></a><spanclass="sd"> f16_kv: Use half-precision for key/value cache.</span>
1024
+
</span><spanid="__span-0-32"><aid="__codelineno-0-32" name="__codelineno-0-32"></a><spanclass="sd"> logits_all: Return logits for all tokens, not just the last token.</span>
1025
+
</span><spanid="__span-0-33"><aid="__codelineno-0-33" name="__codelineno-0-33"></a><spanclass="sd"> vocab_only: Only load the vocabulary no weights.</span>
1026
1026
</span><spanid="__span-0-34"><aid="__codelineno-0-34" name="__codelineno-0-34"></a><spanclass="sd"> n_threads: Number of threads to use. If None, the number of threads is automatically determined.</span>
</span><spanid="__span-0-24"><aid="__codelineno-0-24" name="__codelineno-0-24"></a><spanclass="w"></span><spanclass="sd">"""Load a llama.cpp model from `model_path`.</span>
</span><spanid="__span-0-27"><aid="__codelineno-0-27" name="__codelineno-0-27"></a><spanclass="sd"> model_path: Path to the model directory.</span>
1429
-
</span><spanid="__span-0-28"><aid="__codelineno-0-28" name="__codelineno-0-28"></a><spanclass="sd"> n_ctx: Number of tokens to keep in memory.</span>
1428
+
</span><spanid="__span-0-27"><aid="__codelineno-0-27" name="__codelineno-0-27"></a><spanclass="sd"> model_path: Path to the model.</span>
1429
+
</span><spanid="__span-0-28"><aid="__codelineno-0-28" name="__codelineno-0-28"></a><spanclass="sd"> n_ctx: Maximum context size.</span>
1430
1430
</span><spanid="__span-0-29"><aid="__codelineno-0-29" name="__codelineno-0-29"></a><spanclass="sd"> n_parts: Number of parts to split the model into. If -1, the number of parts is automatically determined.</span>
1431
-
</span><spanid="__span-0-30"><aid="__codelineno-0-30" name="__codelineno-0-30"></a><spanclass="sd"> seed: Random seed.</span>
1432
-
</span><spanid="__span-0-31"><aid="__codelineno-0-31" name="__codelineno-0-31"></a><spanclass="sd"> f16_kv: Use half-precision for key/value matrices.</span>
1433
-
</span><spanid="__span-0-32"><aid="__codelineno-0-32" name="__codelineno-0-32"></a><spanclass="sd"> logits_all: Return logits for all tokens, not just the vocabulary.</span>
1434
-
</span><spanid="__span-0-33"><aid="__codelineno-0-33" name="__codelineno-0-33"></a><spanclass="sd"> vocab_only: Only use tokens in the vocabulary.</span>
1431
+
</span><spanid="__span-0-30"><aid="__codelineno-0-30" name="__codelineno-0-30"></a><spanclass="sd"> seed: Random seed. 0 for random.</span>
1432
+
</span><spanid="__span-0-31"><aid="__codelineno-0-31" name="__codelineno-0-31"></a><spanclass="sd"> f16_kv: Use half-precision for key/value cache.</span>
1433
+
</span><spanid="__span-0-32"><aid="__codelineno-0-32" name="__codelineno-0-32"></a><spanclass="sd"> logits_all: Return logits for all tokens, not just the last token.</span>
1434
+
</span><spanid="__span-0-33"><aid="__codelineno-0-33" name="__codelineno-0-33"></a><spanclass="sd"> vocab_only: Only load the vocabulary no weights.</span>
1435
1435
</span><spanid="__span-0-34"><aid="__codelineno-0-34" name="__codelineno-0-34"></a><spanclass="sd"> n_threads: Number of threads to use. If None, the number of threads is automatically determined.</span>
0 commit comments