LlamaCppLLM¶
llama.cpp LLM implementation running the Python bindings for the C++ code.
Attributes¶
-
model_path: contains the path to the GGUF quantized model, compatible with the installed version of the
llama.cppPython bindings. -
n_gpu_layers: the number of layers to use for the GPU. Defaults to
-1, meaning that the available GPU device will be used. -
chat_format: the chat format to use for the model. Defaults to
None, which means the Llama format will be used. -
n_ctx: the context size to use for the model. Defaults to
512. -
n_batch: the prompt processing maximum batch size to use for the model. Defaults to
512. -
seed: random seed to use for the generation. Defaults to
4294967295. -
verbose: whether to print verbose output. Defaults to
False. -
structured_output: a dictionary containing the structured output configuration or if more fine-grained control is needed, an instance of
OutlinesStructuredOutput. Defaults to None. -
extra_kwargs: additional dictionary of keyword arguments that will be passed to the
Llamaclass ofllama_cpplibrary. Defaults to{}. -
_model: the Llama model instance. This attribute is meant to be used internally and should not be accessed directly. It will be set in the
loadmethod.
Runtime Parameters¶
-
model_path: the path to the GGUF quantized model.
-
n_gpu_layers: the number of layers to use for the GPU. Defaults to
-1. -
chat_format: the chat format to use for the model. Defaults to
None. -
verbose: whether to print verbose output. Defaults to
False. -
extra_kwargs: additional dictionary of keyword arguments that will be passed to the
Llamaclass ofllama_cpplibrary. Defaults to{}.