Llamacpp
LlamacppLLM¶
LlamaCppLLM
¶
Bases: LLM
llama.cpp LLM implementation running the Python bindings for the C++ code.
Attributes:
Name | Type | Description |
---|---|---|
chat_format |
str
|
the chat format to use for the model. Defaults to |
model_path |
RuntimeParameter[FilePath]
|
contains the path to the GGUF quantized model, compatible with the
installed version of the |
n_gpu_layers |
RuntimeParameter[int]
|
the number of layers to use for the GPU. Defaults to |
verbose |
RuntimeParameter[bool]
|
whether to print verbose output. Defaults to |
_model |
Optional[Llama]
|
the Llama model instance. This attribute is meant to be used internally and
should not be accessed directly. It will be set in the |
Runtime parameters
model_path
: the path to the GGUF quantized model.n_gpu_layers
: the number of layers to use for the GPU. Defaults to-1
.verbose
: whether to print verbose output. Defaults toFalse
.
Source code in src/distilabel/llms/llamacpp.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
model_name: str
property
¶
Returns the model name used for the LLM.
generate(inputs, num_generations=1, max_new_tokens=128, frequency_penalty=0.0, presence_penalty=0.0, temperature=1.0, top_p=1.0)
¶
Generates num_generations
responses for the given input using the Llama model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
List[ChatType]
|
a list of inputs in chat format to generate responses for. |
required |
num_generations |
int
|
the number of generations to create per input. Defaults to
|
1
|
max_new_tokens |
int
|
the maximum number of new tokens that the model will generate.
Defaults to |
128
|
frequency_penalty |
float
|
the repetition penalty to use for the generation. Defaults
to |
0.0
|
presence_penalty |
float
|
the presence penalty to use for the generation. Defaults to
|
0.0
|
temperature |
float
|
the temperature to use for the generation. Defaults to |
1.0
|
top_p |
float
|
the top-p value to use for the generation. Defaults to |
1.0
|
Returns:
Type | Description |
---|---|
List[GenerateOutput]
|
A list of lists of strings containing the generated responses for each input. |
Source code in src/distilabel/llms/llamacpp.py
load()
¶
Loads the Llama
model from the model_path
.