LlamaCppEmbeddings¶
LlamaCpp
library implementation for embedding generation.
Attributes¶
-
model_name: contains the name of the GGUF quantized model, compatible with the installed version of the
llama.cpp
Python bindings. -
model_path: contains the path to the GGUF quantized model, compatible with the installed version of the
llama.cpp
Python bindings. -
repo_id: the Hugging Face Hub repository id.
-
verbose: whether to print verbose output. Defaults to
False
. -
n_gpu_layers: number of layers to run on the GPU. Defaults to
-1
(use the GPU if available). -
disable_cuda_device_placement: whether to disable CUDA device placement. Defaults to
True
. -
normalize_embeddings: whether to normalize the embeddings. Defaults to
False
. -
seed: RNG seed, -1 for random
-
n_ctx: Text context, 0 = from model
-
n_batch: Prompt processing maximum batch size
-
extra_kwargs: additional dictionary of keyword arguments that will be passed to the
Llama
class ofllama_cpp
library. Defaults to{}
.
Runtime Parameters¶
-
n_gpu_layers: the number of layers to use for the GPU. Defaults to
-1
. -
verbose: whether to print verbose output. Defaults to
False
. -
normalize_embeddings: whether to normalize the embeddings. Defaults to
False
. -
extra_kwargs: additional dictionary of keyword arguments that will be passed to the
Llama
class ofllama_cpp
library. Defaults to{}
.
Examples¶
Generate sentence embeddings using a local model¶
from pathlib import Path
from distilabel.models.embeddings import LlamaCppEmbeddings
# You can follow along this example downloading the following model running the following
# command in the terminal, that will download the model to the `Downloads` folder:
# curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://huggingface.co/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf
model_path = "Downloads/"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(
model=model,
model_path=str(Path.home() / model_path),
)
embeddings.load()
results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()
Generate sentence embeddings using a HuggingFace Hub model¶
from distilabel.models.embeddings import LlamaCppEmbeddings
# You need to set environment variable to download private model to the local machine
repo_id = "second-state/All-MiniLM-L6-v2-Embedding-GGUF"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(model=model,repo_id=repo_id)
embeddings.load()
results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()
# [
# [-0.05447685346007347, -0.01623094454407692, ...],
# [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]
Generate sentence embeddings with cpu¶
from pathlib import Path
from distilabel.models.embeddings import LlamaCppEmbeddings
# You can follow along this example downloading the following model running the following
# command in the terminal, that will download the model to the `Downloads` folder:
# curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://huggingface.co/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf
model_path = "Downloads/"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(
model=model,
model_path=str(Path.home() / model_path),
n_gpu_layers=0,
disable_cuda_device_placement=True,
)
embeddings.load()
results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()
# [
# [-0.05447685346007347, -0.01623094454407692, ...],
# [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]