SentenceTransformerEmbeddings¶
sentence-transformers library implementation for embedding generation.
Attributes¶
- 
model: the model Hugging Face Hub repo id or a path to a directory containing the model weights and configuration files. 
- 
device: the name of the device used to load the model e.g. "cuda", "mps", etc. Defaults to None.
- 
prompts: a dictionary containing prompts to be used with the model. Defaults to None.
- 
default_prompt_name: the default prompt (in prompts) that will be applied to the inputs. If not provided, then no prompt will be used. Defaults toNone.
- 
trust_remote_code: whether to allow fetching and executing remote code fetched from the repository in the Hub. Defaults to False.
- 
revision: if modelrefers to a Hugging Face Hub repository, then the revision (e.g. a branch name or a commit id) to use. Defaults to"main".
- 
token: the Hugging Face Hub token that will be used to authenticate to the Hugging Face Hub. If not provided, the HF_TOKENenvironment orhuggingface_hubpackage local configuration will be used. Defaults toNone.
- 
truncate_dim: the dimension to truncate the sentence embeddings. Defaults to None.
- 
model_kwargs: extra kwargs that will be passed to the Hugging Face transformersmodel class. Defaults toNone.
- 
tokenizer_kwargs: extra kwargs that will be passed to the Hugging Face transformerstokenizer class. Defaults toNone.
- 
config_kwargs: extra kwargs that will be passed to the Hugging Face transformersconfiguration class. Defaults toNone.
- 
precision: the dtype that will have the resulting embeddings. Defaults to "float32".
- 
normalize_embeddings: whether to normalize the embeddings so they have a length of 1. Defaults to None.
Examples¶
Generating sentence embeddings¶
from distilabel.embeddings import SentenceTransformerEmbeddings
embeddings = SentenceTransformerEmbeddings(model="mixedbread-ai/mxbai-embed-large-v1")
embeddings.load()
results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]