Embedding Gallery¶

This section contains the existing Embeddings subclasses implemented in distilabel.

`embeddings` ¶

`LlamaCppEmbeddings` ¶

Bases: Embeddings, CudaDevicePlacementMixin

LlamaCpp library implementation for embedding generation.

Attributes:

Name	Type	Description
`model_name`	`str`	contains the name of the GGUF quantized model, compatible with the installed version of the `llama.cpp` Python bindings.
`model_path`	`RuntimeParameter[str]`	contains the path to the GGUF quantized model, compatible with the installed version of the `llama.cpp` Python bindings.
`repo_id`	`RuntimeParameter[str]`	the Hugging Face Hub repository id.
`verbose`	`RuntimeParameter[bool]`	whether to print verbose output. Defaults to `False`.
`n_gpu_layers`	`RuntimeParameter[int]`	number of layers to run on the GPU. Defaults to `-1` (use the GPU if available).
`disable_cuda_device_placement`	`RuntimeParameter[bool]`	whether to disable CUDA device placement. Defaults to `True`.
`normalize_embeddings`	`RuntimeParameter[bool]`	whether to normalize the embeddings. Defaults to `False`.
`seed`	`int`	RNG seed, -1 for random
`n_ctx`	`int`	Text context, 0 = from model
`n_batch`	`int`	Prompt processing maximum batch size
`extra_kwargs`	`Optional[RuntimeParameter[Dict[str, Any]]]`	additional dictionary of keyword arguments that will be passed to the `Llama` class of `llama_cpp` library. Defaults to `{}`.

Runtime parameters

n_gpu_layers: the number of layers to use for the GPU. Defaults to -1.
verbose: whether to print verbose output. Defaults to False.
normalize_embeddings: whether to normalize the embeddings. Defaults to False.
extra_kwargs: additional dictionary of keyword arguments that will be passed to the Llama class of llama_cpp library. Defaults to {}.

References

Offline inference embeddings

Examples:

Generate sentence embeddings using a local model:

from pathlib import Path
from distilabel.models.embeddings import LlamaCppEmbeddings

# You can follow along this example downloading the following model running the following
# command in the terminal, that will download the model to the `Downloads` folder:
# curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://huggingface.co/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

model_path = "Downloads/"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(
    model=model,
    model_path=str(Path.home() / model_path),
)

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()

Generate sentence embeddings using a HuggingFace Hub model:

from distilabel.models.embeddings import LlamaCppEmbeddings
# You need to set environment variable to download private model to the local machine

repo_id = "second-state/All-MiniLM-L6-v2-Embedding-GGUF"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(model=model,repo_id=repo_id)

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]

Generate sentence embeddings with cpu:

from pathlib import Path
from distilabel.models.embeddings import LlamaCppEmbeddings

# You can follow along this example downloading the following model running the following
# command in the terminal, that will download the model to the `Downloads` folder:
# curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://huggingface.co/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

model_path = "Downloads/"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(
    model=model,
    model_path=str(Path.home() / model_path),
    n_gpu_layers=0,
    disable_cuda_device_placement=True,
)

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]

Source code in src/distilabel/models/embeddings/llamacpp.py

class LlamaCppEmbeddings(Embeddings, CudaDevicePlacementMixin):
    """`LlamaCpp` library implementation for embedding generation.

    Attributes:
        model_name: contains the name of the GGUF quantized model, compatible with the
            installed version of the `llama.cpp` Python bindings.
        model_path: contains the path to the GGUF quantized model, compatible with the
            installed version of the `llama.cpp` Python bindings.
        repo_id: the Hugging Face Hub repository id.
        verbose: whether to print verbose output. Defaults to `False`.
        n_gpu_layers: number of layers to run on the GPU. Defaults to `-1` (use the GPU if available).
        disable_cuda_device_placement: whether to disable CUDA device placement. Defaults to `True`.
        normalize_embeddings: whether to normalize the embeddings. Defaults to `False`.
        seed: RNG seed, -1 for random
        n_ctx: Text context, 0 = from model
        n_batch: Prompt processing maximum batch size
        extra_kwargs: additional dictionary of keyword arguments that will be passed to the
            `Llama` class of `llama_cpp` library. Defaults to `{}`.

    Runtime parameters:
        - `n_gpu_layers`: the number of layers to use for the GPU. Defaults to `-1`.
        - `verbose`: whether to print verbose output. Defaults to `False`.
        - `normalize_embeddings`: whether to normalize the embeddings. Defaults to `False`.
        - `extra_kwargs`: additional dictionary of keyword arguments that will be passed to the
            `Llama` class of `llama_cpp` library. Defaults to `{}`.

    References:
        - [Offline inference embeddings](https://llama-cpp-python.readthedocs.io/en/stable/#embeddings)

    Examples:
        Generate sentence embeddings using a local model:

        ```python
        from pathlib import Path
        from distilabel.models.embeddings import LlamaCppEmbeddings

        # You can follow along this example downloading the following model running the following
        # command in the terminal, that will download the model to the `Downloads` folder:
        # curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://huggingface.co/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

        model_path = "Downloads/"
        model = "all-MiniLM-L6-v2-Q2_K.gguf"
        embeddings = LlamaCppEmbeddings(
            model=model,
            model_path=str(Path.home() / model_path),
        )

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        print(results)
        embeddings.unload()
        ```

        Generate sentence embeddings using a HuggingFace Hub model:

        ```python
        from distilabel.models.embeddings import LlamaCppEmbeddings
        # You need to set environment variable to download private model to the local machine

        repo_id = "second-state/All-MiniLM-L6-v2-Embedding-GGUF"
        model = "all-MiniLM-L6-v2-Q2_K.gguf"
        embeddings = LlamaCppEmbeddings(model=model,repo_id=repo_id)

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        print(results)
        embeddings.unload()
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```

        Generate sentence embeddings with cpu:

        ```python
        from pathlib import Path
        from distilabel.models.embeddings import LlamaCppEmbeddings

        # You can follow along this example downloading the following model running the following
        # command in the terminal, that will download the model to the `Downloads` folder:
        # curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://huggingface.co/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

        model_path = "Downloads/"
        model = "all-MiniLM-L6-v2-Q2_K.gguf"
        embeddings = LlamaCppEmbeddings(
            model=model,
            model_path=str(Path.home() / model_path),
            n_gpu_layers=0,
            disable_cuda_device_placement=True,
        )

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        print(results)
        embeddings.unload()
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```


    """

    model: str = Field(
        description="The name of the model to use for embeddings.",
    )

    model_path: RuntimeParameter[str] = Field(
        default=None,
        description="The path to the GGUF quantized model, compatible with the installed version of the `llama.cpp` Python bindings.",
    )

    repo_id: RuntimeParameter[str] = Field(
        default=None, description="The Hugging Face Hub repository id.", exclude=True
    )

    n_gpu_layers: RuntimeParameter[int] = Field(
        default=-1,
        description="The number of layers that will be loaded in the GPU.",
    )

    n_ctx: int = 512
    n_batch: int = 512
    seed: int = 4294967295

    normalize_embeddings: RuntimeParameter[bool] = Field(
        default=False,
        description="Whether to normalize the embeddings.",
    )
    verbose: RuntimeParameter[bool] = Field(
        default=False,
        description="Whether to print verbose output from llama.cpp library.",
    )
    extra_kwargs: Optional[RuntimeParameter[Dict[str, Any]]] = Field(
        default_factory=dict,
        description="Additional dictionary of keyword arguments that will be passed to the"
        " `Llama` class of `llama_cpp` library. See all the supported arguments at: "
        "https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama.__init__",
    )
    _model: Optional["Llama"] = PrivateAttr(...)

    def load(self) -> None:
        """Loads the `gguf` model using either the path or the Hugging Face Hub repository id."""
        super().load()
        CudaDevicePlacementMixin.load(self)

        try:
            from llama_cpp import Llama
        except ImportError as ie:
            raise ImportError(
                "`llama-cpp-python` package is not installed. Please install it using"
                " `pip install 'distilabel[llama-cpp]'`."
            ) from ie

        if self.repo_id is not None:
            # use repo_id to download the model
            from huggingface_hub.utils import validate_repo_id

            validate_repo_id(self.repo_id)
            self._model = Llama.from_pretrained(
                repo_id=self.repo_id,
                filename=self.model,
                n_gpu_layers=self.n_gpu_layers,
                seed=self.seed,
                n_ctx=self.n_ctx,
                n_batch=self.n_batch,
                verbose=self.verbose,
                embedding=True,
                kwargs=self.extra_kwargs,
            )
        elif self.model_path is not None:
            self._model = Llama(
                model_path=str(Path(self.model_path) / self.model),
                n_gpu_layers=self.n_gpu_layers,
                seed=self.seed,
                n_ctx=self.n_ctx,
                n_batch=self.n_batch,
                verbose=self.verbose,
                embedding=True,
                kwargs=self.extra_kwargs,
            )
        else:
            raise ValueError("Either 'model_path' or 'repo_id' must be provided")

    def unload(self) -> None:
        """Unloads the `gguf` model."""
        CudaDevicePlacementMixin.unload(self)
        self._model.close()
        super().unload()

    @property
    def model_name(self) -> str:
        """Returns the name of the model."""
        return self.model

    def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
        """Generates embeddings for the provided inputs.

        Args:
            inputs: a list of texts for which an embedding has to be generated.

        Returns:
            The generated embeddings.
        """
        return self._model.embed(inputs, normalize=self.normalize_embeddings)

`model_name` `property` ¶

Returns the name of the model.

`load()` ¶

Loads the gguf model using either the path or the Hugging Face Hub repository id.

Source code in src/distilabel/models/embeddings/llamacpp.py

def load(self) -> None:
    """Loads the `gguf` model using either the path or the Hugging Face Hub repository id."""
    super().load()
    CudaDevicePlacementMixin.load(self)

    try:
        from llama_cpp import Llama
    except ImportError as ie:
        raise ImportError(
            "`llama-cpp-python` package is not installed. Please install it using"
            " `pip install 'distilabel[llama-cpp]'`."
        ) from ie

    if self.repo_id is not None:
        # use repo_id to download the model
        from huggingface_hub.utils import validate_repo_id

        validate_repo_id(self.repo_id)
        self._model = Llama.from_pretrained(
            repo_id=self.repo_id,
            filename=self.model,
            n_gpu_layers=self.n_gpu_layers,
            seed=self.seed,
            n_ctx=self.n_ctx,
            n_batch=self.n_batch,
            verbose=self.verbose,
            embedding=True,
            kwargs=self.extra_kwargs,
        )
    elif self.model_path is not None:
        self._model = Llama(
            model_path=str(Path(self.model_path) / self.model),
            n_gpu_layers=self.n_gpu_layers,
            seed=self.seed,
            n_ctx=self.n_ctx,
            n_batch=self.n_batch,
            verbose=self.verbose,
            embedding=True,
            kwargs=self.extra_kwargs,
        )
    else:
        raise ValueError("Either 'model_path' or 'repo_id' must be provided")

`unload()` ¶

Unloads the gguf model.

Source code in src/distilabel/models/embeddings/llamacpp.py

def unload(self) -> None:
    """Unloads the `gguf` model."""
    CudaDevicePlacementMixin.unload(self)
    self._model.close()
    super().unload()

`encode(inputs)` ¶

Generates embeddings for the provided inputs.

Parameters:

Name	Type	Description	Default
`inputs`	`List[str]`	a list of texts for which an embedding has to be generated.	required

Returns:

Type	Description
`List[List[Union[int, float]]]`	The generated embeddings.

Source code in src/distilabel/models/embeddings/llamacpp.py

def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
    """Generates embeddings for the provided inputs.

    Args:
        inputs: a list of texts for which an embedding has to be generated.

    Returns:
        The generated embeddings.
    """
    return self._model.embed(inputs, normalize=self.normalize_embeddings)

`SentenceTransformerEmbeddings` ¶

Bases: Embeddings, CudaDevicePlacementMixin

sentence-transformers library implementation for embedding generation.

Attributes:

Name	Type	Description
`model`	`str`	the model Hugging Face Hub repo id or a path to a directory containing the model weights and configuration files.
`device`	`Optional[RuntimeParameter[str]]`	the name of the device used to load the model e.g. "cuda", "mps", etc. Defaults to `None`.
`prompts`	`Optional[Dict[str, str]]`	a dictionary containing prompts to be used with the model. Defaults to `None`.
`default_prompt_name`	`Optional[str]`	the default prompt (in `prompts`) that will be applied to the inputs. If not provided, then no prompt will be used. Defaults to `None`.
`trust_remote_code`	`bool`	whether to allow fetching and executing remote code fetched from the repository in the Hub. Defaults to `False`.
`revision`	`Optional[str]`	if `model` refers to a Hugging Face Hub repository, then the revision (e.g. a branch name or a commit id) to use. Defaults to `"main"`.
`token`	`Optional[str]`	the Hugging Face Hub token that will be used to authenticate to the Hugging Face Hub. If not provided, the `HF_TOKEN` environment or `huggingface_hub` package local configuration will be used. Defaults to `None`.
`truncate_dim`	`Optional[int]`	the dimension to truncate the sentence embeddings. Defaults to `None`.
`model_kwargs`	`Optional[Dict[str, Any]]`	extra kwargs that will be passed to the Hugging Face `transformers` model class. Defaults to `None`.
`tokenizer_kwargs`	`Optional[Dict[str, Any]]`	extra kwargs that will be passed to the Hugging Face `transformers` tokenizer class. Defaults to `None`.
`config_kwargs`	`Optional[Dict[str, Any]]`	extra kwargs that will be passed to the Hugging Face `transformers` configuration class. Defaults to `None`.
`precision`	`Optional[Literal['float32', 'int8', 'uint8', 'binary', 'ubinary']]`	the dtype that will have the resulting embeddings. Defaults to `"float32"`.
`normalize_embeddings`	`RuntimeParameter[bool]`	whether to normalize the embeddings so they have a length of 1. Defaults to `None`.

Examples:

Generating sentence embeddings:

from distilabel.models import SentenceTransformerEmbeddings

embeddings = SentenceTransformerEmbeddings(model="mixedbread-ai/mxbai-embed-large-v1")

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]

Source code in src/distilabel/models/embeddings/sentence_transformers.py

class SentenceTransformerEmbeddings(Embeddings, CudaDevicePlacementMixin):
    """`sentence-transformers` library implementation for embedding generation.

    Attributes:
        model: the model Hugging Face Hub repo id or a path to a directory containing the
            model weights and configuration files.
        device: the name of the device used to load the model e.g. "cuda", "mps", etc.
            Defaults to `None`.
        prompts: a dictionary containing prompts to be used with the model. Defaults to
            `None`.
        default_prompt_name: the default prompt (in `prompts`) that will be applied to the
            inputs. If not provided, then no prompt will be used. Defaults to `None`.
        trust_remote_code: whether to allow fetching and executing remote code fetched
            from the repository in the Hub. Defaults to `False`.
        revision: if `model` refers to a Hugging Face Hub repository, then the revision
            (e.g. a branch name or a commit id) to use. Defaults to `"main"`.
        token: the Hugging Face Hub token that will be used to authenticate to the Hugging
            Face Hub. If not provided, the `HF_TOKEN` environment or `huggingface_hub` package
            local configuration will be used. Defaults to `None`.
        truncate_dim: the dimension to truncate the sentence embeddings. Defaults to `None`.
        model_kwargs: extra kwargs that will be passed to the Hugging Face `transformers`
            model class. Defaults to `None`.
        tokenizer_kwargs: extra kwargs that will be passed to the Hugging Face `transformers`
            tokenizer class. Defaults to `None`.
        config_kwargs: extra kwargs that will be passed to the Hugging Face `transformers`
            configuration class. Defaults to `None`.
        precision: the dtype that will have the resulting embeddings. Defaults to `"float32"`.
        normalize_embeddings: whether to normalize the embeddings so they have a length
            of 1. Defaults to `None`.

    Examples:
        Generating sentence embeddings:

        ```python
        from distilabel.models import SentenceTransformerEmbeddings

        embeddings = SentenceTransformerEmbeddings(model="mixedbread-ai/mxbai-embed-large-v1")

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```
    """

    model: str
    device: Optional[RuntimeParameter[str]] = Field(
        default=None,
        description="The device to be used to load the model. If `None`, then it"
        " will check if a GPU can be used.",
    )
    prompts: Optional[Dict[str, str]] = None
    default_prompt_name: Optional[str] = None
    trust_remote_code: bool = False
    revision: Optional[str] = None
    token: Optional[str] = None
    truncate_dim: Optional[int] = None
    model_kwargs: Optional[Dict[str, Any]] = None
    tokenizer_kwargs: Optional[Dict[str, Any]] = None
    config_kwargs: Optional[Dict[str, Any]] = None
    precision: Optional[Literal["float32", "int8", "uint8", "binary", "ubinary"]] = (
        "float32"
    )
    normalize_embeddings: RuntimeParameter[bool] = Field(
        default=True,
        description="Whether to normalize the embeddings so the generated vectors"
        " have a length of 1 or not.",
    )

    _model: Union["SentenceTransformer", None] = PrivateAttr(None)

    def load(self) -> None:
        """Loads the Sentence Transformer model"""
        super().load()

        if self.device == "cuda":
            CudaDevicePlacementMixin.load(self)

        try:
            from sentence_transformers import SentenceTransformer
        except ImportError as e:
            raise ImportError(
                "`sentence-transformers` package is not installed. Please install it using"
                " `pip install 'distilabel[sentence-transformers]'`."
            ) from e

        self._model = SentenceTransformer(
            model_name_or_path=self.model,
            device=self.device,
            prompts=self.prompts,
            default_prompt_name=self.default_prompt_name,
            trust_remote_code=self.trust_remote_code,
            revision=self.revision,
            token=self.token,
            truncate_dim=self.truncate_dim,
            model_kwargs=self.model_kwargs,
            tokenizer_kwargs=self.tokenizer_kwargs,
            config_kwargs=self.config_kwargs,
        )

    @property
    def model_name(self) -> str:
        """Returns the name of the model."""
        return self.model

    def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
        """Generates embeddings for the provided inputs.

        Args:
            inputs: a list of texts for which an embedding has to be generated.

        Returns:
            The generated embeddings.
        """
        return self._model.encode(  # type: ignore
            sentences=inputs,
            batch_size=len(inputs),
            convert_to_numpy=True,
            precision=self.precision,  # type: ignore
            normalize_embeddings=self.normalize_embeddings,  # type: ignore
        ).tolist()  # type: ignore

    def unload(self) -> None:
        del self._model
        if self.device == "cuda":
            CudaDevicePlacementMixin.unload(self)
        super().unload()

`model_name` `property` ¶

Returns the name of the model.

`load()` ¶

Loads the Sentence Transformer model

Source code in src/distilabel/models/embeddings/sentence_transformers.py

def load(self) -> None:
    """Loads the Sentence Transformer model"""
    super().load()

    if self.device == "cuda":
        CudaDevicePlacementMixin.load(self)

    try:
        from sentence_transformers import SentenceTransformer
    except ImportError as e:
        raise ImportError(
            "`sentence-transformers` package is not installed. Please install it using"
            " `pip install 'distilabel[sentence-transformers]'`."
        ) from e

    self._model = SentenceTransformer(
        model_name_or_path=self.model,
        device=self.device,
        prompts=self.prompts,
        default_prompt_name=self.default_prompt_name,
        trust_remote_code=self.trust_remote_code,
        revision=self.revision,
        token=self.token,
        truncate_dim=self.truncate_dim,
        model_kwargs=self.model_kwargs,
        tokenizer_kwargs=self.tokenizer_kwargs,
        config_kwargs=self.config_kwargs,
    )

`encode(inputs)` ¶

Generates embeddings for the provided inputs.

Parameters:

Name	Type	Description	Default
`inputs`	`List[str]`	a list of texts for which an embedding has to be generated.	required

Returns:

Type	Description
`List[List[Union[int, float]]]`	The generated embeddings.

Source code in src/distilabel/models/embeddings/sentence_transformers.py

def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
    """Generates embeddings for the provided inputs.

    Args:
        inputs: a list of texts for which an embedding has to be generated.

    Returns:
        The generated embeddings.
    """
    return self._model.encode(  # type: ignore
        sentences=inputs,
        batch_size=len(inputs),
        convert_to_numpy=True,
        precision=self.precision,  # type: ignore
        normalize_embeddings=self.normalize_embeddings,  # type: ignore
    ).tolist()  # type: ignore

`SGLangEmbeddings` ¶

Bases: Embeddings, CudaDevicePlacementMixin

sglang library implementation for embedding generation.

Attributes:

Name	Type	Description
`model`	`str`	the model Hugging Face Hub repo id or a path to a directory containing the model weights and configuration files.
`dtype`	`str`	the data type to use for the model. Defaults to `auto`.
`trust_remote_code`	`bool`	whether to trust the remote code when loading the model. Defaults to `False`.
`quantization`	`Optional[str]`	the quantization mode to use for the model. Defaults to `None`.
`revision`	`Optional[str]`	the revision of the model to load. Defaults to `None`.
`seed`	`int`	the seed to use for the random number generator. Defaults to `0`.
`extra_kwargs`	`Optional[RuntimeParameter[Dict[str, Any]]]`	additional dictionary of keyword arguments that will be passed to the `Engine` class of `sglang` library. Defaults to `{}`.
`_model`	`Engine`	the `SGLang` model instance. This attribute is meant to be used internally and should not be accessed directly. It will be set in the `load` method.

References

https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/server_args.py

Examples:

Generating sentence embeddings:

if __name__ == "__main__":

    from distilabel.models import SGLangEmbeddings
    embeddings = SGLangEmbeddings(model="intfloat/e5-mistral-7b-instruct")
    embeddings.load()
    results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
    print(results)
    # [
    #   [0.0203704833984375, -0.0060882568359375, ...],
    #   [0.02398681640625, 0.0177001953125 ...],
    # ]

Source code in src/distilabel/models/embeddings/sglang.py

class SGLangEmbeddings(Embeddings, CudaDevicePlacementMixin):
    """`sglang` library implementation for embedding generation.

    Attributes:
        model: the model Hugging Face Hub repo id or a path to a directory containing the
            model weights and configuration files.
        dtype: the data type to use for the model. Defaults to `auto`.
        trust_remote_code: whether to trust the remote code when loading the model. Defaults
            to `False`.
        quantization: the quantization mode to use for the model. Defaults to `None`.
        revision: the revision of the model to load. Defaults to `None`.
        seed: the seed to use for the random number generator. Defaults to `0`.
        extra_kwargs: additional dictionary of keyword arguments that will be passed to the
            `Engine` class of `sglang` library. Defaults to `{}`.
        _model: the `SGLang` model instance. This attribute is meant to be used internally
            and should not be accessed directly. It will be set in the `load` method.

    References:
        - https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/server_args.py

    Examples:
        Generating sentence embeddings:

        ```python
        if __name__ == "__main__":

            from distilabel.models import SGLangEmbeddings
            embeddings = SGLangEmbeddings(model="intfloat/e5-mistral-7b-instruct")
            embeddings.load()
            results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
            print(results)
            # [
            #   [0.0203704833984375, -0.0060882568359375, ...],
            #   [0.02398681640625, 0.0177001953125 ...],
            # ]
        ```
    """

    model: str
    dtype: str = "auto"
    trust_remote_code: bool = False
    quantization: Optional[str] = None
    revision: Optional[str] = None

    seed: int = 0

    extra_kwargs: Optional[RuntimeParameter[Dict[str, Any]]] = Field(
        default_factory=dict,
        description="Additional dictionary of keyword arguments that will be passed to the"
        " `Engine` class of `sglang` library. See all the supported arguments at: "
        "https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/entrypoints/engine.py",
    )

    _model: "Engine" = PrivateAttr(None)

    def load(self) -> None:
        """Loads the `sglang` model using either the path or the Hugging Face Hub repository id."""
        super().load()

        CudaDevicePlacementMixin.load(self)

        try:
            from sglang import Engine
        except ImportError as err:
            raise ImportError(
                "sglang is not installed. Please install it with sglang document https://docs.sglang.ai/start/install.html."
            ) from err

        self._model = Engine(
            model_path=self.model,
            dtype=self.dtype,
            trust_remote_code=self.trust_remote_code,
            quantization=self.quantization,
            revision=self.revision,
            random_seed=self.seed,
            **self.extra_kwargs,  # type: ignore
        )

    def unload(self) -> None:
        """Unloads the `SGLang` model."""
        self._model = None
        CudaDevicePlacementMixin.unload(self)
        super().unload()

    @property
    def model_name(self) -> str:
        """Returns the name of the model."""
        return self.model

    def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
        """Generates embeddings for the provided inputs.

        Args:
            inputs: a list of texts for which an embedding has to be generated.

        Returns:
            The generated embeddings.
        """
        return [output["embedding"] for output in self._model.encode(inputs)]

`model_name` `property` ¶

Returns the name of the model.

`load()` ¶

Loads the sglang model using either the path or the Hugging Face Hub repository id.

Source code in src/distilabel/models/embeddings/sglang.py

def load(self) -> None:
    """Loads the `sglang` model using either the path or the Hugging Face Hub repository id."""
    super().load()

    CudaDevicePlacementMixin.load(self)

    try:
        from sglang import Engine
    except ImportError as err:
        raise ImportError(
            "sglang is not installed. Please install it with sglang document https://docs.sglang.ai/start/install.html."
        ) from err

    self._model = Engine(
        model_path=self.model,
        dtype=self.dtype,
        trust_remote_code=self.trust_remote_code,
        quantization=self.quantization,
        revision=self.revision,
        random_seed=self.seed,
        **self.extra_kwargs,  # type: ignore
    )

`unload()` ¶

Unloads the SGLang model.

Source code in src/distilabel/models/embeddings/sglang.py

def unload(self) -> None:
    """Unloads the `SGLang` model."""
    self._model = None
    CudaDevicePlacementMixin.unload(self)
    super().unload()

`encode(inputs)` ¶

Generates embeddings for the provided inputs.

Parameters:

Name	Type	Description	Default
`inputs`	`List[str]`	a list of texts for which an embedding has to be generated.	required

Returns:

Type	Description
`List[List[Union[int, float]]]`	The generated embeddings.

Source code in src/distilabel/models/embeddings/sglang.py

def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
    """Generates embeddings for the provided inputs.

    Args:
        inputs: a list of texts for which an embedding has to be generated.

    Returns:
        The generated embeddings.
    """
    return [output["embedding"] for output in self._model.encode(inputs)]

`vLLMEmbeddings` ¶

Bases: Embeddings, CudaDevicePlacementMixin

vllm library implementation for embedding generation.

Attributes:

Name	Type	Description
`model`	`str`	the model Hugging Face Hub repo id or a path to a directory containing the model weights and configuration files.
`dtype`	`str`	the data type to use for the model. Defaults to `auto`.
`trust_remote_code`	`bool`	whether to trust the remote code when loading the model. Defaults to `False`.
`quantization`	`Optional[str]`	the quantization mode to use for the model. Defaults to `None`.
`revision`	`Optional[str]`	the revision of the model to load. Defaults to `None`.
`enforce_eager`	`bool`	whether to enforce eager execution. Defaults to `True`.
`seed`	`int`	the seed to use for the random number generator. Defaults to `0`.
`extra_kwargs`	`Optional[RuntimeParameter[Dict[str, Any]]]`	additional dictionary of keyword arguments that will be passed to the `LLM` class of `vllm` library. Defaults to `{}`.
`_model`	`LLM`	the `vLLM` model instance. This attribute is meant to be used internally and should not be accessed directly. It will be set in the `load` method.

References

Offline inference embeddings

Examples:

Generating sentence embeddings:

from distilabel.models import vLLMEmbeddings

embeddings = vLLMEmbeddings(model="intfloat/e5-mistral-7b-instruct")

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]

Source code in src/distilabel/models/embeddings/vllm.py

class vLLMEmbeddings(Embeddings, CudaDevicePlacementMixin):
    """`vllm` library implementation for embedding generation.

    Attributes:
        model: the model Hugging Face Hub repo id or a path to a directory containing the
            model weights and configuration files.
        dtype: the data type to use for the model. Defaults to `auto`.
        trust_remote_code: whether to trust the remote code when loading the model. Defaults
            to `False`.
        quantization: the quantization mode to use for the model. Defaults to `None`.
        revision: the revision of the model to load. Defaults to `None`.
        enforce_eager: whether to enforce eager execution. Defaults to `True`.
        seed: the seed to use for the random number generator. Defaults to `0`.
        extra_kwargs: additional dictionary of keyword arguments that will be passed to the
            `LLM` class of `vllm` library. Defaults to `{}`.
        _model: the `vLLM` model instance. This attribute is meant to be used internally
            and should not be accessed directly. It will be set in the `load` method.

    References:
        - [Offline inference embeddings](https://docs.vllm.ai/en/latest/getting_started/examples/offline_inference_embedding.html)

    Examples:
        Generating sentence embeddings:

        ```python
        from distilabel.models import vLLMEmbeddings

        embeddings = vLLMEmbeddings(model="intfloat/e5-mistral-7b-instruct")

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```
    """

    model: str
    dtype: str = "auto"
    trust_remote_code: bool = False
    quantization: Optional[str] = None
    revision: Optional[str] = None

    enforce_eager: bool = True

    seed: int = 0

    extra_kwargs: Optional[RuntimeParameter[Dict[str, Any]]] = Field(
        default_factory=dict,
        description="Additional dictionary of keyword arguments that will be passed to the"
        " `vLLM` class of `vllm` library. See all the supported arguments at: "
        "https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py",
    )

    _model: "_vLLM" = PrivateAttr(None)

    def load(self) -> None:
        """Loads the `vLLM` model using either the path or the Hugging Face Hub repository id."""
        super().load()

        CudaDevicePlacementMixin.load(self)

        try:
            from vllm import LLM as _vLLM

        except ImportError as ie:
            raise ImportError(
                "vLLM is not installed. Please install it using `pip install 'distilabel[vllm]'`."
            ) from ie

        self._model = _vLLM(
            self.model,
            dtype=self.dtype,
            trust_remote_code=self.trust_remote_code,
            quantization=self.quantization,
            revision=self.revision,
            enforce_eager=self.enforce_eager,
            seed=self.seed,
            **self.extra_kwargs,  # type: ignore
        )

    def unload(self) -> None:
        """Unloads the `vLLM` model."""
        CudaDevicePlacementMixin.unload(self)
        super().unload()

    @property
    def model_name(self) -> str:
        """Returns the name of the model."""
        return self.model

    def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
        """Generates embeddings for the provided inputs.

        Args:
            inputs: a list of texts for which an embedding has to be generated.

        Returns:
            The generated embeddings.
        """
        return [output.outputs.embedding for output in self._model.encode(inputs)]

`model_name` `property` ¶

Returns the name of the model.

`load()` ¶

Loads the vLLM model using either the path or the Hugging Face Hub repository id.

Source code in src/distilabel/models/embeddings/vllm.py

def load(self) -> None:
    """Loads the `vLLM` model using either the path or the Hugging Face Hub repository id."""
    super().load()

    CudaDevicePlacementMixin.load(self)

    try:
        from vllm import LLM as _vLLM

    except ImportError as ie:
        raise ImportError(
            "vLLM is not installed. Please install it using `pip install 'distilabel[vllm]'`."
        ) from ie

    self._model = _vLLM(
        self.model,
        dtype=self.dtype,
        trust_remote_code=self.trust_remote_code,
        quantization=self.quantization,
        revision=self.revision,
        enforce_eager=self.enforce_eager,
        seed=self.seed,
        **self.extra_kwargs,  # type: ignore
    )

`unload()` ¶

Unloads the vLLM model.

Source code in src/distilabel/models/embeddings/vllm.py

def unload(self) -> None:
    """Unloads the `vLLM` model."""
    CudaDevicePlacementMixin.unload(self)
    super().unload()

`encode(inputs)` ¶

Generates embeddings for the provided inputs.

Parameters:

Name	Type	Description	Default
`inputs`	`List[str]`	a list of texts for which an embedding has to be generated.	required

Returns:

Type	Description
`List[List[Union[int, float]]]`	The generated embeddings.

Source code in src/distilabel/models/embeddings/vllm.py

def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
    """Generates embeddings for the provided inputs.

    Args:
        inputs: a list of texts for which an embedding has to be generated.

    Returns:
        The generated embeddings.
    """
    return [output.outputs.embedding for output in self._model.encode(inputs)]

Embedding Gallery¶

embeddings ¶

LlamaCppEmbeddings ¶

model_name property ¶

load() ¶

unload() ¶

encode(inputs) ¶

SentenceTransformerEmbeddings ¶

model_name property ¶

load() ¶

encode(inputs) ¶

SGLangEmbeddings ¶

model_name property ¶

load() ¶

unload() ¶

encode(inputs) ¶

vLLMEmbeddings ¶

model_name property ¶

load() ¶

unload() ¶

encode(inputs) ¶

`embeddings` ¶

`LlamaCppEmbeddings` ¶

`model_name` `property` ¶

`load()` ¶

`unload()` ¶

`encode(inputs)` ¶

`SentenceTransformerEmbeddings` ¶

`model_name` `property` ¶

`load()` ¶

`encode(inputs)` ¶

`SGLangEmbeddings` ¶

`model_name` `property` ¶

`load()` ¶

`unload()` ¶

`encode(inputs)` ¶

`vLLMEmbeddings` ¶

`model_name` `property` ¶

`load()` ¶

`unload()` ¶

`encode(inputs)` ¶