Skip to content

EmbeddingGeneration

Generate embeddings using an Embeddings model.

EmbeddingGeneration is a Step that using an Embeddings model generates sentence embeddings for the provided input texts.

Attributes

  • embeddings: the Embeddings model used to generate the sentence embeddings.

Input & Output Columns

graph TD
    subgraph Dataset
        subgraph Columns
            ICOL0[text]
        end
        subgraph New columns
            OCOL0[embedding]
        end
    end

    subgraph EmbeddingGeneration
        StepInput[Input Columns: text]
        StepOutput[Output Columns: embedding]
    end

    ICOL0 --> StepInput
    StepOutput --> OCOL0
    StepInput --> StepOutput

Inputs

  • text (str): The text for which the sentence embedding has to be generated.

Outputs

  • embedding (List[Union[float, int]]): the generated sentence embedding.

Examples

Generate sentence embeddings with Sentence Transformers

from distilabel.embeddings import SentenceTransformerEmbeddings
from distilabel.steps import EmbeddingGeneration

embedding_generation = EmbeddingGeneration(
    embeddings=SentenceTransformerEmbeddings(
        model="mixedbread-ai/mxbai-embed-large-v1",
    )
)

embedding_generation.load()

result = next(embedding_generation.process([{"text": "Hello, how are you?"}]))
# [{'text': 'Hello, how are you?', 'embedding': [0.06209656596183777, -0.015797119587659836, ...]}]