Generate embeddings
          GenerateEmbeddings
¶
  
            Bases: Step
Generate embeddings for a text input using the last hidden state of an LLM, as
described in the paper 'What Makes Good Data for Alignment? A Comprehensive Study of
Automatic Data Selection in Instruction Tuning'.
Attributes:
| Name | Type | Description | 
|---|---|---|
llm | 
          
                LLM
           | 
          
             The   | 
        
Input columns
- text (
str,List[Dict[str, str]]): The input text or conversation to generate embeddings for. 
Output columns
- embedding (
List[float]): The embedding of the input text or conversation. 
Source code in src/distilabel/steps/tasks/generate_embeddings.py
              
          inputs: List[str]
  
  
      property
  
¶
  The inputs for the task is a text column containing either a string or a
list of dictionaries in OpenAI chat-like format.
          outputs: List[str]
  
  
      property
  
¶
  The outputs for the task is an embedding column containing the embedding of
the text input.
          format_input(input)
¶
  Formats the input to be used by the LLM to generate the embeddings. The input
can be in ChatType format or a string. If a string, it will be converted to a
list of dictionaries in OpenAI chat-like format.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
input | 
          
                Dict[str, Any]
           | 
          
             The input to format.  | 
          required | 
Returns:
| Type | Description | 
|---|---|
                ChatType
           | 
          
             The OpenAI chat-like format of the input.  | 
        
Source code in src/distilabel/steps/tasks/generate_embeddings.py
            
          load()
¶
  
          process(inputs)
¶
  Generates an embedding for each input using the last hidden state of the LLM.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
inputs | 
          
                StepInput
           | 
          
             A list of Python dictionaries with the inputs of the task.  | 
          required | 
Yields:
| Type | Description | 
|---|---|
                StepOutput
           | 
          
             A list of Python dictionaries with the outputs of the task.  |