Skip to content

Text generation

TextGeneration

Bases: Task

TextGeneration is a pre-defined task that defines the instruction as the input and generation as the output. This task is used to generate text based on the input instruction. The model_name is also returned as part of the output in order to enhance it.

Input columns
  • instruction (str): The instruction to generate text from.
Output columns
  • generation (str): The generated text.
  • model_name (str): The model name used to generate the text.
Source code in src/distilabel/steps/tasks/text_generation.py
class TextGeneration(Task):
    """TextGeneration is a pre-defined task that defines the `instruction` as the input
    and `generation` as the output. This task is used to generate text based on the input
    instruction. The model_name is also returned as part of the output in order to enhance it.

    Input columns:
        - instruction (`str`): The instruction to generate text from.

    Output columns:
        - generation (`str`): The generated text.
        - model_name (`str`): The model name used to generate the text.
    """

    @property
    def inputs(self) -> List[str]:
        """The input for the task is the `instruction`."""
        return ["instruction"]

    def format_input(self, input: Dict[str, Any]) -> ChatType:
        """The input is formatted as a `ChatType` assuming that the instruction
        is the first interaction from the user within a conversation."""

        instruction = input["instruction"]

        if isinstance(instruction, str):
            return [{"role": "user", "content": input["instruction"]}]

        if not is_openai_format(instruction):
            raise ValueError(
                f"Input `instruction` must be a string or an OpenAI chat-like format. "
                f"Got: {instruction}. Please check: 'https://cookbook.openai.com/examples/how_to_format_inputs_to_chatgpt_models'."
            )

        return instruction

    @property
    def outputs(self) -> List[str]:
        """The output for the task is the `generation` and the `model_name`."""
        return ["generation", "model_name"]

    def format_output(
        self, output: Union[str, None], input: Dict[str, Any]
    ) -> Dict[str, Any]:
        """The output is formatted as a dictionary with the `generation`. The `model_name`
        will be automatically included within the `process` method of `Task`."""
        return {"generation": output}

inputs: List[str] property

The input for the task is the instruction.

outputs: List[str] property

The output for the task is the generation and the model_name.

format_input(input)

The input is formatted as a ChatType assuming that the instruction is the first interaction from the user within a conversation.

Source code in src/distilabel/steps/tasks/text_generation.py
def format_input(self, input: Dict[str, Any]) -> ChatType:
    """The input is formatted as a `ChatType` assuming that the instruction
    is the first interaction from the user within a conversation."""

    instruction = input["instruction"]

    if isinstance(instruction, str):
        return [{"role": "user", "content": input["instruction"]}]

    if not is_openai_format(instruction):
        raise ValueError(
            f"Input `instruction` must be a string or an OpenAI chat-like format. "
            f"Got: {instruction}. Please check: 'https://cookbook.openai.com/examples/how_to_format_inputs_to_chatgpt_models'."
        )

    return instruction

format_output(output, input)

The output is formatted as a dictionary with the generation. The model_name will be automatically included within the process method of Task.

Source code in src/distilabel/steps/tasks/text_generation.py
def format_output(
    self, output: Union[str, None], input: Dict[str, Any]
) -> Dict[str, Any]:
    """The output is formatted as a dictionary with the `generation`. The `model_name`
    will be automatically included within the `process` method of `Task`."""
    return {"generation": output}