Skip to content

GeneratorStep

This section contains the API reference for the GeneratorStep class.

For more information and examples on how to use existing generator steps or create custom ones, please refer to Tutorial - Step - GeneratorStep.

Bases: _Step, ABC

A special kind of Step that is able to generate data i.e. it doesn't receive any input from the previous steps.

Attributes:

Name Type Description
batch_size RuntimeParameter[int]

The number of rows that will contain the batches generated by the step. Defaults to 50.

Runtime parameters
  • batch_size: The number of rows that will contain the batches generated by the step. Defaults to 50.
Source code in src/distilabel/steps/base.py
class GeneratorStep(_Step, ABC):
    """A special kind of `Step` that is able to generate data i.e. it doesn't receive
    any input from the previous steps.

    Attributes:
        batch_size: The number of rows that will contain the batches generated by the
            step. Defaults to `50`.

    Runtime parameters:
        - `batch_size`: The number of rows that will contain the batches generated by
            the step. Defaults to `50`.
    """

    batch_size: RuntimeParameter[int] = Field(
        default=50,
        description="The number of rows that will contain the batches generated by the"
        " step.",
    )

    @abstractmethod
    def process(self, offset: int = 0) -> "GeneratorStepOutput":
        """Method that defines the generation logic of the step. It should yield the
        output rows and a boolean indicating if it's the last batch or not.

        Args:
            offset: The offset to start the generation from. Defaults to 0.

        Yields:
            The output rows and a boolean indicating if it's the last batch or not.
        """
        pass

    def process_applying_mappings(self, offset: int = 0) -> "GeneratorStepOutput":
        """Runs the `process` method of the step applying the `outputs_mappings` to the
        output rows. This is the function that should be used to run the generation logic
        of the step.

        Args:
            offset: The offset to start the generation from. Defaults to 0.

        Yields:
            The output rows and a boolean indicating if it's the last batch or not.
        """

        # If the `Step` was built using the `@step` decorator, then we need to pass
        # the runtime parameters as `kwargs`, so they can be used within the processing
        # function
        generator = (
            self.process(offset=offset)
            if not self._built_from_decorator
            else self.process(offset=offset, **self._runtime_parameters)
        )

        for output_rows, last_batch in generator:
            yield (
                [
                    {self.output_mappings.get(k, k): v for k, v in row.items()}
                    for row in output_rows
                ],
                last_batch,
            )

process(offset=0) abstractmethod

Method that defines the generation logic of the step. It should yield the output rows and a boolean indicating if it's the last batch or not.

Parameters:

Name Type Description Default
offset int

The offset to start the generation from. Defaults to 0.

0

Yields:

Type Description
GeneratorStepOutput

The output rows and a boolean indicating if it's the last batch or not.

Source code in src/distilabel/steps/base.py
@abstractmethod
def process(self, offset: int = 0) -> "GeneratorStepOutput":
    """Method that defines the generation logic of the step. It should yield the
    output rows and a boolean indicating if it's the last batch or not.

    Args:
        offset: The offset to start the generation from. Defaults to 0.

    Yields:
        The output rows and a boolean indicating if it's the last batch or not.
    """
    pass

process_applying_mappings(offset=0)

Runs the process method of the step applying the outputs_mappings to the output rows. This is the function that should be used to run the generation logic of the step.

Parameters:

Name Type Description Default
offset int

The offset to start the generation from. Defaults to 0.

0

Yields:

Type Description
GeneratorStepOutput

The output rows and a boolean indicating if it's the last batch or not.

Source code in src/distilabel/steps/base.py
def process_applying_mappings(self, offset: int = 0) -> "GeneratorStepOutput":
    """Runs the `process` method of the step applying the `outputs_mappings` to the
    output rows. This is the function that should be used to run the generation logic
    of the step.

    Args:
        offset: The offset to start the generation from. Defaults to 0.

    Yields:
        The output rows and a boolean indicating if it's the last batch or not.
    """

    # If the `Step` was built using the `@step` decorator, then we need to pass
    # the runtime parameters as `kwargs`, so they can be used within the processing
    # function
    generator = (
        self.process(offset=offset)
        if not self._built_from_decorator
        else self.process(offset=offset, **self._runtime_parameters)
    )

    for output_rows, last_batch in generator:
        yield (
            [
                {self.output_mappings.get(k, k): v for k, v in row.items()}
                for row in output_rows
            ],
            last_batch,
        )