Skip to content

Combine

CombineColumns

Bases: Step

CombineColumns is a Step that implements the process method that calls the combine_dicts function to handle and combine a list of StepInput. Also CombineColumns provides two attributes columns and output_columns to specify the columns to merge and the output columns which will override the default value for the properties inputs and outputs, respectively.

Attributes:

Name Type Description
columns List[str]

List of strings with the names of the columns to merge.

output_columns Optional[List[str]]

Optional list of strings with the names of the output columns.

Input columns
  • dynamic, based on the columns value provided.
Output columns
  • dynamic, based on the output_columns value provided or merged_{column} for each column in columns.
Source code in src/distilabel/steps/combine.py
class CombineColumns(Step):
    """CombineColumns is a Step that implements the `process` method that calls the `combine_dicts`
    function to handle and combine a list of `StepInput`. Also `CombineColumns` provides two attributes
    `columns` and `output_columns` to specify the columns to merge and the output columns
    which will override the default value for the properties `inputs` and `outputs`, respectively.

    Attributes:
        columns: List of strings with the names of the columns to merge.
        output_columns: Optional list of strings with the names of the output columns.

    Input columns:
        - dynamic, based on the `columns` value provided.

    Output columns:
        - dynamic, based on the `output_columns` value provided or `merged_{column}` for each column in `columns`.
    """

    columns: List[str]
    output_columns: Optional[List[str]] = None

    @property
    def inputs(self) -> List[str]:
        """The inputs for the task are the column names in `columns`."""
        return self.columns

    @property
    def outputs(self) -> List[str]:
        """The outputs for the task are the column names in `output_columns` or
        `merged_{column}` for each column in `columns`."""
        return (
            self.output_columns
            if self.output_columns is not None
            else [f"merged_{column}" for column in self.columns]
        )

    @override
    def process(self, *inputs: StepInput) -> "StepOutput":
        """The `process` method calls the `combine_dicts` function to handle and combine a list of `StepInput`.

        Args:
            *inputs: A list of `StepInput` to be combined.

        Yields:
            A `StepOutput` with the combined `StepInput` using the `combine_dicts` function.
        """
        yield combine_dicts(
            *inputs,
            merge_keys=self.inputs,
            output_merge_keys=self.outputs,
        )

inputs: List[str] property

The inputs for the task are the column names in columns.

outputs: List[str] property

The outputs for the task are the column names in output_columns or merged_{column} for each column in columns.

process(*inputs)

The process method calls the combine_dicts function to handle and combine a list of StepInput.

Parameters:

Name Type Description Default
*inputs StepInput

A list of StepInput to be combined.

()

Yields:

Type Description
StepOutput

A StepOutput with the combined StepInput using the combine_dicts function.

Source code in src/distilabel/steps/combine.py
@override
def process(self, *inputs: StepInput) -> "StepOutput":
    """The `process` method calls the `combine_dicts` function to handle and combine a list of `StepInput`.

    Args:
        *inputs: A list of `StepInput` to be combined.

    Yields:
        A `StepOutput` with the combined `StepInput` using the `combine_dicts` function.
    """
    yield combine_dicts(
        *inputs,
        merge_keys=self.inputs,
        output_merge_keys=self.outputs,
    )