Keep
KeepColumns
¶
Bases: Step
KeepColumns is a Step that implements the process method that keeps only the columns
specified in the columns attribute. Also KeepColumns provides an attribute columns to
specify the columns to keep which will override the default value for the properties inputs
and outputs.
Note
The order in which the columns are provided is important, as the output will be sorted
using the provided order, which is useful before pushing either a dataset.Dataset via
the PushToHub step or a distilabel.Distiset via the Pipeline.run output variable.
Attributes:
| Name | Type | Description |
|---|---|---|
columns |
List[str]
|
List of strings with the names of the columns to keep. |
Input columns
- dynamic, based on the
columnsvalue provided.
Output columns
- dynamic, based on the
columnsvalue provided.
Source code in src/distilabel/steps/keep.py
inputs: List[str]
property
¶
The inputs for the task are the column names in columns.
outputs: List[str]
property
¶
The outputs for the task are the column names in columns.
process(*inputs)
¶
The process method keeps only the columns specified in the columns attribute.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*inputs |
StepInput
|
A list of dictionaries with the input data. |
()
|
Yields:
| Type | Description |
|---|---|
StepOutput
|
A list of dictionaries with the output data. |