Keep
KeepColumns
¶
Bases: Step
KeepColumns is a Step that implements the process
method that keeps only the columns
specified in the columns
attribute. Also KeepColumns
provides an attribute columns
to
specify the columns to keep which will override the default value for the properties inputs
and outputs
.
Note
The order in which the columns are provided is important, as the output will be sorted
using the provided order, which is useful before pushing either a dataset.Dataset
via
the PushToHub
step or a distilabel.Distiset
via the Pipeline.run
output variable.
Attributes:
Name | Type | Description |
---|---|---|
columns |
List[str]
|
List of strings with the names of the columns to keep. |
Input columns
- dynamic, based on the
columns
value provided.
Output columns
- dynamic, based on the
columns
value provided.
Source code in src/distilabel/steps/keep.py
inputs: List[str]
property
¶
The inputs for the task are the column names in columns
.
outputs: List[str]
property
¶
The outputs for the task are the column names in columns
.
process(*inputs)
¶
The process
method keeps only the columns specified in the columns
attribute.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*inputs |
StepInput
|
A list of dictionaries with the input data. |
()
|
Yields:
Type | Description |
---|---|
StepOutput
|
A list of dictionaries with the output data. |