GeneratorStep¶
This section contains the API reference for the GeneratorStep
class.
For more information and examples on how to use existing generator steps or create custom ones, please refer to Tutorial - Step - GeneratorStep.
GeneratorStep
¶
Bases: _Step
, ABC
A special kind of Step
that is able to generate data i.e. it doesn't receive
any input from the previous steps.
Attributes:
Name | Type | Description |
---|---|---|
batch_size |
RuntimeParameter[int]
|
The number of rows that will contain the batches generated by the
step. Defaults to |
Runtime parameters
batch_size
: The number of rows that will contain the batches generated by the step. Defaults to50
.
Source code in src/distilabel/steps/base.py
process(offset=0)
abstractmethod
¶
Method that defines the generation logic of the step. It should yield the output rows and a boolean indicating if it's the last batch or not.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset
|
int
|
The offset to start the generation from. Defaults to 0. |
0
|
Yields:
Type | Description |
---|---|
GeneratorStepOutput
|
The output rows and a boolean indicating if it's the last batch or not. |
Source code in src/distilabel/steps/base.py
process_applying_mappings(offset=0)
¶
Runs the process
method of the step applying the outputs_mappings
to the
output rows. This is the function that should be used to run the generation logic
of the step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset
|
int
|
The offset to start the generation from. Defaults to 0. |
0
|
Yields:
Type | Description |
---|---|
GeneratorStepOutput
|
The output rows and a boolean indicating if it's the last batch or not. |
Source code in src/distilabel/steps/base.py
make_generator_step(dataset, pipeline=None, batch_size=50, input_mappings=None, output_mappings=None, resources=StepResources(), repo_id='default_name')
¶
Helper method to create a GeneratorStep
from a dataset, to simplify
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
Union[Dataset, DataFrame, List[Dict[str, str]]]
|
The dataset to use in the |
required |
batch_size
|
int
|
The batch_size, will default to the same used by the |
50
|
input_mappings
|
Optional[Dict[str, str]]
|
Applies the same as any other step. Defaults to |
None
|
output_mappings
|
Optional[Dict[str, str]]
|
Applies the same as any other step. Defaults to |
None
|
resources
|
StepResources
|
Applies the same as any other step. Defaults to |
StepResources()
|
repo_id
|
Optional[str]
|
The repository ID to use in the |
'default_name'
|
Raises:
Type | Description |
---|---|
ValueError
|
If the format is different from the ones supported. |
Returns:
Type | Description |
---|---|
GeneratorStep
|
A |
GeneratorStep
|
if the input is a |