GeneratorStep¶
This section contains the API reference for the GeneratorStep class.
For more information and examples on how to use existing generator steps or create custom ones, please refer to Tutorial - Step - GeneratorStep.
GeneratorStep
¶
    
              Bases: _Step, ABC
A special kind of Step that is able to generate data i.e. it doesn't receive
any input from the previous steps.
Attributes:
| Name | Type | Description | 
|---|---|---|
| batch_size | RuntimeParameter[int] | The number of rows that will contain the batches generated by the
step. Defaults to  | 
Runtime parameters
- batch_size: The number of rows that will contain the batches generated by the step. Defaults to- 50.
Source code in src/distilabel/steps/base.py
                
process(offset=0)
  
      abstractmethod
  
¶
    Method that defines the generation logic of the step. It should yield the output rows and a boolean indicating if it's the last batch or not.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| offset | int | The offset to start the generation from. Defaults to 0. | 0 | 
Yields:
| Type | Description | 
|---|---|
| GeneratorStepOutput | The output rows and a boolean indicating if it's the last batch or not. | 
Source code in src/distilabel/steps/base.py
              
process_applying_mappings(offset=0)
¶
    Runs the process method of the step applying the outputs_mappings to the
output rows. This is the function that should be used to run the generation logic
of the step.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| offset | int | The offset to start the generation from. Defaults to 0. | 0 | 
Yields:
| Type | Description | 
|---|---|
| GeneratorStepOutput | The output rows and a boolean indicating if it's the last batch or not. | 
Source code in src/distilabel/steps/base.py
              
make_generator_step(dataset, pipeline=None, batch_size=50, input_mappings=None, output_mappings=None, resources=StepResources(), repo_id='default_name')
¶
    Helper method to create a GeneratorStep from a dataset, to simplify
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| dataset | Union[Dataset, DataFrame, List[Dict[str, str]]] | The dataset to use in the  | required | 
| batch_size | int | The batch_size, will default to the same used by the  | 50 | 
| input_mappings | Optional[Dict[str, str]] | Applies the same as any other step. Defaults to  | None | 
| output_mappings | Optional[Dict[str, str]] | Applies the same as any other step. Defaults to  | None | 
| resources | StepResources | Applies the same as any other step. Defaults to  | StepResources() | 
| repo_id | Optional[str] | The repository ID to use in the  | 'default_name' | 
Raises:
| Type | Description | 
|---|---|
| ValueError | If the format is different from the ones supported. | 
Returns:
| Type | Description | 
|---|---|
| GeneratorStep | A  | 
| GeneratorStep | if the input is a  |