Data
LoadDataFromDicts
¶
Bases: GeneratorStep
A generator step that loads a dataset from a list of dictionaries.
This step will load the dataset and yield the transformed data as it is loaded from the list of dictionaries.
Attributes:
Name | Type | Description |
---|---|---|
data |
List[Dict[str, Any]]
|
The list of dictionaries to load the data from. |
Runtime parameters
batch_size
: The batch size to use when processing the data.
Output columns
Dynamic, based on the keys found on the first dictionary of the list
Source code in src/distilabel/steps/generators/data.py
outputs: List[str]
property
¶
Returns a list of strings with the names of the columns that the step will generate.
process(offset=0)
¶
Yields batches from a list of dictionaries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset |
int
|
The offset to start the generation from. Defaults to |
0
|
Yields:
Type | Description |
---|---|
GeneratorStepOutput
|
A list of Python dictionaries as read from the inputs (propagated in batches) |
GeneratorStepOutput
|
and a flag indicating whether the yield batch is the last one. |