LoadHubDataset¶
Loads a dataset from the Hugging Face Hub.
GeneratorStep
that loads a dataset from the Hugging Face Hub using the datasets
library.
Attributes¶
-
repo_id: The Hugging Face Hub repository ID of the dataset to load.
-
split: The split of the dataset to load.
-
config: The configuration of the dataset to load. This is optional and only needed if the dataset has multiple configurations.
Runtime Parameters¶
-
batch_size: The batch size to use when processing the data.
-
repo_id: The Hugging Face Hub repository ID of the dataset to load.
-
split: The split of the dataset to load. Defaults to 'train'.
-
config: The configuration of the dataset to load. This is optional and only needed if the dataset has multiple configurations.
-
streaming: Whether to load the dataset in streaming mode or not. Defaults to
False
. -
num_examples: The number of examples to load from the dataset. By default will load all examples.
Input & Output Columns¶
graph TD
subgraph Dataset
subgraph New columns
OCOL0[dynamic]
end
end
subgraph LoadHubDataset
StepOutput[Output Columns: dynamic]
end
StepOutput --> OCOL0
Outputs¶
- dynamic (
all
): The columns that will be generated by this step, based on the datasets loaded from the Hugging Face Hub.