Skip to content

LoadHubDataset

Loads a dataset from the Hugging Face Hub.

GeneratorStep that loads a dataset from the Hugging Face Hub using the datasets library.

Attributes

  • repo_id: The Hugging Face Hub repository ID of the dataset to load.

  • split: The split of the dataset to load.

  • config: The configuration of the dataset to load. This is optional and only needed if the dataset has multiple configurations.

Runtime Parameters

  • batch_size: The batch size to use when processing the data.

  • repo_id: The Hugging Face Hub repository ID of the dataset to load.

  • split: The split of the dataset to load. Defaults to 'train'.

  • config: The configuration of the dataset to load. This is optional and only needed if the dataset has multiple configurations.

  • streaming: Whether to load the dataset in streaming mode or not. Defaults to False.

  • num_examples: The number of examples to load from the dataset. By default will load all examples.

Input & Output Columns

graph TD
    subgraph Dataset
        subgraph New columns
            OCOL0[dynamic]
        end
    end

    subgraph LoadHubDataset
        StepOutput[Output Columns: dynamic]
    end

    StepOutput --> OCOL0

Outputs

  • dynamic (all): The columns that will be generated by this step, based on the datasets loaded from the Hugging Face Hub.