Global Steps¶
PushToHub
¶
Bases: GlobalStep
A GlobalStep
which creates a datasets.Dataset
with the input data and pushes
it to the Hugging Face Hub.
Attributes:
Name | Type | Description |
---|---|---|
repo_id |
RuntimeParameter[str]
|
The Hugging Face Hub repository ID where the dataset will be uploaded. |
split |
RuntimeParameter[str]
|
The split of the dataset that will be pushed. Defaults to |
private |
RuntimeParameter[bool]
|
Whether the dataset to be pushed should be private or not. Defaults to
|
token |
Optional[RuntimeParameter[str]]
|
The token that will be used to authenticate in the Hub. If not provided, the
token will be tried to be obtained from the environment variable |
Runtime parameters
repo_id
: The Hugging Face Hub repository ID where the dataset will be uploaded.split
: The split of the dataset that will be pushed.private
: Whether the dataset to be pushed should be private or not.token
: The token that will be used to authenticate in the Hub.
Input columns
- dynamic, based on the existing data within inputs
Source code in src/distilabel/steps/globals/huggingface.py
process(inputs)
¶
Method that processes the input data, respecting the datasets.Dataset
formatting,
and pushes it to the Hugging Face Hub based on the RuntimeParameter
s attributes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
StepInput
|
that input data within a single object (as it's a GlobalStep) that
will be transformed into a |
required |
Yields:
Type | Description |
---|---|
StepOutput
|
Propagates the received inputs so that the |
StepOutput
|
the last step of the |
StepOutput
|
steps. |