Skip to content

Base

Argilla

Bases: Step, ABC

Abstract step that provides a class to subclass from, that contains the boilerplate code required to interact with Argilla, as well as some extra validations on top of it. It also defines the abstract methods that need to be implemented in order to add a new dataset type as a step.

Note

This class is not intended to be instanced directly, but via subclass.

Attributes:

Name Type Description
dataset_name RuntimeParameter[str]

The name of the dataset in Argilla where the records will be added.

dataset_workspace Optional[RuntimeParameter[str]]

The workspace where the dataset will be created in Argilla. Defaults to None, which means it will be created in the default workspace.

api_url Optional[RuntimeParameter[str]]

The URL of the Argilla API. Defaults to None, which means it will be read from the ARGILLA_API_URL environment variable.

api_key Optional[RuntimeParameter[SecretStr]]

The API key to authenticate with Argilla. Defaults to None, which means it will be read from the ARGILLA_API_KEY environment variable.

Runtime parameters
  • dataset_name: The name of the dataset in Argilla where the records will be added.
  • dataset_workspace: The workspace where the dataset will be created in Argilla. Defaults to None, which means it will be created in the default workspace.
  • api_url: The base URL to use for the Argilla API requests.
  • api_key: The API key to authenticate the requests to the Argilla API.
Input columns
  • dynamic, based on the inputs value provided
Source code in src/distilabel/steps/argilla/base.py
class Argilla(Step, ABC):
    """Abstract step that provides a class to subclass from, that contains the boilerplate code
    required to interact with Argilla, as well as some extra validations on top of it. It also defines
    the abstract methods that need to be implemented in order to add a new dataset type as a step.

    Note:
        This class is not intended to be instanced directly, but via subclass.

    Attributes:
        dataset_name: The name of the dataset in Argilla where the records will be added.
        dataset_workspace: The workspace where the dataset will be created in Argilla. Defaults to
            `None`, which means it will be created in the default workspace.
        api_url: The URL of the Argilla API. Defaults to `None`, which means it will be read from
            the `ARGILLA_API_URL` environment variable.
        api_key: The API key to authenticate with Argilla. Defaults to `None`, which means it will
            be read from the `ARGILLA_API_KEY` environment variable.

    Runtime parameters:
        - `dataset_name`: The name of the dataset in Argilla where the records will be
            added.
        - `dataset_workspace`: The workspace where the dataset will be created in Argilla.
            Defaults to `None`, which means it will be created in the default workspace.
        - `api_url`: The base URL to use for the Argilla API requests.
        - `api_key`: The API key to authenticate the requests to the Argilla API.

    Input columns:
        - dynamic, based on the `inputs` value provided
    """

    dataset_name: RuntimeParameter[str] = Field(
        default=None, description="The name of the dataset in Argilla."
    )
    dataset_workspace: Optional[RuntimeParameter[str]] = Field(
        default=None,
        description="The workspace where the dataset will be created in Argilla. Defaults"
        "to `None` which means it will be created in the default workspace.",
    )

    api_url: Optional[RuntimeParameter[str]] = Field(
        default_factory=lambda: os.getenv("ARGILLA_API_URL"),
        description="The base URL to use for the Argilla API requests.",
    )
    api_key: Optional[RuntimeParameter[SecretStr]] = Field(
        default_factory=lambda: os.getenv(_ARGILLA_API_KEY_ENV_VAR_NAME),
        description="The API key to authenticate the requests to the Argilla API.",
    )

    _rg_dataset: Optional["RemoteFeedbackDataset"] = PrivateAttr(...)

    def model_post_init(self, __context: Any) -> None:
        """Checks that the Argilla Python SDK is installed, and then filters the Argilla warnings."""
        super().model_post_init(__context)

        try:
            import argilla as rg  # noqa
        except ImportError as ie:
            raise ImportError(
                "Argilla is not installed. Please install it using `pip install argilla`."
            ) from ie

        warnings.filterwarnings("ignore")

    def _rg_init(self) -> None:
        """Initializes the Argilla API client with the provided `api_url` and `api_key`."""
        try:
            if "hf.space" in self.api_url and "HF_TOKEN" in os.environ:
                headers = {"Authorization": f"Bearer {os.environ['HF_TOKEN']}"}
            else:
                headers = None
            rg.init(
                api_url=self.api_url,
                api_key=self.api_key.get_secret_value(),
                extra_headers=headers,
            )  # type: ignore
        except Exception as e:
            raise ValueError(f"Failed to initialize the Argilla API: {e}") from e

    def _rg_dataset_exists(self) -> bool:
        """Checks if the dataset already exists in Argilla."""
        return self.dataset_name in [
            dataset.name
            for dataset in rg.FeedbackDataset.list(workspace=self.dataset_workspace)  # type: ignore
        ]

    @property
    def outputs(self) -> List[str]:
        """The outputs of the step is an empty list, since the steps subclassing from this one, will
        always be leaf nodes and won't propagate the inputs neither generate any outputs.
        """
        return []

    def load(self) -> None:
        """Method to perform any initialization logic before the `process` method is
        called. For example, to load an LLM, stablish a connection to a database, etc.
        """
        super().load()

        self._rg_init()

    @property
    @abstractmethod
    def inputs(self) -> List[str]:
        ...

    @abstractmethod
    def process(self, *inputs: StepInput) -> "StepOutput":
        ...

outputs: List[str] property

The outputs of the step is an empty list, since the steps subclassing from this one, will always be leaf nodes and won't propagate the inputs neither generate any outputs.

load()

Method to perform any initialization logic before the process method is called. For example, to load an LLM, stablish a connection to a database, etc.

Source code in src/distilabel/steps/argilla/base.py
def load(self) -> None:
    """Method to perform any initialization logic before the `process` method is
    called. For example, to load an LLM, stablish a connection to a database, etc.
    """
    super().load()

    self._rg_init()

model_post_init(__context)

Checks that the Argilla Python SDK is installed, and then filters the Argilla warnings.

Source code in src/distilabel/steps/argilla/base.py
def model_post_init(self, __context: Any) -> None:
    """Checks that the Argilla Python SDK is installed, and then filters the Argilla warnings."""
    super().model_post_init(__context)

    try:
        import argilla as rg  # noqa
    except ImportError as ie:
        raise ImportError(
            "Argilla is not installed. Please install it using `pip install argilla`."
        ) from ie

    warnings.filterwarnings("ignore")