Skip to content

prometheus

PrometheusTask dataclass

Bases: CritiqueTask

A CritiqueTask following the prompt templated used by Prometheus.

Parameters:

Name Type Description Default
system_prompt str

the system prompt to be used for generation. Defaults to None.

'You are a fair evaluator language model.'
scoring_criteria str

the scoring criteria to be used for the task, that defines the scores below, provided via score_descriptions.

required
score_descriptions Dict[int, str]

the descriptions of the scores, where the key is the rating value (ideally those should be consecutive), and the value is the description of each rating.

required
Disclaimer

Since the Prometheus model has been trained with OpenAI API generated data, the prompting strategy may just be consistent / compliant with either GPT-3.5 or GPT-4 from OpenAI API, or with their own model. Any other model may fail on the generation of a structured output, as well as providing an incorrect / inaccurate critique.

References
Source code in src/distilabel/tasks/critique/prometheus.py
@dataclass
class PrometheusTask(CritiqueTask):
    """A `CritiqueTask` following the prompt templated used by Prometheus.

    Args:
        system_prompt (str, optional): the system prompt to be used for generation. Defaults to `None`.
        scoring_criteria (str): the scoring criteria to be used for the task, that defines
            the scores below, provided via `score_descriptions`.
        score_descriptions (Dict[int, str]): the descriptions of the scores, where
            the key is the rating value (ideally those should be consecutive), and the
            value is the description of each rating.

    Disclaimer:
        Since the Prometheus model has been trained with OpenAI API generated data, the prompting
        strategy may just be consistent / compliant with either GPT-3.5 or GPT-4 from OpenAI API, or
        with their own model. Any other model may fail on the generation of a structured output, as
        well as providing an incorrect / inaccurate critique.

    References:
        - [`Prometheus: Inducing Fine-grained Evaluation Capability in Language Models`](https://arxiv.org/abs/2310.08491)
        - [`kaist-ai/prometheus-13b-v1.0`](https://huggingface.co/kaist-ai/prometheus-7b-v1.0)
        - [`kaist-ai/prometheus-13b-v1.0`](https://huggingface.co/kaist-ai/prometheus-13b-v1.0)
    """

    scoring_criteria: str
    score_descriptions: Dict[int, str]

    system_prompt: str = "You are a fair evaluator language model."

    __jinja2_template__: ClassVar[str] = _PROMETHEUS_TEMPLATE

    @property
    def input_args_names(self) -> List[str]:
        return super().input_args_names + ["ref_completion"]

    def generate_prompt(
        self, input: str, generations: str, ref_completion: str, **_: Any
    ) -> Prompt:
        """Generates a prompt following the Prometheus specification.

        Args:
            input (str): the input to be used for the prompt.
            generations (List[str]): the generations to be used for the prompt, in
                this case, the ones to be critiqued.
            ref_completion (str): the reference completion to be used for the prompt,
                which is the reference one, assuming the one with the highest score.

        Returns:
            Prompt: the generated prompt.

        Examples:
            >>> from distilabel.tasks.critique import PrometheusTask
            >>> task = PrometheusTask(
            ...     scoring_criteria="Overall quality of the responses provided.",
            ...     score_descriptions={0: "false", 1: "partially false", 2: "average", 3: "partially true", 4: "true"},
            ... )
            >>> task.generate_prompt(
            ...     input="What are the first 5 Fibonacci numbers?",
            ...     generations=["0 1 1 2 3", "0 1 1 2 3"],
            ...     ref_completion="0 1 1 2 3",
            ... )
            Prompt(
                system_prompt="You are a fair evaluator language model.",
                formatted_prompt=""###Task Description:...",
            )
        """
        render_kwargs = {
            "instruction": input,
            "completion": generations,
            "ref_completion": ref_completion,
            "scoring_criteria": self.scoring_criteria,
            "score_descriptions": self.score_descriptions,
        }
        return Prompt(
            system_prompt=self.system_prompt,
            formatted_prompt=self.template.render(**render_kwargs),
        )

    def parse_output(self, output: str) -> CritiqueTaskOutput:  # type: ignore
        """Parses the output of the model into the desired format."""
        # We use a regex instead of splitting by the delimiter because the
        # critique may contain the delimiter, and using the regex is safer.
        pattern = r"(.+?)\. \[RESULT\] (\d+)"
        match = re.match(pattern, output)
        if match:
            return CritiqueTaskOutput(
                score=float(match.group(2)),
                critique=match.group(1).strip(),
            )

generate_prompt(input, generations, ref_completion, **_)

Generates a prompt following the Prometheus specification.

Parameters:

Name Type Description Default
input str

the input to be used for the prompt.

required
generations List[str]

the generations to be used for the prompt, in this case, the ones to be critiqued.

required
ref_completion str

the reference completion to be used for the prompt, which is the reference one, assuming the one with the highest score.

required

Returns:

Name Type Description
Prompt Prompt

the generated prompt.

Examples:

>>> from distilabel.tasks.critique import PrometheusTask
>>> task = PrometheusTask(
...     scoring_criteria="Overall quality of the responses provided.",
...     score_descriptions={0: "false", 1: "partially false", 2: "average", 3: "partially true", 4: "true"},
... )
>>> task.generate_prompt(
...     input="What are the first 5 Fibonacci numbers?",
...     generations=["0 1 1 2 3", "0 1 1 2 3"],
...     ref_completion="0 1 1 2 3",
... )
Prompt(
    system_prompt="You are a fair evaluator language model.",
    formatted_prompt=""###Task Description:...",
)
Source code in src/distilabel/tasks/critique/prometheus.py
def generate_prompt(
    self, input: str, generations: str, ref_completion: str, **_: Any
) -> Prompt:
    """Generates a prompt following the Prometheus specification.

    Args:
        input (str): the input to be used for the prompt.
        generations (List[str]): the generations to be used for the prompt, in
            this case, the ones to be critiqued.
        ref_completion (str): the reference completion to be used for the prompt,
            which is the reference one, assuming the one with the highest score.

    Returns:
        Prompt: the generated prompt.

    Examples:
        >>> from distilabel.tasks.critique import PrometheusTask
        >>> task = PrometheusTask(
        ...     scoring_criteria="Overall quality of the responses provided.",
        ...     score_descriptions={0: "false", 1: "partially false", 2: "average", 3: "partially true", 4: "true"},
        ... )
        >>> task.generate_prompt(
        ...     input="What are the first 5 Fibonacci numbers?",
        ...     generations=["0 1 1 2 3", "0 1 1 2 3"],
        ...     ref_completion="0 1 1 2 3",
        ... )
        Prompt(
            system_prompt="You are a fair evaluator language model.",
            formatted_prompt=""###Task Description:...",
        )
    """
    render_kwargs = {
        "instruction": input,
        "completion": generations,
        "ref_completion": ref_completion,
        "scoring_criteria": self.scoring_criteria,
        "score_descriptions": self.score_descriptions,
    }
    return Prompt(
        system_prompt=self.system_prompt,
        formatted_prompt=self.template.render(**render_kwargs),
    )

parse_output(output)

Parses the output of the model into the desired format.

Source code in src/distilabel/tasks/critique/prometheus.py
def parse_output(self, output: str) -> CritiqueTaskOutput:  # type: ignore
    """Parses the output of the model into the desired format."""
    # We use a regex instead of splitting by the delimiter because the
    # critique may contain the delimiter, and using the regex is safer.
    pattern = r"(.+?)\. \[RESULT\] (\d+)"
    match = re.match(pattern, output)
    if match:
        return CritiqueTaskOutput(
            score=float(match.group(2)),
            critique=match.group(1).strip(),
        )