Skip to content

quality_scorer

QualityScorerTask dataclass

Bases: PreferenceTaskNoRationale

A PreferenceTask following the Quality Scorer specification for rating instructions in terms of quality.

This task is inspired by the Evol Quality Scorer in the Deita framework: Deita is an open-sourced project designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).

The task follows the same scheme as the Evol Complexity Scorer, but the instructions are scored in terms of quality, obtaining a quality score q for each instruction.

Parameters:

Name Type Description Default
system_prompt str

the system prompt to be used. Not defined for this task.

''
References
Source code in src/distilabel/tasks/preference/quality_scorer.py
@dataclass
class QualityScorerTask(PreferenceTaskNoRationale):
    """A `PreferenceTask` following the `Quality Scorer` specification for rating instructions
    in terms of quality.

    This task is inspired by the Evol Quality Scorer in the Deita framework: *Deita is an open-sourced project
    designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).*

    The task follows the same scheme as the Evol Complexity Scorer, but the instructions are scored in terms of
    quality, obtaining a quality score *q* for each instruction.

    Args:
        system_prompt (str, optional): the system prompt to be used. Not defined for this task.

    References:
        - [`What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning`](https://arxiv.org/abs/2312.15685)
    """

    system_prompt: str = ""
    task_description: str = """Your evaluation should consider factors such as helpfulness, relevance, accuracy, depth,
creativity, and level of detail of the response."""
    __jinja2_template__: str = _QUALITY_SCORER_TEMPLATE

    def generate_prompt(self, input: str, generations: List[str], **_: Any) -> Prompt:
        """Generates a prompt following the *Evol Quality* specification in *Deita*.

        Args:
            input (str): the instruction for which the model will score the responses.
            generations (List[str]): the generations to be used for the prompt.

        Returns:
            Prompt: the generated prompt.

        Examples:
            >>> from distilabel.tasks.preference import QualityScorerTask
            >>> task = QualityScorerTask()
            >>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
            Prompt(
                system_prompt="",
                formatted_prompt="Rank the following responses provided ..."
            )
        """
        render_kwargs = {
            "instruction": input,
            "responses": generations,
            "task_description": self.task_description,
        }
        return Prompt(
            system_prompt=self.system_prompt,
            formatted_prompt=self.template.render(**render_kwargs),
        )

    def parse_output(self, output: str) -> Dict[str, List[str]]:
        """Parses the output of the task, returning a list with the rating of each instruction.

        Args:
            output (str): The output of the LLM raw.

        Returns:
            Dict[str, List[str]]: A dict with containing the ratings of each instruction.
        """
        output = output.lower().split("\n")
        scores = [
            float(re.sub(r"\[response \d+\] score:", "", o).strip()) for o in output
        ]
        return {self.output_args_names[0]: scores}

generate_prompt(input, generations, **_)

Generates a prompt following the Evol Quality specification in Deita.

Parameters:

Name Type Description Default
input str

the instruction for which the model will score the responses.

required
generations List[str]

the generations to be used for the prompt.

required

Returns:

Name Type Description
Prompt Prompt

the generated prompt.

Examples:

>>> from distilabel.tasks.preference import QualityScorerTask
>>> task = QualityScorerTask()
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
    system_prompt="",
    formatted_prompt="Rank the following responses provided ..."
)
Source code in src/distilabel/tasks/preference/quality_scorer.py
def generate_prompt(self, input: str, generations: List[str], **_: Any) -> Prompt:
    """Generates a prompt following the *Evol Quality* specification in *Deita*.

    Args:
        input (str): the instruction for which the model will score the responses.
        generations (List[str]): the generations to be used for the prompt.

    Returns:
        Prompt: the generated prompt.

    Examples:
        >>> from distilabel.tasks.preference import QualityScorerTask
        >>> task = QualityScorerTask()
        >>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
        Prompt(
            system_prompt="",
            formatted_prompt="Rank the following responses provided ..."
        )
    """
    render_kwargs = {
        "instruction": input,
        "responses": generations,
        "task_description": self.task_description,
    }
    return Prompt(
        system_prompt=self.system_prompt,
        formatted_prompt=self.template.render(**render_kwargs),
    )

parse_output(output)

Parses the output of the task, returning a list with the rating of each instruction.

Parameters:

Name Type Description Default
output str

The output of the LLM raw.

required

Returns:

Type Description
Dict[str, List[str]]

Dict[str, List[str]]: A dict with containing the ratings of each instruction.

Source code in src/distilabel/tasks/preference/quality_scorer.py
def parse_output(self, output: str) -> Dict[str, List[str]]:
    """Parses the output of the task, returning a list with the rating of each instruction.

    Args:
        output (str): The output of the LLM raw.

    Returns:
        Dict[str, List[str]]: A dict with containing the ratings of each instruction.
    """
    output = output.lower().split("\n")
    scores = [
        float(re.sub(r"\[response \d+\] score:", "", o).strip()) for o in output
    ]
    return {self.output_args_names[0]: scores}