quality_scorer

`QualityScorerTask` `dataclass` ¶

Bases: PreferenceTaskNoRationale

A PreferenceTask following the Quality Scorer specification for rating instructions in terms of quality.

This task is inspired by the Evol Quality Scorer in the Deita framework: Deita is an open-sourced project designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).

The task follows the same scheme as the Evol Complexity Scorer, but the instructions are scored in terms of quality, obtaining a quality score q for each instruction.

Parameters:

Name	Type	Description	Default
`system_prompt`	`str`	the system prompt to be used. Not defined for this task.	`''`

References

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

Source code in src/distilabel/tasks/preference/quality_scorer.py

@dataclass
class QualityScorerTask(PreferenceTaskNoRationale):
    """A `PreferenceTask` following the `Quality Scorer` specification for rating instructions
    in terms of quality.

    This task is inspired by the Evol Quality Scorer in the Deita framework: *Deita is an open-sourced project
    designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).*

    The task follows the same scheme as the Evol Complexity Scorer, but the instructions are scored in terms of
    quality, obtaining a quality score *q* for each instruction.

    Args:
        system_prompt (str, optional): the system prompt to be used. Not defined for this task.

    References:
        - [`What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning`](https://arxiv.org/abs/2312.15685)
    """

    system_prompt: str = ""
    task_description: str = """Your evaluation should consider factors such as helpfulness, relevance, accuracy, depth,
creativity, and level of detail of the response."""
    __jinja2_template__: str = _QUALITY_SCORER_TEMPLATE

    def generate_prompt(self, input: str, generations: List[str], **_: Any) -> Prompt:
        """Generates a prompt following the *Evol Quality* specification in *Deita*.

        Args:
            input (str): the instruction for which the model will score the responses.
            generations (List[str]): the generations to be used for the prompt.

        Returns:
            Prompt: the generated prompt.

        Examples:
            >>> from distilabel.tasks.preference import QualityScorerTask
            >>> task = QualityScorerTask()
            >>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
            Prompt(
                system_prompt="",
                formatted_prompt="Rank the following responses provided ..."
            )
        """
        render_kwargs = {
            "instruction": input,
            "responses": generations,
            "task_description": self.task_description,
        }
        return Prompt(
            system_prompt=self.system_prompt,
            formatted_prompt=self.template.render(**render_kwargs),
        )

    def parse_output(self, output: str) -> Dict[str, List[str]]:
        """Parses the output of the task, returning a list with the rating of each instruction.

        Args:
            output (str): The output of the LLM raw.

        Returns:
            Dict[str, List[str]]: A dict with containing the ratings of each instruction.
        """
        output = output.lower().split("\n")
        scores = [
            float(re.sub(r"\[response \d+\] score:", "", o).strip()) for o in output
        ]
        return {self.output_args_names[0]: scores}

`generate_prompt(input, generations, **_)` ¶

Generates a prompt following the Evol Quality specification in Deita.

Parameters:

Name	Type	Description	Default
`input`	`str`	the instruction for which the model will score the responses.	required
`generations`	`List[str]`	the generations to be used for the prompt.	required

Returns:

Name	Type	Description
`Prompt`	`Prompt`	the generated prompt.

Examples:

>>> from distilabel.tasks.preference import QualityScorerTask
>>> task = QualityScorerTask()
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
    system_prompt="",
    formatted_prompt="Rank the following responses provided ..."
)

Source code in src/distilabel/tasks/preference/quality_scorer.py

def generate_prompt(self, input: str, generations: List[str], **_: Any) -> Prompt:
    """Generates a prompt following the *Evol Quality* specification in *Deita*.

    Args:
        input (str): the instruction for which the model will score the responses.
        generations (List[str]): the generations to be used for the prompt.

    Returns:
        Prompt: the generated prompt.

    Examples:
        >>> from distilabel.tasks.preference import QualityScorerTask
        >>> task = QualityScorerTask()
        >>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
        Prompt(
            system_prompt="",
            formatted_prompt="Rank the following responses provided ..."
        )
    """
    render_kwargs = {
        "instruction": input,
        "responses": generations,
        "task_description": self.task_description,
    }
    return Prompt(
        system_prompt=self.system_prompt,
        formatted_prompt=self.template.render(**render_kwargs),
    )

`parse_output(output)` ¶

Parses the output of the task, returning a list with the rating of each instruction.

Parameters:

Name	Type	Description	Default
`output`	`str`	The output of the LLM raw.	required

Returns:

Type	Description
`Dict[str, List[str]]`	Dict[str, List[str]]: A dict with containing the ratings of each instruction.

Source code in src/distilabel/tasks/preference/quality_scorer.py

def parse_output(self, output: str) -> Dict[str, List[str]]:
    """Parses the output of the task, returning a list with the rating of each instruction.

    Args:
        output (str): The output of the LLM raw.

    Returns:
        Dict[str, List[str]]: A dict with containing the ratings of each instruction.
    """
    output = output.lower().split("\n")
    scores = [
        float(re.sub(r"\[response \d+\] score:", "", o).strip()) for o in output
    ]
    return {self.output_args_names[0]: scores}

quality_scorer

QualityScorerTask dataclass ¶

generate_prompt(input, generations, **_) ¶

parse_output(output) ¶

`QualityScorerTask` `dataclass` ¶

`generate_prompt(input, generations, **_)` ¶

`parse_output(output)` ¶