Bases: PreferenceTaskNoRationale
A PreferenceTask
following the Quality Scorer
specification for rating instructions
in terms of quality.
This task is inspired by the Evol Quality Scorer in the Deita framework: Deita is an open-sourced project
designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).
The task follows the same scheme as the Evol Complexity Scorer, but the instructions are scored in terms of
quality, obtaining a quality score q for each instruction.
Parameters:
Name |
Type |
Description |
Default |
system_prompt |
str
|
the system prompt to be used. Not defined for this task.
|
''
|
References
Source code in src/distilabel/tasks/preference/quality_scorer.py
| @dataclass
class QualityScorerTask(PreferenceTaskNoRationale):
"""A `PreferenceTask` following the `Quality Scorer` specification for rating instructions
in terms of quality.
This task is inspired by the Evol Quality Scorer in the Deita framework: *Deita is an open-sourced project
designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).*
The task follows the same scheme as the Evol Complexity Scorer, but the instructions are scored in terms of
quality, obtaining a quality score *q* for each instruction.
Args:
system_prompt (str, optional): the system prompt to be used. Not defined for this task.
References:
- [`What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning`](https://arxiv.org/abs/2312.15685)
"""
system_prompt: str = ""
task_description: str = """Your evaluation should consider factors such as helpfulness, relevance, accuracy, depth,
creativity, and level of detail of the response."""
__jinja2_template__: str = _QUALITY_SCORER_TEMPLATE
def generate_prompt(self, input: str, generations: List[str], **_: Any) -> Prompt:
"""Generates a prompt following the *Evol Quality* specification in *Deita*.
Args:
input (str): the instruction for which the model will score the responses.
generations (List[str]): the generations to be used for the prompt.
Returns:
Prompt: the generated prompt.
Examples:
>>> from distilabel.tasks.preference import QualityScorerTask
>>> task = QualityScorerTask()
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
system_prompt="",
formatted_prompt="Rank the following responses provided ..."
)
"""
render_kwargs = {
"instruction": input,
"responses": generations,
"task_description": self.task_description,
}
return Prompt(
system_prompt=self.system_prompt,
formatted_prompt=self.template.render(**render_kwargs),
)
def parse_output(self, output: str) -> Dict[str, List[str]]:
"""Parses the output of the task, returning a list with the rating of each instruction.
Args:
output (str): The output of the LLM raw.
Returns:
Dict[str, List[str]]: A dict with containing the ratings of each instruction.
"""
output = output.lower().split("\n")
scores = [
float(re.sub(r"\[response \d+\] score:", "", o).strip()) for o in output
]
return {self.output_args_names[0]: scores}
|
generate_prompt(input, generations, **_)
Generates a prompt following the Evol Quality specification in Deita.
Parameters:
Name |
Type |
Description |
Default |
input |
str
|
the instruction for which the model will score the responses.
|
required
|
generations |
List[str]
|
the generations to be used for the prompt.
|
required
|
Returns:
Name | Type |
Description |
Prompt |
Prompt
|
|
Examples:
>>> from distilabel.tasks.preference import QualityScorerTask
>>> task = QualityScorerTask()
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
system_prompt="",
formatted_prompt="Rank the following responses provided ..."
)
Source code in src/distilabel/tasks/preference/quality_scorer.py
| def generate_prompt(self, input: str, generations: List[str], **_: Any) -> Prompt:
"""Generates a prompt following the *Evol Quality* specification in *Deita*.
Args:
input (str): the instruction for which the model will score the responses.
generations (List[str]): the generations to be used for the prompt.
Returns:
Prompt: the generated prompt.
Examples:
>>> from distilabel.tasks.preference import QualityScorerTask
>>> task = QualityScorerTask()
>>> task.generate_prompt("What are the first 5 Fibonacci numbers?", ["0 1 1 2 3", "0 1 1 2 3"])
Prompt(
system_prompt="",
formatted_prompt="Rank the following responses provided ..."
)
"""
render_kwargs = {
"instruction": input,
"responses": generations,
"task_description": self.task_description,
}
return Prompt(
system_prompt=self.system_prompt,
formatted_prompt=self.template.render(**render_kwargs),
)
|
parse_output(output)
Parses the output of the task, returning a list with the rating of each instruction.
Parameters:
Name |
Type |
Description |
Default |
output |
str
|
The output of the LLM raw.
|
required
|
Returns:
Type |
Description |
Dict[str, List[str]]
|
Dict[str, List[str]]: A dict with containing the ratings of each instruction.
|
Source code in src/distilabel/tasks/preference/quality_scorer.py
| def parse_output(self, output: str) -> Dict[str, List[str]]:
"""Parses the output of the task, returning a list with the rating of each instruction.
Args:
output (str): The output of the LLM raw.
Returns:
Dict[str, List[str]]: A dict with containing the ratings of each instruction.
"""
output = output.lower().split("\n")
scores = [
float(re.sub(r"\[response \d+\] score:", "", o).strip()) for o in output
]
return {self.output_args_names[0]: scores}
|