Bases: Task
QualityScorer is a pre-defined task that defines the instruction
as the input
and score
as the output. This task is used to rate the quality of instructions and responses.
It's an implementation of the quality score task from the paper 'What Makes Good Data
for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning'.
The task follows the same scheme as the Complexity Scorer, but the instruction-response pairs
are scored in terms of quality, obtaining a quality score for each instruction.
Attributes:
Name |
Type |
Description |
_template |
Union[Template, None]
|
a Jinja2 template used to format the input for the LLM.
|
Input columns
- instruction (
str
): The instruction that was used to generate the responses
.
- responses (
List[str]
): The responses to be scored. Each response forms a pair with the instruction.
Output columns
- quality_score (
List[float]
): The quality score for each instruction.
References
Source code in src/distilabel/steps/tasks/quality_scorer.py
| class QualityScorer(Task):
"""QualityScorer is a pre-defined task that defines the `instruction` as the input
and `score` as the output. This task is used to rate the quality of instructions and responses.
It's an implementation of the quality score task from the paper 'What Makes Good Data
for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning'.
The task follows the same scheme as the Complexity Scorer, but the instruction-response pairs
are scored in terms of quality, obtaining a quality score for each instruction.
Attributes:
_template: a Jinja2 template used to format the input for the LLM.
Input columns:
- instruction (`str`): The instruction that was used to generate the `responses`.
- responses (`List[str]`): The responses to be scored. Each response forms a pair with the instruction.
Output columns:
- quality_score (`List[float]`): The quality score for each instruction.
References:
- [`What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning`](https://arxiv.org/abs/2312.15685)
"""
_template: Union[Template, None] = PrivateAttr(...)
def load(self) -> None:
super().load()
self._template = Template(_QUALITY_SCORER_TEMPLATE)
@property
def inputs(self) -> List[str]:
"""The input for the task are `instruction` and `responses`."""
return ["instruction", "responses"]
def format_input(self, input: Dict[str, Any]) -> ChatType: # type: ignore
"""The input is formatted as a `ChatType` assuming that the instruction
is the first interaction from the user within a conversation."""
return [{"role": "user", "content": self._template.render(**input)}] # type: ignore
@property
def outputs(self):
"""The output for the task is a list of `quality_scores` containing the quality score for each
response in `responses`."""
return ["scores"]
def format_output(
self, output: Union[str, None], input: Dict[str, Any]
) -> Dict[str, Any]:
"""The output is formatted as a list with the score of each instruction-response pair.
Args:
output: the raw output of the LLM.
input: the input to the task. Used for obtaining the number of responses.
Returns:
A dict with containing the scores for each instruction-response pair.
"""
if output is None:
return {self.outputs[0]: [None] * len(input["responses"])}
scores = []
score_lines = output.split("\n")
for i, line in enumerate(score_lines):
match = _PARSE_SCORE_LINE_REGEX.match(line)
score = float(match.group(1)) if match else None
scores.append(score)
if i == len(input["responses"]) - 1:
break
return {self.outputs[0]: scores}
|
The input for the task are instruction
and responses
.
outputs
property
The output for the task is a list of quality_scores
containing the quality score for each
response in responses
.
The input is formatted as a ChatType
assuming that the instruction
is the first interaction from the user within a conversation.
Source code in src/distilabel/steps/tasks/quality_scorer.py
| def format_input(self, input: Dict[str, Any]) -> ChatType: # type: ignore
"""The input is formatted as a `ChatType` assuming that the instruction
is the first interaction from the user within a conversation."""
return [{"role": "user", "content": self._template.render(**input)}] # type: ignore
|
The output is formatted as a list with the score of each instruction-response pair.
Parameters:
Name |
Type |
Description |
Default |
output |
Union[str, None]
|
the raw output of the LLM.
|
required
|
input |
Dict[str, Any]
|
the input to the task. Used for obtaining the number of responses.
|
required
|
Returns:
Type |
Description |
Dict[str, Any]
|
A dict with containing the scores for each instruction-response pair.
|
Source code in src/distilabel/steps/tasks/quality_scorer.py
| def format_output(
self, output: Union[str, None], input: Dict[str, Any]
) -> Dict[str, Any]:
"""The output is formatted as a list with the score of each instruction-response pair.
Args:
output: the raw output of the LLM.
input: the input to the task. Used for obtaining the number of responses.
Returns:
A dict with containing the scores for each instruction-response pair.
"""
if output is None:
return {self.outputs[0]: [None] * len(input["responses"])}
scores = []
score_lines = output.split("\n")
for i, line in enumerate(score_lines):
match = _PARSE_SCORE_LINE_REGEX.match(line)
score = float(match.group(1)) if match else None
scores.append(score)
if i == len(input["responses"]) - 1:
break
return {self.outputs[0]: scores}
|